Automation#

ProDock automation workflow

One end-to-end entry point

Run preparation, docking, postprocessing, and optional database export through one Python function or one CLI command.

JSON-first workflow

Define campaigns in one all-in-one config or split them across config, receptor, and ligand JSON files.

Override-ready CLI

Keep stable defaults in JSON and override engines, compute settings, paths, and workflow flags directly from the terminal.

Reproducible effective config

Validate merged inputs, print the final effective config, and write it to disk for debugging and reproducibility.

Automation architecture#

The automation layer is the highest-level workflow entry point in ProDock. It connects the lower-level stages into one reproducible campaign:

  • receptor preparation or prepared receptor reuse,

  • ligand preparation or prepared ligand reuse,

  • single- or multi-engine docking,

  • optional interaction extraction,

  • optional SQLite export.

The same workflow is exposed through two interfaces:

  • prodock() for Python usage,

  • prodock --config ... or python -m prodock --config ... for CLI usage.

Configuration patterns#

The CLI supports two main input patterns:

  • all-in-one JSON — one file contains project directory, receptor input, ligand input, and run options

  • split JSON — the main config contains project/run options, while receptor and ligand definitions are passed separately through --receptor-json and --ligand-json

This makes the automation layer suitable both for simple tutorial runs and for larger projects where receptor and ligand collections are maintained separately.

Override precedence#

Configuration values are resolved in this order:

  1. --config provides the base payload

  2. --receptor-json overrides embedded receptors from --config

  3. --ligand-json overrides embedded ligands from --config

  4. explicit CLI flags override all JSON-derived values

This means you can keep a stable base config and vary engines, compute settings, or workflow flags from the terminal without editing the JSON files.

Input modes#

After config merging, the final workflow must contain exactly one receptor mode and exactly one ligand mode.

Supported receptor modes:

  • receptors for raw receptor specifications

  • prepared_receptors for already prepared receptor inputs

Supported ligand modes:

  • ligands for inline ligand dictionaries

  • ligand_dir for a directory of prepared ligand files

This keeps the high-level workflow unambiguous while still supporting both raw and preprocessed campaigns.

All-in-one JSON#

Single-file campaign config

Keep receptors, ligands, and run options together in one JSON file.

The simplest automation layout is one file containing everything:

{
  "project_dir": "Demo",
  "receptors": [
    {
      "pdb_id": "4WKQ",
      "receptor_name": "EGFR_4WKQ",
      "ligand_code": "IRE",
      "chains": ["A"],
      "cofactors": []
    }
  ],
  "ligands": [
    {
      "id": "erlotinib",
      "smiles": "COCCOc1cc2c(ncnc2cc1OCCOC)Nc1cccc(c1)C#C"
    },
    {
      "id": "gefitinib",
      "smiles": "COc1cc2ncnc(c2cc1OCCCN1CCOCC1)Nc1ccc(c(c1)Cl)F"
    }
  ],
  "config": {
    "engines": ["qvina", "qvina-w"],
    "extract_interaction": true,
    "save_to_database": true,
    "db_name": "demo.db",
    "cpu": 8,
    "n_jobs": 8,
    "exhaustiveness": 16,
    "n_poses": 20
  }
}

Run it with either form:

prodock --config run.json
python -m prodock --config run.json

This is the most convenient pattern for tutorials, notebooks, and compact project runs.

Split JSON input#

Split config, receptor, and ligand files

Keep project settings separate from receptor and ligand collections.

For larger campaigns, receptor and ligand definitions can live in separate files.

Example config.json:

{
  "project_dir": "Demo",
  "config": {
    "engines": ["qvina", "qvina-w"],
    "extract_interaction": true,
    "save_to_database": true,
    "db_name": "demo.db",
    "cpu": 8,
    "n_jobs": 8,
    "exhaustiveness": 16,
    "n_poses": 20
  }
}

Example receptor.json:

{
  "receptors": [
    {
      "pdb_id": "4WKQ",
      "receptor_name": "EGFR_4WKQ",
      "ligand_code": "IRE",
      "chains": ["A"],
      "cofactors": []
    }
  ]
}

Example ligand.json:

{
  "ligands": [
    {
      "id": "erlotinib",
      "smiles": "COCCOc1cc2c(ncnc2cc1OCCOC)Nc1cccc(c1)C#C"
    },
    {
      "id": "gefitinib",
      "smiles": "COc1cc2ncnc(c2cc1OCCCN1CCOCC1)Nc1ccc(c(c1)Cl)F"
    }
  ]
}

Run with split inputs:

prodock \
  --config config.json \
  --receptor-json receptor.json \
  --ligand-json ligand.json

If --receptor-json or --ligand-json is not supplied, the CLI falls back to embedded receptors or ligands from --config.

Python automation#

prodock()

Run the same automated workflow directly from Python.

Use prodock() when you want the fastest end-to-end workflow from Python.

Minimal run:

from prodock import prodock

result = prodock(
    "Quick_Run",
    receptors=RECEPTORS,
    ligands=LIGANDS,
)

print(result.campaign_json)

Interaction-aware run:

from prodock import prodock

result = prodock(
    "Interaction_Run",
    receptors=RECEPTORS,
    ligands=LIGANDS,
    engines=["qvina", "qvina-w"],
    extract_interaction=True,
)

print(result.pose_df.head())
print(result.merged_df.head())

Database-focused run:

from prodock import prodock

result = prodock(
    "Database_Run",
    receptors=RECEPTORS,
    ligands=LIGANDS,
    engines=["qvina", "qvina-w"],
    extract_interaction=True,
    save_to_database=True,
    db_name="results.db",
)

print(result.db_path)

Prepared-input modes#

Prepared receptors and ligand directories

Skip preprocessing when docking-ready inputs already exist.

Automation also supports already prepared inputs.

Prepared receptor mode:

{
  "project_dir": "DemoPrepared",
  "prepared_receptors": [
    {
      "receptor_id": "4WKQ",
      "receptor_pdbqt": "prepared/4WKQ/4WKQ.pdbqt",
      "center": [5.0, 10.0, 12.0],
      "size": [20.0, 20.0, 20.0]
    }
  ],
  "ligands": [
    {
      "id": "erlotinib",
      "smiles": "COCCOc1cc2c(ncnc2cc1OCCOC)Nc1cccc(c1)C#C"
    }
  ],
  "config": {
    "engines": ["qvina"],
    "save_to_database": true
  }
}

Ligand directory mode:

{
  "project_dir": "DemoLigandDir",
  "receptors": [
    {
      "pdb_id": "4WKQ",
      "receptor_name": "EGFR_4WKQ",
      "ligand_code": "IRE",
      "chains": ["A"],
      "cofactors": []
    }
  ],
  "ligand_dir": "prepared_ligands",
  "config": {
    "engines": ["qvina", "vina"],
    "extract_interaction": false
  }
}

Python prepared-input example:

from prodock import prodock

result = prodock(
    "Prepared_Run",
    prepared_receptors=[
        {
            "receptor_id": "4WKQ",
            "receptor_pdbqt": "Prepared_Run/4WKQ/filtered_protein/4WKQ.pdbqt",
            "center": (2.865, 193.257, 21.367),
            "size": (27.091, 27.091, 27.091),
        }
    ],
    ligand_dir="Prepared_Run/ligands",
    engines=["qvina"],
    extract_interaction=True,
    save_to_database=True,
    db_name="prepared.db",
)

print(result.db_path)

CLI overrides#

Override config values from the terminal

Keep stable defaults in JSON and vary runtime behavior with CLI flags.

Override engines and compute settings:

prodock \
  --config config.json \
  --receptor-json receptor.json \
  --ligand-json ligand.json \
  --engines qvina smina vina \
  --cpu 8 \
  --n-jobs 8 \
  --exhaustiveness 16 \
  --n-poses 20

Boolean workflow flags support both positive and negative forms:

prodock --config run.json --progress
prodock --config run.json --no-progress

prodock --config run.json --extract-interaction
prodock --config run.json --no-extract-interaction

prodock --config run.json --save-to-database
prodock --config run.json --no-save-to-database

prodock --config run.json --replace
prodock --config run.json --no-replace

Interaction-focused overrides:

prodock \
  --config run.json \
  --extract-interaction \
  --interaction-batch-size 8 \
  --interaction-n-jobs 4 \
  --interaction-progress \
  --include-fingerprint-columns \
  --include-interaction-events

Use the InteractionProfiler backend explicitly:

prodock \
  --config run.json \
  --extract-interaction \
  --use-interaction-profiler

Database-focused overrides:

prodock \
  --config run.json \
  --save-to-database \
  --db-name demo.db \
  --replace \
  --replace-interactions

Validation and reproducibility#

Validate, inspect, and save the merged config

Check the final resolved workflow before running the campaign.

The CLI can validate merged inputs without running docking:

prodock \
  --config config.json \
  --receptor-json receptor.json \
  --ligand-json ligand.json \
  --validate-only

Print the final merged effective configuration:

prodock \
  --config config.json \
  --receptor-json receptor.json \
  --ligand-json ligand.json \
  --print-effective-config

Write the effective configuration to disk:

prodock \
  --config config.json \
  --receptor-json receptor.json \
  --ligand-json ligand.json \
  --effective-config-json effective.json

Write a compact run summary:

prodock \
  --config run.json \
  --summary-json summary.json

Show a traceback on errors:

prodock --config run.json --traceback

These options are especially useful for debugging configuration merges, keeping reproducible campaign records, and capturing exactly what was executed.

Path resolution notes#

Relative paths inside each JSON file are resolved relative to the directory of that JSON file.

Relative paths passed to:

  • --summary-json

  • --effective-config-json

are resolved relative to the main --config directory.

Minimal end-to-end example#

Example

Run one complete automated campaign from split JSON files

prodock \
  --config config.json \
  --receptor-json receptor.json \
  --ligand-json ligand.json \
  --engines qvina qvina-w \
  --extract-interaction \
  --save-to-database \
  --db-name results.db \
  --effective-config-json effective.json \
  --summary-json summary.json

See also#

  • Core API — full reference for the high-level automation entry points

  • Preprocess — prepare inputs explicitly before docking

  • Dock — run single or batch docking directly

  • Postprocess — analyze docking outputs after the campaign

  • Database — store and query campaigns in SQLite