Structure#

Structure intake

Retrieve, clean, and organize experimental protein structures for docking

The prodock.structure module converts raw experimental PDB entries into clean receptor and ligand artifacts for downstream docking workflows.

Workflow at a glance#

Input

PDB entry

Experimental structure identifier such as 1M17.

Step 1

Fetch structure

Download the PDB file and initialize a local receptor workspace.

Step 2

Filter chains

Retain only the selected receptor chains for downstream preparation.

Step 3

Extract ligand

Save the bound ligand as a reusable structural reference.

Step 4

Clean structure

Remove solvent residues while preserving selected cofactors.

Output

Filtered receptor

Clean receptor ready for preprocessing and docking.

Ligand output

Reference ligand

reference_ligand/<ligand_code>.sdf

Main objects#

⚙️

PDBEngine

Step-wise backend engine for one receptor structure.

📦

PDBQuery

Thin public wrapper for single-entry and batch-oriented workflows.

🧼

PDBQTSanitizer

Backend-aware validator and sanitizer for generated PDBQT files.

PDBEngine#

PDBEngine is the main step-wise backend for preparing a PDB structure for downstream use.

It typically performs:

  • validation of runtime requirements and canonical output paths

  • fetching the structure into a local working directory

  • filtering the structure to selected chains

  • extracting the requested ligand as reference and co-crystal files

  • removing solvent residues while optionally preserving cofactors

  • saving the filtered protein structure

Typical example#

from pathlib import Path
from prodock.structure.pdb_engine import PDBEngine

engine = (
    PDBEngine(
        pdb_id="1M17",
        base_out=Path("tutorial/1M17"),
        chains=["A"],
        ligand_code="AQ4",
        cofactors=[],
    )
    .run_all()
)

print(engine.filtered_path)
print(engine.ref_path)
print(engine.cocrystal_path)

Step-wise usage#

from pathlib import Path
from prodock.structure.pdb_engine import PDBEngine

engine = PDBEngine(
    pdb_id="1M17",
    base_out=Path("tutorial/1M17"),
    chains=["A"],
    ligand_code="AQ4",
    cofactors=[],
)

(
    engine.validate()
          .fetch()
          .filter_chains()
          .extract_ligand()
          .clean_solvents_and_cofactors()
          .save_filtered_protein()
)

Generated outputs#

  • fetched_protein/<pdb_id>.pdb

  • filtered_protein/<pdb_id>.pdb

  • reference_ligand/<ligand_code>.sdf

  • cocrystal/<pdb_id>.sdf

PDBQuery#

PDBQuery is a thin public wrapper around PDBEngine. It preserves a compact, backward-compatible public API while delegating the actual preparation logic to the underlying engine.

Batch example#

from prodock.structure.pdb_query import PDBQuery

PROJECT = "tutorial"

RECEPTORS = [
    {
        "pdb_id": "1M17",
        "receptor_name": "EGFR_1M17",
        "ligand_code": "AQ4",
        "chains": ["A"],
        "cofactors": [],
    },
    {
        "pdb_id": "2ITY",
        "receptor_name": "EGFR_2ITY",
        "ligand_code": "IRE",
        "chains": ["A"],
        "cofactors": [],
    },
    {
        "pdb_id": "4WKQ",
        "receptor_name": "EGFR_4WKQ",
        "ligand_code": "IRE",
        "chains": ["A"],
        "cofactors": [],
    },
]

PDBQuery.process_batch(RECEPTORS, output_dir=PROJECT)

Single-receptor example#

from prodock.structure import PDBQuery

pq = PDBQuery(
    pdb_id="1M17",
    output_dir="tutorial/1M17",
    chains=["A"],
    ligand_code="AQ4",
    cofactors=[],
)

pq.run_all()

print(pq.filtered_protein_path)
print(pq.reference_ligand_path)
print(pq.cocrystal_ligand_path)

PDBQTSanitizer#

PDBQTSanitizer is a backend-aware validator and sanitizer for PDBQT files.

It is useful when a generated .pdbqt contains non-canonical element fields, atom-type-like trailing tokens, or formatting patterns that may cause failures across docking backends.

Example#

from prodock.structure import PDBQTSanitizer

sanitizer = PDBQTSanitizer("ligand.pdbqt", backend="meeko")
warnings = sanitizer.validate(strict=True)
sanitizer.sanitize(rebuild=True, aggressive=False)
sanitizer.write("ligand.sanitized.pdbqt")

In-place sanitization#

from prodock.structure import PDBQTSanitizer

sanitizer = PDBQTSanitizer("ligand.pdbqt", backend="obabel")
sanitizer.sanitize_inplace(rebuild=True, aggressive=False, backup=True)

Summary#

The structure module provides the earliest stage of a ProDock workflow:

  • PDBEngine for explicit step-wise structure preparation

  • PDBQuery for compact public and batch interfaces

  • PDBQTSanitizer for validating and normalizing generated PDBQT files