Structure#
Retrieve, clean, and organize experimental protein structures for docking
The prodock.structure module converts raw experimental PDB entries into clean receptor and ligand artifacts for downstream docking workflows.
Workflow at a glance#
PDB entry
Experimental structure identifier such as 1M17.
Fetch structure
Download the PDB file and initialize a local receptor workspace.
Filter chains
Retain only the selected receptor chains for downstream preparation.
Extract ligand
Save the bound ligand as a reusable structural reference.
Clean structure
Remove solvent residues while preserving selected cofactors.
Filtered receptor
Clean receptor ready for preprocessing and docking.
Reference ligand
reference_ligand/<ligand_code>.sdf
Main objects#
PDBEngine
Step-wise backend engine for one receptor structure.
PDBQuery
Thin public wrapper for single-entry and batch-oriented workflows.
PDBQTSanitizer
Backend-aware validator and sanitizer for generated PDBQT files.
PDBEngine#
PDBEngine is the main step-wise backend for preparing a PDB structure for downstream use.
It typically performs:
validation of runtime requirements and canonical output paths
fetching the structure into a local working directory
filtering the structure to selected chains
extracting the requested ligand as reference and co-crystal files
removing solvent residues while optionally preserving cofactors
saving the filtered protein structure
Typical example#
from pathlib import Path
from prodock.structure.pdb_engine import PDBEngine
engine = (
PDBEngine(
pdb_id="1M17",
base_out=Path("tutorial/1M17"),
chains=["A"],
ligand_code="AQ4",
cofactors=[],
)
.run_all()
)
print(engine.filtered_path)
print(engine.ref_path)
print(engine.cocrystal_path)
Step-wise usage#
from pathlib import Path
from prodock.structure.pdb_engine import PDBEngine
engine = PDBEngine(
pdb_id="1M17",
base_out=Path("tutorial/1M17"),
chains=["A"],
ligand_code="AQ4",
cofactors=[],
)
(
engine.validate()
.fetch()
.filter_chains()
.extract_ligand()
.clean_solvents_and_cofactors()
.save_filtered_protein()
)
Generated outputs#
fetched_protein/<pdb_id>.pdbfiltered_protein/<pdb_id>.pdbreference_ligand/<ligand_code>.sdfcocrystal/<pdb_id>.sdf
PDBQuery#
PDBQuery is a thin public wrapper around PDBEngine. It preserves a compact,
backward-compatible public API while delegating the actual preparation logic to the
underlying engine.
Batch example#
from prodock.structure.pdb_query import PDBQuery
PROJECT = "tutorial"
RECEPTORS = [
{
"pdb_id": "1M17",
"receptor_name": "EGFR_1M17",
"ligand_code": "AQ4",
"chains": ["A"],
"cofactors": [],
},
{
"pdb_id": "2ITY",
"receptor_name": "EGFR_2ITY",
"ligand_code": "IRE",
"chains": ["A"],
"cofactors": [],
},
{
"pdb_id": "4WKQ",
"receptor_name": "EGFR_4WKQ",
"ligand_code": "IRE",
"chains": ["A"],
"cofactors": [],
},
]
PDBQuery.process_batch(RECEPTORS, output_dir=PROJECT)
Single-receptor example#
from prodock.structure import PDBQuery
pq = PDBQuery(
pdb_id="1M17",
output_dir="tutorial/1M17",
chains=["A"],
ligand_code="AQ4",
cofactors=[],
)
pq.run_all()
print(pq.filtered_protein_path)
print(pq.reference_ligand_path)
print(pq.cocrystal_ligand_path)
PDBQTSanitizer#
PDBQTSanitizer is a backend-aware validator and sanitizer for PDBQT files.
It is useful when a generated .pdbqt contains non-canonical element fields,
atom-type-like trailing tokens, or formatting patterns that may cause failures
across docking backends.
Example#
from prodock.structure import PDBQTSanitizer
sanitizer = PDBQTSanitizer("ligand.pdbqt", backend="meeko")
warnings = sanitizer.validate(strict=True)
sanitizer.sanitize(rebuild=True, aggressive=False)
sanitizer.write("ligand.sanitized.pdbqt")
In-place sanitization#
from prodock.structure import PDBQTSanitizer
sanitizer = PDBQTSanitizer("ligand.pdbqt", backend="obabel")
sanitizer.sanitize_inplace(rebuild=True, aggressive=False, backup=True)
Summary#
The structure module provides the earliest stage of a ProDock workflow:
PDBEnginefor explicit step-wise structure preparationPDBQueryfor compact public and batch interfacesPDBQTSanitizerfor validating and normalizing generated PDBQT files