Structure API#
PDB query#
- class PDBQuery(
- pdb_id,
- output_dir,
- chains=None,
- ligand_code='',
- ligand_name=None,
- cofactors=None,
- protein_name=None,
- auto_create_dirs=True,
Bases:
objectThin public wrapper around the PDB orchestrator.
This wrapper preserves the old public API surface while delegating the implementation of individual steps to smaller modules (fetch, convert, selection, etc.). Use this class when you want a single-entry point that behaves like the original PDBQuery.
- Parameters:
pdb_id (str) – PDB identifier (case-insensitive), e.g.
"5N2F".output_dir (Union[str, Path]) – Base output directory where per-PDB subfolders will be created.
chains (Optional[Sequence[str]]) – Sequence of chain identifiers to keep (e.g.
["A"]). IfNoneor empty, all chains are preserved.ligand_code (str) – Three-letter ligand residue name to extract (e.g.
"HEM"or"8HW").ligand_name (Optional[str]) – Friendly ligand name. Kept for compatibility; canonical filenames use
ligand_code.cofactors (Optional[Sequence[str]]) – Residue names to preserve when cleaning solvents (e.g.
["HEM"]).protein_name (Optional[str]) – Optional friendly protein name (not used for canonical filenames).
auto_create_dirs (bool) – If
True(default) the orchestrator will create the standard output sub-folders (fetched_protein,filtered_protein,reference_ligand,cocrystal). Set toFalseto opt out of automatic directory creation.output_dir
- Returns:
instance of
PDBQuery
Note
The machinery for fetching / selection / conversion is implemented in
prodock.structure.fetch,prodock.structure.selectionandprodock.structure.convert. The wrapper only exposes a compact, backward-compatible API.Examples#
Basic usage:
.. code-block:: python from prodock.structure import PDBQuery pq = PDBQuery( pdb_id="5N2F", output_dir="out/5N2F", chains=["A"], ligand_code="HEM", cofactors=["HEM"], auto_create_dirs=True ) pq.run_all()
Batch usage (convenience helper):
.. code-block:: python items = [ {"pdb_id": "5N2F", "ligand_code": "HEM", "chains": ["A"]}, {"pdb_id": "1ABC", "ligand_code": "ABC", "chains": []}, ] PDBQuery.process_batch(items, output_dir="out/batch")
- run_all()#
Run the full pipeline:
validate() -> fetch() -> filter_chains() -> extract_ligand() -> clean_solvents_and_cofactors() -> save_filtered_protein()
Returns#
self
PDB engine#
- class PDBEngine(
- pdb_id,
- base_out,
- chains=None,
- ligand_code='',
- cofactors=None,
- auto_create_dirs=True,
Bases:
objectStep-wise backend engine for preparing a PDB structure for downstream use.
This class orchestrates a typical receptor-preparation workflow around a PyMOL session. The workflow can include:
validating runtime requirements and output paths
fetching a structure by PDB identifier
filtering the structure to selected chains
extracting a bound ligand as reference and cocrystal files
removing solvents while optionally preserving cofactors
saving the filtered protein structure
The engine is designed in a fluent style, so most public methods return the current instance and can be chained.
- Parameters:
pdb_id (str) – PDB identifier of the structure to fetch and process.
base_out (Path) – Base output directory under which subdirectories for fetched proteins, filtered proteins, reference ligands, and cocrystal ligands are created.
chains (Optional[List[str]]) – Optional list of chain identifiers to keep. If empty or
None, all chains are retained.ligand_code (str) – Residue name of the ligand to extract, for example
"ATP"or"HEM". If empty, ligand extraction is skipped.cofactors (Optional[List[str]]) – Optional list of residue names that should be preserved even if they appear in the solvent-removal list.
auto_create_dirs (bool) – Whether output directories should be created automatically when needed.
- Raises:
RuntimeError – Raised later by
validate()orfetch()if PyMOL is not available at runtime.
Example#
Basic end-to-end usage:
from prodock.structure.pdb_engine import PDBEngine from pathlib import Path engine = ( PDBEngine( pdb_id="1M17", base_out=Path("tutorial/1M17"), chains=["A"], ligand_code="AQ4", cofactors=[], ) .run_all() ) print(engine.filtered_path) print(engine.ref_path) print(engine.cocrystal_path)
Example#
Step-wise usage for finer control:
engine = PDBEngine( pdb_id="1M17", base_out=Path("tutorial/1M17"), chains=["A"], ligand_code="AQ4", cofactors=[], ) ( engine.validate() .fetch() .filter_chains() .extract_ligand() .clean_solvents_and_cofactors() .save_filtered_protein() )
- validate()#
Validate runtime requirements and initialize canonical output paths.
This method verifies that the PyMOL
cmdAPI is available, ensures that required output directories exist, and sets the expected output file paths for the fetched structure, filtered protein, reference ligand, and cocrystal ligand.- Returns:
The current engine instance.
- Return type:
- Raises:
RuntimeError – If PyMOL
cmdis not importable.
Example#
engine = PDBEngine("1ABC", Path("out"), ligand_code="LIG") engine.validate() print(engine.pdb_path) print(engine.filtered_path) print(engine.ref_path) print(engine.cocrystal_path)
- fetch()#
Fetch the requested PDB structure and load it into the active PyMOL session.
The structure file is retrieved via
fetch_pdb_to_dir(), stored in the configured fetch directory, and then loaded into PyMOL.- Returns:
The current engine instance.
- Return type:
- Raises:
RuntimeError – If PyMOL
cmdis not available.
Example#
engine = PDBEngine("1ABC", Path("out")).validate().fetch() print(engine.pdb_path)
- filter_chains()#
Keep only the requested chains in the PyMOL session.
If no chains were configured, the structure is left unchanged. When chains are provided, a PyMOL selection is built and all other atoms are removed from the current session.
- Returns:
The current engine instance.
- Return type:
Example#
engine = ( PDBEngine("1ABC", Path("out"), chains=["A", "B"]) .validate() .fetch() .filter_chains() )
- extract_ligand()#
Extract the requested ligand and save reference and cocrystal files.
The ligand is written first as a temporary PDB file and then converted into the configured SDF outputs. If chain identifiers were provided, extraction is attempted chain by chain. If no chains were provided, extraction is attempted without chain restriction.
If no ligand code was configured, this step is skipped.
- Returns:
The current engine instance.
- Return type:
- Raises:
RuntimeError – If ligand extraction was requested but no ligand could be saved.
Example#
engine = ( PDBEngine( pdb_id="1ABC", base_out=Path("out"), ligand_code="ATP", chains=["A"], ) .validate() .fetch() .extract_ligand() ) print(engine.ref_path) print(engine.cocrystal_path)
- clean_solvents_and_cofactors()#
Remove configured solvent residues while optionally preserving cofactors.
Solvent residue names are taken from
DEFAULT_SOLVENTS. If cofactors were configured, they are excluded from removal even if they overlap with the solvent list.- Returns:
The current engine instance.
- Return type:
Example#
engine = ( PDBEngine( pdb_id="1ABC", base_out=Path("out"), cofactors=["MG", "ZN"], ) .validate() .fetch() .clean_solvents_and_cofactors() )
- save_filtered_protein()#
Save the current PyMOL session as the filtered protein structure.
The structure is saved to
filtered_pathusing the PyMOL selection"all". After saving, the PyMOL session is cleared withcmd.delete("all")on a best-effort basis.- Returns:
The current engine instance.
- Return type:
Example#
engine = ( PDBEngine("1ABC", Path("out")) .validate() .fetch() .save_filtered_protein() ) print(engine.filtered_path)
- run_all()#
Execute the full PDB preparation workflow.
The workflow consists of:
- Returns:
The current engine instance after all processing steps complete.
- Return type:
Example#
engine = PDBEngine( pdb_id="1ABC", base_out=Path("out"), chains=["A"], ligand_code="LIG", cofactors=["MG"], ).run_all() print("Filtered protein:", engine.filtered_path) print("Reference ligand:", engine.ref_path) print("Cocrystal ligand:", engine.cocrystal_path)
PDBQT sanitizer#
- class PDBQTSanitizer(path=None, *, backend='meeko')#
Bases:
objectBackend-aware PDBQT sanitizer and validator.
This sanitizer is designed for ligand PDBQT compatibility with older Vina/QuickVina-family parsers that are sensitive to fixed-column formatting.
Main behavior#
rebuilds ATOM/HETATM lines into one consistent fixed-width format
preserves legacy AutoDock/Vina atom types such as
A,OA,NA,SA, andHDselectively downgrades unsupported pseudo-types such as
CG0andG0keeps torsion-tree records unchanged
Recommended usage#
sanitizer = PDBQTSanitizer("ligand.pdbqt", backend="meeko") sanitizer.validate(strict=False) sanitizer.sanitize(rebuild=True, aggressive=False) sanitizer.write("ligand.sanitized.pdbqt")
- param path:
Optional path to a PDBQT file to load immediately.
- type path:
Optional[str | pathlib.Path]
- param backend:
Sanitizer behavior profile.
- type backend:
Literal[“meeko”, “obabel”]
- read(path)#
Load a PDBQT file into memory.
- Parameters:
path (str | pathlib.Path) – Input file path.
- Returns:
Current instance.
- Return type:
- write(out_path)#
Write sanitized content to disk.
- Parameters:
out_path (str | pathlib.Path) – Output file path.
- Returns:
Written path.
- Return type:
- sanitize_inplace(rebuild=True, aggressive=False, backup=True)#
Sanitize and overwrite the loaded file.
- Parameters:
- Returns:
Written path.
- Return type:
- classmethod sanitize_file(
- path,
- out_path=None,
- *,
- backend='meeko',
- rebuild=True,
- aggressive=False,
- backup=True,
Convenience wrapper for sanitizing a file.
- Parameters:
path (str | pathlib.Path) – Input PDBQT path.
out_path (Optional[str | pathlib.Path]) – Output path. If
None, overwrite original.backend (Literal["meeko", "obabel"]) – Sanitizer behavior profile.
rebuild (bool) – Rebuild ATOM/HETATM lines into fixed-width PDBQT format.
aggressive (bool) – Allow stronger fallback heuristics for malformed lines.
backup (bool) – Create backup if overwriting.
- Returns:
Sanitized file path.
- Return type:
- validate(strict=False)#
Validate the loaded PDBQT and collect warnings.
- sanitize(rebuild=True, aggressive=False)#
Produce sanitized content.
For older qvina,
rebuild=Trueis strongly recommended.- Parameters:
- Returns:
Current instance.
- Return type:
- Parameters:
path (Optional[str | Path])
backend (SanitizeBackend)
Conversion#
- convert_with_mekoo(
- mekoo_cmd,
- input_pdb,
- out_basename,
- write_pdbqt=None,
- box_center=None,
- box_size=None,
- *,
- sanitize_rebuild=True,
- sanitize_aggressive=False,
- sanitize_backup=False,
Run a Meeko receptor-preparation command and collect produced artifacts.
This helper calls
mk_prepare_receptor.py(or a compatible wrapper) to generate receptor-preparation outputs. If a PDBQT file is produced, it is always sanitized in place.- Parameters:
mekoo_cmd (str) – Path or executable name for the Meeko receptor-preparation command.
input_pdb (Path) – Input PDB file passed to Meeko.
out_basename (Path) – Base output path used by Meeko for generated files, without the final extension.
write_pdbqt (Optional[Path]) – Optional explicit path for a PDBQT output file.
box_center (Optional[Tuple[float, float, float]]) – Optional grid-box center as
(x, y, z).box_size (Optional[Tuple[float, float, float]]) – Optional grid-box size as
(sx, sy, sz).sanitize_rebuild (Optional[bool]) – Whether sanitization should rebuild fixed-width ATOM/HETATM records.
sanitize_aggressive (bool) – Whether sanitization should use aggressive cleanup heuristics.
sanitize_backup (bool) – Whether a backup should be created before in-place sanitization.
- Returns:
Dictionary summarizing command execution. Keys include
"called","rc","stdout","stderr", and"produced".- Return type:
Dict[str, Any]
- convert_with_meeko(
- mekoo_cmd,
- input_pdb,
- out_basename,
- write_pdbqt=None,
- box_center=None,
- box_size=None,
- *,
- sanitize_rebuild=True,
- sanitize_aggressive=False,
- sanitize_backup=False,
Run a Meeko receptor-preparation command and collect produced artifacts.
This helper calls
mk_prepare_receptor.py(or a compatible wrapper) to generate receptor-preparation outputs. If a PDBQT file is produced, it is always sanitized in place.- Parameters:
mekoo_cmd (str) – Path or executable name for the Meeko receptor-preparation command.
input_pdb (Path) – Input PDB file passed to Meeko.
out_basename (Path) – Base output path used by Meeko for generated files, without the final extension.
write_pdbqt (Optional[Path]) – Optional explicit path for a PDBQT output file.
box_center (Optional[Tuple[float, float, float]]) – Optional grid-box center as
(x, y, z).box_size (Optional[Tuple[float, float, float]]) – Optional grid-box size as
(sx, sy, sz).sanitize_rebuild (Optional[bool]) – Whether sanitization should rebuild fixed-width ATOM/HETATM records.
sanitize_aggressive (bool) – Whether sanitization should use aggressive cleanup heuristics.
sanitize_backup (bool) – Whether a backup should be created before in-place sanitization.
- Returns:
Dictionary summarizing command execution. Keys include
"called","rc","stdout","stderr", and"produced".- Return type:
Dict[str, Any]
- convert_with_obabel(
- input_path,
- output_path,
- extra_args=None,
- *,
- sanitize_rebuild=False,
- sanitize_aggressive=False,
- sanitize_backup=False,
- validate_receptor=False,
- flexibility=False,
Convert a structure file with Open Babel and sanitize generated PDBQT output.
Input and output formats are inferred from file extensions and passed to Open Babel using the canonical CLI form:
obabel -i<in_ext> input -o<out_ext> -O output
When the output format is
.pdbqt, the generated file is sanitized in place. For receptor preparation, the helper can also validate whether the written PDBQT matches the requested rigid or flexible mode.- Parameters:
input_path (Path) – Input structure file path.
output_path (Path) – Output structure file path.
extra_args (Optional[Sequence[str]]) – Optional extra CLI arguments forwarded to Open Babel.
sanitize_rebuild (Optional[bool]) – Whether sanitization should rebuild fixed-width ATOM/HETATM records.
sanitize_aggressive (bool) – Whether sanitization should use aggressive cleanup heuristics.
sanitize_backup (bool) – Whether a backup should be created before in-place sanitization.
validate_receptor (bool) – Whether to validate the generated PDBQT as receptor output.
flexibility (bool) – Whether receptor PDBQT output should be generated in flexible mode.
- Returns:
This function returns nothing.
- Return type:
None
- Raises:
RuntimeError – If Open Babel is unavailable, conversion fails, output is missing, or receptor validation fails.
ValueError – If the input or output file extension is missing.
- pdb_to_pdbqt(
- input_pdb,
- output_pdbqt,
- *,
- mode,
- backend,
- extra_args=None,
- meeko_cmd=None,
- mgltools_cmd=None,
- sanitize_rebuild=None,
- sanitize_aggressive=False,
- sanitize_backup=False,
- validate_receptor=None,
- flexibility=False,
Convert PDB to PDBQT using a single explicit backend.
- pdbqt_to_pdb(input_pdbqt, output_pdb, *, backend, extra_args=None)#
Convert PDBQT to PDB using Open Babel.
- sdf_to_pdb(input_sdf, output_pdb, *, backend, extra_args=None)#
Convert SDF to PDB using a single explicit backend.
- sdf_to_pdbqt(
- input_sdf,
- output_pdbqt,
- *,
- backend,
- tmp_from_sdf_backend='rdkit',
- extra_args=None,
- meeko_cmd=None,
- mgltools_cmd=None,
- sanitize_rebuild=None,
- sanitize_aggressive=False,
- sanitize_backup=False,
Convert SDF to PDBQT using a single explicit backend.
- ensure_pdbqt(
- input_path,
- output_dir,
- *,
- backend,
- mode='ligand',
- tmp_from_sdf_backend='rdkit',
- extra_args=None,
- meeko_cmd=None,
- mgltools_cmd=None,
- sanitize_rebuild=None,
- sanitize_aggressive=False,
- sanitize_backup=False,
- validate_receptor=None,
- flexibility=False,
Ensure that an input file is available as PDBQT using the chosen backend.
- Parameters:
backend (Literal['meeko', 'obabel', 'mgltools'])
mode (Literal['receptor', 'ligand'])
tmp_from_sdf_backend (Literal['rdkit', 'obabel'])
meeko_cmd (str | None)
mgltools_cmd (str | None)
sanitize_rebuild (bool | None)
sanitize_aggressive (bool)
sanitize_backup (bool)
validate_receptor (bool | None)
flexibility (bool)
- Return type:
- pdb_to_sdf(input_pdb, output_sdf, *, backend, extra_args=None)#
Convert PDB to SDF using a single explicit backend.
- pdbqt_to_sdf(
- input_pdbqt,
- output_sdf,
- *,
- backend='auto',
- extra_args=None,
- is_dlg=False,
- sanitize=True,
Convert PDBQT to SDF.
- load_sdf_for_interactions(
- sdf_path,
- *,
- sanitize=True,
- remove_hs=False,
- strict_parsing=False,
- resname='LIG',
- resnumber=1,
- chain='',
Load the first valid ligand molecule from an SDF for interaction analysis.
This helper is intended for the common ProDock interaction workflow where a temporary SDF already corresponds to a single pose. It applies a relaxed fallback strategy if strict RDKit sanitization fails and restores minimal residue metadata required by downstream ProLIF conversion.
- Parameters:
sdf_path (Union[str, Path]) – Input SDF path.
sanitize (bool) – Whether to sanitize on the first RDKit load attempt.
remove_hs (bool) – Whether RDKit should remove hydrogens during load.
strict_parsing (bool) – Whether RDKit strict parsing should be enabled.
resname (str) – Default residue name to attach if missing.
resnumber (int) – Default residue number to attach if missing.
chain (str) – Default chain identifier to attach if missing.
- Returns:
First valid RDKit molecule from the SDF.
- Return type:
Any
- Raises:
ValueError – If no valid molecule can be loaded.