Structure API#

PDB query#

class PDBQuery(
pdb_id,
output_dir,
chains=None,
ligand_code='',
ligand_name=None,
cofactors=None,
protein_name=None,
auto_create_dirs=True,
)#

Bases: object

Thin public wrapper around the PDB orchestrator.

This wrapper preserves the old public API surface while delegating the implementation of individual steps to smaller modules (fetch, convert, selection, etc.). Use this class when you want a single-entry point that behaves like the original PDBQuery.

Parameters:
  • pdb_id (str) – PDB identifier (case-insensitive), e.g. "5N2F".

  • output_dir (Union[str, Path]) – Base output directory where per-PDB subfolders will be created.

  • chains (Optional[Sequence[str]]) – Sequence of chain identifiers to keep (e.g. ["A"]). If None or empty, all chains are preserved.

  • ligand_code (str) – Three-letter ligand residue name to extract (e.g. "HEM" or "8HW").

  • ligand_name (Optional[str]) – Friendly ligand name. Kept for compatibility; canonical filenames use ligand_code.

  • cofactors (Optional[Sequence[str]]) – Residue names to preserve when cleaning solvents (e.g. ["HEM"]).

  • protein_name (Optional[str]) – Optional friendly protein name (not used for canonical filenames).

  • auto_create_dirs (bool) – If True (default) the orchestrator will create the standard output sub-folders (fetched_protein, filtered_protein, reference_ligand, cocrystal). Set to False to opt out of automatic directory creation.

  • output_dir

Returns:

instance of PDBQuery

Note

The machinery for fetching / selection / conversion is implemented in prodock.structure.fetch, prodock.structure.selection and prodock.structure.convert. The wrapper only exposes a compact, backward-compatible API.

Examples#

Basic usage:

.. code-block:: python

    from prodock.structure import PDBQuery

    pq = PDBQuery(
        pdb_id="5N2F",
        output_dir="out/5N2F",
        chains=["A"],
        ligand_code="HEM",
        cofactors=["HEM"],
        auto_create_dirs=True
    )
    pq.run_all()

Batch usage (convenience helper):

.. code-block:: python

    items = [
        {"pdb_id": "5N2F", "ligand_code": "HEM", "chains": ["A"]},
        {"pdb_id": "1ABC", "ligand_code": "ABC", "chains": []},
    ]
    PDBQuery.process_batch(items, output_dir="out/batch")
validate()#

Ensure runtime preconditions and (optionally) create output directories.

Returns#

self

run_all()#

Run the full pipeline:

validate() -> fetch() -> filter_chains() -> extract_ligand() -> clean_solvents_and_cofactors() -> save_filtered_protein()

Returns#

self

property pdb_path: str | None#
property filtered_protein_path: str | None#
property reference_ligand_path: str | None#
property cocrystal_ligand_path: str | None#
classmethod process_batch(items, output_dir, **kwargs)#

Batch helper (kept for backwards compatibility).

Example#

items = [{“pdb_id”: “5N2F”, “ligand_code”: “HEM”, “chains”: [“A”]}] PDBQuery.process_batch(items, output_dir=”out/batch”)

Parameters:

PDB engine#

class PDBEngine(
pdb_id,
base_out,
chains=None,
ligand_code='',
cofactors=None,
auto_create_dirs=True,
)#

Bases: object

Step-wise backend engine for preparing a PDB structure for downstream use.

This class orchestrates a typical receptor-preparation workflow around a PyMOL session. The workflow can include:

  • validating runtime requirements and output paths

  • fetching a structure by PDB identifier

  • filtering the structure to selected chains

  • extracting a bound ligand as reference and cocrystal files

  • removing solvents while optionally preserving cofactors

  • saving the filtered protein structure

The engine is designed in a fluent style, so most public methods return the current instance and can be chained.

Parameters:
  • pdb_id (str) – PDB identifier of the structure to fetch and process.

  • base_out (Path) – Base output directory under which subdirectories for fetched proteins, filtered proteins, reference ligands, and cocrystal ligands are created.

  • chains (Optional[List[str]]) – Optional list of chain identifiers to keep. If empty or None, all chains are retained.

  • ligand_code (str) – Residue name of the ligand to extract, for example "ATP" or "HEM". If empty, ligand extraction is skipped.

  • cofactors (Optional[List[str]]) – Optional list of residue names that should be preserved even if they appear in the solvent-removal list.

  • auto_create_dirs (bool) – Whether output directories should be created automatically when needed.

Raises:

RuntimeError – Raised later by validate() or fetch() if PyMOL is not available at runtime.

Example#

Basic end-to-end usage:

from prodock.structure.pdb_engine import PDBEngine
from pathlib import Path
engine = (
    PDBEngine(
        pdb_id="1M17",
        base_out=Path("tutorial/1M17"),
        chains=["A"],
        ligand_code="AQ4",
        cofactors=[],
    )
    .run_all()
)

print(engine.filtered_path)
print(engine.ref_path)
print(engine.cocrystal_path)

Example#

Step-wise usage for finer control:

engine = PDBEngine(
    pdb_id="1M17",
        base_out=Path("tutorial/1M17"),
        chains=["A"],
        ligand_code="AQ4",
        cofactors=[],
)

(
    engine.validate()
    .fetch()
    .filter_chains()
    .extract_ligand()
    .clean_solvents_and_cofactors()
    .save_filtered_protein()
)
validate()#

Validate runtime requirements and initialize canonical output paths.

This method verifies that the PyMOL cmd API is available, ensures that required output directories exist, and sets the expected output file paths for the fetched structure, filtered protein, reference ligand, and cocrystal ligand.

Returns:

The current engine instance.

Return type:

PDBEngine

Raises:

RuntimeError – If PyMOL cmd is not importable.

Example#

engine = PDBEngine("1ABC", Path("out"), ligand_code="LIG")
engine.validate()

print(engine.pdb_path)
print(engine.filtered_path)
print(engine.ref_path)
print(engine.cocrystal_path)
fetch()#

Fetch the requested PDB structure and load it into the active PyMOL session.

The structure file is retrieved via fetch_pdb_to_dir(), stored in the configured fetch directory, and then loaded into PyMOL.

Returns:

The current engine instance.

Return type:

PDBEngine

Raises:

RuntimeError – If PyMOL cmd is not available.

Example#

engine = PDBEngine("1ABC", Path("out")).validate().fetch()
print(engine.pdb_path)
filter_chains()#

Keep only the requested chains in the PyMOL session.

If no chains were configured, the structure is left unchanged. When chains are provided, a PyMOL selection is built and all other atoms are removed from the current session.

Returns:

The current engine instance.

Return type:

PDBEngine

Example#

engine = (
    PDBEngine("1ABC", Path("out"), chains=["A", "B"])
    .validate()
    .fetch()
    .filter_chains()
)
extract_ligand()#

Extract the requested ligand and save reference and cocrystal files.

The ligand is written first as a temporary PDB file and then converted into the configured SDF outputs. If chain identifiers were provided, extraction is attempted chain by chain. If no chains were provided, extraction is attempted without chain restriction.

If no ligand code was configured, this step is skipped.

Returns:

The current engine instance.

Return type:

PDBEngine

Raises:

RuntimeError – If ligand extraction was requested but no ligand could be saved.

Example#

engine = (
    PDBEngine(
        pdb_id="1ABC",
        base_out=Path("out"),
        ligand_code="ATP",
        chains=["A"],
    )
    .validate()
    .fetch()
    .extract_ligand()
)

print(engine.ref_path)
print(engine.cocrystal_path)
clean_solvents_and_cofactors()#

Remove configured solvent residues while optionally preserving cofactors.

Solvent residue names are taken from DEFAULT_SOLVENTS. If cofactors were configured, they are excluded from removal even if they overlap with the solvent list.

Returns:

The current engine instance.

Return type:

PDBEngine

Example#

engine = (
    PDBEngine(
        pdb_id="1ABC",
        base_out=Path("out"),
        cofactors=["MG", "ZN"],
    )
    .validate()
    .fetch()
    .clean_solvents_and_cofactors()
)
save_filtered_protein()#

Save the current PyMOL session as the filtered protein structure.

The structure is saved to filtered_path using the PyMOL selection "all". After saving, the PyMOL session is cleared with cmd.delete("all") on a best-effort basis.

Returns:

The current engine instance.

Return type:

PDBEngine

Example#

engine = (
    PDBEngine("1ABC", Path("out"))
    .validate()
    .fetch()
    .save_filtered_protein()
)

print(engine.filtered_path)
run_all()#

Execute the full PDB preparation workflow.

The workflow consists of:

  1. validate()

  2. fetch()

  3. filter_chains()

  4. extract_ligand()

  5. clean_solvents_and_cofactors()

  6. save_filtered_protein()

Returns:

The current engine instance after all processing steps complete.

Return type:

PDBEngine

Example#

engine = PDBEngine(
    pdb_id="1ABC",
    base_out=Path("out"),
    chains=["A"],
    ligand_code="LIG",
    cofactors=["MG"],
).run_all()

print("Filtered protein:", engine.filtered_path)
print("Reference ligand:", engine.ref_path)
print("Cocrystal ligand:", engine.cocrystal_path)

PDBQT sanitizer#

class PDBQTSanitizer(path=None, *, backend='meeko')#

Bases: object

Backend-aware PDBQT sanitizer and validator.

This sanitizer is designed for ligand PDBQT compatibility with older Vina/QuickVina-family parsers that are sensitive to fixed-column formatting.

Main behavior#

  • rebuilds ATOM/HETATM lines into one consistent fixed-width format

  • preserves legacy AutoDock/Vina atom types such as A, OA, NA, SA, and HD

  • selectively downgrades unsupported pseudo-types such as CG0 and G0

  • keeps torsion-tree records unchanged

Parameters:
  • path (Optional[str | Path])

  • backend (SanitizeBackend)

Conversion#

convert_with_mekoo(
mekoo_cmd,
input_pdb,
out_basename,
write_pdbqt=None,
box_center=None,
box_size=None,
*,
sanitize_rebuild=True,
sanitize_aggressive=False,
sanitize_backup=False,
)#

Run a Meeko receptor-preparation command and collect produced artifacts.

This helper calls mk_prepare_receptor.py (or a compatible wrapper) to generate receptor-preparation outputs. If a PDBQT file is produced, it is always sanitized in place.

Parameters:
  • mekoo_cmd (str) – Path or executable name for the Meeko receptor-preparation command.

  • input_pdb (Path) – Input PDB file passed to Meeko.

  • out_basename (Path) – Base output path used by Meeko for generated files, without the final extension.

  • write_pdbqt (Optional[Path]) – Optional explicit path for a PDBQT output file.

  • box_center (Optional[Tuple[float, float, float]]) – Optional grid-box center as (x, y, z).

  • box_size (Optional[Tuple[float, float, float]]) – Optional grid-box size as (sx, sy, sz).

  • sanitize_rebuild (Optional[bool]) – Whether sanitization should rebuild fixed-width ATOM/HETATM records.

  • sanitize_aggressive (bool) – Whether sanitization should use aggressive cleanup heuristics.

  • sanitize_backup (bool) – Whether a backup should be created before in-place sanitization.

Returns:

Dictionary summarizing command execution. Keys include "called", "rc", "stdout", "stderr", and "produced".

Return type:

Dict[str, Any]

convert_with_meeko(
mekoo_cmd,
input_pdb,
out_basename,
write_pdbqt=None,
box_center=None,
box_size=None,
*,
sanitize_rebuild=True,
sanitize_aggressive=False,
sanitize_backup=False,
)#

Run a Meeko receptor-preparation command and collect produced artifacts.

This helper calls mk_prepare_receptor.py (or a compatible wrapper) to generate receptor-preparation outputs. If a PDBQT file is produced, it is always sanitized in place.

Parameters:
  • mekoo_cmd (str) – Path or executable name for the Meeko receptor-preparation command.

  • input_pdb (Path) – Input PDB file passed to Meeko.

  • out_basename (Path) – Base output path used by Meeko for generated files, without the final extension.

  • write_pdbqt (Optional[Path]) – Optional explicit path for a PDBQT output file.

  • box_center (Optional[Tuple[float, float, float]]) – Optional grid-box center as (x, y, z).

  • box_size (Optional[Tuple[float, float, float]]) – Optional grid-box size as (sx, sy, sz).

  • sanitize_rebuild (Optional[bool]) – Whether sanitization should rebuild fixed-width ATOM/HETATM records.

  • sanitize_aggressive (bool) – Whether sanitization should use aggressive cleanup heuristics.

  • sanitize_backup (bool) – Whether a backup should be created before in-place sanitization.

Returns:

Dictionary summarizing command execution. Keys include "called", "rc", "stdout", "stderr", and "produced".

Return type:

Dict[str, Any]

convert_with_obabel(
input_path,
output_path,
extra_args=None,
*,
sanitize_rebuild=False,
sanitize_aggressive=False,
sanitize_backup=False,
validate_receptor=False,
flexibility=False,
)#

Convert a structure file with Open Babel and sanitize generated PDBQT output.

Input and output formats are inferred from file extensions and passed to Open Babel using the canonical CLI form:

obabel -i<in_ext> input -o<out_ext> -O output

When the output format is .pdbqt, the generated file is sanitized in place. For receptor preparation, the helper can also validate whether the written PDBQT matches the requested rigid or flexible mode.

Parameters:
  • input_path (Path) – Input structure file path.

  • output_path (Path) – Output structure file path.

  • extra_args (Optional[Sequence[str]]) – Optional extra CLI arguments forwarded to Open Babel.

  • sanitize_rebuild (Optional[bool]) – Whether sanitization should rebuild fixed-width ATOM/HETATM records.

  • sanitize_aggressive (bool) – Whether sanitization should use aggressive cleanup heuristics.

  • sanitize_backup (bool) – Whether a backup should be created before in-place sanitization.

  • validate_receptor (bool) – Whether to validate the generated PDBQT as receptor output.

  • flexibility (bool) – Whether receptor PDBQT output should be generated in flexible mode.

Returns:

This function returns nothing.

Return type:

None

Raises:
  • RuntimeError – If Open Babel is unavailable, conversion fails, output is missing, or receptor validation fails.

  • ValueError – If the input or output file extension is missing.

pdb_to_pdbqt(
input_pdb,
output_pdbqt,
*,
mode,
backend,
extra_args=None,
meeko_cmd=None,
mgltools_cmd=None,
sanitize_rebuild=None,
sanitize_aggressive=False,
sanitize_backup=False,
validate_receptor=None,
flexibility=False,
)#

Convert PDB to PDBQT using a single explicit backend.

Parameters:
  • input_pdb (str | Path)

  • output_pdbqt (str | Path)

  • mode (Literal['receptor', 'ligand'])

  • backend (Literal['meeko', 'obabel', 'mgltools'])

  • extra_args (Sequence[str] | None)

  • meeko_cmd (str | None)

  • mgltools_cmd (str | None)

  • sanitize_rebuild (bool | None)

  • sanitize_aggressive (bool)

  • sanitize_backup (bool)

  • validate_receptor (bool | None)

  • flexibility (bool)

Return type:

Path

pdbqt_to_pdb(input_pdbqt, output_pdb, *, backend, extra_args=None)#

Convert PDBQT to PDB using Open Babel.

Parameters:
Return type:

Path

sdf_to_pdb(input_sdf, output_pdb, *, backend, extra_args=None)#

Convert SDF to PDB using a single explicit backend.

Parameters:
Return type:

Path

sdf_to_pdbqt(
input_sdf,
output_pdbqt,
*,
backend,
tmp_from_sdf_backend='rdkit',
extra_args=None,
meeko_cmd=None,
mgltools_cmd=None,
sanitize_rebuild=None,
sanitize_aggressive=False,
sanitize_backup=False,
)#

Convert SDF to PDBQT using a single explicit backend.

Parameters:
  • input_sdf (str | Path)

  • output_pdbqt (str | Path)

  • backend (Literal['meeko', 'obabel', 'mgltools'])

  • tmp_from_sdf_backend (Literal['rdkit', 'obabel'])

  • extra_args (Sequence[str] | None)

  • meeko_cmd (str | None)

  • mgltools_cmd (str | None)

  • sanitize_rebuild (bool | None)

  • sanitize_aggressive (bool)

  • sanitize_backup (bool)

Return type:

Path

ensure_pdbqt(
input_path,
output_dir,
*,
backend,
mode='ligand',
tmp_from_sdf_backend='rdkit',
extra_args=None,
meeko_cmd=None,
mgltools_cmd=None,
sanitize_rebuild=None,
sanitize_aggressive=False,
sanitize_backup=False,
validate_receptor=None,
flexibility=False,
)#

Ensure that an input file is available as PDBQT using the chosen backend.

Parameters:
  • input_path (str | Path)

  • output_dir (str | Path)

  • backend (Literal['meeko', 'obabel', 'mgltools'])

  • mode (Literal['receptor', 'ligand'])

  • tmp_from_sdf_backend (Literal['rdkit', 'obabel'])

  • extra_args (Sequence[str] | None)

  • meeko_cmd (str | None)

  • mgltools_cmd (str | None)

  • sanitize_rebuild (bool | None)

  • sanitize_aggressive (bool)

  • sanitize_backup (bool)

  • validate_receptor (bool | None)

  • flexibility (bool)

Return type:

Path

pdb_to_sdf(input_pdb, output_sdf, *, backend, extra_args=None)#

Convert PDB to SDF using a single explicit backend.

Parameters:
Return type:

Path

pdbqt_to_sdf(
input_pdbqt,
output_sdf,
*,
backend='auto',
extra_args=None,
is_dlg=False,
sanitize=True,
)#

Convert PDBQT to SDF.

Parameters:
Return type:

Path

load_sdf_for_interactions(
sdf_path,
*,
sanitize=True,
remove_hs=False,
strict_parsing=False,
resname='LIG',
resnumber=1,
chain='',
)#

Load the first valid ligand molecule from an SDF for interaction analysis.

This helper is intended for the common ProDock interaction workflow where a temporary SDF already corresponds to a single pose. It applies a relaxed fallback strategy if strict RDKit sanitization fails and restores minimal residue metadata required by downstream ProLIF conversion.

Parameters:
  • sdf_path (Union[str, Path]) – Input SDF path.

  • sanitize (bool) – Whether to sanitize on the first RDKit load attempt.

  • remove_hs (bool) – Whether RDKit should remove hydrogens during load.

  • strict_parsing (bool) – Whether RDKit strict parsing should be enabled.

  • resname (str) – Default residue name to attach if missing.

  • resnumber (int) – Default residue number to attach if missing.

  • chain (str) – Default chain identifier to attach if missing.

Returns:

First valid RDKit molecule from the SDF.

Return type:

Any

Raises:

ValueError – If no valid molecule can be loaded.