Preprocess API#

Grid box#

Utilities to compute docking grid boxes from ligand coordinates.

This module provides GridBox, a lightweight object-oriented interface for constructing docking search boxes from ligand coordinates. It supports:

  • loading ligands from file paths or raw structure text,

  • building boxes with multiple algorithms,

  • post-processing boxes by snapping or cubic expansion,

  • exporting Vina-style configuration snippets.

Low-level structure parsing is delegated to prodock.preprocess.gridbox.parsers, while geometric box-construction algorithms are delegated to prodock.preprocess.gridbox.algorithms.

Supported algorithms#

The module-level dispatch helper supports the following algorithm names:

  • "scale"

  • "pad"

  • "advanced"

  • "percentile"

  • "pca-aabb"

  • "centroid-fixed"

  • "union"

Example#

from prodock.preprocess.gridbox.gridbox import GridBox, compute_with_algo

gb = GridBox().load_ligand("ligand.sdf").from_ligand_scale(
    scale=2.0,
    isotropic=True,
)

print(gb.center)
print(gb.size)
print(gb.to_vina_lines())

gb2 = compute_with_algo("pad", "ligand.sdf", pad=3.0, isotropic=False)
print(gb2.to_vina_lines())

Configured automatic execution is also supported:

gb = GridBox(algo="advanced", algo_kwargs={"pad": 4.0, "snap": 0.25})
gb.load_ligand("ligand.sdf")
compute_with_algo(algoname, ligand, **kwargs)#

Load a ligand and compute a grid box using a named algorithm.

This convenience wrapper constructs a GridBox, loads the ligand, looks up the requested algorithm in ALGO_MAP, and applies it using the provided keyword arguments.

Parameters:
  • algoname (str) – Algorithm key defined in ALGO_MAP.

  • ligand (Union[str, Path]) – Ligand source, given either as a filesystem path or raw structure text.

  • kwargs (dict) – Algorithm-specific keyword arguments forwarded to the selected builder.

Returns:

Grid box after algorithm application.

Return type:

GridBox

Raises:

ValueError – If the algorithm name is unknown or ligand parsing fails.

Example#

gb = compute_with_algo(
    "pad",
    "ligand.sdf",
    pad=4.0,
    isotropic=False,
)
class GridBox(mol=None, algo=None, algo_kwargs=None, round_ndigits=3)#

Bases: object

Represent and compute a docking grid box.

Instances store an optional ligand molecule together with a computed box center and size. Builder methods mutate the instance and return self so calls can be chained.

An algorithm may optionally be configured at construction time via algo and algo_kwargs. When present, that algorithm is applied automatically after a ligand is loaded, either through the constructor mol argument or later via load_ligand().

Parameters:
  • mol (Optional[Chem.Mol]) – Optional RDKit molecule used to initialize the object.

  • algo (Optional[str]) – Optional algorithm name corresponding to a key in ALGO_MAP.

  • algo_kwargs (Optional[Dict[str, Any]]) – Optional keyword arguments forwarded to the configured algorithm.

  • round_ndigits (int) – Default number of decimal places inserted into algo_kwargs when not already provided.

Example#

gb = GridBox()
gb.load_ligand("ligand.sdf")
gb.from_ligand_pad(pad=4.0, isotropic=False)

print(gb.center)
print(gb.size)

Automatic algorithm execution:

gb = GridBox(
    algo="percentile",
    algo_kwargs={"low": 5.0, "high": 95.0, "pad": 2.0},
)
gb.load_ligand("ligand.sdf")
load_ligand(data, fmt=None)#

Load a ligand from a path or raw structure text.

If an initialization algorithm was configured when the object was constructed, that algorithm is executed automatically after successful parsing.

Parameters:
  • data (Union[str, Path]) – Ligand source as a filesystem path or raw molecular text.

  • fmt (Optional[str]) – Optional explicit format hint such as "sdf", "pdb", "mol2", or "xyz".

Returns:

The current grid box instance.

Return type:

GridBox

Raises:

ValueError – If ligand parsing fails.

Example#

gb = GridBox()
gb.load_ligand("ligand.sdf")
from_ligand_scale(scale=2.0, isotropic=False, round_ndigits=3)#

Build a box by scaling the ligand axis-aligned bounding box.

The box center is computed from the ligand coordinate bounds, and the box size is computed as ligand span multiplied by scale.

Parameters:
  • scale (float) – Multiplicative factor applied to the ligand span.

  • isotropic (bool) – If True, use the maximum scaled span for all axes to produce a cubic box.

  • round_ndigits (int) – Number of decimal places used to round output values.

Returns:

The current grid box instance.

Return type:

GridBox

Raises:

ValueError – If no ligand has been loaded.

from_ligand_pad(pad=4.0, isotropic=False, min_size=0.0, round_ndigits=3)#

Build a box by padding the ligand axis-aligned bounding box.

The resulting box size is the ligand span plus twice the padding, with optional minimum edge lengths enforced afterwards.

Parameters:
  • pad (Union[float, Tuple[float, float, float]]) – Padding in Ångström, provided either as a scalar or per-axis triple.

  • isotropic (bool) – If True, first convert the ligand span to a cubic span before applying padding.

  • min_size (Union[float, Tuple[float, float, float]]) – Minimum allowed size, provided either as a scalar or per-axis triple.

  • round_ndigits (int) – Number of decimal places used to round output values.

Returns:

The current grid box instance.

Return type:

GridBox

Raises:

ValueError – If no ligand has been loaded.

from_ligand_pad_adv(
pad=4.0,
isotropic=False,
min_size=0.0,
*,
heavy_only=False,
snap_step=None,
round_ndigits=3,
)#

Build a box with advanced padding logic.

This method extends simple padding with optional heavy-atom-only bounds and optional snapping of the resulting center and size.

Parameters:
  • pad (Union[float, Tuple[float, float, float]]) – Padding in Ångström, provided either as a scalar or per-axis triple.

  • isotropic (bool) – If True, produce a cubic box using the maximum span.

  • min_size (Union[float, Tuple[float, float, float]]) – Minimum allowed size, provided either as a scalar or per-axis triple.

  • heavy_only (bool) – If True, compute the ligand bounds using heavy atoms only.

  • snap_step (Optional[float]) – Optional snapping interval in Ångström applied to center and size.

  • round_ndigits (int) – Number of decimal places used to round output values.

Returns:

The current grid box instance.

Return type:

GridBox

Raises:

ValueError – If no ligand has been loaded.

from_ligand_percentile(
low=5.0,
high=95.0,
pad=0.0,
isotropic=False,
round_ndigits=3,
)#

Build a box from coordinate percentiles.

This builder reduces the influence of outlier coordinates by using lower and upper coordinate percentiles instead of raw extrema.

Parameters:
  • low (float) – Lower percentile in the range 0 to 100.

  • high (float) – Upper percentile in the range 0 to 100.

  • pad (float) – Padding in Ångström applied after percentile bounds are computed.

  • isotropic (bool) – If True, make the final box cubic using the maximum span.

  • round_ndigits (int) – Number of decimal places used to round output values.

Returns:

The current grid box instance.

Return type:

GridBox

Raises:

ValueError – If no ligand has been loaded.

from_ligand_pca_aabb(scale=1.0, pad=0.0, isotropic=False, round_ndigits=3)#

Build a box using a PCA-oriented bounding procedure.

The ligand is analyzed in a PCA frame, expanded there, and the result is converted back to an axis-aligned bounding box in the original frame.

Parameters:
  • scale (float) – Scale factor applied in PCA space.

  • pad (float) – Padding in Ångström applied in PCA space.

  • isotropic (bool) – If True, make the final box cubic using the maximum axis.

  • round_ndigits (int) – Number of decimal places used to round output values.

Returns:

The current grid box instance.

Return type:

GridBox

Raises:

ValueError – If no ligand has been loaded.

from_centroid_fixed(size)#

Center the box at the ligand centroid and use a fixed user-supplied size.

Parameters:

size (Tuple[float, float, float]) – Explicit box size given as (sx, sy, sz) in Ångström.

Returns:

The current grid box instance.

Return type:

GridBox

Raises:

ValueError – If no ligand has been loaded or if size contains non-positive values.

from_union(ligand_paths, fmt=None, pad=0.0, round_ndigits=3)#

Build the axis-aligned union of boxes computed for multiple ligands.

Each ligand is parsed independently and converted into a padded box before all boxes are merged.

Parameters:
  • ligand_paths (Iterable[Union[str, Path]]) – Iterable of ligand paths or raw text entries.

  • fmt (Optional[str]) – Optional format hint used when entries are raw text.

  • pad (float) – Padding in Ångström applied to each ligand before union.

  • round_ndigits (int) – Number of decimal places used to round output values.

Returns:

The current grid box instance.

Return type:

GridBox

Raises:

ValueError – If any ligand fails to parse or if no ligands are provided.

Example#

gb = GridBox().from_union(
    ["lig1.sdf", "lig2.sdf", "lig3.sdf"],
    pad=2.0,
)
grow_to_min_cube()#

Expand the current box into the smallest cube containing it.

The center is preserved, and all three edge lengths are set to the current maximum edge length.

Returns:

The current grid box instance.

Return type:

GridBox

snap(step=0.25, round_ndigits=3)#

Snap the current center and size to a regular grid.

Center and size values are snapped to multiples of step and then rounded to round_ndigits decimal places.

Parameters:
  • step (float) – Grid step size in Ångström.

  • round_ndigits (int) – Number of decimal places used after snapping.

Returns:

The current grid box instance.

Return type:

GridBox

property center: Tuple[float, float, float]#

Return the computed box center.

Returns:

Box center as (x, y, z) in Ångström.

Return type:

Tuple[float, float, float]

Raises:

ValueError – If the center has not been computed yet.

property size: Tuple[float, float, float]#

Return the computed box size.

Returns:

Box size as (sx, sy, sz) in Ångström.

Return type:

Tuple[float, float, float]

Raises:

ValueError – If the size has not been computed yet.

property vina_dict: Dict[str, float]#

Return the box in a Vina-compatible dictionary representation.

The returned dictionary contains the six standard Vina keys: center_x, center_y, center_z, size_x, size_y, and size_z.

Returns:

Dictionary of Vina-style box parameters.

Return type:

Dict[str, float]

to_vina_lines(fmt='{k} = {v:.3f}')#

Render the current box as a Vina-style multiline text block.

Parameters:

fmt (str) – Per-line format string receiving k and v fields.

Returns:

Multiline text snippet containing Vina box parameters.

Return type:

str

Example#

print(gb.to_vina_lines())
as_tuple()#

Return the box as (center, size).

Returns:

Pair of tuples containing center and size.

Return type:

Tuple[Tuple[float, float, float], Tuple[float, float, float]]

Ligand preparation#

prodock.process.ligand.prep#

Utilities for converting SMILES strings into per-ligand 3D structure files.

This module provides the LigandPrep class, which supports:

  • loading ligands from SMILES lists, dictionaries, or pandas DataFrames,

  • optional 3D embedding and geometry optimization,

  • writing one intermediate SDF per ligand,

  • converting SDF files into final formats such as PDB or PDBQT,

  • keeping all generated structures in memory as MolBlock strings,

  • exporting a CSV manifest summarizing processing results.

Overview#

Each ligand is represented internally as a record containing the input SMILES, optional name, processing status, output path, error message, and generated MolBlock string.

Default behavior#

  • final output format: "pdbqt"

  • conversion backend: "meeko"

  • explicit hydrogens are added before embedding

  • 3D embedding is enabled

  • geometry optimization is enabled

  • intermediate SDF files are removed after conversion unless LigandPrep.set_keep_intermediate() is enabled

Processing model#

For each input ligand, the workflow is typically:

  1. Parse the SMILES string

  2. Build 3D coordinates using either Conformer or an RDKit fallback

  3. Optionally optimize the geometry

  4. Store the generated MolBlock in memory

  5. Write an intermediate SDF file

  6. Optionally convert the SDF into PDB or PDBQT

If the final output format is already "sdf", no additional structure conversion is performed.

Examples#

Basic usage from a list of SMILES:

from prodock.process.ligand import LigandPrep

proc = (
    LigandPrep(output_dir="ligands_out")
    .from_smiles_list(
        ["CCO", "c1ccccc1"],
        names=["ethanol", "benzene"],
    )
    .process_all()
    .save_manifest("ligands_manifest.csv")
)

print(proc.summary)
print(proc.output_paths)

Using in-memory mode only (no files written):

proc = (
    LigandPrep(output_dir=None)
    .set_output_format("sdf")
    .from_smiles_list(["CCO", "CCN"])
    .process_all()
)

print(proc.sdf_strings[0])

Loading from a pandas DataFrame:

import pandas as pd
from prodock.process.ligand import LigandPrep

df = pd.DataFrame(
    {
        "smiles": ["CCO", "CC(=O)O"],
        "name": ["ethanol", "acetic_acid"],
    }
)

proc = (
    LigandPrep(output_dir="ligands_df")
    .from_dataframe(df)
    .set_output_format("pdb")
    .set_converter_backend("obabel")
    .process_all()
)

print(proc.ok)
class LigandPrep(output_dir='ligands_out', smiles_key='smiles', name_key='name', index_pad=4)#

Bases: object

High-level helper to convert SMILES strings into per-ligand 3D structure files.

This class provides a compact workflow for ligand preparation starting from SMILES input. It can ingest ligands from multiple input formats, generate 3D coordinates, optimize geometries, write intermediate SDF files, and convert them into downstream formats such as PDB and PDBQT.

One internal record is stored for each ligand and updated in place during processing.

Internal record schema#

Each record has the form:

{
    "index": int,
    "smiles": str,
    "name": str,
    "out_path": Optional[Path],
    "status": "pending" | "ok" | "failed",
    "error": Optional[str],
    "molblock": Optional[str],
}

Default behavior#

  • output format: "pdbqt"

  • conversion backend: "meeko"

  • intermediate SDF files are deleted by default

  • 3D embedding, hydrogen addition, and optimization are enabled

Notes#

  • If the optional Conformer helper is available, it is preferred for 3D generation.

  • Otherwise, an RDKit-based fallback is used.

  • If output_dir is None, processing is performed entirely in memory and no files are written.

  • If the final output format is not "sdf", an intermediate SDF is written and converted using structure conversion helpers.

param output_dir:

Directory used for writing output files. If None, file output is disabled and structures are only stored in memory.

type output_dir:

Optional[Union[str, Path]]

param smiles_key:

Key used to locate SMILES values in dictionary rows and DataFrames.

type smiles_key:

str

param name_key:

Key used to locate ligand names in dictionary rows and DataFrames.

type name_key:

str

param index_pad:

Zero-padding width used when auto-generating names for unnamed ligands. For example, 4 produces names such as 0000, 0001, and so on.

type index_pad:

int

raises OSError:

If the output directory cannot be created.

Example#

Create and process ligands into PDBQT files:

from prodock.process.ligand import LigandPrep

proc = (
    LigandPrep(output_dir="ligands_out")
    .set_output_format("pdbqt")
    .set_converter_backend("meeko")
    .from_smiles_list(
        ["CCO", "CCN"],
        names=["ethanol", "ethylamine"],
    )
    .process_all()
)

print(proc.summary)
print(proc.output_paths)

Example#

Process ligands without writing any files:

proc = (
    LigandPrep(output_dir=None)
    .from_smiles_list(["CCO"])
    .process_all()
)

print(proc.sdf_strings)
print(proc.mols)
set_options(embed3d=None, add_hs=None, optimize=None)#

Set simple boolean processing options.

Only values explicitly provided are updated. Passing None leaves the corresponding setting unchanged.

Parameters:
  • embed3d (Optional[bool]) – Enable or disable 3D coordinate embedding.

  • add_hs (Optional[bool]) – Enable or disable explicit hydrogen addition before embedding.

  • optimize (Optional[bool]) – Enable or disable geometry optimization after embedding.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = LigandPrep().set_options(
    embed3d=True,
    add_hs=True,
    optimize=False,
)
set_embed_method(embed_algorithm)#

Set the embedding algorithm used by Conformer or RDKit.

Common values include "ETKDGv3", "ETKDGv2", and "ETKDG". Passing None clears the explicit preference and lets the fallback logic choose the best available method.

Parameters:

embed_algorithm (Optional[str]) – Embedding algorithm name, or None.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = LigandPrep().set_embed_method("ETKDGv3")
set_opt_method(method)#

Set the molecular mechanics optimization method.

Typical values include "MMFF94" and "UFF". The chosen method is used for geometry optimization after 3D embedding, when optimization is enabled.

Parameters:

method (str) – Optimizer name.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = LigandPrep().set_opt_method("UFF")
set_conformer_seed(seed)#

Set the random seed used for conformer generation.

This affects deterministic behavior in supported embedding workflows.

Parameters:

seed (int) – Integer random seed.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = LigandPrep().set_conformer_seed(123)
set_conformer_jobs(n_jobs)#

Set the number of parallel jobs used for conformer generation.

This setting is forwarded to the optional Conformer helper when available.

Parameters:

n_jobs (int) – Number of parallel jobs.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = LigandPrep().set_conformer_jobs(4)
set_opt_max_iters(max_iters)#

Set the maximum number of optimization iterations.

This value is used by the selected force-field optimizer.

Parameters:

max_iters (int) – Maximum optimization iteration count.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = LigandPrep().set_opt_max_iters(500)
set_output_format(fmt)#

Set the final output format for processed ligands.

Supported formats are "sdf", "pdb", and "pdbqt".

Parameters:

fmt (str) – Requested output format string.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Raises:

ValueError – If the requested format is unsupported.

Example#

proc = LigandPrep().set_output_format("pdbqt")
set_converter_backend(backend)#

Set the backend used for SDF-to-final-format conversion.

Typical values include:

  • "meeko" for PDBQT conversion

  • "obabel" for PDB or PDBQT conversion

  • "rdkit" for supported PDB conversion paths

Parameters:

backend (Optional[str]) – Backend name, or None to clear the explicit selection.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = LigandPrep().set_converter_backend("obabel")
set_backend(backend)#

Alias for set_converter_backend().

This method exists as a short convenience name.

Parameters:

backend (Optional[str]) – Backend name, or None.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = LigandPrep().set_backend("meeko")
set_keep_intermediate(keep)#

Control whether intermediate SDF files are retained.

When the final output format is not "sdf", each ligand is first written to an intermediate SDF file. By default, that file is removed after conversion. Setting keep=True preserves it.

Parameters:

keep (bool) – Whether to keep intermediate SDF files.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = LigandPrep().set_keep_intermediate(True)
set_output_dir(path)#

Set or clear the output directory used for file writing.

Passing None switches the instance into in-memory mode, where MolBlock strings are still generated but no output files are written.

Parameters:

path (Optional[Union[str, Path]]) – New output directory path, or None to disable file writing.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Raises:

OSError – If the new directory cannot be created.

Example#

proc = LigandPrep().set_output_dir("prepared_ligands")
from_smiles_list(smiles, names=None)#

Load ligand records from a sequence of SMILES strings.

Optional names can be supplied in parallel. If no names are provided, fallback names based on the record index are used when output files are written.

Parameters:
  • smiles (Sequence[str]) – Sequence of SMILES strings.

  • names (Optional[Sequence[str]]) – Optional sequence of ligand names with the same length as smiles.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Raises:

ValueError – If names is provided but its length does not match smiles.

Example#

proc = (
    LigandPrep()
    .from_smiles_list(
        ["CCO", "CCN"],
        names=["ethanol", "ethylamine"],
    )
)
from_list_of_dicts(rows)#

Load ligand records from a sequence of dictionaries.

Each row must contain at least the configured SMILES key. If the configured name key is present, it is used as the ligand name.

Parameters:

rows (Sequence[Dict[str, Any]]) – Sequence of dictionaries containing ligand metadata.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

rows = [
    {"smiles": "CCO", "name": "ethanol"},
    {"smiles": "CCN", "name": "ethylamine"},
]

proc = LigandPrep().from_list_of_dicts(rows)
from_dataframe(df)#

Load ligand records from a pandas DataFrame.

The DataFrame must contain at least the configured SMILES column. If the configured name column exists, it is used to populate ligand names.

Parameters:

df (pandas.DataFrame) – DataFrame containing ligand input records.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Raises:

Example#

import pandas as pd

df = pd.DataFrame(
    {
        "smiles": ["CCO", "CCN"],
        "name": ["ethanol", "ethylamine"],
    }
)

proc = LigandPrep().from_dataframe(df)
process_all(start=0, stop=None)#

Process all loaded ligand records between start and stop.

Each selected record is converted into a MolBlock representation in memory. If file output is enabled, an intermediate SDF is written and optionally converted into the configured final format.

The range follows standard Python slicing rules: start is inclusive and stop is exclusive.

Parameters:
  • start (int) – Start index of the record range to process, inclusive.

  • stop (Optional[int]) – Stop index of the record range to process, exclusive. If None, processing continues to the end of the loaded records.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = (
    LigandPrep(output_dir="ligands_out")
    .from_smiles_list(["CCO", "CCN", "CCC"])
    .process_all(start=1, stop=3)
)

print(proc.summary)
save_manifest(path='ligands_manifest.csv')#

Save a CSV manifest describing all processed records.

The manifest contains one row per internal record and includes the record index, input SMILES, ligand name, output path, status, and error message.

Parameters:

path (Union[str, Path]) – Destination path for the CSV manifest.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = (
    LigandPrep(output_dir="ligands_out")
    .from_smiles_list(["CCO"], names=["ethanol"])
    .process_all()
    .save_manifest("ligands_manifest.csv")
)
property records: List[Dict[str, Any]]#

Return a shallow copy of the internal record list.

Returns:

List of record dictionaries.

Return type:

List[Dict[str, Any]]

Example#

proc = LigandPrep().from_smiles_list(["CCO"])
print(proc.records)
property output_paths: List[Path | None]#

Return output paths corresponding to all records.

Records processed in in-memory mode or failed records may contain None values.

Returns:

List of output paths or None.

Return type:

List[Optional[Path]]

Example#

proc = LigandPrep(output_dir="ligands_out").from_smiles_list(["CCO"])
proc.process_all()
print(proc.output_paths)
property failed: List[Dict[str, Any]]#

Return records that failed processing.

Returns:

List of failed record dictionaries.

Return type:

List[Dict[str, Any]]

Example#

proc = LigandPrep().from_smiles_list(["not_a_smiles"]).process_all()
print(proc.failed)
property ok: List[Dict[str, Any]]#

Return records that were processed successfully.

Returns:

List of successful record dictionaries.

Return type:

List[Dict[str, Any]]

Example#

proc = LigandPrep().from_smiles_list(["CCO"]).process_all()
print(proc.ok)
property summary: Dict[str, int]#

Return summary counts for total, successful, failed, and pending records.

Returns:

Summary dictionary with keys "total", "ok", "failed", and "pending".

Return type:

Dict[str, int]

Example#

proc = LigandPrep().from_smiles_list(["CCO", "CCN"]).process_all()
print(proc.summary)
property sdf_strings: List[str]#

Return MolBlock strings for successfully processed records.

Despite the property name, the stored values are MolBlock strings held in memory and not full multi-record SDF files.

Returns:

List of MolBlock strings.

Return type:

List[str]

Example#

proc = LigandPrep(output_dir=None).from_smiles_list(["CCO"]).process_all()
print(proc.sdf_strings[0])
property mols: List[Any]#

Return RDKit Mol objects parsed from stored MolBlock strings.

Returns:

List of RDKit Mol objects.

Return type:

List[Any]

Raises:

RuntimeError – If RDKit is unavailable.

Example#

proc = LigandPrep(output_dir=None).from_smiles_list(["CCO"]).process_all()
mols = proc.mols
print(len(mols))
clear_records()#

Remove all loaded records from the instance.

This resets the processing state but does not delete any files already written to disk.

Returns:

The current instance for method chaining.

Return type:

LigandPrep

Example#

proc = LigandPrep().from_smiles_list(["CCO"])
proc.clear_records()
print(len(proc))
Parameters:
  • output_dir (Optional[Union[str, Path]])

  • smiles_key (str)

  • name_key (str)

  • index_pad (int)

Receptor preparation#

ReceptorPrep orchestration using small helpers (minimizers + converters).

Key behaviour - use_meeko=True by default - the mekoo executable name is fixed internally as “mk_prepare_receptor.py” (not provided by callers) - prep(…) is the main orchestration method (previously fix_and_minimize_pdb) - OpenMM minimization is attempted first. On failure we fallback to OpenBabel minimizer. - If fallback to OpenBabel occurs we prefer OpenBabel for conversions. - If Meeko conversion fails, we fallback to OpenBabel conversion. - Produced receptor PDBQT is validated and sanitized to improve downstream docking robustness.

Note By default prep appends a “_prep” suffix to output basenames to avoid in-place overwrites. Set add_prep_suffix=False to disable that behavior.

class ReceptorPrep(use_meeko=True, enable_logging=False)#

Bases: ReprMixin

High-level receptor preprocessor.

Parameters:
  • use_meeko (bool) – If True, attempt to use mekoo for receptor PDBQT conversion first.

  • enable_logging (bool) – Enable console logging for the instance.

Notes#

The mekoo executable is a fixed internal constant: “mk_prepare_receptor.py”.

enable_console_logging(level=logging.DEBUG)#

Enable console logging for this ReceptorPrep instance.

Parameters:

level (int) – logging level.

Return type:

None

toggle_meeko(on_off)#

Enable or disable usage of mekoo for conversions.

Parameters:

on_off (bool) – True to enable mekoo, False to disable.

Return type:

None

property use_meeko: bool#

Whether mekoo is enabled for this instance.

property mekoo_cmd: str#

Internal mekoo command (fixed): returns ‘mk_prepare_receptor.py’.

property final_artifact: Path | None#

Path to the final artifact produced by the last run (or None).

property last_simulation_report: Dict[str, Any] | None#

Last simulation report dictionary (or None if none).

property used_obabel: bool#

True if the last run used OpenBabel as fallback for minimization/conversion.

property minimized_stage: str | None#

Which minimization stage succeeded (‘gas’, ‘solvent’, ‘obabel’, etc.).

property conversion_backend: str | None#

Backend used for final conversion (‘meeko’, ‘obabel’, or None).

property conversion_fallback: bool#

True if conversion required fallback from preferred backend.

to_dict()#

Return a copy of the last_simulation_report (or None).

Returns:

shallow copy of the report or None

Return type:

dict or None

save_report(path, *, indent=2)#

Save last_simulation_report as JSON.

Parameters:
  • path (str or pathlib.Path) – Destination file path.

  • indent (int) – JSON indent level.

Returns:

path written

Return type:

pathlib.Path

Raises:

RuntimeError – if there is no report to save.

property expected_output_path: Path | None#

Return the path where the final artifact will be written for the most recent prep() parameters, or the actual final artifact if a run completed.

expected_output_for(
input_pdb,
output_dir,
out_fmt='pdb',
add_prep_suffix=True,
basename=None,
)#

Compute the expected output Path for the provided arguments without changing instance state.

Parameters:
Return type:

Path

prep(
input_pdb,
output_dir,
out_fmt='pdb',
energy_diff=10.0,
max_minimization_steps=5000,
start_at=1,
ion_conc=0.15,
cofactors=None,
minimize_in_water=False,
backbone_k_kcal_per_A2=5.0,
enable_logging=False,
obabel_steps=500,
obabel_convert_args=None,
add_prep_suffix=True,
)#

High-level orchestration for preparing a receptor.

Behaviour summary: - run PDBFixer - attempt OpenMM minimization

  • if OpenMM fails, fall back to OpenBabel minimization

  • if out_fmt == ‘pdbqt’: - if minimization fallback used -> use OpenBabel conversion - else if use_meeko=True -> try Meeko conversion first - if Meeko conversion fails -> fallback to OpenBabel conversion

  • receptor PDBQT is sanitized and validated for downstream docking

  • run PyMOL postprocessing for PDB artifacts

Raises:

RuntimeError – if minimization fails, or if out_fmt=’pdbqt’ and no valid PDBQT is produced

Parameters:
  • input_pdb (str)

  • output_dir (str)

  • out_fmt (str)

  • energy_diff (float)

  • max_minimization_steps (int)

  • start_at (int)

  • ion_conc (float)

  • cofactors (List[str] | None)

  • minimize_in_water (bool)

  • backbone_k_kcal_per_A2 (float)

  • enable_logging (bool)

  • obabel_steps (int)

  • obabel_convert_args (List[str] | None)

  • add_prep_suffix (bool)

Return type:

ReceptorPrep