Postprocess API#

Extraction#

crawl_scores(
roots,
include_logs=None,
include_tables=None,
engine_hint=None,
*,
layout='auto',
recursive=True,
)#

Discover and parse docking outputs under one or more filesystem roots.

Supported layouts are:

  • auto: generic crawling with best-effort engine detection

  • single_file: explicit log file parsing with required engine

  • flat_dir: log directory parsing with required engine

  • engine_tree: high-level directory whose immediate child folders are engine names

Parameters:
  • roots (Sequence[pathlib.Path | str]) – Files or directories to inspect.

  • include_logs (Sequence[str] | None) – Optional log-file glob patterns used only in layout="auto".

  • include_tables (Sequence[str] | None) – Optional table-file glob patterns used only in layout="auto".

  • engine_hint (str | None) – Engine required for single_file and flat_dir. In auto mode, it acts as a fallback hint.

  • layout (Literal["auto", "single_file", "flat_dir", "engine_tree"]) – Extraction layout mode.

  • recursive (bool) – Whether directory-based layouts should recurse into nested folders.

Returns:

Combined parsed dataframe, or None when no parsable data is found.

Return type:

pandas.DataFrame | None

Example#

df1 = crawl_scores(
    roots=["logs/erlotinib.log"],
    engine_hint="vina",
    layout="single_file",
)

df2 = crawl_scores(
    roots=["logs/smina_run"],
    engine_hint="smina",
    layout="flat_dir",
    recursive=True,
)

df3 = crawl_scores(
    roots=["logs"],
    layout="engine_tree",
)

df4 = crawl_scores(
    roots=["results"],
    layout="auto",
)
class Extractor(
include_logs=None,
include_tables=None,
match_mode='substring',
crawl_func=None,
engine_map=None,
)#

Bases: object

High-level score extractor with layout-aware helpers.

The extractor wraps crawl_scores() and provides convenience methods for engine filtering, engine listing, and explicit layout-based extraction.

Parameters:
  • include_logs (Sequence[str] | None) – Optional custom log-file glob patterns.

  • include_tables (Sequence[str] | None) – Optional custom table-file glob patterns.

  • match_mode (str) – Matching mode used when filtering extracted rows by engine.

  • crawl_func (Callable | None) – Optional custom crawl function used instead of crawl_scores().

  • engine_map (dict | None) – Optional mapping from logical engine groups to concrete engine tokens.

Example#

extractor = Extractor(
    match_mode="exact",
    engine_map={"vina-family": ["vina", "vina-gpu", "qvina", "qvina-gpu"]},
)
extract_scores(
roots,
engines=None,
engine_hint=None,
*,
layout='auto',
recursive=True,
)#

Extract scores and optionally filter the results by engine.

Parameters:
  • roots (Sequence[str | pathlib.Path]) – Files or directories to inspect.

  • engines (Iterable[str] | None) – Optional iterable of engine filters.

  • engine_hint (str | None) – Optional engine hint used during parsing.

  • layout (LayoutMode) – Extraction layout mode.

  • recursive (bool) – Whether directory-based layouts should recurse into nested folders.

Returns:

Extracted dataframe, optionally filtered by engine, or None when no data is found.

Return type:

pandas.DataFrame | None

Example#

extractor = Extractor(match_mode="exact")

df = extractor.extract_scores(
    roots=["logs"],
    engines=["vina", "smina"],
    layout="engine_tree",
)
list_engines(roots, engine_hint=None, *, layout='auto', recursive=True)#

List unique engine names discovered under the given roots.

Parameters:
  • roots (Sequence[str | pathlib.Path]) – Files or directories to inspect.

  • engine_hint (str | None) – Optional engine hint used during parsing.

  • layout (LayoutMode) – Extraction layout mode.

  • recursive (bool) – Whether directory-based layouts should recurse into nested folders.

Returns:

Set of lowercased engine names found in the extracted data.

Return type:

set[str]

Example#

extractor = Extractor()
engines = extractor.list_engines(
    roots=["logs"],
    layout="engine_tree",
)
extract_log_file(path, *, engine)#

Extract scores from a single explicit log file.

Parameters:
  • path (str | pathlib.Path) – Path to a .log or .txt file.

  • engine (str) – Required engine name for the file.

Returns:

Extracted dataframe or None when no rows are found.

Return type:

pandas.DataFrame | None

Example#

extractor = Extractor()
df = extractor.extract_log_file(
    "logs/erlotinib.log",
    engine="vina",
)
extract_logs_dir(path, *, engine, recursive=True)#

Extract scores from a directory of log files belonging to one engine.

Parameters:
  • path (str | pathlib.Path) – Directory containing .log or .txt files.

  • engine (str) – Required engine name applied to all discovered files.

  • recursive (bool) – Whether nested subdirectories should also be searched.

Returns:

Extracted dataframe or None when no rows are found.

Return type:

pandas.DataFrame | None

Example#

extractor = Extractor()
df = extractor.extract_logs_dir(
    "logs/smina_run",
    engine="smina",
    recursive=True,
)
extract_engine_folders(path, *, recursive=True)#

Extract scores from a high-level directory whose immediate subfolders are engine names.

Parameters:
  • path (str | pathlib.Path) – High-level logs directory.

  • recursive (bool) – Whether nested subdirectories inside each engine folder should also be searched.

Returns:

Extracted dataframe or None when no rows are found.

Return type:

pandas.DataFrame | None

Example#

extractor = Extractor()
df = extractor.extract_engine_folders(
    "logs",
    recursive=True,
)
extract_scores(roots, engines=None, engine_hint=None, *, layout='auto', recursive=True)#

Extract scores using the default Extractor instance.

Parameters:
  • roots (Sequence[str | pathlib.Path]) – Files or directories to inspect.

  • engines (Iterable[str] | None) – Optional iterable of engine filters.

  • engine_hint (str | None) – Optional engine hint used during parsing.

  • layout (LayoutMode) – Extraction layout mode.

  • recursive (bool) – Whether directory-based layouts should recurse into nested folders.

Returns:

Extracted dataframe or None when no data is found.

Return type:

pandas.DataFrame | None

Example#

df = extract_scores(
    roots=["logs"],
    layout="engine_tree",
)
list_engines(roots, engine_hint=None, *, layout='auto', recursive=True)#

List unique engine names using the default Extractor instance.

Parameters:
  • roots (Sequence[str | pathlib.Path]) – Files or directories to inspect.

  • engine_hint (str | None) – Optional engine hint used during parsing.

  • layout (LayoutMode) – Extraction layout mode.

  • recursive (bool) – Whether directory-based layouts should recurse into nested folders.

Returns:

Set of lowercased engine names.

Return type:

set[str]

Example#

engines = list_engines(
    roots=["logs"],
    layout="engine_tree",
)
extract_log_file(path, *, engine)#

Extract scores from a single explicit log file using the default extractor.

Parameters:
  • path (str | pathlib.Path) – Path to a .log or .txt file.

  • engine (str) – Required engine name for the file.

Returns:

Extracted dataframe or None when no rows are found.

Return type:

pandas.DataFrame | None

Example#

df = extract_log_file(
    "logs/erlotinib.log",
    engine="vina",
)
extract_logs_dir(path, *, engine, recursive=True)#

Extract scores from a flat log directory using the default extractor.

Parameters:
  • path (str | pathlib.Path) – Directory containing .log or .txt files.

  • engine (str) – Required engine name applied to all discovered files.

  • recursive (bool) – Whether nested subdirectories should also be searched.

Returns:

Extracted dataframe or None when no rows are found.

Return type:

pandas.DataFrame | None

Example#

df = extract_logs_dir(
    "logs/smina_run",
    engine="smina",
    recursive=True,
)
extract_engine_folders(path, *, recursive=True)#

Extract scores from a high-level directory whose immediate subfolders are engine names.

Parameters:
  • path (str | pathlib.Path) – High-level logs directory.

  • recursive (bool) – Whether nested subdirectories inside each engine folder should also be searched.

Returns:

Extracted dataframe or None when no rows are found.

Return type:

pandas.DataFrame | None

Example#

df = extract_engine_folders(
    "logs",
    recursive=True,
)
parse_log_text(text, engine=None, regex=None)#

Parse docking log text using built-in or custom regex rules.

The parser first resolves the engine name using canonicalize_engine_name() and, if needed, automatic engine detection. When regex is provided, a custom row pattern is tried first. If custom parsing yields rows, those rows are returned immediately. Otherwise, the built-in engine-specific parsers are used.

For GNINA logs, the parser will first try the GNINA table parser and then fall back to the Vina-family parser if no GNINA rows are found. This is useful because some GNINA outputs may also contain Vina-like score tables.

Parameters:
  • text (str) – Raw docking log text to parse.

  • engine (Optional[str]) – Optional engine name such as "vina", "smina", "qvina", or "gnina". If omitted, the engine is inferred automatically from the input text when possible.

  • regex (Optional[dict[str, str]]) – Optional mapping of custom regex patterns. Supported keys are "vina_row" and "gnina_row". The selected pattern must expose four capture groups in the same order as the built-in parser expects.

Returns:

Parsed docking rows. For Vina-family logs, rows contain mode, affinity_kcal_mol, rmsd_lb, and rmsd_ub. For GNINA logs, rows contain mode, affinity_kcal_mol, cnn_pose, and cnn_affinity.

Return type:

list[dict]

Example#

text = '''
-----+------------+----------+----------
   1       -7.5      0.000      0.000
   2       -7.1      1.200      2.400
'''
rows = parse_log_text(text, engine="vina")

Example#

custom = {
    "vina_row": r"^\s*(\d+)\s+([-+]?\d*\.?\d+)\s+([-+]?\d*\.?\d+)\s+([-+]?\d*\.?\d+)$"
}
rows = parse_log_text(text, engine="vina", regex=custom)

Pose processing#

class PoseCrawler(roots, *, engine=None, recursive=True, docked_suffix='_docked.pdbqt')#

Bases: object

High-level helper for discovering, summarizing, converting, and loading docked poses.

This class provides a compact interface over the lower-level pose utilities for:

  • discovering pose files

  • building PoseRecord entries

  • converting records into DataFrames

  • loading RDKit molecules

  • selecting the best-scoring pose per group

  • converting discovered .pdbqt files to .sdf

Supported input layouts#

  1. A direct path to one .pdbqt file with engine=....

  2. A direct path to a flat folder of .pdbqt files with engine=....

  3. A higher-level ProDock tree such as <root>/<receptor>/results/docked/<engine>/*.pdbqt, where receptor id and engine are inferred automatically.

Important#

When a root is a directory, only files whose names end with "_docked.pdbqt" are retained. This prevents receptor preparation files such as filtered_protein/4WKQ.pdbqt from being treated as docked ligand poses.

Direct file inputs are not filtered by suffix. This preserves the original direct-file behavior.

param roots:

Root files or directories to inspect.

type roots:

Sequence[str | pathlib.Path]

param engine:

Optional engine hint for direct-file or flat-directory inputs, or an optional filter for hierarchical ProDock trees.

type engine:

Optional[str]

param recursive:

Whether nested directories should be searched recursively.

type recursive:

bool

param docked_suffix:

Required filename suffix applied only to records discovered from directory roots. Default is "_docked.pdbqt".

type docked_suffix:

str

Example#

from prodock.postprocess.pose.core import PoseCrawler

crawler = PoseCrawler(["Data/testcase/post"])

df = crawler.crawl()
best_df = crawler.best()

mol_df = crawler.crawl_mols(save_sdf=True)
best_mol_df = crawler.best_mols()

sdf_paths = crawler.convert(out_dir="Data/testcase/post/converted_sdf")

A direct single-file workflow is also supported:

crawler = PoseCrawler(
    ["Data/testcase/post/1M17/results/docked/vina/erlotinib_docked.pdbqt"],
    engine="vina",
)
df = crawler.crawl()

Common real input examples include:

"Data/testcase/post/1M17/results/docked/qvina/erlotinib_docked.pdbqt"
"Data/testcase/post/1M17/results/docked/smina/erlotinib_docked.pdbqt"
"Data/testcase/post/1M17/results/docked/vina/erlotinib_docked.pdbqt"
records()#

Return discovered pose records.

This method delegates to prodock.postprocess.pose.io.build_pose_records() using the crawler configuration captured at initialization, then filters directory-derived records so that only *_docked.pdbqt files are retained.

Returns:

Discovered pose records.

Return type:

list[prodock.postprocess.pose.model.PoseRecord]

Example#

crawler = PoseCrawler(["Data/testcase/post"])
records = crawler.records()
crawl()#

Return discovered pose records as a standardized DataFrame.

The returned table uses the public pose schema:

  • receptor_id

  • ligand_id

  • engine

  • pose_rank

  • affinity

Returns:

Pose summary table.

Return type:

pandas.DataFrame

Example#

crawler = PoseCrawler(["Data/testcase/post"])
df = crawler.crawl()
crawl_mols(
*,
backend='auto',
sanitize=True,
remove_hs=False,
save_sdf=False,
overwrite_sdf=False,
)#

Return a DataFrame containing pose metadata and RDKit molecules.

This method loads molecules from the discovered pose files and returns a standardized table with the public pose-plus-molecule schema:

  • receptor_id

  • ligand_id

  • engine

  • pose_rank

  • affinity

  • mol

For directory roots, only records whose source files end with *_docked.pdbqt are processed.

Parameters:
  • backend (str) – Conversion backend used during PDBQT-to-SDF conversion.

  • sanitize (bool) – Whether imported RDKit molecules should be sanitized.

  • remove_hs (bool) – Whether hydrogens should be removed during SDF import.

  • save_sdf (bool) – Whether to also write an SDF file beside each source .pdbqt file.

  • overwrite_sdf (bool) – Whether an existing neighboring SDF file may be overwritten.

Returns:

Pose table with RDKit molecules.

Return type:

pandas.DataFrame

Example#

crawler = PoseCrawler(
    ["Data/testcase/post/1M17/results/docked/smina/erlotinib_docked.pdbqt"],
    engine="smina",
)
mol_df = crawler.crawl_mols(save_sdf=True)
best(*, by=('receptor_id', 'ligand_id', 'engine'))#

Return best-scoring pose rows per group.

Lower affinity is treated as better. By default, one best row is selected for each (receptor_id, ligand_id, engine) group.

Parameters:

by (Sequence[str]) – Grouping columns that define independent selection groups.

Returns:

Best-scoring rows per group.

Return type:

pandas.DataFrame

Example#

crawler = PoseCrawler(["Data/testcase/post"])
best_df = crawler.best()
best_mols(
*,
by=('receptor_id', 'ligand_id', 'engine'),
backend='obabel',
sanitize=True,
remove_hs=False,
save_sdf=False,
overwrite_sdf=False,
)#

Return best-scoring pose rows per group, including RDKit molecules.

This method first builds a pose-plus-molecule DataFrame and then applies best-pose selection on top of it.

Parameters:
  • by (Sequence[str]) – Grouping columns that define independent selection groups.

  • backend (str) – Conversion backend used during PDBQT-to-SDF conversion.

  • sanitize (bool) – Whether imported RDKit molecules should be sanitized.

  • remove_hs (bool) – Whether hydrogens should be removed during SDF import.

  • save_sdf (bool) – Whether to also write an SDF file beside each source .pdbqt file.

  • overwrite_sdf (bool) – Whether an existing neighboring SDF file may be overwritten.

Returns:

Best-scoring rows with molecule objects.

Return type:

pandas.DataFrame

Example#

crawler = PoseCrawler(["Data/testcase/post"])
best_mol_df = crawler.best_mols(save_sdf=False)
convert(*, backend='obabel', overwrite=False, out_dir=None)#

Convert discovered PDBQT pose files into SDF files.

When out_dir is omitted, each SDF file is written beside its source .pdbqt file. When out_dir is provided, converted files are written into that shared destination directory.

For directory roots, only files ending with *_docked.pdbqt are converted. Direct file inputs are preserved unchanged.

Parameters:
  • backend (str) – Conversion backend used for PDBQT-to-SDF conversion.

  • overwrite (bool) – Whether existing output files may be overwritten.

  • out_dir (Optional[str | pathlib.Path]) – Optional shared output directory. When omitted, SDF files are saved beside the source .pdbqt files.

Returns:

Written or reused SDF paths.

Return type:

list[pathlib.Path]

Example#

crawler = PoseCrawler(["Data/testcase/post"])
sdf_paths = crawler.convert(
    out_dir="Data/testcase/post/converted_sdf",
    overwrite=True,
)
Parameters:
  • roots (Sequence[PathLike])

  • engine (Optional[str])

  • recursive (bool)

  • docked_suffix (str)

crawl_poses(roots, *, engine=None, recursive=True, docked_suffix='_docked.pdbqt')#

Convenience wrapper around PoseCrawler.crawl().

Parameters:
  • roots (Sequence[str | pathlib.Path]) – Root files or directories to inspect.

  • engine (Optional[str]) – Optional engine hint or filter.

  • recursive (bool) – Whether nested directories should be searched recursively.

  • docked_suffix (str) – Required filename suffix applied only to records discovered from directory roots.

Returns:

Standardized pose summary table.

Return type:

pandas.DataFrame

Example#

df = crawl_poses(["Data/testcase/post"])
crawl_pose_mols(
roots,
*,
engine=None,
recursive=True,
docked_suffix='_docked.pdbqt',
backend='obabel',
sanitize=True,
remove_hs=False,
save_sdf=False,
overwrite_sdf=False,
)#

Convenience wrapper around PoseCrawler.crawl_mols().

Parameters:
  • roots (Sequence[str | pathlib.Path]) – Root files or directories to inspect.

  • engine (str | None) – Optional engine hint or filter. :type engine: Optional[str]

  • recursive (bool) – Whether nested directories should be searched recursively.

  • docked_suffix (str) – Required filename suffix applied only to records discovered from directory roots.

  • backend (str) – Conversion backend used during PDBQT-to-SDF conversion.

  • sanitize (bool) – Whether imported RDKit molecules should be sanitized.

  • remove_hs (bool) – Whether hydrogens should be removed during SDF import.

  • save_sdf (bool) – Whether to also write an SDF beside each source .pdbqt file.

  • overwrite_sdf (bool) – Whether an existing neighboring SDF file may be overwritten.

Returns:

Standardized pose-plus-molecule table.

Return type:

pandas.DataFrame

Example#

mol_df = crawl_pose_mols(
    ["Data/testcase/post/1M17/results/docked/qvina/erlotinib_docked.pdbqt"],
    engine="qvina",
    save_sdf=True,
)
select_best_poses(
roots,
*,
engine=None,
recursive=True,
docked_suffix='_docked.pdbqt',
by=('receptor_id', 'ligand_id', 'engine'),
)#

Convenience wrapper around PoseCrawler.best().

Parameters:
  • roots (Sequence[str | pathlib.Path]) – Root files or directories to inspect.

  • engine (Optional[str]) – Optional engine hint or filter.

  • recursive (bool) – Whether nested directories should be searched recursively.

  • docked_suffix (str) – Required filename suffix applied only to records discovered from directory roots.

  • by (Sequence[str]) – Grouping columns that define independent selection groups.

Returns:

Best-scoring pose rows per group.

Return type:

pandas.DataFrame

Example#

best_df = select_best_poses(["Data/testcase/post"])
select_best_pose_mols(
roots,
*,
engine=None,
recursive=True,
docked_suffix='_docked.pdbqt',
by=('receptor_id', 'ligand_id', 'engine'),
backend='obabel',
sanitize=True,
remove_hs=False,
save_sdf=False,
overwrite_sdf=False,
)#

Convenience wrapper around PoseCrawler.best_mols().

Parameters:
  • roots (Sequence[str | pathlib.Path]) – Root files or directories to inspect.

  • engine (Optional[str]) – Optional engine hint or filter.

  • recursive (bool) – Whether nested directories should be searched recursively.

  • docked_suffix (str) – Required filename suffix applied only to records discovered from directory roots.

  • by (Sequence[str]) – Grouping columns that define independent selection groups.

  • backend (str) – Conversion backend used during PDBQT-to-SDF conversion.

  • sanitize (bool) – Whether imported RDKit molecules should be sanitized.

  • remove_hs (bool) – Whether hydrogens should be removed during SDF import.

  • save_sdf (bool) – Whether to also write an SDF beside each source .pdbqt file.

  • overwrite_sdf (bool) – Whether an existing neighboring SDF file may be overwritten.

Returns:

Best-scoring pose rows with molecule objects.

Return type:

pandas.DataFrame

Example#

best_mol_df = select_best_pose_mols(
    ["Data/testcase/post"],
    save_sdf=False,
)
save_pose_sdf(pdbqt_file, *, backend='obabel', overwrite=False, out_file=None)#

Convert a docked .pdbqt pose file to .sdf and save it on disk.

By default, the output SDF is written next to the input file using the same file stem. An explicit output path may also be supplied via out_file.

If the destination file already exists and overwrite is False, the existing path is returned without performing a new conversion.

Parameters:
Returns:

Path to the written or reused SDF file.

Return type:

pathlib.Path

Example#

sdf_path = save_pose_sdf(
    "Data/testcase/post/1M17/results/docked/qvina/erlotinib_docked.pdbqt",
    backend="obabel",
    overwrite=True,
)
pdbqt_to_rdkit_mols(pdbqt_file, *, backend='auto', sanitize=True, remove_hs=False)#

Convert a docked .pdbqt file into RDKit molecules via a temporary SDF.

The function first converts the input .pdbqt file into a temporary SDF file using prodock.structure.conversion.pdbqt_to_sdf(), then loads the molecules with rdkit.Chem.SDMolSupplier.

Invalid molecules returned as None by the supplier are discarded.

Parameters:
  • pdbqt_file (str | pathlib.Path) – Input docked .pdbqt file.

  • backend (str) – Conversion backend passed to the PDBQT-to-SDF converter.

  • sanitize (bool) – Whether RDKit sanitization should be applied while reading the temporary SDF.

  • remove_hs (bool) – Whether hydrogens should be removed during SDF import.

Returns:

List of successfully loaded RDKit molecule objects.

Return type:

list[rdkit.Chem.Mol]

Example#

mols = pdbqt_to_rdkit_mols(
    "Data/testcase/post/1M17/results/docked/smina/erlotinib_docked.pdbqt",
    sanitize=True,
    remove_hs=False,
)
convert_pose_tree(
roots,
*,
engine=None,
recursive=True,
backend='obabel',
overwrite=False,
out_dir=None,
)#

Convert discovered .pdbqt pose files into .sdf files.

When out_dir is omitted, each output SDF is written beside its source .pdbqt file. When out_dir is provided, all SDF files are written into that shared destination directory.

If multiple input files share the same stem and a shared out_dir is used, unique filenames are generated by appending suffixes such as _2, _3, and so on.

Parameters:
  • roots (Sequence[str | pathlib.Path]) – Root files or directories to inspect.

  • engine (Optional[str]) – Optional engine filter applied to the parent directory name of discovered pose files.

  • recursive (bool) – Whether to recurse into nested directories during file discovery.

  • backend (str) – Conversion backend passed to the underlying PDBQT-to-SDF converter.

  • overwrite (bool) – Whether existing SDF files may be overwritten.

  • out_dir (Optional[str | pathlib.Path]) – Optional shared output directory for all converted SDF files.

Returns:

Paths to written or reused SDF files.

Return type:

list[pathlib.Path]

Example#

outputs = convert_pose_tree(
    ["Data/testcase/post/1M17/results/docked"],
    engine="vina",
    recursive=True,
    out_dir="Data/testcase/post/converted_sdf",
)

Interaction analysis#

class InteractionProfiler(
interactions=None,
parameters=None,
count=False,
vicinity_cutoff=6.0,
receptor_selection=None,
receptor_use_segid=None,
ligand_resname='LIG',
ligand_resnumber=1,
ligand_chain='',
ligand_use_segid=False,
sdf_sanitize=True,
receptor_guess_bonds=True,
receptor_vdwradii=None,
suppress_mdanalysis_warnings=True,
suppress_mdanalysis_info_logs=True,
progress=False,
n_jobs=1,
drop_empty=True,
)#

Bases: object

High-level helper for protein-ligand interaction extraction using ProLIF.

This class stores all interaction-analysis settings in one place and exposes two main execution methods:

  • run() for one receptor plus one ligand source

  • run_pose_table() for automated pose-table workflows across one or multiple receptors

The class is designed for ProDock automation and notebook workflows where reproducible settings, pose-level summaries, and optional fingerprint vectors are useful.

Parameters:
  • interactions (Optional[Sequence[str]]) – Optional subset of ProLIF interaction names to enable. If None, ProLIF defaults are used.

  • parameters (Optional[Dict[str, Dict[str, Any]]]) – Optional parameter overrides passed directly to ProLIF interaction definitions.

  • count (bool) – Whether to generate count fingerprints instead of boolean fingerprints.

  • vicinity_cutoff (float) – Distance cutoff used by ProLIF when automatically selecting nearby receptor residues.

  • receptor_selection (Optional[str]) – Optional MDAnalysis selection string for the receptor. If None, all atoms from the receptor structure are used.

  • receptor_use_segid (Optional[bool]) – Whether ProLIF should use segment id instead of chain id for receptor residue identifiers.

  • ligand_resname (str) – Default ligand residue name used when an RDKit molecule has no residue metadata.

  • ligand_resnumber (int) – Default ligand residue number used when an RDKit molecule has no residue metadata.

  • ligand_chain (str) – Default ligand chain id used when an RDKit molecule has no residue metadata.

  • ligand_use_segid (bool) – Whether ProLIF should use segment id instead of chain id for ligands.

  • sdf_sanitize (bool) – Whether RDKit should sanitize molecules when reading an SDF file.

  • receptor_guess_bonds (bool) – Whether to proactively guess receptor bond topology before ProLIF converts the receptor to RDKit.

  • receptor_vdwradii (Optional[Mapping[str, float]]) – Optional VdW radii mapping forwarded to MDAnalysis bond guessing.

  • suppress_mdanalysis_warnings (bool) – Whether to suppress known non-actionable MDAnalysis warnings.

  • suppress_mdanalysis_info_logs (bool) – Whether to suppress repeated MDAnalysis info log messages.

  • progress (bool) – Whether ProLIF should show a progress bar.

  • n_jobs (Optional[int]) – Number of parallel jobs used by ProLIF.

  • drop_empty (bool) – Whether to drop empty columns in the wide fingerprint table.

Example#

Create a profiler and run interaction extraction for one SDF file:

from prodock.postprocess.interaction.core import InteractionProfiler

profiler = InteractionProfiler(
    count=False,
    vicinity_cutoff=6.0,
    progress=False,
    n_jobs=1,
)

result = profiler.run(
    receptor_pdb="Data/testcase/Multi/1M17/filtered_protein/1M17.pdb",
    ligands="Data/testcase/post/1M17/erlotinib.sdf",
)

print(result.fingerprint_df.head())
print(result.interaction_df.head())

Example#

Run pose-table automation for multiple receptors:

profiler = InteractionProfiler(progress=False, n_jobs=1)

result = profiler.run_pose_table(
    poses=df,
    receptor_pdb_by_id={
        "1M17": "Data/testcase/Multi/1M17/filtered_protein/1M17.pdb",
        "4WKQ": "Data/testcase/Multi/4WKQ/filtered_protein/4WKQ.pdb",
    },
    batch_size=10,
    include_interaction_events=True,
    include_bitvectors=False,
    include_countvectors=False,
    fail_fast=True,
)

merged_df = result.merged_df
interaction_df = result.interaction_df
summary_df = result.summary_df
interactions: Sequence[str] | None = None#
parameters: Dict[str, Dict[str, Any]] | None = None#
count: bool = False#
vicinity_cutoff: float = 6.0#
receptor_selection: str | None = None#
receptor_use_segid: bool | None = None#
ligand_resname: str = 'LIG'#
ligand_resnumber: int = 1#
ligand_chain: str = ''#
ligand_use_segid: bool = False#
sdf_sanitize: bool = True#
receptor_guess_bonds: bool = True#
receptor_vdwradii: Mapping[str, float] | None = None#
suppress_mdanalysis_warnings: bool = True#
suppress_mdanalysis_info_logs: bool = True#
progress: bool = False#
n_jobs: int | None = 1#
drop_empty: bool = True#
available_interactions(show_hidden=False, show_bridged=False)#

List interactions available in the installed ProLIF version.

Parameters:
  • show_hidden (bool) – Whether hidden interactions should be included.

  • show_bridged (bool) – Whether bridged interactions should be included.

Returns:

List of interaction names supported by the installed ProLIF version.

Return type:

list[str]

settings_snapshot()#

Return a serializable snapshot of the current profiler settings.

Returns:

Serializable dictionary of profiler settings.

Return type:

Dict[str, Any]

run(receptor_pdb, ligands, residues=None)#

Extract protein-ligand interactions for one receptor and one ligand input source.

Parameters:
  • receptor_pdb (str | pathlib.Path) – Path to the receptor PDB file.

  • ligands (str | pathlib.Path | Any | Sequence[Any] | Iterable[Any] | Mapping[str, Any]) – Ligand input source.

  • residues (Optional[Sequence[str] | str]) – Optional residue subset passed to Fingerprint.run_from_iterable.

Returns:

Structured interaction extraction result.

Return type:

InteractionRunResult

run_pose_table(
poses,
receptor_pdb_by_id,
*,
receptor_col='receptor_id',
ligand_col='ligand_id',
engine_col='engine',
pose_rank_col='pose_rank',
affinity_col='affinity',
mol_col='mol',
pose_id_col=None,
residues=None,
batch_size=1,
include_fingerprint_columns=False,
include_interaction_events=True,
include_bitvectors=False,
include_countvectors=False,
fingerprint_prefix='ifp__',
gc_collect=True,
fail_fast=True,
ultra_safe=True,
)#

Compute pose-centric interactions for a pose table.

Parameters:
  • poses (pandas.DataFrame) – Input pose table with at least receptor, ligand, engine, rank, affinity, and molecule columns.

  • receptor_pdb_by_id (Mapping[str, str | pathlib.Path]) – Mapping from receptor id to receptor PDB path.

  • receptor_col (str) – Column containing receptor identifiers.

  • ligand_col (str) – Column containing ligand identifiers.

  • engine_col (str) – Column containing engine identifiers.

  • pose_rank_col (str) – Column containing pose rank.

  • affinity_col (str) – Column containing affinity score.

  • mol_col (str) – Column containing RDKit molecules.

  • pose_id_col (Optional[str]) – Optional pre-existing pose id column.

  • residues (Optional[Sequence[str] | str]) – Optional ProLIF residue subset.

  • batch_size (int) – Number of poses to process together when ultra_safe is False.

  • include_fingerprint_columns (bool) – Retained for API compatibility.

  • include_interaction_events (bool) – Whether to compute and store raw event payloads.

  • include_bitvectors (bool) – Whether to collect ProLIF bitvectors aligned to pose order.

  • include_countvectors (bool) – Whether to collect ProLIF countvectors aligned to pose order.

  • fingerprint_prefix (str) – Retained for API compatibility.

  • gc_collect (bool) – Whether to call garbage collection between batches.

  • fail_fast (bool) – Whether to stop immediately on the first failing batch.

  • ultra_safe (bool) – Whether to force one-pose-at-a-time processing.

Returns:

Pose-centric interaction result.

Return type:

PoseInteractionTableResult

extract_interactions(
receptor_pdb,
ligands,
*,
interactions=None,
parameters=None,
count=False,
vicinity_cutoff=6.0,
receptor_selection=None,
receptor_use_segid=None,
ligand_resname='LIG',
ligand_resnumber=1,
ligand_chain='',
ligand_use_segid=False,
sdf_sanitize=True,
receptor_guess_bonds=True,
receptor_vdwradii=None,
suppress_mdanalysis_warnings=True,
suppress_mdanalysis_info_logs=True,
progress=False,
n_jobs=1,
residues=None,
drop_empty=True,
)#

Convenience wrapper around InteractionProfiler for single-run extraction.

Returns:

Structured single-run interaction result.

Return type:

InteractionRunResult

Parameters:
extract_pose_table_interactions(
poses,
receptor_pdb_by_id,
*,
interactions=None,
parameters=None,
count=False,
vicinity_cutoff=6.0,
receptor_selection=None,
receptor_use_segid=None,
ligand_resname='LIG',
ligand_resnumber=1,
ligand_chain='',
ligand_use_segid=False,
sdf_sanitize=True,
receptor_guess_bonds=True,
receptor_vdwradii=None,
suppress_mdanalysis_warnings=True,
suppress_mdanalysis_info_logs=True,
progress=False,
n_jobs=1,
receptor_col='receptor_id',
ligand_col='ligand_id',
engine_col='engine',
pose_rank_col='pose_rank',
affinity_col='affinity',
mol_col='mol',
pose_id_col=None,
residues=None,
batch_size=1,
include_fingerprint_columns=False,
include_interaction_events=True,
include_bitvectors=False,
include_countvectors=False,
fingerprint_prefix='ifp__',
gc_collect=True,
fail_fast=True,
ultra_safe=True,
drop_empty=True,
)#

Convenience wrapper for automated pose-table interaction extraction.

Returns:

Pose-centric interaction result containing merged_df, interaction_df, and summary_df.

Return type:

PoseInteractionTableResult

Parameters:
  • poses (pandas.DataFrame)

  • receptor_pdb_by_id (Mapping[str, str | Path])

  • interactions (Sequence[str] | None)

  • parameters (Dict[str, Dict[str, Any]] | None)

  • count (bool)

  • vicinity_cutoff (float)

  • receptor_selection (str | None)

  • receptor_use_segid (bool | None)

  • ligand_resname (str)

  • ligand_resnumber (int)

  • ligand_chain (str)

  • ligand_use_segid (bool)

  • sdf_sanitize (bool)

  • receptor_guess_bonds (bool)

  • receptor_vdwradii (Mapping[str, float] | None)

  • suppress_mdanalysis_warnings (bool)

  • suppress_mdanalysis_info_logs (bool)

  • progress (bool)

  • n_jobs (int | None)

  • receptor_col (str)

  • ligand_col (str)

  • engine_col (str)

  • pose_rank_col (str)

  • affinity_col (str)

  • mol_col (str)

  • pose_id_col (str | None)

  • residues (Sequence[str] | str | None)

  • batch_size (int)

  • include_fingerprint_columns (bool)

  • include_interaction_events (bool)

  • include_bitvectors (bool)

  • include_countvectors (bool)

  • fingerprint_prefix (str)

  • gc_collect (bool)

  • fail_fast (bool)

  • ultra_safe (bool)

  • drop_empty (bool)

tanimoto_similarity_matrix(vectors, names=None)#

Compute a pairwise Tanimoto similarity matrix.

This function computes the all-against-all Tanimoto similarity between fingerprint vectors and returns the result as a square pandas.DataFrame. It is suitable for RDKit explicit bit vectors as well as sparse count vectors, which makes it useful for both boolean and count-based ProLIF fingerprints.

When names is not provided, default labels of the form mol_0000, mol_0001, and so on are generated automatically.

Parameters:
  • vectors (Sequence[Any]) – Sequence of RDKit-compatible fingerprint vectors. Each element should be accepted by rdkit.DataStructs.TanimotoSimilarity().

  • names (Sequence[str] | None) – Optional labels to use for both the row index and column names of the returned matrix. When omitted, default molecule labels are generated from the vector order.

Returns:

Square dataframe whose (i, j) entry contains the Tanimoto similarity between vectors[i] and vectors[j].

Return type:

pandas.DataFrame

Raises:

MissingDependencyError – Raised when RDKit is not installed or cannot be imported.

Example#

sim = tanimoto_similarity_matrix(
    vectors=result.bitvectors,
    names=result.molecule_names,
)
print(sim.iloc[:5, :5])

Another Example#

sim = tanimoto_similarity_matrix(vectors)
assert sim.shape == (len(vectors), len(vectors))
class JournalStyle(
name,
palette,
heatmap_cmap='cividis',
background='white',
panel_facecolor='white',
grid_color='#d9dde3',
spine_color='#4a4f57',
text_color='#222222',
title_size=10.5,
label_size=9.0,
tick_size=8.0,
panel_label_size=11.0,
line_width=0.8,
grid_alpha=0.35,
histogram_alpha=0.85,
scatter_alpha=0.85,
marker_size=20.0,
)#

Bases: object

Visual style configuration for publication-style figures.

Parameters:
name: str#
palette: tuple[str, ...]#
heatmap_cmap: str = 'cividis'#
background: str = 'white'#
panel_facecolor: str = 'white'#
grid_color: str = '#d9dde3'#
spine_color: str = '#4a4f57'#
text_color: str = '#222222'#
title_size: float = 10.5#
label_size: float = 9.0#
tick_size: float = 8.0#
panel_label_size: float = 11.0#
line_width: float = 0.8#
grid_alpha: float = 0.35#
histogram_alpha: float = 0.85#
scatter_alpha: float = 0.85#
marker_size: float = 20.0#
build_pose_visualization_table(result)#

Build a merged pose-level dataframe convenient for plotting.

Parameters:

result (PoseInteractionTableResult)

Return type:

pandas.DataFrame

make_affinity_histogram(
result,
*,
bins=20,
group_by=None,
figsize=(3.35, 2.6),
title='Affinity distribution',
xlabel='Affinity',
ylabel='Count',
style='nature',
)#
Parameters:
Return type:

Any

save_affinity_histogram(result, output_path, **kwargs)#
Parameters:
  • result (PoseInteractionTableResult)

  • output_path (str | Path)

  • kwargs (Any)

Return type:

Path

make_best_pose_bar(
result,
*,
group_cols=('receptor_id', 'ligand_id', 'engine'),
figsize=(4.5, 2.8),
title='Best pose per group',
xlabel='Group',
ylabel='Best affinity',
style='nature',
)#
Parameters:
Return type:

Any

save_best_pose_bar(result, output_path, **kwargs)#
Parameters:
  • result (PoseInteractionTableResult)

  • output_path (str | Path)

  • kwargs (Any)

Return type:

Path

make_interaction_type_bar(
result,
*,
top_n=10,
normalize=False,
figsize=(3.35, 2.6),
title='Interaction type frequency',
xlabel='Type',
ylabel=None,
style='nature',
)#
Parameters:
Return type:

Any

save_interaction_type_bar(result, output_path, **kwargs)#
Parameters:
  • result (PoseInteractionTableResult)

  • output_path (str | Path)

  • kwargs (Any)

Return type:

Path

make_residue_contact_bar(
result,
*,
interaction_type=None,
top_n=15,
normalize=False,
figsize=(4.0, 2.6),
title='Residue contact frequency',
xlabel='Residue',
ylabel=None,
style='nature',
)#
Parameters:
Return type:

Any

save_residue_contact_bar(result, output_path, **kwargs)#
Parameters:
  • result (PoseInteractionTableResult)

  • output_path (str | Path)

  • kwargs (Any)

Return type:

Path

make_interaction_count_histogram(
result,
*,
bins=20,
count_kind='compact',
group_by=None,
figsize=(3.35, 2.6),
title='Interaction count distribution',
xlabel=None,
ylabel='Count',
style='nature',
)#
Parameters:
Return type:

Any

save_interaction_count_histogram(result, output_path, **kwargs)#
Parameters:
  • result (PoseInteractionTableResult)

  • output_path (str | Path)

  • kwargs (Any)

Return type:

Path

make_affinity_vs_interaction_count_scatter(
result,
*,
count_kind='compact',
group_by=None,
figsize=(3.35, 2.6),
title='Affinity vs interaction count',
xlabel='Affinity',
ylabel=None,
style='nature',
)#
Parameters:
Return type:

Any

save_affinity_vs_interaction_count_scatter(result, output_path, **kwargs)#
Parameters:
  • result (PoseInteractionTableResult)

  • output_path (str | Path)

  • kwargs (Any)

Return type:

Path

plot_similarity_heatmap(
result,
*,
figsize=(4.2, 3.5),
annotate=False,
title='Fingerprint similarity',
xlabel='Pose',
ylabel='Pose',
vmin=0.0,
vmax=1.0,
style='nature',
max_labels=30,
)#
Parameters:
Return type:

Any

save_similarity_heatmap(result, output_path, *, dpi=300, **kwargs)#
Parameters:
  • result (PoseInteractionTableResult)

  • output_path (str | Path)

  • dpi (int)

  • kwargs (Any)

Return type:

Path

make_summary_panel_2x3(
result,
*,
style='nature',
figsize=(10.5, 6.8),
top_n_types=8,
top_n_residues=12,
residue_interaction_type='Hydrophobic',
scatter_group_by='engine',
hist_group_by='engine',
title='Docking interaction summary',
)#

Build a 2x3 publication-style summary panel.

Panels: a) affinity distribution b) best affinity per receptor-ligand-engine group c) interaction type frequency d) residue contact frequency e) interaction count distribution f) affinity vs interaction count

Parameters:
  • result (PoseInteractionTableResult)

  • style (str | JournalStyle)

  • figsize (tuple[float, float])

  • top_n_types (int)

  • top_n_residues (int)

  • residue_interaction_type (str | None)

  • scatter_group_by (str | None)

  • hist_group_by (str | None)

  • title (str)

Return type:

Any

save_summary_panel_2x3(
result,
output_path,
*,
style='nature',
figsize=(10.5, 6.8),
top_n_types=8,
top_n_residues=12,
residue_interaction_type='Hydrophobic',
scatter_group_by='engine',
hist_group_by='engine',
title='Docking interaction summary',
dpi=300,
)#
Parameters:
Return type:

Path