Postprocess API#
Extraction#
- crawl_scores(
- roots,
- include_logs=None,
- include_tables=None,
- engine_hint=None,
- *,
- layout='auto',
- recursive=True,
Discover and parse docking outputs under one or more filesystem roots.
Supported layouts are:
auto: generic crawling with best-effort engine detectionsingle_file: explicit log file parsing with required engineflat_dir: log directory parsing with required engineengine_tree: high-level directory whose immediate child folders are engine names
- Parameters:
roots (Sequence[pathlib.Path | str]) – Files or directories to inspect.
include_logs (Sequence[str] | None) – Optional log-file glob patterns used only in
layout="auto".include_tables (Sequence[str] | None) – Optional table-file glob patterns used only in
layout="auto".engine_hint (str | None) – Engine required for
single_fileandflat_dir. Inautomode, it acts as a fallback hint.layout (Literal["auto", "single_file", "flat_dir", "engine_tree"]) – Extraction layout mode.
recursive (bool) – Whether directory-based layouts should recurse into nested folders.
- Returns:
Combined parsed dataframe, or
Nonewhen no parsable data is found.- Return type:
pandas.DataFrame | None
Example#
df1 = crawl_scores( roots=["logs/erlotinib.log"], engine_hint="vina", layout="single_file", ) df2 = crawl_scores( roots=["logs/smina_run"], engine_hint="smina", layout="flat_dir", recursive=True, ) df3 = crawl_scores( roots=["logs"], layout="engine_tree", ) df4 = crawl_scores( roots=["results"], layout="auto", )
- class Extractor(
- include_logs=None,
- include_tables=None,
- match_mode='substring',
- crawl_func=None,
- engine_map=None,
Bases:
objectHigh-level score extractor with layout-aware helpers.
The extractor wraps
crawl_scores()and provides convenience methods for engine filtering, engine listing, and explicit layout-based extraction.- Parameters:
include_logs (Sequence[str] | None) – Optional custom log-file glob patterns.
include_tables (Sequence[str] | None) – Optional custom table-file glob patterns.
match_mode (str) – Matching mode used when filtering extracted rows by engine.
crawl_func (Callable | None) – Optional custom crawl function used instead of
crawl_scores().engine_map (dict | None) – Optional mapping from logical engine groups to concrete engine tokens.
Example#
extractor = Extractor( match_mode="exact", engine_map={"vina-family": ["vina", "vina-gpu", "qvina", "qvina-gpu"]}, )
- extract_scores(
- roots,
- engines=None,
- engine_hint=None,
- *,
- layout='auto',
- recursive=True,
Extract scores and optionally filter the results by engine.
- Parameters:
roots (Sequence[str | pathlib.Path]) – Files or directories to inspect.
engines (Iterable[str] | None) – Optional iterable of engine filters.
engine_hint (str | None) – Optional engine hint used during parsing.
layout (LayoutMode) – Extraction layout mode.
recursive (bool) – Whether directory-based layouts should recurse into nested folders.
- Returns:
Extracted dataframe, optionally filtered by engine, or
Nonewhen no data is found.- Return type:
pandas.DataFrame | None
Example#
extractor = Extractor(match_mode="exact") df = extractor.extract_scores( roots=["logs"], engines=["vina", "smina"], layout="engine_tree", )
- list_engines(roots, engine_hint=None, *, layout='auto', recursive=True)#
List unique engine names discovered under the given roots.
- Parameters:
roots (Sequence[str | pathlib.Path]) – Files or directories to inspect.
engine_hint (str | None) – Optional engine hint used during parsing.
layout (LayoutMode) – Extraction layout mode.
recursive (bool) – Whether directory-based layouts should recurse into nested folders.
- Returns:
Set of lowercased engine names found in the extracted data.
- Return type:
Example#
extractor = Extractor() engines = extractor.list_engines( roots=["logs"], layout="engine_tree", )
- extract_log_file(path, *, engine)#
Extract scores from a single explicit log file.
- Parameters:
path (str | pathlib.Path) – Path to a
.logor.txtfile.engine (str) – Required engine name for the file.
- Returns:
Extracted dataframe or
Nonewhen no rows are found.- Return type:
pandas.DataFrame | None
Example#
extractor = Extractor() df = extractor.extract_log_file( "logs/erlotinib.log", engine="vina", )
- extract_logs_dir(path, *, engine, recursive=True)#
Extract scores from a directory of log files belonging to one engine.
- Parameters:
path (str | pathlib.Path) – Directory containing
.logor.txtfiles.engine (str) – Required engine name applied to all discovered files.
recursive (bool) – Whether nested subdirectories should also be searched.
- Returns:
Extracted dataframe or
Nonewhen no rows are found.- Return type:
pandas.DataFrame | None
Example#
extractor = Extractor() df = extractor.extract_logs_dir( "logs/smina_run", engine="smina", recursive=True, )
- extract_engine_folders(path, *, recursive=True)#
Extract scores from a high-level directory whose immediate subfolders are engine names.
- Parameters:
path (str | pathlib.Path) – High-level logs directory.
recursive (bool) – Whether nested subdirectories inside each engine folder should also be searched.
- Returns:
Extracted dataframe or
Nonewhen no rows are found.- Return type:
pandas.DataFrame | None
Example#
extractor = Extractor() df = extractor.extract_engine_folders( "logs", recursive=True, )
- extract_scores(roots, engines=None, engine_hint=None, *, layout='auto', recursive=True)#
Extract scores using the default
Extractorinstance.- Parameters:
roots (Sequence[str | pathlib.Path]) – Files or directories to inspect.
engines (Iterable[str] | None) – Optional iterable of engine filters.
engine_hint (str | None) – Optional engine hint used during parsing.
layout (LayoutMode) – Extraction layout mode.
recursive (bool) – Whether directory-based layouts should recurse into nested folders.
- Returns:
Extracted dataframe or
Nonewhen no data is found.- Return type:
pandas.DataFrame | None
Example#
df = extract_scores( roots=["logs"], layout="engine_tree", )
- list_engines(roots, engine_hint=None, *, layout='auto', recursive=True)#
List unique engine names using the default
Extractorinstance.- Parameters:
roots (Sequence[str | pathlib.Path]) – Files or directories to inspect.
engine_hint (str | None) – Optional engine hint used during parsing.
layout (LayoutMode) – Extraction layout mode.
recursive (bool) – Whether directory-based layouts should recurse into nested folders.
- Returns:
Set of lowercased engine names.
- Return type:
Example#
engines = list_engines( roots=["logs"], layout="engine_tree", )
- extract_log_file(path, *, engine)#
Extract scores from a single explicit log file using the default extractor.
- Parameters:
path (str | pathlib.Path) – Path to a
.logor.txtfile.engine (str) – Required engine name for the file.
- Returns:
Extracted dataframe or
Nonewhen no rows are found.- Return type:
pandas.DataFrame | None
Example#
df = extract_log_file( "logs/erlotinib.log", engine="vina", )
- extract_logs_dir(path, *, engine, recursive=True)#
Extract scores from a flat log directory using the default extractor.
- Parameters:
path (str | pathlib.Path) – Directory containing
.logor.txtfiles.engine (str) – Required engine name applied to all discovered files.
recursive (bool) – Whether nested subdirectories should also be searched.
- Returns:
Extracted dataframe or
Nonewhen no rows are found.- Return type:
pandas.DataFrame | None
Example#
df = extract_logs_dir( "logs/smina_run", engine="smina", recursive=True, )
- extract_engine_folders(path, *, recursive=True)#
Extract scores from a high-level directory whose immediate subfolders are engine names.
- Parameters:
path (str | pathlib.Path) – High-level logs directory.
recursive (bool) – Whether nested subdirectories inside each engine folder should also be searched.
- Returns:
Extracted dataframe or
Nonewhen no rows are found.- Return type:
pandas.DataFrame | None
Example#
df = extract_engine_folders( "logs", recursive=True, )
- parse_log_text(text, engine=None, regex=None)#
Parse docking log text using built-in or custom regex rules.
The parser first resolves the engine name using
canonicalize_engine_name()and, if needed, automatic engine detection. Whenregexis provided, a custom row pattern is tried first. If custom parsing yields rows, those rows are returned immediately. Otherwise, the built-in engine-specific parsers are used.For GNINA logs, the parser will first try the GNINA table parser and then fall back to the Vina-family parser if no GNINA rows are found. This is useful because some GNINA outputs may also contain Vina-like score tables.
- Parameters:
text (str) – Raw docking log text to parse.
engine (Optional[str]) – Optional engine name such as
"vina","smina","qvina", or"gnina". If omitted, the engine is inferred automatically from the input text when possible.regex (Optional[dict[str, str]]) – Optional mapping of custom regex patterns. Supported keys are
"vina_row"and"gnina_row". The selected pattern must expose four capture groups in the same order as the built-in parser expects.
- Returns:
Parsed docking rows. For Vina-family logs, rows contain
mode,affinity_kcal_mol,rmsd_lb, andrmsd_ub. For GNINA logs, rows containmode,affinity_kcal_mol,cnn_pose, andcnn_affinity.- Return type:
Example#
text = ''' -----+------------+----------+---------- 1 -7.5 0.000 0.000 2 -7.1 1.200 2.400 ''' rows = parse_log_text(text, engine="vina")
Example#
custom = { "vina_row": r"^\s*(\d+)\s+([-+]?\d*\.?\d+)\s+([-+]?\d*\.?\d+)\s+([-+]?\d*\.?\d+)$" } rows = parse_log_text(text, engine="vina", regex=custom)
Pose processing#
- class PoseCrawler(roots, *, engine=None, recursive=True, docked_suffix='_docked.pdbqt')#
Bases:
objectHigh-level helper for discovering, summarizing, converting, and loading docked poses.
This class provides a compact interface over the lower-level pose utilities for:
discovering pose files
building
PoseRecordentriesconverting records into DataFrames
loading RDKit molecules
selecting the best-scoring pose per group
converting discovered
.pdbqtfiles to.sdf
Supported input layouts#
A direct path to one
.pdbqtfile withengine=....A direct path to a flat folder of
.pdbqtfiles withengine=....A higher-level ProDock tree such as
<root>/<receptor>/results/docked/<engine>/*.pdbqt, where receptor id and engine are inferred automatically.
Important#
When a root is a directory, only files whose names end with
"_docked.pdbqt"are retained. This prevents receptor preparation files such asfiltered_protein/4WKQ.pdbqtfrom being treated as docked ligand poses.Direct file inputs are not filtered by suffix. This preserves the original direct-file behavior.
- param roots:
Root files or directories to inspect.
- type roots:
Sequence[str | pathlib.Path]
- param engine:
Optional engine hint for direct-file or flat-directory inputs, or an optional filter for hierarchical ProDock trees.
- type engine:
Optional[str]
- param recursive:
Whether nested directories should be searched recursively.
- type recursive:
bool
- param docked_suffix:
Required filename suffix applied only to records discovered from directory roots. Default is
"_docked.pdbqt".- type docked_suffix:
str
Example#
from prodock.postprocess.pose.core import PoseCrawler crawler = PoseCrawler(["Data/testcase/post"]) df = crawler.crawl() best_df = crawler.best() mol_df = crawler.crawl_mols(save_sdf=True) best_mol_df = crawler.best_mols() sdf_paths = crawler.convert(out_dir="Data/testcase/post/converted_sdf")
A direct single-file workflow is also supported:
crawler = PoseCrawler( ["Data/testcase/post/1M17/results/docked/vina/erlotinib_docked.pdbqt"], engine="vina", ) df = crawler.crawl()
Common real input examples include:
"Data/testcase/post/1M17/results/docked/qvina/erlotinib_docked.pdbqt" "Data/testcase/post/1M17/results/docked/smina/erlotinib_docked.pdbqt" "Data/testcase/post/1M17/results/docked/vina/erlotinib_docked.pdbqt"
- records()#
Return discovered pose records.
This method delegates to
prodock.postprocess.pose.io.build_pose_records()using the crawler configuration captured at initialization, then filters directory-derived records so that only*_docked.pdbqtfiles are retained.- Returns:
Discovered pose records.
- Return type:
list[prodock.postprocess.pose.model.PoseRecord]
Example#
crawler = PoseCrawler(["Data/testcase/post"]) records = crawler.records()
- crawl()#
Return discovered pose records as a standardized DataFrame.
The returned table uses the public pose schema:
receptor_idligand_idenginepose_rankaffinity
- Returns:
Pose summary table.
- Return type:
pandas.DataFrame
Example#
crawler = PoseCrawler(["Data/testcase/post"]) df = crawler.crawl()
- crawl_mols(
- *,
- backend='auto',
- sanitize=True,
- remove_hs=False,
- save_sdf=False,
- overwrite_sdf=False,
Return a DataFrame containing pose metadata and RDKit molecules.
This method loads molecules from the discovered pose files and returns a standardized table with the public pose-plus-molecule schema:
receptor_idligand_idenginepose_rankaffinitymol
For directory roots, only records whose source files end with
*_docked.pdbqtare processed.- Parameters:
backend (str) – Conversion backend used during PDBQT-to-SDF conversion.
sanitize (bool) – Whether imported RDKit molecules should be sanitized.
remove_hs (bool) – Whether hydrogens should be removed during SDF import.
save_sdf (bool) – Whether to also write an SDF file beside each source
.pdbqtfile.overwrite_sdf (bool) – Whether an existing neighboring SDF file may be overwritten.
- Returns:
Pose table with RDKit molecules.
- Return type:
pandas.DataFrame
Example#
crawler = PoseCrawler( ["Data/testcase/post/1M17/results/docked/smina/erlotinib_docked.pdbqt"], engine="smina", ) mol_df = crawler.crawl_mols(save_sdf=True)
- best(*, by=('receptor_id', 'ligand_id', 'engine'))#
Return best-scoring pose rows per group.
Lower affinity is treated as better. By default, one best row is selected for each
(receptor_id, ligand_id, engine)group.- Parameters:
by (Sequence[str]) – Grouping columns that define independent selection groups.
- Returns:
Best-scoring rows per group.
- Return type:
pandas.DataFrame
Example#
crawler = PoseCrawler(["Data/testcase/post"]) best_df = crawler.best()
- best_mols(
- *,
- by=('receptor_id', 'ligand_id', 'engine'),
- backend='obabel',
- sanitize=True,
- remove_hs=False,
- save_sdf=False,
- overwrite_sdf=False,
Return best-scoring pose rows per group, including RDKit molecules.
This method first builds a pose-plus-molecule DataFrame and then applies best-pose selection on top of it.
- Parameters:
by (Sequence[str]) – Grouping columns that define independent selection groups.
backend (str) – Conversion backend used during PDBQT-to-SDF conversion.
sanitize (bool) – Whether imported RDKit molecules should be sanitized.
remove_hs (bool) – Whether hydrogens should be removed during SDF import.
save_sdf (bool) – Whether to also write an SDF file beside each source
.pdbqtfile.overwrite_sdf (bool) – Whether an existing neighboring SDF file may be overwritten.
- Returns:
Best-scoring rows with molecule objects.
- Return type:
pandas.DataFrame
Example#
crawler = PoseCrawler(["Data/testcase/post"]) best_mol_df = crawler.best_mols(save_sdf=False)
- convert(*, backend='obabel', overwrite=False, out_dir=None)#
Convert discovered PDBQT pose files into SDF files.
When
out_diris omitted, each SDF file is written beside its source.pdbqtfile. Whenout_diris provided, converted files are written into that shared destination directory.For directory roots, only files ending with
*_docked.pdbqtare converted. Direct file inputs are preserved unchanged.- Parameters:
backend (str) – Conversion backend used for PDBQT-to-SDF conversion.
overwrite (bool) – Whether existing output files may be overwritten.
out_dir (Optional[str | pathlib.Path]) – Optional shared output directory. When omitted, SDF files are saved beside the source
.pdbqtfiles.
- Returns:
Written or reused SDF paths.
- Return type:
Example#
crawler = PoseCrawler(["Data/testcase/post"]) sdf_paths = crawler.convert( out_dir="Data/testcase/post/converted_sdf", overwrite=True, )
- crawl_poses(roots, *, engine=None, recursive=True, docked_suffix='_docked.pdbqt')#
Convenience wrapper around
PoseCrawler.crawl().- Parameters:
roots (Sequence[str | pathlib.Path]) – Root files or directories to inspect.
engine (Optional[str]) – Optional engine hint or filter.
recursive (bool) – Whether nested directories should be searched recursively.
docked_suffix (str) – Required filename suffix applied only to records discovered from directory roots.
- Returns:
Standardized pose summary table.
- Return type:
pandas.DataFrame
Example#
df = crawl_poses(["Data/testcase/post"])
- crawl_pose_mols(
- roots,
- *,
- engine=None,
- recursive=True,
- docked_suffix='_docked.pdbqt',
- backend='obabel',
- sanitize=True,
- remove_hs=False,
- save_sdf=False,
- overwrite_sdf=False,
Convenience wrapper around
PoseCrawler.crawl_mols().- Parameters:
roots (Sequence[str | pathlib.Path]) – Root files or directories to inspect.
engine (str | None) – Optional engine hint or filter. :type engine: Optional[str]
recursive (bool) – Whether nested directories should be searched recursively.
docked_suffix (str) – Required filename suffix applied only to records discovered from directory roots.
backend (str) – Conversion backend used during PDBQT-to-SDF conversion.
sanitize (bool) – Whether imported RDKit molecules should be sanitized.
remove_hs (bool) – Whether hydrogens should be removed during SDF import.
save_sdf (bool) – Whether to also write an SDF beside each source
.pdbqtfile.overwrite_sdf (bool) – Whether an existing neighboring SDF file may be overwritten.
- Returns:
Standardized pose-plus-molecule table.
- Return type:
pandas.DataFrame
Example#
mol_df = crawl_pose_mols( ["Data/testcase/post/1M17/results/docked/qvina/erlotinib_docked.pdbqt"], engine="qvina", save_sdf=True, )
- select_best_poses(
- roots,
- *,
- engine=None,
- recursive=True,
- docked_suffix='_docked.pdbqt',
- by=('receptor_id', 'ligand_id', 'engine'),
Convenience wrapper around
PoseCrawler.best().- Parameters:
roots (Sequence[str | pathlib.Path]) – Root files or directories to inspect.
engine (Optional[str]) – Optional engine hint or filter.
recursive (bool) – Whether nested directories should be searched recursively.
docked_suffix (str) – Required filename suffix applied only to records discovered from directory roots.
by (Sequence[str]) – Grouping columns that define independent selection groups.
- Returns:
Best-scoring pose rows per group.
- Return type:
pandas.DataFrame
Example#
best_df = select_best_poses(["Data/testcase/post"])
- select_best_pose_mols(
- roots,
- *,
- engine=None,
- recursive=True,
- docked_suffix='_docked.pdbqt',
- by=('receptor_id', 'ligand_id', 'engine'),
- backend='obabel',
- sanitize=True,
- remove_hs=False,
- save_sdf=False,
- overwrite_sdf=False,
Convenience wrapper around
PoseCrawler.best_mols().- Parameters:
roots (Sequence[str | pathlib.Path]) – Root files or directories to inspect.
engine (Optional[str]) – Optional engine hint or filter.
recursive (bool) – Whether nested directories should be searched recursively.
docked_suffix (str) – Required filename suffix applied only to records discovered from directory roots.
by (Sequence[str]) – Grouping columns that define independent selection groups.
backend (str) – Conversion backend used during PDBQT-to-SDF conversion.
sanitize (bool) – Whether imported RDKit molecules should be sanitized.
remove_hs (bool) – Whether hydrogens should be removed during SDF import.
save_sdf (bool) – Whether to also write an SDF beside each source
.pdbqtfile.overwrite_sdf (bool) – Whether an existing neighboring SDF file may be overwritten.
- Returns:
Best-scoring pose rows with molecule objects.
- Return type:
pandas.DataFrame
Example#
best_mol_df = select_best_pose_mols( ["Data/testcase/post"], save_sdf=False, )
- save_pose_sdf(pdbqt_file, *, backend='obabel', overwrite=False, out_file=None)#
Convert a docked
.pdbqtpose file to.sdfand save it on disk.By default, the output SDF is written next to the input file using the same file stem. An explicit output path may also be supplied via
out_file.If the destination file already exists and
overwriteisFalse, the existing path is returned without performing a new conversion.- Parameters:
pdbqt_file (str | pathlib.Path) – Input docked
.pdbqtfile.backend (str) – Conversion backend passed to
prodock.structure.conversion.pdbqt_to_sdf().overwrite (bool) – Whether an existing output file may be overwritten.
out_file (Optional[str | pathlib.Path]) – Optional explicit destination path. When omitted, the output file is created beside the input file with suffix
.sdf.
- Returns:
Path to the written or reused SDF file.
- Return type:
Example#
sdf_path = save_pose_sdf( "Data/testcase/post/1M17/results/docked/qvina/erlotinib_docked.pdbqt", backend="obabel", overwrite=True, )
- pdbqt_to_rdkit_mols(pdbqt_file, *, backend='auto', sanitize=True, remove_hs=False)#
Convert a docked
.pdbqtfile into RDKit molecules via a temporary SDF.The function first converts the input
.pdbqtfile into a temporary SDF file usingprodock.structure.conversion.pdbqt_to_sdf(), then loads the molecules withrdkit.Chem.SDMolSupplier.Invalid molecules returned as
Noneby the supplier are discarded.- Parameters:
pdbqt_file (str | pathlib.Path) – Input docked
.pdbqtfile.backend (str) – Conversion backend passed to the PDBQT-to-SDF converter.
sanitize (bool) – Whether RDKit sanitization should be applied while reading the temporary SDF.
remove_hs (bool) – Whether hydrogens should be removed during SDF import.
- Returns:
List of successfully loaded RDKit molecule objects.
- Return type:
list[rdkit.Chem.Mol]
Example#
mols = pdbqt_to_rdkit_mols( "Data/testcase/post/1M17/results/docked/smina/erlotinib_docked.pdbqt", sanitize=True, remove_hs=False, )
- convert_pose_tree(
- roots,
- *,
- engine=None,
- recursive=True,
- backend='obabel',
- overwrite=False,
- out_dir=None,
Convert discovered
.pdbqtpose files into.sdffiles.When
out_diris omitted, each output SDF is written beside its source.pdbqtfile. Whenout_diris provided, all SDF files are written into that shared destination directory.If multiple input files share the same stem and a shared
out_diris used, unique filenames are generated by appending suffixes such as_2,_3, and so on.- Parameters:
roots (Sequence[str | pathlib.Path]) – Root files or directories to inspect.
engine (Optional[str]) – Optional engine filter applied to the parent directory name of discovered pose files.
recursive (bool) – Whether to recurse into nested directories during file discovery.
backend (str) – Conversion backend passed to the underlying PDBQT-to-SDF converter.
overwrite (bool) – Whether existing SDF files may be overwritten.
out_dir (Optional[str | pathlib.Path]) – Optional shared output directory for all converted SDF files.
- Returns:
Paths to written or reused SDF files.
- Return type:
Example#
outputs = convert_pose_tree( ["Data/testcase/post/1M17/results/docked"], engine="vina", recursive=True, out_dir="Data/testcase/post/converted_sdf", )
Interaction analysis#
- class InteractionProfiler(
- interactions=None,
- parameters=None,
- count=False,
- vicinity_cutoff=6.0,
- receptor_selection=None,
- receptor_use_segid=None,
- ligand_resname='LIG',
- ligand_resnumber=1,
- ligand_chain='',
- ligand_use_segid=False,
- sdf_sanitize=True,
- receptor_guess_bonds=True,
- receptor_vdwradii=None,
- suppress_mdanalysis_warnings=True,
- suppress_mdanalysis_info_logs=True,
- progress=False,
- n_jobs=1,
- drop_empty=True,
Bases:
objectHigh-level helper for protein-ligand interaction extraction using ProLIF.
This class stores all interaction-analysis settings in one place and exposes two main execution methods:
run()for one receptor plus one ligand sourcerun_pose_table()for automated pose-table workflows across one or multiple receptors
The class is designed for ProDock automation and notebook workflows where reproducible settings, pose-level summaries, and optional fingerprint vectors are useful.
- Parameters:
interactions (Optional[Sequence[str]]) – Optional subset of ProLIF interaction names to enable. If
None, ProLIF defaults are used.parameters (Optional[Dict[str, Dict[str, Any]]]) – Optional parameter overrides passed directly to ProLIF interaction definitions.
count (bool) – Whether to generate count fingerprints instead of boolean fingerprints.
vicinity_cutoff (float) – Distance cutoff used by ProLIF when automatically selecting nearby receptor residues.
receptor_selection (Optional[str]) – Optional MDAnalysis selection string for the receptor. If
None, all atoms from the receptor structure are used.receptor_use_segid (Optional[bool]) – Whether ProLIF should use segment id instead of chain id for receptor residue identifiers.
ligand_resname (str) – Default ligand residue name used when an RDKit molecule has no residue metadata.
ligand_resnumber (int) – Default ligand residue number used when an RDKit molecule has no residue metadata.
ligand_chain (str) – Default ligand chain id used when an RDKit molecule has no residue metadata.
ligand_use_segid (bool) – Whether ProLIF should use segment id instead of chain id for ligands.
sdf_sanitize (bool) – Whether RDKit should sanitize molecules when reading an SDF file.
receptor_guess_bonds (bool) – Whether to proactively guess receptor bond topology before ProLIF converts the receptor to RDKit.
receptor_vdwradii (Optional[Mapping[str, float]]) – Optional VdW radii mapping forwarded to MDAnalysis bond guessing.
suppress_mdanalysis_warnings (bool) – Whether to suppress known non-actionable MDAnalysis warnings.
suppress_mdanalysis_info_logs (bool) – Whether to suppress repeated MDAnalysis info log messages.
progress (bool) – Whether ProLIF should show a progress bar.
n_jobs (Optional[int]) – Number of parallel jobs used by ProLIF.
drop_empty (bool) – Whether to drop empty columns in the wide fingerprint table.
Example#
Create a profiler and run interaction extraction for one SDF file:
from prodock.postprocess.interaction.core import InteractionProfiler profiler = InteractionProfiler( count=False, vicinity_cutoff=6.0, progress=False, n_jobs=1, ) result = profiler.run( receptor_pdb="Data/testcase/Multi/1M17/filtered_protein/1M17.pdb", ligands="Data/testcase/post/1M17/erlotinib.sdf", ) print(result.fingerprint_df.head()) print(result.interaction_df.head())
Example#
Run pose-table automation for multiple receptors:
profiler = InteractionProfiler(progress=False, n_jobs=1) result = profiler.run_pose_table( poses=df, receptor_pdb_by_id={ "1M17": "Data/testcase/Multi/1M17/filtered_protein/1M17.pdb", "4WKQ": "Data/testcase/Multi/4WKQ/filtered_protein/4WKQ.pdb", }, batch_size=10, include_interaction_events=True, include_bitvectors=False, include_countvectors=False, fail_fast=True, ) merged_df = result.merged_df interaction_df = result.interaction_df summary_df = result.summary_df
- available_interactions(show_hidden=False, show_bridged=False)#
List interactions available in the installed ProLIF version.
- settings_snapshot()#
Return a serializable snapshot of the current profiler settings.
- Returns:
Serializable dictionary of profiler settings.
- Return type:
Dict[str, Any]
- run(receptor_pdb, ligands, residues=None)#
Extract protein-ligand interactions for one receptor and one ligand input source.
- Parameters:
receptor_pdb (str | pathlib.Path) – Path to the receptor PDB file.
ligands (str | pathlib.Path | Any | Sequence[Any] | Iterable[Any] | Mapping[str, Any]) – Ligand input source.
residues (Optional[Sequence[str] | str]) – Optional residue subset passed to
Fingerprint.run_from_iterable.
- Returns:
Structured interaction extraction result.
- Return type:
InteractionRunResult
- run_pose_table(
- poses,
- receptor_pdb_by_id,
- *,
- receptor_col='receptor_id',
- ligand_col='ligand_id',
- engine_col='engine',
- pose_rank_col='pose_rank',
- affinity_col='affinity',
- mol_col='mol',
- pose_id_col=None,
- residues=None,
- batch_size=1,
- include_fingerprint_columns=False,
- include_interaction_events=True,
- include_bitvectors=False,
- include_countvectors=False,
- fingerprint_prefix='ifp__',
- gc_collect=True,
- fail_fast=True,
- ultra_safe=True,
Compute pose-centric interactions for a pose table.
- Parameters:
poses (pandas.DataFrame) – Input pose table with at least receptor, ligand, engine, rank, affinity, and molecule columns.
receptor_pdb_by_id (Mapping[str, str | pathlib.Path]) – Mapping from receptor id to receptor PDB path.
receptor_col (str) – Column containing receptor identifiers.
ligand_col (str) – Column containing ligand identifiers.
engine_col (str) – Column containing engine identifiers.
pose_rank_col (str) – Column containing pose rank.
affinity_col (str) – Column containing affinity score.
mol_col (str) – Column containing RDKit molecules.
pose_id_col (Optional[str]) – Optional pre-existing pose id column.
residues (Optional[Sequence[str] | str]) – Optional ProLIF residue subset.
batch_size (int) – Number of poses to process together when
ultra_safeisFalse.include_fingerprint_columns (bool) – Retained for API compatibility.
include_interaction_events (bool) – Whether to compute and store raw event payloads.
include_bitvectors (bool) – Whether to collect ProLIF bitvectors aligned to pose order.
include_countvectors (bool) – Whether to collect ProLIF countvectors aligned to pose order.
fingerprint_prefix (str) – Retained for API compatibility.
gc_collect (bool) – Whether to call garbage collection between batches.
fail_fast (bool) – Whether to stop immediately on the first failing batch.
ultra_safe (bool) – Whether to force one-pose-at-a-time processing.
- Returns:
Pose-centric interaction result.
- Return type:
PoseInteractionTableResult
- extract_interactions(
- receptor_pdb,
- ligands,
- *,
- interactions=None,
- parameters=None,
- count=False,
- vicinity_cutoff=6.0,
- receptor_selection=None,
- receptor_use_segid=None,
- ligand_resname='LIG',
- ligand_resnumber=1,
- ligand_chain='',
- ligand_use_segid=False,
- sdf_sanitize=True,
- receptor_guess_bonds=True,
- receptor_vdwradii=None,
- suppress_mdanalysis_warnings=True,
- suppress_mdanalysis_info_logs=True,
- progress=False,
- n_jobs=1,
- residues=None,
- drop_empty=True,
Convenience wrapper around
InteractionProfilerfor single-run extraction.- Returns:
Structured single-run interaction result.
- Return type:
InteractionRunResult
- Parameters:
ligands (str | Path | Any | Sequence[Any] | Iterable[Any] | Mapping[str, Any])
count (bool)
vicinity_cutoff (float)
receptor_selection (str | None)
receptor_use_segid (bool | None)
ligand_resname (str)
ligand_resnumber (int)
ligand_chain (str)
ligand_use_segid (bool)
sdf_sanitize (bool)
receptor_guess_bonds (bool)
suppress_mdanalysis_warnings (bool)
suppress_mdanalysis_info_logs (bool)
progress (bool)
n_jobs (int | None)
drop_empty (bool)
- extract_pose_table_interactions(
- poses,
- receptor_pdb_by_id,
- *,
- interactions=None,
- parameters=None,
- count=False,
- vicinity_cutoff=6.0,
- receptor_selection=None,
- receptor_use_segid=None,
- ligand_resname='LIG',
- ligand_resnumber=1,
- ligand_chain='',
- ligand_use_segid=False,
- sdf_sanitize=True,
- receptor_guess_bonds=True,
- receptor_vdwradii=None,
- suppress_mdanalysis_warnings=True,
- suppress_mdanalysis_info_logs=True,
- progress=False,
- n_jobs=1,
- receptor_col='receptor_id',
- ligand_col='ligand_id',
- engine_col='engine',
- pose_rank_col='pose_rank',
- affinity_col='affinity',
- mol_col='mol',
- pose_id_col=None,
- residues=None,
- batch_size=1,
- include_fingerprint_columns=False,
- include_interaction_events=True,
- include_bitvectors=False,
- include_countvectors=False,
- fingerprint_prefix='ifp__',
- gc_collect=True,
- fail_fast=True,
- ultra_safe=True,
- drop_empty=True,
Convenience wrapper for automated pose-table interaction extraction.
- Returns:
Pose-centric interaction result containing
merged_df,interaction_df, andsummary_df.- Return type:
PoseInteractionTableResult
- Parameters:
poses (pandas.DataFrame)
count (bool)
vicinity_cutoff (float)
receptor_selection (str | None)
receptor_use_segid (bool | None)
ligand_resname (str)
ligand_resnumber (int)
ligand_chain (str)
ligand_use_segid (bool)
sdf_sanitize (bool)
receptor_guess_bonds (bool)
suppress_mdanalysis_warnings (bool)
suppress_mdanalysis_info_logs (bool)
progress (bool)
n_jobs (int | None)
receptor_col (str)
ligand_col (str)
engine_col (str)
pose_rank_col (str)
affinity_col (str)
mol_col (str)
pose_id_col (str | None)
batch_size (int)
include_fingerprint_columns (bool)
include_interaction_events (bool)
include_bitvectors (bool)
include_countvectors (bool)
fingerprint_prefix (str)
gc_collect (bool)
fail_fast (bool)
ultra_safe (bool)
drop_empty (bool)
- tanimoto_similarity_matrix(vectors, names=None)#
Compute a pairwise Tanimoto similarity matrix.
This function computes the all-against-all Tanimoto similarity between fingerprint vectors and returns the result as a square
pandas.DataFrame. It is suitable for RDKit explicit bit vectors as well as sparse count vectors, which makes it useful for both boolean and count-based ProLIF fingerprints.When
namesis not provided, default labels of the formmol_0000,mol_0001, and so on are generated automatically.- Parameters:
vectors (Sequence[Any]) – Sequence of RDKit-compatible fingerprint vectors. Each element should be accepted by
rdkit.DataStructs.TanimotoSimilarity().names (Sequence[str] | None) – Optional labels to use for both the row index and column names of the returned matrix. When omitted, default molecule labels are generated from the vector order.
- Returns:
Square dataframe whose
(i, j)entry contains the Tanimoto similarity betweenvectors[i]andvectors[j].- Return type:
pandas.DataFrame
- Raises:
MissingDependencyError – Raised when RDKit is not installed or cannot be imported.
Example#
sim = tanimoto_similarity_matrix( vectors=result.bitvectors, names=result.molecule_names, ) print(sim.iloc[:5, :5])
Another Example#
sim = tanimoto_similarity_matrix(vectors) assert sim.shape == (len(vectors), len(vectors))
- class JournalStyle(
- name,
- palette,
- heatmap_cmap='cividis',
- background='white',
- panel_facecolor='white',
- grid_color='#d9dde3',
- spine_color='#4a4f57',
- text_color='#222222',
- title_size=10.5,
- label_size=9.0,
- tick_size=8.0,
- panel_label_size=11.0,
- line_width=0.8,
- grid_alpha=0.35,
- histogram_alpha=0.85,
- scatter_alpha=0.85,
- marker_size=20.0,
Bases:
objectVisual style configuration for publication-style figures.
- Parameters:
name (str)
heatmap_cmap (str)
background (str)
panel_facecolor (str)
grid_color (str)
spine_color (str)
text_color (str)
title_size (float)
label_size (float)
tick_size (float)
panel_label_size (float)
line_width (float)
grid_alpha (float)
histogram_alpha (float)
scatter_alpha (float)
marker_size (float)
- build_pose_visualization_table(result)#
Build a merged pose-level dataframe convenient for plotting.
- Parameters:
result (PoseInteractionTableResult)
- Return type:
pandas.DataFrame
- make_affinity_histogram(
- result,
- *,
- bins=20,
- group_by=None,
- figsize=(3.35, 2.6),
- title='Affinity distribution',
- xlabel='Affinity',
- ylabel='Count',
- style='nature',
- save_affinity_histogram(result, output_path, **kwargs)#
- make_best_pose_bar(
- result,
- *,
- group_cols=('receptor_id', 'ligand_id', 'engine'),
- figsize=(4.5, 2.8),
- title='Best pose per group',
- xlabel='Group',
- ylabel='Best affinity',
- style='nature',
- save_best_pose_bar(result, output_path, **kwargs)#
- make_interaction_type_bar(
- result,
- *,
- top_n=10,
- normalize=False,
- figsize=(3.35, 2.6),
- title='Interaction type frequency',
- xlabel='Type',
- ylabel=None,
- style='nature',
- save_interaction_type_bar(result, output_path, **kwargs)#
- make_residue_contact_bar(
- result,
- *,
- interaction_type=None,
- top_n=15,
- normalize=False,
- figsize=(4.0, 2.6),
- title='Residue contact frequency',
- xlabel='Residue',
- ylabel=None,
- style='nature',
- save_residue_contact_bar(result, output_path, **kwargs)#
- make_interaction_count_histogram(
- result,
- *,
- bins=20,
- count_kind='compact',
- group_by=None,
- figsize=(3.35, 2.6),
- title='Interaction count distribution',
- xlabel=None,
- ylabel='Count',
- style='nature',
- save_interaction_count_histogram(result, output_path, **kwargs)#
- make_affinity_vs_interaction_count_scatter(
- result,
- *,
- count_kind='compact',
- group_by=None,
- figsize=(3.35, 2.6),
- title='Affinity vs interaction count',
- xlabel='Affinity',
- ylabel=None,
- style='nature',
- save_affinity_vs_interaction_count_scatter(result, output_path, **kwargs)#
- plot_similarity_heatmap(
- result,
- *,
- figsize=(4.2, 3.5),
- annotate=False,
- title='Fingerprint similarity',
- xlabel='Pose',
- ylabel='Pose',
- vmin=0.0,
- vmax=1.0,
- style='nature',
- max_labels=30,
- save_similarity_heatmap(result, output_path, *, dpi=300, **kwargs)#
- make_summary_panel_2x3(
- result,
- *,
- style='nature',
- figsize=(10.5, 6.8),
- top_n_types=8,
- top_n_residues=12,
- residue_interaction_type='Hydrophobic',
- scatter_group_by='engine',
- hist_group_by='engine',
- title='Docking interaction summary',
Build a 2x3 publication-style summary panel.
Panels: a) affinity distribution b) best affinity per receptor-ligand-engine group c) interaction type frequency d) residue contact frequency e) interaction count distribution f) affinity vs interaction count
- save_summary_panel_2x3(
- result,
- output_path,
- *,
- style='nature',
- figsize=(10.5, 6.8),
- top_n_types=8,
- top_n_residues=12,
- residue_interaction_type='Hydrophobic',
- scatter_group_by='engine',
- hist_group_by='engine',
- title='Docking interaction summary',
- dpi=300,
- Parameters:
- Return type: