Database API#
Core database#
- class PoseDatabase(db_path, *, compress_mol=True, create=True, timeout=30.0)#
Bases:
objectSQLite database wrapper for docking pose, score, and interaction storage.
The wrapper exposes convenience APIs for three common workflows:
Insert docking poses from row mappings or a pandas DataFrame
Store interactions either row-by-row or from pose-keyed dictionaries
Query poses, scores, and interactions with flexible filters
Tables#
receptors: receptor dimension tableligands: ligand dimension tableengines: docking engine dimension tableposes: pose identity, optional externalpose_id, and moleculespose_scores: affinity and score payloadsinteractions: one row per interaction event or summary interaction
If a DataFrame does not provide an external
pose_idcolumn, the logical unique key remains(receptor_id, ligand_id, engine, pose_rank). If an externalpose_idis present, it is stored and can later be used to import interactions from pose-keyed dictionaries.- param db_path:
SQLite database file path.
- type db_path:
PathLike
- param compress_mol:
Whether serialized RDKit molecules should be compressed with
zlib.- type compress_mol:
bool
- param create:
Whether to create the schema on initialization.
- type create:
bool
- param timeout:
SQLite connection timeout in seconds.
- type timeout:
float
Example#
from prodock.database import PoseDatabase db = PoseDatabase("poses.sqlite") db.insert_dataframe(df) db.upsert_interaction_payload(interactions_by_pose) fp = db.interaction_fingerprint(mode="binary")
- property connection: Connection#
Return the underlying SQLite connection.
- Returns:
Active SQLite connection.
- Return type:
- close()#
Close the active SQLite connection.
- Returns:
None
- Return type:
None
- create_schema()#
Create the database schema if it does not yet exist.
- Returns:
None
- Return type:
None
Example#
db = PoseDatabase("poses.sqlite", create=False) db.create_schema()
- upsert_pose(
- *,
- receptor_id,
- ligand_id,
- engine,
- pose_rank,
- affinity,
- mol,
- pose_id=None,
- pose_metadata=None,
- score_data=None,
- score_metadata=None,
- receptor_metadata=None,
- ligand_metadata=None,
- engine_metadata=None,
Insert or update one docking pose and its score row.
- Parameters:
receptor_id (str) – Receptor identifier.
ligand_id (str) – Ligand identifier.
engine (str) – Docking engine name.
pose_rank (int) – Pose rank within the receptor-ligand-engine group.
affinity (Optional[float]) – Primary affinity score.
mol (rdchem.Mol) – RDKit molecule to store.
pose_id (Optional[str]) – Optional external stable pose identifier.
pose_metadata (Optional[Mapping[str, Any]]) – Optional pose metadata payload.
score_data (Optional[Mapping[str, Any]]) – Optional structured score payload.
score_metadata (Optional[Mapping[str, Any]]) – Optional score metadata payload.
receptor_metadata (Optional[Mapping[str, Any]]) – Optional receptor metadata.
ligand_metadata (Optional[Mapping[str, Any]]) – Optional ligand metadata.
engine_metadata (Optional[Mapping[str, Any]]) – Optional engine metadata.
- Returns:
Internal
pose_db_id.- Return type:
Example#
from rdkit import Chem mol = Chem.MolFromSmiles("CCO") pose_db_id = db.upsert_pose( receptor_id="1M17", ligand_id="erlotinib", engine="qvina", pose_rank=1, affinity=-8.1, mol=mol, pose_id="1M17__erlotinib__qvina__pose1", )
- insert_many(rows, *, replace=True)#
Insert many pose rows inside one transaction.
Each row should contain at least
receptor_id,ligand_id,engine,pose_rank,affinity, andmol. An optional external stringpose_idis supported.- Parameters:
- Returns:
None
- Return type:
None
Example#
rows = [ { "pose_id": "1M17__erol__qvina__pose1", "receptor_id": "1M17", "ligand_id": "erol", "engine": "qvina", "pose_rank": 1, "affinity": -8.2, "mol": mol, } ] db.insert_many(rows, replace=True)
- insert_dataframe(
- df,
- *,
- replace=True,
- interactions_by_pose=None,
- replace_interactions=True,
Insert a pandas DataFrame of docking poses.
Required columns are
receptor_id,ligand_id,engine,pose_rank,affinity, andmol. An optionalpose_idcolumn is stored when present.If
interactions_by_poseis supplied, it must be keyed by the stored externalpose_idvalues.- Parameters:
df (pd.DataFrame) – Input DataFrame.
replace (bool) – Whether existing pose rows should be updated.
interactions_by_pose (Optional[Mapping[str, Mapping[str, Any]]]) – Optional interaction payload keyed by external
pose_id.replace_interactions (bool) – Whether existing interactions for affected poses should first be deleted.
- Returns:
None
- Return type:
None
- Raises:
ValueError – If required DataFrame columns are missing.
Example#
db.insert_dataframe(df, replace=True)
- classmethod from_dataframe(
- db_path,
- df,
- *,
- compress_mol=True,
- replace=True,
- interactions_by_pose=None,
- replace_interactions=True,
Build a new database file from a DataFrame.
- Parameters:
db_path (PathLike) – Output SQLite file path.
df (pd.DataFrame) – Input DataFrame containing docking poses.
compress_mol (bool) – Whether stored molecule blobs should be compressed.
replace (bool) – Whether duplicate logical keys should be updated.
interactions_by_pose (Optional[Mapping[str, Mapping[str, Any]]]) – Optional interaction payloads keyed by external pose id.
replace_interactions (bool) – Whether to replace existing interactions when interaction payloads are supplied.
- Returns:
Initialized database instance.
- Return type:
- query_poses(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- top_rank=None,
- affinity_threshold=None,
- affinity_min=None,
- interaction_type=None,
- residue_id=None,
- chain_id=None,
- residue_name=None,
- residue_number=None,
- include_mol=True,
- include_interactions=False,
- interaction_mode='summary',
- as_dataframe=False,
- order_by=None,
- limit=None,
Query poses using flexible logical and interaction-aware filters.
Interaction filters can be used to return only poses that contain particular interactions, for example
interaction_type="Hydrophobic"andresidue_id="LEU23.A".If
include_interactionsis enabled, pose rows are enriched with either summary or detailed interaction payloads.- Parameters:
pose_db_id (Optional[int]) – Optional internal pose id filter.
pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id or sequence of ids.
receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor id or sequence of receptor ids.
ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand id or sequence of ligand ids.
engine (Optional[Union[str, Sequence[str]]]) – Optional engine name or sequence of engine names.
pose_rank (Optional[int]) – Optional exact pose rank.
top_rank (Optional[int]) – Optional maximum pose rank to keep.
affinity_threshold (Optional[float]) – Optional maximum affinity threshold.
affinity_min (Optional[float]) – Optional minimum affinity threshold.
interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction type filter.
residue_id (Optional[Union[str, Sequence[str]]]) – Optional residue id filter such as
"LEU23.A".chain_id (Optional[Union[str, Sequence[str]]]) – Optional chain filter.
residue_name (Optional[Union[str, Sequence[str]]]) – Optional residue-name filter.
residue_number (Optional[int]) – Optional residue-number filter.
include_mol (bool) – Whether deserialized RDKit molecules should be included.
include_interactions (bool) – Whether interaction payloads should be attached.
interaction_mode (str) – Interaction payload style, either
"summary"or"detailed".as_dataframe (bool) – Whether to return a pandas DataFrame instead of dataclass records.
order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition passed to
resolve_order_by().limit (Optional[int]) – Optional maximum number of returned rows.
- Returns:
List of
PoseRecordobjects or a pandas DataFrame.- Return type:
Union[list[PoseRecord], pd.DataFrame]
- get_pose(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- include_mol=True,
- include_interactions=False,
- interaction_mode='summary',
Fetch one exact pose by internal id, external id, or logical key.
- Parameters:
pose_db_id (Optional[int]) – Internal pose id.
pose_id (Optional[str]) – External stable pose id.
receptor_id (Optional[str]) – Receptor identifier.
ligand_id (Optional[str]) – Ligand identifier.
engine (Optional[str]) – Engine name.
pose_rank (Optional[int]) – Pose rank within the receptor-ligand-engine group.
include_mol (bool) – Whether to include the RDKit molecule.
include_interactions (bool) – Whether to attach interactions.
interaction_mode (str) –
"summary"or"detailed".
- Returns:
Matching pose or
Noneif no match exists.- Return type:
Optional[PoseRecord]
- query_scores(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- top_rank=None,
- affinity_threshold=None,
- affinity_min=None,
- as_dataframe=False,
- order_by=None,
- limit=None,
Query the dedicated
pose_scorestable joined to pose identity.- Parameters:
pose_db_id (Optional[int]) – Optional internal pose id filter.
pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id or sequence of ids.
receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor id filter.
ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand id filter.
engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
top_rank (Optional[int]) – Optional maximum pose rank.
affinity_threshold (Optional[float]) – Optional maximum affinity threshold.
affinity_min (Optional[float]) – Optional minimum affinity threshold.
as_dataframe (bool) – Whether to return a DataFrame.
order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition.
limit (Optional[int]) – Optional maximum number of rows.
- Returns:
List of
ScoreRecordor a DataFrame.- Return type:
Union[list[ScoreRecord], pd.DataFrame]
- count_poses(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- top_rank=None,
- affinity_threshold=None,
- affinity_min=None,
- interaction_type=None,
- residue_id=None,
- chain_id=None,
- residue_name=None,
- residue_number=None,
Count poses matching the supplied filters.
- add_interaction(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type,
- chain_id=None,
- residue_name=None,
- residue_number=None,
- residue_id=None,
- ligand_residue=None,
- occurrence_index=0,
- ligand_atom_indices=None,
- protein_atom_indices=None,
- ligand_parent_atom_indices=None,
- protein_parent_atom_indices=None,
- distance=None,
- angle=None,
- metadata=None,
- replace=False,
Insert one interaction linked to a stored pose.
A single row can represent either a summarized interaction or one detailed interaction event.
- Parameters:
pose_db_id (Optional[int]) – Internal pose id.
pose_id (Optional[str]) – External stable pose id.
receptor_id (Optional[str]) – Receptor id for logical-key lookup.
ligand_id (Optional[str]) – Ligand id for logical-key lookup.
engine (Optional[str]) – Engine name for logical-key lookup.
pose_rank (Optional[int]) – Pose rank for logical-key lookup.
interaction_type (str) – Interaction family, for example
"Hydrophobic".chain_id (Optional[str]) – Protein chain identifier.
residue_name (Optional[str]) – Residue name, for example
"LEU".residue_number (Optional[int]) – Residue sequence number.
residue_id (Optional[str]) – Combined residue id such as
"LEU23.A".ligand_residue (Optional[str]) – Ligand residue identifier if available.
occurrence_index (int) – Zero-based event index within one pose / residue / interaction type.
ligand_atom_indices (Optional[Sequence[int]]) – Ligand atom indices for the specific event.
protein_atom_indices (Optional[Sequence[int]]) – Protein atom indices for the specific event.
ligand_parent_atom_indices (Optional[Sequence[int]]) – Parent ligand atom indices when available.
protein_parent_atom_indices (Optional[Sequence[int]]) – Parent protein atom indices when available.
distance (Optional[float]) – Optional interaction distance.
angle (Optional[float]) – Optional interaction angle.
metadata (Optional[Mapping[str, Any]]) – Optional arbitrary metadata payload.
replace (bool) – Whether an existing unique interaction row should be updated.
- Returns:
New interaction identifier.
- Return type:
- delete_interactions_for_pose(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
Delete all interactions linked to a single pose.
- upsert_interaction_payload(interactions_by_pose, *, replace=True)#
Insert interaction payloads keyed by external
pose_id.Supported payload formats per pose are:
summary:
{"Hydrophobic": ["LEU23.A", "VAL31.A"]}detailed:
{"Hydrophobic": {"LEU23.A": [{...}, {...}]}}
Noneor empty payloads are treated as “no interactions”. Ifreplace=True, existing interactions for that pose are deleted and no new rows are inserted.
- insert_interactions(rows, *, replace=False)#
Insert many interaction rows inside one transaction.
Each row must provide either
pose_db_id, externalpose_id, or the full logical pose key.
- query_interactions(
- *,
- interaction_id=None,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type=None,
- chain_id=None,
- residue_name=None,
- residue_number=None,
- residue_id=None,
- ligand_residue=None,
- as_dataframe=False,
- order_by=None,
- limit=None,
Query stored interactions using pose-level and residue-level filters.
- Parameters:
interaction_id (Optional[int]) – Optional interaction primary-key filter.
pose_db_id (Optional[int]) – Optional internal pose id filter.
pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id or sequence of ids.
receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.
ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.
engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction type filter.
chain_id (Optional[Union[str, Sequence[str]]]) – Optional chain filter.
residue_name (Optional[Union[str, Sequence[str]]]) – Optional residue-name filter.
residue_number (Optional[int]) – Optional residue-number filter.
residue_id (Optional[Union[str, Sequence[str]]]) – Optional combined residue-id filter.
ligand_residue (Optional[Union[str, Sequence[str]]]) – Optional ligand residue filter.
as_dataframe (bool) – Whether to return a DataFrame.
order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition.
limit (Optional[int]) – Optional maximum number of rows.
- Returns:
List of
InteractionRecordor a DataFrame.- Return type:
Union[list[InteractionRecord], pd.DataFrame]
- get_interaction_summary(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type=None,
- residue_id=None,
- return_by='pose_key',
Return summarized interactions grouped by pose.
The output payload is compatible with the compact interaction format:
{pose_key: {interaction_type: [residue_id, ...]}}.- Parameters:
pose_db_id (Optional[Union[int, Sequence[int]]]) – Optional pose id or sequence of internal pose ids.
pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id filter.
receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.
ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.
engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction-type filter.
residue_id (Optional[Union[str, Sequence[str]]]) – Optional residue-id filter.
return_by (str) – One of
"pose_db_id","pose_id", or"pose_key".
- Returns:
Nested summary mapping grouped by pose.
- Return type:
- get_interaction_details(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type=None,
- residue_id=None,
- return_by='pose_key',
Return detailed interactions grouped by pose.
The output mirrors the nested detailed format:
{pose_key: {interaction_type: {residue_id: [event, ...]}}}.- Parameters:
pose_db_id (Optional[Union[int, Sequence[int]]]) – Optional pose id or sequence of internal pose ids.
pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id filter.
receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.
ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.
engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction-type filter.
residue_id (Optional[Union[str, Sequence[str]]]) – Optional residue-id filter.
return_by (str) – One of
"pose_db_id","pose_id", or"pose_key".
- Returns:
Nested detailed mapping grouped by pose.
- Return type:
dict[Union[int, str], dict[str, dict[str, list[dict[str, Any]]]]]
- interaction_fingerprint(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type=None,
- residue_id=None,
- mode='binary',
- feature_sep='::',
- index_by='pose_key',
Build a pose-by-feature interaction fingerprint matrix.
Features are named as
<interaction_type><feature_sep><residue_id>.- Parameters:
pose_db_id (Optional[Union[int, Sequence[int]]]) – Optional pose id or sequence of internal pose ids.
pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id filter.
receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.
ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.
engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction-type filter.
residue_id (Optional[Union[str, Sequence[str]]]) – Optional residue-id filter.
mode (str) – Either
"binary"or"count".feature_sep (str) – Separator used when building feature names.
index_by (str) – One of
"pose_db_id","pose_id", or"pose_key".
- Returns:
Fingerprint matrix as a pandas DataFrame.
- Return type:
pd.DataFrame
- Raises:
ValueError – If
modeorindex_byis invalid.
- summarize()#
Return one summary row per receptor-ligand-engine group.
- Returns:
Summary DataFrame with pose counts and best affinity values.
- Return type:
pd.DataFrame
- list_receptors()#
List all receptor identifiers present in the database.
- list_ligands()#
List all ligand identifiers present in the database.
- list_engines()#
List all docking engine names present in the database.
- delete_poses(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- top_rank=None,
- affinity_threshold=None,
- affinity_min=None,
Delete poses matching the supplied filters.
At least one filter must be provided to prevent accidental deletion of the entire table.
- Parameters:
pose_db_id (Optional[int]) – Optional internal pose id filter.
pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id or sequence of ids.
receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.
ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.
engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
top_rank (Optional[int]) – Optional maximum pose rank.
affinity_threshold (Optional[float]) – Optional maximum affinity threshold.
affinity_min (Optional[float]) – Optional minimum affinity threshold.
- Returns:
Number of deleted pose rows.
- Return type:
- Raises:
ValueError – If no filters are provided.
- delete_interactions(
- *,
- interaction_id=None,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type=None,
- chain_id=None,
- residue_name=None,
- residue_number=None,
- residue_id=None,
- ligand_residue=None,
Delete interactions matching the supplied filters.
- Parameters:
interaction_id (Optional[int]) – Optional interaction primary-key filter.
pose_db_id (Optional[int]) – Optional pose id filter.
pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id filter.
receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.
ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.
engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction-type filter.
chain_id (Optional[Union[str, Sequence[str]]]) – Optional chain filter.
residue_name (Optional[Union[str, Sequence[str]]]) – Optional residue-name filter.
residue_number (Optional[int]) – Optional residue-number filter.
residue_id (Optional[Union[str, Sequence[str]]]) – Optional residue-id filter.
ligand_residue (Optional[Union[str, Sequence[str]]]) – Optional ligand-residue filter.
- Returns:
Number of deleted interaction rows.
- Return type:
- Raises:
ValueError – If no filters are provided.
- vacuum()#
Run SQLite
VACUUMto compact the database file.- Returns:
None
- Return type:
None
Quey database#
- class PoseQuery(db_path=None, *, connection=None, timeout=30.0, read_only=True)#
Bases:
objectStandalone query client for an existing ProDock SQLite database.
The class opens an existing database file or attaches to an existing SQLite connection, then provides read/query helpers for stored poses, score rows, interactions, interaction summaries, and fingerprint matrices.
- Parameters:
db_path (Optional[PathLike]) – Path to an existing ProDock SQLite database. Required when
connectionis not supplied.connection (Optional[sqlite3.Connection]) – Existing SQLite connection to reuse.
timeout (float) – SQLite connection timeout in seconds.
read_only (bool) – Whether to open
db_pathin SQLite read-only mode.
- Raises:
ValueError – If neither
db_pathnorconnectionis provided.FileNotFoundError – If
db_pathdoes not exist.
Example#
from prodock.database import PoseQuery q = PoseQuery("prodock.db") df = q.poses(as_dataframe=True) print(df.head())
- property connection: Connection#
Return the underlying SQLite connection.
- Returns:
Active SQLite connection.
- Return type:
- close()#
Close the connection owned by this query object.
Connections passed in through
connection=...are not closed here.- Returns:
None
- Return type:
None
- poses(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- top_rank=None,
- affinity_threshold=None,
- affinity_min=None,
- interaction_type=None,
- residue_id=None,
- chain_id=None,
- residue_name=None,
- residue_number=None,
- include_mol=True,
- include_interactions=False,
- interaction_mode='summary',
- as_dataframe=False,
- order_by=None,
- limit=None,
Query poses using logical and interaction-aware filters.
Interaction filters can be used to return only poses that contain particular interaction patterns, for example
interaction_type="Hydrophobic"andresidue_id="LEU23.A".When
include_interactionsis enabled, each returned pose is enriched with either a compact summary payload or a detailed nested interaction payload.- Parameters:
pose_db_id (Optional[int]) – Optional internal pose id filter.
pose_id (FilterStr) – Optional external pose id or sequence of ids.
receptor_id (FilterStr) – Optional receptor id or sequence of receptor ids.
ligand_id (FilterStr) – Optional ligand id or sequence of ligand ids.
engine (FilterStr) – Optional engine name or sequence of engine names.
pose_rank (Optional[int]) – Optional exact pose rank.
top_rank (Optional[int]) – Optional maximum pose rank to keep.
affinity_threshold (Optional[float]) – Optional maximum affinity threshold.
affinity_min (Optional[float]) – Optional minimum affinity threshold.
interaction_type (FilterStr) – Optional interaction type filter.
residue_id (FilterStr) – Optional residue id filter such as
"LEU23.A".chain_id (FilterStr) – Optional chain filter.
residue_name (FilterStr) – Optional residue-name filter.
residue_number (Optional[int]) – Optional residue-number filter.
include_mol (bool) – Whether deserialized RDKit molecules should be included.
include_interactions (bool) – Whether interaction payloads should be attached.
interaction_mode (str) – Interaction payload style, either
"summary"or"detailed".as_dataframe (bool) – Whether to return a pandas DataFrame instead of dataclass records.
order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition passed to
resolve_order_by().limit (Optional[int]) – Optional maximum number of returned rows.
- Returns:
List of
PoseRecordobjects or a pandas DataFrame.- Return type:
Union[list[PoseRecord], pd.DataFrame]
Example#
q = PoseQuery("prodock.db") df = q.poses( receptor_id="1M17", top_rank=3, include_interactions=True, interaction_mode="summary", as_dataframe=True, ) print(df[["pose_id", "affinity", "interaction_summary"]].head())
- pose(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- include_mol=True,
- include_interactions=False,
- interaction_mode='summary',
Fetch one exact pose by internal id, external id, or logical key.
- Parameters:
pose_db_id (Optional[int]) – Internal pose id.
pose_id (Optional[str]) – External stable pose id.
receptor_id (Optional[str]) – Receptor identifier.
ligand_id (Optional[str]) – Ligand identifier.
engine (Optional[str]) – Engine name.
pose_rank (Optional[int]) – Pose rank within the receptor-ligand-engine group.
include_mol (bool) – Whether to include the RDKit molecule.
include_interactions (bool) – Whether to attach interactions.
interaction_mode (str) –
"summary"or"detailed".
- Returns:
Matching pose or
Noneif no match exists.- Return type:
Optional[PoseRecord]
Example#
q = PoseQuery("prodock.db") pose = q.pose( receptor_id="1M17", ligand_id="erlotinib", engine="qvina", pose_rank=1, ) print(pose)
- scores(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- top_rank=None,
- affinity_threshold=None,
- affinity_min=None,
- as_dataframe=False,
- order_by=None,
- limit=None,
Query score rows joined to pose identity.
- Parameters:
pose_db_id (Optional[int]) – Optional internal pose id filter.
pose_id (FilterStr) – Optional external pose id or sequence of ids.
receptor_id (FilterStr) – Optional receptor id filter.
ligand_id (FilterStr) – Optional ligand id filter.
engine (FilterStr) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
top_rank (Optional[int]) – Optional maximum pose rank.
affinity_threshold (Optional[float]) – Optional maximum affinity threshold.
affinity_min (Optional[float]) – Optional minimum affinity threshold.
as_dataframe (bool) – Whether to return a DataFrame.
order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition.
limit (Optional[int]) – Optional maximum number of rows.
- Returns:
List of
ScoreRecordor a pandas DataFrame.- Return type:
Union[list[ScoreRecord], pd.DataFrame]
- count_poses(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- top_rank=None,
- affinity_threshold=None,
- affinity_min=None,
- interaction_type=None,
- residue_id=None,
- chain_id=None,
- residue_name=None,
- residue_number=None,
Count poses matching the supplied filters.
- interactions(
- *,
- interaction_id=None,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type=None,
- chain_id=None,
- residue_name=None,
- residue_number=None,
- residue_id=None,
- ligand_residue=None,
- as_dataframe=False,
- order_by=None,
- limit=None,
Query stored interactions using pose-level and residue-level filters.
- Parameters:
interaction_id (Optional[int]) – Optional interaction primary-key filter.
pose_db_id (Optional[int]) – Optional internal pose id filter.
pose_id (FilterStr) – Optional external pose id or sequence of ids.
receptor_id (FilterStr) – Optional receptor filter.
ligand_id (FilterStr) – Optional ligand filter.
engine (FilterStr) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
interaction_type (FilterStr) – Optional interaction type filter.
chain_id (FilterStr) – Optional chain filter.
residue_name (FilterStr) – Optional residue-name filter.
residue_number (Optional[int]) – Optional residue-number filter.
residue_id (FilterStr) – Optional combined residue-id filter.
ligand_residue (FilterStr) – Optional ligand residue filter.
as_dataframe (bool) – Whether to return a DataFrame.
order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition.
limit (Optional[int]) – Optional maximum number of rows.
- Returns:
List of
InteractionRecordor a pandas DataFrame.- Return type:
Union[list[InteractionRecord], pd.DataFrame]
Example#
q = PoseQuery("prodock.db") df = q.interactions( receptor_id="1M17", interaction_type="Hydrophobic", as_dataframe=True, ) print(df[["pose_id", "interaction_type", "residue_id"]].head())
- interaction_summary(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type=None,
- residue_id=None,
- return_by='pose_key',
Return summarized interactions grouped by pose.
The returned payload uses the compact format
{pose_key: {interaction_type: [residue_id, ...]}}.- Parameters:
pose_db_id (FilterInt) – Optional pose id or sequence of internal pose ids.
pose_id (FilterStr) – Optional external pose id filter.
receptor_id (FilterStr) – Optional receptor filter.
ligand_id (FilterStr) – Optional ligand filter.
engine (FilterStr) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
interaction_type (FilterStr) – Optional interaction-type filter.
residue_id (FilterStr) – Optional residue-id filter.
return_by (str) – One of
"pose_db_id","pose_id", or"pose_key".
- Returns:
Nested summary mapping grouped by pose.
- Return type:
Example#
q = PoseQuery("prodock.db") summary = q.interaction_summary( receptor_id="1M17", return_by="pose_id", ) print(summary)
- interaction_details(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type=None,
- residue_id=None,
- return_by='pose_key',
Return detailed interactions grouped by pose.
The returned payload mirrors the nested detailed format
{pose_key: {interaction_type: {residue_id: [event, ...]}}}.- Parameters:
pose_db_id (FilterInt) – Optional pose id or sequence of internal pose ids.
pose_id (FilterStr) – Optional external pose id filter.
receptor_id (FilterStr) – Optional receptor filter.
ligand_id (FilterStr) – Optional ligand filter.
engine (FilterStr) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
interaction_type (FilterStr) – Optional interaction-type filter.
residue_id (FilterStr) – Optional residue-id filter.
return_by (str) – One of
"pose_db_id","pose_id", or"pose_key".
- Returns:
Nested detailed mapping grouped by pose.
- Return type:
dict[Union[int, str], dict[str, dict[str, list[dict[str, Any]]]]]
Example#
q = PoseQuery("prodock.db") details = q.interaction_details( receptor_id="1M17", return_by="pose_key", ) print(details)
- fingerprint(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type=None,
- residue_id=None,
- mode='binary',
- feature_sep='::',
- index_by='pose_key',
Build a pose-by-feature interaction fingerprint matrix.
Features are named as
<interaction_type><feature_sep><residue_id>.- Parameters:
pose_db_id (FilterInt) – Optional pose id or sequence of internal pose ids.
pose_id (FilterStr) – Optional external pose id filter.
receptor_id (FilterStr) – Optional receptor filter.
ligand_id (FilterStr) – Optional ligand filter.
engine (FilterStr) – Optional engine filter.
pose_rank (Optional[int]) – Optional exact pose-rank filter.
interaction_type (FilterStr) – Optional interaction-type filter.
residue_id (FilterStr) – Optional residue-id filter.
mode (str) – Either
"binary"or"count".feature_sep (str) – Separator used when building feature names.
index_by (str) – One of
"pose_db_id","pose_id", or"pose_key".
- Returns:
Fingerprint matrix as a pandas DataFrame.
- Return type:
pd.DataFrame
- Raises:
ValueError – If
modeorindex_byis invalid.
Example#
q = PoseQuery("prodock.db") fp = q.fingerprint( receptor_id="1M17", mode="binary", index_by="pose_key", ) print(fp.head())
- summary()#
Return one summary row per receptor-ligand-engine group.
The summary includes the number of stored poses, best affinity value, maximum stored pose rank, and the number of linked interaction rows.
- Returns:
Summary DataFrame with pose counts and best affinity values.
- Return type:
pd.DataFrame
Example#
q = PoseQuery("prodock.db") print(q.summary())
- receptors()#
List all receptor identifiers present in the database.
- ligands()#
List all ligand identifiers present in the database.
- engines()#
List all docking engine names present in the database.
- query_poses(**kwargs)#
Alias for
poses().- Parameters:
kwargs (Any)
- Return type:
list[PoseRecord] | pandas.DataFrame
- get_pose(**kwargs)#
Alias for
pose().- Parameters:
kwargs (Any)
- Return type:
PoseRecord | None
- query_scores(**kwargs)#
Alias for
scores().- Parameters:
kwargs (Any)
- Return type:
list[ScoreRecord] | pandas.DataFrame
- query_interactions(**kwargs)#
Alias for
interactions().- Parameters:
kwargs (Any)
- Return type:
list[InteractionRecord] | pandas.DataFrame
- get_interaction_summary(**kwargs)#
Alias for
interaction_summary().
- get_interaction_details(**kwargs)#
Alias for
interaction_details().
- interaction_fingerprint(**kwargs)#
Alias for
fingerprint().- Parameters:
kwargs (Any)
- Return type:
pandas.DataFrame
- list_receptors()#
Alias for
receptors().
- build_pose_where_clause(
- *,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- top_rank=None,
- affinity_threshold=None,
- affinity_min=None,
- interaction_type=None,
- residue_id=None,
- chain_id=None,
- residue_name=None,
- residue_number=None,
Build a
WHEREclause for pose queries.Interaction filters are translated to an
EXISTSsubquery so poses can be queried directly by interaction content.
- build_interaction_where_clause(
- *,
- interaction_id=None,
- pose_db_id=None,
- pose_id=None,
- receptor_id=None,
- ligand_id=None,
- engine=None,
- pose_rank=None,
- interaction_type=None,
- chain_id=None,
- residue_name=None,
- residue_number=None,
- residue_id=None,
- ligand_residue=None,
Build a
WHEREclause for interaction queries.
- resolve_order_by(order_by)#
Resolve a public order key to a SQL
ORDER BYclause.- Parameters:
order_by (Optional[Union[str, Sequence[str]]]) – Column name or list of column names. Prefix with
-for descending.- Returns:
SQL
ORDER BYclause.- Return type:
- Raises:
ValueError – If an unsupported key is supplied.
Records#
- class PoseRecord(
- pose_db_id,
- pose_id,
- receptor_id,
- ligand_id,
- engine,
- pose_rank,
- affinity,
- mol,
- pose_metadata,
- score_data,
- score_metadata,
- interaction_summary=<factory>,
- interaction_details=<factory>,
- created_at='',
Bases:
objectImmutable in-memory representation of a docking pose row.
This record combines core pose identity fields with optional molecule content, score payloads, and aggregated interaction summaries. It is designed to act as a typed container for rows reconstructed from the ProDock SQLite database layer.
- Parameters:
pose_db_id (int) – Internal SQLite integer primary key for the pose row.
pose_id (Optional[str]) – Optional external stable pose identifier, for example
1M17__erlotinib__qvina__pose1. When absent, a logical pose key can still be generated from the receptor, ligand, engine, and rank fields.receptor_id (str) – Receptor identifier associated with the pose.
ligand_id (str) – Ligand identifier associated with the pose.
engine (str) – Docking engine name, for example
vina,smina, orqvina.pose_rank (int) – One-based pose rank within the receptor-ligand-engine group.
affinity (Optional[float]) – Primary affinity value associated with the pose, if available.
mol (Optional[rdchem.Mol]) – Deserialized RDKit molecule for the pose. This may be
Nonewhen molecule blobs are not loaded from the database.pose_metadata (dict[str, Any]) – Free-form pose-level metadata stored with the
posestable row.score_data (dict[str, Any]) – Structured score payload stored in the related
pose_scoresrow.score_metadata (dict[str, Any]) – Additional metadata associated with the score payload.
interaction_summary (dict[str, list[str]]) – Optional grouped interaction summary in the form
{interaction_type: [residue_id, ...]}.interaction_details (dict[str, Any]) – Optional grouped detailed interaction payload, typically shaped like
{interaction_type: {residue_id: [event, ...]}}or a similar nested structure.created_at (str) – SQLite insertion timestamp for the pose row.
- Example:
>>> record = PoseRecord( ... pose_db_id=1, ... pose_id=None, ... receptor_id="1M17", ... ligand_id="erlotinib", ... engine="qvina", ... pose_rank=1, ... affinity=-6.2, ... mol=None, ... pose_metadata={}, ... score_data={"affinity": -6.2}, ... score_metadata={}, ... ) >>> record.pose_key '1M17__erlotinib__qvina__pose1'
- property pose_key: str#
Return the best available human-readable pose key.
This property prefers the stored external
pose_idwhen present. Otherwise, it deterministically reconstructs a logical key from the receptor identifier, ligand identifier, engine, and pose rank.- Returns:
Stable pose key suitable for display, export, or downstream matching.
- Return type:
- Example:
>>> record.pose_key '1M17__erlotinib__qvina__pose1'
- class ScoreRecord(
- pose_db_id,
- pose_id,
- receptor_id,
- ligand_id,
- engine,
- pose_rank,
- affinity,
- score_data,
- metadata,
Bases:
objectImmutable in-memory representation of a pose score row.
This record stores resolved pose identity fields together with a structured score payload and optional metadata. It is typically used for score-centric queries where loading the full pose molecule or interaction details is not necessary.
- Parameters:
pose_db_id (int) – Internal SQLite pose primary key.
pose_id (Optional[str]) – Optional external stable pose identifier.
receptor_id (str) – Receptor identifier resolved through the associated pose row.
ligand_id (str) – Ligand identifier resolved through the associated pose row.
engine (str) – Docking engine name resolved through the associated pose row.
pose_rank (int) – One-based pose rank mirrored from the pose row.
affinity (Optional[float]) – Primary affinity value, if available.
score_data (dict[str, Any]) – Structured score payload, for example containing raw engine-specific score terms.
metadata (dict[str, Any]) – Additional metadata associated with the score record.
- Example:
>>> record = ScoreRecord( ... pose_db_id=1, ... pose_id=None, ... receptor_id="1M17", ... ligand_id="erlotinib", ... engine="vina", ... pose_rank=2, ... affinity=-7.1, ... score_data={"affinity": -7.1, "cnn_pose": 0.82}, ... metadata={}, ... ) >>> record.pose_key '1M17__erlotinib__vina__pose2'
- property pose_key: str#
Return the best available human-readable pose key.
This property prefers the stored external
pose_idwhen available. Otherwise, it reconstructs a deterministic logical key from the pose identity fields.- Returns:
Stable pose key suitable for display and record matching.
- Return type:
- class InteractionRecord(
- interaction_id,
- pose_db_id,
- pose_id,
- receptor_id,
- ligand_id,
- engine,
- pose_rank,
- interaction_type,
- chain_id,
- residue_name,
- residue_number,
- residue_id,
- ligand_residue,
- occurrence_index,
- ligand_atom_indices,
- protein_atom_indices,
- ligand_parent_atom_indices,
- protein_parent_atom_indices,
- distance,
- angle,
- metadata,
- created_at,
Bases:
objectImmutable in-memory representation of a pose interaction row.
Each instance represents one detailed interaction event associated with a specific docking pose. The record includes resolved pose identity fields, residue-level annotations, atom index mappings, geometric descriptors, and arbitrary extra metadata.
- Parameters:
interaction_id (int) – Internal SQLite integer primary key for the interaction row.
pose_db_id (int) – Foreign-key link to the associated pose row.
pose_id (Optional[str]) – Optional external stable pose identifier.
receptor_id (str) – Receptor identifier resolved through the associated pose row.
ligand_id (str) – Ligand identifier resolved through the associated pose row.
engine (str) – Docking engine name resolved through the associated pose row.
pose_rank (int) – One-based pose rank resolved through the associated pose row.
interaction_type (str) – Interaction label such as
Hydrophobic,VdWContact, orHBDonor.chain_id (Optional[str]) – Optional protein chain identifier.
residue_name (Optional[str]) – Optional residue name, for example
LEU.residue_number (Optional[int]) – Optional residue number, for example
149.residue_id (Optional[str]) – Optional compact residue identifier such as
LEU149.A.ligand_residue (Optional[str]) – Optional ligand residue label such as
LIG1.occurrence_index (int) – Zero-based occurrence index for repeated interactions of the same type at the same residue.
ligand_atom_indices (list[int]) – Ligand atom indices participating directly in the interaction.
protein_atom_indices (list[int]) – Protein atom indices participating directly in the interaction.
ligand_parent_atom_indices (list[int]) – Parent ligand atom indices when available from the upstream interaction extractor.
protein_parent_atom_indices (list[int]) – Parent protein atom indices when available from the upstream interaction extractor.
distance (Optional[float]) – Optional interaction distance value.
angle (Optional[float]) – Optional interaction angle value.
metadata (dict[str, Any]) – Additional free-form metadata associated with the interaction event.
created_at (str) – SQLite insertion timestamp for the interaction row.
- Example:
>>> record = InteractionRecord( ... interaction_id=1, ... pose_db_id=10, ... pose_id=None, ... receptor_id="1M17", ... ligand_id="erlotinib", ... engine="qvina", ... pose_rank=1, ... interaction_type="Hydrophobic", ... chain_id="A", ... residue_name="LEU", ... residue_number=149, ... residue_id="LEU149.A", ... ligand_residue="LIG1", ... occurrence_index=0, ... ligand_atom_indices=[2], ... protein_atom_indices=[9], ... ligand_parent_atom_indices=[2], ... protein_parent_atom_indices=[2392], ... distance=4.49, ... angle=None, ... metadata={}, ... created_at="2026-04-02 10:00:00", ... ) >>> record.pose_key '1M17__erlotinib__qvina__pose1'
- property pose_key: str#
Return the best available human-readable pose key.
This property prefers the stored external
pose_idwhen available. Otherwise, it reconstructs a deterministic logical key from the pose identity fields.- Returns:
Stable pose key suitable for grouping, display, and record matching.
- Return type: