Database API#

Core database#

class PoseDatabase(db_path, *, compress_mol=True, create=True, timeout=30.0)#

Bases: object

SQLite database wrapper for docking pose, score, and interaction storage.

The wrapper exposes convenience APIs for three common workflows:

  1. Insert docking poses from row mappings or a pandas DataFrame

  2. Store interactions either row-by-row or from pose-keyed dictionaries

  3. Query poses, scores, and interactions with flexible filters

Tables#

  • receptors: receptor dimension table

  • ligands: ligand dimension table

  • engines: docking engine dimension table

  • poses: pose identity, optional external pose_id, and molecules

  • pose_scores: affinity and score payloads

  • interactions: one row per interaction event or summary interaction

If a DataFrame does not provide an external pose_id column, the logical unique key remains (receptor_id, ligand_id, engine, pose_rank). If an external pose_id is present, it is stored and can later be used to import interactions from pose-keyed dictionaries.

param db_path:

SQLite database file path.

type db_path:

PathLike

param compress_mol:

Whether serialized RDKit molecules should be compressed with zlib.

type compress_mol:

bool

param create:

Whether to create the schema on initialization.

type create:

bool

param timeout:

SQLite connection timeout in seconds.

type timeout:

float

Example#

from prodock.database import PoseDatabase

db = PoseDatabase("poses.sqlite")
db.insert_dataframe(df)
db.upsert_interaction_payload(interactions_by_pose)

fp = db.interaction_fingerprint(mode="binary")
property connection: Connection#

Return the underlying SQLite connection.

Returns:

Active SQLite connection.

Return type:

sqlite3.Connection

close()#

Close the active SQLite connection.

Returns:

None

Return type:

None

create_schema()#

Create the database schema if it does not yet exist.

Returns:

None

Return type:

None

Example#

db = PoseDatabase("poses.sqlite", create=False)
db.create_schema()
upsert_pose(
*,
receptor_id,
ligand_id,
engine,
pose_rank,
affinity,
mol,
pose_id=None,
pose_metadata=None,
score_data=None,
score_metadata=None,
receptor_metadata=None,
ligand_metadata=None,
engine_metadata=None,
)#

Insert or update one docking pose and its score row.

Parameters:
  • receptor_id (str) – Receptor identifier.

  • ligand_id (str) – Ligand identifier.

  • engine (str) – Docking engine name.

  • pose_rank (int) – Pose rank within the receptor-ligand-engine group.

  • affinity (Optional[float]) – Primary affinity score.

  • mol (rdchem.Mol) – RDKit molecule to store.

  • pose_id (Optional[str]) – Optional external stable pose identifier.

  • pose_metadata (Optional[Mapping[str, Any]]) – Optional pose metadata payload.

  • score_data (Optional[Mapping[str, Any]]) – Optional structured score payload.

  • score_metadata (Optional[Mapping[str, Any]]) – Optional score metadata payload.

  • receptor_metadata (Optional[Mapping[str, Any]]) – Optional receptor metadata.

  • ligand_metadata (Optional[Mapping[str, Any]]) – Optional ligand metadata.

  • engine_metadata (Optional[Mapping[str, Any]]) – Optional engine metadata.

Returns:

Internal pose_db_id.

Return type:

int

Example#

from rdkit import Chem

mol = Chem.MolFromSmiles("CCO")
pose_db_id = db.upsert_pose(
    receptor_id="1M17",
    ligand_id="erlotinib",
    engine="qvina",
    pose_rank=1,
    affinity=-8.1,
    mol=mol,
    pose_id="1M17__erlotinib__qvina__pose1",
)
insert_many(rows, *, replace=True)#

Insert many pose rows inside one transaction.

Each row should contain at least receptor_id, ligand_id, engine, pose_rank, affinity, and mol. An optional external string pose_id is supported.

Parameters:
  • rows (Iterable[Mapping[str, Any]]) – Iterable of row-like mappings.

  • replace (bool) – Whether to upsert existing logical keys.

Returns:

None

Return type:

None

Example#

rows = [
    {
        "pose_id": "1M17__erol__qvina__pose1",
        "receptor_id": "1M17",
        "ligand_id": "erol",
        "engine": "qvina",
        "pose_rank": 1,
        "affinity": -8.2,
        "mol": mol,
    }
]
db.insert_many(rows, replace=True)
insert_dataframe(
df,
*,
replace=True,
interactions_by_pose=None,
replace_interactions=True,
)#

Insert a pandas DataFrame of docking poses.

Required columns are receptor_id, ligand_id, engine, pose_rank, affinity, and mol. An optional pose_id column is stored when present.

If interactions_by_pose is supplied, it must be keyed by the stored external pose_id values.

Parameters:
  • df (pd.DataFrame) – Input DataFrame.

  • replace (bool) – Whether existing pose rows should be updated.

  • interactions_by_pose (Optional[Mapping[str, Mapping[str, Any]]]) – Optional interaction payload keyed by external pose_id.

  • replace_interactions (bool) – Whether existing interactions for affected poses should first be deleted.

Returns:

None

Return type:

None

Raises:

ValueError – If required DataFrame columns are missing.

Example#

db.insert_dataframe(df, replace=True)
classmethod from_dataframe(
db_path,
df,
*,
compress_mol=True,
replace=True,
interactions_by_pose=None,
replace_interactions=True,
)#

Build a new database file from a DataFrame.

Parameters:
  • db_path (PathLike) – Output SQLite file path.

  • df (pd.DataFrame) – Input DataFrame containing docking poses.

  • compress_mol (bool) – Whether stored molecule blobs should be compressed.

  • replace (bool) – Whether duplicate logical keys should be updated.

  • interactions_by_pose (Optional[Mapping[str, Mapping[str, Any]]]) – Optional interaction payloads keyed by external pose id.

  • replace_interactions (bool) – Whether to replace existing interactions when interaction payloads are supplied.

Returns:

Initialized database instance.

Return type:

PoseDatabase

query_poses(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
top_rank=None,
affinity_threshold=None,
affinity_min=None,
interaction_type=None,
residue_id=None,
chain_id=None,
residue_name=None,
residue_number=None,
include_mol=True,
include_interactions=False,
interaction_mode='summary',
as_dataframe=False,
order_by=None,
limit=None,
)#

Query poses using flexible logical and interaction-aware filters.

Interaction filters can be used to return only poses that contain particular interactions, for example interaction_type="Hydrophobic" and residue_id="LEU23.A".

If include_interactions is enabled, pose rows are enriched with either summary or detailed interaction payloads.

Parameters:
  • pose_db_id (Optional[int]) – Optional internal pose id filter.

  • pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id or sequence of ids.

  • receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor id or sequence of receptor ids.

  • ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand id or sequence of ligand ids.

  • engine (Optional[Union[str, Sequence[str]]]) – Optional engine name or sequence of engine names.

  • pose_rank (Optional[int]) – Optional exact pose rank.

  • top_rank (Optional[int]) – Optional maximum pose rank to keep.

  • affinity_threshold (Optional[float]) – Optional maximum affinity threshold.

  • affinity_min (Optional[float]) – Optional minimum affinity threshold.

  • interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction type filter.

  • residue_id (Optional[Union[str, Sequence[str]]]) – Optional residue id filter such as "LEU23.A".

  • chain_id (Optional[Union[str, Sequence[str]]]) – Optional chain filter.

  • residue_name (Optional[Union[str, Sequence[str]]]) – Optional residue-name filter.

  • residue_number (Optional[int]) – Optional residue-number filter.

  • include_mol (bool) – Whether deserialized RDKit molecules should be included.

  • include_interactions (bool) – Whether interaction payloads should be attached.

  • interaction_mode (str) – Interaction payload style, either "summary" or "detailed".

  • as_dataframe (bool) – Whether to return a pandas DataFrame instead of dataclass records.

  • order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition passed to resolve_order_by().

  • limit (Optional[int]) – Optional maximum number of returned rows.

Returns:

List of PoseRecord objects or a pandas DataFrame.

Return type:

Union[list[PoseRecord], pd.DataFrame]

get_pose(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
include_mol=True,
include_interactions=False,
interaction_mode='summary',
)#

Fetch one exact pose by internal id, external id, or logical key.

Parameters:
  • pose_db_id (Optional[int]) – Internal pose id.

  • pose_id (Optional[str]) – External stable pose id.

  • receptor_id (Optional[str]) – Receptor identifier.

  • ligand_id (Optional[str]) – Ligand identifier.

  • engine (Optional[str]) – Engine name.

  • pose_rank (Optional[int]) – Pose rank within the receptor-ligand-engine group.

  • include_mol (bool) – Whether to include the RDKit molecule.

  • include_interactions (bool) – Whether to attach interactions.

  • interaction_mode (str) – "summary" or "detailed".

Returns:

Matching pose or None if no match exists.

Return type:

Optional[PoseRecord]

query_scores(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
top_rank=None,
affinity_threshold=None,
affinity_min=None,
as_dataframe=False,
order_by=None,
limit=None,
)#

Query the dedicated pose_scores table joined to pose identity.

Parameters:
  • pose_db_id (Optional[int]) – Optional internal pose id filter.

  • pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id or sequence of ids.

  • receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor id filter.

  • ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand id filter.

  • engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • top_rank (Optional[int]) – Optional maximum pose rank.

  • affinity_threshold (Optional[float]) – Optional maximum affinity threshold.

  • affinity_min (Optional[float]) – Optional minimum affinity threshold.

  • as_dataframe (bool) – Whether to return a DataFrame.

  • order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition.

  • limit (Optional[int]) – Optional maximum number of rows.

Returns:

List of ScoreRecord or a DataFrame.

Return type:

Union[list[ScoreRecord], pd.DataFrame]

count_poses(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
top_rank=None,
affinity_threshold=None,
affinity_min=None,
interaction_type=None,
residue_id=None,
chain_id=None,
residue_name=None,
residue_number=None,
)#

Count poses matching the supplied filters.

Returns:

Number of matching pose rows.

Return type:

int

Parameters:
add_interaction(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type,
chain_id=None,
residue_name=None,
residue_number=None,
residue_id=None,
ligand_residue=None,
occurrence_index=0,
ligand_atom_indices=None,
protein_atom_indices=None,
ligand_parent_atom_indices=None,
protein_parent_atom_indices=None,
distance=None,
angle=None,
metadata=None,
replace=False,
)#

Insert one interaction linked to a stored pose.

A single row can represent either a summarized interaction or one detailed interaction event.

Parameters:
  • pose_db_id (Optional[int]) – Internal pose id.

  • pose_id (Optional[str]) – External stable pose id.

  • receptor_id (Optional[str]) – Receptor id for logical-key lookup.

  • ligand_id (Optional[str]) – Ligand id for logical-key lookup.

  • engine (Optional[str]) – Engine name for logical-key lookup.

  • pose_rank (Optional[int]) – Pose rank for logical-key lookup.

  • interaction_type (str) – Interaction family, for example "Hydrophobic".

  • chain_id (Optional[str]) – Protein chain identifier.

  • residue_name (Optional[str]) – Residue name, for example "LEU".

  • residue_number (Optional[int]) – Residue sequence number.

  • residue_id (Optional[str]) – Combined residue id such as "LEU23.A".

  • ligand_residue (Optional[str]) – Ligand residue identifier if available.

  • occurrence_index (int) – Zero-based event index within one pose / residue / interaction type.

  • ligand_atom_indices (Optional[Sequence[int]]) – Ligand atom indices for the specific event.

  • protein_atom_indices (Optional[Sequence[int]]) – Protein atom indices for the specific event.

  • ligand_parent_atom_indices (Optional[Sequence[int]]) – Parent ligand atom indices when available.

  • protein_parent_atom_indices (Optional[Sequence[int]]) – Parent protein atom indices when available.

  • distance (Optional[float]) – Optional interaction distance.

  • angle (Optional[float]) – Optional interaction angle.

  • metadata (Optional[Mapping[str, Any]]) – Optional arbitrary metadata payload.

  • replace (bool) – Whether an existing unique interaction row should be updated.

Returns:

New interaction identifier.

Return type:

int

delete_interactions_for_pose(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
)#

Delete all interactions linked to a single pose.

Returns:

Number of deleted interaction rows.

Return type:

int

Parameters:
  • pose_db_id (int | None)

  • pose_id (str | None)

  • receptor_id (str | None)

  • ligand_id (str | None)

  • engine (str | None)

  • pose_rank (int | None)

upsert_interaction_payload(interactions_by_pose, *, replace=True)#

Insert interaction payloads keyed by external pose_id.

Supported payload formats per pose are:

  • summary: {"Hydrophobic": ["LEU23.A", "VAL31.A"]}

  • detailed: {"Hydrophobic": {"LEU23.A": [{...}, {...}]}}

None or empty payloads are treated as “no interactions”. If replace=True, existing interactions for that pose are deleted and no new rows are inserted.

Parameters:
  • interactions_by_pose (Mapping[str, Optional[Mapping[str, Any]]]) – Mapping from external pose_id to interaction payload.

  • replace (bool) – Whether existing interactions for each affected pose should first be deleted.

Returns:

None

Return type:

None

insert_interactions(rows, *, replace=False)#

Insert many interaction rows inside one transaction.

Each row must provide either pose_db_id, external pose_id, or the full logical pose key.

Parameters:
  • rows (Iterable[Mapping[str, Any]]) – Iterable of interaction row mappings.

  • replace (bool) – Whether conflicting interaction rows should be updated.

Returns:

None

Return type:

None

query_interactions(
*,
interaction_id=None,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type=None,
chain_id=None,
residue_name=None,
residue_number=None,
residue_id=None,
ligand_residue=None,
as_dataframe=False,
order_by=None,
limit=None,
)#

Query stored interactions using pose-level and residue-level filters.

Parameters:
  • interaction_id (Optional[int]) – Optional interaction primary-key filter.

  • pose_db_id (Optional[int]) – Optional internal pose id filter.

  • pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id or sequence of ids.

  • receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.

  • ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.

  • engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction type filter.

  • chain_id (Optional[Union[str, Sequence[str]]]) – Optional chain filter.

  • residue_name (Optional[Union[str, Sequence[str]]]) – Optional residue-name filter.

  • residue_number (Optional[int]) – Optional residue-number filter.

  • residue_id (Optional[Union[str, Sequence[str]]]) – Optional combined residue-id filter.

  • ligand_residue (Optional[Union[str, Sequence[str]]]) – Optional ligand residue filter.

  • as_dataframe (bool) – Whether to return a DataFrame.

  • order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition.

  • limit (Optional[int]) – Optional maximum number of rows.

Returns:

List of InteractionRecord or a DataFrame.

Return type:

Union[list[InteractionRecord], pd.DataFrame]

get_interaction_summary(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type=None,
residue_id=None,
return_by='pose_key',
)#

Return summarized interactions grouped by pose.

The output payload is compatible with the compact interaction format: {pose_key: {interaction_type: [residue_id, ...]}}.

Parameters:
  • pose_db_id (Optional[Union[int, Sequence[int]]]) – Optional pose id or sequence of internal pose ids.

  • pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id filter.

  • receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.

  • ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.

  • engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction-type filter.

  • residue_id (Optional[Union[str, Sequence[str]]]) – Optional residue-id filter.

  • return_by (str) – One of "pose_db_id", "pose_id", or "pose_key".

Returns:

Nested summary mapping grouped by pose.

Return type:

dict[Union[int, str], dict[str, list[str]]]

get_interaction_details(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type=None,
residue_id=None,
return_by='pose_key',
)#

Return detailed interactions grouped by pose.

The output mirrors the nested detailed format: {pose_key: {interaction_type: {residue_id: [event, ...]}}}.

Parameters:
  • pose_db_id (Optional[Union[int, Sequence[int]]]) – Optional pose id or sequence of internal pose ids.

  • pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id filter.

  • receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.

  • ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.

  • engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction-type filter.

  • residue_id (Optional[Union[str, Sequence[str]]]) – Optional residue-id filter.

  • return_by (str) – One of "pose_db_id", "pose_id", or "pose_key".

Returns:

Nested detailed mapping grouped by pose.

Return type:

dict[Union[int, str], dict[str, dict[str, list[dict[str, Any]]]]]

interaction_fingerprint(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type=None,
residue_id=None,
mode='binary',
feature_sep='::',
index_by='pose_key',
)#

Build a pose-by-feature interaction fingerprint matrix.

Features are named as <interaction_type><feature_sep><residue_id>.

Parameters:
  • pose_db_id (Optional[Union[int, Sequence[int]]]) – Optional pose id or sequence of internal pose ids.

  • pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id filter.

  • receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.

  • ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.

  • engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction-type filter.

  • residue_id (Optional[Union[str, Sequence[str]]]) – Optional residue-id filter.

  • mode (str) – Either "binary" or "count".

  • feature_sep (str) – Separator used when building feature names.

  • index_by (str) – One of "pose_db_id", "pose_id", or "pose_key".

Returns:

Fingerprint matrix as a pandas DataFrame.

Return type:

pd.DataFrame

Raises:

ValueError – If mode or index_by is invalid.

summarize()#

Return one summary row per receptor-ligand-engine group.

Returns:

Summary DataFrame with pose counts and best affinity values.

Return type:

pd.DataFrame

list_receptors()#

List all receptor identifiers present in the database.

Returns:

Sorted receptor identifiers.

Return type:

list[str]

list_ligands()#

List all ligand identifiers present in the database.

Returns:

Sorted ligand identifiers.

Return type:

list[str]

list_engines()#

List all docking engine names present in the database.

Returns:

Sorted engine names.

Return type:

list[str]

delete_poses(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
top_rank=None,
affinity_threshold=None,
affinity_min=None,
)#

Delete poses matching the supplied filters.

At least one filter must be provided to prevent accidental deletion of the entire table.

Parameters:
  • pose_db_id (Optional[int]) – Optional internal pose id filter.

  • pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id or sequence of ids.

  • receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.

  • ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.

  • engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • top_rank (Optional[int]) – Optional maximum pose rank.

  • affinity_threshold (Optional[float]) – Optional maximum affinity threshold.

  • affinity_min (Optional[float]) – Optional minimum affinity threshold.

Returns:

Number of deleted pose rows.

Return type:

int

Raises:

ValueError – If no filters are provided.

delete_interactions(
*,
interaction_id=None,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type=None,
chain_id=None,
residue_name=None,
residue_number=None,
residue_id=None,
ligand_residue=None,
)#

Delete interactions matching the supplied filters.

Parameters:
  • interaction_id (Optional[int]) – Optional interaction primary-key filter.

  • pose_db_id (Optional[int]) – Optional pose id filter.

  • pose_id (Optional[Union[str, Sequence[str]]]) – Optional external pose id filter.

  • receptor_id (Optional[Union[str, Sequence[str]]]) – Optional receptor filter.

  • ligand_id (Optional[Union[str, Sequence[str]]]) – Optional ligand filter.

  • engine (Optional[Union[str, Sequence[str]]]) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • interaction_type (Optional[Union[str, Sequence[str]]]) – Optional interaction-type filter.

  • chain_id (Optional[Union[str, Sequence[str]]]) – Optional chain filter.

  • residue_name (Optional[Union[str, Sequence[str]]]) – Optional residue-name filter.

  • residue_number (Optional[int]) – Optional residue-number filter.

  • residue_id (Optional[Union[str, Sequence[str]]]) – Optional residue-id filter.

  • ligand_residue (Optional[Union[str, Sequence[str]]]) – Optional ligand-residue filter.

Returns:

Number of deleted interaction rows.

Return type:

int

Raises:

ValueError – If no filters are provided.

vacuum()#

Run SQLite VACUUM to compact the database file.

Returns:

None

Return type:

None

Parameters:
  • db_path (PathLike)

  • compress_mol (bool)

  • create (bool)

  • timeout (float)

Quey database#

class PoseQuery(db_path=None, *, connection=None, timeout=30.0, read_only=True)#

Bases: object

Standalone query client for an existing ProDock SQLite database.

The class opens an existing database file or attaches to an existing SQLite connection, then provides read/query helpers for stored poses, score rows, interactions, interaction summaries, and fingerprint matrices.

Parameters:
  • db_path (Optional[PathLike]) – Path to an existing ProDock SQLite database. Required when connection is not supplied.

  • connection (Optional[sqlite3.Connection]) – Existing SQLite connection to reuse.

  • timeout (float) – SQLite connection timeout in seconds.

  • read_only (bool) – Whether to open db_path in SQLite read-only mode.

Raises:

Example#

from prodock.database import PoseQuery

q = PoseQuery("prodock.db")
df = q.poses(as_dataframe=True)
print(df.head())
property connection: Connection#

Return the underlying SQLite connection.

Returns:

Active SQLite connection.

Return type:

sqlite3.Connection

close()#

Close the connection owned by this query object.

Connections passed in through connection=... are not closed here.

Returns:

None

Return type:

None

poses(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
top_rank=None,
affinity_threshold=None,
affinity_min=None,
interaction_type=None,
residue_id=None,
chain_id=None,
residue_name=None,
residue_number=None,
include_mol=True,
include_interactions=False,
interaction_mode='summary',
as_dataframe=False,
order_by=None,
limit=None,
)#

Query poses using logical and interaction-aware filters.

Interaction filters can be used to return only poses that contain particular interaction patterns, for example interaction_type="Hydrophobic" and residue_id="LEU23.A".

When include_interactions is enabled, each returned pose is enriched with either a compact summary payload or a detailed nested interaction payload.

Parameters:
  • pose_db_id (Optional[int]) – Optional internal pose id filter.

  • pose_id (FilterStr) – Optional external pose id or sequence of ids.

  • receptor_id (FilterStr) – Optional receptor id or sequence of receptor ids.

  • ligand_id (FilterStr) – Optional ligand id or sequence of ligand ids.

  • engine (FilterStr) – Optional engine name or sequence of engine names.

  • pose_rank (Optional[int]) – Optional exact pose rank.

  • top_rank (Optional[int]) – Optional maximum pose rank to keep.

  • affinity_threshold (Optional[float]) – Optional maximum affinity threshold.

  • affinity_min (Optional[float]) – Optional minimum affinity threshold.

  • interaction_type (FilterStr) – Optional interaction type filter.

  • residue_id (FilterStr) – Optional residue id filter such as "LEU23.A".

  • chain_id (FilterStr) – Optional chain filter.

  • residue_name (FilterStr) – Optional residue-name filter.

  • residue_number (Optional[int]) – Optional residue-number filter.

  • include_mol (bool) – Whether deserialized RDKit molecules should be included.

  • include_interactions (bool) – Whether interaction payloads should be attached.

  • interaction_mode (str) – Interaction payload style, either "summary" or "detailed".

  • as_dataframe (bool) – Whether to return a pandas DataFrame instead of dataclass records.

  • order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition passed to resolve_order_by().

  • limit (Optional[int]) – Optional maximum number of returned rows.

Returns:

List of PoseRecord objects or a pandas DataFrame.

Return type:

Union[list[PoseRecord], pd.DataFrame]

Example#

q = PoseQuery("prodock.db")

df = q.poses(
    receptor_id="1M17",
    top_rank=3,
    include_interactions=True,
    interaction_mode="summary",
    as_dataframe=True,
)
print(df[["pose_id", "affinity", "interaction_summary"]].head())
pose(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
include_mol=True,
include_interactions=False,
interaction_mode='summary',
)#

Fetch one exact pose by internal id, external id, or logical key.

Parameters:
  • pose_db_id (Optional[int]) – Internal pose id.

  • pose_id (Optional[str]) – External stable pose id.

  • receptor_id (Optional[str]) – Receptor identifier.

  • ligand_id (Optional[str]) – Ligand identifier.

  • engine (Optional[str]) – Engine name.

  • pose_rank (Optional[int]) – Pose rank within the receptor-ligand-engine group.

  • include_mol (bool) – Whether to include the RDKit molecule.

  • include_interactions (bool) – Whether to attach interactions.

  • interaction_mode (str) – "summary" or "detailed".

Returns:

Matching pose or None if no match exists.

Return type:

Optional[PoseRecord]

Example#

q = PoseQuery("prodock.db")

pose = q.pose(
    receptor_id="1M17",
    ligand_id="erlotinib",
    engine="qvina",
    pose_rank=1,
)
print(pose)
scores(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
top_rank=None,
affinity_threshold=None,
affinity_min=None,
as_dataframe=False,
order_by=None,
limit=None,
)#

Query score rows joined to pose identity.

Parameters:
  • pose_db_id (Optional[int]) – Optional internal pose id filter.

  • pose_id (FilterStr) – Optional external pose id or sequence of ids.

  • receptor_id (FilterStr) – Optional receptor id filter.

  • ligand_id (FilterStr) – Optional ligand id filter.

  • engine (FilterStr) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • top_rank (Optional[int]) – Optional maximum pose rank.

  • affinity_threshold (Optional[float]) – Optional maximum affinity threshold.

  • affinity_min (Optional[float]) – Optional minimum affinity threshold.

  • as_dataframe (bool) – Whether to return a DataFrame.

  • order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition.

  • limit (Optional[int]) – Optional maximum number of rows.

Returns:

List of ScoreRecord or a pandas DataFrame.

Return type:

Union[list[ScoreRecord], pd.DataFrame]

count_poses(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
top_rank=None,
affinity_threshold=None,
affinity_min=None,
interaction_type=None,
residue_id=None,
chain_id=None,
residue_name=None,
residue_number=None,
)#

Count poses matching the supplied filters.

Returns:

Number of matching pose rows.

Return type:

int

Parameters:
interactions(
*,
interaction_id=None,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type=None,
chain_id=None,
residue_name=None,
residue_number=None,
residue_id=None,
ligand_residue=None,
as_dataframe=False,
order_by=None,
limit=None,
)#

Query stored interactions using pose-level and residue-level filters.

Parameters:
  • interaction_id (Optional[int]) – Optional interaction primary-key filter.

  • pose_db_id (Optional[int]) – Optional internal pose id filter.

  • pose_id (FilterStr) – Optional external pose id or sequence of ids.

  • receptor_id (FilterStr) – Optional receptor filter.

  • ligand_id (FilterStr) – Optional ligand filter.

  • engine (FilterStr) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • interaction_type (FilterStr) – Optional interaction type filter.

  • chain_id (FilterStr) – Optional chain filter.

  • residue_name (FilterStr) – Optional residue-name filter.

  • residue_number (Optional[int]) – Optional residue-number filter.

  • residue_id (FilterStr) – Optional combined residue-id filter.

  • ligand_residue (FilterStr) – Optional ligand residue filter.

  • as_dataframe (bool) – Whether to return a DataFrame.

  • order_by (Optional[Union[str, Sequence[str]]]) – Optional ordering clause definition.

  • limit (Optional[int]) – Optional maximum number of rows.

Returns:

List of InteractionRecord or a pandas DataFrame.

Return type:

Union[list[InteractionRecord], pd.DataFrame]

Example#

q = PoseQuery("prodock.db")

df = q.interactions(
    receptor_id="1M17",
    interaction_type="Hydrophobic",
    as_dataframe=True,
)
print(df[["pose_id", "interaction_type", "residue_id"]].head())
interaction_summary(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type=None,
residue_id=None,
return_by='pose_key',
)#

Return summarized interactions grouped by pose.

The returned payload uses the compact format {pose_key: {interaction_type: [residue_id, ...]}}.

Parameters:
  • pose_db_id (FilterInt) – Optional pose id or sequence of internal pose ids.

  • pose_id (FilterStr) – Optional external pose id filter.

  • receptor_id (FilterStr) – Optional receptor filter.

  • ligand_id (FilterStr) – Optional ligand filter.

  • engine (FilterStr) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • interaction_type (FilterStr) – Optional interaction-type filter.

  • residue_id (FilterStr) – Optional residue-id filter.

  • return_by (str) – One of "pose_db_id", "pose_id", or "pose_key".

Returns:

Nested summary mapping grouped by pose.

Return type:

dict[Union[int, str], dict[str, list[str]]]

Example#

q = PoseQuery("prodock.db")

summary = q.interaction_summary(
    receptor_id="1M17",
    return_by="pose_id",
)
print(summary)
interaction_details(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type=None,
residue_id=None,
return_by='pose_key',
)#

Return detailed interactions grouped by pose.

The returned payload mirrors the nested detailed format {pose_key: {interaction_type: {residue_id: [event, ...]}}}.

Parameters:
  • pose_db_id (FilterInt) – Optional pose id or sequence of internal pose ids.

  • pose_id (FilterStr) – Optional external pose id filter.

  • receptor_id (FilterStr) – Optional receptor filter.

  • ligand_id (FilterStr) – Optional ligand filter.

  • engine (FilterStr) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • interaction_type (FilterStr) – Optional interaction-type filter.

  • residue_id (FilterStr) – Optional residue-id filter.

  • return_by (str) – One of "pose_db_id", "pose_id", or "pose_key".

Returns:

Nested detailed mapping grouped by pose.

Return type:

dict[Union[int, str], dict[str, dict[str, list[dict[str, Any]]]]]

Example#

q = PoseQuery("prodock.db")

details = q.interaction_details(
    receptor_id="1M17",
    return_by="pose_key",
)
print(details)
fingerprint(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type=None,
residue_id=None,
mode='binary',
feature_sep='::',
index_by='pose_key',
)#

Build a pose-by-feature interaction fingerprint matrix.

Features are named as <interaction_type><feature_sep><residue_id>.

Parameters:
  • pose_db_id (FilterInt) – Optional pose id or sequence of internal pose ids.

  • pose_id (FilterStr) – Optional external pose id filter.

  • receptor_id (FilterStr) – Optional receptor filter.

  • ligand_id (FilterStr) – Optional ligand filter.

  • engine (FilterStr) – Optional engine filter.

  • pose_rank (Optional[int]) – Optional exact pose-rank filter.

  • interaction_type (FilterStr) – Optional interaction-type filter.

  • residue_id (FilterStr) – Optional residue-id filter.

  • mode (str) – Either "binary" or "count".

  • feature_sep (str) – Separator used when building feature names.

  • index_by (str) – One of "pose_db_id", "pose_id", or "pose_key".

Returns:

Fingerprint matrix as a pandas DataFrame.

Return type:

pd.DataFrame

Raises:

ValueError – If mode or index_by is invalid.

Example#

q = PoseQuery("prodock.db")

fp = q.fingerprint(
    receptor_id="1M17",
    mode="binary",
    index_by="pose_key",
)
print(fp.head())
summary()#

Return one summary row per receptor-ligand-engine group.

The summary includes the number of stored poses, best affinity value, maximum stored pose rank, and the number of linked interaction rows.

Returns:

Summary DataFrame with pose counts and best affinity values.

Return type:

pd.DataFrame

Example#

q = PoseQuery("prodock.db")
print(q.summary())
receptors()#

List all receptor identifiers present in the database.

Returns:

Sorted receptor identifiers.

Return type:

list[str]

ligands()#

List all ligand identifiers present in the database.

Returns:

Sorted ligand identifiers.

Return type:

list[str]

engines()#

List all docking engine names present in the database.

Returns:

Sorted engine names.

Return type:

list[str]

query_poses(**kwargs)#

Alias for poses().

Parameters:

kwargs (Any)

Return type:

list[PoseRecord] | pandas.DataFrame

get_pose(**kwargs)#

Alias for pose().

Parameters:

kwargs (Any)

Return type:

PoseRecord | None

query_scores(**kwargs)#

Alias for scores().

Parameters:

kwargs (Any)

Return type:

list[ScoreRecord] | pandas.DataFrame

query_interactions(**kwargs)#

Alias for interactions().

Parameters:

kwargs (Any)

Return type:

list[InteractionRecord] | pandas.DataFrame

get_interaction_summary(**kwargs)#

Alias for interaction_summary().

Parameters:

kwargs (Any)

Return type:

dict[int | str, dict[str, list[str]]]

get_interaction_details(**kwargs)#

Alias for interaction_details().

Parameters:

kwargs (Any)

Return type:

dict[int | str, dict[str, dict[str, list[dict[str, Any]]]]]

interaction_fingerprint(**kwargs)#

Alias for fingerprint().

Parameters:

kwargs (Any)

Return type:

pandas.DataFrame

summarize()#

Alias for summary().

Return type:

pandas.DataFrame

list_receptors()#

Alias for receptors().

Return type:

list[str]

list_ligands()#

Alias for ligands().

Return type:

list[str]

list_engines()#

Alias for engines().

Return type:

list[str]

build_pose_where_clause(
*,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
top_rank=None,
affinity_threshold=None,
affinity_min=None,
interaction_type=None,
residue_id=None,
chain_id=None,
residue_name=None,
residue_number=None,
)#

Build a WHERE clause for pose queries.

Interaction filters are translated to an EXISTS subquery so poses can be queried directly by interaction content.

Returns:

SQL fragment and bound parameters.

Return type:

tuple[str, list[Any]]

Parameters:
build_interaction_where_clause(
*,
interaction_id=None,
pose_db_id=None,
pose_id=None,
receptor_id=None,
ligand_id=None,
engine=None,
pose_rank=None,
interaction_type=None,
chain_id=None,
residue_name=None,
residue_number=None,
residue_id=None,
ligand_residue=None,
)#

Build a WHERE clause for interaction queries.

Returns:

SQL fragment and bound parameters.

Return type:

tuple[str, list[Any]]

Parameters:
resolve_order_by(order_by)#

Resolve a public order key to a SQL ORDER BY clause.

Parameters:

order_by (Optional[Union[str, Sequence[str]]]) – Column name or list of column names. Prefix with - for descending.

Returns:

SQL ORDER BY clause.

Return type:

str

Raises:

ValueError – If an unsupported key is supplied.

Records#

class PoseRecord(
pose_db_id,
pose_id,
receptor_id,
ligand_id,
engine,
pose_rank,
affinity,
mol,
pose_metadata,
score_data,
score_metadata,
interaction_summary=<factory>,
interaction_details=<factory>,
created_at='',
)#

Bases: object

Immutable in-memory representation of a docking pose row.

This record combines core pose identity fields with optional molecule content, score payloads, and aggregated interaction summaries. It is designed to act as a typed container for rows reconstructed from the ProDock SQLite database layer.

Parameters:
  • pose_db_id (int) – Internal SQLite integer primary key for the pose row.

  • pose_id (Optional[str]) – Optional external stable pose identifier, for example 1M17__erlotinib__qvina__pose1. When absent, a logical pose key can still be generated from the receptor, ligand, engine, and rank fields.

  • receptor_id (str) – Receptor identifier associated with the pose.

  • ligand_id (str) – Ligand identifier associated with the pose.

  • engine (str) – Docking engine name, for example vina, smina, or qvina.

  • pose_rank (int) – One-based pose rank within the receptor-ligand-engine group.

  • affinity (Optional[float]) – Primary affinity value associated with the pose, if available.

  • mol (Optional[rdchem.Mol]) – Deserialized RDKit molecule for the pose. This may be None when molecule blobs are not loaded from the database.

  • pose_metadata (dict[str, Any]) – Free-form pose-level metadata stored with the poses table row.

  • score_data (dict[str, Any]) – Structured score payload stored in the related pose_scores row.

  • score_metadata (dict[str, Any]) – Additional metadata associated with the score payload.

  • interaction_summary (dict[str, list[str]]) – Optional grouped interaction summary in the form {interaction_type: [residue_id, ...]}.

  • interaction_details (dict[str, Any]) – Optional grouped detailed interaction payload, typically shaped like {interaction_type: {residue_id: [event, ...]}} or a similar nested structure.

  • created_at (str) – SQLite insertion timestamp for the pose row.

Example:
>>> record = PoseRecord(
...     pose_db_id=1,
...     pose_id=None,
...     receptor_id="1M17",
...     ligand_id="erlotinib",
...     engine="qvina",
...     pose_rank=1,
...     affinity=-6.2,
...     mol=None,
...     pose_metadata={},
...     score_data={"affinity": -6.2},
...     score_metadata={},
... )
>>> record.pose_key
'1M17__erlotinib__qvina__pose1'
pose_db_id: int#
pose_id: str | None#
receptor_id: str#
ligand_id: str#
engine: str#
pose_rank: int#
affinity: float | None#
mol: rdkit.Chem.rdchem.Mol | None#
pose_metadata: dict[str, Any]#
score_data: dict[str, Any]#
score_metadata: dict[str, Any]#
interaction_summary: dict[str, list[str]]#
interaction_details: dict[str, Any]#
created_at: str = ''#
property pose_key: str#

Return the best available human-readable pose key.

This property prefers the stored external pose_id when present. Otherwise, it deterministically reconstructs a logical key from the receptor identifier, ligand identifier, engine, and pose rank.

Returns:

Stable pose key suitable for display, export, or downstream matching.

Return type:

str

Example:
>>> record.pose_key
'1M17__erlotinib__qvina__pose1'
class ScoreRecord(
pose_db_id,
pose_id,
receptor_id,
ligand_id,
engine,
pose_rank,
affinity,
score_data,
metadata,
)#

Bases: object

Immutable in-memory representation of a pose score row.

This record stores resolved pose identity fields together with a structured score payload and optional metadata. It is typically used for score-centric queries where loading the full pose molecule or interaction details is not necessary.

Parameters:
  • pose_db_id (int) – Internal SQLite pose primary key.

  • pose_id (Optional[str]) – Optional external stable pose identifier.

  • receptor_id (str) – Receptor identifier resolved through the associated pose row.

  • ligand_id (str) – Ligand identifier resolved through the associated pose row.

  • engine (str) – Docking engine name resolved through the associated pose row.

  • pose_rank (int) – One-based pose rank mirrored from the pose row.

  • affinity (Optional[float]) – Primary affinity value, if available.

  • score_data (dict[str, Any]) – Structured score payload, for example containing raw engine-specific score terms.

  • metadata (dict[str, Any]) – Additional metadata associated with the score record.

Example:
>>> record = ScoreRecord(
...     pose_db_id=1,
...     pose_id=None,
...     receptor_id="1M17",
...     ligand_id="erlotinib",
...     engine="vina",
...     pose_rank=2,
...     affinity=-7.1,
...     score_data={"affinity": -7.1, "cnn_pose": 0.82},
...     metadata={},
... )
>>> record.pose_key
'1M17__erlotinib__vina__pose2'
pose_db_id: int#
pose_id: str | None#
receptor_id: str#
ligand_id: str#
engine: str#
pose_rank: int#
affinity: float | None#
score_data: dict[str, Any]#
metadata: dict[str, Any]#
property pose_key: str#

Return the best available human-readable pose key.

This property prefers the stored external pose_id when available. Otherwise, it reconstructs a deterministic logical key from the pose identity fields.

Returns:

Stable pose key suitable for display and record matching.

Return type:

str

class InteractionRecord(
interaction_id,
pose_db_id,
pose_id,
receptor_id,
ligand_id,
engine,
pose_rank,
interaction_type,
chain_id,
residue_name,
residue_number,
residue_id,
ligand_residue,
occurrence_index,
ligand_atom_indices,
protein_atom_indices,
ligand_parent_atom_indices,
protein_parent_atom_indices,
distance,
angle,
metadata,
created_at,
)#

Bases: object

Immutable in-memory representation of a pose interaction row.

Each instance represents one detailed interaction event associated with a specific docking pose. The record includes resolved pose identity fields, residue-level annotations, atom index mappings, geometric descriptors, and arbitrary extra metadata.

Parameters:
  • interaction_id (int) – Internal SQLite integer primary key for the interaction row.

  • pose_db_id (int) – Foreign-key link to the associated pose row.

  • pose_id (Optional[str]) – Optional external stable pose identifier.

  • receptor_id (str) – Receptor identifier resolved through the associated pose row.

  • ligand_id (str) – Ligand identifier resolved through the associated pose row.

  • engine (str) – Docking engine name resolved through the associated pose row.

  • pose_rank (int) – One-based pose rank resolved through the associated pose row.

  • interaction_type (str) – Interaction label such as Hydrophobic, VdWContact, or HBDonor.

  • chain_id (Optional[str]) – Optional protein chain identifier.

  • residue_name (Optional[str]) – Optional residue name, for example LEU.

  • residue_number (Optional[int]) – Optional residue number, for example 149.

  • residue_id (Optional[str]) – Optional compact residue identifier such as LEU149.A.

  • ligand_residue (Optional[str]) – Optional ligand residue label such as LIG1.

  • occurrence_index (int) – Zero-based occurrence index for repeated interactions of the same type at the same residue.

  • ligand_atom_indices (list[int]) – Ligand atom indices participating directly in the interaction.

  • protein_atom_indices (list[int]) – Protein atom indices participating directly in the interaction.

  • ligand_parent_atom_indices (list[int]) – Parent ligand atom indices when available from the upstream interaction extractor.

  • protein_parent_atom_indices (list[int]) – Parent protein atom indices when available from the upstream interaction extractor.

  • distance (Optional[float]) – Optional interaction distance value.

  • angle (Optional[float]) – Optional interaction angle value.

  • metadata (dict[str, Any]) – Additional free-form metadata associated with the interaction event.

  • created_at (str) – SQLite insertion timestamp for the interaction row.

Example:
>>> record = InteractionRecord(
...     interaction_id=1,
...     pose_db_id=10,
...     pose_id=None,
...     receptor_id="1M17",
...     ligand_id="erlotinib",
...     engine="qvina",
...     pose_rank=1,
...     interaction_type="Hydrophobic",
...     chain_id="A",
...     residue_name="LEU",
...     residue_number=149,
...     residue_id="LEU149.A",
...     ligand_residue="LIG1",
...     occurrence_index=0,
...     ligand_atom_indices=[2],
...     protein_atom_indices=[9],
...     ligand_parent_atom_indices=[2],
...     protein_parent_atom_indices=[2392],
...     distance=4.49,
...     angle=None,
...     metadata={},
...     created_at="2026-04-02 10:00:00",
... )
>>> record.pose_key
'1M17__erlotinib__qvina__pose1'
interaction_id: int#
pose_db_id: int#
pose_id: str | None#
receptor_id: str#
ligand_id: str#
engine: str#
pose_rank: int#
interaction_type: str#
chain_id: str | None#
residue_name: str | None#
residue_number: int | None#
residue_id: str | None#
ligand_residue: str | None#
occurrence_index: int#
ligand_atom_indices: list[int]#
protein_atom_indices: list[int]#
ligand_parent_atom_indices: list[int]#
protein_parent_atom_indices: list[int]#
distance: float | None#
angle: float | None#
metadata: dict[str, Any]#
created_at: str#
property pose_key: str#

Return the best available human-readable pose key.

This property prefers the stored external pose_id when available. Otherwise, it reconstructs a deterministic logical key from the pose identity fields.

Returns:

Stable pose key suitable for grouping, display, and record matching.

Return type:

str