Database#

ProDock database workflow

Campaign storage

Store receptors, ligands, engines, poses, scores, and interactions in one SQLite database instead of scattered output files.

Query analysis

Reopen the campaign later and filter poses by receptor, ligand, engine, rank, affinity, or interaction content without rebuilding results.

Database architecture#

ProDock stores docking campaigns in a compact normalized SQLite schema. The main design idea is simple:

  • write results once,

  • query them many times later.

The schema is organized so that receptor, ligand, and engine identifiers are stored once, while pose-specific, score-specific, and interaction-specific records remain linked through stable pose keys.

ProDock database architecture

This gives three practical benefits:

  • compact storage across many receptors, ligands, and engines,

  • easy reconstruction of analysis tables,

  • consistent filtering across identity, score, and interaction layers.

The database is organized into three layers:

  • dimension tables for receptors, ligands, and engines,

  • pose records for pose identity, serialized molecules, and metadata,

  • analysis tables for score payloads and residue-level interactions.

In practice, this means one campaign can be explored later without reparsing engine logs, reconverting pose files, or recomputing interaction summaries.

Writer and reader

The database layer is split into one writer API and one read/query API.

The two main public objects are:

  • PoseDatabase — create the schema and insert or update campaign data

  • PoseQuery — open an existing database and run analysis-friendly queries

This separation keeps workflow generation and downstream analysis cleanly decoupled.

Pose identity#

A stored pose can be addressed in two ways:

  • pose_db_id — internal SQLite integer primary key

  • pose_id — optional stable external id such as "1M17__erlotinib__qvina__pose1"

If no external pose_id is stored, a pose is still uniquely identified by:

(receptor_id, ligand_id, engine, pose_rank)

This makes the schema robust both for automated inserts and for human-readable campaign exports.

Write campaign data#

PoseDatabase

Store pose tables, score rows, molecules, and interactions in one campaign database.

Use PoseDatabase when you already have a pose dataframe from docking or postprocessing and want to persist it for later analysis.

Minimal write example:

from prodock.database import PoseDatabase

db = PoseDatabase("poses.sqlite")
db.insert_dataframe(pose_dataframe)

The dataframe is expected to contain the core pose columns:

  • receptor_id

  • ligand_id

  • engine

  • pose_rank

  • affinity

  • mol

A one-step construction pattern is also available:

from prodock.database import PoseDatabase

db = PoseDatabase.from_dataframe(
    "poses.sqlite",
    pose_dataframe,
)

If interaction payloads are already available, they can be stored at import time:

db.insert_dataframe(
    pose_dataframe,
    interactions_by_pose=interactions_by_pose,
    replace=True,
    replace_interactions=True,
)

Query stored campaigns#

PoseQuery

Open an existing database and query poses, scores, and interactions through a read-focused API.

Use PoseQuery after a campaign has already been stored. By default, it opens the database in read-only mode.

Typical query patterns include:

  • pose tables filtered by receptor, ligand, engine, or rank,

  • score tables without loading molecules,

  • exact retrieval of one stored pose,

  • interaction-aware filtering,

  • interaction summaries and fingerprint matrices,

  • campaign-level summaries.

Basic examples:

from prodock.database import PoseQuery

q = PoseQuery("poses.sqlite")

poses = q.poses(
    receptor_id="1M17",
    engine="qvina",
    as_dataframe=True,
)

scores = q.scores(
    receptor_id="1M17",
    top_rank=3,
    as_dataframe=True,
)

print(poses[["pose_id", "pose_rank", "affinity"]].head())
print(scores[["pose_id", "affinity"]].head())

Retrieve one exact pose:

pose = q.pose(
    receptor_id="1M17",
    ligand_id="erlotinib",
    engine="qvina",
    pose_rank=1,
    include_interactions=True,
    interaction_mode="summary",
)

print(pose)

Interaction querying#

One advantage of keeping interactions in the same schema is that pose queries can also filter by interaction content.

from prodock.database import PoseQuery

q = PoseQuery("poses.sqlite")

selected = q.poses(
    receptor_id="1M17",
    interaction_type="Hydrophobic",
    residue_id="LEU23.A",
    as_dataframe=True,
)

print(selected[["pose_id", "affinity"]])

Compact summaries can be rebuilt directly from stored interaction rows:

summary = q.interaction_summary(
    receptor_id="1M17",
    return_by="pose_id",
)

print(summary)

Detailed event payloads can also be reconstructed:

details = q.interaction_details(
    receptor_id="1M17",
    return_by="pose_key",
)

print(details)

Fingerprint matrices are also available:

fp = q.fingerprint(
    receptor_id="1M17",
    mode="binary",
    index_by="pose_key",
)

print(fp.head())

Minimal end-to-end example#

Example

Write once, query later

from prodock.database import PoseDatabase, PoseQuery

# Step 1: write campaign results
db = PoseDatabase("poses.sqlite")
db.insert_dataframe(
    pose_dataframe,
    interactions_by_pose=interactions_by_pose,
    replace=True,
    replace_interactions=True,
)

# Step 2: reopen with the read/query API
q = PoseQuery("poses.sqlite")

best = q.poses(
    receptor_id="1M17",
    top_rank=1,
    include_interactions=True,
    interaction_mode="summary",
    as_dataframe=True,
)

fp = q.fingerprint(
    receptor_id="1M17",
    mode="binary",
    index_by="pose_key",
)

summary = q.summary()

print(best.head())
print(fp.head())
print(summary.head())

See also#

  • Database API — full reference for PoseDatabase and PoseQuery

  • Postprocess API — build pose tables and interaction payloads before database import

  • Postprocess — postprocess docking outputs into reusable tables and summaries