Modelling

Overview

This module provides utilities to post-process AlphaFold-Multimer predictions, extract interface features, and rescore complexes with pDockQ/pDockQ2. Typical workflows:

single model analysis (single_analysis)
per-entry processing across ranked models (all_analysis)
batch processing across many entry folders (batch_analysis)

Examples

Single (direct) analysis

Analyze one predicted model (JSON + PDB):

from pepkit.modelling.af.post.analysis import Analysis

a = Analysis(
    json_path="data/examples/7QWV_A_7QWV_B/7QWV_A_7QWV_B_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json",
    pdb_path="data/examples/7QWV_A_7QWV_B/7QWV_A_7QWV_B_relaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb",
    peptide_chain_position="last",
    distance_cutoff=8.0,
)

result = a.single_analysis()
subset = {
    "composite_ptm": result.get("composite_ptm"),
    "pdockq": result.get("pdockq"),
    "pdockq2": result.get("pdockq2"),
}
print(subset)

Example output:

{'composite_ptm': 0.82, 'pdockq': 0.16, 'pdockq2': 0.36}

Per-entry (all ranked models)

Process every ranked model in a folder:

a = Analysis(peptide_chain_position="last")
entry_result = a.all_analysis("data/examples/7QWV_A_7QWV_B/")

The returned entry_result is a dict with per-rank keys (rank001, rank002 …) and meta keys such as length and processing_time.

Command line usage

Single entry

Module invocation:

python -m pepkit.modelling.af.post.analysis --entry_dir data/examples/7QWV_A_7QWV_B

Installed CLI:

pepkit postprocess --af-out data/examples/7QWV_A_7QWV_B --single-entry

The command writes result.json inside the entry directory (af-out/result.json).

Example af-out/result.json (valid JSON; truncated):

{
  "rank001": {
    "mean_plddt": 83.091,
    "median_plddt": 95.0,
    "peptide_plddt": 78.062,
    "protein_interface_plddt": 87.986,
    "peptide_interface_plddt": 78.062,
    "interface_plddt": 83.024,
    "mean_pae": 8.683,
    "max_pae": 31.312,
    "peptide_pae": 8.837,
    "protein_interface_pae": 6.613,
    "peptide_interface_pae": 8.837,
    "mean_interface_pae": 4.408,
    "pdockq": 0.16,
    "pdockq2": 0.36,
    "composite_ptm": 0.82
  },
  "rank002": {},
  "length": 5,
  "processing_time": 12.3
}

Batch (multiple entries)

Module invocation:

python -m pepkit.modelling.af.post.analysis --batch_dir data/examples

Installed CLI:

pepkit postprocess --af-out data/examples

The batch run processes all entry folders under the provided directory.

Turning results into a summary table

Collect pDockQ / pDockQ2 across ranks and build a DataFrame:

import json
import pandas as pd
from pathlib import Path

rows = []
for entry_dir in Path("data/examples").iterdir():
    p = entry_dir / "result.json"
    if not p.exists():
        continue
    data = json.loads(p.read_text())
    for rank_key, record in data.items():
        if not rank_key.startswith("rank"):
            continue
        rows.append({
            "entry": entry_dir.name,
            "rank": rank_key,
            "pdockq": record.get("pdockq"),
            "pdockq2": record.get("pdockq2"),
            "composite_ptm": record.get("composite_ptm"),
        })

df = pd.DataFrame(rows).sort_values(["entry", "rank"])
print(df.head())

Quick notes & tips

peptide_chain_position: use "last" if the peptide is the last chain in the PDB.
distance_cutoff: interface radius in Å (example uses 8.0 Å).
Output is plain JSON dicts; save result.json per entry for reproducibility.

API quick reference

`Analysis(...)`	Constructor. Common args: `json_path`, `pdb_path`, `peptide_chain_position`, `distance_cutoff`.
`single_analysis()`	Analyze a single JSON+PDB pair → metrics dict.
`all_analysis(folder_path)`	Process ranked models in `folder_path` → dict keyed by rank.
`batch_analysis(parent_path, progress_step_pct=20)`	Process multiple entries under `parent_path` → mapping entry→results.