Modelling ========= Overview -------- This module provides utilities to post-process AlphaFold-Multimer predictions, extract interface features, and rescore complexes with ``pDockQ``/``pDockQ2``. Typical workflows: - single model analysis (``single_analysis``) - per-entry processing across ranked models (``all_analysis``) - batch processing across many entry folders (``batch_analysis``) Examples -------- Single (direct) analysis ^^^^^^^^^^^^^^^^^^^^^^^^ Analyze one predicted model (JSON + PDB): .. code-block:: python from pepkit.modelling.af.post.analysis import Analysis a = Analysis( json_path="data/examples/7QWV_A_7QWV_B/7QWV_A_7QWV_B_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json", pdb_path="data/examples/7QWV_A_7QWV_B/7QWV_A_7QWV_B_relaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb", peptide_chain_position="last", distance_cutoff=8.0, ) result = a.single_analysis() subset = { "composite_ptm": result.get("composite_ptm"), "pdockq": result.get("pdockq"), "pdockq2": result.get("pdockq2"), } print(subset) Example output: .. code-block:: python {'composite_ptm': 0.82, 'pdockq': 0.16, 'pdockq2': 0.36} Per-entry (all ranked models) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process every ranked model in a folder: .. code-block:: python a = Analysis(peptide_chain_position="last") entry_result = a.all_analysis("data/examples/7QWV_A_7QWV_B/") The returned ``entry_result`` is a dict with per-rank keys (``rank001``, ``rank002`` …) and meta keys such as ``length`` and ``processing_time``. Command line usage ------------------ Single entry ^^^^^^^^^^^^ Module invocation: .. code-block:: console python -m pepkit.modelling.af.post.analysis --entry_dir data/examples/7QWV_A_7QWV_B Installed CLI: .. code-block:: console pepkit postprocess --af-out data/examples/7QWV_A_7QWV_B --single-entry The command writes ``result.json`` inside the entry directory (``af-out/result.json``). Example ``af-out/result.json`` (valid JSON; truncated): .. code-block:: json { "rank001": { "mean_plddt": 83.091, "median_plddt": 95.0, "peptide_plddt": 78.062, "protein_interface_plddt": 87.986, "peptide_interface_plddt": 78.062, "interface_plddt": 83.024, "mean_pae": 8.683, "max_pae": 31.312, "peptide_pae": 8.837, "protein_interface_pae": 6.613, "peptide_interface_pae": 8.837, "mean_interface_pae": 4.408, "pdockq": 0.16, "pdockq2": 0.36, "composite_ptm": 0.82 }, "rank002": {}, "length": 5, "processing_time": 12.3 } Batch (multiple entries) ^^^^^^^^^^^^^^^^^^^^^^^^ Module invocation: .. code-block:: console python -m pepkit.modelling.af.post.analysis --batch_dir data/examples Installed CLI: .. code-block:: console pepkit postprocess --af-out data/examples The batch run processes all entry folders under the provided directory. Turning results into a summary table ------------------------------------ Collect ``pDockQ`` / ``pDockQ2`` across ranks and build a DataFrame: .. code-block:: python import json import pandas as pd from pathlib import Path rows = [] for entry_dir in Path("data/examples").iterdir(): p = entry_dir / "result.json" if not p.exists(): continue data = json.loads(p.read_text()) for rank_key, record in data.items(): if not rank_key.startswith("rank"): continue rows.append({ "entry": entry_dir.name, "rank": rank_key, "pdockq": record.get("pdockq"), "pdockq2": record.get("pdockq2"), "composite_ptm": record.get("composite_ptm"), }) df = pd.DataFrame(rows).sort_values(["entry", "rank"]) print(df.head()) Quick notes & tips ------------------ - ``peptide_chain_position``: use ``"last"`` if the peptide is the last chain in the PDB. - ``distance_cutoff``: interface radius in Å (example uses 8.0 Å). - Output is plain JSON dicts; save ``result.json`` per entry for reproducibility. API quick reference ------------------- .. list-table:: :widths: 25 75 :header-rows: 0 * - ``Analysis(...)`` - Constructor. Common args: ``json_path``, ``pdb_path``, ``peptide_chain_position``, ``distance_cutoff``. * - ``single_analysis()`` - Analyze a single JSON+PDB pair → metrics dict. * - ``all_analysis(folder_path)`` - Process ranked models in `folder_path` → dict keyed by rank. * - ``batch_analysis(parent_path, progress_step_pct=20)`` - Process multiple entries under `parent_path` → mapping entry→results. See also -------- - :doc:`getting_started` - :doc:`api`