Chemical Modeling

The pepkit.chem package contains utilities for peptide sequences and peptide-like molecules:

FASTA/SMILES conversion (linear peptides)
Standardization & filtering (drop non-canonical residues, batch/DataFrame processing)
Peptide properties (net charge, molecular weight, pI)
Descriptor calculation for ML pipelines

At a glance 

Inputs	FASTA strings, peptide sequences, SMILES, or pandas DataFrames
Outputs	Canonical SMILES, cleaned/standardized sequences, property dicts, descriptor tables
Where to look next	API Reference for full function/class docs

Sequence ⇄ SMILES conversion 

Convert a peptide sequence → SMILES 

Convert a sequence to a canonical SMILES string:

from pepkit.chem.conversion import fasta_to_smiles

fasta = "ACDE"
smiles = fasta_to_smiles(fasta)
print(smiles)

Example output

SMILES: C[C@H](N)C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)O

Convert SMILES → sequence 

Convert a peptide-like SMILES back to a FASTA/sequence (when the SMILES corresponds to a peptide-like backbone):

from pepkit.chem.conversion import smiles_to_fasta

seq = smiles_to_fasta(smiles, header="peptide1")
print(seq)

Example output

>peptide1
ACDE

Round-trip quick check 

from pepkit.chem.conversion import fasta_to_smiles, smiles_to_fasta
seq = "ACDEFGHIK"
ok = (smiles_to_fasta(fasta_to_smiles(seq), split=True) == seq)
print("Round Trip:", ok)

Note

Intended for linear peptides (RDKit-style). Modified/cyclic peptides or arbitrary small-molecule SMILES may not round-trip.
RDKit is required.

When remove_non_canonical=True, records containing non-canonical residues will be filtered out or set to None depending on the API. Confirm how your downstream pipeline handles missing values before using this option.

Descriptors 

Generate ML features using peptide-sequence descriptors or RDKit molecular descriptors.

from pepkit.chem.desc import Descriptor

# Peptide sequence descriptors
data_pep = [{"id": "pep1", "peptide_sequence": "ACDE"}]
desc_pep = Descriptor(engine="peptides").calculate(data_pep)
print(desc_pep[0])

# RDKit molecular descriptors
data_mol = [{"id": "mol1", "smiles": "CCO"}]
desc_mol = Descriptor(engine="rdkit").calculate(data_mol)
print(desc_mol[0])

Note

The peptides engine requires the third-party package peptides. Install with pip install peptides when using engine="peptides".

Chemical Modeling

At a glance 

Sequence ⇄ SMILES conversion 

Convert a peptide sequence → SMILES 

Convert SMILES → sequence 

Round-trip quick check 

Standardization & filtering 

Descriptors 

See also 