AbAffinity is a Large Languge Model to predict binding affinity of scFv antibody sequence against SARS-CoV-2 HR2 peptide. It takes input of antibody heavy and light chain sequence, and predicts the binding affinity against a peptide in SARS-CoV-2 HR2 peptide. This peptide is common in all the variants of SARS-CoV-2.
-
Predict Binding Affinity: Given the input antiobdy seqeunce, predict binding affinity
-
Antibody Representation: Given the input antiobdy seqeunce, provide embedding of the antibody. The model gives both residue level representation and sequence level representation.
-
Attention Contact Map: Given the input antibody sequence, AbAffinity will give residue-residue attention maps of the antibody.
You can install AbAffinity using the following command:
pip install git+https://huggingface.co/faisalashraf/abaffinityYou can also install it in a local folder:
git lfs install
git clone https://huggingface.co/faisalashraf/abaffinity
cd abaffinity
pip install .Here's a quick example to get started:
from AbAffinity import AbAffinity
# Example usage
abmodel=AbAffinity()
#The model takes complete scFv sequences as input. Heavy and Light chain are connected with a linker sequence. Use make_scFv() method from the model to get the complete scFv seqeunce from heavy chain and light chain sequence.
heavy_seq = 'EVQLVESGAEVKKPGASVKVSCKASGYTFTSYGISWVRQAPGQGLEWMGWISAYNGNTNYAQKLQGRVTMTTDTSTSTAYMELRSLRSDDTAVYYCARVGRGVIDHWGQGTLVTVSS'
light_seq = 'SSELTQDPAVSVALGQTVRITCEGDSLDYYYANWYQQKPGQAPILVIYGKNNRPSGIADRFSGSNSGDTSSLIITGAQAEDEADYYCSSRDSSGFEVTFGAGTKLTVL'
scFv_seq = abmodel.make_scFv(heavy_seq, light_seq)
print(scFv_seq) # Output: EVQLVESGAEVKKPGASVKVSCKASGYTFTSYGISWVRQAPGQGLEWMGWISAYNGNTNYAQKLQGRVTMTTDTSTSTAYMELRSLRSDDTAVYYCARVGRGVIDHWGQGTLVTVSSGGGGSGGGGSGGGGSSSELTQDPAVSVALGQTVRITCEGDSLDYYYANWYQQKPGQAPILVIYGKNNRPSGIADRFSGSNSGDTSSLIITGAQAEDEADYYCSSRDSSGFEVTFGAGTKLTVL
#Use `get_affinity()` method to get the predicted binding affinity of the antibody sequence.
#You can pass a list of sequences to get embeddings for all. Make sure that you have enough memory to process the sequences altogether. You can tune the batch size for this purpose. Example: `model.get_affinity(list_sequences, batch_size=16)`. Default batch_size is 4.
pred_affinity = abmodel.get_affinity(scFv_seq)
print(pred_affinity) # Output: tensor([3.1595])
# Use `get_embeddings()` method to get the embeddings for input sequences. Use `mode='res'` to get residue wise embeddings, and `mode='seq'` will give seqeunce embedding.
# You can pass a list of sequences to get embeddings for all. Make sure that you have enough memory to process the sequences altogether. You can tune the batch size for this purpose. Example: `model.get_embeddings(list_sequences, mode='seq', batch_size=16)`. Default batch_size is 4.
res_emb = abmodel.get_embeddings(scFv_seq, mode='res')
print(res_emb.shape) # Output: torch.Size([258, 1280])
seq_emb = abmodel.get_embeddings(scFv_seq, mode='seq')
print(seq_emb.shape) # Output: torch.Size([1280])
# Use `get_contact_map()` method to get the contact maps of the given antibody sequence. It will return a matrix of shape `L x L` where `L` is the length of input sequence. Each value in the matrix represents the contact weight between two residue in the sequence.
# Use `mode='VH-VL'` if you want to plot the contacts for heavy chain and light chain separately, and `mode='scFv'` to plot single contacts for the entire scFv sequence.
contacts = abmodel.get_contact_map(scFv_seq, mode = 'scFv')
print(contacts.shape) # Output: contact map figure, (240, 240)This project is licensed under the MIT License - see the LICENSE file for details.
For citation:
@article{ashraf2024large,
title={A Large Language Model Guides the Affinity Maturation of Variant Antibodies Generated by Combinatorial Optimization},
author={Ashraf, Faisal Bin and Zhang, Zihao and Paco, Karen and Mendivil, Mariana P and Lay, Jordan A and Ray, Animesh and Lonardi, Stefano},
journal={bioRxiv},
pages={2024--12},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}