Skip to content

A Large Languge Model to predict binding affinity of scFv antibody sequence against SARS-CoV-2. It takes input of antibody heavy and light chain sequence, and predicts the binding affinity against a common peptide in SARS-CoV-2.

Notifications You must be signed in to change notification settings

ucrbioinfo/AbAffinity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

AbAffinity is a Large Languge Model to predict binding affinity of scFv antibody sequence against SARS-CoV-2 HR2 peptide. It takes input of antibody heavy and light chain sequence, and predicts the binding affinity against a peptide in SARS-CoV-2 HR2 peptide. This peptide is common in all the variants of SARS-CoV-2.

Key Features

  • Predict Binding Affinity: Given the input antiobdy seqeunce, predict binding affinity

  • Antibody Representation: Given the input antiobdy seqeunce, provide embedding of the antibody. The model gives both residue level representation and sequence level representation.

  • Attention Contact Map: Given the input antibody sequence, AbAffinity will give residue-residue attention maps of the antibody.

Installation

You can install AbAffinity using the following command:

pip install git+https://huggingface.co/faisalashraf/abaffinity

You can also install it in a local folder:

git lfs install
git clone https://huggingface.co/faisalashraf/abaffinity
cd abaffinity 
pip install .

Usage

Here's a quick example to get started:

from AbAffinity import AbAffinity

# Example usage
abmodel=AbAffinity() 


#The model takes complete scFv sequences as input. Heavy and Light chain are connected with a linker sequence. Use make_scFv() method from the model to get the complete scFv seqeunce from heavy chain and light chain sequence.

heavy_seq = 'EVQLVESGAEVKKPGASVKVSCKASGYTFTSYGISWVRQAPGQGLEWMGWISAYNGNTNYAQKLQGRVTMTTDTSTSTAYMELRSLRSDDTAVYYCARVGRGVIDHWGQGTLVTVSS' 
light_seq = 'SSELTQDPAVSVALGQTVRITCEGDSLDYYYANWYQQKPGQAPILVIYGKNNRPSGIADRFSGSNSGDTSSLIITGAQAEDEADYYCSSRDSSGFEVTFGAGTKLTVL'

scFv_seq = abmodel.make_scFv(heavy_seq, light_seq) 
print(scFv_seq)  # Output: EVQLVESGAEVKKPGASVKVSCKASGYTFTSYGISWVRQAPGQGLEWMGWISAYNGNTNYAQKLQGRVTMTTDTSTSTAYMELRSLRSDDTAVYYCARVGRGVIDHWGQGTLVTVSSGGGGSGGGGSGGGGSSSELTQDPAVSVALGQTVRITCEGDSLDYYYANWYQQKPGQAPILVIYGKNNRPSGIADRFSGSNSGDTSSLIITGAQAEDEADYYCSSRDSSGFEVTFGAGTKLTVL

#Use `get_affinity()` method to get the predicted binding affinity of the antibody sequence. 
#You can pass a list of sequences to get embeddings for all. Make sure that you have enough memory to process the sequences altogether. You can tune the batch size for this purpose. Example: `model.get_affinity(list_sequences, batch_size=16)`. Default batch_size is 4. 

pred_affinity = abmodel.get_affinity(scFv_seq)
print(pred_affinity) # Output: tensor([3.1595]) 


# Use `get_embeddings()` method to get the embeddings for input sequences. Use `mode='res'` to get residue wise embeddings, and `mode='seq'` will give seqeunce embedding. 
# You can pass a list of sequences to get embeddings for all. Make sure that you have enough memory to process the sequences altogether. You can tune the batch size for this purpose. Example: `model.get_embeddings(list_sequences, mode='seq', batch_size=16)`. Default batch_size is 4.

res_emb = abmodel.get_embeddings(scFv_seq, mode='res')
print(res_emb.shape)  # Output: torch.Size([258, 1280])
 
seq_emb = abmodel.get_embeddings(scFv_seq, mode='seq')
print(seq_emb.shape) # Output: torch.Size([1280]) 

# Use  `get_contact_map()` method to get the contact maps of the given antibody sequence. It will return a matrix of shape `L x L` where `L` is the length of input sequence. Each value in the matrix represents the contact weight between two residue in the sequence.  
# Use `mode='VH-VL'` if you want to plot the contacts for heavy chain and light chain separately, and `mode='scFv'` to plot single contacts for the entire scFv sequence. 

contacts = abmodel.get_contact_map(scFv_seq, mode = 'scFv')
print(contacts.shape) # Output: contact map figure,  (240, 240)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

For citation:

@article{ashraf2024large,
  title={A Large Language Model Guides the Affinity Maturation of Variant Antibodies Generated by Combinatorial Optimization},
  author={Ashraf, Faisal Bin and Zhang, Zihao and Paco, Karen and Mendivil, Mariana P and Lay, Jordan A and Ray, Animesh and Lonardi, Stefano},
  journal={bioRxiv},
  pages={2024--12},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

About

A Large Languge Model to predict binding affinity of scFv antibody sequence against SARS-CoV-2. It takes input of antibody heavy and light chain sequence, and predicts the binding affinity against a common peptide in SARS-CoV-2.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages