PEPMHC

A Siamese BERT model to predict peptide-HLA immunogenic binding.

About

Peptide pretraining run files format: 'pretraining_LLM_{}mlm{}.out'.format(peptide_max_length, mask rate)
Siamese BERT run files format: 'prot_bert_{}{}mlm{}{}.out'.format(peptide_max_length, mhc_max_length, mask_rate, previous training steps for peptide bert) if followed by 'new_split', then it represents new mhc split in train and test sets.
bash commands: (1) LLM pretraining: 'train_llm.sh' for single gpu, and 'train_llm_ddp.sh' for multiple gpus (max 4 in ccdb cedar clusters). (2) Siamese BERT training: 'train_protbert.sh'. I tested using multi-gpus, however it is even slower than using single gpus. So I recommend using single gpu for now.
virtual machine package requirements: 'requirements_pep.txt' for LLM pretraining and 'requirements_prot_bert' for Siamese BERT training
Sample interactive run with 4 gpus: 'interactive'
Result analysis: sample code 'plot.py'
My working log on this dataset: 'MHCAttNet_log'
Model architecture details: 'Our_model_details'

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LLM		LLM
MHCAttnNet_ft		MHCAttnNet_ft
.gitignore		.gitignore
environment_llm.yml		environment_llm.yml
environment_protbert.yml		environment_protbert.yml
plot.py		plot.py
readme.md		readme.md