ERGO is a deep learning based model for predicting TCR-peptide binding.
Check our web-tool at http://tcr.cs.biu.ac.il
pytorch 1.4.0
numpy 1.18.1
scikit-learn 0.22.1
The main module for training is ERGO.py.
For training, run:
python ERGO.py train model_type database specific gpu --model_file=model.pt --train_data_file=train_data --test_data_file=test_data
where:
model_typeis the the type of TCR encoding, LSTM based withlstmor autoencoder based withae- database is the training database, McPAS-TCR with
mcpasor VDJdb withvdjdb. gpuis cuda device to use (e.g.cuda:0), orcpufor CPU (but it might be way slower)--model_fileis the file which the model is saved to after training.--train_data_fileand--test_data_fileare train and test data files, you can set them asautofor defaults.
If you are interested in prediction only and not interested in training ERGO models, It might be more convenient to use our web tool, available here. You can choose what model and training set to use, and get the binding score of given TCRs and peptides from a csv file.
Anyway you can also predict using the ERGO.py module.
It is quite similar to training, run:
python ERGO.py predict model_type database specific gpu --model_file=model.pt --train_data_file=train_data --test_data_file=test_data
where:
--model_fileis the trained model file.--test_data_fileis a csv file with TCR and peptide columns. See example file in the ERGO website.- All other cmd parameters are similar to the training process.
The trained models and some of the train/test datasets we used are stored in the models directory.
The autoencoder based model requires a pre-trained TCR-autoencoder. for training the TCR-autoencoder, go to the TCR-Autoencoder directory using
cd TCR_Autoencoder
and run:
python train_tcr_autoencoder.py BM_data_CDR3s device model_file.pt
when device is a CUDA GPU device (e.g. 'cuda:0') or 'cpu' for CPU device.
The trained autoencoder will be saved in model_file as a pytorch model.
You can use the already trained tcr_autoencoder.pt model instead.