Skip to content

Elution Profile-Based Protein Complexes Inference Algorithm

Notifications You must be signed in to change notification settings

bio-it-station/SPIFFED

Repository files navigation

logo

SPIFFED – Software for Prediction of Interactome with Feature-extraction Free Elution Data

A balanced end-to-end deep learning model for interactome prediction from co-fractionation/mass-spectrometry (CF-MS) data


Introduction

SPIFFED is modified from Elution Profile-Based Inference of Protein Complexes (EPIC), a widely used protein protein interaction predictor and protein complex inference software. SPIFFED differs from EPIC in that it uses a convolutional neural network to analyze raw co-elution data, thereby eliminating the need for manual feature engineering. This approach enhances the accuracy of protein interaction predictions.

Install

To install SPIFFED, first make sure you have Python 2.7

$ git clone https://github.com/bio-it-station/SPIFFED
$ conda create -n "EPIC_test" python=2.7.16

$ pip install -r requirements.txt
$ pip install beautifulsoup4
$ pip install tensorflow==1.13.1
$ pip install Keras==2.2.4
$ conda install rpy2
$ pip install scikit-plot

Here is a list of dependent packages:

1. scikit-learn
2. requests
3. scikit-learn
4. beautifulsoup4
5. mock
6. kohonen
7. numpy
8. matplotlib

Run SPIFFED

Here is the main and only one command that you need to run:

python ./src/main.py -s feature_selection input_directory -c gold_standard_file_path output_directory -o output_filename_prefix -M training_method -n number_of_cores -m EXP -f STRING --LEARNING_SELECTION learning method selection --K_D_TRAIN fold_or_direct_training --FOLD_NUM number_of_folds --TRAIN_TEST_RATIO testing_data_ratio --POS_NEG_RATIO negative_PPIs_ratio --NUM_EP number_of_elution_profiles --NUM_FRC number_of_fractions --CNN_ENSEMBLE ensemble_bool


Parameter Definition

  1. (-s feature_selection) or (--feature_selection feature_selection): Specify correlation scores to be used in SPIFFED. Eight different correlation socres are implemented in SPIFFED, in order: Mutual Information, Bayes Correlation, Euclidean Distance, Weighted Cross-Correlation, Jaccard Score, PCCN, Pearson Correlation Coefficient, Apex Score, and Raw elution profile. "0" indicates that we don't use this correlation score and "1" indicates that we use this correlation score.

    • If you want to run Convolutional Neural Network (CNN) or Label Spreading (LS), you must set this parameter to "-s 000000001". (* note that there are 9 characters in the string).
    • If you want to run EPPC with SPIFFED scores, then you can set this parameter to "-s 11101001". (* note that there are 8 characters in the string). In this example, it will use Mutual Information, Bayes Correlation, Euclidean Distance, Jaccard Score and Apex Score. To specify the correlation scores to use:
  2. input_directory: This parameter stores the input directory where you store your elution profile file. It is recommended to use the abosulte path instead of relative path.

  3. (-c gold_standard_file_path) or (--cluster gold_standard_file_path): This parameter stores the path to the gold standard file that you curated.

  4. output_directory: This parameter stores the path to the ouput directory. Make sure that you've already created the directory before running the command. It is recommended to use the abosulte path instead of relative path.

  5. (-o output_filename_prefix) or (--output_prefix output_filename_prefix): You can specify a prefix name for all the output files. The default is "Out"

  6. (-M training_method) or (--classifier training_method): This parameter specifies what kind of classifier that you use. Possible options include RF, CNN, LS. Note that RF must comes with selected SPIFFED scores like "-s 11101001" instead of raw elution profile ("-s 000000001"). CNN and LS must come with raw elution profile ("-s 000000001").

  7. (-n number_of_cores) or (--num_cores number_of_cores): You need to specify the number of cores used to run EPPC, the default number is 1. Assume you want to use six cores to run SPIFFED, you can set "-n 6"

  8. --LEARNING_SELECTION learning method selection: This parameter specifies whether you want to use supervised learning or semi-supervised learning. If you want to run with supervised learning, then set "--LEARNING_SELECTION sl" (Your training_method can be RF or CNN); if you want to run with semi-supervised learning, then set "--LEARNING_SELECTION ssl" (Your training_method can be CNN or LS).

  9. --K_D_TRAIN fold_or_direct_training: Set d to directly train the model; set k to run with k-fold training. (options: d and k; default: d)

  10. --FOLD_NUM number_of_folds: If you set --K_D_TRAIN k, then this parameter stores how many folds you are going to evaluate your mode. Note that this parameter must be bigger than 2. (default: 5)

  11. --TRAIN_TEST_RATIO testing_data_ratio: This parameter stores the ratio of testing data to all data. (default: 0.3)

  12. --POS_NEG_RATIO negative_PPIs_ratio: This parameter stores the ratio of negative PPIs to positive PPIs. (default: 1)

  13. --NUM_EP number_of_elution_profiles: This parameter stores the number of elution profiles inside each PPI. (default: 2)

  14. --NUM_FRC number_of_fractions: This parameter stores the number of fractions in the elution profile file. (default: 27)

  15. --CNN_ENSEMBLE number_of_fractions: This parameter is a boolean value. If it's 0, users need to provide one elution profile; if it's 1, users need to provide multiple elution profiles.


SPIFFED Command Examples

To run SPIFFED:

python ./main.py -s 000000001 /ccb/salz3/kh.chao/SPIFFED/input/EPIC_DATA/beadsALF -c /ccb/salz3/kh.chao/SPIFFED/input/EPIC_DATA/Worm_reference_complexes.txt /ccb/salz3/kh.chao/SPIFFED/output/EPIC_DATA/beadsALF/TEST/CNN_SL/FOLDS/beadsALF__K_D__k__CNN_SL__fold_number_5__negative_ratio_5__test_ratio_30 -o TEST -M CNN -n 10 -m EXP -f STRING --LEARNING_SELECTION sl --K_D_TRAIN k --FOLD_NUM 5 --TRAIN_TEST_RATIO 0.3 --POS_NEG_RATIO 5 --CNN_ENSEMBLE 0


Multiple

To run SPIFFED with ensemble model:

python ./main.py -s 000000001 /home/kuan-hao/SPIFFED/input/OUR_DATA/intensity_HML_ensemble/ -c /home/kuan-hao/SPIFFED/input/OUR_DATA/gold_standard.tsv /home/kuan-hao/SPIFFED/output/SELF_DATA/intensity_HML_ensemble__negative_ratio_5/ -o out -M CNN -n 10 -m EXP -f STRING --LEARNING_SELECTION sl --K_D_TRAIN d --FOLD_NUM 5 --TRAIN_TEST_RATIO 0.7 --POS_NEG_RATIO 5 --NUM_EP 2 --NUM_FRC 27 --CNN_ENSEMBLE 1

About

Elution Profile-Based Protein Complexes Inference Algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •