Skip to content

brycejoh16/DevRep

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DevRep

Gp2 Developability Modeling, Hackel Lab Univ. of Minnesota Lead contact: Alex Golinski golin010@umn.edu

Modeling python scripts used to predict the yield of Gp2 paratope variants.

The code for the first part of the project, determining the most predictive HT assays, can be found in ./main_paper_one/

The code for the second part of the project, creating a sequence-based model to predict yield via transfer learning of DevRep, can be found in ./main_paper_two/

Both files in the main_paper_x folders need to me moved to the main directory to run.

There are brief examples of how to use the code for the most predictive models for each paper in the main directory under main*example.py. Beyond unzipping the datasets, these scripts should run without without any other modifications.

For non-top performing models, saved hyperparameter trials and model stats can be found within the zipped folder in the repective folders

To create the environment with the conda package manager run from the command line

conda create --name <env> python=3.7.5 tensorflow=2.0.0 numpy=1.17.4 pandas=0.25.3 seaborn=0.10.1 scikit-learn=0.22 matplotlib

Where <env> is your environment name. Then from the commmand line type:

conda activate <env>

conda install -c conda-forge hyperopt=0.2.2

The environment for DevRep is now setup!

File descriptions: model_module.py - base model class that defines how to cross-validate, test, and evaluate model performances. submodels_module.py - subclasses that modify the model inputs/outputs and datasets for model evlaulation model_architectures.py - describes the hyperparameters and construction of the possible model architectures used plot_model.py - helper class to plot the predicted results from cv and testing load_format_data.py - helper functions to format the data from the pickeled DataFrames to useful inputs for model evaluations

Folder descriptions: /datasets/ - location of saved sequences' yields and assay scores. Due to GitHub size limits, you will have to unzip the datasets and the example predicted datasets Datasets are a pickeled DataFrame, which can be opened via panda.read_pickle()

/trials/ - location of hyperparameter trials during cross-validation saved as pickeled hyperopt files. /model_stats/ - location of the best cv- and test- performance of the models /models/ - location of saved models as either pickeled scikit-learn models or tensorflow2 weights /plotpairs/ - location of saved pairs of (predicted value, true value, strain or assay id) /figures/ - location of the saved predicted figures for cv and testing

About

Gp2 Developability Modeling, Hackel Lab Univ. of Minnesota

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%