DevRep

Gp2 Developability Modeling, Hackel Lab Univ. of Minnesota Lead contact: Alex Golinski golin010@umn.edu

Modeling python scripts used to predict the yield of Gp2 paratope variants.

The code for the first part of the project, determining the most predictive HT assays, can be found in ./main_paper_one/

The code for the second part of the project, creating a sequence-based model to predict yield via transfer learning of DevRep, can be found in ./main_paper_two/

Both files in the main_paper_x folders need to me moved to the main directory to run.

There are brief examples of how to use the code for the most predictive models for each paper in the main directory under main*example.py. Beyond unzipping the datasets, these scripts should run without without any other modifications.

For non-top performing models, saved hyperparameter trials and model stats can be found within the zipped folder in the repective folders

To create the environment with the conda package manager run from the command line

conda create --name <env> python=3.7.5 tensorflow=2.0.0 numpy=1.17.4 pandas=0.25.3 seaborn=0.10.1 scikit-learn=0.22 matplotlib

Where <env> is your environment name. Then from the commmand line type:

conda activate <env>

conda install -c conda-forge hyperopt=0.2.2

The environment for DevRep is now setup!

File descriptions: model_module.py - base model class that defines how to cross-validate, test, and evaluate model performances. submodels_module.py - subclasses that modify the model inputs/outputs and datasets for model evlaulation model_architectures.py - describes the hyperparameters and construction of the possible model architectures used plot_model.py - helper class to plot the predicted results from cv and testing load_format_data.py - helper functions to format the data from the pickeled DataFrames to useful inputs for model evaluations

Folder descriptions: /datasets/ - location of saved sequences' yields and assay scores. Due to GitHub size limits, you will have to unzip the datasets and the example predicted datasets Datasets are a pickeled DataFrame, which can be opened via panda.read_pickle()

/trials/ - location of hyperparameter trials during cross-validation saved as pickeled hyperopt files. /model_stats/ - location of the best cv- and test- performance of the models /models/ - location of saved models as either pickeled scikit-learn models or tensorflow2 weights /plotpairs/ - location of saved pairs of (predicted value, true value, strain or assay id) /figures/ - location of the saved predicted figures for cv and testing

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
__pycache__		__pycache__
best_emb_model		best_emb_model
datasets		datasets
figures		figures
main_paper_one		main_paper_one
main_paper_two		main_paper_two
model_stats		model_stats
models		models
nested_sampling_scripts		nested_sampling_scripts
plotpairs		plotpairs
sampling_data/Nb_sequences_1000_Nbsteps_4_Nb_loops_15000_static_1		sampling_data/Nb_sequences_1000_Nbsteps_4_Nb_loops_15000_static_1
trials		trials
LICENSE		LICENSE
README.md		README.md
alex_group_presentation.py		alex_group_presentation.py
init_DevRep.py		init_DevRep.py
input_deck.py		input_deck.py
load_format_data.py		load_format_data.py
main_DevRep_example.py		main_DevRep_example.py
main_HT_assay_example.py		main_HT_assay_example.py
min_yield_plots.py		min_yield_plots.py
model_architectures.py		model_architectures.py
model_module.py		model_module.py
ns_data_modules.py		ns_data_modules.py
ns_latest_runs.py		ns_latest_runs.py
ns_main_sampling.py		ns_main_sampling.py
ns_msi.py		ns_msi.py
ns_nested_sampling.py		ns_nested_sampling.py
ns_nested_sampling_CPU.pbs		ns_nested_sampling_CPU.pbs
ns_nested_sampling_ray.py		ns_nested_sampling_ray.py
ns_passwords.py		ns_passwords.py
ns_plot_modules.py		ns_plot_modules.py
ns_sampling_modules.py		ns_sampling_modules.py
ns_sequence_Timings.py		ns_sequence_Timings.py
ns_show_results.py		ns_show_results.py
ns_submodels_module.py		ns_submodels_module.py
ns_walk.py		ns_walk.py
parellel_get_yield.py		parellel_get_yield.py
plot_model.py		plot_model.py
ray_profile.py		ray_profile.py
submodels_module.py		submodels_module.py
zip_the_data.py		zip_the_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DevRep

About

Uh oh!

Releases

Packages

Languages

License

brycejoh16/DevRep

Folders and files

Latest commit

History

Repository files navigation

DevRep

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages