Repository for UW-Madison Machine Learning Marathon 2025
- Get ESM running on Platform R - Zach
- Show group how to do - Zach
- Read MaveDB paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1845-6
- Mike
- Zach
- Sean
- Brendan
- Read Kaggle website
(https://www.kaggle.com/competitions/mave-db-amino-acid-substitution-prediction/data)
- Mike
- Zach
- Sean
- Brendan
- Download Data
- Mike
- Zach
- Sean
- Brendan
- Code to generate protein sequences and variants - Brendan
- EDA Slides due 9/30
- Create slide deck - Mike 9/22 (https://docs.google.com/presentation/d/1bbqywkXwJe28KgNbH5wtFZf6BR64EssxS5nZa9J39Ks/edit?usp=sharing)
- Explanation of problem - Mike 9/22
- First fit of a model results - Mike 9/29
- Key technologies used
- Figure for ESM - Zach or Brendan
- Explanation of getting downloading proteins embeddings - Zach or Brendan
- Initial EDA explanation
- Best figure for showing proteins in embedding space- Zach
- Best figure for showing distribution of variants - Sean
- Next steps
- Update with at least one bullet point for next steps
- Mike 9/29
- Zach
- Sean
- Brendan
- Update with at least one bullet point for next steps