🌐 Team7_EpiXposome

🧬 Epigenomics_Harmonization_Exposome

🔍 Key Features

EpiXposome aims to decode the impact of environmental exposures on epigenetic changes using advanced machine learning. Our goal is to predict disease risks by revealing how exposome variables interact with epigenetic markers. This project promises to deliver actionable insights that could revolutionize disease prevention and treatment.

📂 Datasets

JPSS (NOAA): NOAA Joint Polar Satellite System (JPSS)
TARGET: Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
C-PTAC-2: Clinical Proteomic Tumor Analysis Consortium 2 (CPTAC-2)
TCGA: The Cancer Genome Atlas - Registry of Open Data on AWS
ENCODE: Encyclopedia of DNA Elements (ENCODE)

🔧 Parameter Usage

Data Preprocessing Parameters

missing_data_threshold
normalization_method
batch_size

Model Training Parameters

Bidirectional RNNs (BRNNs) / BiLSTMs / GRUs

sequence_length
hidden_units
dropout_rate
learning_rate
epochs

💻 System Requirements

OS: Linux (Ubuntu, CentOS, Amazon Linux), macOS, or Windows 10 with WSL2.
CPU: Intel i5/i7 or AMD equivalent; multi-core recommended.
RAM: 16 GB minimum, 32 GB+ recommended.
Storage: 256 GB SSD minimum; more for large datasets.
Software: Python 3.6+, key libraries (numpy, pandas, scikit-learn), Jupyter.

🔗 Dependencies

Python
BRNNs
Bidirectional LSTMs (BiLSTMs)
GRUs
RNNs
XGBoost
Genetic algorithm
Logistic regression models
JSSS
AWS CLI

📥 Example Inputs

geneexp mirna exp tccga

📤 Example Outputs

.csv .json

Steps

Step 1: Data Collection
- Data Sources:
  - miRNA expression data from CPTAC2 and TARGET
  - DNA methylation data from TCGA, ENCODE, TARGET, and TCGA
  - Exposome data from NOAA Joint Polar Satellite System (JPSS)
Step 2: Data Preprocessing
- Clean and impute missing data for robust analysis.
- Prepare the data by addressing any quality control issues.
Step 3: Dimensionality Reduction and Data Integration
- Use PCA to reduce dimensions down to three.
- Use Non-Negative Matrix Factorization (NMF) to integrate disparate data sets and find underlying relationships.

NMF formula:

Step 4: Data Splitting
- Randomly split the dataset into training (80%) and testing (20%) sets.
Step 5: Bidirectional Recurrent Neural Networks (BiRNNs)
- Run Bidirectional Recurrent Neural Networks (BiRNNs) model to determine dependencies between exposome variables, methylation and microRNA data sets.

Step 6: Model Training
- Train the selected machine learning model using the training dataset.
Step 7: Model Testing
- Assess the model's performance using the testing dataset to validate its accuracy.
Step 8: Experimentation
- Investigate the effects of different exposome variables on mRNA and methylation activity.
- Determine if there are specific genes or pathways involved.
Step 9: Results Interpretation
- Use bioinformatics analyses (like GSEA, GO) to interpret the model's findings.
Step 10: Visualization

Develop visualizations to represent the results clearly (e.g., time-altered line plots).

🌐 Process Flowchart

🔮 Future Aims

Standardization and Packaging
- Package the model into a standardized, reusable module.
- Prepare a Python package for easy distribution and use.
Publication and Sharing
- Publish the findings and the Python package for the broader research community.
Documentation and Reproducibility
- Ensure all steps are well-documented to allow for reproducibility of the results.
- Include instructions for setting up the computational environment and running the analysis.

👤 Contributors

Halina Krzystek
Kirtan Dave
Paul Kao
Macciej Kowalski
Aung Myat Phyo
Diya
Alishba Nadeem

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
example_data		example_data
figures		figures
imgae		imgae
scripts		scripts
src		src
Bio AI Team 7 Concept Flowchart (1).pdf		Bio AI Team 7 Concept Flowchart (1).pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌐 Team7_EpiXposome

🧬 Epigenomics_Harmonization_Exposome

🔍 Key Features

📂 Datasets

🔧 Parameter Usage

Data Preprocessing Parameters

Model Training Parameters

Bidirectional RNNs (BRNNs) / BiLSTMs / GRUs

💻 System Requirements

🔗 Dependencies

📥 Example Inputs

📤 Example Outputs

Steps

🌐 Process Flowchart

🔮 Future Aims

👤 Contributors

🔗 References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

License

SFGLab/EpiXposome

Folders and files

Latest commit

History

Repository files navigation

🌐 Team7_EpiXposome

🧬 Epigenomics_Harmonization_Exposome

🔍 Key Features

📂 Datasets

🔧 Parameter Usage

Data Preprocessing Parameters

Model Training Parameters

Bidirectional RNNs (BRNNs) / BiLSTMs / GRUs

💻 System Requirements

🔗 Dependencies

📥 Example Inputs

📤 Example Outputs

Steps

🌐 Process Flowchart

🔮 Future Aims

👤 Contributors

🔗 References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages