Skip to content

SFGLab/EpiXposome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌐 Team7_EpiXposome

🧬 Epigenomics_Harmonization_Exposome

🔍 Key Features

EpiXposome aims to decode the impact of environmental exposures on epigenetic changes using advanced machine learning. Our goal is to predict disease risks by revealing how exposome variables interact with epigenetic markers. This project promises to deliver actionable insights that could revolutionize disease prevention and treatment.

📂 Datasets

🔧 Parameter Usage

Data Preprocessing Parameters

  • missing_data_threshold
  • normalization_method
  • batch_size

Model Training Parameters

Bidirectional RNNs (BRNNs) / BiLSTMs / GRUs

  • sequence_length
  • hidden_units
  • dropout_rate
  • learning_rate
  • epochs

💻 System Requirements

  • OS: Linux (Ubuntu, CentOS, Amazon Linux), macOS, or Windows 10 with WSL2.
  • CPU: Intel i5/i7 or AMD equivalent; multi-core recommended.
  • RAM: 16 GB minimum, 32 GB+ recommended.
  • Storage: 256 GB SSD minimum; more for large datasets.
  • Software: Python 3.6+, key libraries (numpy, pandas, scikit-learn), Jupyter.

🔗 Dependencies

  • Python
  • BRNNs
  • Bidirectional LSTMs (BiLSTMs)
  • GRUs
  • RNNs
  • XGBoost
  • Genetic algorithm
  • Logistic regression models
  • JSSS
  • AWS CLI

📥 Example Inputs

geneexp mirna exp tccga

📤 Example Outputs

.csv .json

Steps

  1. Step 1: Data Collection

    • Data Sources:
      • miRNA expression data from CPTAC2 and TARGET
      • DNA methylation data from TCGA, ENCODE, TARGET, and TCGA
      • Exposome data from NOAA Joint Polar Satellite System (JPSS)
  2. Step 2: Data Preprocessing

    • Clean and impute missing data for robust analysis.
    • Prepare the data by addressing any quality control issues.
  3. Step 3: Dimensionality Reduction and Data Integration

    • Use PCA to reduce dimensions down to three.
    • Use Non-Negative Matrix Factorization (NMF) to integrate disparate data sets and find underlying relationships.

NMF formula:

Screenshot 2025-05-15 at 11 34 57 AM
  1. Step 4: Data Splitting

    • Randomly split the dataset into training (80%) and testing (20%) sets.
  2. Step 5: Bidirectional Recurrent Neural Networks (BiRNNs)

    • Run Bidirectional Recurrent Neural Networks (BiRNNs) model to determine dependencies between exposome variables, methylation and microRNA data sets.

image

  1. Step 6: Model Training

    • Train the selected machine learning model using the training dataset.
  2. Step 7: Model Testing

    • Assess the model's performance using the testing dataset to validate its accuracy.
  3. Step 8: Experimentation

    • Investigate the effects of different exposome variables on mRNA and methylation activity.
    • Determine if there are specific genes or pathways involved.
  4. Step 9: Results Interpretation

    • Use bioinformatics analyses (like GSEA, GO) to interpret the model's findings.
  5. Step 10: Visualization

  • Develop visualizations to represent the results clearly (e.g., time-altered line plots).

🌐 Process Flowchart

Team 7 Flow- (4)

🔮 Future Aims

  • Standardization and Packaging

    • Package the model into a standardized, reusable module.
    • Prepare a Python package for easy distribution and use.
  • Publication and Sharing

    • Publish the findings and the Python package for the broader research community.
  • Documentation and Reproducibility

    • Ensure all steps are well-documented to allow for reproducibility of the results.
    • Include instructions for setting up the computational environment and running the analysis.

👤 Contributors

  • Halina Krzystek
  • Kirtan Dave
  • Paul Kao
  • Macciej Kowalski
  • Aung Myat Phyo
  • Diya
  • Alishba Nadeem

🔗 References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 8