Skip to content

lucianhu/MethProScan

Repository files navigation

MethProScan

🧬 Methylation Analysis with Promoter Region Scanning

📖 Overview

MethProScan is an R package designed for DNA methylation array data analysis, with a special focus on promoter regions. It provides a complete workflow — from preprocessing and statistical testing to visualization and biological interpretation.

This package is part of my PhD project on Brain Epigenomics. While similar analysis pipelines can be found elsewhere, MethProScan integrates visualization and summary statistics in a way that makes data exploration more intuitive, clear, and engaging.

You don’t need to write extra code or ask “how many CpGs are up/down” or “how many are in promoter regions” — everything is automatically computed and visualized for you 🧠📊.

⚙️ Key Features

  • Comprehensive methylation pipeline for EPICv1 and EPICv2 arrays
  • Liftover support — convert EPICv2 probe data to EPICv1 coordinates for compatibility
  • Promoter region analysis — CpG-level and region-level statistical testing
  • Quality control (QC) with plots for probe and sample filtering
  • Automatic summary statistics (e.g. CpG up/down counts, promoter enrichment)
  • Visualization — volcano plots, MA plots, heatmaps, PCA, and promoter-specific views
  • Flexible parameters — thresholds, imputation, batch correction, and parallelization

🧩 Installation

Install the development version directly from GitHub:

# install.packages("devtools")
devtools::install_github("lucianhu/MethProScan")

Then load the package:

library(MethProScan)

🚀 Basic Usage Example

Below are simplified examples demonstrating the main functions. Note: all file paths should be replaced with your own working directories.

1️⃣ Run the methylation processing pipeline

# Load metadata
metadata <- read.csv("samplesheet.csv", stringsAsFactors = FALSE)

# Filter metadata (example)
metadata_epicv1 <- metadata[metadata$array_type == "EPICv1", ]

# Run the analysis pipeline
results <- process_methylation(
  metadata = metadata_epicv1,
  idat_col = "idat_ffpe",
  sample_col = "rna_kryo",
  group_col = "max_class",
  idat_dir = "data/idat_files",
  output_dir = "results/methylation",
  array_type = "EPICv1",
  genome_build = "hg38",
  remove_sex = TRUE
)

2️⃣ Process and merge EPICv2 → EPICv1 datasets

# Filter for EPICv2 samples
metadata_epicv2 <- metadata[metadata$array_type == "EPICv2", ]

# Run EPICv2 → EPICv1 liftover pipeline
results <- process_epicv2_to_epicv1(
  metadata = metadata_epicv2,
  idat_dir = "data/idat_files",
  epicv1_mset_path = "results/Noob_mset.RDS",
  output_dir = "results/epicv1_epicv2_merged",
  idat_col = "idat_ffpe",
  sample_col = "rna_kryo",
  group_col = "max_class",
  array_col = "array_type",
  genome_build = "hg38",
  probe_pval_threshold = 0.05,
  sample_qc_threshold = 0.05,
  sample_threshold = 0.1,
  n_cores = 8,
  impute_liftover = FALSE,
  apply_combat = TRUE,
  generate_qc_plots = TRUE,
  prefix = "EPICv2_EPICv1_merged"
)

# Access results
beta_values <- results$beta
m_values <- results$m_values
phenotype <- results$phenotype

3️⃣ Differentially Methylated CpG (DMC) analysis

diff_dmc <- dmc_analysis(
  mset = results$ratioSet,
  annotation = NULL,
  lfc_cutoff = 0.2,
  p_adjust = 0.05,
  reference_level = "LOW",
  comparison_level = "HIGH",
  output_dir = "results/dmc_analysis"
)

4️⃣ Promoter-level exploration

# Example: plot CpGs within significant PDE4D promoters
significant_promoters <- custom_df %>%
  dplyr::filter(geneName == "PDE4D", padj < 0.05) %>%
  dplyr::arrange(padj) %>%
  dplyr::pull(promoterEnsemblId)

for (promoter in significant_promoters) {
  plot_promoter_cpgs(
    promoter_id = promoter,
    cpg_map = cpg_promoter_map,
    mset = mset,
    pheno = pheno,
    n_cpgs = 10,
    output_dir = "results/dmr_analysis/promoter_plots"
  )
}

🧠 Outputs

After running the pipeline, MethProScan produces:

  • Processed methylation objects (.RDS)
  • Differentially methylated results (_all_results.csv, _significant.csv)
  • Plots: MA, Volcano, Heatmap, PCA, tSNE, Promoter CpG visualization
  • Summary text file with up/down counts and significance thresholds

🧬 Citation

If you use MethProScan in your research, please cite:

Nguyen, Q. N., MethProScan: Methylation Analysis with Promoter Region Scanning (PhD project, Heidelberg University).

📄 License

This package is distributed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors