MethProScan is an R package designed for DNA methylation array data analysis, with a special focus on promoter regions. It provides a complete workflow — from preprocessing and statistical testing to visualization and biological interpretation.
This package is part of my PhD project on Brain Epigenomics. While similar analysis pipelines can be found elsewhere, MethProScan integrates visualization and summary statistics in a way that makes data exploration more intuitive, clear, and engaging.
You don’t need to write extra code or ask “how many CpGs are up/down” or “how many are in promoter regions” — everything is automatically computed and visualized for you 🧠📊.
- Comprehensive methylation pipeline for EPICv1 and EPICv2 arrays
- Liftover support — convert EPICv2 probe data to EPICv1 coordinates for compatibility
- Promoter region analysis — CpG-level and region-level statistical testing
- Quality control (QC) with plots for probe and sample filtering
- Automatic summary statistics (e.g. CpG up/down counts, promoter enrichment)
- Visualization — volcano plots, MA plots, heatmaps, PCA, and promoter-specific views
- Flexible parameters — thresholds, imputation, batch correction, and parallelization
Install the development version directly from GitHub:
# install.packages("devtools")
devtools::install_github("lucianhu/MethProScan")Then load the package:
library(MethProScan)Below are simplified examples demonstrating the main functions. Note: all file paths should be replaced with your own working directories.
# Load metadata
metadata <- read.csv("samplesheet.csv", stringsAsFactors = FALSE)
# Filter metadata (example)
metadata_epicv1 <- metadata[metadata$array_type == "EPICv1", ]
# Run the analysis pipeline
results <- process_methylation(
metadata = metadata_epicv1,
idat_col = "idat_ffpe",
sample_col = "rna_kryo",
group_col = "max_class",
idat_dir = "data/idat_files",
output_dir = "results/methylation",
array_type = "EPICv1",
genome_build = "hg38",
remove_sex = TRUE
)# Filter for EPICv2 samples
metadata_epicv2 <- metadata[metadata$array_type == "EPICv2", ]
# Run EPICv2 → EPICv1 liftover pipeline
results <- process_epicv2_to_epicv1(
metadata = metadata_epicv2,
idat_dir = "data/idat_files",
epicv1_mset_path = "results/Noob_mset.RDS",
output_dir = "results/epicv1_epicv2_merged",
idat_col = "idat_ffpe",
sample_col = "rna_kryo",
group_col = "max_class",
array_col = "array_type",
genome_build = "hg38",
probe_pval_threshold = 0.05,
sample_qc_threshold = 0.05,
sample_threshold = 0.1,
n_cores = 8,
impute_liftover = FALSE,
apply_combat = TRUE,
generate_qc_plots = TRUE,
prefix = "EPICv2_EPICv1_merged"
)
# Access results
beta_values <- results$beta
m_values <- results$m_values
phenotype <- results$phenotypediff_dmc <- dmc_analysis(
mset = results$ratioSet,
annotation = NULL,
lfc_cutoff = 0.2,
p_adjust = 0.05,
reference_level = "LOW",
comparison_level = "HIGH",
output_dir = "results/dmc_analysis"
)# Example: plot CpGs within significant PDE4D promoters
significant_promoters <- custom_df %>%
dplyr::filter(geneName == "PDE4D", padj < 0.05) %>%
dplyr::arrange(padj) %>%
dplyr::pull(promoterEnsemblId)
for (promoter in significant_promoters) {
plot_promoter_cpgs(
promoter_id = promoter,
cpg_map = cpg_promoter_map,
mset = mset,
pheno = pheno,
n_cpgs = 10,
output_dir = "results/dmr_analysis/promoter_plots"
)
}After running the pipeline, MethProScan produces:
- Processed methylation objects (
.RDS) - Differentially methylated results (
_all_results.csv,_significant.csv) - Plots: MA, Volcano, Heatmap, PCA, tSNE, Promoter CpG visualization
- Summary text file with up/down counts and significance thresholds
If you use MethProScan in your research, please cite:
Nguyen, Q. N., MethProScan: Methylation Analysis with Promoter Region Scanning (PhD project, Heidelberg University).
This package is distributed under the MIT License.