GWPIS (Geometric Weighted Pathway Interaction Score) is a framework developed to explore the interaction between SARS-CoV-2 proteins and host immune pathways. The library integrates D-SCRIPT protein-language models (available at https://d-script.readthedocs.io) and weighted pathway activity scores from PROGENy (available at https://saezlab.github.io/progeny/) to quantify how SARS-CoV-2 proteins interact with host immune responses at the pathway level.
This project requires both R (4.0.2) and Python (3.7.12) environments.
R packages required:
- dplyr (1.1.3)
- ggplot2 (3.4.4)
- reshape2 (1.4.4)
- tidyr (1.3.0)
- igraph (1.5.1)
- Matrix (1.6-1.1)
- DESeq2 (1.42.0)
- Mfuzz (2.48.0)
- PROGENy (1.10.0)
- clusterProfiler (4.9.0.002)
- GenomicRanges (1.52.1)
- SummarizedExperiment (1.30.2)
- e1071 (1.7-4)
You can install the necessary packages with the following commands in R:
install.packages(c("dplyr", "ggplot2", "reshape2", "tidyr", "igraph", "Matrix", "e1071"))
BiocManager::install(c("DESeq2", "Mfuzz", "clusterProfiler", "PROGENy", "GenomicRanges", "SummarizedExperiment"))Python packages required:
- D-SCRIPT ๏ผ0.2.8๏ผ
You can install the necessary packages with the following commands in System:
git clone https://github.com/samsledje/D-SCRIPT.git
cd D-SCRIPT
conda env create --file environment.yml
conda activate dscript
pip install dscript
This repository is organized into three main folders: data, script, and analysis. Below is a breakdown of the files contained in each folder:
Contains R scripts used for analysis, including data processing, clustering, and pathway analysis.
- 01_Fig1_Bulk_RNAseq.Rmd: R Markdown script for processing Bulk RNA-seq data.
- 02_Fig1_Mfuzz.Rmd: R Markdown script for performing gene clustering using the Mfuzz package.
- 03_Fig1_PROGENy.Rmd: R Markdown script for conducting pathway analysis of the samples using PROGENy.
- 04_Fig2_GWPIS.Rmd: R Markdown script for calculating the interaction scores between SARS-CoV-2 proteins and immune pathways using the GWPIS method.
Contains the data files used in the analysis.
- 01_RawCount.txt: Raw RNA-seq count data used for Figure 1 (Bulk RNA-seq).
- 04_Predict.tsv: Protein-protein interaction data generated by the D-SCRIPT method, used for Figure 2 (protein interaction analysis).
Contains processed analysis results and intermediate data objects.
- 01_ddsres.Rds: DESeq2 object created using the raw RNA-seq counts.
- 02_cl.Rds/02_df.Rds: Results of gene clustering using Mfuzz.
- 03_Progeny.Rds: PROGENy pathway enrichment results.
-
Sledzieski S, Singh R, Cowen L, Berger B. D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions. Cell Syst. 2021 Oct 20;12(10):969-982.e6. doi: 10.1016/j.cels.2021.08.010. Epub 2021 Oct 9. PMID: 34536380; PMCID: PMC8586911.
-
Schubert M, Klinger B, Klรผnemann M, Sieber A, Uhlitz F, Sauer S, Garnett MJ, Blรผthgen N, Saez-Rodriguez J. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nat Commun. 2018 Jan 2;9(1):20. doi: 10.1038/s41467-017-02391-6. PMID: 29295995; PMCID: PMC5750219.
If you use the data or code from this repository in your work, please cite this repository:
This repository accompanies a manuscript currently under peer review. Citation details will be updated upon publication.