This project presents an end-to-end RNA-seq analysis investigating the host transcriptional response to Influenza A virus (WSN/33) infection.
The workflow integrates experimental design awareness with reproducible computational steps to characterize virus-induced changes in host gene expression, with a particular focus on interferon-mediated antiviral responses.
How does Influenza A virus infection alter host gene expression, and which components of the innate immune response are transcriptionally activated in infected cells compared to mock controls?
- Host: Human (GENCODE v49, GRCh38)
- Virus: Influenza A virus (WSN/33 strain)
- Conditions:
mock: uninfected controlvirus: Influenza A–infected samples
- Sequencing: RNA-seq (paired-end)
The repository is organized to clearly separate metadata, references, scripts, and results.
-
config/
Contains configuration files used by shell scripts (paths, parameters). -
data/metadata/
Experimental metadata (samples.csv) describing samples, conditions, and sequencing runs. -
ref/
Reference files used for analysis, including:- a combined host–virus FASTA (human GENCODE v49 + Influenza A WSN/33),
- the GENCODE v49 annotation (GTF),
- a transcript-to-gene mapping file (
tx2gene).
-
scripts/
Modular shell and R scripts implementing each step of the workflow, from data retrieval to visualization. -
results/
Processed outputs and final results, including:- figures/: PCA, volcano plot, and ISG heatmap,
- host_gene/: gene-level differential expression results,
- host_matrix/: host-only expression matrices.
Large intermediate files (raw FASTQ, Salmon indices, quantification outputs) are intentionally excluded.
Raw RNA-seq data were retrieved using SRA Toolkit and ENA, ensuring robustness against network instability.
Sequencing quality was assessed using:
- FastQC for individual samples
- MultiQC for aggregated reports
A combined host–virus reference was constructed by merging:
- the human transcriptome (GENCODE v49),
- the Influenza A (WSN/33) genome.
This approach enables simultaneous quantification of host and viral transcripts.
Transcript-level quantification was performed using Salmon on the combined reference.
Host and viral transcripts were quantified together, after which viral transcripts were excluded during downstream host gene-level analysis.
Host-only expression matrices were generated by summarizing transcript-level estimates to gene level using a curated tx2gene mapping derived from GENCODE v49.
Gene-level differential expression was performed with DESeq2, using an explicit contrast:
- virus vs mock
Genes with positive log2 fold change are transcriptionally induced by viral infection, while negative values indicate repression.
To explore and interpret host responses:
- PCA was applied to variance-stabilized counts to assess global transcriptional differences,
- Volcano plots were used to identify significantly induced and repressed genes,
- A heatmap of interferon-stimulated genes (ISGs) was generated to highlight coordinated antiviral responses.
Principal component analysis reveals a clear separation between mock and virus samples, indicating a strong virus-driven transcriptional effect.
The analysis identifies robust induction of classical interferon-stimulated genes (ISGs), including:
- OAS1, OAS3
- STAT1
- IRF7
- IFIT family genes
These genes are hallmarks of early innate immune activation and antiviral defense.
Genes with higher expression in mock samples likely represent baseline cellular processes that are transcriptionally reprogrammed upon infection.
All analytical steps are implemented as modular scripts, allowing the full workflow to be rerun from raw data if needed.
Large intermediate files (raw reads, indices, quantification outputs) are intentionally excluded from version control, while all scripts and final results required for interpretation are provided.
- R (≥ 4.2) with packages:
DESeq2,tximport,ggplot2,ggrepel,pheatmap,dplyr,readr - Salmon for transcript quantification
- FastQC / MultiQC for quality control
Yasmina Soumahoro
Biologist | Bioinformatics | Host–Pathogen Transcriptomics