Skip to content

End-to-end host–virus RNA-seq analysis of Influenza A infection.

Notifications You must be signed in to change notification settings

yasmina-bioinfo/Viro_InfluenzaA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Viro_InfluenzaA — Host–Virus RNA-seq Analysis

Overview

This project presents an end-to-end RNA-seq analysis investigating the host transcriptional response to Influenza A virus (WSN/33) infection.

The workflow integrates experimental design awareness with reproducible computational steps to characterize virus-induced changes in host gene expression, with a particular focus on interferon-mediated antiviral responses.


Biological question

How does Influenza A virus infection alter host gene expression, and which components of the innate immune response are transcriptionally activated in infected cells compared to mock controls?


Experimental design

  • Host: Human (GENCODE v49, GRCh38)
  • Virus: Influenza A virus (WSN/33 strain)
  • Conditions:
    • mock: uninfected control
    • virus: Influenza A–infected samples
  • Sequencing: RNA-seq (paired-end)

Project structure

The repository is organized to clearly separate metadata, references, scripts, and results.

  • config/
    Contains configuration files used by shell scripts (paths, parameters).

  • data/metadata/
    Experimental metadata (samples.csv) describing samples, conditions, and sequencing runs.

  • ref/
    Reference files used for analysis, including:

    • a combined host–virus FASTA (human GENCODE v49 + Influenza A WSN/33),
    • the GENCODE v49 annotation (GTF),
    • a transcript-to-gene mapping file (tx2gene).
  • scripts/
    Modular shell and R scripts implementing each step of the workflow, from data retrieval to visualization.

  • results/
    Processed outputs and final results, including:

    • figures/: PCA, volcano plot, and ISG heatmap,
    • host_gene/: gene-level differential expression results,
    • host_matrix/: host-only expression matrices.

Large intermediate files (raw FASTQ, Salmon indices, quantification outputs) are intentionally excluded.


Analysis workflow

1. Data acquisition

Raw RNA-seq data were retrieved using SRA Toolkit and ENA, ensuring robustness against network instability.

2. Quality control

Sequencing quality was assessed using:

  • FastQC for individual samples
  • MultiQC for aggregated reports

3. Reference preparation

A combined host–virus reference was constructed by merging:

  • the human transcriptome (GENCODE v49),
  • the Influenza A (WSN/33) genome.

This approach enables simultaneous quantification of host and viral transcripts.

4. Quantification

Transcript-level quantification was performed using Salmon on the combined reference.
Host and viral transcripts were quantified together, after which viral transcripts were excluded during downstream host gene-level analysis.

5. Host gene expression matrices

Host-only expression matrices were generated by summarizing transcript-level estimates to gene level using a curated tx2gene mapping derived from GENCODE v49.

6. Differential expression analysis

Gene-level differential expression was performed with DESeq2, using an explicit contrast:

  • virus vs mock

Genes with positive log2 fold change are transcriptionally induced by viral infection, while negative values indicate repression.

7. Visualization and interpretation

To explore and interpret host responses:

  • PCA was applied to variance-stabilized counts to assess global transcriptional differences,
  • Volcano plots were used to identify significantly induced and repressed genes,
  • A heatmap of interferon-stimulated genes (ISGs) was generated to highlight coordinated antiviral responses.

Key results

Global transcriptional response

Principal component analysis reveals a clear separation between mock and virus samples, indicating a strong virus-driven transcriptional effect.

Differentially expressed genes

The analysis identifies robust induction of classical interferon-stimulated genes (ISGs), including:

  • OAS1, OAS3
  • STAT1
  • IRF7
  • IFIT family genes

These genes are hallmarks of early innate immune activation and antiviral defense.

Genes with higher expression in mock samples likely represent baseline cellular processes that are transcriptionally reprogrammed upon infection.


Reproducibility

All analytical steps are implemented as modular scripts, allowing the full workflow to be rerun from raw data if needed.
Large intermediate files (raw reads, indices, quantification outputs) are intentionally excluded from version control, while all scripts and final results required for interpretation are provided.


Software and tools

  • R (≥ 4.2) with packages: DESeq2, tximport, ggplot2, ggrepel, pheatmap, dplyr, readr
  • Salmon for transcript quantification
  • FastQC / MultiQC for quality control

Author

Yasmina Soumahoro
Biologist | Bioinformatics | Host–Pathogen Transcriptomics

About

End-to-end host–virus RNA-seq analysis of Influenza A infection.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published