-
Notifications
You must be signed in to change notification settings - Fork 15
Tutorial
Here we provide an example run using data from SRA and OICR.
FASTQ files are provided by various international institutes and are publicly available in the Sequence Read Archive (SRA). 3 random samples sequenced on the Illumina platform were downloaded from SRA to demonstrate the output generated from ncov-tools.
The following samples were download from SRA:
VA-DCLS-1905
SRX9446939: Amplicon-based sequencing of SARS-CoV-2: VA-DCLS-1905
1 ILLUMINA (Illumina MiSeq) run: 190,731 spots, 18.6M bases, 7.8Mb downloads
ARTIC PCR-tiling of viral cDNA (V3), sequenced by Illumina MiSeq with DNA Flex library prep-kit. Only reads aligned to SARS-CoV-2 reference (NC_045512.2) retained.
VA-DCLS-1863
SRX9446956: Amplicon-based sequencing of SARS-CoV-2: VA-DCLS-1863
1 ILLUMINA (Illumina MiSeq) run: 244,177 spots, 23.9M bases, 10.1Mb downloads
ARTIC PCR-tiling of viral cDNA (V3), sequenced by Illumina MiSeq with DNA Flex library prep-kit. Only reads aligned to SARS-CoV-2 reference (NC_045512.2) retained
VA-DCLS-1856
SRX9446952: Amplicon-based sequencing of SARS-CoV-2: VA-DCLS-1856
1 ILLUMINA (Illumina MiSeq) run: 229,677 spots, 22.4M bases, 9.5Mb downloads
ARTIC PCR-tiling of viral cDNA (V3), sequenced by Illumina MiSeq with DNA Flex library prep-kit. Only reads aligned to SARS-CoV-2 reference (NC_045512.2) retained.
Note that OICR has provided negative control FASTQ files.
The Connor Lab has built a Nextflow pipeline, with focus on COVID-19, to run alignment and variant tools and generate output for use with downstream analysis. Review the documentation for Nextflow and the ncov2019-artic-nv pipeline for instructions on installing and running the pipeline.
Nextflow v20.10.0 build 5430
https://github.com/connor-lab/ncov2019-artic-nf
- Create the following directory structure:
run_name
├── data
└── qc
└── data
- Transfer the FASTQ files from the SRA samples into the
run_name/datadirectory. - Clone the negative controls repository and copy the FASTQ files into the
run_name/datadirectory. - Clone the Connor Lab Nextflow pipeline repository and the ncov primer schemes into run_name
cd run_name
git clone git@github.com:connor-lab/ncov2019-artic-nf.git
git clone git@github.com:artic-network/artic-ncov2019.git
- Run the Nextflow pipeline inside the
run_namedirectory:
nextflow run ncov2019-artic-nf/main.nf --schemeVersion V3 --directory data --illumina --prefix run_name
- Link all
.bam,.consensus.fa, and.variants.tsvfiles into theqc/datadirectory
cd qc/data
ln -s ../../results/ncovIllumina_sequenceAnalysis_trimPrimerSequences/<sample>.mapped.bam
ln -s ../../results/ncovIllumina_sequenceAnalysis_trimPrimerSequences/<sample>.mapped.primertrimmed.sorted.bam
ln -s ../../results/ncovIllumina_sequenceAnalysis_readMapping/<sample>.sorted.bam
ln -s ../../results/ncovIllumina_sequenceAnalysis_makeConsensus/<sample>.primertrimmed.consensus.fa
ln -s ../../results/ncovIllumina_sequenceAnalysis_callVariants/<sample>.variants.tsv
- If you haven’t installed the
ncov-toolpackage, follow the installation documentation. - Run the
ncov-toolspipeline:
snakemake -s /path/to/Snakefile —cores <number of cores> all_qc_sequencing
snakemake -s /path/to/Snakefile —cores <number of cores> all_qc_analysis
snakemake -s /path/to/Snakefile —cores <number of cores> all_qc_reports
- Review the plots (in plots/) and the generated reports (in qc_reports/).
The _summary_qc.tsv table shows metadata summarizing each sample and a final classification in the qc_pass column which can be used to determine whether the sample passes or fails. In this instance, the negative control fails due to a lack of viral template (classified as INCOMPLETE_GENOME) while all other samples pass all criteria.
The _negative_control_report.tsv shows the Neg1 sample has passed. There were 0 amplicons detected in the control and only 6 bases covered from the alignment.
The _mixture_report.tsv shows no samples having contamination between other samples.
The _ambiguous_position_report.tsv does not identify any common ambiguous bases between 2 or more samples.
The _amplicon_coverage_heatmap.pdf plot shows Neg1 having 0 coverage across amplicons while the remaining 3 samples significant coverage across all amplicons. Note that amplicon 64 is a commonly identified low coverage amplicon.
The _amplicon_covered_fraction.pdf plot shows sample Neg having low fraction of the amplicon covered across all amplicons. All other samples show 100% coverage across all amplicons.
The _depth_by_position.pdf plot shows sample Neg1 having random position coverage across the genome with the majority having 0 coverage. All other samples show consistent coverage across all genomic positions.
The _tree_snps.pdf plot includes all 3 positive samples and the reference genome labelled MN908947.3. Note the absence of Neg1 in the plot which only includes samples at 75% genome completeness. From here we can identify common mutations between samples and their phylogenetic profile.