.=============================================.
_
._ _ ___ ._ _ ___ ___ ___ ._ _ ___ | |
| ' |<_> || ' |/ . \| . \<_> || ' |/ ._>| |
|_|_|<___||_|_|\___/| _/<___||_|_|\___.|_| 2
|_|
.=============================================.
Nanopanel2 (np2) is a somatic variant caller for Nanopore panel sequencing data. Np2 works directly on basecalled FAST5 files and outputs VCF and TSV files containing variant calls and associated statistics. It also produces haplotype map TSV and PDF files that inform about haplotype distributions of called (PASS) variants.
A preprint describing and evaluating the Nanopanel2 software is available at bioarxiv.org.
The recommended way to use np2 is via the released Singularity v3.4.1 images which contains all required 3rd party tools in the supported software versions. Nanopanel2 SIF files can be found in the releases sections.
Users that prefer to run the python code directly should make sure that the following 3rd party tools are available:
The following 3rd party executables are called by the np2 python pipeline and are packaged in the singularity container:
Note that the call path of the respective tools can be configured in np2's JSON config file (section 'exe') which makes it possible to install these tools locally.
To run np2, you need the following input data files:
- Guppy-basecalled FAST5 files [required]
- FASTA file containing all considered amplicon sequences [required]
Additionally, you can provide a truth-set VCF file per sample if you want to measure the performance of np2 on your data.
Np2 supports multiplexed input data and can be configured to automatically run porechop for demultiplexing before any further processing is done. Note that you do not need to configure all of your multiplexed samples in case you want to process only a subset of them. Np2 will ignore the other samples in this case. Please note that porechop is not actively supported anymore and that we will also likely switch to another demultiplexing tool in future nanopanel versions.
Alternatively, you can do the demultiplexing yourself (make sure that the tool you select results in actual FAST5 and not only FASTQ files) before configuring np2 for each sample individually.
Np2 input FAST5 files must contain guppy basecalling information. Np2 was developed and tested with guppy v3.6.1, an example commanline to call guppy is:
guppy_basecaller \
-i fast5_file \
-s output_dir \
-c dna_r9.4.1_450bps_hac.cfg \
--fast5_out \
--trace_categories_logs Move \
--num_callers 14 \
--gpu_runners_per_device 8 \
--chunks_per_runner 768 \
--chunk_size 500 \
--disable_pings \
--compress_fastq \
-x auto
Np2 is fully configured via a single JSON configuration file, a commented example can be found in the docs folder. Please note that np2 uses the commentjson package to parse input JSON files, so you can use Python/JavaScript style inline comments.
NOTE: the 'basecall_grp' configuration parameter tells np2 from which basecall group to get the basecalling probabilities. By default, guppy will add a new basecall group with every run (i.e, the first is called 'Basecall_1D_000', the second 'Basecall_1D_001', etc.). If you are unsure about the structure in your FAST5 files, you can either inspect them with common HDF5 tools or run 'nanopanel2 show_fast5_struct -i <my_fast5>'.
Np2 can align nanopore reads with 3 aligners: minimap2, ngmlr and last . Note, however, that we currently recommend to use minimap2 only as it showed the best performance in our evaluation.
singularity run nanopanel2_XXX.sif call --conf config.json --out .
Runtime and memory requirements strongly depend on the size of input data and the number of configured threads. We recommend to run np2 with at least 64gb RAM, larger flowcells may require 128gb. The number of used CPU cores/threads (we recommend at least 8) is configurable via the 'threads' parameter in the JSON config file.
Np2 now runs the whole processing pipeline (see block diagram above) and produces result files along the way. If np2 fails at some stage you can typically restart it and it continues the pipeline from the stage that failed.
Nanopanel2 is free for academic use.
If you want to use Nanopanel2 for commercial applications but don't want to adhere to the GNU Affero General Public License v3.0, you can purchase a commercial license. Please contact the author in this case.
Copyright (c) 2020-2021 Niko Popitsch.
Detailed license information can be found in the LICENSE file.
This distribution may include materials developed by third parties. For license and attribution notices for these materials, please refer to the LICENSE file.
If you make use of nanopanel2, please cite our paper:
Niko Popitsch, Sandra Preuner, Thomas Lion, Nanopanel2 calls phased low-frequency variants in Nanopore panel sequencing data, Bioinformatics, 2021, doi: https://doi.org/10.1093/bioinformatics/btab526
