GitHub - popitsch/nanopanel2: Nanopanel2: a somatic variant caller for Nanopore panel sequencing data

.=============================================.
                                         _
._ _  ___ ._ _  ___  ___  ___ ._ _  ___ | |
| ' |<_> || ' |/ . \| . \<_> || ' |/ ._>| |
|_|_|<___||_|_|\___/|  _/<___||_|_|\___.|_| 2
                    |_|
.=============================================.

Introduction

Nanopanel2 (np2) is a somatic variant caller for Nanopore panel sequencing data. Np2 works directly on basecalled FAST5 files and outputs VCF and TSV files containing variant calls and associated statistics. It also produces haplotype map TSV and PDF files that inform about haplotype distributions of called (PASS) variants.

A preprint describing and evaluating the Nanopanel2 software is available at bioarxiv.org.

Installation

The recommended way to use np2 is via the released Singularity v3.4.1 images which contains all required 3rd party tools in the supported software versions. Nanopanel2 SIF files can be found in the releases sections.

Users that prefer to run the python code directly should make sure that the following 3rd party tools are available:

3rd Party tools

The following 3rd party executables are called by the np2 python pipeline and are packaged in the singularity container:

samtools v1.9
porechop v0.2.4
minimap2 v2.17
ngmlr v0.2.7
last v1042
bgzip v1.2.1++

Note that the call path of the respective tools can be configured in np2's JSON config file (section 'exe') which makes it possible to install these tools locally.

Input data

To run np2, you need the following input data files:

Guppy-basecalled FAST5 files [required]
FASTA file containing all considered amplicon sequences [required]

Additionally, you can provide a truth-set VCF file per sample if you want to measure the performance of np2 on your data.

Multiplexed data

Np2 supports multiplexed input data and can be configured to automatically run porechop for demultiplexing before any further processing is done. Note that you do not need to configure all of your multiplexed samples in case you want to process only a subset of them. Np2 will ignore the other samples in this case. Please note that porechop is not actively supported anymore and that we will also likely switch to another demultiplexing tool in future nanopanel versions.

Alternatively, you can do the demultiplexing yourself (make sure that the tool you select results in actual FAST5 and not only FASTQ files) before configuring np2 for each sample individually.

Guppy preprocessing

Np2 input FAST5 files must contain guppy basecalling information. Np2 was developed and tested with guppy v3.6.1, an example commanline to call guppy is:

guppy_basecaller \
     -i fast5_file \
     -s output_dir \
     -c dna_r9.4.1_450bps_hac.cfg \
     --fast5_out \
     --trace_categories_logs Move \
     --num_callers 14 \
     --gpu_runners_per_device 8 \
     --chunks_per_runner 768 \
     --chunk_size 500 \
     --disable_pings \
     --compress_fastq \
     -x auto

Configuration file

Np2 is fully configured via a single JSON configuration file, a commented example can be found in the docs folder. Please note that np2 uses the commentjson package to parse input JSON files, so you can use Python/JavaScript style inline comments.

NOTE: the 'basecall_grp' configuration parameter tells np2 from which basecall group to get the basecalling probabilities. By default, guppy will add a new basecall group with every run (i.e, the first is called 'Basecall_1D_000', the second 'Basecall_1D_001', etc.). If you are unsure about the structure in your FAST5 files, you can either inspect them with common HDF5 tools or run 'nanopanel2 show_fast5_struct -i <my_fast5>'.

Read mappers

Np2 can align nanopore reads with 3 aligners: minimap2, ngmlr and last . Note, however, that we currently recommend to use minimap2 only as it showed the best performance in our evaluation.

General usage

singularity run nanopanel2_XXX.sif call --conf config.json --out .

Runtime and memory requirements strongly depend on the size of input data and the number of configured threads. We recommend to run np2 with at least 64gb RAM, larger flowcells may require 128gb. The number of used CPU cores/threads (we recommend at least 8) is configurable via the 'threads' parameter in the JSON config file.

Np2 now runs the whole processing pipeline (see block diagram above) and produces result files along the way. If np2 fails at some stage you can typically restart it and it continues the pipeline from the stage that failed.

License

Nanopanel2 is free for academic use.

If you want to use Nanopanel2 for commercial applications but don't want to adhere to the GNU Affero General Public License v3.0, you can purchase a commercial license. Please contact the author in this case.

Detailed license information can be found in the LICENSE file.

This distribution may include materials developed by third parties. For license and attribution notices for these materials, please refer to the LICENSE file.

Citation

If you make use of nanopanel2, please cite our paper:

Niko Popitsch, Sandra Preuner, Thomas Lion, Nanopanel2 calls phased low-frequency variants in Nanopore panel sequencing data, Bioinformatics, 2021, doi: https://doi.org/10.1093/bioinformatics/btab526

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
docs		docs
licenses		licenses
util		util
.DS_Store		.DS_Store
.gitignore		.gitignore
.project		.project
.pydevproject		.pydevproject
LICENSE		LICENSE
README.rst		README.rst
VERSION		VERSION
nanopanel2.py		nanopanel2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Installation

3rd Party tools

Input data

Multiplexed data

Guppy preprocessing

Configuration file

Read mappers

General usage

License

Citation

About

Uh oh!

Releases 1

Packages

Languages

License

popitsch/nanopanel2

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation

3rd Party tools

Input data

Multiplexed data

Guppy preprocessing

Configuration file

Read mappers

General usage

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages