Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ jobs:
- name: test short single
run: nextflow run -ansi-log -profile docker,test_illumina_single aline.nf
- name: test short single cram
run: nextflow run -ansi-log -profile docker,test_illumina_single aline.nf --cram
run: nextflow run -ansi-log -profile docker,test_illumina_single aline.nf --cram --filter_unmapped
- name: test short paired
run: nextflow run -ansi-log -profile docker,test_illumina_paired aline.nf
- name: test ont
Expand Down
58 changes: 9 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ It is then translated to the correct option in the following aligners:
| bwamem | 🚫 | 🚫 | 🚫 |
| bwamem2 | 🚫 | 🚫 | 🚫 |
| bwasw | 🚫 | 🚫 | 🚫 |
| dragmap | 🚫 | 🚫 | 🚫 |
| graphmap2 | 🚫 | 🚫 | 🚫 |
| hisat2 | --rna-strandness [ F / R / FR / RF ] | SF / SR / ISF OSF MSF / ISR OSR MSR | strand information |
| hisat2 | --fr / --rf / --ff | I / O / M | read orientation |
Expand Down Expand Up @@ -128,6 +129,7 @@ If you provide an annotation file the pipeline will pass automatically the file
| bwamem | 🚫 |
| bwamem2 | 🚫 |
| bwasw | 🚫 |
| dragmap | 🚫 |
| graphmap2 | GTF (--gtf) |
| hisat2 | 🚫 |
| kallisto | 🚫 |
Expand Down Expand Up @@ -335,9 +337,8 @@ On success you should get a message looking like this:
control1,path/to/data1.fastq.gz,,auto,short_single,rna
control2,path/to/data2_R1.fastq.gz,path/to/data2_R2.fastq.gz,auto,short_paired,rna
--reference path to the reference file (fa, fa.gz, fasta or fasta.gz)
--aligner aligner(s) to use among this list (comma or space separated) [bbmap, bowtie, bowtie2, bwaaln, bwamem, bwamem2, bwasw, graphmap2, hisat2, kallisto, minimap2, novoalign, nucmer, ngmlr, star, subread, sublong]
--aligner aligner(s) to use among this list (comma or space separated) [bbmap, bowtie, bowtie2, bwaaln, bwamem, bwamem2, bwasw, dragmap, graphmap2, hisat2, kallisto, minimap2, novoalign, nucmer, ngmlr, salmon, star, subread, sublong]
--outdir path to the output directory (default: alignment_results)
--cram output alignment files in sorted CRAM format instead of sorted BAM (default: false). This saves disk space but disables FastQC on alignment files.
--annotation [Optional][used by graphmap2, STAR, subread] Absolute path to the annotation file (gtf or gff3)

Type of input reads
Expand All @@ -354,6 +355,8 @@ On success you should get a message looking like this:
--trimming_fastp run fastp for trimming (default: false)
--fastqc run fastqc on raw and aligned reads (default: false). Note: FastQC will be automatically disabled for alignment files when --cram is enabled.
--samtools_stats run samtools stats on aligned reads (default: false)
--filter_unmapped filter out unmapped reads from final alignment files (default: false). Filtering is performed during sorting when possible for optimal performance.
--cram output alignment files in sorted CRAM format instead of sorted BAM (default: false). This saves disk space but disables FastQC on alignment files. Conversion is performed during sorting when possible for optimal performance.
--multiqc_config path to the multiqc config file (default: config/multiqc_conf.yml)

Aligner specific options
Expand All @@ -364,6 +367,7 @@ On success you should get a message looking like this:
--bwamem_options additional options for bwamem
--bwamem2_options additional options for bwamem2
--bwasw_options additional options for bwasw
--dragmap_options additional options for dragmap
--graphmap2_options additional options for graphmap2
--hisat2_options additional options for hisat2
--kallisto_options additional options for kallisto
Expand Down Expand Up @@ -452,7 +456,8 @@ Here the description of typical ouput you will get from AliNe:
└── MultiQC # MultiQC folder that aggregate results across many samples into a single report
├── multiqc_report.html # Report with interactive plots for statistics across many samples.
└── multiqc_report_data # Plot and data used by the multiqc_report.html
├── multiqc_report_data # Plot and data used by the multiqc_report.html
└── alignment_comparison.tsv # A tsv table summerizing the statistics of the different aligners across all samples.
```

### Statistics
Expand Down Expand Up @@ -497,52 +502,7 @@ Some information produced via FastQC or Samtools stats are reported at the top o

<img src="img/multiqc.png" />

In order to facilitate the reading of this `General Statistics` you can export the table in tsv using the `Export as CSV...` button and execute the following piece of R code on the downloaded `general_stats_table.tsv` file :

```R
# install packages
install.packages("dplyr")
install.packages("stringr")
install.packages("tidyr")
install.packages("knitr")

# Load necessary libraries
library(dplyr)
library(stringr)
library(tidyr)
library(knitr)

# Read the TSV file
file_path <- "general_stats_table.tsv"
df <- read.delim(file_path, check.names = FALSE)

# clean sample name to remove suffix _*_samtoolsstats
df$Sample <- df$Sample |> stringr::str_remove_all("_\\d+_samtoolsstats")

# sample name as row name
rownames(df) <- df$Sample

# remove Sample column and clean up the column names
tableout <- cbind(ID = rownames(df), stack(df[-1])) |>
transform(ind = as.character(ind) |> stringr::str_remove_all("\\.\\d+"))

# remove na values
tableout <- tableout[!is.na(tableout$values),]
# remove . values
tableout$values <- tableout$values |> stringr::str_remove_all("^\\.$")

# pivot data
tableout <- tableout |> pivot_wider(id_cols = ID , names_from = ind, values_from = values,
values_fn = \(x) paste(unique(x), collapse = ""))

# round each value to 4 decimals
tableout <- tableout |> mutate(across(-ID, ~round(as.numeric(.), 4)))

# print with nice output
knitr::kable(tableout)
```

You will get a table similar to this one:
To make the General Statistics easier to read and compare, AliNe also generates a TSV file named `alignment_comparison.tsv`, located in the `<output_directory>/MultiQC` directory. This file contains the same information as the `General Statistics` table, but in a simpler, tabular format that is more convenient for comparisons. It looks like this:

```
|ID | Dups| GC| Seqs| Error rate| Non-primary| Reads mapped| % Mapped| Total seqs|
Expand Down
Loading