Skip to content

Deduplication of optical reads with clumpify doesn't work #1

@AJL07

Description

@AJL07

Hi,

I wanted to use the clumpify.sh module to estimate and remove optical duplicates from my fastqs but when I ran the command the number of reads in the output is the same as the input (and it do find optical duplicates).

Here's one of the logs :


Version 39.33

Read Estimate:          818713
Memory Estimate:        624 MB
Memory Available:       8358 MB
Set groups to 1
Executing clump.KmerSort1 [in1=25-410_R1_001.fastq.gz, in2=25-410_R2_001.fastq.gz, out1=estimate_optical_duplicates/25-410_R1_001.markdup.fastq.gz, out2=estimate_optical_duplicates/25-410_R2_001.dedup.fastq.gz, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, dedupe=true, markduplicates=true]

Making comparator.
Made a comparator with k=31, seed=1, border=1, hashes=4
Starting cris 0.
Fetching reads.
Making fetch threads.
Starting threads.
Waiting for threads.
Fetch time: 	4.344 seconds.
Closing input stream.
Combining thread output.
Combine time: 	0.001 seconds.
Sorting.
Sort time: 	0.743 seconds.
Making clumps.
Clump time: 	0.560 seconds.
Deduping.
Dedupe time: 	1.145 seconds.
Writing.
Waiting for writing to complete.
Write time: 	6.892 seconds.
Done!
Time:                         	13.996 seconds.
Reads Processed:          247k 	17.70k reads/sec
Bases Processed:        37158k 	2.65m bases/sec

Reads In:                        247726
Clumps Formed:           50157
Duplicates Found:        14432	5.826%
Reads Out:                     247726
Bases Out:                     37158900
Total time: 	14.604 seconds.

Thanks
Anaïs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions