Skip to content

Feature Request: Add option for circular contig simulation in RandomReadsMG #4

@teojcryan

Description

@teojcryan

The current implementation of RandomReadsMG treats contigs as linear which prevents the generation of reads that span the (often arbitrary) start/end seam of circular elements. It would be very helpful if RandomReadsMG could simulate reads from circular genomes (e.g., plasmids, bacterial/archaeal chromosomes), which is critical for generating realistic metagenomic datasets. This functionality is present in other simulators like readSimulator.

A key difficulty is how to designate specific contigs in a multi-FASTA input as circular. Enforcing a specific FASTA header format is complex and can be brittle, so I think it would suffice to just apply the circularity property at the input file level by modifying the existing custom depth notation.

The new notation could be (feel free to pick any other notation):

  • Current notation: <file>=X (e.g., ecoli.fa=40) sets a custom depth of 40x for ecoli.fa.
  • Proposed notation: <file>=Xc (e.g., plasmid_library.fa=50c) would set a custom depth of 50x and treat all contigs within plasmid_library.fa as circular.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions