Skip to content

jonahdaus/ensembleassembly

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snakemake Ensemble Assembly Pipeline

This repository contains a bioinformatics pipeline for ensemble genome assembly using Snakemake. The pipeline integrates de novo and reference-based assembly approaches to improve assembly quality for similar (e.g., bacterial) genomes while removing contaminant sequences (e.g., from host DNA).

Project Structure

.
├── README.md              # This file
├── config                 # Configuration files
│   └── config.yaml        # Main configuration
├── resources              # Resources (e.g., reference genomes)
├── results                # Results directory
└── workflow               # Snakemake workflow
    ├── Snakefile          # Main Snakemake file
    ├── env                # Conda environments
    ├── rules              # Snakemake rules (e.g., assembly, filtering)
    └── scripts            # Auxiliary scripts

Installation

  1. Create Conda environment:

    conda env create -f workflow/env/myenv.yaml
    conda activate myenv
  2. Install Snakemake (if not already installed):

    conda install -c conda-forge snakemake

Running the Pipeline

To run the pipeline:

snakemake --use-conda

Configuration

Edit config/config.yaml to customize parameters such as input FASTQ files, reference genome path, and tool settings for the assembly steps.

Workflow Overview

  • Initial de novo assembly of input reads
  • Selection of best assembly to serve as reference
  • Reference-based improvement of other assemblies
  • Consensus genome construction
  • Filtering of contaminant sequences

Results

Output files will be saved in the results/ directory and include assembled genomes, intermediate files, quality reports, and the final consensus assembly.

Acknowledgments

This pipeline incorporates tools for de novo and reference-based genome assembly. Tool choices may include (but are not limited to): SPAdes, BWA, Samtools, and QUAST.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 66.1%
  • PowerShell 23.2%
  • Shell 10.7%