Snakemake Ensemble Assembly Pipeline

This repository contains a bioinformatics pipeline for ensemble genome assembly using Snakemake. The pipeline integrates de novo and reference-based assembly approaches to improve assembly quality for similar (e.g., bacterial) genomes while removing contaminant sequences (e.g., from host DNA).

Project Structure

.
├── README.md              # This file
├── config                 # Configuration files
│   └── config.yaml        # Main configuration
├── resources              # Resources (e.g., reference genomes)
├── results                # Results directory
└── workflow               # Snakemake workflow
    ├── Snakefile          # Main Snakemake file
    ├── env                # Conda environments
    ├── rules              # Snakemake rules (e.g., assembly, filtering)
    └── scripts            # Auxiliary scripts

Installation

Create Conda environment:

conda env create -f workflow/env/myenv.yaml
conda activate myenv

Install Snakemake (if not already installed):
```
conda install -c conda-forge snakemake
```

Running the Pipeline

To run the pipeline:

snakemake --use-conda

Configuration

Edit config/config.yaml to customize parameters such as input FASTQ files, reference genome path, and tool settings for the assembly steps.

Workflow Overview

Initial de novo assembly of input reads
Selection of best assembly to serve as reference
Reference-based improvement of other assemblies
Consensus genome construction
Filtering of contaminant sequences

Results

Output files will be saved in the results/ directory and include assembled genomes, intermediate files, quality reports, and the final consensus assembly.

Acknowledgments

This pipeline incorporates tools for de novo and reference-based genome assembly. Tool choices may include (but are not limited to): SPAdes, BWA, Samtools, and QUAST.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Snakemake Ensemble Assembly Pipeline

Project Structure

Installation

Running the Pipeline

Configuration

Workflow Overview

Results

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
config		config
resources		resources
results		results
workflow		workflow
.gitignore		.gitignore
README.md		README.md

jonahdaus/ensembleassembly

Folders and files

Latest commit

History

Repository files navigation

Snakemake Ensemble Assembly Pipeline

Project Structure

Installation

Running the Pipeline

Configuration

Workflow Overview

Results

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages