PIP-eco

Pathotype Identification Pipeline for Escherichia coli (PIP-eco)

Overview: PIP-eco is a comprehensive analytical tool designed to accurately identify and characterize Escherichia coli (E. coli) pathotypes. This pipeline facilitates detailed analysis for both single and hybrid pathotypes using whole genome sequencing (WGS) data. PIP-eco pipeline consists of three infrastructures: Marker gene alignment process, Pan-phylogenetic analysis process, and Pathogenicity Islands (PAIs) analysis process. It accepts assembled bacterial strain collections as input, which can be either NCBI RefSeq records or user's own data in fasta format. In the PIP-eco pipeline, genome annotation on the input WGS data is performed. Follwing this, the pathotype is determined based on marker genes. Additionally, by conducting phylogenetic analysis based on pan-genome analysis, the genetic distances are investigated, thus effectively discriminating hybrid pathotypes. Through these processes, the PIP-eco pipeline is utilized not only for pathotype assignment but also for tracing the trajectories of pathogenic factors. The Processing within the PIP-eco pipeline uses publicly available tools: PROKKA, USEARCH, MUSCLE, and MAFFT.

Pipeline overview

Quick start and installing dependencies

1. Download PIP-eco pipeline on Github and syncronized environment.

conda create -y pathotype.yaml
git clone https://github.com/SBL-Kimlab/PIP-eco.git

2. Installing dependencies

Overview of dependencies:

Genome annotation: Prokka
Local alignment tool: Usearch
Sequence alignment tool: Muscle

Usage

In the PIP-eco pipeline, each process is performed according to defined modules. Users can directly use the individual modules as shown below, so all processes can be executed at once.

#Before executing the PIP-eco pipeline, it needs to declare /include/include.ipynb.

import os
import os.path as path
from time import sleep
path_root = path.abspath( path.join( os.getcwd(), ".." ) )
path_local = path_root +  "/PIPeco"; path_include = path_root + "/include"
file_include = path_include +  "/include.ipynb"
%run $file_include

#PIP-eco pipeline excution 
os.chdir( path_root )
pipeco = pathotype()

pipeco.method.genome_annotation()
pipeco.method.marker_alignment()
pipeco.method.vf_based_phylogenetic()
pipeco.method.pai_analysis()

Reference

Seemann, T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14), 2068-2069.
Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460-2461.
Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research, 32(5), 1792-1797.
Katoh, K., Misawa, K., Kuma, K. I., & Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic acids research, 30(14), 3059-3066.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
include		include
script		script
.gitattributes		.gitattributes
PIPeco.png		PIPeco.png
README.md		README.md
pathotype.yaml		pathotype.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PIP-eco

Table of contents

Pipeline overview

Quick start and installing dependencies

1. Download PIP-eco pipeline on Github and syncronized environment.

2. Installing dependencies

Overview of dependencies:

Usage

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

sbml-lab/PIP-eco

Folders and files

Latest commit

History

Repository files navigation

PIP-eco

Table of contents

Pipeline overview

Quick start and installing dependencies

1. Download PIP-eco pipeline on Github and syncronized environment.

2. Installing dependencies

Overview of dependencies:

Usage

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages