Home

Welcome to the CIRCOS_PanGenome wiki!

This program has 2 parameters: a groups file and a file containing the names of your FASTA files to be used. Both input files are text files.

The first file is a text file containing clusters of genes and the species that contain similar genes in this cluster. The second file is a file containing names of FASTA files. Each of these FASTA files contains information about the common name of the protein and the amino acid sequence of the protein.

This program allows for visualization of these genomes through the overlapping of similar genes. The output files of this program can be directly put into CIRCOS visualization software, which shows your circular genomes, including those genes that are located in one or more of your species.

The groups file is a text file in the format:

cluster1: speciesA|arbitrary_name1 speciesB|arbitrary_name2 speciesC|arbitrary_name3
cluster2: speciesB|arbitrary_name4 speciesC|arbitrary_name5 speciesD|arbitrary_name6

The file containing your names of FASTAs is a text file in the format:

fasta_file1.fasta
fasta_file2.fasta

Each FASTA file that these names correspond to are in one of the two following formats:

>gi|123456789|gb|arbitrary_name1| common_protein_name1 [Genus species]
AMINO ACID SEQUENCE
>gi|987654321|gb|arbitrary_name2| common_protein_name2 [Genus species]
AMINO ACID SEQUENCE

or 

>fig|arbitrary_name1	common_name1
AMINO ACID SEQUENCE
>fig|arbitrary_name2	common_name2
AMINO ACID SEQUENCE

SPECIAL NOTE*

Each fasta file that you make to be put into this program may have slightly different formatting, depending on how you name your sequences. Specifically, this program searches for your names in one of two common methods shown above. In particular, the first format will be generated by GenBank (thus the gb as the 3rd object in the name), whereas the second format is generated by RAST. Either is acceptable for this program, but you may choose to format differently, in which case you MUST change the function 'get_info(each_line)' in lines 185 to 215.

On a Linux based system, the pangenome.py can be called in the command line like so:

[user@localhost] $ python pangenome.py groups_file.txt fasta_file_names.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally