-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the CIRCOS_PanGenome wiki!
This program has 2 parameters: a groups file and a file containing the names of your FASTA files to be used. Both input files are text files.
The first file is a text file containing clusters of genes and the species that contain similar genes in this cluster. The second file is a file containing names of FASTA files. Each of these FASTA files contains information about the common name of the protein and the amino acid sequence of the protein.
This program allows for visualization of these genomes through the overlapping of similar genes. The output files of this program can be directly put into CIRCOS visualization software, which shows your circular genomes, including those genes that are located in one or more of your species.
The groups file is a text file in the format:
cluster1: speciesA|arbitrary_name1 speciesB|arbitrary_name2 speciesC|arbitrary_name3
cluster2: speciesB|arbitrary_name4 speciesC|arbitrary_name5 speciesD|arbitrary_name6
The file containing your names of FASTAs is a text file in the format:
fasta_file1.fasta
fasta_file2.fasta
Each FASTA file that these names correspond to are in one of the two following formats:
>gi|123456789|gb|arbitrary_name1| common_protein_name1 [Genus species]
AMINO ACID SEQUENCE
>gi|987654321|gb|arbitrary_name2| common_protein_name2 [Genus species]
AMINO ACID SEQUENCE
or
>fig|arbitrary_name1 common_name1
AMINO ACID SEQUENCE
>fig|arbitrary_name2 common_name2
AMINO ACID SEQUENCE
SPECIAL NOTE*
Each fasta file that you make to be put into this program may have slightly different formatting, depending on how you name your sequences. Specifically, this program searches for your names in one of two common methods shown above. In particular, the first format will be generated by GenBank (thus the gb as the 3rd object in the name), whereas the second format is generated by RAST. Either is acceptable for this program, but you may choose to format differently, in which case you MUST change the function 'get_info(each_line)' in lines 185 to 215.
On a Linux based system, the pangenome.py can be called in the command line like so:
[user@localhost] $ python pangenome.py groups_file.txt fasta_file_names.txt