Alternative Splicing Evolution

Summary

This repository contains the code for our paper: The role of alternative splicing in driving yet another phase transition in genomic complexity [doi]

Code

step1_get_data.py

Download genome annotation and protein sequence for all species on Ensembl
Calculate descriptive statistics for gene length, exon count, and protein length for each species

step2_merge_all_data.py

Helper script to aggregate result files when step1 got interrupted.

step3_avg_variance_exon.py

Script for Fig.1A

step4_mean_plots_gene_length.py

Script for Fig.1B

step5_mean_plots_protein_length.py

Script for Fig.1B inset

Data

Genome Annotation

Protein-coding gene annotations in GFF3 format are obtained from Ensembl and EnsemblGenomes FTP server. Archaea and Bacteria are not included in this study for the lack of alternative splicing in general.

Protein sequences

Protein sequences in FASTA format are obtained from Ensembl and EnsemblGenomes FTP server

Database	Release	Genome Annotation	Protein Sequence
Ensembl	114	Link	Link
EnsemblMetazoa	61	Link	Link
EnsemblPlants	61	Link	Link
EnsemblFungi	61	Link	Link
EnsemblProtists	61	Link	Link

Results

After data acquisition and aggregation, panel_a_data.tsv is generated and serves as the primary dataset for statistical analysis and visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extract_data.sh		extract_data.sh
panel_a_data.tsv		panel_a_data.tsv
parse_data_new.py		parse_data_new.py
step1_get_data.py		step1_get_data.py
step2_merge_all_data.py		step2_merge_all_data.py
step3_avg_variance_exon.py		step3_avg_variance_exon.py
step4_mean_plots_gene_length.py		step4_mean_plots_gene_length.py
step5_mean_plots_protein_length.py		step5_mean_plots_protein_length.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Alternative Splicing Evolution

Summary

Code

Data

Genome Annotation

Protein sequences

Results

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

KorkinLab/AS_evolution

Folders and files

Latest commit

History

Repository files navigation

Alternative Splicing Evolution

Summary

Code

Data

Genome Annotation

Protein sequences

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages