-
Notifications
You must be signed in to change notification settings - Fork 12
Genomic Resources
Please visit https://robertslab.github.io/resources/ instead.
Here we try to compile genomic resources such that they are readily available and somewhat described. An effort will be made to keep respective index files alongside so these files can be directly used in IGV etc.
-
Archived Versions of this page - 091319;
-
Nightingales (Google Sheet) : Database of all raw high-throughput sequencing data
Species list
| C bairdi | C gigas | C virginica | Hematodinium | M magister | O lurida | P generosa | QPX |
-
cbai_genome_v1.01.fasta (18MB)
-
cbai_genome_v1.0.fasta (19MB)
-
-
MD5 =
aeec8ffbf8fa44fb1750caee6abaf68a -
BUSCOs:
C:96.5%[S:40.3%,D:56.2%],F:2.2%,M:1.3%,n:978 -
FastA index (
samtools faidx) -
BLASTx annotation (outfmt6)
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-UW with non_Alveolata. Derived from
cbai_transcriptome_v3.0.fasta
-
-
-
Assembly from 20200518
-
MD5 =
5516789cbad5fa9009c3566003557875 -
BUSCOs:
C:97.6%[S:39.1%,D:58.5%],F:1.6%,M:0.8%,n:978 -
FastA index (
samtools faidx) -
BLASTx annotation (outfmt6)
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-UW with no taxonomic filter.
-
-
-
MD5 =
1fb788175f9bb7cd5145370a399ae857 -
BUSCOs:
C:98.3%[S:25.2%,D:73.1%],F:1.4%,M:0.3%,n:978 -
FastA index (
samtools faidx) -
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-GW, 2020-UW with non_Alveolata. Derived from
cbai_transcriptome_v2.0.fasta
-
-
-
Also referred to as
20200507.C_bairdi.Trinity.fasta. -
MD5 =
01adbd54298495c147767b19ee5c0de9https://gannet.fish.washington.edu/Atumefaciens/20200526_cbai_trinotate_transcriptome-v3.0/20200526.cbai.trinotate.go_annotations.txt -
BUSCOs:
C:98.8%[S:24.9%,D:73.9%],F:0.9%,M:0.3%,n:978 -
FastA index (
samtools faidx) -
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-GW, 2020-UW with no taxonomic filter.
-
-
-
MD5 =
032d1f81c7744736ebeefe7f63ed6d95 -
Assembly from 20200527
-
FastA index (
samtools faidx)-
cbai_transcriptome_v1.7.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/cbai_transcriptome_v1.7.fasta.fai
-
cbai_transcriptome_v1.7.fasta.fai :
-
BUSCOs:
C:86.7%[S:66.5%,D:20.2%],F:8.2%,M:5.1%,n:978 -
BLASTx Annotation (outfmt6)
-
[GO Terms Annotation] (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-UW with Arthropoda only reads.
-
-
-
MD5 =
46d77ce86cdbbcac26bf1a6cb820651e -
FastA index (
samtools faidx)-
cbai_transcriptome_v1.6.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/cbai_transcriptome_v1.6.fasta.fai
-
cbai_transcriptome_v1.6.fasta.fai :
-
BUSCOs:
C:91.7%[S:62.6%,D:29.1%],F:6.2%,M:2.1%,n:978 -
BLASTx Annotation (outfmt6)
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-GW, 2020-UW with Arthropoda only reads.
-
-
-
MD5 =
e61d68c45728ffbb91e3d34c087d9aa9 -
BUSCOs: C:91.8%[S:64.0%,D:27.8%],F:5.9%,M:2.3%,n:978
-
FastA index (
samtools faidx)-
cbai_transcriptome_v1.5.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/cbai_transcriptome_v1.5.fasta.fai
-
cbai_transcriptome_v1.5.fasta.fai :
-
Updated assembly from 20200330. Also referred to as
20200408.C_bairdi.megan.Trinity.fasta -
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-GW with Arthropoda only reads.
-
-
-
MD5 =
fb28a203154b44b67ec2e2476d96d326 -
BUSCOs:
C:85.5%[S:64.7%,D:20.8%],F:9.3%,M:5.2%,n:978 -
FastA index (
samtools faidx)-
cbai_transcriptome_v1.0.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/cbai_transcriptome_v1.0.fasta.fasta.fai
-
cbai_transcriptome_v1.0.fasta.fai :
-
Initial Trinity assembly from 20200122
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019 with Arthropoda only reads.
-
-
Compilation of DNA Methylation Genome Feature Tracks (Crassostrea gigas) circa 2015
-
Re-defining Cgigas Canonical features circa 2015
-
Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa :
http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa-
MD5 = 6de9d1239eb10ea0545bed6c4e746d6c
-
FastA index (
samtools faidx) :http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.fai
-
-
Crassostrea_gigas.oyster_v9.dna_sm.toplevel_bisulfite.tar.gz :
http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel_bisulfite.tar.gz- Gzipped tarball of bisulfite genome for use with Bismark
- Creation details here
-
Cgigas_v9_gene.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff -
Cgigas_v9_exon.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff -
Cgigas_v9_intron.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff -
Cgigas_v9_TE.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff- Contains Tandem Repeats and wublastx features.
-
Cgigas_v9_CG.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff- index:
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff.idx
- index:
-
Cgigas_v9_1k5p_gene_promoter.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff -
Cgigas_v9_COMP_gene_prom_TE.bed :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed -
Crassostrea_gigas.oyster_v9.40.gff3 :
http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.40.gff3- MD5 = 90a747fbc94a0a9225c43f75cc40b9db
-
Crassostrea_gigas.oyster_v9.40.abinitio.gff3 :
http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.40.abinitio.gff3- MD5 = c2a8c388f5a8afb22a115d61dee3dda0
-
Crassostrea_gigas.oyster_v9.40_mRNA.gff3
grep "mRNA" Crassostrea_gigas.oyster_v9.40.gff3 > Crassostrea_gigas.oyster_v9.40_mRNA.gff3
-
Cvirginica_v300.fa :
http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300.fa-
MD5 = f9135e323583dc77fc726e9df2677a32
-
FastA index (
samtools faidx)-
Cvirginica_v300.fa.fai :
http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300.fa.fai
-
Cvirginica_v300.fa.fai :
-
-
GCF_002022765.2_C_virginica-3.0_genomic.fna :
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/022/765/GCF_002022765.2_C_virginica-3.0/GCF_002022765.2_C_virginica-3.0_genomic.fna.gz- compressed version of
Cvirginica_v300.fa(same files)
- compressed version of
-
Cvirginica_v300_bisulfite.tar.gz :
http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300_bisulfite.tar.gz- Gzipped tarball of bisulfite genome for use with Bismark
- Creation details here
-
C_virginica-3.0_Gnomon_mRNA.gff3 :
http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_Gnomon_mRNA.gff3 -
C_virginica-3.0_Gnomon_exon.bed :
http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_Gnomon_exon.bed -
C_virginica-3.0_intron.bed :
http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_intron.bed -
C_virginica-3.0_CG-motif.bed :
http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_CG-motif.bed-
MD5 = f88c171bccf45a6f3afcf455b6be810f
-
Dead link in this Jupyter Notebook obscures details on how this was generated (via Galaxy):
-
-
C_virginica-3.0_TE-all.gff :
http://owl.fish.washington.edu/halfshell/genomic-databank/C_virginica-3.0_TE-all.gff-
MD5 = d0d81fc6cf7525bc2c61984bee23521b
-
-
C_virginica-3.0_TE-Cg.gff :
http://owl.fish.washington.edu/halfshell/genomic-databank/C_virginica-3.0_TE-Cg.gff-
MD5 = 83cd753c171076464fee1165b7e1c6ba
-
-
hemat_transcriptome_v1.5.fasta
-
MD5 =
b8d4a3c1bad2e07da8431bf70bdabfdd -
BUSCOs:
C:25.6%[S:20.7%,D:4.9%],F:11.7%,M:62.7%,n:978 -
FastA index (
samtools faidx)-
hemat_transcriptome_v1.5.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/hemat_transcriptome_v1.5.fasta.fai
-
hemat_transcriptome_v1.5.fasta.fai :
-
Updated assembly from 20200330.
-
BLASTx Annotation (txt; 355KB)
-
Trinotate GO Terms Annotation (txt; 2.3MB)
-
internal short-hand: includes 2018, 2019, 2020-GW with Alveolata only reads.
-
-
hemat_transcriptome_v1.0.fasta (3.9MB)
-
MD5 =
fa5eb74767d180af5265d2d1f80b6430 -
BUSCOs:
C:25.1%[S:19.2%,D:5.9%],F:9.5%,M:65.4%,n:978 -
FastA index (
samtools faidx)-
hemat_transcriptome_v1.0.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/hemat_transcriptome_v1.0.fasta.fai
-
hemat_transcriptome_v1.0.fasta.fai :
-
Initial Trinity assembly from 20200122
-
BLASTx Annotation (txt; 308KB)
-
Trinotate GO Terms Annotation (txt; 2.1MB)
-
internal short-hand: includes 2018, 2019 with Alveolata only reads.
-
-
mmag_pilon_scaffolds.fasta-
MD5 = 5dfa2ba11edf0ff8191f112e0b1378d1
-
Not shared publicly until permission received from NOAA.
-
Roberts Lab members can access on Owl:
/web/halfshell/genomic-databank/mmag_pilon_scaffolds.fasta -
Original filename:
pilon_scaffolds.fasta -
FastA index (
samtools faidx)mmag_pilon_scaffolds.fasta.fai
-
-
Olurida_v081.fa :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa-
MD5 = 3ac56372bd62038f264d27eef0883bd3
-
This is
v080with only contigs > 1000bp. Details of howv080was reduced found here. -
FastA index (
samtools faidx)-
Olurida_v081.fa.fai :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa.fai
-
Olurida_v081.fa.fai :
-
-
Olurida_v080.fa :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080.fa-
MD5 = 9258398f554493e08fdc30e9c1409864
-
FastA index (
samtools faidx)-
Olurida_v080.fa.fai :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080.fa.fai
-
Olurida_v080.fa.fai :
-
Also known as
pbjelly_sjw_01. Details found here, though confirmation would be good.
-
Olurida_v080_bisulfite.tar.gz :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080_bisulfite.tar.gz- Gzipped tarball of bisulfite genome for use with Bismark
- Creation details here
-
-
Olurida_transcriptome_v3.fasta
- MD5 = 9da3242af2be0463051ec7e1f39b2593
-
Olurida_v081_genome_snap02.all.renamed.putative_function.domain_added.gff (2.9GB) :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081_genome_snap02.all.renamed.putative_function.domain_added.gff- MD5 =
f54512bd964f45645c34b1e8e403a2b0
- MD5 =
-
Olurida_v081-20190709.CDS.gff :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-20190709.CDS.gff -
Olurida_v081-20190709.exon.gff :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-20190709.exon.gff -
Olurida_v081-20190709.gene.gff :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-20190709.gene.gff -
Olurida_v081-20190709.mRNA.gff :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-20190709.mRNA.gff -
Olurida_v081_TE-Cg.gff :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081_TE-Cg.gff-
MD5 = 977fd7cdb460cd0b9df5e875e1e880ea
-
Transposable Element track - more details in Sam's Notebook, including a summary table.
-
-
Olurida_v081_CG-motif.gff :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081_CG-motif.gff
-
Pgenerosa_v074.fa :
http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_v074.fa-
Version of 070 containing 18 largest scaffolds (details on subsetting)
-
MD5 = 32976550b9030126c07920d5f2db179c
-
BUSCO scores:
C:71.6%[S:70.7%,D:0.9%],F:4.7%,M:23.7%,n:978- Notebook entry
-
FastA index (
samtools faidx) -http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_v074.fa.fai
-
These originate from GenSAS annotation on 20190710
-
BUSCO:
- 68.4% complete BUSCOs
-
Panopea-generosa-vv0.74.a3.gene.gff3 :
https://owl.fish.washington.edu/halfshell/genomic-databank/Panopea-generosa-vv0.74.a3.gene.gff3 -
Panopea-generosa-vv0.74.a3.exon.gff3 :
https://owl.fish.washington.edu/halfshell/genomic-databank/Panopea-generosa-vv0.74.a3.exon.gff3 -
Panopea-generosa-vv0.74.a3.intragenic.bed :
https://gannet.fish.washington.edu/seashell/wd/091319/tracks/Panopea-generosa-vv0.74.a3.intragenic.bed -
Panopea-generosa-vv0.74.a3.intron.bed :
https://gannet.fish.washington.edu/seashell/wd/091319/tracks/Panopea-generosa-vv0.74.a3.intron.bed -
Panopea-generosa-vv0.74.a3.rm.gff3 :
https://gannet.fish.washington.edu/seashell/wd/091319/tracks/Panopea-generosa-vv0.74.a3.rm.gff3- index:
https://gannet.fish.washington.edu/seashell/wd/091319/tracks/Panopea-generosa-vv0.74.a3.rm.gff3.idx
- index:
-
Pgenerosa_v074.CpG.gff :
https://gannet.fish.washington.edu/seashell/wd/091319/tracks/Pgenerosa_v074.CpG.gff -
Panopea-generosa-vv0.74.a3.CDS.gff3 :
https://owl.fish.washington.edu/halfshell/genomic-databank/Panopea-generosa-vv0.74.a3.CDS.gff3 -
Panopea-generosa-vv0.74.a3.mRNA.gff3 :
https://owl.fish.washington.edu/halfshell/genomic-databank/Panopea-generosa-vv0.74.a3.mRNA.gff3
-
Pgenerosa_transcriptome_v5.fasta :
http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_transcriptome_v5.fasta- MD5 = 5a21424ecbc88c3b01daefe56bed79da
Transcriptome generated from various libaries - details here
-
QPX_v017.fasta :
http://eagle.fish.washington.edu/QPX_genome/QPX_v017.fasta
CLC v5.1 Mismatch cost = 2; Perform scaffolding = Yes; Mapping mode = Map reads back to contigs (slow); Deletion cost = 3; Similarity fraction = 0.9; Length fraction = 0.8; Insertion cost = 3; Update contigs = Yes; Automatic word size = Yes; Minimum contig length = 10000; Automatic bubble size = Yes; input: filtered_QPX_DNA_GTGAAA_L001_R1 trimmed.
-
QPX_v017.fasta :
https://ndownloader.figshare.com/files/3085550
CLC v5.1 Mismatch cost = 2; Perform scaffolding = Yes; Mapping mode = Map reads back to contigs (slow); Deletion cost = 3; Similarity fraction = 0.9; Length fraction = 0.8; Insertion cost = 3; Update contigs = Yes; Automatic word size = Yes; Minimum contig length = 10000; Automatic bubble size = Yes; input: filtered_QPX_DNA_GTGAAA_L001_R1 trimmed.
-
QPX_v015.fasta :
https://doi.org/10.1371/journal.pone.0074196.s001
De novo assembly was performed with Genomics Workbench v. 5.0 (CLC Bio, Germany) on quality trimmed sequences with the following parameters: mismatch cost = 2, deletion cost = 3, similarity fraction = 0.9, insertion cost = 3, length fraction = 0.8 and minimum contig size of 100 bp for genomic data and 200 bp for transcriptomic data. In order to remove ribosomal RNA sequences from the transcriptome data, consensus sequences were compared to the NCBI nt database using the BLASTn algorithm [59]. Sequences with significant matches (9) were removed and not considered in subsequent analyses.
Manuscript: https://doi.org/10.1371/journal.pone.0074196
QPX_Transcriptome v2.1
Subset of version 1 (v1) that only includes sequences with e-value < 1E-20. Based on Swiss-Prot blastx output, all sequences are oriented 5' - 3'. nucleotides between stop codons; minimum size 200.