Skip to content

Phase 2 species identification #5

@alimanfoo

Description

@alimanfoo

Reported by Dan Lawson:

I ran some quick and dirty QC (perl 1 liners) on the phase 2 AR1 release and stumbled at almost the first hurdle - species identification.

I took the M/S assignment (column 9 of the samples.meta.txt file) and the 5 columns that constitute the samples.species.txt file to look for consistency of inference from the SNPs to M or S forms (now An. coluzzi, An. gambiae, or a hybrid). Here's what I found..

// Assignments to M, S or hybrid

perl -ne 'next if (/ox_code/);chomp;@f=split/\t/;print "$f[8]\n";' samples.meta.txt | sort | uniq -c | sort -rn
720 S
287 M
113
22 M/S

=> 113 individuals do not have an assignment

// Check consistency where an assignment present

Count samples.species m_s

650 [S S S S S] == S
56 [S . S S S] == S
7 [S M/S S S S] == S
4 [S M/S M/S M/S M/S] == S
1 [S S S S M/S] == S
1 [S S M/S M/S M/S] == S
1 [S M/S M M M] == S

260 [M M M M M] == M
15 [M M M M M/S] == M
6 [M . M M M] == M
3 [M . M M M/S] == M
2 [M S S S S] == M
1 [M M/S M/S M/S M/S] == M

8 [M/S M/S M/S M/S M/S] == M/S
6 [M/S S M/S M/S M/S] == M/S
4 [M/S S S S S] == M/S
2 [M/S M M/S M/S M/S] == M/S
1 [M/S M/S S S S] == M/S
1 [M/S M/S M/S M/S M] == M/S

101 [ S S S S] == AWOL S ?
7 [ . S S S] == AWOL S ?
2 [ M/S S S S] == AWOL S ?
1 [ S S S M/S] == AWOL S ?
1 [ S S M/S S] == AWOL S ?
1 [ M/S M/S M/S M/S] == AWOL ?

My take on this is 2 fold; firstly my ignorance as to what the various columns relate to, *snp are diagnostic SNPs taken from the reads, but I don't know what meta and sine_hmm pertain to. Secondly, there is quite a lot of heterogeneity here.

I'd love to understand this a bit more but a key thing is to review the missing data for the 113 individuals.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions