-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Reported by Dan Lawson:
I ran some quick and dirty QC (perl 1 liners) on the phase 2 AR1 release and stumbled at almost the first hurdle - species identification.
I took the M/S assignment (column 9 of the samples.meta.txt file) and the 5 columns that constitute the samples.species.txt file to look for consistency of inference from the SNPs to M or S forms (now An. coluzzi, An. gambiae, or a hybrid). Here's what I found..
// Assignments to M, S or hybrid
perl -ne 'next if (/ox_code/);chomp;@f=split/\t/;print "$f[8]\n";' samples.meta.txt | sort | uniq -c | sort -rn
720 S
287 M
113
22 M/S=> 113 individuals do not have an assignment
// Check consistency where an assignment present
Count samples.species m_s
650 [S S S S S] == S
56 [S . S S S] == S
7 [S M/S S S S] == S
4 [S M/S M/S M/S M/S] == S
1 [S S S S M/S] == S
1 [S S M/S M/S M/S] == S
1 [S M/S M M M] == S260 [M M M M M] == M
15 [M M M M M/S] == M
6 [M . M M M] == M
3 [M . M M M/S] == M
2 [M S S S S] == M
1 [M M/S M/S M/S M/S] == M8 [M/S M/S M/S M/S M/S] == M/S
6 [M/S S M/S M/S M/S] == M/S
4 [M/S S S S S] == M/S
2 [M/S M M/S M/S M/S] == M/S
1 [M/S M/S S S S] == M/S
1 [M/S M/S M/S M/S M] == M/S101 [ S S S S] == AWOL S ?
7 [ . S S S] == AWOL S ?
2 [ M/S S S S] == AWOL S ?
1 [ S S S M/S] == AWOL S ?
1 [ S S M/S S] == AWOL S ?
1 [ M/S M/S M/S M/S] == AWOL ?My take on this is 2 fold; firstly my ignorance as to what the various columns relate to, *snp are diagnostic SNPs taken from the reads, but I don't know what meta and sine_hmm pertain to. Secondly, there is quite a lot of heterogeneity here.
I'd love to understand this a bit more but a key thing is to review the missing data for the 113 individuals.