Skip to content

sample and phenotype preprocessing for aou v8#32

Open
aryarm wants to merge 12 commits intomainfrom
aryarm
Open

sample and phenotype preprocessing for aou v8#32
aryarm wants to merge 12 commits intomainfrom
aryarm

Conversation

@aryarm
Copy link
Member

@aryarm aryarm commented Feb 3, 2026

some differences between v7 sample preprocessing and v8 sample preprocessing:

  1. v7 used the number of singletons while v8 used a count of the number of variants not appearing in gnomAD 3.1 instead
  2. v7 used the Het/HomVar genotype ratio for all genotypes while v8 has separate het/homvar ratios for SNPs and indels
  3. v7 filters samples for call rate > 0.9 while v8 does not perform any filtering for call rate. We should filter for call rate before each GWAS

sample QC report

  • started out with 414831 samples
  • our filtering step left 380247 samples (removed 34,584 samples)
    • of those 34,584 samples, AoU had flagged 988 of them
  • v8 passing samples are missing 9436 samples from v7 that passed
    • and originally, the v8 data (before even any qc) was missing 2684 samples from v7
  • overlapping them:
    • how many samples in v7 before qc? 245395
    • and how many in v8 that overlap v7? 242711
    • that is a loss of 2684 which matches from before!
  • after running the v8 code with only the v7 samples from the v8 metrics (aka v8code+v7subset). what does the passing data look like?
    • comparing v8code+v7subset with v7passing:
      • how many unique to v8code+v7subset? 4910 samples
      • how many unique to v7passing? 9442 samples
  • ok, now let's use the v7 metrics to run finish_sampleqc.py with the v8 code (aka v7metrics+v8code)
    • comparing v7metrics+v8code with v8code+v7subset:
      • how many unique to v7metrics+v8code? 7 samples
      • how many unique to v8code+v7subset? 87 samples
    • comparing v7metrics+v8code with v7passing:
      • how many unique to v7metrics+v8code? 4823 samples
      • how many unique to v7passing? 9435 samples

@aryarm aryarm marked this pull request as ready for review February 6, 2026 19:20
@aryarm aryarm changed the title sample preprocessing for aou v8 sample and phenotype preprocessing for aou v8 Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant