Conversation
standage
commented
Apr 24, 2025
|
|
||
| def __init__(self, name, rsids, index, xrefs=None, source=None): | ||
| self.name = Marker.check_name(name) | ||
| self.source_name = str(self.name) |
Comment on lines
-49
to
+56
| self.source_name_map[marker.source.name][marker.name] = self.definition_names[marker.posstr()] | ||
| self.source_name_map[marker.source.name][marker.source_name] = self.definition_names[marker.posstr()] | ||
| continue | ||
| else: | ||
| new_name = marker.name | ||
| if len(self.markers_by_definition) > 1: | ||
| new_name = f"{marker.name}.v{len(self.definition_names) + 1}" | ||
| self.definition_names[marker.posstr()] = new_name | ||
| self.source_name_map[marker.source.name][marker.name] = new_name | ||
| self.source_name_map[marker.source.name][marker.source_name] = new_name |
| - 2413 distinct loci | ||
| [frequencies] | ||
| - 59753 haplotypes | ||
| - 59704 haplotypes |
Member
Author
There was a problem hiding this comment.
Correcting for frequency records using deprecated marker identifiers
standage
commented
Apr 24, 2025
microhapdb/tests/test_frequency.py
Outdated
Comment on lines
130
to
135
| def test_marker_names_valid(): | ||
| freq_markers = set(microhapdb.frequencies.Marker) | ||
| markers = set(microhapdb.markers.Name) | ||
| invalid = freq_markers - markers | ||
| print(invalid) | ||
| assert len(invalid) == 0 |
Member
Author
There was a problem hiding this comment.
Added this regression test
Member
Author
|
Additional issues discovered after running the regression test on the master branch.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In this PR I'm updating the binder demo notebook. In the process, I changed the allele formatting from
A|T|T|AtoA:T:T:Ato avoid confusion with conventional genetic notation for haplotype phases. (I would love to have dropped the separators altogether, but some legacy functions of the database still need to handle microhaps with indels correctly.)I also found a bug with how non-1KGP allele frequencies were being renamed post-resolution of locus and allele definition identifiers.
It only affected four allele definitions at two loci, and was resolved with a simple change to the build procedure.None of the standard 1KGP allele frequencies or Ae scores were affected.Update: Actually, after running the new regression test on the master branch, I found three more affected loci—see comment below. As before, the 1KGP allele frequencies remain unaffected.