Conversation
Replace the pomegranate-based HMM with a direct numpy/scipy implementation featuring distance-dependent transitions, joint (log2, BAF) emissions parameterized by (purity, ploidy), and grid search over marginal likelihood. This removes ~500MB of transitive dependencies (pomegranate, PyTorch, numba, llvmlite). Key changes: - hmm.py: Forward/Viterbi algorithms in log space, batched grid search vectorized across all (purity, ploidy) grid points per arm - commands.py: Add --purity-output flag to segment command for saving purity/ploidy estimates as TSV - Remove pomegranate from all dependency and environment files - Update documentation to reflect new HMM implementation - Remove stale pomegranate ImportWarning workarounds from test files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…icks Remove run_exports with Jinja2 pin_subpackage template from conda meta.yaml -- not appropriate for a pure Python package and caused check-yaml hook failure. Fix single-backtick RST inline code to double-backtick. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skip purity/ploidy grid search when no autosomal chromosomes are found (e.g. chrX/chrY/chrM-only data), falling back to purity=1.0, ploidy=2.0 with a warning. Rename state_cn to state_indices for clarity, since the variable holds HMM state indices, not copy number values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use aggregated minor allele counts and read depths (from VCF) instead of derived mirrored BAF ratios for the HMM emission model. Beta-binomial naturally accounts for sequencing depth, respects the [0,1] domain, and eliminates the mirroring mismatch where observations were >= 0.5 but expected values could be < 0.5. - Add VariantArray.baf_counts_by_ranges() to aggregate counts per bin - Add _betabinom_logpmf() vectorized helper using gammaln/betaln - Rename _expected_baf → _expected_minor_freq (corrects misleading name) - Update log_emission_probs, _batched_emission_baf, prepare_observations, grid_search_purity_ploidy, and segment_hmm for count-based interface - Clamp alt_count to [0, depth] for robustness against inconsistent VCFs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review. |
|
Claude finished @etal's task —— View job PR Review: Native HMM SegmentationReviewing PR #1001 which implements native HMM segmentation to replace the pomegranate dependency. Review Checklist:
Review SummaryThis PR represents a significant and well-executed refactoring that removes the pomegranate dependency and implements a native HMM segmentation algorithm using pure numpy/scipy. The implementation is mathematically sound and follows good software engineering practices. ✅ StrengthsArchitecture & Design:
Mathematical Soundness:
Code Quality:
🔧 Implementation DetailsKey algorithmic improvements:
Dependency management:
🧪 Testing & ValidationTest coverage:
Code quality improvements:
🎯 Suggestions for Improvement
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1001 +/- ##
==========================================
+ Coverage 61.68% 62.43% +0.74%
==========================================
Files 72 72
Lines 6968 7168 +200
Branches 1213 1239 +26
==========================================
+ Hits 4298 4475 +177
- Misses 2296 2316 +20
- Partials 374 377 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
16 tests covering _betabinom_logpmf (vs scipy reference, zero-depth, broadcasting, extreme parameters), _expected_minor_freq (diploid het, LOH, CN=0, impure, all-states bound check), log_emission_probs (log2-only, with BAF counts, zero-depth neutrality, het-favors-diploid), and _batched_emission_baf (shape, zero-depth zeroed). Also improve BETABINOM_RHO comment and segment_hmm docstring clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes the RUF059 lint error.
No description provided.