Skip to content

[Feature] Local PCA and CA-PCA dimension estimators#39

Merged
Xmaster6y merged 10 commits intomainfrom
local-pca
Feb 14, 2026
Merged

[Feature] Local PCA and CA-PCA dimension estimators#39
Xmaster6y merged 10 commits intomainfrom
local-pca

Conversation

@Xmaster6y
Copy link
Owner

@Xmaster6y Xmaster6y commented Feb 13, 2026

What does this PR do?

Key insights about the PR.

Linked Issues

  • Closes #?
  • #?

Checklist

  • I have read the CONTRIBUTING guide.
  • I have added tests for my changes if needed.
  • I have updated the documentation if needed.

Summary by cubic

Adds LocalPCA and curvature-adjusted CA‑PCA estimators for per‑point intrinsic dimension from k‑NN neighborhoods. Ships a new notebook with synthetic data, MNIST, TwoNN plots, clearer per‑point visuals, and a per‑label MNIST analysis.

  • New Features

    • LocalPcaDimensionEstimator: PCA on k‑NN neighborhoods with “maxgap” or “ratio”; k can be int or “auto”; outputs (..., N).
    • CaPcaDimensionEstimator: curvature‑adjusted PCA using k+1 neighbors; returns NaN for degenerate/too‑small neighborhoods; outputs (..., N).
    • Exported in latent.dimension_estimation; neighbor util now returns distances and indices; LocalKnn and TwoNn updated.
    • Docs: Dimension Estimation notebook (now includes per‑label MNIST results), methods page card, README link; references updated (Fukunaga, Bruske, CA‑PCA).
  • Dependencies

    • scikit‑learn required for PCA estimators, enforced via a requires_sklearn decorator with a clear ImportError.
    • Added ipywidgets to docs/notebooks extras for notebook rendering.

Written for commit 1594373. Summary will update on new commits.

@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 97.94872% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.81%. Comparing base (5ad1107) to head (1594373).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/tdhook/latent/dimension_estimation/ca_pca.py 95.55% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #39      +/-   ##
==========================================
+ Coverage   96.71%   96.81%   +0.09%     
==========================================
  Files          33       36       +3     
  Lines        2038     2226     +188     
==========================================
+ Hits         1971     2155     +184     
- Misses         67       71       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 9 files

Confidence score: 3/5

  • Potential runtime failure in src/tdhook/latent/dimension_estimation/local_pca.py if tensors require gradients, since the neighborhood is converted to NumPy without detaching
  • Neighbor selection edge cases in local_pca.py (self/too-close points marked inf but still selected) could lead to fewer than k valid neighbors and unstable estimates
  • Score reflects a few medium-severity, user-impacting correctness risks in dimension estimation rather than merge-blocking defects
  • Pay close attention to src/tdhook/latent/dimension_estimation/local_pca.py and src/tdhook/latent/dimension_estimation/ca_pca.py - gradient detachment and neighbor selection consistency
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/tdhook/latent/dimension_estimation/local_pca.py">

<violation number="1" location="src/tdhook/latent/dimension_estimation/local_pca.py:101">
P2: `sorted_neighbors` marks self/too-close points as `inf`, but `neighbor_idx = indices[i, :k]` never checks whether those indices correspond to finite distances. If fewer than k valid neighbors exist (e.g., duplicates within `eps`), this will include excluded points instead of returning NaN or skipping the neighborhood.</violation>

<violation number="2" location="src/tdhook/latent/dimension_estimation/local_pca.py:103">
P2: Detach the neighborhood tensor before converting to NumPy so this works with tensors that require gradients.</violation>
</file>

<file name="src/tdhook/latent/dimension_estimation/ca_pca.py">

<violation number="1" location="src/tdhook/latent/dimension_estimation/ca_pca.py:88">
P2: PCA neighborhood uses only k neighbors even though CA-PCA is described as using k+1 neighbors and r is computed from k and k+1 distances. This inconsistency can bias the dimension estimate; include the (k+1)-th neighbor in the PCA neighborhood.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="docs/source/notebooks/methods/dimension-estimation.ipynb">

<violation number="1" location="docs/source/notebooks/methods/dimension-estimation.ipynb:551">
P2: Indexing a torch tensor with a NumPy boolean mask (`labels == d`) is not reliably supported and can throw TypeError or mis-index. Convert the mask to a torch boolean tensor before indexing to avoid runtime errors.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@Xmaster6y Xmaster6y merged commit b7c78fb into main Feb 14, 2026
7 checks passed
@Xmaster6y Xmaster6y deleted the local-pca branch February 14, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments