Skip to content

fix: Handle wide matrices in orthogonal initializer#629

Open
blasphemetheus wants to merge 1 commit intoelixir-nx:mainfrom
blasphemetheus:fix/orthogonal-wide-matrix
Open

fix: Handle wide matrices in orthogonal initializer#629
blasphemetheus wants to merge 1 commit intoelixir-nx:mainfrom
blasphemetheus:fix/orthogonal-wide-matrix

Conversation

@blasphemetheus
Copy link

Title: Fix orthogonal initializer for wide matrices

Summary

  • Fix orthogonal_impl to generate a {max(m,n), max(m,n)} square random matrix instead of {m, n}, so QR decomposition produces enough orthogonal columns for wide shapes
  • Add test for wide 2D matrix {8, 32} with orthonormality assertion
  • Add test for wide high-rank shape {2, 8}

Fixes #628

Test plan

  • Existing orthogonal tests pass (property, raises on rank < 2)
  • New wide matrix test: {8, 32} produces orthonormal rows (t * t^T ≈ I)
  • New wide high-rank test: {2, 8} returns correct shape
  • All 25 doctests + 10 tests pass (3.8s)

QR decomposition of an {m, n} matrix produces Q of shape {m, m},
which fails when n > m (e.g. LSTM weights {hidden, 4*hidden}).
Generate a {max(m,n), max(m,n)} square random matrix so QR always
produces enough orthogonal columns, then slice to {m, n}.

Adds tests for wide 2D and high-rank shapes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Orthogonal initializer crashes on wide matrices (e.g. LSTM/GRU kernels)

1 participant