Skip to content

Add Metal backend for Apple Silicon GPU acceleration#6

Open
robtaylor wants to merge 1 commit intofacebookresearch:mainfrom
ChipFlow:pr/metal-backend
Open

Add Metal backend for Apple Silicon GPU acceleration#6
robtaylor wants to merge 1 commit intofacebookresearch:mainfrom
ChipFlow:pr/metal-backend

Conversation

@robtaylor
Copy link

Summary

Implements Apple Metal support as a GPU backend for macOS/iOS with Apple Silicon:

  • MetalDefs.h/mm: Buffer registry, context management, MetalMirror helper for CPU/GPU data sharing
  • MetalKernels.metal: Compute shaders for factorization and solve operations
  • MatOpsMetal.mm: NumericCtx and SolveCtx using Metal + MPS + Eigen

Key Features

  • Float-only precision (Apple Silicon lacks FP64 hardware support)
  • Metal Performance Shaders for gemm on large matrices (>64x64x64 threshold)
  • Eigen fallback for dense operations (potrf, trsm) and small matrices
  • MTLResourceStorageModeShared for efficient CPU/GPU data sharing
  • BackendAuto for automatic backend detection (CUDA > Metal > OpenCL > CPU)
  • detectBestBackend() helper function

Tests

  • MetalFactorTest.cpp, MetalSolveTest.cpp (8 tests, all passing)
  • All 89 CPU tests continue to pass

Test Plan

  • Build with -DBASPACHO_USE_METAL=1 on macOS
  • Run Metal tests on Apple Silicon hardware
  • Verify CPU tests still pass

🤖 Generated with Claude Code

Implements Apple Metal support as an additional backend alongside CPU and CUDA:

- MetalDefs.h/mm: Buffer registry, context management, and MetalMirror helper
- MetalKernels.metal: Compute shaders for factorization and solve operations
- MatOpsMetal.mm: NumericCtx and SolveCtx implementations using Metal + Eigen
- MetalFactorTest.cpp, MetalSolveTest.cpp: Test suites for factor and solve ops

Key implementation details:
- Float-only (Apple Silicon lacks double precision support)
- Uses Eigen for dense operations (potrf, trsm, saveSyrkGemm)
- Metal compute kernels for sparse operations (factor_lumps, sparse_elim, assemble)
- MTLResourceStorageModeShared for CPU/GPU data sharing
- Row-major storage for Eigen compatibility

All 8 Metal tests pass (factor, solve with sparse elimination + dense factorization).
All 89 CPU tests continue to pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@meta-cla
Copy link

meta-cla bot commented Jan 2, 2026

Hi @robtaylor!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant