Skip to content

Add GitHub Actions CI with GPU testing infrastructure#8

Open
robtaylor wants to merge 3 commits intofacebookresearch:mainfrom
ChipFlow:pr/ci-infrastructure
Open

Add GitHub Actions CI with GPU testing infrastructure#8
robtaylor wants to merge 3 commits intofacebookresearch:mainfrom
ChipFlow:pr/ci-infrastructure

Conversation

@robtaylor
Copy link

Summary

Implements comprehensive CI/CD pipeline for testing all backends:

Workflows

  • macos-metal.yml: macOS ARM64 runner for Metal backend testing
  • test.yml: Standard CPU tests on Linux/macOS
  • test-gpu-cloudrun.yml: GPU testing via Google Cloud Run (NVIDIA)

Infrastructure

  • Dockerfile.gpu-base: CUDA-enabled container with CLBlast, sccache
  • scripts/setup_gcp_gpu_ci.sh: GCP project setup for Cloud Run jobs
  • scripts/run-gpu-tests.sh: GPU test execution wrapper

Features

  • sccache with GCS backend for build caching
  • PoCL-based OpenCL testing on GitHub runners
  • Automatic GPU detection and backend selection
  • Comprehensive test matrix (CPU, Metal, OpenCL, CUDA)

Dependencies

⚠️ This PR depends on:

Please merge those PRs first, then this PR can be rebased cleanly.

Test Plan

  • Verify macOS CI runs Metal tests
  • Verify Linux CI runs CPU tests
  • Verify OpenCL tests work with PoCL

🤖 Generated with Claude Code

robtaylor and others added 3 commits January 2, 2026 13:57
Implements Apple Metal support as an additional backend alongside CPU and CUDA:

- MetalDefs.h/mm: Buffer registry, context management, and MetalMirror helper
- MetalKernels.metal: Compute shaders for factorization and solve operations
- MatOpsMetal.mm: NumericCtx and SolveCtx implementations using Metal + Eigen
- MetalFactorTest.cpp, MetalSolveTest.cpp: Test suites for factor and solve ops

Key implementation details:
- Float-only (Apple Silicon lacks double precision support)
- Uses Eigen for dense operations (potrf, trsm, saveSyrkGemm)
- Metal compute kernels for sparse operations (factor_lumps, sparse_elim, assemble)
- MTLResourceStorageModeShared for CPU/GPU data sharing
- Row-major storage for Eigen compatibility

All 8 Metal tests pass (factor, solve with sparse elimination + dense factorization).
All 89 CPU tests continue to pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add OpenCL/CLBlast backend as portable GPU fallback:
- Add BASPACHO_USE_OPENCL CMake option with CLBlast dependency
- Add FindCLBlast.cmake module
- Add BackendOpenCL to BackendType enum
- Update detectBestBackend() priority: CUDA > Metal > OpenCL > CPU
- Create OpenCLDefs.h/cpp with context management and buffer mirroring
- Port sparse kernels to OpenCL (factor_lumps, assemble, solve kernels)
- Create MatOpsOpenCL.cpp with NumericCtx/SolveCtx stubs
  - CPU fallback for potrf (CLBlast doesn't have this)
  - CLBlast ready for trsm/gemm (CPU fallback for now)

This is a foundational commit - OpenCL backend compiles but
operations throw "not yet implemented" for full GPU execution.
CPU-only build verified: 89 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Metal backend solver to benchmark suite (Bench.cpp)
  - Uses float precision (Metal hardware limitation)
  - Supports factor and solve operations with timing

- Create GitHub Actions workflow (macos-metal.yml)
  - Runs on macos-14 runner (Apple Silicon M1/M2)
  - Two jobs: build-and-test, benchmark
  - Runs all CPU and Metal tests
  - Executes benchmarks comparing Metal vs CPU BLAS
  - Uploads benchmark results as artifacts
  - Posts summary to GitHub Actions

The workflow can be triggered manually with custom parameters:
  - benchmark_iterations: Number of iterations per problem
  - problem_filter: Regex to filter specific problems

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@meta-cla
Copy link

meta-cla bot commented Jan 2, 2026

Hi @robtaylor!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant