Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
544f231
feat: Add parallelism analysis for IREE/Baspacho test cases
robtaylor Mar 6, 2026
a882e24
feat: Add NR phase timing profiler with JAX named scopes
robtaylor Mar 7, 2026
8efa57c
feat: Add eval branch specialization analysis to parallelism script
robtaylor Mar 7, 2026
232595b
feat: Inline shared params/cache as literals for branch specialization
robtaylor Mar 7, 2026
c27da3d
fix: Resolve lint errors in analyze_parallelism.py
robtaylor Mar 7, 2026
46cea11
style: Apply ruff format to all files
robtaylor Mar 7, 2026
8ee5bbc
docs: Add linting instructions to CLAUDE.md
robtaylor Mar 7, 2026
373842c
feat: Wire SCCP dead branch elimination into cache-split codegen path
robtaylor Mar 8, 2026
90f379e
style: Fix lint errors in check_constant_folding.py
robtaylor Mar 8, 2026
cf9f4b3
perf: Remove redundant Tikhonov regularization from dense linear solver
robtaylor Mar 9, 2026
d009e91
perf: Make output recording unconditional in transient loop
robtaylor Mar 9, 2026
29ac4c3
fix: Fix VACASK test paths and wire dump_jaxpr into compare_vacask
robtaylor Mar 9, 2026
5476ad2
fix: Stop inlining shared params as literals in generated eval code
robtaylor Mar 9, 2026
b695068
fix: Add None check for _transient_setup_cache in dump_jaxpr
robtaylor Mar 9, 2026
131e160
perf: Add XLA flag sweep script and CI workflow for CUDA optimization
robtaylor Mar 9, 2026
7f463c1
perf: Disable SCCP for unified eval function (cache invalidation)
robtaylor Mar 9, 2026
97fdfb9
perf: Return numpy arrays from run_transient to avoid CUDA dynamic_sl…
robtaylor Mar 9, 2026
f7adaf9
ci: Install nsight-systems-cli for nsys profiling workflow
robtaylor Mar 9, 2026
5fbff74
ci: Fix nsys package name to cuda-nsight-systems-12-6
robtaylor Mar 9, 2026
9f9e8ec
ci: Export nsys SQLite + CSV stats and improve summary
robtaylor Mar 9, 2026
6cc1186
ci: Add libcudnn9-cuda-12 to nsys profiling workflow
robtaylor Mar 9, 2026
927e4dc
ci: Install full cuda-toolkit-12-6 for nsys profiling
robtaylor Mar 9, 2026
43fc0c8
ci: Add libcudss0-cuda-12 to nsys profiling workflow
robtaylor Mar 9, 2026
a0b9161
ci: Add cancel-in-progress concurrency to all PR workflows
robtaylor Mar 9, 2026
7356c78
perf: Exclude JIT warmup from nsys profiling via CUDA profiler API
robtaylor Mar 9, 2026
982caeb
feat: Add HLO analysis script for dense benchmark circuits
robtaylor Mar 9, 2026
75dc078
fix: Tolerate SIGSEGV during nsys teardown (exit 139)
robtaylor Mar 9, 2026
c39943c
fix: Set nsys profile_name env var before running nsys
robtaylor Mar 9, 2026
869524d
fix: Add --force-export=true to nsys stats commands
robtaylor Mar 9, 2026
06dc3bc
fix: Drop cudaProfilerApi capture range from nsys profiling
robtaylor Mar 9, 2026
338ced1
feat: Convert NR while_loop to fori_loop for GPU-resident execution
robtaylor Mar 9, 2026
ab0a322
Revert "feat: Convert NR while_loop to fori_loop for GPU-resident exe…
robtaylor Mar 9, 2026
880c12e
feat: Enable XLA command buffer WHILE+CONDITIONAL for GPU profiling
robtaylor Mar 9, 2026
b5a9e27
feat: Add XLA command buffer diagnostics to nsys profiling workflow
robtaylor Mar 9, 2026
2bd9412
chore: Clean up nsys workflow after command buffer investigation
robtaylor Mar 9, 2026
cdc85c0
feat: Add BaSpaCho dense solver integration for CUDA
robtaylor Mar 9, 2026
83d0fe8
feat: Add BaSpaCho option to nsys profiling workflow
robtaylor Mar 10, 2026
cf50c8f
fix: Add libopenblas-dev for BaSpaCho build in nsys workflow
robtaylor Mar 10, 2026
6fefb8e
chore: Add solver availability logging to nsys profiling script
robtaylor Mar 10, 2026
d0a6271
fix: Force reinstall spineax with --reinstall --no-cache for BaSpaCho
robtaylor Mar 10, 2026
3c8568a
chore: Add ls -lR diagnostic for spineax install verification
robtaylor Mar 10, 2026
ab03e95
chore: Add -vvv to uv pip install for BaSpaCho build diagnostics
robtaylor Mar 10, 2026
f2adf52
chore: Enable cancel-in-progress for nsys profiling concurrency group
robtaylor Mar 10, 2026
2347bee
perf: Cache apt packages for CUDA toolkit across CI runs
robtaylor Mar 10, 2026
3ab2cb9
fix: Use sccache for BaSpaCho cmake build, fix python path in CI
robtaylor Mar 10, 2026
3f31bf5
chore: Use cache-apt-pkgs-action for CUDA toolkit caching
robtaylor Mar 10, 2026
08e98c7
chore: Consolidate all apt packages into single cache-apt-pkgs-action…
robtaylor Mar 10, 2026
2749ae9
feat: Add runner selection to nsys profiling workflow
robtaylor Mar 10, 2026
769de43
chore: Add NVIDIA CUDA repo to apt-sources for cache-apt-pkgs-action
robtaylor Mar 10, 2026
495c6d1
fix: Use YAML literal block for apt-sources, make env setup resilient
robtaylor Mar 10, 2026
3f4d08e
fix: Pass CUDAToolkit_ROOT to cmake for BaSpaCho build
robtaylor Mar 10, 2026
6646a36
fix: Enable install scripts for CUDA apt packages, dynamic CUDA root
robtaylor Mar 10, 2026
0c21c0d
fix: Bump apt cache version to invalidate stale cache
robtaylor Mar 10, 2026
874ea27
chore: Add dpkg and CUDA file diagnostics to env setup
robtaylor Mar 10, 2026
090617f
fix: Include runner name in apt cache key
robtaylor Mar 10, 2026
205bffa
fix: Remove duplicate NVIDIA CUDA apt source causing Signed-By conflict
robtaylor Mar 10, 2026
68d59d4
ci: Add python-version to setup-uv for correct cache keys
robtaylor Mar 10, 2026
5b91c1c
ci: Use cache-apt-pkgs-action for all apt installs
robtaylor Mar 10, 2026
5cbb073
fix: Run ldconfig after apt cache restore to fix OpenBLAS symlinks
robtaylor Mar 10, 2026
28dc623
fix: Reinstall libopenblas-dev if .so missing after apt cache restore
robtaylor Mar 10, 2026
19e4633
fix: Run apt --fix-broken install before reinstalling openblas
robtaylor Mar 10, 2026
92060b6
fix: Force dpkg overwrite to fix gcc-14-base version conflict from cache
robtaylor Mar 10, 2026
11a1211
fix: Remove apt workaround, deleted corrupted cache instead
robtaylor Mar 10, 2026
346dfd1
ci: Replace cuda-toolkit-12-6 with minimal CUDA packages
robtaylor Mar 10, 2026
2bc8f4e
ci: Add libcufft-12-6 runtime library for JAX CUDA plugin
robtaylor Mar 10, 2026
c13e4a5
ci: Disable uv cache pruning to preserve downloaded packages
robtaylor Mar 11, 2026
e230113
ci: Add cuda-libraries-12-6 and cuda-cupti-12-6 for JAX runtime
robtaylor Mar 11, 2026
5ac9686
ci: Pass sccache to spineax build via cmake.define flags
robtaylor Mar 11, 2026
4136e73
ci: Add quarterly uv cache cleanup workflow
robtaylor Mar 11, 2026
369f6ac
fix: Add CUDA lib64 to LD_LIBRARY_PATH for JAX runtime
robtaylor Mar 11, 2026
49c0bfd
fix: Write LD_LIBRARY_PATH as single GITHUB_ENV entry
robtaylor Mar 11, 2026
5d81200
debug: Add phase timing to nsys profiling target
robtaylor Mar 11, 2026
b863279
perf: Scope nsys capture to run_transient() via NVTX range
robtaylor Mar 11, 2026
ad5d880
fix: Drop NVTX capture-range and remove --no-cache from spineax build
robtaylor Mar 11, 2026
20c1430
ci: Fix BLAS detection and add compiler cache to UMFPACK build
robtaylor Mar 11, 2026
5a22180
debug: Add XLA VLOG flags for command buffer conversion diagnostics
robtaylor Mar 11, 2026
ca33f18
fix: Use TF_CPP_VMODULE for XLA debug logging (not TF_CPP_VLOG_FLAGS)
robtaylor Mar 11, 2026
e389e63
ci: Comment out XLA VLOG flags after confirming command buffer findings
robtaylor Mar 11, 2026
ea5a6b4
ci: Re-enable XLA VLOG for command buffer verification
robtaylor Mar 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 57 additions & 43 deletions .github/workflows/benchmark-comparison.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ on:
schedule:
- cron: '0 6 * * *' # Daily at 6am UTC

concurrency:
group: benchmark-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
CARGO_TERM_COLOR: always
CARGO_INCREMENTAL: 0
Expand All @@ -34,8 +38,6 @@ jobs:
name: benchmark (${{ matrix.solver }}-${{ matrix.platform }})
runs-on: ${{ matrix.runner }}
timeout-minutes: ${{ matrix.solver == 'sparse' && matrix.platform == 'cuda' && 360 || 90 }}
concurrency:
group: benchmark-${{ matrix.solver }}-${{ matrix.platform }}-${{ github.ref }}

steps:
- name: Checkout with submodules
Expand All @@ -54,45 +56,39 @@ jobs:
nvcc --version 2>/dev/null || echo "nvcc not in PATH"

# ── System dependencies ──────────────────────────────────────
- name: Cache apt packages
uses: actions/cache@v4
with:
path: /var/cache/apt/archives
key: apt-${{ runner.os }}-${{ runner.arch }}-${{ matrix.platform }}-v1
restore-keys: |
apt-${{ runner.os }}-${{ runner.arch }}-${{ matrix.platform }}-

- name: Install system dependencies (CPU)
- name: Install apt dependencies (CPU)
if: matrix.platform == 'cpu'
run: |
sudo apt-get update
sudo apt-get install -y \
cmake ninja-build \
flex bison libfl-dev \
libsuitesparse-dev libopenblas-dev \
uses: robtaylor/cache-apt-pkgs-action@feat/apt-sources
with:
packages: >-
llvm-18-dev clang-18 libclang-18-dev lld-18
cmake ninja-build flex bison libfl-dev
libsuitesparse-dev libopenblas-dev
ccache bc
apt-sources: |
https://apt.llvm.org/llvm-snapshot.gpg.key | deb http://apt.llvm.org/noble/ llvm-toolchain-noble-18 main
execute_install_scripts: true

- name: Install system dependencies (CUDA)
- name: Install apt dependencies (CUDA)
if: matrix.platform == 'cuda'
uses: robtaylor/cache-apt-pkgs-action@feat/apt-sources
with:
packages: >-
llvm-18-dev clang-18 libclang-18-dev lld-18
cuda-nvcc-12-6 cuda-cudart-dev-12-6 cuda-driver-dev-12-6
libcublas-dev-12-6 libcusolver-dev-12-6 libcusparse-dev-12-6
libnvjitlink-dev-12-6
cuda-libraries-12-6 cuda-cupti-12-6
libcudnn9-cuda-12 libcudss0-cuda-12
libsuitesparse-dev libopenblas-dev swig cmake pkg-config
# NVIDIA CUDA repo is already on the runner image (cuda-archive-keyring.gpg).
# Adding it again via apt-sources causes "Conflicting Signed-By" error.
apt-sources: |
https://apt.llvm.org/llvm-snapshot.gpg.key | deb http://apt.llvm.org/noble/ llvm-toolchain-noble-18 main
execute_install_scripts: true

- name: Set LLVM and CUDA environment
run: |
sudo apt-get update
sudo apt-get install -y \
cmake pkg-config swig \
libsuitesparse-dev libopenblas-dev

# ── LLVM 18 (idempotent, works on all runners) ──────────────
- name: Install LLVM 18
run: |
if ! llvm-config-18 --version 2>/dev/null; then
wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | \
sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc > /dev/null
sudo chmod a+r /etc/apt/trusted.gpg.d/apt.llvm.org.asc
wget -q https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 18
rm llvm.sh
fi
sudo apt-get install -y lld-18
echo "/usr/lib/llvm-18/bin" >> "$GITHUB_PATH"
echo "LLVM_SYS_181_PREFIX=/usr/lib/llvm-18" >> "$GITHUB_ENV"

Expand Down Expand Up @@ -134,24 +130,37 @@ jobs:
openvaf_jax/openvaf_py
vendor/OpenVAF

# ── CUDA toolkit ─────────────────────────────────────────────
- name: Install CUDA toolkit
# ── CUDA environment ─────────────────────────────────────────
- name: Set CUDA environment
if: matrix.platform == 'cuda'
run: |
sudo apt-get install -y cuda-toolkit-12-6 libcudnn9-cuda-12 libcudss0-cuda-12
echo "/usr/local/cuda-12.6/bin" >> "$GITHUB_PATH"
CUDSS_LIB=$(dpkg -L libcudss0-cuda-12 | grep '\.so' | head -1)
CUDA_ROOT=$(find /usr/local -maxdepth 1 -name "cuda-12*" -type d 2>/dev/null | sort -V | tail -1)
if [ -z "$CUDA_ROOT" ] && [ -d "/usr/local/cuda" ]; then
CUDA_ROOT="/usr/local/cuda"
fi
EXTRA_LD=""
if [ -n "$CUDA_ROOT" ]; then
echo "${CUDA_ROOT}/bin" >> "$GITHUB_PATH"
# CUDA lib64 has cuSPARSE, cuFFT, etc. needed by JAX at startup.
EXTRA_LD="${CUDA_ROOT}/lib64"
fi
CUDSS_LIB=$(dpkg -L libcudss0-cuda-12 2>/dev/null | grep '\.so' | head -1)
if [ -n "$CUDSS_LIB" ]; then
CUDSS_DIR=$(dirname "$CUDSS_LIB")
echo "LD_LIBRARY_PATH=${CUDSS_DIR}:${LD_LIBRARY_PATH}" >> "$GITHUB_ENV"
EXTRA_LD="${CUDSS_DIR}:${EXTRA_LD}"
echo "cuDSS library found at: $CUDSS_LIB"
fi
if [ -n "$EXTRA_LD" ]; then
echo "LD_LIBRARY_PATH=${EXTRA_LD}:${LD_LIBRARY_PATH}" >> "$GITHUB_ENV"
fi

# ── Python environment ──────────────────────────────────────
- name: Install uv
uses: astral-sh/setup-uv@v6
with:
enable-cache: true
prune-cache: false
python-version: "3.12"

- name: Set up Python
run: uv python install 3.12
Expand Down Expand Up @@ -193,7 +202,12 @@ jobs:
working-directory: vajax/sparse
run: |
uv pip install scikit-build-core nanobind
uv pip install --no-build-isolation .
LAUNCHER=${{ matrix.platform == 'cuda' && 'sccache' || 'ccache' }}
uv pip install --no-build-isolation \
-C cmake.define.BLA_VENDOR=OpenBLAS \
-C cmake.define.CMAKE_C_COMPILER_LAUNCHER=$LAUNCHER \
-C cmake.define.CMAKE_CXX_COMPILER_LAUNCHER=$LAUNCHER \
.

# ── Run VAJAX unit tests (CPU dense only) ────────────────────
- name: Run VAJAX tests
Expand Down
23 changes: 23 additions & 0 deletions .github/workflows/cache-cleanup.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Cache Cleanup

on:
schedule:
- cron: '0 0 1 */3 *' # First day of every 3rd month
workflow_dispatch:

jobs:
cleanup:
runs-on: ubuntu-latest
steps:
- name: Delete uv caches
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
echo "=== Deleting uv caches ==="
gh cache list --repo ${{ github.repository }} --key setup-uv --json id,key,sizeInBytes \
--jq '.[] | "\(.id)\t\(.key)\t\(.sizeInBytes)"' | \
while IFS=$'\t' read -r id key size; do
echo "Deleting: $key ($(numfmt --to=iec $size))"
gh cache delete "$id" --repo ${{ github.repository }}
done
echo "Done"
6 changes: 6 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ on:
pull_request:
branches: [main]

concurrency:
group: lint-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
lint:
runs-on: ubuntu-latest
Expand All @@ -17,6 +21,8 @@ jobs:
uses: astral-sh/setup-uv@v6
with:
enable-cache: true
prune-cache: false
python-version: "3.11"

- name: Set up Python
run: uv python install 3.11
Expand Down
Loading
Loading