[Examples] Add MachSuite Benchmarks#287
Conversation
|
I think we need to clean up the folder to eliminate the data files. It'd be better to include those inputs in the kernel file or use small random inputs to reduce the testing time. |
|
Yes, I need to clean up the tests, I'll let you know when this PR is ready |
|
BugBot run |
🚨 BugBot failed to runRemote branch not found for this Pull Request. It may have been merged or deleted (requestId: serverGenReqId_74d9584d-f604-4c85-a057-e061761da4fc). |
|
@zzzDavid Could you find some time to clean up this PR? |
There was a problem hiding this comment.
Pull request overview
This PR adds MachSuite benchmark implementations as Allo examples. MachSuite is a collection of 19 benchmarks representing low-level kernels suitable for hardware acceleration. The implementation includes various algorithms across different domains including sparse matrix operations, sorting, FFT, graph algorithms, neural networks, and cryptography.
Changes:
- Added Allo implementations for multiple MachSuite benchmarks (spmv, mergesort, md, kmp, gemm, fft, bfs, backprop, aes)
- Included test infrastructure and data files for verification
- Provided both basic implementations and optimized versions for some benchmarks
Reviewed changes
Copilot reviewed 54 out of 87 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/machsuite/spmv/ellpack/ellpack.py | ELLPACK sparse matrix-vector multiplication implementation |
| examples/machsuite/spmv/crs/run_test.py | Test runner for CRS sparse matrix format |
| examples/machsuite/spmv/crs/crs.py | CRS sparse matrix-vector multiplication implementation |
| examples/machsuite/mergesort/testing_backup.py | Backup testing file for merge sort with reference implementation |
| examples/machsuite/mergesort/mergesort.py | Merge sort implementation (commented out with debugging notes) |
| examples/machsuite/merge/testing.py | Test runner for merge sort |
| examples/machsuite/merge/mergesort.py | Alternative merge sort implementation |
| examples/machsuite/md/knn/md.py | Molecular dynamics k-nearest neighbor implementation |
| examples/machsuite/md/grid/md.py | Molecular dynamics grid-based implementation |
| examples/machsuite/kmp/kmp.py | Knuth-Morris-Pratt string matching algorithm |
| examples/machsuite/gemm/gemm_ncubed.py | Basic N-cubed GEMM implementation |
| examples/machsuite/gemm/gemm_blocked.py | Blocked GEMM implementation |
| examples/machsuite/fft/transpose/transpose_fft.py | FFT with transpose optimization |
| examples/machsuite/fft/strided/strided_fft.py | FFT with strided memory access |
| examples/machsuite/bfs/bfs_queue_allo.py | Breadth-first search using queue-based approach |
| examples/machsuite/bfs/bfs_bulk_allo.py | Breadth-first search using bulk synchronous approach |
| examples/machsuite/backprop/backprop.py | Neural network backpropagation implementation |
| examples/machsuite/aes/aes.py | AES encryption implementation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 64 out of 64 changed files in this pull request and generated 65 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
zzzDavid
left a comment
There was a problem hiding this comment.
Thanks for the review. Addressed the valid findings:
Fixed:
- Removed duplicate imports in
transpose_fft.py(float32, int32imported twice) - Removed unused
cmplx_MUL_x/yfunctions (identical tocmplx_M_x/yand never called) - Removed commented-out
FFT4code block intranspose_fft.py - Removed commented-out class definitions in
md/grid/md.py - Removed unused imports across 11 files:
numpy,math,Struct,index,Int,int32,float32,allo.ir.types as T
Not applicable (false positives):
- "Call to a non-callable of builtin-class module" (~40 comments):
allo.customize()is the core API of the Allo framework — these are all valid calls, not errors. N_TOKENSinviterbi.py: standard pattern documenting the emission matrix shape, consistent withN_OBSandN_STATESwhich are used.
|
Hi @chhzh123 this PR is ready for human review :) |
chhzh123
left a comment
There was a problem hiding this comment.
I suggest better organizing the files and making them consistent. It'd be better to separate the Allo implementation and the Numpy implementation into two different files, and create a run_test.py for each application.
|
@chhzh123 Addressed your review feedback:
|
- Fix allo.sin()/allo.cos(): add missing SinOp to math op dispatch dict and add F64Type support to the type guard in builder.py (MLIR math dialect supports all float types natively) - Add new implementations: Needleman-Wunsch (nw/) and Radix Sort (sort/radix/) - Fix all benchmarks: resolve loop iterator pre-declarations, scalar-from-array workarounds, caller variable shadowing, module-level customize interference, AoS data parsing, and hardcoded user paths - Fix all test scripts to use os.path.dirname(__file__) for portable data paths - Add stencil data files for stencil2d/stencil3d All 19 MachSuite benchmark variants build and pass with LLVM backend: AES, Backprop, BFS/Bulk, BFS/Queue, FFT/Strided, FFT/Transpose, GEMM/Ncubed, GEMM/Blocked, KMP, MD/Grid, MD/KNN, MergeSort, NW, RadixSort, SPMV/CRS, SPMV/ELLPACK, Stencil2D, Stencil3D, Viterbi Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 36 .data files (~130K lines) with programmatic input generation and Python/NumPy reference validation in each test. All 19 benchmarks generate their own inputs (seeded random or inline constants) and validate against a Python reference implementation instead of golden data files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add psize.json with "full" and "small" size tiers for all 19 benchmarks, following the polybench pattern. Guard module-level allo.customize()/build() calls so kernel files can be imported without side effects. Add test_*() functions to each benchmark that accept a size parameter, and create test_machsuite.py as the pytest entry point using small sizes for CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…te mergesort dir Remove 14 files (680 lines) not used by the test suite: debug scripts (debug_aes.py, reproduce.py, no-code.py, neg_loop_step.py), file I/O helpers (support.py, read.py, write.py), HLS synthesis variants (*_opt.py), the duplicate mergesort/ directory, and setup-py312.sh. Move top-level imports of removed modules into __main__ guards in generate.py and viterbi.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run the 19 MachSuite benchmark tests (small sizes) alongside the existing polybench benchmarks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When pytest discovers examples/machsuite/ as a directory, it was collecting tests from both test_machsuite.py and individual subdirectory files (34 items), causing module name collisions (md/grid/md.py vs md/knn/md.py) and fatal aborts. The conftest.py ignores all benchmark subdirectories so only test_machsuite.py is collected (19 items). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ed-out code Address Copilot review findings: - Remove duplicate float32/int32 imports and unused cmplx_MUL_x/y in transpose_fft.py - Remove unused imports (numpy, math, Struct, index, Int, int32, float32, T) across 11 files - Remove commented-out FFT4 block and class definitions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unused `import numpy as np` from ellpack.py, radix_sort.py, gemm_ncubed.py, and nw.py kernel files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reorganize the BFS benchmark into two subfolders matching the other multi-variant benchmarks (md/grid, md/knn, fft/strided, fft/transpose). Each subfolder contains the Allo kernel, Python reference, and run_test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename all test files to `run_test.py` for consistency across benchmarks - Flatten `sort/radix/` to `radix_sort/` since there's only one sorting algorithm - Update test_machsuite.py to reflect all path changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Combine run_test_blocked.py into run_test.py so each benchmark has a single test file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Inline the Python reference implementation into run_test.py and remove the separate viterbi.py file. Clean up unused imports in viterbi_allo.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename viterbi_allo.py to viterbi.py - Merge BFS python references into run_test.py, remove bfs_bulk_python.py and bfs_queue_python.py - Clean up kernel files: remove __main__ blocks and unused imports Each benchmark now consistently has <name>.py (Allo kernel) and run_test.py (test + reference implementation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Every benchmark now follows the same pattern: |
Description
MachSuite is a set of 19 benchmarks designed to mimic low-level kernels suitable for hardware acceleration, this PR adds MachSuite Allo implementations as examples.
Contributors
Francis Pham @fpham0701
Rhoda Ma @rhodama
Raymond Lin @rlin569
William Yoon @wty5
Nicole Li @nicolelii
Juhyoung Lee @Juhyoung29
Yuqiang Ge @YqGe585
Checklist