Releases: JuliaGPU/JACC.jl
Releases · JuliaGPU/JACC.jl
v1.0.0
What's Changed
- Use explicit type for dims in ParallelReduce by @PhilipFackler in #292
- Fix backend bits by @PhilipFackler in #293
- Fix type instability causing crash in 2D parallel_for on AMDGPU by @PhilipFackler in #299
- Add to_device and create_stream by @PhilipFackler in #300
- Versions of
array()for allocating uninitialized arrays by @PhilipFackler in #301 - Add Apple GPU CI on ExCL by @williamfgc in #305
- Fix runners in CI by @williamfgc in #307
- Metal backend by @williamfgc in #306
- Use correct arch label in CI by @williamfgc in #308
- Update README by @williamfgc in #310
- Correct dimensions for
JACC.sharedby @PhilipFackler in #309 - Use explicit type for workspace member to avoid type instability by @PhilipFackler in #313
- Add basic macro syntax by @PhilipFackler in #312
- Refactored
ParallelReduceouter constructors into JACC.reducer by @PhilipFackler in #314 - Update AMDGPU perf-test kernel by @luraess in #315
- Custom ranges for parallel_for and parallel_reduce by @PhilipFackler in #316
- Repo GPU CI JuliaORNL to JuliaGPU by @williamfgc in #318
- Fix ReadMe by @williamfgc in #317
- Fix documentation link in badge by @williamfgc in #319
- Update deploydocs org to JuliaGPU by @williamfgc in #321
- Add API documentation by @williamfgc in #320
- Update api_usage.md by @PhilipFackler in #322
New Contributors
Full Changelog: v0.6.0...v1.0.0
v0.6.0
What's Changed
- Skip set_backend if passed the same backend by @PhilipFackler in #251
- Add ubuntu arm runner by @PhilipFackler in #250
- Add project documentation by @williamfgc in #252
- Add docs badge and acknowledgement by @williamfgc in #253
- Add missing
synchronizeimplementation by @PhilipFackler in #256 - Improvements to "threads" backend by @PhilipFackler in #257
- Occupancy with oneAPI by @PhilipFackler in #260
- Fixed incorrect condition in AMDGPU LaunchSpec parallel_for by @PhilipFackler in #263
- Update NVIDIA CI runner by @williamfgc in #269
- Managed/Unmanaged reduce workspace for GPU backends by @PhilipFackler in #270
- Use -1 to signal default shmem_size by @PhilipFackler in #272
- Added to_host function by @PhilipFackler in #273
- Add GTX1080 CI workflow by @williamfgc in #274
- Docs badge by @williamfgc in #275
- Implement JACC.Multi for oneAPI by @PhilipFackler in #279
- Add
@inlineto default parallel_for by @williamfgc in #282 - N-dimensional versions and API update by @PhilipFackler in #276
- Allow more manipulation of backends by @PhilipFackler in #277
- Add JACC.Async implementations by @PhilipFackler in #278
- Simplify parallel_reduce implementation by @PhilipFackler in #284
- Test cleanup and benchmarks by @PhilipFackler in #246
- Add
Optype parameter toParallelReduceby @PhilipFackler in #287 - Updated version to 0.6.0 by @PhilipFackler in #288
- Fixed compat entries by @PhilipFackler in #289
Full Changelog: v0.5.0...v0.6.0
v0.5.0
What's Changed
- Added JACC.Async for CUDA backend. JACC.Async.copy does not work XD by @pedrovalerolara in #227
- Fix scoping for parallel_for in threads impl by @PhilipFackler in #229
- Fix conditions in 2d reduce kernel by @PhilipFackler in #231
shared(::AbstractArray)andsync_workgroup()by @PhilipFackler in #202- JACC.Multi API updates by @PhilipFackler in #228
- Use max shmem device props by @PhilipFackler in #235
- Added API functions for do-style syntax by @PhilipFackler in #241
- Scope synchronize properly for threads by @PhilipFackler in #243
- Added
fillfunction by @PhilipFackler in #244 - Fix AMDGPU version by @williamfgc in #239
- Update README by @williamfgc in #238
- Refactor parallel_reduce for do syntax by @PhilipFackler in #248
- Updated version to 0.5.0 by @PhilipFackler in #249
Full Changelog: v0.4.0...v0.5.0
v0.4.0
What's Changed
- Add
LaunchSpecversions ofparallel_reduceby @PhilipFackler in #216 - Better occupancy for 2D parallel_for and prevent oversubscription by @PhilipFackler in #224
- Release v0.4.0 by @PhilipFackler in #225
Full Changelog: v0.3.1...v0.4.0
v0.3.1
What's Changed
- Replaced while loops in reduce kernels by @PhilipFackler in #206
- Change
shmem_sizeto use thread count (like AMDGPUExt) by @PhilipFackler in #210 - Release v0.3.1 by @PhilipFackler in #211
Full Changelog: v0.3.0...v0.3.1
v0.3.0
What's Changed
- Bump to Julia 1.11.3 in cousteau by @williamfgc in #197
- Reorganize source code and modules by @PhilipFackler in #193
- Release v0.3.0 by @PhilipFackler in #204
- Make sure comparison is in host memory in
sharedtest case by @PhilipFackler in #208
Full Changelog: v0.2.1...v0.3.0
v0.2.1
What's Changed
- Install backend on
set_backendby @PhilipFackler in #195 - Release v0.2.1 by @PhilipFackler in #196
Full Changelog: v0.2.0...v0.2.1
v0.2.0
What's Changed
- Added JACCASYNC API for threads.jl backend. Note that JACC.Async work… by @pedrovalerolara in #161
- Bump Atomix to 1.0.1 by @williamfgc in #165
- Remove rocm by @williamfgc in #166
- Bump AMDGPU CI version by @williamfgc in #168
- Update CI labels by @williamfgc in #169
- Update README info by @williamfgc in #164
- Added @init_backend for user convenience. by @PhilipFackler in #170
- Update
parallel_reduceby @PhilipFackler in #173 - Remove
Arraytype and addarrayfunction by @PhilipFackler in #177 - Fix bug in oneAPI 1D reduce by @PhilipFackler in #182
- WIP: AMDGPU compute occupancy by @PhilipFackler in #178
- WIP: Better blocks/threads calculations for CUDA backend by @PhilipFackler in #136
- Add parallel_for API with keyword struct by @PhilipFackler in #188
- Use computed occupancy for amdgpu parallel_reduce by @PhilipFackler in #190
- Release v0.2.0 by @PhilipFackler in #191
Full Changelog: v0.1.1...v0.2.0
v0.1.1
What's Changed
- Update to macos-latest for github actions by @PhilipFackler in #152
- Update to macos-latest for github actions by @PhilipFackler in #153
- Switched to TestItemRunner and enabled selecting tests by name or tag by @PhilipFackler in #154
- Updated JACC.BLAS test by @PhilipFackler in #155
- Update oneAPI testing by @PhilipFackler in #151
- Fixed
@maybe_threadedto work as intended with precompilation by @PhilipFackler in #158 - Release v0.1.1 by @PhilipFackler in #159
Full Changelog: v0.1.0...v0.1.1
JACC v0.1.0
What's Changed
- Reorder GPU grid indices by @williamfgc in #104
- Reorder AMDGPU gridsize by @williamfgc in #105
- Revert reordering by @williamfgc in #106
- swapped thread dimension by @ygtangg in #107
- Changed thread dimension to: 1-32-32 by @ygtangg in #108
- Reverted GPU thread dimension: 32-32-1 by @ygtangg in #109
- Promoted JACC.shared for CUDA backend. Added test case in tests_cuda by @pedrovalerolara in #111
- Promoted JACC.shared for AMDGPU backend. Test added by @pedrovalerolara in #112
- Promoted JACC.shared OneAPI implementation. Added testing. by @pedrovalerolara in #113
- Blas level1 by @hetmankad in #110
- Added JACC.multi for CUDA. Testing doesn't work, I got a segmentation… by @pedrovalerolara in #116
- Added JACC.multi implementation for the AMDGPU backend, non-included … by @pedrovalerolara in #124
- Update JACCMULTI.jl by @pedrovalerolara in #125
- Update to latest checkout action by @williamfgc in #126
- Added stencil-aware funtions for JACC.multi on CUDA back ends. There … by @pedrovalerolara in #127
- Integrate other PRs for fixing extensions by @PhilipFackler in #123
- Bump Julia to 1.10 on AMD GPU CI on cousteau by @williamfgc in #128
- Added ghost cells support for JACC.multi on AMDGPU backend. There is … by @pedrovalerolara in #132
- Added range checks to parallel_for implementations by @PhilipFackler in #131
- Moved and refactored most tests as common portable versions by @PhilipFackler in #135
- Custom operators for
parallel_reduceby @PhilipFackler in #120 - Capitalized modules Multi and Experimental and applied formatting by @PhilipFackler in #146
- Release v0.1.0 by @PhilipFackler in #147
- Fixed [compat] entry for julia by @PhilipFackler in #148
New Contributors
- @ygtangg made their first contribution in #107
- @hetmankad made their first contribution in #110
Full Changelog: v0.0.5...v0.1.0