Releases · chaxu01/llama.cpp

27 Aug 11:10

1e74897

b6298 Latest

Latest

CANN: refactor mask handling and improve performance in FA (#15561)

* CANN(flash-attn): refactor mask handling and improve performance

1. Refactored the mask computation in Flash Attention, unified the logic without separating prefill and decode.
2. Optimized performance in non-alibi scenarios by reducing one repeat operation.
3. Updated operator management to explicitly mark unsupported cases on 310P devices and when dim is not divisible by 16.

Signed-off-by: noemotiovon <757486878@qq.com>

* [CANN]: fix review

Signed-off-by: noemotiovon <757486878@qq.com>

* [CANN]: Optimization FA BNSD to BSND

Signed-off-by: noemotiovon <757486878@qq.com>

---------

Signed-off-by: noemotiovon <757486878@qq.com>

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-27T11:10:16Z
llama-b6298-bin-macos-arm64.zip

sha256:4d3ec49c5850c07e78f117fe56ca39e6dba6c9cbabd490591ff48887b7108421

10.9 MB 2025-08-27T11:10:31Z
llama-b6298-bin-macos-x64.zip

sha256:34c56e374f01617a13d1054a380b0a9d49b04dd62898b91d4df97ca455212249

28.1 MB 2025-08-27T11:10:33Z
llama-b6298-bin-ubuntu-vulkan-x64.zip

sha256:bf9b604781eafd3922564177b0616926d25c3b1d3524791a2d1dedbeb08e2b9e

25 MB 2025-08-27T11:10:34Z
llama-b6298-bin-ubuntu-x64.zip

sha256:e3467a47d984a03163ac04d381f226a150d010acbe518954a1688bec0018c649

12.9 MB 2025-08-27T11:10:36Z
llama-b6298-bin-win-cpu-arm64.zip

sha256:2377ddc863e84eef848904ca4b347bb614a10f2e87fd66cf5d1807c16ecc25e0

11.1 MB 2025-08-27T11:10:38Z
llama-b6298-bin-win-cpu-x64.zip

sha256:c6dd8dce6a67b42455ae6f84244c3c23dce6e50dc9ce5f5704ea5d9be49b1f49

14.1 MB 2025-08-27T11:10:39Z
llama-b6298-bin-win-cuda-12.4-x64.zip

sha256:e94299f1c96656fefbc05d5533c5c3fa8a9caf67ce5f1081e8e81c409cc73efb

137 MB 2025-08-27T11:10:40Z
llama-b6298-bin-win-hip-radeon-x64.zip

sha256:2e6df2049cad9a9904b68fd771bddc958aa077eb7872febe7ed4a91d9a33cbca

287 MB 2025-08-27T11:10:46Z
llama-b6298-bin-win-opencl-adreno-arm64.zip

sha256:06822ffc418704c51260344f2a6baa11d73b0447ca88ff93886e5a8f88ec94c7

11.5 MB 2025-08-27T11:10:59Z
Source code (zip)

2025-08-27T09:21:41Z
Source code (tar.gz)

2025-08-27T09:21:41Z

14 Jul 09:18

github-actions

b5891

0d92267

b5891

llama : add jinja template for rwkv-world (#14665)

* llama : add jinja template for rwkv-world

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Assets 15

18 Jun 09:55

github-actions

b5695

9540255

b5695

llama-chat : fix multiple system message for gemma, orion (#14246)

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: chaxu01/llama.cpp

b6298

Uh oh!

b5891

Uh oh!

b5695

Uh oh!