[pull] master from ggml-org:master by pull[bot] · Pull Request #809 · LongLeCE/llama.cpp

pull · 2026-01-24T08:42:02Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* optimize flash attention kernel by improving score computation and online softmax update * wip * Refactor online softmax update in flash attention kernel for improved performance * Optimize flash attention kernel by replacing float array with HVX_Vector for score computation * wip

* ggml-cuda: add split-wise cuda graph * add n-cpu-moe compare_llama_bench.py * fix hip/musa builds

chraac and others added 2 commits January 23, 2026 22:02

ggml-cuda: enable cuda-graphs for n-cpu-moe (#18934)

81ab64f

* ggml-cuda: add split-wise cuda graph * add n-cpu-moe compare_llama_bench.py * fix hip/musa builds

pull bot locked and limited conversation to collaborators Jan 24, 2026

pull bot added the ⤵️ pull label Jan 24, 2026

pull bot merged commit 81ab64f into LongLeCE:master Jan 24, 2026

github-actions bot added Nvidia GPU python ggml script labels Jan 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#809

[pull] master from ggml-org:master#809
pull[bot] merged 2 commits intoLongLeCE:masterfrom
ggml-org:master

pull bot commented Jan 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pull bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pull bot commented Jan 24, 2026 •

edited

Loading