Skip to content

[pull] master from ggml-org:master#809

Merged
pull[bot] merged 2 commits intoLongLeCE:masterfrom
ggml-org:master
Jan 24, 2026
Merged

[pull] master from ggml-org:master#809
pull[bot] merged 2 commits intoLongLeCE:masterfrom
ggml-org:master

Conversation

@pull
Copy link

@pull pull bot commented Jan 24, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

chraac and others added 2 commits January 23, 2026 22:02
* optimize flash attention kernel by improving score computation and online softmax update

* wip

* Refactor online softmax update in flash attention kernel for improved performance

* Optimize flash attention kernel by replacing float array with HVX_Vector for score computation

* wip
* ggml-cuda: add split-wise cuda graph

* add n-cpu-moe compare_llama_bench.py

* fix hip/musa builds
@pull pull bot locked and limited conversation to collaborators Jan 24, 2026
@pull pull bot added the ⤵️ pull label Jan 24, 2026
@pull pull bot merged commit 81ab64f into LongLeCE:master Jan 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants