[pull] master from ggml-org:master by pull[bot] · Pull Request #800 · LongLeCE/llama.cpp

pull · 2026-01-21T20:42:02Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Change ggml_vk_mul_mat_vec_id_q_f16 to loop over the batch dimension and update the indexing calculations in get_offsets. Mat-vec is faster than mat-mat for small values of n. We don't get the same reuse of the weights as in the non-ID path, but with this the cost is linear in n rather than n>1 being far slower than n==1.

… subgroup crash (#17356)" (#18831) This reverts commit 980b7cd.

* from previous PR * Make instruction(system) as first message * Convert [input_message] (text/image/file) * Rename convert_responses_to_chatcmpl(body) -> response_body * Initial tool call support * Erase instructions field from chatcmpl body * Feed reasoning texts to chat template * Use std::vector instead of opaque json array * Make output_item.added events consistent * Move `server_task_result_cmpl_partial::update` from header to source * Match ID of output_item.added and .done events * Add function_call only if there is no "fc_" prefix * Add function call output at non-streaming API * Test if ID is persistent * Add doc * Fix style - use trailing comma * Rewrite state management * catch up with upstream/master * Fix style - "type" is the first item of SSE data * Explicitly check "instructions" from response_body * Make lambdas static * Check if reasoning content exists * Add `oai_resp_id` to task_result_state(also initialized at ctor), server_task_result_cmpl_partial, and server_task_result_cmpl_final * Reject `input_file` since it is not supported by chatcmpl * Add "fc_" prefix to non-straming function call id as coderabbit pointed out --------- Co-authored-by: openingnow <>

…8987) Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* vulkan: Remove transfer_ctx, do everything in compute_ctx. We had a bug where a set_tensor_async (using transfer_ctx) didn't get submitted before the graph_compute (using compute_ctx) that came after it. To avoid this sort of issue, just do everything in compute_ctx. Remove transfer_cmd_pool, which was already unused. * fix crash with perf logger

This commit removes the mention of RoPE in the comment for the Q and K computation as RoPE is not applied.

* fix: Use `tabular-nums` for chat message statistics * fix: Rebuild WebUI

* jinja: support none|string * Update common/jinja/value.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-jinja.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add as_string() --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

jeffbolznv and others added 9 commits January 21, 2026 16:22

Revert "vulkan: force full subgroups for flash attention to fix intel…

067b8d7

… subgroup crash (#17356)" (#18831) This reverts commit 980b7cd.

vulkan: support flash attention GQA/split_k with small batches (#18938)

33f890e

common : improve error message when HTTPS is missing but required (#1…

14be5a3

…8987) Signed-off-by: Adrien Gallouët <angt@huggingface.co>

llama : clarify nemotron-h.cpp comment about RoPE [no ci] (#18997)

9da3dcd

This commit removes the mention of RoPE in the comment for the Q and K computation as RoPE is not applied.

fix: Use tabular-nums for chat message statistics (#18915)

3802d3c

* fix: Use `tabular-nums` for chat message statistics * fix: Rebuild WebUI

pull bot locked and limited conversation to collaborators Jan 21, 2026

pull bot added the ⤵️ pull label Jan 21, 2026

pull bot merged commit c301172 into LongLeCE:master Jan 21, 2026

github-actions bot added testing examples python ggml server Vulkan model jinja parser labels Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#800

[pull] master from ggml-org:master#800
pull[bot] merged 9 commits intoLongLeCE:masterfrom
ggml-org:master

pull bot commented Jan 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

pull bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pull bot commented Jan 21, 2026 •

edited

Loading