[pull] master from ggml-org:master by pull[bot] · Pull Request #756 · LongLeCE/llama.cpp

pull · 2026-01-08T14:42:03Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH * makes the min_batch_size for triggering op offload configurable via env var, defaulting to the prior hardcoded value of 32 * ggml: read GGML_OP_OFFLOAD_MIN_BATCH once and store to dev ctx * cann: forward declaration of device context struct * cann: move offload op check after device context declaration * cuda: fix whitespace Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>

Add template specialization for kernel_mul_mm_id_map0 with ne20=5 to support models using 5 active experts (e.g., VAETKI).

* vendor : update cpp-httplib to 0.30.0 * common : allow custom headers when downloading

* vulkan: optimize ssm_scan * fix warp vs subgroup naming

I added an assert to catch further mismatches, and it found several. Fix those, too.

DocShotgun and others added 7 commits January 8, 2026 11:03

llama-fit-params: free memory target per device (#18679)

64848de

metal : add MoE kernel specialization for ne20=5 (#18667)

945bf10

Add template specialization for kernel_mul_mm_id_map0 with ne20=5 to support models using 5 active experts (e.g., VAETKI).

scripts : support chaining commands in pr2wt.sh (#18671)

f2f6c88

vendor : update cpp-httplib to 0.30.0 (#18660)

55abc39

* vendor : update cpp-httplib to 0.30.0 * common : allow custom headers when downloading

vulkan: optimize ssm_scan (#18630)

cb14b06

* vulkan: optimize ssm_scan * fix warp vs subgroup naming

vulkan: fix push constant size for quantize_q8_1 (#18687)

2524c26

I added an assert to catch further mismatches, and it found several. Fix those, too.

pull bot locked and limited conversation to collaborators Jan 8, 2026

pull bot added the ⤵️ pull label Jan 8, 2026

pull bot merged commit 2524c26 into LongLeCE:master Jan 8, 2026

github-actions bot added Apple Metal Nvidia GPU testing examples python ggml SYCL Ascend NPU server script Vulkan labels Jan 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#756

[pull] master from ggml-org:master#756
pull[bot] merged 7 commits intoLongLeCE:masterfrom
ggml-org:master

pull bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

pull bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pull bot commented Jan 8, 2026 •

edited

Loading