Skip to content

[pull] master from ggml-org:master#705

Merged
pull[bot] merged 5 commits intoLongLeCE:masterfrom
ggml-org:master
Dec 24, 2025
Merged

[pull] master from ggml-org:master#705
pull[bot] merged 5 commits intoLongLeCE:masterfrom
ggml-org:master

Conversation

@pull
Copy link

@pull pull bot commented Dec 24, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

yoka and others added 5 commits December 24, 2025 17:19
Move the graph property checking code into methods of LRU cache.

Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>
* model: llama-embed-nemotron

* minor: python lint

* changed arch-name

* templated llm_build_llama to be used for both llama and llama-embed arch
* CUDA: experimental native mxfp4 support for blackwell

* optimize load_tiles

* optimize quantize_mxfp4

* cleanup

* first pass review: formatting

* use interleaved layout for mma

* mmq: add assert for size

* use __nv_fp4x4_e2m1

* use iter_k as 512, cleanup

* Use 1200 as blackwell instead of 1000

* address review comments

* mmq: fix stride

* quantize.cu: use reference impl of e8m0 scale

* address review comments

* add 120f-virtual + minor fixes

---------

Co-authored-by: Aman Gupta <aman>
@pull pull bot locked and limited conversation to collaborators Dec 24, 2025
@pull pull bot added the ⤵️ pull label Dec 24, 2025
@pull pull bot merged commit c8a2417 into LongLeCE:master Dec 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants