Pr/step3.5 flash by Nexesenex · Pull Request #338 · Nexesenex/croco.cpp

Nexesenex · 2026-02-06T15:45:52Z

Continue Tasks: ▶️ 1 queued — View all

* metal : skip loading all-zero mask * cont : minor

* vulkan: make FA mask/softcap enables spec constants * don't specialize for sinks * bump timeout a little bit

…gml-org#19376) The cpu and cuda backends use fp16 for the VKQ accumulator type, this change does the same for vulkan. This helps particularly with large head sizes which are very register-limited. I tried this for the coopmat1 path and it slowed down a bit. I didn't try for scalar. I applied the softmax bias that the cuda backend uses to avoid overflow, although I was not able to reproduce the original bug without it.

* kimi linear model implementation * kimi linear convert_hf_to_gguf * kimi linear constants.py tensor_mapping.py * Kimi Linear ggml.h * kimi linear ggml-cpu * Kimi Linear ggml-cuda * Kimi Linear ggml.c * kimi linear src/llama * remove "const int64_t n_seq_tokens = q->ne[2];" to get rid of unused variable warning * remove type mismatch warning * read MoE params * removed some hard coded code * removed all hard code * use DeepseekV2 tokenizer * removed unnecessary internal methods called by the old set_vocab of KimiLinear * rewrite get_vocab for KimiLinear. Removed all kda_scan code * removed all traces of kda_scan * reduce OP count by 1 due to removal of kda_scan * Move KIMI_LINEAR to llm_arch_is_hybrid to enable KV cache * set n_embd_head_k/v to ensure kv cache works * don't quantize conv1d of Kimi Linear * Kimi Linear backend agnostic * removed LOG_INFO * naive chunking form implemented * fixed some comments * add Kimi-K2 specific tokens to be recognized as EOG * build_kda_autoregressive is implemented to replace build_kda_recurrent for faster inference. sync'd to b7682 * replaced Akk and Aqk with mul_mat and clamp * no clamp version * Moved Aqk computation out of the loop * fixed typo and split wkv_b into wk_b and wv_b * MLA KV cache support * fix trailing spaces * moved const llama_model & model; around to follow qwen3next format and see if it cna pass the -Wunused-private-field error * fix trailing whitespace * removed traling whitespaces in empty line + make sure indentation is multiple of 4 * try to make lint happy * remove blank lines to make lint happy * removed at least blank line containing white space * fixed flake8 complaints locally * return ggml_tensor * pair in kda_autoregressive and kda_chunking as in ngxson's Qwen3Next improvement * removed Kimi-Linear specific change that causes failure at server-windows * removed private: from kimi_linear to make build checks happy * removed unnecessary ggml_cont before ggml_reshape * created static function causal_conv1d to abtract similar code for q/k/v * merged dt_bias to SSM_DT. Do -exp(log_A) in convert_hf_to_gguf.py. * reverted to original * fixed find_hparam calls. Fixed e_score_correction_bias to use bias instead of weight. Removed all ssm_conv bias terms. * remove DT_B from constants.py. remove one comment line in llama-model.cpp * new class llm_graph_input_mem_hybrid_k to get around the new MLA change. switch the concat order of ggml_concat calls in kimi-linear.cpp to accommodate MLA changes. Removed support for exp_probs_b.weight * remove ssm_o_norm_b * remove ssm_o_norm_b * changed hparams.kda_head_dim to hparams.n_embd_head_kda. added TODO comment for class llama_graph_mem_hybrid_k * removed all ggml_cont b4 ggml_reshape_4d * Whitespace * replaced all hparams.get with find_hparams * added new names for n_experts, n_experts_used and score_func in TextModel and removed their code in KimiLinear in convert_hf_to_gguf.py. Removed unnecessary ggml_cont and GGML_ASSERT in kimi-linear.cpp * use is_mla to switch between different mem_hybrid types * fixed logical errors in convert_hf_to_gguf.py pointed out by CISC * removed if else for required parameters kv_lora_rank and qk_rope_head_dim * add back ggml_cont for Vcur * minor changes * removed extra line in llama-vocab.cpp. Added back the comment in llama-graph.cpp * f16 gguf cannot run without context length * made a mistake of adding back n_ctx parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ggerganov and others added 21 commits February 6, 2026 09:25

metal : skip loading all-zero mask (ggml-org#19337)

7fcf1ef

* metal : skip loading all-zero mask * cont : minor

vulkan: make FA mask/softcap enables spec constants (ggml-org#19309)

f9bd518

* vulkan: make FA mask/softcap enables spec constants * don't specialize for sinks * bump timeout a little bit

Support Step3.5-Flash

02092c6

fix: norm.weight + 1 (HF zero_centered=true)

5c7c683

step35: simplify GGUF conversion + drop redundant rope KVs

d9d7431

Address review feedback

34a4d1a

rename limits -> clamp

60ab182

Apply suggestions from code review

0293d36

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Apply suggestion from @CISC

ff62b6c

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

rename swiglu limits -> swiglu clamp in LLM_KV

512a735

avoid CI fail

19cfffe

Apply suggestions from code review

f7ca995

Apply suggestions from code review

aea967f

disabled KV shifting for LLM_ARCH_STEP35

4e6e242

Apply suggestions from code review

a7e96cf

mistakenly removed cmath

46e8431

add model size && apply missed suggestion

f542d91

assert partial_rotary_factors

430da16

fix CI errors:

402fc2e

Nexesenex merged commit ef75147 into Nexesenex:Step35 Feb 6, 2026
60 of 77 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr/step3.5 flash#338

Pr/step3.5 flash#338
Nexesenex merged 21 commits intoNexesenex:Step35from
forforever73:pr/step3.5-flash

Nexesenex commented Feb 6, 2026 •

edited by continue bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Nexesenex commented Feb 6, 2026 • edited by continue bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Nexesenex commented Feb 6, 2026 •

edited by continue bot

Loading