-
Notifications
You must be signed in to change notification settings - Fork 140
Pull requests: jd-opensource/xllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat: support index cache transfer in PD separate deployment scenario.
#895
opened Feb 7, 2026 by
sunbaosong
Loading…
feat: adapt for CANN 8.5 and PyTorch 2.7.1 for npu device.
#891
opened Feb 6, 2026 by
haimbb000
Loading…
feat: add beam search for recommandation in cuda env.
#890
opened Feb 6, 2026 by
liujinguang0125
Loading…
bugfix: fix mrope position tensor shape mismatch in NPU graph mode.
#885
opened Feb 4, 2026 by
QwertyJack
Loading…
2 tasks done
feat: add VMM submitter APIs for non-blocking vmm::map/unmap.
#874
opened Feb 3, 2026 by
shifengmin
Loading…
feat: add kernels that support Qwen3 model on musa device.
#856
opened Feb 2, 2026 by
FleckyFelix
Loading…
feat: implement rpc interface in APIService for xllm service internal usage.
#837
opened Jan 30, 2026 by
weizhehuang0827
Loading…
bugfix: fix MTP k>1 crash by loading embed_tokens weights
#836
opened Jan 29, 2026 by
QwertyJack
•
Draft
3 tasks done
feat: support QwenImageEditPlus pipeline with embedding infer.
#834
opened Jan 29, 2026 by
shan-chen-feng
Loading…
bugfix: fix KV cache memory leak when prefix cache is enabled.
#829
opened Jan 29, 2026 by
QwertyJack
•
Draft
3 tasks
bugfix: get suitable token budget that allocated for sequence when enable MTP with overlap.
#823
opened Jan 28, 2026 by
RobbieLeung
Loading…
bugfix: fix missing M-RoPE section in GLM-4V model args.
#822
opened Jan 28, 2026 by
wly-115
Loading…
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.