特意来感谢作者，汇报一下复现的硬件配置

硬件:
CPU：epyc9654*1
主板：技嘉MZ33-AR0
内存：三星DDR5 64G*12
显卡：4090 24G涡轮
硬盘：三星M.2 4T
机箱：纸皮ATX机箱
电源：长城金牌2200W
垃圾主板内存位档显卡，又加装一块PCIE16的延长线
BIOS：NSP=1，SMT关，AVX512开，核心数auto

OS1:win server2022
OS2:ubuntu 24.04
驱动:nvidia 570
cuda toolkit:12.4.1
python:3.12
GIT
openai库
flashinfer
AI model:从抱脸上拖下来的 unsloth deepseek-671B-Q4分层gguf格式文件

启动命令：
export HF_ENDPOINT="https://hf-mirror.com"
python -m ktransformers.local_chat \
	--model_path deepseek-ai/DeepSeek-R1 \
	--gguf_path /home/dministrator/models/DeepSeek-R1-Q4_K_M \
	--max_new_tokens 4096 \
	--total_context 102800 \
	--cpu_infer 84 \
	--cache_q4 true \
    --temperature 0.6 \
    --top_p 0.95 

测试环境：
OS1:wsl2 ubuntu
ktransformers0.2.3post2-fancy

OS2：ubuntu
ktransformers0.2.3post2-fancy

local chat mode:
提示词：请说一段50字以内的笑话。

OS1:eval 9tps
OS2:eval 14.02tps

结论:原生ubuntu下，生成的tps速度大约比win或者win wsl2、docker、anaconda、msvc等虚拟环境下要快上40-55%
感谢作者帮我在ubuntu下复现这一过程，我本人纯程序小白，完全不懂，光ubuntu系统下的驱动就装了3天才装上。

![Image](https://github.com/user-attachments/assets/41f1bd9a-be52-49d9-abf0-9667f60f851d)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

特意来感谢作者，汇报一下复现的硬件配置 #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

特意来感谢作者，汇报一下复现的硬件配置 #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions