[Bug] Qwen3-Omni-30B-A3B-Thinking模型，使用aisbench评测vocalsound数据集精度，精度评分异常

### 操作系统及版本

openEuler24.03-lts

### 安装工具的python环境

docker容器中的python环境

### python版本

3.11

### AISBench工具版本

aisbench版本为当前代码仓master源码

### AISBench执行命令

ais_bench --models vllm_api_stream_chat --datasets vocalsound_gen --mode all --dump-eval-details --merge-ds --debug

### 模型配置文件或自定义配置文件内容

from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.postprocess.model_postprocessors import extract_non_reasoning_content

models = [
    dict(
        attr="service",
        type=VLLMCustomAPIChat,
        abbr="vllm-api-stream-chat",
        path="/home/msmodelslim/model_quant/Qwen3-Omni-30B-A3B-Thinking-w8a8/",
        model="Qwen3",
        stream=True,
        request_rate=0,
        use_timestamp=False,
        retry=2,
        api_key="",
        host_ip="localhost",
        host_port=8066,
        url="",
        max_out_len=2048,
        batch_size=72,
        trust_remote_code=False,
        generation_kwargs=dict(
            temperature=0,
            top_p=1,
            top_k=-1,
            ignore_eos=False,
        ),
        pred_postprocessor=dict(type=extract_non_reasoning_content),
    )
]

### 预期行为

1、精度评测结果正确

### 实际行为

1、精度评分异常，记录的评分值不准确


按照以下图片修改后评分正常


### 前置检查

- [x] 我已读懂主页文档的快速入门，无法解决问题
- [x] 我已检索过FAQ，无重复问题
- [x] 我已搜索过现有Issue，无重复问题
- [x] 我已更新到最新版本，问题仍存在

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Qwen3-Omni-30B-A3B-Thinking模型，使用aisbench评测vocalsound数据集精度，精度评分异常 #167

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

预期行为

实际行为

前置检查

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Qwen3-Omni-30B-A3B-Thinking模型，使用aisbench评测vocalsound数据集精度，精度评分异常 #167

Description

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

预期行为

实际行为

前置检查

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions