[Bug] textvqa多模态测试预测结果为空，acc为0

### 操作系统及版本

openEuler24.03

### 安装工具的python环境

docker容器中的python环境

### python版本

3.11

### AISBench工具版本

3.1.20260211

### AISBench执行命令

ais_bench --models vllm_api_general_chat --datasets textvqa_gen --debug

### 模型配置文件或自定义配置文件内容

from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.postprocess.model_postprocessors import extract_non_reasoning_content

models = [
    dict(
        attr="service",
        type=VLLMCustomAPIChat,
        abbr="vllm-api-general-chat",
        path="",
        model="qwen3.5",
        stream=False,
        request_rate=0,
        use_timestamp=False,
        retry=2,
        api_key="",
        host_ip="100.100.135.166",
        host_port=8010,
        url="",
        max_out_len=512,
        batch_size=1,
        trust_remote_code=False,
        generation_kwargs=dict(
            temperature=0.6,
            ignore_eos=False,
        ),
        pred_postprocessor=dict(type=extract_non_reasoning_content),
    )
]


### 预期行为

之前已用纯文本数据集进行测试，确定服务端模型可正常提供服务；预期测试textvqa也应输出预测结果

### 实际行为

acc为0，预测结果为空。
执行结果：
[root@fb1f86dde726 train_images]# ais_bench --models vllm_api_general_chat --datasets textvqa_gen --debug
[2026-03-05 02:04:39,255] [ais_bench] [INFO] Loading vllm_api_general_chat: /workspace/benchmark/ais_bench/benchmark/configs/./models/vllm_api/vllm_api_general_chat.py
[2026-03-05 02:04:39,262] [ais_bench] [INFO] Loading textvqa_gen: /workspace/benchmark/ais_bench/benchmark/configs/./datasets/textvqa/textvqa_gen.py
[2026-03-05 02:04:39,265] [ais_bench] [INFO] Loading example: /workspace/benchmark/ais_bench/benchmark/configs/./summarizers/example.py
[2026-03-05 02:04:39,304] [ais_bench] [INFO] Current exp folder: outputs/default/20260305_020428
[2026-03-05 02:04:39,379] [ais_bench] [INFO] Starting inference tasks...
[2026-03-05 02:04:39,383] [ais_bench] [INFO] Partitioned into 1 tasks.
[2026-03-05 02:04:39,398] [ais_bench] [INFO] Launch TasksMonitor, PID: 2360, Refresh interval: 0.5, Run in background: True
[2026-03-05 02:04:51,910] [ais_bench] [INFO] Debug mode, print progress directly
[2026-03-05 02:04:51,912] [ais_bench] [INFO] Task [vllm-api-general-chat/textvqa]
[2026-03-05 02:04:52,992] [ais_bench] [INFO] Zero Retriever initialized, returning empty shot case for all queries
[2026-03-05 02:04:55,348] [ais_bench] [INFO] Apply ice template finished
[2026-03-05 02:04:55,833] [ais_bench] [INFO] Start warmup, run with concurrency: 1
Warmup: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 198.55case/s]
[2026-03-05 02:04:55,841] [ais_bench] [INFO] Warmup finished Total Count: 1 Success Count: 1 Failed Count: 0
[2026-03-05 02:04:56,175] [ais_bench] [INFO] Dataset needed memory size: 5.24575233 MB
[2026-03-05 02:04:56,175] [ais_bench] [INFO] Memory usage check passed: 2.84% < 80% (Available: 1957.29 GB)
[2026-03-05 02:04:56,184] [ais_bench] [WARNING] The request rate is below 0.1, resulting in an excessively long interval between two consecutive requests.
[2026-03-05 02:04:56,184] [ais_bench] [INFO] Traffic request rate: 0 RPS with burstiness 1.0.
[2026-03-05 02:04:56,188] [ais_bench] [INFO] Request rate (0.0) or ramp end rps (None) < 0.001, sending all requests simultaneously
[2026-03-05 02:04:56,190] [ais_bench] [INFO] Debug mode, run with concurrency: 1
[2026-03-05 02:04:56,290] [ais_bench] [INFO] All subprocesses have finished deserializing the first batch of data
[2026-03-05 02:04:56,389] [ais_bench] [INFO] Starting progress bar Total data num: 4984 Finished data num: 0 Left data num: 4984
Progress: 100%|███████████████████████████████████████████████████████████████████████████| 4984/4984 [00:31<00:00, 158.92case/s]
POST=4984 (0.0/s)  RECV=4984 (0.0/s)  FAIL=0 (0.0/s)  FINISH=4984 (0.0/s)
[2026-03-05 02:05:27,774] [ais_bench] [INFO] Api infer task time elapsed: 35.86s
[2026-03-05 02:05:29,445] [ais_bench] [INFO] Inference tasks completed.
[2026-03-05 02:05:29,447] [ais_bench] [INFO] Starting evaluation tasks...
[2026-03-05 02:05:29,451] [ais_bench] [INFO] Partitioned into 1 tasks.
[2026-03-05 02:05:29,465] [ais_bench] [INFO] Launch TasksMonitor, PID: 2378, Refresh interval: 0.5, Run in background: True
[2026-03-05 02:05:41,899] [ais_bench] [INFO] Debug mode, print progress directly
[2026-03-05 02:05:44,130] [ais_bench] [INFO] Running 1-th replica of evaluation
[2026-03-05 02:05:46,859] [ais_bench] [INFO] Task vllm-api-general-chat/textvqa: {'accuracy': 0.0}
[2026-03-05 02:05:46,888] [ais_bench] [INFO] Evaluation task time elapsed: 4.99s
[2026-03-05 02:05:48,447] [ais_bench] [INFO] Evaluation tasks completed.
[2026-03-05 02:05:48,447] [ais_bench] [INFO] Summarizing evaluation results...
dataset    version    metric    mode      vllm-api-general-chat
---------  ---------  --------  ------  -----------------------
textvqa    4005f4     accuracy  gen                        0.00
[2026-03-05 02:05:48,451] [ais_bench] [INFO] write summary to /workspace/benchmark/ais_bench/datasets/textvqa/train_images/outputs/default/20260305_020428/summary/summary_20260305_020428.txt
[2026-03-05 02:05:48,452] [ais_bench] [INFO] write csv to /workspace/benchmark/ais_bench/datasets/textvqa/train_images/outputs/default/20260305_020428/summary/summary_20260305_020428.csv


The markdown format results is as below:

| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| textvqa | 4005f4 | accuracy | gen | 0.00 |

[2026-03-05 02:05:48,452] [ais_bench] [INFO] write markdown summary to /workspace/benchmark/ais_bench/datasets/textvqa/train_images/outputs/default/20260305_020428/summary/summary_20260305_020428.md


具体预测值：
{"data_abbr": "textvqa", "id": 0, "success": true, "uuid": "f072c2c71fd3495e9bd3d49f0646d830", "origin_prompt": [{"role": "H     UMAN", "prompt": [{"image_url": {"url": "file:///workspace/benchmark/ais_bench/datasets/textvqa/train_images/003a8ae2ef43b90     1.jpg"}, "type": "image_url"}, {"text": "what is the brand of this camera? Answer the question using a single word or phrase     .", "type": "text"}]}], "prediction": "", "gold": [{"answer": "nous les gosses", "answer_confidence": "yes", "answer_id": 0}     , {"answer": "dakota", "answer_confidence": "yes", "answer_id": 1}, {"answer": "clos culombu", "answer_confidence": "yes", "     answer_id": 2}, {"answer": "dakota digital", "answer_confidence": "yes", "answer_id": 3}, {"answer": "dakota", "answer_confi     dence": "yes", "answer_id": 4}, {"answer": "dakota", "answer_confidence": "yes", "answer_id": 5}, {"answer": "dakota digital     ", "answer_confidence": "yes", "answer_id": 6}, {"answer": "dakota digital", "answer_confidence": "yes", "answer_id": 7}, {"     answer": "dakota", "answer_confidence": "yes", "answer_id": 8}, {"answer": "dakota", "answer_confidence": "yes", "answer_id"     : 9}]}
   2 {"data_abbr": "textvqa", "id": 1, "success": true, "uuid": "34f79be48244405e81e046bcfb86ed80", "origin_prompt": [{"role": "H     UMAN", "prompt": [{"image_url": {"url": "file:///workspace/benchmark/ais_bench/datasets/textvqa/train_images/b9dc400eb20bad6     4.jpg"}, "type": "image_url"}, {"text": "what does the small white text spell? Answer the question using a single word or ph     rase.", "type": "text"}]}], "prediction": "", "gold": [{"answer": "copenhagen", "answer_confidence": "yes", "answer_id": 0},      {"answer": "copenhagen", "answer_confidence": "yes", "answer_id": 1}, {"answer": "copenhagen", "answer_confidence": "yes",      "answer_id": 2}, {"answer": "copenhagen", "answer_confidence": "yes", "answer_id": 3}, {"answer": "copenhagen", "answer_conf     idence": "yes", "answer_id": 4}, {"answer": "thursday", "answer_confidence": "yes", "answer_id": 5}, {"answer": "copenhagen"     , "answer_confidence": "yes", "answer_id": 6}, {"answer": "copenhagen", "answer_confidence": "yes", "answer_id": 7}, {"answe     r": "copenhagen", "answer_confidence": "yes", "answer_id": 8}, {"answer": "copenhagen", "answer_confidence": "yes", "answer_     id": 9}]}


### 前置检查

- [x] 我已读懂主页文档的快速入门，无法解决问题
- [x] 我已检索过FAQ，无重复问题
- [x] 我已搜索过现有Issue，无重复问题
- [ ] 我已更新到最新版本，问题仍存在

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] textvqa多模态测试预测结果为空，acc为0 #166

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

预期行为

实际行为

前置检查

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] textvqa多模态测试预测结果为空，acc为0 #166

Description

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

预期行为

实际行为

前置检查

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions