📈 Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
- DeepVerifier enables self-evolving Deep Research Agents (DRAs) by verifying an agent’s draft answer, generating rubric-guided feedback, and iterating—yielding an inference-time scaling effect without additional training.
- We build an automatically constructed DRA Failure Taxonomy (5 major classes, 13 subclasses) and derive structured rubrics to make verification and feedback more targeted and reliable.
- Across challenging benchmarks (e.g., GAIA / XBench-DeepSearch / BrowseComp), DeepVerifier improves verification quality and supports multi-round refinement for stronger final task accuracy.
- 🧠 Verification via Asymmetry + Decomposition: breaks hard verification into small, source-checkable questions.
- 📜 Rubric-Guided Feedback: taxonomy-derived rubrics produce actionable, structured corrections (not just “judge” scores).
- 🔌 Plug-and-Play Test-Time Self-Evolution: integrates into existing agent pipelines as a verifier + feedback module.
- 📦 DeepVerifier-4K Release: a curated SFT dataset (4,646 pairs) to train stronger reflection and self-critique in open models.
- All related code is in
System/ckv3/DeepVerifier/verifier.py - Datasets are in data/dataset
- Please install necessary dependencies following Cognitive Kernel-Pro.
There are three running modes
- Run verification
- Run CK Agent
- Analyze the outputs of 1 or 2 and calculate accuracy
Input is ck_agent trajectory (jsonl), output is verifier trajectory (jsonl)
There are three different verifiers
- LLM Verifier
- Simple Agent Verifier
- Deep Verifer
To run those verifiers, you can first cd run_scipts/, then
- Export env variables (Open AI keys or Azure keys)
export OPENAI_API_KEY="YOUR_API_KEY"
export OPENAI_ENDPOINT="YOUR_ENDPOINT"
export OPENAI_API_VERSION="YOUR_API_VERSION"
export AZURE_OPENAI_ENDPOINT="YOUR_ENDPOINT"
export AZURE_OPENAI_API_KEY="YOUR_API_KEY"
export AZURE_OPENAI_API_VERSION="YOUR_API_VERSION"
export AWS_ACCESS_KEY="YOUR_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"- In verify.sh: modify the
default_inputanddefault_outputparameters in line 49-50, and--project_pathin line 67. - Run the following command:
bash verify.sh 0 deep_verifier # for deep verifier
bash verify.sh 0 llm_verifier # for llm verifier
bash verify.sh 0 simple_verifier # for simple verifier
# the 0 is the web port number to host the headless browser, 0 is 3000, 1 is 3001, x is 300x.Input is gaia query (jsonl) or **ck_agent trajectory **(jsonl), output is also ck_agent trajectory (jsonl)
To run those verifiers, you can first cd run_scipts/, then
- In verify.sh: modify the
default_inputanddefault_outputparameters in line 49-50, and other parameters in line 57-67:
--verify bool # Whether to use DeepVerifier to verify the ck_agent's answer and retry if the answer is incorrect.
--provide_feedback # Whether to provide feedback to ck_agent in the next try when the answer is incorrect.
--max_retries int # The maximum number of retries for ck_agent when the answer is incorrect.
# all retries will be recorded in the ck_agent's trajectory in the "attempts" field.- Run the following command:
bash verify.sh 0 ck_agent Input is a ck_agent trajectory (jsonl) or verifier trajectory (jsonl), output is a csv file and the printed out accuracy
bash verify.sh analyze path/to/the/jsonl_file
bash run_script/split.sh: split a jsonl into multiple part for parallel running
bash run_script/merge.sh: merge multiple parts into a single file
- Deep Research Agent framework: Cognitive Kernel-Pro
- Agent Self-Evolving Research, including WebEvolver, WebCoT, WebVoyager, OpenWebVoyager, WebAggregatorQA.
@misc{wan2026inference,
title={Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification},
author={Wan, Yuxuan and Fang, Tianqing and Li, Zaitang and Huo, Yintong and Wang, Wenxuan and Mi, Haitao and Yu, Dong and Lyu, Michael R},
year={2026},
eprint={2601.15808},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.15808},
}
@misc{fang2025cognitivekernelpro,
title={Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training},
author={Tianqing Fang and Zhisong Zhang and Xiaoyang Wang and Rui Wang and Can Qin and Yuxuan Wan and Jun-Yu Ma and Ce Zhang and Jiaqi Chen and Xiyun Li and Hongming Zhang and Haitao Mi and Dong Yu},
year={2025},
eprint={2508.00414},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2508.00414},
}
