Skip to content

yxwan123/DeepVerifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📈 Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

arXiv arXiv

🌟 Introduction

  • DeepVerifier enables self-evolving Deep Research Agents (DRAs) by verifying an agent’s draft answer, generating rubric-guided feedback, and iterating—yielding an inference-time scaling effect without additional training.
  • We build an automatically constructed DRA Failure Taxonomy (5 major classes, 13 subclasses) and derive structured rubrics to make verification and feedback more targeted and reliable.
  • Across challenging benchmarks (e.g., GAIA / XBench-DeepSearch / BrowseComp), DeepVerifier improves verification quality and supports multi-round refinement for stronger final task accuracy.

✨ Features

  • 🧠 Verification via Asymmetry + Decomposition: breaks hard verification into small, source-checkable questions.
  • 📜 Rubric-Guided Feedback: taxonomy-derived rubrics produce actionable, structured corrections (not just “judge” scores).
  • 🔌 Plug-and-Play Test-Time Self-Evolution: integrates into existing agent pipelines as a verifier + feedback module.
  • 📦 DeepVerifier-4K Release: a curated SFT dataset (4,646 pairs) to train stronger reflection and self-critique in open models.

🚀 Usage

1. Overview

  • All related code is in System/ckv3/DeepVerifier/verifier.py
  • Datasets are in data/dataset
  • Please install necessary dependencies following Cognitive Kernel-Pro.

2. Run Scripts

There are three running modes

  1. Run verification
  2. Run CK Agent
  3. Analyze the outputs of 1 or 2 and calculate accuracy

2.1 Run Verification

Input is ck_agent trajectory (jsonl), output is verifier trajectory (jsonl)

There are three different verifiers

  • LLM Verifier
  • Simple Agent Verifier
  • Deep Verifer

To run those verifiers, you can first cd run_scipts/, then

  1. Export env variables (Open AI keys or Azure keys)
export OPENAI_API_KEY="YOUR_API_KEY"
export OPENAI_ENDPOINT="YOUR_ENDPOINT"
export OPENAI_API_VERSION="YOUR_API_VERSION"

export AZURE_OPENAI_ENDPOINT="YOUR_ENDPOINT"  
export AZURE_OPENAI_API_KEY="YOUR_API_KEY"
export AZURE_OPENAI_API_VERSION="YOUR_API_VERSION"
export AWS_ACCESS_KEY="YOUR_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
  1. In verify.sh: modify the default_input and default_output parameters in line 49-50, and --project_path in line 67.
  2. Run the following command:
bash verify.sh 0 deep_verifier # for deep verifier
bash verify.sh 0 llm_verifier # for llm verifier
bash verify.sh 0 simple_verifier # for simple verifier
# the 0 is the web port number to host the headless browser, 0 is 3000, 1 is 3001, x is 300x.

2.2 Run CK Aent

Input is gaia query (jsonl) or **ck_agent trajectory **(jsonl), output is also ck_agent trajectory (jsonl)

To run those verifiers, you can first cd run_scipts/, then

  1. In verify.sh: modify the default_input and default_output parameters in line 49-50, and other parameters in line 57-67:
--verify bool # Whether to use DeepVerifier to verify the ck_agent's answer and retry if the answer is incorrect.
--provide_feedback # Whether to provide feedback to ck_agent in the next try when the answer is incorrect.
--max_retries int # The maximum number of retries for ck_agent when the answer is incorrect.
# all retries will be recorded in the ck_agent's trajectory in the "attempts" field.
  1. Run the following command:
bash verify.sh 0 ck_agent 

2.3 Run Analysis

Input is a ck_agent trajectory (jsonl) or verifier trajectory (jsonl), output is a csv file and the printed out accuracy

bash verify.sh analyze path/to/the/jsonl_file

3. Utilities

bash run_script/split.sh: split a jsonl into multiple part for parallel running

bash run_script/merge.sh: merge multiple parts into a single file


Friendly links to other works from Tencent AI Lab

Citation

@misc{wan2026inference,
      title={Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification}, 
      author={Wan, Yuxuan and Fang, Tianqing and Li, Zaitang and Huo, Yintong and Wang, Wenxuan and Mi, Haitao and Yu, Dong and Lyu, Michael R},
      year={2026},
      eprint={2601.15808},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.15808}, 
}

@misc{fang2025cognitivekernelpro,
      title={Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training}, 
      author={Tianqing Fang and Zhisong Zhang and Xiaoyang Wang and Rui Wang and Can Qin and Yuxuan Wan and Jun-Yu Ma and Ce Zhang and Jiaqi Chen and Xiyun Li and Hongming Zhang and Haitao Mi and Dong Yu},
      year={2025},
      eprint={2508.00414},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.00414}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published