Skip to content
/ ETC Public

The official code of ETC, Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG (AAAI'26, Oral)

License

Notifications You must be signed in to change notification settings

pkuserc/ETC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG
[AAAI'26 Oral]
Bo Li, Tian Tian, Zhenghua Xu, Hao Cheng, Shikun Zhang and Wei Ye

Dynamic retrieval-augmented generation (RAG) enables large language models (LLMs) to fetch external knowledge on demand, improving adaptability over static RAG. A key challenge in this setting is determining when retrieval should occur. Prior methods typically trigger retrieval based on low confidence in individual tokens, which can result in delayed intervention after errors have already occurred. We propose the Entropy-Trend Constraint (ETC), a training-free method that selects optimal retrieval timing by modeling the dynamics of token-level uncertainty. Specifically, ETC leverages first- and second-order differences of the entropy sequence to capture emerging uncertainty trends, enabling earlier and more precise retrieval. Experiments across six QA benchmarks and three LLM backbones show that ETC consistently outperforms strong baselines while reducing retrieval frequency. It is especially effective in domain-specific settings, demonstrating robust generalization. Further ablation studies and qualitative analysis confirm that trend-aware uncertainty modeling leads to more effective retrieval timing. Our approach is plug-and-play, model-agnostic, and easy to integrate into existing decoding pipelines.

Install environment

conda create -n etc python=3.9
conda activate etc
pip install torch==2.1.1 transformers==4.30.2 beir==1.0.1
python -m spacy download en_core_web_sm

Prepare Retriever

Followed by dragin. Use the Wikipedia dump and elastic search to build the retriever

Build Wikipedia index

Download the Wikipedia dump

mkdir -p data/dpr
wget -O data/dpr/psgs_w100.tsv.gz https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
pushd data/dpr
gzip -d psgs_w100.tsv.gz
popd

Use Elasticsearch to index the Wikipedia dump

cd data
wget -O elasticsearch-7.17.9.tar.gz https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.9-linux-x86_64.tar.gz  # download Elasticsearch
tar zxvf elasticsearch-7.17.9.tar.gz
rm elasticsearch-7.17.9.tar.gz 
cd elasticsearch-7.17.9
nohup bin/elasticsearch &  # run Elasticsearch in background

Build the index

python prep_elastic.py --data_path data/dpr/psgs_w100.tsv --index_name wiki 
Download Dataset

Get Dataset

For 2WikiMultihopQA:

Download the 2WikiMultihop dataset from its repository https://www.dropbox.com/s/ms2m13252h6xubs/data_ids_april7.zip?e=1. Unzip it and move the folder to data/2wikimultihopqa.

For StrategyQA:

wget -O data/strategyqa_dataset.zip https://storage.googleapis.com/ai2i/strategyqa/data/strategyqa_dataset.zip
mkdir -p data/strategyqa
unzip data/strategyqa_dataset.zip -d data/strategyqa
rm data/strategyqa_dataset.zip 

For HotpotQA:

mkdir -p data/hotpotqa
wget -O data/hotpotqa/hotpotqa-dev.json http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json

For IIRC:

wget -O data/iirc.tgz https://iirc-dataset.s3.us-west-2.amazonaws.com/iirc_train_dev.tgz
tar -xzvf data/iirc.tgz
mv iirc_train_dev/ data/iirc
rm data/iirc.tgz

For BioASQ:

mkdir -p data/bioasq_7b_yesno
wget -O data/bioasq_7b_yesno/Task7B_yesno_train.json \
  https://huggingface.co/datasets/nanyy1025/bioasq_7b_yesno/resolve/main/Task7B_yesno_train.json
wget -O data/bioasq_7b_yesno/Task7B_yesno_validation.json \
  https://huggingface.co/datasets/nanyy1025/bioasq_7b_yesno/resolve/main/Task7B_yesno_validation.json
wget -O data/bioasq_7b_yesno/Task7B_yesno_test.json \
  https://huggingface.co/datasets/nanyy1025/bioasq_7b_yesno/resolve/main/Task7B_yesno_test.json

For PubMedQA:

mkdir -p data/pubmedQA
mkdir -p data/pubmedQA
wget -O data/pubmedQA/pqal_train_set.json \
  https://huggingface.co/datasets/tan9/pubmedQA/resolve/main/pqal_train_set.json
wget -O data/pubmedQA/test_set.json \
  https://huggingface.co/datasets/tan9/pubmedQA/resolve/main/test_set.json

Run

The following parameters can be selected in config.json:

Parameter Meaning Example / Options
model_name_or_path Hugging Face model meta-llama/Llama-2-13b-chat
dataset Dataset 2wikimultihopqa, hotpotqa, iirc, strategyqa
data_path The folder where the data is located. If you use the above code to download the data, the folder will be ../data/dataset. ../data/2wikimultihopqa
fewshot Number of few-shot examples 6
sample Number of questions sampled from the dataset. -1 means use the entire dataset. 1000
shuffle Whether to shuffle the dataset. Without this parameter, the dataset will not be shuffled. true, false (without)
generate_max_length Maximum generated length of a question 64
query_formulation Way to generate retrieval question. Main: direct, real_words. Other options: current_wo_wrong, current, forward_all, last_n_tokens, last_sentence real_words
retrieve_keep_top_k Number of reserved tokens when generating a search question 35
output_dir The generated results will be stored in a folder with a numeric name inside the output folder you specify. If the folder doesn’t exist, it will be created automatically. ../result/2wikimultihopqa_llama2_13b
retriever Type of retriever BM25, SGPT
es_index_name Index names in Elasticsearch wiki
retrieve_topk Number of related documents retained 3
hallucination_threshold Threshold at which a word is judged to be incorrect 1.2
check_real_words Whether only content words participate in threshold judgment. Without this parameter, all words will be considered. true, false (without)
use_counter Whether to count the number of generations, retrievals, questions, tokens, and sentences. Without this parameter, counting will be disabled. true, false (without)
thres_abs Whether to use absolute value for threshold determination false
Running Example
python main.py -c path_to_config_file

Evaluation

python evaluate_.py

About

The official code of ETC, Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG (AAAI'26, Oral)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages