We design a large-scale multi-session conversation dataset to study implicit reasoning in personalized conversations and a hierarchical tree framework for for efficient, level-based retrieval.
conda create -n ImplexConv python=3.9
conda activate ImplexConv
python -m pip install -r requirements.txtIf you need to use OpenAI APIs, you will need to obtain an API key here.
export OPENAI_API_KEY=[your OpenAI API Key]
All datasets referenced in the paper are available on HuggingFace.
- Create conversation summarization and facts:
python fact_sum_batch.py \
--home_dir ./datasets \
--dataset_name opposed_reasoning \
--model_type gpt-4o-mini \
--output_file summarized_opposed_facts.json
- Generate the response and retrieved content:
python fact_topic_reasoning.py \
--home_dir ./datasets \
--dataset_name opposed_reasoning \
--model_type gpt-4o-mini \
--summy_info summarized_opposed_facts.json \
--output_response_file opposed_response.json \
--output_retrieve_file opposed_retrieved_text.json
If you find the work useful, please cite:
@article{li2025toward,
title={Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning},
author={Li, Xintong and Bantupalli, Jalend and Dharmani, Ria and Zhang, Yuwei and Shang, Jingbo},
journal={arXiv preprint arXiv:2503.07018},
year={2025}
}