CompanyRehearsal: Retrieval-Augmented Financial QA with Knowledge Graph Grounding

This repository is about the paper
“CompanyRehearsal: Retrieval-Augmented Financial QA with Knowledge Graph Grounding”. It includes the datasets used in the study. The project explores Retrieval-Augmented Generation (RAG) techniques that leverage knowledge graphs (KGs) or past earnings call Q&A to enhance factual accuracy and reasoning in financial domains, particularly in earnings call transcripts (ECC) scenario.

We provides ECC QA pairs, knowledge graph used in the expeiments, and financial terminology resources designed to support research on financial question answering.

Please cite the following references if you use the released data.

@inproceedings{shih2025company,
  title={Company-Specific Knowledge Matters: Retrieval-Augmented Generation for Earnings Call Answer Rehearsal},
  author={Shih, Yung-Yu and Chen, Yun-Nung and Chen, Chung-Chi},
  booktitle={Proceedings of the 34th ACM International Conference on Information and Knowledge Management},
  pages={5243--5247},
  year={2025}
}

📁 Repository Structure

`data/`

Contains all raw and processed data used in the RAG experiments.

`knowledge_graph/`

Knowledge graphs used for retrieval and reasoning tasks.

cause_effect_sentence_pairs.txt — Sentence-level cause–effect relationships.

Column	Description
Cause sentence	The initiating or source statement (e.g., '2Q18 earnings met expectations')
Effect sentence	The resulting or affected statement (e.g., 'FY18 EPS estimate remains unchanged at $4.10')

cause_effect_term_pairs.csv — Term-level cause–effect pairs, formatted as (head → cause, tail → effect, weight). Each row represents a directed relationship from a cause term to an effect term, along with its frequency count in the corpus.

Column	Description
Cause	The source or initiating term (e.g., `growth`)
Effect	The resulting or affected term (e.g., `revenue`)
Count	The number of times the cause–effect pair appears in the dataset

`earnings_call_qa/`

Question–Answer pairs derived from financial earnings call transcripts.

all_qa/ — QA pairs collected across all companies.
cs_qa(aapl)/ — QA pairs extracted from Apple earnings call transcripts, covering 2022 Q1 to 2024 Q3 (11 sessions in total).

Each file (e.g., A_q4_2020.txt) contains structured JSON data with detailed QA annotations, including term-level mappings and semantic analysis results.

Example: A_q4_2020.txt

{
  "0": {
    "question": "A couple of questions from me, Mike, maybe on -- first on the guidance part here. I guess, the Q1 guidance of 4.5% to 5.5% core, does it have -- does it assume any co-tailwinds, because, I guess, if you look at Q4, I mean 6% core, any reason why the core should slow down sequentially?",
    "answer": [
      "Yeah. Let me start, before Bob. So again, thanks for the earlier comments, Vijay. So how we characterize our Q1 guide is positive, but we use a very prudent approach...",
      "Yeah, Vijay. I think a couple of things. The thing that I would say is, we didn't end the year with emptying the tank out and feel really good about that..."
    ],
    "KG_terms": ["management", "demand", "shareholder", "results", "market", "backlog", "uncertainty", "visibility", "prudent"],
    "question_terms": ["core", "mean", "guidance", "down", "slow", "part"],
    "answer_terms": ["year", "upside", "order", "business", "quarter", "recovery", "visibility", "prudent"],
    "in_A_in_Q": [],
    "in_A_not_in_Q": ["year", "upside", "order", "business", "recovery", "visibility", "prudent"]
  }
}

`financial_terms.txt`

A curated list of domain-specific financial terms used to enhance retrieval accuracy, entity linking, and knowledge grounding in financial text analysis.
Each line represents a single term.

Example entries: liquidity coverage ratio, cost of goods sold(COGS), Treasuries, NCO (Net Charge-Offs), P/S, ...

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
README.md		README.md
paper.pdf		paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CompanyRehearsal: Retrieval-Augmented Financial QA with Knowledge Graph Grounding

📁 Repository Structure

`data/`

`knowledge_graph/`

`earnings_call_qa/`

`financial_terms.txt`

About

Uh oh!

Releases

Packages

MiuLab/CompanyRehearsal

Folders and files

Latest commit

History

Repository files navigation

CompanyRehearsal: Retrieval-Augmented Financial QA with Knowledge Graph Grounding

📁 Repository Structure

data/

knowledge_graph/

earnings_call_qa/

financial_terms.txt

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

`data/`

`knowledge_graph/`

`earnings_call_qa/`

`financial_terms.txt`

Packages