An enhanced fact-checking system that verifies claims using web search and LLM-based reasoning.
π Try it online: https://claimclaire.vercel.app/
ClaimCLAIRE is an AI-powered fact-checking system that verifies claims by:
- Decomposing claims into atomic components for systematic verification
- Gathering evidence from web searches with source trust ratings
- Evaluating each component against collected evidence
- Synthesizing a final verdict with detailed explanations and citations
The agent follows a five-stage pipeline:
- Claim Decomposition: Breaks down the input claim into atomic components using iterative validation
- Holistic Evidence Gathering: A ReAct-style agent searches the web using Serper.dev API with LLM-based reranking
- Component Evaluation & Gap-Filling: Evaluates each component against gathered evidence, performs targeted searches for unverified components
- Verdict Synthesis: Applies deterministic logic rules to determine if the claim is consistent or inconsistent
- Report Generation: Generates a natural-language explanation with citations and trust ratings
All web searches are performed using the Serper.dev Google Search API with optional Gemini-based reranking for improved relevance. Sources are assigned trust ratings (Reliable, Mixed, or Unreliable) to weight their credibility.
A sample dataset (sample_dev.json) with 5 examples is included in data/ for quick testing:
pixi shell
python evaluate_ablations.py \
--data-path data/sample_dev.json \
--ablation A4 \
--output-path results/sample_test.jsonFor complete evaluations, download the full AVeriTeC dataset:
- Download
dev.jsonfrom HuggingFace - Place it in the
data/directory:# The data/ directory already exists # Download and place dev.json there
The dataset contains fact-checking claims with labels: "Supported", "Refuted", "Not Enough Evidence", or "Conflicting Evidence/Cherrypicking".
- Linux (tested on Ubuntu 22.04) or macOS.
- Python 3.12 (managed automatically via pixi).
- Access credentials for your preferred LLM provider.
curl -fsSL https://pixi.sh/install.sh | bashgit clone https://github.com/yo-lxmmm/ClaimCLAIRE.git
cd ClaimCLAIREOption A: Using pixi (recommended)
pixi shellThe first run resolves Python 3.12 and all dependencies defined in pixi.toml. Subsequent invocations reuse the cached env.
Option B: Using pip
python3.12 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtThe agent requires API keys for LLM providers and web search. Create a .env file in the project root with your credentials. You can use .env.template as a starting point:
cp .env.template .envThen edit .env with your credentials.
Required API Keys:
# Serper.dev API key (REQUIRED for web search)
SERPER_API_KEY=your_serper_api_key_here
# Google Gemini API key (if using Google models)
GOOGLE_API_KEY=your_google_api_key_hereOptional API Keys (for other LLM providers):
# Anthropic (for Claude models - used in decomposition by default)
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Azure OpenAI
AZURE_OPENAI_API_KEY=your_azure_openai_api_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2024-06-01
# OpenAI
OPENAI_API_KEY=your_openai_api_key_hereβΉοΈ Note: The system uses Claude Sonnet 4 for claim decomposition (A2+) and the specified
--enginemodel for other components. Refer to the LangChain chat model docs for provider-specific variables.
Use the live web interface: https://claimclaire.vercel.app/
No installation or API keys needed - just visit the link and start verifying claims!
Run the Flask web application locally for an interactive interface:
# Inside `pixi shell` or activated venv
python baseline_web_app.pyThen open your browser to http://localhost:8080 to use the web interface.
βΉοΈ Note: Make sure you have set
GOOGLE_API_KEYandSERPER_API_KEYin your.envfile before running the web app.
You can integrate the agent directly into your application:
Example code:
from claire_agent import InconsistencyAgent
from utils.report_rendering import render_inconsistency_report
import asyncio
agent = InconsistencyAgent(
engine="gemini-2.5-flash",
model_provider="google_genai",
num_results_per_query=10,
reasoning_effort=None,
)
async def main():
claim = "Bernie Sanders purchased an opulent Vermont mansion in 2016 for $2.5 million."
report = await agent.analyze_claim(
claim_text=claim,
passage=claim # Can provide additional context if available
)
render_inconsistency_report(report)
if __name__ == "__main__":
asyncio.run(main())The system includes ablation studies (A0-A4) to evaluate different components:
| Ablation | ReAct Agent | Iterative Decomposition | Trust Weighting | Gap-Filling | Description |
|---|---|---|---|---|---|
| A0 | β | β | β | β | Baseline RAG (simple search) |
| A1 | β | β | β | β | A0 + ReAct Agent |
| A2 | β | β | β | β | A1 + Iterative Decomposition |
| A3 | β | β | β | β | A2 + Trust Weighting |
| A4 | β | β | β | β | A3 + Gap-Filling (Full System) |
Evaluate a specific ablation variant:
# Inside pixi shell
python evaluate_ablations.py \
--data-path data/dev.json \
--ablation A4 \
--output-path results/ablation_A4_full.json \
--engine gemini-2.5-flash \
--model-provider google_genai \
--num-results 10To test on a subset:
python evaluate_ablations.py \
--data-path data/dev.json \
--ablation A2 \
--output-path results/ablation_A2_test.json \
--max-examples 10 \
--engine gemini-2.5-flash \
--model-provider google_genai \
--num-results 10Use the provided script to run all ablations sequentially:
# Inside pixi shell
./run_evaluations.shThis will run A0, A1, A2, A3, and A4 on the full dataset and save results to the results/ directory.
Each evaluation produces three files:
-
JSON file (
ablation_A4_full.json): Complete results with metrics and predictionsmetrics: Overall accuracy, macro F1, per-class precision/recall/F1label_breakdown: Performance broken down by original labelresults: Per-example predictions
-
Results CSV (
ablation_A4_full_results.csv): Main evaluation results- Columns:
claim_id,claim,gold_label,predicted_verdict,correct,num_components,num_sources,gap_fill_triggered, etc.
- Columns:
-
Components CSV (
ablation_A4_full_components.csv): Component-level details- Shows each claim component's evaluation, gap-filling status, and verdict changes
{
"metrics": {
"accuracy": 0.85,
"macro_f1": 0.83,
"per_class": {
"consistent": {"precision": 0.87, "recall": 0.82, "f1": 0.84},
"inconsistent": {"precision": 0.81, "recall": 0.86, "f1": 0.83}
}
}
}This system builds upon the CLAIRE architecture introduced in:
Sina J. Semnani, Jirayu Burapacheep, Arpandeep Khatua, Thanawan Atchariyachanvanit, Zheng Wang, Monica S. Lam. "Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models." EMNLP 2025.
This code is released under the Apache 2.0 license.
