DocuCheck

🧠 Overview

DocuCheck is a Python-based tool that extracts factual claims from documents (like PDFs) and performs automated fact-checking using a generative AI model (Google’s Gemini).
It produces a human-readable HTML report (report.html) summarizing extracted claims, internal consistency checks, and external verification results.

⚙️ Features

Structural Text Extraction: Parses PDFs to understand document structure (headings vs. paragraphs) for better context.
AI-Powered Claim Extraction: Uses a generative model to identify and extract factual claims from text.
Internal Consistency Analysis: Detects contradictions within the document’s own claims.
External Fact-Checking: Verifies claims against the model’s external knowledge to identify outdated or invalid information.
Modern HTML Reporting: Generates a clean, single-page HTML report with a visual summary dashboard.
Result Caching: Caches analysis results to avoid re-processing and repeated API calls.

🗂️ Project Structure

Docucheck/
├── main.py           # Main CLI entry point with argument parsing
├── extractor.py      # Handles PDF parsing and claim extraction
├── verifier.py       # Handles internal/external fact-checking
├── reporter.py       # Generates the final HTML report
├── caching.py        # Manages file hashing and result caching
└── utils.py          # Shared utilities (e.g., JSON parsing)

run.py                # The main script to execute the package
requirements.txt      # Python dependencies
.env.example          # Example environment variables
.gitignore            # Ignores .env, .venv, pycache, etc.
LICENSE               # MIT License
README.md             # This file

🚀 Quick Start

1. Installation

Clone the repository and navigate into the project directory:

git clone https://github.com/<your-username>/docucheck.git
cd docucheck

2. Set Up Virtual Environment

Windows (PowerShell):

python -m venv .venv
.\.venv\Scripts\Activate.ps1

macOS/Linux (bash):

python3 -m venv .venv
source .venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Set Up Environment

Copy the example .env file and add your Gemini API key.

Windows (PowerShell):

copy .env.example .env
# Now edit the .env file with Notepad or VS Code

macOS/Linux (bash):

cp .env.example .env
# Now edit the .env file with nano, vim, or VS Code

Your .env file should look like this:

GEMINI_API_KEY=your-api-key-goes-here

🧩 Usage

Run the application using run.py, passing the path to the document you want to analyze.

Basic Example

python run.py "path/to/your/document.pdf"

This will analyze the PDF and save the report as report.html in the same directory.

Command-Line Arguments

Argument	Description	Required	Default
`input_file`	Path to the input file (.pdf, .txt, etc.)	✅	—
`-o`, `--output`	Path to save the output HTML report	❌	`report.html`
`-l`, `--limit`	Limit the number of claims to externally fact-check (0 = all)	❌	`0`
`--force`	Force re-analysis and bypass cached results	❌	`False`

Full Example

python run.py "my_research.pdf" -o "MyReport.html" -l 5 --force

This command:

Analyzes my_research.pdf
Saves the report as MyReport.html
Only fact-checks the first 5 claims
Bypasses the cache

🧠 How It Works

run.py executes the main() function in Docucheck/__main__.py.
The script parses command-line arguments.
caching.py generates a SHA-256 hash of the input file and checks for a cached result.
If cache exists (and --force not used), analysis is skipped and report generation begins.
If no cache:
- extractor.py uses PyMuPDF to extract structured text.
- Text is sent to Gemini API for claim extraction.
verifier.py:
- Checks for internal contradictions.
- Performs external fact-checking using Gemini’s knowledge.
reporter.py compiles all results into a single HTML report.
caching.py saves all results (claims, contradictions, checks) to a .Docucheck_Cache JSON file.

📄 License

Distributed under the MIT License.
See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Docucheck		Docucheck
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
RESTRUCTURING_SUMMARY.md		RESTRUCTURING_SUMMARY.md
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuCheck

🧠 Overview

⚙️ Features

🗂️ Project Structure

🚀 Quick Start

1. Installation

2. Set Up Virtual Environment

3. Install Dependencies

4. Set Up Environment

🧩 Usage

Basic Example

Command-Line Arguments

Full Example

🧠 How It Works

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

aryash45/docucheck

Folders and files

Latest commit

History

Repository files navigation

DocuCheck

🧠 Overview

⚙️ Features

🗂️ Project Structure

🚀 Quick Start

1. Installation

2. Set Up Virtual Environment

3. Install Dependencies

4. Set Up Environment

🧩 Usage

Basic Example

Command-Line Arguments

Full Example

🧠 How It Works

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages