Document Data Extraction and Schema Builder

This project provides a comprehensive solution for extracting structured data from documents using Llama Cloud's AI services and managing the extracted data in a SQLite database. It consists of a FastAPI backend and a React-based schema builder frontend.

Features

Document Upload: Upload documents for data extraction
Schema Builder: Create custom data schemas for extraction
AI-Powered Extraction: Use Llama Cloud's AI to extract structured data from documents
Database Integration: Store extracted data in SQLite with dynamic table creation
CORS Support: Ready for frontend integration

Project Structure

.
├── app.py                 # FastAPI backend
├── schema-builder/        # React frontend for schema building
├── uploaded_files/        # Storage for uploaded documents
├── llama_cloud_services/  # Llama Cloud integration
└── database.db           # SQLite database

Prerequisites

Python 3.8+
Node.js 14+ (for schema-builder)
Llama Cloud API key

Setup

Backend Setup:

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install fastapi uvicorn sqlalchemy pydantic python-multipart

Frontend Setup:
```
cd schema-builder
npm install
```
Environment Configuration: Create a .env file with your Llama Cloud API key:
```
LLAMA_CLOUD_API_KEY="your-api-key"
```

Running the Application

Start the Backend:
```
uvicorn app:app --reload
```
Start the Frontend:
```
cd schema-builder
npm run dev
```

API Endpoints

POST /generate-schema: Generate a Pydantic model from user-defined schema
POST /upload-file: Upload a document for extraction
POST /extract-data: Extract data from uploaded document using AI
POST /push-to-db: Store extracted data in SQLite database

Usage

Use the schema builder to define the structure of data you want to extract
Upload a document (e.g., resume, invoice, etc.)
The system will extract data according to your schema
Extracted data can be stored in the SQLite database

Security Notes

The .env file containing your API key is excluded from version control
CORS is configured to allow frontend integration
File uploads are stored in a dedicated directory

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
schema-builder		schema-builder
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Data Extraction and Schema Builder

Features

Project Structure

Prerequisites

Setup

Running the Application

API Endpoints

Usage

Security Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mudit14224/extract-schema-build

Folders and files

Latest commit

History

Repository files navigation

Document Data Extraction and Schema Builder

Features

Project Structure

Prerequisites

Setup

Running the Application

API Endpoints

Usage

Security Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages