Skip to content

This project provides a comprehensive solution for extracting structured data from documents using Llama Cloud's AI services and managing the extracted data in a SQLite database.

Notifications You must be signed in to change notification settings

mudit14224/extract-schema-build

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Document Data Extraction and Schema Builder

This project provides a comprehensive solution for extracting structured data from documents using Llama Cloud's AI services and managing the extracted data in a SQLite database. It consists of a FastAPI backend and a React-based schema builder frontend.

Features

  • Document Upload: Upload documents for data extraction
  • Schema Builder: Create custom data schemas for extraction
  • AI-Powered Extraction: Use Llama Cloud's AI to extract structured data from documents
  • Database Integration: Store extracted data in SQLite with dynamic table creation
  • CORS Support: Ready for frontend integration

Project Structure

.
├── app.py                 # FastAPI backend
├── schema-builder/        # React frontend for schema building
├── uploaded_files/        # Storage for uploaded documents
├── llama_cloud_services/  # Llama Cloud integration
└── database.db           # SQLite database

Prerequisites

  • Python 3.8+
  • Node.js 14+ (for schema-builder)
  • Llama Cloud API key

Setup

  1. Backend Setup:

    # Create and activate virtual environment
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
    # Install dependencies
    pip install fastapi uvicorn sqlalchemy pydantic python-multipart
  2. Frontend Setup:

    cd schema-builder
    npm install
  3. Environment Configuration: Create a .env file with your Llama Cloud API key:

    LLAMA_CLOUD_API_KEY="your-api-key"
    

Running the Application

  1. Start the Backend:

    uvicorn app:app --reload
  2. Start the Frontend:

    cd schema-builder
    npm run dev

API Endpoints

  • POST /generate-schema: Generate a Pydantic model from user-defined schema
  • POST /upload-file: Upload a document for extraction
  • POST /extract-data: Extract data from uploaded document using AI
  • POST /push-to-db: Store extracted data in SQLite database

Usage

  1. Use the schema builder to define the structure of data you want to extract
  2. Upload a document (e.g., resume, invoice, etc.)
  3. The system will extract data according to your schema
  4. Extracted data can be stored in the SQLite database

Security Notes

  • The .env file containing your API key is excluded from version control
  • CORS is configured to allow frontend integration
  • File uploads are stored in a dedicated directory

About

This project provides a comprehensive solution for extracting structured data from documents using Llama Cloud's AI services and managing the extracted data in a SQLite database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published