This project provides a comprehensive solution for extracting structured data from documents using Llama Cloud's AI services and managing the extracted data in a SQLite database. It consists of a FastAPI backend and a React-based schema builder frontend.
- Document Upload: Upload documents for data extraction
- Schema Builder: Create custom data schemas for extraction
- AI-Powered Extraction: Use Llama Cloud's AI to extract structured data from documents
- Database Integration: Store extracted data in SQLite with dynamic table creation
- CORS Support: Ready for frontend integration
.
├── app.py # FastAPI backend
├── schema-builder/ # React frontend for schema building
├── uploaded_files/ # Storage for uploaded documents
├── llama_cloud_services/ # Llama Cloud integration
└── database.db # SQLite database
- Python 3.8+
- Node.js 14+ (for schema-builder)
- Llama Cloud API key
-
Backend Setup:
# Create and activate virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install fastapi uvicorn sqlalchemy pydantic python-multipart
-
Frontend Setup:
cd schema-builder npm install -
Environment Configuration: Create a
.envfile with your Llama Cloud API key:LLAMA_CLOUD_API_KEY="your-api-key"
-
Start the Backend:
uvicorn app:app --reload
-
Start the Frontend:
cd schema-builder npm run dev
POST /generate-schema: Generate a Pydantic model from user-defined schemaPOST /upload-file: Upload a document for extractionPOST /extract-data: Extract data from uploaded document using AIPOST /push-to-db: Store extracted data in SQLite database
- Use the schema builder to define the structure of data you want to extract
- Upload a document (e.g., resume, invoice, etc.)
- The system will extract data according to your schema
- Extracted data can be stored in the SQLite database
- The
.envfile containing your API key is excluded from version control - CORS is configured to allow frontend integration
- File uploads are stored in a dedicated directory