Tests related to converting arbitrary PDFs into iTELL JSON format.
Pipeline for Extracting Images & Embedding in ITELL JSON using Gemini API
-
Extract Images from PDF
- Use
PyMuPDFto parse the PDF and extract all images. - Store images locally for now, with possible migration to a hosted DB later.
- Extract and record metadata for each image:
- Original position (coordinates) within the PDF page
- Page number, image size, etc.
- Use
-
Gemini API Integration
- Send the PDF file directly to the Gemini API.
- In the prompt, include:
- Reference to the ITELL guide
- Example ITELL JSON
- The image metadata (positions, page numbers, etc.)
- Goal: Ensure Gemini embeds images within the ITELL JSON at their correct locations according to the extracted PDF positions.
- Python 3.8 or higher
-
Clone the repository (if not already done):
git clone <repository-url> cd itell-volume-generation
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On macOS/Linux:
source venv/bin/activate - On Windows:
venv\Scripts\activate
- On macOS/Linux:
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.envfile in the project root:touch .env
-
Configure environment variables in
.env:# Required: Choose one API provider OPENAI_API_KEY=your_openai_api_key_here # OR OPENROUTER_API_KEY=your_openrouter_api_key_here # Optional: Model configuration OPENAI_MODEL=gpt-4o-mini OPENROUTER_MODEL=google/gemini-2.5-flash # Optional: Base URL overrides OPENAI_BASE_URL=https://api.openai.com/v1 OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 # Optional: OpenRouter-specific headers OPENROUTER_SITE_URL=https://yoursite.com OPENROUTER_APP_NAME=YourAppName
Run the pipeline:
python src/pipeline/main.py \
--pdf src/resources/input.pdf \
--guide src/resources/guide.md \
--reference-json src/resources/reference.json \
--image-dir results/extracted-images \
--output results/itell.json- Provide only
OPENROUTER_API_KEY(omitOPENAI_API_KEY) to automatically targethttps://openrouter.ai/api/v1. Override withOPENROUTER_BASE_URLor--base-urlif needed. - Most OpenRouter models use provider-scoped names such as
google/gemini-2.5-flash(the default when no model is specified) oropenrouter/openai/gpt-4o-mini. Pass a custom name via--modelor setOPENROUTER_MODEL. - Optional headers recommended by OpenRouter can be set via
OPENROUTER_SITE_URL(becomesHTTP-Referer) andOPENROUTER_APP_NAME(becomesX-Title). - If you keep both OpenAI and OpenRouter keys, pass
--api-key/--base-urlexplicitly so the correct provider is used.