Learning Track: From Chatbots to Production Components
Focus: Fine control, guardrails, and output reliability
A production-grade LLM-powered data extraction system that demonstrates how to build reliable, trustworthy LLM components through structured outputs, validation, and automatic error recovery.
🆓 Uses FREE local models via Ollama - no API costs!
By the end of this project, you'll understand:
- Function Calling - How to force LLMs to return structured data instead of free-form text
- Schema Validation - Using Pydantic to enforce type safety and catch errors immediately
- Retry Logic - Automatic recovery from validation failures with error-aware prompting
- Deterministic Behavior - Configuring LLMs for consistency over creativity
- Production Patterns - Moving from "works sometimes" to "works reliably"
The Big Picture: Transform LLMs from unpredictable text generators into components you can actually use in production pipelines.
┌─────────────────┐
│ Raw Text Input │ (email, invoice, support ticket)
└────────┬────────┘
│
▼
┌─────────────────────────────────────────┐
│ LLM Extractor (with Function Calling) │
│ • GPT-4o-mini with low temperature │
│ • Pydantic schema as function def │
│ • Structured JSON output enforced │
└────────┬────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ Pydantic Validation Layer │
│ • Type checking │
│ • Field constraints │
│ • Business logic rules │
└────────┬───────────┬────────────┘
│ │
✓ Valid ✗ Invalid
│ │
│ ▼
│ ┌──────────────────┐
│ │ Retry Logic │
│ │ • Error feedback│
│ │ • Max 3 attempts│
│ └──────┬───────────┘
│ │
│ ▼
│ (retry with corrections)
│
▼
┌─────────────────┐
│ Valid JSON Out │
└─────────────────┘
- Python 3.10+
- Ollama installed (free local LLM runtime)
- Install from: https://ollama.ai
- Or via:
brew install ollama(macOS)
- Basic understanding of async/await (helpful but not required)
# Install Ollama (if not already installed)
# macOS/Linux:
curl -fsSL https://ollama.ai/install.sh | sh
# Or macOS with Homebrew:
brew install ollama
# Start Ollama service (in a separate terminal)
ollama serve
# Pull the default model (llama3.2 - fast and capable)
ollama pull llama3.2
# Clone or navigate to the project
cd llm_102
# Install Python dependencies
pip install -r requirements.txt
# Set up environment variables (optional - has defaults)
cp .env.example .env
# Edit .env if you want to change model or Ollama host# Extract data from a sample invoice
python cli.py extract \
--input sample_inputs/invoice_tech.txt \
--type invoice \
--output output/invoice_result.json
# Extract data from an email
python cli.py extract \
--input sample_inputs/email_project.txt \
--type email
# Process a support ticket
python cli.py extract \
--input sample_inputs/support_ticket_urgent.txt \
--type support_ticket \
--verbosepython cli.py list-schemaspython cli.py validate \
--schema invoice \
--file sample_outputs/invoice_success.jsonllm_102/
├── README.md # This file
├── CONCEPTS.md # Deep dive into core concepts
├── requirements.txt # Python dependencies
├── .env.example # Environment template
├── cli.py # CLI interface (entry point)
│
├── src/
│ ├── __init__.py
│ ├── schemas.py # Pydantic models for extraction
│ ├── extractor.py # Core extraction engine
│ └── logging_config.py # Logging setup
│
├── sample_inputs/ # Example documents to extract from
│ ├── invoice_tech.txt
│ ├── email_project.txt
│ ├── email_inquiry.txt
│ └── support_ticket_urgent.txt
│
└── sample_outputs/ # Example successful extractions
├── invoice_success.json
└── email_success.json
Create a .env file with:
# Optional (defaults shown)
MODEL_NAME=llama3.2 # Free Ollama model
OLLAMA_HOST=http://localhost:11434 # Local Ollama server
TEMPERATURE=0.1 # Low for deterministic outputs
MAX_RETRIES=3 # Retry attempts on validation failureAvailable Ollama Models:
llama3.2(recommended) - Fast, capable, 3B paramsllama3.1- More powerful, 8B paramsmistral- Alternative, good for JSONphi3- Microsoft's model, very fast
Pull any model with: ollama pull <model-name>
# Full control over extraction
python cli.py extract \
--input <file> \
--type <invoice|email|support_ticket> \
--output <optional-output-file> \
--model llama3.1 \ # Override model
--temperature 0.0 \ # Override temperature
--max-retries 5 \ # Override retry limit
--verbose \ # Debug logging
--show-attempts # Show all retry attemptsInstead of parsing free-form text:
# ❌ Unreliable: Parse LLM's creative response
"The invoice total is $1,234.56 and it's due on March 15th"
# ✅ Reliable: Schema-enforced JSON
{
"total_amount": 1234.56,
"currency": "USD",
"due_date": "2025-03-15"
}How it works:
- We define a Pydantic schema (e.g.,
InvoiceData) - Convert it to JSON Schema format
- Include it in the prompt to guide the model
- LLM returns JSON conforming to the schema
- We validate with Pydantic
Note: Ollama models don't have native function calling like OpenAI, but we achieve the same result through structured prompting with JSON schemas.
See src/extractor.py lines 126-180 for implementation.
Type Safety:
class InvoiceData(BaseModel):
invoice_number: str
total_amount: float = Field(..., gt=0) # Must be positive
due_date: str = Field(..., pattern=r"^\d{4}-\d{2}-\d{2}$")Benefits:
- Automatic type conversion (
"42"→42) - Field constraints (regex, min/max, custom validators)
- Clear error messages for debugging
- Self-documenting schemas
See src/schemas.py for all schema definitions.
When validation fails, we don't just give up:
# Attempt 1: LLM returns invalid date format
{"due_date": "03/15/2025"} # ❌ Doesn't match YYYY-MM-DD
# System builds feedback prompt:
"Field 'due_date': string does not match regex pattern ^\d{4}-\d{2}-\d{2}$"
# Attempt 2: LLM corrects the error
{"due_date": "2025-03-15"} # ✓ Valid!Key insight: The LLM learns from its mistakes within the same request chain.
See src/extractor.py lines 75-120 for retry implementation.
Temperature = 0.1 (not 0.7+)
- Consistent outputs for the same input
- Still flexible enough to handle variations
- Critical for production reliability
Trade-off:
- Low temperature → deterministic, reliable
- High temperature → creative, unpredictable
Read CONCEPTS.md - it explains why these patterns matter.
Open src/schemas.py and see:
- How Pydantic models define structure
- Field validators for business logic
- Nested models (e.g.,
InvoiceIteminInvoiceData)
Run with --verbose and watch the logs:
python cli.py extract \
--input sample_inputs/invoice_tech.txt \
--type invoice \
--verboseFollow the flow:
- Input text loaded
- LLM called with function schema
- Response validated
- (If failure) Retry with error feedback
- Success or final failure
Try breaking things to learn:
# What happens with incomplete data?
echo "Invoice #123, total $50" > test.txt
python cli.py extract --input test.txt --type invoice
# Can it handle ambiguous dates?
# Can it extract from messy formatting?Add your own schema:
- Define a new Pydantic model in
schemas.py - Add it to
EXTRACTION_SCHEMAS - Create sample inputs
- Test with the CLI
Example: Resume parser, product catalog, contract terms, etc.
⚙ Starting extraction...
┌─ Extraction Attempts ──────────────────┐
│ # │ Status │ Details │
├─────┼────────────┼─────────────────────┤
│ 1 │ ✓ Success │ Valid data extracted│
└─────┴────────────┴─────────────────────┘
✓ Extraction succeeded after 1 attempt!
┌─ Extracted Data ───────────────────────┐
│ {
│ "invoice_number": "INV-2025-0342",
│ "invoice_date": "2025-03-10",
│ "due_date": "2025-04-09",
│ ...
│ }
└────────────────────────────────────────┘
⚙ Starting extraction...
┌─ Extraction Attempts ──────────────────────────────────┐
│ # │ Status │ Details │
├─────┼───────────┼──────────────────────────────────────┤
│ 1 │ ✗ Failed │ due_date: string does not match regex│
│ 2 │ ✓ Success │ Valid data extracted │
└─────┴───────────┴──────────────────────────────────────┘
✓ Extraction succeeded after 2 attempts!
- Free and open-source - no API costs
- Privacy - data never leaves your machine
- Fast - local inference, no network latency
- Flexible - try different models easily
- Production-ready - can deploy same setup anywhere
- Good balance of speed and capability
- Excellent at structured output tasks
- Small enough to run on most machines (3B params)
- Can upgrade to llama3.1 or mistral for better accuracy
Yes! The architecture is provider-agnostic. To add OpenAI support:
- Install
openaipackage - Modify
extractor.pyto support multiple backends - Add API key to
.env
- Industry standard for Python data validation
- Excellent error messages
- Type safety without boilerplate
- Auto-generates JSON schemas
- Focus on core concepts without UI distractions
- Easy to integrate into scripts/pipelines
- Lower barrier to entry (no frontend setup)
- Can easily wrap in FastAPI/Streamlit later
- LLMs are probabilistic, not deterministic
- Even with low temperature, occasional errors happen
- Retrying with error feedback has ~95% success rate
- Graceful degradation: fail after N attempts, don't hang forever
# Missing required fields
echo "Just some random text" > test.txt
python cli.py extract --input test.txt --type invoice
# → Should fail gracefully after max retries
# Malformed data
echo "Invoice: ABC, Total: not-a-number" > test.txt
python cli.py extract --input test.txt --type invoice
# → Should retry and either fix or fail clearly# Create invalid JSON
echo '{"invoice_number": 123}' > bad.json
python cli.py validate --schema invoice --file bad.json
# → Should fail: invoice_number must be string
# Test with valid data
python cli.py validate --schema invoice --file sample_outputs/invoice_success.json
# → Should passLogs are saved to logs/extraction_<timestamp>.log:
# Run with verbose mode
python cli.py extract --input sample_inputs/invoice_tech.txt --type invoice --verbose
# Check the latest log
ls -lt logs/ | head -2
cat logs/extraction_<timestamp>.logWhat to look for:
- Validation errors and retry triggers
- LLM response times
- Token usage (in OpenAI dashboard)
- Patterns in failures (schema issue? prompt issue?)
| Failure Type | Cause | How We Handle |
|---|---|---|
| Schema violation | Wrong type, missing field | Retry with error feedback |
| Hallucination | LLM invents data | Retry with stricter prompt |
| Partial extraction | Some fields missing | Retry with field-specific guidance |
| Timeout/API error | Network/rate limit | Fail fast with clear error |
| Invalid API key | Configuration error | Fail fast immediately |
| Max retries exceeded | Persistent validation errors | Fail gracefully with full error log |
✅ Type Safety: Pydantic ensures no silent failures
✅ Observability: Comprehensive logging of all attempts
✅ Resilience: Automatic retry with error recovery
✅ Fail-Fast: Clear errors on unrecoverable failures
✅ Configurability: Environment-based config, no hardcoded secrets
✅ Extensibility: Easy to add new schemas
✅ Testability: Validate schemas independently
✅ Documentation: Self-documenting code with Pydantic models
- Define the model in
src/schemas.py:
class ContractData(BaseModel):
contract_id: str
parties: List[str]
effective_date: str
termination_date: str
key_terms: List[str]- Register it in
EXTRACTION_SCHEMAS:
"contract": {
"model": ContractData,
"description": "Extract structured data from contracts",
"name": "extract_contract_data"
}- Use it:
python cli.py extract --input contract.txt --type contractfrom src import LLMExtractor
extractor = LLMExtractor(api_key="...")
# Process a batch
for file_path in invoice_files:
text = Path(file_path).read_text()
result = extractor.extract(text, schema_type="invoice")
if result.success:
save_to_database(result.data)
else:
log_failure(file_path, result.error_message)The current implementation is synchronous. For high-throughput:
from openai import AsyncOpenAI
class AsyncLLMExtractor:
async def extract(self, text: str, schema_type: str):
# Use await self.client.chat.completions.create(...)
...Token Usage:
- Invoice: ~500-800 tokens per extraction
- Email: ~400-600 tokens per extraction
- Support ticket: ~600-900 tokens per extraction
Cost (Ollama):
- FREE - runs locally on your machine
- No API costs, no usage limits
- Only cost is electricity (minimal)
Latency:
- Typical: 2-5 seconds per extraction (depends on hardware)
- With retries: 6-15 seconds worst case
- Much faster on GPU-enabled machines
Hardware Requirements:
- Minimum: 8GB RAM for llama3.2
- Recommended: 16GB RAM for better performance
- Optional: GPU for 5-10x speed improvement
Optimization tips:
- Batch similar documents together
- Use async for concurrent processing
- Cache repeated extractions
- Monitor token usage and adjust schemas
- Check if the schema is too strict for the input data
- Review logs to see what fields are failing
- Consider making some fields optional
- Improve the system prompt for clarity
- Rare but possible with some inputs
- Usually means the input is too ambiguous
- Add more context or examples to the prompt
- Not applicable with Ollama (local inference)
- If using a shared Ollama server, coordinate with team
- Make sure Ollama is installed:
ollama --version - Start Ollama service:
ollama serve - Check if running:
curl http://localhost:11434/api/tags - Verify OLLAMA_HOST in .env matches your setup
- Pull the model:
ollama pull llama3.2 - List available models:
ollama list - Check MODEL_NAME in .env matches an installed model
- Ollama Documentation
- Llama 3.2 Model Card
- Pydantic Documentation
- Designing LLM Systems for Production
- Prompt Engineering Guide
After completing this project, you should be able to:
- ✅ Explain the difference between free-form text generation and structured function calling
- ✅ Design Pydantic schemas for real-world data extraction tasks
- ✅ Implement retry logic with error-aware prompting
- ✅ Configure LLMs for deterministic vs creative behavior
- ✅ Debug validation failures using logs and error messages
- ✅ Extend the system with new extraction schemas
- ✅ Integrate LLM components into production pipelines
- ✅ Evaluate when to use LLMs vs traditional parsing
The Big Win: You now know how to build LLM systems that are reliable enough to deploy, not just impressive demos.
Ideas for improvements:
- Add async support for batch processing
- Implement caching layer for repeated inputs
- Add Streamlit UI for visual extraction
- Support for PDF/image inputs (OCR + extraction)
- Multi-language support
- Custom validation rules engine
- A/B testing different prompts
- Cost tracking and analytics
MIT License - feel free to use this for learning and commercial projects.
Built as part of a comprehensive LLM engineering learning track. This project demonstrates real-world patterns used in production systems at companies building reliable LLM applications.
Key insight: The difference between a cool demo and a production system is reliability. This project shows you how to cross that gap.
Questions? Issues? Check the logs first, then review CONCEPTS.md for deeper explanations of the patterns used here.
Happy extracting! 🚀