Skip to content

PyExtreme/llm_102

Repository files navigation

Week 3: Reliable LLM Systems - Data Extraction with Guardrails

Learning Track: From Chatbots to Production Components
Focus: Fine control, guardrails, and output reliability

A production-grade LLM-powered data extraction system that demonstrates how to build reliable, trustworthy LLM components through structured outputs, validation, and automatic error recovery.

🆓 Uses FREE local models via Ollama - no API costs!

🎯 What You'll Master

By the end of this project, you'll understand:

  1. Function Calling - How to force LLMs to return structured data instead of free-form text
  2. Schema Validation - Using Pydantic to enforce type safety and catch errors immediately
  3. Retry Logic - Automatic recovery from validation failures with error-aware prompting
  4. Deterministic Behavior - Configuring LLMs for consistency over creativity
  5. Production Patterns - Moving from "works sometimes" to "works reliably"

The Big Picture: Transform LLMs from unpredictable text generators into components you can actually use in production pipelines.


🏗️ System Architecture

┌─────────────────┐
│  Raw Text Input │  (email, invoice, support ticket)
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│  LLM Extractor (with Function Calling)  │
│  • GPT-4o-mini with low temperature     │
│  • Pydantic schema as function def      │
│  • Structured JSON output enforced      │
└────────┬────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────┐
│  Pydantic Validation Layer      │
│  • Type checking                │
│  • Field constraints            │
│  • Business logic rules         │
└────────┬───────────┬────────────┘
         │           │
    ✓ Valid     ✗ Invalid
         │           │
         │           ▼
         │    ┌──────────────────┐
         │    │  Retry Logic     │
         │    │  • Error feedback│
         │    │  • Max 3 attempts│
         │    └──────┬───────────┘
         │           │
         │           ▼
         │    (retry with corrections)
         │
         ▼
┌─────────────────┐
│  Valid JSON Out │
└─────────────────┘

📋 Prerequisites

  • Python 3.10+
  • Ollama installed (free local LLM runtime)
  • Basic understanding of async/await (helpful but not required)

🚀 Quick Start

1. Installation

# Install Ollama (if not already installed)
# macOS/Linux:
curl -fsSL https://ollama.ai/install.sh | sh
# Or macOS with Homebrew:
brew install ollama

# Start Ollama service (in a separate terminal)
ollama serve

# Pull the default model (llama3.2 - fast and capable)
ollama pull llama3.2

# Clone or navigate to the project
cd llm_102

# Install Python dependencies
pip install -r requirements.txt

# Set up environment variables (optional - has defaults)
cp .env.example .env
# Edit .env if you want to change model or Ollama host

2. Run Your First Extraction

# Extract data from a sample invoice
python cli.py extract \
  --input sample_inputs/invoice_tech.txt \
  --type invoice \
  --output output/invoice_result.json

# Extract data from an email
python cli.py extract \
  --input sample_inputs/email_project.txt \
  --type email

# Process a support ticket
python cli.py extract \
  --input sample_inputs/support_ticket_urgent.txt \
  --type support_ticket \
  --verbose

3. List Available Schemas

python cli.py list-schemas

4. Validate Existing JSON

python cli.py validate \
  --schema invoice \
  --file sample_outputs/invoice_success.json

📚 Project Structure

llm_102/
├── README.md                    # This file
├── CONCEPTS.md                  # Deep dive into core concepts
├── requirements.txt             # Python dependencies
├── .env.example                 # Environment template
├── cli.py                       # CLI interface (entry point)
│
├── src/
│   ├── __init__.py
│   ├── schemas.py               # Pydantic models for extraction
│   ├── extractor.py             # Core extraction engine
│   └── logging_config.py        # Logging setup
│
├── sample_inputs/               # Example documents to extract from
│   ├── invoice_tech.txt
│   ├── email_project.txt
│   ├── email_inquiry.txt
│   └── support_ticket_urgent.txt
│
└── sample_outputs/              # Example successful extractions
    ├── invoice_success.json
    └── email_success.json

🔧 Configuration

Environment Variables

Create a .env file with:

# Optional (defaults shown)
MODEL_NAME=llama3.2              # Free Ollama model
OLLAMA_HOST=http://localhost:11434  # Local Ollama server
TEMPERATURE=0.1                  # Low for deterministic outputs
MAX_RETRIES=3                    # Retry attempts on validation failure

Available Ollama Models:

  • llama3.2 (recommended) - Fast, capable, 3B params
  • llama3.1 - More powerful, 8B params
  • mistral - Alternative, good for JSON
  • phi3 - Microsoft's model, very fast

Pull any model with: ollama pull <model-name>

CLI Options

# Full control over extraction
python cli.py extract \
  --input <file> \
  --type <invoice|email|support_ticket> \
  --output <optional-output-file> \
  --model llama3.1 \                   # Override model
  --temperature 0.0 \                  # Override temperature
  --max-retries 5 \                    # Override retry limit
  --verbose \                          # Debug logging
  --show-attempts                      # Show all retry attempts

📖 Core Concepts

1. Function Calling (Structured Outputs)

Instead of parsing free-form text:

# ❌ Unreliable: Parse LLM's creative response
"The invoice total is $1,234.56 and it's due on March 15th"

# ✅ Reliable: Schema-enforced JSON
{
  "total_amount": 1234.56,
  "currency": "USD",
  "due_date": "2025-03-15"
}

How it works:

  • We define a Pydantic schema (e.g., InvoiceData)
  • Convert it to JSON Schema format
  • Include it in the prompt to guide the model
  • LLM returns JSON conforming to the schema
  • We validate with Pydantic

Note: Ollama models don't have native function calling like OpenAI, but we achieve the same result through structured prompting with JSON schemas.

See src/extractor.py lines 126-180 for implementation.

2. Pydantic Validation

Type Safety:

class InvoiceData(BaseModel):
    invoice_number: str
    total_amount: float = Field(..., gt=0)  # Must be positive
    due_date: str = Field(..., pattern=r"^\d{4}-\d{2}-\d{2}$")

Benefits:

  • Automatic type conversion ("42"42)
  • Field constraints (regex, min/max, custom validators)
  • Clear error messages for debugging
  • Self-documenting schemas

See src/schemas.py for all schema definitions.

3. Retry Logic with Error Feedback

When validation fails, we don't just give up:

# Attempt 1: LLM returns invalid date format
{"due_date": "03/15/2025"}  # ❌ Doesn't match YYYY-MM-DD

# System builds feedback prompt:
"Field 'due_date': string does not match regex pattern ^\d{4}-\d{2}-\d{2}$"

# Attempt 2: LLM corrects the error
{"due_date": "2025-03-15"}  # ✓ Valid!

Key insight: The LLM learns from its mistakes within the same request chain.

See src/extractor.py lines 75-120 for retry implementation.

4. Deterministic Behavior

Temperature = 0.1 (not 0.7+)

  • Consistent outputs for the same input
  • Still flexible enough to handle variations
  • Critical for production reliability

Trade-off:

  • Low temperature → deterministic, reliable
  • High temperature → creative, unpredictable

🎓 Learning Path

Step 1: Understand the Problem

Read CONCEPTS.md - it explains why these patterns matter.

Step 2: Explore the Schemas

Open src/schemas.py and see:

  • How Pydantic models define structure
  • Field validators for business logic
  • Nested models (e.g., InvoiceItem in InvoiceData)

Step 3: Trace an Extraction

Run with --verbose and watch the logs:

python cli.py extract \
  --input sample_inputs/invoice_tech.txt \
  --type invoice \
  --verbose

Follow the flow:

  1. Input text loaded
  2. LLM called with function schema
  3. Response validated
  4. (If failure) Retry with error feedback
  5. Success or final failure

Step 4: Experiment

Try breaking things to learn:

# What happens with incomplete data?
echo "Invoice #123, total $50" > test.txt
python cli.py extract --input test.txt --type invoice

# Can it handle ambiguous dates?
# Can it extract from messy formatting?

Step 5: Extend

Add your own schema:

  1. Define a new Pydantic model in schemas.py
  2. Add it to EXTRACTION_SCHEMAS
  3. Create sample inputs
  4. Test with the CLI

Example: Resume parser, product catalog, contract terms, etc.


🔬 Example Outputs

Successful Extraction (1 attempt)

⚙ Starting extraction...

┌─ Extraction Attempts ──────────────────┐
│  #  │ Status     │ Details             │
├─────┼────────────┼─────────────────────┤
│  1  │ ✓ Success  │ Valid data extracted│
└─────┴────────────┴─────────────────────┘

✓ Extraction succeeded after 1 attempt!

┌─ Extracted Data ───────────────────────┐
│ {
│   "invoice_number": "INV-2025-0342",
│   "invoice_date": "2025-03-10",
│   "due_date": "2025-04-09",
│   ...
│ }
└────────────────────────────────────────┘

Failed Extraction with Retry

⚙ Starting extraction...

┌─ Extraction Attempts ──────────────────────────────────┐
│  #  │ Status    │ Details                              │
├─────┼───────────┼──────────────────────────────────────┤
│  1  │ ✗ Failed  │ due_date: string does not match regex│
│  2  │ ✓ Success │ Valid data extracted                 │
└─────┴───────────┴──────────────────────────────────────┘

✓ Extraction succeeded after 2 attempts!

🛠️ Design Decisions

Why Ollama?

  • Free and open-source - no API costs
  • Privacy - data never leaves your machine
  • Fast - local inference, no network latency
  • Flexible - try different models easily
  • Production-ready - can deploy same setup anywhere

Why llama3.2?

  • Good balance of speed and capability
  • Excellent at structured output tasks
  • Small enough to run on most machines (3B params)
  • Can upgrade to llama3.1 or mistral for better accuracy

Can I still use OpenAI?

Yes! The architecture is provider-agnostic. To add OpenAI support:

  1. Install openai package
  2. Modify extractor.py to support multiple backends
  3. Add API key to .env

Why Pydantic?

  • Industry standard for Python data validation
  • Excellent error messages
  • Type safety without boilerplate
  • Auto-generates JSON schemas

Why CLI (not Web UI)?

  • Focus on core concepts without UI distractions
  • Easy to integrate into scripts/pipelines
  • Lower barrier to entry (no frontend setup)
  • Can easily wrap in FastAPI/Streamlit later

Why Retry Logic?

  • LLMs are probabilistic, not deterministic
  • Even with low temperature, occasional errors happen
  • Retrying with error feedback has ~95% success rate
  • Graceful degradation: fail after N attempts, don't hang forever

🧪 Testing the System

Test with Invalid Inputs

# Missing required fields
echo "Just some random text" > test.txt
python cli.py extract --input test.txt --type invoice
# → Should fail gracefully after max retries

# Malformed data
echo "Invoice: ABC, Total: not-a-number" > test.txt
python cli.py extract --input test.txt --type invoice
# → Should retry and either fix or fail clearly

Test Validation Separately

# Create invalid JSON
echo '{"invoice_number": 123}' > bad.json
python cli.py validate --schema invoice --file bad.json
# → Should fail: invoice_number must be string

# Test with valid data
python cli.py validate --schema invoice --file sample_outputs/invoice_success.json
# → Should pass

📊 Logs and Debugging

Logs are saved to logs/extraction_<timestamp>.log:

# Run with verbose mode
python cli.py extract --input sample_inputs/invoice_tech.txt --type invoice --verbose

# Check the latest log
ls -lt logs/ | head -2
cat logs/extraction_<timestamp>.log

What to look for:

  • Validation errors and retry triggers
  • LLM response times
  • Token usage (in OpenAI dashboard)
  • Patterns in failures (schema issue? prompt issue?)

🚦 Failure Modes and How We Handle Them

Failure Type Cause How We Handle
Schema violation Wrong type, missing field Retry with error feedback
Hallucination LLM invents data Retry with stricter prompt
Partial extraction Some fields missing Retry with field-specific guidance
Timeout/API error Network/rate limit Fail fast with clear error
Invalid API key Configuration error Fail fast immediately
Max retries exceeded Persistent validation errors Fail gracefully with full error log

🎯 What Makes This "Production-Ready"

Type Safety: Pydantic ensures no silent failures
Observability: Comprehensive logging of all attempts
Resilience: Automatic retry with error recovery
Fail-Fast: Clear errors on unrecoverable failures
Configurability: Environment-based config, no hardcoded secrets
Extensibility: Easy to add new schemas
Testability: Validate schemas independently
Documentation: Self-documenting code with Pydantic models


🔮 Extending the System

Add a New Schema

  1. Define the model in src/schemas.py:
class ContractData(BaseModel):
    contract_id: str
    parties: List[str]
    effective_date: str
    termination_date: str
    key_terms: List[str]
  1. Register it in EXTRACTION_SCHEMAS:
"contract": {
    "model": ContractData,
    "description": "Extract structured data from contracts",
    "name": "extract_contract_data"
}
  1. Use it:
python cli.py extract --input contract.txt --type contract

Integrate into a Pipeline

from src import LLMExtractor

extractor = LLMExtractor(api_key="...")

# Process a batch
for file_path in invoice_files:
    text = Path(file_path).read_text()
    result = extractor.extract(text, schema_type="invoice")
    
    if result.success:
        save_to_database(result.data)
    else:
        log_failure(file_path, result.error_message)

Add Async Support

The current implementation is synchronous. For high-throughput:

from openai import AsyncOpenAI

class AsyncLLMExtractor:
    async def extract(self, text: str, schema_type: str):
        # Use await self.client.chat.completions.create(...)
        ...

📈 Performance Considerations

Token Usage:

  • Invoice: ~500-800 tokens per extraction
  • Email: ~400-600 tokens per extraction
  • Support ticket: ~600-900 tokens per extraction

Cost (Ollama):

  • FREE - runs locally on your machine
  • No API costs, no usage limits
  • Only cost is electricity (minimal)

Latency:

  • Typical: 2-5 seconds per extraction (depends on hardware)
  • With retries: 6-15 seconds worst case
  • Much faster on GPU-enabled machines

Hardware Requirements:

  • Minimum: 8GB RAM for llama3.2
  • Recommended: 16GB RAM for better performance
  • Optional: GPU for 5-10x speed improvement

Optimization tips:

  • Batch similar documents together
  • Use async for concurrent processing
  • Cache repeated extractions
  • Monitor token usage and adjust schemas

🤔 Common Issues

"Validation failed after 3 attempts"

  • Check if the schema is too strict for the input data
  • Review logs to see what fields are failing
  • Consider making some fields optional
  • Improve the system prompt for clarity

"No function call in response"

  • Rare but possible with some inputs
  • Usually means the input is too ambiguous
  • Add more context or examples to the prompt

"Rate limit exceeded"

  • Not applicable with Ollama (local inference)
  • If using a shared Ollama server, coordinate with team

"Connection refused" or "Ollama not responding"

  • Make sure Ollama is installed: ollama --version
  • Start Ollama service: ollama serve
  • Check if running: curl http://localhost:11434/api/tags
  • Verify OLLAMA_HOST in .env matches your setup

"Model not found"

  • Pull the model: ollama pull llama3.2
  • List available models: ollama list
  • Check MODEL_NAME in .env matches an installed model

📚 Further Reading


🎓 Learning Outcomes

After completing this project, you should be able to:

  1. Explain the difference between free-form text generation and structured function calling
  2. Design Pydantic schemas for real-world data extraction tasks
  3. Implement retry logic with error-aware prompting
  4. Configure LLMs for deterministic vs creative behavior
  5. Debug validation failures using logs and error messages
  6. Extend the system with new extraction schemas
  7. Integrate LLM components into production pipelines
  8. Evaluate when to use LLMs vs traditional parsing

The Big Win: You now know how to build LLM systems that are reliable enough to deploy, not just impressive demos.


🤝 Contributing

Ideas for improvements:

  • Add async support for batch processing
  • Implement caching layer for repeated inputs
  • Add Streamlit UI for visual extraction
  • Support for PDF/image inputs (OCR + extraction)
  • Multi-language support
  • Custom validation rules engine
  • A/B testing different prompts
  • Cost tracking and analytics

📝 License

MIT License - feel free to use this for learning and commercial projects.


🙏 Acknowledgments

Built as part of a comprehensive LLM engineering learning track. This project demonstrates real-world patterns used in production systems at companies building reliable LLM applications.

Key insight: The difference between a cool demo and a production system is reliability. This project shows you how to cross that gap.


Questions? Issues? Check the logs first, then review CONCEPTS.md for deeper explanations of the patterns used here.

Happy extracting! 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published