Skip to content

is9117/text-to-sql-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

t2sql - Text-to-SQL PoC CLI

A proof-of-concept command-line interface for translating natural language questions into SQL queries using an agentic approach with OpenAI's language models.

Features

  • Agentic SQL generation: Translates business questions to PostgreSQL queries
  • Multi-turn conversations: Maintains context across the session for follow-up questions
  • Clarifying questions: Asks for clarification when questions are ambiguous
  • Error recovery: Automatically retries failed queries once with repairs
  • Plain-text TUI: Simple, accessible terminal interface with no colors or fancy formatting
  • Comprehensive logging: All interactions logged to file for debugging and audit

Architecture

  • No RAG, no schema slicing: Full database schema and context are embedded in the system prompt at startup
  • Single database connection: PostgreSQL only (easily extensible to other databases)
  • Tool-based execution: Uses OpenAI function calling to execute SQL queries
  • Context-aware: Loads business background, guardrails, and examples from configuration files

Requirements

  • Python 3.12+
  • PostgreSQL database
  • OpenAI API key (or compatible API endpoint)
  • uv - Fast Python package installer

Installation

Quick Start

  1. Install uv (if not already installed):

    # macOS/Linux
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Windows
    powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
    
    # Or with pip
    pip install uv
  2. Clone and setup:

    cd text-to-sql-poc
    
    # One-command setup: sync dependencies from lock file
    uv sync
    
    # Create .env file from template
    make setup-env
    
    # Edit .env with your credentials
    # Then run the application
    make run

That's it! uv sync automatically:

  • Detects Python 3.12 from .python-version
  • Creates a virtual environment (.venv)
  • Installs all dependencies from pyproject.toml
  • Creates/updates the lock file (uv.lock)

Using the Makefile

A comprehensive Makefile is provided with modern uv commands:

# See all available commands
make help

# Quick setup (sync dependencies + create .env)
make setup

# Run the application (uses uv run)
make run

# Add a new dependency
make add PKG=requests

# Add a dev dependency
make add-dev PKG=pytest

# Update lock file after manual pyproject.toml changes
make lock

# Clean up (remove venv and cached files)
make clean

# View project information
make info

Available Makefile targets:

  • Setup Commands: sync, lock, setup, setup-env
  • Dependency Management: add, add-dev, remove
  • Application Commands: run, test, format, lint, check
  • Maintenance: clean, info, logs, logs-tail, logs-clean
  • Development Helpers: shell, python, db-check

All commands use modern uv features:

  • uv sync - Sync dependencies from lock file
  • uv run - Run commands in the project environment
  • uv add - Add dependencies to pyproject.toml
  • uv lock - Update the lock file

Configuration

1. Environment Variables

Create a .env file in the project root (or set environment variables):

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional, defaults to OpenAI
LLM_MODEL=gpt-4o-mini  # Or gpt-4, gpt-4o, etc.

# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=your_database_name
DB_USER=your_database_user
DB_PASSWORD=your_database_password

# Logging (optional)
LOG_DIR=./logs

2. Context Files

The application loads context from files in t2sql/context/. These files define the business domain, database schema, and behavior:

  • business_background.md: Description of the business domain and key concepts
  • persona.md: Agent communication style and behavior guidelines
  • guardrails.md: Safety rules and constraints for SQL generation
  • ddl.sql: Full PostgreSQL schema (CREATE TABLE statements)
  • meanings.yaml: Detailed table/column descriptions and relationships
  • examples.md: High-quality example queries (natural language → SQL pairs)

Example files are provided in the repository. Customize these files to match your database schema and business domain.

3. Database Setup

Ensure your PostgreSQL database is running and accessible. The application will:

  • Connect on startup to validate credentials
  • Execute queries with configurable timeouts
  • Automatically append LIMIT clauses to prevent large result sets

Usage

Starting the CLI

Using the Makefile (recommended):

make run

Or directly with uv:

uv run python -m t2sql

Or manually activate the virtual environment:

source .venv/bin/activate  # On Windows: .venv\Scripts\activate
python -m t2sql

Interactive Session

t2sql - Text-to-SQL PoC CLI
Session log: ./logs/session-20250314-143022.log
Type :help for commands, :exit to quit

t2sql> What were the top 5 countries by revenue last month?

Plan: Join orders and customers to calculate country revenue for last month.

SQL:
```sql
SELECT
    c.country_code,
    SUM(o.total_amount) AS revenue
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.shipped_at >= date_trunc('month', CURRENT_DATE) - INTERVAL '1 month'
  AND o.shipped_at < date_trunc('month', CURRENT_DATE)
  AND o.status IN ('shipped', 'delivered')
GROUP BY c.country_code
ORDER BY revenue DESC
LIMIT 5;

Answer: Top 5 countries by revenue last month:

  1. USA: $1,234,567.89
  2. KOR: $987,654.32
  3. JPN: $765,432.10
  4. DEU: $543,210.98
  5. GBR: $432,109.87

t2sql> :exit Goodbye!


### Commands

- `:help` - Show help message with available commands
- `:exit` or `:quit` - Exit the application
- `:history` - Show conversation history count

## How It Works

1. **System Prompt Construction**: At startup, the application loads all context files and builds a comprehensive system prompt containing:
   - Business domain knowledge
   - Complete database schema (DDL)
   - Table/column meanings and relationships
   - Behavior guidelines and guardrails
   - Example queries

2. **User Input Processing**: When you enter a question:
   - The agent analyzes the question
   - Plans which tables and columns to use
   - Generates a SQL query
   - Calls the `execute_sql` tool

3. **Query Execution**: The tool:
   - Enforces a LIMIT clause (default: 1000 rows)
   - Sets a timeout (default: 60 seconds)
   - Executes the query against PostgreSQL
   - Returns structured results or error details

4. **Error Handling**: If a query fails:
   - The agent analyzes the error (e.g., SchemaError, TypeError, Timeout)
   - Attempts to repair the query once
   - If repair fails, asks the user for clarification

5. **Multi-turn Context**: The agent maintains conversation history, allowing:
   - Follow-up questions ("What about this quarter?")
   - Refinements ("Show only Electronics category")
   - Clarifications ("I meant customer country, not shipping country")

## Logging

All interactions are logged to `logs/session-YYYYMMDD-HHMMSS.log`:

- User input (not echoed to stdout for privacy)
- LLM requests and responses
- Tool calls (SQL queries and parameters)
- Tool results (row counts, errors)
- Debug information for troubleshooting

Logs use INFO level for high-level events and DEBUG for detailed payloads.

## Limitations (Current PoC Scope)

- **No RAG**: Full schema must fit in context window
- **No schema slicing**: Cannot dynamically select relevant tables
- **Single database**: PostgreSQL only (though easily extensible)
- **No streaming**: Responses are not streamed
- **No query budgets**: No cost or execution time tracking
- **No caching**: Each query executes fresh (no result caching)
- **No visualization**: Text-only output (no charts or graphs)
- **Single process**: No multi-user support or concurrent sessions

## Customization

### Adding Your Own Schema

1. Edit `t2sql/context/ddl.sql` with your CREATE TABLE statements
2. Update `t2sql/context/meanings.yaml` with your table/column descriptions
3. Add example queries to `t2sql/context/examples.md`
4. Optionally customize `business_background.md`, `persona.md`, and `guardrails.md`

### Using a Different LLM

Set the `OPENAI_BASE_URL` and `LLM_MODEL` environment variables to point to any OpenAI-compatible API:

```bash
# Example: Using a local LLM server
OPENAI_BASE_URL=http://localhost:8000/v1
LLM_MODEL=mistral-7b-instruct

# Example: Using Azure OpenAI
OPENAI_BASE_URL=https://your-resource.openai.azure.com/
LLM_MODEL=gpt-4

Extending to Other Databases

Modify t2sql/tools.py to support other database drivers:

# Example: Adding MySQL support
import mysql.connector

def execute_sql_mysql(sql_query: str, ...):
    conn = mysql.connector.connect(
        host=config.db_host,
        database=config.db_name,
        user=config.db_user,
        password=config.db_password
    )
    # ... similar logic

Troubleshooting

"Required environment variable OPENAI_API_KEY is not set"

  • Create a .env file or set environment variables
  • Ensure the API key is valid

"Failed to initialize agent"

  • Check database connection settings
  • Verify PostgreSQL is running and accessible
  • Check logs in logs/ directory for details

"Query timeout"

  • Reduce the timeout with :timeout_ms parameter
  • Simplify the query or add more selective WHERE conditions
  • Check database performance and indexes

Agent produces incorrect SQL

  • Review and improve t2sql/context/meanings.yaml descriptions
  • Add similar examples to t2sql/context/examples.md
  • Refine guardrails.md to prevent specific issues

License

This is a proof-of-concept demonstration. Customize as needed for your use case.

Contributing

This PoC is designed to be extended and customized. Areas for enhancement:

  • Add support for multiple database backends
  • Implement schema slicing for large databases
  • Add result caching and query optimization
  • Support for visualization and chart generation
  • Multi-user session management
  • Query budget tracking and cost controls
  • Streaming responses for better UX

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published