A proof-of-concept command-line interface for translating natural language questions into SQL queries using an agentic approach with OpenAI's language models.
- Agentic SQL generation: Translates business questions to PostgreSQL queries
- Multi-turn conversations: Maintains context across the session for follow-up questions
- Clarifying questions: Asks for clarification when questions are ambiguous
- Error recovery: Automatically retries failed queries once with repairs
- Plain-text TUI: Simple, accessible terminal interface with no colors or fancy formatting
- Comprehensive logging: All interactions logged to file for debugging and audit
- No RAG, no schema slicing: Full database schema and context are embedded in the system prompt at startup
- Single database connection: PostgreSQL only (easily extensible to other databases)
- Tool-based execution: Uses OpenAI function calling to execute SQL queries
- Context-aware: Loads business background, guardrails, and examples from configuration files
- Python 3.12+
- PostgreSQL database
- OpenAI API key (or compatible API endpoint)
- uv - Fast Python package installer
-
Install uv (if not already installed):
# macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex" # Or with pip pip install uv
-
Clone and setup:
cd text-to-sql-poc # One-command setup: sync dependencies from lock file uv sync # Create .env file from template make setup-env # Edit .env with your credentials # Then run the application make run
That's it! uv sync automatically:
- Detects Python 3.12 from
.python-version - Creates a virtual environment (
.venv) - Installs all dependencies from
pyproject.toml - Creates/updates the lock file (
uv.lock)
A comprehensive Makefile is provided with modern uv commands:
# See all available commands
make help
# Quick setup (sync dependencies + create .env)
make setup
# Run the application (uses uv run)
make run
# Add a new dependency
make add PKG=requests
# Add a dev dependency
make add-dev PKG=pytest
# Update lock file after manual pyproject.toml changes
make lock
# Clean up (remove venv and cached files)
make clean
# View project information
make infoAvailable Makefile targets:
- Setup Commands:
sync,lock,setup,setup-env - Dependency Management:
add,add-dev,remove - Application Commands:
run,test,format,lint,check - Maintenance:
clean,info,logs,logs-tail,logs-clean - Development Helpers:
shell,python,db-check
All commands use modern uv features:
uv sync- Sync dependencies from lock fileuv run- Run commands in the project environmentuv add- Add dependencies to pyproject.tomluv lock- Update the lock file
Create a .env file in the project root (or set environment variables):
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1 # Optional, defaults to OpenAI
LLM_MODEL=gpt-4o-mini # Or gpt-4, gpt-4o, etc.
# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=your_database_name
DB_USER=your_database_user
DB_PASSWORD=your_database_password
# Logging (optional)
LOG_DIR=./logsThe application loads context from files in t2sql/context/. These files define the business domain, database schema, and behavior:
business_background.md: Description of the business domain and key conceptspersona.md: Agent communication style and behavior guidelinesguardrails.md: Safety rules and constraints for SQL generationddl.sql: Full PostgreSQL schema (CREATE TABLE statements)meanings.yaml: Detailed table/column descriptions and relationshipsexamples.md: High-quality example queries (natural language → SQL pairs)
Example files are provided in the repository. Customize these files to match your database schema and business domain.
Ensure your PostgreSQL database is running and accessible. The application will:
- Connect on startup to validate credentials
- Execute queries with configurable timeouts
- Automatically append LIMIT clauses to prevent large result sets
Using the Makefile (recommended):
make runOr directly with uv:
uv run python -m t2sqlOr manually activate the virtual environment:
source .venv/bin/activate # On Windows: .venv\Scripts\activate
python -m t2sqlt2sql - Text-to-SQL PoC CLI
Session log: ./logs/session-20250314-143022.log
Type :help for commands, :exit to quit
t2sql> What were the top 5 countries by revenue last month?
Plan: Join orders and customers to calculate country revenue for last month.
SQL:
```sql
SELECT
c.country_code,
SUM(o.total_amount) AS revenue
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.shipped_at >= date_trunc('month', CURRENT_DATE) - INTERVAL '1 month'
AND o.shipped_at < date_trunc('month', CURRENT_DATE)
AND o.status IN ('shipped', 'delivered')
GROUP BY c.country_code
ORDER BY revenue DESC
LIMIT 5;
Answer: Top 5 countries by revenue last month:
- USA: $1,234,567.89
- KOR: $987,654.32
- JPN: $765,432.10
- DEU: $543,210.98
- GBR: $432,109.87
t2sql> :exit Goodbye!
### Commands
- `:help` - Show help message with available commands
- `:exit` or `:quit` - Exit the application
- `:history` - Show conversation history count
## How It Works
1. **System Prompt Construction**: At startup, the application loads all context files and builds a comprehensive system prompt containing:
- Business domain knowledge
- Complete database schema (DDL)
- Table/column meanings and relationships
- Behavior guidelines and guardrails
- Example queries
2. **User Input Processing**: When you enter a question:
- The agent analyzes the question
- Plans which tables and columns to use
- Generates a SQL query
- Calls the `execute_sql` tool
3. **Query Execution**: The tool:
- Enforces a LIMIT clause (default: 1000 rows)
- Sets a timeout (default: 60 seconds)
- Executes the query against PostgreSQL
- Returns structured results or error details
4. **Error Handling**: If a query fails:
- The agent analyzes the error (e.g., SchemaError, TypeError, Timeout)
- Attempts to repair the query once
- If repair fails, asks the user for clarification
5. **Multi-turn Context**: The agent maintains conversation history, allowing:
- Follow-up questions ("What about this quarter?")
- Refinements ("Show only Electronics category")
- Clarifications ("I meant customer country, not shipping country")
## Logging
All interactions are logged to `logs/session-YYYYMMDD-HHMMSS.log`:
- User input (not echoed to stdout for privacy)
- LLM requests and responses
- Tool calls (SQL queries and parameters)
- Tool results (row counts, errors)
- Debug information for troubleshooting
Logs use INFO level for high-level events and DEBUG for detailed payloads.
## Limitations (Current PoC Scope)
- **No RAG**: Full schema must fit in context window
- **No schema slicing**: Cannot dynamically select relevant tables
- **Single database**: PostgreSQL only (though easily extensible)
- **No streaming**: Responses are not streamed
- **No query budgets**: No cost or execution time tracking
- **No caching**: Each query executes fresh (no result caching)
- **No visualization**: Text-only output (no charts or graphs)
- **Single process**: No multi-user support or concurrent sessions
## Customization
### Adding Your Own Schema
1. Edit `t2sql/context/ddl.sql` with your CREATE TABLE statements
2. Update `t2sql/context/meanings.yaml` with your table/column descriptions
3. Add example queries to `t2sql/context/examples.md`
4. Optionally customize `business_background.md`, `persona.md`, and `guardrails.md`
### Using a Different LLM
Set the `OPENAI_BASE_URL` and `LLM_MODEL` environment variables to point to any OpenAI-compatible API:
```bash
# Example: Using a local LLM server
OPENAI_BASE_URL=http://localhost:8000/v1
LLM_MODEL=mistral-7b-instruct
# Example: Using Azure OpenAI
OPENAI_BASE_URL=https://your-resource.openai.azure.com/
LLM_MODEL=gpt-4
Modify t2sql/tools.py to support other database drivers:
# Example: Adding MySQL support
import mysql.connector
def execute_sql_mysql(sql_query: str, ...):
conn = mysql.connector.connect(
host=config.db_host,
database=config.db_name,
user=config.db_user,
password=config.db_password
)
# ... similar logic- Create a
.envfile or set environment variables - Ensure the API key is valid
- Check database connection settings
- Verify PostgreSQL is running and accessible
- Check logs in
logs/directory for details
- Reduce the timeout with
:timeout_msparameter - Simplify the query or add more selective WHERE conditions
- Check database performance and indexes
- Review and improve
t2sql/context/meanings.yamldescriptions - Add similar examples to
t2sql/context/examples.md - Refine
guardrails.mdto prevent specific issues
This is a proof-of-concept demonstration. Customize as needed for your use case.
This PoC is designed to be extended and customized. Areas for enhancement:
- Add support for multiple database backends
- Implement schema slicing for large databases
- Add result caching and query optimization
- Support for visualization and chart generation
- Multi-user session management
- Query budget tracking and cost controls
- Streaming responses for better UX