Skip to content

dkovacevic/sql-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

SQL-LLM Fine-tuning & Deployment (TinyLLaMA + LoRA on MPS)

This repository shows how to fine-tune a TinyLLaMA model for Text-to-SQL generation using the Gretel synthetic text-to-sql dataset with LoRA adapters, and then export it for local inference in Ollama.

This guide walks through training, converting, quantizing, and running a SQL-aware LLM locally with llama.cpp and Ollama.

1. Environment Setup

1.1 Create & activate virtual environment

python3.12 -m venv .venv
source .venv/bin/activate

1.2 Install compatible dependencies

We pin to versions that work well together for training & conversion:

pip install --upgrade pip
pip uninstall -y transformers trl accelerate peft datasets huggingface_hub torchvision torchaudio

pip install \
    torch==2.4.1 \
    transformers==4.43.3 \
    trl==0.9.6 \
    accelerate==0.33.0 \
    peft==0.12.0 \
    datasets==2.20.0 \
    huggingface_hub==0.23.5
    

2. Train / Fine-tune Model

For faster runs:

  • MAXLEN=256
  • Use select(range(500)) instead of full dataset
  • max_steps=20

Run:

python main.py

3. Convert HuggingFace Model to GGUF

3.1 Clone llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

3.2 Convert merged HF model to GGUF

python3.12 convert_hf_to_gguf.py ../merged-model-tinyllama     --outfile ../llama-sql-f16.gguf     --outtype f16

4. Quantization (optional but recommended)

./build/bin/llama-quantize ../llama-sql-f16.gguf ../llama-sql-Q4_K_M.gguf Q4_K_M

Resulting file sizes:

llama-sql-f16.gguf  -> ~2.0 GB
llama-sql-Q4_K_M.gguf -> ~637 MB

5. Create Ollama Model

Modelfile example:

FROM ./llama-sql-Q4_K_M.gguf
TEMPLATE """
### Context:
{{{{ context }}}}

### Question:
{{{{ question }}}}

### Response:
"""
PARAMETER temperature 0

6. Create model in Ollama:

ollama rm sql-llm:latest 2>/dev/null || true
ollama create sql-llm:latest -f Modelfile

7. Run Queries

Example:

ollama run sql-llm "Users(id INT, name TEXT, age INT)
Orders(id INT, user_id INT, total NUMERIC)

Question:
Find top 5 users by total spend."

Expected output:

<sql_query>
SELECT
  u.id,
  u.name,
  COALESCE(SUM(o.total), 0) AS total_spend
FROM Users u
LEFT JOIN Orders o ON o.user_id = u.id
GROUP BY u.id, u.name
ORDER BY total_spend DESC
LIMIT 5;
</sql_query>
<explanation>
Join Orders to Users on user_id, aggregate totals, sort by spend descending, and return top 5 users.
</explanation>

Notes:

  • Ollama’s template variables ({{{{ context }}}}, {{{{ question }}}}) require a matching TEMPLATE in Modelfile.
  • If Ollama fails with unknown flag: --var, pass all input as a single string instead of separate --var arguments.

About

Small LLM trained on 100k sql queries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages