This repository shows how to fine-tune a TinyLLaMA model for Text-to-SQL generation using the Gretel synthetic text-to-sql dataset with LoRA adapters, and then export it for local inference in Ollama.
This guide walks through training, converting, quantizing, and running a SQL-aware LLM locally with llama.cpp and Ollama.
python3.12 -m venv .venv
source .venv/bin/activateWe pin to versions that work well together for training & conversion:
pip install --upgrade pip
pip uninstall -y transformers trl accelerate peft datasets huggingface_hub torchvision torchaudio
pip install \
torch==2.4.1 \
transformers==4.43.3 \
trl==0.9.6 \
accelerate==0.33.0 \
peft==0.12.0 \
datasets==2.20.0 \
huggingface_hub==0.23.5
For faster runs:
MAXLEN=256- Use
select(range(500))instead of full dataset max_steps=20
Run:
python main.pygit clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpppython3.12 convert_hf_to_gguf.py ../merged-model-tinyllama --outfile ../llama-sql-f16.gguf --outtype f16./build/bin/llama-quantize ../llama-sql-f16.gguf ../llama-sql-Q4_K_M.gguf Q4_K_MResulting file sizes:
llama-sql-f16.gguf -> ~2.0 GB
llama-sql-Q4_K_M.gguf -> ~637 MB
Modelfile example:
FROM ./llama-sql-Q4_K_M.gguf
TEMPLATE """
### Context:
{{{{ context }}}}
### Question:
{{{{ question }}}}
### Response:
"""
PARAMETER temperature 0
ollama rm sql-llm:latest 2>/dev/null || true
ollama create sql-llm:latest -f ModelfileExample:
ollama run sql-llm "Users(id INT, name TEXT, age INT)
Orders(id INT, user_id INT, total NUMERIC)
Question:
Find top 5 users by total spend."Expected output:
<sql_query>
SELECT
u.id,
u.name,
COALESCE(SUM(o.total), 0) AS total_spend
FROM Users u
LEFT JOIN Orders o ON o.user_id = u.id
GROUP BY u.id, u.name
ORDER BY total_spend DESC
LIMIT 5;
</sql_query>
<explanation>
Join Orders to Users on user_id, aggregate totals, sort by spend descending, and return top 5 users.
</explanation>
Notes:
- Ollama’s template variables (
{{{{ context }}}},{{{{ question }}}}) require a matchingTEMPLATEinModelfile. - If Ollama fails with
unknown flag: --var, pass all input as a single string instead of separate--vararguments.