SQL-LLM Fine-tuning & Deployment (TinyLLaMA + LoRA on MPS)

This repository shows how to fine-tune a TinyLLaMA model for Text-to-SQL generation using the Gretel synthetic text-to-sql dataset with LoRA adapters, and then export it for local inference in Ollama.

This guide walks through training, converting, quantizing, and running a SQL-aware LLM locally with llama.cpp and Ollama.

1. Environment Setup

1.1 Create & activate virtual environment

python3.12 -m venv .venv
source .venv/bin/activate

1.2 Install compatible dependencies

We pin to versions that work well together for training & conversion:

pip install --upgrade pip
pip uninstall -y transformers trl accelerate peft datasets huggingface_hub torchvision torchaudio

pip install \
    torch==2.4.1 \
    transformers==4.43.3 \
    trl==0.9.6 \
    accelerate==0.33.0 \
    peft==0.12.0 \
    datasets==2.20.0 \
    huggingface_hub==0.23.5

2. Train / Fine-tune Model

For faster runs:

MAXLEN=256
Use select(range(500)) instead of full dataset
max_steps=20

Run:

python main.py

3. Convert HuggingFace Model to GGUF

3.1 Clone llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

3.2 Convert merged HF model to GGUF

python3.12 convert_hf_to_gguf.py ../merged-model-tinyllama     --outfile ../llama-sql-f16.gguf     --outtype f16

4. Quantization (optional but recommended)

./build/bin/llama-quantize ../llama-sql-f16.gguf ../llama-sql-Q4_K_M.gguf Q4_K_M

Resulting file sizes:

llama-sql-f16.gguf  -> ~2.0 GB
llama-sql-Q4_K_M.gguf -> ~637 MB

5. Create Ollama Model

Modelfile example:

FROM ./llama-sql-Q4_K_M.gguf
TEMPLATE """
### Context:
{{{{ context }}}}

### Question:
{{{{ question }}}}

### Response:
"""
PARAMETER temperature 0

6. Create model in Ollama:

ollama rm sql-llm:latest 2>/dev/null || true
ollama create sql-llm:latest -f Modelfile

7. Run Queries

Example:

ollama run sql-llm "Users(id INT, name TEXT, age INT)
Orders(id INT, user_id INT, total NUMERIC)

Question:
Find top 5 users by total spend."

Expected output:

<sql_query>
SELECT
  u.id,
  u.name,
  COALESCE(SUM(o.total), 0) AS total_spend
FROM Users u
LEFT JOIN Orders o ON o.user_id = u.id
GROUP BY u.id, u.name
ORDER BY total_spend DESC
LIMIT 5;
</sql_query>
<explanation>
Join Orders to Users on user_id, aggregate totals, sort by spend descending, and return top 5 users.
</explanation>

Notes:

Ollama’s template variables ({{{{ context }}}}, {{{{ question }}}}) require a matching TEMPLATE in Modelfile.
If Ollama fails with unknown flag: --var, pass all input as a single string instead of separate --var arguments.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Modelfile		Modelfile
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SQL-LLM Fine-tuning & Deployment (TinyLLaMA + LoRA on MPS)

1. Environment Setup

1.1 Create & activate virtual environment

1.2 Install compatible dependencies

2. Train / Fine-tune Model

3. Convert HuggingFace Model to GGUF

3.1 Clone llama.cpp

3.2 Convert merged HF model to GGUF

4. Quantization (optional but recommended)

5. Create Ollama Model

6. Create model in Ollama:

7. Run Queries

About

Uh oh!

Releases

Packages

Languages

dkovacevic/sql-llm

Folders and files

Latest commit

History

Repository files navigation

SQL-LLM Fine-tuning & Deployment (TinyLLaMA + LoRA on MPS)

1. Environment Setup

1.1 Create & activate virtual environment

1.2 Install compatible dependencies

2. Train / Fine-tune Model

3. Convert HuggingFace Model to GGUF

3.1 Clone llama.cpp

3.2 Convert merged HF model to GGUF

4. Quantization (optional but recommended)

5. Create Ollama Model

6. Create model in Ollama:

7. Run Queries

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages