ChartQA with Qwen3-VL

Fine-tuning Qwen3-VL-2B for chart question answering using SFT and GRPO.

Overview

This repository implements a two-stage training pipeline for chart question answering:

SFT (Supervised Fine-Tuning): Initial fine-tuning on ChartQA dataset
GRPO (Group Relative Policy Optimization): Reinforcement learning to improve format compliance and accuracy

Model

Base Model: Qwen/Qwen3-VL-2B-Instruct
SFT Model: Nhaass/Qwen3-VL-2B-ChartQA
GRPO Model: Nhaass/Qwen3-VL-2B-ChartQA-GRPO

Installation

pip install -r requirements.txt

Dataset Structure

ChartQADataset/
├── train/
│   ├── train_augmented.json
│   └── png/
├── val/
│   ├── val_augmented.json
│   └── png/
└── test/
    ├── test_augmented.json
    └── png/

Training

SFT Training

cd sft
python train.py

GRPO Training

cd grpo
python train.py

Evaluation

python test_model.py

Project Structure

ChartQA/
├── sft/                  # Supervised fine-tuning
│   ├── config.py
│   ├── model.py
│   ├── data_loader.py
│   ├── collator.py
│   ├── callbacks.py
│   ├── trainer.py
│   └── train.py
│
├── grpo/                 # GRPO training
│   ├── config.py
│   ├── model.py
│   ├── data_loader.py
│   ├── rewards.py
│   ├── callbacks.py
│   ├── trainer.py
│   └── train.py
│
├── test_model.py         # Evaluation script
└── requirements.txt

Configuration

SFT Config (`sft/config.py`)

Model: Qwen3-VL-2B-Thinking
LoRA rank: 16
Batch size: 1 (with gradient accumulation)
Learning rate: 2e-5

GRPO Config (`grpo/config.py`)

Reward components: format, accuracy, length
Lambda weights: configurable
KL coefficient: 0.1
Num samples per prompt: 4

Reward Function

GRPO uses a multi-component reward:

reward = λ_format × format × (λ_acc × accuracy + λ_len × length) + λ_format × format - 1

Where:

format: 1 if output has JSON format, 0 otherwise
accuracy: 1 if answer matches ground truth, 0 otherwise
length: Dynamic reward based on response length

Results

Test the model on ChartQA test set:

python test_model.py

Results are saved to test_results.json.

Demo

You can try the live demo on Hugging Face: ChartQA-Qwen3-VL-2B Demo

License

MIT License - see LICENSE file for details.

Citation

@misc{chartqa-qwen3vl,
  title={ChartQA with Qwen3-VL: SFT and GRPO Training},
  author={ChartQA Project},
  year={2026},
  url={https://github.com/yourusername/ChartQA}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChartQA with Qwen3-VL

Overview

Model

Installation

Dataset Structure

Training

SFT Training

GRPO Training

Evaluation

Project Structure

Configuration

SFT Config (`sft/config.py`)

GRPO Config (`grpo/config.py`)

Reward Function

Results

Demo

License

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ChartQADataset		ChartQADataset
grpo		grpo
sft		sft
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_model.py		test_model.py

License

cudnah124/ChartQA

Folders and files

Latest commit

History

Repository files navigation

ChartQA with Qwen3-VL

Overview

Model

Installation

Dataset Structure

Training

SFT Training

GRPO Training

Evaluation

Project Structure

Configuration

SFT Config (sft/config.py)

GRPO Config (grpo/config.py)

Reward Function

Results

Demo

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

SFT Config (`sft/config.py`)

GRPO Config (`grpo/config.py`)

Packages