This project implements a compact convolutional neural network for classifying 28×28 grayscale drawings from the Quick, Draw! dataset (or subsets exported in .npy format). The codebase handles dataset ingestion, training, evaluation, and exporting the model.
The goal is straightforward: learn a fast, reasonably expressive architecture that performs well on low-resolution synthetic line drawings without depending on heavyweight backbones.
.
│ inference.py
│ model.py
│ prepare_data.py
│ training.py
│
├───scripts
│ │ check_torch_cuda.py
│ │ export_as_onnx.py
│ │ prepare_subset.py
│ └───pycache
└───pycache
- model.py — Defines the
SketchCNNarchitecture. - prepare_data.py — Contains the
QuickDrawNPYdataset class. - training.py — Training loop, evaluation function, checkpoint saving.
- inference.py — Run trained model inference.
- scripts/ — Subset generation, CUDA diagnostics, and ONNX export.
The project expects a directory containing one .npy file per class, where each file holds:
Example: data/ └── quickdraw_subset_10k/ ├── airplane.npy ├── apple.npy ├── backpack.npy ...
During loading:
- All
.npyarrays are concatenated. - Each sample becomes a (1, 28, 28) float tensor in
[0,1]. - Class labels are derived from the sorted filenames.
- Optional transforms apply after tensor conversion.
SketchCNN is a compact but deep convolutional network tuned for low-resolution doodle-like drawings.
- Convolutional blocks expand channels 64 → 128 → 256.
- Each block uses repeated Conv → BatchNorm → LeakyReLU layers.
- MaxPool downsampling in the first two stages; final AvgPool compresses to 4×4.
- LeakyReLU mitigates dead features often seen in sparse line drawings.
Output shape before the classifier: 256×4×4.
- Flatten → Linear(4096 → 256) → LeakyReLU → Dropout → Linear → logits.
- Final output dimension = number of classes (auto-detected).
Training configuration lives in training.py.
- Loss: CrossEntropy
- Optimizer: Adam (lr=1e-3, weight_decay=1e-4)
- Scheduler: StepLR (step_size=10, gamma=0.5)
- Epochs: 20
- Batch size: 256
- Device: CUDA if available
Dataset split into 90% train / 10% test via random_split.
Normalization: bash Normalize(mean=0.5, std=0.5)
Each epoch:
- Full training pass with gradient updates.
- Validation accuracy computed via
evaluate(). - Scheduler step.
Training prints epoch loss and validation accuracy.
After training, weights are saved to: bash models/model_weights.pth
python training.py