Skip to content

alexdonea/swift-microgpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

swift-microgpt

A Swift port of Andrej Karpathy's microgpt, built directly on Apple Silicon GPUs using Metal Performance Shaders Graph (MPSGraph).

It's mathematically identical to the original Python version (same architecture, same Adam optimizer, same loss math) but runs a couple of orders of magnitude faster since it's fully native on GPU.

Architecture

I stuck to Karpathy's minimal GPT-2-inspired design with a few simplifications:

  • Decoder-only transformer (configurable depth/width)
  • Multi-head causal self-attention with separate Q, K, V, O projection matrices
  • RMSNorm instead of LayerNorm, and no biases anywhere
  • Pre-norm residual connections on attention and MLP blocks
  • MLP block: standard linear -> ReLU -> linear (4x expansion)
  • Adam optimizer with full bias correction and learning rate decay
  • Masked loss: cross-entropy averaged over real tokens (ignores padding)
  • Temperature sampling with numerically stable softmax for generation
  • Top-level async/await to play nice with Swift 6 strict concurrency

Performance

Benchmarks on M-series hardware (training on 32K names):

Script 1,000 steps 5,000 steps CPU/GPU
microgpt.py (Karpathy) ~80s ~6.5m CPU (Single-thread scalar)
SwiftMicroGPT ~1.7s ~9.5s GPU (MPSGraph vectorized)

Why is the Swift/Metal version so fast?

  1. Parallelism: MPSGraph runs all that matrix math across the GPU cores simultaneously, rather than chugging through Python's loops.
  2. Graph compilation: The forward pass, backward pass, and optimizer are compiled into one giant pre-optimized operation tree. No step-by-step overhead.
  3. Metal kernels: Apple's backend is hyper-optimized for their own SIMD/matrix silicon.
  4. Unified memory: CPU and GPU share the same RAM, so we don't have to copy state back and forth across a slow PCIe bus like on traditional discrete GPUs.

Output

After 5,000 training steps on the names dataset:

Dataset loaded. Vocab Size: 27, Documents: 32033
--- GPU Training Initiated ---
Step    0 | Loss: 9.2592
Step 1000 | Loss: 2.5203
Step 2000 | Loss: 2.8649
Step 3000 | Loss: 2.6742
Step 4000 | Loss: 2.5009
Step 4999 | Loss: 2.3209
Training Completed in 11.06 seconds.

--- Inference Generation ---
Sample  1: kanare
Sample  2: andie
Sample  3: bannalie
Sample  4: jadona
Sample  5: mailann
Sample  6: rinan
Sample  7: adilen
Sample  8: danion
Sample  9: mashan
Sample 10: maman

Getting Started

Prerequisites

  • macOS with Apple Silicon (M1 or newer)
  • Swift 6.1+ (included with Xcode 16+)

Build and Run

swiftc -O -o microgpt src/SwiftMicroGPT.swift src/main.swift \
  -framework Metal \
  -framework MetalPerformanceShaders \
  -framework MetalPerformanceShadersGraph

./microgpt

The dataset is expected at src/data/names.txt relative to the working directory.

Configuration

Hyperparameters are defined in the GPTConfig struct:

struct GPTConfig {
    let numberOfLayers: Int = 1
    let embeddingDimension: Int = 16
    let contextWindowSize: Int = 16
    let numberOfHeads: Int = 4
    var headDimension: Int { embeddingDimension / numberOfHeads }

    let learningRate: Float = 0.01
    let beta1: Float = 0.85
    let beta2: Float = 0.99
    let adamEpsilon: Float = 1e-8
    let numberOfTrainingSteps: Int = 5000
    let temperature: Float = 0.5
}
Parameter Description
numberOfLayers Number of transformer blocks
embeddingDimension Embedding dimension (model width)
contextWindowSize Maximum context length (attention window)
numberOfHeads Number of attention heads (must divide embeddingDimension)
learningRate Peak learning rate (linearly decayed to zero)
beta1, beta2 Adam momentum coefficients
numberOfTrainingSteps Total training iterations
temperature Sampling temperature for inference (lower = more conservative)

License

MIT License. See LICENSE for details.


*P.S. Expecting a kid and can't decide on a name? Dump your shortlist into src/data/names.txt, spin up a quick run, and let your Mac's GPU name your baby.

About

A Swift port of Karpathy’s microgpt. Runs natively on Apple Silicon GPUs via Metal Performance Shaders Graph.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages