swift-microgpt

A Swift port of Andrej Karpathy's microgpt, built directly on Apple Silicon GPUs using Metal Performance Shaders Graph (MPSGraph).

It's mathematically identical to the original Python version (same architecture, same Adam optimizer, same loss math) but runs a couple of orders of magnitude faster since it's fully native on GPU.

Architecture

I stuck to Karpathy's minimal GPT-2-inspired design with a few simplifications:

Decoder-only transformer (configurable depth/width)
Multi-head causal self-attention with separate Q, K, V, O projection matrices
RMSNorm instead of LayerNorm, and no biases anywhere
Pre-norm residual connections on attention and MLP blocks
MLP block: standard linear -> ReLU -> linear (4x expansion)
Adam optimizer with full bias correction and learning rate decay
Masked loss: cross-entropy averaged over real tokens (ignores padding)
Temperature sampling with numerically stable softmax for generation
Top-level async/await to play nice with Swift 6 strict concurrency

Performance

Benchmarks on M-series hardware (training on 32K names):

Script	1,000 steps	5,000 steps	CPU/GPU
`microgpt.py` (Karpathy)	~80s	~6.5m	CPU (Single-thread scalar)
`SwiftMicroGPT`	~1.7s	~9.5s	GPU (MPSGraph vectorized)

Why is the Swift/Metal version so fast?

Parallelism: MPSGraph runs all that matrix math across the GPU cores simultaneously, rather than chugging through Python's loops.
Graph compilation: The forward pass, backward pass, and optimizer are compiled into one giant pre-optimized operation tree. No step-by-step overhead.
Metal kernels: Apple's backend is hyper-optimized for their own SIMD/matrix silicon.
Unified memory: CPU and GPU share the same RAM, so we don't have to copy state back and forth across a slow PCIe bus like on traditional discrete GPUs.

Output

After 5,000 training steps on the names dataset:

Dataset loaded. Vocab Size: 27, Documents: 32033
--- GPU Training Initiated ---
Step    0 | Loss: 9.2592
Step 1000 | Loss: 2.5203
Step 2000 | Loss: 2.8649
Step 3000 | Loss: 2.6742
Step 4000 | Loss: 2.5009
Step 4999 | Loss: 2.3209
Training Completed in 11.06 seconds.

--- Inference Generation ---
Sample  1: kanare
Sample  2: andie
Sample  3: bannalie
Sample  4: jadona
Sample  5: mailann
Sample  6: rinan
Sample  7: adilen
Sample  8: danion
Sample  9: mashan
Sample 10: maman

Getting Started

Prerequisites

macOS with Apple Silicon (M1 or newer)
Swift 6.1+ (included with Xcode 16+)

Build and Run

swiftc -O -o microgpt src/SwiftMicroGPT.swift src/main.swift \
  -framework Metal \
  -framework MetalPerformanceShaders \
  -framework MetalPerformanceShadersGraph

./microgpt

The dataset is expected at src/data/names.txt relative to the working directory.

Configuration

Hyperparameters are defined in the GPTConfig struct:

struct GPTConfig {
    let numberOfLayers: Int = 1
    let embeddingDimension: Int = 16
    let contextWindowSize: Int = 16
    let numberOfHeads: Int = 4
    var headDimension: Int { embeddingDimension / numberOfHeads }

    let learningRate: Float = 0.01
    let beta1: Float = 0.85
    let beta2: Float = 0.99
    let adamEpsilon: Float = 1e-8
    let numberOfTrainingSteps: Int = 5000
    let temperature: Float = 0.5
}

Parameter	Description
`numberOfLayers`	Number of transformer blocks
`embeddingDimension`	Embedding dimension (model width)
`contextWindowSize`	Maximum context length (attention window)
`numberOfHeads`	Number of attention heads (must divide `embeddingDimension`)
`learningRate`	Peak learning rate (linearly decayed to zero)
`beta1`, `beta2`	Adam momentum coefficients
`numberOfTrainingSteps`	Total training iterations
`temperature`	Sampling temperature for inference (lower = more conservative)

License

MIT License. See LICENSE for details.

*P.S. Expecting a kid and can't decide on a name? Dump your shortlist into src/data/names.txt, spin up a quick run, and let your Mac's GPU name your baby.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

swift-microgpt

Architecture

Performance

Output

Getting Started

Prerequisites

Build and Run

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

swift-microgpt

Architecture

Performance

Output

Getting Started

Prerequisites

Build and Run

Configuration

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages