Swift port of Karpathy's microgpt.py. Zero dependencies. Runs on macOS and Linux. ~4x faster than CPython.
I built this to understand how GPTs work, not by reading about transformers but by porting every operation from Python to Swift, line by line. Karpathy's blog post puts it well: "if you understand microgpt, you understand the algorithmic essence" of LLMs. I wanted to see if I actually did.
There are already dozens of ports out there (Rust, C, C++, Zig, even other Swift ones) but I still found it worthwhile. I wanted to understand the algorithm by reimplementing it in the language I work in daily.
swift build -c release
.build/release/microgpt-swiftDownloads names.txt on first run.
swift run also works but compiles in debug mode, which is ~10x slower.
| Time (1000 steps) | vs. Swift | |
|---|---|---|
| Python (CPython 3.14.3) | 65.0 ± 0.6s | 3.8x |
| Swift (release) | 17.0 ± 0.1s | 1.0x |
Apple M1 Max. Swift 6.2.4, CPython 3.14.3. Default hyperparameters, data pre-downloaded. Measured with hyperfine (3 runs, 1 warmup).
| Flag | Default | |
|---|---|---|
--steps |
1000 | training steps |
--lr |
0.01 | learning rate |
--seed |
42 | RNG seed |
--temperature |
0.5 | sampling temperature |
--samples |
20 | names to generate |
.build/release/microgpt-swift --steps 2000 --temperature 0.8
# or: swift run microgpt-swift -- --steps 2000 --temperature 0.8num docs: 32033
vocab size: 27
num params: 4192
step 1000 / 1000 | loss 2.6181
--- inference (new, hallucinated names) ---
sample 1: mari
sample 2: maren
sample 3: ran
sample 4: leynn
sample 5: amaron
sample 6: jaria
sample 7: kara
sample 8: orel
sample 9: tarili
sample 10: jarian
sample 11: fama
sample 12: arian
sample 13: araha
sample 14: ianda
sample 15: varria
sample 16: alinile
sample 17: darcan
sample 18: lare
sample 19: kareen
sample 20: radeia
- Andrej Karpathy for microgpt
- Point-Free for Xoshiro256** (MIT)
MIT