A TypeScript port of Andrej Karpathy's microgpt — a complete GPT built from scratch with zero runtime dependencies. Supports training and inference.
No PyTorch. No TensorFlow. Just TypeScript, ~500 lines of code. Runs in the browser.
Try the live playground: microgpt-ts.vercel.app
A character-level GPT that learns to generate text. Everything is built from scratch: a Value class that tracks computations and backpropagates gradients through the chain rule, a GPT architecture (embeddings, multi-head attention, MLP, residual connections, RMSNorm), and an Adam optimizer. Train it on a preset dataset or your own text, then run inference.
Prerequisites: Node.js (v20+) and pnpm
pnpm install
pnpm demoTrains on ~32k names for 1000 steps, then generates new ones:
num docs: 32033
vocab size: 27
chars: abcdefghijklmnopqrstuvwxyz
num params: 4192
step 1 / 1000 | loss 3.3075
step 2 / 1000 | loss 2.9965
step 3 / 1000 | loss 3.2289
...
step 801 / 1000 | loss 1.9760
training time: 11.06s
--- inference (new, hallucinated names) ---
sample 1: kadan
sample 2: alele
sample 3: rian
sample 4: kanani
sample 5: dahe
An interactive browser playground that trains and runs the GPT entirely client-side — no server needed.
cd web
pnpm install
pnpm devHead to localhost:3000/playground to train a model and generate text in real time, or use the hosted version at microgpt-ts.vercel.app.
Features:
- Preset datasets — baby names, Pokemon, company names, movie titles, fortunes, and more
- Custom data — paste or upload your own text
- Configurable architecture — adjust embedding size, attention heads, layers, and block size
- Live training — real-time loss charts and sample generation as the model trains
- Token-level inference — step through generation one token at a time with probability distributions
- Temperature control — adjust sampling randomness during generation
- Prefix conditioning — seed generation with a starting string
microgpt/ Core library (~500 lines, zero dependencies)
value.ts Autograd engine (forward + backward via chain rule)
model.ts GPT architecture (embeddings, attention, MLP, inference)
train.ts Training loop with Adam optimizer
utils.ts Math helpers (softmax, sampling, matrix ops)
browser.ts Browser runtime (async training, Web Workers, serialization)
datasets/ Preset training datasets (baby names, pokemon, company names, ...)
scripts/demo.ts CLI entry point: load data, train, generate
web/ Next.js interactive playground
The blog post describes building the model up in layers, one component at a time. This repo follows the same progression — each step is a separate PR:
| Step | What it adds | PR |
|---|---|---|
| train0 | Bigram count table — no neural net, no gradients | #1 |
| train1 | MLP + manual gradients (numerical & analytic) + SGD | #2 |
| train2 | Autograd (Value class) — replaces manual gradients |
#3 |
| train3 | Position embeddings + single-head attention + RMSNorm + residuals | #4 |
| train4 | Multi-head attention + layer loop — full GPT architecture | #5 |
| train5 | Adam optimizer | #6 |
Each step is also tagged (step-1-bigram through step-6-microgpt-completed) so you can browse the snapshots on GitHub or checkout any one locally: git checkout step-2-mlp.
Scripts from the repo root:
pnpm demo # Run CLI demo (train + generate)
pnpm dev # Start web dev server (localhost:3000)
pnpm build:web # Production build
pnpm storybook # Component stories (localhost:6006)
pnpm lint # Check with Biome
pnpm lint:fix # Auto-fix with BiomeThe web app uses Next.js 16, shadcn/ui, Tailwind CSS v4, and Biome for linting/formatting. See AGENTS.md for coding conventions.
Direct port of Karpathy's microgpt.py, which accompanies his blog post microgpt. As he puts it:
The most atomic way to train and run inference for a GPT in pure, dependency-free Python. This file is the complete algorithm. Everything else is just efficiency.