microgpt-ts

A TypeScript port of Andrej Karpathy's microgpt — a complete GPT built from scratch with zero runtime dependencies. Supports training and inference.

No PyTorch. No TensorFlow. Just TypeScript, ~500 lines of code. Runs in the browser.

Try the live playground: microgpt-ts.vercel.app

What is this?

A character-level GPT that learns to generate text. Everything is built from scratch: a Value class that tracks computations and backpropagates gradients through the chain rule, a GPT architecture (embeddings, multi-head attention, MLP, residual connections, RMSNorm), and an Adam optimizer. Train it on a preset dataset or your own text, then run inference.

Getting Started

Prerequisites: Node.js (v20+) and pnpm

CLI demo

pnpm install
pnpm demo

Trains on ~32k names for 1000 steps, then generates new ones:

num docs: 32033
vocab size: 27
chars: abcdefghijklmnopqrstuvwxyz
num params: 4192
step    1 / 1000 | loss 3.3075
step    2 / 1000 | loss 2.9965
step    3 / 1000 | loss 3.2289
...
step  801 / 1000 | loss 1.9760
training time: 11.06s

--- inference (new, hallucinated names) ---
sample  1: kadan
sample  2: alele
sample  3: rian
sample  4: kanani
sample  5: dahe

Web playground

An interactive browser playground that trains and runs the GPT entirely client-side — no server needed.

cd web
pnpm install
pnpm dev

Head to localhost:3000/playground to train a model and generate text in real time, or use the hosted version at microgpt-ts.vercel.app.

Features:

Preset datasets — baby names, Pokemon, company names, movie titles, fortunes, and more
Custom data — paste or upload your own text
Configurable architecture — adjust embedding size, attention heads, layers, and block size
Live training — real-time loss charts and sample generation as the model trains
Token-level inference — step through generation one token at a time with probability distributions
Temperature control — adjust sampling randomness during generation
Prefix conditioning — seed generation with a starting string

Project Structure

microgpt/                  Core library (~500 lines, zero dependencies)
  value.ts                   Autograd engine (forward + backward via chain rule)
  model.ts                   GPT architecture (embeddings, attention, MLP, inference)
  train.ts                   Training loop with Adam optimizer
  utils.ts                   Math helpers (softmax, sampling, matrix ops)
  browser.ts                 Browser runtime (async training, Web Workers, serialization)
datasets/                  Preset training datasets (baby names, pokemon, company names, ...)
scripts/demo.ts            CLI entry point: load data, train, generate
web/                       Next.js interactive playground

Progression

The blog post describes building the model up in layers, one component at a time. This repo follows the same progression — each step is a separate PR:

Step	What it adds	PR
train0	Bigram count table — no neural net, no gradients	#1
train1	MLP + manual gradients (numerical & analytic) + SGD	#2
train2	Autograd (`Value` class) — replaces manual gradients	#3
train3	Position embeddings + single-head attention + RMSNorm + residuals	#4
train4	Multi-head attention + layer loop — full GPT architecture	#5
train5	Adam optimizer	#6

Each step is also tagged (step-1-bigram through step-6-microgpt-completed) so you can browse the snapshots on GitHub or checkout any one locally: git checkout step-2-mlp.

Development

Scripts from the repo root:

pnpm demo          # Run CLI demo (train + generate)
pnpm dev           # Start web dev server (localhost:3000)
pnpm build:web     # Production build
pnpm storybook     # Component stories (localhost:6006)
pnpm lint          # Check with Biome
pnpm lint:fix      # Auto-fix with Biome

The web app uses Next.js 16, shadcn/ui, Tailwind CSS v4, and Biome for linting/formatting. See AGENTS.md for coding conventions.

Credits

Direct port of Karpathy's microgpt.py, which accompanies his blog post microgpt. As he puts it:

The most atomic way to train and run inference for a GPT in pure, dependency-free Python. This file is the complete algorithm. Everything else is just efficiency.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.agents/skills		.agents/skills
.claude/skills		.claude/skills
datasets		datasets
microgpt		microgpt
scripts		scripts
web		web
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
biome.jsonc		biome.jsonc
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

microgpt-ts

What is this?

Getting Started

CLI demo

Web playground

Project Structure

Progression

Development

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

microgpt-ts

What is this?

Getting Started

CLI demo

Web playground

Project Structure

Progression

Development

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages