Milestone 2: Tokenizer · Parser · AST · Generator
🧠 Goal: compile a minimal language statement
exit <int>;→ NASM x86-64 assembly → ELF executable → returns that exit code.
Part 1: https://l1nq.com/HydrogenPart1NotionNotes
- Lexer for:
return, integer literals,; - Direct
tokens → NASM(no AST) - Assembles with
nasm+ld - Demo program returns the chosen exit code (0–255)
- Replaced keyword
return→exit - Introduced a Tokenizer class with
peek()+consume() - Added a Parser + AST
Grammar:
ExitNode → 'exit' Expression ';' Expression → IntLiteral
- Added a Generator that emits NASM from the AST
- Full pipeline: tokenize → parse → generate → assemble → link → run
- Linux / WSL (for
nasm+ld) cmake,g++orclang++
Install on Ubuntu:
sudo apt update
sudo apt install -y nasm build-essential cmakeUsing helper scripts:
./scripts/build.sh
./scripts/run.sh
# prints "exit code: 21"Or manually:
cmake -S . -B build
cmake --build build -j
./build/hydro examples/exit_ok.hy
./out
echo $?- Tokenize
exit 21;→ [EXIT, INT_LITERAL(21), SEMICOLON] 2. Parse → AST
Exit {
Expression {
IntLiteral(21)
}
}- Generate NASM
global _start
_start:
mov rax, 60 ; sys_exit
mov rdi, 21 ; exit code
syscall- Assemble + Link
nasm -felf64 out.asm
ld -o out out.o- Run
./out
echo $? # → 21Hydrogen/
├─ src/
│ ├─ main.cpp
│ ├─ tokenization.hpp
│ ├─ parser.hpp
│ └─ generation.hpp
├─ examples/
│ └─ exit_ok.hy
├─ scripts/
│ ├─ build.sh
│ └─ run.sh
├─ docs/
│ └─ grammar.md
├─ CMakeLists.txt
└─ README.md🗓️ Milestone History
| Version | Stage | Highlights |
|---|---|---|
| v0.1-part1 | Minimal compiler | tokens → NASM → ELF |
| v0.2-part2 | Tokenizer + Parser + AST + Generator | full pipeline implemented |
🧩 Hydrogen is an educational experiment in building a self-hosting compiler from scratch — one step at a time.