Neko is an educational end-to-end compiler that translates a custom high-level language into native x86_64 Linux Assembly. It is designed to demonstrate the core principles of compiler construction in a clear, modular, and modern C++ way.
The compiler follows the classic 5-phase architecture:
Note
This pipeline doesn't include optimisation phase.
- Lexical Analysis (Lexer): Converts raw source code into a stream of tokens.
- Syntax Analysis (Parser): Transforms tokens into an Abstract Syntax Tree (AST) using recursive descent.
- Semantic Analysis (Sema): Validates the AST for scope rules, variable declarations, and basic type consistency.
- Intermediate Representation (IR): Flattens the AST into Three-Address Code (TAC), handling control flow and temporaries.
- Code Generation (CodeGen): Translates TAC into x86_64 NASM Assembly following the System V AMD64 ABI.
- Variables: Declaration and assignment (
var x = 10;). - Functions: First-class function definitions with parameters and return values.
- Control Flow:
if-elsestatements andwhileloops. - Expressions: Arithmetic operations, comparisons, and logical negation.
- Built-ins: Native
printstatement for integers and strings.
The easiest way to build and run Neko is using the provided Docker environment, which comes pre-configured with all necessary tools (cmake, nasm, gcc, etc.).
From the project root, build and run the Docker container:
# Build the image
docker build -t neko-compiler .
# Run the container in the background
docker run -d --name neko-dev neko-compiler
# Enter the container
docker exec -it neko-dev bashInside the container, use the provided build script:
# Build the C++ compiler binary
./build.shThis creates the neko executable inside the build/ directory.
To compile a source file into assembly:
./build/neko ../tests/functions.neThis will:
- Print the AST.
- Print the generated Three-Address Code.
- Print the final x86_64 Assembly.
- Save the assembly to
output.asm.
Once you have output.asm, you can turn it into a native Linux executable:
# switch to the build directory
cd build
# assemble into an object file
nasm -felf64 output.asm
# link
gcc -no-pie output.o -o output
# and finally run the native binary
./outputTo keep the codebase clean, I use clang-format. You can run it via the build script or CMake:
./build.sh --format
# or just directly run the script
./format.shAll type of contributions are welcome!