A simple interpreter for a custom scripting language, implemented in C++ using classic compiler design principles.
This project implements a tree-walking interpreter that processes a dynamically-typed scripting language. The interpreter follows a three-stage architecture:
- Lexical Analysis - Tokenizes input into meaningful symbols
- Parsing - Builds an Abstract Syntax Tree (AST) using recursive descent
- Evaluation - Executes the program by traversing the AST
The language supports two dynamic types:
- Integers - Whole numbers (e.g.,
42,-17,0) - Strings - Text literals in double quotes (e.g.,
"hello","world\n")
Variables are dynamically typed and can hold either type.
Arithmetic/String Operators:
+- Addition (integers) or concatenation (strings)-- Subtraction (integers) or substring removal (strings)*- Multiplication (integers) or string repetition (string * int)/- Integer division=- Assignment (right-associative)
Operator Precedence (highest to lowest):
*,/+,-=
print <expr>- Output expression value without newlineprintln <expr>- Output expression value with newlinerepeat <count> begin <statements> end- Loop construct<variable> = <expr>- Variable assignment
All statements must end with a semicolon (;).
Single-line comments begin with //:
// This is a comment
x = 5; // Assign 5 to x
Basic Arithmetic:
x = 10;
y = 20;
println x + y; // Outputs: 30
String Operations:
name = "Alice";
greeting = "Hello, " + name;
println greeting; // Outputs: Hello, Alice
// String repetition
stars = "*" * 5;
println stars; // Outputs: *****
// Substring removal
text = "banana";
result = text - "na";
println result; // Outputs: bana (removes first occurrence)
Loops:
repeat 3 begin
println "Iteration";
end;
Chained Assignments:
a = b = c = 100; // Right-to-left: c=100, b=100, a=100
println a; // Outputs: 100
Complex Example:
// Fibonacci-like sequence
a = 0;
b = 1;
println a;
println b;
repeat 5 begin
temp = a + b;
println temp;
a = b;
b = temp;
end;
Lexer (lex.h, lex.cpp)
- Finite state machine tokenizer
- Converts source code into token stream
- Tracks line numbers for error reporting
- Supports escape sequences in strings (
\n, etc.)
Parser (parse.h, parse.cpp)
- Recursive descent parser with one-token lookahead
- Builds Abstract Syntax Tree (AST)
- Enforces grammar rules and operator precedence
- Continues parsing after errors to report multiple issues
Parse Tree (pt.h)
- Defines AST node types (expressions, statements, literals)
- Implements tree-walking interpreter via
Eval()methods - Each node type knows how to evaluate itself
Value System (val.h, val.cpp)
- Dynamic typing with runtime type checking
- Operator overloading for type-specific behaviors
- Type coercion rules for mixed operations
Main Driver (main.cpp)
- Command-line argument processing
- Parse tree statistics reporting
- Execution orchestration
Prog → Sl
Sl → Stmt ';' Sl | Stmt ';'
Stmt → PrintStmt | PrintlnStmt | RepeatStmt | Expr
PrintStmt → 'print' Expr
PrintlnStmt→ 'println' Expr
RepeatStmt → 'repeat' Expr 'begin' Stmt+ 'end'
Expr → Sum ('=' (Sum | Expr))*
Sum → Prod (('+' | '-') Prod)*
Prod → Primary (('*' | '/') Primary)*
Primary → '(' Expr ')' | IDENT | ICONST | SCONST
- C++11 compatible compiler (g++)
- Standard C++ library (no external dependencies)
makeThis produces an executable named program.
To clean build artifacts:
make clean./programType your program, then press Ctrl+D (EOF) to execute.
./program myprogram.txt-v- Verbose mode (TODO)<filename>- Read program from file
After execution, the interpreter prints statistics:
PLUS COUNT: <number of + operations>
EQ COUNT: <number of = operations>
MAX DEPTH: <maximum AST depth>
Parse Errors:
- Reported with line numbers
- Parser continues to find additional errors
- Program does not execute if parse errors exist
Runtime Errors:
- Type mismatches (e.g., adding string to integer)
- Division by zero
- Undefined variable access
- Invalid repeat counts (non-integer or negative)
Example error output:
RUNTIME ERROR: Repeat statement count must be an integer
- No conditionals - No
if/elsestatements - No functions - Cannot define subroutines
- No input - No
readorinputstatements - Global scope only - All variables are global
- Integer math only - No floating-point numbers
- No arrays - Only scalar values
- No comparison operators - No
<,>,==, etc.
Operations check types at runtime:
int + int→ integer additionstring + string→ concatenationint * int→ multiplicationstring * intorint * string→ repetitionint - int→ subtractionstring - string→ remove first occurrence of substring
Mixed-type operations (e.g., int + string) produce runtime errors.
String subtraction removes only the first occurrence of the substring:
text = "banana";
result = text - "na"; // "bana", not "ba"
Assignment is right-associative and returns a value:
x = y = 5; // Equivalent to: y = 5; x = y;
Assignment in parentheses is not allowed:
z = (x = 5); // PARSE ERROR
The body of a repeat statement must contain actual statements (print, assignment, or nested repeat), not bare expressions:
repeat 3 begin
println "ok"; // Valid
end;
repeat 3 begin
42; // INVALID - bare expression
end;
cpp-interpreter/
├── lex.h / lex.cpp # Lexical analyzer
├── parse.h / parse.cpp # Parser
├── pt.h # AST nodes and interpreter
├── val.h / val.cpp # Value type system
├── main.cpp # Driver program
├── io.cpp # I/O utilities
├── Makefile # Build configuration
└── README.md
Educational/academic project.