My implementation of second part of Lox programming language from Crafting interpreters in C++. This implementation uses a bytecode VM to execute the programs.
Install meson, ninja and a C++ 20 capable compiler
$ meson setup builddir
$ cd builddir
$ ninja -j8
If you are on linux, run
$ ./src/bin/cpplox
On windows, run
.\src\bin\cpplox.exe
After building, three binaries are produced
./builddir/src/bin/cpplox - Includes a Compiler, VM and a REPL
./builddir/src/bin/loxvm - Standalone VM, does not include compiler, used for running compiled lox programs
./builddir/src/bin/loxdump - For disassembling compiled lox programs
./builddir/src/bin/loxcrc - For updating the CRC checksum of a compiled lox program (operates on the file inplace)
To run tests, you need to have Python 3
First install pytest, a test automation library used to test programs.
Create a virtual environment
$ python -m venv .venv
Install dependencies
$ pip install -r requirements.txt
Run tests
$ meson setup builddir -Denable-tests=true
$ cd builddir
$ ninja -j8
$ meson test -v
$ cd builddir
$ meson test --benchmark --interactive
Have Clang installed along with asan. You also need to follow the steps in the above section, pytest must be installed and available on the PATH
Then run the following command,
$ ./development.sh
$ cpplox --help
Lox compiler and interpreter in C++
Usage:
cpplox [OPTION...] positional parameters
--print-tokens Show all tokens generated by the lexer
along with line position
--compile-only Compiles the source code to bytecode but
does not execute it
--dump-bytecode Dumps the bytecode generated by the
compiler before execution
--trace-execution Used for debugging: Prints the bytecode
instruction before executing it
--trace-eval-stack Used for debugginig: Prints the internal
evaluation stack before the instruction is
executed
--enable-step-mode Used for debugging: Enables step mode and
waits for input before an instruction is
executed
--eval-stack-size arg Maximum evaluation stack size allowed
(default: 1024)
--gc-initial-collection-threshold arg
Number of bytes after which GC will collect
garbage for the first time (default:
1048576)
--log-gc Logs events that occur during a GC
collection
--display-mem-stats Display memory stats, such as number of
objects allocated, bytes allocated, etc on
program exit
--stress-gc Stresses the GC by triggering a collection
everytime a new object is created
-c, --command arg Execute the given command and exit
-g, --emit-debug-info Include debug information in compiled
bytecode (line number mappings). Can only
be used with -o/--output flag
-o, --output arg Compiles the script, generates the bytecode
and store it in the specified file. Does
not execute the program
-h, --help Prints this help message
-v, --version Prints program version
Check out the examples in the examples/ directory.
For a complex example see, a math evaluation program using Shunting-Yard algorithm here
Supports storing the compiled bytecode in a file, then loading and executing it. See bytecode file format to learn more about how the compiled program is stored on disk.
To generate a bytecode file, use the -o/--output flag, and optionally the -g/--emit-debug-info flag to emit the line-bytecode offset mapping.
Example:
$ cpp-lox program.lox -o compiled.loxc
or
$ cpp-lox program.lox -o compiled.loxc -g
If you also want to include bytecode line positions.
Execute it with,
$ cpp-lox compiled.loxc
Use src/bin/loxdump to disassemble a compiled lox program
$ ./src/bin/loxdump compiled.loxc
Implement list data structure (Note: It's implemented as a dynamic array (vector) rather than a linked list (like CPython))
Example:
var x = [1, 2, 3]; // Create a list
println(x[0]); // Get element at index, prints 1
println(x); // Print the list
println(x[0] = 8); // Prints 8, sets index 0 to 8
Implementation: Two new opcodes LIST and LIST_APPEND have been added. LIST takes a single byte operand, the intial size of the list (n), and it reads & pops the top n values on stack and creates a new list object. If n >= 10, the compiler will emit a LIST list instruction to build the list with 10 values, and for the remaining values, it adds a LIST_APPEND after emitting the bytecode for the expression. This is done so that the stack does not overflow when a list is declared with a lot of elements.
Note: Allocating huge lists eg (list(10000000, 0)) may cause the program to pause because the GC has to trace through all those values even if there are no object references. In the future, try to optimize this
Another two instructions: LOAD_INDEX and STORE_INDEX to access and modify list members
For STORE_INDEX: [container, index, value] -> [ ] after execution
For LOAD_INDEX: [container, index] -> [value]
Native functions to work with lists
len(list) int: Returns the length of the list, works the same with stringscap(list) int: Returns the capacity of the backing array of the listlist(len, default, cap) list: Creates a new list with the specified length, capacity, and default value. All three arguments are optional. len is the initial length of the list, cap is the capacity of the list (after which a new reallocation will be triggered), and default is the default value when creating the list (default is nil).append(list, element): Adds an element to the end of a listdelete(list, index): Removes an element at specified index from listclear(list): Clears all values of the listpop(list) value: Removes and returns the last element from a list
Indexing works with strings too
The result of indexing a string results in another string (there is no character datatype)
Example:
var x = "hello";
println(x[1]); // prints e
x[2] = "x"; // Not allowed, throws error since strings are immutable
This implementation also supports associative data structure, map. Internally it is implemented using hash maps (the same implementation is used for methods and fields)
Declaring a new map,
// Keys & values can be any expression
var x = {
"name": "hello",
3: "three",
3: "three",
};
// Empty map
var x = {};
// Can also be declared as
var x = map();
// Printing type
println(type(x)) // map
Accessing & Modifying values in a map
var x = {
"three": 3,
"four": 4
};
println(x["three"]); // Prints 3
x["three"] = -3; // Updates the value of key "three" to -3
x["two"] = 2; // If a key does not exist, it gets created
println(x["five"]); // If a key does not exist, throws runtime error
Native functions to work with maps
len(map) int: Returns the number of key-value pairs in the mapmap() map: Creates a new mapappend(map, key, value): Adds the key-value pair to the map (same as using the[]operator)delete(map, key) bool: Deletes the key from the map, returns true if the key was removed, false if the key does not existkeys(map) list: Returns a list of keys of the mapvalues(map) list: Returns a list of values of the maphas(map, key) bool: Returns true if the key exists in the mapclear(map): Clears all key-value pairs of the mapget(map, key, default): Returns the value associated with the key in the map, if the key does not exist, returns the set default value.defaultis optional, and if not specified, returns nil
Note: Currently there is no way to iterate over all elements of the map using a loop (no range for loop yet), so use the keys function to retrieve all the keys, then iterate over this list using a loop.
Note: Also objects can be used as map keys, but there is no way to specify custom hash or equality operators, so they will be checked for equality based on whether they are the same object. (same memory location)
Lists & Maps are unhashable, and cannot be used as map keys (similar behavior to CPython)
The compiler de-duplicates integers (but not real numbers)
Supports 65k constants, and 65k global variables (accessed with 2 byte unsigned index)
const variables
In global scope,
If a constant of the same name has already been declared, do not allow
redeclaration with var Example:
>> const x = 32;
>> var x = 50; // Not allowed
>> const x = 80;
>> const x = 90; // Not Allowed
>> var x = 95;
>> var x = 100; // Allowed
One difference is that redeclaration with const is not allowed since it helps make the vm simpler, otherwise an extra flag/byte needs to be maintained to identify if it's a const declaration or a var declaration.
Supports C-style switch statements with cases and default:
switch (value) {
case 1:
print "one";
break;
case 2:
var foo = "hello";
print foo;
break;
default:
print "other";
}
- Both the selector and the case value can be expressions
- Each arm of the switch statement defines a local scope, so variables can be declared within it
- There is no break statement, there is no fallthrough
- Cases cannot appear after the default statement
In local scope redeclaration of any form is disallowed
-
sqrt(value number) double- Returns the square root of the number -
exit(status_code int)- Exits the interpreter with the status code -
input() string- Takes a line of input fromstdin, does not include terminating\n -
print(args...)- Prints all the arguments, separated by a space -
println(args...)- Prints all the arguments, separated by a space, and adds a newline at the end -
len(str string) int- Returns the length of the string / list -
to_string(value) string- Converts the given value into a string -
to_int(value) int- Converts the given value into an int -
to_double(value) double- Converts the given value into a double -
type(value) string- Returns the type of the value as string -
rand() double- Returns a random double between [0, 1) (inclusive of 0, exclusive of 1) -
randint(m int, n int) int- Returns a random integer between m & n (inclusive at both ends) -
assert(value bool, message string)- Raises a runtime error & prints the message if value is false. Does nothing if value is true -
sys__mem_get_bytes_allocated() int- Returns the total number of bytes allocated by the garbage collector -
sys__mem_get_bytes_freed() int- Returns the total number of bytes freed by the garbage collector -
sys__mem_get_next_gc() int- Returns the threshold in bytes at which the next garbage collection will trigger -
sys__mem_get_objects_created() int- Returns the total number of objects created by the garbage collector -
sys__mem_get_objects_freed() int- Returns the total number of objects freed by the garbage collector -
sys__mem_get_live_objects() int- Returns the current number of live objects managed by the garbage collector -
sys__mem_get_net_bytes() int- Returns the net bytes currently allocated (allocated - freed) -
sys__mem_display_gc_stats()- Prints detailed garbage collection statistics to the console -
hash(value)- Returns the hash value (int64_t of a value), throws error if the type is not hashable -
is_hashable(value) bool- Returns true if the value is hashable, false otherwise (like lists, maps)
print statement is renamed to echo. This was done to maintain compatability with tests without major refactoring.
Do not use this statement unless needed, and instead use the print() and println() functions.
Note: The GC implementation in my project does not track all memory and it will use a bit of extra memory. This is because some parts of the application uses std::vector and std::string that are not managed by the garbage collector. In the book, all memory is allocated through reallocate() hence the GC has complete control over the memory. But in my implementation, all vectors & strings are wrapped in VM objects, so once they are freed, the associated vectors & strings get freed too.
Native functions for manipulating and accessing properties (fields)
has_property(instance, string) bool- Returns true if the property exists on the instance (similar tohasattrof python)get_property(instance, string) value- Returns the value of the property on the instance (similar togetattrof python)set_property(instance, string, value)- Sets the property on the instance to the given value (similar tosetattrof python)del_property(instance, string) bool- Deletes a property on an instance, returns true if the property was deleted, false if the property did not not exist
This operator is similar to the one in JS and can be used to access properties of nested objects easily (when some intermediate object is nil / not an instance)
Example:
var result = x?.y?.z?.w;
If any of x, y, or z is nil, or the property does not exist, or the property is not an instance, the result is nil
If the base (x) is not defined, the expression results in an error
class Foo {}
var f = Foo();
f.x = print;
println(f.x?.prop); // nil (becasue x is a native function)
For inheritance, instead of < operator, I have decided to use : in my language (similar to C++)
Example:
class Foo {
}
class Bar : Foo {
}
Three new instructions, ZERO, ONE, MINUS_ONE, that push 0, 1, and -1 respectively on to the stack. This was added as an optimization, so that the value need not be loaded from the constant table whenever these values are present.
- Fix division by zero error
- Implement break statements
- Implement slices
- Implement
clear()method for lists - Implement
copy()method for lists & maps (shallow copy, copies only the top level values) - Range for loops (to iterate over maps/lists)
- Support custom hash, comparision operators between objects
- Fix opts.debug_trace_execution (L70 in vm)
- flush native method (when using print)
- implement a chr and ord function (to convert to and from ascii values)