cpp-lox

My implementation of second part of Lox programming language from Crafting interpreters in C++. This implementation uses a bytecode VM to execute the programs.

How to run?

Install meson, ninja and a C++ 20 capable compiler

$ meson setup builddir
$ cd builddir
$ ninja -j8

If you are on linux, run

$ ./src/bin/cpplox

On windows, run

.\src\bin\cpplox.exe

After building, three binaries are produced

./builddir/src/bin/cpplox - Includes a Compiler, VM and a REPL

./builddir/src/bin/loxvm - Standalone VM, does not include compiler, used for running compiled lox programs

./builddir/src/bin/loxdump - For disassembling compiled lox programs

./builddir/src/bin/loxcrc - For updating the CRC checksum of a compiled lox program (operates on the file inplace)

To run tests

To run tests, you need to have Python 3

First install pytest, a test automation library used to test programs.

Create a virtual environment

$ python -m venv .venv

Install dependencies

$ pip install -r requirements.txt

Run tests

$ meson setup builddir -Denable-tests=true
$ cd builddir
$ ninja -j8
$ meson test -v

To run benchmarks

$ cd builddir
$ meson test --benchmark --interactive

Development build

Have Clang installed along with asan. You also need to follow the steps in the above section, pytest must be installed and available on the PATH

Then run the following command,

$ ./development.sh

Help

$ cpplox --help
Lox compiler and interpreter in C++
Usage:
  cpplox [OPTION...] positional parameters

      --print-tokens            Show all tokens generated by the lexer 
                                along with line position
      --compile-only            Compiles the source code to bytecode but 
                                does not execute it
      --dump-bytecode           Dumps the bytecode generated by the 
                                compiler before execution
      --trace-execution         Used for debugging: Prints the bytecode 
                                instruction before executing it
      --trace-eval-stack        Used for debugginig: Prints the internal 
                                evaluation stack before the instruction is 
                                executed
      --enable-step-mode        Used for debugging: Enables step mode and 
                                waits for input before an instruction is 
                                executed
      --eval-stack-size arg     Maximum evaluation stack size allowed 
                                (default: 1024)
      --gc-initial-collection-threshold arg
                                Number of bytes after which GC will collect 
                                garbage for the first time (default: 
                                1048576)
      --log-gc                  Logs events that occur during a GC 
                                collection
      --display-mem-stats       Display memory stats, such as number of 
                                objects allocated, bytes allocated, etc on 
                                program exit
      --stress-gc               Stresses the GC by triggering a collection 
                                everytime a new object is created
  -c, --command arg             Execute the given command and exit
  -g, --emit-debug-info         Include debug information in compiled 
                                bytecode (line number mappings). Can only 
                                be used with -o/--output flag
  -o, --output arg              Compiles the script, generates the bytecode 
                                and store it in the specified file. Does 
                                not execute the program
  -h, --help                    Prints this help message
  -v, --version                 Prints program version

Examples

Check out the examples in the examples/ directory.

For a complex example see, a math evaluation program using Shunting-Yard algorithm here

Notable Changes from the book & Usage

Serialization of bytecode to file and execution of it

Supports storing the compiled bytecode in a file, then loading and executing it. See bytecode file format to learn more about how the compiled program is stored on disk.

To generate a bytecode file, use the -o/--output flag, and optionally the -g/--emit-debug-info flag to emit the line-bytecode offset mapping.

Example:

$ cpp-lox program.lox -o compiled.loxc

or

$ cpp-lox program.lox -o compiled.loxc -g

If you also want to include bytecode line positions.

Execute it with,

$ cpp-lox compiled.loxc

Use src/bin/loxdump to disassemble a compiled lox program

$ ./src/bin/loxdump compiled.loxc

Lists (dynamic arrays)

Implement list data structure (Note: It's implemented as a dynamic array (vector) rather than a linked list (like CPython))

Example:

var x = [1, 2, 3]; // Create a list
println(x[0]); // Get element at index, prints 1
println(x); // Print the list
println(x[0] = 8); // Prints 8, sets index 0 to 8

Implementation: Two new opcodes LIST and LIST_APPEND have been added. LIST takes a single byte operand, the intial size of the list (n), and it reads & pops the top n values on stack and creates a new list object. If n >= 10, the compiler will emit a LIST list instruction to build the list with 10 values, and for the remaining values, it adds a LIST_APPEND after emitting the bytecode for the expression. This is done so that the stack does not overflow when a list is declared with a lot of elements.

Note: Allocating huge lists eg (list(10000000, 0)) may cause the program to pause because the GC has to trace through all those values even if there are no object references. In the future, try to optimize this

Another two instructions: LOAD_INDEX and STORE_INDEX to access and modify list members

For STORE_INDEX: [container, index, value] -> [ ] after execution For LOAD_INDEX: [container, index] -> [value]

Native functions to work with lists

len(list) int: Returns the length of the list, works the same with strings
cap(list) int: Returns the capacity of the backing array of the list
list(len, default, cap) list: Creates a new list with the specified length, capacity, and default value. All three arguments are optional. len is the initial length of the list, cap is the capacity of the list (after which a new reallocation will be triggered), and default is the default value when creating the list (default is nil).
append(list, element): Adds an element to the end of a list
delete(list, index): Removes an element at specified index from list
clear(list): Clears all values of the list
pop(list) value: Removes and returns the last element from a list

String indexing

Indexing works with strings too

The result of indexing a string results in another string (there is no character datatype)

Example:

var x = "hello";
println(x[1]); // prints e

x[2] = "x"; // Not allowed, throws error since strings are immutable

Maps (Hashmap)

This implementation also supports associative data structure, map. Internally it is implemented using hash maps (the same implementation is used for methods and fields)

Declaring a new map,

// Keys & values can be any expression
var x = {
    "name": "hello",
    3: "three",
    3: "three",
};

// Empty map
var x = {};

// Can also be declared as
var x = map();

// Printing type
println(type(x)) // map

Accessing & Modifying values in a map

var x  = {
    "three": 3,
    "four": 4
};
println(x["three"]); // Prints 3
x["three"] = -3; // Updates the value of key "three" to -3
x["two"] = 2; // If a key does not exist, it gets created
println(x["five"]); // If a key does not exist, throws runtime error

Native functions to work with maps

len(map) int: Returns the number of key-value pairs in the map
map() map: Creates a new map
append(map, key, value): Adds the key-value pair to the map (same as using the [] operator)
delete(map, key) bool: Deletes the key from the map, returns true if the key was removed, false if the key does not exist
keys(map) list: Returns a list of keys of the map
values(map) list: Returns a list of values of the map
has(map, key) bool: Returns true if the key exists in the map
clear(map): Clears all key-value pairs of the map
get(map, key, default): Returns the value associated with the key in the map, if the key does not exist, returns the set default value. default is optional, and if not specified, returns nil

Note: Currently there is no way to iterate over all elements of the map using a loop (no range for loop yet), so use the keys function to retrieve all the keys, then iterate over this list using a loop.

Note: Also objects can be used as map keys, but there is no way to specify custom hash or equality operators, so they will be checked for equality based on whether they are the same object. (same memory location)

Lists & Maps are unhashable, and cannot be used as map keys (similar behavior to CPython)

De-duplication of integers

The compiler de-duplicates integers (but not real numbers)

More constants

Supports 65k constants, and 65k global variables (accessed with 2 byte unsigned index)

`const` variables

const variables

In global scope,

If a constant of the same name has already been declared, do not allow
redeclaration with var Example:
>> const x = 32;
>> var x = 50; // Not allowed

>> const x = 80;
>> const x = 90; // Not Allowed

>> var x = 95;
>> var x = 100; // Allowed

One difference is that redeclaration with const is not allowed since it helps make the vm simpler, otherwise an extra flag/byte needs to be maintained to identify if it's a const declaration or a var declaration.

`switch` statement

Supports C-style switch statements with cases and default:

switch (value) {
    case 1:
        print "one";
        break;
    case 2:
        var foo = "hello";
        print foo;
        break;
    default:
        print "other";
}

Both the selector and the case value can be expressions
Each arm of the switch statement defines a local scope, so variables can be declared within it
There is no break statement, there is no fallthrough
Cases cannot appear after the default statement

Local scope redeclaration

In local scope redeclaration of any form is disallowed

List of native functions

sqrt(value number) double - Returns the square root of the number
exit(status_code int) - Exits the interpreter with the status code
input() string - Takes a line of input from stdin, does not include terminating \n
print(args...) - Prints all the arguments, separated by a space
println(args...) - Prints all the arguments, separated by a space, and adds a newline at the end
len(str string) int - Returns the length of the string / list
to_string(value) string - Converts the given value into a string
to_int(value) int - Converts the given value into an int
to_double(value) double - Converts the given value into a double
type(value) string - Returns the type of the value as string
rand() double - Returns a random double between [0, 1) (inclusive of 0, exclusive of 1)
randint(m int, n int) int - Returns a random integer between m & n (inclusive at both ends)
assert(value bool, message string) - Raises a runtime error & prints the message if value is false. Does nothing if value is true
sys__mem_get_bytes_allocated() int - Returns the total number of bytes allocated by the garbage collector
sys__mem_get_bytes_freed() int - Returns the total number of bytes freed by the garbage collector
sys__mem_get_next_gc() int - Returns the threshold in bytes at which the next garbage collection will trigger
sys__mem_get_objects_created() int - Returns the total number of objects created by the garbage collector
sys__mem_get_objects_freed() int - Returns the total number of objects freed by the garbage collector
sys__mem_get_live_objects() int - Returns the current number of live objects managed by the garbage collector
sys__mem_get_net_bytes() int - Returns the net bytes currently allocated (allocated - freed)
sys__mem_display_gc_stats() - Prints detailed garbage collection statistics to the console
hash(value) - Returns the hash value (int64_t of a value), throws error if the type is not hashable
is_hashable(value) bool - Returns true if the value is hashable, false otherwise (like lists, maps)

No print statements

print statement is renamed to echo. This was done to maintain compatability with tests without major refactoring. Do not use this statement unless needed, and instead use the print() and println() functions.

Note: The GC implementation in my project does not track all memory and it will use a bit of extra memory. This is because some parts of the application uses std::vector and std::string that are not managed by the garbage collector. In the book, all memory is allocated through reallocate() hence the GC has complete control over the memory. But in my implementation, all vectors & strings are wrapped in VM objects, so once they are freed, the associated vectors & strings get freed too.

Native functions for accessing properties

Native functions for manipulating and accessing properties (fields)

has_property(instance, string) bool - Returns true if the property exists on the instance (similar to hasattr of python)
get_property(instance, string) value - Returns the value of the property on the instance (similar to getattr of python)
set_property(instance, string, value) - Sets the property on the instance to the given value (similar to setattr of python)
del_property(instance, string) bool - Deletes a property on an instance, returns true if the property was deleted, false if the property did not not exist

Optional chaining operator `?.`

This operator is similar to the one in JS and can be used to access properties of nested objects easily (when some intermediate object is nil / not an instance)

Example:

var result = x?.y?.z?.w;

If any of x, y, or z is nil, or the property does not exist, or the property is not an instance, the result is nil

If the base (x) is not defined, the expression results in an error

class Foo {}
var f = Foo();
f.x = print;
println(f.x?.prop); // nil (becasue x is a native function)

`:` instead of `<` for inheritance

For inheritance, instead of < operator, I have decided to use : in my language (similar to C++)

Example:

class Foo {
}
class Bar : Foo {
}

Additional OPCODES for optimization of constants

Three new instructions, ZERO, ONE, MINUS_ONE, that push 0, 1, and -1 respectively on to the stack. This was added as an optimization, so that the value need not be loaded from the constant table whenever these values are present.

Documents

Opcodes list

Bytecode file format

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
documents		documents
examples		examples
flags		flags
include		include
src		src
subprojects		subprojects
tests		tests
thirdparty		thirdparty
.clang-format		.clang-format
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
development.sh		development.sh
meson.build		meson.build
meson_options.txt		meson_options.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cpp-lox

How to run?

To run tests

To run benchmarks

Development build

Help

Examples

Notable Changes from the book & Usage

Serialization of bytecode to file and execution of it

Lists (dynamic arrays)

String indexing

Maps (Hashmap)

De-duplication of integers

More constants

`const` variables

`switch` statement

Local scope redeclaration

List of native functions

No print statements

Native functions for accessing properties

Optional chaining operator `?.`

`:` instead of `<` for inheritance

Additional OPCODES for optimization of constants

Documents

TODO

About

Uh oh!

Releases 2

Uh oh!

Languages

License

ananthvk/cpp-lox

Folders and files

Latest commit

History

Repository files navigation

cpp-lox

How to run?

To run tests

To run benchmarks

Development build

Help

Examples

Notable Changes from the book & Usage

Serialization of bytecode to file and execution of it

Lists (dynamic arrays)

String indexing

Maps (Hashmap)

De-duplication of integers

More constants

const variables

switch statement

Local scope redeclaration

List of native functions

No print statements

Native functions for accessing properties

Optional chaining operator ?.

: instead of < for inheritance

Additional OPCODES for optimization of constants

Documents

TODO

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Languages

`const` variables

`switch` statement

Optional chaining operator `?.`

`:` instead of `<` for inheritance