A statically typed DSL for columnar DataFrame manipulation, transpiling to C++23.
extern fn read_csv(path: String) -> DataFrame from "csv.hpp";
let prices = read_csv("prices.csv");
// Filter, then group-by aggregation
let ohlc = prices[
filter price > 1.0,
select { open = first(price), high = max(price), low = min(price), close = last(price) },
by symbol,
];
// Add derived columns (all existing columns are preserved)
let annotated = prices[update { price_k = price / 1000.0 }];
// Join two tables
let enriched = prices join ohlc on symbol;
Aggregation benchmarks on 4 M rows (prices.csv, 252 symbols).
Release build (-O2), 5 iterations, 1 warmup, WSL2 / clang++.
| query | ibex | ibex+parse | polars | pandas | data.table | dplyr |---------------------+----------+------------+----------+----------+------------+--------- | mean by symbol | 46.6 ms | 44.9 ms | 34.9 ms | 180.2 ms | 24.0 ms | 51.6 ms | OHLC by symbol | 51.2 ms | 50.5 ms | 28.7 ms | 195.7 ms | 27.4 ms | 50.6 ms | update price×2 | 3.54 ms | 3.21 ms | 2.77 ms | 5.42 ms | 40.0 ms | 5.40 ms | count by symbol×day | 137.8 ms | — | 51.8 ms | 314.5 ms | 24.6 ms | 103.4 ms | mean by symbol×day | 138.3 ms | — | 53.5 ms | 318.3 ms | 23.2 ms | 116.0 ms | OHLC by symbol×day | 154.5 ms | — | 57.5 ms | 338.0 ms | 27.4 ms | 137.2 ms
ibex vs. others (geometric mean): 3.1× faster than pandas, on par with dplyr, 2.3× slower than polars, 3.5× slower than data.table.
ibex+parse includes text parsing and IR lowering; the overhead is negligible.
See benchmarking/ for methodology and reproduction instructions.
extern fn read_csv(path: String) -> DataFrame from "csv.hpp";
let iris = read_csv("data/iris.csv");
// Filter rows, select columns
iris[filter `Sepal.Length` > 5.0, select { Species, `Sepal.Length` }];
// Mean sepal length per species
iris[select { mean_sl = mean(`Sepal.Length`) }, by Species];
// Add derived columns — all existing columns are preserved
iris[update { sl_doubled = `Sepal.Length` * 2.0 }];
// Unique species values
iris[distinct `Species`];
// Unique (Species, Sepal.Length) pairs
iris[distinct { `Species`, `Sepal.Length` }];
// Order by a single key (ascending by default)
iris[order `Species`];
// Order by multiple keys with explicit directions
iris[order { `Species` asc, `Sepal.Length` desc }];
// Order by all columns (schema order)
iris[order];
let total = scalar(prices[select { total = sum(price) }], total);
let enriched = prices join ohlc on symbol;
let with_meta = prices left join metadata on symbol;
ibex/
├── include/ibex/ Public headers
│ ├── core/ Column<T>, DataFrame<Schema>
│ ├── ir/ Typed IR nodes (Scan, Filter, Project, Aggregate)
│ ├── parser/ Lexer, recursive-descent parser
│ ├── runtime/ Extern function registry, execution engine
│ └── repl/ Interactive REPL session
├── src/ Implementation files (mirrors include/)
├── libraries/ Bundled plugin sources (csv.hpp, csv.cpp → csv.so)
├── scripts/ Helper shell scripts (build, run, plugin-build)
├── tests/ Catch2 unit tests
├── tools/ CLI binaries (REPL, compiler, benchmark)
├── examples/ Usage examples
└── cmake/ Build system modules
| Module | Responsibility | Dependencies |
|---|---|---|
core |
Columnar storage (Column<T>, DataFrame) |
None |
ir |
Typed intermediate representation nodes | core |
parser |
Source text → IR tree | ir |
runtime |
Extern function registry, execution | core |
repl |
Interactive read-eval-print loop | parser, runtime |
- Static typing: Schema-level type safety for columns and DataFrames
- Relational IR: Clean separation between parsing and execution via a typed IR layer
- C++ interop: Register external C++ functions for use within Ibex queries
- Zero-copy where possible:
std::span-based access to columnar data - Modern C++23: Concepts,
std::expected,std::variant, RAII, no rawnew/delete
Requirements: Clang 17+, CMake 3.26+, Ninja (recommended).
# Debug (with sanitizers)
cmake -B build -G Ninja \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Debug \
-DIBEX_ENABLE_SANITIZERS=ON
cmake --build build
ctest --test-dir build --output-on-failure
# Release
cmake -B build-release -G Ninja \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release
cmake --build build-release| Option | Default | Description |
|---|---|---|
IBEX_WARNINGS_AS_ERRORS |
OFF |
Treat compiler warnings as errors |
IBEX_ENABLE_LTO |
OFF |
Link-time optimization (Release) |
IBEX_ENABLE_SANITIZERS |
OFF |
ASan + UBSan (Debug only) |
IBEX_BUILD_TESTS |
ON |
Build Catch2 test suite |
IBEX_BUILD_TOOLS |
ON |
Build REPL binary |
IBEX_BUILD_EXAMPLES |
ON |
Build example programs |
# With the bundled CSV plugin
IBEX_LIBRARY_PATH=./build-release/libraries ./build-release/tools/ibex
# Or pass the plugin directory explicitly
./build-release/tools/ibex --plugin-path ./build-release/libraries:tables List available tables
:scalars List scalar bindings and values
:schema <table> Show column names and types
:head <table> [n] Show first n rows (default 10)
:describe <table> [n] Schema + first n rows
:load <file> Load and execute an .ibex script
Ibex data-source functions (e.g. read_csv, read_parquet) are plugins —
shared libraries loaded at runtime when a script declares an extern fn.
When the REPL encounters:
extern fn read_csv(path: String) -> DataFrame from "csv.hpp";
it looks for csv.so in the plugin search path and calls its
ibex_register(ExternRegistry*) entry point to register the function.
-
Create a header (
my_source.hpp) that implements your function returningibex::runtime::Table. -
Create a registration file (
my_source.cpp):
#include "my_source.hpp"
#include <ibex/runtime/extern_registry.hpp>
extern "C" void ibex_register(ibex::runtime::ExternRegistry* registry) {
registry->register_table("my_source", [](const ibex::runtime::ExternArgs& args) {
// ...
});
}- Compile it with the helper script:
scripts/ibex-plugin-build.sh my_source.cpp
# Produces: my_source.so next to my_source.cpp- Use it from Ibex:
extern fn my_source(path: String) -> DataFrame from "my_source.hpp";
let df = my_source("data/file.bin");
| Script | Description |
|---|---|
scripts/ibex-plugin-build.sh <src.cpp> [-o out.so] |
Compile a plugin .cpp into a loadable .so |
scripts/ibex-build.sh <file.ibex> [-o output] |
Transpile an .ibex file and produce a binary |
scripts/ibex-run.sh <file.ibex> [-- args...] |
Transpile, compile, and run an .ibex file |
All scripts respect IBEX_ROOT, BUILD_DIR, and CXX environment overrides.
- Time-indexed DataFrame support
- Query optimizer (predicate pushdown, projection pruning)
- REPL tab completion and history