GitHub - sirius-db/sirius

Sirius is a GPU-native SQL engine. It plugs into existing databases such as DuckDB via the standard Substrait query format, requiring no query rewrites or major system changes. Sirius currently supports DuckDB and Doris (coming soon), other systems marked with * are on our roadmap. Built on NVIDIA CUDA-X libraries including cuDF and RAPIDS Memory Manager (RMM), Sirius delivers high-performance GPU-accelerated analytics.

Performance

Running TPC-H on SF=100, Sirius achieves ~8x speedup over existing CPU query engines at the same hardware rental cost, making it well-suited for interactive analytics, financial workloads, and ETL jobs.

Experiment Setup:

GPU instance: GH200@LambdaLabs ($1.5/hour)
CPU instance: c8i.8xlarge@AWS ($1.5/hour)

Supported OS/GPU/CUDA/CMake

Ubuntu >= 22.04
NVIDIA Volta™ or higher with compute capability 7.0+
CUDA >= 13.0
CMake >= 3.30.4 (follow this instruction to upgrade CMake)
libcudf >= 26.04
We recommend building Sirius with at least 16 vCPUs to ensure faster compilation.

Requirements

Git (to clone the repo)
Pixi (install instructions here)

Building Sirius

To clone the Sirius repository:

git clone --recurse-submodules https://github.com/sirius-db/sirius.git
cd sirius

The --recurse-submodules will ensure DuckDB is pulled which is required to build the extension.

There is a Pixi manifest available to set up an environment with all required dependencies installed. Start a shell in the environment with:

pixi shell

The environment activation handles setting up everything needed to build and test.

To build Sirius:

CMAKE_BUILD_PARALLEL_LEVEL={nproc} make

Note that if building the extension consumes too much memory, try reducing the CMAKE_BUILD_PARALLEL_LEVEL value used when invoking make.

Optionally, to use the Python API in Sirius, we also need to build the duckdb-python package with the following commands:

pushd duckdb-python
pip install .
popd

Common issues: If pip install . only works inside an environment, then do the following from the Sirius home directory before the installation:

python3 -m venv --prompt duckdb .venv
source .venv/bin/activate

Generating and Loading test datasets

TPC-H Dataset

To generate the TPC-H dataset

cd test_datasets
unzip tpch-dbgen.zip
cd tpch-dbgen
./dbgen -s 1 && mkdir s1 && mv *.tbl s1  # this generates dataset of SF1
cd ../../

To load the TPC-H dataset to duckdb:

./build/release/duckdb {DATABASE_NAME}.duckdb
.read scripts/tpch_load.sql

ClickBench Dataset

To download the dataset run:

cd test_datasets
wget https://pages.cs.wisc.edu/~yxy/sirius-datasets/test_hits.tsv.gz
gzip -d test_hits.tsv.gz
cd ..

To load the dataset to duckdb:

./build/release/duckdb {DATABASE_NAME}.duckdb
.read scripts/clickbench_load_duckdb.sql

Running Sirius: CLI

To run Sirius CLI, simply start the shell with ./build/release/duckdb {DATABASE_NAME}.duckdb. From the duckdb shell, initialize the Sirius buffer manager with call gpu_buffer_init. This API accepts 2 parameters, the GPU caching region size and the GPU processing region size. The GPU caching region is a memory region where the raw data is stored in GPUs, whereas the GPU processing region is where intermediate results are stored in GPUs (hash tables, join results .etc). For example, to set the caching region as 1 GB and the processing region as 2 GB, we can run the following command:

call gpu_buffer_init("1 GB", "2 GB");

By default, Sirius also allocates pinned memory based on the above two arguments. To explicility specify the amount of pinned memory to allocate during initialization run:

call gpu_buffer_init("1 GB", "2 GB", pinned_memory_size = "4 GB");

After setting up Sirius, we can execute SQL queries using the call gpu_processing:

call gpu_processing("select
  l.l_orderkey,
  sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
  o.o_orderdate,
  o.o_shippriority
from
  customer c,
  orders o,
  lineitem l
where
  c.c_mktsegment = 'HOUSEHOLD'
  and c.c_custkey = o.o_custkey
  and l.l_orderkey = o.o_orderkey
  and o.o_orderdate < date '1995-03-25'
  and l.l_shipdate > date '1995-03-25'
group by
  l.l_orderkey,
  o.o_orderdate,
  o.o_shippriority
order by
  revenue desc,
  o.o_orderdate
limit 10;");

The cold run in Sirius would be significantly slower due to data loading from storage and conversion from DuckDB format to Sirius native format. Subsequent runs would be faster since it benefits from caching on GPU memory.

All 22 TPC-H queries are saved in tpch-queries.sql. To run all queries:

.read tpch-queries.sql

Running Sirius: Python API

Make sure to build the duckdb-python package before using the Python API with the method described here. To use the Sirius Python API, add the following code to the beginning of your Python program:

import duckdb
con = duckdb.connect('{DATABASE_NAME}.duckdb', config={"allow_unsigned_extensions": "true"})
con.execute("load '{SIRIUS_HOME_PATH}/build/release/extension/sirius/sirius.duckdb_extension'")
con.execute("call gpu_buffer_init('{GPU_CACHE_SIZE}', '{GPU_PROCESSING_SIZE}')")

To execute query in Python:

con.execute('''
  call gpu_processing("select
    l.l_orderkey,
    sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
    o.o_orderdate,
    o.o_shippriority
  from
    customer c,
    orders o,
    lineitem l
  where
    c.c_mktsegment = 'HOUSEHOLD'
    and c.c_custkey = o.o_custkey
    and l.l_orderkey = o.o_orderkey
    and o.o_orderdate < date '1995-03-25'
    and l.l_shipdate > date '1995-03-25'
  group by
    l.l_orderkey,
    o.o_orderdate,
    o.o_shippriority
  order by
    revenue desc,
    o.o_orderdate
  limit 10;");
            ''').fetchall()

Correctness Testing

SQLLogic Tests

Sirius provides a unit test that compares Sirius against DuckDB for correctness across many test queries. Note that these tests are meant to be end to end tests as they run SQL queries using Sirius and compare that against the expected result. To run the unittest, generate the datasets using the method described here and run the unittest using the following command:

make test

To run a specific test run the command from the root directory:

CMAKE_BUILD_PARALLEL_LEVEL={nproc} make
build/release/test/unittest --test-dir . test/sql/tpch-sirius.test

C++ Tests

Sirius also implements C++ tests for all of the APIs it implements. These tests are meant to be individual unit tests for each of the classes/functions used to run Sirius. You can find examples on how to implement these unit tests in test/cpp. You can run all of the unit tests using:

CMAKE_BUILD_PARALLEL_LEVEL={nproc} make
build/release/extension/sirius/test/cpp/sirius_unittest

To run tests associated with specific tag or to run a specific test you can execute the the test script like this:

CMAKE_BUILD_PARALLEL_LEVEL={nproc} make
build/release/extension/sirius/test/cpp/sirius_unittest "[cpu_cache]"
build/release/extension/sirius/test/cpp/sirius_unittest "test_cpu_cache_basic_string_single_col"

Any logs produced during test execution are saved in:

build/release/extension/sirius/test/cpp/log

Just like duckdb, we are using Catch2 as our testing framework so more details about writing and running tests can be found there.

Logging

Sirius uses spdlog for logging messages during query execution. Default log directory is ${CMAKE_BINARY_DIR}/log and default log level is info, which can be configured by environment variables SIRIUS_LOG_DIR and SIRIUS_LOG_LEVEL. For example:

export SIRIUS_LOG_DIR={PATH for logging}
export SIRIUS_LOG_LEVEL=debug

Limitations

Sirius is under active development, and several features are still in progress. Notable current limitations include:

Data Size Limitations: Sirius currently only works when the dataset fits in the GPU memory capacity. To be more specific, it would return an error if the input data size is larger than the GPU caching region or if the intermediate results is larger than the GPU processing region. We are actively addressing this issue by adding support for partitioning and batch execution (issues #12 and #19), multi-GPUs execution (issue #18), spilling to disk/host memory (issue #19), and distributed query execution (issue #18).
Row Count Limitations: Sirius uses libcudf to implement FILTER, PROJECTION, JOIN, GROUP-BY, ORDER-BY, AGGREGATION. However, since libcudf uses int32_t for row IDs, this imposes limits on the maximum row count that Sirius can currently handle (~2B rows). See libcudf issue #13159 for more details. We are actively addressing this by adding support for partitioning and batch execution. See Sirius issue #12 for more details.
Data Type Coverage: Sirius currently supports commonly used data types including INTEGER, BIGINT, FLOAT, DOUBLE, VARCHAR, DATE, TIMESTAMP, and DECIMAL. We are actively working on supporting additional data types—such as nested types. See issue #20 for more details.
Operator Coverage: At present, Sirius only supports a range of operators including FILTER, PROJECTION, JOIN, GROUP-BY, ORDER-BY, AGGREGATION, TOP-N, LIMIT, and CTE. We are working on adding more advanced operators such as WINDOW functions and ASOF JOIN, etc. See issue #21 for more details.

For a full list of current limitations and ongoing work, please refer to our GitHub issues page. If these issues are encountered when running Sirius, Sirius will gracefully fallback to DuckDB query execution on CPUs.

Future Roadmap

Sirius is still under major development and we are working on adding more features to Sirius, such as storage/disk support, multi-GPUs, multi-node, more operators, data types, accelerating more engines, and many more.

Sirius always welcomes new contributors! If you are interested, check our website, reach out to our email, or join our slack channel.

Let's kickstart the GPU eras for Data Analytics!

Name		Name	Last commit message	Last commit date
Latest commit History 651 Commits
.claude/skills		.claude/skills
.github		.github
cmake		cmake
cucascade @ f22582d		cucascade @ f22582d
docker		docker
docs		docs
duckdb @ 6ddac80		duckdb @ 6ddac80
duckdb-python @ 49da1d7		duckdb-python @ 49da1d7
extension-ci-tools @ 32eb753		extension-ci-tools @ 32eb753
scripts		scripts
src		src
substrait @ 501d558		substrait @ 501d558
test		test
test_datasets		test_datasets
tools		tools
utils/stream_check		utils/stream_check
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.clangd		.clangd
.codespell_words		.codespell_words
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
extension_config.cmake		extension_config.cmake
pixi.lock		pixi.lock
pixi.toml		pixi.toml
setup_test_datasets.sh		setup_test_datasets.sh
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Performance

Supported OS/GPU/CUDA/CMake

Requirements

Building Sirius

Generating and Loading test datasets

TPC-H Dataset

ClickBench Dataset

Running Sirius: CLI

Running Sirius: Python API

Correctness Testing

SQLLogic Tests

C++ Tests

Logging

Limitations

Future Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 27

Languages

Folders and files

Latest commit

History

Repository files navigation

Performance

Supported OS/GPU/CUDA/CMake

Requirements

Building Sirius

Generating and Loading test datasets

TPC-H Dataset

ClickBench Dataset

Running Sirius: CLI

Running Sirius: Python API

Correctness Testing

SQLLogic Tests

C++ Tests

Logging

Limitations

Future Roadmap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 27

Languages

Packages