BenchBox

"Running a benchmark should be as simple as import {benchmark}"

BenchBox is a "benchmarking toolbox" that makes it simple to benchmark analytic (OLAP) databases.

Overview

BenchBox provides industry-standard (TPC-H, TPC-DS), academic (Join Order), and custom designed (Primitives) benchmarks for data warehouse workloads.

BenchBox embeds the entire benchmark lifecycle, including query and data generation, result analysis, and reporting for these benchmarks in a single Python tool with simple setup.

BenchBox uses Python-native interfaces for popular local data tools (DuckDB, DataFusion, Polars) and cloud platforms (Snowflake, Databricks, ClickHouse).

Versioning

BenchBox loosely follows Semantic Versioning using the MAJOR.MINOR.PATCH scheme. Variations on the "official" SemVer spec are made to better fit the nature of BenchBox as an evolving benchmarking tool rather than a stable API library. See below for details.

MAJOR when we make incompatible changes OR significant changes in scope or functionality.
MINOR when we add backward-compatible changes OR significantly expand functionality.
PATCH when we make bug fixes or documentation updates, bug-fixes may not be backward-compatible.

Current release: v0.1.1. Check your installation with benchbox --version, which also reports metadata consistency diagnostics pulled from pyproject.toml and documentation markers.

For Developers: See Release Automation Guide for the automated release process with reproducible builds and timestamp normalization.

Alpha Software

BenchBox is ALPHA software. APIs may change, features may be incomplete, and production use is not recommended. See DISCLAIMER.md for full details on what this means and how to get help.

Features

Embedded Benchmarks: Self-contained benchmark data and queries
Eighteen Benchmarks: TPC-H, TPC-DS, TPC-DI, TPC-DS-OBT, TPC-H Skew, TPC-Havoc, SSB, AMPLab, JoinOrder, ClickBench, H2ODB, NYC Taxi, TSBS DevOps, CoffeeShop, TPC-H Data Vault, Read Primitives, Write Primitives, Transaction Primitives
Cross-Database: Same benchmarks work on any database platform
DataFrame Mode: Native DataFrame API benchmarking with Polars, Pandas, and 6 other libraries
SQL Platforms (31): DuckDB, MotherDuck, SQLite, DataFusion, PostgreSQL, TimescaleDB, Polars, ClickHouse, Firebolt, InfluxDB, Databricks, Snowflake, BigQuery, Redshift, Azure Synapse, Microsoft Fabric Warehouse, Trino, Starburst, Presto, Athena, Spark, PySpark, AWS Glue, Amazon EMR Serverless, Athena Spark, GCP Dataproc, GCP Dataproc Serverless, Microsoft Fabric Spark, Azure Synapse Spark, Snowpark Connect, Onehouse Quanton
DataFrame Platforms (7): Polars-DF, Pandas-DF, DataFusion-DF, Modin-DF, Dask-DF, cuDF-DF (GPU), PySpark-DF
Open Table Formats: Delta Lake, Apache Iceberg, Apache Hudi (via Databricks, Quanton, Trino, Spark platforms)
SQL Translation: Automatic query conversion between SQL dialects
Self-Contained Python Package: Core install requires no external database servers or system dependencies; opt-in to extra package installs for cloud platforms when needed.

Documentation Tour

The full documentation lives under docs/ and is published with Sphinx.

Topic	Where to start
Install & first benchmark	docs/usage/getting-started.md
Everyday CLI workflows	docs/usage/cli-quick-start.md
DataFrame benchmarking	docs/platforms/dataframe.md
Config and automation	docs/usage/configuration.md & docs/usage/examples.md
Platform guidance	docs/platforms/platform-selection-guide.md
Troubleshooting	docs/TROUBLESHOOTING.md
Developer docs	docs/DEVELOPMENT.md & docs/design/README.md

Run uv run -- sphinx-build -b html docs docs/_build/html to build the local site.

Benchmarks

TPC Standards: TPC-H, TPC-DS, TPC-DI
Academic Benchmarks: SSB, AMPLab, JoinOrder
Industry Benchmarks: ClickBench, H2ODB, NYC Taxi, TSBS DevOps, CoffeeShop
Data Modeling Variants: TPC-H Data Vault
BenchBox Primitives: Read Primitives, Write Primitives, Transaction Primitives
BenchBox Experimental: TPC-DS-OBT, TPC-Havoc, TPC-H Skew

Related Tools

BenchBox is one of several open-source database benchmarking tools. Each has different strengths:

BenchBox vs BenchBase

BenchBox focuses on OLAP analytic benchmarks while BenchBase focuses on OLTP transactional benchmarks.

BenchBox provides Python-native benchmarking with embedded data generation, while BenchBase uses Java with JDBC drivers. Both have their setup requirements - BenchBox requires Python dependencies and database connections, while BenchBase requires Java and JDBC setup.

Consider BenchBox for analytical workloads when you prefer Python-based tooling. Consider BenchBase for transactional workloads or when you need mature, production-tested benchmarking infrastructure with diverse OLTP workloads (TPC-C, Twitter, YCSB, etc.).

BenchBox vs HammerDB

BenchBox and HammerDB (hosted by the TPC Council) target different workload types.

HammerDB focuses on OLTP transactional benchmarks (TPROC-C derived from TPC-C) with support for enterprise databases - Oracle, SQL Server, PostgreSQL, MySQL/MariaDB, and IBM Db2. It uses Tcl and provides GUI, CLI, and web service interfaces. HammerDB measures throughput in NOPM (New Orders Per Minute) and has decades of enterprise credibility.

BenchBox focuses on OLAP analytical benchmarks (TPC-H, TPC-DS, ClickBench, etc.) with support for the broad platform spectrum - from embedded engines like DuckDB and DataFusion, through DataFrame libraries like Polars and Pandas, to cloud data warehouses like Snowflake, BigQuery, and Databricks.

Consider HammerDB when testing transactional throughput on enterprise databases or when you need TPC Council-sponsored credibility. Consider BenchBox when benchmarking analytical queries across cloud data warehouses, embedded engines, or DataFrame libraries.

BenchBox vs LakeBench

BenchBox and LakeBench are both Python-based benchmarking frameworks, but target different ecosystems.

LakeBench focuses on lakehouse compute engines (Spark, Fabric, Synapse, HDInsight) and evaluates end-to-end ELT workflows - ingestion, transformation, maintenance, and queries - using Delta Lake tables. It offers 4 benchmarks including ELTBench, a custom workflow-oriented benchmark.

BenchBox focuses on the broad ecosystem of analytic platforms-from single-node engines like DuckDB, to DataFrame libraries like Polars and Pandas, through to cloud data warehouses like Snowflake, BigQuery, and Redshift. It provides 18 benchmarks including TPC standards, academic workloads like SSB and JoinOrder, and BenchBox-original benchmarks like TPC-Havoc for optimizer stress testing.

Consider LakeBench when evaluating Spark-based lakehouse engines, testing complete ELT pipeline performance, or working primarily in Microsoft Fabric/Azure environments. Consider BenchBox when benchmarking across the analytic platform spectrum, needing benchmark variety beyond TPC standards, or comparing DataFrame libraries alongside SQL engines.

Installation

BenchBox ships as a Python package with optional extras that enable specific database platforms. Start with the core installation, then layer in the extras that match your environment.

Core Installation (DuckDB + SQLite)

The base package includes everything you need for local development, DuckDB, and SQLite workflows.

Embedded DuckDB engine for quick benchmarks
Local data generators and CLI utilities
SQLite integration for lightweight testing
Does not include remote warehouse connectors (Databricks, Snowflake, etc.)

Install the core package with your preferred tool:

Recommended (using uv):

uv add benchbox

Alternative (pip-compatible):

uv pip install benchbox
# or
python -m pip install benchbox
# or
pipx install benchbox

Optional Dependency Groups

Extras unlock connectors and helpers for each platform. Quote the extras specification so shells like zsh do not expand the brackets.

[cloud] – Databricks, BigQuery, Redshift, Snowflake connectors (recommended starting point)
[cloudstorage] – Cloud storage helpers (cloudpathlib)
[databricks] – Databricks SQL Warehouses (databricks-sql-connector, cloudpathlib)
[bigquery] – Google BigQuery (google-cloud-bigquery, google-cloud-storage, cloudpathlib)
[redshift] – Amazon Redshift (redshift-connector, boto3, cloudpathlib)
[snowflake] – Snowflake Data Cloud (snowflake-connector-python, cloudpathlib)
[clickhouse] – ClickHouse Analytics (clickhouse-driver)
[datafusion] – Apache DataFusion OLAP Engine (datafusion, pyarrow)
[all] – Everything (all connectors, cloud tooling, ClickHouse, and DataFusion)

Installation Matrix

Choose the installation that matches your environment and requirements:

Common Installation Scenarios

Use Case	Platforms Enabled	Extras	Recommended Command (uv)	Alternative (pip-compatible)
Local development & testing	DuckDB, SQLite	`(none)`	`uv add benchbox`	`uv pip install benchbox`
Cloud storage helpers	S3, GCS, Azure path utilities	`[cloudstorage]`	`uv add benchbox --extra cloudstorage`	`uv pip install "benchbox[cloudstorage]"`
All cloud platforms	Databricks, BigQuery, Redshift, Snowflake	`[cloud]`	`uv add benchbox --extra cloud`	`uv pip install "benchbox[cloud]"`
Everything included	All platforms + ClickHouse	`[all]`	`uv add benchbox --extra all`	`uv pip install "benchbox[all]"`
Development with cloud	Core + all platforms + dev tools	`[cloud,dev]`	`uv add benchbox --extra cloud --extra dev`	`uv pip install "benchbox[cloud,dev]"`

Single Platform Installations

Platform	What's Included	Extras	Recommended Command (uv)	Alternative (pip-compatible)
Databricks	SQL Warehouses, Unity Catalog, DBFS	`[databricks]`	`uv add benchbox --extra databricks`	`uv pip install "benchbox[databricks]"`
Google BigQuery	BigQuery, Cloud Storage	`[bigquery]`	`uv add benchbox --extra bigquery`	`uv pip install "benchbox[bigquery]"`
Amazon Redshift	Redshift, S3 integration	`[redshift]`	`uv add benchbox --extra redshift`	`uv pip install "benchbox[redshift]"`
Snowflake	Snowflake Data Cloud	`[snowflake]`	`uv add benchbox --extra snowflake`	`uv pip install "benchbox[snowflake]"`
ClickHouse	ClickHouse Analytics	`[clickhouse]`	`uv add benchbox --extra clickhouse`	`uv pip install "benchbox[clickhouse]"`
DataFusion	Apache DataFusion OLAP Engine	`[datafusion]`	`uv add benchbox --extra datafusion`	`uv pip install "benchbox[datafusion]"`

Advanced Combinations

Scenario	Recommended Command (uv)	Alternative (pip-compatible)	Use Case
Multi-cloud analytics	`uv add benchbox --extra cloud --extra clickhouse`	`uv pip install "benchbox[cloud,clickhouse]"`	Compare cloud platforms + ClickHouse
Full development setup	`uv add benchbox --extra all --extra dev --extra docs`	`uv pip install "benchbox[all,dev,docs]"`	Contributing to BenchBox
AWS-focused	`uv add benchbox --extra redshift`	`uv pip install "benchbox[redshift]"`	Amazon Redshift only
Google Cloud-focused	`uv add benchbox --extra bigquery`	`uv pip install "benchbox[bigquery]"`	Google BigQuery only
Azure-compatible	`uv add benchbox --extra databricks --extra snowflake`	`uv pip install "benchbox[databricks,snowflake]"`	Databricks + Snowflake on Azure

Alternative Package Managers

All installation commands above work with different Python package managers:

Package Manager	Recommended Format	Example	Alternative (pip-compatible)
uv (recommended)	`uv add benchbox --extra <name>`	`uv add benchbox --extra cloud`	`uv pip install "benchbox[cloud]"`
pip	`python -m pip install "benchbox[extras]"`	`python -m pip install "benchbox[cloud]"`	N/A (only format available)
pipx	`pipx install "benchbox[extras]"`	`pipx install "benchbox[cloud]"`	N/A (only format available)

Note: When using pip-compatible syntax, use quotes around the package specification ("benchbox[extras]") to prevent shell expansion in zsh and other shells. The uv add syntax doesn't require quotes.

Installing Multiple Extras

You can combine extras in a single installation command. Order does not matter.

Recommended (using uv):

uv add benchbox --extra cloud --extra clickhouse

Alternative (pip-compatible):

uv pip install "benchbox[cloud,clickhouse]"
python -m pip install "benchbox[cloud,clickhouse]"
pipx install "benchbox[cloud,clickhouse]"

Already installed BenchBox? Re-run the installer with the extras you need or use pipx inject benchbox "benchbox[cloud]" to add connectors to an existing pipx environment.

Validate Your Installation

Use the built-in dependency checker to confirm that everything is ready before running benchmarks.

# Overview of installed extras
benchbox check-deps

# Focus on a single platform
benchbox check-deps --platform databricks

# View the installation matrix in the terminal
benchbox check-deps --matrix

# Include detailed guidance and next steps
benchbox check-deps --verbose

Troubleshooting Installation Issues

Common Installation Problems

Shell and Package Manager Issues:

Shell quoting errors (zsh: no matches found) – wrap extras in quotes: "benchbox[cloud]"
uv not installed – install with pipx install uv or use python -m pip install ... instead
pip cannot find wheels – upgrade packaging tools: python -m pip install --upgrade pip setuptools wheel
Conflicting virtual environments – remove old installs: pip uninstall benchbox before re-installing

Platform-Specific Compilation Issues:

macOS SSL errors – update certificates: /Applications/Python 3.x/Install Certificates.command
Windows Visual C++ build tools missing – install "Desktop development with C++" workload from Visual Studio Installer
Linux missing development packages – install build tools: sudo apt-get install build-essential (Ubuntu/Debian) or sudo yum groupinstall "Development Tools" (RHEL/CentOS)

Cloud Platform Authentication:

Databricks connection issues – verify SQL warehouse is running and accessible
BigQuery authentication errors – ensure service account credentials or gcloud auth is configured
Snowflake connection timeouts – check network connectivity and account URL format
Redshift SSL errors – verify cluster security group allows connections

Installation Verification

After installation, verify everything works:

# Check if BenchBox is installed and working
benchbox --version

# Verify your platform dependencies
benchbox check-deps

# Test core functionality
python -c "from benchbox import TPCH; print('✅ BenchBox core working')"

# Test specific platform (example for BigQuery)
python -c "from benchbox.platforms.bigquery import BigQueryAdapter; print('✅ BigQuery connector working')"

Quick Fixes for Common Errors

Import Error: No module named 'benchbox'

# Verify installation
pip list | grep benchbox
# If missing, reinstall
uv add benchbox
# or: uv pip install benchbox

ModuleNotFoundError for platform-specific connectors

# Install the missing platform extra
uv add benchbox --extra databricks
# or: uv pip install "benchbox[databricks]"

Permission denied errors

# Use user installation if you don't have admin rights
python -m pip install --user "benchbox[cloud]"

Virtual environment conflicts

# Clean install in fresh environment
python -m venv fresh_env
source fresh_env/bin/activate  # or `fresh_env\Scripts\activate` on Windows
uv add benchbox --extra cloud
# or: uv pip install "benchbox[cloud]"

For detailed platform-specific setup guides, see Platform Documentation and the Troubleshooting Guide.

Dependency Overview

BenchBox uses a layered dependency approach: minimal core dependencies for local development plus optional extras for specific platforms.

Core Dependencies (installed automatically)

These libraries are required for every installation and provide complete local benchmarking functionality:

sqlglot – SQL dialect translation between databases
click – Command-line interface framework
rich – Terminal output formatting and progress indicators
psutil – System resource monitoring
pydantic – Data validation and configuration models
pyyaml – YAML configuration file support
duckdb – Embedded analytical database engine
pytest libraries – Testing framework components for built-in validation

The core package includes all necessary Python dependencies for local benchmarking - DuckDB is embedded and ready to go. No external database servers or system installations are required for basic functionality.

Optional Platform Dependencies (installed on demand)

These extras add connectivity to specific platforms and are installed only when needed:

Cloud Platform SDKs:

[cloud] – All major cloud platforms (Databricks, BigQuery, Redshift, Snowflake)
[databricks] – Databricks SQL Warehouses (databricks-sql-connector, cloudpathlib)
[bigquery] – Google BigQuery and Cloud Storage (google-cloud-bigquery, google-cloud-storage)
[redshift] – Amazon Redshift (redshift-connector, boto3 for S3)
[snowflake] – Snowflake Data Cloud (snowflake-connector-python)

Database-Specific Drivers:

[clickhouse] – ClickHouse Analytics (clickhouse-driver)

Development Tools:

[dev] – Development dependencies (additional testing tools)
[docs] – Documentation generation tools

Why This Architecture?

Fast installation: Core package installs quickly with minimal dependencies
No vendor lock-in: Install only the platforms you actually use
Reduced conflicts: Platform-specific dependencies are isolated
Easy maintenance: Update cloud SDKs independently of core functionality

Quick Start

Get started with BenchBox in 3 steps:

1. Install BenchBox

Choose the installation that matches your target platform:

Recommended (using uv):

# For local development (DuckDB only)
uv add benchbox

# For cloud platforms (recommended)
uv add benchbox --extra cloud

# For everything (all platforms + ClickHouse)
uv add benchbox --extra all

Alternative (pip-compatible):

uv pip install benchbox
uv pip install "benchbox[cloud]"
uv pip install "benchbox[all]"

2. Verify Installation

Check that everything is working:

# Verify BenchBox is installed
benchbox --version

# Check available platforms
benchbox check-deps --matrix

3. Run Your First Benchmark

Start with a simple local benchmark:

from benchbox import TPCH

# Create a small TPC-H benchmark for testing
tpch = TPCH(scale_factor=0.01)  # ~10MB dataset for quick testing

# Generate sample data
print("Generating data...")
data_paths = tpch.generate_data()
print(f"✅ Generated {len(data_paths)} data files")

# Get a sample query
query1 = tpch.get_query(1, seed=42)  # Reproducible parameters
print(f"✅ Generated TPC-H Query 1")

# Run on embedded DuckDB (no setup required)
import duckdb
conn = duckdb.connect(":memory:")

# Create schema and load data
conn.execute(tpch.get_create_tables_sql())
for table_file in data_paths:
    table_name = table_file.split('/')[-1].replace('.csv', '')
    conn.execute(f"COPY {table_name} FROM '{table_file}' WITH (DELIMITER '|', HEADER false)")

# Execute the query
result = conn.execute(query1).fetchdf()
print(f"✅ Query executed successfully, returned {len(result)} rows")

Run a DataFrame Benchmark

Compare SQL vs DataFrame execution paradigms:

# SQL mode - queries executed via SQL
benchbox run --platform duckdb --benchmark tpch --scale 0.01

# DataFrame mode - queries executed via native Polars API
benchbox run --platform polars-df --benchmark tpch --scale 0.01

Same benchmark, same scale factor, different execution paradigm.

Next Steps

For Cloud Platforms:

See Platform Documentation for platform-specific setup
Start with examples/getting_started/ for zero-config DuckDB runs and credential-ready cloud samples
Use examples/BENCHMARK_GUIDE.md for quick reference on running all 11 benchmarks
Explore examples/features/ for capability-specific examples (query subsets, tuning, result analysis, etc.)
Check examples/use_cases/ for real-world patterns (CI/CD regression testing, platform evaluation, cost optimization)
See examples/programmatic/ for Python API usage and integration patterns
Use --dry-run OUTPUT_DIR on the CLI or example scripts to export a JSON/YAML plan and per-query SQL files before executing benchmarks
Use benchbox run CLI for full benchmark execution

For Advanced Usage:

Explore all 11 benchmark suites: TPC-H, TPC-DS, TPC-DI, ClickBench, H2ODB, and more
Scale up with larger datasets (scale factors 1.0, 10.0, 100.0+)
Compare performance across different platforms
See examples/INDEX.md for complete examples navigation
See examples/PATTERNS.md for common workflow patterns

Command-Line Interface (CLI)

BenchBox provides a comprehensive command-line interface (CLI) for all benchmarking operations, from data generation to result analysis.

Quick CLI Reference

Command	Purpose	Example
`benchbox run`	Execute benchmarks	`benchbox run --platform duckdb --benchmark tpch`
`benchbox shell`	Interactive SQL shell	`benchbox shell --last --benchmark tpch`
`benchbox platforms`	Manage database platforms	`benchbox platforms list`
`benchbox check-deps`	Check dependencies	`benchbox check-deps --platform databricks`
`benchbox profile`	System analysis	`benchbox profile`
`benchbox benchmarks`	Manage benchmark suites	`benchbox benchmarks list`

Common CLI Workflows

Local Development:

# TPC-H benchmark on DuckDB
benchbox run --platform duckdb --benchmark tpch

# Run specific queries only (in custom order)
benchbox run --platform duckdb --benchmark tpch --queries "Q1,Q6,Q17"

# Run a single query for testing
benchbox run --platform duckdb --benchmark tpch --queries "Q1"

# Explore benchmark data interactively
benchbox shell --last --benchmark tpch

# System analysis for optimization recommendations
benchbox profile

# See all CLI examples
benchbox run --help examples

# Check available benchmarks
benchbox benchmarks list

Running Specific Queries with `--queries`

The --queries flag allows you to run a subset of benchmark queries in your specified order, useful for debugging and focused testing:

# Run specific TPC-H queries in custom order
benchbox run --platform duckdb --benchmark tpch --queries "1,6,17"

# Run single query for debugging
benchbox run --platform duckdb --benchmark tpch --queries "6"

# TPC-DS queries (1-99)
benchbox run --platform duckdb --benchmark tpcds --queries "1,2,3"

Query ID Ranges by Benchmark:

TPC-H: 1-22
TPC-DS: 1-99
SSB: 1-13

⚠️ Important Constraints:

TPC-H Compliance: Using --queries overrides the official TPC-H stream permutation order, making results non-compliant with official TPC-H benchmarks. Use for development/debugging only.
Validation Limits:
- Maximum 100 queries per run
- Query IDs must be alphanumeric (letters, numbers, dash, underscore)
- Maximum 20 characters per query ID
- Duplicate query IDs are removed automatically
Phase Compatibility: Only applies to power and standard phases. Ignored for warmup, throughput, and maintenance phases.
Order Preservation: Queries execute in exactly the order you specify, not the benchmark's default order.

Error Examples:

# ERROR: Invalid query ID for TPC-H (only 1-22 valid)
benchbox run --platform duckdb --benchmark tpch --queries "99"
# ❌ Invalid query IDs: 99. Available: 1-22

# ERROR: Invalid format (no special characters)
benchbox run --platform duckdb --benchmark tpch --queries "1;DROP TABLE"
# ❌ Invalid query ID format (must be alphanumeric)

# ERROR: Too many queries
benchbox run --platform duckdb --benchmark tpch --queries "1,2,3,...,101"
# ❌ Too many queries: 101 (max 100)

# ERROR: Incompatible phases
benchbox run --platform duckdb --benchmark tpch --queries "1,6" --phases warmup
# ❌ --queries only works with power/standard phases

Programmatic API Equivalent:

from benchbox.platforms.duckdb import DuckDBAdapter
from benchbox.tpch import TPCH

benchmark = TPCH(scale_factor=1.0)
adapter = DuckDBAdapter()

# Load data
adapter.load_benchmark_data(benchmark)

# Run specific queries
run_config = {
    "query_subset": ["1", "6", "17"],  # Note: parameter is 'query_subset'
    "timeout": 60,
    "verbose": True
}

results = adapter.run_standard_queries(benchmark, run_config)

Cloud Platform Setup:

# Check platform dependencies
benchbox check-deps --platform databricks

# Install platform dependencies (if needed)
uv add benchbox --extra databricks
# or: uv pip install "benchbox[databricks]"

# Configure platform
benchbox platforms setup

Production Benchmarking:

# Full TPC-DS benchmark on Databricks with tuning
benchbox run --platform databricks --benchmark tpcds --scale 1 \
  --tuning tuned --phases power,throughput \
  --output dbfs:/Volumes/workspace/benchmarks/

# BigQuery with custom configuration
benchbox run --platform bigquery --benchmark tpch --scale 0.1 \
  --platform-option project_id=my-project \
  --verbose

# Snowflake baseline comparison
benchbox run --platform snowflake --benchmark tpch --scale 1 \
  --tuning notuning --output s3://my-bucket/baseline/

Data Generation and Testing:

# Generate test data only
benchbox run --benchmark tpch --scale 0.01 --phases generate \
  --output ./test-data

# Preview configuration without execution
benchbox run --platform databricks --benchmark tpcds --scale 0.1 \
  --dry-run ./preview

# Load data into database
benchbox run --platform duckdb --benchmark tpch --scale 0.1 \
  --phases load --force

Key CLI Features

Multi-Phase Execution:

generate: Create benchmark data files
load: Load data into database
warmup: Warm up database caches
power: Execute single-stream queries
throughput: Execute concurrent query streams
maintenance: Execute data maintenance operations

Platform Integration:

Automatic platform detection and configuration
Platform-specific options via --platform-option KEY=VALUE
Cloud storage support (S3, GCS, Azure Blob, DBFS)
Authentication via environment variables

Advanced Configuration:

Tuning modes: tuned, notuning, or custom config files
Compression options: none, gzip, zstd with configurable levels
Validation: preflight and post-load data validation
Reproducible runs with seed control

Output and Analysis:

Multiple output formats: JSON, CSV, HTML
Dry-run mode for configuration preview
Verbose logging for debugging
Query plan analysis with --show-query-plans

Interactive vs Non-Interactive Mode

Interactive Mode (Default):

# Guided setup with system recommendations
benchbox run

Non-Interactive Mode:

# Direct execution with all parameters specified
benchbox run --platform duckdb --benchmark tpch --scale 0.01 \
  --non-interactive

# Automation-friendly with environment variables
BENCHBOX_NON_INTERACTIVE=true benchbox run \
  --platform databricks --benchmark tpcds --quiet

Platform-Specific Examples

Databricks:

# Databricks SQL Warehouse with Unity Catalog
benchbox run --platform databricks --benchmark tpch --scale 1 \
  --platform-option catalog=main \
  --platform-option schema=benchbox \
  --output dbfs:/Volumes/main/benchbox/results/

# Check available Databricks options
benchbox run --describe-platform-options databricks

BigQuery:

# BigQuery with custom project and dataset
benchbox run --platform bigquery --benchmark tpcds --scale 0.1 \
  --platform-option project_id=my-project \
  --platform-option dataset=benchbox \
  --output gs://my-bucket/benchmarks/

# BigQuery with specific location
benchbox run --platform bigquery --benchmark tpch \
  --platform-option location=europe-west1

Snowflake:

# Snowflake with custom warehouse
benchbox run --platform snowflake --benchmark tpch --scale 1 \
  --platform-option warehouse=LARGE_WH \
  --platform-option database=BENCHBOX \
  --tuning tuned

# Snowflake baseline run
benchbox run --platform snowflake --benchmark tpcds \
  --tuning notuning --phases power

ClickHouse:

# Local ClickHouse instance
benchbox run --platform clickhouse --benchmark clickbench \
  --platform-option mode=local \
  --platform-option port=9000

# ClickHouse with TLS
benchbox run --platform clickhouse --benchmark tpch \
  --platform-option secure=true \
  --platform-option port=9440

Troubleshooting CLI Issues

Common Issues:

Platform Dependencies Missing:

# Check what's needed
benchbox check-deps --platform databricks

# Install missing dependencies
uv add benchbox --extra databricks
# or: uv pip install "benchbox[databricks]"

Authentication Errors:

# Check platform status
benchbox platforms status

# Verify environment variables
echo $DATABRICKS_TOKEN

Memory or Storage Issues:

# Profile system for recommendations
benchbox profile

# Use smaller scale factors
benchbox run --platform duckdb --benchmark tpch --scale 0.001

Configuration Problems:

# Validate configuration
benchbox validate

# Preview settings with dry-run
benchbox run --dry-run ./debug --platform duckdb --benchmark tpch

CLI Help and Documentation

Get Help:

# General help
benchbox --help

# Command-specific help
benchbox run --help
benchbox platforms --help

# Platform options
benchbox run --describe-platform-options clickhouse

Enable Verbose Output:

# Standard verbose logging
benchbox run --verbose --platform duckdb --benchmark tpch

# Very verbose for debugging
benchbox run -vv --platform duckdb --benchmark tpch

For complete CLI documentation, see CLI Reference.

Remote Output Paths (Minimal Example)

When you pass a remote output root (dbfs:/, s3://, gs://, abfss://), BenchBox appends the dataset suffix automatically for consistency with local paths.

CLI example:

benchbox run \
  --platform databricks \
  --benchmark tpch \
  --scale 0.01 \
  --output dbfs:/Volumes/workspace/raw/source/
# Writes to: dbfs:/Volumes/workspace/raw/source/tpch_sf01

Platform-Specific Options

Platform adapters now register their own CLI options that are supplied via the generic --platform-option flag. Each option follows a KEY=VALUE format and can be provided multiple times. For example, to run ClickHouse in local mode with TLS enabled:

benchbox run \
  --platform clickhouse \
  --benchmark tpch \
  --platform-option mode=local \
  --platform-option secure=true

You can inspect the available options for any platform without executing a benchmark by using --describe-platform-options:

benchbox run --describe-platform-options clickhouse

Python helper:

from benchbox.utils.output_path import normalize_output_root
print(normalize_output_root("s3://bucket/prefix", "tpch", 0.01))
# s3://bucket/prefix/tpch_sf01

Benchmarks Provided

TPC Standards

TPC-H - 22 queries for data warehouses. Tests basic SQL operations with string columns and date predicates.
- Official site: http://www.tpc.org/tpch
TPC-DS - 99 complex queries with CTEs, subqueries, window functions. Tests advanced SQL features.
- Official site: http://www.tpc.org/tpcds
TPC-DI - ETL workflows and data integration testing. Focuses on data transformation pipelines.
- Official site: http://www.tpc.org/tpcdi

Academic Benchmarks

SSB - Star schema queries for OLAP testing. Simplified dimensional modeling.
- Original paper: https://www.cs.umb.edu/~poneil/StarSchemaB.PDF
AMPLab - Big data benchmark with text processing. Complex data patterns.
- Original site: https://amplab.cs.berkeley.edu/benchmark/
Join Order - IMDB dataset for join optimization testing. Complex join patterns test cardinality estimation.
- Original paper: https://www.vldb.org/pvldb/vol9/p204-leis.pdf

Industry Benchmarks

ClickBench - Real-world analytical queries from web analytics. Wide range of operations.
- Official site: https://benchmark.clickhouse.com
H2ODB/db-benchmark - Data science operations. GroupBy and join patterns for analytical workloads.
- Current version: https://duckdblabs.github.io/db-benchmark/
NYC Taxi - 25 OLAP queries on real NYC TLC taxi trip data. Temporal, geographic, and financial analytics.
- Data source: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
TSBS DevOps - Time Series Benchmark Suite for DevOps monitoring. 18 queries testing CPU, memory, disk, network metrics.
- Based on: https://github.com/timescale/tsbs
CoffeeShop - Point-of-sale benchmark with regional weighting. 11 analytics queries on retail transaction data.
- Newly created for BenchBox

Data Modeling Variants

TPC-H Data Vault - TPC-H queries adapted for Data Vault 2.0 modeling (Hubs, Links, Satellites). Tests enterprise DWH patterns.
- Newly created for BenchBox

BenchBox Primitives

Read Primitives - 90+ queries testing aggregation, joins, filters, window functions, and advanced SQL operations.
- Newly created for BenchBox
Write Primitives - 117 write operations testing INSERT, UPDATE, DELETE, BULK_LOAD, MERGE, DDL operations.
- Newly created for BenchBox
Transaction Primitives - 8 transaction operations testing ACID compliance, isolation levels, savepoints.
- Newly created for BenchBox

BenchBox Experimental

TPC-DS-OBT - TPC-DS queries adapted for a single denormalized "One Big Table" schema. Tests wide-table analytics.
- Newly created for BenchBox
TPC-Havoc - Query optimizer stress testing. 220 query variants (22 TPC-H queries × 10 syntax variants).
- Newly created for BenchBox
TPC-H Skew - TPC-H with configurable data skew distributions. Tests optimizer behavior on non-uniform data.
- Newly created for BenchBox

TPC-H Detailed Example

BenchBox provides complete TPC-H implementation:

Data generation per specification
22 queries with parameter substitution
Schema definition and SQL generation
Database loading and query execution
Stream generation for concurrent testing
Performance measurement and reporting

Basic Usage

from benchbox import TPCH

# Initialize TPC-H at scale factor 1 (~1GB data)
tpch = TPCH(scale_factor=1, output_dir="tpch_data")

# Generate data files (returns paths to the generated files)
data_files = tpch.generate_data()

# Get schema information
schema = tpch.get_schema()

# Get SQL to create tables
create_tables_sql = tpch.get_create_tables_sql()

# Get a specific query with random parameters
query1 = tpch.get_query(1)

# Get a query with specific parameters
params = {"days": 90}  # 90 days for Query 1
query1_with_params = tpch.get_query(1, params=params, seed=42)

# Get all queries
all_queries = tpch.get_queries()

Advanced Features

from benchbox import TPCH

# Initialize with verbose output
tpch = TPCH(scale_factor=1, output_dir="tpch_data", verbose=True)

# Generate data if needed
tpch.generate_data()

# Generate query streams for concurrent testing
stream_files = tpch.generate_streams(
    num_streams=4,          # 4 concurrent streams
    rng_seed=42,           # Reproducible parameters
    streams_output_dir="streams"
)

# Get stream information
stream_info = tpch.get_all_streams_info()
for stream in stream_info:
    print(f"Stream {stream['stream_id']}: {len(stream['queries'])} queries")

# Load data directly into a database
tpch.load_data_to_database(
    connection_string="duckdb://tpch.db",
    dialect="duckdb",
    drop_existing=True
)

# Run individual queries with timing
result = tpch.run_query(
    query_id=1,
    connection_string="duckdb://tpch.db",
    dialect="duckdb"
)
print(f"Query 1 took {result['execution_time']:.3f}s")

# Run the full benchmark
benchmark_results = tpch.run_benchmark(
    connection_string="duckdb://tpch.db",
    queries=[1, 2, 3, 4, 5],  # Run specific queries
    iterations=3,              # Run each query 3 times
    dialect="duckdb"
)

# Run concurrent streams
stream_results = tpch.run_streams(
    connection_string="duckdb://tpch.db",
    stream_files=stream_files,
    concurrent=True,
    dialect="duckdb"
)

Working with Databases

from benchbox import TPCH
from pathlib import Path
import duckdb

# Initialize TPC-H benchmark
tpch = TPCH(scale_factor=0.1)  # Small scale for quick testing

# Generate data
tpch.generate_data()

# Create a DuckDB database and connection
conn = duckdb.connect("tpch.db")

# Create tables
conn.execute(tpch.get_create_tables_sql())

# Load data using DuckDB's efficient CSV reading
data_files = tpch.generate_data()
for file_path in data_files:
    table_name = Path(file_path).stem  # Get filename without extension
    print(f"Loading {table_name} from {file_path}")

    # DuckDB can read CSV files directly with proper delimiter
    conn.execute(f"""
        COPY {table_name} FROM '{file_path}'
        WITH (DELIMITER '|', HEADER false)
    """)

# Run a query
query = tpch.get_query(1)
results = conn.execute(query).fetchall()
print(results)

# You can also run all queries and time them
import time
for query_id in range(1, 23):  # TPC-H has 22 queries
    query = tpch.get_query(query_id, seed=42)  # Use seed for reproducible parameters
    start_time = time.time()
    result = conn.execute(query).fetchall()
    end_time = time.time()
    print(f"Query {query_id}: {len(result)} rows, {end_time - start_time:.3f}s")

# Clean up
conn.close()

Testing

Run the test suite using either make commands or direct pytest:

# Fast tests (default)
make test
# or
uv run -- python -m pytest -m fast

# All tests
make test-all
# or
uv run -- python -m pytest

# Specific benchmark tests
make test-tpch
# or
uv run -- python -m pytest -m tpch

# Unit tests only
make test-unit
# or
uv run -- python -m pytest -m unit

# Integration tests only
make test-integration
# or
uv run -- python -m pytest -m "integration and not live_integration"

# With coverage
make coverage
# or
uv run -- python -m pytest --cov=benchbox --cov-report=term-missing

Cloud Platform Notebooks

Ready-to-run notebooks for major cloud platforms are available in examples/notebooks:

Databricks: examples/notebooks/databricks_benchmarking.ipynb
BigQuery: examples/notebooks/bigquery_benchmarking.ipynb
Snowflake: examples/notebooks/snowflake_benchmarking.ipynb
Redshift: examples/notebooks/redshift_benchmarking.ipynb
ClickHouse: examples/notebooks/clickhouse_benchmarking.ipynb

See examples/notebooks/README.md for structure, prerequisites, and selection guidance.

Extending BenchBox

Add new benchmarks by extending BaseBenchmark:

from benchbox import BaseBenchmark

class MyCustomBenchmark(BaseBenchmark):
    def __init__(self, scale_factor=1.0, **kwargs):
        super().__init__(scale_factor=scale_factor, **kwargs)
        # Custom initialization

    def generate_data(self, tables=None, output_format="memory"):
        # Implement data generation logic
        pass

    def get_query(self, query_id):
        # Return a specific query
        pass

    def get_all_queries(self):
        # Return all benchmark queries
        pass

    def execute_query(self, query_id, connection, params=None):
        # Execute a query against a database
        pass

Project Structure

BenchBox/
├── benchbox/                       # Main package directory
│   ├── __init__.py                 # Package initialization
│   ├── base.py                     # Base class for benchmarks
│   ├── tpch.py                     # TPC-H implementation
│   ├── core/                       # Core implementation modules
│   │   ├── __init__.py
│   │   ├── tpch/                   # TPC-H detailed implementation
│   │   │   ├── __init__.py
│   │   │   ├── benchmark.py        # Main TPC-H benchmark class
│   │   │   ├── generator.py        # Data generation logic
│   │   │   ├── queries.py          # Query management
│   │   │   └── schema.py           # Schema definition
├── tests/                          # Test directory
│   ├── __init__.py
│   ├── conftest.py                 # Common pytest fixtures
│   ├── test_tpch.py                # Tests for TPC-H benchmark
│   ├── test_tpch_comprehensive.py  # Comprehensive TPC-H tests
│   ├── specialized/                # Specialized test cases
│   │   ├── test_tpch_minimal.py    # Minimal TPC-H tests
│   │   └── test_tpcds_minimal.py   # Minimal TPC-DS tests
│   ├── utilities/                  # Unified test utilities
│   │   ├── unified_test_runner.py  # Unified test runner
│   │   └── benchmark_validator.py  # Benchmark validation
│   ├── integration/                # Integration tests
│   │   ├── __init__.py
│   │   └── test_database_integration.py  # Database integration tests
├── examples/                       # Example scripts and documentation
│   ├── getting_started/            # Beginner-friendly examples
│   │   ├── local/                  # DuckDB and SQLite examples
│   │   └── cloud/                  # Cloud platform examples
│   ├── features/                   # Feature-specific examples (8 files)
│   │   ├── test_types.py           # Power, throughput, maintenance tests
│   │   ├── query_subset.py         # Query selection strategies
│   │   ├── tuning_comparison.py    # Baseline vs tuned comparison
│   │   ├── result_analysis.py      # Result loading and comparison
│   │   ├── multi_platform.py       # Multi-platform execution
│   │   ├── export_formats.py       # JSON, CSV, HTML export
│   │   ├── data_validation.py      # Data quality checks
│   │   └── performance_monitoring.py # Resource monitoring
│   ├── use_cases/                  # Production-ready patterns (4 files)
│   │   ├── ci_regression_test.py   # CI/CD regression testing
│   │   ├── platform_evaluation.py  # Platform comparison
│   │   ├── incremental_tuning.py   # Iterative optimization
│   │   └── cost_optimization.py    # Cost management
│   ├── programmatic/               # Python API documentation
│   │   └── README.md               # API reference and integration examples
│   ├── BENCHMARK_GUIDE.md          # Quick reference for all 11 benchmarks
│   ├── INDEX.md                    # Complete examples navigation
│   └── PATTERNS.md                 # Common workflow patterns
├── Makefile                        # Build and test automation
├── pytest.ini                     # Fast local pytest configuration
├── pytest-ci.ini                  # CI pytest profile (coverage + reports)
└── README.md                       # Project README

Contributing

As alpha software, BenchBox benefits greatly from community feedback and contributions. Here's how you can help:

Reporting Issues

Bug Reports: Found a problem? Create an issue with:

Steps to reproduce the issue
Expected vs actual behavior
Environment details (Python version, platform, database)
Minimal code example if possible

Feature Requests: Have an idea? Open an issue describing:

The use case and problem you're trying to solve
Proposed solution or approach
How it fits with existing functionality

Community Guidelines

Be patient: As alpha software, responses may take time
Search first: Check existing issues before creating new ones
Be specific: Detailed reports help us understand and fix issues faster
Stay constructive: Focus on problems and solutions, not criticism

Development Contributions

Ready to contribute code? Here's the process:

Fork and clone the repository
Install dependencies: uv sync --group dev (or uv pip install -e ".[dev]")
Run tests: make test to ensure everything works
Make changes with appropriate tests
Test thoroughly: make test-all and make lint
Submit pull request with clear description of changes

Contact

GitHub Issues: Primary channel for bugs and features
Discussions: Use GitHub discussions for questions and ideas
Email: For security issues or private concerns: joe@benchbox.dev

Disclaimer

BenchBox is an independent personal project by Joe Harris, not affiliated with any past or present employer. See DISCLAIMER.md for full details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
_binaries		_binaries
_sources		_sources
benchbox		benchbox
docs		docs
examples		examples
landing		landing
scripts		scripts
tests		tests
.codespell-ignore.txt		.codespell-ignore.txt
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYRIGHT.md		COPYRIGHT.md
DISCLAIMER.md		DISCLAIMER.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
RELEASE_VERSION		RELEASE_VERSION
pyproject.toml		pyproject.toml
pytest-ci.ini		pytest-ci.ini
pytest.ini		pytest.ini
uv.lock		uv.lock

License

joeharris76/BenchBox

Folders and files

Latest commit

History

Repository files navigation

BenchBox

Overview

Versioning

Alpha Software

Features

Documentation Tour

Benchmarks

Related Tools

BenchBox vs BenchBase

BenchBox vs HammerDB

BenchBox vs LakeBench

Installation

Core Installation (DuckDB + SQLite)

Optional Dependency Groups

Installation Matrix

Common Installation Scenarios

Single Platform Installations

Advanced Combinations

Alternative Package Managers

Installing Multiple Extras

Validate Your Installation

Troubleshooting Installation Issues

Common Installation Problems

Installation Verification

Quick Fixes for Common Errors

Dependency Overview

Core Dependencies (installed automatically)

Optional Platform Dependencies (installed on demand)

Why This Architecture?

Quick Start

1. Install BenchBox

2. Verify Installation

3. Run Your First Benchmark

Run a DataFrame Benchmark

Next Steps

Command-Line Interface (CLI)

Quick CLI Reference

Common CLI Workflows

Running Specific Queries with --queries

Key CLI Features

Interactive vs Non-Interactive Mode

Platform-Specific Examples

Troubleshooting CLI Issues

CLI Help and Documentation

Remote Output Paths (Minimal Example)

Platform-Specific Options

Benchmarks Provided

TPC Standards

Academic Benchmarks

Industry Benchmarks

Data Modeling Variants

BenchBox Primitives

BenchBox Experimental

TPC-H Detailed Example

Basic Usage

Advanced Features

Working with Databases

Testing

Cloud Platform Notebooks

Extending BenchBox

Project Structure

Contributing

Reporting Issues

Community Guidelines

Development Contributions

Contact

Disclaimer

License

About

Resources

License

Contributing

Uh oh!

Stars

Running Specific Queries with `--queries`

Packages