"Running a benchmark should be as simple as import {benchmark}"
BenchBox is a "benchmarking toolbox" that makes it simple to benchmark analytic (OLAP) databases.
BenchBox provides industry-standard (TPC-H, TPC-DS), academic (Join Order), and custom designed (Primitives) benchmarks for data warehouse workloads.
BenchBox embeds the entire benchmark lifecycle, including query and data generation, result analysis, and reporting for these benchmarks in a single Python tool with simple setup.
BenchBox uses Python-native interfaces for popular local data tools (DuckDB, DataFusion, Polars) and cloud platforms (Snowflake, Databricks, ClickHouse).
BenchBox loosely follows Semantic Versioning using the MAJOR.MINOR.PATCH scheme. Variations on the "official" SemVer spec are made to better fit the nature of BenchBox as an evolving benchmarking tool rather than a stable API library. See below for details.
- MAJOR when we make incompatible changes OR significant changes in scope or functionality.
- MINOR when we add backward-compatible changes OR significantly expand functionality.
- PATCH when we make bug fixes or documentation updates, bug-fixes may not be backward-compatible.
Current release: v0.1.1. Check your installation with benchbox --version, which also reports metadata consistency diagnostics pulled from pyproject.toml and documentation markers.
For Developers: See Release Automation Guide for the automated release process with reproducible builds and timestamp normalization.
BenchBox is ALPHA software. APIs may change, features may be incomplete, and production use is not recommended. See DISCLAIMER.md for full details on what this means and how to get help.
- Embedded Benchmarks: Self-contained benchmark data and queries
- Eighteen Benchmarks: TPC-H, TPC-DS, TPC-DI, TPC-DS-OBT, TPC-H Skew, TPC-Havoc, SSB, AMPLab, JoinOrder, ClickBench, H2ODB, NYC Taxi, TSBS DevOps, CoffeeShop, TPC-H Data Vault, Read Primitives, Write Primitives, Transaction Primitives
- Cross-Database: Same benchmarks work on any database platform
- DataFrame Mode: Native DataFrame API benchmarking with Polars, Pandas, and 6 other libraries
- SQL Platforms (31): DuckDB, MotherDuck, SQLite, DataFusion, PostgreSQL, TimescaleDB, Polars, ClickHouse, Firebolt, InfluxDB, Databricks, Snowflake, BigQuery, Redshift, Azure Synapse, Microsoft Fabric Warehouse, Trino, Starburst, Presto, Athena, Spark, PySpark, AWS Glue, Amazon EMR Serverless, Athena Spark, GCP Dataproc, GCP Dataproc Serverless, Microsoft Fabric Spark, Azure Synapse Spark, Snowpark Connect, Onehouse Quanton
- DataFrame Platforms (7): Polars-DF, Pandas-DF, DataFusion-DF, Modin-DF, Dask-DF, cuDF-DF (GPU), PySpark-DF
- Open Table Formats: Delta Lake, Apache Iceberg, Apache Hudi (via Databricks, Quanton, Trino, Spark platforms)
- SQL Translation: Automatic query conversion between SQL dialects
- Self-Contained Python Package: Core install requires no external database servers or system dependencies; opt-in to extra package installs for cloud platforms when needed.
The full documentation lives under docs/ and is published with Sphinx.
| Topic | Where to start |
|---|---|
| Install & first benchmark | docs/usage/getting-started.md |
| Everyday CLI workflows | docs/usage/cli-quick-start.md |
| DataFrame benchmarking | docs/platforms/dataframe.md |
| Config and automation | docs/usage/configuration.md & docs/usage/examples.md |
| Platform guidance | docs/platforms/platform-selection-guide.md |
| Troubleshooting | docs/TROUBLESHOOTING.md |
| Developer docs | docs/DEVELOPMENT.md & docs/design/README.md |
Run uv run -- sphinx-build -b html docs docs/_build/html to build the local site.
- TPC Standards: TPC-H, TPC-DS, TPC-DI
- Academic Benchmarks: SSB, AMPLab, JoinOrder
- Industry Benchmarks: ClickBench, H2ODB, NYC Taxi, TSBS DevOps, CoffeeShop
- Data Modeling Variants: TPC-H Data Vault
- BenchBox Primitives: Read Primitives, Write Primitives, Transaction Primitives
- BenchBox Experimental: TPC-DS-OBT, TPC-Havoc, TPC-H Skew
BenchBox is one of several open-source database benchmarking tools. Each has different strengths:
BenchBox focuses on OLAP analytic benchmarks while BenchBase focuses on OLTP transactional benchmarks.
BenchBox provides Python-native benchmarking with embedded data generation, while BenchBase uses Java with JDBC drivers. Both have their setup requirements - BenchBox requires Python dependencies and database connections, while BenchBase requires Java and JDBC setup.
Consider BenchBox for analytical workloads when you prefer Python-based tooling. Consider BenchBase for transactional workloads or when you need mature, production-tested benchmarking infrastructure with diverse OLTP workloads (TPC-C, Twitter, YCSB, etc.).
BenchBox and HammerDB (hosted by the TPC Council) target different workload types.
HammerDB focuses on OLTP transactional benchmarks (TPROC-C derived from TPC-C) with support for enterprise databases - Oracle, SQL Server, PostgreSQL, MySQL/MariaDB, and IBM Db2. It uses Tcl and provides GUI, CLI, and web service interfaces. HammerDB measures throughput in NOPM (New Orders Per Minute) and has decades of enterprise credibility.
BenchBox focuses on OLAP analytical benchmarks (TPC-H, TPC-DS, ClickBench, etc.) with support for the broad platform spectrum - from embedded engines like DuckDB and DataFusion, through DataFrame libraries like Polars and Pandas, to cloud data warehouses like Snowflake, BigQuery, and Databricks.
Consider HammerDB when testing transactional throughput on enterprise databases or when you need TPC Council-sponsored credibility. Consider BenchBox when benchmarking analytical queries across cloud data warehouses, embedded engines, or DataFrame libraries.
BenchBox and LakeBench are both Python-based benchmarking frameworks, but target different ecosystems.
LakeBench focuses on lakehouse compute engines (Spark, Fabric, Synapse, HDInsight) and evaluates end-to-end ELT workflows - ingestion, transformation, maintenance, and queries - using Delta Lake tables. It offers 4 benchmarks including ELTBench, a custom workflow-oriented benchmark.
BenchBox focuses on the broad ecosystem of analytic platforms-from single-node engines like DuckDB, to DataFrame libraries like Polars and Pandas, through to cloud data warehouses like Snowflake, BigQuery, and Redshift. It provides 18 benchmarks including TPC standards, academic workloads like SSB and JoinOrder, and BenchBox-original benchmarks like TPC-Havoc for optimizer stress testing.
Consider LakeBench when evaluating Spark-based lakehouse engines, testing complete ELT pipeline performance, or working primarily in Microsoft Fabric/Azure environments. Consider BenchBox when benchmarking across the analytic platform spectrum, needing benchmark variety beyond TPC standards, or comparing DataFrame libraries alongside SQL engines.
BenchBox ships as a Python package with optional extras that enable specific database platforms. Start with the core installation, then layer in the extras that match your environment.
The base package includes everything you need for local development, DuckDB, and SQLite workflows.
- Embedded DuckDB engine for quick benchmarks
- Local data generators and CLI utilities
- SQLite integration for lightweight testing
- Does not include remote warehouse connectors (Databricks, Snowflake, etc.)
Install the core package with your preferred tool:
Recommended (using uv):
uv add benchboxAlternative (pip-compatible):
uv pip install benchbox
# or
python -m pip install benchbox
# or
pipx install benchboxExtras unlock connectors and helpers for each platform. Quote the extras specification so shells like zsh do not expand the brackets.
[cloud]– Databricks, BigQuery, Redshift, Snowflake connectors (recommended starting point)[cloudstorage]– Cloud storage helpers (cloudpathlib)[databricks]– Databricks SQL Warehouses (databricks-sql-connector,cloudpathlib)[bigquery]– Google BigQuery (google-cloud-bigquery,google-cloud-storage,cloudpathlib)[redshift]– Amazon Redshift (redshift-connector,boto3,cloudpathlib)[snowflake]– Snowflake Data Cloud (snowflake-connector-python,cloudpathlib)[clickhouse]– ClickHouse Analytics (clickhouse-driver)[datafusion]– Apache DataFusion OLAP Engine (datafusion,pyarrow)[all]– Everything (all connectors, cloud tooling, ClickHouse, and DataFusion)
Choose the installation that matches your environment and requirements:
| Use Case | Platforms Enabled | Extras | Recommended Command (uv) | Alternative (pip-compatible) |
|---|---|---|---|---|
| Local development & testing | DuckDB, SQLite | (none) |
uv add benchbox |
uv pip install benchbox |
| Cloud storage helpers | S3, GCS, Azure path utilities | [cloudstorage] |
uv add benchbox --extra cloudstorage |
uv pip install "benchbox[cloudstorage]" |
| All cloud platforms | Databricks, BigQuery, Redshift, Snowflake | [cloud] |
uv add benchbox --extra cloud |
uv pip install "benchbox[cloud]" |
| Everything included | All platforms + ClickHouse | [all] |
uv add benchbox --extra all |
uv pip install "benchbox[all]" |
| Development with cloud | Core + all platforms + dev tools | [cloud,dev] |
uv add benchbox --extra cloud --extra dev |
uv pip install "benchbox[cloud,dev]" |
| Platform | What's Included | Extras | Recommended Command (uv) | Alternative (pip-compatible) |
|---|---|---|---|---|
| Databricks | SQL Warehouses, Unity Catalog, DBFS | [databricks] |
uv add benchbox --extra databricks |
uv pip install "benchbox[databricks]" |
| Google BigQuery | BigQuery, Cloud Storage | [bigquery] |
uv add benchbox --extra bigquery |
uv pip install "benchbox[bigquery]" |
| Amazon Redshift | Redshift, S3 integration | [redshift] |
uv add benchbox --extra redshift |
uv pip install "benchbox[redshift]" |
| Snowflake | Snowflake Data Cloud | [snowflake] |
uv add benchbox --extra snowflake |
uv pip install "benchbox[snowflake]" |
| ClickHouse | ClickHouse Analytics | [clickhouse] |
uv add benchbox --extra clickhouse |
uv pip install "benchbox[clickhouse]" |
| DataFusion | Apache DataFusion OLAP Engine | [datafusion] |
uv add benchbox --extra datafusion |
uv pip install "benchbox[datafusion]" |
| Scenario | Recommended Command (uv) | Alternative (pip-compatible) | Use Case |
|---|---|---|---|
| Multi-cloud analytics | uv add benchbox --extra cloud --extra clickhouse |
uv pip install "benchbox[cloud,clickhouse]" |
Compare cloud platforms + ClickHouse |
| Full development setup | uv add benchbox --extra all --extra dev --extra docs |
uv pip install "benchbox[all,dev,docs]" |
Contributing to BenchBox |
| AWS-focused | uv add benchbox --extra redshift |
uv pip install "benchbox[redshift]" |
Amazon Redshift only |
| Google Cloud-focused | uv add benchbox --extra bigquery |
uv pip install "benchbox[bigquery]" |
Google BigQuery only |
| Azure-compatible | uv add benchbox --extra databricks --extra snowflake |
uv pip install "benchbox[databricks,snowflake]" |
Databricks + Snowflake on Azure |
All installation commands above work with different Python package managers:
| Package Manager | Recommended Format | Example | Alternative (pip-compatible) |
|---|---|---|---|
| uv (recommended) | uv add benchbox --extra <name> |
uv add benchbox --extra cloud |
uv pip install "benchbox[cloud]" |
| pip | python -m pip install "benchbox[extras]" |
python -m pip install "benchbox[cloud]" |
N/A (only format available) |
| pipx | pipx install "benchbox[extras]" |
pipx install "benchbox[cloud]" |
N/A (only format available) |
Note: When using pip-compatible syntax, use quotes around the package specification (
"benchbox[extras]") to prevent shell expansion in zsh and other shells. Theuv addsyntax doesn't require quotes.
You can combine extras in a single installation command. Order does not matter.
Recommended (using uv):
uv add benchbox --extra cloud --extra clickhouseAlternative (pip-compatible):
uv pip install "benchbox[cloud,clickhouse]"
python -m pip install "benchbox[cloud,clickhouse]"
pipx install "benchbox[cloud,clickhouse]"Already installed BenchBox? Re-run the installer with the extras you need or use pipx inject benchbox "benchbox[cloud]" to add connectors to an existing pipx environment.
Use the built-in dependency checker to confirm that everything is ready before running benchmarks.
# Overview of installed extras
benchbox check-deps
# Focus on a single platform
benchbox check-deps --platform databricks
# View the installation matrix in the terminal
benchbox check-deps --matrix
# Include detailed guidance and next steps
benchbox check-deps --verboseShell and Package Manager Issues:
- Shell quoting errors (
zsh: no matches found) – wrap extras in quotes:"benchbox[cloud]" uvnot installed – install withpipx install uvor usepython -m pip install ...insteadpipcannot find wheels – upgrade packaging tools:python -m pip install --upgrade pip setuptools wheel- Conflicting virtual environments – remove old installs:
pip uninstall benchboxbefore re-installing
Platform-Specific Compilation Issues:
- macOS SSL errors – update certificates:
/Applications/Python 3.x/Install Certificates.command - Windows Visual C++ build tools missing – install "Desktop development with C++" workload from Visual Studio Installer
- Linux missing development packages – install build tools:
sudo apt-get install build-essential(Ubuntu/Debian) orsudo yum groupinstall "Development Tools"(RHEL/CentOS)
Cloud Platform Authentication:
- Databricks connection issues – verify SQL warehouse is running and accessible
- BigQuery authentication errors – ensure service account credentials or
gcloud authis configured - Snowflake connection timeouts – check network connectivity and account URL format
- Redshift SSL errors – verify cluster security group allows connections
After installation, verify everything works:
# Check if BenchBox is installed and working
benchbox --version
# Verify your platform dependencies
benchbox check-deps
# Test core functionality
python -c "from benchbox import TPCH; print('✅ BenchBox core working')"
# Test specific platform (example for BigQuery)
python -c "from benchbox.platforms.bigquery import BigQueryAdapter; print('✅ BigQuery connector working')"Import Error: No module named 'benchbox'
# Verify installation
pip list | grep benchbox
# If missing, reinstall
uv add benchbox
# or: uv pip install benchboxModuleNotFoundError for platform-specific connectors
# Install the missing platform extra
uv add benchbox --extra databricks
# or: uv pip install "benchbox[databricks]"Permission denied errors
# Use user installation if you don't have admin rights
python -m pip install --user "benchbox[cloud]"Virtual environment conflicts
# Clean install in fresh environment
python -m venv fresh_env
source fresh_env/bin/activate # or `fresh_env\Scripts\activate` on Windows
uv add benchbox --extra cloud
# or: uv pip install "benchbox[cloud]"For detailed platform-specific setup guides, see Platform Documentation and the Troubleshooting Guide.
BenchBox uses a layered dependency approach: minimal core dependencies for local development plus optional extras for specific platforms.
These libraries are required for every installation and provide complete local benchmarking functionality:
- sqlglot – SQL dialect translation between databases
- click – Command-line interface framework
- rich – Terminal output formatting and progress indicators
- psutil – System resource monitoring
- pydantic – Data validation and configuration models
- pyyaml – YAML configuration file support
- duckdb – Embedded analytical database engine
- pytest libraries – Testing framework components for built-in validation
The core package includes all necessary Python dependencies for local benchmarking - DuckDB is embedded and ready to go. No external database servers or system installations are required for basic functionality.
These extras add connectivity to specific platforms and are installed only when needed:
Cloud Platform SDKs:
[cloud]– All major cloud platforms (Databricks, BigQuery, Redshift, Snowflake)[databricks]– Databricks SQL Warehouses (databricks-sql-connector,cloudpathlib)[bigquery]– Google BigQuery and Cloud Storage (google-cloud-bigquery,google-cloud-storage)[redshift]– Amazon Redshift (redshift-connector,boto3for S3)[snowflake]– Snowflake Data Cloud (snowflake-connector-python)
Database-Specific Drivers:
[clickhouse]– ClickHouse Analytics (clickhouse-driver)
Development Tools:
[dev]– Development dependencies (additional testing tools)[docs]– Documentation generation tools
- Fast installation: Core package installs quickly with minimal dependencies
- No vendor lock-in: Install only the platforms you actually use
- Reduced conflicts: Platform-specific dependencies are isolated
- Easy maintenance: Update cloud SDKs independently of core functionality
Get started with BenchBox in 3 steps:
Choose the installation that matches your target platform:
Recommended (using uv):
# For local development (DuckDB only)
uv add benchbox
# For cloud platforms (recommended)
uv add benchbox --extra cloud
# For everything (all platforms + ClickHouse)
uv add benchbox --extra allAlternative (pip-compatible):
uv pip install benchbox
uv pip install "benchbox[cloud]"
uv pip install "benchbox[all]"Check that everything is working:
# Verify BenchBox is installed
benchbox --version
# Check available platforms
benchbox check-deps --matrixStart with a simple local benchmark:
from benchbox import TPCH
# Create a small TPC-H benchmark for testing
tpch = TPCH(scale_factor=0.01) # ~10MB dataset for quick testing
# Generate sample data
print("Generating data...")
data_paths = tpch.generate_data()
print(f"✅ Generated {len(data_paths)} data files")
# Get a sample query
query1 = tpch.get_query(1, seed=42) # Reproducible parameters
print(f"✅ Generated TPC-H Query 1")
# Run on embedded DuckDB (no setup required)
import duckdb
conn = duckdb.connect(":memory:")
# Create schema and load data
conn.execute(tpch.get_create_tables_sql())
for table_file in data_paths:
table_name = table_file.split('/')[-1].replace('.csv', '')
conn.execute(f"COPY {table_name} FROM '{table_file}' WITH (DELIMITER '|', HEADER false)")
# Execute the query
result = conn.execute(query1).fetchdf()
print(f"✅ Query executed successfully, returned {len(result)} rows")Compare SQL vs DataFrame execution paradigms:
# SQL mode - queries executed via SQL
benchbox run --platform duckdb --benchmark tpch --scale 0.01
# DataFrame mode - queries executed via native Polars API
benchbox run --platform polars-df --benchmark tpch --scale 0.01Same benchmark, same scale factor, different execution paradigm.
For Cloud Platforms:
- See Platform Documentation for platform-specific setup
- Start with
examples/getting_started/for zero-config DuckDB runs and credential-ready cloud samples - Use
examples/BENCHMARK_GUIDE.mdfor quick reference on running all 11 benchmarks - Explore
examples/features/for capability-specific examples (query subsets, tuning, result analysis, etc.) - Check
examples/use_cases/for real-world patterns (CI/CD regression testing, platform evaluation, cost optimization) - See
examples/programmatic/for Python API usage and integration patterns - Use
--dry-run OUTPUT_DIRon the CLI or example scripts to export a JSON/YAML plan and per-query SQL files before executing benchmarks - Use
benchbox runCLI for full benchmark execution
For Advanced Usage:
- Explore all 11 benchmark suites: TPC-H, TPC-DS, TPC-DI, ClickBench, H2ODB, and more
- Scale up with larger datasets (scale factors 1.0, 10.0, 100.0+)
- Compare performance across different platforms
- See examples/INDEX.md for complete examples navigation
- See examples/PATTERNS.md for common workflow patterns
BenchBox provides a comprehensive command-line interface (CLI) for all benchmarking operations, from data generation to result analysis.
| Command | Purpose | Example |
|---|---|---|
benchbox run |
Execute benchmarks | benchbox run --platform duckdb --benchmark tpch |
benchbox shell |
Interactive SQL shell | benchbox shell --last --benchmark tpch |
benchbox platforms |
Manage database platforms | benchbox platforms list |
benchbox check-deps |
Check dependencies | benchbox check-deps --platform databricks |
benchbox profile |
System analysis | benchbox profile |
benchbox benchmarks |
Manage benchmark suites | benchbox benchmarks list |
Local Development:
# TPC-H benchmark on DuckDB
benchbox run --platform duckdb --benchmark tpch
# Run specific queries only (in custom order)
benchbox run --platform duckdb --benchmark tpch --queries "Q1,Q6,Q17"
# Run a single query for testing
benchbox run --platform duckdb --benchmark tpch --queries "Q1"
# Explore benchmark data interactively
benchbox shell --last --benchmark tpch
# System analysis for optimization recommendations
benchbox profile
# See all CLI examples
benchbox run --help examples
# Check available benchmarks
benchbox benchmarks listThe --queries flag allows you to run a subset of benchmark queries in your specified order, useful for debugging and focused testing:
# Run specific TPC-H queries in custom order
benchbox run --platform duckdb --benchmark tpch --queries "1,6,17"
# Run single query for debugging
benchbox run --platform duckdb --benchmark tpch --queries "6"
# TPC-DS queries (1-99)
benchbox run --platform duckdb --benchmark tpcds --queries "1,2,3"Query ID Ranges by Benchmark:
- TPC-H: 1-22
- TPC-DS: 1-99
- SSB: 1-13
-
TPC-H Compliance: Using
--queriesoverrides the official TPC-H stream permutation order, making results non-compliant with official TPC-H benchmarks. Use for development/debugging only. -
Validation Limits:
- Maximum 100 queries per run
- Query IDs must be alphanumeric (letters, numbers, dash, underscore)
- Maximum 20 characters per query ID
- Duplicate query IDs are removed automatically
-
Phase Compatibility: Only applies to
powerandstandardphases. Ignored forwarmup,throughput, andmaintenancephases. -
Order Preservation: Queries execute in exactly the order you specify, not the benchmark's default order.
Error Examples:
# ERROR: Invalid query ID for TPC-H (only 1-22 valid)
benchbox run --platform duckdb --benchmark tpch --queries "99"
# ❌ Invalid query IDs: 99. Available: 1-22
# ERROR: Invalid format (no special characters)
benchbox run --platform duckdb --benchmark tpch --queries "1;DROP TABLE"
# ❌ Invalid query ID format (must be alphanumeric)
# ERROR: Too many queries
benchbox run --platform duckdb --benchmark tpch --queries "1,2,3,...,101"
# ❌ Too many queries: 101 (max 100)
# ERROR: Incompatible phases
benchbox run --platform duckdb --benchmark tpch --queries "1,6" --phases warmup
# ❌ --queries only works with power/standard phasesProgrammatic API Equivalent:
from benchbox.platforms.duckdb import DuckDBAdapter
from benchbox.tpch import TPCH
benchmark = TPCH(scale_factor=1.0)
adapter = DuckDBAdapter()
# Load data
adapter.load_benchmark_data(benchmark)
# Run specific queries
run_config = {
"query_subset": ["1", "6", "17"], # Note: parameter is 'query_subset'
"timeout": 60,
"verbose": True
}
results = adapter.run_standard_queries(benchmark, run_config)Cloud Platform Setup:
# Check platform dependencies
benchbox check-deps --platform databricks
# Install platform dependencies (if needed)
uv add benchbox --extra databricks
# or: uv pip install "benchbox[databricks]"
# Configure platform
benchbox platforms setupProduction Benchmarking:
# Full TPC-DS benchmark on Databricks with tuning
benchbox run --platform databricks --benchmark tpcds --scale 1 \
--tuning tuned --phases power,throughput \
--output dbfs:/Volumes/workspace/benchmarks/
# BigQuery with custom configuration
benchbox run --platform bigquery --benchmark tpch --scale 0.1 \
--platform-option project_id=my-project \
--verbose
# Snowflake baseline comparison
benchbox run --platform snowflake --benchmark tpch --scale 1 \
--tuning notuning --output s3://my-bucket/baseline/Data Generation and Testing:
# Generate test data only
benchbox run --benchmark tpch --scale 0.01 --phases generate \
--output ./test-data
# Preview configuration without execution
benchbox run --platform databricks --benchmark tpcds --scale 0.1 \
--dry-run ./preview
# Load data into database
benchbox run --platform duckdb --benchmark tpch --scale 0.1 \
--phases load --forceMulti-Phase Execution:
generate: Create benchmark data filesload: Load data into databasewarmup: Warm up database cachespower: Execute single-stream queriesthroughput: Execute concurrent query streamsmaintenance: Execute data maintenance operations
Platform Integration:
- Automatic platform detection and configuration
- Platform-specific options via
--platform-option KEY=VALUE - Cloud storage support (S3, GCS, Azure Blob, DBFS)
- Authentication via environment variables
Advanced Configuration:
- Tuning modes:
tuned,notuning, or custom config files - Compression options:
none,gzip,zstdwith configurable levels - Validation: preflight and post-load data validation
- Reproducible runs with seed control
Output and Analysis:
- Multiple output formats: JSON, CSV, HTML
- Dry-run mode for configuration preview
- Verbose logging for debugging
- Query plan analysis with
--show-query-plans
Interactive Mode (Default):
# Guided setup with system recommendations
benchbox runNon-Interactive Mode:
# Direct execution with all parameters specified
benchbox run --platform duckdb --benchmark tpch --scale 0.01 \
--non-interactive
# Automation-friendly with environment variables
BENCHBOX_NON_INTERACTIVE=true benchbox run \
--platform databricks --benchmark tpcds --quietDatabricks:
# Databricks SQL Warehouse with Unity Catalog
benchbox run --platform databricks --benchmark tpch --scale 1 \
--platform-option catalog=main \
--platform-option schema=benchbox \
--output dbfs:/Volumes/main/benchbox/results/
# Check available Databricks options
benchbox run --describe-platform-options databricksBigQuery:
# BigQuery with custom project and dataset
benchbox run --platform bigquery --benchmark tpcds --scale 0.1 \
--platform-option project_id=my-project \
--platform-option dataset=benchbox \
--output gs://my-bucket/benchmarks/
# BigQuery with specific location
benchbox run --platform bigquery --benchmark tpch \
--platform-option location=europe-west1Snowflake:
# Snowflake with custom warehouse
benchbox run --platform snowflake --benchmark tpch --scale 1 \
--platform-option warehouse=LARGE_WH \
--platform-option database=BENCHBOX \
--tuning tuned
# Snowflake baseline run
benchbox run --platform snowflake --benchmark tpcds \
--tuning notuning --phases powerClickHouse:
# Local ClickHouse instance
benchbox run --platform clickhouse --benchmark clickbench \
--platform-option mode=local \
--platform-option port=9000
# ClickHouse with TLS
benchbox run --platform clickhouse --benchmark tpch \
--platform-option secure=true \
--platform-option port=9440Common Issues:
- Platform Dependencies Missing:
# Check what's needed
benchbox check-deps --platform databricks
# Install missing dependencies
uv add benchbox --extra databricks
# or: uv pip install "benchbox[databricks]"- Authentication Errors:
# Check platform status
benchbox platforms status
# Verify environment variables
echo $DATABRICKS_TOKEN- Memory or Storage Issues:
# Profile system for recommendations
benchbox profile
# Use smaller scale factors
benchbox run --platform duckdb --benchmark tpch --scale 0.001- Configuration Problems:
# Validate configuration
benchbox validate
# Preview settings with dry-run
benchbox run --dry-run ./debug --platform duckdb --benchmark tpchGet Help:
# General help
benchbox --help
# Command-specific help
benchbox run --help
benchbox platforms --help
# Platform options
benchbox run --describe-platform-options clickhouseEnable Verbose Output:
# Standard verbose logging
benchbox run --verbose --platform duckdb --benchmark tpch
# Very verbose for debugging
benchbox run -vv --platform duckdb --benchmark tpchFor complete CLI documentation, see CLI Reference.
When you pass a remote output root (dbfs:/, s3://, gs://, abfss://), BenchBox appends the dataset suffix automatically for consistency with local paths.
CLI example:
benchbox run \
--platform databricks \
--benchmark tpch \
--scale 0.01 \
--output dbfs:/Volumes/workspace/raw/source/
# Writes to: dbfs:/Volumes/workspace/raw/source/tpch_sf01Platform adapters now register their own CLI options that are supplied via the
generic --platform-option flag. Each option follows a KEY=VALUE format and
can be provided multiple times. For example, to run ClickHouse in local mode
with TLS enabled:
benchbox run \
--platform clickhouse \
--benchmark tpch \
--platform-option mode=local \
--platform-option secure=trueYou can inspect the available options for any platform without executing a
benchmark by using --describe-platform-options:
benchbox run --describe-platform-options clickhousePython helper:
from benchbox.utils.output_path import normalize_output_root
print(normalize_output_root("s3://bucket/prefix", "tpch", 0.01))
# s3://bucket/prefix/tpch_sf01- TPC-H - 22 queries for data warehouses. Tests basic SQL operations with string columns and date predicates.
- Official site: http://www.tpc.org/tpch
- TPC-DS - 99 complex queries with CTEs, subqueries, window functions. Tests advanced SQL features.
- Official site: http://www.tpc.org/tpcds
- TPC-DI - ETL workflows and data integration testing. Focuses on data transformation pipelines.
- Official site: http://www.tpc.org/tpcdi
- SSB - Star schema queries for OLAP testing. Simplified dimensional modeling.
- Original paper: https://www.cs.umb.edu/~poneil/StarSchemaB.PDF
- AMPLab - Big data benchmark with text processing. Complex data patterns.
- Original site: https://amplab.cs.berkeley.edu/benchmark/
- Join Order - IMDB dataset for join optimization testing. Complex join patterns test cardinality estimation.
- Original paper: https://www.vldb.org/pvldb/vol9/p204-leis.pdf
- ClickBench - Real-world analytical queries from web analytics. Wide range of operations.
- Official site: https://benchmark.clickhouse.com
- H2ODB/db-benchmark - Data science operations. GroupBy and join patterns for analytical workloads.
- Current version: https://duckdblabs.github.io/db-benchmark/
- NYC Taxi - 25 OLAP queries on real NYC TLC taxi trip data. Temporal, geographic, and financial analytics.
- TSBS DevOps - Time Series Benchmark Suite for DevOps monitoring. 18 queries testing CPU, memory, disk, network metrics.
- Based on: https://github.com/timescale/tsbs
- CoffeeShop - Point-of-sale benchmark with regional weighting. 11 analytics queries on retail transaction data.
- Newly created for BenchBox
- TPC-H Data Vault - TPC-H queries adapted for Data Vault 2.0 modeling (Hubs, Links, Satellites). Tests enterprise DWH patterns.
- Newly created for BenchBox
- Read Primitives - 90+ queries testing aggregation, joins, filters, window functions, and advanced SQL operations.
- Newly created for BenchBox
- Write Primitives - 117 write operations testing INSERT, UPDATE, DELETE, BULK_LOAD, MERGE, DDL operations.
- Newly created for BenchBox
- Transaction Primitives - 8 transaction operations testing ACID compliance, isolation levels, savepoints.
- Newly created for BenchBox
- TPC-DS-OBT - TPC-DS queries adapted for a single denormalized "One Big Table" schema. Tests wide-table analytics.
- Newly created for BenchBox
- TPC-Havoc - Query optimizer stress testing. 220 query variants (22 TPC-H queries × 10 syntax variants).
- Newly created for BenchBox
- TPC-H Skew - TPC-H with configurable data skew distributions. Tests optimizer behavior on non-uniform data.
- Newly created for BenchBox
BenchBox provides complete TPC-H implementation:
- Data generation per specification
- 22 queries with parameter substitution
- Schema definition and SQL generation
- Database loading and query execution
- Stream generation for concurrent testing
- Performance measurement and reporting
from benchbox import TPCH
# Initialize TPC-H at scale factor 1 (~1GB data)
tpch = TPCH(scale_factor=1, output_dir="tpch_data")
# Generate data files (returns paths to the generated files)
data_files = tpch.generate_data()
# Get schema information
schema = tpch.get_schema()
# Get SQL to create tables
create_tables_sql = tpch.get_create_tables_sql()
# Get a specific query with random parameters
query1 = tpch.get_query(1)
# Get a query with specific parameters
params = {"days": 90} # 90 days for Query 1
query1_with_params = tpch.get_query(1, params=params, seed=42)
# Get all queries
all_queries = tpch.get_queries()from benchbox import TPCH
# Initialize with verbose output
tpch = TPCH(scale_factor=1, output_dir="tpch_data", verbose=True)
# Generate data if needed
tpch.generate_data()
# Generate query streams for concurrent testing
stream_files = tpch.generate_streams(
num_streams=4, # 4 concurrent streams
rng_seed=42, # Reproducible parameters
streams_output_dir="streams"
)
# Get stream information
stream_info = tpch.get_all_streams_info()
for stream in stream_info:
print(f"Stream {stream['stream_id']}: {len(stream['queries'])} queries")
# Load data directly into a database
tpch.load_data_to_database(
connection_string="duckdb://tpch.db",
dialect="duckdb",
drop_existing=True
)
# Run individual queries with timing
result = tpch.run_query(
query_id=1,
connection_string="duckdb://tpch.db",
dialect="duckdb"
)
print(f"Query 1 took {result['execution_time']:.3f}s")
# Run the full benchmark
benchmark_results = tpch.run_benchmark(
connection_string="duckdb://tpch.db",
queries=[1, 2, 3, 4, 5], # Run specific queries
iterations=3, # Run each query 3 times
dialect="duckdb"
)
# Run concurrent streams
stream_results = tpch.run_streams(
connection_string="duckdb://tpch.db",
stream_files=stream_files,
concurrent=True,
dialect="duckdb"
)from benchbox import TPCH
from pathlib import Path
import duckdb
# Initialize TPC-H benchmark
tpch = TPCH(scale_factor=0.1) # Small scale for quick testing
# Generate data
tpch.generate_data()
# Create a DuckDB database and connection
conn = duckdb.connect("tpch.db")
# Create tables
conn.execute(tpch.get_create_tables_sql())
# Load data using DuckDB's efficient CSV reading
data_files = tpch.generate_data()
for file_path in data_files:
table_name = Path(file_path).stem # Get filename without extension
print(f"Loading {table_name} from {file_path}")
# DuckDB can read CSV files directly with proper delimiter
conn.execute(f"""
COPY {table_name} FROM '{file_path}'
WITH (DELIMITER '|', HEADER false)
""")
# Run a query
query = tpch.get_query(1)
results = conn.execute(query).fetchall()
print(results)
# You can also run all queries and time them
import time
for query_id in range(1, 23): # TPC-H has 22 queries
query = tpch.get_query(query_id, seed=42) # Use seed for reproducible parameters
start_time = time.time()
result = conn.execute(query).fetchall()
end_time = time.time()
print(f"Query {query_id}: {len(result)} rows, {end_time - start_time:.3f}s")
# Clean up
conn.close()Run the test suite using either make commands or direct pytest:
# Fast tests (default)
make test
# or
uv run -- python -m pytest -m fast
# All tests
make test-all
# or
uv run -- python -m pytest
# Specific benchmark tests
make test-tpch
# or
uv run -- python -m pytest -m tpch
# Unit tests only
make test-unit
# or
uv run -- python -m pytest -m unit
# Integration tests only
make test-integration
# or
uv run -- python -m pytest -m "integration and not live_integration"
# With coverage
make coverage
# or
uv run -- python -m pytest --cov=benchbox --cov-report=term-missingReady-to-run notebooks for major cloud platforms are available in examples/notebooks:
- Databricks: examples/notebooks/databricks_benchmarking.ipynb
- BigQuery: examples/notebooks/bigquery_benchmarking.ipynb
- Snowflake: examples/notebooks/snowflake_benchmarking.ipynb
- Redshift: examples/notebooks/redshift_benchmarking.ipynb
- ClickHouse: examples/notebooks/clickhouse_benchmarking.ipynb
See examples/notebooks/README.md for structure, prerequisites, and selection guidance.
Add new benchmarks by extending BaseBenchmark:
from benchbox import BaseBenchmark
class MyCustomBenchmark(BaseBenchmark):
def __init__(self, scale_factor=1.0, **kwargs):
super().__init__(scale_factor=scale_factor, **kwargs)
# Custom initialization
def generate_data(self, tables=None, output_format="memory"):
# Implement data generation logic
pass
def get_query(self, query_id):
# Return a specific query
pass
def get_all_queries(self):
# Return all benchmark queries
pass
def execute_query(self, query_id, connection, params=None):
# Execute a query against a database
passBenchBox/
├── benchbox/ # Main package directory
│ ├── __init__.py # Package initialization
│ ├── base.py # Base class for benchmarks
│ ├── tpch.py # TPC-H implementation
│ ├── core/ # Core implementation modules
│ │ ├── __init__.py
│ │ ├── tpch/ # TPC-H detailed implementation
│ │ │ ├── __init__.py
│ │ │ ├── benchmark.py # Main TPC-H benchmark class
│ │ │ ├── generator.py # Data generation logic
│ │ │ ├── queries.py # Query management
│ │ │ └── schema.py # Schema definition
├── tests/ # Test directory
│ ├── __init__.py
│ ├── conftest.py # Common pytest fixtures
│ ├── test_tpch.py # Tests for TPC-H benchmark
│ ├── test_tpch_comprehensive.py # Comprehensive TPC-H tests
│ ├── specialized/ # Specialized test cases
│ │ ├── test_tpch_minimal.py # Minimal TPC-H tests
│ │ └── test_tpcds_minimal.py # Minimal TPC-DS tests
│ ├── utilities/ # Unified test utilities
│ │ ├── unified_test_runner.py # Unified test runner
│ │ └── benchmark_validator.py # Benchmark validation
│ ├── integration/ # Integration tests
│ │ ├── __init__.py
│ │ └── test_database_integration.py # Database integration tests
├── examples/ # Example scripts and documentation
│ ├── getting_started/ # Beginner-friendly examples
│ │ ├── local/ # DuckDB and SQLite examples
│ │ └── cloud/ # Cloud platform examples
│ ├── features/ # Feature-specific examples (8 files)
│ │ ├── test_types.py # Power, throughput, maintenance tests
│ │ ├── query_subset.py # Query selection strategies
│ │ ├── tuning_comparison.py # Baseline vs tuned comparison
│ │ ├── result_analysis.py # Result loading and comparison
│ │ ├── multi_platform.py # Multi-platform execution
│ │ ├── export_formats.py # JSON, CSV, HTML export
│ │ ├── data_validation.py # Data quality checks
│ │ └── performance_monitoring.py # Resource monitoring
│ ├── use_cases/ # Production-ready patterns (4 files)
│ │ ├── ci_regression_test.py # CI/CD regression testing
│ │ ├── platform_evaluation.py # Platform comparison
│ │ ├── incremental_tuning.py # Iterative optimization
│ │ └── cost_optimization.py # Cost management
│ ├── programmatic/ # Python API documentation
│ │ └── README.md # API reference and integration examples
│ ├── BENCHMARK_GUIDE.md # Quick reference for all 11 benchmarks
│ ├── INDEX.md # Complete examples navigation
│ └── PATTERNS.md # Common workflow patterns
├── Makefile # Build and test automation
├── pytest.ini # Fast local pytest configuration
├── pytest-ci.ini # CI pytest profile (coverage + reports)
└── README.md # Project README
As alpha software, BenchBox benefits greatly from community feedback and contributions. Here's how you can help:
Bug Reports: Found a problem? Create an issue with:
- Steps to reproduce the issue
- Expected vs actual behavior
- Environment details (Python version, platform, database)
- Minimal code example if possible
Feature Requests: Have an idea? Open an issue describing:
- The use case and problem you're trying to solve
- Proposed solution or approach
- How it fits with existing functionality
- Be patient: As alpha software, responses may take time
- Search first: Check existing issues before creating new ones
- Be specific: Detailed reports help us understand and fix issues faster
- Stay constructive: Focus on problems and solutions, not criticism
Ready to contribute code? Here's the process:
- Fork and clone the repository
- Install dependencies:
uv sync --group dev(oruv pip install -e ".[dev]") - Run tests:
make testto ensure everything works - Make changes with appropriate tests
- Test thoroughly:
make test-allandmake lint - Submit pull request with clear description of changes
- GitHub Issues: Primary channel for bugs and features
- Discussions: Use GitHub discussions for questions and ideas
- Email: For security issues or private concerns: joe@benchbox.dev
BenchBox is an independent personal project by Joe Harris, not affiliated with any past or present employer. See DISCLAIMER.md for full details.
This project is licensed under the MIT License - see the LICENSE file for details.