High-performance, multi-protocol storage library for AI/ML workloads with universal copy operations across S3, Azure, GCS, local file systems, and DirectIO.
pip install s3dlios3dlio requires several system libraries. Install them before building:
Ubuntu/Debian:
# Quick install - run our helper script
./scripts/install-system-deps.sh
# Or manually:
sudo apt-get install -y \
build-essential pkg-config libssl-dev \
libhdf5-dev libhwloc-dev cmakeRHEL/CentOS/Fedora/Rocky/AlmaLinux:
# Quick install
./scripts/install-system-deps.sh
# Or manually:
sudo dnf install -y \
gcc gcc-c++ make pkg-config openssl-devel \
hdf5-devel hwloc-devel cmakemacOS:
# Quick install
./scripts/install-system-deps.sh
# Or manually:
brew install pkg-config openssl@3 hdf5 hwloc cmake
# Set environment variables (add to ~/.zshrc or ~/.bash_profile):
export PKG_CONFIG_PATH="$(brew --prefix openssl@3)/lib/pkgconfig:$PKG_CONFIG_PATH"
export OPENSSL_DIR="$(brew --prefix openssl@3)"Arch Linux:
# Quick install
./scripts/install-system-deps.sh
# Or manually:
sudo pacman -S base-devel pkg-config openssl hdf5 hwloc cmakecurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env# Clone the repository
git clone https://github.com/russfellows/s3dlio.git
cd s3dlio
# Build with all features
cargo build --release --all-features
# Or build with default features (recommended)
cargo build --release
# Run tests
cargo test
# Build Python bindings (optional)
./build_pyo3.shNote: The hwloc library is optional but recommended for NUMA support on multi-socket systems. s3dlio will build without it but won't have NUMA topology detection.
- 5+ GB/s Performance: High-throughput S3 reads, 2.5+ GB/s writes
- Zero-Copy Architecture:
bytes::Bytesthroughout for minimal memory overhead - Multi-Protocol: S3, Azure Blob, GCS, file://, direct:// (O_DIRECT)
- Python & Rust: Native Rust library with zero-copy Python bindings (PyO3), bytearray support for efficient memory management
- Multi-Endpoint Load Balancing: RoundRobin/LeastConnections across storage endpoints
- AI/ML Ready: PyTorch DataLoader integration, TFRecord/NPZ format support
- High-Speed Data Generation: 50+ GB/s test data with configurable compression/dedup
v0.9.50 (February 2026) - Python runtime fixes, s3torchconnector compat, range download optimization, multipart upload improvements.
Recent highlights:
- v0.9.50 - Python multi-threaded runtime fix (io_uring-style submit), s3torchconnector zero-copy rewrite, S3 range download optimization (76% faster for large objects), multipart upload zero-copy chunking, all 526 tests passing
- v0.9.40 - Enhanced Python bytearray documentation with performance benchmarks (2.5-3x speedup)
- v0.9.37 - Test suite modernization, zero build warnings
- v0.9.36 - BREAKING:
ObjectStore::put()now takesBytesinstead of&[u8]for true zero-copy - v0.9.35 - Hardware detection module, 50+ GB/s data generation
- v0.9.30 - Zero-copy refactor, PyO3 0.27 migration
π Complete Changelog - Full version history, migration guides, API details
For detailed release notes and migration guides, see the Complete Changelog.
Recent versions:
- v0.9.10 (19, October 2024) - Pre-stat size cache for benchmarking (2.5x faster multi-object downloads)
- v0.9.9 (18, October 2025) - Buffer pool optimization for DirectIO (15-20% throughput improvement)
- v0.9.8 (17, October 2025) - Dual GCS backend options, configurable page cache hints
- v0.9.6 (10, October 2025) - RangeEngine disabled by default (performance fix)
- v0.9.5 (9, October 2025) - Adaptive concurrency for deletes (10-70x faster)
- v0.9.3 (8, October 2025) - RangeEngine for Azure & GCS
- v0.9.2 (8, October 2025) - Graceful shutdown & configuration hierarchy
- v0.9.1 (8, October 2025) - Zero-copy Python API with BytesView
- v0.9.0 (7, October 2025) - bytes::Bytes migration (BREAKING)
- v0.8.x (2024-2025) - Production features (universal commands, OpLog, TFRecord indexing)
s3dlio provides unified storage operations across all backends with consistent URI patterns:
- ποΈ Amazon S3:
s3://bucket/prefix/- High-performance S3 operations (5+ GB/s reads, 2.5+ GB/s writes) - βοΈ Azure Blob Storage:
az://container/prefix/- Complete Azure integration with RangeEngine (30-50% faster for large blobs) - π Google Cloud Storage:
gs://bucket/prefix/orgcs://bucket/prefix/- Production ready with RangeEngine and full ObjectStore integration - π Local File System:
file:///path/to/directory/- High-speed local file operations with RangeEngine support - β‘ DirectIO:
direct:///path/to/directory/- Bypass OS cache for maximum I/O performance with RangeEngine
Concurrent range downloads hide network latency by parallelizing HTTP range requests.
Backends with RangeEngine Support:
- β Azure Blob Storage: 30-50% faster for large files (must enable explicitly)
- β Google Cloud Storage: 30-50% faster for large files (must enable explicitly)
- β Local File System: Rarely beneficial due to seek overhead (disabled by default)
- β DirectIO: Rarely beneficial due to O_DIRECT overhead (disabled by default)
- π S3: Coming soon
Default Configuration (v0.9.6+):
- Status: Disabled by default (was: enabled in v0.9.5)
- Reason: Extra HEAD request on every GET causes 50% slowdown for typical workloads
- Threshold: 16MB when enabled
- Chunk size: 64MB default
- Max concurrent: 32 ranges (network) or 16 ranges (local)
How to Enable for Large-File Workloads:
use s3dlio::object_store::{AzureObjectStore, AzureConfig};
let config = AzureConfig {
enable_range_engine: true, // Explicitly enable for large files
..Default::default()
};
let store = AzureObjectStore::with_config(config);When to Enable:
- β Large-file workloads (average size >= 64 MiB)
- β High-bandwidth, high-latency networks
- β Mixed or small-object workloads
- β Local file systems
s3dlio supports two S3 backend implementations. Native AWS SDK is the default and recommended for production use:
# Default: Native AWS SDK backend (RECOMMENDED for production)
cargo build --release
# or explicitly:
cargo build --no-default-features --features native-backends
# Experimental: Apache Arrow object_store backend (optional, for testing)
cargo build --no-default-features --features arrow-backendWhy native-backends is default:
- Proven performance in production workloads
- Optimized for high-throughput S3 operations (5+ GB/s reads, 2.5+ GB/s writes)
- Well-tested with MinIO, Vast, and AWS S3
About arrow-backend:
- Experimental alternative implementation
- No proven performance advantage over native backend
- Useful for comparison testing and development
- Not recommended for production use
s3dlio supports two mutually exclusive GCS backend implementations that can be selected at compile time. Community backend (gcs-community) is the default and recommended for production use:
# Default: Community backend (RECOMMENDED for production)
cargo build --release
# or explicitly:
cargo build --release --features gcs-community
# Experimental: Official Google backend (for testing only)
cargo build --release --no-default-features --features native-backends,s3,gcs-officialWhy gcs-community is default:
- β Production-ready and stable (10/10 tests pass consistently)
- β
Uses community-maintained
gcloud-storagev1.1 crate - β Full ADC (Application Default Credentials) support
- β All operations work reliably: GET, PUT, DELETE, LIST, STAT, range reads
About gcs-official:
β οΈ Experimental only - Known transport flakes in test suites- Uses official Google
google-cloud-storagev1.1 crate - Individual operations work correctly (100% pass when tested alone)
- Full test suite experiences intermittent "transport error" failures (7/10 tests fail)
- Root cause: Upstream HTTP/2 connection pool flake in google-cloud-rust library
- Bug Report: googleapis/google-cloud-rust#3574
- Related Issue: googleapis/google-cloud-rust#3412
- Not recommended for production until upstream issue is resolved
For more details: See GCS Backend Selection Guide
Rust CLI:
git clone https://github.com/russfellows/s3dlio.git
cd s3dlio
cargo build --releasePython Library:
pip install s3dlio
# or build from source:
./build_pyo3.sh && ./install_pyo3_wheel.sh- CLI Guide - Complete command-line interface reference with examples
- Python API Guide - Complete Python library reference with examples
- Multi-Endpoint Guide - Load balancing across multiple storage endpoints (v0.9.14+)
- Rust API Guide v0.9.0 - Complete Rust library reference with migration guide
- Changelog - Version history and release notes
- Adaptive Tuning Guide - Optional performance auto-tuning
- Testing Guide - Test suite documentation
- v0.9.2 Test Summary - β 122/130 tests passing (93.8%)
s3dlio treats upload and download as enhanced versions of the Unix cp command, working across all storage backends:
CLI Usage:
# Upload to any backend with real-time progress
s3-cli upload /local/data/*.log s3://mybucket/logs/
s3-cli upload /local/files/* az://container/data/
s3-cli upload /local/models/* gs://ml-bucket/models/
s3-cli upload /local/backup/* file:///remote-mount/backup/
s3-cli upload /local/cache/* direct:///nvme-storage/cache/
# Download from any backend
s3-cli download s3://bucket/data/ ./local-data/
s3-cli download az://container/logs/ ./logs/
s3-cli download gs://ml-bucket/datasets/ ./datasets/
s3-cli download file:///network-storage/data/ ./data/
# Cross-backend copying workflow
s3-cli download s3://source-bucket/data/ ./temp/
s3-cli upload ./temp/* gs://dest-bucket/data/Advanced Pattern Matching:
# Glob patterns for file selection (upload)
s3-cli upload "/data/*.log" s3://bucket/logs/
s3-cli upload "/files/data_*.csv" az://container/data/
# Regex patterns for listing (use single quotes to prevent shell expansion)
s3-cli ls -r s3://bucket/ -p '.*\.txt$' # Only .txt files
s3-cli ls -r gs://bucket/ -p '.*\.(csv|json)$' # CSV or JSON files
s3-cli ls -r az://acct/cont/ -p '.*/data_.*' # Files with "data_" in path
# Count objects matching pattern (with progress indicator)
s3-cli ls -rc gs://bucket/data/ -p '.*\.npz$'
# Output: β [00:00:05] 71,305 objects (14,261 obj/s)
# Total objects: 142,610 (10.0s, rate: 14,261 objects/s)
# Delete only matching files
s3-cli delete -r s3://bucket/logs/ -p '.*\.log$'See CLI Guide for complete command reference and pattern syntax.
High-Performance Data Operations:
import s3dlio
# Universal upload/download across all backends
s3dlio.upload(['/local/data.csv'], 's3://bucket/data/')
s3dlio.upload(['/local/logs/*.log'], 'az://container/logs/')
s3dlio.upload(['/local/models/*.pt'], 'gs://ml-bucket/models/')
s3dlio.download('s3://bucket/data/', './local-data/')
s3dlio.download('gs://ml-bucket/datasets/', './datasets/')
# High-level AI/ML operations
dataset = s3dlio.create_dataset("s3://bucket/training-data/")
loader = s3dlio.create_async_loader("gs://ml-bucket/data/", {"batch_size": 32})
# PyTorch integration
from s3dlio.torch import S3IterableDataset
from torch.utils.data import DataLoader
dataset = S3IterableDataset("gs://bucket/data/", loader_opts={})
dataloader = DataLoader(dataset, batch_size=16)Streaming & Compression:
# High-performance streaming with compression
options = s3dlio.PyWriterOptions()
options.compression = "zstd"
options.compression_level = 3
writer = s3dlio.create_s3_writer('s3://bucket/data.zst', options)
writer.write_chunk(large_data_bytes)
stats = writer.finalize() # Returns (bytes_written, compressed_bytes)
# Data generation with configurable modes
s3dlio.put("s3://bucket/test-data-{}.bin", num=1000, size=4194304,
data_gen_mode="streaming") # 2.6-3.5x faster for most casesMulti-Endpoint Load Balancing (v0.9.14+):
# Distribute I/O across multiple storage endpoints
store = s3dlio.create_multi_endpoint_store(
uris=[
"s3://bucket-1/data",
"s3://bucket-2/data",
"s3://bucket-3/data",
],
strategy="least_connections" # or "round_t robin"
)
# Zero-copy data access (memoryview compatible)
data = store.get("s3://bucket-1/file.bin")
array = np.frombuffer(memoryview(data), dtype=np.float32)
# Monitor load distribution
stats = store.get_endpoint_stats()
for i, s in enumerate(stats):
print(f"Endpoint {i}: {s['requests']} requests, {s['bytes_transferred']} bytes")π Complete Multi-Endpoint Guide - Load balancing, configuration, use cases
s3dlio delivers world-class performance across all operations:
| Operation | Performance | Notes |
|---|---|---|
| S3 PUT | Up to 3.089 GB/s | Exceeds steady-state baseline by 17.8% |
| S3 GET | Up to 4.826 GB/s | Near line-speed performance |
| Multi-Process | 2-3x faster | Improvement over single process |
| Streaming Mode | 2.6-3.5x faster | For 1-8MB objects vs single-pass |
- HTTP/2 Support: Modern multiplexing for enhanced throughput (with Apache Arrow backend only)
- Intelligent Defaults: Streaming mode automatically selected based on benchmarks
- Multi-Process Architecture: Massive parallelism for maximum performance
- Zero-Copy Streaming: Memory-efficient operations for large datasets
- Configurable Chunk Sizes: Fine-tune performance for your workload
store = s3dlio.PyCheckpointStore('file:///tmp/checkpoints/') store.save('model_state', your_model_data) loaded_data = store.load('model_state')
**Ready for Production**: All core functionality validated, comprehensive test suite, and honest documentation matching actual capabilities.
## Configuration & Tuning
### Environment Variables
s3dlio supports comprehensive configuration through environment variables:
- **HTTP Client Optimization**: `S3DLIO_USE_OPTIMIZED_HTTP=true` - Enhanced connection pooling
- **Runtime Scaling**: `S3DLIO_RT_THREADS=32` - Tokio worker threads
- **Connection Pool**: `S3DLIO_MAX_HTTP_CONNECTIONS=400` - Max connections per host
- **Range GET**: `S3DLIO_RANGE_CONCURRENCY=64` - Large object optimization
- **Operation Logging**: `S3DLIO_OPLOG_LEVEL=2` - S3 operation tracking
π [Environment Variables Reference](docs/api/Environment_Variables.md)
### Operation Logging (Op-Log)
Universal operation trace logging across all backends with zstd-compressed TSV format, warp-replay compatible.
```python
import s3dlio
s3dlio.init_op_log("operations.tsv.zst")
# All operations automatically logged
s3dlio.finalize_op_log()
See S3DLIO OpLog Implementation for detailed usage.
- Rust: Install Rust toolchain
- Python 3.12+: For Python library development
- UV (recommended): Install UV
- HDF5: Required for HDF5 support (
libhdf5-devon Ubuntu,brew install hdf5on macOS)
# Python environment
uv venv && source .venv/bin/activate
# Rust CLI
cargo build --release
# Python library
./build_pyo3.sh && ./install_pyo3_wheel.sh# Required for S3 operations
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_ENDPOINT_URL=https://your-s3-endpoint
AWS_REGION=us-east-1Enable comprehensive S3 operation logging compatible with MinIO warp format:
cargo build --release --features profiling
cargo run --example simple_flamegraph_test --features profilingimport s3dlio
options = s3dlio.PyWriterOptions()
options.compression = "zstd"
writer = s3dlio.create_s3_writer('s3://bucket/data.zst', options)
writer.write_chunk(large_data)
stats = writer.finalize()# Use pre-built container
podman pull quay.io/russfellows-sig65/s3dlio
podman run --net=host --rm -it quay.io/russfellows-sig65/s3dlio
# Or build locally
podman build -t s3dlio .Note: Always use --net=host for storage backend connectivity.
- π₯οΈ CLI Guide: docs/CLI_GUIDE.md - Complete command-line reference
- π Python API: docs/PYTHON_API_GUIDE.md - Python library reference
- π API Documentation: docs/api/
- π Changelog: docs/Changelog.md
- π§ͺ Testing Guide: docs/TESTING-GUIDE.md
- π Performance: docs/performance/
- sai3-bench - Multi-protocol I/O benchmarking suite built on s3dlio
- polarWarp - Op-log analysis tool for parsing and visualizing s3dlio operation logs
Licensed under the Apache License 2.0 - see LICENSE file.
π Ready to get started? Check out the Quick Start section above or explore our example scripts for common use cases!