A modern, high-performance CLI tool for spatial data conversion and processing
GeoETL is a next-generation geospatial ETL (Extract, Transform, Load) tool built with Rust, designed to become a modern replacement for GDAL. It leverages cutting-edge technologies like Apache DataFusion and DataFusion Ballista to deliver blazing-fast performance on both single-node and distributed systems.
To become the modern standard for vector spatial data processing, empowering users with 5-10x faster performance than GDAL, seamless scalability from laptop to cluster, and an intuitive developer experience.
Read the full vision: docs/VISION.md
- High Performance: Vectorized execution using Apache DataFusion, multi-threaded by default
- Scalable: Seamlessly scale from single-node to distributed processing with Ballista
- Memory Safe: Built with Rust for zero-cost memory safety guarantees
- Cloud Native: First-class support for cloud storage (S3, Azure Blob, GCS)
- Modern Architecture: Leverages the GeoRust ecosystem for spatial operations
- Streaming I/O: Process datasets larger than available RAM
- No GDAL Dependency: Clean, modern implementation without legacy dependencies
- Rust: Memory-safe systems programming language
- Apache DataFusion: SQL query engine for fast analytics
- DataFusion Ballista: Distributed compute platform
- Apache Arrow: Columnar in-memory data format
- geo: Geospatial algorithms and operations
- geozero: Zero-copy geospatial data streaming
- proj: Coordinate reference system transformations
- rstar: R-tree spatial indexing
- geojson, flatgeobuf: Format support
Phase 1: Foundation (Q1 2026 - Current)
GeoETL is in active early development. We are currently establishing the core architecture and foundational components.
68+ vector format drivers including:
Core Formats:
- GeoJSON, GeoJSONSeq
- ESRI Shapefile
- GeoPackage (GPKG)
- FlatGeobuf
- (Geo)Parquet
- (Geo)Arrow IPC
Databases:
- PostgreSQL/PostGIS
- MySQL, SQLite/Spatialite
- Oracle Spatial, MongoDB
- Microsoft SQL Server
CAD & Engineering:
- AutoCAD DXF, DWG
- Microstation DGN
- ESRI File Geodatabase
Web Services:
- OGC WFS, OGC API - Features
- Carto, Elasticsearch
- Google Earth Engine
...and many more! See geoetl-cli drivers for the complete list.
Note: GeoETL is in early development. Pre-built binaries will be available in future releases.
# Prerequisites: Rust 1.90.0 or later
git clone https://github.com/yourusername/geoetl.git
cd geoetl
cargo build --release
# The binary will be at: target/release/geoetl-cliFor detailed build instructions and development setup, see docs/DEVELOPMENT.md.
# List available drivers
geoetl-cli drivers
# Convert between formats
geoetl-cli convert \
-i input.geojson \
-o output.parquet \
--input-driver GeoJSON \
--output-driver Parquet
# Get dataset information
geoetl-cli info data.geojson
geoetl-cli info --detailed --stats data.geojson
# Generate shell completions
geoetl-cli completions bash > ~/.local/share/bash-completion/completions/geoetl
# Enable verbose logging
geoetl-cli -v convert -i input.geojson -o output.parquet# See all 68+ supported driver formats
geoetl-cli drivers# GeoJSON to Parquet
geoetl-cli convert \
-i cities.geojson \
-o cities.parquet \
--input-driver GeoJSON \
--output-driver Parquet
# More formats coming in Phase 2# Basic information
geoetl-cli info data.geojson
# Detailed with statistics
geoetl-cli info --detailed --stats data.geojsonGeoETL supports shell completions for faster command-line usage:
# Bash
geoetl-cli completions bash > ~/.local/share/bash-completion/completions/geoetl
# Zsh
geoetl-cli completions zsh > ~/.zsh/completions/_geoetl
# Fish
geoetl-cli completions fish > ~/.config/fish/completions/geoetl.fish
# PowerShell
geoetl-cli completions powershell > geoetl.ps1
# Elvish
geoetl-cli completions elvish > ~/.elvish/completions/geoetl.elv- Documentation Website: Complete guide to using GeoETL with detailed examples
- Quick Reference: Fast command reference and cheat sheet
- Development Guide: Build instructions, workflow, and contribution guidelines
- DataFusion Geospatial Format Integration Guide: Comprehensive guide for implementing custom geospatial file format support using DataFusion and GeoArrow
- Vision Document: Project vision, goals, and strategic roadmap
- Architecture Decision Records: Detailed technical design decisions
GeoETL is organized as a Rust workspace with the following crates:
geoetl/
├── crates/
│ ├── geoetl-cli/ # Command-line interface
│ ├── geoetl-core/ # Core library with spatial operations
│ ├── geoetl-formats/ # Format readers and writers (planned)
│ ├── geoetl-exec/ # Query execution engine (planned)
│ └── geoetl-ballista/ # Distributed execution (planned)
└── docs/ # Documentation
High-level data flow:
CLI → Core Library → DataFusion Engine → Format I/O → Data Sources
↓
Single-Node / Ballista
See ADR 0001 for detailed architecture documentation.
- ✅ Workspace structure
- ✅ Vision and architecture documentation
- ✅ CLI framework with clap (argument parsing, logging)
- ✅ Driver registry (68+ GDAL-compatible drivers)
- ✅ CLI command structure (convert, info, drivers)
- ✅ Tabled-based output formatting
- ⏳ DataFusion integration
- ⏳ Basic vector I/O implementation (GeoJSON, Parquet)
- Vector I/O implementation (read/write operations)
- Driver auto-detection from file extensions
- Core spatial operations
- CRS transformations
- Performance benchmarking
- Advanced spatial algorithms
- Query optimization
- Performance parity with GDAL
- Ballista integration
- Cloud storage support
- Horizontal scaling
| Operation | Target vs GDAL | Method |
|---|---|---|
| Format conversion | 5-10x faster | Vectorized processing |
| Spatial filtering | 5x faster | R-tree indexing, SIMD |
| Buffer operations | 3-5x faster | Parallel execution |
| Spatial joins | 5x faster | Partition-based parallelism |
| Distributed (1TB) | Linear scaling | Ballista partitioning |
We welcome contributions! Whether you want to report bugs, suggest features, or contribute code, we'd love your help.
- Report Issues: GitHub Issues
- Discuss Ideas: GitHub Discussions
- Contribute Code: See docs/DEVELOPMENT.md for setup and guidelines
GeoETL is committed to leveraging and contributing back to the GeoRust ecosystem through open, transparent, community-driven development.
- Documentation: Check the Documentation Website for detailed usage instructions
- Command Help: Run
geoetl-cli --helporgeoetl-cli <command> --help - Issues: Report bugs at GitHub Issues
- Questions: Ask questions in GitHub Discussions
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
GeoETL builds on the shoulders of giants:
- The GeoRust community for excellent geospatial libraries
- The Apache Arrow project for DataFusion and Arrow
- The GeoArrow ecosystem for geospatial data in Arrow format
- The GDAL project for decades of geospatial innovation
- The Rust community for an amazing language and ecosystem
Status: Early Development | Rust Version: 1.90+ | License: MIT/Apache-2.0