A Rust DSL for building database query rewrite rules with uninterpreted symbols. RuleScript provides a minimal, pragmatic API that wraps DataFusion's native query planning while enabling rule verification and code generation.
RuleScript lets you express query optimizer rewrite rules using abstract patterns with uninterpreted symbols:
// Pattern: source.filter(P).filter(Q) → source.filter(P AND Q)
crate::rule! {
FilterMergeRule {
schemas: {
source: (col: T),
},
functions: {
P(T) -> Bool,
Q(T) -> Bool,
},
from: {
let inner = crate::filter!(source, P(col));
crate::filter!(inner, Q(col))
},
to: crate::filter!(source, P(col) && Q(col)),
}
}The P and Q are uninterpreted predicates - they can represent ANY boolean expression. This means one rule definition covers infinite concrete cases.
Filter Rules:
- FilterMergeRule - Merge consecutive filters
- FilterProjectTransposeRule - Push filter below projection
- FilterAggregateTransposeRule - Push filter predicates on GROUP BY columns below aggregate
- FilterIntoJoinRule - Merge filter into join condition
- FilterReduceTrueRule - Remove filter with true predicate
- FilterReduceFalseRule - Replace filter with false predicate with empty relation
Project Rules:
- ProjectMergeRule - Merge consecutive projections
- ProjectRemoveRule - Remove identity projections
Join Rules:
- JoinCommuteRule - Swap join inputs
- JoinLeftConditionPushRule - Push left-table predicates down as filter on left input
- JoinRightConditionPushRule - Push right-table predicates down as filter on right input
- JoinExtractFilterRule - Extract join condition as filter above join
- JoinLeftProjectTransposeRule - Pull projection from left join input up
- JoinRightProjectTransposeRule - Pull projection from right join input up
- JoinAssociateRule - Restructure nested joins using associativity
Semi-Join Rules:
- LeftSemiJoinFilterTransposeRule - Pull filter above left semi-join
- RightSemiJoinFilterTransposeRule - Pull filter above right semi-join
Prune Empty Rules:
- PruneEmptyFilterRule - Remove filter over empty relation
- PruneEmptyProjectRule - Remove projection over empty relation
- PruneEmptyUnionLeftRule - Simplify union with empty left input
- PruneEmptyUnionRightRule - Simplify union with empty right input
- PruneEmptyUnionBothRule - Simplify union with both inputs empty
All rules have comprehensive tests (79 unit tests + 21 doc tests, all passing).
See src/rule/impls/README.md for detailed rule documentation.
- Pattern Matching: Full support for Filter, Project, Join, and user-defined operators
- Predicate Decomposition: Automatic splitting of conjunctive predicates based on column dependencies
- Function Composition: Support for nested function applications (e.g.,
f(g(x))) - Alias Handling: Transparent matching through alias wrappers
- Column Abstraction: Smart column pattern matching that works with field partitions
- User-Defined Operators: Extensibility for custom logical operators with pattern matching and QED verification support
- QED Export: Serialization to QED format for rule verification, including EXISTS subqueries with outer column references
- DataFusion Integration: RuleWrapper adapter for seamless optimizer integration
src/
ast/
opaque.rs - Abstract types, fields, schemas
relational.rs - Logical plan patterns (Source, Filter, Project, Join)
pattern.rs - Pattern functions (ScalarPattern, AggregatePattern)
extension.rs - User-defined operator support (UserDefinedLogicalOperator trait)
source.rs - Source node implementation
matcher/
mod.rs - PatternMatcher trait and error types
default.rs - DefaultMatcher with full pattern matching logic
rule/
mod.rs - Rule traits (RewriteRule, ApplicableRule)
test.rs - Test utilities (table helpers)
impls/ - Concrete rule implementations
verifier/
mod.rs - Verifier trait for rule verification
qed.rs - QED format serialization with subquery support
lib.rs - Public API exports
examples/
optimizer_repl/ - Interactive demo with all rules
user_defined_left_semi_join.rs - Example of user-defined operator with EXISTS semantics
crate::rule! {
MyRule {
schemas: {
source: (x: T),
},
functions: {
P(T) -> Bool,
},
from: crate::filter!(source, P(x)),
to: source, // Remove the filter
}
}use rulescript::rule::{ApplicableRule, impls::FilterMergeRule};
let rule = FilterMergeRule;
let optimized_plan = rule.try_apply(&concrete_plan)?;use rulescript::rule::RuleWrapper;
use datafusion::optimizer::Optimizer;
let optimizer_rule = RuleWrapper::new(FilterMergeRule);
optimizer.add_rule(Arc::new(optimizer_rule));cargo run --example optimizer_replInteractive demonstration with optimization rules:
- Choose which rules to apply
- See before/after query plans
- Real SQL parsing with DataFusion
See examples/README.md for detailed usage.
# All tests
cargo test
# Specific rule
cargo test filter_merge
# With output
cargo test -- --nocapture
# Clippy checks
cargo clippy --all-targetscrate::rule! {
RuleName {
schemas: {
input_name: (field: Type),
other_input: (x: T1, y: T2),
},
functions: {
FuncName(InputType) -> OutputType,
Predicate(T1, T2) -> Bool,
},
from: { /* pattern to match */ },
to: { /* replacement pattern */ },
}
}// Filter
crate::filter!(source, predicate)
// Project
crate::project!(source, [expr1, expr2])
crate::project!(source, [expr as alias])
// Join
crate::join!(left, right, Inner, condition)
crate::join!(left, right, Left, condition)Based on the paper "RuleScript: A DSL for Query Optimizer Rules" which addresses the challenge of correctly implementing hundreds of rewrite rules in modern optimizers.
Key Concepts:
- Uninterpreted Symbols: Abstract types/functions represent families of concrete queries
- Pattern Matching: Declarative patterns with automatic instantiation
- Verification: Rules can be verified via QED solver (future integration)
Why This Approach:
Modern query optimizers suffer from:
- Error-prone manual implementation (100-200+ rules per optimizer)
- Difficult to verify correctness
- Redundant code across similar rules
RuleScript solves this by:
- One rule definition → many concrete applications
- Automated verification possible
- Declarative patterns reduce implementation complexity
Pattern Matching:
- Column patterns match based on field partitions (one pattern can match multiple columns)
- Predicates decompose automatically based on column dependencies
- Function composition works through context mapping
Implementation:
- Minimal abstraction over DataFusion's native types
- No async/tokio in tests (fast, simple tests)
- Smart defaults (all types map to Binary for uniformity)
- Functions are UDFs that error on execution (pattern-only)
Rule Application:
- DefaultMatcher manages three binding types: fields, functions, sources
- Context-preserving instantiation (bindings stay in their context)
- Recursive plan transformation with captured bindings
The project references two external directories not tracked in git:
parser/- Java implementation with QED-verified rulescalcite/- Apache Calcite source for rule reference
See PARSER_AND_CALCITE_NOTES.txt for details on these directories.
Near-term:
- More complex join rules (with 4-predicate decomposition)
- Rule families with meta-variables
- Additional user-defined operator examples
Long-term:
- SMT solver integration for automated verification
- Code generation adapters for different engines
- Performance optimizations for pattern matching
Current Implementation:
- Join rules only support INNER joins (OUTER joins require IS NOT NULL predicates)
- Some complex expression types not yet handled (SIMILAR TO with escape)
- No optimization for pattern matching efficiency
Design Constraints:
- All abstract symbols must bind to at least one match (no optional predicates)
- Single binding per symbol per rule application
- Strict column validation (all references must exist in context)
- datafusion - Query planning framework
- thiserror - Error handling macros
Active development. Core pattern matching complete. User-defined operator support implemented. QED export working with subquery support. API stabilizing.
Test Status: 79 unit tests + 21 doc tests passing
The project emphasizes correctness and extensibility. Pattern matching, instantiation, and user-defined operators are fully implemented with 22 working rules demonstrating the approach works with real DataFusion plans. QED serialization enables rule verification including complex cases like EXISTS subqueries with outer column references.