Skip to content

qed-solver/rulescript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RuleScript

A Rust DSL for building database query rewrite rules with uninterpreted symbols. RuleScript provides a minimal, pragmatic API that wraps DataFusion's native query planning while enabling rule verification and code generation.

What It Does

RuleScript lets you express query optimizer rewrite rules using abstract patterns with uninterpreted symbols:

// Pattern: source.filter(P).filter(Q) → source.filter(P AND Q)
crate::rule! {
    FilterMergeRule {
        schemas: {
            source: (col: T),
        },
        functions: {
            P(T) -> Bool,
            Q(T) -> Bool,
        },
        from: {
            let inner = crate::filter!(source, P(col));
            crate::filter!(inner, Q(col))
        },
        to: crate::filter!(source, P(col) && Q(col)),
    }
}

The P and Q are uninterpreted predicates - they can represent ANY boolean expression. This means one rule definition covers infinite concrete cases.

Current State

Implemented Rules (22 total)

Filter Rules:

  • FilterMergeRule - Merge consecutive filters
  • FilterProjectTransposeRule - Push filter below projection
  • FilterAggregateTransposeRule - Push filter predicates on GROUP BY columns below aggregate
  • FilterIntoJoinRule - Merge filter into join condition
  • FilterReduceTrueRule - Remove filter with true predicate
  • FilterReduceFalseRule - Replace filter with false predicate with empty relation

Project Rules:

  • ProjectMergeRule - Merge consecutive projections
  • ProjectRemoveRule - Remove identity projections

Join Rules:

  • JoinCommuteRule - Swap join inputs
  • JoinLeftConditionPushRule - Push left-table predicates down as filter on left input
  • JoinRightConditionPushRule - Push right-table predicates down as filter on right input
  • JoinExtractFilterRule - Extract join condition as filter above join
  • JoinLeftProjectTransposeRule - Pull projection from left join input up
  • JoinRightProjectTransposeRule - Pull projection from right join input up
  • JoinAssociateRule - Restructure nested joins using associativity

Semi-Join Rules:

  • LeftSemiJoinFilterTransposeRule - Pull filter above left semi-join
  • RightSemiJoinFilterTransposeRule - Pull filter above right semi-join

Prune Empty Rules:

  • PruneEmptyFilterRule - Remove filter over empty relation
  • PruneEmptyProjectRule - Remove projection over empty relation
  • PruneEmptyUnionLeftRule - Simplify union with empty left input
  • PruneEmptyUnionRightRule - Simplify union with empty right input
  • PruneEmptyUnionBothRule - Simplify union with both inputs empty

All rules have comprehensive tests (79 unit tests + 21 doc tests, all passing).

See src/rule/impls/README.md for detailed rule documentation.

Core Features

  • Pattern Matching: Full support for Filter, Project, Join, and user-defined operators
  • Predicate Decomposition: Automatic splitting of conjunctive predicates based on column dependencies
  • Function Composition: Support for nested function applications (e.g., f(g(x)))
  • Alias Handling: Transparent matching through alias wrappers
  • Column Abstraction: Smart column pattern matching that works with field partitions
  • User-Defined Operators: Extensibility for custom logical operators with pattern matching and QED verification support
  • QED Export: Serialization to QED format for rule verification, including EXISTS subqueries with outer column references
  • DataFusion Integration: RuleWrapper adapter for seamless optimizer integration

Architecture

src/
  ast/
    opaque.rs      - Abstract types, fields, schemas
    relational.rs  - Logical plan patterns (Source, Filter, Project, Join)
    pattern.rs     - Pattern functions (ScalarPattern, AggregatePattern)
    extension.rs   - User-defined operator support (UserDefinedLogicalOperator trait)
    source.rs      - Source node implementation
  matcher/
    mod.rs         - PatternMatcher trait and error types
    default.rs     - DefaultMatcher with full pattern matching logic
  rule/
    mod.rs         - Rule traits (RewriteRule, ApplicableRule)
    test.rs        - Test utilities (table helpers)
    impls/         - Concrete rule implementations
  verifier/
    mod.rs         - Verifier trait for rule verification
    qed.rs         - QED format serialization with subquery support
  lib.rs           - Public API exports
examples/
  optimizer_repl/  - Interactive demo with all rules
  user_defined_left_semi_join.rs - Example of user-defined operator with EXISTS semantics

Quick Start

Define a Rule

crate::rule! {
    MyRule {
        schemas: {
            source: (x: T),
        },
        functions: {
            P(T) -> Bool,
        },
        from: crate::filter!(source, P(x)),
        to: source,  // Remove the filter
    }
}

Apply a Rule

use rulescript::rule::{ApplicableRule, impls::FilterMergeRule};

let rule = FilterMergeRule;
let optimized_plan = rule.try_apply(&concrete_plan)?;

Integrate with DataFusion

use rulescript::rule::RuleWrapper;
use datafusion::optimizer::Optimizer;

let optimizer_rule = RuleWrapper::new(FilterMergeRule);
optimizer.add_rule(Arc::new(optimizer_rule));

Run Interactive Demo

cargo run --example optimizer_repl

Interactive demonstration with optimization rules:

  • Choose which rules to apply
  • See before/after query plans
  • Real SQL parsing with DataFusion

See examples/README.md for detailed usage.

Run Tests

# All tests
cargo test

# Specific rule
cargo test filter_merge

# With output
cargo test -- --nocapture

# Clippy checks
cargo clippy --all-targets

Macro Reference

rule! - Define Complete Rules

crate::rule! {
    RuleName {
        schemas: {
            input_name: (field: Type),
            other_input: (x: T1, y: T2),
        },
        functions: {
            FuncName(InputType) -> OutputType,
            Predicate(T1, T2) -> Bool,
        },
        from: { /* pattern to match */ },
        to: { /* replacement pattern */ },
    }
}

Plan Construction Macros

// Filter
crate::filter!(source, predicate)

// Project
crate::project!(source, [expr1, expr2])
crate::project!(source, [expr as alias])

// Join
crate::join!(left, right, Inner, condition)
crate::join!(left, right, Left, condition)

Theoretical Foundation

Based on the paper "RuleScript: A DSL for Query Optimizer Rules" which addresses the challenge of correctly implementing hundreds of rewrite rules in modern optimizers.

Key Concepts:

  • Uninterpreted Symbols: Abstract types/functions represent families of concrete queries
  • Pattern Matching: Declarative patterns with automatic instantiation
  • Verification: Rules can be verified via QED solver (future integration)

Why This Approach:

Modern query optimizers suffer from:

  1. Error-prone manual implementation (100-200+ rules per optimizer)
  2. Difficult to verify correctness
  3. Redundant code across similar rules

RuleScript solves this by:

  1. One rule definition → many concrete applications
  2. Automated verification possible
  3. Declarative patterns reduce implementation complexity

Key Design Decisions

Pattern Matching:

  • Column patterns match based on field partitions (one pattern can match multiple columns)
  • Predicates decompose automatically based on column dependencies
  • Function composition works through context mapping

Implementation:

  • Minimal abstraction over DataFusion's native types
  • No async/tokio in tests (fast, simple tests)
  • Smart defaults (all types map to Binary for uniformity)
  • Functions are UDFs that error on execution (pattern-only)

Rule Application:

  • DefaultMatcher manages three binding types: fields, functions, sources
  • Context-preserving instantiation (bindings stay in their context)
  • Recursive plan transformation with captured bindings

External Resources

The project references two external directories not tracked in git:

  • parser/ - Java implementation with QED-verified rules
  • calcite/ - Apache Calcite source for rule reference

See PARSER_AND_CALCITE_NOTES.txt for details on these directories.

Future Work

Near-term:

  • More complex join rules (with 4-predicate decomposition)
  • Rule families with meta-variables
  • Additional user-defined operator examples

Long-term:

  • SMT solver integration for automated verification
  • Code generation adapters for different engines
  • Performance optimizations for pattern matching

Known Limitations

Current Implementation:

  • Join rules only support INNER joins (OUTER joins require IS NOT NULL predicates)
  • Some complex expression types not yet handled (SIMILAR TO with escape)
  • No optimization for pattern matching efficiency

Design Constraints:

  • All abstract symbols must bind to at least one match (no optional predicates)
  • Single binding per symbol per rule application
  • Strict column validation (all references must exist in context)

Dependencies

  • datafusion - Query planning framework
  • thiserror - Error handling macros

Status

Active development. Core pattern matching complete. User-defined operator support implemented. QED export working with subquery support. API stabilizing.

Test Status: 79 unit tests + 21 doc tests passing

The project emphasizes correctness and extensibility. Pattern matching, instantiation, and user-defined operators are fully implemented with 22 working rules demonstrating the approach works with real DataFusion plans. QED serialization enables rule verification including complex cases like EXISTS subqueries with outer column references.

About

Rust impl of RuleScript based on Datafusion

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages