Skip to content

A high-performance Rust library for weighted finite-state transducers with comprehensive semiring support.

License

Notifications You must be signed in to change notification settings

aaronstevenwhite/arcweight

Repository files navigation

ArcWeight

Crates.io Documentation Build Status License: Apache 2.0 DOI

A high-performance Rust library for weighted finite-state transducers with comprehensive semiring support.

ArcWeight provides efficient algorithms for constructing, combining, and optimizing weighted finite-state transducers (WFSTs), making it suitable for natural language processing, speech recognition, and computational linguistics applications.

Features

  • Core FST Operations: Composition, determinization, minimization, closure, union, concatenation
  • Advanced Algorithms: Shortest path, weight pushing, epsilon removal, pruning, synchronization
  • High-Performance Minimization: Hopcroft's O(n log n) algorithm and Brzozowski's algorithm
  • Rich Semiring Support: Tropical, log, probability, boolean, integer, product, and Gallic weights
  • Multiple FST Implementations: Vector-based, constant, compact, lazy evaluation, CSR format, and cached
  • Type-Safe Design: Zero-cost abstractions with trait-based polymorphism
  • OpenFST Compatible: Read and write OpenFST format files
  • Python Bindings: Full-featured Python API via PyO3 for easy integration
  • Pure Rust: Memory-safe implementation with no C++ dependencies
  • Parallel Processing: Rayon-based parallelization with work-stealing for large FSTs
  • SIMD Optimizations: AVX2/AVX-512 vectorized weight operations (where available)
  • Custom Allocators: Optional mimalloc support via fast-alloc feature

Quick Start

Add ArcWeight to your Cargo.toml:

[dependencies]
arcweight = "0.3"

Basic Example

use arcweight::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a simple FST
    let mut fst = VectorFst::<TropicalWeight>::new();

    // Add states
    let s0 = fst.add_state();
    let s1 = fst.add_state();
    let s2 = fst.add_state();

    // Set start and final states
    fst.set_start(s0);
    fst.set_final(s2, TropicalWeight::one());

    // Add arcs
    fst.add_arc(s0, Arc::new(1, 1, TropicalWeight::one(), s1));
    fst.add_arc(s1, Arc::new(2, 2, TropicalWeight::one(), s2));

    // Perform operations
    let minimized = minimize(&fst)?;

    println!("Original states: {}", fst.num_states());
    println!("Minimized states: {}", minimized.num_states());

    Ok(())
}

Python Bindings

ArcWeight also provides Python bindings for easy integration into Python projects:

pip install arcweight
import arcweight

# Create a new FST
fst = arcweight.VectorFst()

# Add states
s0 = fst.add_state()
s1 = fst.add_state()

# Set start state
fst.set_start(s0)

# Add an arc: from s0 to s1, input=1, output=1, weight=1.0
fst.add_arc(s0, 1, 1, 1.0, s1)

# Set final state
fst.set_final(s1, 0.5)

# Perform operations
minimized = arcweight.minimize(fst)
composed = arcweight.compose(fst1, fst2)

The Python API provides full access to all FST operations and algorithms. See the Python bindings documentation for more details.

Examples

ArcWeight includes comprehensive examples demonstrating real-world applications:

# String edit distance
cargo run --example edit_distance

# Spell checking and correction
cargo run --example spell_checking

# Morphological analysis
cargo run --example morphological_analyzer

# Phonological rules
cargo run --example phonological_rules

# Text normalization
cargo run --example number_date_normalizer

See the examples/ directory for complete implementations with detailed explanations.

Documentation

Minimum Supported Rust Version (MSRV)

ArcWeight requires Rust 1.85.0 or later.

The MSRV is explicitly tested in CI and will only be increased in minor version updates. When the MSRV is increased, the previous two stable releases will still be supported for six months.

Performance

ArcWeight is designed for high performance:

  • Zero-copy arc iteration minimizes allocations
  • Cache-friendly data structures optimize memory access (CSR format, SmallVec inline storage)
  • Optional parallel algorithms leverage multi-core processors with work-stealing
  • Automatic algorithm selection based on FST properties
  • SIMD-accelerated weight operations (AVX2/AVX-512)
  • Custom allocator support (mimalloc via fast-alloc feature)
  • FxHashMap for faster hash operations in composition and determinization
  • LTO and single-codegen-unit builds for release optimization

Run benchmarks on your system:

cargo bench

Performance Features

Enable additional optimizations:

[dependencies]
arcweight = { version = "0.3", features = ["fast-alloc", "parallel"] }

Available performance features:

  • fast-alloc - Use mimalloc allocator for reduced allocation overhead
  • parallel - Enable Rayon-based parallel algorithms (enabled by default)
  • gpu - Experimental GPU acceleration via wgpu

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Quick checklist:

  • Follow existing code style (run cargo fmt)
  • Add tests for new functionality (run cargo test)
  • Update documentation for public APIs (run cargo doc)
  • Ensure all CI checks pass (run cargo clippy)

Getting Help

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Citation

If you use ArcWeight in your research, please cite:

@software{arcweight,
  author = {White, Aaron Steven},
  title = {ArcWeight: A Rust Library for Weighted Finite-State Transducers},
  url = {https://github.com/aaronstevenwhite/arcweight},
  doi = {10.5281/zenodo.17371992},
  year = {2025}
}

References

ArcWeight implements algorithms based on:

Acknowledgments

This library was architected and implemented with the help of Claude Code.

About

A high-performance Rust library for weighted finite-state transducers with comprehensive semiring support.

Resources

License

Contributing

Stars

Watchers

Forks