Add implementation plan for Rust engine by galpin · Pull Request #11 · galpin/pluck

galpin · 2026-02-22T10:24:03Z

Outlines the architecture and steps for adding a PyO3/maturin-based
Rust engine to accelerate JSON normalization, tree walking, and frame
extraction while preserving the public Python API with graceful fallback.

https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh

Outlines the architecture and steps for adding a PyO3/maturin-based Rust engine to accelerate JSON normalization, tree walking, and frame extraction while preserving the public Python API with graceful fallback. https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh

Rebased onto latest main which migrated from Poetry to uv with uv_build backend and added pre-commit hooks (ruff + ty). Updated plan to reflect: - Replace uv_build (not poetry-core) with maturin as build backend - CI steps use uv commands (uv sync, uv run) instead of poetry - Pre-commit compatibility with ty's unresolved-import = "warn" - Development workflow uses maturin develop + uv run pytest https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh

Add a compiled Rust extension (pluck._pluck_engine) that accelerates the two most expensive operations: JSON normalization with cross-joins and frame extraction from GraphQL responses. Key changes: - rust/: PyO3 extension with normalize(), extract_frames(), and walker - src/pluck/_engine.py: engine selector with graceful Python fallback - src/pluck/_execution.py: delegates to engine instead of direct calls - pyproject.toml: maturin build backend replaces uv_build - CI.yml: adds Rust toolchain and maturin build steps - tests/test_performance.py: benchmark comparing Python vs Rust (2.2x) The public API is unchanged. When the Rust extension is unavailable, the library falls back transparently to the existing Python code. https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh

Two key optimizations to the Rust normalization engine: 1. Native Rust Value enum: Convert Python objects to Rust types once at the boundary, do all normalization in pure Rust (no GIL interaction), then convert back. Eliminates ~33K Py_INCREF calls per benchmark. Uses Rc<str> for column names so cross-join cloning is a pointer copy. 2. Columnar output format: New normalize_columnar() returns {col: [vals]} instead of [{col: val}]. Creates 1 dict + N_cols lists instead of N_rows dicts. Pandas consumes columnar data much faster. End-to-end benchmarks (normalize + DataFrame creation): 5K rows: Python 0.077s → Rust 0.014s (5.6x) 20K rows: Python 0.222s → Rust 0.049s (4.6x) 60K rows: Python 0.717s → Rust 0.203s (3.5x) https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh

…(6-7x) - Replace path.to_vec() allocations with mutable push/pop path stack - Add HashMap cache for generate_name to avoid redundant string allocation - Add normalize_columnar_batch: single Rust call for all items, eliminating per-item Python↔Rust round-trips and Python-side merge loop - Update _execution.py to use batch function - All 43 tests pass, benchmark shows 5-7x end-to-end speedup https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh

- Add arrow crate (v55) with pyarrow FFI for zero-copy Rust→Python transfer - New normalize_arrow_batch: builds typed Arrow arrays (Int64, Float64, Boolean, Utf8) directly from Rust Value enum, passes RecordBatch to Python via Arrow C Data Interface — no per-cell PyObject creation - Add pyarrow as runtime dependency - Update _execution.py to use Arrow path as primary when Rust engine available - Add create_from_arrow to DataFrameLibrary using RecordBatch.to_pandas() - Benchmark: 6.2x (5K rows), 7.6x (20K), 10.1x (60K) end-to-end speedup https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh

claude added 6 commits February 22, 2026 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add implementation plan for Rust engine#11

Add implementation plan for Rust engine#11
galpin wants to merge 6 commits intomainfrom
claude/rust-pluck-engine-RnHyI

galpin commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

galpin commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants