Open
Conversation
Outlines the architecture and steps for adding a PyO3/maturin-based Rust engine to accelerate JSON normalization, tree walking, and frame extraction while preserving the public Python API with graceful fallback. https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh
Rebased onto latest main which migrated from Poetry to uv with uv_build backend and added pre-commit hooks (ruff + ty). Updated plan to reflect: - Replace uv_build (not poetry-core) with maturin as build backend - CI steps use uv commands (uv sync, uv run) instead of poetry - Pre-commit compatibility with ty's unresolved-import = "warn" - Development workflow uses maturin develop + uv run pytest https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh
Add a compiled Rust extension (pluck._pluck_engine) that accelerates the two most expensive operations: JSON normalization with cross-joins and frame extraction from GraphQL responses. Key changes: - rust/: PyO3 extension with normalize(), extract_frames(), and walker - src/pluck/_engine.py: engine selector with graceful Python fallback - src/pluck/_execution.py: delegates to engine instead of direct calls - pyproject.toml: maturin build backend replaces uv_build - CI.yml: adds Rust toolchain and maturin build steps - tests/test_performance.py: benchmark comparing Python vs Rust (2.2x) The public API is unchanged. When the Rust extension is unavailable, the library falls back transparently to the existing Python code. https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh
Two key optimizations to the Rust normalization engine:
1. Native Rust Value enum: Convert Python objects to Rust types once at
the boundary, do all normalization in pure Rust (no GIL interaction),
then convert back. Eliminates ~33K Py_INCREF calls per benchmark.
Uses Rc<str> for column names so cross-join cloning is a pointer copy.
2. Columnar output format: New normalize_columnar() returns {col: [vals]}
instead of [{col: val}]. Creates 1 dict + N_cols lists instead of
N_rows dicts. Pandas consumes columnar data much faster.
End-to-end benchmarks (normalize + DataFrame creation):
5K rows: Python 0.077s → Rust 0.014s (5.6x)
20K rows: Python 0.222s → Rust 0.049s (4.6x)
60K rows: Python 0.717s → Rust 0.203s (3.5x)
https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh
…(6-7x) - Replace path.to_vec() allocations with mutable push/pop path stack - Add HashMap cache for generate_name to avoid redundant string allocation - Add normalize_columnar_batch: single Rust call for all items, eliminating per-item Python↔Rust round-trips and Python-side merge loop - Update _execution.py to use batch function - All 43 tests pass, benchmark shows 5-7x end-to-end speedup https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh
- Add arrow crate (v55) with pyarrow FFI for zero-copy Rust→Python transfer - New normalize_arrow_batch: builds typed Arrow arrays (Int64, Float64, Boolean, Utf8) directly from Rust Value enum, passes RecordBatch to Python via Arrow C Data Interface — no per-cell PyObject creation - Add pyarrow as runtime dependency - Update _execution.py to use Arrow path as primary when Rust engine available - Add create_from_arrow to DataFrameLibrary using RecordBatch.to_pandas() - Benchmark: 6.2x (5K rows), 7.6x (20K), 10.1x (60K) end-to-end speedup https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Outlines the architecture and steps for adding a PyO3/maturin-based
Rust engine to accelerate JSON normalization, tree walking, and frame
extraction while preserving the public Python API with graceful fallback.
https://claude.ai/code/session_012euCx7hSyKp8V4yuYCK9Dh