-
Notifications
You must be signed in to change notification settings - Fork 226
Add ruby-rbs crate: Safe Rust wrapper over ruby-rbs-sys
#2808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
alexcrocha
wants to merge
32
commits into
ruby:master
Choose a base branch
from
Shopify:rust-rbs
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,271
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit introduces the `ruby-rbs` crate, which will provide a safe, high-level Rust API for the RBS C library. It follows the common Rust pattern of separating the safe wrapper from the `*-sys` crate that provides the raw FFI bindings. The `ruby-rbs` crate will depend on `ruby-rbs-sys` for the unsafe C bindings and will expose a safe, idiomatic Rust interface. This commit sets up the foundation for that structure. The initial implementation includes: - The basic crate structure with its own Cargo.toml, declaring a dependency on `ruby-rbs-sys`. - A build script (`build.rs`) that will be responsible for generating safe Rust wrappers from the C API. Currently, it only generates an empty `bindings.rs` file. - The `ruby-rbs` crate is added to the main workspace `Cargo.toml`. While the interaction is not yet implemented, this setup paves the way for providing a robust Rust interface for RBS, which will improve safety and developer experience.
The build script now reads the config.yml file and generates corresponding Rust struct definitions for all RBS AST nodes. Implementation details: - Parse config.yml using serde to extract node definitions - Generate proper Rust module hierarchy from :: namespace separators - Apply Rust naming conventions: - Modules use snake_case - Structs remain PascalCase - Handle Rust reserved keywords (Use -> UseDirective, Self -> SelfType) - Smart PascalCase to snake_case conversion that correctly handles acronyms (e.g., 'AST' -> 'ast', not 'a_s_t') The generated bindings create empty struct definitions organized in the correct module hierarchy, laying the foundation for the safe Rust API that will wrap the ruby-rbs-sys FFI bindings.
Instead of auto-generating nested module paths from RBS nested naming conventions, use explicit `rust_name` fields in `config.yml` and generate flat structs. - Add `rust_name` field to all node definitions in `config.yml` - Remove complex module/path parsing logic from build.rs - Generate flat structs (e.g., `ClassNode`) instead of nested modules - Add `Node` enum to wrap all node types This makes the generated Rust code easier to work with.
Handle rbs_string field types when generating Rust structs from config.yml. The RBSString struct wraps rbs_string_t pointers and provides an as_bytes() method that safely calculates string length using pointer arithmetic.
The `parse` function enables parsing RBS code from Rust. This provides a safe Rust interface to the C parser, handling memory management and encoding setup.
Since `bool` is a primitive type with direct FFI mapping between C and Rust, we don't need a wrapper struct like we do for complex types (`rbs_string_t`, etc.).
Symbol fields in RBS AST nodes store their values as constant IDs that need to be resolved through the parser's constant pool. This safe Rust wrapper (`RBSSymbol`) maintains a reference to the parser and provides access to the symbol's name bytes, similar to how `RBSString` handles string types. The build script now generates accessors for `rbs_ast_symbol` fields that properly pass both the symbol pointer and parser reference to enable constant pool lookups.
Refactor node structs to use pointer-based access and add NodeList iterator Changes node generation from storing individual fields to holding a single pointer to the C struct. This avoids duplicating data in Rust structs and matches the pattern used in Prism's bindings. We just maintain a thin wrapper around the C pointer and dereference it in accessor methods. Adds NodeList/NodeListIter to enable idiomatic Rust iteration over RBS's linked list structures, and implements Node::new() factory method that type-checks the C node pointer and constructs the appropriate Rust variant with proper pointer casting. Also adds convert_name() helper to generate C identifiers from RBS node names (snake_case_t for types, UPPER_CASE for enum constants).
Many AST nodes in `config.yml` have location fields (`rbs_location`, `rbs_location_list`). This change adds the necessary wrapper structs (`RBSLocation`, `RBSLocationList`) and updates `build.rs` to generate accessors for these fields. The `RBSLocation` wrapper includes a reference to the parser to support future functionality like source extraction.
Enable nested AST traversal by exposing rbs_node and rbs_node_list fields. Nested structure traversal (e.g., class members, constant types) depends on access to rbs_node and rbs_node_list fields. Making these fields accessible aligns the Rust bindings with the C API. Fields named "type" are accessible via type_ to avoid a Rust keyword collision.
Adds `test_parse_integer()` which parses an integer literal type alias and traverses the AST (`TypeAlias` -> `LiteralType` -> `Integer`) using pattern matching to verify node types and extract values. This validates that the generated node wrappers enable AST traversal in pure Rust with proper type safety. Also adds `Debug` derives and refactors memory management by returning `SignatureNode` instead of raw pointer, with `Drop` impl to free parser.
Refactor the previous implementation of `Symbol`/`Keyword` handling to treat them as first-class nodes in the build configuration. `Keyword` and `Symbol` represent identifiers (interned strings), not traditional AST nodes. However, the C parser defines them in `rbs_node_type` (as `RBS_KEYWORD` and `RBS_AST_SYMBOL`) and treats them as nodes (`rbs_node_t*`) in many contexts (lists, hashes). Instead of manually defining `RBSSymbol`/`RBSKeyword` structs, we now inject them into the `config.yml` node list in `build.rs`. This allows them to be generated as `SymbolNode`/`KeywordNode` variants in the `Node` enum, enabling polymorphic handling (in Node lists and Hashes)
Add support for RBS hashes (`rbs_hash_t`), which are used in Record types and Function keyword arguments
Enable walking the AST by generating a `Visit` trait with per-node visitor methods. It uses double dispatch to route each node type to its corresponding visitor method. This avoids consumers needing to manually match on Node variants and allows overriding specific visits while inheriting default behaviour for others.
Some C struct pointer fields can be NULL (super_class when no parent class, comment when no doc comment). This metadata allows our Rust codegen to generate Option<T> return types for these accessors instead of unconditionally wrapping potentially NULL pointers.
Read `optional: true` annotations from `config.yml` and generate `Option<T>` return types with null checks, so we don't crash at runtime. The extracted helper function centralizes the accessor generation logic for pointer-based field types.
The Visit trait added in #69 provided the scaffolding for AST traversal, but the visitor functions were empty stubs that didn't recurse into children nodes. Without this, the visitor pattern is incomplete as we'd have to manually write traversal logic every time we want to walk the tree. This commit adds the generation of visitor functions for child node traversal. We handle four field types: - `rbs_node`: single child node - `rbs_node_list`: list of child nodes - `rbs_hash`: key-value pairs of nodes - Wrapper types (`rbs_type_name`, `rbs_namespace`, etc): each with its own visitor method Each case handles optional fields to safely skip NULL pointers
Each node already has location data in its C struct, but it wasn't exposed through the Rust API. This adds a generated `location()` method to every node type, making it easy to get source ranges for any part of the AST. Also removing `parser` from location structs as it is not needed.
Addressing some linting warnings
Adds `location()` accessor to the `Node` enum, delegating to each variant's `location()` method. A previous commit added `location()` to individual node types but missed the enum itself. This allows getting the location of the entire node definition when working with the `Node` enum directly.
Reorder lib.rs structs alphabetically Improve bindings code formatting Remove TODO comments from rust crate Some nodes don't use their parser field, but conditionally omitting it adds significant complexity. Keep parser on all nodes and suppress the warning on the parser field. Remove debug comment from generated bindings
Adds lifetimes to make borrowing relationships clearer so the Rust compiler can validate and enforce them.
Replaced `*mut T` with `NonNull<T>` for the parser pointer to make the ‘never null’ assumption explicit. `NonNull<T>` represents a non-null raw pointer (a wrapper around `*mut T`) that guarantees the pointer is never null.
TypeApplicationAnnotation, InstanceVariableAnnotation, ClassAliasAnnotation, and ModuleAliasAnnotation also need rust_name fields for rust binding code generation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces
ruby-rbs, a safe Rust wrapper for the RBS parser. Builds on #2807 (ruby-rbs-sys) — please refer to it for motivation and background on the two-crate approach.ruby-rbs Overview
build.rsreadsconfig.ymland generates Rust structs matching the C AST node types'a) tie all nodes to the parser, preventing use-after-freeSignatureNodeimplementsDropto free the parser when droppedVisittrait for traversing the ASTChanges to config.yml
This PR adds two new fields to node definitions:
rust_name: Specifies the Rust struct name (e.g.,BoolNodeforRBS::AST::Bool)optional: Documents which fields can be NULL in the C parserNo impact on existing code. These fields are ignored by the Ruby/C code generators.