Support loading Variable and Sample metadata from parquet file metada…#16
Merged
Support loading Variable and Sample metadata from parquet file metada…#16
Conversation
…ta if it is present.
Merge clipply lints not related to this PR.
…on't match their data type.
- Fix Parquet physical type to IPUMS type mapping (INT32/INT64 -> integer, FLOAT/DOUBLE -> double, BYTE_ARRAY -> string) - Return errors on variable parse failures instead of silently skipping - Fix category sorting for Float and String types - Optimize dataset lookup by hoisting it outside variable loop - Add missing assert!() in test and improve test clarity - Add test for parquet_type_to_ipums_type conversion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reads the IPUMS parquet file metadata (key-value), providing metadata to the Context instance. Looks for 'variable' and 'sample'. and de-serializes their content into IpumsVariable and IpumsSample structs.
This implements some context methods for loading from Parquet and adds the
parquet_metadatamodule.There is a mapping from Parquet physical types to IPUMS type so if we don't have variable metadata that includes the data type, we can consult the Parquet schema to get the variable's type.