Skip to content

Support loading Variable and Sample metadata from parquet file metada…#16

Merged
ccdavis merged 5 commits intomainfrom
parquet-metadata
Dec 29, 2025
Merged

Support loading Variable and Sample metadata from parquet file metada…#16
ccdavis merged 5 commits intomainfrom
parquet-metadata

Conversation

@ccdavis
Copy link
Owner

@ccdavis ccdavis commented Aug 14, 2025

Reads the IPUMS parquet file metadata (key-value), providing metadata to the Context instance. Looks for 'variable' and 'sample'. and de-serializes their content into IpumsVariable and IpumsSample structs.

This implements some context methods for loading from Parquet and adds the parquet_metadata module.

There is a mapping from Parquet physical types to IPUMS type so if we don't have variable metadata that includes the data type, we can consult the Parquet schema to get the variable's type.

ccdavis and others added 5 commits August 13, 2025 23:46
Merge clipply lints not related to this PR.
- Fix Parquet physical type to IPUMS type mapping (INT32/INT64 -> integer,
  FLOAT/DOUBLE -> double, BYTE_ARRAY -> string)
- Return errors on variable parse failures instead of silently skipping
- Fix category sorting for Float and String types
- Optimize dataset lookup by hoisting it outside variable loop
- Add missing assert!() in test and improve test clarity
- Add test for parquet_type_to_ipums_type conversion

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ccdavis ccdavis merged commit ecd8cd9 into main Dec 29, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant