Dynamically narrow logical types in Parquet writer

From [discord conversation](https://discord.com/channels/885562378132000778/1166447479609376850/1450038250549805086).

The idea is that since we now dynamically adapt predicates/filters on a per-file basis including adding/removing casts, etc. we could dynamically at write time narrow the *logical* types we encode in Parquet.

For example, if we are writing a column that is `Int64` in the schema but for this file in particular the min/max values we see are `(2, 19)` we can add a `UINT_16` annotation to signal to readers that this column is a `UInt16` column. I'm not sure but I'd guess this can be done after writing the data as it's a metadata only operation. Then when a reader sees a predicate like `col > 5` it can cast `5` to `UInt16` and evaluate the whole thing more efficiently. I guess if it were `col > 9999` it could (just from the metadata, without even looking at stats) replace that with `false`?.

A next step would be to narrow the physical representation of the data to save storage costs, etc, but I think that would be more involved since it would require two passes over the data or something like that.

cc @asubiotto 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamically narrow logical types in Parquet writer #19337

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dynamically narrow logical types in Parquet writer #19337

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions