Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Dec 24, 2025

Implemented the DeleteFileIndex and Builder to manage and efficiently filter
delete files (equality deletes, position deletes, and deletion vectors)
based on sequence numbers and partitions.

Key changes:

  • Added DeleteFileIndex and DeleteFileIndex::Builder in src/iceberg/delete_file_index.{h,cc}.
  • Added ContentFileUtil for helper functions related to content files and DVs.
  • Updated ManifestReader to support dropping stats via TryDropStats().
  • Added comprehensive unit tests in src/iceberg/test/delete_file_index_test.cc.

@wgtmac wgtmac force-pushed the delete_file_index branch 4 times, most recently from 9345afc to 0b8f579 Compare December 25, 2025 03:20
// select record_count, which is a primitive type.
if (!columns.empty()) {
const std::unordered_set<std::string_view> selected(columns.cbegin(), columns.cend());
if (selected.contains(ManifestReader::kAllColumns)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (selected.contains(ManifestReader::kAllColumns)) {
if (selected.contains(Schema::kAllColumns)) {

"upper_bounds", "record_count"};
const std::vector<std::string> kStatsColumns = {
"value_counts", "null_value_counts", "nan_value_counts", "lower_bounds",
"upper_bounds", "column_sizes", "record_count"};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added column_sizes seems not aligned with java impl, is that a bug in java or our implementation is different?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants