Skip to content

Conversation

@bbalser
Copy link
Collaborator

@bbalser bbalser commented Feb 9, 2026

This branch adds a Write-Audit-Publish workflow to the helium_iceberg crate — a pattern for safely writing
data to Iceberg tables in three phases: write to an isolated branch, audit/validate, then atomically publish
to main.

New Files

  • branch.rs — Low-level branch operations: create_branch(), commit_to_branch(), publish_branch(),
    delete_branch(). Builds snapshot/manifest infrastructure directly and tracks a wap.id in snapshot summaries
    for idempotent retry detection.
  • iceberg_table.rs — High-level facade wrapping Catalog + Table. Implements both DataWriter (direct writes)
    and BranchWriter (branch-based WAP writes) traits. Includes smart idempotent WAP state detection with 4
    states: NotStarted, StaleBranch, WrittenNotPublished, AlreadyPublished — enabling crash recovery.
  • staged_writer.rs — Session-based API using a typestate pattern: StagedWriter (write phase) →
    StagedPublisher (publish phase). Enforces the create→write→publish lifecycle at compile time.

Major Modifications

  • catalog.rs — Added direct REST API access (RestEndpoint) to work around iceberg 0.8's non-public
    TableCommit::builder(). Added OAuth2 client credentials auth with token caching and 401 retry. Added S3
    property passthrough support.
  • writer.rs — Gutted from ~190 lines to ~45 lines. Now contains only two trait definitions: DataWriter and
    BranchWriter. All concrete implementation moved to branch.rs and iceberg_table.rs.
  • memory_writer.rs — Expanded significantly to include MemoryBranchWriter for testing WAP workflows.
  • table_creator.rs — Added support for identifier fields and sort order specification.
  • settings.rs — Added properties: HashMap<String, String> for passing S3 credentials directly to the catalog.

Infrastructure

  • iceberg-compose.yml — Updated Docker Compose to use Polaris as the Iceberg catalog (replacing previous
    setup).

Architecture

StagedWriter / StagedPublisher (compile-time safe session API)

IcebergTable (DataWriter + BranchWriter traits)

branch.rs (low-level snapshot/manifest ops)

Catalog → RestEndpoint (direct REST + OAuth2 auth)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants