Skip to content

Implement compaction of sealed fractions #336

@dkharms

Description

@dkharms

As an important milestone for seq-db, we plan to introduce compaction into the database.

We expect several benefits from compaction:

  • It will allow us to reduce the fraction size to very small values (or even seal a fraction for each bulk), significantly lowering the memory footprint. We also anticipate an increase in ingestion throughput, since smaller fractions should reduce contention;
  • We expect lower on-disk usage for fractions. For example, the tokens section of the .index file (which accounts for around 20% of a fraction’s .index size) often contains overlapping (field, value) tuples across multiple fractions. When fractions are merged, these sections become more space-efficient to store by merging duplicate tuples;
  • Compaction will enable us to implement partitioning. We expect partitions (and parts) to be created upon sealing, which may produce many tiny sealed fractions — an inefficient outcome. Background compaction will address this by merging small parts into larger, more efficient ones;
  • Also, if we stick with Time-Tiered Compaction Strategy (TWCS, DTCS) we can easily implement time-based retention.

Metadata

Metadata

Assignees

Labels

featureNew feature or requestperformanceFeatures or improvements that positively affect seq-db performance

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions