Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ fn main() -> std::io::Result<()> {
MassMapBuilder::default()
.with_hash_seed(42)
.with_bucket_count(1024)
.with_bucket_size_limit(16 << 10)
.build(file, entries.iter())?;

// Read-only lookup phase.
Expand Down Expand Up @@ -132,6 +133,15 @@ hexdump -C examples/demo.massmap
#> 00000290
```

## Configuration

- `with_hash_seed(seed)`: choose deterministic sharding.
- `with_bucket_count(count)`: trade memory for faster lookups.
- `with_writer_buffer_size(bytes)`: tune streaming IO throughput.
- `with_field_names(true)`: emit MessagePack maps with named fields for easier debugging.
- `with_bucket_size_limit(bytes)`: guard against oversized buckets.
- Replace the default [`MassMapHashLoader`](https://docs.rs/massmap/latest/massmap/trait.MassMapHashLoader.html) to plug in custom hashers.

## Readers and Writers

`MassMapReader` and `MassMapWriter` abstract over positional IO. The traits are implemented for `std::fs::File` out of the box, but they can also wrap network storage, memory-mapped regions, or custom paged backends. Override `MassMapReader::batch_read_at` to dispatch vectored reads when available.
Expand Down
2 changes: 1 addition & 1 deletion examples/massmap.rs
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ impl MassMapHashLoader for MassMapTolerableHashLoader {
fn run_info(args: InfoArgs) -> Result<()> {
let file = File::open(&args.input)?;

let map = MassMap::<String, serde_json::Value, _>::load(file)?;
let map = MassMap::<String, serde_json::Value, _, MassMapTolerableHashLoader>::load(file)?;

let json = serde_json::to_string_pretty(&map.info())
.map_err(|e| Error::other(format!("Failed to format JSON: {e}")))?;
Expand Down
25 changes: 16 additions & 9 deletions src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,14 @@ use crate::{

/// Builder type for emitting massmap files from key-value iterators.
///
/// The builder owns configuration such as hash seed, bucket sizing and IO
/// buffering. Use [`build`](Self::build) to stream MessagePack-encoded buckets to
/// a [`MassMapWriter`] sink (typically a file implementing `FileExt`).
/// The builder owns configuration such as the hash seed, bucket sizing, IO
/// buffering, field-name emission, and optional bucket size guards. Use
/// [`build`](Self::build) to stream MessagePack-encoded buckets to a
/// [`MassMapWriter`] sink (typically a file implementing `FileExt`).
///
/// Cloning is not required; each builder instance is consumed by a single call
/// to [`build`](Self::build).
/// The loader type parameter `H` allows swapping in custom
/// [`MassMapHashLoader`] implementations. Each builder instance is consumed by a
/// single call to [`build`](Self::build).
#[derive(Debug)]
pub struct MassMapBuilder<H: MassMapHashLoader = MassMapDefaultHashLoader> {
hash_config: MassMapHashConfig,
Expand Down Expand Up @@ -75,14 +77,18 @@ impl<H: MassMapHashLoader> MassMapBuilder<H> {

/// Controls whether serialized MessagePack maps include field names.
///
/// Enabling this makes the output human readable at the cost of slightly
/// larger files.
/// Enabling this makes the serialized buckets human readable at the cost
/// of slightly larger files and additional encoding work.
pub fn with_field_names(mut self, value: bool) -> Self {
self.field_names = value;
self
}

/// Sets a hard cap on the number of bytes allowed per bucket payload.
///
/// Buckets that exceed this limit cause [`build`](Self::build) to abort
/// with `ErrorKind::InvalidData`, which can be useful when targeting
/// systems with strict per-request IO ceilings.
pub fn with_bucket_size_limit(mut self, limit: u32) -> Self {
self.bucket_size_limit = limit;
self
Expand All @@ -91,8 +97,9 @@ impl<H: MassMapHashLoader> MassMapBuilder<H> {
/// Consumes the builder and writes a massmap to `writer` from `entries`.
///
/// The iterator is hashed according to the configured parameters, buckets
/// are serialized via `rmp-serde`, and a [`MassMapInfo`] summary is returned
/// on success.
/// are serialized via `rmp-serde`, and a [`MassMapInfo`] summary is
/// returned on success. Input ordering does not matter; keys are
/// automatically distributed across buckets.
///
/// # Errors
///
Expand Down
8 changes: 6 additions & 2 deletions src/massmap.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ use crate::{
/// - `K`: key type stored in the map; must implement `serde::Deserialize`.
/// - `V`: value type stored in the map; must implement `serde::Deserialize` and `Clone`.
/// - `R`: reader that satisfies [`MassMapReader`].
/// - `H`: hash loader used to reconstruct the [`BuildHasher`](BuildHasher) from
/// the persisted [`MassMapHashConfig`](crate::MassMapHashConfig).
#[derive(Debug)]
pub struct MassMap<K, V, R: MassMapReader, H: MassMapHashLoader = MassMapDefaultHashLoader> {
/// Header serialized at the start of the massmap file.
Expand Down Expand Up @@ -165,7 +167,8 @@ where
///
/// The iterator reads each bucket sequentially from the backing storage,
/// deserializes all entries in the bucket, and yields them one at a time.
/// Each bucket is fully loaded into memory before any of its entries are yielded.
/// Each bucket is fully loaded into memory before any of its entries are
/// yielded. Iteration stops immediately if a bucket fails to deserialize.
///
/// # Examples
///
Expand Down Expand Up @@ -230,7 +233,8 @@ where
/// Iterator over all entries in a [`MassMap`].
///
/// This iterator traverses buckets sequentially, loading each bucket fully into
/// memory before yielding its entries one by one.
/// memory before yielding its entries one by one. Items are returned as
/// `Result`s so that IO or deserialization failures propagate to the caller.
pub struct MassMapIter<'a, K, V, R: MassMapReader, H: MassMapHashLoader> {
map: &'a MassMap<K, V, R, H>,
bucket_index: usize,
Expand Down
3 changes: 2 additions & 1 deletion src/meta.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,8 @@ pub struct MassMapMeta {
pub bucket_count: u64,
/// Number of empty buckets.
pub empty_buckets: u64,
/// Hash configuration.
/// Hash configuration used to derive the [`BuildHasher`](std::hash::BuildHasher)
/// when reopening the map.
pub hash_config: MassMapHashConfig,
}

Expand Down
4 changes: 3 additions & 1 deletion src/reader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ use std::{borrow::Borrow, io::Result};
/// Trait abstracting read access to massmap files.
///
/// Implementations must support positional reads without mutating shared state.
/// The trait is blanket-implemented for platform-specific `FileExt` handles.
/// The trait is blanket-implemented for platform-specific `FileExt` handles,
/// but can also wrap memory-mapped regions or networked block stores. Override
/// [`batch_read_at`](Self::batch_read_at) to surface vectored IO capabilities.
pub trait MassMapReader {
/// Reads `length` bytes starting at `offset` and forwards them to `f`.
///
Expand Down
Loading