Skip to content

Conversation

@RaduBerinde
Copy link
Contributor

Run with both 8 and 16-bit fingerprints and with a few different
sizes. Add a MKeys/s metric which is intuitive.

Results on an Apple M1 laptop below. For comparison, on the same
machine a cacheline-blocked bloom filter does ~130Mkeys/s.

name                                time/op
BinaryFusePopulate/8/n=10000-10      199µs ± 1%
BinaryFusePopulate/8/n=100000-10    2.55ms ± 1%
BinaryFusePopulate/8/n=1000000-10   27.5ms ± 1%
BinaryFusePopulate/16/n=10000-10     231µs ± 1%
BinaryFusePopulate/16/n=100000-10   2.58ms ± 0%
BinaryFusePopulate/16/n=1000000-10  29.0ms ± 1%

name                                MKeys/s
BinaryFusePopulate/8/n=10000-10       50.4 ± 1%
BinaryFusePopulate/8/n=100000-10      39.2 ± 1%
BinaryFusePopulate/8/n=1000000-10     36.4 ± 1%
BinaryFusePopulate/16/n=10000-10      43.3 ± 1%
BinaryFusePopulate/16/n=100000-10     38.8 ± 0%
BinaryFusePopulate/16/n=1000000-10    34.5 ± 1%

name                                alloc/op
BinaryFusePopulate/8/n=10000-10      283kB ± 0%
BinaryFusePopulate/8/n=100000-10    2.58MB ± 0%
BinaryFusePopulate/8/n=1000000-10   24.8MB ± 0%
BinaryFusePopulate/16/n=10000-10     321kB ± 0%
BinaryFusePopulate/16/n=100000-10   2.91MB ± 0%
BinaryFusePopulate/16/n=1000000-10  28.1MB ± 0%

name                                allocs/op
BinaryFusePopulate/8/n=10000-10       8.00 ± 0%
BinaryFusePopulate/8/n=100000-10      8.00 ± 0%
BinaryFusePopulate/8/n=1000000-10     8.00 ± 0%
BinaryFusePopulate/16/n=10000-10      8.00 ± 0%
BinaryFusePopulate/16/n=100000-10     8.00 ± 0%
BinaryFusePopulate/16/n=1000000-10    8.00 ± 0%

Run with both 8 and 16-bit fingerprints and with a few different
sizes. Add a `MKeys/s` metric which is intuitive.

Results on an Apple M1 laptop below. For comparison, on the same
machine a cacheline-blocked bloom filter does ~130Mkeys/s.

```
name                                time/op
BinaryFusePopulate/8/n=10000-10      199µs ± 1%
BinaryFusePopulate/8/n=100000-10    2.55ms ± 1%
BinaryFusePopulate/8/n=1000000-10   27.5ms ± 1%
BinaryFusePopulate/16/n=10000-10     231µs ± 1%
BinaryFusePopulate/16/n=100000-10   2.58ms ± 0%
BinaryFusePopulate/16/n=1000000-10  29.0ms ± 1%

name                                MKeys/s
BinaryFusePopulate/8/n=10000-10       50.4 ± 1%
BinaryFusePopulate/8/n=100000-10      39.2 ± 1%
BinaryFusePopulate/8/n=1000000-10     36.4 ± 1%
BinaryFusePopulate/16/n=10000-10      43.3 ± 1%
BinaryFusePopulate/16/n=100000-10     38.8 ± 0%
BinaryFusePopulate/16/n=1000000-10    34.5 ± 1%

name                                alloc/op
BinaryFusePopulate/8/n=10000-10      283kB ± 0%
BinaryFusePopulate/8/n=100000-10    2.58MB ± 0%
BinaryFusePopulate/8/n=1000000-10   24.8MB ± 0%
BinaryFusePopulate/16/n=10000-10     321kB ± 0%
BinaryFusePopulate/16/n=100000-10   2.91MB ± 0%
BinaryFusePopulate/16/n=1000000-10  28.1MB ± 0%

name                                allocs/op
BinaryFusePopulate/8/n=10000-10       8.00 ± 0%
BinaryFusePopulate/8/n=100000-10      8.00 ± 0%
BinaryFusePopulate/8/n=1000000-10     8.00 ± 0%
BinaryFusePopulate/16/n=10000-10      8.00 ± 0%
BinaryFusePopulate/16/n=100000-10     8.00 ± 0%
BinaryFusePopulate/16/n=1000000-10    8.00 ± 0%
```
@lemire lemire merged commit b6f8966 into FastFilter:master Jan 6, 2026
4 checks passed
@lemire
Copy link
Member

lemire commented Jan 6, 2026

Merged.

@RaduBerinde RaduBerinde deleted the binary-fuse-bench branch January 6, 2026 14:38
@RaduBerinde
Copy link
Contributor Author

Thank you!

RaduBerinde added a commit to RaduBerinde/pebble that referenced this pull request Jan 10, 2026
Binary fuse filters take non-trivial memory, about 24 bits per key
(see FastFilter/xorfilter#48). We thus have to
be more careful with memory usage.

For small-to-medium filters, we reuse builders in a `sync.Pool`. For
large filters, we limit concurrency and keep a very small pool of
builders to reuse. For very large filters (that are unlikely to show
up in practice currently), we further limit concurrency and don't
reuse builders.

Note that Pebble's compaction concurrency is typically much smaller
than the number of CPUs, so the limits should not impact performance
(especially we only limit concurrency of building the filter itself,
which is a small part of sstable write time).
RaduBerinde added a commit to RaduBerinde/pebble that referenced this pull request Jan 11, 2026
Binary fuse filters take non-trivial memory, about 24 bits per key
(see FastFilter/xorfilter#48). We thus have to
be more careful with memory usage.

For small-to-medium filters, we reuse builders in a `sync.Pool`. For
large filters, we limit concurrency and keep a very small pool of
builders to reuse. For very large filters (that are unlikely to show
up in practice currently), we further limit concurrency and don't
reuse builders.

Note that Pebble's compaction concurrency is typically much smaller
than the number of CPUs, so the limits should not impact performance
(especially we only limit concurrency of building the filter itself,
which is a small part of sstable write time).
RaduBerinde added a commit to RaduBerinde/pebble that referenced this pull request Jan 13, 2026
Binary fuse filters take non-trivial memory, about 24 bits per key
(see FastFilter/xorfilter#48). We thus have to
be more careful with memory usage.

For small-to-medium filters, we reuse builders in a `sync.Pool`. For
large filters, we limit concurrency and keep a very small pool of
builders to reuse. For very large filters (that are unlikely to show
up in practice currently), we further limit concurrency and don't
reuse builders.

Note that Pebble's compaction concurrency is typically much smaller
than the number of CPUs, so the limits should not impact performance
(especially we only limit concurrency of building the filter itself,
which is a small part of sstable write time).
RaduBerinde added a commit to RaduBerinde/pebble that referenced this pull request Jan 13, 2026
Binary fuse filters take non-trivial memory, about 24 bits per key
(see FastFilter/xorfilter#48). We thus have to
be more careful with memory usage.

For small-to-medium filters, we reuse builders in a `sync.Pool`. For
large filters, we limit concurrency and keep a very small pool of
builders to reuse. For very large filters (that are unlikely to show
up in practice currently), we further limit concurrency and don't
reuse builders.

Note that Pebble's compaction concurrency is typically much smaller
than the number of CPUs, so the limits should not impact performance
(especially we only limit concurrency of building the filter itself,
which is a small part of sstable write time).
RaduBerinde added a commit to cockroachdb/pebble that referenced this pull request Jan 13, 2026
Binary fuse filters take non-trivial memory, about 24 bits per key
(see FastFilter/xorfilter#48). We thus have to
be more careful with memory usage.

For small-to-medium filters, we reuse builders in a `sync.Pool`. For
large filters, we limit concurrency and keep a very small pool of
builders to reuse. For very large filters (that are unlikely to show
up in practice currently), we further limit concurrency and don't
reuse builders.

Note that Pebble's compaction concurrency is typically much smaller
than the number of CPUs, so the limits should not impact performance
(especially we only limit concurrency of building the filter itself,
which is a small part of sstable write time).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants