bloom

An ultra fast, lightweight, concurrent-safe Bloom filter for Go.

Zero dependencies — only the Go standard library. Zero allocations per operation. Lock-free concurrent reads and writes via atomic.Uint64. Built for production use, small enough to understand (just over 100 LOC).

This library is based on my blog post: The Magic of Bloom Filters, which covers the theory, math, benchmarks, and design decisions in depth.

Features

Zero dependencies — uses only the Go standard library (hash/fnv, sync/atomic, sync, math)
Zero allocations — hashers are recycled via sync.Pool (0 B/op, 0 allocs/op)
Concurrent-safe with no locks — atomic.Uint64 bitset for lock-free Add and Contains
Kirsch-Mitzenmacher optimization — one hash call simulates k independent hashes
Configurable — create from explicit (m, k) or from (n, p) probability parameters
Pluggable hash function — default FNV-1a, swappable via WithHashFunc option
Binary serialization — MarshalBinary / UnmarshalBinary for persistence and network transfer

Install

go get github.com/phrozen/bloom

Usage

Bloom filters answer one question: "Is this item in the set?" The answer is either a definitive No or a probabilistic Probably. There are no false negatives — if the filter says no, the item was never added. This makes them ideal for guarding expensive lookups: check the filter first, skip the database/disk/network call when the answer is no.

Create from probability parameters

Most of the time, you know how many items you expect and what false positive rate you can tolerate. The constructor handles the math for you — it computes the optimal number of bits and hash functions automatically:

// Filter for 1M items with a 1% false positive rate.
// Internally this allocates ~1.14 MB (9.58M bits) and uses 7 hash functions.
f := bloom.NewFilterFromProbability(1_000_000, 0.01)

Create with explicit parameters

If you already know the exact bit size and hash count you want (e.g. reproducing a filter from known parameters), you can specify them directly:

// 10 million bits, 7 hash functions
f := bloom.NewFilter(10_000_000, 7)

Add and query

Both functions are concurrent-safe and cannot fail (no error). Passing an empty []byte is valid — it simply hashes like any other input.

id := "550e8400-e29b-41d4-a716-446655440000"
f.Add([]byte(id))

if !f.Contains([]byte(id)) {
    // Definitively NOT in the set — skip the database entirely
    return ErrNotFound
}
// Probably in the set — check the database to confirm

Custom hash function

The default hasher is FNV-1a from the standard library. If you need a different algorithm, swap it via the WithHashFunc option. Any hash.Hash64 implementation works:

import "github.com/cespare/xxhash/v2"

f := bloom.NewFilterFromProbability(1_000_000, 0.01,
    bloom.WithHashFunc(func() hash.Hash64 { return xxhash.New() }),
)

Serialize / Deserialize

Filters implement encoding.BinaryMarshaler and encoding.BinaryUnmarshaler, so you can persist them to disk, send them over the network, or embed them in any encoding that supports those interfaces:

// Save
data, err := f.MarshalBinary()

// Restore
var restored bloom.Filter
err = restored.UnmarshalBinary(data)

The binary format is compact: 16 bytes of header (m + k) followed by the raw bitset. A 1M-item filter at 1% FPR serializes to ~1.14 MB.

Benchmarks

All benchmarks on AMD Ryzen 7 5800X (16 threads), 16-byte UUID keys, go test -bench=. -benchmem:

Per-operation

Operation	ns/op	B/op	allocs/op
`Add`	~36 ns	0	0
`Contains`	~25 ns	0	0

At scale (1M items, 1% FPR, 16 goroutines)

Benchmark	ns/op	B/op	allocs/op
Concurrent Add	~12 ns	0	0
Concurrent Contains	~9 ns	0	0
Concurrent Add+Contains (50/50)	~14 ns	0	0
Filter memory footprint	1.14 MB
Actual false positive rate	~1.00%

Hash function comparison

The sync.Pool optimization eliminates the interface allocation penalty entirely. All hashers now perform within the same ballpark, despite wildly different internal struct sizes:

Hasher	Add ns/op	Contains ns/op
FNV-1a (default, std lib)	~36 ns	~25 ns
xxHash	~40 ns	~26 ns
Murmur3	~36 ns	~29 ns
XXH3	~36 ns	~29 ns

FNV-1a is the default because it ships with Go — zero dependencies — and with Kirsch-Mitzenmacher splitting a 64-bit hash into two 32-bit halves, all four hashers produce identical false positive rates (~1% for a 1% target). Hash "quality" is a non-factor for this use case.

How it works

Bit array — a []atomic.Uint64 slice aligned to CPU word size for single-instruction bit operations
Kirsch-Mitzenmacher — hash once, split into two 32-bit halves h1 and h2, simulate k hashes via h1 + i*h2 (paper)
sync.Pool — recycles hasher instances so Add/Contains do zero heap allocations
Atomic operations — Or to set bits, Load to read them — fully concurrent, no mutex

API

// Constructors
func NewFilter(m int, k int, opts ...Option) *Filter
func NewFilterFromProbability(n int, p float64, opts ...Option) *Filter

// Options
func WithHashFunc(h HashFunc) Option

// Operations
func (f *Filter) Add(data []byte)
func (f *Filter) Contains(data []byte) bool

// Serialization (encoding.BinaryMarshaler / encoding.BinaryUnmarshaler)
func (f *Filter) MarshalBinary() ([]byte, error)
func (f *Filter) UnmarshalBinary(data []byte) error

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bloom.go		bloom.go
bloom_test.go		bloom_test.go
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bloom

Features

Install

Usage

Create from probability parameters

Create with explicit parameters

Add and query

Custom hash function

Serialize / Deserialize

Benchmarks

Per-operation

At scale (1M items, 1% FPR, 16 goroutines)

Hash function comparison

How it works

API

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bloom

Features

Install

Usage

Create from probability parameters

Create with explicit parameters

Add and query

Custom hash function

Serialize / Deserialize

Benchmarks

Per-operation

At scale (1M items, 1% FPR, 16 goroutines)

Hash function comparison

How it works

API

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages