mutex-benches

Benchmarks for std::sync::Mutex vs parking_lot::Mutex.

Quick Links:

Quick Start - Run benchmarks immediately
Benchmark Results - See the data
Conclusions - TL;DR: When to use which mutex

Overview

This benchmark suite measures mutex performance under different contention patterns by spawning multiple threads that repeatedly acquire and release a shared lock. The key aspects of our methodology:

Measurement approach:

Each thread measures wait-to-acquire time - the duration from calling lock() until the guard is obtained
All threads start simultaneously using a barrier synchronization primitive
All samples are recorded from the moment threads start until the duration elapses
Per-thread operation counts track fairness and starvation issues

Metrics reported:

Throughput: total lock acquire/release operations per second across all threads
Wait latencies (in nanoseconds): mean, median (p50), p95, p99, and standard deviation
Per-thread ops: distribution of work across threads (reveals fairness/starvation)

Scenarios simulated:

Uncontended: Single thread (no contention baseline)
Short-hold: Multiple threads with minimal critical section work
Long-hold: Threads sleep while holding the lock (configurable duration)
Burst: Threads cycle between active periods (acquiring locks rapidly) and idle periods (sleeping)
Hog: One thread monopolizes the lock while others compete for it

Critical section behavior:

By default: increment a counter with black_box() to prevent compiler optimization
Optional sleep during lock hold (configured via --hold-us or --hog-hold-us)
Wait measurement stops when lock is acquired; hold time is separate

Quick Start

Run predefined scenario benchmarks:

# Short hold scenario (4 threads, 10 seconds)
cargo short --kind std          # using std::sync::Mutex
cargo short --kind parking-lot  # using parking_lot::Mutex

# Long hold scenario (8 threads, 10 seconds, 500μs hold time)
cargo long --kind std
cargo long --kind parking-lot

# Burst scenario (8 threads, 15 seconds, 200ms on / 800ms off)
cargo burst --kind std
cargo burst --kind parking-lot

# Hog scenario (6 threads, 15 seconds)
# Add --hog-hold-us to force the hog to sleep *inside* the lock and truly monopolize
cargo hog --kind std -- --hog-hold-us 500
cargo hog --kind parking-lot -- --hog-hold-us 500

Or use the shorthand aliases:

cargo bstd --scenario short-hold --threads 4    # std::sync::Mutex
cargo bpl --scenario burst --threads 8 --pin    # parking_lot::Mutex with CPU pinning

Custom Benchmarks

Run with custom parameters:

cargo run --release -- --kind std --scenario short-hold --threads 8 --seconds 20 --pin

Available Options

--kind: std or parking-lot
--scenario: uncontended, short-hold, long-hold, burst, hog
--threads: Number of worker threads (default: 4)
--seconds: Benchmark duration (default: 10)
--hold-us: Microseconds to hold lock in long-hold scenario (default: 200)
--burst-on-ms: Active period in burst scenario (default: 200)
--burst-off-ms: Idle period in burst scenario (default: 800)
--hog-hold-us: Hog scenario only. Hog sleeps this many µs while holding the lock (default: 0)
--pin: Pin threads to CPU cores for reduced context switching

Note: In uncontended, the harness forces threads=1 even if a larger value is passed.

Configuration

Rust toolchain: pinned via rust-toolchain.toml
parking_lot: pinned in Cargo.toml
Scenarios: uncontended, short hold, long hold, burst, hog
Metrics: throughput, median/p95/p99/stdev wait-to-acquire, per-thread ops

Percentile Definition

Percentiles use the inclusive nearest-rank convention: index = min(n * p / 100, n - 1) on the sorted wait list. This makes p99 equal to the maximum when n=100, which is a conservative choice for tail latency.

Run on Linux via Docker (from macOS)

Quick one-liners:

cargo linux           # native-arch Linux (arm64 on Apple Silicon)
cargo linux-amd64     # x86_64 Linux via emulation (slower)

These call scripts/bench_linux_docker.sh, which builds in /work/target-linux and runs scripts/bench_all.sh. Use CPUSET=0-3 cargo linux to limit cores inside the container for steadier numbers.

Benchmark Results: std::Mutex vs parking_lot

Scenario 1: ShortHold (4 threads, 10 seconds)

Configuration: Minimal work in critical section, moderate contention

Metric	std::Mutex	parking_lot	Winner	Difference
Throughput	8.01M ops/s	7.35M ops/s	std	9.0% faster
Median Wait	125ns	125ns	tie	—
Mean Wait	308ns	417ns	std	35.4% lower
P95 Wait	709ns	833ns	std	17.5% lower
P99 Wait	7.71μs	8.54μs	std	10.8% lower
StdDev Wait	2.99μs	8.25μs	std	63.8% lower
Per-Thread Ops	[20.5M, 19.4M, 19.7M, 20.6M]	[18.3M, 18.9M, 18.2M, 18.2M]	—	—
Fairness (Range)	5.6%	3.9%	parking_lot	43.3% more fair

Summary: std::Mutex wins on throughput and latency in this low-to-moderate contention scenario, but parking_lot shows significantly better fairness with more even operation distribution across threads.

Scenario 2: LongHold (8 threads, 10 seconds, 500μs hold)

Configuration: Long critical sections (500μs sleep), heavy contention

Metric	std::Mutex	parking_lot	Winner	Difference
Throughput	748 ops/s	696 ops/s	std	7.5% faster
Median Wait	125ns	9.25ms	std	74,005× lower
Mean Wait	9.36ms	10.14ms	std	8.3% lower
P95 Wait	291ns	16.95ms	std	58,258× lower
P99 Wait	541ns	21.65ms	std	40,024× lower
StdDev Wait	188.73ms	3.67ms	parking_lot	51.4× more stable
Per-Thread Ops	[1026, 815, 666, 66, 1252, 1394, 1026, 1233]	[864, 873, 866, 876, 873, 860, 872, 877]	—	—
Fairness (Range)	95.3%	1.9%	parking_lot	49.1× more fair

Finding: std::Mutex shows severe thread starvation (one thread only completed 66 operations vs 1394 for another). The extremely low median/P95/P99 for std combined with high mean and massive stddev (188ms) indicates most acquisitions are fast, but occasional extreme delays cause thread starvation.

Summary: While std has higher throughput, it achieves this through unfairness. parking_lot ensures all threads make progress with dramatically better fairness and predictability (49× more fair, 51× more stable wait times).

Scenario 3: Burst (8 threads, 15 seconds, 200ms on / 800ms off)

Configuration: Bursty workload with periodic activity spikes

Metric	std::Mutex	parking_lot	Winner	Difference
Throughput	1.15M ops/s	1.37M ops/s	parking_lot	18.5% faster
Median Wait	84ns	84ns	tie	—
Mean Wait	935ns	966ns	std	3.3% lower
P95 Wait	1.42μs	2.00μs	std	41.2% lower
P99 Wait	15.96μs	21.00μs	std	31.6% lower
StdDev Wait	8.68μs	6.96μs	parking_lot	24.8% more stable
Per-Thread Ops	[2.39M, 2.13M, 2.13M, 2.15M, 2.07M, 2.20M, 2.10M, 2.12M]	[2.52M, 2.52M, 2.55M, 2.79M, 2.51M, 2.51M, 2.53M, 2.56M]	—	—
Fairness (Range)	13.6%	9.9%	parking_lot	37.1% more fair

Summary: parking_lot wins on throughput (18.5% faster) and fairness in bursty scenarios. The periodic contention spikes favor parking_lot's adaptive spinning and fairness mechanisms. While std has lower tail latencies, parking_lot's better stability and fairness lead to higher overall throughput.

Scenario 4: Hog (6 threads, 15 seconds, 500μs hog hold)

Configuration: One thread monopolizes the lock while others compete

Metric	std::Mutex	parking_lot	Winner	Difference
Throughput	820 ops/s	2,965 ops/s	parking_lot	261.6% faster
Median Wait	166ns	359.88μs	std	2,168× lower
Mean Wait	6.10ms	812.30μs	std	7.5× lower
P95 Wait	333ns	2.26ms	std	6,800× lower
P99 Wait	500ns	4.78ms	std	9,550× lower
StdDev Wait	130.76ms	1.09ms	parking_lot	120× more stable
Per-Thread Ops	[12242, 10, 6, 8, 16, 11]	[9168, 7082, 7063, 7023, 7109, 7037]	—	—
Fairness (Range)	100.0%	23.4%	parking_lot	4.3× more fair

Finding: The hog thread completed 12,242 operations while other threads completed only 6-16 operations (essentially complete starvation). This demonstrates the worst-case scenario for std::Mutex's unfair locking behavior.

Summary: parking_lot's "eventual fairness" mechanism prevents monopolization, resulting in 261.6% higher overall throughput and dramatically better fairness. All threads make meaningful progress with parking_lot, while std completely starves non-hog threads.

Overall Conclusions

When to Use std::Mutex

Low to moderate contention scenarios
When you want zero dependencies
Short critical sections with symmetric workloads
Maximum throughput is more important than fairness

When to Use parking_lot::Mutex

Any scenario requiring fairness (prevents thread starvation)
Bursty workloads (18.5% faster)
Risk of monopolization (261.6% faster, prevents starvation)
Need predictable latency (much more stable wait times)
Long critical sections with multiple threads
When preventing priority inversion matters

Key Insight

The benchmark reveals that std::Mutex on Linux is not fair by default. It uses a "barging" strategy that can lead to severe thread starvation under contention. parking_lot's "eventual fairness" mechanism (forcing fair unlock every 0.5ms) provides dramatically better behavior in contended scenarios while maintaining competitive performance in uncontended cases.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.cargo		.cargo
.vscode		.vscode
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mutex-benches

Overview

Quick Start

Custom Benchmarks

Available Options

Configuration

Percentile Definition

Run on Linux via Docker (from macOS)

Benchmark Results: std::Mutex vs parking_lot

Scenario 1: ShortHold (4 threads, 10 seconds)

Scenario 2: LongHold (8 threads, 10 seconds, 500μs hold)

Scenario 3: Burst (8 threads, 15 seconds, 200ms on / 800ms off)

Scenario 4: Hog (6 threads, 15 seconds, 500μs hog hold)

Overall Conclusions

When to Use std::Mutex

When to Use parking_lot::Mutex

Key Insight

About

Uh oh!

Releases

Packages

Languages

cuongleqq/mutex-benches

Folders and files

Latest commit

History

Repository files navigation

mutex-benches

Overview

Quick Start

Custom Benchmarks

Available Options

Configuration

Percentile Definition

Run on Linux via Docker (from macOS)

Benchmark Results: std::Mutex vs parking_lot

Scenario 1: ShortHold (4 threads, 10 seconds)

Scenario 2: LongHold (8 threads, 10 seconds, 500μs hold)

Scenario 3: Burst (8 threads, 15 seconds, 200ms on / 800ms off)

Scenario 4: Hog (6 threads, 15 seconds, 500μs hog hold)

Overall Conclusions

When to Use std::Mutex

When to Use parking_lot::Mutex

Key Insight

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages