Benchmarks for std::sync::Mutex vs parking_lot::Mutex.
Quick Links:
- Quick Start - Run benchmarks immediately
- Benchmark Results - See the data
- Conclusions - TL;DR: When to use which mutex
This benchmark suite measures mutex performance under different contention patterns by spawning multiple threads that repeatedly acquire and release a shared lock. The key aspects of our methodology:
Measurement approach:
- Each thread measures wait-to-acquire time - the duration from calling
lock()until the guard is obtained - All threads start simultaneously using a barrier synchronization primitive
- All samples are recorded from the moment threads start until the duration elapses
- Per-thread operation counts track fairness and starvation issues
Metrics reported:
- Throughput: total lock acquire/release operations per second across all threads
- Wait latencies (in nanoseconds): mean, median (p50), p95, p99, and standard deviation
- Per-thread ops: distribution of work across threads (reveals fairness/starvation)
Scenarios simulated:
- Uncontended: Single thread (no contention baseline)
- Short-hold: Multiple threads with minimal critical section work
- Long-hold: Threads sleep while holding the lock (configurable duration)
- Burst: Threads cycle between active periods (acquiring locks rapidly) and idle periods (sleeping)
- Hog: One thread monopolizes the lock while others compete for it
Critical section behavior:
- By default: increment a counter with
black_box()to prevent compiler optimization - Optional sleep during lock hold (configured via
--hold-usor--hog-hold-us) - Wait measurement stops when lock is acquired; hold time is separate
Run predefined scenario benchmarks:
# Short hold scenario (4 threads, 10 seconds)
cargo short --kind std # using std::sync::Mutex
cargo short --kind parking-lot # using parking_lot::Mutex
# Long hold scenario (8 threads, 10 seconds, 500μs hold time)
cargo long --kind std
cargo long --kind parking-lot
# Burst scenario (8 threads, 15 seconds, 200ms on / 800ms off)
cargo burst --kind std
cargo burst --kind parking-lot
# Hog scenario (6 threads, 15 seconds)
# Add --hog-hold-us to force the hog to sleep *inside* the lock and truly monopolize
cargo hog --kind std -- --hog-hold-us 500
cargo hog --kind parking-lot -- --hog-hold-us 500Or use the shorthand aliases:
cargo bstd --scenario short-hold --threads 4 # std::sync::Mutex
cargo bpl --scenario burst --threads 8 --pin # parking_lot::Mutex with CPU pinningRun with custom parameters:
cargo run --release -- --kind std --scenario short-hold --threads 8 --seconds 20 --pin--kind:stdorparking-lot--scenario:uncontended,short-hold,long-hold,burst,hog--threads: Number of worker threads (default: 4)--seconds: Benchmark duration (default: 10)--hold-us: Microseconds to hold lock inlong-holdscenario (default: 200)--burst-on-ms: Active period inburstscenario (default: 200)--burst-off-ms: Idle period inburstscenario (default: 800)--hog-hold-us: Hog scenario only. Hog sleeps this many µs while holding the lock (default: 0)--pin: Pin threads to CPU cores for reduced context switching
Note: In uncontended, the harness forces threads=1 even if a larger value is passed.
- Rust toolchain: pinned via
rust-toolchain.toml - parking_lot: pinned in Cargo.toml
- Scenarios: uncontended, short hold, long hold, burst, hog
- Metrics: throughput, median/p95/p99/stdev wait-to-acquire, per-thread ops
Percentiles use the inclusive nearest-rank convention: index = min(n * p / 100, n - 1) on the sorted wait list.
This makes p99 equal to the maximum when n=100, which is a conservative choice for tail latency.
Quick one-liners:
cargo linux # native-arch Linux (arm64 on Apple Silicon)
cargo linux-amd64 # x86_64 Linux via emulation (slower)These call scripts/bench_linux_docker.sh, which builds in /work/target-linux and runs scripts/bench_all.sh.
Use CPUSET=0-3 cargo linux to limit cores inside the container for steadier numbers.
Configuration: Minimal work in critical section, moderate contention
| Metric | std::Mutex | parking_lot | Winner | Difference |
|---|---|---|---|---|
| Throughput | 8.01M ops/s | 7.35M ops/s | std | 9.0% faster |
| Median Wait | 125ns | 125ns | tie | — |
| Mean Wait | 308ns | 417ns | std | 35.4% lower |
| P95 Wait | 709ns | 833ns | std | 17.5% lower |
| P99 Wait | 7.71μs | 8.54μs | std | 10.8% lower |
| StdDev Wait | 2.99μs | 8.25μs | std | 63.8% lower |
| Per-Thread Ops | [20.5M, 19.4M, 19.7M, 20.6M] | [18.3M, 18.9M, 18.2M, 18.2M] | — | — |
| Fairness (Range) | 5.6% | 3.9% | parking_lot | 43.3% more fair |
Summary: std::Mutex wins on throughput and latency in this low-to-moderate contention scenario, but parking_lot shows significantly better fairness with more even operation distribution across threads.
Configuration: Long critical sections (500μs sleep), heavy contention
| Metric | std::Mutex | parking_lot | Winner | Difference |
|---|---|---|---|---|
| Throughput | 748 ops/s | 696 ops/s | std | 7.5% faster |
| Median Wait | 125ns | 9.25ms | std | 74,005× lower |
| Mean Wait | 9.36ms | 10.14ms | std | 8.3% lower |
| P95 Wait | 291ns | 16.95ms | std | 58,258× lower |
| P99 Wait | 541ns | 21.65ms | std | 40,024× lower |
| StdDev Wait | 188.73ms | 3.67ms | parking_lot | 51.4× more stable |
| Per-Thread Ops | [1026, 815, 666, 66, 1252, 1394, 1026, 1233] | [864, 873, 866, 876, 873, 860, 872, 877] | — | — |
| Fairness (Range) | 95.3% | 1.9% | parking_lot | 49.1× more fair |
Finding: std::Mutex shows severe thread starvation (one thread only completed 66 operations vs 1394 for another). The extremely low median/P95/P99 for std combined with high mean and massive stddev (188ms) indicates most acquisitions are fast, but occasional extreme delays cause thread starvation.
Summary: While std has higher throughput, it achieves this through unfairness. parking_lot ensures all threads make progress with dramatically better fairness and predictability (49× more fair, 51× more stable wait times).
Configuration: Bursty workload with periodic activity spikes
| Metric | std::Mutex | parking_lot | Winner | Difference |
|---|---|---|---|---|
| Throughput | 1.15M ops/s | 1.37M ops/s | parking_lot | 18.5% faster |
| Median Wait | 84ns | 84ns | tie | — |
| Mean Wait | 935ns | 966ns | std | 3.3% lower |
| P95 Wait | 1.42μs | 2.00μs | std | 41.2% lower |
| P99 Wait | 15.96μs | 21.00μs | std | 31.6% lower |
| StdDev Wait | 8.68μs | 6.96μs | parking_lot | 24.8% more stable |
| Per-Thread Ops | [2.39M, 2.13M, 2.13M, 2.15M, 2.07M, 2.20M, 2.10M, 2.12M] | [2.52M, 2.52M, 2.55M, 2.79M, 2.51M, 2.51M, 2.53M, 2.56M] | — | — |
| Fairness (Range) | 13.6% | 9.9% | parking_lot | 37.1% more fair |
Summary: parking_lot wins on throughput (18.5% faster) and fairness in bursty scenarios. The periodic contention spikes favor parking_lot's adaptive spinning and fairness mechanisms. While std has lower tail latencies, parking_lot's better stability and fairness lead to higher overall throughput.
Configuration: One thread monopolizes the lock while others compete
| Metric | std::Mutex | parking_lot | Winner | Difference |
|---|---|---|---|---|
| Throughput | 820 ops/s | 2,965 ops/s | parking_lot | 261.6% faster |
| Median Wait | 166ns | 359.88μs | std | 2,168× lower |
| Mean Wait | 6.10ms | 812.30μs | std | 7.5× lower |
| P95 Wait | 333ns | 2.26ms | std | 6,800× lower |
| P99 Wait | 500ns | 4.78ms | std | 9,550× lower |
| StdDev Wait | 130.76ms | 1.09ms | parking_lot | 120× more stable |
| Per-Thread Ops | [12242, 10, 6, 8, 16, 11] | [9168, 7082, 7063, 7023, 7109, 7037] | — | — |
| Fairness (Range) | 100.0% | 23.4% | parking_lot | 4.3× more fair |
Finding: The hog thread completed 12,242 operations while other threads completed only 6-16 operations (essentially complete starvation). This demonstrates the worst-case scenario for std::Mutex's unfair locking behavior.
Summary: parking_lot's "eventual fairness" mechanism prevents monopolization, resulting in 261.6% higher overall throughput and dramatically better fairness. All threads make meaningful progress with parking_lot, while std completely starves non-hog threads.
- Low to moderate contention scenarios
- When you want zero dependencies
- Short critical sections with symmetric workloads
- Maximum throughput is more important than fairness
- Any scenario requiring fairness (prevents thread starvation)
- Bursty workloads (18.5% faster)
- Risk of monopolization (261.6% faster, prevents starvation)
- Need predictable latency (much more stable wait times)
- Long critical sections with multiple threads
- When preventing priority inversion matters
The benchmark reveals that std::Mutex on Linux is not fair by default. It uses a "barging" strategy that can lead to severe thread starvation under contention. parking_lot's "eventual fairness" mechanism (forcing fair unlock every 0.5ms) provides dramatically better behavior in contended scenarios while maintaining competitive performance in uncontended cases.