Skip to content

cuongleqq/mutex-benches

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mutex-benches

Benchmarks for std::sync::Mutex vs parking_lot::Mutex.

Quick Links:

Overview

This benchmark suite measures mutex performance under different contention patterns by spawning multiple threads that repeatedly acquire and release a shared lock. The key aspects of our methodology:

Measurement approach:

  • Each thread measures wait-to-acquire time - the duration from calling lock() until the guard is obtained
  • All threads start simultaneously using a barrier synchronization primitive
  • All samples are recorded from the moment threads start until the duration elapses
  • Per-thread operation counts track fairness and starvation issues

Metrics reported:

  • Throughput: total lock acquire/release operations per second across all threads
  • Wait latencies (in nanoseconds): mean, median (p50), p95, p99, and standard deviation
  • Per-thread ops: distribution of work across threads (reveals fairness/starvation)

Scenarios simulated:

  1. Uncontended: Single thread (no contention baseline)
  2. Short-hold: Multiple threads with minimal critical section work
  3. Long-hold: Threads sleep while holding the lock (configurable duration)
  4. Burst: Threads cycle between active periods (acquiring locks rapidly) and idle periods (sleeping)
  5. Hog: One thread monopolizes the lock while others compete for it

Critical section behavior:

  • By default: increment a counter with black_box() to prevent compiler optimization
  • Optional sleep during lock hold (configured via --hold-us or --hog-hold-us)
  • Wait measurement stops when lock is acquired; hold time is separate

Quick Start

Run predefined scenario benchmarks:

# Short hold scenario (4 threads, 10 seconds)
cargo short --kind std          # using std::sync::Mutex
cargo short --kind parking-lot  # using parking_lot::Mutex

# Long hold scenario (8 threads, 10 seconds, 500μs hold time)
cargo long --kind std
cargo long --kind parking-lot

# Burst scenario (8 threads, 15 seconds, 200ms on / 800ms off)
cargo burst --kind std
cargo burst --kind parking-lot

# Hog scenario (6 threads, 15 seconds)
# Add --hog-hold-us to force the hog to sleep *inside* the lock and truly monopolize
cargo hog --kind std -- --hog-hold-us 500
cargo hog --kind parking-lot -- --hog-hold-us 500

Or use the shorthand aliases:

cargo bstd --scenario short-hold --threads 4    # std::sync::Mutex
cargo bpl --scenario burst --threads 8 --pin    # parking_lot::Mutex with CPU pinning

Custom Benchmarks

Run with custom parameters:

cargo run --release -- --kind std --scenario short-hold --threads 8 --seconds 20 --pin

Available Options

  • --kind: std or parking-lot
  • --scenario: uncontended, short-hold, long-hold, burst, hog
  • --threads: Number of worker threads (default: 4)
  • --seconds: Benchmark duration (default: 10)
  • --hold-us: Microseconds to hold lock in long-hold scenario (default: 200)
  • --burst-on-ms: Active period in burst scenario (default: 200)
  • --burst-off-ms: Idle period in burst scenario (default: 800)
  • --hog-hold-us: Hog scenario only. Hog sleeps this many µs while holding the lock (default: 0)
  • --pin: Pin threads to CPU cores for reduced context switching

Note: In uncontended, the harness forces threads=1 even if a larger value is passed.

Configuration

  • Rust toolchain: pinned via rust-toolchain.toml
  • parking_lot: pinned in Cargo.toml
  • Scenarios: uncontended, short hold, long hold, burst, hog
  • Metrics: throughput, median/p95/p99/stdev wait-to-acquire, per-thread ops

Percentile Definition

Percentiles use the inclusive nearest-rank convention: index = min(n * p / 100, n - 1) on the sorted wait list. This makes p99 equal to the maximum when n=100, which is a conservative choice for tail latency.

Run on Linux via Docker (from macOS)

Quick one-liners:

cargo linux           # native-arch Linux (arm64 on Apple Silicon)
cargo linux-amd64     # x86_64 Linux via emulation (slower)

These call scripts/bench_linux_docker.sh, which builds in /work/target-linux and runs scripts/bench_all.sh. Use CPUSET=0-3 cargo linux to limit cores inside the container for steadier numbers.

Benchmark Results: std::Mutex vs parking_lot

Scenario 1: ShortHold (4 threads, 10 seconds)

Configuration: Minimal work in critical section, moderate contention

Metric std::Mutex parking_lot Winner Difference
Throughput 8.01M ops/s 7.35M ops/s std 9.0% faster
Median Wait 125ns 125ns tie
Mean Wait 308ns 417ns std 35.4% lower
P95 Wait 709ns 833ns std 17.5% lower
P99 Wait 7.71μs 8.54μs std 10.8% lower
StdDev Wait 2.99μs 8.25μs std 63.8% lower
Per-Thread Ops [20.5M, 19.4M, 19.7M, 20.6M] [18.3M, 18.9M, 18.2M, 18.2M]
Fairness (Range) 5.6% 3.9% parking_lot 43.3% more fair

Summary: std::Mutex wins on throughput and latency in this low-to-moderate contention scenario, but parking_lot shows significantly better fairness with more even operation distribution across threads.


Scenario 2: LongHold (8 threads, 10 seconds, 500μs hold)

Configuration: Long critical sections (500μs sleep), heavy contention

Metric std::Mutex parking_lot Winner Difference
Throughput 748 ops/s 696 ops/s std 7.5% faster
Median Wait 125ns 9.25ms std 74,005× lower
Mean Wait 9.36ms 10.14ms std 8.3% lower
P95 Wait 291ns 16.95ms std 58,258× lower
P99 Wait 541ns 21.65ms std 40,024× lower
StdDev Wait 188.73ms 3.67ms parking_lot 51.4× more stable
Per-Thread Ops [1026, 815, 666, 66, 1252, 1394, 1026, 1233] [864, 873, 866, 876, 873, 860, 872, 877]
Fairness (Range) 95.3% 1.9% parking_lot 49.1× more fair

Finding: std::Mutex shows severe thread starvation (one thread only completed 66 operations vs 1394 for another). The extremely low median/P95/P99 for std combined with high mean and massive stddev (188ms) indicates most acquisitions are fast, but occasional extreme delays cause thread starvation.

Summary: While std has higher throughput, it achieves this through unfairness. parking_lot ensures all threads make progress with dramatically better fairness and predictability (49× more fair, 51× more stable wait times).


Scenario 3: Burst (8 threads, 15 seconds, 200ms on / 800ms off)

Configuration: Bursty workload with periodic activity spikes

Metric std::Mutex parking_lot Winner Difference
Throughput 1.15M ops/s 1.37M ops/s parking_lot 18.5% faster
Median Wait 84ns 84ns tie
Mean Wait 935ns 966ns std 3.3% lower
P95 Wait 1.42μs 2.00μs std 41.2% lower
P99 Wait 15.96μs 21.00μs std 31.6% lower
StdDev Wait 8.68μs 6.96μs parking_lot 24.8% more stable
Per-Thread Ops [2.39M, 2.13M, 2.13M, 2.15M, 2.07M, 2.20M, 2.10M, 2.12M] [2.52M, 2.52M, 2.55M, 2.79M, 2.51M, 2.51M, 2.53M, 2.56M]
Fairness (Range) 13.6% 9.9% parking_lot 37.1% more fair

Summary: parking_lot wins on throughput (18.5% faster) and fairness in bursty scenarios. The periodic contention spikes favor parking_lot's adaptive spinning and fairness mechanisms. While std has lower tail latencies, parking_lot's better stability and fairness lead to higher overall throughput.


Scenario 4: Hog (6 threads, 15 seconds, 500μs hog hold)

Configuration: One thread monopolizes the lock while others compete

Metric std::Mutex parking_lot Winner Difference
Throughput 820 ops/s 2,965 ops/s parking_lot 261.6% faster
Median Wait 166ns 359.88μs std 2,168× lower
Mean Wait 6.10ms 812.30μs std 7.5× lower
P95 Wait 333ns 2.26ms std 6,800× lower
P99 Wait 500ns 4.78ms std 9,550× lower
StdDev Wait 130.76ms 1.09ms parking_lot 120× more stable
Per-Thread Ops [12242, 10, 6, 8, 16, 11] [9168, 7082, 7063, 7023, 7109, 7037]
Fairness (Range) 100.0% 23.4% parking_lot 4.3× more fair

Finding: The hog thread completed 12,242 operations while other threads completed only 6-16 operations (essentially complete starvation). This demonstrates the worst-case scenario for std::Mutex's unfair locking behavior.

Summary: parking_lot's "eventual fairness" mechanism prevents monopolization, resulting in 261.6% higher overall throughput and dramatically better fairness. All threads make meaningful progress with parking_lot, while std completely starves non-hog threads.

Overall Conclusions

When to Use std::Mutex

  • Low to moderate contention scenarios
  • When you want zero dependencies
  • Short critical sections with symmetric workloads
  • Maximum throughput is more important than fairness

When to Use parking_lot::Mutex

  • Any scenario requiring fairness (prevents thread starvation)
  • Bursty workloads (18.5% faster)
  • Risk of monopolization (261.6% faster, prevents starvation)
  • Need predictable latency (much more stable wait times)
  • Long critical sections with multiple threads
  • When preventing priority inversion matters

Key Insight

The benchmark reveals that std::Mutex on Linux is not fair by default. It uses a "barging" strategy that can lead to severe thread starvation under contention. parking_lot's "eventual fairness" mechanism (forcing fair unlock every 0.5ms) provides dramatically better behavior in contended scenarios while maintaining competitive performance in uncontended cases.

About

Benchmarks for std::sync::Mutex vs parking_lot::Mutex

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published