Conversation
b9b43a0 to
de50ead
Compare
|
Still need to investigate thread-safety of the |
c97a88f to
6fedd7a
Compare
|
Calling |
StephenWakely
left a comment
There was a problem hiding this comment.
Looks good. Just a couple of very minor nits.
statsd/aggregator_benchmark_test.go
Outdated
| i := 0 | ||
| for pb.Next() { | ||
| i++ | ||
| name := fmt.Sprintf("metric.%d", i%100000) |
There was a problem hiding this comment.
This benchmark is mostly going to be benchmarking the string allocation here. We should instead preallocate a big list of these strings before b.ResetTimer().
statsd/options.go
Outdated
| defaultOriginDetection = true | ||
| defaultChannelModeErrorsWhenFull = false | ||
| defaultErrorHandler = func(error) {} | ||
| defaultTagCardinality = CardinalityNotSet |
There was a problem hiding this comment.
defaultTagCardinality isn't used anywhere, so should be removed.
It was removed in a previous PR because the value is now 0, but arguably it should be used and set in resolveOptions.
There was a problem hiding this comment.
Thanks for picking that up. I forgot doing something with that value.
StephenWakely
left a comment
There was a problem hiding this comment.
I make this mistake every time too...
statsd/options.go
Outdated
| channelModeErrorsWhenFull: defaultChannelModeErrorsWhenFull, | ||
| errorHandler: defaultErrorHandler, | ||
| aggregatorShardCount: defaultAggregatorShardCount, | ||
| tagCardinality: &defaultTagCardinality, |
There was a problem hiding this comment.
Thinking about it, the defaultTagCardinality should be nil, we use nil to mean it hasn't been set here and should be loaded from the environment.
cff244f to
b1f2f9f
Compare
Updated version of #343 (fixes git conflicts)
This PR introduces optional sharding for client-side aggregation locks to significantly improve throughput in high-concurrency scenarios.
Problem:
The aggregator previously used a single
sync.RWMutexfor each metric type (counts,gauges,sets). Under high concurrency (many goroutines reporting metrics), this single lock became a major contention point, limiting throughput even when reporting unique metrics.Solution:
We have introduced a sharding mechanism for these metric maps.
WithAggregatorShardCount(int).1, preserving the existing behavior and performance characteristics.shardCount > 1, metrics are distributed across shards based on a hash of their context (name + tags), reducing lock contention by a factor roughly equal to the shard count.Performance:
Micro-benchmarks demonstrate significant improvements in high-contention scenarios (M4 Max, 14 threads):
High Contention (Concurrent updates to unique metrics):
-> ~6x throughput improvement
Overhead (Single thread, no contention):
-> Zero overhead introduced for the default configuration.
The optimization ensures that the hashing cost is completely bypassed when sharding is disabled (default), guaranteeing no regression for existing users.