-
Notifications
You must be signed in to change notification settings - Fork 810
Open
Description
Burrow/core/internal/evaluator/caching.go
Lines 226 to 252 in 4a05b20
| for topic, partitions := range topics { | |
| for partitionID, partition := range partitions { | |
| partitionStatus := evaluatePartitionStatus(partition, module.minimumComplete, module.allowedLag) | |
| partitionStatus.Topic = topic | |
| partitionStatus.Partition = int32(partitionID) | |
| partitionStatus.Owner = partition.Owner | |
| partitionStatus.ClientID = partition.ClientID | |
| if partitionStatus.Status > status.Status { | |
| // If the partition status is greater than StatusError, we just mark it as StatusError | |
| if partitionStatus.Status > protocol.StatusError { | |
| status.Status = protocol.StatusError | |
| } else { | |
| status.Status = partitionStatus.Status | |
| } | |
| } | |
| if (status.Maxlag == nil) || (partitionStatus.CurrentLag > status.Maxlag.CurrentLag) { | |
| status.Maxlag = partitionStatus | |
| } | |
| if partitionStatus.Complete == 1.0 { | |
| completePartitions++ | |
| } | |
| status.Partitions[count] = partitionStatus | |
| count++ | |
| } | |
| } |
This piece of code loops over a map of topics, and if the last topic's last partition is reporting ok, the consumer status will be ok.
Given that the map iteration in go is randomized, the consumer status is unpredictable.
The following are the real world effect from this:
-
The metric from burrow of a consumer when scraping at 2m interval:

-
The metric from burrow-exporter which requests burrow at 30s interval, and then being scrapped at 2m interval:

The more frequently we query (as burrow uses 30s cache expiration by default), the more likely to see non-OK consumer status.
Metadata
Metadata
Assignees
Labels
No labels