-
Notifications
You must be signed in to change notification settings - Fork 810
Description
This maybe my limited understanding, here, but it's worth me writing this down to see if I'm misguided or have found an issue. My setup is 1 x Burrow instance consuming a single Kafka cluster. I have a polling script that calls the /lag endpoint periodically and forwards the retrieved JSON payload to a timeseries DB for visualisation and alerting.
After a period of ~36 hours I start to see a large increase in the calculated current_lag value per Topic Partition. I can find no natural explanation for this lag in terms of an increase of messages produced, or a slowdown in the Consumers within the Group. What I have found is:
- Restarting the poller makes the
current_lagreset to zero. - Starting a second poller alongside causes different
current_lagvalues to be reported from the first poller.
To be clear: I have no fancy logic in the pollers - they simply perform a HTTP GET on v3/kafka/$CLUSTER/consumer/$CONSUMER_GROUP/lag and report the current_lag per Partition.
Have I misconfigured something?
This is Burrow v1.8.0