Polling the HTTP /lag endpoint causes strange results

This maybe my limited understanding, here, but it's worth me writing this down to see if I'm misguided or have found an issue. My setup is 1 x Burrow instance consuming a single Kafka cluster. I have a polling script that calls the `/lag` endpoint periodically and forwards the retrieved JSON payload to a timeseries DB for visualisation and alerting.

After a period of ~36 hours I start to see a large increase in the calculated `current_lag` value per Topic Partition. I can find no natural explanation for this lag in terms of an increase of messages produced, or a slowdown in the Consumers within the Group. What I have found is:

- Restarting the poller makes the `current_lag` reset to zero.
- Starting a second poller alongside causes different `current_lag` values to be reported from the first poller.

To be clear: I have no fancy logic in the pollers - they simply perform a HTTP GET on `v3/kafka/$CLUSTER/consumer/$CONSUMER_GROUP/lag` and report the `current_lag` per Partition.


Have I misconfigured something?

This is Burrow v1.8.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Polling the HTTP /lag endpoint causes strange results #833

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Polling the HTTP /lag endpoint causes strange results #833

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions