Skip to content

Polling the HTTP /lag endpoint causes strange results #833

@alexchowle

Description

@alexchowle

This maybe my limited understanding, here, but it's worth me writing this down to see if I'm misguided or have found an issue. My setup is 1 x Burrow instance consuming a single Kafka cluster. I have a polling script that calls the /lag endpoint periodically and forwards the retrieved JSON payload to a timeseries DB for visualisation and alerting.

After a period of ~36 hours I start to see a large increase in the calculated current_lag value per Topic Partition. I can find no natural explanation for this lag in terms of an increase of messages produced, or a slowdown in the Consumers within the Group. What I have found is:

  • Restarting the poller makes the current_lag reset to zero.
  • Starting a second poller alongside causes different current_lag values to be reported from the first poller.

To be clear: I have no fancy logic in the pollers - they simply perform a HTTP GET on v3/kafka/$CLUSTER/consumer/$CONSUMER_GROUP/lag and report the current_lag per Partition.

Have I misconfigured something?

This is Burrow v1.8.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions