Skip to content

Gracefully handle zookeeper disconnects in ToozWorkerAdvertiser #34

@jimbobhickville

Description

@jimbobhickville

We've hit an issue in running the new Tooz-backed worker setup. When Zookeeper has a connection issue, and the heartbeat fails with a connection error, the worker goes offline indefinitely and never recovers. It stops advertising itself as available so it never gets assigned work. Only restarting the worker process will make it recover. It should either sit idle until it can reconnect or exit the process so it can be noticed by monitoring systems and restarted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions