Skip to content

Stopping the peer does not interrupt retry loops #6

@sundbry

Description

@sundbry

This plugin can hang in situations where the write-batch is stuck in a long loop, especially if you have a long retry timeout period. When the instance is (stop)ped, however, it doesn't interrupt this loop, and it can stick around still running in the background. This has some negative consequences

  1. It prevents jobs from checkpointing their output state and you notice the error when you try to resume
  2. it causes issues with background threads when reloading a lot during development

We could mitigate this by setting a smaller retry timeout, but I like to have a long one (especially in production) to cover any down time windows in external services to give them time to recover.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions