Skip to content

Comments

status: use hive jobs for probes#4

Open
MitchLewis930 wants to merge 1 commit intopr_044_beforefrom
pr_044_after
Open

status: use hive jobs for probes#4
MitchLewis930 wants to merge 1 commit intopr_044_beforefrom
pr_044_after

Conversation

@MitchLewis930
Copy link

PR_044

This commit refactores the probe initialization to use a Hive Job instead
of a plain lifecycle start hook. This way we can also get rid of the raw
Go routine to execute the check if every probe successfully executed at
least once before exposing the status.

Note: the kvstore "shutdown check" is still part of its own lifecycle
stop hook. Probably better to eventually move this to the kvstore module.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
@greptile-apps
Copy link

greptile-apps bot commented Jan 30, 2026

Greptile Overview

Greptile Summary

Refactored status probe initialization from lifecycle hooks to hive job pattern. The change migrates probe startup logic from OnStart lifecycle hook + goroutine to a job.OneShot pattern, providing better integration with the hive framework's health reporting system.

Key Changes:

  • Removed Health field from statusParams, added JobGroup field
  • Moved probe initialization logic from startStatusCollector() method into a job.OneShot closure
  • Health reporting now uses the cell.Health parameter provided by the job framework instead of params.Health.NewScope()
  • Added defer collector.statusCollector.Close() in the job to ensure cleanup happens when the job context is cancelled
  • Removed OnStart hook entirely; probe lifecycle now managed by job instead of lifecycle hooks
  • Removed empty line in OnStop hook for cleaner formatting
  • Deleted startStatusCollector() method from status_collector.go as logic moved to cell.go

Behavior Changes:

  • Health reporting: Previously used probeCheckHealth.Degraded() and probeCheckHealth.OK(), now returns error which allows job framework to handle health status
  • Cleanup timing: Close() now called when job context is done (via defer), previously called in OnStop hook
  • The job now blocks on <-ctx.Done() after probes initialize, keeping them running until shutdown

Confidence Score: 4/5

  • This PR is safe to merge with low risk
  • The refactoring follows established patterns in the codebase (job.OneShot is used extensively), maintains equivalent functionality, and improves integration with the hive framework. The change is part of a series of commits modernizing the status collector. One minor concern is the cleanup timing change (Close() now happens earlier via defer in job vs OnStop hook), but this should be acceptable as the job blocks until context cancellation
  • No files require special attention

Important Files Changed

Filename Overview
pkg/status/cell.go Refactored probe initialization from lifecycle OnStart to hive job.OneShot pattern; improved cleanup by deferring Close() call
pkg/status/status_collector.go Removed startStatusCollector method as logic migrated to cell.go job pattern

Sequence Diagram

sequenceDiagram
    participant Hive as Hive Framework
    participant Job as Job Group
    participant SC as StatusCollector
    participant Collector as Collector
    participant Probes as Probes
    participant Health as Health Reporter

    Note over Hive,SC: Initialization Phase
    Hive->>SC: newStatusCollector(params)
    SC->>Collector: newCollector(logger, config)
    
    Note over Job,Health: Job Execution Phase
    Hive->>Job: Start JobGroup
    Job->>Job: Execute OneShot("probes")
    Job->>Health: Provide Health scope
    Job->>SC: Call job function with ctx
    
    SC->>Probes: getProbes()
    Probes-->>SC: Return probe list
    SC->>Collector: StartProbes(probes)
    Note over Collector,Probes: Probes run in goroutines
    
    SC->>SC: Create timeout context
    SC->>Collector: WaitForFirstRun(waitCtx)
    
    alt All probes succeed
        Collector-->>SC: return nil
        SC->>SC: Set allProbesInitialized = true
        SC->>Job: Wait for ctx.Done()
        Job->>SC: Context cancelled
        SC->>Collector: defer Close()
        SC-->>Job: return nil
    else Probes timeout
        Collector-->>SC: return error
        SC->>Collector: defer Close()
        SC-->>Job: return error (health degraded)
    end
    
    Note over Hive,SC: Shutdown Phase
    Hive->>SC: OnStop hook
    SC->>SC: Check KVstore status
    SC-->>Hive: return nil
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants