Skip to content

Comments

operator: fix errors/warnings metric.#8

Open
MitchLewis930 wants to merge 1 commit intopr_048_beforefrom
pr_048_after
Open

operator: fix errors/warnings metric.#8
MitchLewis930 wants to merge 1 commit intopr_048_beforefrom
pr_048_after

Conversation

@MitchLewis930
Copy link

PR_048

This was broken during transition of pkg/metrics to integrate with Hive where relevant operator metrics where never initialized.
This adds a init func specific for operator and cleans up the "flush" logic used as a work around for errors/warnings emitted prior to agent starting (in the case of the operator).

Addresses: cilium#29525

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
@greptile-apps
Copy link

greptile-apps bot commented Jan 30, 2026

Greptile Overview

Greptile Summary

This PR fixes the errors_warnings metric in the Cilium operator by adding proper initialization that was previously missing. The operator doesn't use the pkg/metrics/Cell module (which the agent uses), so it needs to manually initialize the ErrorsWarnings metric before it can be used by the LoggingHook.

Key changes:

  • Added InitOperatorMetrics() to initialize the ErrorsWarnings metric for the operator
  • Extracted newErrorsWarningsMetric() helper to deduplicate metric creation logic
  • Added FlushLoggingMetrics() function using sync.Once for thread-safe, idempotent flushing
  • Operator now calls InitOperatorMetrics(), registers the metric, then calls FlushLoggingMetrics() in correct order
  • Refactored agent path to use the new FlushLoggingMetrics() function for consistency

The implementation is solid with proper ordering (initialize → register → flush) and thread-safety via sync.Once.

Confidence Score: 5/5

  • This PR is safe to merge with no issues found
  • The changes follow established patterns, have proper thread-safety with sync.Once, maintain correct ordering of operations, and improve code organization by extracting reusable functions. The fix addresses a real issue where operator metrics weren't being initialized properly.
  • No files require special attention

Important Files Changed

Filename Overview
operator/metrics/metrics.go Added operator-specific initialization for ErrorsWarnings metric with proper ordering: init, register, then flush
pkg/metrics/cell.go Refactored to use new FlushLoggingMetrics() function for cleaner code
pkg/metrics/logging_hook.go Added FlushLoggingMetrics() function with sync.Once for thread-safe, idempotent metric flushing
pkg/metrics/metrics.go Added InitOperatorMetrics() and extracted newErrorsWarningsMetric() helper to support operator-specific initialization

Sequence Diagram

sequenceDiagram
    participant Init as Operator Init
    participant Hook as LoggingHook
    participant RegMgr as registerMetricsManager
    participant Metrics as pkg/metrics
    participant Prom as Prometheus Registry

    Note over Init,Hook: Early initialization
    Init->>Hook: NewLoggingHook()
    Hook->>Metrics: Create metricsInitialized channel
    Note over Hook: Starts goroutine waiting<br/>on metricsInitialized

    Note over Hook: Errors/warnings logged<br/>accumulate in atomic counters

    Note over RegMgr,Prom: Hive cell initialization
    RegMgr->>Metrics: InitOperatorMetrics()
    Metrics->>Metrics: ErrorsWarnings = newErrorsWarningsMetric()
    
    RegMgr->>Prom: Registry.MustRegister(ErrorsWarnings)
    
    RegMgr->>Metrics: FlushLoggingMetrics()
    Metrics->>Metrics: flushMetrics.Do(close channel)
    Note over Hook: Goroutine unblocks
    Hook->>Metrics: ErrorsWarnings.WithLabelValues().Add()
    Note over Hook: Accumulated errors/warnings<br/>flushed to metric
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants