Skip to content

Comments

operator: fix errors/warnings metric.#8

Open
MitchLewis930 wants to merge 1 commit intopr_048_beforefrom
pr_048_after
Open

operator: fix errors/warnings metric.#8
MitchLewis930 wants to merge 1 commit intopr_048_beforefrom
pr_048_after

Conversation

@MitchLewis930
Copy link

@MitchLewis930 MitchLewis930 commented Jan 30, 2026

User description

PR_048


PR Type

Bug fix


Description

  • Fix errors/warnings metric initialization in operator

  • Extract metric creation into reusable function

  • Implement flush mechanism for pre-initialization metrics

  • Ensure metrics registered before agent startup


Diagram Walkthrough

flowchart LR
  A["Operator Init"] --> B["InitOperatorMetrics"]
  B --> C["Create ErrorsWarnings Metric"]
  C --> D["Register with Prometheus"]
  D --> E["FlushLoggingMetrics"]
  E --> F["Emit Accumulated Logs"]
Loading

File Walkthrough

Relevant files
Bug fix
metrics.go
Initialize operator metrics and flush logs                             

operator/metrics/metrics.go

  • Call InitOperatorMetrics() during operator metric registration
  • Register ErrorsWarnings metric with Prometheus registry
  • Invoke FlushLoggingMetrics() to emit pre-initialization metrics
+4/-0     
Enhancement
cell.go
Refactor metric flush logic                                                           

pkg/metrics/cell.go

  • Replace inline metric flush logic with FlushLoggingMetrics() call
  • Simplify hive cell initialization by delegating to dedicated function
+1/-4     
logging_hook.go
Add flush function for logging metrics                                     

pkg/metrics/logging_hook.go

  • Add sync.Once variable to ensure flush executes only once
  • Create new FlushLoggingMetrics() function to handle metric flushing
  • Move metric initialization signal logic into dedicated function
+17/-1   
metrics.go
Extract and initialize errors/warnings metric                       

pkg/metrics/metrics.go

  • Extract ErrorsWarnings metric creation into newErrorsWarningsMetric()
    helper function
  • Create InitOperatorMetrics() function to initialize operator-specific
    metrics
  • Replace inline metric creation with reusable function call
+15/-6   

This was broken during transition of pkg/metrics to integrate with Hive where relevant operator metrics where never initialized.
This adds a init func specific for operator and cleans up the "flush" logic used as a work around for errors/warnings emitted prior to agent starting (in the case of the operator).

Addresses: cilium#29525

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Avoid reassigning a global metric variable

Avoid reassigning the global metrics.ErrorsWarnings variable in multiple
initialization paths like InitOperatorMetrics and NewLegacyMetrics. Instead,
instantiate the metric once and provide it to dependent components via
dependency injection to create a more robust and less fragile design.

Examples:

pkg/metrics/metrics.go [1412-1414]
func InitOperatorMetrics() {
	ErrorsWarnings = newErrorsWarningsMetric()
}
pkg/metrics/metrics.go [984]
		ErrorsWarnings: newErrorsWarningsMetric(),

Solution Walkthrough:

Before:

// pkg/metrics/metrics.go
var ErrorsWarnings metric.Vec[metric.Counter] // Global variable

func NewLegacyMetrics() *LegacyMetrics {
  lm := &LegacyMetrics{
    // ...
    ErrorsWarnings: newErrorsWarningsMetric(),
  }
  // ...
  ErrorsWarnings = lm.ErrorsWarnings // Global assignment
  return lm
}

func InitOperatorMetrics() {
	ErrorsWarnings = newErrorsWarningsMetric() // Another global assignment
}

// operator/metrics/metrics.go
func registerMetricsManager(...) {
  metrics.InitOperatorMetrics()
  Registry.MustRegister(metrics.ErrorsWarnings)
  //...
}

After:

// pkg/metrics/metrics.go
// The global `ErrorsWarnings` variable is removed.

func newErrorsWarningsMetric() metric.Vec[metric.Counter] {
  // ... returns a new metric instance
}

// The metric is provided via the dependency injection framework (hive)
// pkg/metrics/cell.go
var Cell = cell.Module("metrics", "Metrics",
  cell.Provide(newErrorsWarningsMetric),
  // ...
)

// Components receive the metric via their constructor/parameters
// instead of accessing a global.
// operator/metrics/metrics.go
func registerMetricsManager(p params, errWarnMetric metric.Vec[metric.Counter]) {
  // ...
  Registry.MustRegister(errWarnMetric)
  // ...
}
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a fragile design using a reassigned global variable (ErrorsWarnings) and proposes a robust dependency injection-based solution that would significantly improve maintainability and prevent future initialization bugs.

High
Possible issue
Preserve closed channel semantics

Do not set the metricsInitialized channel to nil after closing it to ensure
future listeners can correctly observe its closed state.

pkg/metrics/logging_hook.go [26-31]

 flushMetrics.Do(func() {
     if metricsInitialized != nil {
         close(metricsInitialized)
-        metricsInitialized = nil
+        // keep the channel closed, do not set to nil
     }
 })
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that setting the metricsInitialized channel to nil after closing it is a bug, as it would cause any subsequent reads to block indefinitely instead of correctly signaling that initialization is complete.

Medium
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants