Skip to content

Comments

Introduce MAC collision metric and alert#342

Open
RamLavi wants to merge 3 commits intokubevirt:mainfrom
RamLavi:add_alert_KubemacpoolMACCollisionDetected
Open

Introduce MAC collision metric and alert#342
RamLavi wants to merge 3 commits intokubevirt:mainfrom
RamLavi:add_alert_KubemacpoolMACCollisionDetected

Conversation

@RamLavi
Copy link
Contributor

@RamLavi RamLavi commented Jan 13, 2026

What this PR does / why we need it:
This PR introduces:

  • a new gauge metric for kmp_mac_collisions, deployed by kubemacpool.
  • a new alert for running objects MAC collision.
    It also references the new alert on the deprecated alert.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:
Assisted by Claude

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

Introduce kmp_mac_collisions metric 
Introduce KubemacpoolMACCollisionDetected alert Runbook

@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Jan 13, 2026
@RamLavi
Copy link
Contributor Author

RamLavi commented Jan 13, 2026

/hold

until the alert is merged and bumped to CNAO

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2026
@RamLavi
Copy link
Contributor Author

RamLavi commented Jan 13, 2026

example of how the diagnoses should look like:

[root@zeus22 kubemacpool]# kubectl exec -n $KMP_NAMESPACE deployment/kubemacpool-mac-controller-manager \
+      -c manager -- curl -s http://localhost:8080/metrics | grep kmp_mac_collisions
# HELP kmp_mac_collisions Count of running objects sharing the same MAC address (collision when > 1)
# TYPE kmp_mac_collisions gauge
kmp_mac_collisions{mac="02:00:00:aa:bb:cc"} 2
[root@zeus22 kubemacpool]#
[root@zeus22 kubemacpool]#
[root@zeus22 kubemacpool]# kubectl get vmi -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.metadata.name}{"\t"}{.status.interfaces[*].mac}{"\n"}{end}' | grep -i "02:00:00:aa:bb:cc"
test-collision-ns1	test-vm-collision-1	02:00:00:aa:bb:cc
test-collision-ns2	test-vm-collision-2	02:00:00:aa:bb:cc

@RamLavi RamLavi force-pushed the add_alert_KubemacpoolMACCollisionDetected branch from d092889 to 0cf2d0f Compare February 1, 2026 08:34
@kubevirt-bot kubevirt-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L and removed size/M labels Feb 1, 2026
@RamLavi RamLavi changed the title Introduce KubemacpoolMACCollisionDetected alert Runbook Introduce MAC collision metric and alert Feb 1, 2026
@RamLavi RamLavi force-pushed the add_alert_KubemacpoolMACCollisionDetected branch from 0cf2d0f to a1da150 Compare February 15, 2026 16:05
@kubevirt-bot kubevirt-bot added size/M and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L labels Feb 15, 2026
@RamLavi
Copy link
Contributor Author

RamLavi commented Feb 15, 2026

/hold cancel

@sradco all feature content has been merged. We can proceed with this PR.
Could you take a look?

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 15, 2026
@RamLavi
Copy link
Contributor Author

RamLavi commented Feb 15, 2026

Hey @avlitman is the failed sanity unrelated? The runbooks there are not related to the one I am adding.

@avlitman
Copy link
Collaborator

Hey @avlitman is the failed sanity unrelated? The runbooks there are not related to the one I am adding.

fixed in #353 you can rebase

@RamLavi RamLavi force-pushed the add_alert_KubemacpoolMACCollisionDetected branch from a1da150 to 1ae0f84 Compare February 16, 2026 13:30
@RamLavi
Copy link
Contributor Author

RamLavi commented Feb 16, 2026

Change: Rebase

avlitman
avlitman previously approved these changes Feb 16, 2026
docs/metrics.md Outdated
| ssp-operator | `kubevirt_ssp_operator_up` | Recording rule | Gauge | The total number of running ssp-operator pods |
| ssp-operator | `kubevirt_ssp_template_validator_rejected_increase` | Recording rule | Gauge | The increase in the number of rejected template validators, over the last hour |
| ssp-operator | `kubevirt_ssp_template_validator_up` | Recording rule | Gauge | The total number of running virt-template-validator pods |
| kubemacpool | `kmp_mac_collisions` | Gauge | Count of running objects sharing the same MAC address (collision when > 1) |
Copy link
Collaborator

@avlitman avlitman Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be added automatically once it's added to the repo, so add kubemacpool here: https://github.com/kubevirt/monitoring/blob/main/tools/metricsdocs/types.go#L26 and run the workflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

@avlitman avlitman dismissed their stale review February 16, 2026 14:03

had one comment

@avlitman avlitman self-requested a review February 16, 2026 14:05
@avlitman
Copy link
Collaborator

Hi @sradco this runbook and metric is not under kubevirt org it's here: https://github.com/k8snetworkplumbingwg/kubemacpool/blob/main/pkg/monitoring/rules/alerts/alerts.go#L39

I wonder if it's the right place to add this runbook and metric documentation. WDTY?

@RamLavi RamLavi force-pushed the add_alert_KubemacpoolMACCollisionDetected branch from 1ae0f84 to 8910bc3 Compare February 17, 2026 12:06
@RamLavi
Copy link
Contributor Author

RamLavi commented Feb 17, 2026

Hi @sradco this runbook and metric is not under kubevirt org it's here: https://github.com/k8snetworkplumbingwg/kubemacpool/blob/main/pkg/monitoring/rules/alerts/alerts.go#L39

I wonder if it's the right place to add this runbook and metric documentation. WDTY?

IMO since this repo is deployed by HCO (by extention by CNAO) - we should treat it as one deployed by the product...
@sradco what do you say?

Move the GitHub org from a hardcoded global "kubevirt" string to a
per-project field on projectInfo and project structs. All existing
projects use the new defaultOrg constant ("kubevirt"), so there is
no change in behavior.

This prepares for adding projects hosted outside the kubevirt org.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Add kubemacpool as a tracked project in the metrics documentation
generator. Since kubemacpool is hosted under the k8snetworkplumbingwg
GitHub org (not kubevirt), this uses the per-project org support
introduced in the previous commit.

Regenerate docs/metrics.md to include the new KMP metrics.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
This commit introduces a new alert for running objects MAC collision.
It also references the new alert on the deprecated alert.

Signed-off-by: Ram Lavi <ralavi@redhat.com>
@RamLavi RamLavi force-pushed the add_alert_KubemacpoolMACCollisionDetected branch from 8910bc3 to 109eacd Compare February 18, 2026 10:26
@RamLavi
Copy link
Contributor Author

RamLavi commented Feb 18, 2026

Change: Rebase

@sradco
Copy link
Collaborator

sradco commented Feb 24, 2026

@RamLavi please add to the PR description the link to the PR of the new alert and metric. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants