Skip to content

Comments

Add node neighbor link updater controller#9

Open
MitchLewis930 wants to merge 1 commit intopr_049_beforefrom
pr_049_after
Open

Add node neighbor link updater controller#9
MitchLewis930 wants to merge 1 commit intopr_049_beforefrom
pr_049_after

Conversation

@MitchLewis930
Copy link

@MitchLewis930 MitchLewis930 commented Jan 30, 2026

PR_049


Note

Medium Risk
Touches core datapath node neighbor discovery/refresh flow and changes execution model (async goroutines → queued controller), which could affect neighbor programming timing and error handling in production.

Overview
Adds a new node-manager–owned queue and StartNodeNeighborLinkUpdater controller that drains queued node neighbor updates on a fixed interval and reports per-node health, then wires it into agent startup.

Refactors neighbor handling so datapath/node code enqueues neighbor insert/refresh requests (including ARP ping periodic refresh) instead of spawning ad-hoc goroutines; introduces datapath.NodeNeighborEnqueuer and passes the NodeManager into the Linux datapath/node handler to support this.

Updates the NodeNeighborRefresh interface to accept a refresh flag, adjusts error propagation/formatting in Linux neighbor insertion, and updates tests/fakes plus adds unit tests for the new queue.

Written by Cursor Bugbot for commit 8d525fe. This will update automatically on new commits. Configure here.

This controller is responsible for monitoring nodes that request a neighbor
link update.
It provides a single place in the application where all node neighbor link
are updates are performed as insertNeighbor are sprinkled in the code base.
Additionally given the central location we can provide additional
module health signals should neighbor link updates failed.

Signed-off-by: Fernand Galiana <fernand.galiana@gmail.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

logfields.LogSubsys: "enqueue",
}).Warn("Skipping nodeNeighbor insert: No node given")
}
m.nodeNeighborQueue.push(&nodeQueueEntry{node: n, refresh: refresh})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return after nil check causes nil node queuing

High Severity

The Enqueue function logs "Skipping nodeNeighbor insert: No node given" when n == nil, but then continues execution and pushes the nil node to the queue anyway. A return statement is missing after the warning log, so the intended skip behavior doesn't actually happen. This causes nil nodes to accumulate in the queue and subsequently trigger errors when processed by StartNodeNeighborLinkUpdater, which breaks out of the processing loop and leaves remaining valid items unprocessed.

Fix in Cursor Fix in Web

if e == nil || e.node == nil {
errs = errors.Join(errs, fmt.Errorf("invalid node spec found in queue: %#v", e))
break
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty queue incorrectly treated as error condition

Medium Severity

In StartNodeNeighborLinkUpdater, after processing all items in the queue, pop() returns nil to indicate the queue is empty. However, the condition e == nil is grouped with e.node == nil and both trigger an error message "invalid node spec found in queue". An empty queue is a normal termination condition, not an error. This causes spurious error accumulation and incorrect health status reporting after every successful queue drain.

Fix in Cursor Fix in Web

hr.Degraded("Failed node neighbor link update", errs)
} else {
hr.OK("Node neighbor link update successful")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accumulated errors incorrectly mark successful nodes as degraded

Medium Severity

In StartNodeNeighborLinkUpdater, the health status check uses accumulated errs rather than the current node's error. When processing multiple nodes, if an earlier node fails, errs retains that error. Subsequent successful nodes then get marked as "Degraded" because errs != nil is true even though their individual NodeNeighborRefresh call succeeded. The per-node health reporter (hr) receives misleading status based on unrelated prior failures.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants