Add node neighbor link updater controller#9
Add node neighbor link updater controller#9MitchLewis930 wants to merge 1 commit intopr_049_beforefrom
Conversation
This controller is responsible for monitoring nodes that request a neighbor link update. It provides a single place in the application where all node neighbor link are updates are performed as insertNeighbor are sprinkled in the code base. Additionally given the central location we can provide additional module health signals should neighbor link updates failed. Signed-off-by: Fernand Galiana <fernand.galiana@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a new controller for managing node neighbor link updates in the Cilium datapath. The main purpose is to centralize and provide health monitoring for node neighbor link operations, replacing the previous goroutine-based approach with a queue-based controller system.
Changes:
- Added a new controller
StartNodeNeighborLinkUpdaterthat processes node neighbor link updates from a managed queue - Modified
NodeNeighborRefreshto accept arefreshparameter for timestamp-based rate limiting - Replaced direct
insertNeighborgoroutine calls with an enqueue mechanism for better monitoring and error handling
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/option/config.go | Added blank line for formatting consistency |
| pkg/node/manager/queue_test.go | Added comprehensive test coverage for the new queue data structure |
| pkg/node/manager/manager.go | Implemented the core queue-based controller logic and Enqueue method |
| pkg/node/manager/cell.go | Added interface definitions for the new enqueuer functionality |
| pkg/datapath/types/node.go | Updated NodeNeighborRefresh signature to include refresh parameter |
| pkg/datapath/types/config.go | Added NodeNeighborEnqueuer interface definition |
| pkg/datapath/linux/node_test.go | Updated test instantiations to include mock enqueuer |
| pkg/datapath/linux/node_linux_test.go | Updated test cases to use new signature and mock enqueuer |
| pkg/datapath/linux/node.go | Replaced direct neighbor insertion with enqueue calls and updated error handling |
| pkg/datapath/linux/fuzz_test.go | Added mock enqueuer implementation for fuzzing tests |
| pkg/datapath/linux/datapath.go | Wired up NodeManager dependency for enqueuer |
| pkg/datapath/fake/types/node.go | Updated fake implementation signature |
| pkg/datapath/cells.go | Added NodeManager dependency injection |
| daemon/cmd/status.go | Updated client implementation signature |
| daemon/cmd/daemon_main.go | Initialized the new neighbor link updater controller |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if n == nil { | ||
| log.WithFields(logrus.Fields{ | ||
| logfields.LogSubsys: "enqueue", | ||
| }).Warn("Skipping nodeNeighbor insert: No node given") |
There was a problem hiding this comment.
The function logs a warning when n is nil but continues to push the nil node to the queue anyway. This will cause errors downstream in StartNodeNeighborLinkUpdater where it checks for nil nodes. Add an early return after the warning log.
| }).Warn("Skipping nodeNeighbor insert: No node given") | |
| }).Warn("Skipping nodeNeighbor insert: No node given") | |
| return |
| if e == nil || e.node == nil { | ||
| errs = errors.Join(errs, fmt.Errorf("invalid node spec found in queue: %#v", e)) | ||
| break | ||
| } |
There was a problem hiding this comment.
The loop breaks immediately when encountering a nil entry or nil node, which prevents processing remaining valid entries in the queue. The loop should use continue instead of break to skip invalid entries and process remaining valid ones, or check isEmpty() to determine when to exit.
| if e == nil || e.node == nil { | |
| errs = errors.Join(errs, fmt.Errorf("invalid node spec found in queue: %#v", e)) | |
| break | |
| } | |
| if e == nil { | |
| break | |
| } | |
| if e.node == nil { | |
| errs = errors.Join(errs, fmt.Errorf("invalid node spec found in queue: %#v", e)) | |
| continue | |
| } |
| n.mutex.Lock() | ||
| isInitialized := n.isInitialized | ||
| n.mutex.Unlock() | ||
| if !isInitialized { | ||
| if !n.isInitialized { | ||
| n.mutex.Unlock() | ||
| // Wait until the node is initialized. When it's not, insertNeighbor() | ||
| // is not invoked, so there is nothing to refresh. | ||
| return nil | ||
| } | ||
|
|
||
| return n.insertNeighbor(ctx, &nodeToRefresh, true) | ||
| n.mutex.Unlock() |
There was a problem hiding this comment.
The mutex lock/unlock pattern here is unnecessarily split. Consider using defer n.mutex.Unlock() immediately after the lock to simplify the control flow and prevent potential future errors if additional code paths are added.
PR_049