Skip to content

Conversation

@aliceb-nv
Copy link
Contributor

@aliceb-nv aliceb-nv commented Jan 5, 2026

This PR implements a modular system for checking against solving limits (time, user stop, work limits...) at any point in the code, and replaces previous manual timer_t handling. A user callback is also added to provide a programmatical way to stop the solve at any point. Additional termination checks are also included, to ensure the solver stops promptly if requested.

Furthermore, the user can now gracefully stop the solver at any point using Ctrl-C, so that intermediary solve information is kept and printed out, and the best solution is saved to disk (according to command line parameters). Forceful termination can still be achieved by pressing Ctrl-C 5 times in a row.

closes #186

Summary by CodeRabbit

  • New Features

    • Solvers now support user interrupt (Ctrl-C) for graceful termination.
    • Added termination-check callbacks so users can supply custom stop conditions alongside time limits.
    • LP solvers now support registering solution and termination callbacks, matching MIP callback capabilities.
  • Chores

    • Updated copyright year ranges across the codebase.
  • Tests

    • Updated tests to use the new termination mechanism.

✏️ Tip: You can customize this high-level summary in your review settings.

@aliceb-nv aliceb-nv added this to the 26.02 milestone Jan 5, 2026
@aliceb-nv aliceb-nv added the non-breaking Introduces a non-breaking change label Jan 5, 2026
@aliceb-nv aliceb-nv requested a review from a team as a code owner January 5, 2026 17:47
@aliceb-nv aliceb-nv added the improvement Improves an existing functionality label Jan 5, 2026
@aliceb-nv aliceb-nv requested a review from a team as a code owner January 5, 2026 17:47
@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

Introduces centralized termination handling and SIGINT support: a user interrupt singleton, termination_checker_t to unify time/user callbacks, a CHECK_TERMINATION callback type, and wide refactor replacing timer_t usages with termination_checker_t across LP/MIP solvers and Python callback wiring.

Changes

Cohort / File(s) Summary
Termination core & interrupt
cpp/include/cuopt/utilities/user_interrupt_handler.hpp, cpp/src/utilities/termination_checker.hpp
New user_interrupt_handler_t singleton for SIGINT; new termination_checker_t type with callback registration and check()/remaining_time()/elapsed_time() APIs.
Callback types & Python wrappers
cpp/include/cuopt/linear_programming/utilities/internals.hpp, cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp, python/cuopt/cuopt/linear_programming/internals/internals.pyx, python/cuopt/cuopt/linear_programming/internals/__init__.py
Added CHECK_TERMINATION callback type and check_termination_callback_t base; provided default_check_termination_callback_t Cython wrapper and exported CheckTerminationCallback.
Solver settings & API surface
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp, cpp/include/cuopt/linear_programming/solver_settings.hpp, cpp/src/math_optimization/solver_settings.cu, python/.../solver_settings/solver_settings.py, python/.../solver/solver.pxd
Added LP callback storage and APIs: set_lp_callback() / get_lp_callbacks(); Python SolverSettings gains lp_callbacks and setters/getters.
Timer → termination propagation (core)
Many files under cpp/src/linear_programming/, cpp/src/mip/, cpp/src/mip/local_search/, cpp/src/mip/presolve/, cpp/src/mip/diversity/, cpp/src/mip/relaxed_lp/, cpp/src/dual_simplex/ (see raw summary for list)
Systematically replaced timer_t / timer.check_time_limit() and related APIs with termination_checker_t / check(); added termination parameters to functions, updated constructors and members to carry termination context.
Dual-simplex concurrent halt handling
cpp/src/dual_simplex/basis_solves.cpp, cpp/src/dual_simplex/basis_updates.cpp, cpp/src/dual_simplex/crossover.cpp, cpp/src/dual_simplex/phase2.cpp, cpp/src/dual_simplex/primal.cpp, cpp/src/dual_simplex/primal.hpp
Introduced -2 return for concurrent halt; propagated CONCURRENT_HALT status and adjusted factorization/repair/error flows to distinguish concurrent halt vs rank errors.
MIP solver context / solver wiring
cpp/src/mip/solver_context.cuh, cpp/src/mip/solver.cu, cpp/src/mip/solver.cuh, cpp/src/mip/solve.cu, cpp/src/mip/solver.cu
Contexts and solver constructors now accept/hold termination_checker_t&; branch-and-bound settings wired to context termination; run_mip/run_solver signatures updated.
PDLP / LP solver integration
cpp/src/linear_programming/pdlp.cu, cpp/src/linear_programming/pdlp.cuh, cpp/src/linear_programming/solve.cu, cpp/src/linear_programming/solve.cuh
PDLP/LP APIs updated to accept termination_checker_t; termination propagated through barrier, dual simplex and PDLP call paths; LP callbacks wired into termination callback where applicable.
Local search, diversity, feasibility, rounding, presolve, probing, relaxed LP
Multiple files under cpp/src/mip/... (local_search, diversity, presolve, relaxed_lp, feasibility_jump, recombiners, population, RINS, bounds_presolve, probing_cache, etc.)
Numerous function signatures and members changed from timer_t to termination_checker_t; internal timers constructed with parent termination; checks now use .check().
Utilities & timer
cpp/src/utilities/timer.hpp
Removed print_debug(std::string) and dropped <string> include.
Tests & benchmarks
benchmarks/linear_programming/cuopt/run_mip.cpp, cpp/tests/...
Tests and benchmark run files updated to construct and pass termination_checker_t (root tag) and to pass termination where relaxed LP APIs changed.
MIP test helper
cpp/tests/mip/mip_utils.cuh
test_mps_file now accepts optional vector of callbacks and applies them to settings.
Problem objective mapping
cpp/src/mip/problem/problem.cu, cpp/src/mip/problem/problem.cuh
Added get_solver_obj_from_user_obj(f_t) to problem_t for inverse objective mapping.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Uniformize preemptions/termination checks and implement graceful SIGINT handler' clearly summarizes the main changes: introducing unified termination handling and a SIGINT handler.
Linked Issues check ✅ Passed The PR implementation meets issue #186 requirements: it adds graceful SIGINT handling, integrates termination checks throughout solvers, returns best partial solutions on interruption, and implements force-quit protection via repeated Ctrl-C.
Out of Scope Changes check ✅ Passed All changes directly support the stated objective of implementing graceful SIGINT handling and uniformizing termination checks across solvers. No unrelated refactoring or feature additions detected outside the termination handling scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Fix all issues with AI Agents 🤖
In @cpp/include/cuopt/linear_programming/solver_settings.hpp:
- Around line 85-86: Add Doxygen-style comments above the public methods
set_lp_callback and get_lp_callbacks describing their purpose and usage,
documenting the parameter for set_lp_callback (that it accepts an
internals::base_solution_callback_t* and that nullptr clears/unsets the
callback), documenting the return semantics for get_lp_callbacks (returns a
vector of pointers, ownership/copy/const guarantees), and including
thread-safety notes (whether these methods are safe to call concurrently or
require external synchronization); ensure comments follow the project's Doxygen
format, include @param, @return and @thread-safety or equivalent tags, and place
them directly above the declarations of set_lp_callback and get_lp_callbacks in
the header.

In @cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp:
- Around line 94-100: The function check_termination calls PyObject_CallMethod
on this->pyCallbackClass and immediately uses PyObject_IsTrue and Py_DECREF
without checking for NULL; modify check_termination to test whether
PyObject_CallMethod returned NULL, handle the Python exception (e.g., call
PyErr_Print() or propagate/translate the error) and return a safe default
(false) when NULL is returned, otherwise proceed to call PyObject_IsTrue on res
and Py_DECREF(res); ensure references to pyCallbackClass, PyObject_CallMethod,
PyObject_IsTrue, and Py_DECREF are preserved and only used after the NULL check.

In @cpp/include/cuopt/utilities/user_interrupt_handler.hpp:
- Around line 81-115: The signal handler handle_signal must not take mutex_ or
call callbacks_; change it to only perform async-signal-safe operations: set the
existing std::atomic<bool> terminate_signal_received_ and atomically increment a
new std::atomic<int> interrupt_count_ (or write a single byte to a self-pipe /
pipe_fds to notify the main thread) and return; remove usage of
interrupt_times_, callbacks_, std::lock_guard<std::mutex> and any chrono
operations from handle_signal. Add a new public method like
process_pending_signals() (called from solver/main loop) that acquires mutex_,
updates interrupt_times_, checks force_quit_threshold/window, logs and calls
callbacks_ and calls std::_Exit if needed; or have the main thread read the
self-pipe and perform the same locked processing there. Update any declarations
(add interrupt_count_ or pipe fds and the new processing method) and ensure all
non-async-signal-safe work (locking, timestamping, invoking callbacks_) happens
only in process_pending_signals() on the main thread.

In @cpp/src/dual_simplex/branch_and_bound.cpp:
- Around line 1157-1160: The early return in diving_thread (the block checking
toc(exploration_stats_.start_time) > settings_.time_limit ||
(settings_.termination != nullptr && settings_.termination->check())) fails to
set solver_status_; set solver_status_ = mip_exploration_status_t::TIME_LIMIT
immediately before the return so the timeout is propagated to set_final_solution
and the caller receives the correct status. Locate the check in
branch_and_bound::diving_thread and assign solver_status_ to
mip_exploration_status_t::TIME_LIMIT right before returning (keeping the
existing condition logic intact).

In @cpp/src/mip/feasibility_jump/feasibility_jump.cu:
- Around line 871-872: The if-condition is redundantly calling
context.termination.check() because timer was constructed with
context.termination as its parent and timer.check() already invokes the parent's
check; remove the explicit context.termination.check() from the conditional (the
if that currently reads something like if (timer.check() || steps >=
settings.iteration_limit || context.preempt_heuristic_solver_.load() ||
context.termination.check()) ) so it becomes if (timer.check() || steps >=
settings.iteration_limit || context.preempt_heuristic_solver_.load() ); keep all
other checks and behavior unchanged.
- Line 865: The inner termination_checker_t created at the start of host_loop()
reuses the full settings.time_limit and therefore allows total runtime to exceed
the intended limit; change host_loop() to accept and reuse the outer
termination_checker_t (or accept a reference/pointer to it) instead of
constructing a new termination_checker_t with settings.time_limit, or
alternatively compute elapsed = outer_timer.elapsed() before calling host_loop()
and reduce settings.time_limit by that elapsed amount so the inner timer
enforces the remaining time; update host_loop() signature and all call sites to
accept the outer timer (or remaining_time) and replace the inner
construction/usage of termination_checker_t and check() accordingly so only one
shared timer governs enforcement and effort_rate/logging remain correct.

In @cpp/src/mip/local_search/local_search.cu:
- Around line 273-283: The code sets fj_settings.time_limit to fixed values
(0.5/0.25) and constructs termination_checker_t with those fixed values, which
can exceed the parent in_timer budget; change the construction of timer when
solution.get_feasible() is false (and ls_config.at_least_one_parent_feasible
true/false) to respect remaining time by using the smaller of
in_timer.remaining_time() and the desired limit—i.e., keep
fj_settings.time_limit as 0.5 or 0.25 but build timer with
termination_checker_t(std::min(in_timer.remaining_time(),
fj_settings.time_limit), context.termination) so the new timer never exceeds the
parent timer.

In @cpp/src/mip/local_search/rounding/lb_constraint_prop.cu:
- Line 710: The assignment to max_timer constructs termination_checker_t with
only max_time_for_bounds_prop but termination_checker_t has no single-argument
ctor; update the call in lb_constraint_prop (where max_timer is set) to pass the
second required argument—use termination_checker_t::root_tag_t{} as the second
parameter (i.e., construct termination_checker_t with max_time_for_bounds_prop
and termination_checker_t::root_tag_t{}).

In @cpp/src/mip/solve.cu:
- Around line 186-200: The termination callback currently takes a void* and
casts it back to a non-const mip_solver_settings_t, which obscures intent;
change the argument passed to termination.set_termination_callback to a const
void* (e.g. (const void*)&settings) and inside the lambda static_cast that
parameter to const mip_solver_settings_t<i_t, f_t>* so the lambda only uses
read-only access (keep using termination.set_termination_callback and the
existing lambda body but replace the casts to reflect const-correctness).
🧹 Nitpick comments (16)
cpp/src/mip/diversity/recombiners/sub_mip.cuh (1)

107-107: Consider consolidating time limit settings.

Both line 107 and line 102 specify sub_mip_recombiner_config_t::sub_mip_time_limit. If the termination checker at line 104 now handles all time limit enforcement, the explicit time_limit assignment here might be redundant. However, without visibility into branch_and_bound_t's implementation, it's unclear whether both settings are still needed.

🔎 Verification script to check time_limit usage in branch_and_bound
#!/bin/bash
# Description: Check how branch_and_bound uses time_limit vs termination settings

# Search for time_limit field usage in branch_and_bound implementation
rg -nP --type=cpp --type=cu -C5 '\btime_limit\b' -g '*branch_and_bound*'

# Search for termination field usage in branch_and_bound implementation  
rg -nP --type=cpp --type=cu -C5 '\btermination\b' -g '*branch_and_bound*'
cpp/src/mip/feasibility_jump/feasibility_jump.cu (1)

1050-1050: Consider more accurate effort rate calculation.

The timer created here measures total elapsed time including setup work (lines 1051-1093), but line 1098 calculates effort_rate as iterations / timer.elapsed_time(). Since iterations only counts work done inside host_loop(), the effort rate includes setup overhead and doesn't reflect the true iteration rate.

Consider using the timer from host_loop() (line 865) or returning its elapsed time for a more accurate iteration throughput metric.

python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx (1)

211-218: LP callbacks correctly wired; consider de-duplicating callback registration

The new lp_callbacks = settings.get_lp_callbacks() loop mirrors the existing MIP callback wiring and the cast to base_solution_callback_t* is consistent, so LP solution callbacks should now flow through correctly.

You now have nearly identical MIP/LP callback registration blocks; factoring this into a small helper (e.g., _register_callbacks(c_solver_settings, callbacks, set_fn)) would reduce drift risk if callback handling evolves.

cpp/src/dual_simplex/barrier.cu (1)

3691-3694: Centralized termination checks are correct; consider more generic logging/helper

Switching from ad‑hoc time checks to settings.check_termination(start_time) before initial_point and at key points in the main loop is a solid move: it now respects the unified termination mechanism (time limit, callbacks, SIGINT via the parent checker) and still returns lp_status_t::TIME_LIMIT or CONCURRENT_LIMIT as before.

Two small follow‑ups to consider:

  • All these sites log "Barrier time limit exceeded" and return TIME_LIMIT even when the stop is driven by a user callback or SIGINT rather than the numeric time limit. If callbacks/SIGINT are intentionally folded into TIME_LIMIT, a more neutral message like "Barrier terminated by limit/callback" would avoid misleading logs.
  • The check_termination + log + TIME_LIMIT pattern is repeated several times; a tiny helper (e.g., if (auto s = settings.handle_termination_if_requested(start_time)) return s;) would de‑duplicate this and keep future changes localized.

Also applies to: 3697-3700, 3796-3799, 3832-3835, 3873-3876

cpp/tests/mip/bounds_standardization_test.cu (1)

69-80: Relaxed-LP test termination checker wiring is correct; add <limits> include

Constructing a root termination_checker_t with std::numeric_limits<double>::infinity() and threading it into detail::get_relaxed_lp_solution matches the new API and preserves the intended behavior: a global “no hard wall-clock cap” while lp_settings.time_limit still enforces the 120s limit inside the relaxed LP solve.

Since this file now uses std::numeric_limits, it would be cleaner to explicitly #include <limits> rather than relying on transitive includes from other headers.

If more tests need a root termination checker, consider a small utility (e.g., make_infinite_termination_checker()) in a shared test helper to avoid repeating the construction pattern.

cpp/src/dual_simplex/crossover.cpp (4)

532-535: Log message is misleading after termination logic unification.

Line 533 says "Crossover time exceeded" but check_termination now checks multiple termination conditions (time limit, user callbacks, SIGINT) per the PR objectives, not just time. Consider updating to "Crossover terminated" or "Crossover stopped" to reflect the broader termination logic.

Suggested message update
     if (settings.check_termination(start_time)) {
-      settings.log.printf("Crossover time exceeded\n");
+      settings.log.printf("Crossover terminated\n");
       return -1;
     }

532-539: Redundant concurrent_halt check after check_termination.

Lines 536-539 manually check concurrent_halt after check_termination returns. Per the snippet from simplex_solver_settings.hpp (lines 89-95), check_termination already sets *concurrent_halt = 1 when termination is requested, making this subsequent check redundant.

Remove redundant check
     if (settings.check_termination(start_time)) {
       settings.log.printf("Crossover terminated\n");
       return -1;
     }
-    if (settings.concurrent_halt != nullptr && *settings.concurrent_halt == 1) {
-      settings.log.printf("Concurrent halt\n");
-      return -2;
-    }

Note: This pattern repeats at lines 823-830, 1146-1153, 1233-1240, and 1361-1368. Consider applying the same refactor to all occurrences.


823-826: Update log message to reflect unified termination logic.

Same issue as line 533: the message says "time limit exceeded" but check_termination now checks multiple termination conditions.

Suggested update
     if (settings.check_termination(start_time)) {
-      settings.log.printf("Crossover time limit exceeded\n");
+      settings.log.printf("Crossover terminated\n");
       return -1;
     }

1146-1149: Update log message to reflect unified termination logic.

Same issue: "Time limit exceeded" should reflect that termination can occur due to multiple conditions.

Suggested update
     if (settings.check_termination(start_time)) {
-      settings.log.printf("Time limit exceeded\n");
+      settings.log.printf("Crossover terminated\n");
       return crossover_status_t::TIME_LIMIT;
     }
cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp (1)

102-102: Consider initializing pyCallbackClass to avoid undefined behavior.

The pyCallbackClass member is uninitialized. If check_termination() is called before the callback is properly set up, this will cause undefined behavior.

🔎 Suggested initialization
-  PyObject* pyCallbackClass;
+  PyObject* pyCallbackClass = nullptr;
cpp/include/cuopt/linear_programming/utilities/internals.hpp (1)

59-66: LGTM! Clean callback interface following existing patterns.

The new check_termination_callback_t class follows the established pattern of get_solution_callback_t and set_solution_callback_t, providing a consistent API for termination checking.

Consider adding Doxygen documentation for this public API class to describe the expected behavior and return value semantics of check_termination(). Based on coding guidelines for public headers.

cpp/include/cuopt/utilities/user_interrupt_handler.hpp (1)

117-122: Consider making next_id_ atomic for thread-safety.

While register_callback and unregister_callback are protected by mutex_, if you address the signal handler issue above, you may want next_id_ to be std::atomic<size_t> for consistency with thread-safe patterns.

cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (2)

174-182: Add Doxygen documentation for new public API methods.

As per coding guidelines for public headers (cpp/include/cuopt/**/*), new public functions should have documentation comments. These methods are part of the public C++ API and should document:

  • Purpose and usage
  • Lifetime requirements for callback pointers
  • Thread-safety considerations
🔎 Suggested documentation
+  /**
+   * @brief Get the list of registered LP solution callbacks.
+   * @return Vector of pointers to registered callbacks.
+   * @note Callbacks must remain valid for the lifetime of the solver.
+   */
   const std::vector<internals::base_solution_callback_t*> get_lp_callbacks() const
   {
     return lp_callbacks_;
   }

+  /**
+   * @brief Register an LP solution callback.
+   * @param callback Pointer to callback object. Caller retains ownership
+   *                 and must ensure the callback remains valid during solving.
+   */
   void set_lp_callback(internals::base_solution_callback_t* callback)
   {
     lp_callbacks_.push_back(callback);
   }

229-230: Consider making lp_callbacks_ private for encapsulation consistency.

The member lp_callbacks_ is declared public but has getter/setter methods (get_lp_callbacks, set_lp_callback). This breaks encapsulation - callers could bypass the accessors and modify the vector directly.

🔎 Move to private section
- std::vector<internals::base_solution_callback_t*> lp_callbacks_;
-
  private:
   /** Initial primal solution */
   std::shared_ptr<rmm::device_uvector<f_t>> initial_primal_solution_;
   /** Initial dual solution */
   std::shared_ptr<rmm::device_uvector<f_t>> initial_dual_solution_;
+  /** LP solution callbacks */
+  std::vector<internals::base_solution_callback_t*> lp_callbacks_;
cpp/src/utilities/termination_checker.hpp (1)

29-74: Consider making termination_checker_t non-copyable to prevent dangling parent pointers.

The class stores a raw pointer to a parent termination_checker_t. If an instance is copied, both the original and copy will reference the same parent. If the parent is destroyed before the copy, the copy will have a dangling pointer. Since this class represents a scoped termination context tied to a specific solver invocation, non-copyable semantics would be safer.

🔎 Suggested fix
 class termination_checker_t {
  public:
+  // Non-copyable to prevent dangling parent pointers
+  termination_checker_t(const termination_checker_t&) = delete;
+  termination_checker_t& operator=(const termination_checker_t&) = delete;
+  termination_checker_t(termination_checker_t&&) = default;
+  termination_checker_t& operator=(termination_checker_t&&) = default;
+
   // Separate tag to force any declaration of a root termination checker to be explicit
   struct root_tag_t {};
cpp/src/mip/local_search/rounding/constraint_prop.cu (1)

42-43: Clarify the initialization of max_timer with zero time limit.

The constructor initializes max_timer with time_limit=0, which would cause check() to return true immediately. This is overwritten in apply_round() on line 1067 before actual use, so it's functionally correct, but the pattern is confusing.

Consider using a sentinel or making the initial state more explicit.

Comment on lines +85 to +86
void set_lp_callback(internals::base_solution_callback_t* callback = nullptr);
const std::vector<internals::base_solution_callback_t*> get_lp_callbacks() const;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add Doxygen documentation for new public API methods.

The new set_lp_callback and get_lp_callbacks methods lack documentation comments. Public APIs in headers under cpp/include/cuopt/ should include Doxygen-style documentation covering:

  • Purpose and usage
  • Parameter descriptions (e.g., callback purpose, nullptr behavior)
  • Return value semantics
  • Thread-safety considerations

As per coding guidelines for public header files.

📋 Suggested documentation template
+  /**
+   * @brief Set LP callback for monitoring solve progress and enabling user termination
+   * @param callback Pointer to callback implementation, or nullptr to clear callbacks
+   */
   void set_lp_callback(internals::base_solution_callback_t* callback = nullptr);
+  /**
+   * @brief Get all registered LP callbacks
+   * @return Vector of callback pointers currently registered
+   */
   const std::vector<internals::base_solution_callback_t*> get_lp_callbacks() const;
🤖 Prompt for AI Agents
In @cpp/include/cuopt/linear_programming/solver_settings.hpp around lines 85-86,
Add Doxygen-style comments above the public methods set_lp_callback and
get_lp_callbacks describing their purpose and usage, documenting the parameter
for set_lp_callback (that it accepts an internals::base_solution_callback_t* and
that nullptr clears/unsets the callback), documenting the return semantics for
get_lp_callbacks (returns a vector of pointers, ownership/copy/const
guarantees), and including thread-safety notes (whether these methods are safe
to call concurrently or require external synchronization); ensure comments
follow the project's Doxygen format, include @param, @return and @thread-safety
or equivalent tags, and place them directly above the declarations of
set_lp_callback and get_lp_callbacks in the header.

Comment on lines +94 to +100
bool check_termination() override
{
PyObject* res = PyObject_CallMethod(this->pyCallbackClass, "check_termination", nullptr);
bool should_terminate = PyObject_IsTrue(res);
Py_DECREF(res);
return should_terminate;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing NULL check on Python API return value will cause crash on exception.

PyObject_CallMethod returns NULL when the Python callback raises an exception. Calling PyObject_IsTrue(NULL) is undefined behavior and Py_DECREF(NULL) will segfault.

🔎 Proposed fix
   bool check_termination() override
   {
     PyObject* res = PyObject_CallMethod(this->pyCallbackClass, "check_termination", nullptr);
+    if (res == nullptr) {
+      // Python exception occurred - treat as termination request
+      PyErr_Print();
+      return true;
+    }
     bool should_terminate = PyObject_IsTrue(res);
     Py_DECREF(res);
     return should_terminate;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bool check_termination() override
{
PyObject* res = PyObject_CallMethod(this->pyCallbackClass, "check_termination", nullptr);
bool should_terminate = PyObject_IsTrue(res);
Py_DECREF(res);
return should_terminate;
}
bool check_termination() override
{
PyObject* res = PyObject_CallMethod(this->pyCallbackClass, "check_termination", nullptr);
if (res == nullptr) {
// Python exception occurred - treat as termination request
PyErr_Print();
return true;
}
bool should_terminate = PyObject_IsTrue(res);
Py_DECREF(res);
return should_terminate;
}
🤖 Prompt for AI Agents
In @cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp around
lines 94-100, The function check_termination calls PyObject_CallMethod on
this->pyCallbackClass and immediately uses PyObject_IsTrue and Py_DECREF without
checking for NULL; modify check_termination to test whether PyObject_CallMethod
returned NULL, handle the Python exception (e.g., call PyErr_Print() or
propagate/translate the error) and return a safe default (false) when NULL is
returned, otherwise proceed to call PyObject_IsTrue on res and Py_DECREF(res);
ensure references to pyCallbackClass, PyObject_CallMethod, PyObject_IsTrue, and
Py_DECREF are preserved and only used after the NULL check.

Comment on lines +81 to +115
static void handle_signal(int /*sig*/)
{
auto& self = instance();
std::lock_guard<std::mutex> lock(self.mutex_);

self.terminate_signal_received_ = true;

auto now = std::chrono::steady_clock::now();
auto cutoff = now - std::chrono::seconds(force_quit_window_seconds);

// Remove timestamps older than the window
while (!self.interrupt_times_.empty() && self.interrupt_times_.front() < cutoff) {
self.interrupt_times_.pop_front();
}
self.interrupt_times_.push_back(now);

// Force quit if too many interrupts in the window
if (static_cast<int>(self.interrupt_times_.size()) >= force_quit_threshold) {
fprintf(stderr,
"Force quit: %d interrupts in %d seconds.",
force_quit_threshold,
force_quit_window_seconds);
std::_Exit(128 + SIGINT);
}

// Invoke all registered callbacks
for (const auto& [id, callback] : self.callbacks_) {
callback();
}

auto remaining = force_quit_threshold - static_cast<int>(self.interrupt_times_.size());
fprintf(stderr,
"Interrupt received. Stopping solver... (press Ctrl-C %d more time(s) to force quit)",
remaining);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Calling mutex and invoking callbacks in a signal handler is undefined behavior.

Signal handlers have strict requirements on what functions can be called safely. Per POSIX, only async-signal-safe functions are permitted in signal handlers. std::mutex::lock(), std::mutex::unlock(), and arbitrary user callbacks are not async-signal-safe.

This can cause deadlocks if SIGINT arrives while mutex_ is already held (e.g., during register_callback), or undefined behavior in other scenarios.

Consider one of these safer approaches:

  1. Use std::atomic_flag or sig_atomic_t to set a flag in the handler, then check it from normal code paths
  2. Use a self-pipe trick or signalfd (Linux) to handle signals in a dedicated thread
  3. At minimum, use std::atomic<bool> for terminate_signal_received_ (already done) and check it from solver loops rather than invoking callbacks from the handler
🔎 Suggested safer pattern
  static void handle_signal(int /*sig*/)
  {
    auto& self = instance();
-   std::lock_guard<std::mutex> lock(self.mutex_);
-
-   self.terminate_signal_received_ = true;
-
-   auto now    = std::chrono::steady_clock::now();
-   auto cutoff = now - std::chrono::seconds(force_quit_window_seconds);
-
-   // Remove timestamps older than the window
-   while (!self.interrupt_times_.empty() && self.interrupt_times_.front() < cutoff) {
-     self.interrupt_times_.pop_front();
-   }
-   self.interrupt_times_.push_back(now);
-
-   // Force quit if too many interrupts in the window
-   if (static_cast<int>(self.interrupt_times_.size()) >= force_quit_threshold) {
-     fprintf(stderr,
-             "Force quit: %d interrupts in %d seconds.",
-             force_quit_threshold,
-             force_quit_window_seconds);
-     std::_Exit(128 + SIGINT);
-   }
-
-   // Invoke all registered callbacks
-   for (const auto& [id, callback] : self.callbacks_) {
-     callback();
-   }
-
-   auto remaining = force_quit_threshold - static_cast<int>(self.interrupt_times_.size());
-   fprintf(stderr,
-           "Interrupt received. Stopping solver... (press Ctrl-C %d more time(s) to force quit)",
-           remaining);
+   // Only use async-signal-safe operations here
+   self.terminate_signal_received_.store(true, std::memory_order_relaxed);
+   
+   // Increment interrupt count atomically for force-quit detection
+   int count = self.interrupt_count_.fetch_add(1, std::memory_order_relaxed) + 1;
+   
+   if (count >= force_quit_threshold) {
+     // _Exit is async-signal-safe
+     std::_Exit(128 + SIGINT);
+   }
+   
+   // Use write() instead of fprintf - it's async-signal-safe
+   const char msg[] = "Interrupt received. Stopping solver...\n";
+   write(STDERR_FILENO, msg, sizeof(msg) - 1);
  }
+ 
+ // Add atomic counter for simpler force-quit detection
+ std::atomic<int> interrupt_count_{0};

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @cpp/include/cuopt/utilities/user_interrupt_handler.hpp around lines 81-115,
The signal handler handle_signal must not take mutex_ or call callbacks_; change
it to only perform async-signal-safe operations: set the existing
std::atomic<bool> terminate_signal_received_ and atomically increment a new
std::atomic<int> interrupt_count_ (or write a single byte to a self-pipe /
pipe_fds to notify the main thread) and return; remove usage of
interrupt_times_, callbacks_, std::lock_guard<std::mutex> and any chrono
operations from handle_signal. Add a new public method like
process_pending_signals() (called from solver/main loop) that acquires mutex_,
updates interrupt_times_, checks force_quit_threshold/window, logs and calls
callbacks_ and calls std::_Exit if needed; or have the main thread read the
self-pipe and perform the same locked processing there. Update any declarations
(add interrupt_count_ or pipe fds and the new processing method) and ensure all
non-async-signal-safe work (locking, timestamping, invoking callbacks_) happens
only in process_pending_signals() on the main thread.

data.incumbent_quality.set_value_async(obj, handle_ptr->get_stream());

timer_t timer(settings.time_limit);
termination_checker_t timer(settings.time_limit, context.termination);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's look at the file structure and line ranges mentioned
echo "=== Examining feasibility_jump.cu for timer patterns ==="
cd cpp/src/mip/feasibility_jump

# Check file exists and get context around the mentioned lines
echo "=== Code around line 1050 (solve function, first timer) ==="
sed -n '1045,1100p' feasibility_jump.cu

echo ""
echo "=== Code around line 865 (host_loop call and timer) ==="
sed -n '860,870p' feasibility_jump.cu

Repository: NVIDIA/cuopt

Length of output: 3149


🏁 Script executed:

#!/bin/bash
# Search for termination_checker_t instantiations to find all timers
echo "=== All termination_checker_t instantiations ==="
rg -n "termination_checker_t\s+" cpp/src/mip/feasibility_jump/feasibility_jump.cu -A 2

echo ""
echo "=== Check host_loop function signature ==="
rg -n "void host_loop\|termination_checker_t.*host_loop" cpp/src/mip/feasibility_jump/feasibility_jump.cu -B 2 -A 5

Repository: NVIDIA/cuopt

Length of output: 469


🏁 Script executed:

#!/bin/bash
# Search for time_limit usage in settings and how it's modified
echo "=== time_limit adjustments and checks ==="
rg -n "time_limit" cpp/src/mip/feasibility_jump/feasibility_jump.cu -B 2 -A 2 | head -100

Repository: NVIDIA/cuopt

Length of output: 1309


🏁 Script executed:

#!/bin/bash
# Check how the termination_checker_t timer is used in host_loop
echo "=== Timer usage in host_loop (around line 865-870 and beyond) ==="
sed -n '865,900p' cpp/src/mip/feasibility_jump/feasibility_jump.cu

echo ""
echo "=== Check where timer limit is actually enforced in host_loop ==="
rg -n "timer\." cpp/src/mip/feasibility_jump/feasibility_jump.cu | grep -E "host_loop|865|870|880|890"

echo ""
echo "=== Look for termination checks using the timer ==="
sed -n '860,920p' cpp/src/mip/feasibility_jump/feasibility_jump.cu

Repository: NVIDIA/cuopt

Length of output: 4273


🏁 Script executed:

#!/bin/bash
# Understand the termination_checker_t class to see how it enforces limits
echo "=== Find termination_checker_t definition ==="
fd -t f "\.h$|\.hpp$|\.cuh$" cpp | xargs rg -l "class termination_checker_t|struct termination_checker_t"

echo ""
echo "=== Search for termination_checker_t constructor and check methods ==="
rg -A 10 "class termination_checker_t|struct termination_checker_t" --type=hpp --type=h --type=cuh | head -50

Repository: NVIDIA/cuopt

Length of output: 299


🏁 Script executed:

#!/bin/bash
# Check if the outer timer from line 1050 is used to enforce limits in solve()
echo "=== Check the outer timer from solve() and what happens with it ==="
sed -n '1095,1115p' cpp/src/mip/feasibility_jump/feasibility_jump.cu

Repository: NVIDIA/cuopt

Length of output: 993


🏁 Script executed:

#!/bin/bash
# Find and examine termination_checker_t class definition
echo "=== termination_checker_t definition ==="
cat -n cpp/src/utilities/termination_checker.hpp | head -60

echo ""
echo "=== Check the check() method ==="
sed -n '1,100p' cpp/src/utilities/termination_checker.hpp | grep -A 20 "check()"

Repository: NVIDIA/cuopt

Length of output: 3278


🏁 Script executed:

#!/bin/bash
# Examine the full flow to understand if outer timer is ever used for enforcement
echo "=== Full solve() function signature and flow ==="
sed -n '1045,1110p' cpp/src/mip/feasibility_jump/feasibility_jump.cu

Repository: NVIDIA/cuopt

Length of output: 3081


🏁 Script executed:

#!/bin/bash
# Check if there's any mechanism that passes the outer timer to host_loop
echo "=== host_loop function signature ==="
rg -n "i_t host_loop" cpp/src/mip/feasibility_jump/feasibility_jump.cu -A 5

echo ""
echo "=== Check the complete host_loop definition to understand parameter passing ==="
sed -n '840,850p' cpp/src/mip/feasibility_jump/feasibility_jump.cu

Repository: NVIDIA/cuopt

Length of output: 625


🏁 Script executed:

#!/bin/bash
# Let me get the host_loop signature to confirm it doesn't receive the outer timer
echo "=== Complete host_loop signature ==="
sed -n '844,865p' cpp/src/mip/feasibility_jump/feasibility_jump.cu

Repository: NVIDIA/cuopt

Length of output: 914


🏁 Script executed:

#!/bin/bash
# Check if there are any other places where host_loop is called or defined
echo "=== All host_loop calls/definitions ==="
rg -n "host_loop" cpp/src/mip/feasibility_jump/feasibility_jump.cu

Repository: NVIDIA/cuopt

Length of output: 263


🏁 Script executed:

#!/bin/bash
# Verify if the outer timer is ever used to enforce termination
echo "=== Check all uses of outer timer from line 1050 ==="
sed -n '1050,1120p' cpp/src/mip/feasibility_jump/feasibility_jump.cu | grep -n "timer\."

Repository: NVIDIA/cuopt

Length of output: 237


🏁 Script executed:

#!/bin/bash
# Check the initialization times between line 1050 and host_loop call
echo "=== Code between outer timer creation and host_loop call ==="
sed -n '1050,1095p' cpp/src/mip/feasibility_jump/feasibility_jump.cu

Repository: NVIDIA/cuopt

Length of output: 2255


Time limit enforcement creates overlapping measurement and enforcement windows.

The outer timer at line 1050 and inner timer at line 865 are independent timers, each with a full settings.time_limit. The outer timer is used only for elapsed time measurement (effort_rate calculation, logging), while the inner timer at line 865 enforces termination via check(). However, between outer timer creation (line 1050) and inner timer creation (line 865), setup work occurs (test_variable_bounds, check_problem_representation, resize_vectors, device_init, climber_init). Since the inner timer starts fresh with a full time_limit after this setup, the total solver time can exceed the intended limit.

In ROUNDING mode, this is partially mitigated by reducing settings.time_limit at lines 1067-1070 before the host_loop() call. In non-ROUNDING mode, no such adjustment occurs.

Consider adjusting settings.time_limit to account for elapsed setup time before calling host_loop(), or pass the outer timer instance to host_loop() to reuse the same termination context.

🤖 Prompt for AI Agents
In @cpp/src/mip/feasibility_jump/feasibility_jump.cu around line 865, The inner
termination_checker_t created at the start of host_loop() reuses the full
settings.time_limit and therefore allows total runtime to exceed the intended
limit; change host_loop() to accept and reuse the outer termination_checker_t
(or accept a reference/pointer to it) instead of constructing a new
termination_checker_t with settings.time_limit, or alternatively compute elapsed
= outer_timer.elapsed() before calling host_loop() and reduce
settings.time_limit by that elapsed amount so the inner timer enforces the
remaining time; update host_loop() signature and all call sites to accept the
outer timer (or remaining_time) and replace the inner construction/usage of
termination_checker_t and check() accordingly so only one shared timer governs
enforcement and effort_rate/logging remain correct.

Comment on lines +871 to +872
if (timer.check() || steps >= settings.iteration_limit ||
context.preempt_heuristic_solver_.load() || context.termination.check()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for termination_checker_t class definition
rg -A 20 "class termination_checker_t" --type cpp --type h --type hpp --type cu --type cuh

Repository: NVIDIA/cuopt

Length of output: 83


🏁 Script executed:

# Also search for termination_checker_t definition/declaration
fd -e h -e hpp -e cu -e cuh | xargs rg "termination_checker_t" -B 2 -A 10 | head -100

Repository: NVIDIA/cuopt

Length of output: 9215


🏁 Script executed:

# Look for the check method in termination_checker_t
rg -A 10 "termination_checker_t.*{" --type cpp --type h --type hpp --type cu --type cuh -A 30 | grep -A 20 "check()"

Repository: NVIDIA/cuopt

Length of output: 167


🏁 Script executed:

# Search for termination_checker_t definition more carefully
rg "termination_checker_t" --type-list | head -5

Repository: NVIDIA/cuopt

Length of output: 445


🏁 Script executed:

# Search for the class definition
rg -B 2 -A 30 "class termination_checker_t" | head -100

Repository: NVIDIA/cuopt

Length of output: 4962


🏁 Script executed:

# Search for termination_checker_t in header files
fd -e h | xargs rg "termination_checker_t" -B 5 -A 20 | head -150

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Get the full check() method from termination_checker.hpp
sed -n '/bool check() const/,/^  }/p' cpp/src/utilities/termination_checker.hpp

Repository: NVIDIA/cuopt

Length of output: 341


🏁 Script executed:

# Also check the context where termination_checker_t is used in feasibility_jump.cu
sed -n '860,880p' cpp/src/mip/feasibility_jump/feasibility_jump.cu

Repository: NVIDIA/cuopt

Length of output: 786


Remove redundant termination check.

The condition at lines 871-872 checks both timer.check() and context.termination.check(). Since timer is constructed with context.termination as its parent (line 865), timer.check() already internally calls context.termination.check() via its parent pointer. The explicit context.termination.check() call is redundant and can be removed.

Termination checker logic

The termination_checker_t::check() method checks:

  1. Termination callback
  2. Time limit
  3. If parent exists, calls parent->check()

Since context.termination is the parent, timer.check() already includes its check.

🤖 Prompt for AI Agents
In @cpp/src/mip/feasibility_jump/feasibility_jump.cu around lines 871-872, The
if-condition is redundantly calling context.termination.check() because timer
was constructed with context.termination as its parent and timer.check() already
invokes the parent's check; remove the explicit context.termination.check() from
the conditional (the if that currently reads something like if (timer.check() ||
steps >= settings.iteration_limit || context.preempt_heuristic_solver_.load() ||
context.termination.check()) ) so it becomes if (timer.check() || steps >=
settings.iteration_limit || context.preempt_heuristic_solver_.load() ); keep all
other checks and behavior unchanged.

Comment on lines +273 to 283
auto timer = in_timer;
if (timer.check()) return false;
// adjust these time limits
if (!solution.get_feasible()) {
if (ls_config.at_least_one_parent_feasible) {
fj_settings.time_limit = 0.5;
timer = timer_t(fj_settings.time_limit);
timer = termination_checker_t(fj_settings.time_limit, context.termination);
} else {
fj_settings.time_limit = 0.25;
timer = timer_t(fj_settings.time_limit);
timer = termination_checker_t(fj_settings.time_limit, context.termination);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find cpp/src -name "termination_checker.hpp" -o -name "*termination_checker*" | head -20

Repository: NVIDIA/cuopt

Length of output: 97


🏁 Script executed:

rg -n "class termination_checker_t" cpp/src -A 30 -B 2

Repository: NVIDIA/cuopt

Length of output: 5160


🏁 Script executed:

rg -n "timer" cpp/src/mip/local_search/local_search.cu -B 3 -A 3 | head -50

Repository: NVIDIA/cuopt

Length of output: 2167


🏁 Script executed:

cat -n cpp/src/utilities/termination_checker.hpp

Repository: NVIDIA/cuopt

Length of output: 3051


🏁 Script executed:

sed -n '265,300p' cpp/src/mip/local_search/local_search.cu

Repository: NVIDIA/cuopt

Length of output: 1736


🏁 Script executed:

rg -n "context.termination" cpp/src/mip/local_search/local_search.cu -B 2 -A 2 | head -40

Repository: NVIDIA/cuopt

Length of output: 2114


🏁 Script executed:

sed -n '236,265p' cpp/src/mip/local_search/local_search.cu

Repository: NVIDIA/cuopt

Length of output: 1405


🏁 Script executed:

rg -n "class.*context" cpp/src/mip/local_search/local_search.cuh -A 10 | head -50

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

sed -n '269,295p' cpp/src/mip/local_search/local_search.cu

Repository: NVIDIA/cuopt

Length of output: 1227


🏁 Script executed:

grep -n "at_least_one_parent_feasible\|feasible" cpp/src/mip/local_search/local_search.cu | head -30

Repository: NVIDIA/cuopt

Length of output: 1721


Remove hard-coded time limits and respect remaining time from parent timer.

Lines 279 and 282 use fixed time limits (0.5 and 0.25 seconds) instead of respecting the remaining time from in_timer. This is inconsistent with the pattern used throughout the rest of the function, where std::min(timer.remaining_time(), limit) is used (e.g., lines 248, 251, 256). If in_timer.remaining_time() is less than 0.5, this code will exceed the original time budget.

Use std::min(in_timer.remaining_time(), fj_settings.time_limit) for constructing the new termination_checker_t objects, similar to line 402.

🤖 Prompt for AI Agents
In @cpp/src/mip/local_search/local_search.cu around lines 273-283, The code sets
fj_settings.time_limit to fixed values (0.5/0.25) and constructs
termination_checker_t with those fixed values, which can exceed the parent
in_timer budget; change the construction of timer when solution.get_feasible()
is false (and ls_config.at_least_one_parent_feasible true/false) to respect
remaining time by using the smaller of in_timer.remaining_time() and the desired
limit—i.e., keep fj_settings.time_limit as 0.5 or 0.25 but build timer with
termination_checker_t(std::min(in_timer.remaining_time(),
fj_settings.time_limit), context.termination) so the new timer never exceeds the
parent timer.

// this is second timer that can continue but without recovery mode
const f_t max_time_for_bounds_prop = 5.;
max_timer = timer_t{max_time_for_bounds_prop};
max_timer = termination_checker_t{max_time_for_bounds_prop};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find termination_checker.hpp
fd -t f termination_checker.hpp

# Get the file path to examine constructors
find . -name "termination_checker.hpp" -type f

Repository: NVIDIA/cuopt

Length of output: 141


🏁 Script executed:

#!/bin/bash
# Once we have the file, examine the termination_checker_t class definition
cat cpp/src/utilities/termination_checker.hpp | head -100

Repository: NVIDIA/cuopt

Length of output: 2519


🏁 Script executed:

#!/bin/bash
# Also check the specific line 710 in lb_constraint_prop.cu for context
sed -n '705,715p' cpp/src/mip/local_search/rounding/lb_constraint_prop.cu

Repository: NVIDIA/cuopt

Length of output: 494


Fix constructor call: termination_checker_t requires two arguments.

Line 710 attempts to construct termination_checker_t with only max_time_for_bounds_prop:

max_timer = termination_checker_t{max_time_for_bounds_prop};

The class has no single-argument constructor. Both constructors require a second argument: either a parent termination_checker_t& or root_tag_t{}. Use:

max_timer = termination_checker_t{max_time_for_bounds_prop, termination_checker_t::root_tag_t{}};
🤖 Prompt for AI Agents
In @cpp/src/mip/local_search/rounding/lb_constraint_prop.cu around line 710, The
assignment to max_timer constructs termination_checker_t with only
max_time_for_bounds_prop but termination_checker_t has no single-argument ctor;
update the call in lb_constraint_prop (where max_timer is set) to pass the
second required argument—use termination_checker_t::root_tag_t{} as the second
parameter (i.e., construct termination_checker_t with max_time_for_bounds_prop
and termination_checker_t::root_tag_t{}).

Comment on lines +186 to +200
auto termination = termination_checker_t(time_limit, termination_checker_t::root_tag_t{});
termination.set_termination_callback(
[](void* termination_callback_data) {
auto settings = static_cast<mip_solver_settings_t<i_t, f_t>*>(termination_callback_data);
for (auto callback : settings->get_mip_callbacks()) {
if (callback->get_type() != internals::base_solution_callback_type::CHECK_TERMINATION) {
continue;
}
auto check_termination_callback =
static_cast<internals::check_termination_callback_t*>(callback);
if (check_termination_callback->check_termination()) { return true; }
}
return false;
},
(void*)&settings);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find the solve_mip function and understand the scope
cd cpp/src/mip && cat -n solve.cu | head -300

Repository: NVIDIA/cuopt

Length of output: 15975


🏁 Script executed:

# Search for where the callback is actually invoked
rg -n "set_termination_callback|termination_callback" cpp/src/mip/solve.cu -A 5 -B 5

Repository: NVIDIA/cuopt

Length of output: 1032


🏁 Script executed:

# Check the termination_checker_t class to understand lifecycle
find . -name "*.h" -o -name "*.hpp" -o -name "*.cuh" | xargs rg -l "termination_checker_t" | head -5

Repository: NVIDIA/cuopt

Length of output: 247


🏁 Script executed:

# Look for the definition of termination_checker_t
rg -n "class termination_checker_t|struct termination_checker_t" --type=h --type=hpp --type=cuh -A 20

Repository: NVIDIA/cuopt

Length of output: 83


🏁 Script executed:

# Find termination_checker_t definition
rg -n "class termination_checker_t|struct termination_checker_t" -A 30 cpp/src/utilities/termination_checker.hpp

Repository: NVIDIA/cuopt

Length of output: 1185


🏁 Script executed:

# Check where the callback is invoked in termination_checker
rg -n "termination_callback|invoke" cpp/src/utilities/termination_checker.hpp -B 2 -A 2

Repository: NVIDIA/cuopt

Length of output: 758


🏁 Script executed:

# Check if termination object is passed to other functions and if it could outlive solve_mip
rg -n "termination_checker_t" cpp/src/mip/solver.cuh | head -20

Repository: NVIDIA/cuopt

Length of output: 149


🏁 Script executed:

# Check if get_mip_callbacks is a const method
rg -n "get_mip_callbacks" cpp/src -A 2 -B 2 | head -40

Repository: NVIDIA/cuopt

Length of output: 2666


🏁 Script executed:

# Check the full lifetime of termination - does it persist after run_mip?
sed -n '186,320p' cpp/src/mip/solve.cu | grep -n "termination\|run_mip"

Repository: NVIDIA/cuopt

Length of output: 947


🏁 Script executed:

# Verify the callback is only used during run_mip execution
rg -n "check\(\)" cpp/src/utilities/termination_checker.hpp -B 5 -A 5

Repository: NVIDIA/cuopt

Length of output: 653


The lifetime of settings in the termination callback is safe, but const-correctness could be clearer.

The settings reference remains valid throughout solve_mip since it's a function parameter. The callback is only invoked during run_mip (via check() in the termination_checker), and the termination object—being local to solve_mip—persists until after run_mip returns.

However, the cast (void*)&settings followed by the cast back to non-const pointer mip_solver_settings_t<i_t, f_t>* obscures intent. While safe in practice (only const methods like get_mip_callbacks() are called), consider using const_cast explicitly or storing a const void* to clarify that the callback doesn't modify settings:

(const void*)&settings

Then cast to const mip_solver_settings_t<i_t, f_t>* in the lambda for clarity that only read-only access is needed.

🤖 Prompt for AI Agents
In @cpp/src/mip/solve.cu around lines 186-200, The termination callback
currently takes a void* and casts it back to a non-const mip_solver_settings_t,
which obscures intent; change the argument passed to
termination.set_termination_callback to a const void* (e.g. (const
void*)&settings) and inside the lambda static_cast that parameter to const
mip_solver_settings_t<i_t, f_t>* so the lambda only uses read-only access (keep
using termination.set_termination_callback and the existing lambda body but
replace the casts to reflect const-correctness).

… incorrect PDLP objective problem space when passed to B&B from diversity_manager
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
cpp/src/mip/presolve/lb_probing_cache.cu (1)

279-286: Add CUDA error checking for kernel launch.

The compute_min_slack_per_var kernel launch lacks error verification. CUDA errors can silently corrupt results or cause downstream failures if not caught immediately.

As per coding guidelines, every CUDA kernel launch must have error checking with CUDA_CHECK or equivalent verification.

🔍 Proposed fix
   compute_min_slack_per_var<i_t, f_t>
     <<<problem.pb->n_integer_vars, 128, 0, problem.handle_ptr->get_stream()>>>(
       problem.pb->view(),
       make_span_2(bound_presolve.cnst_slack),
       make_span(min_slack_per_var),
       make_span(different_coefficient),
       make_span(max_excess_per_var),
       make_span(max_n_violated_per_constraint));
+  CUDA_CHECK(cudaGetLastError());
cpp/src/mip/presolve/probing_cache.cu (1)

322-330: Add CUDA error checking for kernel launch.

The compute_min_slack_per_var kernel launch lacks error verification. CUDA errors can silently corrupt results or cause downstream failures if not caught immediately.

As per coding guidelines, every CUDA kernel launch must have error checking with CUDA_CHECK or equivalent verification.

🔍 Proposed fix
   compute_min_slack_per_var<i_t, f_t>
     <<<problem.n_integer_vars, 128, 0, problem.handle_ptr->get_stream()>>>(
       problem.view(),
       make_span(bound_presolve.upd.min_activity),
       make_span(bound_presolve.upd.max_activity),
       make_span(min_slack_per_var),
       make_span(different_coefficient),
       make_span(max_excess_per_var),
       make_span(max_n_violated_per_constraint));
+  CUDA_CHECK(cudaGetLastError());
cpp/src/dual_simplex/phase2.cpp (2)

1224-1232: Distinguish TIME_LIMIT vs CONCURRENT_LIMIT when initializing steepest-edge norms

In initialize_steepest_edge_norms:

if (settings.check_termination(start_time)) { return -1; }
if (settings.concurrent_halt != nullptr && *settings.concurrent_halt == 1) { return -1; }

But in dual_phase2_with_advanced_basis the call site interprets:

if (phase2::initialize_steepest_edge_norms(...) == -1) {
  return dual::status_t::TIME_LIMIT;
}

Everywhere else in this file and in crossover, -2 is consistently used for concurrent limits and mapped to CONCURRENT_LIMIT. Here, a concurrent halt during steepest-edge initialization is indistinguishable from a time limit and will be surfaced as TIME_LIMIT, breaking the new uniform semantics.

I’d recommend:

  • Returning -1 for check_termination(start_time) and -2 when *settings.concurrent_halt == 1, and
  • Extending the caller to map -2 to dual::status_t::CONCURRENT_LIMIT.

For example:

Proposed adjustment
 // initialize_steepest_edge_norms(...)
-    if (settings.check_termination(start_time)) { return -1; }
-    if (settings.concurrent_halt != nullptr && *settings.concurrent_halt == 1) { return -1; }
+    if (settings.check_termination(start_time)) { return -1; }  // TIME_LIMIT
+    if (settings.concurrent_halt != nullptr && *settings.concurrent_halt == 1) {
+      return -2;  // CONCURRENT_LIMIT
+    }
 // dual_phase2_with_advanced_basis(...)
-      if (phase2::initialize_steepest_edge_norms(
-            lp, settings, start_time, basic_list, ft, delta_y_steepest_edge) == -1) {
-        return dual::status_t::TIME_LIMIT;
-      }
+      const i_t se_status = phase2::initialize_steepest_edge_norms(
+        lp, settings, start_time, basic_list, ft, delta_y_steepest_edge);
+      if (se_status == -1) { return dual::status_t::TIME_LIMIT; }
+      if (se_status == -2) { return dual::status_t::CONCURRENT_LIMIT; }

Also applies to: 2320-2327


2932-2935: Fix syntax in STEEPEST_EDGE_DEBUG block

Under #ifdef STEEPEST_EDGE_DEBUG you have:

if (iter < 100 || iter % 100 == 0))
{
  phase2::check_steepest_edge_norms(...);
}

The extra ) makes this block ill-formed C++ if STEEPEST_EDGE_DEBUG is ever enabled.

A minimal fix:

-#ifdef STEEPEST_EDGE_DEBUG
-    if (iter < 100 || iter % 100 == 0))
-    {
+#ifdef STEEPEST_EDGE_DEBUG
+    if (iter < 100 || iter % 100 == 0)
+    {
🤖 Fix all issues with AI agents
In @cpp/src/mip/local_search/local_search.cu:
- Around line 251-252: Several local termination_checkers are being constructed
using context.termination as the parent, which bypasses the provided parent
timer; update each construction to chain through the passed-in parent timer
instead. Replace instantiations like
termination_checker_t(std::min(timer.remaining_time(), 2.), context.termination)
with termination_checker_t(std::min(timer.remaining_time(), 2.), timer), and
apply the same change for the other occurrences mentioned (near the uses at the
locations corresponding to the original lines 280, 283, 402-403, 475) so each
sub-timer recursively checks the immediate parent timer rather than
context.termination.

In @cpp/src/mip/problem/problem.cu:
- Around line 1783-1787: The method get_solver_obj_from_user_obj currently
divides by presolve_data.objective_scaling_factor which can be zero; add
validation: either enforce non-zero/near-zero rejection in the public setter
set_objective_scaling_factor (reject values with fabs(value) < kEpsilon and
return error/throw) and document it, and also harden
get_solver_obj_from_user_obj by guarding against tiny scaling factors (check
fabs(presolve_data.objective_scaling_factor) < kEpsilon and either throw a clear
exception or use a safe fallback) so division by zero cannot occur; reference
the same epsilon and error messaging approach used in convergence_information.cu
for consistency.

In @cpp/src/mip/solver.cu:
- Line 186: termination_checker_t currently accesses termination_callback_ and
termination_callback_data_ from different threads without synchronization; add
thread-safety by protecting those members with a mutex (e.g., add std::mutex
mutex_ to termination_checker_t and lock in both set_termination_callback() and
check()) or, if preferred, make the function pointer and data pointer atomic
(std::atomic for the pointer types) and perform atomic loads/stores in
set_termination_callback() and check() so the B&B thread always sees a
consistent pair; ensure branch_and_bound_settings.termination (which points to
context.termination) still calls into the synchronized check() method.

In @cpp/tests/mip/load_balancing_test.cu:
- Around line 141-142: The temporary termination_checker_t(0) is being passed to
the mip_solver_t constructor which stores a reference (timer_) and thus causes a
dangling reference; fix by creating a named local variable like
termination_checker_t timer(0) (or whatever identifier is used in
multi_probe_test.cu) and pass timer to the detail::mip_solver_t<int, double>
constructor instead of the temporary, ensuring the named variable outlives the
solver; also add #include <limits> if that test needs it.
🧹 Nitpick comments (9)
cpp/src/mip/presolve/probing_cache.cu (1)

491-512: Optimize termination handling in parallel loop.

When timer.check() returns true at line 495, the current code uses continue, causing all remaining iterations to execute the termination check unnecessarily. In an OpenMP parallel loop, this wastes cycles across all threads.

♻️ Proposed refactor to use a shared termination flag
   // Atomic variables for tracking progress
   std::atomic<size_t> n_of_implied_singletons(0);
   std::atomic<size_t> n_of_cached_probings(0);
+  std::atomic<bool> should_terminate(false);
 
 // Main parallel loop
 #pragma omp parallel
   {
 #pragma omp for schedule(static, 4)
     for (auto var_idx : priority_indices) {
-      if (timer.check()) { continue; }
+      if (should_terminate.load(std::memory_order_relaxed) || timer.check()) { 
+        should_terminate.store(true, std::memory_order_relaxed);
+        continue; 
+      }
 
       int thread_idx = omp_get_thread_num();
       CUOPT_LOG_TRACE("Computing probing cache for var %d on thread %d", var_idx, thread_idx);

Alternatively, check the flag first to avoid redundant timer.check() calls:

-      if (timer.check()) { continue; }
+      if (should_terminate.load(std::memory_order_relaxed)) { continue; }
+      if (timer.check()) { 
+        should_terminate.store(true, std::memory_order_relaxed);
+        continue; 
+      }
cpp/src/mip/diversity/recombiners/bound_prop_recombiner.cuh (1)

180-181: LGTM! Termination checker correctly integrated.

The replacement of timer_t with termination_checker_t is implemented correctly in both code paths. The timer objects are properly initialized with the time limit and termination context, and passed to constraint_prop.apply_round() for termination checking.

Optional enhancement for more responsive termination:

In the first code path (lines 178-213), several operations occur between timer creation and the first termination check in apply_round(): compute_vars_to_fix, fix_variables, device copies, sync_stream, get_probing_values_for_feasible, and host_copy. If any of these operations are time-consuming, termination response could be delayed. Consider adding an explicit termination check after the most expensive operations to improve responsiveness to SIGINT.

💡 Example of adding explicit termination check
 offspring.handle_ptr->sync_stream();
+if (timer.is_terminated()) {
+  CUOPT_LOG_DEBUG("Termination requested during recombination");
+  return std::make_pair(offspring, false);
+}
 offspring.assignment  = std::move(fixed_assignment);

Note: This assumes termination_checker_t exposes a method like is_terminated() for explicit checking. Adjust based on the actual API.

Also applies to: 215-216

cpp/src/mip/local_search/local_search.cu (2)

670-671: Redundant termination check in loop.

The loop condition at line 670 already includes !timer.check(), making the check at line 671 redundant. Consider removing it to avoid unnecessary duplicate checks.

♻️ Simplify by removing redundant check
 for (i_t i = 0; i < n_fp_iterations && !timer.check(); ++i) {
-  if (timer.check()) {
-    is_feasible = false;
-    break;
-  }
   CUOPT_LOG_DEBUG("fp_loop it %d last_improved_iteration %d", i, last_improved_iteration);

250-256: Redundant termination check in generate_fast_solution.

The loop condition at line 250 already checks !timer.check(), making the check at line 256 redundant. The loop will naturally exit when the timer expires.

♻️ Remove redundant check
 while (!timer.check()) {
   termination_checker_t constr_prop_timer =
     termination_checker_t(std::min(timer.remaining_time(), 2.), context.termination);
   constraint_prop.apply_round(solution, 1., constr_prop_timer);
   if (solution.compute_feasibility()) { return; }
-  if (timer.check()) { return; };
   f_t time_limit = std::min(3., timer.remaining_time());
cpp/tests/mip/mip_utils.cuh (1)

118-120: Consider adding null check for callbacks.

The loop iterates over all callbacks without filtering nulls. If the caller inadvertently passes a vector containing null pointers, this could cause issues when set_mip_callback is invoked.

♻️ Optional defensive check
   for (auto callback : callbacks) {
+    if (callback == nullptr) { continue; }
     settings.set_mip_callback(callback);
   }
cpp/src/dual_simplex/basis_solves.cpp (1)

364-367: Clarify and de-magic the -2 concurrent-halt return from factorize_basis

factorize_basis now uses -2 to signal settings.concurrent_halt == 1, distinct from rank-deficiency -1. Callers in this PR already treat -2 as a concurrent limit, but the raw integer makes it easy for other sites to mis-handle this status.

Consider:

  • Introducing a small enum or constexpr i_t for CONCURRENT_HALT and RANK_DEFICIENT, and
  • Verifying all uses of factorize_basis across the codebase handle -2 explicitly as a concurrent-limit condition.

Also applies to: 583-586

cpp/src/dual_simplex/primal.cpp (1)

298-311: Concurrent-halt propagation from factorize_basis in primal_phase2 looks correct

Mapping rank == -2 and repair_rank == -2 to primal::status_t::CONCURRENT_HALT, and repair_rank == -1 to NUMERICAL, is consistent with the new factorization status semantics.

Minor nit: in the repaired path the log still prints rank instead of repair_rank, which can be misleading when the second factorization fails differently than the first. If you touch this again, consider logging repair_rank there.

cpp/src/dual_simplex/phase2.cpp (1)

2885-2907: Confirm refactor_basis cannot return -1, or handle it explicitly in the repair loop

In the refactor/repair block:

i_t refactor_result = ft.refactor_basis(...);
if (refactor_result == -2) { return dual::status_t::CONCURRENT_LIMIT; }
if (refactor_result > 0) {
  ...
  i_t deficient_size;
  while ((deficient_size = ft.refactor_basis(...)) > 0) {
    ...
  }
  if (deficient_size == -2) { return dual::status_t::CONCURRENT_LIMIT; }
  settings.log.printf("Successfully repaired basis. Iteration %d\n", iter);
}

This assumes ft.refactor_basis only returns:

  • 0 on success,
  • >0 for “deficient columns remaining”, and
  • -2 for concurrent halt.

If refactor_basis can ever return -1 for unrecoverable numerical issues, this code will:

  • Exit the while (since -1 > 0 is false),
  • Skip the deficient_size == -2 check, and
  • Log “Successfully repaired basis” even though refactor failed.

Please confirm the contract of refactor_basis. If -1 is a possible return, it should be handled explicitly (e.g., mapping to dual::status_t::NUMERICAL) before treating the repair as successful.

cpp/src/dual_simplex/crossover.cpp (1)

532-539: Check that check_termination is purely time/user-stop, not concurrent-halt aware

In dual_push’s loop:

if (settings.check_termination(start_time)) {
  settings.log.printf("Crossover time exceeded\n");
  return -1;
}
if (settings.concurrent_halt != nullptr && *settings.concurrent_halt == 1) {
  settings.log.printf("Concurrent halt\n");
  return -2;
}

Callers then convert these to TIME_LIMIT vs CONCURRENT_LIMIT through return_to_status.

This separation is fine if settings.check_termination(start_time) only covers time / user-stop / work limits. If it also interprets settings.concurrent_halt internally, the first branch would fire and you’d report TIME_LIMIT instead of CONCURRENT_LIMIT, making the second branch unreachable.

Please confirm the intended contract of check_termination and whether it is guaranteed not to subsume the concurrent-halt condition.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bac257 and 606a5f8.

📒 Files selected for processing (21)
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/mip/diversity/recombiners/bound_prop_recombiner.cuh
  • cpp/src/mip/local_search/local_search.cu
  • cpp/src/mip/presolve/lb_probing_cache.cu
  • cpp/src/mip/presolve/probing_cache.cu
  • cpp/src/mip/problem/problem.cu
  • cpp/src/mip/problem/problem.cuh
  • cpp/src/mip/solve.cu
  • cpp/src/mip/solver.cu
  • cpp/tests/mip/elim_var_remap_test.cu
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/tests/mip/mip_utils.cuh
  • cpp/tests/mip/multi_probe_test.cu
🚧 Files skipped from review as they are similar to previous changes (1)
  • cpp/src/dual_simplex/branch_and_bound.cpp
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{cu,cuh}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Files:

  • cpp/src/mip/problem/problem.cu
  • cpp/tests/mip/mip_utils.cuh
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/src/mip/solver.cu
  • cpp/src/mip/solve.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/mip/presolve/probing_cache.cu
  • cpp/src/mip/presolve/lb_probing_cache.cu
  • cpp/src/mip/diversity/recombiners/bound_prop_recombiner.cuh
  • cpp/src/mip/problem/problem.cuh
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/mip/elim_var_remap_test.cu
**/*.cu

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Files:

  • cpp/src/mip/problem/problem.cu
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/src/mip/solver.cu
  • cpp/src/mip/solve.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/mip/presolve/probing_cache.cu
  • cpp/src/mip/presolve/lb_probing_cache.cu
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/mip/elim_var_remap_test.cu
**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

  • cpp/src/mip/problem/problem.cu
  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/tests/mip/mip_utils.cuh
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/src/mip/solver.cu
  • cpp/src/mip/solve.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/presolve/probing_cache.cu
  • cpp/src/mip/presolve/lb_probing_cache.cu
  • cpp/src/mip/diversity/recombiners/bound_prop_recombiner.cuh
  • cpp/src/mip/problem/problem.cuh
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/mip/elim_var_remap_test.cu
**/*.{cu,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Files:

  • cpp/src/mip/problem/problem.cu
  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/src/mip/solver.cu
  • cpp/src/mip/solve.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/presolve/probing_cache.cu
  • cpp/src/mip/presolve/lb_probing_cache.cu
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/mip/elim_var_remap_test.cu
**/*.{cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
**/*.{h,hpp,py}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Verify C API does not break ABI stability (no struct layout changes, field reordering); maintain backward compatibility in Python and server APIs with deprecation warnings

Files:

  • cpp/src/dual_simplex/primal.hpp
**/*test*.{cpp,cu,py}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)
Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment
Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Add tests for problem transformations: verify correctness of original→transformed→postsolve mappings and index consistency across problem representations
Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling

Files:

  • cpp/tests/mip/load_balancing_test.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/tests/mip/elim_var_remap_test.cu
🧠 Learnings (25)
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations

Applied to files:

  • cpp/src/mip/problem/problem.cu
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/mip/solver.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/mip/presolve/probing_cache.cu
  • cpp/src/mip/presolve/lb_probing_cache.cu
  • cpp/src/mip/problem/problem.cuh
  • cpp/tests/mip/elim_var_remap_test.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/src/mip/problem/problem.cu
  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/src/mip/solver.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/problem/problem.cuh
  • cpp/tests/mip/elim_var_remap_test.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms

Applied to files:

  • cpp/src/mip/problem/problem.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/dual_simplex/basis_updates.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/mip/solver.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/presolve/lb_probing_cache.cu
  • cpp/src/mip/diversity/recombiners/bound_prop_recombiner.cuh
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/mip/elim_var_remap_test.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/src/mip/solver.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/mip/elim_var_remap_test.cu
📚 Learning: 2025-12-03T23:29:26.391Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 500
File: cpp/src/dual_simplex/sparse_matrix.cpp:519-524
Timestamp: 2025-12-03T23:29:26.391Z
Learning: In cpp/src/dual_simplex/sparse_matrix.cpp, the check_matrix() function is debug/diagnostic code (wrapped in #ifdef CHECK_MATRIX) that intentionally prints errors without necessarily returning early. The return codes from this debug code are not actively checked; the purpose is to print all validation errors in one pass for better diagnostics.

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/mip/solver.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/presolve/probing_cache.cu
  • cpp/src/mip/presolve/lb_probing_cache.cu
  • cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cpp,hpp,h} : Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/presolve/probing_cache.cu
  • cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle

Applied to files:

  • cpp/src/dual_simplex/primal.cpp
  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/crossover.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)

Applied to files:

  • cpp/src/dual_simplex/basis_solves.cpp
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.

Applied to files:

  • cpp/src/dual_simplex/basis_solves.cpp
  • cpp/src/dual_simplex/primal.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/src/mip/solver.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/mip/elim_var_remap_test.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover

Applied to files:

  • cpp/src/dual_simplex/phase2.cpp
  • cpp/tests/mip/mip_utils.cuh
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/tests/mip/elim_var_remap_test.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment

Applied to files:

  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling

Applied to files:

  • cpp/tests/mip/load_balancing_test.cu
  • cpp/src/mip/diversity/diversity_manager.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/src/mip/presolve/lb_probing_cache.cu
  • cpp/tests/mip/elim_var_remap_test.cu
📚 Learning: 2025-12-04T04:11:12.640Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 500
File: cpp/src/dual_simplex/scaling.cpp:68-76
Timestamp: 2025-12-04T04:11:12.640Z
Learning: In the cuOPT dual simplex solver, CSR/CSC matrices (including the quadratic objective matrix Q) are required to have valid dimensions and indices by construction. Runtime bounds checking in performance-critical paths like matrix scaling is avoided to prevent slowdowns. Validation is performed via debug-only check_matrix() calls wrapped in #ifdef CHECK_MATRIX.

Applied to files:

  • cpp/src/mip/solver.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cpp,hpp,h} : Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths

Applied to files:

  • cpp/src/mip/diversity/diversity_manager.cu
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.

Applied to files:

  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/src/mip/local_search/local_search.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for problem transformations: verify correctness of original→transformed→postsolve mappings and index consistency across problem representations

Applied to files:

  • cpp/tests/mip/elim_var_remap_test.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*benchmark*.{cpp,cu,py} : Include performance benchmarks and regression detection for GPU operations; verify near real-time performance on million-variable problems

Applied to files:

  • cpp/tests/mip/elim_var_remap_test.cu
🧬 Code graph analysis (10)
cpp/src/dual_simplex/primal.cpp (1)
cpp/src/dual_simplex/basis_solves.cpp (6)
  • basis_repair (614-665)
  • basis_repair (614-620)
  • basis_repair (850-856)
  • factorize_basis (157-611)
  • factorize_basis (157-166)
  • factorize_basis (839-848)
cpp/tests/mip/load_balancing_test.cu (1)
cpp/src/utilities/termination_checker.hpp (4)
  • termination_checker_t (38-41)
  • termination_checker_t (38-38)
  • termination_checker_t (42-45)
  • termination_checker_t (42-42)
cpp/src/mip/solver.cu (2)
cpp/src/mip/diversity/diversity_manager.cu (1)
  • lp_timer (362-362)
cpp/src/utilities/termination_checker.hpp (4)
  • termination_checker_t (38-41)
  • termination_checker_t (38-38)
  • termination_checker_t (42-45)
  • termination_checker_t (42-42)
cpp/src/mip/solve.cu (3)
cpp/src/utilities/termination_checker.hpp (4)
  • termination_checker_t (38-41)
  • termination_checker_t (38-38)
  • termination_checker_t (42-45)
  • termination_checker_t (42-42)
cpp/include/cuopt/linear_programming/solver_settings.hpp (6)
  • settings (35-35)
  • settings (35-35)
  • settings (39-39)
  • settings (39-39)
  • callback (84-84)
  • callback (85-85)
cpp/include/cuopt/linear_programming/utilities/internals.hpp (3)
  • CHECK_TERMINATION (62-65)
  • GET_SOLUTION (44-47)
  • SET_SOLUTION (53-56)
cpp/src/mip/diversity/diversity_manager.cu (1)
cpp/src/mip/local_search/local_search.cu (2)
  • run_local_search (267-309)
  • run_local_search (267-270)
cpp/tests/mip/multi_probe_test.cu (1)
cpp/src/utilities/termination_checker.hpp (4)
  • termination_checker_t (38-41)
  • termination_checker_t (38-38)
  • termination_checker_t (42-45)
  • termination_checker_t (42-42)
cpp/src/dual_simplex/basis_updates.cpp (1)
cpp/src/dual_simplex/basis_solves.cpp (3)
  • factorize_basis (157-611)
  • factorize_basis (157-166)
  • factorize_basis (839-848)
cpp/src/mip/diversity/lns/rins.cu (1)
cpp/src/utilities/timer.hpp (1)
  • time_limit (46-46)
cpp/src/mip/local_search/local_search.cu (1)
cpp/src/utilities/termination_checker.hpp (4)
  • termination_checker_t (38-41)
  • termination_checker_t (38-38)
  • termination_checker_t (42-45)
  • termination_checker_t (42-42)
cpp/tests/mip/elim_var_remap_test.cu (2)
cpp/src/utilities/termination_checker.hpp (4)
  • termination_checker_t (38-41)
  • termination_checker_t (38-38)
  • termination_checker_t (42-45)
  • termination_checker_t (42-42)
cpp/src/mip/relaxed_lp/relaxed_lp.cu (4)
  • get_relaxed_lp_solution (27-35)
  • get_relaxed_lp_solution (27-31)
  • get_relaxed_lp_solution (38-114)
  • get_relaxed_lp_solution (38-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.12, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.13, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.11, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-sh-client / 13.1.0, 3.10, amd64, rockylinux8
🔇 Additional comments (25)
cpp/src/mip/presolve/lb_probing_cache.cu (1)

312-312: LGTM: Termination checker integration is correct.

The function signature correctly accepts const termination_checker_t& as a parameter. The usage of timer.remaining_time() at line 326 and timer.check() at line 330 aligns with the PR's objective to provide unified termination handling across time limits, user callbacks, and SIGINT.

Also applies to: 326-326, 330-330, 403-403

cpp/src/mip/presolve/probing_cache.cu (1)

458-458: LGTM: Termination checker integration is correct.

The function signature correctly accepts const termination_checker_t& as a parameter. The usage of timer.remaining_time() at line 470 and timer.check() at line 495 aligns with the PR's objective to unify termination handling for time limits, user callbacks, and SIGINT.

Also applies to: 470-470, 524-524

cpp/src/dual_simplex/basis_updates.cpp (2)

2058-2071: LGTM! Proper termination handling for first factorization attempt.

The explicit capture of factorize_result and the early exit when -2 (concurrent halt) is returned ensures the solver stops promptly when termination is requested. This aligns with the PR's objective of implementing graceful SIGINT handling.


2093-2103: LGTM! Consistent termination handling for retry after basis repair.

The retry factorization properly checks for -2 (concurrent halt) again, ensuring that termination requests are honored even if they arrive during the basis repair phase. This maintains consistent termination semantics throughout the refactoring process.

cpp/src/mip/problem/problem.cuh (2)

3-3: LGTM: Copyright year updated appropriately.

The copyright year range has been updated to include 2026, which is appropriate for new changes in this file.


94-94: LGTM: Function declaration is correct.

The declaration of get_solver_obj_from_user_obj is properly placed adjacent to its inverse function get_user_obj_from_solver_obj and correctly matches the implementation in the source file.

cpp/src/mip/diversity/lns/rins.cu (1)

221-255: LGTM! Termination checker integration is correct.

The termination checker is properly initialized with both time limit and parent termination context, enabling RINS sub-problems to respect external termination signals (Ctrl-C, user callbacks, time limits). The lifetime of the local rins_termination object is appropriate—it outlives all usages including the synchronous branch_and_bound.solve() call at line 278.

cpp/src/mip/diversity/diversity_manager.cu (4)

55-55: LGTM! Timer initialization updated correctly.

The timer member is properly initialized with both time limit and parent termination context, enabling the diversity manager to respect external termination signals.


181-362: LGTM! Hierarchical termination checker pattern is well-designed.

The local termination checkers at lines 181, 223, 330, and 362 correctly chain to parent timers, enabling hierarchical time budgets for sub-operations (presolve, fast solution generation, probing, LP solve). This ensures external termination signals propagate through the call hierarchy.


248-248: LGTM! Termination checks enable responsive cancellation.

The termination checks at lines 248, 477, 505, 558, 577, and 581 are well-placed at loop boundaries and between major operations, enabling the solver to respond promptly to Ctrl-C or user termination requests per the PR objectives.

Also applies to: 477-477, 505-505, 558-558, 577-577, 581-581


437-444: Objective space conversion verified as correct.

The conversion from PDLP's user-space objective to solver-space is properly implemented. The get_solver_obj_from_user_obj() function correctly applies the inverse transformation (user_obj / objective_scaling_factor) - objective_offset, and the callback parameters are correctly ordered: param4 receives solver_obj (solver-space) and param5 receives user_obj (user-space). The comments accurately document these expectations and the fix addresses the intended issue.

cpp/src/mip/local_search/local_search.cu (2)

237-238: LGTM! Function signatures consistently updated.

All local search function signatures (lines 237-238, 269, 314, 332, 362, 385, 427-428, 447, 640) are consistently updated to accept const termination_checker_t& instead of timer_t. The use of const references is appropriate since these functions query the timer without mutating it.

Also applies to: 269-269, 314-314, 332-332, 362-362, 385-385, 427-428, 447-447, 640-640


652-652: FP timer assignment pattern is consistent.

The fp.timer assignments at lines 652 and 788 correctly propagate termination context to the FP solver, enabling it to respect time limits and external termination signals during feasibility pump iterations.

Also applies to: 788-788

cpp/src/dual_simplex/primal.hpp (1)

26-28: LGTM!

The new CONCURRENT_HALT enum value properly extends the status enumeration for the unified termination handling introduced in this PR. The trailing comma addition is good practice for minimizing future diffs.

cpp/tests/mip/elim_var_remap_test.cu (2)

153-154: LGTM!

Correct usage of termination_checker_t with infinite time limit and root_tag_t{} to establish a root termination context for the test. This aligns with the PR's termination handling pattern.


161-162: Timer reuse is appropriate here.

Sharing the same termination_checker_t across both get_relaxed_lp_solution calls ensures both LP solves operate under the same root termination context, which is correct for this test scenario.

Also applies to: 192-192

cpp/src/mip/solve.cu (3)

186-200: Termination callback wiring looks correct.

The lambda correctly:

  1. Casts the callback data back to the settings pointer
  2. Filters for CHECK_TERMINATION type before casting
  3. Returns true on first termination request, short-circuiting the loop

One note: the cast (void*)&settings is safe since settings is a const reference that outlives the termination object.


207-226: Appropriate presolve gating for callbacks and initial solutions.

The logic correctly disables presolve when:

  1. GET_SOLUTION or SET_SOLUTION callbacks are registered (since presolve would alter variable indices)
  2. Initial solutions are provided (since presolve would invalidate them)

The warning messages appropriately inform users of the automatic behavior change.


252-252: Correct timing source update.

Using termination.elapsed_time() instead of a separate timer properly integrates presolve timing with the unified termination mechanism.

cpp/src/mip/solver.cu (2)

48-56: Constructor properly wires termination checker.

The termination_checker_t& parameter is correctly propagated to the context and stored as timer_. This ensures the solver has access to the unified termination mechanism throughout its lifetime.


136-136: Correct parent-child termination hierarchy.

The lp_timer is properly constructed with context.termination as its parent, establishing the termination chain so that LP termination respects the overall MIP termination conditions.

cpp/tests/mip/multi_probe_test.cu (1)

153-154: LGTM!

Correct integration of termination_checker_t for the test. Using infinite time limit with root_tag_t{} is appropriate for test scenarios where timeout shouldn't interfere with test execution.

Also applies to: 164-164

cpp/src/dual_simplex/crossover.cpp (3)

24-33: Central mapping from internal int codes to crossover_status_t is clean

return_to_status giving:

  • -1 → TIME_LIMIT
  • -2 → CONCURRENT_LIMIT
  • otherwise NUMERICAL_ISSUES

provides a single place to interpret the various int return codes from dual_push / primal_push / local helpers. This avoids repeating switch logic at each call site and keeps status propagation coherent.


784-799: Primal push: concurrent-halt vs numerical failure after basis repair handled correctly

In primal_push, after factorize_basis and optional basis_repair:

i_t rank = factorize_basis(...);
if (rank == -2) { return -2; }  // Concurrent halt
if (rank != m) {
  ...
  i_t repair_rank = factorize_basis(...);
  if (repair_rank == -2) { return -2; }  // Concurrent halt
  if (repair_rank == -1) {
    ... // log
    return -1;
  } else {
    settings.log.debug("Basis repaired\n");
  }
}

This mirrors the semantics used elsewhere:

  • -2 consistently signals concurrent halt,
  • -1 after repair becomes a numerical failure,
  • m (full rank) continues normally.

Given the top-level uses return_to_status to convert these into crossover_status_t, this path now behaves as expected.


1135-1155: Crossover’s initial factorization now cleanly distinguishes concurrent halt, numerical issues, and time limit

For the initial basis factorization:

rank = factorize_basis(...);
if (rank == -2) { return crossover_status_t::CONCURRENT_LIMIT; }
if (rank != m) {
  ...
  i_t repair_rank = factorize_basis(...);
  if (repair_rank == -2) { return crossover_status_t::CONCURRENT_LIMIT; }
  if (repair_rank == -1) {
    ... // log
    return crossover_status_t::NUMERICAL_ISSUES;
  }
  ...
}

if (settings.check_termination(start_time)) {
  settings.log.printf("Time limit exceeded\n");
  return crossover_status_t::TIME_LIMIT;
}
if (settings.concurrent_halt != nullptr && *settings.concurrent_halt == 1) {
  settings.log.printf("Concurrent halt\n");
  return crossover_status_t::CONCURRENT_LIMIT;
}

This matches the new global convention:

  • factorize_basis == -2 → concurrent limit,
  • unrepaired rank deficiency → numerical issue,
  • check_termination(start_time) → time/user stop,
  • explicit concurrent_halt watched once more before proceeding.

The layering looks correct and keeps statuses for “local factorization error” separate from external termination conditions.

Comment on lines +251 to +252
termination_checker_t constr_prop_timer =
termination_checker_t(std::min(timer.remaining_time(), 2.), context.termination);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find termination_checker_t class definition and constructor
find . -name "*.hpp" -o -name "*.h" | xargs grep -l "termination_checker_t" | head -5

Repository: NVIDIA/cuopt

Length of output: 150


🏁 Script executed:

# Search for termination_checker_t constructor definition
rg "class termination_checker_t|struct termination_checker_t" -A 20

Repository: NVIDIA/cuopt

Length of output: 3197


🏁 Script executed:

# Read the specific lines in local_search.cu mentioned in the review
cat -n cpp/src/mip/local_search/local_search.cu | sed -n '245,260p'

Repository: NVIDIA/cuopt

Length of output: 1003


🏁 Script executed:

# Check lines 280, 283, 402-403, 475
cat -n cpp/src/mip/local_search/local_search.cu | sed -n '275,290p'
cat -n cpp/src/mip/local_search/local_search.cu | sed -n '398,410p'
cat -n cpp/src/mip/local_search/local_search.cu | sed -n '470,480p'

Repository: NVIDIA/cuopt

Length of output: 2251


🏁 Script executed:

# Find the function containing lines 251-252
cat -n cpp/src/mip/local_search/local_search.cu | sed -n '140,160p'

Repository: NVIDIA/cuopt

Length of output: 1123


🏁 Script executed:

# Find function signature for lines 280-283 context
cat -n cpp/src/mip/local_search/local_search.cu | sed -n '260,285p'

Repository: NVIDIA/cuopt

Length of output: 1324


🏁 Script executed:

# Find the overall function structure
rg -n "^[a-zA-Z_].*\(.*termination_checker_t.*\)" cpp/src/mip/local_search/local_search.cu -B 5 | head -50

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look at context around line 154 where initial timer is created
cat -n cpp/src/mip/local_search/local_search.cu | sed -n '145,160p'

Repository: NVIDIA/cuopt

Length of output: 958


🏁 Script executed:

# Search for patterns of how sub-task timers are created throughout the codebase
rg "termination_checker_t.*remaining_time.*context\.termination" cpp/src/mip/local_search/local_search.cu

Repository: NVIDIA/cuopt

Length of output: 402


🏁 Script executed:

# Check if there are any sub-task timers that DO chain through parent timer
rg "termination_checker_t.*remaining_time\(" cpp/src/mip/local_search/local_search.cu | head -20

Repository: NVIDIA/cuopt

Length of output: 402


🏁 Script executed:

# Find the check() method implementation
rg -A 10 "bool check\(\)" cpp/src/utilities/termination_checker.hpp

Repository: NVIDIA/cuopt

Length of output: 410


🏁 Script executed:

# Also check if there's any documentation about parent vs direct context.termination usage
rg -B 5 -A 5 "context\.termination" cpp/src/mip/local_search/local_search.cu | head -60

Repository: NVIDIA/cuopt

Length of output: 2714


Fix timer parent chaining in local termination checkers.

Several local termination checkers are initialized with context.termination directly rather than chaining through the parent timer parameter. This bypasses parent time constraints and allows sub-tasks to exceed their allocated time budget.

Since termination_checker_t::check() recursively checks its parent (if set), creating sub-timers with context.termination as parent skips intermediate timer constraints.

Affected lines should chain through the parent timer instead:

Lines 251-252
-termination_checker_t constr_prop_timer = termination_checker_t(std::min(timer.remaining_time(), 2.), context.termination);
+termination_checker_t constr_prop_timer = termination_checker_t(std::min(timer.remaining_time(), 2.), timer);

Also applies to: 280, 283, 402-403, 475

🤖 Prompt for AI Agents
In @cpp/src/mip/local_search/local_search.cu around lines 251 - 252, Several
local termination_checkers are being constructed using context.termination as
the parent, which bypasses the provided parent timer; update each construction
to chain through the passed-in parent timer instead. Replace instantiations like
termination_checker_t(std::min(timer.remaining_time(), 2.), context.termination)
with termination_checker_t(std::min(timer.remaining_time(), 2.), timer), and
apply the same change for the other occurrences mentioned (near the uses at the
locations corresponding to the original lines 280, 283, 402-403, 475) so each
sub-timer recursively checks the immediate parent timer rather than
context.termination.

Comment on lines +1783 to +1787
template <typename i_t, typename f_t>
f_t problem_t<i_t, f_t>::get_solver_obj_from_user_obj(f_t user_obj) const
{
return (user_obj / presolve_data.objective_scaling_factor) - presolve_data.objective_offset;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for objective_scaling_factor initialization and assignments
rg -n "objective_scaling_factor\s*=" --type=cpp --type=cu -B2 -A2

Repository: NVIDIA/cuopt

Length of output: 82


🏁 Script executed:

# Search for any assertions or validation on objective_scaling_factor
rg -n "objective_scaling_factor" --type=cpp --type=cu | head -50

Repository: NVIDIA/cuopt

Length of output: 82


🏁 Script executed:

# Look for presolve_data structure definition
rg -n "struct.*presolve_data|class.*presolve_data" --type=cpp --type=cu --type=hpp --type=h -A10

Repository: NVIDIA/cuopt

Length of output: 82


🏁 Script executed:

# Search for division operations in problem.cu that might indicate patterns for handling scaling factors
rg -n "\/\s*\w*scaling" cpp/src/mip/problem/problem.cu -B1 -A1

Repository: NVIDIA/cuopt

Length of output: 137


🏁 Script executed:

# Search for objective_scaling_factor using glob patterns
rg -n "objective_scaling_factor" -g "*.cu" -g "*.cpp" -g "*.hpp" -g "*.h" -B2 -A2 | head -100

Repository: NVIDIA/cuopt

Length of output: 7661


🏁 Script executed:

# Search for presolve_data structure definition
rg -n "struct presolve_data|objective_scaling_factor\s*=" -g "*.cpp" -g "*.cu" -g "*.hpp" -g "*.h" -A5 | head -80

Repository: NVIDIA/cuopt

Length of output: 430


🏁 Script executed:

# Look for initialization of presolve_data and objective_scaling_factor
rg -n "presolve_data\." -g "*.cpp" -g "*.cu" -g "*.hpp" | grep -i "scaling\|init" | head -30

Repository: NVIDIA/cuopt

Length of output: 1801


🏁 Script executed:

# Find where objective_scaling_factor is set to non-zero or validated
rg -n "objective_scaling_factor.*=" -g "*.cpp" -g "*.cu" -g "*.hpp" -g "*.h" -B1 -A3 | head -100

Repository: NVIDIA/cuopt

Length of output: 2797


🏁 Script executed:

# Search for presolve_data initialization in problem.cu
rg -n "presolve_data\s*=" -g "*.cu" -g "*.cpp" -B3 -A3 | head -80

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check for validation in set_objective_scaling_factor
rg -n "set_objective_scaling_factor" -g "*.cpp" -g "*.cu" -B2 -A5 | head -120

Repository: NVIDIA/cuopt

Length of output: 6015


🏁 Script executed:

# Look for validation or guards when calling get_solver_obj_from_user_obj
rg -n "get_solver_obj_from_user_obj" -g "*.cpp" -g "*.cu" -B3 -A3

Repository: NVIDIA/cuopt

Length of output: 1261


🏁 Script executed:

# Check if there are any guards or epsilon checks around the division
rg -n "objective_scaling_factor.*\!" -g "*.cpp" -g "*.cu" -B2 -A2

Repository: NVIDIA/cuopt

Length of output: 176


🏁 Script executed:

# Search for where presolve_data.objective_scaling_factor is set (not just read)
rg -n "presolve_data\.objective_scaling_factor\s*=" -g "*.cpp" -g "*.cu" -B5 -A3

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look for any validation before using get_solver_obj_from_user_obj
rg -n "get_solver_obj_from_user_obj" -g "*.cpp" -g "*.cu" -B5 -A5

Repository: NVIDIA/cuopt

Length of output: 1979


🏁 Script executed:

# Check if there's a guard checking scaling_factor before division elsewhere
rg -n "!= 1 \|" -g "*.cpp" -g "*.cu" | grep -i "scaling"

Repository: NVIDIA/cuopt

Length of output: 345


🏁 Script executed:

# Verify the setter is indeed public and part of the API
rg -n "set_objective_scaling_factor" -g "*.hpp" -g "*.h" | head -20

Repository: NVIDIA/cuopt

Length of output: 1003


🏁 Script executed:

# Check if presolve_data structure could have objective_scaling_factor set to zero through initialization
rg -n "struct.*presolve_data|class.*presolve_data" -g "*.cpp" -g "*.cu" -g "*.hpp" -g "*.h" -A15 | head -60

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check how presolve_data.objective_scaling_factor is populated
rg -n "presolve_data\.objective_scaling_factor\s*=" -g "*.cpp" -g "*.cu" -B3 -A3

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look at the function that initializes presolve_data to understand data flow
rg -n "void initialize_presolve_data|presolve_data\s*\{" -g "*.cpp" -g "*.cu" -B2 -A10

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Search for presolve_data structure definition and initialization
rg -n "presolve_data\s*\{|presolve_data\s*=" -g "*.cpp" -g "*.cu" | head -40

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look for presolve_data member initialization in constructors or functions
rg -n "\.objective_scaling_factor\s*=" -g "*.cpp" -g "*.cu" -B5 -A2 | head -100

Repository: NVIDIA/cuopt

Length of output: 870


🏁 Script executed:

# Search in the problem.cu file for how presolve_data is set up
rg -n "objective_scaling" cpp/src/mip/problem/problem.cu -B3 -A3

Repository: NVIDIA/cuopt

Length of output: 1249


🏁 Script executed:

# Find where presolve_data is initialized and how objective_scaling_factor gets its value
rg -n "presolve_data\." cpp/src/mip/problem/problem.cu | grep -v "presolve_data\.objective_scaling_factor\s*=" | head -30

Repository: NVIDIA/cuopt

Length of output: 2422


🏁 Script executed:

# Look for constructor or initialization of problem_t that sets up presolve_data
rg -n "problem_t::problem_t|problem_t<.*>::problem_t" cpp/src/mip/problem/problem.cu -A30 | head -80

Repository: NVIDIA/cuopt

Length of output: 4862


🏁 Script executed:

# Find where problem reads the objective scaling from user input and stores in presolve_data
rg -n "objective_scaling_factor" cpp/src/mip/problem/problem.cu -B2 -A2 | grep -v "v\.objective_scaling_factor"

Repository: NVIDIA/cuopt

Length of output: 854


🏁 Script executed:

# Find presolve_data constructor in problem.cu or header files
rg -n "presolve_data\s*\(" cpp/src/mip/problem/problem.cu -B2 -A10 | head -50

Repository: NVIDIA/cuopt

Length of output: 2655


🏁 Script executed:

# Search for presolve_data class/struct definition
find . -name "*.hpp" -o -name "*.h" | xargs grep -l "struct presolve_data\|class presolve_data" | head -5

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Find the presolve_data constructor definition
rg -n "presolve_data\s*\(" cpp/src/mip/problem/ -g "*.hpp" -g "*.h" -A15 | head -80

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Search for presolve_data_t definition (note the _t suffix)
rg -n "struct presolve_data_t|class presolve_data_t" -g "*.hpp" -g "*.h" -A30

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Find presolve_data_t constructor that takes optimization_problem_t
rg -n "presolve_data_t.*optimization_problem_t" -g "*.hpp" -g "*.h" -B2 -A15

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Search for presolve_data_t definition more broadly
rg -n "presolve_data_t" -g "*.hpp" -g "*.h" | head -20

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Search for the struct definition in problem.hpp
rg -n "struct presolve_data_t" -g "*.hpp" -g "*.h" -A40

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look in problem.hpp header file
find . -name "problem.hpp" -path "*/mip/problem/*" -exec cat {} \; | head -200

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Find header files in the mip/problem directory
find cpp/src/mip/problem -name "*.hpp" -o -name "*.h"

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# List files in the mip directory
ls -la cpp/src/mip/problem/

Repository: NVIDIA/cuopt

Length of output: 862


🏁 Script executed:

# Search for presolve_data definition in all header files
rg "objective_scaling_factor.*objective_offset" -g "*.hpp" -g "*.h" -B10 -A10

Repository: NVIDIA/cuopt

Length of output: 6130


🏁 Script executed:

# Read the presolve_data.cuh header file to understand the structure
cat cpp/src/mip/problem/presolve_data.cuh

Repository: NVIDIA/cuopt

Length of output: 3140


🏁 Script executed:

# Verify that there's no validation of objective_scaling_factor in the setter
cat -n cpp/src/linear_programming/optimization_problem.cu | sed -n '125,128p'

Repository: NVIDIA/cuopt

Length of output: 241


🏁 Script executed:

# Check if there's any documentation or comments about objective_scaling_factor validation
rg -n "objective_scaling_factor.*validate|objective_scaling_factor.*check\|objective_scaling_factor.*guard" -g "*.cpp" -g "*.cu" -g "*.hpp" -g "*.h"

Repository: NVIDIA/cuopt

Length of output: 38


Add validation to get_solver_obj_from_user_obj() or ensure objective_scaling_factor cannot be zero.

The division by presolve_data.objective_scaling_factor can fail if the scaling factor is zero or very close to zero. While the default value is 1.0, the public set_objective_scaling_factor() setter has no validation, allowing users to set this value to zero. Either validate the setter to reject zero/near-zero values, or add a guard in this function to handle invalid scaling factors safely. Note that the LP solver's equivalent code (in convergence_information.cu) uses explicit guards before applying scaling transformations, suggesting this is a known concern.

🤖 Prompt for AI Agents
In @cpp/src/mip/problem/problem.cu around lines 1783 - 1787, The method
get_solver_obj_from_user_obj currently divides by
presolve_data.objective_scaling_factor which can be zero; add validation: either
enforce non-zero/near-zero rejection in the public setter
set_objective_scaling_factor (reject values with fabs(value) < kEpsilon and
return error/throw) and document it, and also harden
get_solver_obj_from_user_obj by guarding against tiny scaling factors (check
fabs(presolve_data.objective_scaling_factor) < kEpsilon and either throw a clear
exception or use a safe fallback) so division by zero cannot occur; reference
the same epsilon and error messaging approach used in convergence_information.cu
for consistency.

i_t num_diving_threads = std::max(1, num_threads - num_bfs_threads);
branch_and_bound_settings.num_bfs_threads = num_bfs_threads;
branch_and_bound_settings.num_diving_threads = num_diving_threads;
branch_and_bound_settings.termination = &context.termination;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's find and examine the termination_checker_t class definition
rg -n 'class termination_checker_t|struct termination_checker_t' --type cpp --type h -A 20

Repository: NVIDIA/cuopt

Length of output: 1718


🏁 Script executed:

#!/bin/bash
# Next, let's look at the context around lines 186 and 235 in solver.cu
head -n 250 cpp/src/mip/solver.cu | tail -n 70

Repository: NVIDIA/cuopt

Length of output: 3727


🏁 Script executed:

#!/bin/bash
# Search for should_terminate implementation
rg -n 'should_terminate' --type cpp --type h -B 2 -A 5

Repository: NVIDIA/cuopt

Length of output: 1015


🏁 Script executed:

#!/bin/bash
# Get the full termination_checker_t class definition
cat -n cpp/src/utilities/termination_checker.hpp

Repository: NVIDIA/cuopt

Length of output: 3051


🏁 Script executed:

#!/bin/bash
# Search for should_terminate method specifically in termination_checker_t
rg -n 'should_terminate' cpp/src/utilities/termination_checker.hpp -B 2 -A 8

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Find and examine timer_t implementation
rg -n 'class timer_t|struct timer_t' --type cpp --type h -A 30

Repository: NVIDIA/cuopt

Length of output: 1817


🏁 Script executed:

#!/bin/bash
# Check if set_termination_callback is called after the async launch of B&B
rg -n 'set_termination_callback' cpp/src/mip/solver.cu -B 5 -A 5

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Search across entire repo for set_termination_callback usage
rg -n 'set_termination_callback' --type cpp --type h

Repository: NVIDIA/cuopt

Length of output: 169


🏁 Script executed:

#!/bin/bash
# Check if termination_checker_t is accessed by B&B from multiple threads
# Look at how B&B uses the termination pointer
rg -n 'termination->check\|termination_->check\|\.check()' cpp/src/mip/ --type cpp --type h -B 2 -A 2

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Find how branch_and_bound_t uses the termination pointer
rg -n 'settings.termination\|termination_\|termination->' cpp/src/mip/ --type cpp --type h -B 2 -A 2

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Look for the branch_and_bound_t class and how it accesses termination
fd -t f '\.(cu|cpp|h|hpp)$' cpp/src/ -name '*branch_and_bound*' | head -5

Repository: NVIDIA/cuopt

Length of output: 286


🏁 Script executed:

#!/bin/bash
# Find branch_and_bound files
find cpp/src -name '*branch_and_bound*' -type f

Repository: NVIDIA/cuopt

Length of output: 139


🏁 Script executed:

#!/bin/bash
# Search for how branch_and_bound_settings.termination is actually used
rg 'settings\.termination|\.termination' cpp/src --type cpp --type h -B 3 -A 3 | head -50

Repository: NVIDIA/cuopt

Length of output: 2139


Add thread-safety synchronization to termination_checker_t.

The termination_callback_ and termination_callback_data_ members are accessed without synchronization: check() reads them from the B&B thread while set_termination_callback() writes them from the main thread. Add a mutex to protect concurrent access to these members, or use atomics if the callback pointer can be safely swapped.

🤖 Prompt for AI Agents
In @cpp/src/mip/solver.cu at line 186, termination_checker_t currently accesses
termination_callback_ and termination_callback_data_ from different threads
without synchronization; add thread-safety by protecting those members with a
mutex (e.g., add std::mutex mutex_ to termination_checker_t and lock in both
set_termination_callback() and check()) or, if preferred, make the function
pointer and data pointer atomic (std::atomic for the pointer types) and perform
atomic loads/stores in set_termination_callback() and check() so the B&B thread
always sees a consistent pair; ensure branch_and_bound_settings.termination
(which points to context.termination) still calls into the synchronized check()
method.

Comment on lines +141 to +142
detail::mip_solver_t<int, double> solver(
problem, default_settings, scaling, termination_checker_t(0));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Potential dangling reference: temporary passed to non-const reference parameter.

termination_checker_t(0) creates a temporary that is passed to mip_solver_t which expects termination_checker_t&. The temporary will be destroyed after the constructor completes, but mip_solver_t stores timer_ as a reference (see cpp/src/mip/solver.cu line 56). This results in a dangling reference when the solver is subsequently used.

Compare with cpp/tests/mip/multi_probe_test.cu (lines 153-164) which correctly creates a named variable.

🔧 Proposed fix
+  auto timer = termination_checker_t(std::numeric_limits<double>::infinity(),
+                                     termination_checker_t::root_tag_t{});
   detail::mip_solver_t<int, double> solver(
-    problem, default_settings, scaling, termination_checker_t(0));
+    problem, default_settings, scaling, timer);

Note: You'll need to add #include <limits> if not already present.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
detail::mip_solver_t<int, double> solver(
problem, default_settings, scaling, termination_checker_t(0));
auto timer = termination_checker_t(0);
detail::mip_solver_t<int, double> solver(
problem, default_settings, scaling, timer);
🤖 Prompt for AI Agents
In @cpp/tests/mip/load_balancing_test.cu around lines 141 - 142, The temporary
termination_checker_t(0) is being passed to the mip_solver_t constructor which
stores a reference (timer_) and thus causes a dangling reference; fix by
creating a named local variable like termination_checker_t timer(0) (or whatever
identifier is used in multi_probe_test.cu) and pass timer to the
detail::mip_solver_t<int, double> constructor instead of the temporary, ensuring
the named variable outlives the solver; also add #include <limits> if that test
needs it.

@github-actions
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] cuOpt should exit the solve gracefully on SIGINT

1 participant