Review the C++ implementation of the binary search function #53

gshiroma · 2025-07-03T19:53:59Z

This PR sets an upper limit on the number of iterations for the binary search in ISCE3. It addresses an issue reported here, where Topo processing runs indefinitely. The root cause was traced to the presence of NaNs in the DEM, which prevented ISCE3’s binary search from converging to a solution. To prevent such cases from causing infinite loops, this PR adds a maximum iteration threshold to the while loop in the binary search.

(UPDATE) This PR now reviews the binary search function binarySearch() entirely and adds unit tests to verify its correctness.

gmgunter · 2025-07-03T21:23:46Z

cxx/isce3/core/Utilities.h

-        while (left <= right) {
+
+        long long count = 0;
+        long long MAX_NUM_ITERATIONS = 1e11;


Wow, nice job tracking down this issue! That's a subtle one.

Not sure I understand the logic behind this fix, though. Previously, if array contained NaN values, the code could enter an infinite loop. Now, if array contains NaN values, the code might spin for 100 billion useless iterations and then return the wrong answer? Is that really the behavior that we want?

Since NaN values in array are the underlying issue, why don't we just add a check for NaNs instead?

#include <cmath> #include <stdexcept>

+ const auto x = array[middle]; + if (std::isnan(x)) { + raise std::invalid_argument("input array may not contain NaN values"); + } - if (array[middle] <= value) { + if (x <= value) { left = middle; - } else if (array[middle] > value) { + } else if (x > value) { right = middle; }

I think that even if we handle the NaN case, it’s still important for this loop to have a fixed upper bound. But I like your suggestion and will include it. Thanks!

I incorporated your suggestion in commit e19052c. Thank you, @gmgunter !

it’s still important for this loop to have a fixed upper bound

Why?

I think it's a good practice. It's also the second rule in The Power of 10: Rules for Developing Safety-Critical Code.

instead of using the formula above, why not just use array.size()?

The number of iterations should be log2 of the array size, so you could do this instead of the approach I used above:

const auto maxiter = static_cast<int>(std::ceil(std::log2(array.size())));

That seems fine too. I just thought my previous approach was a bit simpler, and it's nice that it's a compile-time constant so you don't have to do any additional math at runtime.

If we compute maxiter the way I described in the previous comment, it should always be equal to 64 on 64-bit machines, which isn't a very large number of iterations to try before giving up, so I figured it'd be reasonable even if it's a bit looser than the maximum number of iterations that are strictly necessary for a given array size.

I’m still not sure why you don't like the idea of having an upper bound.

I just think it's an unnecessary pessimization of the algorithm. Generally we want to minimize the number of instructions executed in tight loops like this. This change is adding an additional increment and comparison operation to each iteration. If the function is free of bugs, these extra operations shouldn't be necessary since the number of iterations will never exceed the specified maximum.

Looking at the implementations of binary search in libc++ and NumPy, neither one seems to implement an explicit check that the loop doesn't exceed some maximum number of iterations. Presumably these are high-quality, mature implementations that we should seek to emulate.

That said, I'm much happier with it now that the code raises an exception upon hitting the max number of iterations instead of just returning the wrong answer (using an assert would also be fine in this case). Pipelining and branch prediction will probably hide most of the latency, and the way the function is implemented is already probably pretty non-optimal anyway.

So, while I would still prefer to remove the max iterations check, I think my concerns are non-blocking if the team would like to include this check.

That seems fine too. I just thought my previous approach was a bit simpler, and it's nice that it's a compile-time constant so you don't have to do any additional math at runtime.

If we compute maxiter the way I described in the previous comment, it should always be equal to 64 on 64-bit machines, which isn't a very large number of iterations to try before giving up, so I figured it'd be reasonable even if it's a bit looser than the maximum number of iterations that are strictly necessary for a given array size.

I also used array.size() instead of std::ceil(std::log2(array.size())) to be a bit looser on the maximum number of iterations. My goal with this PR was just to detect cases in which the binary search is running unexpectedly long and we'd catch it. I don't feel very comfortable setting the maximum number of iterations to std::ceil(std::log2(array.size())).

I’m still not sure why you don't like the idea of having an upper bound.

I just think it's an unnecessary pessimization of the algorithm. Generally we want to minimize the number of instructions executed in tight loops like this. This change is adding an additional increment and comparison operation to each iteration. If the function is free of bugs, these extra operations shouldn't be necessary since the number of iterations will never exceed the specified maximum.

Looking at the implementations of binary search in libc++ and NumPy, neither one seems to implement an explicit check that the loop doesn't exceed some maximum number of iterations. Presumably these are high-quality, mature implementations that we should seek to emulate.

Well, I’m also sure their code went through much more scrutiny than ours. We don't expect our code to have bugs until we find them =). And thanks for sharing these other implementations!

That said, I'm much happier with it now that the code raises an exception upon hitting the max number of iterations instead of just returning the wrong answer (using an assert would also be fine in this case). Pipelining and branch prediction will probably hide most of the latency, and the way the function is implemented is already probably pretty non-optimal anyway.

So, while I would still prefer to remove the max iterations check, I think my concerns are non-blocking if the team would like to include this check.

Thank you, @gmgunter , I just don't want to use std::ceil(std::log2(array.size())) because I think it's too strict. Do you think we can use array.size() or do you still prefer your suggestion with constexpr int maxiter = sizeof(std::size_t) * CHAR_BIT;?

I think we agree that, at least in theory, the binary search implementation should take at most $\log_{2} N$ iterations. It seems like your concern is that the code may be buggy, causing it to perform additional iterations in some circumstances. Is that why you prefer to use array.size() instead of std::ceil(std::log2(array.size()))?

If we use the stricter upper bound, and the code does have a bug that causes it to be exceeded, then that will just raise an exception that alerts us to the bug so we can fix it. I'm not sure why that'd be more risky than using a looser tolerance that allows such bugs to go undetected.

Beyond that, my concern is that using array.size() instead of std::ceil(std::log2(array.size())) would be misleading for someone reading the code. If I was reading this function body for the first time, I think I would be very confused about why the authors expected binary search to take $N$ iterations instead of $\log_2 N$.

Yeah, I have my own concerns about this function too. It was not my intent to review its functionality since we've been using it for a while. I'd avoid being too strict in these edge cases since we are so close to the NISAR launch, but it's probably a good idea to take a deeper look into it. Things that I found:

I don't fully understand why we need to keep in the loop if left == right in:

while (left <= right) { ... }

Also, if left is equal to right -1, we always return the left in the next skip condition, even if right is closer to value.

if (left == (right - 1)) { index = left; return index; }

The conditions look a bit strange to me. I tried to improve and make the function more intuitive in commits 96915ae and 8893ba8. I also added some unit tests in commit 0da209a .

Now, I feel more confident in using std::ceil(std::log2(array.size()). I also added +1 just to avoid being too strict.

I hope the changes make sense. Please let me know your thoughts.

It turns out that Topo.cpp doesn't want the closest point (index) to value in the binary search. It wants the left index because of the linear interpolation that happens a few lines below (see code here). So, I added an argument to binarySearch() to force picking the left index (always_pick_left), and updated the unit test accordingly. Without these updates the computation of the layover-shadow mask fails in some points. See updates in the commit 563fea9.

cxx/isce3/core/Utilities.h

…right] == value

hfattahi · 2025-07-14T17:55:43Z

My understanding is that this issue could be avoided if the user would pass correct data (DEM without NaN) to the module. I can imagine that downstream workflows would check the inputs before passing to this core module. Anyways, for now I removed R4.1 milestone as this does not seem NISAR or OPERA critical and requires MCR approval to be added to the milestone of the next delivery.

gshiroma added 2 commits July 3, 2025 12:45

limit the maximum number of iterations for binary search

108a19e

limit the maximum number of iterations for binary search

d396562

gshiroma requested review from Tyler-g-hudson and gmgunter July 3, 2025 19:53

gmgunter reviewed Jul 3, 2025

View reviewed changes

raise an error if array contains NaN

e19052c

gmgunter reviewed Jul 4, 2025

View reviewed changes

cxx/isce3/core/Utilities.h Show resolved Hide resolved

cxx/isce3/core/Utilities.h Outdated Show resolved Hide resolved

gshiroma added 3 commits July 3, 2025 23:21

add imports

15d38d3

simplify code

feef376

limit the number of iterations in the binary search

1b2e7e9

gmgunter reviewed Jul 8, 2025

View reviewed changes

cxx/isce3/core/Utilities.h Outdated Show resolved Hide resolved

cxx/isce3/core/Utilities.h Outdated Show resolved Hide resolved

use size_t instead of long long

a3684da

gshiroma added this to the R4.1.0 milestone Jul 8, 2025

gshiroma added 4 commits July 9, 2025 11:58

update the binary search binarySearch()

96915ae

update the binary search binarySearch()

8893ba8

add unit tests for binarySearch()

0da209a

remove comments

e8f5f7f

gshiroma changed the title ~~Limit the maximum number of iterations for binary search~~ Review the c++ implementation of the binary search function Jul 9, 2025

gshiroma added 5 commits July 9, 2025 20:00

add argument always_pick_left to binarySearch()

563fea9

fix argument position

00a33f7

fix argument position

d6ef793

handle values outside array; improve docstrings; fix logic for array[…

5f3c682

…right] == value

improve docstrings

f314eaa

gshiroma changed the title ~~Review the c++ implementation of the binary search function~~ Review the C++ implementation of the binary search function Jul 14, 2025

hfattahi removed this from the R4.1.0 milestone Jul 14, 2025

Review the C++ implementation of the binary search function #53

Are you sure you want to change the base?

Review the C++ implementation of the binary search function #53

Uh oh!

Conversation

gshiroma commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmgunter Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hfattahi commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gshiroma commented Jul 3, 2025 •

edited

Loading

gmgunter Jul 9, 2025 •

edited

Loading

hfattahi commented Jul 14, 2025 •

edited

Loading