Skip to content

Conversation

@eltongo
Copy link
Contributor

@eltongo eltongo commented Aug 10, 2023

What and why? 🤔

There's a division by zero bug in WeightedRandomStreamRanker when the score for an element is zero. This PR fixes that.

Testing Steps ✍️

A code review, and ensuring tests pass is fine in this case.

You're it! 👑

@Automattic/stream-builders

@eltongo eltongo requested review from a team and mhsdef and removed request for a team August 10, 2023 13:47
@coveralls
Copy link

coveralls commented Aug 10, 2023

Coverage Status

coverage: 79.311% (+0.006%) from 79.305% when pulling c877a55 on elton/weighted-ranker-divbyzero into 94e8bb2 on main.

// calculate sampling score
$r = pow(mt_rand() / $max_rand, (1 / $original_element->get_score()));
}
$H[strval($r)] = $element;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need to be worried about this line though: $H[strval($r)] = $element;. An element with a score of zero should be a rare occurrence, but is it possible some elements will be lost? It seems that way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like we should have some logging somewhere to show us definitively whether a score of 0 actually happens... maybe we should add some here, if not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i see that we do have this logging, since the exception is happening in production. in that case, maybe a decision could be made here to just drop the element entirely if the weight ends up being 0. that seems to make sense to me? cc @nicola-barbieri @lucila i guess it's a separate broader question of what we should do here when two elements have the exact same score, because yeah, they'll collide here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might still be possible for two elements to have the same score, even if not zero though

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When score->0 then r->infinity, in which case I think assigning zero to it is far from the actual final score.

It all depends on the requirements of this ranker and what a score=0 means:

  1. if it's an error it means nothing and we can give it indeed a very low score or even make this element an invalid one (you can do this below)
  2. If it's not an error and it's an acceptable value which is just not handled then r should be a high number and not zero or invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what if, theoretically, we get two elements with the same score. Only one of those will end up in $H. Does this mean the other one gets discarded?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another alternative I'm thinking about, to reduce the chances of having more than 1 element with the same score, is to do something like:

$score = $original_element->get_score();
if ($score === 0) {
  $score = 0.001;
}
$r = pow(mt_rand() / $max_rand, (1 / $score));

so that we will still call mt_rand() with a very small value on the exp part.

thoughts? @eltongo

I think this is an alternative @lovemeblender had suggested before (internally on a huddle)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think it is also possible for two elements to have the same non-zero score, no? In that case, what we need is to fix $H, because this would only fix the case for the score being zero.

Copy link

@lovemeblender lovemeblender Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mt_rand() <= $max_rand

if that's true we're set with either zero or adding an very small number to avoid collisions ^ collisions will be handled gracefully I think as the elements will just appear adjacently

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with fixing the division by zero, maybe adding an epsilon to all input scores:
$score = $score + $epsilon (0.00001)

score = 0 shouldn't happen in the first place (unless rounding errors). which element is causing this ?

@eltongo eltongo requested a review from a team as a code owner August 10, 2023 13:51
$original_element = $element->get_original_element();
// calculate sampling score
$r = pow(mt_rand() / $max_rand, (1 / $original_element->get_score()));
if ($original_element->get_score() === 0.0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just as a note, if for any reason this comes as 0 instead, 0===0.0 evaluates to false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants