-
Notifications
You must be signed in to change notification settings - Fork 9
Fix division by zero in WeightedRandomStreamRanker #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| // calculate sampling score | ||
| $r = pow(mt_rand() / $max_rand, (1 / $original_element->get_score())); | ||
| } | ||
| $H[strval($r)] = $element; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we need to be worried about this line though: $H[strval($r)] = $element;. An element with a score of zero should be a rare occurrence, but is it possible some elements will be lost? It seems that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel like we should have some logging somewhere to show us definitively whether a score of 0 actually happens... maybe we should add some here, if not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i see that we do have this logging, since the exception is happening in production. in that case, maybe a decision could be made here to just drop the element entirely if the weight ends up being 0. that seems to make sense to me? cc @nicola-barbieri @lucila i guess it's a separate broader question of what we should do here when two elements have the exact same score, because yeah, they'll collide here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might still be possible for two elements to have the same score, even if not zero though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When score->0 then r->infinity, in which case I think assigning zero to it is far from the actual final score.
It all depends on the requirements of this ranker and what a score=0 means:
- if it's an error it means nothing and we can give it indeed a very low score or even make this element an invalid one (you can do this below)
- If it's not an error and it's an acceptable value which is just not handled then
rshould be a high number and not zero or invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what if, theoretically, we get two elements with the same score. Only one of those will end up in $H. Does this mean the other one gets discarded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another alternative I'm thinking about, to reduce the chances of having more than 1 element with the same score, is to do something like:
$score = $original_element->get_score();
if ($score === 0) {
$score = 0.001;
}
$r = pow(mt_rand() / $max_rand, (1 / $score));
so that we will still call mt_rand() with a very small value on the exp part.
thoughts? @eltongo
I think this is an alternative @lovemeblender had suggested before (internally on a huddle)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think it is also possible for two elements to have the same non-zero score, no? In that case, what we need is to fix $H, because this would only fix the case for the score being zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mt_rand() <= $max_rand
if that's true we're set with either zero or adding an very small number to avoid collisions ^ collisions will be handled gracefully I think as the elements will just appear adjacently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with fixing the division by zero, maybe adding an epsilon to all input scores:
$score = $score + $epsilon (0.00001)
score = 0 shouldn't happen in the first place (unless rounding errors). which element is causing this ?
| $original_element = $element->get_original_element(); | ||
| // calculate sampling score | ||
| $r = pow(mt_rand() / $max_rand, (1 / $original_element->get_score())); | ||
| if ($original_element->get_score() === 0.0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just as a note, if for any reason this comes as 0 instead, 0===0.0 evaluates to false
What and why? 🤔
There's a division by zero bug in
WeightedRandomStreamRankerwhen the score for an element is zero. This PR fixes that.Testing Steps ✍️
A code review, and ensuring tests pass is fine in this case.
You're it! 👑
@Automattic/stream-builders