-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Hi team,
I noticed that the accuracy reward calculation seems to involve responses from different images within a batch, which are generated in parallel across multiple processes.
However, it looks like the reward computation itself is also happening in parallel without synchronization across processes. This might lead to inconsistencies in the accuracy reward, since it depends on cross-process data.
Is this behavior intentional? Or should there be a mechanism to synchronize the responses before computing the reward in a multi-process setting?
Thanks in advance for the clarification!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels