Process synchronization in the Accuracy Reward Method

Hi team,

I noticed that the accuracy reward calculation seems to involve responses from different images within a batch, which are generated in parallel across multiple processes.

However, it looks like the reward computation itself is also happening in parallel without synchronization across processes. This might lead to inconsistencies in the accuracy reward, since it depends on cross-process data.

Is this behavior intentional? Or should there be a mechanism to synchronize the responses before computing the reward in a multi-process setting?

Thanks in advance for the clarification!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process synchronization in the Accuracy Reward Method #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Process synchronization in the Accuracy Reward Method #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions