Skip to content

Process synchronization in the Accuracy Reward Method #3

@saitejatangudu

Description

@saitejatangudu

Hi team,

I noticed that the accuracy reward calculation seems to involve responses from different images within a batch, which are generated in parallel across multiple processes.

However, it looks like the reward computation itself is also happening in parallel without synchronization across processes. This might lead to inconsistencies in the accuracy reward, since it depends on cross-process data.

Is this behavior intentional? Or should there be a mechanism to synchronize the responses before computing the reward in a multi-process setting?

Thanks in advance for the clarification!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions