Thanks. You can fix a global seed and make the tempreture lower to a fixed quality score.

…

________________________________ 发件人: Omntrix ***@***.***> 发送时间: 2026年1月19日 10:18 收件人: TianheWu/VisualQuality-R1 ***@***.***> 抄送: Subscribed ***@***.***> 主题: [Ext] [TianheWu/VisualQuality-R1] the problem of reward model (Issue #10) CAUTION: External email. Do not reply, click on links or open attachments unless you recognize the sender and know the content is safe. [https://avatars.githubusercontent.com/u/151115636?s=20&v=4]Omntrix created an issue (TianheWu/VisualQuality-R1#10)<#10> Hi ,This is a nice work. I'd like to use VisualQuality-R1 as the reward model for reinforcement learning. However, I've found that VisualQuality-R1 gives different scores for the same image each time. This is unacceptable for training GRPO. Do you have any solutions? ― Reply to this email directly, view it on GitHub<#10>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/APC47QSY5FIXEESR7ND3XVT4HQ5GHAVCNFSM6AAAAACSDNLJWWVHI2DSMVQWIX3LMV43ASLTON2WKOZTHAZDONZRGQ4DCOA>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

the problem of reward model #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions