Skip to content

tztechno/tz_openr1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tz_openr1


PJ_0509

Check whether the GRPO really improves scores further using for SFT fine-tuned model.

  • Model: Qwen2.5-1.5B
  • Number Data: 1000(sft),1000(rl),1000(eval)
  • Epochs: 20(sft),1by1(rl)
  • Methods: SFT,GRPO

PJ_0426

Check whether the RLs really improves scores further using for SFT fine-tuned model.

  • Model: Qwen2.5-0.5B
  • Number Data: 400(sft,rl),400(eval)
  • Epochs: 10(sft),1(rl)
  • Methods: SFT,GRPO,CPPO,DRGRPO,DRGRPOCPPO,RAFT,REINFORCE

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors