-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Thank you for your source code and dataset.
I want to ask about the training budget you use in the script stage_one.sh, the training batch size is 128 with 32 samples on each, and your dataset(sparkle-reasoning/dsr40k) includes around 40k prompts. Does this mean you have trained the model with 40k * 30 = 1200k prompts for total_epochs=30? Since I am using the same dataset to train the same Qwen model, it seems the performance doesn't change that much after the 400 training steps with a batch size of 256. I was wondering if it is because the amount of training is not enough, should I continue to train in order to improve the in-domain performance? I have attached my training dynamics for the reward:

Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels