Skip to content

The questions about the training process #2

@jinhangzhan

Description

@jinhangzhan

Thank you for your source code and dataset.

I want to ask about the training budget you use in the script stage_one.sh, the training batch size is 128 with 32 samples on each, and your dataset(sparkle-reasoning/dsr40k) includes around 40k prompts. Does this mean you have trained the model with 40k * 30 = 1200k prompts for total_epochs=30? Since I am using the same dataset to train the same Qwen model, it seems the performance doesn't change that much after the 400 training steps with a batch size of 256. I was wondering if it is because the amount of training is not enough, should I continue to train in order to improve the in-domain performance? I have attached my training dynamics for the reward:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions