Skip to content

[MiniLLM] baseline hyperparameter settings #333

@hyhy892

Description

@hyhy892

Thank you for your excellent work. I'm trying to reproduce KD baseline on gpt2.
I noticed that in the appendix of your paper, you mentioned that your training hyperparameters were obtained through a search.
I'd like to know if you trained the program for a smaller number of epochs and then determined the learning rate based on the validation results? Because complete training is very costly.
And could you provide the training hyperparameter settings used in the final paper results?
I tried training with the hyperparameters provided in your code on gpt2-base, only modifying the batch size, but I obtained results far below those in the paper:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions