Skip to content

1cycle Policy. Unfamiliar results #4

@karanchahal

Description

@karanchahal

Hey,

I was implementing 1 cycle policy as an exercise. And I have a few observations from my experiments.
I have a
Model : Resnet18.
Batch size for training = 128
Batch size for testing = 100

Optimser : optim.SGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
Total number of epochs 26

1 cycle policy : Learning rate goes from 0.01 to 0.1 and back till 24 epochs

Then model is trained for 2 epochs at 0.001 learning rate.

No cyclic momentum used or adamw.

I achieved a test set accuracy of 93.4%in 26 epochs.

This seems like a big difference from the 70 epochs at 512 batch size that is quoted in your blog post.

Am I doing something wrong ? Is the number of epochs a good metric to base your results on, as those are dependant on the batch size ? .

The whole point of using super convergence is using high learning rates to converge quicker , but it seems like using low learning rates (0.01- 0.1 < 0.8-3) is faster to train.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions