-
Notifications
You must be signed in to change notification settings - Fork 112
Description
Hey,
I was implementing 1 cycle policy as an exercise. And I have a few observations from my experiments.
I have a
Model : Resnet18.
Batch size for training = 128
Batch size for testing = 100
Optimser : optim.SGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
Total number of epochs 26
1 cycle policy : Learning rate goes from 0.01 to 0.1 and back till 24 epochs
Then model is trained for 2 epochs at 0.001 learning rate.
No cyclic momentum used or adamw.
I achieved a test set accuracy of 93.4%in 26 epochs.
This seems like a big difference from the 70 epochs at 512 batch size that is quoted in your blog post.
Am I doing something wrong ? Is the number of epochs a good metric to base your results on, as those are dependant on the batch size ? .
The whole point of using super convergence is using high learning rates to converge quicker , but it seems like using low learning rates (0.01- 0.1 < 0.8-3) is faster to train.