Bad performance on large vision models

Hello there,

I am doing my best to learn how to use this optimizer, as I would very much like to have an auto-tuned optimizer where I do not have to spend endless days fiddling with hyperparameters.  I have tried to use YellowFin to learn large vision models such as [MobileNet](https://arxiv.org/abs/1704.04861), but my results are always very disappointing as compared to a traditional optimizer such as `SGD`.  I am not so concerned about convergence time as I am about loss/accuracy; I have found that YellowFin tends to converge to a much worse loss/accuracy than my SGD runs do.

I am posting here an example of training MobileNet on the ImageNet dataset with a batch size of 64, comparing the training and testing loss (as well as testing accuracy) of a few epochs of training on MobileNet.  In both cases, I have a learning rate schedule applied to set the learning rate factor to `0.3 ^ (epoch // 10)`, which causes the learning rate to fall to 3/10 of its value every 10 epochs.  You can see the effect of this learning rate schedule in the `sgd` plot fairly easily, the `yf` plot shows it less clearly.  In these figures, the training loss (per minibatch) is shown in blue, while the testing loss (per epoch) is shown in red, with the relevant axis shown on the left.  The top-1 and top-5 accuracies on the training dataset are shown in green (per epoch), with their relevant axis given on the right.  Other than the optimizer choice, all other training settings are the same, including minibatch size (64), dataset (ImageNet) and model architecture (MobileNet).

Here is a plot for an SGD optimizer run (note that I have this model only partially trained, this is because it has trained enough that we can already see it will converge to a significantly better loss than the YF model did, below):
![mobilenet SGD](https://user-images.githubusercontent.com/130920/36285968-71d6325e-1262-11e8-8017-06c00dd0918d.png)

Here is a plot for a YellowFin optimizer run:
![mobilenet YellowFin](https://user-images.githubusercontent.com/130920/36250128-bf7e10e2-11f1-11e8-887e-125547144027.png)

If there are any questions about my methodology I would be happy to explain in greater detail.  There is nothing particularly special going on in my model, I am simply trying to determine why YellowFin seems to converge with such poor results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad performance on large vision models #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bad performance on large vision models #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions