Open
Conversation
Contributor
Author
Contributor
Author
|
I've updated this PR to include recent changes to the master branch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



EDIT:
@JianGoForIt
I've just updated this pull request to sync with the current master.
This adds the functionality to monitor global gradient norm, both clipped and original. It works for both adaptive and manual clipping. I find it useful to observe gradient norm charts in Tensorflow, along with other metrics.
To summarize it in TF, you can just use
with initialized
optimizer = YFOptimizer(...)OLD COMMENTS BELOW, NVM
Context
Sometimes it is useful to use gradient clipping regulated by
clip_thresh, which is used bytf.clip_by_global_normasclip_normparameter. Iftf.global_norm(gradients)>clip_norm, gradients are shrinked byclip_norm / tf.global_norm(gradients).https://www.tensorflow.org/api_docs/python/tf/clip_by_global_norm
Problem and proposed changes
It may not be so clear what is the appropriate value for
clip_threshto prevent NaN values. To help with that, I have added code to monitor current gradient global norm. Specifically, ifclip_threshis not None, gradient clipping is possible and the calculatedself._grads_normis reachable byself.grad_global_norm_monitor.I'll post some screenshots to show that everything works okay. I have an ongoing training which compares this branch with the current master. I am tracking internal optimizer values with this setup:
optimizer = YFOptimizer(learning_rate=0.05, momentum=0.0, clip_thresh=10)tf.summary.scalar("learning_rate", optimizer._lr_var)tf.summary.scalar("momentum", optimizer._mu_var)tf.summary.scalar("global_gradient_norm", optimizer.grad_global_norm_monitor)