Draft
Conversation
lukasmolnar
commented
Jun 24, 2024
sheim
reviewed
Jul 2, 2024
Owner
sheim
left a comment
There was a problem hiding this comment.
Overall looks good, I haven't checked in detail the GePPO implementation, is there something specific you want me to look at closely?
| activation="elu", | ||
| init_noise_std=1.0, | ||
| normalize_obs=True, | ||
| store_pik=False, |
Owner
There was a problem hiding this comment.
use readable variable names. What is "pik"?
Owner
There was a problem hiding this comment.
I think this would be easy enough to create a new actor class that inherits from the vanilla actor, what do you think?
learning/modules/utils/normalize.py
Outdated
| mean = input.mean(tuple(range(input.dim() - 1))) | ||
| var = input.var(tuple(range(input.dim() - 1))) | ||
| # TODO: check this, it got rid of NaN values in first iteration | ||
| dim = tuple(range(input.dim() - 1)) |
|
|
||
| # Implementation based on GePPO repo: https://github.com/jqueeney/geppo | ||
| @torch.no_grad | ||
| def compute_gae_vtrace(data, gamma, lam, is_trunc, actor, critic, rec=False): |
Owner
There was a problem hiding this comment.
rule of thumb, don't abbreviate (rec --> recursive)
| offpol_ratio = torch.exp(log_prob_pik - batch["log_prob"]) | ||
|
|
||
| advantages = batch["advantages"] | ||
| if self.normalize_advantages: |
Owner
There was a problem hiding this comment.
this is currently set to False, which surprises me, that seemed to be quite important in PPO...
| counter += 1 | ||
| self.mean_surrogate_loss /= counter | ||
|
|
||
| # Compute TV, add to self for logging |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue ticket number and link
Fixes # (issue)
Describe your changes
Please include a summary of the change, including why you did this, and the desired effect.
Instructions for reviewers
Indicate anything in particular that you would like a code-reviewer to pay particular attention to.
Indicate steps to actually test code, including CLI instructions if different than usual.
Point out the desired behavior, and not just the "check that this appears" (otherwise the code reviewer will be lazy and just verify what you've already verified).
Checklist before requesting a review
ruff format .manually