-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add centered version of RMSProp #1778
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this! I've left a couple minor thoughts.
Co-authored-by: Dhairya Gandhi <[email protected]>
Co-authored-by: Dhairya Gandhi <[email protected]>
Thanks, Python brings bad habits. |
src/optimise/optimisers.jl
Outdated
centered::Bool | ||
state::IdDict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider this an API change? Might have to add a deprecation warning + getproperty overload forwarding acc
to state
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we'll have to overload setproperty! also then, to avoid issues where the optimiser state is saved to properly resume training? Still, if it's saved in a .bson file, there will be problems. The easiest way to avoid the API change is perhaps to just define the centered version as a different optimiser. I like the idea of just having one optimiser with the centered flag though.
Another option is to create the new RMSProp version under a different name, and use a deprication warning for RMSProp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might just leave acc
as it is as the symbol. That would resolve the deprecation. Would there be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that help? The IdDict doesn't contain the same information anymore, so I don't see that helping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does contain the same information when centered = false
though, which is exactly what we want for backwards compatibility.
My concern was that both renaming and adding fields seem to be SemVer breaks. The former is easier to address, but the latter would not play well with code that uses reflection. Serialization isn't as big of a concern because optimizer state is basically useless once serialized (IdDict references will no longer match up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could move centered
to a type parameter like RMSProp{true/false}
to avoid adding a field to the optimiser.
src/optimise/optimisers.jl
Outdated
if o.centered | ||
@. Δ_ave = ρ * Δ_ave + (1 - ρ) * Δ | ||
end | ||
@. Δ *= η / (√(acc - Δ_ave^2) + ϵ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@. Δ *= η / (√(acc - Δ_ave^2) + ϵ) | |
@. Δ *= η / (√(acc - Δ_ave * conj(Δ_ave)) + ϵ) |
src/optimise/optimisers.jl
Outdated
@@ -110,39 +110,51 @@ function apply!(o::Nesterov, x, Δ) | |||
end | |||
|
|||
""" | |||
RMSProp(η = 0.001, ρ = 0.9) | |||
RMSProp(η = 0.001, ρ = 0.9, centered = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On master there is RMSProp(η::Real, ρ::Real, ϵ::Real)
.
Probably this option should be controlled by a keyword, matching FluxML/Optimisers.jl#51
This need not hold up v0.13, since it adds a feature without breaking anything. |
@ludvigk are you planning to follow up on this? If not, I'd be willing to open a new PR with this. |
Codecov Report
@@ Coverage Diff @@
## master #1778 +/- ##
==========================================
- Coverage 87.94% 87.84% -0.11%
==========================================
Files 19 19
Lines 1485 1489 +4
==========================================
+ Hits 1306 1308 +2
- Misses 179 181 +2
Continue to review full report at Codecov.
|
I added the suggestions from here to the pull request. Changed "state" back to "acc", and merged with the new epsilon argument for RMSprop. Not sure what's left? |
I think these two signatures are a bit too do-what-I-mean:
4 is a lot of positional arguments, especially if their order isn't fixed. I would prefer it to be only a keyword argument. Ideally such a keyword should match FluxML/Optimisers.jl#51 . |
Added centered version of RMSProp, which exists in other frameworks (pytorch, TF).
Not sure if it makes more sense to have a separate
CenteredRMSProp
(orCRMSProp
) struct for the centered version.PR Checklist