Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add centered version of RMSProp #1778

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

ludvigk
Copy link

@ludvigk ludvigk commented Nov 23, 2021

Added centered version of RMSProp, which exists in other frameworks (pytorch, TF).

Not sure if it makes more sense to have a separate CenteredRMSProp (or CRMSProp) struct for the centered version.

PR Checklist

  • Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable
  • API changes require approval from a committer (different from the author, if applicable)

Copy link
Member

@DhairyaLGandhi DhairyaLGandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this! I've left a couple minor thoughts.

src/optimise/optimisers.jl Outdated Show resolved Hide resolved
src/optimise/optimisers.jl Outdated Show resolved Hide resolved
ludvigk and others added 2 commits November 23, 2021 15:21
@ludvigk
Copy link
Author

ludvigk commented Nov 23, 2021

Thanks for looking into this! I've left a couple minor thoughts.

Thanks, Python brings bad habits.

src/optimise/optimisers.jl Show resolved Hide resolved
Comment on lines 140 to 141
centered::Bool
state::IdDict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider this an API change? Might have to add a deprecation warning + getproperty overload forwarding acc to state.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we'll have to overload setproperty! also then, to avoid issues where the optimiser state is saved to properly resume training? Still, if it's saved in a .bson file, there will be problems. The easiest way to avoid the API change is perhaps to just define the centered version as a different optimiser. I like the idea of just having one optimiser with the centered flag though.

Another option is to create the new RMSProp version under a different name, and use a deprication warning for RMSProp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might just leave acc as it is as the symbol. That would resolve the deprecation. Would there be a problem?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that help? The IdDict doesn't contain the same information anymore, so I don't see that helping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does contain the same information when centered = false though, which is exactly what we want for backwards compatibility.

My concern was that both renaming and adding fields seem to be SemVer breaks. The former is easier to address, but the latter would not play well with code that uses reflection. Serialization isn't as big of a concern because optimizer state is basically useless once serialized (IdDict references will no longer match up).

Copy link
Member

@DhairyaLGandhi DhairyaLGandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move centered to a type parameter like RMSProp{true/false} to avoid adding a field to the optimiser.

if o.centered
@. Δ_ave = ρ * Δ_ave + (1 - ρ) * Δ
end
@. Δ *= η / (√(acc - Δ_ave^2) + ϵ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@. Δ *= η / ((acc - Δ_ave^2) + ϵ)
@. Δ *= η / ((acc - Δ_ave * conj(Δ_ave)) + ϵ)

@ToucheSir ToucheSir added this to the v0.13 milestone Feb 4, 2022
@ToucheSir ToucheSir mentioned this pull request Feb 5, 2022
@@ -110,39 +110,51 @@ function apply!(o::Nesterov, x, Δ)
end

"""
RMSProp(η = 0.001, ρ = 0.9)
RMSProp(η = 0.001, ρ = 0.9, centered = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On master there is RMSProp(η::Real, ρ::Real, ϵ::Real).

Probably this option should be controlled by a keyword, matching FluxML/Optimisers.jl#51

@mcabbott mcabbott removed this from the v0.13 milestone Mar 5, 2022
@mcabbott
Copy link
Member

mcabbott commented Mar 5, 2022

This need not hold up v0.13, since it adds a feature without breaking anything.

@cossio
Copy link
Contributor

cossio commented May 11, 2022

@ludvigk are you planning to follow up on this?

If not, I'd be willing to open a new PR with this.

@ludvigk
Copy link
Author

ludvigk commented May 12, 2022

@cossio

@ludvigk are you planning to follow up on this?

If not, I'd be willing to open a new PR with this.

I forgot about this. Feel free to open a new PR, or I can update this one this weekend.

@codecov-commenter
Copy link

Codecov Report

Merging #1778 (fbffc5c) into master (1f82da4) will decrease coverage by 0.10%.
The diff coverage is 62.50%.

@@            Coverage Diff             @@
##           master    #1778      +/-   ##
==========================================
- Coverage   87.94%   87.84%   -0.11%     
==========================================
  Files          19       19              
  Lines        1485     1489       +4     
==========================================
+ Hits         1306     1308       +2     
- Misses        179      181       +2     
Impacted Files Coverage Δ
src/optimise/optimisers.jl 92.77% <62.50%> (-0.98%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f82da4...fbffc5c. Read the comment docs.

@ludvigk
Copy link
Author

ludvigk commented May 22, 2022

I added the suggestions from here to the pull request. Changed "state" back to "acc", and merged with the new epsilon argument for RMSprop. Not sure what's left?

@mcabbott
Copy link
Member

mcabbott commented May 23, 2022

I think these two signatures are a bit too do-what-I-mean:

RMSProp(η::Real, ρ::Real, centered::Bool, ϵ::Real)
RMSProp(η::Real, ρ::Real, ϵ::Real)

4 is a lot of positional arguments, especially if their order isn't fixed. I would prefer it to be only a keyword argument. Ideally such a keyword should match FluxML/Optimisers.jl#51 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants