Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document subgradient convention #404

Closed
oxinabox opened this issue Jul 20, 2021 · 3 comments · Fixed by #419
Closed

Document subgradient convention #404

oxinabox opened this issue Jul 20, 2021 · 3 comments · Fixed by #419
Labels
documentation Improvements or additions to documentation

Comments

@oxinabox
Copy link
Member

This is not written down anywhere in particular.
One writeup of it here
FluxML/Zygote.jl#1036 (comment)

@oxinabox oxinabox added the documentation Improvements or additions to documentation label Jul 23, 2021
@mcabbott
Copy link
Member

We say "If the derivative is not defined, but the subgradient contains zero, then just say the derivative is 0".

See discussion of clamp in JuliaDiff/ForwardDiff.jl#480 (comment) for a possible counterexample. If y = clamp(x,0,1) has zero gradient at the endpoints, your parameter will tend to get stuck there; it's more useful to give the nonzero gradient so that, if moving into the bulk lowers the loss, it can see that.

@oxinabox
Copy link
Member Author

Yeah, the wording I now have in mind is something like

You are free to chose any element of the subgradient. Choose the most useful. This will often mean choosing 0.

That counter example is a good one to include so can show that not always will the most useful it be zero.

Some more comments on the subgradient convention are here:
https://twitter.com/Awfidius/status/1419213506382028801

@mcabbott
Copy link
Member

When all sub-gradients are finite, their mean is probably the neutral choice. But in the ForwardDiff clamp story, it's the one you can't have, as you can't evaluate both branches.

The mean is also the one used by FiniteDifferences, I think. Which causes some tests to fail with an implementation of maximum via findmax: JuliaDiff/ChainRules.jl#480.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants