Document what to do at nondifferentiable points #419

oxinabox · 2021-07-27T20:15:06Z

Closes #404

This is based on a conversation @awf and I had at RSE-Conf 2018.
And we have to some extent been following it in ChainRules.jl since some time before then.

So here it is written down more formally

Here is the Docs Preview
Feedback is appreciated.

docs/src/nondiff_points.md

codecov-commenter · 2021-07-27T22:32:31Z

Codecov Report

Merging #419 (80be21e) into main (99d56b1) will increase coverage by 0.11%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #419      +/-   ##
==========================================
+ Coverage   92.91%   93.03%   +0.11%     
==========================================
  Files          15       15              
  Lines         819      833      +14     
==========================================
+ Hits          761      775      +14     
  Misses         58       58

Impacted Files	Coverage Δ
src/rule_definition_tools.jl	`96.27% <0.00%> (+0.02%)`	⬆️
src/projection.jl	`97.47% <0.00%> (+0.07%)`	⬆️
src/accumulation.jl	`97.22% <0.00%> (+0.07%)`	⬆️
src/tangent_types/thunks.jl	`95.00% <0.00%> (+0.10%)`	⬆️
src/tangent_types/tangent.jl	`85.50% <0.00%> (+0.32%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99d56b1...80be21e. Read the comment docs.

docs/src/nondiff_points.md

oxinabox · 2021-11-09T18:07:44Z

I have dropped almost all the sub/super stuff and just stuck to lots of practical examples with discussion.
That seems more useful to most.
iirc this was @MasonProtter 's suggestion.
I think it is better now

Co-authored-by: Mason Protter <[email protected]>

Co-authored-by: Miha Zgubic <[email protected]>

docs/src/nondiff_points.md

MasonProtter · 2021-11-09T19:15:38Z

docs/src/nondiff_points.md

+
+This has a number of advantages.
+- It follows the rule that derivatives are zero at local minima (and maxima).
+- If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee.


Suggested change

- If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee.

- If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee.

The word "flee" is evocative, but maybe a little confusing here. Maybe instead we could say "oscillate" or "wobble"

MasonProtter · 2021-11-09T19:17:58Z

docs/src/nondiff_points.md

+```
+
+We do not have to worry about what to return for the side where it is not defined.
+As we will never be asked for the derivative at e.g. `x=-2.5` since the primal function errors.


Just a comment, not sure if it's important but the primal won't error if we make the argument complex. And in that case there's the interesting issue of the branch cut.

MasonProtter · 2021-11-09T19:20:19Z

docs/src/nondiff_points.md

+ - If the derivative from one side is finite and the other isn't, say it is the derivative taken from finite side.
+ - When derivative from each side is not equal, strongly consider reporting the average
+
+Our goal as always, is to get a pragmatically useful result for everyone, which must by necessity also avoid a pathological result for anyone.


Maybe worth mentioning that we can't always get the result that's best for literally everyone, but we sometimes just have to do our best.

mzgubic

Content looks good, and is definitely a useful addition.

My only suggestions would be to separate this in two: have the short version paragraph at the top in "writing good rules" and link to the rest of the text which IMO belongs to "maths" section.

oxinabox · 2021-11-10T13:02:03Z

the writing good rules section is too long.
I think we can skip having a short summary of this there (for now at least).
People who need to answer this know they need to answer this and so can look it up.

But I will move this under math.

awf

Sorry annoying to add comments after you've already merged. I'm happy to do a PR instead if that's easier.

awf · 2021-11-10T14:47:27Z

docs/src/maths/nondiff_points.md

+
+This has a number of advantages.
+- It follows the rule that derivatives are zero at local minima (and maxima).
+- If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee.


awf · 2021-11-10T14:48:41Z

docs/src/maths/nondiff_points.md

+
+The other option for `x->ceil(x)` would be relax the problem into `x->x`, and thus  say it is 1 everywhere
+But that it too weird, if the use wanted a relaxation of the problem then they would provide one.
+We can not be imposing that relaxation on to `ceil` for everyone is not reasonable.


We can not be imposing that relaxation on to ceil for everyone

or

Imposing that relaxation on to ceil for everyone is not reasonable.

awf · 2021-11-10T14:57:27Z

docs/src/maths/nondiff_points.md

+
+We do not have to worry about what to return for the side where it is not defined.
+As we will never be asked for the derivative at e.g. `x=-2.5` since the primal function errors.
+But we do need to worry about at the boundary -- if that boundary point doesn't error.


Maybe replace with

But we do need to worry about at the boundary. The function is defined for x=0 (because exp is defined at -Inf), but AD will return <what will it return? Is it NaN?>

awf · 2021-11-10T14:58:15Z

docs/src/maths/nondiff_points.md

+As we will never be asked for the derivative at e.g. `x=-2.5` since the primal function errors.
+But we do need to worry about at the boundary -- if that boundary point doesn't error.
+
+Since we will never be asked about the left-hand side (as the primal errors), we can use just the right-hand side derivative.


Repetition of line 95-96.

awf · 2021-11-10T14:59:03Z

docs/src/maths/nondiff_points.md

+But this is more or less the same as choosing some large value -- in this case an extremely large value that will rapidly overflow.
+
+
+### Derivative on-finite and different on both sides


awf · 2021-11-10T14:59:45Z

docs/src/maths/nondiff_points.md

+```
+
+In this example, the primal is defined and finite, so we would like a derivative to defined.
+We are back in the case of a local minimal like we were for `abs`.


awf · 2021-11-10T14:59:55Z

docs/src/maths/nondiff_points.md

+plot(x-> sign(x) * cbrt(x))
+```
+
+In this example, the primal is defined and finite, so we would like a derivative to defined.


to be defined

awf · 2021-11-10T15:00:46Z

docs/src/maths/nondiff_points.md

+From the case studies a few general rules can be seen for how to choose a value that is _useful_.
+These rough rules are:
+ - Say the derivative is 0 at local optima
+ - If the derivative from one side is defined and the other isn't, say it is the derivative taken from defined side.


taken from the

awf · 2021-11-10T15:00:53Z

docs/src/maths/nondiff_points.md

+These rough rules are:
+ - Say the derivative is 0 at local optima
+ - If the derivative from one side is defined and the other isn't, say it is the derivative taken from defined side.
+ - If the derivative from one side is finite and the other isn't, say it is the derivative taken from finite side.


taken from the

awf · 2021-11-10T15:01:37Z

docs/src/maths/nondiff_points.md

+ - Say the derivative is 0 at local optima
+ - If the derivative from one side is defined and the other isn't, say it is the derivative taken from defined side.
+ - If the derivative from one side is finite and the other isn't, say it is the derivative taken from finite side.
+ - When derivative from each side is not equal, strongly consider reporting the average


I'm kind of inclined to remove "strongly"

oxinabox · 2021-11-10T20:18:54Z

@awf your comments look good to me, please do make a PR.

oxinabox added the documentation Improvements or additions to documentation label Jul 27, 2021

JuliaDiff deleted a comment from codecov-commenter Jul 27, 2021

oxinabox commented Jul 27, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

oxinabox commented Jul 27, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

oxinabox commented Jul 27, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

aisopous reviewed Jul 27, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

oxinabox commented Jul 27, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

MasonProtter reviewed Jul 28, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

mzgubic reviewed Jul 28, 2021

View reviewed changes

oxinabox commented Jul 28, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

c42f reviewed Jul 29, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

sethaxen self-requested a review July 29, 2021 10:36

oxinabox commented Jul 29, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

mzgubic reviewed Aug 25, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

oxinabox changed the title ~~Document sub/super-differential convention~~ Document what to do at nondifferentiable points Nov 9, 2021

oxinabox and others added 11 commits November 9, 2021 18:15

Document sub/super-differential convention

4695c69

I would like a markdown compatible spell-checker please

1242e74

add tldr

3a23527

Update docs/src/nondiff_points.md

c190bdf

Co-authored-by: Mason Protter <[email protected]>

Apply suggestions from code review

369b273

Co-authored-by: Miha Zgubic <[email protected]>

add gif

cd400b6

Add case study plots

78e4582

add and fill in some more examples

af3b139

wip

7807bfb

wip

4a7f0b4

wip

4cfd8a1

oxinabox added 3 commits November 9, 2021 18:16

wip

1099b2a

Finish examples

19f3399

Delete sub/super discussion

35f8633

oxinabox force-pushed the ox/subgrad_convention branch from 74e52b0 to 35f8633 Compare November 9, 2021 18:22

MasonProtter reviewed Nov 9, 2021

View reviewed changes

docs/src/nondiff_points.md Outdated Show resolved Hide resolved

MasonProtter reviewed Nov 9, 2021

View reviewed changes

Update docs/src/nondiff_points.md

ca4b1ff

mzgubic approved these changes Nov 10, 2021

View reviewed changes

oxinabox added 2 commits November 10, 2021 13:16

move to under math docs, and fix plotting on CI

80be21e

update short version

efdb8d0

oxinabox merged commit 4d27d7e into main Nov 10, 2021

oxinabox deleted the ox/subgrad_convention branch November 10, 2021 13:36

awf reviewed Nov 10, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document what to do at nondifferentiable points #419

Document what to do at nondifferentiable points #419

oxinabox commented Jul 27, 2021 •

edited

Loading

codecov-commenter commented Jul 27, 2021 •

edited

Loading

oxinabox commented Nov 9, 2021

MasonProtter Nov 9, 2021

MasonProtter Nov 9, 2021

MasonProtter Nov 9, 2021

mzgubic left a comment

oxinabox commented Nov 10, 2021

awf left a comment

awf Nov 10, 2021

awf Nov 10, 2021

awf Nov 10, 2021

awf Nov 10, 2021

awf Nov 10, 2021

awf Nov 10, 2021

awf Nov 10, 2021

awf Nov 10, 2021

awf Nov 10, 2021

awf Nov 10, 2021

oxinabox commented Nov 10, 2021

	- If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee.
	- If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee.

		But this is more or less the same as choosing some large value -- in this case an extremely large value that will rapidly overflow.


		### Derivative on-finite and different on both sides

Document what to do at nondifferentiable points #419

Document what to do at nondifferentiable points #419

Conversation

oxinabox commented Jul 27, 2021 • edited Loading

codecov-commenter commented Jul 27, 2021 • edited Loading

Codecov Report

oxinabox commented Nov 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzgubic left a comment

Choose a reason for hiding this comment

oxinabox commented Nov 10, 2021

awf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxinabox commented Nov 10, 2021

oxinabox commented Jul 27, 2021 •

edited

Loading

codecov-commenter commented Jul 27, 2021 •

edited

Loading