Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Draft state.
Closes Gradient Clipping #596.
Adds Storage and Gradient view/mutating methods.
dfdx::nn_traits::WithGrads
trait anddfdx_derives::WithGrads
proc macro, basead onZeroGrads
.ZeroGrads
trait could be merged into theWithGrads
by mostly just merging their methods.dfdx_core::tensor::WithStorage
trait.Changed some methods from
Gradients
:get_mut
aspub
.get_ref
aspub
, and lower the requirements from&mut self
to&self
.Added gradient clamping and cliping methods.
Example using clip_norm:
Note that
clip_norm
doesn't change the grads "direction" because all grad values are scaled by the same value, whileclip_value
does changes the direction (because some values are changed while others are left intact). So for gradient descent, where the grads direction is supposed to be somewhat followed, my guess is thatclip_norm
is better.