- MicroGrad is an autograd (automatic gradient) engine
-
variable with positive derivative means slightly increasing this will increase the final loss
-
variable with negative derivative with loss means increasing this slightly will decrease the loss.
-
Therefore, to decrease the loss,
-
- Learning rate times negative derivate step basically means decreasing by a nudge. Negative derivative and further drease it will increase the loss. So we need to bump this value up
-
- Learning rate times positive derivate step basically means increasing by a nudge. Positive derivative and further inreasing will increase the loss, so actually decrease this value
SEE HOW YOU NEED A MINUS IN THE UPDATE STEP
new_weight = old_weight - (learning rate * derivation)
-
repr -> returns a string representation of the object.
-
add -> returns a new object which defines addition operation. c = a + b then object a's add is only called with
self
as a andother
as b -
mul -> returns a new object which defines multiplication operation. c = a * b then object a's mul is only called with
self
as a andother
as b -
rmul -> is called when a mul for self is not found. For example, if
c = 2 * Value(2.9)
2 being an int cannot call mul. There the reverse mul will be called on second item where self = second item (Value) and other = first item -
truediv -> Function for
/
operation with self and other -
floordiv -> Function for
//
operation with self and other
- There exists reverse for both. rtruediv and rfloordiv.
-
sub -> Function to perform subtraction operation
-
neg -> Function to perform the negate operation
-
call -> This function is called when a object which is initialized (init already done) is called again. Example: x = [2.0, 3.0] # Sample inputs n = Neuron(2) # Initialize a Neuron with 2 inputs
n(x)
will call__call__
function with self as n and other as x
- data -> value it represents
- -prev = tuple of children
- grad = variable to store the gradient (no required while init)
- _op = string operation which resulted in this value ('+', '-', '*', '**') (not required while init)
- _label = string unique identifier
-
Base Case Derivative of loss with respect to loss will always be 1.0 (Linear change)
-
Multiplication If c = a*b local derivative of c wrt a is always value of b and local derivative of c wrt b is always the value of a
-
Addition (Just passes on gradients) if c = a + b local derivative of c wrt a is 1.0 & local derivative of c wrt b is 1.0 Therefore a plus just allows the gradients to pass locally
-
Tanh if o = tanh(n), then derivative of o wrt n is 1 - o**2
-
The derivatives are basically rates of change. If c = a + b and d = a * b the derivatives of a and b not just affect c but also d
-
while backpropogation if I only put self.grad for
a
as whatever comes fromc path
just pass along. and then overwrite it with doing backprop fromd path
, the gradients are only for the d-path since I overwrote it. -
Therefore, gradients should be accumulated.
+=
than=
. -
Doing this, one should remember to reset the gradients to zero everytime we run one step of backprop otherwise gradients will become very large.
- Every Value object (node) has a backward function to execute that will propogate the derivatives to it's children.
- While doing forward pass, we keep track of children and how to populate using which math operation performed that forward pass.
- The
_backward
attribute store which _backward method to call. While forward pass, we store this information as well. - Finally while doing backprop, all these methods are called in topological ordering filling all the gradients.
- Class Neuron!!
- Defines what a single neuron does.
- Takes
nin
inputs. - generates
nin
random weights - And a random bias weight
- calculates
w*x + b
in the call function - applies activation function
- Returns the activation
- Class Layer!!
- Defines how many neurons in each layer
- calls the
__call__
function from each neuron - And returns list of activations from all neurons in each layer
- MLP!!
- Defines the entire Network
- Takes
nins
andnouts
which denotes number of inputs and list of neurons in each layer - outputs and final layer feedforward
- Initializes every layer based on
nouts
list - In the call function, Input passes through every layer and becomes and new input for next layer until reaching the end. This is feedforward.
All these have parameters definitions that return list of concatenated w and b
- Before we train we have input data
xs
output labelsys
andn
which is MLP network object - We do a forward pass by passing x into n using
n(x)
one by one and storingypred
- Set all parameter gradients to zero
- Backward is just .backward()
- Update weights of all parameters
- Stats to show