MicroGrad is an autograd (automatic gradient) engine

Derivaties

variable with positive derivative means slightly increasing this will increase the final loss
variable with negative derivative with loss means increasing this slightly will decrease the loss.
Therefore, to decrease the loss,
1. Learning rate times negative derivate step basically means decreasing by a nudge. Negative derivative and further drease it will increase the loss. So we need to bump this value up
1. Learning rate times positive derivate step basically means increasing by a nudge. Positive derivative and further inreasing will increase the loss, so actually decrease this value

SEE HOW YOU NEED A MINUS IN THE UPDATE STEP

new_weight = old_weight - (learning rate * derivation)

Class Special functions

repr -> returns a string representation of the object.
add -> returns a new object which defines addition operation. c = a + b then object a's add is only called with self as a and other as b
mul -> returns a new object which defines multiplication operation. c = a * b then object a's mul is only called with self as a and other as b
rmul -> is called when a mul for self is not found. For example, if c = 2 * Value(2.9) 2 being an int cannot call mul. There the reverse mul will be called on second item where self = second item (Value) and other = first item
truediv -> Function for / operation with self and other
floordiv -> Function for // operation with self and other

There exists reverse for both. rtruediv and rfloordiv.

sub -> Function to perform subtraction operation
neg -> Function to perform the negate operation
call -> This function is called when a object which is initialized (init already done) is called again. Example: x = [2.0, 3.0] # Sample inputs n = Neuron(2) # Initialize a Neuron with 2 inputs n(x) will call __call__ function with self as n and other as x

Value object parameters

data -> value it represents
-prev = tuple of children
grad = variable to store the gradient (no required while init)
_op = string operation which resulted in this value ('+', '-', '*', '**') (not required while init)
_label = string unique identifier

Local Derivatives

Base Case Derivative of loss with respect to loss will always be 1.0 (Linear change)
Multiplication If c = a*b local derivative of c wrt a is always value of b and local derivative of c wrt b is always the value of a
Addition (Just passes on gradients) if c = a + b local derivative of c wrt a is 1.0 & local derivative of c wrt b is 1.0 Therefore a plus just allows the gradients to pass locally
Tanh if o = tanh(n), then derivative of o wrt n is 1 - o**2

Potential Bug

The derivatives are basically rates of change. If c = a + b and d = a * b the derivatives of a and b not just affect c but also d
while backpropogation if I only put self.grad for a as whatever comes from c path just pass along. and then overwrite it with doing backprop from d path, the gradients are only for the d-path since I overwrote it.
Therefore, gradients should be accumulated. += than =.
Doing this, one should remember to reset the gradients to zero everytime we run one step of backprop otherwise gradients will become very large.

Backward function

Every Value object (node) has a backward function to execute that will propogate the derivatives to it's children.
While doing forward pass, we keep track of children and how to populate using which math operation performed that forward pass.
The _backward attribute store which _backward method to call. While forward pass, we store this information as well.
Finally while doing backprop, all these methods are called in topological ordering filling all the gradients.

Neural Network Architecture

Class Neuron!!

Defines what a single neuron does.
Takes nin inputs.
generates nin random weights
And a random bias weight
calculates w*x + b in the call function
applies activation function
Returns the activation

Class Layer!!

Defines how many neurons in each layer
calls the __call__ function from each neuron
And returns list of activations from all neurons in each layer

MLP!!

Defines the entire Network
Takes nins and nouts which denotes number of inputs and list of neurons in each layer
outputs and final layer feedforward
Initializes every layer based on nouts list
In the call function, Input passes through every layer and becomes and new input for next layer until reaching the end. This is feedforward.

All these have parameters definitions that return list of concatenated w and b

Training loop

Before we train we have input data xs output labels ys and n which is MLP network object
We do a forward pass by passing x into n using n(x) one by one and storing ypred
Set all parameter gradients to zero
Backward is just .backward()
Update weights of all parameters
Stats to show

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOTES.md

NOTES.md

Derivaties

Class Special functions

Value object parameters

Local Derivatives

Potential Bug

Backward function

Neural Network Architecture

Training loop

Files

NOTES.md

Latest commit

History

NOTES.md

File metadata and controls

Derivaties

Class Special functions

Value object parameters

Local Derivatives

Potential Bug

Backward function

Neural Network Architecture

Training loop