Skip to content

Latest commit

 

History

History
131 lines (89 loc) · 4.98 KB

NOTES.md

File metadata and controls

131 lines (89 loc) · 4.98 KB
  • MicroGrad is an autograd (automatic gradient) engine

Derivaties

  • variable with positive derivative means slightly increasing this will increase the final loss

  • variable with negative derivative with loss means increasing this slightly will decrease the loss.

  • Therefore, to decrease the loss,

    1. Learning rate times negative derivate step basically means decreasing by a nudge. Negative derivative and further drease it will increase the loss. So we need to bump this value up
    1. Learning rate times positive derivate step basically means increasing by a nudge. Positive derivative and further inreasing will increase the loss, so actually decrease this value

SEE HOW YOU NEED A MINUS IN THE UPDATE STEP

new_weight = old_weight - (learning rate * derivation)

Class Special functions

  1. repr -> returns a string representation of the object.

  2. add -> returns a new object which defines addition operation. c = a + b then object a's add is only called with self as a and other as b

  3. mul -> returns a new object which defines multiplication operation. c = a * b then object a's mul is only called with self as a and other as b

  4. rmul -> is called when a mul for self is not found. For example, if c = 2 * Value(2.9) 2 being an int cannot call mul. There the reverse mul will be called on second item where self = second item (Value) and other = first item

  5. truediv -> Function for / operation with self and other

  6. floordiv -> Function for // operation with self and other

  • There exists reverse for both. rtruediv and rfloordiv.
  1. sub -> Function to perform subtraction operation

  2. neg -> Function to perform the negate operation

  3. call -> This function is called when a object which is initialized (init already done) is called again. Example: x = [2.0, 3.0] # Sample inputs n = Neuron(2) # Initialize a Neuron with 2 inputs n(x) will call __call__ function with self as n and other as x

Value object parameters

  1. data -> value it represents
  2. -prev = tuple of children
  3. grad = variable to store the gradient (no required while init)
  4. _op = string operation which resulted in this value ('+', '-', '*', '**') (not required while init)
  5. _label = string unique identifier

Local Derivatives

  1. Base Case Derivative of loss with respect to loss will always be 1.0 (Linear change)

  2. Multiplication If c = a*b local derivative of c wrt a is always value of b and local derivative of c wrt b is always the value of a

  3. Addition (Just passes on gradients) if c = a + b local derivative of c wrt a is 1.0 & local derivative of c wrt b is 1.0 Therefore a plus just allows the gradients to pass locally

  4. Tanh if o = tanh(n), then derivative of o wrt n is 1 - o**2

Potential Bug

  • The derivatives are basically rates of change. If c = a + b and d = a * b the derivatives of a and b not just affect c but also d

  • while backpropogation if I only put self.grad for a as whatever comes from c path just pass along. and then overwrite it with doing backprop from d path, the gradients are only for the d-path since I overwrote it.

  • Therefore, gradients should be accumulated. += than =.

  • Doing this, one should remember to reset the gradients to zero everytime we run one step of backprop otherwise gradients will become very large.

Backward function

  • Every Value object (node) has a backward function to execute that will propogate the derivatives to it's children.
  • While doing forward pass, we keep track of children and how to populate using which math operation performed that forward pass.
  • The _backward attribute store which _backward method to call. While forward pass, we store this information as well.
  • Finally while doing backprop, all these methods are called in topological ordering filling all the gradients.

Neural Network Architecture

  1. Class Neuron!!
  • Defines what a single neuron does.
  • Takes nin inputs.
  • generates nin random weights
  • And a random bias weight
  • calculates w*x + b in the call function
  • applies activation function
  • Returns the activation
  1. Class Layer!!
  • Defines how many neurons in each layer
  • calls the __call__ function from each neuron
  • And returns list of activations from all neurons in each layer
  1. MLP!!
  • Defines the entire Network
  • Takes nins and nouts which denotes number of inputs and list of neurons in each layer
  • outputs and final layer feedforward
  • Initializes every layer based on nouts list
  • In the call function, Input passes through every layer and becomes and new input for next layer until reaching the end. This is feedforward.

All these have parameters definitions that return list of concatenated w and b

Training loop

  • Before we train we have input data xs output labels ys and n which is MLP network object
  • We do a forward pass by passing x into n using n(x) one by one and storing ypred
  • Set all parameter gradients to zero
  • Backward is just .backward()
  • Update weights of all parameters
  • Stats to show