r4ds · jonthegeek · Nov 23, 2023 · Nov 15, 2023 · Nov 17, 2023 · Nov 22, 2023
diff --git a/07_modules.Rmd b/07_modules.Rmd
@@ -2,23 +2,133 @@
 
 **Learning objectives:**
 
-- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
+Learn about *modules* with focus on `nn_linear()`, `nn_squential()`, and `nn_module()`
 
-## SLIDE 1 {-}
+## Built-in modules
 
-- ADD SLIDES AS SECTIONS (`##`).
-- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
+**What are modules?**
 
-## Meeting Videos {-}
+:   -   an object that encapsulates state
+    -   can be of any complexity (e.g. layer, or models consisting of layers)
 
-### Cohort 1 {-}
+Examples of `{torch}` modules:
+
+-   linear: `nn_linear()`
+-   convolutional: `nn_conf1d()`, `nn_conf2d()`, `nn_conv_3d()`
+-   recurrent: `nn_lstm()`, `nn_gru()`
+-   embedding: `nn_embedding()`
+-   multi-head attention: `nn_multihead_attention()`
+-   See [torch documentation](https://torch.mlverse.org/docs/reference/#neural-network-modules) for others
+
+## Linear Layer: `nn_linear()`
+
+Consider the [linear layer](https://torch.mlverse.org/docs/reference/nn_linear):
+
+```{r}
+library(torch)
+
+l <- nn_linear(in_features = 5, out_features = 16) #bias = TRUE is default
+l
+```
+
+Comment about size: We expect `l` to be $5 \times 16$ (i.e for matrix multiplication: $X_{50\times5}* \beta_{5 \times 16}$). We see below that it is $16 \times 5$, which is due to the underlying C++ implementation of `libtorch`. For performance reasons, the transpose is stored.
+
+```{r}
+l$weight$size()
+```
+
+Apply the module:
+
+```{r}
+#Generate data: generated from the normal distribution
+x <- torch_randn(50, 5) 
+
+# Feed x into layer:
+output <- l(x)
+
+output$size()
+```
+
+When we use built-in modules, `requires_grad = TRUE` is [*not*]{.underline} required in creation of the tensor (unlike previous chapters). It's taken care of for us.
+
+## Sequential Models: `nn_squential()`
+
+[`nn_squential()`](https://torch.mlverse.org/docs/reference/nn_sequential) can be used for models that propagate straight through the layers. A Multi-Layer Perceptron (MLP) is an example (i.e. a network consisting only of linear layers). Below we build an MLP using this method:
+
+```{r}
+mlp <- nn_sequential( # all arguments should be modules
+  nn_linear(10, 32),
+  nn_relu(),
+  nn_linear(32,64),
+  nn_relu(),
+  nn_linear(64,1)
+)
+```
+
+Apply this model to random data:
+
+```{r}
+output <- mlp(torch_randn(50, 10))
+```
+
+## General Models: `nn_module()`
+
+[`nn_module()`](https://torch.mlverse.org/docs/reference/nn_module) is "factory function" for building models of arbitrary complexity. More flexible than the sequential model. Use to define:
+
+-   weight initialization
+
+-   model structure (forward pass), including identification of model parameters using `nn_parameter()` .
+
+Example:
+
+```{r}
+my_linear <- nn_module(
+  initialize = function(in_features, out_features){
+    self$w <- nn_parameter(torch_randn(in_features, out_features)) # random normal
+    self$b <- nn_parameter(torch_zeros(out_features))              # zeros
+  },
+  forward = function(input){
+    input$mm(self$w) + self$b
+  }
+)
+```
+
+Next instantiate the model with input and output dimensions:
+
+```{r}
+l <- my_linear(7, 1)
+l
+```
+
+Apply the model to random data (just like we did in the previous section):
+
+```{r}
+output <- l(torch_randn(5, 7))
+output
+```
+
+That was the forward pass. Let's define a (dummy) loss function and compute the gradient:
+
+```{r}
+loss <- output$mean()
+loss$backward() # compute gradient
+l$w$grad #inspect result
+```
+
+## 
+
+## Meeting Videos {.unnumbered}
+
+### Cohort 1 {.unnumbered}
 
 `r knitr::include_url("https://www.youtube.com/embed/URL")`
 
 <details>
-<summary> Meeting chat log </summary>
 
-```
+<summary>Meeting chat log</summary>
+
+```         
 LOG
 ```
+
 </details>