diff --git a/docs/index.html b/docs/index.html
index 757584f..64ee72c 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -449,9 +449,10 @@ <h1>Reader-Translator-Generator (RTG)</h1>
 <li><a href="#_overview">1. Overview</a>
 <ul class="sectlevel2">
 <li><a href="#_features">1.1. Features</a></li>
-<li><a href="#_setup">1.2. Setup</a></li>
-<li><a href="#_usage">1.3. Usage</a></li>
-<li><a href="#_credits_thanks">1.4. Credits / Thanks</a></li>
+<li><a href="#colab-example">1.2. Quick Start using Google Colab</a></li>
+<li><a href="#_setup">1.3. Setup</a></li>
+<li><a href="#_usage">1.4. Usage</a></li>
+<li><a href="#_credits_thanks">1.5. Credits / Thanks</a></li>
 </ul>
 </li>
 <li><a href="#conf">2. RTG <strong><code>conf.yml</code></strong> File</a>
@@ -595,7 +596,13 @@ <h3 id="_features">1.1. Features</h3>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_setup">1.2. Setup</h3>
+<h3 id="colab-example">1.2. Quick Start using Google Colab</h3>
+<div class="paragraph">
+<p>Use this Google Colab notebook for learning <em>how to train your NMT model with RTG</em>: <a href="https://colab.research.google.com/drive/198KbkUcCGXJXnWiM7IyEiO1Mq2hdVq8T?usp=sharing" class="bare">https://colab.research.google.com/drive/198KbkUcCGXJXnWiM7IyEiO1Mq2hdVq8T?usp=sharing</a></p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_setup">1.3. Setup</h3>
 <div class="paragraph">
 <p>Add the root of this repo to <code>PYTHONPATH</code> or install it via <code>pip --editable</code></p>
 </div>
@@ -638,7 +645,7 @@ <h3 id="_setup">1.2. Setup</h3>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_usage">1.3. Usage</h3>
+<h3 id="_usage">1.4. Usage</h3>
 <div class="paragraph">
 <p>Refer to <code>scripts/rtg-pipeline.sh</code> bash script and <code>examples/transformer.base.yml</code> file for specific examples.</p>
 </div>
@@ -704,7 +711,7 @@ <h3 id="_usage">1.3. Usage</h3>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_credits_thanks">1.4. Credits / Thanks</h3>
+<h3 id="_credits_thanks">1.5. Credits / Thanks</h3>
 <div class="ulist">
 <ul>
 <li>
@@ -1074,14 +1081,11 @@ <h2 id="avoid-oom">4. Avoiding Out-of-Memory</h2>
 <div class="sect2">
 <h3 id="_trainer_memory">4.1. Trainer Memory</h3>
 <div class="paragraph">
-<p>Let&#8217;s visualize the total memory required memory for training a model in the order of a 5D tensor: <code>[Layers x ModelDim x Batch x SequenceLength x Vocabulary]</code></p>
+<p>Let&#8217;s visualize the total required memory for training a model in the order of a 4D tensor: <code>[ ModelDim x Batch x SequenceLength x Vocabulary]</code></p>
 </div>
 <div class="ulist">
 <ul>
 <li>
-<p>Number of layers are often fixed. [There is something we can do (see Google&#8217;s Reformer) , but it is beyond our scope at the moment.]</p>
-</li>
-<li>
 <p>Model dim is often fixed. We dont do anything fancy here.</p>
 </li>
 <li>
@@ -1109,7 +1113,7 @@ <h3 id="_trainer_memory">4.1. Trainer Memory</h3>
 <p>If you have GPUs with larger memory, use them. For example, V100 with 32GB is much better than 1080 Ti with 11GB.</p>
 </li>
 <li>
-<p>If you have larger GPU, but you have many smaller GPUs, use many them by setting <code>CUDA_VISIBLE_DEVICES</code> variable to comma separated list of GPU IDs.
+<p>If you dont have larger GPU, but you have many smaller GPUs, use many them by setting <code>CUDA_VISIBLE_DEVICES</code> variable to comma separated list of GPU IDs.
 The built in <code>DataParallel</code> module divides batches into multiple GPUs &#8658; reduces total memory needed on each GPU.</p>
 </li>
 <li>
@@ -1130,7 +1134,7 @@ <h3 id="_trainer_memory">4.1. Trainer Memory</h3>
 <div class="sect2">
 <h3 id="_decoder_memory">4.2. Decoder Memory</h3>
 <div class="paragraph">
-<p>Since beam decoder is used, let&#8217;s visualize <code>[Batch x Beams x Vocabulary x SequenceLength]</code></p>
+<p>Since beam decoder is used, let&#8217;s visualize memory as <code>[Batch x Beams x Vocabulary x SequenceLength]</code></p>
 </div>
 <div class="ulist">
 <ul>