Skip to content

Commit

Permalink
Add colab notebook.
Browse files Browse the repository at this point in the history
  • Loading branch information
thammegowda committed Jun 24, 2020
1 parent d95e500 commit 324379d
Showing 1 changed file with 16 additions and 12 deletions.
28 changes: 16 additions & 12 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -449,9 +449,10 @@ <h1>Reader-Translator-Generator (RTG)</h1>
<li><a href="#_overview">1. Overview</a>
<ul class="sectlevel2">
<li><a href="#_features">1.1. Features</a></li>
<li><a href="#_setup">1.2. Setup</a></li>
<li><a href="#_usage">1.3. Usage</a></li>
<li><a href="#_credits_thanks">1.4. Credits / Thanks</a></li>
<li><a href="#colab-example">1.2. Quick Start using Google Colab</a></li>
<li><a href="#_setup">1.3. Setup</a></li>
<li><a href="#_usage">1.4. Usage</a></li>
<li><a href="#_credits_thanks">1.5. Credits / Thanks</a></li>
</ul>
</li>
<li><a href="#conf">2. RTG <strong><code>conf.yml</code></strong> File</a>
Expand Down Expand Up @@ -595,7 +596,13 @@ <h3 id="_features">1.1. Features</h3>
</div>
</div>
<div class="sect2">
<h3 id="_setup">1.2. Setup</h3>
<h3 id="colab-example">1.2. Quick Start using Google Colab</h3>
<div class="paragraph">
<p>Use this Google Colab notebook for learning <em>how to train your NMT model with RTG</em>: <a href="https://colab.research.google.com/drive/198KbkUcCGXJXnWiM7IyEiO1Mq2hdVq8T?usp=sharing" class="bare">https://colab.research.google.com/drive/198KbkUcCGXJXnWiM7IyEiO1Mq2hdVq8T?usp=sharing</a></p>
</div>
</div>
<div class="sect2">
<h3 id="_setup">1.3. Setup</h3>
<div class="paragraph">
<p>Add the root of this repo to <code>PYTHONPATH</code> or install it via <code>pip --editable</code></p>
</div>
Expand Down Expand Up @@ -638,7 +645,7 @@ <h3 id="_setup">1.2. Setup</h3>
</div>
</div>
<div class="sect2">
<h3 id="_usage">1.3. Usage</h3>
<h3 id="_usage">1.4. Usage</h3>
<div class="paragraph">
<p>Refer to <code>scripts/rtg-pipeline.sh</code> bash script and <code>examples/transformer.base.yml</code> file for specific examples.</p>
</div>
Expand Down Expand Up @@ -704,7 +711,7 @@ <h3 id="_usage">1.3. Usage</h3>
</div>
</div>
<div class="sect2">
<h3 id="_credits_thanks">1.4. Credits / Thanks</h3>
<h3 id="_credits_thanks">1.5. Credits / Thanks</h3>
<div class="ulist">
<ul>
<li>
Expand Down Expand Up @@ -1074,14 +1081,11 @@ <h2 id="avoid-oom">4. Avoiding Out-of-Memory</h2>
<div class="sect2">
<h3 id="_trainer_memory">4.1. Trainer Memory</h3>
<div class="paragraph">
<p>Let&#8217;s visualize the total memory required memory for training a model in the order of a 5D tensor: <code>[Layers x ModelDim x Batch x SequenceLength x Vocabulary]</code></p>
<p>Let&#8217;s visualize the total required memory for training a model in the order of a 4D tensor: <code>[ ModelDim x Batch x SequenceLength x Vocabulary]</code></p>
</div>
<div class="ulist">
<ul>
<li>
<p>Number of layers are often fixed. [There is something we can do (see Google&#8217;s Reformer) , but it is beyond our scope at the moment.]</p>
</li>
<li>
<p>Model dim is often fixed. We dont do anything fancy here.</p>
</li>
<li>
Expand Down Expand Up @@ -1109,7 +1113,7 @@ <h3 id="_trainer_memory">4.1. Trainer Memory</h3>
<p>If you have GPUs with larger memory, use them. For example, V100 with 32GB is much better than 1080 Ti with 11GB.</p>
</li>
<li>
<p>If you have larger GPU, but you have many smaller GPUs, use many them by setting <code>CUDA_VISIBLE_DEVICES</code> variable to comma separated list of GPU IDs.
<p>If you dont have larger GPU, but you have many smaller GPUs, use many them by setting <code>CUDA_VISIBLE_DEVICES</code> variable to comma separated list of GPU IDs.
The built in <code>DataParallel</code> module divides batches into multiple GPUs &#8658; reduces total memory needed on each GPU.</p>
</li>
<li>
Expand All @@ -1130,7 +1134,7 @@ <h3 id="_trainer_memory">4.1. Trainer Memory</h3>
<div class="sect2">
<h3 id="_decoder_memory">4.2. Decoder Memory</h3>
<div class="paragraph">
<p>Since beam decoder is used, let&#8217;s visualize <code>[Batch x Beams x Vocabulary x SequenceLength]</code></p>
<p>Since beam decoder is used, let&#8217;s visualize memory as <code>[Batch x Beams x Vocabulary x SequenceLength]</code></p>
</div>
<div class="ulist">
<ul>
Expand Down

0 comments on commit 324379d

Please sign in to comment.