diff --git a/docs/index.html b/docs/index.html index 757584f..64ee72c 100644 --- a/docs/index.html +++ b/docs/index.html @@ -449,9 +449,10 @@

Reader-Translator-Generator (RTG)

  • 1. Overview
  • 2. RTG conf.yml File @@ -595,7 +596,13 @@

    1.1. Features

    -

    1.2. Setup

    +

    1.2. Quick Start using Google Colab

    +
    +

    Use this Google Colab notebook for learning how to train your NMT model with RTG: https://colab.research.google.com/drive/198KbkUcCGXJXnWiM7IyEiO1Mq2hdVq8T?usp=sharing

    +
    +
    +
    +

    1.3. Setup

    Add the root of this repo to PYTHONPATH or install it via pip --editable

    @@ -638,7 +645,7 @@

    1.2. Setup

    -

    1.3. Usage

    +

    1.4. Usage

    Refer to scripts/rtg-pipeline.sh bash script and examples/transformer.base.yml file for specific examples.

    @@ -704,7 +711,7 @@

    1.3. Usage

    -

    1.4. Credits / Thanks

    +

    1.5. Credits / Thanks

    • @@ -1074,14 +1081,11 @@

      4. Avoiding Out-of-Memory

      4.1. Trainer Memory

      -

      Let’s visualize the total memory required memory for training a model in the order of a 5D tensor: [Layers x ModelDim x Batch x SequenceLength x Vocabulary]

      +

      Let’s visualize the total required memory for training a model in the order of a 4D tensor: [ ModelDim x Batch x SequenceLength x Vocabulary]

      • -

        Number of layers are often fixed. [There is something we can do (see Google’s Reformer) , but it is beyond our scope at the moment.]

        -
      • -
      • Model dim is often fixed. We dont do anything fancy here.

      • @@ -1109,7 +1113,7 @@

        4.1. Trainer Memory

        If you have GPUs with larger memory, use them. For example, V100 with 32GB is much better than 1080 Ti with 11GB.

      • -

        If you have larger GPU, but you have many smaller GPUs, use many them by setting CUDA_VISIBLE_DEVICES variable to comma separated list of GPU IDs. +

        If you dont have larger GPU, but you have many smaller GPUs, use many them by setting CUDA_VISIBLE_DEVICES variable to comma separated list of GPU IDs. The built in DataParallel module divides batches into multiple GPUs ⇒ reduces total memory needed on each GPU.

      • @@ -1130,7 +1134,7 @@

        4.1. Trainer Memory

        4.2. Decoder Memory

        -

        Since beam decoder is used, let’s visualize [Batch x Beams x Vocabulary x SequenceLength]

        +

        Since beam decoder is used, let’s visualize memory as [Batch x Beams x Vocabulary x SequenceLength]