Author: Archit Vasan ([email protected]), including and adapting materials and discussions over time by Varuni Sastri, Carlo Graziani, Taylor Childers, Venkat Vishwanath, Jay Alammar and Kevin Gimpel.
This tutorial continues the discussion with Carlo Graziani from last week on large language models (LLMs) where he introduced sequential data modeling, tokenization methods and embeddings. Here, we will attempt to demystify aspects of the Transformer model architecture.
We will refer to this notebook:
The discussion will include:
- positional encodings,
- attention mechanisms,
- output layers,
- and training loops.
This will hopefully also provide the necessary background for next week's discussion of distributed training of LLMs.
We are first going to use "text-generation" using the popular GPT-2 model and the Hugging Face pipeline. Then we are going to code the model elements of a simple LLM from scratch and train this ourselves.
Next week, we'll learn about how to train more complicated LLMs using distributed resources.
- If you are using ALCF, first log in. From a terminal run the following command:
-
Although we already cloned the repo before, you'll want the updated version. To be reminded of the instructions for syncing your fork, click here.
-
We will be downloading data in our Jupyter notebook, which runs on hardware that by default has no Internet access. From the terminal on Polaris, edit the ~/.bash_profile file to have these proxy settings:
export HTTP_PROXY="http://proxy-01.pub.alcf.anl.gov:3128"
export HTTPS_PROXY="http://proxy-01.pub.alcf.anl.gov:3128"
export http_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
export https_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
export ftp_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
export no_proxy="admin,polaris-adminvm-01,localhost,*.cm.polaris.alcf.anl.gov,polaris-*,*.polaris.alcf.anl.gov,*.alcf.anl.gov"
-
Now that we have the updated notebooks, we can open them. If you are using ALCF JupyterHub or Google Colab, you can be reminded of the steps here.
-
Reminder: Change the notebook's kernel to
datascience/conda-2023-01-10
(you may need to change kernel each time you open a notebook for the first time):- select Kernel in the menu bar
- select Change kernel...
- select datascience/conda-2023-01-10 from the drop-down menu
Here is an image of GenSLM described earlier by Arvind Ramanathan. This is a language model that can model genomic information in a single model. It was shown to model the evolution of SARS-COV2 without expensive experiments.
Here are some recommendations for further reading and additional code for review.
- "The Illustrated Transformer" by Jay Alammar
- "Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)"
- "The Illustrated GPT-2 (Visualizing Transformer Language Models)"
- "LLM Tutorial Workshop (Argonne National Laboratory)"
- "LLM Tutorial Workshop Part 2 (Argonne National Laboratory)"