This implements the LLM as described in Build a Large Language Model (From Scratch) by Sebastian Raschka.
The concepts and most of the code pieces are taken from the book. I did modifications to enhance the usability and have an llm which runs out of the box and uses all the described concepts in the book.
See also the GitHub repository for the original code of the book.
Most of the code is commented with few sentences what it is doing and where to look at in the book to find more information.
The intention of this repository is to exist for learning-purposes.
This repository uses poetry
for managing virtual environments and packages.
- Clone the repository
git clone [email protected]:denniskawurek/llm-from-scratch.git
- Install Poetry: Installation Guide
- Install dependencies:
poetry install
- Start a shell:
poetry shell
Within the shell the following commands can be executed:
Before the LLM can be used, the model needs to be fine-tuned.
The LLM fine-tunes a gpt2-medium (355M)
and loads the weights to the GPTModel
class in gpt.py
.
Run instruction-finetune.py
to kick-off the process:
python llm/instruction-finetune.py
This loads some data. After that the finetuning process starts which may take some time depending on the hardware.
The model is stored in the models
directory.
The fine-tuned model can be evaluated with ollama.
For this install ollama
and run:
ollama serve
and then in a different window:
ollama run llama3
Now uncomment the following line in llm.py
and run python llm.py
afterwards:
# evaluate_model(model, get_tokenizer(), device)
Use llm.py
to run the LLM. Right now it contains the input_text
variable. This is the instruction which can be set for the LLM.
The following command starts an interactive session where the model is loaded and can receive instructions:
python llm/llm.py
The following command loads just the model and processes the given instruction:
python llm/llm.py 'What is the capital of United Kingdom?'
--prefer-gpu
- use GPU for generating responses.
instruction-finetune.py
has a prefer_gpu
variable. If this is set to true, cuda
will be used to train and run the model. Default is cpu
(or mps
for MacBooks).
Beware that only a model which was finetuned with cuda
can be used in llm.py
with prefer_gpu=True