Implementing an LLM from Scratch

This implements the LLM as described in Build a Large Language Model (From Scratch) by Sebastian Raschka.

The concepts and most of the code pieces are taken from the book. I did modifications to enhance the usability and have an llm which runs out of the box and uses all the described concepts in the book.

See also the GitHub repository for the original code of the book.

Most of the code is commented with few sentences what it is doing and where to look at in the book to find more information.

The intention of this repository is to exist for learning-purposes.

Setup

This repository uses poetry for managing virtual environments and packages.

Clone the repository

git clone [email protected]:denniskawurek/llm-from-scratch.git

Install Poetry: Installation Guide
Install dependencies:

poetry install

Start a shell:

poetry shell

Within the shell the following commands can be executed:

Fine-tune LLM

Before the LLM can be used, the model needs to be fine-tuned.

The LLM fine-tunes a gpt2-medium (355M) and loads the weights to the GPTModel class in gpt.py.

Run instruction-finetune.py to kick-off the process:

python llm/instruction-finetune.py

This loads some data. After that the finetuning process starts which may take some time depending on the hardware.

The model is stored in the models directory.

Evaluating the model

The fine-tuned model can be evaluated with ollama.

For this install ollama and run:

ollama serve

and then in a different window:

ollama run llama3

Now uncomment the following line in llm.py and run python llm.py afterwards:

# evaluate_model(model, get_tokenizer(), device)

Use and run the LLM

Use llm.py to run the LLM. Right now it contains the input_text variable. This is the instruction which can be set for the LLM.

The following command starts an interactive session where the model is loaded and can receive instructions:

python llm/llm.py

The following command loads just the model and processes the given instruction:

python llm/llm.py 'What is the capital of United Kingdom?'

Options

--prefer-gpu - use GPU for generating responses.

Using GPU/CPU

instruction-finetune.py has a prefer_gpu variable. If this is set to true, cuda will be used to train and run the model. Default is cpu (or mps for MacBooks).

Beware that only a model which was finetuned with cuda can be used in llm.py with prefer_gpu=True

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
llm		llm
models		models
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementing an LLM from Scratch

Setup

Fine-tune LLM

Evaluating the model

Use and run the LLM

Options

Using GPU/CPU

About

Releases

Packages

Languages

License

denniskawurek/llm-from-scratch

Folders and files

Latest commit

History

Repository files navigation

Implementing an LLM from Scratch

Setup

Fine-tune LLM

Evaluating the model

Use and run the LLM

Options

Using GPU/CPU

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages