Skip to content

peremartra/Large-Language-Model-Notebooks-Course

Repository files navigation

Large Language Models Course: Learn by Doing LLM Projects.

This is the unofficial repository for the book: Large Language Models: Apply and Implement Strategies for Large Language Models (Apress). The book is based on the content of this repository, but the notebooks are being updated, and I am incorporating new examples and chapters. If you are looking for the official repository for the book, with the original notebooks, you should visit the Apress repository, where you can find all the notebooks in their original format as they appear in the book. Buy it at: [Amazon] [Springer]

Please note that the course on GitHub does not contain all the information that is in the book.

This practical free hands on course about Large Language models and their applications is πŸ‘·πŸΌin permanent developmentπŸ‘·πŸΌ. I will be posting the different lessons and samples as I complete them.

The course provides a hands-on experience using models from OpenAI and the Hugging Face library. We are going to see and use a lot of tools and practice with small projects that will grow as we can apply the new knowledge acquired.

The course is divided into three major sections:

1- Techniques and Libraries:

In this part, we will explore different techniques through small examples that will enable us to build bigger projects in the following section. We will learn how to use the most common libraries in the world of Large Language Models, always with a practical focus, while basing our approach on published papers.

Some of the topics and technologies covered in this section include: Chatbots, Code Generation, OpenAI API, Hugging Face, Vector databases, LangChain, Fine Tuning, PEFT Fine Tuning, Soft Prompt tuning, LoRA, QLoRA, Evaluate Models, Knowledge Distillation.

2- Projects:

We will create projects, explaining design decisions. Each project may have more than one possible implementation, as often there is not just one perfect solution. In this section, we will also delve into LLMOps-related topics, although it is not the primary focus of the course.

3- Enterprise Solutions:

Large Language Models are not a standalone solution. In large corporate environments, they are just one piece of the puzzle. We will explore how to structure solutions capable of transforming organizations with thousands of employees, and how Large Language Models play a main role in these new solutions.

How to use the course.

Under each section you can find different chapters, that are formed by different lessons. The title of the lesson is a link to the lesson page, where you can found all the notebooks and articles of the lesson.

Each Lesson is conformed by notebooks and articles. The notebooks contain sufficient information for understanding the code within them, the article provides more detailed explanations about the code and the topic covered.

My advice is to have the article open alongside the notebook and follow along. Many of the articles offer small tips on variations that you can introduce to the notebooks. I recommend following them to enhance clarity of the concepts.

Most of the notebooks are hosted on Colab, while a few are on Kaggle. Kaggle provides more memory in the free version compared to Colab, but I find that copying and sharing notebooks is simpler in Colab, and not everyone has a Kaggle account.

Some of the notebooks require more memory than what the free version of Colab provides. As we are working with large language models, this is a common situation that will recur if you continue working with them. You can run the notebooks in your own environment or opt for the Pro version of Colab.


πŸš€1- Techniques and Libraries.

Each notebook is supported with a Medium article where the code is explained in detail.

In this first section of the course, we will learn to work with the OpenAI API by creating two small projects. We'll delve into OpenAI's roles and how to provide the necessary instructions to the model through the prompt to make it behave as we desire.

The first project is a restaurant chatbot where the model will take customer orders. Building upon this project, we will construct an SQL statement generator. Here, we'll attempt to create a secure prompt that only accepts SQL creation commands and nothing else.

Create Your First Chatbot Using GPT 3.5, OpenAI, Python and Panel.

We will be utilizing OpenAI GPT-3.5 and Panel to develop a straightforward Chatbot tailored for a fast food restaurant. During the course, we will explore the fundamentals of prompt engineering, including understanding the various OpenAI roles, manipulating temperature settings, and how to avoid Prompt Injections.

article panel / article gradio Notebook panel / notebook gradio

How to Create a Natural Language to SQL Translator Using OpenAI API.

Following the same framework utilized in the previous article to create the ChatBot, we made a few modifications to develop a Natural Language to SQL translator. In this case, the Model needs to be provided with the table structures, and adjustments were made to the prompt to ensure smooth functionality and avoid any potential malfunctions. With these modifications in place, the translator is capable of converting natural language queries into SQL queries. @fmquaglia has created a notebook using DBML to describe the tables that by far is a better aproach than the original.

Article / Article Gradio Notebook / Notebook Gradio / Notebook DBML

Brief Introduction to Prompt Engineering with OpenAI.

We will explore prompt engineering techniques to improve the results we obtain from Models. Like how to format the answer and obtain a structured response using Few Shot Samples.

Article Notebook

A brief introduction to Vector Databases, a technology that will accompany us in many lessons throughout the course. We will work on an example of Retrieval Augmented Generation using information from various news datasets stored in ChromaDB.

Influencing Language Models with Personalized Information using a Vector Database.

If there's one aspect gaining importance in the world of large language models, it's exploring how to leverage proprietary information with them. In this lesson, we explore a possible solution that involves storing information in a vector database, ChromaDB in our case, and using it to create enriched prompts.

Article Notebook

Semantic Cache for RAG systems

We enhanced the RAG system by introducing a semantic cache layer capable of determining if a similar question has been asked before. If affirmative, it retrieves information from a cache system created with Faiss instead of accessing the Vector Database.

The inspiration and base code of the semantic cache present in this notebook exist thanks to the course: https://maven.com/boring-bot/advanced-llm/1/home from Hamza Farooq.

Article Notebook
WIP Notebook

LangChain has been one of the libraries in the universe of large language models that has contributed the most to this revolution. It allows us to chain calls to Models and other systems, allowing us to build applications based on large language models. In the course, we will use it several times, creating increasingly complex projects.

Retrieval Augmented Generation (RAG). Use the Data from your DataFrames with LLMs.

In this lesson, we used LangChain to enhance the notebook from the previous lesson, where we used data from two datasets to create an enriched prompt. This time, with the help of LangChain, we built a pipeline that is responsible for retrieving data from the vector database and passing it to the Language Model. The notebook is set up to work with two different datasets and two different models. One of the models is trained for Text Generation, while the other is trained for Text2Text Generation.

Article Notebook

Create a Moderation system using LangChain.

We will create a comment response system using a two-model pipeline built with LangChain. In this setup, the second model will be responsible for moderating the responses generated by the first model.

One effective way to prevent our system from generating unwanted responses is by using a second model that has no direct interaction with users to handle response generation.

This approach can reduce the risk of undesired responses generated by the first model in response to the user's entry.

I will create separate notebooks for this task. One will involve models from OpenAI, and the others will utilize open-source models provided by Hugging Face. The results obtained in the three notebooks are very different. The system works much better with the OpenAI, and LLAMA2 models.

Article Notebook
OpenAI article OpenAI notebook
Llama2-7B Article Llama2-7B Notebook
No Article GPT-J Notebook

Create a Data Analyst Assistant using a LLM Agent.

Agents are one of the most powerful tools in the world of Large Language Models. The agent is capable of interpreting the user's request and using the tools and libraries at its disposal until it achieves the expected result.

With LangChain Agents, we are going to create in just a few lines one of the simplest yet incredibly powerful agents. The agent will act as a Data Analyst Assistant and help us in analyzing data contained in any Excel file. It will be able to identify trends, use models, make forecasts. In summary, we are going to create a simple agent that we can use in our daily work to analyze our data.

Article Notebook

Create a Medical ChatBot with LangChain and ChromaDB.

In this example, two technologies seen previously are combined: agents and vector databases. Medical information is stored in ChromaDB, and a LangChain Agent is created, which will fetch it only when necessary to create an enriched prompt that will be sent to the model to answer the user's question.

In other words, a RAG system is created to assist a Medical ChatBot.

Attention!!! Use it only as an example. Nobody should take the boot's recommendations as those of a real doctor. I disclaim all responsibility for the use that may be given to the ChatBot. I have built it only as an example of different technologies.

Article Notebook

The metrics used to measure the performance of Large Language Models are quite different from the ones we've been using in more traditional models. We're shifting away from metrics like Accuracy, F1 score, or recall, and moving towards metrics like BLEU, ROUGE, or METEOR.

These metrics are tailored to the specific task assigned to the language model.

In this section, we'll explore examples of several of these metrics and how to use them to determine whether one model is superior to another for a given task. We'll delve into practical scenarios where these metrics help us make informed decisions about the performance of different models.

Evaluating translations with BLEU.

Bleu is one of the first Metrics stablished to evaluate the quality of translations. In the notebook we compare the quality of a translation made by google with other from an Open Source Model from Hugging Face.

Article WIP Notebook

Evaluating Summarization with ROUGE.

We will explore the usage of the ROUGE metric to measure the quality of summaries generated by a language model. We are going to use two T5 models, one of them being the t5-Base model and the other a t5-base fine-tuned specifically designed for creating summaries.

Article Notebook

Monitor an Agent using LangSmith.

In this initial example, you can observe how to use LangSmith to monitor the traffic between the various components that make up the Agent. The agent is a RAG system that utilizes a vectorial database to construct an enriched prompt and pass it to the model. LangSmith captures both the use of the Agent's tools and the decisions made by the model, providing information at all times about the sent/received data, consumed tokens, the duration of the query, and all of this in a truly user-friendly environment.

Article Notebook

Evaluating the quality of summaries using Embedding distance with LangSmith.

Previously in the notebook, Rouge Metrics: Evaluating Summaries, we learned how to use ROUGE to evaluate which summary best approximated the one created by a human. This time, we will use embedding distance and LangSmith to verify which model produces summaries more similar to the reference ones.

Article Notebook

Evaluating a RAG solution using Giskard.

We take the agent that functions as a medical assistant and incorporate Giskard to evaluate if its responses are correct. In this way, not only the model's response is evaluated, but also the information retrieval in the vector database. Giskard is a solution that allows evaluating a complete RAG solution.

Article Notebook

Introduction to the lm-evaluation library from Eluther.ai.

The lm-eval library by EleutherAI provides easy access to academic benchmarks that have become industry standards. It supports the evaluation of both Open Source models and APIs from providers like OpenAI, and even allows for the evaluation of adapters created using techniques such as LoRA.

In this notebook, I will focus on a small but important feature of the library: evaluating models compatible with Hugging Face's Transformers library.

Article - WIP Notebook

In the FineTuning & Optimization section, we will explore different techniques such as Prompt Fine Tuning or LoRA, and we will use the Hugging Face PEFT library to efficiently fine-tune Large Language Models. We will explore techniques like quantization to reduce the weight of the Models.

Prompt tuning using PEFT Library from Hugging Face.

In this notebook, two models are trained using Prompt Tuning from the PEFT library. This technique not only allows us to train by modifying the weights of very few parameters but also enables us to have different specialized models loaded in memory that use the same foundational model.

Prompt tuning is an additive technique, and the weights of the pre-trained model are not modified. The weights that we modify in this case are those of virtual tokens that we add to the prompt.

Article Notebook

Fine-Tuning with LoRA using PEFT from Hugging Face.

After a brief explanation of how the fine-tuning technique LoRA works, we will fine-tune a model from the Bloom family to teach it to construct prompts that can be used to instruct large language models.

Article Notebook

Fine-Tuning a 7B Model in a single 16GB GPU using QLoRA.

We are going to see a brief introduction to quantization, used to reduce the size of big Large Language Models. With quantization, you can load big models reducing the memory resources needed. It also applies to the fine-tuning process, you can fine-tune the model in a single GPU without consuming all the resources. After the brief explanation we see an example about how is possible to fine-tune a Bloom 7B Model ina a t4 16GB GPU on Google Colab.

Article Notebook

This section is still under construction. The goal is to build a curriculum that will take us from the most simple pruning techniques to creating a model using the same techniques employed by leading companies in the field, such as Microsoft, Google, Nvidia, or OpenAI, to build their models.

Prune a distilGPT2 model using l1 norm to determine less important neurons.

In the first notebook, the pruning process will be applied to the feedforward layers of a distilGPT2 model. This means the model will have reduced weights in those specific layers. The neurons to prune are selected based on their importance scores, which we compute using the L1 norm of their weights. It is a simple aproach, for this first example, that can be used when you want to create a Pruned Model that mimics the Base model in all the areas.

By altering the model's structure, a new configuration file must be created to ensure it works correctly with the transformers library.

Notebook: Pruning a distilGPT2 model.

Prune a Llama3.2 model.

In this first notebook, we attempt to replicate the pruning process used with the distilGPT2 model but applied to a Llama model. By not taking the model's characteristics into account, the pruning process results in a completely unusable model. This notebook serves as an exercise to understand how crucial it is to know the structure of the models that will undergo pruning.

Notebook: Pruning a Llama3.2 model INCORRECT APROACH.

The second notebook addresses the issues encountered when applying the same pruning process to the Llama model as was used for distilGPT2.

The correct approach is to treat the MLP layers of the model as pairs rather than individual layers and to calculate neuron importance by considering both layers together. Additionally, we switch to using the maximum absolute weight to decide which neurons remain in the pruned layers.

Pruning Llama3 Article Notebook: Pruning a Llama3.2 model CORRECT APROACH.

Structured Depth Pruning. Eliminating complete blocks from large language models.

Depth pruning in a Llama-3.2 model.

In this notebook, we will look at an example of depth pruning, which involves removing entire layers from the model. The first thing to note is that removing entire layers from a transformer model usually has a significant impact on the model's performance. This is a much more drastic architectural change compared to the simple removal of neurons from the MLP layers, as seen in the previous example.

Notebook: Depth pruning a Llama Model.

Pruning Attention Layers.

This notebook implements the ideas presented in the paper: What Matters in Transformers? Not All Attention is Needed.

In this notebook, the attention layers that contribute the least to the model are marked to be bypassed, improving inference efficiency and reducing the model's resource consumption. To identify the layers that contribute the least, a simple activation with a prompt is used, and the cosine similarity between the layer's input and output is measured. The smaller the difference, the less modification the layer introduces.

The layer selection process implemented in the notebook is iterative. That is, the least contributing layer is selected, and the contribution of the remaining layers is recalculated using the same prompt. This process is repeated until the desired number of layers has been deactivated.

Since this type of pruning does not alter the model's structure, it does not reduce the model's size.

Article: WIP. Notebook: Pruning Attention Layers.

Knowledge distillation.

Knowledge Distillation involves training a smaller "student" model to mimic a larger, well-trained "teacher" model. The student learns not just from the correct labels but also from the probability distributions (soft targets) that the teacher model produces, effectively transferring the teacher's learned knowledge into a more compact form.

When combined with Pruning, you first create a pruned version of your base model by removing less important connections. During this process, some knowledge is inevitably lost. To recover this lost knowledge, you can apply Knowledge Distillation using the original base model as the teacher and the pruned model as the student, helping to restore some of the lost performance.

Both techniques address the same challenge: reducing model size and computational requirements while maintaining performance, making them crucial for deploying AI in resource-constrained environments like mobile devices.

Recovering knowledge from the base model using KD.

In this notebook, we will use Knowledge distillation to recover some of the knowledge lost during the model pruning process. Llama-3.2-1B will be used as the Teacher model, and the 40% pruned version will be used as the Student model. We will specifically improve the performance on the Lambada benchmark.

Notebook: Knowledge Distillation Llama 3.2.

πŸš€2- Projects.

In this straightforward initial project, we are going to develop a SQL generator from natural language. We'll begin by creating the prompt to implement two solutions: one using OpenAI models running on Azure, and the other with an open-source model from Hugging Face.

Article Notebook
Create a NL2SQL prompt for OpenAI Prompt creation for OpenAI
WIP Prompt creation for defog/SQLCoder
Inference Azure Configuration. Using Azure Inference Point

In this small project we will create a new model aligning a microsoft-phi-3-model with DPO and then publish it to Hugging Face.

Article Notebook
WIP Aligning with DPO a phi3-3 model.

πŸš€3- Architecting Enterprise Solutions.

In this initial solution, we design an architecture for an NL2SQL system capable of operating on a large database. The system is intended to be used with two or three different models. In fact, we use three models in the example.

It's an architecture that enables a fast project kickoff, providing service for only a few tables in the database, allowing us to add more tables at our pace.

In this solution, we explore the transformative power of embeddings and large language models (LLMs) in customer risk assessment and product recommendation in the financial industry. We'll be altering the format in which we store customer information, and consequently, we'll also be changing how this information travels within the systems, achieving important advantages.


Contributing to the course:

Please, if you find any problems, open an issue . I will do my best to fix it as soon as possible, and give you credit.

If you'd like to make a contribution or suggest a topic, please don't hesitate to start a Discussion. I'd be delighted to receive any opinions or advice.

Don't be shy, share the course on your social networks with your friends. Connect with me on LinkedIn or Twitter and feel free to share anything you'd like or ask any questions you may have.

Give a Star ⭐️ to the repository. It helps me a lot, and encourages me to continue adding lessons. It's a nice way to support free Open Source courses like this one.


References & Papers used in the Course: