-
Notifications
You must be signed in to change notification settings - Fork 466
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Update README with .env details
- Loading branch information
1 parent
8cb27d9
commit 2c65a1c
Showing
7 changed files
with
247 additions
and
87 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,4 @@ | ||
# MongoDB Config | ||
DATABASE_HOST=mongodb://decodingml:[email protected]:27017 | ||
DATABASE_NAME=twin | ||
# --- Required settings even when working locally. --- | ||
|
||
# OpenAI API Config | ||
OPENAI_MODEL_ID=gpt-4o-mini | ||
|
@@ -9,15 +7,23 @@ OPENAI_API_KEY=str | |
# Huggingface API Config | ||
HUGGINGFACE_ACCESS_TOKEN=str | ||
|
||
# RAG | ||
RAG_MODEL_DEVICE=cpu | ||
# Comet ML (during training) | ||
COMET_API_KEY=str | ||
COMET_WORKSPACE=llm-engineers-handbook | ||
|
||
# AWS Credentials | ||
# --- Required settings when deploying the code. --- | ||
# --- Otherwise, default values values work fine. --- | ||
|
||
# MongoDB database | ||
DATABASE_HOST="mongodb://decodingml:[email protected]:27017" | ||
|
||
# Qdrant vector database | ||
USE_QDRANT_CLOUD=false | ||
QDRANT_CLOUD_URL="str" | ||
QDRANT_APIKEY="str" | ||
|
||
# AWS Authentication | ||
AWS_ARN_ROLE=str | ||
AWS_REGION=eu-central-1 | ||
AWS_ACCESS_KEY=str | ||
AWS_SECRET_KEY=str | ||
AWS_REGION=eu-central-1 | ||
|
||
# LinkedIn Credentials | ||
LINKEDIN_USERNAME=str | ||
LINKEDIN_PASSWORD=str |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,108 @@ | ||
# LLM-Engineering | ||
|
||
## Dependencies | ||
Repository that contains all the code used throughout the [LLM Engineer's Handbook](https://www.amazon.com/LLM-Engineers-Handbook-engineering-production/dp/1836200072/). | ||
|
||
- Python 3.11 | ||
- Poetry 1.8.3 | ||
- Docker 26.0.0 | ||
![Book Cover](/images/book_cover.png) | ||
|
||
## Install | ||
# Dependencies | ||
|
||
## Local dependencies | ||
|
||
To install and run the project locally, you need the following dependencies (the code was tested with the specified versions of the dependencies): | ||
|
||
- [pyenv 2.3.36](https://github.com/pyenv/pyenv) (optional: for installing multiple Python versions on your machine) | ||
- [Python 3.11](https://www.python.org/downloads/) | ||
- [Poetry 1.8.3](https://python-poetry.org/docs/#installation) | ||
- [Docker 27.1.1](https://docs.docker.com/engine/install/) | ||
|
||
## Cloud services | ||
|
||
The code also uses and depends on the following cloud services. For now, you don't have to do anything. We will guide you in the installation and deployment sections on how to use them: | ||
|
||
- [HuggingFace](https://huggingface.com/): Model registry | ||
- [Comet ML](https://www.comet.com/site/): Experiment tracker | ||
- [Opik](https://www.comet.com/site/products/opik/): LLM evaluation and prompt monitoring | ||
- [ZenML](https://www.zenml.io/): Orchestrator | ||
- [AWS](https://aws.amazon.com/): Compute and storage | ||
- [MongoDB](https://www.mongodb.com/): NoSQL database | ||
- [Qdrant](https://qdrant.tech/): Vector database | ||
|
||
In the [LLM Engineer's Handbook](https://www.amazon.com/LLM-Engineers-Handbook-engineering-production/dp/1836200072/), Chapter 2 will walk you through each tool, and in Chapters 10 and 11, you will have step-by-step guides on how to set everything you need. | ||
|
||
# Install | ||
|
||
## Install Python 3.11 using pyenv (Optional) | ||
|
||
If you have a different global Python version than Python 3.11, you can use pyenv to install Python 3.11 at the project level. Verify your Python version with: | ||
```shell | ||
python --version | ||
``` | ||
|
||
First, verify that you have pyenv installed: | ||
```shell | ||
pyenv --version | ||
# Output: pyenv 2.3.36 | ||
``` | ||
|
||
Install Python 3.11: | ||
```shell | ||
pyenv install 3.11 | ||
``` | ||
|
||
From the root of your repository, run the following to verify that everything works fine: | ||
```shell | ||
pyenv versions | ||
# Output: | ||
# system | ||
# * 3.11.8 (set by <path/to/repo>/LLM-Engineers-Handbook/.python-version) | ||
``` | ||
|
||
Because we defined a `.python-version` file within the repository, pyenv will know to pick up the version from that file and use it locally whenever you are working within that folder. To double-check that, run the following command while you are in the repository: | ||
```shell | ||
python --version | ||
# Output: Python 3.11.8 | ||
``` | ||
|
||
If you move out of this repository, both `pyenv versions` and `python --version`, might output different Python versions. | ||
|
||
## Install project dependences | ||
|
||
The first step is to verify that you have Poetry installed: | ||
```shell | ||
poetry --version | ||
# Output: Poetry (version 1.8.3) | ||
``` | ||
|
||
Use Poetry to install all the project's requirements to run it locally. Thus, we don't need to install any AWS dependencies. Also, we install Poe the Poet as a Poetry plugin to manage our CLI commands and pre-commit to verify our code before committing changes to git: | ||
```shell | ||
poetry install --without aws | ||
poetry self add 'poethepoet[poetry_plugin]' | ||
poetry self add 'poethepoet[poetry_plugin]==0.29.0' | ||
pre-commit install | ||
``` | ||
|
||
We run all the scripts using [Poe the Poet](https://poethepoet.natn.io/index.html). You don't have to do anything else but install it as a Poetry plugin. | ||
We run all the scripts using [Poe the Poet](https://poethepoet.natn.io/index.html). You don't have to do anything else but install Poe the Poet as a Poetry plugin, as described above: `poetry self add 'poethepoet[poetry_plugin]'` | ||
|
||
To activate the environment created by Poetry, run: | ||
```shell | ||
poetry shell | ||
``` | ||
|
||
## Set up .env settings file (for local development) | ||
|
||
### Configure sensitive information | ||
After you have installed all the dependencies, you must create a `.env` file with sensitive credentials to run the project. | ||
After you have installed all the dependencies, you must create and fill a `.env` file with your credentials to properly interact with other services and run the project. | ||
|
||
First, copy our example by running the following: | ||
```shell | ||
cp .env.example .env # The file has to be at the root of your repository! | ||
cp .env.example .env # The file must be at your repository's root! | ||
``` | ||
|
||
Now, let's understand how to fill in all the variables inside the `.env` file to get you started. | ||
|
||
We will begin by reviewing the mandatory settings we must complete when working locally or in the cloud. | ||
|
||
### OpenAI | ||
|
||
To authenticate to OpenAI, you must fill out the `OPENAI_API_KEY` env var with an authentication token. | ||
To authenticate to OpenAI's API, you must fill out the `OPENAI_API_KEY` env var with an authentication token. | ||
|
||
→ Check out this [tutorial](https://platform.openai.com/docs/quickstart) to learn how to provide one from OpenAI. | ||
|
||
|
@@ -38,32 +112,53 @@ To authenticate to HuggingFace, you must fill out the `HUGGINGFACE_ACCESS_TOKEN` | |
|
||
→ Check out this [tutorial](https://huggingface.co/docs/hub/en/security-tokens) to learn how to provide one from HuggingFace. | ||
|
||
### Comet ML | ||
|
||
### LinkedIn Crawling [Optional] | ||
This step is optional. You can finish the project without this step. | ||
Comet ML is required only during training. | ||
|
||
But in case you want to enable LinkedIn crawling, you have to fill in your username and password: | ||
```shell | ||
LINKEDIN_USERNAME = "str" | ||
LINKEDIN_PASSWORD = "str" | ||
To authenticate to Comet ML, you must fill out the `COMET_API_KEY` and `COMET_WORKSPACE` env vars with an authentication token and workspace name. | ||
|
||
→ Check out this [tutorial](https://www.comet.com/docs/v2/api-and-sdk/rest-api/overview/) to learn how to fill the Comet ML variables from above. | ||
|
||
### Opik | ||
|
||
> Soon | ||
|
||
## Set up .env settings file (for deployment) | ||
|
||
when deploying the project to the cloud, we must set additional settings for Mongo, Qdrant, and AWS. | ||
|
||
If you are just working localy, the default values of these env vars will work out-of-the-box. | ||
|
||
We will just highlight what has to be configured, as in **Chapter 11** of the [LLM Engineer's Handbook](https://www.amazon.com/LLM-Engineers-Handbook-engineering-production/dp/1836200072/) we provide step-by-step details on how to deploy the whole system to the cloud. | ||
|
||
### MongoDB | ||
|
||
We must change the `DATABASE_HOST` env var with the URL pointing to the cloud MongoDB cluster. | ||
|
||
### Qdrant | ||
|
||
Change `USE_QDRANT_CLOUD` to `True` and `QDRANT_CLOUD_URL` with the URL and `QDRANT_APIKEY` with the API KEY of your cloud Qdrant cluster. | ||
|
||
To work with Qdrant cloud, the env vars will look like this: | ||
```env | ||
USE_QDRANT_CLOUD=true | ||
QDRANT_CLOUD_URL="<your_qdrant_cloud_url>" | ||
QDRANT_APIKEY="<your_qdrant_api_key>" | ||
``` | ||
|
||
For this to work, you also have to: | ||
- disable 2FA | ||
- disable suspicious activity | ||
### AWS | ||
|
||
We also recommend to: | ||
- create a dummy profile for crawling | ||
- crawl only your data | ||
|
||
|
||
> [!IMPORTANT] | ||
> Find more configuration options in the [settings.py](https://github.com/PacktPublishing/LLM-Engineering/blob/main/llm_engineering/settings.py) file. Every variable from the `Settings` class can be configured through the `.env` file. | ||
> Find more configuration options in the [settings.py](https://github.com/PacktPublishing/LLM-Engineers-Handbook/blob/main/llm_engineering/settings.py) file. Every variable from the `Settings` class can be configured through the `.env` file. | ||
|
||
## Run Locally | ||
# Run the project locally | ||
|
||
### Local Infrastructure | ||
## Local infrastructure | ||
|
||
> [!WARNING] | ||
> You need Docker installed (v27.1.1 or higher) | ||
|
@@ -84,7 +179,7 @@ poetry poe local-infrastructure-down | |
> `export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` | ||
> Otherwise, the connection between the local server and pipeline will break. 🔗 More details in [this issue](https://github.com/zenml-io/zenml/issues/2369). | ||
#### ZenML is now accessible at: | ||
### ZenML is now accessible at: | ||
|
||
Web UI: [localhost:8237](localhost:8237) | ||
|
||
|
@@ -94,15 +189,15 @@ Default credentials: | |
|
||
→🔗 [More on ZenML](https://docs.zenml.io/) | ||
|
||
#### Qdrant is now accessible at: | ||
### Qdrant is now accessible at: | ||
|
||
REST API: [localhost:6333](localhost:6333) | ||
Web UI: [localhost:6333/dashboard](localhost:6333/dashboard) | ||
GRPC API: [localhost:6334](localhost:6334) | ||
|
||
→🔗 [More on Qdrant](https://qdrant.tech/documentation/quick-start/) | ||
|
||
#### MongoDB is now accessible at: | ||
### MongoDB is now accessible at: | ||
|
||
database URI: `mongodb://decodingml:[email protected]:27017` | ||
database name: `twin` | ||
|
@@ -113,7 +208,7 @@ database name: `twin` | |
We will fill this section in the future. So far it is available only in the 11th Chapter of the book. | ||
|
||
|
||
### Run Pipelines | ||
## Run Pipelines | ||
|
||
All the pipelines will be orchestrated behind the scenes by ZenML. | ||
|
||
|
@@ -126,7 +221,7 @@ To see the pipelines running and their results: | |
|
||
**But first, let's understand how we can run all our ML pipelines** ↓ | ||
|
||
#### Data pipelines | ||
### Data pipelines | ||
|
||
Run the data collection ETL: | ||
```shell | ||
|
@@ -155,14 +250,14 @@ poetry poe run-end-to-end-data-pipeline | |
``` | ||
|
||
|
||
#### Utility pipelines | ||
### Utility pipelines | ||
|
||
Export ZenML artifacts to JSON: | ||
```shell | ||
poetry poe run-export-artifact-to-json-pipeline | ||
``` | ||
|
||
#### Training pipelines | ||
### Training pipelines | ||
|
||
```shell | ||
poetry poe run-training-pipeline | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.