Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make llama3.1 as default #2022

Merged
merged 7 commits into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/actions/install_dependencies/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ inputs:
poetry_version:
required: true
type: string
default: "1.5.1"
default: "1.8.3"

runs:
using: composite
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.external
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ FROM python:3.11.6-slim-bookworm as base
# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry
RUN pipx install poetry==1.8.3
ENV PATH="/root/.local/bin:$PATH"
ENV PATH=".venv/bin/:$PATH"

Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.local
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ FROM python:3.11.6-slim-bookworm as base
# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry
RUN pipx install poetry==1.8.3
ENV PATH="/root/.local/bin:$PATH"
ENV PATH=".venv/bin/:$PATH"

Expand Down
9 changes: 7 additions & 2 deletions fern/docs/pages/installation/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ pyenv local 3.11
Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
Follow the instructions on the official Poetry website to install it.

<Callout intent="warning">
A bug exists in Poetry versions 1.7.0 and earlier. We strongly recommend upgrading to a tested version.
To upgrade Poetry to latest tested version, run `poetry self update 1.8.3` after installing it.
</Callout>

### 4. Optional: Install `make`
To run various scripts, you need to install `make`. Follow the instructions for your operating system:
#### macOS
Expand Down Expand Up @@ -135,14 +140,14 @@ Now, start Ollama service (it will start a local inference server, serving both
ollama serve
```

Install the models to be used, the default settings-ollama.yaml is configured to user mistral 7b LLM (~4GB) and nomic-embed-text Embeddings (~275MB)
Install the models to be used, the default settings-ollama.yaml is configured to user llama3.1 8b LLM (~4GB) and nomic-embed-text Embeddings (~275MB)

By default, PGPT will automatically pull models as needed. This behavior can be changed by modifying the `ollama.autopull_models` property.

In any case, if you want to manually pull models, run the following commands:

```bash
ollama pull mistral
ollama pull llama3.1
ollama pull nomic-embed-text
```

Expand Down
2 changes: 1 addition & 1 deletion fern/docs/pages/installation/troubleshooting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ PrivateGPT uses the `AutoTokenizer` library to tokenize input text accurately. I
In your `settings.yaml` file, specify the model you want to use:
```yaml
llm:
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
```
2. **Set Access Token for Gated Models:**
If you are using a gated model, ensure the `access_token` is set as mentioned in the previous section.
Expand Down
4,613 changes: 2,207 additions & 2,406 deletions poetry.lock

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion private_gpt/launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ async def bind_injector_to_request(request: Request) -> None:

# Add LlamaIndex simple observability
global_handler = create_global_handler("simple")
LlamaIndexSettings.callback_manager = CallbackManager([global_handler])
if global_handler is not None:
LlamaIndexSettings.callback_manager = CallbackManager([global_handler])

settings = root_injector.get(Settings)
if settings.server.cors.enabled:
Expand Down
6 changes: 3 additions & 3 deletions settings-docker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ embedding:
mode: ${PGPT_MODE:sagemaker}

llamacpp:
llm_hf_repo_id: ${PGPT_HF_REPO_ID:TheBloke/Mistral-7B-Instruct-v0.1-GGUF}
llm_hf_model_file: ${PGPT_HF_MODEL_FILE:mistral-7b-instruct-v0.1.Q4_K_M.gguf}
llm_hf_repo_id: ${PGPT_HF_REPO_ID:lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF}
llm_hf_model_file: ${PGPT_HF_MODEL_FILE:Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf}

huggingface:
embedding_hf_model_name: ${PGPT_EMBEDDING_HF_MODEL_NAME:BAAI/bge-small-en-v1.5}
Expand All @@ -20,7 +20,7 @@ sagemaker:
embedding_endpoint_name: ${PGPT_SAGEMAKER_EMBEDDING_ENDPOINT_NAME:}

ollama:
llm_model: ${PGPT_OLLAMA_LLM_MODEL:mistral}
llm_model: ${PGPT_OLLAMA_LLM_MODEL:llama3.1}
embedding_model: ${PGPT_OLLAMA_EMBEDDING_MODEL:nomic-embed-text}
api_base: ${PGPT_OLLAMA_API_BASE:http://ollama:11434}
embedding_api_base: ${PGPT_OLLAMA_EMBEDDING_API_BASE:http://ollama:11434}
Expand Down
8 changes: 4 additions & 4 deletions settings-local.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ llm:
# Should be matching the selected model
max_new_tokens: 512
context_window: 3900
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
prompt_style: "mistral"
tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
prompt_style: "llama3"

llamacpp:
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
llm_hf_repo_id: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
llm_hf_model_file: Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

embedding:
mode: huggingface
Expand Down
2 changes: 1 addition & 1 deletion settings-ollama-pg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ embedding:
embed_dim: 768

ollama:
llm_model: mistral
llm_model: llama3.1
embedding_model: nomic-embed-text
api_base: http://localhost:11434

Expand Down
2 changes: 1 addition & 1 deletion settings-ollama.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ embedding:
mode: ollama

ollama:
llm_model: mistral
llm_model: llama3.1
embedding_model: nomic-embed-text
api_base: http://localhost:11434
embedding_api_base: http://localhost:11434 # change if your embedding model runs on another ollama
Expand Down
2 changes: 1 addition & 1 deletion settings-vllm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ server:
llm:
mode: openailike
max_new_tokens: 512
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
temperature: 0.1

embedding:
Expand Down
10 changes: 5 additions & 5 deletions settings.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,12 @@ ui:

llm:
mode: llamacpp
prompt_style: "mistral"
prompt_style: "llama3"
# Should be matching the selected model
max_new_tokens: 512
context_window: 3900
# Select your tokenizer. Llama-index tokenizer is the default.
# tokenizer: mistralai/Mistral-7B-Instruct-v0.2
# tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
temperature: 0.1 # The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)

rag:
Expand All @@ -62,8 +62,8 @@ clickhouse:
database: embeddings

llamacpp:
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
llm_hf_repo_id: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
llm_hf_model_file: Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 1.0 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
Expand Down Expand Up @@ -111,7 +111,7 @@ openai:
embedding_api_key: ${OPENAI_API_KEY:}

ollama:
llm_model: llama2
llm_model: llama3.1
embedding_model: nomic-embed-text
api_base: http://localhost:11434
embedding_api_base: http://localhost:11434 # change if your embedding model runs on another ollama
Expand Down
Loading