From fe0e1782d26fe51b9c7ef2d85a767a6789388c50 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 20 Nov 2024 19:46:27 +0700 Subject: [PATCH 01/17] Add example for AudioQnA deploy in AMD ROCm (#1147) Signed-off-by: artem-astafev Signed-off-by: Artem Astafev Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Liang Lv Signed-off-by: Chingis Yundunov --- .../docker_compose/amd/gpu/rocm/README.md | 170 ++++++++++++++++++ .../docker_compose/amd/gpu/rocm/compose.yaml | 110 ++++++++++++ .../docker_compose/amd/gpu/rocm/set_env.sh | 26 +++ AudioQnA/tests/test_compose_on_rocm.sh | 128 +++++++++++++ 4 files changed, 434 insertions(+) create mode 100644 AudioQnA/docker_compose/amd/gpu/rocm/README.md create mode 100644 AudioQnA/docker_compose/amd/gpu/rocm/compose.yaml create mode 100644 AudioQnA/docker_compose/amd/gpu/rocm/set_env.sh create mode 100644 AudioQnA/tests/test_compose_on_rocm.sh diff --git a/AudioQnA/docker_compose/amd/gpu/rocm/README.md b/AudioQnA/docker_compose/amd/gpu/rocm/README.md new file mode 100644 index 0000000000..3ae8cc8a38 --- /dev/null +++ b/AudioQnA/docker_compose/amd/gpu/rocm/README.md @@ -0,0 +1,170 @@ +# Build Mega Service of AudioQnA on AMD ROCm GPU + +This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice +pipeline on server on AMD ROCm GPU platform. + +## πŸš€ Build Docker images + +### 1. Source Code install GenAIComps + +```bash +git clone https://github.com/opea-project/GenAIComps.git +cd GenAIComps +``` + +### 2. Build ASR Image + +```bash +docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile . + + +docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile . +``` + +### 3. Build LLM Image + +```bash +docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . +``` + +Note: +For compose for ROCm example AMD optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm (https://github.com/huggingface/text-generation-inference) + +### 4. Build TTS Image + +```bash +docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile . + +docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile . +``` + +### 6. Build MegaService Docker Image + +To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below: + +```bash +git clone https://github.com/opea-project/GenAIExamples.git +cd GenAIExamples/AudioQnA/ +docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +``` + +Then run the command `docker images`, you will have following images ready: + +1. `opea/whisper:latest` +2. `opea/asr:latest` +3. `opea/llm-tgi:latest` +4. `opea/speecht5:latest` +5. `opea/tts:latest` +6. `opea/audioqna:latest` + +## πŸš€ Set the environment variables + +Before starting the services with `docker compose`, you have to recheck the following environment variables. + +```bash +export host_ip= # export host_ip=$(hostname -I | awk '{print $1}') +export HUGGINGFACEHUB_API_TOKEN= + +export TGI_LLM_ENDPOINT=http://$host_ip:3006 +export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3 + +export ASR_ENDPOINT=http://$host_ip:7066 +export TTS_ENDPOINT=http://$host_ip:7055 + +export MEGA_SERVICE_HOST_IP=${host_ip} +export ASR_SERVICE_HOST_IP=${host_ip} +export TTS_SERVICE_HOST_IP=${host_ip} +export LLM_SERVICE_HOST_IP=${host_ip} + +export ASR_SERVICE_PORT=3001 +export TTS_SERVICE_PORT=3002 +export LLM_SERVICE_PORT=3007 +``` + +or use set_env.sh file to setup environment variables. + +Note: Please replace with host_ip with your external IP address, do not use localhost. + +Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered, where is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) + +Example for set isolation for 1 GPU + + - /dev/dri/card0:/dev/dri/card0 + - /dev/dri/renderD128:/dev/dri/renderD128 + +Example for set isolation for 2 GPUs + + - /dev/dri/card0:/dev/dri/card0 + - /dev/dri/renderD128:/dev/dri/renderD128 + - /dev/dri/card0:/dev/dri/card0 + - /dev/dri/renderD129:/dev/dri/renderD129 + +Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) + +## πŸš€ Start the MegaService + +```bash +cd GenAIExamples/AudioQnA/docker_compose/amd/gpu/rocm/ +docker compose up -d +``` + +In following cases, you could build docker image from source by yourself. + +- Failed to download the docker image. +- If you want to use a specific version of Docker image. + +Please refer to 'Build Docker Images' in below. + +## πŸš€ Consume the AudioQnA Service + +Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the +base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen +to the response, decode the base64 string and save it as a .wav file. + +```bash +curl http://${host_ip}:3008/v1/audioqna \ + -X POST \ + -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' \ + -H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav +``` + +## πŸš€ Test MicroServices + +```bash +# whisper service +curl http://${host_ip}:7066/v1/asr \ + -X POST \ + -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \ + -H 'Content-Type: application/json' + +# asr microservice +curl http://${host_ip}:3001/v1/audio/transcriptions \ + -X POST \ + -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \ + -H 'Content-Type: application/json' + +# tgi service +curl http://${host_ip}:3006/generate \ + -X POST \ + -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ + -H 'Content-Type: application/json' + +# llm microservice +curl http://${host_ip}:3007/v1/chat/completions\ + -X POST \ + -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \ + -H 'Content-Type: application/json' + +# speecht5 service +curl http://${host_ip}:7055/v1/tts \ + -X POST \ + -d '{"text": "Who are you?"}' \ + -H 'Content-Type: application/json' + +# tts microservice +curl http://${host_ip}:3002/v1/audio/speech \ + -X POST \ + -d '{"text": "Who are you?"}' \ + -H 'Content-Type: application/json' + +``` diff --git a/AudioQnA/docker_compose/amd/gpu/rocm/compose.yaml b/AudioQnA/docker_compose/amd/gpu/rocm/compose.yaml new file mode 100644 index 0000000000..651fd5464b --- /dev/null +++ b/AudioQnA/docker_compose/amd/gpu/rocm/compose.yaml @@ -0,0 +1,110 @@ +# Copyright (C) 2024 Advanced Micro Devices, Inc. +# SPDX-License-Identifier: Apache-2.0 + +services: + whisper-service: + image: ${REGISTRY:-opea}/whisper:${TAG:-latest} + container_name: whisper-service + ports: + - "7066:7066" + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + restart: unless-stopped + asr: + image: ${REGISTRY:-opea}/asr:${TAG:-latest} + container_name: asr-service + ports: + - "3001:9099" + ipc: host + environment: + ASR_ENDPOINT: ${ASR_ENDPOINT} + speecht5-service: + image: ${REGISTRY:-opea}/speecht5:${TAG:-latest} + container_name: speecht5-service + ports: + - "7055:7055" + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + restart: unless-stopped + tts: + image: ${REGISTRY:-opea}/tts:${TAG:-latest} + container_name: tts-service + ports: + - "3002:9088" + ipc: host + environment: + TTS_ENDPOINT: ${TTS_ENDPOINT} + tgi-service: + image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + container_name: tgi-service + ports: + - "3006:80" + volumes: + - "./data:/data" + shm_size: 1g + devices: + - /dev/kfd:/dev/kfd + - /dev/dri/card1:/dev/dri/card1 + - /dev/dri/renderD136:/dev/dri/renderD136 + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 + command: --model-id ${LLM_MODEL_ID} + cap_add: + - SYS_PTRACE + group_add: + - video + security_opt: + - seccomp:unconfined + ipc: host + llm: + image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest} + container_name: llm-tgi-server + depends_on: + - tgi-service + ports: + - "3007:9000" + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT} + HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + restart: unless-stopped + audioqna-backend-server: + image: ${REGISTRY:-opea}/audioqna:${TAG:-latest} + container_name: audioqna-xeon-backend-server + depends_on: + - asr + - llm + - tts + ports: + - "3008:8888" + environment: + - no_proxy=${no_proxy} + - https_proxy=${https_proxy} + - http_proxy=${http_proxy} + - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP} + - ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP} + - ASR_SERVICE_PORT=${ASR_SERVICE_PORT} + - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP} + - LLM_SERVICE_PORT=${LLM_SERVICE_PORT} + - TTS_SERVICE_HOST_IP=${TTS_SERVICE_HOST_IP} + - TTS_SERVICE_PORT=${TTS_SERVICE_PORT} + ipc: host + restart: always + +networks: + default: + driver: bridge diff --git a/AudioQnA/docker_compose/amd/gpu/rocm/set_env.sh b/AudioQnA/docker_compose/amd/gpu/rocm/set_env.sh new file mode 100644 index 0000000000..8765b702b3 --- /dev/null +++ b/AudioQnA/docker_compose/amd/gpu/rocm/set_env.sh @@ -0,0 +1,26 @@ +#!/usr/bin/env bash set_env.sh + +# Copyright (C) 2024 Advanced Micro Devices, Inc. +# SPDX-License-Identifier: Apache-2.0 + + +# export host_ip= # export host_ip=$(hostname -I | awk '{print $1}') + +export host_ip="192.165.1.21" +export HUGGINGFACEHUB_API_TOKEN=${YOUR_HUGGINGFACEHUB_API_TOKEN} +# + +export TGI_LLM_ENDPOINT=http://$host_ip:3006 +export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3 + +export ASR_ENDPOINT=http://$host_ip:7066 +export TTS_ENDPOINT=http://$host_ip:7055 + +export MEGA_SERVICE_HOST_IP=${host_ip} +export ASR_SERVICE_HOST_IP=${host_ip} +export TTS_SERVICE_HOST_IP=${host_ip} +export LLM_SERVICE_HOST_IP=${host_ip} + +export ASR_SERVICE_PORT=3001 +export TTS_SERVICE_PORT=3002 +export LLM_SERVICE_PORT=3007 diff --git a/AudioQnA/tests/test_compose_on_rocm.sh b/AudioQnA/tests/test_compose_on_rocm.sh new file mode 100644 index 0000000000..86a1484728 --- /dev/null +++ b/AudioQnA/tests/test_compose_on_rocm.sh @@ -0,0 +1,128 @@ +#!/bin/bash +# Copyright (C) 2024 Advanced Micro Devices, Inc. +# SPDX-License-Identifier: Apache-2.0 + +set -ex +IMAGE_REPO=${IMAGE_REPO:-"opea"} +IMAGE_TAG=${IMAGE_TAG:-"latest"} +echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}" +echo "TAG=IMAGE_TAG=${IMAGE_TAG}" +export REGISTRY=${IMAGE_REPO} +export TAG=${IMAGE_TAG} + +WORKPATH=$(dirname "$PWD") +LOG_PATH="$WORKPATH/tests" +ip_address=$(hostname -I | awk '{print $1}') +export PATH="~/miniconda3/bin:$PATH" + +function build_docker_images() { + cd $WORKPATH/docker_image_build + git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../ + + echo "Build all the images with --no-cache, check docker_image_build.log for details..." + service_list="audioqna whisper asr llm-tgi speecht5 tts" + docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log + echo "docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm" + docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + docker images && sleep 1s +} + +function start_services() { + cd $WORKPATH/docker_compose/amd/gpu/rocm/ + export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} + export TGI_LLM_ENDPOINT=http://$ip_address:3006 + export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3 + + export ASR_ENDPOINT=http://$ip_address:7066 + export TTS_ENDPOINT=http://$ip_address:7055 + + export MEGA_SERVICE_HOST_IP=${ip_address} + export ASR_SERVICE_HOST_IP=${ip_address} + export TTS_SERVICE_HOST_IP=${ip_address} + export LLM_SERVICE_HOST_IP=${ip_address} + + export ASR_SERVICE_PORT=3001 + export TTS_SERVICE_PORT=3002 + export LLM_SERVICE_PORT=3007 + + # sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env + + # Start Docker Containers + docker compose up -d > ${LOG_PATH}/start_services_with_compose.log + n=0 + until [[ "$n" -ge 100 ]]; do + docker logs tgi-service > $LOG_PATH/tgi_service_start.log + if grep -q Connected $LOG_PATH/tgi_service_start.log; then + break + fi + sleep 5s + n=$((n+1)) + done +} +function validate_megaservice() { + result=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json') + echo $result + if [[ $result == *"AAA"* ]]; then + echo "Result correct." + else + docker logs whisper-service > $LOG_PATH/whisper-service.log + docker logs asr-service > $LOG_PATH/asr-service.log + docker logs speecht5-service > $LOG_PATH/tts-service.log + docker logs tts-service > $LOG_PATH/tts-service.log + docker logs tgi-service > $LOG_PATH/tgi-service.log + docker logs llm-tgi-server > $LOG_PATH/llm-tgi-server.log + docker logs audioqna-xeon-backend-server > $LOG_PATH/audioqna-xeon-backend-server.log + + echo "Result wrong." + exit 1 + fi + +} + +#function validate_frontend() { +# Frontend tests are currently disabled +# cd $WORKPATH/ui/svelte +# local conda_env_name="OPEA_e2e" +# export PATH=${HOME}/miniforge3/bin/:$PATH +## conda remove -n ${conda_env_name} --all -y +## conda create -n ${conda_env_name} python=3.12 -y +# source activate ${conda_env_name} +# +# sed -i "s/localhost/$ip_address/g" playwright.config.ts +# +## conda install -c conda-forge nodejs -y +# npm install && npm ci && npx playwright install --with-deps +# node -v && npm -v && pip list +# +# exit_status=0 +# npx playwright test || exit_status=$? +# +# if [ $exit_status -ne 0 ]; then +# echo "[TEST INFO]: ---------frontend test failed---------" +# exit $exit_status +# else +# echo "[TEST INFO]: ---------frontend test passed---------" +# fi +#} + +function stop_docker() { + cd $WORKPATH/docker_compose/amd/gpu/rocm/ + docker compose stop && docker compose rm -f +} + +function main() { + + stop_docker + if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi + start_services + + validate_megaservice + # Frontend tests are currently disabled + # validate_frontend + + stop_docker + echo y | docker system prune + +} + +main From 0cfb71d3cfd274ad209a735d5b9d482d2bf4b289 Mon Sep 17 00:00:00 2001 From: Chingis Yundunov Date: Tue, 26 Nov 2024 22:50:10 +0700 Subject: [PATCH 02/17] TranslationApp - add: 1. Docker Compose file 2. Set envs scripts 3. Tests script for deploy and tests Translation Application on AMD GPU Signed-off-by: Chingis Yundunov --- .../docker_compose/amd/gpu/rocm/compose.yaml | 99 ++++++++++ .../docker_compose/amd/gpu/rocm/set_env.sh | 18 ++ Translation/tests/test_compose_on_rocm.sh | 178 ++++++++++++++++++ 3 files changed, 295 insertions(+) create mode 100644 Translation/docker_compose/amd/gpu/rocm/compose.yaml create mode 100644 Translation/docker_compose/amd/gpu/rocm/set_env.sh create mode 100644 Translation/tests/test_compose_on_rocm.sh diff --git a/Translation/docker_compose/amd/gpu/rocm/compose.yaml b/Translation/docker_compose/amd/gpu/rocm/compose.yaml new file mode 100644 index 0000000000..8a4d70da28 --- /dev/null +++ b/Translation/docker_compose/amd/gpu/rocm/compose.yaml @@ -0,0 +1,99 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +services: + translation-tgi-service: + image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + container_name: translation-tgi-service + ports: + - "${TRANSLATIONS_TGI_SERVICE_PORT:-8008}:80" + volumes: + - "/var/lib/GenAI/translation/data:/data" + shm_size: 8g + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + TGI_LLM_ENDPOINT: ${TRANSLATIONS_TGI_LLM_ENDPOINT} + HUGGING_FACE_HUB_TOKEN: ${TRANSLATIONS_HUGGINGFACEHUB_API_TOKEN} + HUGGINGFACEHUB_API_TOKEN: ${TRANSLATIONS_HUGGINGFACEHUB_API_TOKEN} + devices: + - /dev/kfd:/dev/kfd + - /dev/dri/:/dev/dri/ + cap_add: + - SYS_PTRACE + group_add: + - video + security_opt: + - seccomp:unconfined + ipc: host + command: --model-id ${TRANSLATIONS_LLM_MODEL_ID} + translation-llm: + image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest} + container_name: translation-llm-tgi-server + depends_on: + - translation-tgi-service + ports: + - "9000:9000" + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + TGI_LLM_ENDPOINT: ${TRANSLATION_TGI_LLM_ENDPOINT} + HUGGINGFACEHUB_API_TOKEN: ${TRANSLATION_HUGGINGFACEHUB_API_TOKEN} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 + restart: unless-stopped + translation-backend-server: + image: ${REGISTRY:-opea}/translation:${TAG:-latest} + container_name: translation-backend-server + depends_on: + - translation-tgi-service + - translation-llm + ports: + - "${TRANSLATION_BACKEND_SERVICE_PORT:-8888}:8888" + environment: + - no_proxy=${no_proxy} + - https_proxy=${https_proxy} + - http_proxy=${http_proxy} + - MEGA_SERVICE_HOST_IP=${TRANSLATION_MEGA_SERVICE_HOST_IP} + - LLM_SERVICE_HOST_IP=${TRANSLATION_LLM_SERVICE_HOST_IP} + ipc: host + restart: always + translation-ui-server: + image: ${REGISTRY:-opea}/translation-ui:${TAG:-latest} + container_name: translation-ui-server + depends_on: + - translation-backend-server + ports: + - "${TRANSLATION_FRONTEND_SERVICE_PORT:-5173}:5173" + environment: + - no_proxy=${no_proxy} + - https_proxy=${https_proxy} + - http_proxy=${http_proxy} + - BASE_URL=${TRANSLATION_BACKEND_SERVICE_ENDPOINT} + ipc: host + restart: always + translation-nginx-server: + image: ${REGISTRY:-opea}/nginx:${TAG:-latest} + container_name: translation-nginx-server + depends_on: + - translation-backend-server + - translation-ui-server + ports: + - "${TRANSLATION_NGINX_PORT:-80}:80" + environment: + - no_proxy=${no_proxy} + - https_proxy=${https_proxy} + - http_proxy=${http_proxy} + - FRONTEND_SERVICE_IP=${TRANSLATION_FRONTEND_SERVICE_IP} + - FRONTEND_SERVICE_PORT=${TRANSLATION_FRONTEND_SERVICE_PORT} + - BACKEND_SERVICE_NAME=${TRANSLATION_BACKEND_SERVICE_NAME} + - BACKEND_SERVICE_IP=${TRANSLATION_BACKEND_SERVICE_IP} + - BACKEND_SERVICE_PORT=${TRANSLATION_BACKEND_SERVICE_PORT} + ipc: host + restart: always +networks: + default: + driver: bridge diff --git a/Translation/docker_compose/amd/gpu/rocm/set_env.sh b/Translation/docker_compose/amd/gpu/rocm/set_env.sh new file mode 100644 index 0000000000..76a2397b97 --- /dev/null +++ b/Translation/docker_compose/amd/gpu/rocm/set_env.sh @@ -0,0 +1,18 @@ +#!/usr/bin/env bash + +# SPDX-License-Identifier: Apache-2.0 + +export TRANSLATION_HOST_IP='192.165.1.21' +export TRANSLATION_EXTERNAL_HOST_IP='direct-supercomputer1.powerml.co' +export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B" +export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008" +export TRANSLATION_HUGGINGFACEHUB_API_TOKEN='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' +export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP} +export TRANSLATION_LLM_SERVICE_HOST_IP=${TRANSLATION_HOST_IP} +export TRANSLATION_FRONTEND_SERVICE_IP=${TRANSLATION_HOST_IP} +export TRANSLATION_FRONTEND_SERVICE_PORT=18122 +export TRANSLATION_BACKEND_SERVICE_NAME=translation +export TRANSLATION_BACKEND_SERVICE_IP=${TRANSLATION_HOST_IP} +export TRANSLATION_BACKEND_SERVICE_PORT=18121 +export TRANSLATION_BACKEND_SERVICE_ENDPOINT="http://${TRANSLATION_EXTERNAL_HOST_IP}:${TRANSLATION_BACKEND_SERVICE_PORT}/v1/translation" +export TRANSLATION_NGINX_PORT=18123 diff --git a/Translation/tests/test_compose_on_rocm.sh b/Translation/tests/test_compose_on_rocm.sh new file mode 100644 index 0000000000..514ddb3ee7 --- /dev/null +++ b/Translation/tests/test_compose_on_rocm.sh @@ -0,0 +1,178 @@ +#!/bin/bash +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +set -xe +IMAGE_REPO=${IMAGE_REPO:-"opea"} +IMAGE_TAG=${IMAGE_TAG:-"latest"} +echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}" +echo "TAG=IMAGE_TAG=${IMAGE_TAG}" +export REGISTRY=${IMAGE_REPO} +export TAG=${IMAGE_TAG} + +WORKPATH=$(dirname "$PWD") +LOG_PATH="$WORKPATH/tests" +ip_address=$(hostname -I | awk '{print $1}') + +function build_docker_images() { + cd $WORKPATH/docker_image_build + git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../ + + echo "Build all the images with --no-cache, check docker_image_build.log for details..." + service_list="translation translation-ui llm-tgi nginx" + docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log + + docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + docker images && sleep 1s +} + +function start_services() { + cd $WORKPATH/docker_compose/amd/gpu/rocm/ + + export TRANSLATION_HOST_IP=${ip_address} + export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B" + export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008" + export TRANSLATION_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} + export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP} + export TRANSLATION_LLM_SERVICE_HOST_IP=${TRANSLATION_HOST_IP} + export TRANSLATION_FRONTEND_SERVICE_IP=${TRANSLATION_HOST_IP} + export TRANSLATION_FRONTEND_SERVICE_PORT=5173 + export TRANSLATION_BACKEND_SERVICE_NAME=translation + export TRANSLATION_BACKEND_SERVICE_IP=${TRANSLATION_HOST_IP} + export TRANSLATION_BACKEND_SERVICE_PORT=8888 + export TRANSLATION_BACKEND_SERVICE_ENDPOINT="http://${TRANSLATION_HOST_IP}:${TRANSLATION_BACKEND_SERVICE_PORT}/v1/translation" + export TRANSLATION_NGINX_PORT=8084 + + sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env + + # Start Docker Containers + docker compose up -d > ${LOG_PATH}/start_services_with_compose.log + + n=0 + # wait long for llm model download + until [[ "$n" -ge 500 ]]; do + docker logs translation-tgi-service > ${LOG_PATH}/translation-tgi-service_start.log + if grep -q Connected ${LOG_PATH}/translation-tgi-service_start.log; then + break + fi + sleep 10s + n=$((n+1)) + done +} + +function validate_services() { + local URL="$1" + local EXPECTED_RESULT="$2" + local SERVICE_NAME="$3" + local DOCKER_NAME="$4" + local INPUT_DATA="$5" + + local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL") + if [ "$HTTP_STATUS" -eq 200 ]; then + echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..." + + local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log) + + if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then + echo "[ $SERVICE_NAME ] Content is as expected." + else + echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT" + docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log + exit 1 + fi + else + echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS" + docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log + exit 1 + fi + sleep 1s +} + +function validate_microservices() { + # Check if the microservices are running correctly. + + # tgi for llm service + validate_services \ + "${TRANSLATION_HOST_IP}:8008/generate" \ + "generated_text" \ + "translation-tgi-service" \ + "translation-tgi-service" \ + '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' + + # llm microservice + validate_services \ + "${TRANSLATION_HOST_IP}:9000/v1/chat/completions" \ + "data: " \ + "translation-llm" \ + "translation-llm-tgi-server" \ + '{"query":"Translate this from Chinese to English:\nChinese: ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚\nEnglish:"}' +} + +function validate_megaservice() { + # Curl the Mega Service + validate_services \ + "${TRANSLATION_HOST_IP}:8888/v1/translation" \ + "translation" \ + "translation-backend-server" \ + "translation-backend-server" \ + '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' + + # test the megeservice via nginx + validate_services \ + "${TRANSLATION_HOST_IP}:${TRANSLATION_NGINX_PORT}/v1/translation" \ + "translation" \ + "translation-nginx-server" \ + "translation-nginx-server" \ + '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' +} + +function validate_frontend() { + cd $WORKPATH/ui/svelte + local conda_env_name="OPEA_e2e" + export PATH=${HOME}/miniconda3/bin/:$PATH + if conda info --envs | grep -q "$conda_env_name"; then + echo "$conda_env_name exist!" + else + conda create -n ${conda_env_name} python=3.12 -y + fi + source activate ${conda_env_name} + + sed -i "s/localhost/$ip_address/g" playwright.config.ts + + conda install -c conda-forge nodejs=22.6.0 -y + npm install && npm ci && npx playwright install --with-deps + node -v && npm -v && pip list + + exit_status=0 + npx playwright test || exit_status=$? + + if [ $exit_status -ne 0 ]; then + echo "[TEST INFO]: ---------frontend test failed---------" + exit $exit_status + else + echo "[TEST INFO]: ---------frontend test passed---------" + fi +} + +function stop_docker() { + cd $WORKPATH/docker_compose/amd/gpu/rocm/ + docker compose stop && docker compose rm -f +} + +function main() { + + stop_docker + + if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi + start_services + + validate_microservices + validate_megaservice + validate_frontend + + stop_docker + echo y | docker system prune + +} + +main From 100dfbfbe12b7d67e639926c94948e36dfb110e2 Mon Sep 17 00:00:00 2001 From: Chingis Yundunov Date: Tue, 26 Nov 2024 22:56:18 +0700 Subject: [PATCH 03/17] TranslationApp - add README file Signed-off-by: Chingis Yundunov --- .../docker_compose/amd/gpu/rocm/README.md | 128 ++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 Translation/docker_compose/amd/gpu/rocm/README.md diff --git a/Translation/docker_compose/amd/gpu/rocm/README.md b/Translation/docker_compose/amd/gpu/rocm/README.md new file mode 100644 index 0000000000..5cff6dc36e --- /dev/null +++ b/Translation/docker_compose/amd/gpu/rocm/README.md @@ -0,0 +1,128 @@ +# Build and deploy Translation Application on AMD GPU (ROCm) + +## Build images + +### Build the LLM Docker Image + +```bash +### Cloning repo +git clone https://github.com/opea-project/GenAIComps.git +cd GenAIComps + +### Build Docker image +docker build -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . +``` + +### Build the MegaService Docker Image + +```bash +### Cloning repo +git clone https://github.com/opea-project/GenAIExamples +cd GenAIExamples/Translation/ + +### Build Docker image +docker build -t opea/translation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +``` + +### Build the UI Docker Image + +```bash +cd GenAIExamples/Translation/ui +### Build UI Docker image +docker build -t opea/translation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile . +``` + +## Deploy Translation Application + +### Features of Docker compose for AMD GPUs + +1. Added forwarding of GPU devices to the container TGI service with instructions: + +```yaml +shm_size: 1g +devices: + - /dev/kfd:/dev/kfd + - /dev/dri/:/dev/dri/ +cap_add: + - SYS_PTRACE +group_add: + - video +security_opt: + - seccomp:unconfined +``` + +In this case, all GPUs are thrown. To reset a specific GPU, you need to use specific device names cardN and renderN. + +For example: + +```yaml +shm_size: 1g +devices: + - /dev/kfd:/dev/kfd + - /dev/dri/card0:/dev/dri/card0 + - /dev/dri/render128:/dev/dri/render128 +cap_add: + - SYS_PTRACE +group_add: + - video +security_opt: + - seccomp:unconfined +``` + +To find out which GPU device IDs cardN and renderN correspond to the same GPU, use the GPU driver utility + +### Go to the directory with the Docker compose file + +```bash +cd GenAIExamples/Translation/docker_compose/amd/gpu/rocm +``` + +### Set environments + +In the file "GenAIExamples/Translation/docker_compose/amd/gpu/rocm/set_env.sh " it is necessary to set the required values. Parameter assignments are specified in the comments for each variable setting command + +```bash +chmod +x set_env.sh +. set_env.sh +``` + +### Run services + +``` +docker compose up -d +``` + +# Validate the MicroServices and MegaService + +## Validate TGI service + +```bash +curl http://${TRANSLATION_HOST_IP}:${TRANSLATIONS_TGI_SERVICE_PORT}/generate \ + -X POST \ + -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ + -H 'Content-Type: application/json' +``` + +## Validate LLM service + +```bash +curl http://${TRANSLATION_HOST_IP}:9000/v1/chat/completions \ + -X POST \ + -d '{"query":"Translate this from Chinese to English:\nChinese: ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚\nEnglish:"}' \ + -H 'Content-Type: application/json' +``` + +## Validate MegaService + +```bash +curl http://${TRANSLATION_HOST_IP}:${TRANSLATION_BACKEND_SERVICE_PORT}/v1/translation -H "Content-Type: application/json" -d '{ + "language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' +``` + +## Validate Nginx service + +```bash +curl http://${TRANSLATION_HOST_IP}:${TRANSLATION_NGINX_PORT}/v1/translation \ + -H "Content-Type: application/json" \ + -d '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' +``` From e458aacfacf018b87487a09c9ae785295417746c Mon Sep 17 00:00:00 2001 From: Chingis Yundunov Date: Tue, 26 Nov 2024 23:10:59 +0700 Subject: [PATCH 04/17] TranslationApp - fix Docker compose file and tests script Signed-off-by: Chingis Yundunov --- Translation/docker_compose/amd/gpu/rocm/set_env.sh | 1 + Translation/tests/test_compose_on_rocm.sh | 1 + 2 files changed, 2 insertions(+) diff --git a/Translation/docker_compose/amd/gpu/rocm/set_env.sh b/Translation/docker_compose/amd/gpu/rocm/set_env.sh index 76a2397b97..87c749ea81 100644 --- a/Translation/docker_compose/amd/gpu/rocm/set_env.sh +++ b/Translation/docker_compose/amd/gpu/rocm/set_env.sh @@ -5,6 +5,7 @@ export TRANSLATION_HOST_IP='192.165.1.21' export TRANSLATION_EXTERNAL_HOST_IP='direct-supercomputer1.powerml.co' export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B" +export MODEL_ID="haoranxu/ALMA-13B" export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008" export TRANSLATION_HUGGINGFACEHUB_API_TOKEN='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' export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP} diff --git a/Translation/tests/test_compose_on_rocm.sh b/Translation/tests/test_compose_on_rocm.sh index 514ddb3ee7..eba06b26aa 100644 --- a/Translation/tests/test_compose_on_rocm.sh +++ b/Translation/tests/test_compose_on_rocm.sh @@ -31,6 +31,7 @@ function start_services() { export TRANSLATION_HOST_IP=${ip_address} export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B" + export MODEL_ID="haoranxu/ALMA-13B" export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008" export TRANSLATION_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP} From adfb587a7c4db0d5aa1c6d3e32c14f2a4c138be2 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue, 26 Nov 2024 16:03:20 +0000 Subject: [PATCH 05/17] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Chingis Yundunov --- Translation/docker_compose/amd/gpu/rocm/set_env.sh | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Translation/docker_compose/amd/gpu/rocm/set_env.sh b/Translation/docker_compose/amd/gpu/rocm/set_env.sh index 87c749ea81..f6a2cec913 100644 --- a/Translation/docker_compose/amd/gpu/rocm/set_env.sh +++ b/Translation/docker_compose/amd/gpu/rocm/set_env.sh @@ -1,5 +1,8 @@ #!/usr/bin/env bash +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + # SPDX-License-Identifier: Apache-2.0 export TRANSLATION_HOST_IP='192.165.1.21' From 8b0526b5deef31cf11d5adeda4e4b8e650d603f7 Mon Sep 17 00:00:00 2001 From: Chingis Yundunov Date: Tue, 26 Nov 2024 23:16:20 +0700 Subject: [PATCH 06/17] TranslationApp - fix Docker compose file and tests script Signed-off-by: Chingis Yundunov --- Translation/docker_compose/amd/gpu/rocm/compose.yaml | 10 +++++----- Translation/docker_compose/amd/gpu/rocm/set_env.sh | 1 - Translation/tests/test_compose_on_rocm.sh | 1 - 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/Translation/docker_compose/amd/gpu/rocm/compose.yaml b/Translation/docker_compose/amd/gpu/rocm/compose.yaml index 8a4d70da28..266f294a9f 100644 --- a/Translation/docker_compose/amd/gpu/rocm/compose.yaml +++ b/Translation/docker_compose/amd/gpu/rocm/compose.yaml @@ -6,7 +6,7 @@ services: image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm container_name: translation-tgi-service ports: - - "${TRANSLATIONS_TGI_SERVICE_PORT:-8008}:80" + - "${TRANSLATION_TGI_SERVICE_PORT:-8008}:80" volumes: - "/var/lib/GenAI/translation/data:/data" shm_size: 8g @@ -14,9 +14,9 @@ services: no_proxy: ${no_proxy} http_proxy: ${http_proxy} https_proxy: ${https_proxy} - TGI_LLM_ENDPOINT: ${TRANSLATIONS_TGI_LLM_ENDPOINT} - HUGGING_FACE_HUB_TOKEN: ${TRANSLATIONS_HUGGINGFACEHUB_API_TOKEN} - HUGGINGFACEHUB_API_TOKEN: ${TRANSLATIONS_HUGGINGFACEHUB_API_TOKEN} + TGI_LLM_ENDPOINT: ${TRANSLATION_TGI_LLM_ENDPOINT} + HUGGING_FACE_HUB_TOKEN: ${TRANSLATION_HUGGINGFACEHUB_API_TOKEN} + HUGGINGFACEHUB_API_TOKEN: ${TRANSLATION_HUGGINGFACEHUB_API_TOKEN} devices: - /dev/kfd:/dev/kfd - /dev/dri/:/dev/dri/ @@ -27,7 +27,7 @@ services: security_opt: - seccomp:unconfined ipc: host - command: --model-id ${TRANSLATIONS_LLM_MODEL_ID} + command: --model-id ${TRANSLATION_LLM_MODEL_ID} translation-llm: image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest} container_name: translation-llm-tgi-server diff --git a/Translation/docker_compose/amd/gpu/rocm/set_env.sh b/Translation/docker_compose/amd/gpu/rocm/set_env.sh index f6a2cec913..8e33e61123 100644 --- a/Translation/docker_compose/amd/gpu/rocm/set_env.sh +++ b/Translation/docker_compose/amd/gpu/rocm/set_env.sh @@ -8,7 +8,6 @@ export TRANSLATION_HOST_IP='192.165.1.21' export TRANSLATION_EXTERNAL_HOST_IP='direct-supercomputer1.powerml.co' export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B" -export MODEL_ID="haoranxu/ALMA-13B" export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008" export TRANSLATION_HUGGINGFACEHUB_API_TOKEN='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' export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP} diff --git a/Translation/tests/test_compose_on_rocm.sh b/Translation/tests/test_compose_on_rocm.sh index eba06b26aa..514ddb3ee7 100644 --- a/Translation/tests/test_compose_on_rocm.sh +++ b/Translation/tests/test_compose_on_rocm.sh @@ -31,7 +31,6 @@ function start_services() { export TRANSLATION_HOST_IP=${ip_address} export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B" - export MODEL_ID="haoranxu/ALMA-13B" export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008" export TRANSLATION_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP} From a32048454a6a53037444dfc729a53712a8074504 Mon Sep 17 00:00:00 2001 From: minmin-intel Date: Wed, 20 Nov 2024 17:30:11 -0800 Subject: [PATCH 07/17] Fix DocIndexRetriever CI error on Xeon (#1167) Signed-off-by: minmin-intel Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Chingis Yundunov --- DocIndexRetriever/tests/test.py | 7 ++-- .../tests/test_compose_on_gaudi.sh | 4 ++ .../tests/test_compose_on_xeon.sh | 39 ++++++++++++++----- 3 files changed, 36 insertions(+), 14 deletions(-) diff --git a/DocIndexRetriever/tests/test.py b/DocIndexRetriever/tests/test.py index 698f40da30..e26ccd3dbd 100644 --- a/DocIndexRetriever/tests/test.py +++ b/DocIndexRetriever/tests/test.py @@ -6,7 +6,7 @@ import requests -def search_knowledge_base(query: str, url: str, request_type="chat_completion") -> str: +def search_knowledge_base(query: str, url: str, request_type: str) -> str: """Search the knowledge base for a specific query.""" print(url) proxies = {"http": ""} @@ -18,12 +18,13 @@ def search_knowledge_base(query: str, url: str, request_type="chat_completion") "top_n": 2, } else: - print("Sending text request") + print("Sending textdoc request") payload = { "text": query, } response = requests.post(url, json=payload, proxies=proxies) print(response) + print(response.json().keys()) if "documents" in response.json(): docs = response.json()["documents"] context = "" @@ -32,7 +33,6 @@ def search_knowledge_base(query: str, url: str, request_type="chat_completion") context = str(i) + ": " + doc else: context += "\n" + str(i) + ": " + doc - # print(context) return context elif "text" in response.json(): return response.json()["text"] @@ -44,7 +44,6 @@ def search_knowledge_base(query: str, url: str, request_type="chat_completion") context = doc["text"] else: context += "\n" + doc["text"] - # print(context) return context else: return "Error parsing response from the knowledge base." diff --git a/DocIndexRetriever/tests/test_compose_on_gaudi.sh b/DocIndexRetriever/tests/test_compose_on_gaudi.sh index e652ead26b..bea6f8e7a1 100644 --- a/DocIndexRetriever/tests/test_compose_on_gaudi.sh +++ b/DocIndexRetriever/tests/test_compose_on_gaudi.sh @@ -15,6 +15,7 @@ LOG_PATH="$WORKPATH/tests" ip_address=$(hostname -I | awk '{print $1}') function build_docker_images() { + echo "Building Docker Images...." cd $WORKPATH/docker_image_build if [ ! -d "GenAIComps" ] ; then git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../ @@ -26,9 +27,11 @@ function build_docker_images() { docker pull redis/redis-stack:7.2.0-v9 docker pull ghcr.io/huggingface/tei-gaudi:1.5.0 docker images && sleep 1s + echo "Docker images built!" } function start_services() { + echo "Starting Docker Services...." cd $WORKPATH/docker_compose/intel/hpu/gaudi export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" export RERANK_MODEL_ID="BAAI/bge-reranker-base" @@ -47,6 +50,7 @@ function start_services() { # Start Docker Containers docker compose up -d sleep 20 + echo "Docker services started!" } function validate() { diff --git a/DocIndexRetriever/tests/test_compose_on_xeon.sh b/DocIndexRetriever/tests/test_compose_on_xeon.sh index c6ff29e29f..a106301598 100644 --- a/DocIndexRetriever/tests/test_compose_on_xeon.sh +++ b/DocIndexRetriever/tests/test_compose_on_xeon.sh @@ -15,8 +15,10 @@ LOG_PATH="$WORKPATH/tests" ip_address=$(hostname -I | awk '{print $1}') function build_docker_images() { + echo "Building Docker Images...." cd $WORKPATH/docker_image_build if [ ! -d "GenAIComps" ] ; then + echo "Cloning GenAIComps repository" git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../ fi service_list="dataprep-redis embedding-tei retriever-redis reranking-tei doc-index-retriever" @@ -25,9 +27,12 @@ function build_docker_images() { docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 docker pull redis/redis-stack:7.2.0-v9 docker images && sleep 1s + + echo "Docker images built!" } function start_services() { + echo "Starting Docker Services...." cd $WORKPATH/docker_compose/intel/cpu/xeon export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" export RERANK_MODEL_ID="BAAI/bge-reranker-base" @@ -45,7 +50,8 @@ function start_services() { # Start Docker Containers docker compose up -d - sleep 20 + sleep 5m + echo "Docker services started!" } function validate() { @@ -66,7 +72,7 @@ function validate_megaservice() { echo "===========Ingest data==================" local CONTENT=$(http_proxy="" curl -X POST "http://${ip_address}:6007/v1/dataprep" \ -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]') + -F 'link_list=["https://opea.dev/"]') local EXIT_CODE=$(validate "$CONTENT" "Data preparation succeeded" "dataprep-redis-service-xeon") echo "$EXIT_CODE" local EXIT_CODE="${EXIT_CODE:0-1}" @@ -77,19 +83,26 @@ function validate_megaservice() { fi # Curl the Mega Service - echo "================Testing retriever service: Default params================" - - local CONTENT=$(curl http://${ip_address}:8889/v1/retrievaltool -X POST -H "Content-Type: application/json" -d '{ - "messages": "Explain the OPEA project?" + echo "================Testing retriever service: Text Request ================" + cd $WORKPATH/tests + local CONTENT=$(http_proxy="" curl http://${ip_address}:8889/v1/retrievaltool -X POST -H "Content-Type: application/json" -d '{ + "text": "Explain the OPEA project?" }') + # local CONTENT=$(python test.py --host_ip ${ip_address} --request_type text) local EXIT_CODE=$(validate "$CONTENT" "OPEA" "doc-index-retriever-service-xeon") echo "$EXIT_CODE" local EXIT_CODE="${EXIT_CODE:0-1}" echo "return value is $EXIT_CODE" if [ "$EXIT_CODE" == "1" ]; then - docker logs tei-embedding-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log + echo "=============Embedding container log==================" + docker logs embedding-tei-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log + echo "=============Retriever container log==================" docker logs retriever-redis-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log - docker logs reranking-tei-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log + echo "=============TEI Reranking log==================" + docker logs tei-reranking-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log + echo "=============Reranking container log==================" + docker logs reranking-tei-xeon-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log + echo "=============Doc-index-retriever container log==================" docker logs doc-index-retriever-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log exit 1 fi @@ -102,9 +115,15 @@ function validate_megaservice() { local EXIT_CODE="${EXIT_CODE:0-1}" echo "return value is $EXIT_CODE" if [ "$EXIT_CODE" == "1" ]; then - docker logs tei-embedding-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log + echo "=============Embedding container log==================" + docker logs embedding-tei-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log + echo "=============Retriever container log==================" docker logs retriever-redis-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log - docker logs reranking-tei-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log + echo "=============TEI Reranking log==================" + docker logs tei-reranking-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log + echo "=============Reranking container log==================" + docker logs reranking-tei-xeon-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log + echo "=============Doc-index-retriever container log==================" docker logs doc-index-retriever-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log exit 1 fi From 78d6293d8ecea7ce4d4ea5caf8253a8d67798757 Mon Sep 17 00:00:00 2001 From: Letong Han <106566639+letonghan@users.noreply.github.com> Date: Thu, 21 Nov 2024 10:48:52 +0800 Subject: [PATCH 08/17] Fix Translation Manifest CI with MODEL_ID (#1169) Signed-off-by: letonghan Signed-off-by: Chingis Yundunov --- Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml | 2 +- .../kubernetes/intel/hpu/gaudi/manifest/translation.yaml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml b/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml index 9cc8c2798f..f8e2b6e659 100644 --- a/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml +++ b/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml @@ -10,7 +10,7 @@ metadata: app.kubernetes.io/instance: translation app.kubernetes.io/version: "2.1.0" data: - LLM_MODEL_ID: "haoranxu/ALMA-13B" + MODEL_ID: "haoranxu/ALMA-13B" PORT: "2080" HF_TOKEN: "insert-your-huggingface-token-here" http_proxy: "" diff --git a/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml b/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml index 25e39a7002..61a487a0db 100644 --- a/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml +++ b/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml @@ -10,7 +10,7 @@ metadata: app.kubernetes.io/instance: translation app.kubernetes.io/version: "2.1.0" data: - LLM_MODEL_ID: "haoranxu/ALMA-13B" + MODEL_ID: "haoranxu/ALMA-13B" PORT: "2080" HF_TOKEN: "insert-your-huggingface-token-here" http_proxy: "" From 122dc7c31368a24a3515cf3951d6dc9384e421eb Mon Sep 17 00:00:00 2001 From: bjzhjing Date: Thu, 21 Nov 2024 14:14:27 +0800 Subject: [PATCH 09/17] Adjustments for helm release change (#1173) Signed-off-by: Cathy Zhang Signed-off-by: Chingis Yundunov --- .../kubernetes/intel/gaudi/README.md | 12 +---- .../kubernetes/intel/gaudi/deploy.py | 52 +++---------------- 2 files changed, 10 insertions(+), 54 deletions(-) diff --git a/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/README.md b/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/README.md index d667727f48..ae0537f8ff 100644 --- a/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/README.md +++ b/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/README.md @@ -69,10 +69,6 @@ Results will be displayed in the terminal and saved as CSV file named `1_stats.c - Persistent Volume Claim (PVC): This is the recommended approach for production setups. For more details on using PVC, refer to [PVC](https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/README.md#using-persistent-volume). - Local Host Path: For simpler testing, ensure that each node involved in the deployment follows the steps above to locally prepare the models. After preparing the models, use `--set global.modelUseHostPath=${MODELDIR}` in the deployment command. -- Add OPEA Helm Repository: - ```bash - python deploy.py --add-repo - ``` - Label Nodes ```base python deploy.py --add-label --num-nodes 2 @@ -192,13 +188,9 @@ All the test results will come to the folder `GenAIEval/evals/benchmark/benchmar ## Teardown -After completing the benchmark, use the following commands to clean up the environment: +After completing the benchmark, use the following command to clean up the environment: Remove Node Labels: -```base -python deploy.py --delete-label -``` -Delete the OPEA Helm Repository: ```bash -python deploy.py --delete-repo +python deploy.py --delete-label ``` diff --git a/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/deploy.py b/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/deploy.py index 4632cc79c2..6f1f97cac2 100644 --- a/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/deploy.py +++ b/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/deploy.py @@ -83,26 +83,6 @@ def clear_labels_from_nodes(label, node_names=None): print(f"Label {label_key} not found on node {node_name}, skipping.") -def add_helm_repo(repo_name, repo_url): - # Add the repo if it does not exist - add_command = ["helm", "repo", "add", repo_name, repo_url] - try: - subprocess.run(add_command, check=True) - print(f"Added Helm repo {repo_name} from {repo_url}.") - except subprocess.CalledProcessError as e: - print(f"Failed to add Helm repo {repo_name}: {e}") - - -def delete_helm_repo(repo_name): - """Delete Helm repo if it exists.""" - command = ["helm", "repo", "remove", repo_name] - try: - subprocess.run(command, check=True) - print(f"Deleted Helm repo {repo_name}.") - except subprocess.CalledProcessError: - print(f"Failed to delete Helm repo {repo_name}. It may not exist.") - - def install_helm_release(release_name, chart_name, namespace, values_file, device_type): """Deploy a Helm release with a specified name and chart. @@ -132,14 +112,14 @@ def install_helm_release(release_name, chart_name, namespace, values_file, devic if device_type == "gaudi": print("Device type is gaudi. Pulling Helm chart to get gaudi-values.yaml...") - # Pull and untar the chart - subprocess.run(["helm", "pull", chart_name, "--untar"], check=True) + # Combine chart_name with fixed prefix + chart_pull_url = f"oci://ghcr.io/opea-project/charts/{chart_name}" - # Determine the directory name (get the actual chart_name if chart_name is in the format 'repo_name/chart_name', else use chart_name directly) - chart_dir_name = chart_name.split("/")[-1] if "/" in chart_name else chart_name + # Pull and untar the chart + subprocess.run(["helm", "pull", chart_pull_url, "--untar"], check=True) - # Find the untarred directory (assumes only one directory matches chart_dir_name) - untar_dirs = glob.glob(f"{chart_dir_name}*") + # Find the untarred directory + untar_dirs = glob.glob(f"{chart_name}*") if untar_dirs: untar_dir = untar_dirs[0] hw_values_file = os.path.join(untar_dir, "gaudi-values.yaml") @@ -210,20 +190,14 @@ def main(): parser.add_argument( "--chart-name", type=str, - default="opea/chatqna", - help="The chart name to deploy, composed of repo name and chart name (default: opea/chatqna).", + default="chatqna", + help="The chart name to deploy, composed of repo name and chart name (default: chatqna).", ) parser.add_argument("--namespace", default="default", help="Kubernetes namespace (default: default).") parser.add_argument("--hf-token", help="Hugging Face API token.") parser.add_argument( "--model-dir", help="Model directory, mounted as volumes for service access to pre-downloaded models" ) - parser.add_argument("--repo-name", default="opea", help="Helm repo name to add/delete (default: opea).") - parser.add_argument( - "--repo-url", - default="https://opea-project.github.io/GenAIInfra", - help="Helm repository URL (default: https://opea-project.github.io/GenAIInfra).", - ) parser.add_argument("--user-values", help="Path to a user-specified values.yaml file.") parser.add_argument( "--create-values-only", action="store_true", help="Only create the values.yaml file without deploying." @@ -244,8 +218,6 @@ def main(): action="store_true", help="Modify resources for services and change extraCmdArgs when creating values.yaml.", ) - parser.add_argument("--add-repo", action="store_true", help="Add the Helm repo specified by --repo-url.") - parser.add_argument("--delete-repo", action="store_true", help="Delete the Helm repo specified by --repo-name.") parser.add_argument( "--device-type", type=str, @@ -264,14 +236,6 @@ def main(): else: args.num_nodes = num_node_names - # Helm repository management - if args.add_repo: - add_helm_repo(args.repo_name, args.repo_url) - return - elif args.delete_repo: - delete_helm_repo(args.repo_name) - return - # Node labeling management if args.add_label: add_labels_to_nodes(args.num_nodes, args.label, args.node_names) From b0110c7f9fa251d3f741aae6eb2f23cc7f9e3b60 Mon Sep 17 00:00:00 2001 From: Mingyuan Qi Date: Thu, 21 Nov 2024 20:36:28 +0800 Subject: [PATCH 10/17] Fix code scanning alert no. 21: Uncontrolled data used in path expression (#1171) Signed-off-by: Mingyuan Qi Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Chingis Yundunov --- EdgeCraftRAG/Dockerfile.server | 5 +++ EdgeCraftRAG/README.md | 44 +++++-------------- .../docker_compose/intel/gpu/arc/compose.yaml | 1 + .../intel/gpu/arc/compose_vllm.yaml | 1 + .../edgecraftrag/components/generator.py | 13 +++--- .../configs/test_pipeline_local_llm.json | 2 +- .../tests/configs/test_pipeline_vllm.json | 2 +- .../tests/test_compose_vllm_on_arc.sh | 3 +- .../tests/test_pipeline_local_llm.json | 2 +- EdgeCraftRAG/ui/gradio/default.yaml | 2 +- EdgeCraftRAG/ui/gradio/ecrag_client.py | 2 +- 11 files changed, 30 insertions(+), 47 deletions(-) diff --git a/EdgeCraftRAG/Dockerfile.server b/EdgeCraftRAG/Dockerfile.server index f076dcd16d..b327544129 100644 --- a/EdgeCraftRAG/Dockerfile.server +++ b/EdgeCraftRAG/Dockerfile.server @@ -23,6 +23,11 @@ RUN useradd -m -s /bin/bash user && \ mkdir -p /home/user && \ chown -R user /home/user/ +RUN mkdir /templates && \ + chown -R user /templates +COPY ./edgecraftrag/prompt_template/default_prompt.txt /templates/ +RUN chown -R user /templates/default_prompt.txt + COPY ./edgecraftrag /home/user/edgecraftrag RUN mkdir -p /home/user/gradio_cache diff --git a/EdgeCraftRAG/README.md b/EdgeCraftRAG/README.md index a248225325..ed828165cc 100644 --- a/EdgeCraftRAG/README.md +++ b/EdgeCraftRAG/README.md @@ -32,14 +32,14 @@ Please follow this link [vLLM with OpenVINO](https://github.com/opea-project/Gen ### Start Edge Craft RAG Services with Docker Compose -If you want to enable vLLM with OpenVINO service, please finish the steps in [Launch vLLM with OpenVINO service](#optional-launch-vllm-with-openvino-service) first. - ```bash cd GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc export MODEL_PATH="your model path for all your models" export DOC_PATH="your doc path for uploading a dir of files" export GRADIO_PATH="your gradio cache path for transferring files" +# If you have a specific prompt template, please uncomment the following line +# export PROMPT_PATH="your prompt path for prompt templates" # Make sure all 3 folders have 1000:1000 permission, otherwise # chown 1000:1000 ${MODEL_PATH} ${DOC_PATH} ${GRADIO_PATH} @@ -70,49 +70,25 @@ optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH}/BAAI/bge-sma optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH}/BAAI/bge-reranker-large --task sentence-similarity optimum-cli export openvino -m Qwen/Qwen2-7B-Instruct ${MODEL_PATH}/Qwen/Qwen2-7B-Instruct/INT4_compressed_weights --weight-format int4 -docker compose up -d +``` + +#### Launch services with local inference +```bash +docker compose -f compose.yaml up -d ``` -#### (Optional) Launch vLLM with OpenVINO service +#### Launch services with vLLM + OpenVINO inference service -1. Set up Environment Variables +Set up Additional Environment Variables and start with compose_vllm.yaml ```bash export LLM_MODEL=#your model id export VLLM_SERVICE_PORT=8008 export vLLM_ENDPOINT="http://${HOST_IP}:${VLLM_SERVICE_PORT}" export HUGGINGFACEHUB_API_TOKEN=#your HF token -``` - -2. Uncomment below code in 'GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml' -```bash - # vllm-openvino-server: - # container_name: vllm-openvino-server - # image: opea/vllm-arc:latest - # ports: - # - ${VLLM_SERVICE_PORT:-8008}:80 - # environment: - # HTTPS_PROXY: ${https_proxy} - # HTTP_PROXY: ${https_proxy} - # VLLM_OPENVINO_DEVICE: GPU - # HF_ENDPOINT: ${HF_ENDPOINT} - # HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - # volumes: - # - /dev/dri/by-path:/dev/dri/by-path - # - $HOME/.cache/huggingface:/root/.cache/huggingface - # devices: - # - /dev/dri - # entrypoint: /bin/bash -c "\ - # cd / && \ - # export VLLM_CPU_KVCACHE_SPACE=50 && \ - # export VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON && \ - # python3 -m vllm.entrypoints.openai.api_server \ - # --model '${LLM_MODEL}' \ - # --max_model_len=1024 \ - # --host 0.0.0.0 \ - # --port 80" +docker compose -f compose_vllm.yaml up -d ``` ### ChatQnA with LLM Example (Command Line) diff --git a/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml b/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml index a695fbc022..68a5c953c9 100644 --- a/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml +++ b/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml @@ -16,6 +16,7 @@ services: - ${DOC_PATH:-${PWD}}:/home/user/docs - ${GRADIO_PATH:-${PWD}}:/home/user/gradio_cache - ${HF_CACHE:-${HOME}/.cache}:/home/user/.cache + - ${PROMPT_PATH:-${PWD}}:/templates/custom ports: - ${PIPELINE_SERVICE_PORT:-16010}:${PIPELINE_SERVICE_PORT:-16010} devices: diff --git a/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml b/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml index 6ba7c4da27..c1e937fa69 100644 --- a/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml +++ b/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml @@ -16,6 +16,7 @@ services: - ${DOC_PATH:-${PWD}}:/home/user/docs - ${GRADIO_PATH:-${PWD}}:/home/user/gradio_cache - ${HF_CACHE:-${HOME}/.cache}:/home/user/.cache + - ${PROMPT_PATH:-${PWD}}:/templates/custom ports: - ${PIPELINE_SERVICE_PORT:-16010}:${PIPELINE_SERVICE_PORT:-16010} devices: diff --git a/EdgeCraftRAG/edgecraftrag/components/generator.py b/EdgeCraftRAG/edgecraftrag/components/generator.py index a888bf18f6..02c8cec2bb 100644 --- a/EdgeCraftRAG/edgecraftrag/components/generator.py +++ b/EdgeCraftRAG/edgecraftrag/components/generator.py @@ -26,12 +26,13 @@ def __init__(self, llm_model, prompt_template, inference_type, **kwargs): ("\n\n", "\n"), ("\t\n", "\n"), ) - template = prompt_template - self.prompt = ( - DocumentedContextRagPromptTemplate.from_file(template) - if os.path.isfile(template) - else DocumentedContextRagPromptTemplate.from_template(template) - ) + safe_root = "/templates" + template = os.path.normpath(os.path.join(safe_root, prompt_template)) + if not template.startswith(safe_root): + raise ValueError("Invalid template path") + if not os.path.exists(template): + raise ValueError("Template file not exists") + self.prompt = DocumentedContextRagPromptTemplate.from_file(template) self.llm = llm_model if isinstance(llm_model, str): self.model_id = llm_model diff --git a/EdgeCraftRAG/tests/configs/test_pipeline_local_llm.json b/EdgeCraftRAG/tests/configs/test_pipeline_local_llm.json index 261459e835..c657362ec1 100644 --- a/EdgeCraftRAG/tests/configs/test_pipeline_local_llm.json +++ b/EdgeCraftRAG/tests/configs/test_pipeline_local_llm.json @@ -37,7 +37,7 @@ "device": "auto", "weight": "INT4" }, - "prompt_path": "./edgecraftrag/prompt_template/default_prompt.txt", + "prompt_path": "./default_prompt.txt", "inference_type": "local" }, "active": "True" diff --git a/EdgeCraftRAG/tests/configs/test_pipeline_vllm.json b/EdgeCraftRAG/tests/configs/test_pipeline_vllm.json index 05809c8e13..60565907ac 100644 --- a/EdgeCraftRAG/tests/configs/test_pipeline_vllm.json +++ b/EdgeCraftRAG/tests/configs/test_pipeline_vllm.json @@ -37,7 +37,7 @@ "device": "auto", "weight": "INT4" }, - "prompt_path": "./edgecraftrag/prompt_template/default_prompt.txt", + "prompt_path": "./default_prompt.txt", "inference_type": "vllm" }, "active": "True" diff --git a/EdgeCraftRAG/tests/test_compose_vllm_on_arc.sh b/EdgeCraftRAG/tests/test_compose_vllm_on_arc.sh index 1d65057be5..4fa7ee92e7 100755 --- a/EdgeCraftRAG/tests/test_compose_vllm_on_arc.sh +++ b/EdgeCraftRAG/tests/test_compose_vllm_on_arc.sh @@ -31,8 +31,7 @@ vLLM_ENDPOINT="http://${HOST_IP}:${VLLM_SERVICE_PORT}" function build_docker_images() { cd $WORKPATH/docker_image_build echo "Build all the images with --no-cache, check docker_image_build.log for details..." - service_list="server ui ecrag" - docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log + docker compose -f build.yaml build --no-cache > ${LOG_PATH}/docker_image_build.log echo "Build vllm_openvino image from GenAIComps..." cd $WORKPATH && git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" diff --git a/EdgeCraftRAG/tests/test_pipeline_local_llm.json b/EdgeCraftRAG/tests/test_pipeline_local_llm.json index 261459e835..c657362ec1 100644 --- a/EdgeCraftRAG/tests/test_pipeline_local_llm.json +++ b/EdgeCraftRAG/tests/test_pipeline_local_llm.json @@ -37,7 +37,7 @@ "device": "auto", "weight": "INT4" }, - "prompt_path": "./edgecraftrag/prompt_template/default_prompt.txt", + "prompt_path": "./default_prompt.txt", "inference_type": "local" }, "active": "True" diff --git a/EdgeCraftRAG/ui/gradio/default.yaml b/EdgeCraftRAG/ui/gradio/default.yaml index 39c3ee92e3..ad3718f0c1 100644 --- a/EdgeCraftRAG/ui/gradio/default.yaml +++ b/EdgeCraftRAG/ui/gradio/default.yaml @@ -29,7 +29,7 @@ postprocessor: "reranker" # Generator generator: "chatqna" -prompt_path: "./edgecraftrag/prompt_template/default_prompt.txt" +prompt_path: "./default_prompt.txt" # Models embedding_model_id: "BAAI/bge-small-en-v1.5" diff --git a/EdgeCraftRAG/ui/gradio/ecrag_client.py b/EdgeCraftRAG/ui/gradio/ecrag_client.py index 6593cbd94f..7a58ff720b 100644 --- a/EdgeCraftRAG/ui/gradio/ecrag_client.py +++ b/EdgeCraftRAG/ui/gradio/ecrag_client.py @@ -78,7 +78,7 @@ def create_update_pipeline( ], generator=api_schema.GeneratorIn( # TODO: remove hardcoding - prompt_path="./edgecraftrag/prompt_template/default_prompt.txt", + prompt_path="./default_prompt.txt", model=api_schema.ModelIn(model_id=llm_id, model_path=llm_path, device=llm_device, weight=llm_weights), inference_type=llm_infertype, ), From b621db2bf4ef8a06733daa9f3397a1dc80e291c4 Mon Sep 17 00:00:00 2001 From: "Wang, Kai Lawrence" <109344418+wangkl2@users.noreply.github.com> Date: Fri, 22 Nov 2024 09:20:09 +0800 Subject: [PATCH 11/17] Update the llm backend ports (#1172) Signed-off-by: Wang, Kai Lawrence Signed-off-by: Chingis Yundunov --- ChatQnA/docker_compose/amd/gpu/rocm/README.md | 2 +- ChatQnA/docker_compose/intel/hpu/gaudi/README.md | 6 +++--- ChatQnA/docker_compose/nvidia/gpu/README.md | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/README.md b/ChatQnA/docker_compose/amd/gpu/rocm/README.md index 9e18d0f61e..9ef30d2a16 100644 --- a/ChatQnA/docker_compose/amd/gpu/rocm/README.md +++ b/ChatQnA/docker_compose/amd/gpu/rocm/README.md @@ -290,7 +290,7 @@ docker compose up -d Try the command below to check whether the TGI service is ready. ```bash - docker logs ${CONTAINER_ID} | grep Connected + docker logs chatqna-tgi-server | grep Connected ``` If the service is ready, you will get the response like below. diff --git a/ChatQnA/docker_compose/intel/hpu/gaudi/README.md b/ChatQnA/docker_compose/intel/hpu/gaudi/README.md index b083b3d403..9e2b5b5455 100644 --- a/ChatQnA/docker_compose/intel/hpu/gaudi/README.md +++ b/ChatQnA/docker_compose/intel/hpu/gaudi/README.md @@ -314,7 +314,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid Try the command below to check whether the LLM serving is ready. ```bash - docker logs tgi-service | grep Connected + docker logs tgi-gaudi-server | grep Connected ``` If the service is ready, you will get the response like below. @@ -327,7 +327,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid ```bash # TGI service - curl http://${host_ip}:9009/v1/chat/completions \ + curl http://${host_ip}:8005/v1/chat/completions \ -X POST \ -d '{"model": ${LLM_MODEL_ID}, "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \ -H 'Content-Type: application/json' @@ -335,7 +335,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid ```bash # vLLM Service - curl http://${host_ip}:9009/v1/chat/completions \ + curl http://${host_ip}:8007/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": ${LLM_MODEL_ID}, "messages": [{"role": "user", "content": "What is Deep Learning?"}]}' ``` diff --git a/ChatQnA/docker_compose/nvidia/gpu/README.md b/ChatQnA/docker_compose/nvidia/gpu/README.md index 686ead52db..92b7a26e79 100644 --- a/ChatQnA/docker_compose/nvidia/gpu/README.md +++ b/ChatQnA/docker_compose/nvidia/gpu/README.md @@ -273,7 +273,7 @@ docker compose up -d Try the command below to check whether the TGI service is ready. ```bash - docker logs ${CONTAINER_ID} | grep Connected + docker logs tgi-server | grep Connected ``` If the service is ready, you will get the response like below. @@ -285,7 +285,7 @@ docker compose up -d Then try the `cURL` command below to validate TGI. ```bash - curl http://${host_ip}:9009/v1/chat/completions \ + curl http://${host_ip}:8008/v1/chat/completions \ -X POST \ -d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \ -H 'Content-Type: application/json' From 280030c4be7dec5f9026fc119e26c07fe0d9dd40 Mon Sep 17 00:00:00 2001 From: ZePan110 Date: Mon, 25 Nov 2024 10:33:33 +0800 Subject: [PATCH 12/17] Limit the version of vllm to avoid dockers build failures. (#1183) Signed-off-by: ZePan110 Signed-off-by: Chingis Yundunov --- .github/workflows/_example-workflow.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/_example-workflow.yml b/.github/workflows/_example-workflow.yml index a86ac25929..b05d67eedc 100644 --- a/.github/workflows/_example-workflow.yml +++ b/.github/workflows/_example-workflow.yml @@ -75,7 +75,7 @@ jobs: docker_compose_path=${{ github.workspace }}/${{ inputs.example }}/docker_image_build/build.yaml if [[ $(grep -c "vllm:" ${docker_compose_path}) != 0 ]]; then git clone https://github.com/vllm-project/vllm.git - cd vllm && git rev-parse HEAD && cd ../ + cd vllm && git checkout 446c780 && cd ../ fi if [[ $(grep -c "vllm-gaudi:" ${docker_compose_path}) != 0 ]]; then git clone https://github.com/HabanaAI/vllm-fork.git From f87770c35e65888933ddc703803d72184a65bd35 Mon Sep 17 00:00:00 2001 From: chyundunovDatamonsters Date: Tue, 17 Dec 2024 11:57:55 +0700 Subject: [PATCH 13/17] Update set_env.sh Delete sensitive data from set envs script Signed-off-by: Chingis Yundunov --- Translation/docker_compose/amd/gpu/rocm/set_env.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Translation/docker_compose/amd/gpu/rocm/set_env.sh b/Translation/docker_compose/amd/gpu/rocm/set_env.sh index 8e33e61123..9efa6f3ee3 100644 --- a/Translation/docker_compose/amd/gpu/rocm/set_env.sh +++ b/Translation/docker_compose/amd/gpu/rocm/set_env.sh @@ -5,11 +5,11 @@ # SPDX-License-Identifier: Apache-2.0 -export TRANSLATION_HOST_IP='192.165.1.21' -export TRANSLATION_EXTERNAL_HOST_IP='direct-supercomputer1.powerml.co' +export TRANSLATION_HOST_IP='' +export TRANSLATION_EXTERNAL_HOST_IP='' export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B" export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008" -export TRANSLATION_HUGGINGFACEHUB_API_TOKEN='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' +export TRANSLATION_HUGGINGFACEHUB_API_TOKEN='' export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP} export TRANSLATION_LLM_SERVICE_HOST_IP=${TRANSLATION_HOST_IP} export TRANSLATION_FRONTEND_SERVICE_IP=${TRANSLATION_HOST_IP} From 1a860b413aba3f1625249dda71cf226ef5c6335e Mon Sep 17 00:00:00 2001 From: Chingis Yundunov Date: Wed, 18 Dec 2024 10:09:34 +0700 Subject: [PATCH 14/17] DBQnA - fix README for Translation Signed-off-by: Chingis Yundunov --- Translation/docker_compose/amd/gpu/rocm/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/Translation/docker_compose/amd/gpu/rocm/README.md b/Translation/docker_compose/amd/gpu/rocm/README.md index 5cff6dc36e..b9201ba571 100644 --- a/Translation/docker_compose/amd/gpu/rocm/README.md +++ b/Translation/docker_compose/amd/gpu/rocm/README.md @@ -1,5 +1,6 @@ # Build and deploy Translation Application on AMD GPU (ROCm) + ## Build images ### Build the LLM Docker Image From 4b7e8485456b4a9105518e1e07cb20da72bddd3b Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 18 Dec 2024 03:10:15 +0000 Subject: [PATCH 15/17] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- Translation/docker_compose/amd/gpu/rocm/README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/Translation/docker_compose/amd/gpu/rocm/README.md b/Translation/docker_compose/amd/gpu/rocm/README.md index b9201ba571..5cff6dc36e 100644 --- a/Translation/docker_compose/amd/gpu/rocm/README.md +++ b/Translation/docker_compose/amd/gpu/rocm/README.md @@ -1,6 +1,5 @@ # Build and deploy Translation Application on AMD GPU (ROCm) - ## Build images ### Build the LLM Docker Image From 47893e7dabcc3d4c520f75335b36747c04b0129f Mon Sep 17 00:00:00 2001 From: Chingis Yundunov Date: Wed, 18 Dec 2024 10:14:17 +0700 Subject: [PATCH 16/17] DBQnA - fix README for Translation Signed-off-by: Chingis Yundunov --- Translation/docker_compose/amd/gpu/rocm/README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/Translation/docker_compose/amd/gpu/rocm/README.md b/Translation/docker_compose/amd/gpu/rocm/README.md index b9201ba571..5cff6dc36e 100644 --- a/Translation/docker_compose/amd/gpu/rocm/README.md +++ b/Translation/docker_compose/amd/gpu/rocm/README.md @@ -1,6 +1,5 @@ # Build and deploy Translation Application on AMD GPU (ROCm) - ## Build images ### Build the LLM Docker Image From 1391c25fbb1c4d20ca34eea7631b99b710eb6989 Mon Sep 17 00:00:00 2001 From: Chingis Yundunov Date: Fri, 20 Dec 2024 14:26:16 +0700 Subject: [PATCH 17/17] Translation app - fix deploy on AMD Signed-off-by: Chingis Yundunov --- .github/workflows/_example-workflow.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/_example-workflow.yml b/.github/workflows/_example-workflow.yml index b05d67eedc..a86ac25929 100644 --- a/.github/workflows/_example-workflow.yml +++ b/.github/workflows/_example-workflow.yml @@ -75,7 +75,7 @@ jobs: docker_compose_path=${{ github.workspace }}/${{ inputs.example }}/docker_image_build/build.yaml if [[ $(grep -c "vllm:" ${docker_compose_path}) != 0 ]]; then git clone https://github.com/vllm-project/vllm.git - cd vllm && git checkout 446c780 && cd ../ + cd vllm && git rev-parse HEAD && cd ../ fi if [[ $(grep -c "vllm-gaudi:" ${docker_compose_path}) != 0 ]]; then git clone https://github.com/HabanaAI/vllm-fork.git