From fe0e1782d26fe51b9c7ef2d85a767a6789388c50 Mon Sep 17 00:00:00 2001
From: Artem Astafev <a.astafev@datamonsters.com>
Date: Wed, 20 Nov 2024 19:46:27 +0700
Subject: [PATCH 01/17] Add example for AudioQnA deploy in AMD ROCm (#1147)

Signed-off-by: artem-astafev <a.astafev@datamonsters.com>
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 .../docker_compose/amd/gpu/rocm/README.md     | 170 ++++++++++++++++++
 .../docker_compose/amd/gpu/rocm/compose.yaml  | 110 ++++++++++++
 .../docker_compose/amd/gpu/rocm/set_env.sh    |  26 +++
 AudioQnA/tests/test_compose_on_rocm.sh        | 128 +++++++++++++
 4 files changed, 434 insertions(+)
 create mode 100644 AudioQnA/docker_compose/amd/gpu/rocm/README.md
 create mode 100644 AudioQnA/docker_compose/amd/gpu/rocm/compose.yaml
 create mode 100644 AudioQnA/docker_compose/amd/gpu/rocm/set_env.sh
 create mode 100644 AudioQnA/tests/test_compose_on_rocm.sh
diff --git a/AudioQnA/docker_compose/amd/gpu/rocm/README.md b/AudioQnA/docker_compose/amd/gpu/rocm/README.md
new file mode 100644
index 0000000000..3ae8cc8a38
--- /dev/null
+++ b/AudioQnA/docker_compose/amd/gpu/rocm/README.md
@@ -0,0 +1,170 @@
+# Build Mega Service of AudioQnA on AMD ROCm GPU
+
+This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice
+pipeline on server on AMD ROCm GPU platform.
+
+## 🚀 Build Docker images
+
+### 1. Source Code install GenAIComps
+
+```bash
+git clone https://github.com/opea-project/GenAIComps.git
+cd GenAIComps
+```
+
+### 2. Build ASR Image
+
+```bash
+docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile .
+
+
+docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
+```
+
+### 3. Build LLM Image
+
+```bash
+docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
+```
+
+Note:
+For compose for ROCm example AMD optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm (https://github.com/huggingface/text-generation-inference)
+
+### 4. Build TTS Image
+
+```bash
+docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile .
+
+docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile .
+```
+
+### 6. Build MegaService Docker Image
+
+To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:
+
+```bash
+git clone https://github.com/opea-project/GenAIExamples.git
+cd GenAIExamples/AudioQnA/
+docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
+```
+
+Then run the command `docker images`, you will have following images ready:
+
+1. `opea/whisper:latest`
+2. `opea/asr:latest`
+3. `opea/llm-tgi:latest`
+4. `opea/speecht5:latest`
+5. `opea/tts:latest`
+6. `opea/audioqna:latest`
+
+## 🚀 Set the environment variables
+
+Before starting the services with `docker compose`, you have to recheck the following environment variables.
+
+```bash
+export host_ip=<your External Public IP>    # export host_ip=$(hostname -I | awk '{print $1}')
+export HUGGINGFACEHUB_API_TOKEN=<your HF token>
+
+export TGI_LLM_ENDPOINT=http://$host_ip:3006
+export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
+
+export ASR_ENDPOINT=http://$host_ip:7066
+export TTS_ENDPOINT=http://$host_ip:7055
+
+export MEGA_SERVICE_HOST_IP=${host_ip}
+export ASR_SERVICE_HOST_IP=${host_ip}
+export TTS_SERVICE_HOST_IP=${host_ip}
+export LLM_SERVICE_HOST_IP=${host_ip}
+
+export ASR_SERVICE_PORT=3001
+export TTS_SERVICE_PORT=3002
+export LLM_SERVICE_PORT=3007
+```
+
+or use set_env.sh file to setup environment variables.
+
+Note: Please replace with host_ip with your external IP address, do not use localhost.
+
+Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered, where is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)
+
+Example for set isolation for 1 GPU
+
+      - /dev/dri/card0:/dev/dri/card0
+      - /dev/dri/renderD128:/dev/dri/renderD128
+
+Example for set isolation for 2 GPUs
+
+      - /dev/dri/card0:/dev/dri/card0
+      - /dev/dri/renderD128:/dev/dri/renderD128
+      - /dev/dri/card0:/dev/dri/card0
+      - /dev/dri/renderD129:/dev/dri/renderD129
+
+Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)
+
+## 🚀 Start the MegaService
+
+```bash
+cd GenAIExamples/AudioQnA/docker_compose/amd/gpu/rocm/
+docker compose up -d
+```
+
+In following cases, you could build docker image from source by yourself.
+
+- Failed to download the docker image.
+- If you want to use a specific version of Docker image.
+
+Please refer to 'Build Docker Images' in below.
+
+## 🚀 Consume the AudioQnA Service
+
+Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the
+base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen
+to the response, decode the base64 string and save it as a .wav file.
+
+```bash
+curl http://${host_ip}:3008/v1/audioqna \
+  -X POST \
+  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' \
+  -H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
+```
+
+## 🚀 Test MicroServices
+
+```bash
+# whisper service
+curl http://${host_ip}:7066/v1/asr \
+  -X POST \
+  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
+  -H 'Content-Type: application/json'
+
+# asr microservice
+curl http://${host_ip}:3001/v1/audio/transcriptions \
+  -X POST \
+  -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
+  -H 'Content-Type: application/json'
+
+# tgi service
+curl http://${host_ip}:3006/generate \
+  -X POST \
+  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
+  -H 'Content-Type: application/json'
+
+# llm microservice
+curl http://${host_ip}:3007/v1/chat/completions\
+  -X POST \
+  -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
+  -H 'Content-Type: application/json'
+
+# speecht5 service
+curl http://${host_ip}:7055/v1/tts \
+  -X POST \
+  -d '{"text": "Who are you?"}' \
+  -H 'Content-Type: application/json'
+
+# tts microservice
+curl http://${host_ip}:3002/v1/audio/speech \
+  -X POST \
+  -d '{"text": "Who are you?"}' \
+  -H 'Content-Type: application/json'
+
+```
diff --git a/AudioQnA/docker_compose/amd/gpu/rocm/compose.yaml b/AudioQnA/docker_compose/amd/gpu/rocm/compose.yaml
new file mode 100644
index 0000000000..651fd5464b
--- /dev/null
+++ b/AudioQnA/docker_compose/amd/gpu/rocm/compose.yaml
@@ -0,0 +1,110 @@
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+services:
+  whisper-service:
+    image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
+    container_name: whisper-service
+    ports:
+      - "7066:7066"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+    restart: unless-stopped
+  asr:
+    image: ${REGISTRY:-opea}/asr:${TAG:-latest}
+    container_name: asr-service
+    ports:
+      - "3001:9099"
+    ipc: host
+    environment:
+      ASR_ENDPOINT: ${ASR_ENDPOINT}
+  speecht5-service:
+    image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
+    container_name: speecht5-service
+    ports:
+      - "7055:7055"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+    restart: unless-stopped
+  tts:
+    image: ${REGISTRY:-opea}/tts:${TAG:-latest}
+    container_name: tts-service
+    ports:
+      - "3002:9088"
+    ipc: host
+    environment:
+      TTS_ENDPOINT: ${TTS_ENDPOINT}
+  tgi-service:
+    image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
+    container_name: tgi-service
+    ports:
+      - "3006:80"
+    volumes:
+     - "./data:/data"
+    shm_size: 1g
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri/card1:/dev/dri/card1
+      - /dev/dri/renderD136:/dev/dri/renderD136
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      HF_HUB_DISABLE_PROGRESS_BARS: 1
+      HF_HUB_ENABLE_HF_TRANSFER: 0
+    command: --model-id ${LLM_MODEL_ID}
+    cap_add:
+      - SYS_PTRACE
+    group_add:
+      - video
+    security_opt:
+      - seccomp:unconfined
+    ipc: host
+  llm:
+    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
+    container_name: llm-tgi-server
+    depends_on:
+      - tgi-service
+    ports:
+      - "3007:9000"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+    restart: unless-stopped
+  audioqna-backend-server:
+    image: ${REGISTRY:-opea}/audioqna:${TAG:-latest}
+    container_name: audioqna-xeon-backend-server
+    depends_on:
+      - asr
+      - llm
+      - tts
+    ports:
+      - "3008:8888"
+    environment:
+      - no_proxy=${no_proxy}
+      - https_proxy=${https_proxy}
+      - http_proxy=${http_proxy}
+      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
+      - ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP}
+      - ASR_SERVICE_PORT=${ASR_SERVICE_PORT}
+      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
+      - LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
+      - TTS_SERVICE_HOST_IP=${TTS_SERVICE_HOST_IP}
+      - TTS_SERVICE_PORT=${TTS_SERVICE_PORT}
+    ipc: host
+    restart: always
+
+networks:
+  default:
+    driver: bridge
diff --git a/AudioQnA/docker_compose/amd/gpu/rocm/set_env.sh b/AudioQnA/docker_compose/amd/gpu/rocm/set_env.sh
new file mode 100644
index 0000000000..8765b702b3
--- /dev/null
+++ b/AudioQnA/docker_compose/amd/gpu/rocm/set_env.sh
@@ -0,0 +1,26 @@
+#!/usr/bin/env bash                                                                                                           set_env.sh
+
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+
+# export host_ip=<your External Public IP>    # export host_ip=$(hostname -I | awk '{print $1}')
+
+export host_ip="192.165.1.21"
+export HUGGINGFACEHUB_API_TOKEN=${YOUR_HUGGINGFACEHUB_API_TOKEN}
+# <token>
+
+export TGI_LLM_ENDPOINT=http://$host_ip:3006
+export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
+
+export ASR_ENDPOINT=http://$host_ip:7066
+export TTS_ENDPOINT=http://$host_ip:7055
+
+export MEGA_SERVICE_HOST_IP=${host_ip}
+export ASR_SERVICE_HOST_IP=${host_ip}
+export TTS_SERVICE_HOST_IP=${host_ip}
+export LLM_SERVICE_HOST_IP=${host_ip}
+
+export ASR_SERVICE_PORT=3001
+export TTS_SERVICE_PORT=3002
+export LLM_SERVICE_PORT=3007
diff --git a/AudioQnA/tests/test_compose_on_rocm.sh b/AudioQnA/tests/test_compose_on_rocm.sh
new file mode 100644
index 0000000000..86a1484728
--- /dev/null
+++ b/AudioQnA/tests/test_compose_on_rocm.sh
@@ -0,0 +1,128 @@
+#!/bin/bash
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+set -ex
+IMAGE_REPO=${IMAGE_REPO:-"opea"}
+IMAGE_TAG=${IMAGE_TAG:-"latest"}
+echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
+echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
+export REGISTRY=${IMAGE_REPO}
+export TAG=${IMAGE_TAG}
+
+WORKPATH=$(dirname "$PWD")
+LOG_PATH="$WORKPATH/tests"
+ip_address=$(hostname -I | awk '{print $1}')
+export PATH="~/miniconda3/bin:$PATH"
+
+function build_docker_images() {
+    cd $WORKPATH/docker_image_build
+    git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
+
+    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
+    service_list="audioqna whisper asr llm-tgi speecht5 tts"
+    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
+    echo "docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
+    docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
+    docker images && sleep 1s
+}
+
+function start_services() {
+    cd $WORKPATH/docker_compose/amd/gpu/rocm/
+    export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+    export TGI_LLM_ENDPOINT=http://$ip_address:3006
+    export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
+
+    export ASR_ENDPOINT=http://$ip_address:7066
+    export TTS_ENDPOINT=http://$ip_address:7055
+
+    export MEGA_SERVICE_HOST_IP=${ip_address}
+    export ASR_SERVICE_HOST_IP=${ip_address}
+    export TTS_SERVICE_HOST_IP=${ip_address}
+    export LLM_SERVICE_HOST_IP=${ip_address}
+
+    export ASR_SERVICE_PORT=3001
+    export TTS_SERVICE_PORT=3002
+    export LLM_SERVICE_PORT=3007
+
+    # sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
+
+    # Start Docker Containers
+    docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
+    n=0
+    until [[ "$n" -ge 100 ]]; do
+       docker logs tgi-service > $LOG_PATH/tgi_service_start.log
+       if grep -q Connected $LOG_PATH/tgi_service_start.log; then
+           break
+       fi
+       sleep 5s
+       n=$((n+1))
+    done
+}
+function validate_megaservice() {
+    result=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
+    echo $result
+    if [[ $result == *"AAA"* ]]; then
+        echo "Result correct."
+    else
+        docker logs whisper-service > $LOG_PATH/whisper-service.log
+        docker logs asr-service > $LOG_PATH/asr-service.log
+        docker logs speecht5-service > $LOG_PATH/tts-service.log
+        docker logs tts-service > $LOG_PATH/tts-service.log
+        docker logs tgi-service > $LOG_PATH/tgi-service.log
+        docker logs llm-tgi-server > $LOG_PATH/llm-tgi-server.log
+        docker logs audioqna-xeon-backend-server > $LOG_PATH/audioqna-xeon-backend-server.log
+
+        echo "Result wrong."
+        exit 1
+    fi
+
+}
+
+#function validate_frontend() {
+# Frontend tests are currently disabled
+#    cd $WORKPATH/ui/svelte
+#    local conda_env_name="OPEA_e2e"
+#    export PATH=${HOME}/miniforge3/bin/:$PATH
+##    conda remove -n ${conda_env_name} --all -y
+##    conda create -n ${conda_env_name} python=3.12 -y
+#    source activate ${conda_env_name}
+#
+#    sed -i "s/localhost/$ip_address/g" playwright.config.ts
+#
+##    conda install -c conda-forge nodejs -y
+#    npm install && npm ci && npx playwright install --with-deps
+#    node -v && npm -v && pip list
+#
+#    exit_status=0
+#    npx playwright test || exit_status=$?
+#
+#    if [ $exit_status -ne 0 ]; then
+#        echo "[TEST INFO]: ---------frontend test failed---------"
+#        exit $exit_status
+#    else
+#        echo "[TEST INFO]: ---------frontend test passed---------"
+#    fi
+#}
+
+function stop_docker() {
+    cd $WORKPATH/docker_compose/amd/gpu/rocm/
+    docker compose stop && docker compose rm -f
+}
+
+function main() {
+
+    stop_docker
+    if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
+    start_services
+
+    validate_megaservice
+    # Frontend tests are currently disabled
+    # validate_frontend
+
+    stop_docker
+    echo y | docker system prune
+
+}
+
+main

From 0cfb71d3cfd274ad209a735d5b9d482d2bf4b289 Mon Sep 17 00:00:00 2001
From: Chingis Yundunov <YundunovCN@sibedge.com>
Date: Tue, 26 Nov 2024 22:50:10 +0700
Subject: [PATCH 02/17] TranslationApp - add: 1. Docker Compose file 2. Set
 envs scripts 3. Tests script for deploy and tests Translation Application on
 AMD GPU

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 .../docker_compose/amd/gpu/rocm/compose.yaml  |  99 ++++++++++
 .../docker_compose/amd/gpu/rocm/set_env.sh    |  18 ++
 Translation/tests/test_compose_on_rocm.sh     | 178 ++++++++++++++++++
 3 files changed, 295 insertions(+)
 create mode 100644 Translation/docker_compose/amd/gpu/rocm/compose.yaml
 create mode 100644 Translation/docker_compose/amd/gpu/rocm/set_env.sh
 create mode 100644 Translation/tests/test_compose_on_rocm.sh

diff --git a/Translation/docker_compose/amd/gpu/rocm/compose.yaml b/Translation/docker_compose/amd/gpu/rocm/compose.yaml
new file mode 100644
index 0000000000..8a4d70da28
--- /dev/null
+++ b/Translation/docker_compose/amd/gpu/rocm/compose.yaml
@@ -0,0 +1,99 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+services:
+  translation-tgi-service:
+    image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
+    container_name: translation-tgi-service
+    ports:
+      - "${TRANSLATIONS_TGI_SERVICE_PORT:-8008}:80"
+    volumes:
+      - "/var/lib/GenAI/translation/data:/data"
+    shm_size: 8g
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      TGI_LLM_ENDPOINT: ${TRANSLATIONS_TGI_LLM_ENDPOINT}
+      HUGGING_FACE_HUB_TOKEN: ${TRANSLATIONS_HUGGINGFACEHUB_API_TOKEN}
+      HUGGINGFACEHUB_API_TOKEN: ${TRANSLATIONS_HUGGINGFACEHUB_API_TOKEN}
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri/:/dev/dri/
+    cap_add:
+      - SYS_PTRACE
+    group_add:
+      - video
+    security_opt:
+      - seccomp:unconfined
+    ipc: host
+    command: --model-id ${TRANSLATIONS_LLM_MODEL_ID}
+  translation-llm:
+    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
+    container_name: translation-llm-tgi-server
+    depends_on:
+      - translation-tgi-service
+    ports:
+      - "9000:9000"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      TGI_LLM_ENDPOINT: ${TRANSLATION_TGI_LLM_ENDPOINT}
+      HUGGINGFACEHUB_API_TOKEN: ${TRANSLATION_HUGGINGFACEHUB_API_TOKEN}
+      HF_HUB_DISABLE_PROGRESS_BARS: 1
+      HF_HUB_ENABLE_HF_TRANSFER: 0
+    restart: unless-stopped
+  translation-backend-server:
+    image: ${REGISTRY:-opea}/translation:${TAG:-latest}
+    container_name: translation-backend-server
+    depends_on:
+      - translation-tgi-service
+      - translation-llm
+    ports:
+      - "${TRANSLATION_BACKEND_SERVICE_PORT:-8888}:8888"
+    environment:
+      - no_proxy=${no_proxy}
+      - https_proxy=${https_proxy}
+      - http_proxy=${http_proxy}
+      - MEGA_SERVICE_HOST_IP=${TRANSLATION_MEGA_SERVICE_HOST_IP}
+      - LLM_SERVICE_HOST_IP=${TRANSLATION_LLM_SERVICE_HOST_IP}
+    ipc: host
+    restart: always
+  translation-ui-server:
+    image: ${REGISTRY:-opea}/translation-ui:${TAG:-latest}
+    container_name: translation-ui-server
+    depends_on:
+      - translation-backend-server
+    ports:
+      - "${TRANSLATION_FRONTEND_SERVICE_PORT:-5173}:5173"
+    environment:
+      - no_proxy=${no_proxy}
+      - https_proxy=${https_proxy}
+      - http_proxy=${http_proxy}
+      - BASE_URL=${TRANSLATION_BACKEND_SERVICE_ENDPOINT}
+    ipc: host
+    restart: always
+  translation-nginx-server:
+    image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
+    container_name: translation-nginx-server
+    depends_on:
+      - translation-backend-server
+      - translation-ui-server
+    ports:
+      - "${TRANSLATION_NGINX_PORT:-80}:80"
+    environment:
+      - no_proxy=${no_proxy}
+      - https_proxy=${https_proxy}
+      - http_proxy=${http_proxy}
+      - FRONTEND_SERVICE_IP=${TRANSLATION_FRONTEND_SERVICE_IP}
+      - FRONTEND_SERVICE_PORT=${TRANSLATION_FRONTEND_SERVICE_PORT}
+      - BACKEND_SERVICE_NAME=${TRANSLATION_BACKEND_SERVICE_NAME}
+      - BACKEND_SERVICE_IP=${TRANSLATION_BACKEND_SERVICE_IP}
+      - BACKEND_SERVICE_PORT=${TRANSLATION_BACKEND_SERVICE_PORT}
+    ipc: host
+    restart: always
+networks:
+  default:
+    driver: bridge
diff --git a/Translation/docker_compose/amd/gpu/rocm/set_env.sh b/Translation/docker_compose/amd/gpu/rocm/set_env.sh
new file mode 100644
index 0000000000..76a2397b97
--- /dev/null
+++ b/Translation/docker_compose/amd/gpu/rocm/set_env.sh
@@ -0,0 +1,18 @@
+#!/usr/bin/env bash
+
+# SPDX-License-Identifier: Apache-2.0
+
+export TRANSLATION_HOST_IP='192.165.1.21'
+export TRANSLATION_EXTERNAL_HOST_IP='direct-supercomputer1.powerml.co'
+export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B"
+export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008"
+export TRANSLATION_HUGGINGFACEHUB_API_TOKEN='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'
+export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP}
+export TRANSLATION_LLM_SERVICE_HOST_IP=${TRANSLATION_HOST_IP}
+export TRANSLATION_FRONTEND_SERVICE_IP=${TRANSLATION_HOST_IP}
+export TRANSLATION_FRONTEND_SERVICE_PORT=18122
+export TRANSLATION_BACKEND_SERVICE_NAME=translation
+export TRANSLATION_BACKEND_SERVICE_IP=${TRANSLATION_HOST_IP}
+export TRANSLATION_BACKEND_SERVICE_PORT=18121
+export TRANSLATION_BACKEND_SERVICE_ENDPOINT="http://${TRANSLATION_EXTERNAL_HOST_IP}:${TRANSLATION_BACKEND_SERVICE_PORT}/v1/translation"
+export TRANSLATION_NGINX_PORT=18123
diff --git a/Translation/tests/test_compose_on_rocm.sh b/Translation/tests/test_compose_on_rocm.sh
new file mode 100644
index 0000000000..514ddb3ee7
--- /dev/null
+++ b/Translation/tests/test_compose_on_rocm.sh
@@ -0,0 +1,178 @@
+#!/bin/bash
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+set -xe
+IMAGE_REPO=${IMAGE_REPO:-"opea"}
+IMAGE_TAG=${IMAGE_TAG:-"latest"}
+echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
+echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
+export REGISTRY=${IMAGE_REPO}
+export TAG=${IMAGE_TAG}
+
+WORKPATH=$(dirname "$PWD")
+LOG_PATH="$WORKPATH/tests"
+ip_address=$(hostname -I | awk '{print $1}')
+
+function build_docker_images() {
+    cd $WORKPATH/docker_image_build
+    git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
+
+    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
+    service_list="translation translation-ui llm-tgi nginx"
+    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
+
+    docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
+    docker images && sleep 1s
+}
+
+function start_services() {
+    cd $WORKPATH/docker_compose/amd/gpu/rocm/
+
+    export TRANSLATION_HOST_IP=${ip_address}
+    export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B"
+    export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008"
+    export TRANSLATION_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+    export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP}
+    export TRANSLATION_LLM_SERVICE_HOST_IP=${TRANSLATION_HOST_IP}
+    export TRANSLATION_FRONTEND_SERVICE_IP=${TRANSLATION_HOST_IP}
+    export TRANSLATION_FRONTEND_SERVICE_PORT=5173
+    export TRANSLATION_BACKEND_SERVICE_NAME=translation
+    export TRANSLATION_BACKEND_SERVICE_IP=${TRANSLATION_HOST_IP}
+    export TRANSLATION_BACKEND_SERVICE_PORT=8888
+    export TRANSLATION_BACKEND_SERVICE_ENDPOINT="http://${TRANSLATION_HOST_IP}:${TRANSLATION_BACKEND_SERVICE_PORT}/v1/translation"
+    export TRANSLATION_NGINX_PORT=8084
+
+    sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
+
+    # Start Docker Containers
+    docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
+
+    n=0
+    # wait long for llm model download
+    until [[ "$n" -ge 500 ]]; do
+        docker logs translation-tgi-service > ${LOG_PATH}/translation-tgi-service_start.log
+        if grep -q Connected ${LOG_PATH}/translation-tgi-service_start.log; then
+            break
+        fi
+        sleep 10s
+        n=$((n+1))
+    done
+}
+
+function validate_services() {
+    local URL="$1"
+    local EXPECTED_RESULT="$2"
+    local SERVICE_NAME="$3"
+    local DOCKER_NAME="$4"
+    local INPUT_DATA="$5"
+
+    local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
+    if [ "$HTTP_STATUS" -eq 200 ]; then
+        echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
+
+        local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)
+
+        if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
+            echo "[ $SERVICE_NAME ] Content is as expected."
+        else
+            echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
+            docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
+            exit 1
+        fi
+    else
+        echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
+        docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
+        exit 1
+    fi
+    sleep 1s
+}
+
+function validate_microservices() {
+    # Check if the microservices are running correctly.
+
+    # tgi for llm service
+    validate_services \
+        "${TRANSLATION_HOST_IP}:8008/generate" \
+        "generated_text" \
+        "translation-tgi-service" \
+        "translation-tgi-service" \
+        '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'
+
+    # llm microservice
+    validate_services \
+        "${TRANSLATION_HOST_IP}:9000/v1/chat/completions" \
+        "data: " \
+        "translation-llm" \
+        "translation-llm-tgi-server" \
+        '{"query":"Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"}'
+}
+
+function validate_megaservice() {
+    # Curl the Mega Service
+    validate_services \
+    "${TRANSLATION_HOST_IP}:8888/v1/translation" \
+    "translation" \
+    "translation-backend-server" \
+    "translation-backend-server" \
+    '{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
+
+    # test the megeservice via nginx
+    validate_services \
+        "${TRANSLATION_HOST_IP}:${TRANSLATION_NGINX_PORT}/v1/translation" \
+        "translation" \
+        "translation-nginx-server" \
+        "translation-nginx-server" \
+        '{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
+}
+
+function validate_frontend() {
+    cd $WORKPATH/ui/svelte
+    local conda_env_name="OPEA_e2e"
+    export PATH=${HOME}/miniconda3/bin/:$PATH
+    if conda info --envs | grep -q "$conda_env_name"; then
+        echo "$conda_env_name exist!"
+    else
+        conda create -n ${conda_env_name} python=3.12 -y
+    fi
+    source activate ${conda_env_name}
+
+    sed -i "s/localhost/$ip_address/g" playwright.config.ts
+
+    conda install -c conda-forge nodejs=22.6.0 -y
+    npm install && npm ci && npx playwright install --with-deps
+    node -v && npm -v && pip list
+
+    exit_status=0
+    npx playwright test || exit_status=$?
+
+    if [ $exit_status -ne 0 ]; then
+        echo "[TEST INFO]: ---------frontend test failed---------"
+        exit $exit_status
+    else
+        echo "[TEST INFO]: ---------frontend test passed---------"
+    fi
+}
+
+function stop_docker() {
+    cd $WORKPATH/docker_compose/amd/gpu/rocm/
+    docker compose stop && docker compose rm -f
+}
+
+function main() {
+
+    stop_docker
+
+    if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
+    start_services
+
+    validate_microservices
+    validate_megaservice
+    validate_frontend
+
+    stop_docker
+    echo y | docker system prune
+
+}
+
+main

From 100dfbfbe12b7d67e639926c94948e36dfb110e2 Mon Sep 17 00:00:00 2001
From: Chingis Yundunov <YundunovCN@sibedge.com>
Date: Tue, 26 Nov 2024 22:56:18 +0700
Subject: [PATCH 03/17] TranslationApp - add README file

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 .../docker_compose/amd/gpu/rocm/README.md     | 128 ++++++++++++++++++
 1 file changed, 128 insertions(+)
 create mode 100644 Translation/docker_compose/amd/gpu/rocm/README.md

diff --git a/Translation/docker_compose/amd/gpu/rocm/README.md b/Translation/docker_compose/amd/gpu/rocm/README.md
new file mode 100644
index 0000000000..5cff6dc36e
--- /dev/null
+++ b/Translation/docker_compose/amd/gpu/rocm/README.md
@@ -0,0 +1,128 @@
+# Build and deploy Translation Application on AMD GPU (ROCm)
+
+## Build images
+
+### Build the LLM Docker Image
+
+```bash
+### Cloning repo
+git clone https://github.com/opea-project/GenAIComps.git
+cd GenAIComps
+
+### Build Docker image
+docker build -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
+```
+
+### Build the MegaService Docker Image
+
+```bash
+### Cloning repo
+git clone https://github.com/opea-project/GenAIExamples
+cd GenAIExamples/Translation/
+
+### Build Docker image
+docker build -t opea/translation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
+```
+
+### Build the UI Docker Image
+
+```bash
+cd GenAIExamples/Translation/ui
+### Build UI Docker image
+docker build -t opea/translation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile .
+```
+
+## Deploy Translation Application
+
+### Features of Docker compose for AMD GPUs
+
+1. Added forwarding of GPU devices to the container TGI service with instructions:
+
+```yaml
+shm_size: 1g
+devices:
+  - /dev/kfd:/dev/kfd
+  - /dev/dri/:/dev/dri/
+cap_add:
+  - SYS_PTRACE
+group_add:
+  - video
+security_opt:
+  - seccomp:unconfined
+```
+
+In this case, all GPUs are thrown. To reset a specific GPU, you need to use specific device names cardN and renderN.
+
+For example:
+
+```yaml
+shm_size: 1g
+devices:
+  - /dev/kfd:/dev/kfd
+  - /dev/dri/card0:/dev/dri/card0
+  - /dev/dri/render128:/dev/dri/render128
+cap_add:
+  - SYS_PTRACE
+group_add:
+  - video
+security_opt:
+  - seccomp:unconfined
+```
+
+To find out which GPU device IDs cardN and renderN correspond to the same GPU, use the GPU driver utility
+
+### Go to the directory with the Docker compose file
+
+```bash
+cd GenAIExamples/Translation/docker_compose/amd/gpu/rocm
+```
+
+### Set environments
+
+In the file "GenAIExamples/Translation/docker_compose/amd/gpu/rocm/set_env.sh " it is necessary to set the required values. Parameter assignments are specified in the comments for each variable setting command
+
+```bash
+chmod +x set_env.sh
+. set_env.sh
+```
+
+### Run services
+
+```
+docker compose up -d
+```
+
+# Validate the MicroServices and MegaService
+
+## Validate TGI service
+
+```bash
+curl http://${TRANSLATION_HOST_IP}:${TRANSLATIONS_TGI_SERVICE_PORT}/generate \
+  -X POST \
+  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
+  -H 'Content-Type: application/json'
+```
+
+## Validate LLM service
+
+```bash
+curl http://${TRANSLATION_HOST_IP}:9000/v1/chat/completions \
+  -X POST \
+  -d '{"query":"Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"}' \
+  -H 'Content-Type: application/json'
+```
+
+## Validate MegaService
+
+```bash
+curl http://${TRANSLATION_HOST_IP}:${TRANSLATION_BACKEND_SERVICE_PORT}/v1/translation -H "Content-Type: application/json" -d '{
+     "language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
+```
+
+## Validate Nginx service
+
+```bash
+curl http://${TRANSLATION_HOST_IP}:${TRANSLATION_NGINX_PORT}/v1/translation \
+    -H "Content-Type: application/json" \
+    -d '{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
+```

From e458aacfacf018b87487a09c9ae785295417746c Mon Sep 17 00:00:00 2001
From: Chingis Yundunov <YundunovCN@sibedge.com>
Date: Tue, 26 Nov 2024 23:10:59 +0700
Subject: [PATCH 04/17] TranslationApp - fix Docker compose file and tests
 script

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 Translation/docker_compose/amd/gpu/rocm/set_env.sh | 1 +
 Translation/tests/test_compose_on_rocm.sh          | 1 +
 2 files changed, 2 insertions(+)

diff --git a/Translation/docker_compose/amd/gpu/rocm/set_env.sh b/Translation/docker_compose/amd/gpu/rocm/set_env.sh
index 76a2397b97..87c749ea81 100644
--- a/Translation/docker_compose/amd/gpu/rocm/set_env.sh
+++ b/Translation/docker_compose/amd/gpu/rocm/set_env.sh
@@ -5,6 +5,7 @@
 export TRANSLATION_HOST_IP='192.165.1.21'
 export TRANSLATION_EXTERNAL_HOST_IP='direct-supercomputer1.powerml.co'
 export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B"
+export MODEL_ID="haoranxu/ALMA-13B"
 export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008"
 export TRANSLATION_HUGGINGFACEHUB_API_TOKEN='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'
 export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP}
diff --git a/Translation/tests/test_compose_on_rocm.sh b/Translation/tests/test_compose_on_rocm.sh
index 514ddb3ee7..eba06b26aa 100644
--- a/Translation/tests/test_compose_on_rocm.sh
+++ b/Translation/tests/test_compose_on_rocm.sh
@@ -31,6 +31,7 @@ function start_services() {
 
     export TRANSLATION_HOST_IP=${ip_address}
     export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B"
+    export MODEL_ID="haoranxu/ALMA-13B"
     export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008"
     export TRANSLATION_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
     export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP}

From adfb587a7c4db0d5aa1c6d3e32c14f2a4c138be2 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 26 Nov 2024 16:03:20 +0000
Subject: [PATCH 05/17] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 Translation/docker_compose/amd/gpu/rocm/set_env.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Translation/docker_compose/amd/gpu/rocm/set_env.sh b/Translation/docker_compose/amd/gpu/rocm/set_env.sh
index 87c749ea81..f6a2cec913 100644
--- a/Translation/docker_compose/amd/gpu/rocm/set_env.sh
+++ b/Translation/docker_compose/amd/gpu/rocm/set_env.sh
@@ -1,5 +1,8 @@
 #!/usr/bin/env bash
 
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
 # SPDX-License-Identifier: Apache-2.0
 
 export TRANSLATION_HOST_IP='192.165.1.21'

From 8b0526b5deef31cf11d5adeda4e4b8e650d603f7 Mon Sep 17 00:00:00 2001
From: Chingis Yundunov <YundunovCN@sibedge.com>
Date: Tue, 26 Nov 2024 23:16:20 +0700
Subject: [PATCH 06/17] TranslationApp - fix Docker compose file and tests
 script

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 Translation/docker_compose/amd/gpu/rocm/compose.yaml | 10 +++++-----
 Translation/docker_compose/amd/gpu/rocm/set_env.sh   |  1 -
 Translation/tests/test_compose_on_rocm.sh            |  1 -
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/Translation/docker_compose/amd/gpu/rocm/compose.yaml b/Translation/docker_compose/amd/gpu/rocm/compose.yaml
index 8a4d70da28..266f294a9f 100644
--- a/Translation/docker_compose/amd/gpu/rocm/compose.yaml
+++ b/Translation/docker_compose/amd/gpu/rocm/compose.yaml
@@ -6,7 +6,7 @@ services:
     image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
     container_name: translation-tgi-service
     ports:
-      - "${TRANSLATIONS_TGI_SERVICE_PORT:-8008}:80"
+      - "${TRANSLATION_TGI_SERVICE_PORT:-8008}:80"
     volumes:
       - "/var/lib/GenAI/translation/data:/data"
     shm_size: 8g
@@ -14,9 +14,9 @@ services:
       no_proxy: ${no_proxy}
       http_proxy: ${http_proxy}
       https_proxy: ${https_proxy}
-      TGI_LLM_ENDPOINT: ${TRANSLATIONS_TGI_LLM_ENDPOINT}
-      HUGGING_FACE_HUB_TOKEN: ${TRANSLATIONS_HUGGINGFACEHUB_API_TOKEN}
-      HUGGINGFACEHUB_API_TOKEN: ${TRANSLATIONS_HUGGINGFACEHUB_API_TOKEN}
+      TGI_LLM_ENDPOINT: ${TRANSLATION_TGI_LLM_ENDPOINT}
+      HUGGING_FACE_HUB_TOKEN: ${TRANSLATION_HUGGINGFACEHUB_API_TOKEN}
+      HUGGINGFACEHUB_API_TOKEN: ${TRANSLATION_HUGGINGFACEHUB_API_TOKEN}
     devices:
       - /dev/kfd:/dev/kfd
       - /dev/dri/:/dev/dri/
@@ -27,7 +27,7 @@ services:
     security_opt:
       - seccomp:unconfined
     ipc: host
-    command: --model-id ${TRANSLATIONS_LLM_MODEL_ID}
+    command: --model-id ${TRANSLATION_LLM_MODEL_ID}
   translation-llm:
     image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
     container_name: translation-llm-tgi-server
diff --git a/Translation/docker_compose/amd/gpu/rocm/set_env.sh b/Translation/docker_compose/amd/gpu/rocm/set_env.sh
index f6a2cec913..8e33e61123 100644
--- a/Translation/docker_compose/amd/gpu/rocm/set_env.sh
+++ b/Translation/docker_compose/amd/gpu/rocm/set_env.sh
@@ -8,7 +8,6 @@
 export TRANSLATION_HOST_IP='192.165.1.21'
 export TRANSLATION_EXTERNAL_HOST_IP='direct-supercomputer1.powerml.co'
 export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B"
-export MODEL_ID="haoranxu/ALMA-13B"
 export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008"
 export TRANSLATION_HUGGINGFACEHUB_API_TOKEN='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'
 export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP}
diff --git a/Translation/tests/test_compose_on_rocm.sh b/Translation/tests/test_compose_on_rocm.sh
index eba06b26aa..514ddb3ee7 100644
--- a/Translation/tests/test_compose_on_rocm.sh
+++ b/Translation/tests/test_compose_on_rocm.sh
@@ -31,7 +31,6 @@ function start_services() {
 
     export TRANSLATION_HOST_IP=${ip_address}
     export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B"
-    export MODEL_ID="haoranxu/ALMA-13B"
     export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008"
     export TRANSLATION_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
     export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP}

From a32048454a6a53037444dfc729a53712a8074504 Mon Sep 17 00:00:00 2001
From: minmin-intel <minmin.hou@intel.com>
Date: Wed, 20 Nov 2024 17:30:11 -0800
Subject: [PATCH 07/17] Fix DocIndexRetriever CI error on Xeon (#1167)

Signed-off-by: minmin-intel <minmin.hou@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 DocIndexRetriever/tests/test.py               |  7 ++--
 .../tests/test_compose_on_gaudi.sh            |  4 ++
 .../tests/test_compose_on_xeon.sh             | 39 ++++++++++++++-----
 3 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/DocIndexRetriever/tests/test.py b/DocIndexRetriever/tests/test.py
index 698f40da30..e26ccd3dbd 100644
--- a/DocIndexRetriever/tests/test.py
+++ b/DocIndexRetriever/tests/test.py
@@ -6,7 +6,7 @@
 import requests
 
 
-def search_knowledge_base(query: str, url: str, request_type="chat_completion") -> str:
+def search_knowledge_base(query: str, url: str, request_type: str) -> str:
     """Search the knowledge base for a specific query."""
     print(url)
     proxies = {"http": ""}
@@ -18,12 +18,13 @@ def search_knowledge_base(query: str, url: str, request_type="chat_completion")
             "top_n": 2,
         }
     else:
-        print("Sending text request")
+        print("Sending textdoc request")
         payload = {
             "text": query,
         }
     response = requests.post(url, json=payload, proxies=proxies)
     print(response)
+    print(response.json().keys())
     if "documents" in response.json():
         docs = response.json()["documents"]
         context = ""
@@ -32,7 +33,6 @@ def search_knowledge_base(query: str, url: str, request_type="chat_completion")
                 context = str(i) + ": " + doc
             else:
                 context += "\n" + str(i) + ": " + doc
-        # print(context)
         return context
     elif "text" in response.json():
         return response.json()["text"]
@@ -44,7 +44,6 @@ def search_knowledge_base(query: str, url: str, request_type="chat_completion")
                 context = doc["text"]
             else:
                 context += "\n" + doc["text"]
-        # print(context)
         return context
     else:
         return "Error parsing response from the knowledge base."
diff --git a/DocIndexRetriever/tests/test_compose_on_gaudi.sh b/DocIndexRetriever/tests/test_compose_on_gaudi.sh
index e652ead26b..bea6f8e7a1 100644
--- a/DocIndexRetriever/tests/test_compose_on_gaudi.sh
+++ b/DocIndexRetriever/tests/test_compose_on_gaudi.sh
@@ -15,6 +15,7 @@ LOG_PATH="$WORKPATH/tests"
 ip_address=$(hostname -I | awk '{print $1}')
 
 function build_docker_images() {
+    echo "Building Docker Images...."
     cd $WORKPATH/docker_image_build
     if [ ! -d "GenAIComps" ] ; then
         git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
@@ -26,9 +27,11 @@ function build_docker_images() {
     docker pull redis/redis-stack:7.2.0-v9
     docker pull ghcr.io/huggingface/tei-gaudi:1.5.0
     docker images && sleep 1s
+    echo "Docker images built!"
 }
 
 function start_services() {
+    echo "Starting Docker Services...."
     cd $WORKPATH/docker_compose/intel/hpu/gaudi
     export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
     export RERANK_MODEL_ID="BAAI/bge-reranker-base"
@@ -47,6 +50,7 @@ function start_services() {
     # Start Docker Containers
     docker compose up -d
     sleep 20
+    echo "Docker services started!"
 }
 
 function validate() {
diff --git a/DocIndexRetriever/tests/test_compose_on_xeon.sh b/DocIndexRetriever/tests/test_compose_on_xeon.sh
index c6ff29e29f..a106301598 100644
--- a/DocIndexRetriever/tests/test_compose_on_xeon.sh
+++ b/DocIndexRetriever/tests/test_compose_on_xeon.sh
@@ -15,8 +15,10 @@ LOG_PATH="$WORKPATH/tests"
 ip_address=$(hostname -I | awk '{print $1}')
 
 function build_docker_images() {
+    echo "Building Docker Images...."
     cd $WORKPATH/docker_image_build
     if [ ! -d "GenAIComps" ] ; then
+        echo "Cloning GenAIComps repository"
         git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
     fi
     service_list="dataprep-redis embedding-tei retriever-redis reranking-tei doc-index-retriever"
@@ -25,9 +27,12 @@ function build_docker_images() {
     docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
     docker pull redis/redis-stack:7.2.0-v9
     docker images && sleep 1s
+
+    echo "Docker images built!"
 }
 
 function start_services() {
+    echo "Starting Docker Services...."
     cd $WORKPATH/docker_compose/intel/cpu/xeon
     export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
     export RERANK_MODEL_ID="BAAI/bge-reranker-base"
@@ -45,7 +50,8 @@ function start_services() {
 
     # Start Docker Containers
     docker compose up -d
-    sleep 20
+    sleep 5m
+    echo "Docker services started!"
 }
 
 function validate() {
@@ -66,7 +72,7 @@ function validate_megaservice() {
     echo "===========Ingest data=================="
     local CONTENT=$(http_proxy="" curl -X POST "http://${ip_address}:6007/v1/dataprep" \
      -H "Content-Type: multipart/form-data" \
-     -F 'link_list=["https://opea.dev"]')
+     -F 'link_list=["https://opea.dev/"]')
     local EXIT_CODE=$(validate "$CONTENT" "Data preparation succeeded" "dataprep-redis-service-xeon")
     echo "$EXIT_CODE"
     local EXIT_CODE="${EXIT_CODE:0-1}"
@@ -77,19 +83,26 @@ function validate_megaservice() {
     fi
 
     # Curl the Mega Service
-    echo "================Testing retriever service: Default params================"
-
-    local CONTENT=$(curl http://${ip_address}:8889/v1/retrievaltool -X POST -H "Content-Type: application/json" -d '{
-     "messages": "Explain the OPEA project?"
+    echo "================Testing retriever service: Text Request ================"
+    cd $WORKPATH/tests
+    local CONTENT=$(http_proxy="" curl http://${ip_address}:8889/v1/retrievaltool -X POST -H "Content-Type: application/json" -d '{
+     "text": "Explain the OPEA project?"
     }')
+    # local CONTENT=$(python test.py --host_ip ${ip_address} --request_type text)
     local EXIT_CODE=$(validate "$CONTENT" "OPEA" "doc-index-retriever-service-xeon")
     echo "$EXIT_CODE"
     local EXIT_CODE="${EXIT_CODE:0-1}"
     echo "return value is $EXIT_CODE"
     if [ "$EXIT_CODE" == "1" ]; then
-        docker logs tei-embedding-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
+        echo "=============Embedding container log=================="
+        docker logs embedding-tei-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
+        echo "=============Retriever container log=================="
         docker logs retriever-redis-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
-        docker logs reranking-tei-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
+        echo "=============TEI Reranking log=================="
+        docker logs tei-reranking-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
+        echo "=============Reranking container log=================="
+        docker logs reranking-tei-xeon-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
+        echo "=============Doc-index-retriever container log=================="
         docker logs doc-index-retriever-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
         exit 1
     fi
@@ -102,9 +115,15 @@ function validate_megaservice() {
     local EXIT_CODE="${EXIT_CODE:0-1}"
     echo "return value is $EXIT_CODE"
     if [ "$EXIT_CODE" == "1" ]; then
-        docker logs tei-embedding-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
+        echo "=============Embedding container log=================="
+        docker logs embedding-tei-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
+        echo "=============Retriever container log=================="
         docker logs retriever-redis-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
-        docker logs reranking-tei-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
+        echo "=============TEI Reranking log=================="
+        docker logs tei-reranking-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
+        echo "=============Reranking container log=================="
+        docker logs reranking-tei-xeon-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
+        echo "=============Doc-index-retriever container log=================="
         docker logs doc-index-retriever-server | tee -a ${LOG_PATH}/doc-index-retriever-service-xeon.log
         exit 1
     fi

From 78d6293d8ecea7ce4d4ea5caf8253a8d67798757 Mon Sep 17 00:00:00 2001
From: Letong Han <106566639+letonghan@users.noreply.github.com>
Date: Thu, 21 Nov 2024 10:48:52 +0800
Subject: [PATCH 08/17] Fix Translation Manifest CI with MODEL_ID (#1169)

Signed-off-by: letonghan <letong.han@intel.com>
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml | 2 +-
 .../kubernetes/intel/hpu/gaudi/manifest/translation.yaml        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml b/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml
index 9cc8c2798f..f8e2b6e659 100644
--- a/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml
+++ b/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml
@@ -10,7 +10,7 @@ metadata:
     app.kubernetes.io/instance: translation
     app.kubernetes.io/version: "2.1.0"
 data:
-  LLM_MODEL_ID: "haoranxu/ALMA-13B"
+  MODEL_ID: "haoranxu/ALMA-13B"
   PORT: "2080"
   HF_TOKEN: "insert-your-huggingface-token-here"
   http_proxy: ""
diff --git a/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml b/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml
index 25e39a7002..61a487a0db 100644
--- a/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml
+++ b/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml
@@ -10,7 +10,7 @@ metadata:
     app.kubernetes.io/instance: translation
     app.kubernetes.io/version: "2.1.0"
 data:
-  LLM_MODEL_ID: "haoranxu/ALMA-13B"
+  MODEL_ID: "haoranxu/ALMA-13B"
   PORT: "2080"
   HF_TOKEN: "insert-your-huggingface-token-here"
   http_proxy: ""

From 122dc7c31368a24a3515cf3951d6dc9384e421eb Mon Sep 17 00:00:00 2001
From: bjzhjing <cathy.zhang@intel.com>
Date: Thu, 21 Nov 2024 14:14:27 +0800
Subject: [PATCH 09/17] Adjustments for helm release change (#1173)

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 .../kubernetes/intel/gaudi/README.md          | 12 +----
 .../kubernetes/intel/gaudi/deploy.py          | 52 +++----------------
 2 files changed, 10 insertions(+), 54 deletions(-)

diff --git a/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/README.md b/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/README.md
index d667727f48..ae0537f8ff 100644
--- a/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/README.md
+++ b/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/README.md
@@ -69,10 +69,6 @@ Results will be displayed in the terminal and saved as CSV file named `1_stats.c
   - Persistent Volume Claim (PVC): This is the recommended approach for production setups. For more details on using PVC, refer to [PVC](https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/README.md#using-persistent-volume).
   - Local Host Path: For simpler testing, ensure that each node involved in the deployment follows the steps above to locally prepare the models. After preparing the models, use `--set global.modelUseHostPath=${MODELDIR}` in the deployment command.
 
-- Add OPEA Helm Repository:
-  ```bash
-  python deploy.py --add-repo
-  ```
 - Label Nodes
   ```base
   python deploy.py --add-label --num-nodes 2
@@ -192,13 +188,9 @@ All the test results will come to the folder `GenAIEval/evals/benchmark/benchmar
 
 ## Teardown
 
-After completing the benchmark, use the following commands to clean up the environment:
+After completing the benchmark, use the following command to clean up the environment:
 
 Remove Node Labels:
-```base
-python deploy.py --delete-label
-```
-Delete the OPEA Helm Repository:
 ```bash
-python deploy.py --delete-repo
+python deploy.py --delete-label
 ```
diff --git a/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/deploy.py b/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/deploy.py
index 4632cc79c2..6f1f97cac2 100644
--- a/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/deploy.py
+++ b/ChatQnA/benchmark/performance/kubernetes/intel/gaudi/deploy.py
@@ -83,26 +83,6 @@ def clear_labels_from_nodes(label, node_names=None):
             print(f"Label {label_key} not found on node {node_name}, skipping.")
 
 
-def add_helm_repo(repo_name, repo_url):
-    # Add the repo if it does not exist
-    add_command = ["helm", "repo", "add", repo_name, repo_url]
-    try:
-        subprocess.run(add_command, check=True)
-        print(f"Added Helm repo {repo_name} from {repo_url}.")
-    except subprocess.CalledProcessError as e:
-        print(f"Failed to add Helm repo {repo_name}: {e}")
-
-
-def delete_helm_repo(repo_name):
-    """Delete Helm repo if it exists."""
-    command = ["helm", "repo", "remove", repo_name]
-    try:
-        subprocess.run(command, check=True)
-        print(f"Deleted Helm repo {repo_name}.")
-    except subprocess.CalledProcessError:
-        print(f"Failed to delete Helm repo {repo_name}. It may not exist.")
-
-
 def install_helm_release(release_name, chart_name, namespace, values_file, device_type):
     """Deploy a Helm release with a specified name and chart.
 
@@ -132,14 +112,14 @@ def install_helm_release(release_name, chart_name, namespace, values_file, devic
     if device_type == "gaudi":
         print("Device type is gaudi. Pulling Helm chart to get gaudi-values.yaml...")
 
-        # Pull and untar the chart
-        subprocess.run(["helm", "pull", chart_name, "--untar"], check=True)
+        # Combine chart_name with fixed prefix
+        chart_pull_url = f"oci://ghcr.io/opea-project/charts/{chart_name}"
 
-        # Determine the directory name (get the actual chart_name if chart_name is in the format 'repo_name/chart_name', else use chart_name directly)
-        chart_dir_name = chart_name.split("/")[-1] if "/" in chart_name else chart_name
+        # Pull and untar the chart
+        subprocess.run(["helm", "pull", chart_pull_url, "--untar"], check=True)
 
-        # Find the untarred directory (assumes only one directory matches chart_dir_name)
-        untar_dirs = glob.glob(f"{chart_dir_name}*")
+        # Find the untarred directory
+        untar_dirs = glob.glob(f"{chart_name}*")
         if untar_dirs:
             untar_dir = untar_dirs[0]
             hw_values_file = os.path.join(untar_dir, "gaudi-values.yaml")
@@ -210,20 +190,14 @@ def main():
     parser.add_argument(
         "--chart-name",
         type=str,
-        default="opea/chatqna",
-        help="The chart name to deploy, composed of repo name and chart name (default: opea/chatqna).",
+        default="chatqna",
+        help="The chart name to deploy, composed of repo name and chart name (default: chatqna).",
     )
     parser.add_argument("--namespace", default="default", help="Kubernetes namespace (default: default).")
     parser.add_argument("--hf-token", help="Hugging Face API token.")
     parser.add_argument(
         "--model-dir", help="Model directory, mounted as volumes for service access to pre-downloaded models"
     )
-    parser.add_argument("--repo-name", default="opea", help="Helm repo name to add/delete (default: opea).")
-    parser.add_argument(
-        "--repo-url",
-        default="https://opea-project.github.io/GenAIInfra",
-        help="Helm repository URL (default: https://opea-project.github.io/GenAIInfra).",
-    )
     parser.add_argument("--user-values", help="Path to a user-specified values.yaml file.")
     parser.add_argument(
         "--create-values-only", action="store_true", help="Only create the values.yaml file without deploying."
@@ -244,8 +218,6 @@ def main():
         action="store_true",
         help="Modify resources for services and change extraCmdArgs when creating values.yaml.",
     )
-    parser.add_argument("--add-repo", action="store_true", help="Add the Helm repo specified by --repo-url.")
-    parser.add_argument("--delete-repo", action="store_true", help="Delete the Helm repo specified by --repo-name.")
     parser.add_argument(
         "--device-type",
         type=str,
@@ -264,14 +236,6 @@ def main():
         else:
             args.num_nodes = num_node_names
 
-    # Helm repository management
-    if args.add_repo:
-        add_helm_repo(args.repo_name, args.repo_url)
-        return
-    elif args.delete_repo:
-        delete_helm_repo(args.repo_name)
-        return
-
     # Node labeling management
     if args.add_label:
         add_labels_to_nodes(args.num_nodes, args.label, args.node_names)

From b0110c7f9fa251d3f741aae6eb2f23cc7f9e3b60 Mon Sep 17 00:00:00 2001
From: Mingyuan Qi <mingyuan.qi@intel.com>
Date: Thu, 21 Nov 2024 20:36:28 +0800
Subject: [PATCH 10/17] Fix code scanning alert no. 21: Uncontrolled data used
 in path expression (#1171)

Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 EdgeCraftRAG/Dockerfile.server                |  5 +++
 EdgeCraftRAG/README.md                        | 44 +++++--------------
 .../docker_compose/intel/gpu/arc/compose.yaml |  1 +
 .../intel/gpu/arc/compose_vllm.yaml           |  1 +
 .../edgecraftrag/components/generator.py      | 13 +++---
 .../configs/test_pipeline_local_llm.json      |  2 +-
 .../tests/configs/test_pipeline_vllm.json     |  2 +-
 .../tests/test_compose_vllm_on_arc.sh         |  3 +-
 .../tests/test_pipeline_local_llm.json        |  2 +-
 EdgeCraftRAG/ui/gradio/default.yaml           |  2 +-
 EdgeCraftRAG/ui/gradio/ecrag_client.py        |  2 +-
 11 files changed, 30 insertions(+), 47 deletions(-)

diff --git a/EdgeCraftRAG/Dockerfile.server b/EdgeCraftRAG/Dockerfile.server
index f076dcd16d..b327544129 100644
--- a/EdgeCraftRAG/Dockerfile.server
+++ b/EdgeCraftRAG/Dockerfile.server
@@ -23,6 +23,11 @@ RUN useradd -m -s /bin/bash user && \
     mkdir -p /home/user && \
     chown -R user /home/user/ 
 
+RUN mkdir /templates && \
+    chown -R user /templates
+COPY ./edgecraftrag/prompt_template/default_prompt.txt /templates/
+RUN chown -R user /templates/default_prompt.txt
+
 COPY ./edgecraftrag /home/user/edgecraftrag
 
 RUN mkdir -p /home/user/gradio_cache 
diff --git a/EdgeCraftRAG/README.md b/EdgeCraftRAG/README.md
index a248225325..ed828165cc 100644
--- a/EdgeCraftRAG/README.md
+++ b/EdgeCraftRAG/README.md
@@ -32,14 +32,14 @@ Please follow this link [vLLM with OpenVINO](https://github.com/opea-project/Gen
 
 ### Start Edge Craft RAG Services with Docker Compose
 
-If you want to enable vLLM with OpenVINO service, please finish the steps in [Launch vLLM with OpenVINO service](#optional-launch-vllm-with-openvino-service) first.
-
 ```bash
 cd GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc
 
 export MODEL_PATH="your model path for all your models"
 export DOC_PATH="your doc path for uploading a dir of files"
 export GRADIO_PATH="your gradio cache path for transferring files"
+# If you have a specific prompt template, please uncomment the following line
+# export PROMPT_PATH="your prompt path for prompt templates"
 
 # Make sure all 3 folders have 1000:1000 permission, otherwise
 # chown 1000:1000 ${MODEL_PATH} ${DOC_PATH} ${GRADIO_PATH}
@@ -70,49 +70,25 @@ optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH}/BAAI/bge-sma
 optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH}/BAAI/bge-reranker-large --task sentence-similarity
 optimum-cli export openvino -m Qwen/Qwen2-7B-Instruct ${MODEL_PATH}/Qwen/Qwen2-7B-Instruct/INT4_compressed_weights --weight-format int4
 
-docker compose up -d
+```
+
+#### Launch services with local inference
 
+```bash
+docker compose -f compose.yaml up -d
 ```
 
-#### (Optional) Launch vLLM with OpenVINO service
+#### Launch services with vLLM + OpenVINO inference service
 
-1. Set up Environment Variables
+Set up Additional Environment Variables and start with compose_vllm.yaml
 
 ```bash
 export LLM_MODEL=#your model id
 export VLLM_SERVICE_PORT=8008
 export vLLM_ENDPOINT="http://${HOST_IP}:${VLLM_SERVICE_PORT}"
 export HUGGINGFACEHUB_API_TOKEN=#your HF token
-```
-
-2. Uncomment below code in 'GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml'
 
-```bash
-  # vllm-openvino-server:
-  #   container_name: vllm-openvino-server
-  #   image: opea/vllm-arc:latest
-  #   ports:
-  #     - ${VLLM_SERVICE_PORT:-8008}:80
-  #   environment:
-  #     HTTPS_PROXY: ${https_proxy}
-  #     HTTP_PROXY: ${https_proxy}
-  #     VLLM_OPENVINO_DEVICE: GPU
-  #     HF_ENDPOINT: ${HF_ENDPOINT}
-  #     HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-  #   volumes:
-  #     - /dev/dri/by-path:/dev/dri/by-path
-  #     - $HOME/.cache/huggingface:/root/.cache/huggingface
-  #   devices:
-  #     - /dev/dri
-  #   entrypoint: /bin/bash -c "\
-  #     cd / && \
-  #     export VLLM_CPU_KVCACHE_SPACE=50 && \
-  #     export VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON && \
-  #     python3 -m vllm.entrypoints.openai.api_server \
-  #       --model '${LLM_MODEL}' \
-  #       --max_model_len=1024 \
-  #       --host 0.0.0.0 \
-  #       --port 80"
+docker compose -f compose_vllm.yaml up -d
 ```
 
 ### ChatQnA with LLM Example (Command Line)
diff --git a/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml b/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml
index a695fbc022..68a5c953c9 100644
--- a/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml
+++ b/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml
@@ -16,6 +16,7 @@ services:
       - ${DOC_PATH:-${PWD}}:/home/user/docs
       - ${GRADIO_PATH:-${PWD}}:/home/user/gradio_cache
       - ${HF_CACHE:-${HOME}/.cache}:/home/user/.cache
+      - ${PROMPT_PATH:-${PWD}}:/templates/custom
     ports:
       - ${PIPELINE_SERVICE_PORT:-16010}:${PIPELINE_SERVICE_PORT:-16010}
     devices:
diff --git a/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml b/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml
index 6ba7c4da27..c1e937fa69 100644
--- a/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml
+++ b/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml
@@ -16,6 +16,7 @@ services:
       - ${DOC_PATH:-${PWD}}:/home/user/docs
       - ${GRADIO_PATH:-${PWD}}:/home/user/gradio_cache
       - ${HF_CACHE:-${HOME}/.cache}:/home/user/.cache
+      - ${PROMPT_PATH:-${PWD}}:/templates/custom
     ports:
       - ${PIPELINE_SERVICE_PORT:-16010}:${PIPELINE_SERVICE_PORT:-16010}
     devices:
diff --git a/EdgeCraftRAG/edgecraftrag/components/generator.py b/EdgeCraftRAG/edgecraftrag/components/generator.py
index a888bf18f6..02c8cec2bb 100644
--- a/EdgeCraftRAG/edgecraftrag/components/generator.py
+++ b/EdgeCraftRAG/edgecraftrag/components/generator.py
@@ -26,12 +26,13 @@ def __init__(self, llm_model, prompt_template, inference_type, **kwargs):
             ("\n\n", "\n"),
             ("\t\n", "\n"),
         )
-        template = prompt_template
-        self.prompt = (
-            DocumentedContextRagPromptTemplate.from_file(template)
-            if os.path.isfile(template)
-            else DocumentedContextRagPromptTemplate.from_template(template)
-        )
+        safe_root = "/templates"
+        template = os.path.normpath(os.path.join(safe_root, prompt_template))
+        if not template.startswith(safe_root):
+            raise ValueError("Invalid template path")
+        if not os.path.exists(template):
+            raise ValueError("Template file not exists")
+        self.prompt = DocumentedContextRagPromptTemplate.from_file(template)
         self.llm = llm_model
         if isinstance(llm_model, str):
             self.model_id = llm_model
diff --git a/EdgeCraftRAG/tests/configs/test_pipeline_local_llm.json b/EdgeCraftRAG/tests/configs/test_pipeline_local_llm.json
index 261459e835..c657362ec1 100644
--- a/EdgeCraftRAG/tests/configs/test_pipeline_local_llm.json
+++ b/EdgeCraftRAG/tests/configs/test_pipeline_local_llm.json
@@ -37,7 +37,7 @@
       "device": "auto",
       "weight": "INT4"
     },
-    "prompt_path": "./edgecraftrag/prompt_template/default_prompt.txt",
+    "prompt_path": "./default_prompt.txt",
     "inference_type": "local"
   },
   "active": "True"
diff --git a/EdgeCraftRAG/tests/configs/test_pipeline_vllm.json b/EdgeCraftRAG/tests/configs/test_pipeline_vllm.json
index 05809c8e13..60565907ac 100644
--- a/EdgeCraftRAG/tests/configs/test_pipeline_vllm.json
+++ b/EdgeCraftRAG/tests/configs/test_pipeline_vllm.json
@@ -37,7 +37,7 @@
       "device": "auto",
       "weight": "INT4"
     },
-    "prompt_path": "./edgecraftrag/prompt_template/default_prompt.txt",
+    "prompt_path": "./default_prompt.txt",
     "inference_type": "vllm"
   },
   "active": "True"
diff --git a/EdgeCraftRAG/tests/test_compose_vllm_on_arc.sh b/EdgeCraftRAG/tests/test_compose_vllm_on_arc.sh
index 1d65057be5..4fa7ee92e7 100755
--- a/EdgeCraftRAG/tests/test_compose_vllm_on_arc.sh
+++ b/EdgeCraftRAG/tests/test_compose_vllm_on_arc.sh
@@ -31,8 +31,7 @@ vLLM_ENDPOINT="http://${HOST_IP}:${VLLM_SERVICE_PORT}"
 function build_docker_images() {
     cd $WORKPATH/docker_image_build
     echo "Build all the images with --no-cache, check docker_image_build.log for details..."
-    service_list="server ui ecrag"
-    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
+    docker compose -f build.yaml build --no-cache > ${LOG_PATH}/docker_image_build.log
 
     echo "Build vllm_openvino image from GenAIComps..."
     cd $WORKPATH && git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}"
diff --git a/EdgeCraftRAG/tests/test_pipeline_local_llm.json b/EdgeCraftRAG/tests/test_pipeline_local_llm.json
index 261459e835..c657362ec1 100644
--- a/EdgeCraftRAG/tests/test_pipeline_local_llm.json
+++ b/EdgeCraftRAG/tests/test_pipeline_local_llm.json
@@ -37,7 +37,7 @@
       "device": "auto",
       "weight": "INT4"
     },
-    "prompt_path": "./edgecraftrag/prompt_template/default_prompt.txt",
+    "prompt_path": "./default_prompt.txt",
     "inference_type": "local"
   },
   "active": "True"
diff --git a/EdgeCraftRAG/ui/gradio/default.yaml b/EdgeCraftRAG/ui/gradio/default.yaml
index 39c3ee92e3..ad3718f0c1 100644
--- a/EdgeCraftRAG/ui/gradio/default.yaml
+++ b/EdgeCraftRAG/ui/gradio/default.yaml
@@ -29,7 +29,7 @@ postprocessor: "reranker"
 
 # Generator
 generator: "chatqna"
-prompt_path: "./edgecraftrag/prompt_template/default_prompt.txt"
+prompt_path: "./default_prompt.txt"
 
 # Models
 embedding_model_id: "BAAI/bge-small-en-v1.5"
diff --git a/EdgeCraftRAG/ui/gradio/ecrag_client.py b/EdgeCraftRAG/ui/gradio/ecrag_client.py
index 6593cbd94f..7a58ff720b 100644
--- a/EdgeCraftRAG/ui/gradio/ecrag_client.py
+++ b/EdgeCraftRAG/ui/gradio/ecrag_client.py
@@ -78,7 +78,7 @@ def create_update_pipeline(
         ],
         generator=api_schema.GeneratorIn(
             # TODO: remove hardcoding
-            prompt_path="./edgecraftrag/prompt_template/default_prompt.txt",
+            prompt_path="./default_prompt.txt",
             model=api_schema.ModelIn(model_id=llm_id, model_path=llm_path, device=llm_device, weight=llm_weights),
             inference_type=llm_infertype,
         ),

From b621db2bf4ef8a06733daa9f3397a1dc80e291c4 Mon Sep 17 00:00:00 2001
From: "Wang, Kai Lawrence" <109344418+wangkl2@users.noreply.github.com>
Date: Fri, 22 Nov 2024 09:20:09 +0800
Subject: [PATCH 11/17] Update the llm backend ports (#1172)

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 ChatQnA/docker_compose/amd/gpu/rocm/README.md    | 2 +-
 ChatQnA/docker_compose/intel/hpu/gaudi/README.md | 6 +++---
 ChatQnA/docker_compose/nvidia/gpu/README.md      | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/README.md b/ChatQnA/docker_compose/amd/gpu/rocm/README.md
index 9e18d0f61e..9ef30d2a16 100644
--- a/ChatQnA/docker_compose/amd/gpu/rocm/README.md
+++ b/ChatQnA/docker_compose/amd/gpu/rocm/README.md
@@ -290,7 +290,7 @@ docker compose up -d
    Try the command below to check whether the TGI service is ready.
 
    ```bash
-   docker logs ${CONTAINER_ID} | grep Connected
+   docker logs chatqna-tgi-server | grep Connected
    ```
 
    If the service is ready, you will get the response like below.
diff --git a/ChatQnA/docker_compose/intel/hpu/gaudi/README.md b/ChatQnA/docker_compose/intel/hpu/gaudi/README.md
index b083b3d403..9e2b5b5455 100644
--- a/ChatQnA/docker_compose/intel/hpu/gaudi/README.md
+++ b/ChatQnA/docker_compose/intel/hpu/gaudi/README.md
@@ -314,7 +314,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
    Try the command below to check whether the LLM serving is ready.
 
    ```bash
-   docker logs tgi-service | grep Connected
+   docker logs tgi-gaudi-server | grep Connected
    ```
 
    If the service is ready, you will get the response like below.
@@ -327,7 +327,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
 
    ```bash
    # TGI service
-   curl http://${host_ip}:9009/v1/chat/completions \
+   curl http://${host_ip}:8005/v1/chat/completions \
      -X POST \
      -d '{"model": ${LLM_MODEL_ID}, "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
      -H 'Content-Type: application/json'
@@ -335,7 +335,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
 
    ```bash
    # vLLM Service
-   curl http://${host_ip}:9009/v1/chat/completions \
+   curl http://${host_ip}:8007/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{"model": ${LLM_MODEL_ID}, "messages": [{"role": "user", "content": "What is Deep Learning?"}]}'
    ```
diff --git a/ChatQnA/docker_compose/nvidia/gpu/README.md b/ChatQnA/docker_compose/nvidia/gpu/README.md
index 686ead52db..92b7a26e79 100644
--- a/ChatQnA/docker_compose/nvidia/gpu/README.md
+++ b/ChatQnA/docker_compose/nvidia/gpu/README.md
@@ -273,7 +273,7 @@ docker compose up -d
    Try the command below to check whether the TGI service is ready.
 
    ```bash
-   docker logs ${CONTAINER_ID} | grep Connected
+   docker logs tgi-server | grep Connected
    ```
 
    If the service is ready, you will get the response like below.
@@ -285,7 +285,7 @@ docker compose up -d
    Then try the `cURL` command below to validate TGI.
 
    ```bash
-   curl http://${host_ip}:9009/v1/chat/completions \
+   curl http://${host_ip}:8008/v1/chat/completions \
      -X POST \
      -d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
      -H 'Content-Type: application/json'

From 280030c4be7dec5f9026fc119e26c07fe0d9dd40 Mon Sep 17 00:00:00 2001
From: ZePan110 <ze.pan@intel.com>
Date: Mon, 25 Nov 2024 10:33:33 +0800
Subject: [PATCH 12/17] Limit the version of vllm to avoid dockers build
 failures. (#1183)

Signed-off-by: ZePan110 <ze.pan@intel.com>
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 .github/workflows/_example-workflow.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/_example-workflow.yml b/.github/workflows/_example-workflow.yml
index a86ac25929..b05d67eedc 100644
--- a/.github/workflows/_example-workflow.yml
+++ b/.github/workflows/_example-workflow.yml
@@ -75,7 +75,7 @@ jobs:
           docker_compose_path=${{ github.workspace }}/${{ inputs.example }}/docker_image_build/build.yaml
           if [[ $(grep -c "vllm:" ${docker_compose_path}) != 0 ]]; then
               git clone https://github.com/vllm-project/vllm.git
-              cd vllm && git rev-parse HEAD && cd ../
+              cd vllm && git checkout 446c780 && cd ../
           fi
           if [[ $(grep -c "vllm-gaudi:" ${docker_compose_path}) != 0 ]]; then
                git clone https://github.com/HabanaAI/vllm-fork.git

From f87770c35e65888933ddc703803d72184a65bd35 Mon Sep 17 00:00:00 2001
From: chyundunovDatamonsters <c.yundunov@datamonsters.com>
Date: Tue, 17 Dec 2024 11:57:55 +0700
Subject: [PATCH 13/17] Update set_env.sh

Delete sensitive data from set envs script

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 Translation/docker_compose/amd/gpu/rocm/set_env.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Translation/docker_compose/amd/gpu/rocm/set_env.sh b/Translation/docker_compose/amd/gpu/rocm/set_env.sh
index 8e33e61123..9efa6f3ee3 100644
--- a/Translation/docker_compose/amd/gpu/rocm/set_env.sh
+++ b/Translation/docker_compose/amd/gpu/rocm/set_env.sh
@@ -5,11 +5,11 @@
 
 # SPDX-License-Identifier: Apache-2.0
 
-export TRANSLATION_HOST_IP='192.165.1.21'
-export TRANSLATION_EXTERNAL_HOST_IP='direct-supercomputer1.powerml.co'
+export TRANSLATION_HOST_IP=''
+export TRANSLATION_EXTERNAL_HOST_IP=''
 export TRANSLATION_LLM_MODEL_ID="haoranxu/ALMA-13B"
 export TRANSLATION_TGI_LLM_ENDPOINT="http://${TRANSLATION_HOST_IP}:8008"
-export TRANSLATION_HUGGINGFACEHUB_API_TOKEN='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'
+export TRANSLATION_HUGGINGFACEHUB_API_TOKEN=''
 export TRANSLATION_MEGA_SERVICE_HOST_IP=${TRANSLATION_HOST_IP}
 export TRANSLATION_LLM_SERVICE_HOST_IP=${TRANSLATION_HOST_IP}
 export TRANSLATION_FRONTEND_SERVICE_IP=${TRANSLATION_HOST_IP}

From 1a860b413aba3f1625249dda71cf226ef5c6335e Mon Sep 17 00:00:00 2001
From: Chingis Yundunov <YundunovCN@sibedge.com>
Date: Wed, 18 Dec 2024 10:09:34 +0700
Subject: [PATCH 14/17] DBQnA - fix README for Translation

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 Translation/docker_compose/amd/gpu/rocm/README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Translation/docker_compose/amd/gpu/rocm/README.md b/Translation/docker_compose/amd/gpu/rocm/README.md
index 5cff6dc36e..b9201ba571 100644
--- a/Translation/docker_compose/amd/gpu/rocm/README.md
+++ b/Translation/docker_compose/amd/gpu/rocm/README.md
@@ -1,5 +1,6 @@
 # Build and deploy Translation Application on AMD GPU (ROCm)
 
+
 ## Build images
 
 ### Build the LLM Docker Image

From 4b7e8485456b4a9105518e1e07cb20da72bddd3b Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Wed, 18 Dec 2024 03:10:15 +0000
Subject: [PATCH 15/17] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 Translation/docker_compose/amd/gpu/rocm/README.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Translation/docker_compose/amd/gpu/rocm/README.md b/Translation/docker_compose/amd/gpu/rocm/README.md
index b9201ba571..5cff6dc36e 100644
--- a/Translation/docker_compose/amd/gpu/rocm/README.md
+++ b/Translation/docker_compose/amd/gpu/rocm/README.md
@@ -1,6 +1,5 @@
 # Build and deploy Translation Application on AMD GPU (ROCm)
 
-
 ## Build images
 
 ### Build the LLM Docker Image

From 47893e7dabcc3d4c520f75335b36747c04b0129f Mon Sep 17 00:00:00 2001
From: Chingis Yundunov <YundunovCN@sibedge.com>
Date: Wed, 18 Dec 2024 10:14:17 +0700
Subject: [PATCH 16/17] DBQnA - fix README for Translation

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 Translation/docker_compose/amd/gpu/rocm/README.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Translation/docker_compose/amd/gpu/rocm/README.md b/Translation/docker_compose/amd/gpu/rocm/README.md
index b9201ba571..5cff6dc36e 100644
--- a/Translation/docker_compose/amd/gpu/rocm/README.md
+++ b/Translation/docker_compose/amd/gpu/rocm/README.md
@@ -1,6 +1,5 @@
 # Build and deploy Translation Application on AMD GPU (ROCm)
 
-
 ## Build images
 
 ### Build the LLM Docker Image

From 1391c25fbb1c4d20ca34eea7631b99b710eb6989 Mon Sep 17 00:00:00 2001
From: Chingis Yundunov <YundunovCN@sibedge.com>
Date: Fri, 20 Dec 2024 14:26:16 +0700
Subject: [PATCH 17/17] Translation app - fix deploy on AMD

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
---
 .github/workflows/_example-workflow.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/_example-workflow.yml b/.github/workflows/_example-workflow.yml
index b05d67eedc..a86ac25929 100644
--- a/.github/workflows/_example-workflow.yml
+++ b/.github/workflows/_example-workflow.yml
@@ -75,7 +75,7 @@ jobs:
           docker_compose_path=${{ github.workspace }}/${{ inputs.example }}/docker_image_build/build.yaml
           if [[ $(grep -c "vllm:" ${docker_compose_path}) != 0 ]]; then
               git clone https://github.com/vllm-project/vllm.git
-              cd vllm && git checkout 446c780 && cd ../
+              cd vllm && git rev-parse HEAD && cd ../
           fi
           if [[ $(grep -c "vllm-gaudi:" ${docker_compose_path}) != 0 ]]; then
                git clone https://github.com/HabanaAI/vllm-fork.git