Move tei and embedding usvc to GenAIComps #1033

yongfengdu · 2024-12-13T05:53:10Z

Description

Enable CIs equivalent to GenAIInfra.
Add tei/embedding-usvc to GenAIComps aligned with new directory design.
README updates.

Issues

GenAIInfra 623

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

lianhao · 2024-12-13T08:27:19Z

comps/embeddings/deployment/kubernetes/helm-chart/Chart.lock

This file should NOT be in the git

These 2 files were in the git ignore of GenAIInfra, will fix it

lianhao · 2024-12-13T08:27:39Z

comps/embeddings/deployment/kubernetes/helm-chart/charts/tei-0-latest.tgz

this file should NOT be in the git

lianhao · 2024-12-13T08:30:57Z

comps/embeddings/deployment/kubernetes/helm-chart/Chart.yaml

+dependencies:
+  - name: tei
+    version: 0-latest
+    repository: file://../../../../3rd_parties/tei/deployment/kubernetes/helm-chart


should this point to an internal OCI repository?

yes, oci references. That way the user can consume the charts without the git tree.

We shouldn't point this to internal OCI repo.(This can be used by external users who doesn't have access to internal CI)
The Option is to use ghcr.io/opea-project/charts, but CI will need to handle more things to test latest changes, and avoid polluting the 0-latest stuff:

Maintain an internal registry.

Push the changed helm chart to internal registry, with version number indicating the tested PR. For example 0-hash

Replace the ghcr.io reference with internal registry at CI for the chart under test.

Replace the version: 0-latest with version: 0-hash

Cleanup the new pushed helm charts.

Ideally, for docker images we should use the same steps.

Using file:// within the same repo is a workaround to avoid such complex changes.
I can do this but it involve more CI infra setups and takes more time.
Let's finalized other parts first before doing this changes.

What is internal OCI repo? I only know of oci://ghcr.io/opea-project/charts where we host the charts, and that is public.

lianhao · 2024-12-13T08:37:37Z

.github/workflows/_helm-test.yaml

+            echo "Use internal docker registry"
+            # insert a prefix before opea/.*, the prefix is OPEA_IMAGE_REPO
+            # TBD, use global options.(https://github.com/opea-project/GenAIInfra/issues/562)
+            find . -name '*values.yaml' -type f -exec sed -i "s#repository: opea/*#repository: ${OPEA_IMAGE_REPO}opea/#g" {} \;


if a PR changes the python code and helm chart simultaneously, our helm CI can't test the changes of the python code. Because the image won't be in internal OPEA_IMAGE_REPO until the PR is merged. We should follow the docker compose CI way to test this.

Yes, this is an issue.
docker compose test will build all required images before deployment.
It run the build and deploy on the same host.
But for helm chart, the runner is dispatched to k8s master node (k8s-master), and the workload is actually running on workers(k8s-worker1). So we'll have to 1) leverage internal docker registry, or 2) modify the runner to k8s-worker1.
With 1), we'll need to setup steps mentioned above.
with 2), There should be one build.yaml for each component to build all required images for this comps (e.g. comps/embedding/docker_image_build/build.yaml). It can't be accomplished with this PR.

lianhao · 2024-12-13T08:43:49Z

.github/workflows/pr-helmchart-test.yaml

+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+name: Helm chart test for components


better change it to 'helm test' to save display length in the final action GUI

lianhao · 2024-12-13T09:03:05Z

.github/workflows/pr-helmchart-test.yaml

+    if: always() && ${{ needs.job1.outputs.run_matrix.workload.length }} > 0
+    uses: ./.github/workflows/_helm-test.yaml
+    strategy:
+      matrix: ${{ fromJSON(needs.job1.outputs.run_matrix) }}


we can add a name to display the short version of chart_dir of the test, it's too long for the reviewer to figure out. Please checkout https://futurestud.io/tutorials/github-actions-customize-the-job-name

lianhao · 2024-12-13T09:03:59Z

.github/workflows/_helm-test.yaml

+        type: string
+
+env:
+  CHARTS_LOCATION: "GenAIComps"


who is using this variable?

This is inherited from previous script and will do clean up.

Include helm lint and e2e test. Still use images from internal registry. Local image build needs more efforts. Signed-off-by: Dolpher Du <[email protected]>

poussa · 2024-12-13T13:40:47Z

Why are we burying the helm charts deep into the directory tree? Typically, the helm-charts directory lives on top level and collects all the charts under it. We should follow the best practices and not to invent something new which is foreign to most fo the people.

Something like this.

GenAIComps
├── README.md
├── comps
└── helm-charts
    ├── README.md
    ├── embeddings
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── nginx
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── tei
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── tgi
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    └── vllm
        ├── Chart.yaml
        ├── README.md
        ├── templates
        └── values.yaml

poussa · 2024-12-13T13:47:29Z

All the values.yaml files should have sensible defaults to CPU and memory request. That way the pod will be of Kubernetes Burstable QoS class, and not BestEffort. Burstable QoS class pods get better performance.

We can use defaults like this:

  resources:
    requests:
     cpu: 1
     memory: "100MiB"

See more Configure Quality of Service for Pods

Signed-off-by: Dolpher Du <[email protected]>

yongfengdu · 2024-12-17T08:39:10Z

Will track this under this issue: opea-project/GenAIInfra#643

All the values.yaml files should have sensible defaults to CPU and memory request. That way the pod will be of Kubernetes Burstable QoS class, and not BestEffort. Burstable QoS class pods get better performance.

We can use defaults like this:
  resources:
    requests:
     cpu: 1
     memory: "100MiB"
See more Configure Quality of Service for Pods

yongfengdu · 2024-12-17T08:47:03Z

This was the same as my original proposal(It also makes CI/Release work easier), but was denied because it's against the overall directory design of GenAIComps (helm chart and docker compose are both deployment methods and should be in the same directory hierarchy).
See yongfengdu@044eb9f

I see there are some more discussions of more general repo/layout, but haven't see any final decision yet. I can change accordingly if there are decisions.

Why are we burying the helm charts deep into the directory tree? Typically, the helm-charts directory lives on top level and collects all the charts under it. We should follow the best practices and not to invent something new which is foreign to most fo the people.

Something like this.

GenAIComps
├── README.md
├── comps
└── helm-charts
    ├── README.md
    ├── embeddings
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── nginx
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── tei
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── tgi
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    └── vllm
        ├── Chart.yaml
        ├── README.md
        ├── templates
        └── values.yaml

yongfengdu requested a review from lvliang-intel as a code owner December 13, 2024 05:53

yongfengdu requested review from poussa and lianhao December 13, 2024 05:53

lianhao requested changes Dec 13, 2024

View reviewed changes

lianhao reviewed Dec 13, 2024

View reviewed changes

Add helm chart test

010a0cb

Include helm lint and e2e test. Still use images from internal registry. Local image build needs more efforts. Signed-off-by: Dolpher Du <[email protected]>

yongfengdu added 2 commits December 13, 2024 13:48

Add tei and embedding-usvc

fb31e66

Signed-off-by: Dolpher Du <[email protected]>

Fix CI

a535767

Signed-off-by: Dolpher Du <[email protected]>

chensuyue deleted the branch opea-project:refactor_comps January 2, 2025 08:31

chensuyue closed this Jan 2, 2025

yongfengdu deleted the helm branch January 20, 2025 05:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move tei and embedding usvc to GenAIComps #1033

Move tei and embedding usvc to GenAIComps #1033

yongfengdu commented Dec 13, 2024

lianhao Dec 13, 2024

yongfengdu Dec 17, 2024

lianhao Dec 13, 2024

lianhao Dec 13, 2024

poussa Dec 13, 2024

yongfengdu Dec 17, 2024

poussa Dec 17, 2024

lianhao Dec 13, 2024

yongfengdu Dec 17, 2024

lianhao Dec 13, 2024

yongfengdu Dec 17, 2024

lianhao Dec 13, 2024

yongfengdu Dec 17, 2024

lianhao Dec 13, 2024

yongfengdu Dec 17, 2024

poussa commented Dec 13, 2024 •

edited

Loading

poussa commented Dec 13, 2024

yongfengdu commented Dec 17, 2024

yongfengdu commented Dec 17, 2024

Move tei and embedding usvc to GenAIComps #1033

Move tei and embedding usvc to GenAIComps #1033

Conversation

yongfengdu commented Dec 13, 2024

Description

Issues

Type of change

Dependencies

Tests

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

poussa commented Dec 13, 2024 • edited Loading

poussa commented Dec 13, 2024

yongfengdu commented Dec 17, 2024

yongfengdu commented Dec 17, 2024

poussa commented Dec 13, 2024 •

edited

Loading