Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tei and embedding usvc to GenAIComps #1033

Closed
wants to merge 3 commits into from

Conversation

yongfengdu
Copy link
Collaborator

Description

Enable CIs equivalent to GenAIInfra.
Add tei/embedding-usvc to GenAIComps aligned with new directory design.
README updates.

Issues

GenAIInfra 623

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should NOT be in the git

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 files were in the git ignore of GenAIInfra, will fix it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should NOT be in the git

dependencies:
- name: tei
version: 0-latest
repository: file://../../../../3rd_parties/tei/deployment/kubernetes/helm-chart
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this point to an internal OCI repository?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, oci references. That way the user can consume the charts without the git tree.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't point this to internal OCI repo.(This can be used by external users who doesn't have access to internal CI)
The Option is to use ghcr.io/opea-project/charts, but CI will need to handle more things to test latest changes, and avoid polluting the 0-latest stuff:

  1. Maintain an internal registry.
  2. Push the changed helm chart to internal registry, with version number indicating the tested PR. For example 0-hash
  3. Replace the ghcr.io reference with internal registry at CI for the chart under test.
  4. Replace the version: 0-latest with version: 0-hash
  5. Cleanup the new pushed helm charts.

Ideally, for docker images we should use the same steps.

Using file:// within the same repo is a workaround to avoid such complex changes.
I can do this but it involve more CI infra setups and takes more time.
Let's finalized other parts first before doing this changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is internal OCI repo? I only know of oci://ghcr.io/opea-project/charts where we host the charts, and that is public.

echo "Use internal docker registry"
# insert a prefix before opea/.*, the prefix is OPEA_IMAGE_REPO
# TBD, use global options.(https://github.com/opea-project/GenAIInfra/issues/562)
find . -name '*values.yaml' -type f -exec sed -i "s#repository: opea/*#repository: ${OPEA_IMAGE_REPO}opea/#g" {} \;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a PR changes the python code and helm chart simultaneously, our helm CI can't test the changes of the python code. Because the image won't be in internal OPEA_IMAGE_REPO until the PR is merged. We should follow the docker compose CI way to test this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an issue.
docker compose test will build all required images before deployment.
It run the build and deploy on the same host.
But for helm chart, the runner is dispatched to k8s master node (k8s-master), and the workload is actually running on workers(k8s-worker1). So we'll have to 1) leverage internal docker registry, or 2) modify the runner to k8s-worker1.
With 1), we'll need to setup steps mentioned above.
with 2), There should be one build.yaml for each component to build all required images for this comps (e.g. comps/embedding/docker_image_build/build.yaml). It can't be accomplished with this PR.

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

name: Helm chart test for components
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better change it to 'helm test' to save display length in the final action GUI

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

if: always() && ${{ needs.job1.outputs.run_matrix.workload.length }} > 0
uses: ./.github/workflows/_helm-test.yaml
strategy:
matrix: ${{ fromJSON(needs.job1.outputs.run_matrix) }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a name to display the short version of chart_dir of the test, it's too long for the reviewer to figure out. Please checkout https://futurestud.io/tutorials/github-actions-customize-the-job-name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

type: string

env:
CHARTS_LOCATION: "GenAIComps"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who is using this variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inherited from previous script and will do clean up.

Include helm lint and e2e test.
Still use images from internal registry.
Local image build needs more efforts.

Signed-off-by: Dolpher Du <[email protected]>
@poussa
Copy link

poussa commented Dec 13, 2024

Why are we burying the helm charts deep into the directory tree? Typically, the helm-charts directory lives on top level and collects all the charts under it. We should follow the best practices and not to invent something new which is foreign to most fo the people.

Something like this.

GenAIComps
├── README.md
├── comps
└── helm-charts
    ├── README.md
    ├── embeddings
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── nginx
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── tei
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── tgi
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    └── vllm
        ├── Chart.yaml
        ├── README.md
        ├── templates
        └── values.yaml

@poussa
Copy link

poussa commented Dec 13, 2024

All the values.yaml files should have sensible defaults to CPU and memory request. That way the pod will be of Kubernetes Burstable QoS class, and not BestEffort. Burstable QoS class pods get better performance.

We can use defaults like this:

  resources:
    requests:
     cpu: 1
     memory: "100MiB"

See more Configure Quality of Service for Pods

Signed-off-by: Dolpher Du <[email protected]>
Signed-off-by: Dolpher Du <[email protected]>
@yongfengdu
Copy link
Collaborator Author

Will track this under this issue: opea-project/GenAIInfra#643

All the values.yaml files should have sensible defaults to CPU and memory request. That way the pod will be of Kubernetes Burstable QoS class, and not BestEffort. Burstable QoS class pods get better performance.

We can use defaults like this:

  resources:
    requests:
     cpu: 1
     memory: "100MiB"

See more Configure Quality of Service for Pods

@yongfengdu
Copy link
Collaborator Author

This was the same as my original proposal(It also makes CI/Release work easier), but was denied because it's against the overall directory design of GenAIComps (helm chart and docker compose are both deployment methods and should be in the same directory hierarchy).
See yongfengdu@044eb9f

I see there are some more discussions of more general repo/layout, but haven't see any final decision yet. I can change accordingly if there are decisions.

Why are we burying the helm charts deep into the directory tree? Typically, the helm-charts directory lives on top level and collects all the charts under it. We should follow the best practices and not to invent something new which is foreign to most fo the people.

Something like this.

GenAIComps
├── README.md
├── comps
└── helm-charts
    ├── README.md
    ├── embeddings
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── nginx
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── tei
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    ├── tgi
    │   ├── Chart.yaml
    │   ├── README.md
    │   ├── templates
    │   └── values.yaml
    └── vllm
        ├── Chart.yaml
        ├── README.md
        ├── templates
        └── values.yaml

@chensuyue chensuyue deleted the branch opea-project:refactor_comps January 2, 2025 08:31
@chensuyue chensuyue closed this Jan 2, 2025
@yongfengdu yongfengdu deleted the helm branch January 20, 2025 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants