Copy-paste microservices-connector changes from EnterpriseRAG (only for discussion purposes) #644

ppalucki · 2024-12-16T17:30:05Z

Description

This PR is only for discussion purposes of upstreaming changes from EnterpriseRAG to opea/GenAIInfra microservices-connector.

It is not expected (at current stage) to compile/work unless we decide how much of available feature we want to backport and in which form.

PR includes only changes from directories:

Please note that, PR excludes:

all manifests yamls from following directories:
- /config/manifests/ - changes were added of both sides
- /config/samples/ - changes were added of both sides
- /helm/manifests_common/
changes in docs README.md / user_guide.md
golang depdencies in go.mod/go.sum

Changes description:

Note when using links please click "load diff" because in main.go there is over 500 lines changed and github diff hides them by default

1. Telemetry

Metrics was proposed previously in #296, but now it has a lot of new things (logs/traces included)

Main changes in:

microservices-connector/cmd/router/main.go
- support for custom metrics, OpenTelemetry instrumentation for metrics/logs/traces (TODO more details later)
microservices-connector/config/gmcrouter/gmc-router.yaml
- Configuration of OpenTelemetry

Telemetry know issues:

metrics names are missleading (not following AI conventions) and "first token latency" doesnt reflect "real" latency before users gets output (full description here)
enabling logs collection through OpenTelemetry collector steals logs from stdout/stderr (limitation of used logging adapter for OpenTelemetry)
disabling OTEL_TRACES does not disable the tracing "wrappers" completly (it just doesn't send them anywhere) - minor optimization
first/next token latency metrics/traces attributes is based on "first bytes read" latency without any logic parsing those bytes from socket from last component in the pipeline and can be missleading when deployed for dataprep pipeline or from component that do not stream (when output guardrail is enabled) - they aren't wrong but name is missleading suggesting "token" is somehow detected, which is wrong and is just "packet of bytes" we received

2. Router logic changes

microservices-connector/cmd/router/main.go in mcGraphHandler Hanlding guard 466 specific http code to stop processing pipeline and still result body

3. GMC operator GMC object status more elaborate "string" description

microservices-connector/internal/controller/gmconnector_controller.go it was done, so UI can easily differentiate status (up/down) of deployed services without parsing logic (na calculating ready vs desired state)

4. New EnterpriseRAG components

TODO (Jakub Piasecki)

4. Other

TODO (Jakub Piasecki)

mkbhanda · 2024-12-16T18:09:37Z

@irisdingbj if you get a chance please follow this PR at a high level and comment. Note this PR is to give an idea, is a work in progress, not expected to compile.

poussa · 2024-12-17T09:10:28Z

@ppalucki thanks for the PR. I think the telemetry is a good, albeit large, addition to gmc. I will try the gmc telemetry in eRAG repo to gain more insights.

irisdingbj · 2024-12-17T22:44:30Z

@irisdingbj if you get a chance please follow this PR at a high level and comment. Note this PR is to give an idea, is a work in progress, not expected to compile.

These are all good improvements for GMC, Thanks!

eero-t · 2024-12-18T10:21:25Z

microservices-connector/helm/values.yaml

+common_tag: &tag "ts1734346962"
+common_repository: &repo "localhost:5000"
+
+llm_model: &cpu_model "Intel/neural-chat-7b-v3-3"
+llm_model_gaudi: &hpu_model "mistralai/Mixtral-8x7B-Instruct-v0.1"
+
+images:
+  gmcManager:
+    image: "opea/gmcmanager"
+    repository: *repo
+    tag: *tag


Is this &tag and *tag syntax something supported by Helm, or something requiring pre-processing before it's fed to Helm?

Yes, it's Helm's native variable reference syntax

Hm. Did not find mention of it from Helm docs. According to this, it's not Helm, but YAML syntax: https://stackoverflow.com/questions/75600964/what-does-and-denote-in-a-helm-template

eero-t · 2024-12-18T10:21:55Z

microservices-connector/helm/values.yaml

@@ -38,23 +139,22 @@ podSecurityContext: {}
 securityContext:
  capabilities:
    drop:
-    - ALL
+      - ALL


eero-t · 2024-12-18T10:24:37Z

microservices-connector/helm/values.yaml

+pvc:
+  - name: model-volume-embedding
+    accessMode: ReadWriteOnce
+    namespace: chatqa
+    storage: 20Gi
+  - name: model-volume-embedding
+    accessMode: ReadWriteOnce
+    namespace: dataprep
+    storage: 20Gi
+  - name: model-volume-llm
+    accessMode: ReadWriteOnce
+    namespace: chatqa
+    storage: 100Gi


PV/PVC setup needs some documentation. Are there also separate PVs for each model, are are all these PVCs using same PV?

eero-t · 2024-12-18T10:30:10Z

microservices-connector/internal/controller/gmconnector_controller.go

+			statusVerbose := "Not ready"
+			if deployment.Status.AvailableReplicas == *deployment.Spec.Replicas {
+				readyCnt += 1
+				statusVerbose = "Ready"
+			}
+			deploymentStatus.WriteString(fmt.Sprintf("%s; Replicas: %d desired | %d updated | %d total | %d available | %d unavailable\n",
+				statusVerbose,


When HPA is used, deployments should not specify replica counts.

I.e. GMC should not be setting replica counts unless it's also responsible for scaling the services.

eero-t · 2024-12-18T10:32:01Z