-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy-paste microservices-connector changes from EnterpriseRAG (only for discussion purposes) #644
base: main
Are you sure you want to change the base?
Copy-paste microservices-connector changes from EnterpriseRAG (only for discussion purposes) #644
Conversation
95e9841
to
de22d7b
Compare
@irisdingbj if you get a chance please follow this PR at a high level and comment. Note this PR is to give an idea, is a work in progress, not expected to compile. |
@ppalucki thanks for the PR. I think the telemetry is a good, albeit large, addition to gmc. I will try the gmc telemetry in eRAG repo to gain more insights. |
These are all good improvements for GMC, Thanks! |
common_tag: &tag "ts1734346962" | ||
common_repository: &repo "localhost:5000" | ||
|
||
llm_model: &cpu_model "Intel/neural-chat-7b-v3-3" | ||
llm_model_gaudi: &hpu_model "mistralai/Mixtral-8x7B-Instruct-v0.1" | ||
|
||
images: | ||
gmcManager: | ||
image: "opea/gmcmanager" | ||
repository: *repo | ||
tag: *tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this &tag
and *tag
syntax something supported by Helm, or something requiring pre-processing before it's fed to Helm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's Helm's native variable reference syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Did not find mention of it from Helm docs. According to this, it's not Helm, but YAML syntax: https://stackoverflow.com/questions/75600964/what-does-and-denote-in-a-helm-template
@@ -38,23 +139,22 @@ podSecurityContext: {} | |||
securityContext: | |||
capabilities: | |||
drop: | |||
- ALL | |||
- ALL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
pvc: | ||
- name: model-volume-embedding | ||
accessMode: ReadWriteOnce | ||
namespace: chatqa | ||
storage: 20Gi | ||
- name: model-volume-embedding | ||
accessMode: ReadWriteOnce | ||
namespace: dataprep | ||
storage: 20Gi | ||
- name: model-volume-llm | ||
accessMode: ReadWriteOnce | ||
namespace: chatqa | ||
storage: 100Gi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PV/PVC setup needs some documentation. Are there also separate PVs for each model, are are all these PVCs using same PV?
statusVerbose := "Not ready" | ||
if deployment.Status.AvailableReplicas == *deployment.Spec.Replicas { | ||
readyCnt += 1 | ||
statusVerbose = "Ready" | ||
} | ||
deploymentStatus.WriteString(fmt.Sprintf("%s; Replicas: %d desired | %d updated | %d total | %d available | %d unavailable\n", | ||
statusVerbose, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When HPA is used, deployments should not specify replica counts.
I.e. GMC should not be setting replica counts unless it's also responsible for scaling the services.
provider := sdkmetric.NewMeterProvider(sdkmetric.WithReader(exporter)) | ||
otel.SetMeterProvider(provider) | ||
|
||
// ppalucki: Own metrics defintion bellow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ppalucki: Own metrics defintion bellow | |
// ppalucki: Own metrics definition below |
metric.WithDescription("Measures the duration of generating all but first tokens."), | ||
api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364), | ||
) | ||
if err != nil { | ||
log.Error(err, "metrics: cannot register next token histogram measure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar improvement could be applied to all metric descriptions / errors:
metric.WithDescription("Measures the duration of generating all but first tokens."), | |
api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364), | |
) | |
if err != nil { | |
log.Error(err, "metrics: cannot register next token histogram measure") | |
metric.WithDescription("Duration of generating all but the first tokens."), | |
api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364), | |
) | |
if err != nil { | |
log.Error(err, "metrics: cannot register next token duration histogram") |
"llm.all.token.latency", | ||
metric.WithUnit("ms"), | ||
metric.WithDescription("Measures the duration to generate response with all tokens."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think more established name for this metric would be "llm.request.latency"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aggree completly :) please check my comment here: #296 (comment)
// Stop the execution of sequence right away if step is a hard dependency and is unsuccessful | ||
otlpr.WithContext(log, ctx).Info("This step is a hard dependency and it is unsuccessful. Stop pipeline execution.", "stepName", step.StepName, "statusCode", statusCode) | ||
// err is nil here, so we cannot record any details about this unsuccesful response without parsing the responseBody. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// err is nil here, so we cannot record any details about this unsuccesful response without parsing the responseBody. | |
// err is nil here, so we cannot record any details about this unsuccessful response without parsing the responseBody |
} | ||
return | ||
} | ||
|
||
// Close span if there was not err and not guardarils were activated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Close span if there was not err and not guardarils were activated | |
// Close span if there was not err, and no guardrails were activated |
// create a handler to handle traffic to /ui | ||
// if the payload is empty, redirect to ui service | ||
// if there's payload, format the payload and redirect to backend service | ||
func mcUiHandler(w http.ResponseWriter, req *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Router logic changes
Hanlding guard 466 specific http code to stop processing pipeline and still result body
Um, what about the removal of this large UI handling function, and the next asset handling one?
In general changes look IMHO quite OK, but I'm not familiar (at all) with the current GMC code.
Megaservice handles streaming and non-streaming requests separately. Token metrics are reported only for streaming (LLM) requests. Couldn't GMC do the same / is this some future TODO? |
Description
This PR is only for discussion purposes of upstreaming changes from EnterpriseRAG to opea/GenAIInfra microservices-connector.
It is not expected (at current stage) to compile/work unless we decide how much of available feature we want to backport and in which form.
PR includes only changes from directories:
Please note that, PR excludes:
all manifests yamls from following directories:
changes in docs README.md / user_guide.md
golang depdencies in go.mod/go.sum
Changes description:
Note when using links please click "load diff" because in main.go there is over 500 lines changed and github diff hides them by default
1. Telemetry
Metrics was proposed previously in #296, but now it has a lot of new things (logs/traces included)
Main changes in:
Telemetry know issues:
2. Router logic changes
3. GMC operator GMC object status more elaborate "string" description
4. New EnterpriseRAG components
TODO (Jakub Piasecki)
4. Other
TODO (Jakub Piasecki)