Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respecting knative-serving concurrency in knative-eventing #15792

Open
tikr7 opened this issue Feb 26, 2025 · 0 comments
Open

Respecting knative-serving concurrency in knative-eventing #15792

tikr7 opened this issue Feb 26, 2025 · 0 comments
Labels
kind/feature Well-understood/specified features, ready for coding.

Comments

@tikr7
Copy link

tikr7 commented Feb 26, 2025

The Knative stack (eventing and serving) is pretty awesome. Overall it makes infra easy to use for our devs and ML engineers. On Kuberntes I don't wanna miss it anymore. It has advanced scaling capabilities based on concurrency. It can handle scaling very well in the following two scenarios:

  • a microservces which answers very quickly (milliseconds)
  • long running jobs which take more than a couple of minutes with the JobSink approach

For the scenarios in between of multiple seconds or up to a minute seem to be a blind spot especially when combining Knative eventing and serving.

Possible configurations

First of all a couple of (central) configurations and behaviors (if something is wrong or there is anything else, please let me know!):

  • kubectl get cm -n knative-eventing config-kafka-broker-data-plane -o yaml
max.poll.records=50

This is how many messages it pulls kinda in parallel per partition.
It is a central configuration and is set on Kubernetes cluster level (to be more specific on knative-eventing-kafka installation level).

  • kubectl get cm -n knative-eventing kafka-broker-config -o yaml
default.topic.partitions: "10"

The partitions are like a multiplier. max.poll.records x default.topic.partitions = 50 x 10 = 500 messages are pulled out kinda in parallel.
This configuration is on Broker level but requires at least an additional kafka-broker-config-* ConfigMap.

  • kubectl get trigger <example-trigger> -o yaml
metadata:
  annotations:
    kafka.eventing.knative.dev/delivery.order: ordered

Another possibility to restrict traffic is using ordered. "An ordered consumer is a per-partition blocking consumer that waits for a successful response from the CloudEvent subscriber before it delivers the next message of the partition." Which means with 1 partition it handles only 1 message at the same time. With 10 partitions it handles 10 messages at the same time.

  • Another important variable is the DeliverySpec.Timout which can be set on Broker and Trigger level (which makes it independent for every Sink/microservice). I could not find the default values but I assume it is 30s.
  delivery:
    timeout: PT30S
spec:
  template:
    spec:
      containerConcurrency: 50

Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time.

Background

I am working in a Python world with AI. Python is not good in doing things fast and concurrently. AI (we do a lot in the vision area with segments and so forth) needs a lot processing time and is therefore slow. We also face often a tipping point, when overloading Python FastAPI/Flask, it has a much lower throughput then. Hence, with a lower concurrency it can achieve the highest throughput.

Scenario and possible tweaks

Lets assume I have a microservice which can handle 2 messages in parallel and needs 10s for each message. Hence the throughput is around 12 messages per minute per replica. Keeping the default values with max.poll.records=50, default.topic.partitions: "10" and timeout: PT30S (unordered), it would overload the microservice immateriality if we get 500 messages and nothing gets processed correctly.

  • Now, we could set a containerConcurrency: 2 (hard-limit), which ensures only getting 2 messages at FastAPI/Flask level to get the max performance but it is on "queue-proxy"-level. The "queue-proxy" needs to buffer all other messages and produces timeouts after 30s too (I could not even identify how to increase the timeout there).
  • If we would decrease the max.poll.records to 2, it would throttle all other microservices in our Kubernetes cluster which could also handle super huge throughput.
  • We could set a combination of default.topic.partitions: "10" (another number possible) and kafka.eventing.knative.dev/delivery.order: ordered. But first the partition configuration is on Broker level and this is quite static.

All this leads to timeouts which triggers retries which over floats the system even more.

I know knative-eventing is in the end an abstraction of Kafka. But would it be possible to implement an intelligent mechanism that knative-eventing respects and is aware of knative-serving concurrency and only pulls the messages out of kafka to utilize the microservices as best as possible (but not too much)? While also scaling more replicas if the demand requires that?

Expected behavior

So the expected behavior would be:

  • Lets assume we have 500 messages in queue and we have only 1 replica which can only have 2 concurrent requests, then it should not send more than 2 concurrent requests to that replica (hard-limit)
  • Then it should scale accordingly a couple of replicas to deal with the traffic

In my opinion it is the last missing puzzle for Knative in terms of scalability.

I am fully open to hop on a call for any discussion =)

@tikr7 tikr7 added the kind/feature Well-understood/specified features, ready for coding. label Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Well-understood/specified features, ready for coding.
Projects
None yet
Development

No branches or pull requests

1 participant