-
We have some flows where we read from a kafka input, run through an avro processor and then write to an s3 bucket. We seem to be hitting a flat max throughput which we're reasonably sure isn't the s3 request rate limit. Our pipeline is as follows:
With this config we end up hitting an absolute maximum of 222 messages per second. If we double the input batching to 20000 we get a max of 444 which makes me think the limiting factor is the output. If the condition isn't met and it goes to the drop output it processes extremely quickly. We have also tried tuning the We are running the latest docker image that was published under Is there anything obviously wrong? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Hey @dbason, I think having both input and output batching makes it difficult to reason about what's going on. Please try removing the input batching and move the |
Beta Was this translation helpful? Give feedback.
I've done some experimentation with this now. We couldn't move the avro processor as we need that earlier to convert the messages from binary so we can run the blobLang query on them in the switch. The key was the
checkpoint_limit
field. Increasing this to thebatchsize * max_in_flight
increased the performance by several orders of magnitude. Thankyou so much for the advice!