-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple producers, one producer per topic, each with different reliability, performance #1032
Comments
I do? :) Where? Those are global producer config properties, not topic, as indicated in CONFIGURATION.md You typically don't have to adjust batch.num.messages, there are no adverse effects of big batches, in fact they increase chance of good compression ratio (if you're using compression). For high-throughput topics it is likely that a number of messages will accumulate in the local queue even with a low queue.buffering.max.ms, so they will get the effects of larger batches. Since each producer instances comes with a bunch of internal threads you should typically try to reuse an existing instance, but that's not possible if you need different config for the same topic obviously (which is a pretty slim use-case..). librdkafka is generally very fast and should be able to keep up with pretty much anything you throw at it (within reason, whatever that means!). |
Thanks for your reply. Let me clarify: I do not require different configs for the same topic. However, I do need different configs for different topics. By configs, I mean, settings for 'acks' (sync vs async), latency, and throughput. From your reply, it looks like I need to create a new rd_kafka_t object for every topic because I might need to configure the queue.buffering.max.ms to tweak the latency on a per topic basis and this parameter applies to rd_kafka_t and not to rd_kafka_topic_t. Am I right? I understand your comment about not needing to tweak the batch.num.messages. Given that I have multiple topics, I maintain a cache of rd_kafka_topic_t objects - one object per topic - so I don't have to recreate the object when a message needs to be sent to that topic. The cache gets cleaned up periodically, or when I am explicitly told that a topic is deleted.
|
I urge you try a single rd_kafka_t instance with queue.buffering.max.ms set to the lowest value required by any of your topics and see what happens, it should really be okay and save you from having multiple producer instances. Caching rd_kafka_topic_t is good. Topic config, as supplied to (the first call to) rd_kafka_topic_new() for a specific topic is local to that topic for the remainder of that rd_kafka_t instance's lifetime.
Topics are local to their rd_kafka_t instance and not shared between them in any way. |
Excellent! Thanks for your advice and clarity. This clears up a lot of design points. |
Hi 👋 @edenhill, would love to get your thoughts on this |
Description
This is a design/architecture question. I have a messaging system where topics are expected to come and go. Some topics produce messages at a high rate, some are relatively slow. Similarly, for some topics, message loss is tolerated, while for others, it is not.
I am confused about some of the configuration options provided for availability and reliability. Some seem to apply to a producer (rd_kafka_t), whereas some apply to a topic (rd_kafka_topic_t).
My understanding is that I can set ack = 0, ... , all for a topic. That takes care of reliability.
For performance, as per your documentation in Introduction.md, you have settings of "batch.num.messages=10000 and queue.buffering.max.ms=1000" for high throughput. And you state that these can be set on a topic_conf basis. However, I am not able to set the queue.buffering.max.ms on a topic_conf object. I get a 'rdKafkaErr: No such configuration property: "queue.buffering.max.ms"' error.
Also, for low latency, Introduction.md says that "Setting queue.buffering.max.ms to 1 will make sure messages are sent as soon as possible. ". Again, I cannot set this property on the topic_conf object.
Are batch.num.messages and queue.buffering.max.ms properties of a topic, or of a producer?
Is it possible for me to have multiple producers for a topic? How might I design for scalability if there is a firehose of messages for a topic?
How might I design for different reliability and performance profiles on a per topic basis? Should I have one producer(rd_kafka_t), but different topic_conf objects?
Thanks.
How to reproduce
Checklist
Please provide the following information:
debug=..
as necessary) from librdkafkaThe text was updated successfully, but these errors were encountered: