Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple producers, one producer per topic, each with different reliability, performance #1032

Closed
9 tasks
blandings opened this issue Feb 1, 2017 · 5 comments
Closed
9 tasks

Comments

@blandings
Copy link

Description

This is a design/architecture question. I have a messaging system where topics are expected to come and go. Some topics produce messages at a high rate, some are relatively slow. Similarly, for some topics, message loss is tolerated, while for others, it is not.

I am confused about some of the configuration options provided for availability and reliability. Some seem to apply to a producer (rd_kafka_t), whereas some apply to a topic (rd_kafka_topic_t).

My understanding is that I can set ack = 0, ... , all for a topic. That takes care of reliability.

For performance, as per your documentation in Introduction.md, you have settings of "batch.num.messages=10000 and queue.buffering.max.ms=1000" for high throughput. And you state that these can be set on a topic_conf basis. However, I am not able to set the queue.buffering.max.ms on a topic_conf object. I get a 'rdKafkaErr: No such configuration property: "queue.buffering.max.ms"' error.

Also, for low latency, Introduction.md says that "Setting queue.buffering.max.ms to 1 will make sure messages are sent as soon as possible. ". Again, I cannot set this property on the topic_conf object.

Are batch.num.messages and queue.buffering.max.ms properties of a topic, or of a producer?

Is it possible for me to have multiple producers for a topic? How might I design for scalability if there is a firehose of messages for a topic?

How might I design for different reliability and performance profiles on a per topic basis? Should I have one producer(rd_kafka_t), but different topic_conf objects?

Thanks.

How to reproduce

Checklist

Please provide the following information:

  • librdkafka version (release number or git tag):
  • Apache Kafka version:
  • librdkafka client configuration:
  • Operating system:
  • Using the legacy Consumer
  • Using the high-level KafkaConsumer
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue
@edenhill
Copy link
Contributor

edenhill commented Feb 1, 2017

.."batch.num.messages=10000 and queue.buffering.max.ms=1000.... And you state that these can be set on a topic_conf basis.

I do? :) Where? Those are global producer config properties, not topic, as indicated in CONFIGURATION.md

You typically don't have to adjust batch.num.messages, there are no adverse effects of big batches, in fact they increase chance of good compression ratio (if you're using compression).
Adjusting queue.buffering.max.ms to your needs should be enough, and yes, it applies to all topics but I dont see that as being a problem.

For high-throughput topics it is likely that a number of messages will accumulate in the local queue even with a low queue.buffering.max.ms, so they will get the effects of larger batches.

Since each producer instances comes with a bunch of internal threads you should typically try to reuse an existing instance, but that's not possible if you need different config for the same topic obviously (which is a pretty slim use-case..).

librdkafka is generally very fast and should be able to keep up with pretty much anything you throw at it (within reason, whatever that means!).
So I suggest starting out with a single producer instances with default configuration and see what kind of performance, latency, etc, you get, and start from there.

@blandings
Copy link
Author

Thanks for your reply. Let me clarify: I do not require different configs for the same topic.

However, I do need different configs for different topics. By configs, I mean, settings for 'acks' (sync vs async), latency, and throughput.

From your reply, it looks like I need to create a new rd_kafka_t object for every topic because I might need to configure the queue.buffering.max.ms to tweak the latency on a per topic basis and this parameter applies to rd_kafka_t and not to rd_kafka_topic_t. Am I right?

I understand your comment about not needing to tweak the batch.num.messages.

Given that I have multiple topics, I maintain a cache of rd_kafka_topic_t objects - one object per topic - so I don't have to recreate the object when a message needs to be sent to that topic. The cache gets cleaned up periodically, or when I am explicitly told that a topic is deleted.

  1. Does the above workflow sound about right?
  2. Lets say, for topic A I set queue.buffering.max.ms=1. For topic B, I set queue.buffering.max.ms=1000, would the setting for one topic, override the setting for the other even though they derive from distinct rd_kafka_t objects? I think it should not. Am I right?

@edenhill
Copy link
Contributor

edenhill commented Feb 1, 2017

I urge you try a single rd_kafka_t instance with queue.buffering.max.ms set to the lowest value required by any of your topics and see what happens, it should really be okay and save you from having multiple producer instances.

Caching rd_kafka_topic_t is good.

Topic config, as supplied to (the first call to) rd_kafka_topic_new() for a specific topic is local to that topic for the remainder of that rd_kafka_t instance's lifetime.
That means that only the properties from conf1 will be used in the following example:

rd_kafka_topic_new(rk, "topic1", conf1);
rd_kafka_topic_new(rk, "topic1", conf2);

Topics are local to their rd_kafka_t instance and not shared between them in any way.

@blandings
Copy link
Author

Excellent! Thanks for your advice and clarity. This clears up a lot of design points.

@Sam-sad-Sajid
Copy link

Hi 👋
This is an interesting thread. I opened an issue in confluent-kafka-go where if I use one producer client to publish to multiple topics, I found that per topic configurations are not respected. Details are in this issue: confluentinc/confluent-kafka-go#1310 (comment)

@edenhill, would love to get your thoughts on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants