Kafka: Scaling producers and consumers


Going Deeper

Once you get further into producer tuning, the configurations start to get more interrelated, with some important non-linear and sometimes unexpected effects on performance. It pays to be extra patient – and scientific – about combinations of different parameters. Remember, you should be continually going back to understanding the root bottleneck while keeping an eye on optimizing the rate of records flowing through the producer.

The next questions to ask are: How big are your records (as Kafka sees them, not as you think they are)? Are you making "good" batches?

The size of the batch is determined by the batch.size configuration, which is the number of bytes after which the producer will send the request to the brokers, regardless of the linger.ms value. Requests sent to brokers will contain multiple batches – one for each partition.

A few other things you need to check include the number of records per batch and their size. Here is where you can start really digging into the kafka.producer MBean. The batch-size-[avg|max] can give you a good idea of the distribution of the number of bytes per batch, and record-size-[avg|max] can give you a sense of the size of each record. Divide the two and you have a rough rate of records per batch. Match this to the batch.size configuration and determine approximately how many records should be flowing through your producer. You should also sanity check this against the record-send-rate – the number of records per second – reported by your producer.

You might be a bit surprised if you occasionally have very large messages, for which you should check record-size-max, because the max.request.size configuration will limit the maximum size (in bytes) of a request and therefore inherently limit the number of record batches, as well.

What about the time you are waiting for I/O? Check out the io-wait-ratio metrics to see where you are spending time. Is the I/O thread waiting or are your producers processing?

Next, you need to make sure that the client buffer is not getting filled. Each producer has a fairly large buffer to collect data that then gets batched and sent to the brokers. In practice, I have never seen this to be a problem, but it often pays to be methodical. Here, the metric buffer-available-bytes is your friend, allowing you to ensure that your buffer.memory size is not being exhausted by your record sizes, batching configurations from earlier, or both.

Producing too many different topics can affect the quality of compression, because you can't compress well across topics. In that case, you might need some application changes so that you can batch more aggressively per destination topic, rather than relying on Kafka to just do the right thing. An advanced tactic would be to check the bytes-per-topic metrics from the producer, so you should only consider doing so after benchmarking and making sure other adjustments are not helping.


The configurations and metrics to tweak on the producer to get high throughput are summarized in Table 1. At this point, you should have all the tools you need to scale up your client instances. You know the most important optimization switches, some guidelines for adjusting garbage collection, and the nice round robin trick for balancing consumer groups when the consumers encounter differently partitioned topics. For slow producers, apply standard optimizations for compression and idle time or dive into the depths of the producer configuration to find out what really happens to entries and stacks.

Table 1

Producer Tuning Summary

Config/Metric Comment
compression.type Test on your data.
linger.ms Check the average time a batch waits in the send buffer (how long it takes to fill a batch) with record-queue-time-avg.
batch.size Determine records per batch, bytes per batch (batch-size-avg, batch-size-max), and records per topic per second (record-send-rate) and check your bytes per topic.
max.request.size Limit the number and size of batches (record-size-max).
Time spent waiting for I/O Are you really waiting (io-wait-ratio)?
buffer.memory + queued requests 32MB default (roughly total memory by producer) allocated to buffer records for sending (see buffer-available-bytes).

The tips in this article should give you a bit more guidance beyond the raw documentation in the Kafka manual for how to go about removing bottlenecks and getting the performance out of all parts of your streaming data pipelines that you know you should be getting.

The Author

Jesse Yates is a Staff Engineer at Tesla and leads the development of real-time stream processing on the Data Platform Team. Before joining Tesla, Jesse created a real-time IoT data startup, helped Salesforce build the first enterprise-grade Apache HBase installation, and consulted on healthcare, defense, and cross-cloud analytic projects. He is also a committer on Apache HBase and a Project Management Committee member of Apache Phoenix, for both big data storage and query projects. You can follow his writing at http://jesseyates.com

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus