Need for speed: 9 tips to supercharge Redpanda

Run your Redpanda at full throttle with our team’s top recommendations

Paul Wilkinson

April 23, 2025

Copy link

CopIED!

Everyone knows Redpanda is fast, but how can you ensure you’re getting the most out of your clusters? Performance isn’t always as easy as flipping a switch — particularly if your architecture isn’t already set up for it. Much like upgrading a car, you need to tweak and tune different parts to get everything running together smoothly.

In this blog post, we run through 9 checklist items to help you experience the most streamlined real-time platform possible.

1. Build the right infrastructure

This is an obvious one, but always worth stating: infrastructure matters. Redpanda was designed to run on modern hardware, so clusters with older, slower components or that run out of resources (whether that’s CPU, memory, disk throughput, IOPS, or network bandwidth) won’t perform as well.

Recommendations:‍

Deploy hardware that meets (and preferably exceeds) our minimum hardware requirements.
Use local NVME for storage rather than spinning disks or remote storage.
Run the brokers on dedicated machines (whether bare metal, VM, or containers in K8s) — no noisy neighbors!
Give Redpanda 95% of the available resources (always good practice to leave some room for the OS / Kubernetes host).
Monitor your resource usage and be prepared to scale as your app grows.

Pro-tip: For folks who don’t want to manage infrastructure, there’s an easy button for performance: deploy on Redpanda Cloud! This is a fully managed offering, suitable for the smallest use cases (using our serverless clusters) right through to the largest deployments — without the hassle of managing a cluster yourself.

Now that you have a cluster up and running, performance is largely determined by application design and data architecture. (That one’s on you!)

2. Check if you need write caching

For self-managed clusters with slower storage media (SSDs, spinning disks, SAN, remote storage — anything other than locally attached NVME), Redpanda’s default behaviour of performing an fsync after every message batch could be slowing you down. Ideally, you’d always deploy Redpanda on the latest and greatest hardware, but the reality of deploying on what you already have sometimes comes up, which is where write caching shines.

Rather than the gold standard of performing an fsync after every message batch, write caching holds your data in broker memory until there’s either enough data or enough time passes, and then syncs to disk in a larger write. Your producer will still only receive an acknowledgement once your data is held in memory on a majority of brokers, as long as you use acks = all. However, you’re definitely trading durability for performance, so make sure you’ve considered this.

Recommendations:‍

When deploying on hardware that doesn’t provide local NVME storage, consider using write caching for the most demanding topics.
Always use acks=all when using write caching to ensure data is in memory on multiple brokers.
Consider multi-AZ deployments when using write caching to reduce the impact of failures.

3. Watch out for skewed partitions

To handle streaming problems larger than a single broker can handle, topics are broken into partitions. But how those partitions are then used is down to your producers, rather than Redpanda.

If your application doesn’t use partitions evenly, it could start to bottleneck on skewed partitions — if not at the producers, then perhaps at the consumers, which can then lead to processing imbalances at the broker level.

Think of this as the data equivalent of Amdahl’s Law: data skew is the enemy of parallelization, limiting the benefits of scaling out by using more partitions. If 90% of your data goes through a single partition, then whether you have 10 partitions or 50 won’t really make a difference since that single overworked partition is your limiting factor.

Recommendations:‍

Use the uniform-sticky partitioner whenever possible to balance writes over all partitions.
Only used keyed partitioners when strictly necessary according to the application requirements (CDC use cases are a good example of this).
If keys are essential, and you have the option, try to pick keys that have the highest cardinality (those with the highest number of distinct values) for a good chance of distributing the keys well over the number of partitions.

Read our guide to partition strategies

4. Tune your batches

Imagine you work at a widget manufacturer and your boss asks you to send 10,000 widgets to a customer. Would you prefer to send them in 10,000 individually wrapped packages, or simply send them all in one big box?

This is the essence of batching. Rather than sending messages one by one, collating them into batches first can make things much more efficient. If your producers aren’t batching today, it’s likely they’re not as efficient as they could be. Batching does mean intentionally introducing latency into the produce pipeline, but it’s often a worthy tradeoff and can lead to lower latency overall since the broker is more efficient.

Recommendations:‍

Increase linger.ms and max batch size to get your producers batching at their best.
Monitor average batch sizes and batch produce rate per topic to understand what workloads to apply batching to.

Read our blog post series on batch tuning

5. Tune your consumers

When building out applications, many folks get a data pipeline up and running and leave it alone. But while the default settings of a consumer are a reasonable starting point, one size doesn’t necessarily fit all. Most consumers will have a preference for either low latency or high throughput, and explicitly tuning the configuration towards that preference can have a huge impact on performance.

Recommendations:‍

Determine whether a given consumer prioritizes low latency or high throughput.
Tune the consumer configuration using the parameters in the following table as a guide.
Consider multi-threading on the client application.

Parameter	Low Latency	Default	High Throughput
fetch.min.bytes	1 Byte	1 Byte	1MB+
fetch.max.wait.ms	< 50ms	500ms	> 1000 ms
max.partition.fetch.bytes	< 100 KB	1MB	> 10 MB
max.poll.records	< 100	500	> 5000

6. Don’t over-commit

Remember the good ol’ days of typing on your computer and having to press “Save” every few minutes in case everything crashed? Now imagine pressing “Save” after every <click!> single <click!> word <click!>. You’d take forever getting anything finished.

When consuming a topic, committing your consumer group offsets is exactly like pressing the save button. You record where you’ve read to, and just like that save, each commit takes time and resources. If you commit too frequently, your consumer will be less efficient, but it can also start to impact the broker as your consume workload gradually transforms (somewhat unknowingly) into a consume AND produce workload, since each read is accompanied by a commit write.

Many folks try to commit excessively often to minimize re-reads during an application restart. While that initially sounds plausible, re-reading some amount of data occasionally is expected for most streaming applications, so if your application already has to cope with re-reading a few milliseconds of messages, it can probably cope with a few seconds worth.

Recommendations:‍

If using auto commit, set auto.commit.interval.ms to a reasonable value. Generally, one second or higher; the default is 5 seconds. Low milliseconds is right out!
If manually committing in your application code, try to align your implied commit frequency to at least one second.
In a Disaster Recovery (DR) context, be aware of your Recovery Point Objectives (RPOs) and use those to help define your minimum commit frequency.
Make sure each application or micro-service uses its own consumer group, otherwise your applications can inadvertently increase the load on a single group coordinator and make rebalances more costly and impactful.

7. Shorten messages with compression

Producers spend their days sending data, which Redpanda dutifully writes to NVME devices and sends it over the network to other brokers to do the same. Consumers then send requests for data (via the network), so Redpanda retrieves it (from memory or NVMe) and sends it back over the network. Finally, consumers send in their commits. That’s a lot of data transfers.

Each of those transfers takes place on a medium (such as a network or disk) that ultimately has a fixed capacity. For better efficiency and to send data more quickly, the only trick we have is to compress it — making it smaller and therefore quicker to send. If you can compress messages at a ratio of 5:1, you can reduce what you would have sent by 80%, which helps every stage of the data lifecycle (ingestion, storage, and retrieval).

There are many choices of compression codecs. Some will compress extremely well, but also require a significant amount of CPU time and memory. Others will compress more moderately, but use far fewer resources. A classic tradeoff.

As long as producers compress the data and consumers decompress it, the choice of codec only affects the clients themselves. While it’s possible to configure Redpanda to compress batches on your behalf, it’s best practice for clients to do this work themselves.

Recommendations:‍

Compress on the client, not the broker (topic configuration for compression should be set to producer).
Clients compress batches, not messages, therefore increasing batching will also make compression more effective.

Use ZSTD or LZ4 for a good balance between compression ratio and CPU time if compression is essential.

8. Use compaction wisely

One of the more interesting features of Redpanda is compaction, which allows older values for a given key to be dropped, keeping only the latest value for a key. This is often used when a topic holds change messages that need to be replayed in the event of a service restart. Replaying intermediate values has no benefit, so they can be removed, improving service start-up time.

The compaction process runs in the broker and is actually the only use case where the broker reads message-level details from a topic. Usually, Redpanda treats the data as opaque bytes that need to be sent without reading them in detail. It gets interesting when you combine compaction with compression.

We usually recommend that compression takes place in clients (see above) for performance reasons, but when compacting, that’s no longer an option. This is because both the read and write portions of the compaction process will use additional CPU to decompress and recompress the data.

As a result, combining compression (particularly with CPU-intensive codecs) with compaction can lead to significant CPU utilization. Again, this is a classic trade-off between space utilization and CPU time.

Recommendations:‍

Don’t compress compacted topics unless you’re willing to spend the CPU cycles uncompressing and recompressing.
Use ZSTD or LZ4 for a good balance between compression ratio and CPU time if compression is essential.

See our blog on implementing a last value cache using WASM

9. Use tiered storage

Another great feature of Redpanda is the ability to use object storage for storing older topic data. While this is primarily discussed as a way to store more data in a topic than you have local space for, there are also performance benefits.

Decommissioning and recommissioning a broker can take time, as the data needs to be replicated away from the broker before it goes offline or re-replicated towards a new broker before it can start up and fully participate in the cluster. When tiered storage is in use, decommissioning and recommissioning can both be sped up by orders of magnitude, since a copy of the data already exists out in the object store. This means only the most recent data (that is yet to be written to tiered storage) needs to be moved to or from a broker.

Recommendations:‍

Use tiered storage for faster decommission and recommission, since those operations can rely on the data to be already located in tiered storage. In other words, the data doesn’t need to be replicated to another broker since a highly available copy is already in place.

Check out our documentation on fast decommission and recommission

Summary

Redpanda is a highly optimized message broker that delivers incredible performance due to its unique design advantages. However, to achieve the best results, it requires a solid combination of infrastructure, data architecture, and application design. This blog post outlined a handy checklist to help you get the most out of your clusters.

If you’re running Redpanda today and need to talk about performance, come chat with us in the Redpanda Community on Slack — we’re a friendly bunch.