What makes Redpanda fast?

10x more speed through intentional design. C++, DMA, and autotuning, oh my!

August 10, 2022
Last modified on
April 20, 2026
TL;DR Takeaways:
No items found.
Learn more at Redpanda University

In 2021, we wrote a blog on how we made the Kafka API fast. In it, we showed that Redpanda outperforms Apache Kafka® in every benchmark category.

That post remains one of the most popular articles on our site, and it's very technical.

Some people just want to know, "How fast is Redpanda?"

TL;DR: It's fast.

How do we define fast?

For Redpanda, speed is generally measured in comparison to Kafka, and for the purposes of this post, we measure it in the tail latency of produce/consume operations.

We measure tail latency in percentiles, where "p9999" means the slowest speed at which 99.99% of the transactions ran.

Imagine a product that guarantees that 99.99% of all requests will fall below a certain threshold. If there are 100 requests, that seems pretty good.

What happens if there are ten million requests? Then 1,000 of them will be on the downside of that threshold.

That still doesn’t seem bad until you realize that for the request at the 99.99th percentile to be just under the threshold, those other thousand requests had to be really bad.

What if those 1,000 requests were making trades worth millions of dollars?

Is it still okay?

For Redpanda, p9999 isn't enough, so we measured ourselves against Kafka at p99999.

We chose this measurement because as more systems exchange messages, the probability of a single message being affected by latencies above the 99.99th percentile also increases.

The businesses that rely on Redpanda to process data need those messages delivered swiftly and reliably.

Okay, but how fast is it?

For each test, we used three massive nodes for the brokers and two slightly smaller nodes for the clients.

This ensured any bottlenecks would be on the server side, where we were taking our measurements.

Each test measured end-to-end latency, using acks=all to commit the transactions on all replicas before returning to the client that the message was received.

Across nine different tests, Redpanda scored between 196% and 10,847% faster than Kafka.

Workload Kafka p99.999 Redpanda p99.999 Percentage Change
acks=all + fsync() after every batch
(1) 10MB/s (10K msgs/s) 215.191ms 12.405ms +1634.71%
(2) 40MB/s (40K msgs/s) 102.589ms 52.122ms +96.82%
(3) 50MB/s (50K msgs/s) 235.675ms 13.999ms +1522.66%
(4) 75MB/s (75K msgs/s) 1801.263ms 16.606ms +10747.06%
(5) 100MB/s (100K msgs/s) 1725.391ms 20.552 +8295.25%
(6) 200MB/s (200K msgs/s) 1945.039ms 27.307ms +7022.86%
(7) 0.5GB/s (500K msgs/s) 3015.295ms 60.943ms +4263.23%
(8) 1GB/s (1M msgs/s) 3839.663ms 174.521ms +2100.12%
(9) 1.25GB/s (1.25M msgs/s) 3797.167ms 237.688ms +1497.54%


Those are massive numbers.

To put the larger number into perspective, for one of the tests, Kafka's p99999 latency was 1.8 seconds, and Redpanda's was 16 milliseconds.

Why is Redpanda fast?

Redpanda's blazing speed comes from its key architectural differences.

Redpanda is written in C++

Kafka is written in Scala, compiled into Java bytecode, and run in the Java Virtual Machine.

Redpanda is written in C++ and makes use of a number of sophisticated performance-enhancing techniques, many of which would be difficult or impossible to use in Java.  These include use of thread-local data structures, pinning memory to threads (with libhwloc), and directly invoking specific Linux-level libraries like DPDK (more efficient packet processing), io_uring (efficient asynchronous disk I/O), and O_DIRECT (sparse filesystem use). If you’re interested in the specifics of how we use these capabilities, check out this presentation.

C++ makes Redpanda faster out of the gate, and we build on that with how we interact with the operating system.

Redpanda intelligently handles memory

Modern operating systems use free RAM for caching files that would otherwise have to be read from and written to disk.

This is called the "page cache," and it was created to speed up reads and writes for files located on slow, rotating disk platters.

Solid State Disks (SSDs) improve access times, but not by much. Working with RAM is still orders of magnitude faster than working with disks.

Partitions in Kafka and Redpanda behave differently than files.

When Kafka makes a fetch request, the operating system responds by executing functionality that would benefit an application loading a file. It performs locking and other behaviors that introduce latency where we don't want it.

For Kafka clusters that run in Kubernetes, sharing the page cache between multiple applications also means that the amount of data that Kafka can cache is never guaranteed.

Redpanda uses an append-only log with ordered reads, and we know exactly how much data will be read and written in any request.

Instead of using the page cache, we allocate RAM specifically for the Redpanda process. We use it for our own hyper-efficient caching, adjusting buffers according to the actual performance of the underlying hardware.

We use Direct Memory Access (DMA) and build our cache in a way that aligns with the filesystem, so when we flush data to disk, it's fast and efficient.

Furthermore, because our cache is shared by all the open files being read/written by Redpanda, when there are spikes in message volume the most heavily used partitions have immediate access to additional buffer space. This approach helps keep latency down.

Automatic kernel tuning

It's tempting to spin up an application on a server and call it done.

Latency-sensitive applications like Redpanda and Kafka benefit from the additional step of tuning the operating system kernel for the best performance.

Tuning comes with its own challenges: namely, how do you know what parameter to tune or what value to set?

Setting an incorrect value can have no effect, or worse, it can make the application perform poorly for what it's trying to do.

If you change too many variables, you don't know which one caused a change.

We've tested Redpanda on many, many architectures and determined which values work best on each platform.

We rolled all of this into rpk redpanda tune all.

This command determines the optimal kernel settings for your platform and sets them so you don't have to figure anything out.

Can I get a "Boom, done?"

Boom. 💥 Done.

PRO TIP: This command is only suited for production servers, and you probably shouldn't run it on your development machine.

Thread-per-core architecture

Threading sits at the core of applications that process data.

It enables parallel processing of work instead of blocking new work while serially processing the current task.

Threads are a part of the CPU architecture for the same reason.

A CPU with multiple cores can execute tasks in parallel, appearing faster because it gets more done.

Threads in an application are assigned to a CPU core by the operating system. They can usually move between cores, but Redpanda intentionally allocates one thread to each CPU core and pins it to that core.

It doesn't get to move.

This is done via the shared-nothing paradigm of ScyllaDB’s Seastar framework. Seastar doesn’t use expensive shared memory across threads. Instead, it avoids slow, unscalable locking semantics and issues with caching by pinning each thread to its own core.

Because of Seastar, everything in Redpanda that would use that core runs through that thread.

We then specify that no instruction can block for more than 500 microseconds. If it does, we see it during development and testing. We treat it as a bug, and we fix it.

This forces our developers to optimize their code and how it operates.

Or, said differently, it forces us to write code that executes very, very quickly.

What's new since we published this post

The original performance story remains our authoritative foundation. C++, DMA, thread-per-core execution, and CPU pinning are still the core drivers of Redpanda's speed. However, we have layered new compiler optimizations, storage options, and sharper hardware guidance on top of that foundation.

  • Profile-guided optimization (PGO): Introduced in Redpanda 26.1, PGO packs the compiled binary's hot code paths tightly together to improve CPU instruction cache locality. This reduces frontend stalls and iTLB pressure, resulting in up to 47% lower p999 latencies and 15% better CPU utilization on the same hardware.
  • Write caching: Available since Redpanda 24.1 for workloads that can trade some durability guarantees for lower latency, write caching can reduce latency by up to 90% on modern storage hardware.
  • rpk iotune: This is a distinct, complementary tool to rpk redpanda tune all. While the latter handles kernel settings, rpk iotune benchmarks specific hardware at startup and writes an io-config.yaml to optimize I/O performance.
  • systemd integration: Redpanda now uses a systemd integration layer to manage software-level limits. This helps ensure settings like memlocks, scheduler affinity, and file handles are properly enforced in production.
  • Updated hardware guidance: Current production guidance emphasizes fast local SSD/NVMe storage, the XFS filesystem, modern Linux kernels, and enough network bandwidth for your workload profile. Follow Redpanda's current sizing and deployment docs for exact recommendations.

Kafka API compatibility: a drop-in replacement

All of this speed would be a lot less useful if it came with a rewrite tax.

It doesn't. Redpanda is compatible with the Apache Kafka API, which means your existing producers, consumers, and much of the surrounding ecosystem can keep doing what they already do. In many cases, that means changing broker endpoints, not rewriting applications.

That matters because the biggest value in Redpanda isn't just lower latency in a benchmark. It's lower latency with familiar Kafka APIs, so you can move faster without turning migration into its own engineering project.

Fast is our normal

For centuries, people could travel from New York to London on a boat. This was the way it was done.

Airplanes changed everything.

With air travel, people could do things that they had never thought of because everything was suddenly faster.

Developers who use Redpanda rely on it to be consistently fast and safe.

When "fast" is the default environment you're working with, your applications are no longer constrained. You can do more, confident that you'll get better performance because Redpanda is a streaming data platform built for today's hardware and tomorrow's applications.

Specifically, some applications our customers told us were not practical with Kafka’s latency characteristics include use cases in algorithmic trading and trade clearing, real-time gaming, SIEM use cases with strict SLAs, oil pipeline jitter monitoring, and various edge-based IoT use cases.

These applications depend in particular on reliable tail latency lower than Kafka can provide, which means that without Redpanda, they would have been implemented in a fully custom solution, or not at all.

The ability to deliver on these use cases with an existing system using the well-known Kafka APIs drives massive value for these customers.

Redpanda. What will you build with it?

Tell us in the Redpanda Community on Slack, follow us on Twitter, and check out our source-available GitHub repo here.

FAQ

Is Redpanda a drop-in replacement for Apache Kafka?
What is the difference between 'rpk redpanda tune all' and 'rpk iotune'?
What hardware does Redpanda need for production?
Does KRaft remove the need for Redpanda's architectural approach?

Related articles

View all posts
Robert Siwicki
,
Rachel Zalkind
,
&
Apr 28, 2026

Five principles for governed autonomy with enterprise AI

How we turned opaque agent behavior into governed, provable workflows

Read more
Text Link
Robert Siwicki
,
Rachel Zalkind
,
&
Apr 16, 2026

Building safe, multi-agent AI systems in Redpanda Agentic Data Plane

How we revamped our Redleader agent to enable governed, multi-agent AI for the enterprise

Read more
Text Link
Kristin Crosier
,
,
&
Mar 24, 2026

Visibility vs. autonomy: The paradox of enterprise agentic systems

How a governed data control plane ensures trust and accountability

Read more
Text Link
PANDA MAIL

Stay in the loop

Subscribe to our VIP (very important panda) mailing list to pounce on the latest blogs, surprise announcements, and community events!
Opt out anytime.