Data sovereignty is the future of cloud

When to choose Redpanda over Apache Kafka.

Alexander Gallego

January 11, 2023

CopIED!

I started Redpanda in 2019 out of personal experience wanting a simpler, faster, cost effective system that didn’t lose data. This was the napkin sketch. Being an engineer is very freeing in that you don’t have to ask for permission, simply open Emacs and start materializing your ideas.

To be clear, this blog is a response to Confluent’s Field CTO, Kai Waehner, on a blog titled “When to choose Redpanda instead of Apache Kafka®.” For the impatient, every single technical point is in the Appendix.

It bears repeating, I think extremely highly of the engineers, hackers, and doers that actually write and maintain the Apache Kafka code. I was inspired by all of you.

Some will always see the world in black and white. I see the world greenfield (see references 1-12), where multiple companies can innovate and push each other’s products into design space dimensions that would be hard without competition, just like LLVM pushed GCC or Rust pushed the safety of the C++ type system. Whether it is pushing the safety properties of clients acks=ALL, or improving the local-dev container startup experience, together we make real-time streaming simpler for developers and in turn get to change how the world builds applications with streaming as default.

What the engineer cares about, however, is her application working end-to-end. End users do not care about foo, bar, baz, or the color of the bike-shed, they care about their untouched code going faster, with fewer resources and lower costs, and that’s ultimately the promise of Redpanda.

Classically, real-time streaming has not been easy. To the engineer, it still feels like distributed systems whack-a-mole with 4 or 5 systems to manage. This is still true when choosing a SaaS vendor, the axes simply shift. When building systems, one cannot eliminate essential complexity, only move it around. In particular when the data volume is high, the cost-dominant structure of cloud is network traffic, latency between your applications and the closeness of the storage engine and for some, the sovereignty of owning the hard-drive lifecycle where your customer data resides.

In this post I focus on our fully managed BYOC (Bring Your Own Cloud) offering to demystify what it is and share design tradeoffs after working with some of our larger F500 customers for the past 2 years as we co-designed our cloud. For the uninitiated, yes it is production ready, SOC2 compliant with managed connectors and VPC peering.

Data sovereignty is the future of cloud

Data sovereignty is much harder to achieve than data privacy. Privacy can be achieved with policy: delete this, mask that, obfuscate here, index like so. Sovereignty can only be achieved if you, the user, control the hard drive lifecycle where data resides. There are no two ways about it. Data either lives inside the hard drives that you control or it does not.

Immutability is streaming’s strongest property and is also its achilles heel for classical vendors. Unlike other systems like databases, traffic on streaming systems is often 10 or 100 times larger in throughput. It is very common to have a 10-to-1 reduction in databases from writes to reads, while the opposite is common for streaming systems like Redpanda, where we see in practice 10 to 100 times more reads compared to writes.

Replayability, immutability, and separating producers from consumers is ultimately the value that Redpanda or Kafka bring to the developers. If you produce data with the right schema, something else will pick it up sometime in the future. Your data is the only contract.

We invented Redpanda’s BYOC to fix data sovereignty problems. Let there be no mistake about this, companies don’t offer BYOC not because of the technical ability of their teams but because reselling AWS infrastructure artificially inflates revenue. We choose to lean into the future of how companies want to consume the data.

At a high level, the data-plane agent dials into our control plane. The agent is a stateless service that listens for an exhaustive list of commands with a complete list of permissions we provide our customers during the security audit. Something has to watch the watchman. The agent is not deployed inside kubernetes, instead it relies on AWS|GCP autoscaling groups. If it crashes, a new image is started with a clean VM and with no interruption of data-plane Redpanda services.

For completeness, Redpanda provides additional Dedicated single-tenant services similar to other SaaS vendors, with an upcoming Serverless offering. But in this post we’re talking about how the future should work for the most demanding workloads. We note that all deployments use the exact same architecture.

Privacy-preserving interactive experiences

One of the nicest things about Redpanda is the fast response times when interacting with data. We spent 2 years and a full cloud re-write until we nailed the user experience. Fundamentally, your browser makes 2 TCP connections, one to the control plane and one to the data plane. This happens after some auto-negotiation and discovery so that ultimately, you can still filter, aggregate, view, explore, send and otherwise slice and dice your streaming data with our UI without Redpanda’s Control Plane ever seeing any of the actual data.

Fully managed cloud via cloud.redpanda.com

We use a technology called module federation, which allows us to ship fully functioning, versioned, micro-ui’s to the data-plane attached to the specific functionality to the installed software bill of materials. In other words - the future - fast, reliable, preserves your privacy and allows users to maintain data sovereignty. Meanwhile, your end users only observe that it works “faster,” especially for filtering, aggregations, etc. as you are not paying WAN network costs between Redpanda and your apps.

Git clone is believing

One thing is certain, there is no substitute for git clone^[14] and seeing for yourself on your terminal, with your target workload - this is how engineering is done. We wrote step-by-step instructions^[15] on how to run the original Open Messaging Benchmark and reproduce results, but most importantly tweak it to run your target workloads so you understand the tail latency, throughput and scalability bottlenecks of Redpanda or Apache Kafka or any other vendor.

To experience a fully managed Bring-Your-Own-Cloud (BYOC) where you retain sovereignty, privacy, lower latency and costs, get started here.

Appendix

Below is a list of factual edits, nitpicks, boring but complete rebuttal to the technical points. They are in bullet list form so you may refer to them in the future.

1. SLA’s - we have them.
2. Disaster Recovery - mm2 for async replication, or tiered storage built-into the enterprise version.
3. Enterprise Support - yes we have one of these too.
4. Fully managed cloud service - yup dedicated and BYOC are fully managed.
5. Redpanda is Edge friendly - Agreed. Redpanda is a small binary image that can and in fact runs in smaller edge devices for some of the largest payment networks in the world.
6. BYOC
- a. How does vendor access your data center or VPC - see above
- b. Who decides when and how to scale a cluster - users in the UI
- c. When to act on issues - via sla sidecar probe of the data plane itself
- d. How much value does the vendor solution bring - incomparable, see data sovereignty article above.
- e. Cost management - extremely cost effective as users get to utilize their discounts and cloud spend
- f. TCO - typically about 50-70% lower than Confluent, depending on scale and commit. More here.
7. How do you guarantee SLAs - via sidecar probe on the data plane
8. Who guarantees SLAs - Redpanda Data, Inc.
9. Regulated industries
- a. Security controls and compliance
  - i. See detailed post above on exhaustive and complete list of commands and kill-switch mechanics (cut firewall traffic)
10. There are reasons cloud vendors only host managed services in the cloud vendor’s environment
- a. This is untrue and misleading.
  - i. Let there be no mistake. Dedicated services are about Revenue maximization. Technically, it is entirely feasible to build BYOC. That's the only way to achieve Data Sovereignty.
11. The author claims that Redpanda is not 100% Apache Kafka, unlike Confluent.
- a. Confluent is a fork of Apache Kafka as well.
- b. We are 100% client compatible, any issues are bugs.
- c. Same is true for MSK and their newly pulled in tiered storage.
- d. This is OK, but let’s not call the kettle black. It either is or isn’t Apache Kafka and Confluent is a fork.
12. Redpanda has lower latency - yes, we’re lower latency.
13. No use cases where 4 TCP connections can publish 1GB/s - untrue, we have at least 4 customers with this workload. Can we all agree that 1GB/s is an average workload?

14. No Zookeeper Apache Kafka - Kai does mention in his article that it will take some time until the architecture goes into production and is Jepsen tested.
15. No value in Redpanda being written in C++ if the ecosystem is not.
- a. Obviously false.
- b. People do not care if it is C++ or Ada. They care about:
  - i. Faster
  - ii. Easier to use
  - iii. Lower costs
  - iv. Untouched applications continuing to work
  - v. Ecosystem compatibility with tooling like Debezium, Tensorflow, ClickHouse, etc.
16. Disaster Recovery across clusters - see note above on mm2 or tiered storage blogs here.
- a. Tiered storage is especially important if you care about N copies of your data for analytics/sampling purposes. Extremely common.
- b. Yes people want that, for operational efficiencies, etc, tiered storage, etc.