Data plane atomicity and the vision of a simpler cloud

Your cloud. Your data. Period.

By
on
August 21, 2024

Our engineering culture has always been to tackle the hard things first. Initially, we started building a new storage engine from scratch without virtual memory and without primitives, like the page cache to accelerate write-behind workloads. We then built an industrial-scale consensus layer into the Redpanda binary itself because we didn’t want to push the complexity on users by bringing on external dependencies. 

Over the years, we’ve continued our tradition across every part of the stack, whether by introducing co-processors in WebAssembly or creating an entirely new mode of cloud delivery: Bring-Your-Own-Cloud (BYOC).

BYOC is a fully managed service hosted on your own cloud account. In simple words, it is the separation of control and data planes, where the control plane is cloud.redpanda.com and the data plane lives inside the customer’s network. 

We came up with BYOC as a new word for what had been classically known as “Bring Your Own Account” because we needed to define a new architectural primitive with data plane atomicity, an agent-based security model, and a clear data principal that did not exist when we built it.

Data plane atomicity

Diagram of the Redpanda Cloud data plane operating within the customer’s cloud

Data plane atomicity means the application has no external dependencies to run. No external consensus, no external databases, no external secret managers, etc. In other words, the unavailability of any other data plane or control plane doesn’t affect the user's application. At worst, you may have to wait a few minutes to trigger a version upgrade should the control plane require a full reboot.

Architecturally, this is a property we feel immensely proud of because it is easier to implement an alternative for us, but doing so would shift the complexity onto users. You cannot eliminate essential complexity. You can only shift it around. 

At a technical level, every data plane has a checksum (sha-1) of the entire immutable infrastructure. At any point, we can generate an exact copy of any data plane, including transitive dependencies in hundreds of packages. Every change produces a json-manifest that tells us down to the individual library versions shipped with any software of what exactly is installed where. Any mutation is simply a new fleet deployment, so we got good at ironing out fleet operations on hundreds of clusters at a time. 

Doing a full fleet update in hours is common if we find a security update. In practice, we swim-lane (10%, 25%, 50%, 100%) customers as most large cloud companies do and operate in one swim-lane at a time.

Agent-based security model

There is exactly one agent process per cluster — typically in an autoscale group of size 1 — that bootstraps the bootstrap. It is a single Go binary injected with a unique token during the cloud provisioning phase to initiate the authentication process against cloud.redpanda.com. This bootstrap service is launched with ambient permissions to operate on behalf of the users, scaling computers, issuing certificates, updating DNS entries, etc, making up the data plane concept.

Users can revoke connection to cloud.redpanda.com and therefore suspend all state changes by shipping a single firewall block rule. Because the cluster is atomic in the sense that it has no external dependencies to function, all applications will remain operational. The data plane is composed of the Kafka-API, Schema Registry, HTTP Proxy, and Redpanda Connect, which are all deployed as a local-network group and have no dependency on internet level services.

The operations the agent can perform are both exhaustive (the list of allowed cloud API operations is complete, and there are no wildcard permissions when allowed by the cloud provider), audited, and constrained to a single cluster. Importantly, this is all user-controlled.

A new data principal

Your data, you own it, in your own bucket.

Managed services give us tremendously helpful building blocks to build bigger and better data products. Fundamentally, progress relies on standing on the shoulders of giants: people who have honed their craft and operationalized foundational services to build what hasn’t been built before. 

While we did get infinitely scalable services, we also gave up ownership of our data and, with that, our access to it. In many ways, our data is often jailed by SaaS providers. A single query can cost you hundreds of thousands of dollars, and worse, the only way to access your data is via a single, very use-case-specific API, not because it is the best way to access the data — and therefore pay for it — but because it is the only way. 

The future is no longer about the separation of compute and storage, but the separation of many compute (any service) with data. The future will be about letting the best-of-breed engineering design win in a world where access is not gated but integrated through open formats like Apache Iceberg™ or Delta Lake.

From a principal point of view, you own your data in your own lakehouse (bucket) using your own compliance and security standards; you control access to the query engines that deliver the value you need at a point in time. 

As an engineer, while costs will always be part of the solution space, you want to focus on the best possible design without being forced into alternative designs that solve your user’s problems in a subpar manner — all because your computational and dollar cost budget is spent merely SELECT-ing a table.

A simple, secure cloud. This is the way

The world considered the main pillar of BYOC to be the prevention of horizontal escalation of privileges for cross-account access, but it was actually a profoundly fundamental shift in how we see the world. Your data? You own it. No other data plane or control plane should be able to affect your applications, and you should always be in control. We built the path we wished existed when consuming services ourselves.  

We wrote our cloud twice — from scratch. We burned through millions of dollars because when we were exploring the design space, it wasn’t immediately obvious how these pillars came together cohesively. So, a massive thank you to our early design partners, without whom we would not be here today. Thank you. We are privileged to partner with you every day, and with the product and engineering teams who built it.

Check out our thoughts on Sovereign AI, which extends the concept of total ownership over your data when building AI applications for the enterprise.

Interesting reads

How BYOC fits into your cloud governance framework 

Bring Your Own Cloud (BYOC): best of both worlds 

Data sovereignty is the future of cloud 

Public cloud security breaches 

Multi-tenancy cloud security: definition & best practices

The risks of multitenancy in cloud computing

Graphic for downloading streaming data report
Redpanda Connect for Cloud
Christina Lin
&
&
&
September 12, 2024
Text Link
New AI connectors and GPU runtime support for AI models
Tyler Rockwood
&
&
&
September 12, 2024
Text Link
Cloud Topics: Efficiently stream data through object storage
Noah Watkins
&
Matt Schumpert
&
&
September 12, 2024
Text Link