Get the data privacy and sovereignty of self-hosting with the ease and scalability of fully-managed services—all in your own cloud

ByChristina LinonMay 4, 2023
Bring Your Own Cloud (BYOC): best of both worlds

As you embark on the journey of modernizing your digital footprint from a legacy system for greater autonomy and faster service delivery, the pressure is on when choosing the right people and places to manage the heart of your data pipelines—your streaming data platform, which is in charge of keeping your data flowing seamlessly throughout your entire system.

Currently, the most common offerings on the market to manage these platforms are: self-hosted, or managed and hosted by the vendors’ cloud (PaaS). Both have their own pros and cons.

With self-hosted, there’s a hefty upfront investment and you’re on the hook for planning the necessary infrastructure and human resources to keep everything up and running. It’s also less flexible to scale since you’d need to do everything in-house. The upside is that you’d have full control over your data, which makes self-hosting an attractive option for companies focused on data privacy, data security, and data sovereignty.

On the other hand, platform as a service (PaaS) is where everything is managed for you in the cloud. It’s like a one-stop-shop where setup, monitoring, maintenance, and scaling is all taken care of. However, not every company trusts third parties with their data, and the lack of transparency, access control and residency can be a major deal-breaker.

Meet BYOC. Fully-managed cloud services—your way

With today’s companies feeling the heat when it comes to guaranteeing data privacy and security, it’s no surprise they want to provision their clusters within their own virtual private cloud (VPC) and keep their data contained in their own environment. However, they also want to offload operations and maintenance to free up their teams. Basically, they want the best parts of both self-hosted and PaaS.

That’s why we developed Bring Your Own Cloud (BYOC)—a fully managed Redpanda cluster hosted on your cloud while Redpanda takes care of operations, monitoring, and maintenance. That means complete control over your data in the cloud along with the relief of fully-managed services. Best of both worlds, indeed!

Furthermore, BYOC’s privacy-first architecture drives compliance for streaming data, and allows you to scale on your own infrastructure while maintaining data sovereignty requirements. If you’d like a more detailed explanation of how BYOC works, read our post on why we think data sovereignty is the future of cloud.

Now, let’s go over how you can set up your Redpanda BYOC cluster, and a few different ways you can connect to your cluster to stream data.

How to set up a BYOC cluster in your own cloud

Installing a BYOC cluster is pretty simple. We can break it down into three steps.

1. Provision appropriate access credentials in your cloud

  • To provision and configure infrastructure for the data-plane agent, you’ll need to set the proper permissions so it can perform correctly.

2. Choose your preferences

  • Simply provide an estimate of your throughput to help us determine the size of your cluster. Redpanda will do the heavy lifting. (This can be changed later if you need to scale.)
  • Tell us which cloud provider you’re using (AWS or GCP).
  • Determine if you need public or private endpoint access.
  • Decide where you want the cluster to be located (Region, AZ).

3. Initiate deployment

The agent will first be deployed in your cloud and kickstart the installation. During this installation, the agent will set up the network across multiple availability zones, but install on a single AZ (if you choose to do so). It’ll also provision an Amazon S3 or GCS service for Tiered Storage, as well as a Kubernetes cluster, Redpanda cluster, and Redpanda Console.

If you’re more of a visual learner, watch this short video explaining what BYOC is and how to install a BYOC cluster:

Example: streaming data with your BYOC cluster

Depending on your preference, you can access the cluster either through the internet or VPC Peerings, then you can start streaming data into the cluster. We created a quick demo showcasing different ways to connect to your BYOC cluster. Below is a diagram of the setup.

blog-byoc-img1

Diagram of how you can connect to your BYOC cluster

Basically, a simulator microservices (Python) is deployed in Kubernetes (K8s) and continuously publishes signal events. The Kubernetes cluster sits in its own VPC and connects to the BYOC cluster via VPC peering.

Another consumer client (Quarkus) consumes the events externally. We set up the BYOC in a public subnet so it can connect via the internet gateway. The signal triggers a Lambda serverless application, instead of using an MSK or SNS. The Lambda service also sits in its own VPC to connect them, and—similar to AWS MSK and AWS Kinesis—establishing a VPC peering connection will do the trick.

In the demo, we enabled SASL for authentication purposes and used the secret manager to store credentials for Lambda triggers. In this case, make sure you update the access policy for your Lambda role so it has permission to access the stored credentials.

Here’s a video demo on how to connect to your BYOC cluster. To get your hands on the code used in this demo, visit this GitHub repo.

Operate and administer your BYOC cluster—while staying in control of your data

Operation

BYOC is a fully managed service, but that doesn’t mean your clusters completely rely on it. The separation of the control plane (in Redpanda Cloud) and data plane (in your own cloud) allows the system to run as usual when the control plane is down.

blog-byoc-img2

Diagram showing how cluster operation works with BYOC

The agent installed not only takes care of bootstrapping, but also configures and maintains the cloud infrastructure, K8s resources, and software artifacts. Rather than sending commands from the control plane, the agent pulls the instruction from it. This ensures the Redpanda control plane doesn’t have credentials or excessive permissions. Lastly, the agent doesn’t collect or distribute any metrics, since the metrics are all collected via the cloud provider’s API endpoint.

Redpanda updates with rolling upgrades. You can always do a blue/green deployment or canary release with a cluster running different versions, introducing the upgrades at your own pace to ensure zero application downtime.

Administration

With the basics and infrastructure all taken care of, you can focus on important cluster administration tasks, like:

  • User security access

    • Assign permissions for cluster access and set up encryption in transit.
  • Partitioning strategy

    • Balancing load by determining consumer behavior.
  • Monitoring

    • Integrate the metrics into your current monitoring tools and dashboard.

These configurations and credentials all reside in your own cloud. To see what’s happening with all your clusters, Redpanda Console is a developer-friendly tool that’s also hosted within your cloud. From the dashboard, you can check in on your topic to see the payload at a glance—a useful ability for developers debugging or generally designing their applications.

blog-byoc-img3

Diagram showing how cluster operation works with BYOC

Get started with Redpanda BYOC

Now that you’re up to speed on everything BYOC can do, let’s end with a few considerations to keep in mind before you start spinning up a BYOC cluster.

  • Throughput: This determines the initial size of your cluster. You’ll want to find the right balance between infrastructure cost and expected service level in performance.
  • Network layout: Data is transmitted through VPC peering (or upcoming transit gateway and private link), this happens to all cloud services, Make sure to set up the cluster as close to the workload as possible (i.e. same AZ).
  • High availability: The cost (network latency and resources) of operating in a single-availability zone is typically much lower, although it will scarify the resiliency of the cluster. Production clusters should be reliable, but the dev and testing clusters can be combined without high availability.

If you’re still on the fence about whether the BYOC deployment model is right for you, we’ll give you a quick cheat sheet. BYOC is the best choice if you’re looking for the following:

  1. Data privacy, compliance, and data sovereignty guarantees
  2. Fully managed, near-zero operations and maintenance
  3. A high-performing, Kafka-compatible streaming platform
  4. Significantly lower latencies and cloud spend

To give you a little extra reassurance, here’s a quote from one of our happy customers.

"Redpanda BYOC gives us a fully managed Kafka service running on our own cloud servers, balancing our internal compliance requirements with ease of use, and without compromising performance or compatibility."

— Kannan D.R., Enterprise Data Architect, LiveRamp

To experience a fully managed Redpanda cluster where your data always stays in your environment—get started with BYOC here. If you get stuck, have a question, or want to chat with our engineers and fellow Redpanda users, join our Redpanda Community on Slack.

Let's keep in touch

Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.