Join us for a live migration from Confluent Cloud to Redpanda

This episode of Real Time with Redpanda features a live migration from Confluent Cloud to Redpanda and an overview of Redpanda’s compatibility with Kafka. We also discuss the upcoming release of our Shadow Indexing tiered storage capabilities.

Featuring:
Alex Gallego, Redpana, CEO & Founder
Alex Debrie, Host
Michal Maslanka, Redpanda Software Developer

What's in the video

  • How does migration from Confluent Cloud to Redpanda work?
  • Live migration – Step 1
  • Live migration – Step 2
  • Complete migration
  • How MirrorMaker works
  • Shadow Indexing
  • Latency
  • Wrap up

Skim the transcript

Alex (0.02)
Hey everybody, welcome to Real Time with Redpanda. I'm your host, Alex DeBrie. This is episode seven, and today we're going to be talking about a lot more about real-time streaming with Redpanda. We've got a great show today. We're doing a live migration, so I want to pull on first our two guests today.

You know, a recurring guest, CEO host, CEO and founder of Redpanda, Alex G is with us. And we've also got a new guest today. We got Michal Mazlanka, software engineer from Krakow, Poland. So, Alex, Michal, thanks for joining us today.

Alexander (0.30)
Thanks for having us, Alex. It's always great to be here. Today is going to be super fun. We're going to be talking about live migration. We are going to be demoing a live migration from a cloud into Redpanda Cloud, but it works with any Kafka cluster into any Redpanda cluster.

You know, and I think it's we have this major milestone that we wanted to share with you guys. Stay tuned for a GA of this release that we are talking about today next week. On some features that are landing, so the official GA for everyone to consume will land next week.

But yeah, without further ado, let's get into it. And before we get started, I'd love to introduce everyone on the live stream to Michal Mazlanka. Actually, so Michal and I worked together at Akamai, not on the same project.

And so he was part of the team in Poland. And so when I started the company, I reached out to engineers that I thought were really good. And so I pinged Michal, and I was like, "Hey, I'm starting this new idea." This is in 2019, and I think Michal, you had just accepted a job at another company. I think you were a month there.

Michal (1.46)
Yeah, that was really cool, and thank you for having me here. Yeah, Alex was saying that we will be looking for compiler backs and working on really cool stuff, and that's really what it is.

Alex (2.01)
I'm just gonna say thanks for joining us. I love that you both are swagged up with the fancy Redpanda shirts. For sure. And so yeah, great to be here today. I just want to set out a little bit of what we're doing here.

You know, as Alex mentioned, we're going to be doing some migration today. I think Alex is going to get set up first and just show at a high level like hey what's what's going on here what's the architecture that's going to be happening what's the process that's going to be happening.

And then Michal's got the demo to show it in practice so could you start with the run through here Alex?

Alexander (02.31)
Yeah, cool, let's do it. I'll turn you on. How does migration from Confluent Cloud to Redpanda work?

So, what do we have going on today? All right, so let's talk about the live migration. And then let me just expand my screen a bit. We're going to have Alex talking to Michal, and we are going to then, this is going to be the first instance of this group. It's a chat application and it's going to be connected to Confluent Cloud.

So, Alex is going to send messages to a Kafka cluster on Confluent Cloud, and then Michal is going to consume messages from Confluent. But this works with any vendor: MSK, Ivan, it doesn't really matter to us. It's just an easy way to run Apache Kafka, so we connected to it, and then the magic is, Brian Doland, from when he was at LinkedIn, he wrote this thing called Mirror Maker 2.

Brian Dolan. And so we're going to be starting up a Mirror Maker 2 process, and that was now, I think, upstreamed into the main project. But this is a really comprehensive tool that mirrors data and read echoes, for some of you guys in the audience.

So if you start to mirror live data, just note that it won't propagate right ackles. But, and then on the other end of the world, and so imagine a totally different cluster, region, cloud, it really doesn't matter. We're gonna be talking to Redpanda.

Alexander (04.18)
And so now, here's the interesting part. This green box runs in localhost, right? So imagine they're really just connecting to somewhere in your application. And the point of this is to eventually migrate the chat window into a Redpanda cluster without any downtime, barring some TCP reconnections. And that's basically the gist. And so it's gonna be two; this is gonna be a two-step process. First, we are going to migrate Michal.

So, Alex is still going to send messages to Confluent Cloud. Note that Mirror Maker 2 is also going to run on localhost and Michal's laptop. And so the first migration is we're going to move Michal, the consumer, into Redpanda Cloud, but Alex will still produce to Confluent Cloud, just to show the Mirror Maker bridge actually working and replicating data in near real-time.

And then we're going to move Alex and the result of this is that Alex is going to keep the consumer offset when it connects to Redpanda Cloud. That is key, right?

So imagine that you transfer a terabyte of data and so you don't want your consumers to all of a sudden be switched back. And so, one key thing that we should talk about today, we shouldn't leave this live stream if we don't talk about that, is that what is the difference between Mirror Maker 2 and shadow indexing? You know, when you're talking about terabytes, or you know, actually some of our customers, we just crossed the first 100 petabyte per year workload, which is super cool in terms of historical data. So what happens when your scale is so large? And where is Mirror Maker 2 useful? Where is shadow indexing useful?

Alex (6.18)
Yeah, I love it. So, yeah, Mirror Maker is absolutely copying that data over, and not just, you know, the topic data like you're saying, but also those consumer offsets.

So it's basically transparent to that copy of the consumer reading from it, and then once that's sort of transferred over, caught up, then you can stop that consumer, restart that consumer, and it keeps on processing without even knowing what happened there.

Alexander (6.36)
Exactly. And so, Michal, if you want to just take it away and show us the demo.

Michal (6.44)
Of course. So, oh, we have a little inception here, but here's what we have. So basically, I have here my terminal, and I created a simple application that's the application that Alex talked about. And this actually creates a Kafka producer and Kafka consumer, and we have basic security here as well.

So we are able to produce any topic, but we are only able to consume from the topic that we are actually connecting to. And here is this. The whole thing is connected like this. So we have this source Alex user and source Alex pass source because that's in the source cluster, and I'm I want to connect here as a user Alex and chat to Michal, and this is connecting to the cluster in the Confluent Cloud.

And what is important is that in this Confluent Cloud cluster, we have these two topics here, right? So they are now empty and right now there are no customer groups here because we are not consuming anything yet.

And that's like a fresh setup, so you will be starting from scratch right now. And let's try to see what will happen. So this is all started basically like that. Alex wrote me, "Hey, Michal." And I responded, "Hello, Alex."

And this is these offsets, this is just debugging information here, so we know exactly at which offset we are reading at and we are producing two and Alex said, "Let's do the MM-2 migration."

Michal (8.32)
And I said, "Yeah, sounds great." Right. And so this is basically the chat app. So we are able to connect, and this is all working right now in the Confluent Cloud. So let's see if some groups are created here.

Alex (8:50)
And one thing I want to show here, if you're looking at that UI and seeing that that's the Cal UI, you know, now part of Redpanda, we talked about that on the last episode with Real Time with Redpanda. I'll post that in the chat if you want to go check that out.

But had a great session on what Cal is doing, how it sort of completes the puzzle in terms of the UI of what's going on in your Redpanda cluster.

Michal (9.10)
Yeah, exactly. It's so convenient and so easy to use that I even forgot to mention that. It's really great. It makes life a lot easier, and it's also good-looking.

So I just wanted to show you here that we have this "gap cup item" consumer, and it's consuming a single partition, and it's at the end of this partition right now with zero lag. So this is all working as expected, right?

So now if we turn this client off and turn it back on, we will not be reading the messages from the start. That's what Alex said.

So we have the consumer group, we continue where we are left off because the offset is committed and stored in the source cluster, right? And now I would like to show you one more thing.

So if we are, if we would like to, we switch off here the Michal client. So if I would like to read Alex's messages, I would connect here to Alex, right?

I would try to connect to Alex with my credentials, and that should be possible because it's like I'm unauthorized to connect. So that this app is working, we have basic authorization here like this, right?

Alex (10.36)
So just to clarify for folks, so far we're still on that single, you know, existing Kafka cluster. We've got our two clients, both producing and consuming from it, chatting to each other. What's next?

Michal (10.48)
Exactly. And the next part is to run the Mirror Maker, right? So we have right now the Mirror Maker is set up in such a way that it's gonna connect to the source cluster and transfer over topics, read ackles, read topic ackles, exactly and consumer groups to the target cluster. So now I'm going to do it here. So you can see that that's a simple process.

Alexander (11:21)
One quick thing to note is that Mirror Maker 2 is using the upstream Apache Kafka APIs, and so it should work just like any other Kafka application, just like effectively any of our customers. We take pride in trying to support you know, sort of this seamless migration into Redpanda.

Right, we literally built the TCP parser of the Kafka protocol so you don't have to. And what it gives us is this ecosystem compatibility. We’re showcasing Mirror Maker 2 but it would be true for Clickhouse and Mongo and all of these databases. What version did you download, Michal?

Michal (12.10)
It’s actually 3.0. The only thing you need to remember is to install Java. So that’s running. This is now a source cluster so we can see in the topics menu that some topics are created by the Mirror Maker. Let’s see if we have something in our target cluster.

Yeah so we have Michal and Alex topics here. Let's see if they have some messages. So we can see the messages were mirrored. Consumer groups aren't here because we do not yet consume anything from it. But they will be here in a minute. Sorry for that.

So let's try to run something like this. We will now connect to the target cluster so just to show you that it's actually connecting to other clusters here. We have different credentials because users as we said they are pre-created in the clusters as that's not the part of the Kafka in the protocol.

And users in both Confluent Cloud and Redpanda cloud are handled by the different APIs. This is as you can see here connecting to Vectorized cloud and let's run it.

Alexander (14.10)
Alex, do you want to switch back to my screen real quick so I can showcase what Michal just did? So we just did this migration where Michal is now Michal underscore target.sh.

So in a second we're going to produce from Alex to Confluent Cloud and we're going to consume from Michal connected to Redpanda.

And let me just highlight that real quick here let me erase a bunch of lines and then say this black line is no longer the case here and now we just have this proper line is now here so this is now the connection.

Michal (15.31)
So exactly as Alex said we should be able now to see this will of course have a little bit of lag because it may take some time and let's see here into this target topic. There was some connection issue. There we go yeah so we have it here.

Alex (16.32)
So just in terms of flow there Alex is still everyone's still producing. I guess Alex is producing to existing Kafka cluster that's getting mirrored over to Redpanda which is now consuming from.

So at some point when you're sort of I guess makers replicated you over you just stop your consumers start them again.

Michal (16:53)
Yeah and this is what I've done for Michal. I stopped the consumer I switch it over so it's like that. What is different here is security setup because in Confluence by default they are using plain scrum, and we are using a shot 256 scrum method and the address is obviously different and credentials are different but all the other things were mirrored by Mirror Maker and we can connect Alex to the target eventually here.

Alexander (17.31)
Alex did you want to switch over onto my screen? Okay let's see his demo before you hit enter. I just wanna give people the thing about demos I feel that is when I show it to my wife it's very anti-climatic to people that aren't like in this.

Just because it's like yo here's this line and then it appears on this other terminal and like the nerd in me and we're just like this is like really cool. And then I showed this to my office just like so what happened I was like look it showed up on this other terminal and it's uh yeah.

But anyways we should switch over onto the graph so I can show people tuning into the twitch stream just keep him up to date so it happened very fast. But what we just did is we just took this line and now we're here and so it's now completely moving into we're now completely migrated but I think it was the subtle thing is that we produce two messages to so offsets technically right.

This was offset two and magically when Alex migrated over onto Redpanda cloud it kept at offset two and so that like this is the piece de la resistance this is sort of this alien point.

You don't have to reconsume your entire data and your consumer groups are checkpointed and obviously you can produce and consume and all of these things work right in the same Kafka API.

We use sazeles scram and and we use different authentication and it went through the aqua control system and the back pressure and abuse prevention.

There's possibly seven layers of auditing and security before you were actually able to consume that so when it all works it's anticlimactic because you type right on the terminal and then it shows on the other terminal but mentally believe me it's kind of sophisticated back to you.

Alex (19.37)
I agree one thing I like about the visualness of it is going into cal and you can see you're connecting to your Redpanda cluster from cal and you can see all those messages you previously sent and all those new messages as well. So I think that's a nice visual element of that as well.

Alexander (19.53)
Totally! Michal, anything else that you want to showcase on your end or should we start to talk about the value of MM2 versus shadow indexing.

Michal (20.08)
I think that’s going to be all from the side of this migration itself. So we migrated the cluster successfully and now we have two applications, two clients running on the target cluster and they continue where we left off. I know that was a simplistic demo as you said.

Alex (20.27)
Yeah the great part is how seamless this is and how it works and you can completely switch clusters and take advantage of the Redpanda performance but also the shadow indexing and things like that. Alex one thing I want to say real quick just to the folks that are watching.

If you have any questions feel free to drop them in the chat and we'll bring up an answer here but otherwise we'll just talk a little bit more about issues you might see while migrating or how to think about this stuff or how it all interacts together.

I know we had one point on how Mirror Maker works with the ackl and things like that. Do you want to talk about that and what to think about there?

Alexander (21.01)
Oh yeah, Michal, do you want to cover the source code of Mirror Maker too if you want to bring up the github link I think it's this detail that is useful for people using Mirror Maker but anyways go ahead.

Michal (21.18)
Definitely that's something we need to remember. So Mirror Maker is not fully replicating all the ackles that are set up in the cluster.

So I had to prepare that demo I had to pre-populate some of the ackles on the target cluster because it as you can see here right. So this is the code that syncs topic ackles and you can see here that we list all ackles and then we filter them out so that their only topic of some resource type is topic and the pattern types literal.

Then there is this small method should replicate ackle which checks whether this ackle is actually not the one which allows you right. So basically it's part of your ackles will not be automatically with it.

So this is something that is good to remember about when you are doing this that's either from Redpanda or to Redpanda or to whichever else kafka compatible system.

Alex (22.23)
So make sure you're testing those migrations and staging and you know setting up your user lists I believe when you do and then also setting up those write ackles or those ones that get filtered out here just to make sure everything's going to work well once you once you switch everything over agree one.

Alexander (22.40)
One thing to note is obviously I think it makes sense as a default because a lot of these are down for disaster recovery right. And so it may be like you may want to have separate ackles right in on different clusters.

You may have the front-end proxies might be able to connect to two different clusters before typically right like on an application we have a proxy of some app. Let's say a click stream if you're an ad tech or a fraud detection if you're a bank etc those things will translate what your app into a log that eventually shows up into Redpanda.

So i think it's a safe default and it's yeah so just something for you if you want the thing to make sure you have right access to both clusters you do have to propagate the right access to yourself here's the thing for most people.

This has to recover is like a one you know like you spend the time in the infrastructure you set it up on whatever you're running your clusters or containers whether it's Kubernetes or something else.

Once you set up this process and so it's not a it's not it's not really a thing that that that you really need to worry too much right like someone else or the maybe if you're a small shop then you do worry about setting it up.

But if you want this kind of live migrations it typically is for customers that are a little bit more sophisticated in terms of their failure recovery and they have the budget which I want to talk about next.

Alex (24.15)
That's right and just a note. That was a great explanation. Just a note for other people you know we're walking through this demo visually Michal did a great job with that if you want additional details there is a post on the Redpanda website about how to do this migration and some additional details there so dropping that into the chat right now but make sure you check that out if you're interested.

Alexander (24.39)
Do you want to bring up my tablet. So while Michal was talking I just wanted to give a sense of when it is appropriate to use Mirror Maker and when not. And h

ow does it relate into shadow indexing and so the way I typically like to think about it and there is flexibility in both like this is in it's a tool and then you decide how much money you want to pay for the tool effectively think of that as Mirror Maker and same thing with Redpanda and so on.

Generally speaking what we have seen is that if you want low latency synchronization of clusters like we show you know once like all the consumer groups have started to populate and all that, then it's in like you know human real time like you can potentially chat and see things react. Let's say you were building a slack replacement of sorts so that's what I would say Mirror Maker 2.

What matters the most is when you want this really super low latency thing it's expensive to run if you're running this across multiple clouds or if you're running this across multiple I think even regions of any particular cloud and so for other things we recommend using our shadow indexing which leverages internal s3 tiered replication.

So instead of using the Mirror Maker to which uses resources on both ends which means here let's think about actually what this means. It means that from the network side you're effectively half in your entire total resources.

And by pushing the complexity around to s3 you get back effectively twice the capacity of both clusters source and destination so let's talk about that real quick and then we're going to wrap up in about five to ten minutes.

Mirror Maker 2 gives you this super low latency things generally speaking it's because you care about that sort of instantaneous failover like it's a hot cluster usually. Within a second of latency or whatever it takes MM2 to actually send data over onto the other cluster is when you really want that instantaneous well over like you have SLAs you know let me give you an example.

Alexander (26.59)
Let's say that you're trading a billion dollars and you're settling a bank account across a bunch of banks. It doesn't really matter if you spent a hundred dollars or five dollars even on or a couple bucks on sending data across multiple clouds with Mirror Maker 2.

Getting that wrong would be catastrophic at like a really large level right so and now compare that if you were doing attack or clicks when you're really just making cents on the dollar let's say a cent on a dollar or maybe four cents on a dollar.

Then what matters is you would want that but really what you care about is the cost of doing that live migration.

It’s probably more expensive than if you leverage internal s3 aggregate data translation so let's talk about that for a second but that's generally the guidance and only you know where in this spectrum.

You just move this line here because this is for you this is really where Mirror Maker is like you want the majority of your data to be real time.

Or maybe you move the aggregate data all the way here because you want to keep a cheap or cost effective way of transferring data so really think about that as a way to reason about the cost of your infrastructure.

The next thing I really want to highlight two things which is the cost of Mirror Maker computationally speaking often this translates to the mechanical cause and network bandwidth.

And then really the one that ends up like a bandit is the hypercloud so you know whether it's amazon or google or azure they're the ones that make all the money. Not the vendor nor the person who's actually replicating data.

But when you have a cluster one and cluster two and your mirror and traffic on both ends you have to take into account the local traffic of both clusters. So what happens is if you think about the cost of traffic you're putting the pressure of all of these clusters and all of these clusters right and so actually the capacity of your cluster is technically this intersection for both sides.

That really is the true capacity that you can do because you're consuming network resources here which is the same computers that are also consuming network resources here and same thing for consuming and producing here and there.

Now, when you change this paradigm and you add shadow indexing let me give you a mental model for how this changes.

So you still have cluster one and you still have cluster two and you may experience higher latency to be able to consume on this side.

But what you've effectively eliminated is this transfer between clusters become the s3 API and it could be gc gcs for google and azure blob store for azure. But the point is that now your capacity if you really do this venn diagram is your total capacity is largely the sum of all of this and the sum of all of that. Actually they don't overlap let me think about that. I guess for real time this is true and if you are reading from historical data then you will consume some additional resources for reading historical data.

Alexander (30.41)
But by and large if you're really talking about just disaster recovery you get the union rather than and I don't think it's overlapping rather than the intersection so when you start to think about costly structures it it matters basically on a cost perspective.

So just to wrap up the mental picture is that Mirror Maker 2 is really about low latency and only you can determine based on what do you actually care about.

Is your use case important enough or is it truly mission critical enough that you want this low latency actual real-time data replication and it means that latency from a mental perspective you're having your capacity of your clusters or do you rather have the union of this at the expense of having higher latency.

So if you think about that and we covered this on the previous live stream show of Redpanda. What happens is let's say you have three Redpanda nodes and they're all talking to each other and so when segments get evicted they get uploaded onto s3 and imagine those are files.

So what happens is if you have a separate cluster let's say cluster one and it also is a set of Redpanda then with shadow indexing what that allows you to do is that you can actually fetch data using the s3 API to basically bring data local to that particular ac region and then download it and render it to whatever application is connected into the Redpanda cluster.

So this higher latency is much higher throughput because you're effectively leveraging the s3 internal API and i think there's an s3 API called s3 copy object. To put it on a different bucket but it is higher latency but is also very high throughput.

We haven't run into a single customer that has saturated the throughput of the s3 copy API across the region. Even during a disaster recovery case most real-time applications are less than a few gigabytes per second and s3 depending on your contract size and how the shardiness strategy is done which we take care of that it really gives you multiple gigabytes per second download.

I'll stop talking here this is super technical and we went down into the weeds but hopefully it gives you an idea of how to think about the difference of the systems.

Alex (33.25)
Just to make sure I'm understanding that so if you had a hot hot setup with two different regions and you're using tiered storage – are you saying I wouldn't have to run it like if i'm pulling up a new hot region I wouldn't have to run everything through my Redpanda cluster and then pull it down from s3 I could just sort of copy it from s3 and it can adjust from there?

Alexander (33.45)
Yeah and Redpanda handles that so it's it's literally a configuration object like you're on the destination cluster you say Redpanda applied this read replica region. That's coming later this quarter probably.

What works is transparent tiered storage that's something that is part of the enterprise today and so the idea here is that Redpanda should just be able to use a bucket as a read replica so there is no active load for transferring the data and it really goes there's so many layers of security.

There's cpu utilization because you're no longer decrypting messages you're no longer validating the alcohols like all of that happens transparently for you at the file handle level.

So it's just it's super efficient in terms of cluster resources. Beyond just the network and disk utilization there's also a huge value added in terms of just load and cpu load and things like that.

Alex (34.44)
That's amazing. Shadow indexing super interesting thing you know we talked about it a lot on previous episodes of real time with Redpanda. I'm gonna drop a few links there is tomorrow there's gonna be a virtual workshop so make sure you sign up for.

I just dropped it in the chat but that'll be tomorrow there so make sure you join and you'll get to hear about use cases some of the details that Alex is talking about here.

So if you have questions about how shadow indexing works how it can work for your situation I think that'd be a great thing to check out. Alex, Michal this has been great. Any other things we we should hit on before we go?

This is super interesting to see these migration use cases to see the hot use cases with mirror maker and things like that

Alexander (35.28)
One last thing I'll mention is underscore consumer offsets. This is an internal topic that is used in Kafka for additional tools. Kyle uses this there's basically this ecosystem of tools. LlinkedIn borrow is a project that may be using this.

There's a set of projects that leverage this as a way to detect consumer lag even though it wasn't necessarily needed for this live migration, it is needed for ecosystem compatibility and so that's landing next week so stay tuned for the official release of consumer offset.

More tools just continue to come. We try to be there's no line for people tuning in if the if your application doesn't work. It's just a bug with Redpanda and then we go and we fix it and so this was one of those bugs.

Actually I think this is the main one that is landing into this official release so with that thanks to everyone for joining in. It's always super fun to join and be nerdy on the details of Redpanda.

Alex (36.49)
Absolutely awesome so yeah thanks everybody for joining in. Alex, thanks for being here. Michal, thanks for showing up and wearing your cool Redpanda shirt as well.

And if you have any questions feel free to hit us up on, Twitter, Slack, all that stuff. You can go to redpanda.com to join the community. Get some swag there or anything you want but otherwise just check back next time.

I think we'll be back in two weeks with a pretty cool use case so we'll see you then. Thanks everybody.

Alexander (37.14)
Bye everyone.

Michal (37.16)
Thank you, bye!

Learn more about the benefits of Redpanda

Watch more tech talks

Keep exploring Redpanda!