Kafka Migration: How to Migrate From Kafka to Redpanda

A helpful checklist to set you up for a smooth transition to the simpler streaming data platform

By
on
October 24, 2023

In a time of high-performance, high-throughput, and low-latency applications, Apache Kafka® is no longer the streaming data superpower it once was. Companies now need a platform that supports real-time, mission-critical workloads without breaking the bank (or their ops team). This is why more companies are choosing Redpanda for simpler management, better performance, and lower cloud costs.

Why migrate from Kafka to Redpanda?

If you’ve decided to take the leap and migrate from Kafka (or Kafka-compatible platforms) to Redpanda—congratulations! From planning and migrating to testing and troubleshooting, there’s a lot involved in a migration, and the first step is often the trickiest. As always, here at Redpanda we like to make things simple, so we wrote a completely free guide detailing how to migrate from Kafka services to Redpanda.

To give you a taste of what the process entails, this post covers the all-important planning stage. It’ll help you find your footing, check all the right boxes, and prepare your clusters for the smoothest transition possible.

Ready to take the first step toward simpler streaming data? Let’s get started.

Migrate from Kafka platforms to Redpanda – an overview

A prepared switch to Redpanda from other Kafka platforms is easy and involves just a few simple steps. There are slight variations to look out for from one Kafka platform to another, depending on the specific implementation details, but here’s what you can generally expect.

  1. Pre-flight evaluation of your existing Kafka platform: Application behavior, data volume and usage, performance characteristics, availability and service level agreements/objectives, and administrative and governance operations.
  2. Pre-flight cluster setup: Set up a Redpanda cluster sized appropriately for your workloads, including items like users, security, access control lists, and topics.
  3. Replicate data: Set up a continuous replication process between the legacy Kafka platform and the Redpanda cluster.
  4. Validate data: Validate completeness and correctness of replicated data.
  5. Reconfigure the application: Move Kafka producers and consumers to point to the Redpanda cluster.
  6. Evaluate application, load, and security: Validate that the application is functional, performant, and adheres to the appropriate security policies once using Redpanda.

For the visual learners, here’s a diagram of everything mentioned above.

The migration process from Kafka to Redpanda at a glance.
The migration process from Kafka to Redpanda at a glance.

Planning your Kafka migration to Redpanda

A solid plan is the first step to a successful migration to Redpanda. This helps you see potential risks that can cause delays down the line, allowing you to mitigate them before they become a problem. At this crucial stage, consider the different stages of work in relation to your development, deployment, and operational practices.

Some companies have a process in place for making changes. This can involve moving design and architecture changes through a development and testing environment before making changes to production workloads. There’s more to consider after development and functional testing. You’ll need to validate the data, test the performance and security, and get approval from relevant stakeholders such as application developers and users who rely on the data.

Before migrating, it’s also important to plan and review each application that uses Kafka. This helps define the functional and performance requirements for each flow of data through the system.

You’ll need to understand how many messages are produced and consumed, topic configuration, log segment rotation rates, and message retention details – to name a few examples. By evaluating these details, you ensure that Redpanda meets the functional and performance requirements for each application.

To get you on the right track, we’ll briefly cover the following:

  1. Common migration questions
  2. A pre-migration checklist
  3. How to evaluate what to migrate
  4. An example Level of Effort timeline

Let’s dig in.

1. Common questions about migrating to Redpanda

Some questions come up during most migration processes. These are the three we get asked the most.

Can I install Redpanda over my existing Kafka installation to shorten the transition?

Unfortunately, no. Redpanda uses a different storage mechanism than your legacy Kafka platform. The process is set up as a replication pipeline to transfer data from the source cluster to a new Redpanda cluster using purpose-built systems or containers.

How long does migration take?

Migration time to a new platform depends entirely on the use case. Usually, moving data is the easiest part to define and scope. The most difficult can be data validation as well as application updates and validation. We do our best to minimize required application-level changes. But in testing, you may find some changes to be necessary. The general guideline is to copy approximately 85 terabytes of data per day on a single 10-gigabit network link between two endpoints. You can scale up or down depending on the base node bandwidth needs. Other factors like available network or CPU capacity come into play. But they don’t impact existing consumers or producers. In these cases, data movement would need to be tuned appropriately to avoid overwhelming your production environments.

Will I need to change my applications?

Not at all. Since Redpanda is an API-compatible, drop-in replacement for Kafka, you can expect minimal change required – and sometimes none at all – at the application level to transition over. In some cases, you may need to update administrative tooling or integrations for monitoring and observability, due to how Redpanda exposes these interfaces to users. An example is that monitoring metrics may have different names.

2. Pre-migration checklist

As you prepare to move from Kafka to Redpanda, here’s the list of requirements you should check off.

  • A destination cluster running a recent Redpanda version
  • Validated network connectivity between source and destination cluster
  • MirrorMaker2 installed and configured near the destination cluster
  • User credentials for MirrorMaker2 to connect to both source and destination cluster User permissions or ACLs for MirrorMaker2 to access source and destination topics on both clusters
  • If you’re using TLS for in-flight encryption, MirrorMaker2 must have the appropriate Certificate Authorities configured to validate TLS connections
  • List of topics to migrate

3. Evaluating what to migrate

Kafka implementations come in all shapes, sizes, performance characteristics, and tenancy designs. It’s critical to catalog yours in as much detail as possible. No matter if it’s a small cluster hosting a handful of topics or a large, multi-tenant shared cluster with thousands of topics competing for resources.

To make sure you don’t miss anything during migration, consider the entire workflow and its dependencies within the source cluster. Here’s what you’ll need to move:

  • Topics and each of their configuration settings
    • Include items like retention configuration, replication factor, and partition counts
  • Users and passwords for authenticating to the cluster
  • Access control lists (ACLs) to give users access to topics
  • Schemas
  • Consumer offsets or their translations
  • Consumer group details

It’s helpful to go through these details for each step of the change management process, from development to production. To help you stay organized and cover all your bases, check the migration questionnaire in the appendix of our complete guide "How to migrate from Kafka services to Redpanda".

[CTA_MODULE]

4. Example: Level of Effort (LOE) timeline

Migration timelines vary in length, but most follow the same cadence and order of operations. Multi-environment setups – like those with formal development, QA, and production lifecycle – might have higher levels of effort due to higher data throughput levels or certain operations front-loaded in the process, like initial application testing.

To give you a better idea of your own timeline, let’s lay out an example. We’re following the above migration overview for a small, low-volume cluster with only a handful of topics and clients. This particular timeline includes 62 hours of effort with four hours of actual outage to applications to transition consumers and producers to the new cluster.

Please note that data flow outages to producers and consumers could be as little as two hours each. We recommend moving consumers first, so they can work off the Redpanda cluster while producers flow data to Redpanda via MirrorMaker2.

  • Pre-flight evaluation (16 hours total)
    • List topics to migrate, determine configurations and data volume (4 hours)
    • Review producer/consumer client code (4 hours)
    • Review governance requirements (8 hours)
  • Pre-flight cluster setup (12 hours total)
    • Setup physical or virtual cluster nodes (4 hours)
    • Install and configure Redpanda and Redpanda Console (2 hours)
    • Configure Redpanda Security (2 hours)
    • Provision users and ACLs (4 hours)
  • Data replication (6 hours total)
    • Install and configure MirrorMaker2 (2 hours)
    • Setup Checkpoint Connector (1 hour)
    • Setup Heartbeat Connector (1 hour)
    • Start topic replication (2 hours)
  • Data validation (6 hours total)
    • Verify consumer offset translation (2 hours)
    • Validate completeness of replicated data (2 hours)
    • Validate correctness of replicated data (2 hours)
  • Application reconfiguration (4 hours total)
    • Move Kafka consumers to Redpanda (2 hours)
    • Move Kafka producers to Redpanda (2 hours)
  • Application, load, and security evaluation (16 hours total)
    • Validate the application is functional (8 hours)
    • Compare performance baseline (4 hours)
    • Verify security policies and behavior once using Redpanda (4 hours)
  • Decommission (2 hours total)
    • Shutdown legacy cluster (2 hours)

As you figure out your timeline, keep in mind that an LOE timeline normally only accounts for work effort. This means day-to-day realities can stretch your plan out – like if you’re restricted to work within specific maintenance windows.

Your choice of deployment method can also affect the length of your timeline. For example, deploying Redpanda Dedicated Clusters in the cloud is usually quicker than greenfielding fresh physical infrastructure. If you need a hand getting your timeline right, our Customer Success team is happy to help adjust it to your requirements and environment.

Best practices for a smooth Kafka migration

Best practices for a smooth Kafka migration pre-, during, and post-migration include:

Assessment and planning

Proper assessment and planning guarantee a detailed understanding of your current setup and any notable configurations. This allows you to catch any potential compatibility issues and make adjustments as needed before beginning the migration to a new environment. For example, assessing things like API compatibility, performance benchmarks, data architecture, and other specific configurations in Kafka can help in mapping workloads to Redpanda.

Having a solid understanding of how data will be transferred and validated also helps to minimize the risk of data discrepancies and data loss throughout the process, helping to preserve data integrity. Additionally, proactive planning can help you accurately plan and allocate time to the migration project, which will further help to avoid potential interruptions once the migration kicks off.

Choosing the right tools for migration

Having the right supporting tools for a Kafka migration can help the process go smoothly with minimal errors or downtime. There are a number of options available, including monitoring and performance tools that help track the overall health and performance of both Kafka and Redpanda during the migration. Data validation tools can also help maintain data accuracy and consistency to help prevent data corruption throughout the migration.

There are also configuration management tools that can help automate the setup of new environments in Redpanda and corroborate a consistent deployment with minimal errors. Understanding the goal of these tools and the benefits they provide can lead to a more simple and (streamlined) Kafka migration.

Testing and validation

Testing and validating throughout the entire Kafka migration process is critical to success. It allows you to catch and address potential issues around things like compatibility and data integrity before they have a chance to grow into larger problems that will be much harder to fix pre-deployment. Resolving these issues before the full-scale deployment will help to avoid high-cost problems and save time spent repairing problems.

Post-migration monitoring and optimization

Once the migration from Kafka to Redpanda is completed, it's important to continue monitoring performance on an ongoing basis to maintain visibility into system health. Maintaining oversight into metrics like message throughput and latency will help confirm that performance is on track, and this visibility will make it easier to optimize settings like topic configurations or replication factors to drive overall efficiency.

Once you have access to real-world performance data post-migration, you can begin to identify and act on any inefficiencies and adjust configurations to help improve performance and operational productivity.

Ready, set, migrate! Download the full Kafka migration guide

Look at that, you’re a few steps closer to simple, powerful, and cost-efficient streaming data! If you’re ready to steam ahead, download the full guide on migrating from Kafka services to Redpanda to learn:

  • How to plan every step of your migration and map a realistic timeline
  • How to test and validate your data for integrity, security, and performance
  • What common migration challenges to look out for (and how to solve them)
  • A migration questionnaire to make sure you cover all your bases

If you’re still on the fence about switching over, you can always take Redpanda for a test run. Just sign up for a free trial of Redpanda Cloud or grab the Redpanda Community Edition from GitHub. If you have any questions or want to chat directly with our engineers about migrating, join our Redpanda Community on Slack.

Kafka migration resources

More of a practical learner? We’ve got you covered. From interactive workshops to hands-on labs in Killercoda, take your pick and learn how to migrate in whatever way suits you best. Happy migrating!

Want the full report?
Download the free guide on migrating from Kafka to Redpanda.
Graphic for downloading streaming data report
Save Your Spot

Related articles

VIEW ALL POSTS
Batch tuning in Redpanda for optimized performance (part 1)
Travis Campbell
&
Paul Wilkinson
&
&
November 19, 2024
Text Link
What is a data streaming architecture?
Redpanda
&
&
&
November 14, 2024
Text Link
How to set up observability for Redpanda
Kavya Shivashankar
&
&
&
November 12, 2024
Text Link