How to migrate from Amazon MSK to Redpanda

Your step-by-step guide on switching to an easy, cost-effective Kafka-compatible alternative

January 4, 2024
Last modified on
April 1, 2026
TL;DR Takeaways:
How can I deploy Redpanda for the migration process?

You can deploy Redpanda with the Cloud control plane if you're using a Dedicated or Bring Your Own Cloud (BYOC) instance. If using Self-hosted on Kubernetes, you can use Redpanda's helm charts and operator. If you opt to deploy straight to a Debian or Red Hat-based system, you can use Redpanda's deployment automation framework based on Ansible. More details on setup can be found in Redpanda's Self-hosted or Cloud deployment documentation.

How can I migrate data from Amazon Managed Streaming for Apache Kafka (MSK) to Redpanda?

This blog suggests a comprehensive migration process that includes planning, data and application movement, functional validation, and scale testing. The process involves setting up a Redpanda cluster and a node running MirrorMaker2 that can access both MSK and Redpanda clusters. You then start copying data from MSK to Redpanda, ensuring consumers' offsets are being replicated and translated. Once the initial replication phase has caught up, you reconfigure consumers to point to Redpanda and start them up. Then, stop Producers, allow MirrorMaker2 replication to catch up with any lagged message flow, stop MirrorMaker2, and reconfigure producers to point to Redpanda. Once the producers and consumers flip to Redpanda, confirm they’re operating correctly, and then safely shut down and remove the MSK cluster.

What are the AWS prep work and caveats to consider during the migration process?

Amazon MSK uses client Identity and Access Management (IAM) roles to give a user, application, or AWS instance access to the topic data in the MSK cluster. These roles are attached to a policy defining what the role is allowed to do. Since MirrorMaker2 must authenticate to the MSK cluster, you need to create appropriate user credentials via these roles. This allows MirrorMaker2 to connect to the MSK cluster and access the topic data (or metadata). You also need to configure Security Group rules to allow the MirrorMaker2 client machine to access both clusters.

What are the steps to prepare the MSK and Redpanda environments for migration?

First, set up a Redpanda cluster and try to minimize the network latency between MSK and Redpanda for the best data transfer performance. Then, set up a node running MirrorMaker2 that can access both MSK and Redpanda clusters. Configure access policies and MirrorMaker2 properties file. Start copying data from MSK to Redpanda, ensuring consumers' offsets are being replicated and translated. Producers and consumers will not be touched in this step.

What is the role of MirrorMaker2 in the migration process from MSK to Redpanda?

MirrorMaker2 is used to copy data from MSK to Redpanda during the migration process. It helps in the initial and ongoing replication of data, ensuring consumers' offsets are being replicated and translated. It also helps catch up with any lagged message flow before stopping to prevent accidental duplicate publishes.

What happens to consumer offsets during migration?

Redpanda Migrator handles offset translation automatically. If using MM2, the MirrorCheckpointConnector manages this, though it often requires manual tuning.

Learn more at Redpanda University

Moving to another streaming data platform is not a decision that’s made lightly, but it’s one many companies are now making to achieve higher throughput, minimal latency operations, and a significantly lower total cost of ownership.

There are many flavors of Kafka implementations—a popular one being Amazon Managed Streaming for Apache Kafka (MSK), to handle their Kafka messaging traffic. While MSK is positioned as a relatively hands-off solution, some find that the performance and cost don't quite align with their expectations of fast, value-friendly Kafka implementations. The Kafka operational tax is real and people are looking for easier, simpler, and cost-effective ways to handle data flow.

Redpanda provides a seamless, drop-in replacement for MSK. We've already covered how to migrate from Kafka to Redpanda, where we gave a high-level overview of the moving process from a legacy Apache Kafka® environment into a modern, high-performance solution with Redpanda. In this post, we'll discuss a practical way of moving data out of MSK and into a Redpanda cluster that will work for our Dedicated, Bring Your Own Cloud (BYOC), or Self-hosted deployment models.

Note this is just one part of a comprehensive migration process that covers areas such as planning, data and application movement, functional validation, and scale testing. If you'd like to learn about the full scope, check out our free report detailing how to migrate from Kafka services to Redpanda.

[CTA_MODULE]

Why migrate from Amazon MSK to Redpanda

Migrating from Amazon MSK to Redpanda eliminates the "operational tax" of legacy Kafka infrastructure. MSK offers managed provisioning, but the ongoing operational burden (scaling, upgrades, monitoring, and DR) still falls on your team. Cloud provider support staff are generalists, not streaming experts, and the MSK SLA includes exclusions for core Kafka components.

By moving to Redpanda's C++ thread-per-core architecture, teams reduce total cost of ownership (TCO) while gaining a predictable, low-latency platform backed by dedicated streaming specialists. Because Redpanda is fully Kafka API-compatible, this move usually requires no application code rewrites so most teams only need to update the broker list and related client authentication settings.

Choosing the right migration tool

Selecting the right tool is the most critical decision in your migration path. We recommend a tiered approach based on your specific infrastructure needs:

  • Redpanda Migrator (The standard): Available via Redpanda Connect, this single Go binary is the recommended path for active takeouts. It handles one-time replication and backfills with minimal infrastructure, eliminating the need for a heavy Kafka Connect cluster.
  • Redpanda Shadowing (The future): Now generally available in Redpanda 25.3 for disaster recovery, this enterprise capability offers byte-for-byte, offset-preserving replication with near-instant recovery. Coming soon, Shadowing will be updated to further streamline migrations to Redpanda, with fewer components and available as an enterprise feature.
  • MirrorMaker 2 (The legacy fallback): While functional, MM2 is a complex disaster recovery setup requiring a separate Kafka Connect cluster and JVM tuning. We detail this path below solely for teams with legacy requirements that mandate its use.

Planning your migration

A successful migration starts before the first message moves. At a minimum, you should inventory the topics and consumer groups you need to move, confirm how authentication works on both sides, make sure network paths are in place between MSK and Redpanda, and decide how you'll cut producers and consumers over. That planning work matters because this tutorial covers only one part of the overall process: data movement and client cutover.

  • Inventory the scope. Decide which topics, consumer groups, and applications move first.
  • Confirm connectivity. Make sure your migration host can reach both clusters with the right ports and security rules.
  • Review authentication. MSK IAM access and Redpanda client authentication should be sorted before you start replication.
  • Define cutover steps. Know when consumers move, when producers move, and how you'll validate the result.

A practical migration walkthrough

The following guide details the legacy MirrorMaker 2 (MM2) approach for teams who cannot yet use Redpanda Migrator. Our primary goal here is to minimize disruption to your application pipeline during the cutover.

Because Redpanda is a drop-in replacement, your applications do not require functional code changes. You will simply need to update configurations, such as the broker list, to point to the new cluster.

The diagram below illustrates these steps:

  1. Set up a Redpanda cluster. For the best data transfer performance between MSK and Redpanda, try to minimize the network latency between the clusters.
  2. Set up a node running MirrorMaker2 that can access both MSK and Redpanda clusters. Configure access policies and MirrorMaker2 properties file. Start copying data from MSK to Redpanda. Ensure consumers' offsets are being replicated and translated. Note: Producers and consumers will not be touched in this step.
  3. Once the initial replication phase has caught up, shut off consumers on MSK, reconfigure them to point to Redpanda, and start them up. Note: Producers will continue publishing to MSK, Consumers will now read from Redpanda.
  4. Stop Producers, allow MirrorMaker2 replication to catch up with any lagged message flow. Stop MirrorMaker2 to prevent accidental duplicate publishes. Reconfigure producers to point to Redpanda. Note: Producers and consumers are now connected directly to Redpanda.

Once the producers and consumers flip to Redpanda, confirm they’re operating correctly, and then safely shut down and remove the MSK cluster.

The logical operations for moving data and producer/consumer clients from MSK to Redpanda
The logical operations for moving data and producer/consumer clients from MSK to Redpanda

We’ll mainly focus on blocks #1 and #2 in the diagram, providing practical direction on how to prepare the Redpanda and MSK environments, as well as configuring MirrorMaker2 for doing the initial and ongoing replication.

1. AWS prep work and caveats

Amazon MSK uses client Identity and Access Management (IAM) roles to give a user, application, or AWS instance access to the topic data in the MSK cluster. These roles are attached to a policy defining what the role is allowed to do. Since MirrorMaker2 must authenticate to the MSK cluster, you need to create appropriate user credentials via these roles. This allows MirrorMaker2 to connect to the MSK cluster and access the topic data (or metadata).

For this tutorial, we’ll assume an MSK cluster is already running and configured for using SASL_SSL with the AWS_MSK_IAM SASL mechanism. MirrorMaker2 will run from a client machine that's separate from both the MSK and Redpanda clusters. This ensures MirrorMaker2 has sufficient resources for your migration.

Check our migration report for more information on MirrorMaker2 sizing.

1.1 Configure AWS security group rules

In many deployments, Amazon MSK and Redpanda are configured in separate VPCs and with separate Security Group rules. You need to configure Security Group rules to allow the MirrorMaker2 client machine to access both clusters.

See the Redpanda deployment guide to understand what network ports you need.

1.2 Configure IAM permissions

Review the AWS documentation for creating an IAM role for use with MSK. This also walks you through creating the access policy.

The policy example provided below requires a correctly configured Amazon Resource Name (ARN) that defines the MSK cluster. Update the region, Account-ID, and cluster/Examplename strings with the ones for your AWS Region, AWS Account, and MSK cluster name. You can find the cluster name in the MSK console.

You may need additional policies defined in your environment, depending on what other IAM roles and policies are in effect. The following allows MirrorMaker2 to replicate from a freshly launched MSK cluster.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:Connect",
                "kafka-cluster:AlterCluster",
                "kafka-cluster:DescribeCluster"
            ],
            "Resource": [
                "arn:aws:kafka:region:Account-ID:cluster/Examplename/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:*Topic*",
                "kafka-cluster:WriteData",
                "kafka-cluster:ReadData"
            ],
            "Resource": [
                "arn:aws:kafka:region:Account-ID:topic/Examplename/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:AlterGroup",
                "kafka-cluster:DescribeGroup"
            ],
            "Resource": [
                "arn:aws:kafka:region:Account-ID:group/Examplename/*"
            ]
        }
    ]
}


Once this is created, you should have a new role and policy defined in AWS. In the next section, we’ll attach that role to the client machine where MirrorMaker2 runs.

1.3 The MirrorMaker2 client machine

Once you define the IAM role and configure the policy, you can then attach the role to the client machine instance. This will allow Java clients like MirrorMaker2 to connect and authenticate from this machine.

Amazon provides a custom Java JAR for enabling a Java Kafka client to authenticate to MSK. This must be installed separately, and configured as shown below. Because of the JAR installation requirement, we’ll use the standalone MirrorMaker2 process, rather than launching via Kafka Connect.

$ wget https://downloads.apache.org/kafka/3.6.0/kafka_2.13-3.6.0.tgz
$ tar zxf kafka_2.13-3.6.0.tgz
$ sudo apt-get install openjdk-11-jdk

$ cd kafka_2.13-3.6.0/libs
$ wget https://github.com/aws/aws-msk-iam-auth/releases/download/v1.1.1/aws-msk-iam-auth-1.1.1-all.jar

This installation procedure loads the AWS_MSK_IAM SASL handler into MirrorMaker2 for use in configuring authentication details. For more information on this authentication mechanism, review the Amazon MSK IAM Access Control documentation.

2. Deploying Redpanda

You can deploy Redpanda with our Cloud control plane if you're using a Dedicated or BYOC instance. If using Self-hosted on Kubernetes, check out our helm charts and operator. If you opt to deploy straight to a Debian or Red Hat-based system, check out our deployment automation framework based on Ansible.

Check our Self-Managed or Cloud deployment documentation for more setup details.

3. Setting up MirrorMaker2

With the environment prepped, create the mm2.properties file. This file should be placed inside the kafka_2.13-3.6.0 directory you set up earlier. This configuration file enables several connectors.

  1. MirrorHeartbeatConnector evaluates the liveness of the replication flow and detects when there may be connectivity issues or insufficient bandwidth.
  2. MirrorCheckpointConnector handles consumer offsets between the two clusters.
  3. MirrorSourceConnector handles the heavy lifting of topic-level data replication.

Setting the topics property allows you to pick and choose which topics get migrated across. This can take a regular expression to match some or all topics. Additionally, you can set explicit topics here if you want to limit it to just a handful.

You'll note that we explicitly disable topic ACL syncing. We recommend handling ACLs outside the scope of MirrorMaker2, as it tends to handle these in unexpected ways.

// name

name = msk-to-rp

// prevent re-encoding of data
key.converter = org.apache.kafka.connect.converters.ByteArrayConverter
value.converter = org.apache.kafka.connect.converters.ByteArrayConverter

// cluster names

clusters = msk, redpanda

//source.cluster.alias = ""
//target.cluster.alias = ""

// Bootstrap details

msk.bootstrap.servers = msk-1.example.com:9098,msk-2.example.com:9098,msk-3.example.com:9098
redpanda.bootstrap.servers = rp-1.example.com:9092,rp-2.example.com:9092,rp-3.example.com:9092

// make sure replication goes from msk to redpanda
msk->redpanda.enabled = true
// make sure there's no replication the other direction
redpanda->msk.enabled = false

topics = mirror-test
groups = .*

replication.factor = 3

//Make sure that your target cluster can accept larger message sizes. For example, 30MB messages
redpanda.producer.max.request.size=31457280

//Make sure topics are created without a cluster name prefix
replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy

// make sure group syncing is working for offset translation
msk->redpanda.sync.group.offsets.enabled = true

// emit offset checkpoints more frequently. checkpoint emitting should be on by default, but let's be explicit
emit.checkpoints.enabled = true
emit.checkpoints.interval.seconds = 10

// configure the heartbeat/checkpoint/offset syncs replication factor.
checkpoints.topic.replication.factor=3
heartbeats.topic.replication.factor=3
offset-syncs.topic.replication.factor=3

// disable acl syncing.
sync.topic.acls.enabled = false

// the amount of parallelized transfers occurring at once. Do not exceed the
// total number of partitions. In dense clusters with many partitions, you may
// need to make this an even divisor of partition count so as not to overwhelm 
// the mirrormaker2 client machine's resources
tasks.max=10

// necessary for configuring against an IAM-managed msk cluster.
msk.security.protocol=SASL_SSL
msk.sasl.mechanism=AWS_MSK_IAM
msk.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
msk.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler

// always go back to earliest offset to ensure no gaps occur in case of error
// note: this can make mm2 publish duplicates to the destination in error scenario
auto.offset.reset=earliest


4. Start MirrorMaker2

On the MirrorMaker2 client machine, run this command from the Kafka bin/ directory:

$ ./connect-mirror-maker.sh mm2.properties

Once the process fires up, you should see multiple connectors start up in the logs, including MirrorHeartbeatConnector, MirrorSourceConnector, and MirrorCheckpointConnector.

Offset translation caveats

In most deployments, it's necessary to do consumer group offset translation to allow consumers to seamlessly move from one cluster to the other and pick up where they left off. The MirrorCheckpointConnector is critical for this process. If it's correctly working, you should see log messages emitted from connect-mirror-maker.sh that look something like the following:

[2023-10-26 04:50:56,044] INFO [MirrorCheckpointConnector|worker] Found 1 consumer groups for msk->redpanda. 1 are new. 0 were removed. Previously had 0. (org.apache.kafka.connect.mirror.MirrorCheckpointConnector:174)
23:01

[2023-10-26 05:00:55,935] INFO [MirrorCheckpointConnector|worker] refreshing consumer groups took 726 ms (org.apache.kafka.connect.mirror.Scheduler:95)

[2023-10-26 05:00:56,086] INFO [MirrorCheckpointConnector|task-0|offsets] WorkerSourceTask{id=MirrorCheckpointConnector-0} Committing offsets for 1 acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233)
23:02
and

[2023-10-26 04:50:57,263] INFO [MirrorCheckpointConnector|task-0] task-thread-MirrorCheckpointConnector-0 checkpointing 1 consumer groups msk->redpanda: [kafka-test]. (org.apache.kafka.connect.mirror.MirrorCheckpointTask:114)


You may need to tune the emit.checkpoints.interval up or down depending on your overall load or frequency of updates. High-volume clusters with many offset updates may require increasing this.

Evaluating performance and lag

By default, MirrorMaker2 will emit status metrics via Java JMX. If you don't have a JMX query mechanism, you can emit Prometheus metrics using the jmx_exporter from the Prometheus project. Ingest these metrics using a standard Prometheus metrics scraper and visualize in Grafana.

We recommend using the jmx_exporter Java Agent method. You can supply the additional arguments to connect-mirror-maker.sh via setting the EXTRA_ARGS environment variable before running the command. The project README has more details on configuring the exporter.

Final tips

MirrorMaker 2 uses asynchronous replication, which can introduce latency in high-volume systems. To prevent lag, ensure your client node meets the following capacity requirements:

  • Network Headroom: Your client must handle double your ingress throughput—consuming from MSK and publishing to Redpanda simultaneously.
  • Sufficient Capacity: If you publish 1 GB/sec into MSK, your client machine requires at least 2 GB/sec of available network bandwidth.

Additionally, sizing tasks.max appropriately is somewhat of an art. You need to experiment with your particular environment during the initial data migration. Too small, and you may not get enough parallelization occurring to copy your partitions on time. Too large, and you end up wasting resources or experiencing process and scheduling contention.

For more tips, read our full migration report to learn what else might affect how your data migration runs.

[CTA_MODULE]

Ready to migrate to simpler data streaming?

Moving platforms doesn't have to be scary, and it's important to show people how to easily and safely get out of their legacy platforms. We’ve helped several customers successfully migrate out of Amazon MSK and into Redpanda, enabling them to experience a better, faster, and more cost-effective Kafka platform.

If you have questions about how to do this in your own environment, contact our team or join the Redpanda Community on Slack to chat with our engineers. You can also get a good preview of what the migration process involves with our on-demand Streamcast and Masterclass.

Wherever you are in your migration journey, we're happy to guide you along the way.

Free report: migrating from Kafka to Redpanda
Get a detailed walkthrough, expert tips, and troubleshooting.
Free report: migrating from Kafka to Redpanda
Get a detailed walkthrough, expert tips, and troubleshooting.

FAQ

How long does it take to migrate from Amazon MSK to Redpanda?
Will I need to change my application code when migrating from MSK to Redpanda?
Can I migrate from Amazon MSK to Redpanda with minimal downtime?
When should I use MirrorMaker2 instead of Redpanda Migrator?

Related articles

View all posts
Prakhar Garg
,
,
&
Feb 24, 2026

Building with Redpanda Connect: Bloblang and Claude plugin

A workshop on building and debugging real-world streaming pipelines

Read more
Text Link
Redpanda
,
,
&
Dec 16, 2025

How to build a governed Agentic AI pipeline with Redpanda

Everything you need to move agentic AI initiatives to production — safely

Read more
Text Link
Mdu Sibisi
,
,
&
Dec 9, 2025

Streaming IoT and event data into Snowflake and ClickHouse

A guide to building robust data pipelines for IoT with Redpanda

Read more
Text Link
PANDA MAIL

Stay in the loop

Subscribe to our VIP (very important panda) mailing list to pounce on the latest blogs, surprise announcements, and community events!
Opt out anytime.