
Organizations collect vast volumes of data from multiple systems and then need to combine these streams for insights. The challenge is that data is constantly changing, and some real-time decisions — like in financial trading — can mean the difference between winning or losing millions.
To keep systems up to date with minimal latency, a popular technique is change data capture (CDC), which tracks changes (inserts, updates, deletes) in a database as they happen, then streams only the changed data instead of scanning the entire database.
CDC is particularly useful for real-time data streaming, which is where Redpanda Connect comes in. With hundreds of configurable connectors, Redpanda Connect is a fresh alternative to Kafka Connect that’s more flexible, scalable, and simpler to deploy so you can easily integrate disparate data systems.
In this post, we introduce you to Redpanda’s CDC inputs for three of the most popular database engines: PostgreSQL, MySQL, and MongoDB.
But first, how is our CDC different?
If you’re just stumbling upon us now, check out our post on Redpanda Connect vs. Kafka Connect to learn how we work differently. In a nutshell, Redpanda Connect easily connects to everything — from operational inputs (CDC) to analytical outputs (Snowflake) so you can build high-throughput, low latency data pipelines, even for sovereign AI and agentic workloads.
You might already be familiar with CDC platforms like Debezium, so to understand why you’d use Redpanda instead, we have to dig into the concepts of snapshots and parallel snapshots.
- Snapshots capture a copy of the entire database state. Our connector reads the snapshot into Redpanda to capture the database’s current state before transitioning to real-time change streaming (i.e. streaming new data).
- Parallel snapshots allow multiple tables (PostgreSQL) or collections (MongoDB) to be read concurrently, radically reducing the time needed to complete the snapshot phase. Now here’s where we’re different: Redpanda’s PostgreSQL and MongoDB CDC connectors can also parallelise reads for large tables. That means tables or collections with millions of records can be split into smaller chunks and read in parallel.
Debezium (Kafka Connect) does not do this today.
Now, let’s get into the CDC connectors currently available in Redpanda Connect.
Postgres CDC
The postgres_cdc
connector captures and streams row-level database changes from PostgreSQL’s Write-Ahead Log (WAL). Additionally, you can also configure it to stream all existing data from the database to give you a fully fresh in-sync view of your full database.
How it works:
- Logical replication: PostgreSQL captures changes at a transaction-level via logical replication, ensuring data consistency by only streaming fully committed transactions. No need to worry about partial or rolled-back data.
- Snapshot and streaming: When creating a replication slot in PostgreSQL, the connector exports consistent snapshot of your tables and seamlessly transitions to streaming ongoing changes from that snapshot point.
Key features:
- Parallel snapshotting: Have billions of rows in your snapshot? No problem. Our connector can shard tables dynamically and read from them in parallel.
- Reliable checkpointing: PostgreSQL's built-in replication slots offset tracking and checkpointing is integrated with Redpanda Connect's at-least-once delivery guarantees ensuring you never drop any data.
Check out TinyBird’s guide on how to use PostgreSQL CDC with Redpanda Connect.
MySQL CDC
The mysql_cdc
connector uses MySQL’s binary log (binlog) to capture changes made to a MySQL database in real time and then streams them to Redpanda Connect.
How it works:
- Binlog streaming: MySQL CDC uses binlog positions to track changes, requiring an external cache (Redis, a SQL database, or another datastore) to store binlog offsets.
- Consistent snapshotting: For consistency, this connector gets a global read lock during initial snapshots, records the binlog position, and then releases the lock to stream data from that precise point forward.
- Topology support: Currently supports standard MySQL setups and primary-replica configurations, with plans to extend support for high-availability clusters and Global Transaction ID (GTID) environments.
Key features:
- Flexible offset management: Offers external checkpointing options to fit your infrastructure needs.
- Snapshot consistency: Ensures reliable, consistent snapshots through strategic locking.
- Reliable checkpointing: With Redpanda Connect's at-least-once delivery model, you can trust that your data is reaching its destination without missing a beat.
MongoDB CDC
The mongodb_cdc connector streams data changes from a MongoDB replica set, using MongoDB’s change streams to capture data updates.
How it works:
- Oplog-based streaming: Captures updates directly from MongoDB's operations log, providing an efficient, near-real-time data stream.
- Parallelized snapshots: The connector employs parallel reads during snapshots, significantly boosting performance for large-scale data migrations by splitting collections into manageable chunks.
Key features:
- High-throughput snapshots: Splitting snapshots into chunks within each collection significantly speeds up initial data migration.
- Flexible document modes: Customizable document handling for updates and deletes, supporting full-document lookups and pre/post image capture.
- External checkpointing: Uses external stores for oplog positions, similar to MySQL, giving you control over your checkpointing strategy.
Generate a trial license key to try Redpanda Connect for 30 days.
Game-changing CDC connectors at your fingertips
You're spoiled for choice when looking for practical CDC use cases, like updating reporting dashboards, replicating and migrating a database with minimal downtime, triggering business workflows in real time based on data changes, catching and preventing fraud, and fueling AI/ML models with the freshest data.
Inspired yet? All three connectors mentioned in this blog are currently available in Redpanda Cloud and Self-Managed with an Enterprise license. Redpanda Connect’s ecosystem is growing fast, so get started for free and check out what other cool connectors you can use to make your job easier.
Related articles
VIEW ALL POSTSLet's keep in touch
Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.