
Build a real-time lakehouse architecture with Redpanda and Databricks
One architecture for real-time and analytics workloads. Easy to access, governed, and immediately queryable
Learn how to ingest your Redpanda data into OpenMetadata to centralize information and prevent data silos.
You can improve the understanding and usability of your data assets by adding a proper description of the asset purpose and properties, assigning the asset to an owner, flagging the asset as business-critical, and adding tags about PII data. This curation process helps existing teams share information and collaborate more effectively while reducing the onboarding time for new members.
You can ingest metadata from Redpanda into OpenMetadata by using OpenMetadata's Ingestion Framework. This Python library holds the logic to connect to multiple sources, translate their original metadata into the OpenMetadata standard, and send it to the server using the APIs. You can follow these steps to configure and deploy the Redpanda metadata ingestion.
You can manage changes in your data schema effectively by deploying and scheduling regular metadata ingestion workflows that flag differences between versions. All the versioning evolution is stored and explorable, and all change events can be consumed by setting up a Webhook. This approach minimizes - and, when possible, prevents - any downtime or time-consuming activities such as backfilling.
The main challenge in data management today is making sense of all the topics, tables, pipelines, dashboards, and ML models that teams have been gathering and creating over the years. The real work of gaining insights from data can become an unbearable task without the correct metadata - the properties that help us understand the data itself.
The purpose of integrating multiple data streams in a centralized hub is to break out of knowledge silos and share as much information as possible. This brings joy to any data consumer at any stage by showing how all the pieces interact. Breaking the barriers among teams is the first step to a healthier, more profitable, and scalable data platform, and that can only happen through transparency and collaboration.
Imagine entering a library just to find all the books lying on the floor, with no covers, bookshelves, or librarian! By the time you find the title you are looking for, it would be hard to care about its content. Unfortunately, this is an everyday reality when working with data.
Without the correct metadata - the properties that help us understand the data itself - the real work of gaining insights from data can become an unbearable task.
How many times have we seen data analysts using the wrong tables? Or backend teams silently changing schemas only to discover the fire a few weeks later? Can we trust the data powering our company’s core KPIs?
Collecting and storing data is no longer a struggle. The main challenge nowadays is making sense of all topics, tables, pipelines, dashboards, and ML models (to name a few!) that teams have been gathering and creating throughout the years.

Data practitioners have now realized that unlocking the value of data requires creating products that go beyond data and focus on people:
While these ideas might resonate, actually achieving them means integrating multiple tools that may not naturally talk to each other.
In this tutorial, we will show you how to ingest the metadata from Redpanda into OpenMetadata, an open-source metadata management platform powered by a standard metadata language and APIs.
Data platforms have multiple specialized teams - backend, data engineering, data scientists, and data analysts. Each of them focuses on different aspects of data. However, real value and understanding only comes from providing proper context.
As an example:
Our goal is to break out of knowledge silos and share as much information as possible to bring joy to any data consumer - at any stage - by showing how all the pieces interact. Breaking the barriers among teams is the first step to a healthier, more profitable, and scalable data platform, and that can only happen through transparency and collaboration.

To achieve that, we will use OpenMetadata to ingest the metadata from the services and provide a single place to discover and collaborate. In this blog, we will focus on integrating Redpanda, a Kafka API-compatible streaming data platform, that is easy to use and super fast in terms of both latency and throughput.
The easiest way to spin up OpenMetadata on your local computer is by using the metadata CLI, as shown in the Quickstart Guide. In a nutshell, it requires us to run the following steps:
pip3 install --upgrade "openmetadata-ingestion[docker]"
metadata docker --startThis will spin up the OpenMetadata server, a MySQL instance as the metadata store, ElasticSearch for querying capabilities, and the OpenMetadata Ingestion container, which will be used to extract metadata from your sources.
To prepare the source, we will use this repository. Clone it locally and run
docker compose upThe result will be a container with Redpanda brokers and a schema registry where we have fed some sample topics.
You can follow these steps to configure and deploy the Redpanda metadata ingestion.
Note: In our setup, both Redpanda and OpenMetadata are in a separate Docker Compose deployment. Therefore, we need to access the sources via the local network. We can achieve this by configuring host.docker.internal as the hostname.

The OpenMetadata UI will walk us through two main steps:
From the UI, we can directly interact with the service and the pipelines it has deployed without managing any other dependencies.

Moreover, engineers can directly import and use the Ingestion Framework package to configure and host their own ingestion processes. On top of that, any operation happening in the UI or Ingestion Framework is open and supported by the server APIs. This means full automation possibilities for any metadata activity, which can be achieved directly via REST or using the OpenMetadata SDKs.
Data is enjoyed the most in good company, and to help all teams and consumers, we need to have updated and clear descriptions of the assets. If we check the metadata that has been ingested, we will find the following list after navigating to the service page:

If we then access any of the topics:

We can observe properties such as the partitions, replication factor, and schema definition. This is great, but only if you already know what this topic is about. On the other hand, imagine finding this same entity with the following information:

In the updated entity, we have:
Applying this same curation process to the rest of the data platform will help existing teams share information and collaborate more effectively while reducing the onboarding time for new members.
How often has a dashboard broken or an ETL pipeline failed because of planned (or surprise) schema changes? Teams and needs evolve, and data has to evolve with them. Unfortunately, this is a reality that won’t change. However, we can improve how we detect and communicate said changes. The goal is to minimize - and, when possible, prevent - any downtime or time-consuming activities such as backfilling.
Being able to deploy and schedule regular metadata ingestion workflows will take care of flagging differences between versions. A new column has been added? The Table version gets bumped by 0.1. Deleting columns can be scary, so the version increases by 1.0, the same way as the software does.
The best part is that all the versioning evolution is stored and explorable. On top of that, all change events can be consumed by setting up a Webhook. Out of the box, OpenMetadata offers linking MS Teams or Slack to notify teams. Moreover, pushing changes to a Redpanda topic would be a possible approach to have fine-grained response control for different events.
In this post, we have explored the modern challenges of data teams, aiming to close the gap between data and people.
Setting up Redpanda together with OpenMetadata, we have prepared a metadata ingestion process and explored how to curate assets’ information, bringing context to where each piece of the architecture is positioned within the data platform.
Finally, we have also presented how data evolution impacts teams and reduces the value they can generate. With the help of OpenMetadata and Redpanda, engineers can detect changes early and automate business flows based on data evolution.
For any questions, you can visit OpenMetadata's documentation, reach out in Slack, or check out the OpenMetadata repository. If you like our post and want to support our project, give us a star!
Take Redpanda for a test drive here. Check out the documentation to understand the nuts and bolts of how the platform works, or read more Redpanda blogs to see the plethora of ways to integrate with Redpanda. To ask Redpanda Solution Architects and Core Engineers questions and interact with other Redpanda users, join the Redpanda Community on Slack.

One architecture for real-time and analytics workloads. Easy to access, governed, and immediately queryable

Everything you need to move agentic AI initiatives to production — safely
Subscribe to our VIP (very important panda) mailing list to pounce on the latest blogs, surprise announcements, and community events!
Opt out anytime.