Unleashing the power of real-time analytics with SingleStore and Redpanda

Learn how to build a modern clickstream analytics system using Redpanda and SingleStore

Manish Kumar

Chris Larsen

June 20, 2023

CopIED!

The ability to use real-time data to make business decisions is critical in today’s world. In the modern business landscape, data has become the new oil, fueling growth and innovation across industries. Yet, as crucial as data is, its true value lies not only in the volume accumulated but in the speed at which we can process and analyze it effectively and efficiently to drive business outcomes.

The sheer magnitude of data generated every day requires tools that facilitate high-speed transactions, real-time analytics, and robust data streaming. Recognizing these requirements, two platforms stand out: SingleStore and Redpanda. While each is powerful in its own right, integrating the two can revolutionize your data management strategy by bringing together the best of data streaming and real-time analytics.

Let's take an example of a clickstream data. Real-time clickstream analysis is a pervasive challenge in today's world of online interactions. It involves tracking and analyzing the sequence of clicks that a user makes while navigating a website or application, which is leveraged for improving user experiences, personalizing content and optimizing marketing strategies.

The complexities of dealing with the high volume, velocity, and variety of clickstream data make its real-time processing extra challenging. Users typically use an event streaming platform like Kafka to process the clickstream, but have historically struggled with the end database on which they need to do complex analysis.

Building such a real-time analytics system requires both ingesting large amounts of events and serving the analytical needs of the application as quickly as possible. Such integrations involve plumbing multiple tools and custom solutions which make it hard to manage and scale. This is where Redpanda’s integration with SingleStore provides a best-in-class option for customers building such applications.

What's Redpanda?

If you're new here, Redpanda is a modern streaming platform that acts as a drop-in replacement for Apache Kafka®. Kafka was once a streaming data superpower, but it struggles with modern data-intensive requirements--sparking a need for leaner, faster streaming data alternatives.

Redpanda is designed to offer higher performance with a simpler, more developer-friendly architecture compared to Kafka. It's fully API compatible with Kafka, which makes it highly suitable for building such applications and also using existing Kafka applications without any code changes.

This makes it easy to integrate with SingleStore pipelines, which can ingest massive amounts of data into SingleStore and make it queryable in real time. Additionally, this can be done in a simple SQL-like interface, which makes using these configurations seamless for developers.

What's SingleStore?

Designed for apps, analytics and gen AI, SingleStore is the world's only real-time data platform that can read, write and reason on petabyte-scale data in a few milliseconds. At its core is a high-performance, SQL database engine that allows you to transact, analyze and contextualize your data. Leading companies like Adobe, Akamai, Comcast, GE, Heap, Hulu, Siemens and Sony trust SingleStore to achieve 50x faster ETL and 100x faster performance at a third of the costs compared to legacy architectures.

In this blog post, we will show how to set up a Redpanda cluster that receives clickstream data from a variety of sources. This data can then be ingested into a SingleStoreDB cluster running on AWS to provide rich insights in real time.

The challenges of building a modern clickstream analytics system

Building a modern clickstream analytics system involves collecting, processing, analyzing, and storing large amounts of data in real time. Let's take an example of the user operation doing multiple clicks online. All of the clickstream data needs to be processed to drive business value or predictive analytics. At a high level, it involves four steps:

Collect data from variety of sources
Ingest data
Store the data
Drive business insights by consuming the data

All of these operations need to happen in real time while ensuring the architecture is robust for enterprise readiness, scalable based on business needs, and at a reasonable cost.

While using real-time stream processing systems such as Apache Flink and Apache Spark can provide a low latency solution, quick ingestion into a database—and quick processing of that data—is where Redpanda's integration with SingleStore provides the best parts of both tools.

Typical databases struggle with the speed of ingestion and have to rely on external tools. However, SingleStore supports native capability called Pipelines which help in super-fast ingestion from Kafka.

Build a real-time analytics system in three steps

Now we'll show a simple setup of ingesting a large amount of clickstream data from Redpanda to SingleStoreDB Cloud running on AWS.

Clickstream Kafka pipelines to SingleStore

Redpanda is easy to deploy in the cloud using one of two options: Dedicated Cloud (provisioned in Redpanda’s tenant, AWS in this case) or Bring Your Own Cloud (BYOC - provisioned in your tenant yet still fully managed with Redpanda’s unified control plane). The solution in this tutorial was built using Redpanda’s BYOC model.

To build the connection with SingleStoreDB Cloud, customers can set up a Kafka. Now that you have SingleStoreDB cluster running, we'll create a pipeline that can capture the incoming stream of data natively into SingleStore. All you need is three steps:

1 - Set up the actual pipeline using SingleStore Kafka pipelines.

CREATE OR REPLACE PIPELINE `<Pipeline_name> ` 
AS LOAD DATA KAFKA 'Redpanda_topic_1,
Redpanda_topic_2,
Redpanda_topic_3,
CONFIG '{
    "sasl.username": "<user_name> ",
    "sasl.mechanism": "SCRAM-SHA-256",
    "security.protocol": "SASL_SSL"
}'
CREDENTIALS '{
"sasl.password": "REDACTED"
}'
DISABLE OUT_OF_ORDER OPTIMIZATION
INTO TABLE <table_name>
FORMAT JSON
(
    field_1<- value_1,
    field_2<- value_2,
    field_3<- value_3,
    field_4<- value_4,
    field_5<- value_5,
)
ON DUPLICATE KEY UPDATE 
    field_1= VALUES(value_1),
    field_2= VALUES(value_2),
    field_3= VALUES(value_3),
    field_4= VALUES(value_4),
    field_5= VALUES(value_5),
;

2 - Once we have the pipeline created you can check the pipeline by checking the sample data

> TEST PIPELINE `<Pipeline_name> `

3 - Once you have verified that the pipeline works perfectly fine through the sample data, you can start the pipeline and see that the data should start flowing.

> START PIPELINE `<Pipeline_name> `

Just like that, you can start getting the data into SingleStore, which is instantly available for queries and provide quick insights.

Empower your real-time data strategy

The combination of SingleStore and Redpanda offers a best-in-class solution for organizations seeking real-time analytics and high-speed data processing. By harnessing the power of these platforms, businesses can stay ahead in today’s data-driven landscape. In this step-by-step blog, we demonstrated how to set up the connection between Redpanda and SingleStoreDB Cloud running on AWS.

Redpanda provides a real-time ingestion platform that can be a drop-in replacement for Kafka. SingleStore provides unique capability to solve both transactional and analytical needs of your application and makes data available for query as soon as it's loaded. This makes building real-time applications simple and efficient—a perfect choice for storing real-time data, streamed from Redpanda.

Interested in trying out SingleStore and Redpanda? Learn how to get started building with SingleStore and try Redpanda for free. You can also browse the Redpanda blog for step-by-step tutorials and real-world customer stories. If you have questions or just want to chat with fellow Redpanda users, join the Redpanda Community on Slack.