Ockam and Redpanda launch the world's first zero-trust streaming data platform

Connect, secure, and stream—all in one simple platform

By
on
June 25, 2024

We’re proud to launch Redpanda Connect with Ockam: the first zero-trust streaming data platform! Redpanda Connect with Ockam is the only enterprise-scale, zero-trust, streaming data platform that's simultaneously easy to deploy, secure by design, and highly performant.

Redpanda and Ockam is a natural partnership as both companies have the same ethos: to enable every developer to build distributed systems, at scale, with simple tools. Both companies' products are also based on popular open-source projects built by experienced, high-performing teams.

“Enterprise executives tell me that their high-value streaming applications take a long time to build and to ship,” said Matthew Gregory, Ockam's CEO.

“Their biggest obstacle is the construction of secure streaming data connections across their siloed enterprise, and between partner companies. Secure data streaming is now simple to set up, trivial to maintain, and almost impossible to mess up. We've unlocked data so application development can move fast.”

Enterprise Kafka product vendors defer zero-trust implementation details to their customers. It's difficult — even for sophisticated engineering organizations — to pull together a team that can choose and assemble the dozens of parts required to build a secure-by-design streaming data platform. Redpanda Connect with Ockam empowers a single developer to easily create end-to-end encrypted streaming pipelines that can connect hundreds of applications across clouds and hybrid infrastructure. It builds trust in data streams, eliminates sources of risk to critical data, and removes the operational complexities inherent in managing keys at scale.

A better way to build streaming data pipelines

Redpanda provides a declarative streaming data platform that enables developers and organizations to build simple, chained, and stateless processing steps for their data pipelines. Our partnership with Redpanda brings a truly zero-trust capability to these pipelines.

Ockam's cryptographic protocols and data integrity guarantees remove the risk of MITM and supply chain attacks that could otherwise tamper with data inflight and lead to data poisoning; highlighted as one of the OWASP Top 10 risks for companies that leverage AI and Large Language Model (LLM) systems.

The mutual authentication and encrypted data protocols that Ockam provides also mean organizations can transmit highly sensitive data with cryptographic guarantees so only the intended recipients will be able to decrypt and process it. This will unlock new high-value applications that businesses may have been unwilling to explore due to privacy and governance risk, or complexity.

“We help the world's largest companies build and scale streaming applications," said Alex Gallego, Redpanda's CEO. "Redpanda Connect with Ockam was conceived to empower our customers to build streaming data applications with ease.

Redpanda Connect comes with over 220 pre-built integrations that will quickly connect to and unlock your data silos. Ockam provides the mutual authentication and end-to-end encryption that zero-trust streaming data applications require. Together with Ockam, we've abstracted away all of the complex pieces that cause engineering teams to stumble.

Redpanda’s State of Streaming Data Report 2023-2024 surveyed 300 streaming data users across various industries.

  • 66% cited that their streaming data projects are blocked because of high costs and a lack of in-house expertise.
  • 40% highlight that data security, data privacy, and complexity are their main technical hurdles.

These are alarming numbers! Despite the obvious benefits that a real-time streaming data architecture provides, companies struggle to ship data pipelines to their applications. Often, they're blocked for several quarters. Once they get to production, they realize that their data doesn't have the security guarantees they need or expect from their streaming data infrastructure.

Redpanda Connect with Ockam aims to move these numbers to 0% and 100%. Here’s how.

Connect, secure, and stream—all in one simple platform

Connect with Redpanda Connect

Redpanda Connect (formerly known as Benthos) empowers engineers to build connectors and integrations to over 200 applications, including SQL, Mongo, Snowflake, Cassandra, Redis, Splunk, AWS Lambda, Azure CosmosDB, and GCP Cloud Storage. Redpanda Connect can process streaming data with transforms, mapping, filtering, hydration, and enrichment capabilities.

Secure with Ockam

Zero trust in your data streaming infrastructure allows you to have absolute trust in your data and applications. Ockam adds the guarantees of data confidentiality, integrity, and authenticity, so you can enforce data governance and privacy policies at scale.

Each producer creates its own unique cryptographically provable identity and encryption keys. The producers use their keys to establish trusted secure channels through Redpanda Cloud and to the consumers across the streaming platform. All the data that moves through your streaming data platform is encrypted in motion — even when it's passing through a broker. Keys, enrollments, and credentials are safely created, stored, rotated, and revoked automagically so there's almost nothing to manage.

Ockam makes it simple to build trust across your entire data layer in a way that's almost impossible to mess up.

Stream with Redpanda Cloud

Redpanda Cloud is a complete streaming data platform delivered as a fully managed service with automated upgrades and patching, data and partition balancing, and 24×7 support. It continuously monitors and maintains your clusters along with the underlying infrastructure to meet strict performance, availability, reliability, and security requirements.

Ockam in action

The example below will show you how to use Ockam to connect any input sources (e.g., MongoDB, PostgreSQL, a Kafka stream, etc.) to any output (e.g., Snowflake, S3, Splunk, etc.), automatically encrypting all your data in motion. You can run this complete example locally in less than two minutes by copying the relevant sections below.

First, copy the code block below and save it to a file named consumer.yaml:

# consumer.yaml

input:
  ockam_kafka:
    seed_brokers: [rp-node-0:9092]
    topics: [topic_A]
    consumer_group: example_group
    ockam_allow_producer: producer
    ockam_relay: consumer_relay
    ockam_enrollment_ticket: ${OCKAM_ENROLLMENT_TICKET}

pipeline:
  processors:
    - bloblang: |
        root = this
        root.data.message = this.data.message.uppercase()

output:
  stdout: {}
  # snowflake_put:
  #   account: acme
  #   ...

Next, save the code block below to producer.yaml

# producer.yaml

input:
  generate:
    count: 1000
    interval: "@every 1s"
    mapping: |
      root = {
        "_producer": hostname(),
        "data": { "email": fake("email"), "message": fake("sentence") }
      }
  # mongodb:
  #   url: mongodb://localhost
  #   database: orders
  #   ...

output:
  ockam_kafka:
    seed_brokers: [rp-node-0:9092]
    topic: topic_A
    ockam_route_to_consumer: /project/default/service/forward_to_consumer_relay/secure/api
    ockam_allow_consumer: consumer
    ockam_enrollment_ticket: ${OCKAM_ENROLLMENT_TICKET}

and data will start flowing. These commands use Homebrew to install tools and Docker to run containers, so before you begin, please install them on your machine.

# Setup Redpanda
brew install redpanda-data/tap/redpanda
rpk container start

# Setup Ockam
brew install build-trust/ockam/ockam
ockam enroll

# Setup Consumer
ockam project ticket --usage-count 1 --expires-in 10m \
  --attribute consumer --relay consumer_relay > ticket

docker run --rm --network redpanda --name consumer \
  -v "$(pwd)/consumer.yaml:/connect.yaml" \
  -e OCKAM_ENROLLMENT_TICKET="$(cat ticket)" ghcr.io/build-trust/redpanda-connect

# The previous command will block and print the output so you can see 
# the messages being consumed by the consumer.
#
# Run the following commands in a new terminal window.

# Setup Producers
for i in $(seq 1 3); do
  ockam project ticket --usage-count 1 --expires-in 10m \
    --attribute producer > ticket

  docker run --rm -d --network redpanda --name "producer$i" \
    -v "$(pwd)/producer.yaml:/connect.yaml" \
    -e OCKAM_ENROLLMENT_TICKET="$(cat ./ticket)" ghcr.io/build-trust/redpanda-connect
done

# It can take a minute or so for the messages to start flowing.
# Observe that the above consumer can decrypt and transform messages.

# However, if you look at the messages inside redpanda console at 
# localhost:8080 or using the below rpk command, you'll notice that 
# the messages are encrypted.
rpk topic consume topic_A

# Cleanup
docker stop $(docker ps -q --filter "name=producer" --filter "name=consumer")
rpk container purge

Walking through the commands

Walking through the commands one-by-one will explain what each step achieves. Starting with getting Redpanda running locally:

brew install redpanda-data/tap/redpanda
rpk container start

These two commands install Redpanda and start a container. Now a broker is running in a container that we will use to stream and process data with Redpanda Connect.

brew install build-trust/ockam/ockam
ockam enroll

The brew command is how you install Ockam Command into your local environment. ockam enroll will open a browser window and guide you through either signing up or signing in to Ockam Orchestrator. Ockam Orchestrator will allow you to securely manage the enrollment of thousands or more processing nodes without the need to manually push identities, keys, or credentials to each node. This initial enroll will establish your local machine as an Administrator within the project that has been set up for you within Ockam Orchestrator.

ockam project ticket --usage-count 1 --expires-in 10m \
  --attribute consumer --relay consumer_relay > ticket

The ockam project ticket command is how you generate a ticket that will admit another node into your project. The arguments here specify that:

  • The ticket is valid for use 1 time. This ensures that the ticket is for the desired purposes and if it's inadvertently leaked there is no opportunity for an attacker to re-use it to gain access to your project.
  • A time-to-live (TTL)/expiry period of 10 minutes. This also reduces the risk posed by a ticket leaking, you can adjust it to any time window that's appropriate for your use case.
  • An attribute of consumer. This will add an attribute to any node that enrolls with this ticket, and policy definitions can then use those attributes to apply Attribute Based Access Controls and restrict which nodes can communicate with each other.
  • Allow any node, that enrolls with this ticket, to set up a relay named consumer_relay. This provides a named route for other nodes to find this node, even if both nodes are on different network.

The output of that command, the enrollment ticket, is then redirected out to a file named ticket.

docker run --rm --network redpanda --name consumer \
  -v "$(pwd)/consumer.yaml:/connect.yaml" \
  -e OCKAM_ENROLLMENT_TICKET="$(cat ticket)" ghcr.io/build-trust/redpanda-connect

This is where you start your consumer with Redpanda Connect. It uses Docker to start a new container named consumer, on an isolated network named redpanda where the broker is available, passing in the consumer.yaml and ticket files.

The consumer.yaml file defines how to receive the data:

input:
  ockam_kafka:
    seed_brokers: [rp-node-0:9092]
    topics: [topic_A]
    consumer_group: example_group
    ockam_allow_producer: producer
    ockam_relay: consumer_relay
    ockam_enrollment_ticket: ${OCKAM_ENROLLMENT_TICKET}

This input block shows the standard Kafka configuration options that specify the broker, topics, and consumer group to use. The following three lines are the Ockam-specific additions that allow this input source to receive and process end-to-end encrypted data streams:

  • ockam_allow_producer: the attribute that needs to exist on the other node before allowing each node to establish a trusted channel between each other.
  • ockam_relay: the name of the relay to connect this node to.
  • ockam_enrollment_ticket: the enrollment ticket to use to admit this node into the Ockam Orchestrator project.
pipeline:
  processors:
    - bloblang: |
      root = this
      root.data.message = this.data.message.uppercase()

The pipeline and processors block does a basic transformation of received data, and converts the message attribute to be all in upper-case. While not an entirely useful transformation in production, this allows you to independently verify that the consumer was able to decrypt and process the data.

output:
  stdout: {}
  # snowflake_put:
  #   account: acme
  #   ...

The last part of the consumer.yaml configuration is the output block. This example implements a simple stdout output. As the commented-out Snowflake block highlights, other destinations can receive your final processed data stream by providing their configuration settings.

The code snippet for starting data producers shows how Ockam makes enrolling any number of nodes painless:

for i in $(seq 1 3); do
  ockam project ticket --usage-count 1 --expires-in 10m \
    --attribute producer > ticket

  docker run --rm -d --network redpanda --name "producer$i" \
    -v "$(pwd)/producer.yaml:/connect.yaml" \
    -e OCKAM_ENROLLMENT_TICKET="$(cat ./ticket)" ghcr.io/build-trust/redpanda-connect
done

The for i in $(seq 1 3); do line creates a for loop that will create 3 separate producers. Each producer will generate a unique identity, enroll in the Ockam Orchestrator project, and establish a secure channel with the consumer node. If you feel ambitious you can increase that 3 to a higher number, just be aware the script will run a new docker container for each producer so you could quickly exhaust your local system resources.

The ockam project ticket command generates an enrollment ticket, just as it did in the earlier section that explained the setup of the consumer. The primary difference with this ticket is that the attribute is set to producer and there is no need to set up a relay for a producer.

The docker run command runs the same image as before, on the same network, giving each image a unique name with a prefix of producer, and passing in the producer.yaml and ticket for each node.

The producer.yaml configuration file has two blocks of importance:

input:
  generate:
    count: 1000
    interval: "@every 1s"
    mapping: |
      root = {
        "_producer": hostname(),
        "data": { "email": fake("email"), "message": fake("sentence") }
      }
  # mongodb:
  #   url: mongodb://localhost
  #   database: orders
  #   ...

The input block here is generating random data, useful for the sake of a self-contained example but something you would replace in a real-world use case. On a 1-second interval new data that includes the hostname (allowing you to verify that there are three separate nodes producing data), and fake data for the email and message attributes in the payload, is sent to the output.

If you recall the explanation earlier, the message field is later processed by the consumer to be entirely in upper-case before being output to stdout. For a real-world use case, the commented out block gives an example of how to use a Mongo database as the input source for a producer.

output:
  ockam_kafka:
    seed_brokers: [rp-node-0:9092]
    topic: topic_A
    ockam_route_to_consumer: /project/default/service/forward_to_consumer_relay/secure/api
    ockam_allow_consumer: consumer
    ockam_enrollment_ticket: ${OCKAM_ENROLLMENT_TICKET}

The output block in producer.yaml contains the standard kafka configuration, along with three Ockam-specific options:

  • ockam_route_to_consumer: the full route to the relay that was setup for the consumer.
  • ockam_allow_consumer: the attribute that needs to exist on the other node before allowing each node to establish a trusted channel between each other.
  • ockam_enrollment_ticket: the enrollment ticket to use to admit this node into the Ockam Orchestrator project.

Now everything is up and running! You'll see in the log output that the consumer is receiving data, from different hosts, and that the message in each payload is upper-case. If you open the Redpanda Console or run rpk topic consume topic_A you'll see that the encrypted messages in topic_A are not readable by the broker or anybody else that might have access to the topic.

By adding or changing a few lines in the consumer.yaml and producer.yaml files it's possible to:

  • Use an existing self-hosted Redpanda broker or Redpanda Cloud instance.
  • Add more inputs into producer.yaml and/or outputs consumer.yaml to connect, secure, and stream data between any of the hundreds of different integrations supported by Redpanda Connect.

To learn more about Ockam, you can check out our website and follow us on LinkedIn. If you have questions about Redpanda Connect or all things streaming data, drop into the Redpanda Community on Slack and ask away.

Originally published on Ockam.io

Graphic for downloading streaming data report
Building a crypto data hub with Rust
HG King
&
Daniel Honig
&
&
August 20, 2024
Text Link
BigQuery to Redpanda: continuous queries for real-time data integration
Praseed Balakrishnan
&
Jobin George
&
&
August 6, 2024
Text Link
ZooKeeper to KRaft migration: a brief overview and a simpler alternative
Dunith Dhanushka
&
&
&
July 30, 2024
Text Link