Use rpk to bootstrap the config, start nodes and manage topics.

ByDavid CastilloonOctober 7, 2020
Getting started with RPK

Introduction

In this post I’m going to introduce rpk, a single tool for managing your entire Redpanda cluster. It handles everything from low-level tuning to node configuration, and Apache Kafka® level management tasks like topic creation.

Prerequisites

For the purpose of this guide, we'll assume a fresh single node. If you have a node or a cluster running already, you can still follow this guide, but you might get different outputs from some of the commands.

If you run into any issues, please let us know in our community Slack workspace!

Installing Redpanda

To install Redpanda on Fedora/RedHat systems:

## Run the setup script to download and install the repo curl -1sLf 'https://packages.vectorized.io/nzc4ZYQK3WRGd9sy/redpanda/cfg/setup/bash.rpm.sh' | sudo -E bash && \ ## Use yum to install redpanda sudo yum install redpanda -y && \

For Debian/Ubuntu systems, use:

## Run the setup script to download and install the repo curl -1sLf 'https://packages.vectorized.io/nzc4ZYQK3WRGd9sy/redpanda/cfg/setup/bash.deb.sh' | sudo -E bash && \ ## Use apt to install redpanda sudo apt install redpanda -y && \

You can check that it was installed correctly by running

$ rpk version

Note: For more information about installing Redpanda on Linux, macOS, or any other compatible system, view the quickstart documentation here.

Bootstrapping the configuration

Redpanda comes with a default configuration for a single-node "cluster" that works out of the box. However, there are times when you need to set certain fields of the node's configuration, like the node's IP address and its ID. Fortunately, there's an rpk command for that!

$ sudo rpk redpanda config bootstrap --id 0

Note: The DEB and RPM packages install the config file in /etc/redpanda/redpanda.yaml by default, so that's why these commands need to be run as root.

rpk redpanda config bootstrap will set the node ID to the given ID and try to discover your machine's private IPv4 address to set the config accordingly.

If you need to set other fields, you can do so with rpk redpanda config set. Also make sure to check out our advanced config reference for a complete list of configuration fields.

Starting Redpanda

The Redpanda packages come with custom systemd units to ensure isolation and resilience, also providing a simple way to run Redpanda. So if you are farmiliar with those standard system tools you’ll feel right at home. Let’s walk through it anyway.

$ sudo systemctl start redpanda

You can verify that it started by running

$ systemctl status redpanda

If everything went well, the output you see should be something like this:

● redpanda.service - Redpanda, the fastest queue in the West. Loaded: loaded (/usr/lib/systemd/system/redpanda.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2020-10-12 08:56:39 -05; 2min 8s ago Main PID: 21188 (redpanda) Status: "redpanda is ready! - release-0.99.7-178-g57e2946c - 57e2946c53f4d89e8446191b98aae1e969727090-dirty" Tasks: 32 (limit: 38107) Memory: 505.9M CGroup: /redpanda.slice/redpanda.service └─21188 /opt/redpanda/bin/redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml --lock-memory false --io-properties-file /etc/redpanda/io-config.yaml Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,414 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:565 - Recovered, log offsets: {start_offset:> Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,425 [shard 0] storage - segment.cc:522 - Creating new segment /var/lib/redpanda/data/redpanda/kvstore/0_0/0-0-v1.log Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,439 [shard 0] cluster - members_manager.cc:46 - starting cluster::members_manager... Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,450 [shard 0] raft - [group_id:0, {redpanda/controller/0}] vote_stm.cc:238 - became the leader term:{1} Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,450 [shard 0] storage - segment.cc:522 - Creating new segment /var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,450 [shard 0] cluster - state_machine.cc:18 - Starting state machine Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:448 - Started RPC server listening at {host: 0.0.0.0, port: 33145} Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:487 - Started Kafka API server listening at {host: 0.0.0.0, port: 9092} Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:489 - Successfully started Redpanda! Oct 12 08:56:39 localhost.localdomain systemd[1]: Started Redpanda, the fastest queue in the West..

Using systemd is especially useful when running Redpanda in production, since it's a central place to enforce restart policies, resource isolation, and running periodic jobs.

rpk also provides a debug info command which gives you much more information about the current node, such as resource usage and the current configuration (your output might differ):

$ rpk debug info Version v21.1.2 (rev 500bf8fa) Cloud Provider aws Machine Type i3.large OS x86_64 5.4.0-1025-aws #25-Ubuntu SMP Fri Sep 11 09:37:24 UTC 2020 Ubuntu 20.04.1 LTS CPU Model Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz CPU Usage % 23.993 Free Memory (MB) 13271.105 Free Space (MB) 449520.945 cluster_id test config_file /etc/redpanda/redpanda.yaml license_key eyJvIjoidmVjdG9yaXplZC5pbyIsInkiOjIwMjAsIm0iOjEwLCJkIjoxMiwiYyI6MzYyMjkwOTA3NH0= node_uuid RKdMw6hLru7NxbMHkoFcUkvY42No26xKFDE92bx4so2h2DJtQ organization vectorized.io redpanda.admin 172.31.20.28:9644 redpanda.auto_create_topics_enabled true redpanda.data_directory /var/lib/redpanda/data redpanda.kafka_api 172.31.20.28:9092 redpanda.kafka_api_tls.cert_file redpanda.kafka_api_tls.enabled false redpanda.kafka_api_tls.key_file redpanda.kafka_api_tls.truststore_file redpanda.node_id 0 redpanda.rpc_server 172.31.20.28:33145 rpk.coredump_dir /var/lib/redpanda/coredump rpk.enable_memory_locking false rpk.enable_usage_stats true rpk.tls.cert_file rpk.tls.key_file rpk.tls.truststore_file rpk.tune_aio_events true rpk.tune_clocksource true rpk.tune_coredump false rpk.tune_cpu true rpk.tune_disk_irq true rpk.tune_disk_nomerges true rpk.tune_disk_scheduler true rpk.tune_fstrim true rpk.tune_network true rpk.tune_swappiness true rpk.tune_transparent_hugepages false

Note: If you've read our docs guide, you might be wondering why we didn't run rpk redpanda tune. The answer is that rpk redpanda tune makes changes to your machine's configuration to ensure the best performance possible, and some of them are persistent. Since they might affect your experience on desktop apps, it's better not to tune your development machine.

Introducing rpk topic

As part of Redpanda , our goal with rpk is for it to have everything you need to do your job. No external tools, no long setup or configuration steps. If there's a common use-case, rpk should support it.

That's why we added the rpk topic command namespace, which allows you to interact with Redpanda's Kafka-compatible API without installing anything else, or using task-specific shell scripts.

All of the rpk topic subcommands default to using the IP configured with rpk redpanda config bootstrap, or localhost:9092 if a configuration file isn't found. If you need to override that (e.g. to interact with other remote brokers), you can pass a list of broker addresses (<ip>:<port> pairs) through the --brokers flag.

Managing topics

Creating a topic is as simple as running

$ rpk topic create cute-pandas --replicas 1 Created topic 'cute-pandas'. Partitions: 1, replicas: 1, configuration: 'cleanup.policy':'delete'

Because we only have one node, the default number of partitions and a replication factor of 1 is ok, but for a production-ready configuration, you'll probably want to tune them to your needs with the -p & -r flags.

Let's check our new topic's configuration:

$ rpk topic describe cute-pandas Name cute-pandas Internal false Config: Name Value Read-only Sensitive partition_count 1 false false replication_factor 1 false false Partitions 1 - 1 out of 1 Partition Leader Replicas In-Sync Replicas High Watermark 0 0 [0] [0] 1

Here we get high-level information about the given topic. In addition to telling us the number of partitions and replicas - which we already knew - there's useful data like which node is the leader for each partition and which other brokers hold its replicas. The High Watermark column shows the latest offset that has been replicated to all replicas for that partition.

You can also use rpk for producing and consuming from topics. Let's try that.

$ echo '{"name": "Red", "website": "vectorized.io"}' | rpk topic produce cute-pandas --key record-key -H header-key:header-value Reading message... Press CTRL + D to send, CTRL + C to cancel. Sent record to partition 0 at offset 1 with timestamp 2020-10-07 18:29:08.481278811 +0000 UTC m=+0.248046468.

rpk topic produce reads from standard input, so it works well for scripting or playing around with it on the command line.

Let's see where that message went by consuming from our topic:

$ rpk topic consume cute-pandas { "headers": [ { "key": "header-key", "value": "header-value" } ], "key": "record-key", "message": "{\"name\": \"Red\", \"website\": \"vectorized.io\"}\n", "partition": 0, "offset": 1, "timestamp": "2020-10-07T18:29:08.481Z" }

Which is just what we had produced! 🎉

rpk will consume from the "beginning", and block waiting for new records. You can exit by pressing CTRL+C.

If you're planning on piping the incoming records to other commands (e.g. jq, awk), you can set pretty-printing off by passing --pretty-print false.

When inspecting live traffic, you may also want to set --offset newest to avoid consuming from the topic's first record. --partitions is also a useful flag to further filter the consumed records.

cute-pandas is a toy topic, with only one partition and a replication factor of 1. For "bigger" topics with thousands of partitions, replicated across multiple nodes, it's useful to have a way to check their health. That's what rpk topic info is for!

$ rpk topic info cute-pandas Name cute-pandas Internal false Partitions 1 Under-replicated partitions None Unavailable partitions None

If there's any issues with our topic's partitions, we would see it here:

  • Under-replicated partitions: Partitions for which replicas have lagged below their replication factor.
  • Unavailable partitions: Partitions for which the majority of their replicas are unavailable.

Lastly, we can also delete topics:

$ rpk topic delete cute-pandas Deleted topic 'cute-pandas'.

We can list all topics to verify that the topic was, in fact, deleted:

$ rpk topic list No topics found.

Conclusion

I hope this has been useful! If you have used rpk and have any ideas on how to improve it or its UX, please reach out! We're always excited to hear feedback and suggestions, and strive to incorporate them quickly into Redpanda and rpk.

As mentioned before, you can optionally pass --brokers to all of the rpk topic subcommands, which allows you to interact with a remote cluster brokers' API. Make sure to also check my previous blog post on configuring TLS in Redpanda and rpk, which could be useful when interacting remotely with a cluster.

Acknowledgments

Thanks to the sarama project contributors. It was a great reference point for rpk topic.


2021-01-25: Edited all the commands to reflect the latest namespacing changes.

Let's keep in touch

Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.