Intro
In this post I’m going to introduce rpk, a single tool for managing your entire Redpanda cluster. It handles everything from low-level tuning to node configuration, and Kafka® level management tasks like topic creation.
Prerequisites
For the purpose of this guide, we’ll assume a fresh single node. If you have a node or a cluster running already, you can still follow this guide, but you might get different outputs from some of the commands.
If you run into any issues, please let us know in our community Slack workspace!
Installing Redpanda
Enter your email on the signup form in https://redpanda.com/ and run the generated command. It should be something similar to
$ curl -s https://<url to setup script> | sudo bash && sudo yum install redpanda -y
You can check that it was installed correctly by running
$ rpk version
Bootstrapping the configuration
Redpanda comes with a default configuration for a single-node “cluster” that works out of the box. However, there are times when you need to set certain fields of the node’s configuration, like the node’s IP address and its ID. Fortunately, there’s an rpk
command for that!
$ sudo rpk redpanda config bootstrap --id 0
Note: The DEB and RPM packages install the config file in
/etc/redpanda/redpanda.yaml
by default, so that’s why these commands need to be run as root.
rpk redpanda config bootstrap
will set the node ID to the given ID and try to discover your machine’s private IPv4 address to set the config accordingly.
If you need to set other fields, you can do so with rpk redpanda config set
. Also make sure to check out our advanced config reference for a complete list of configuration fields.
Starting redpanda
The Redpanda packages come with custom systemd units to ensure isolation and resilience, also providing a simple way to run Redpanda. So if you are farmiliar with those standard system tools you’ll feel right at home. Let’s walk through it anyway.
$ sudo systemctl start redpanda
You can verify that it started by running
$ systemctl status redpanda
If everything went well, the output you see should be something like this:
● redpanda.service - Redpanda, the fastest queue in the West.Loaded: loaded (/usr/lib/systemd/system/redpanda.service; enabled; vendor preset: enabled)Active: active (running) since Mon 2020-10-12 08:56:39 -05; 2min 8s agoMain PID: 21188 (redpanda)Status: "redpanda is ready! - release-0.99.7-178-g57e2946c - 57e2946c53f4d89e8446191b98aae1e969727090-dirty"Tasks: 32 (limit: 38107)Memory: 505.9MCGroup: /redpanda.slice/redpanda.service└─21188 /opt/redpanda/bin/redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml --lock-memory false --io-properties-file /etc/redpanda/io-config.yamlOct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,414 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:565 - Recovered, log offsets: {start_offset:>Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,425 [shard 0] storage - segment.cc:522 - Creating new segment /var/lib/redpanda/data/redpanda/kvstore/0_0/0-0-v1.logOct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,439 [shard 0] cluster - members_manager.cc:46 - starting cluster::members_manager...Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,450 [shard 0] raft - [group_id:0, {redpanda/controller/0}] vote_stm.cc:238 - became the leader term:{1}Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,450 [shard 0] storage - segment.cc:522 - Creating new segment /var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.logOct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,450 [shard 0] cluster - state_machine.cc:18 - Starting state machineOct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:448 - Started RPC server listening at {host: 0.0.0.0, port: 33145}Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:487 - Started Kafka API server listening at {host: 0.0.0.0, port: 9092}Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:489 - Successfully started Redpanda!Oct 12 08:56:39 localhost.localdomain systemd[1]: Started Redpanda, the fastest queue in the West..
Using systemd is especially useful when running Redpanda in production, since it’s a central place to enforce restart policies, resource isolation, and running periodic jobs.
rpk
also provides a debug info
command which gives you much more information about the current node, such as resource usage and the current configuration (your output might differ):
$ rpk debug infoVersion v21.1.2 (rev 500bf8fa)Cloud Provider awsMachine Type i3.largeOS x86_64 5.4.0-1025-aws #25-Ubuntu SMP Fri Sep 11 09:37:24 UTC 2020 Ubuntu 20.04.1LTSCPU Model Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHzCPU Usage % 23.993Free Memory (MB) 13271.105Free Space (MB) 449520.945cluster_id testconfig_file /etc/redpanda/redpanda.yamllicense_key eyJvIjoidmVjdG9yaXplZC5pbyIsInkiOjIwMjAsIm0iOjEwLCJkIjoxMiwiYyI6MzYyMjkwOTA3NH0=node_uuid RKdMw6hLru7NxbMHkoFcUkvY42No26xKFDE92bx4so2h2DJtQorganization vectorized.ioredpanda.admin 172.31.20.28:9644redpanda.auto_create_topics_enabled trueredpanda.data_directory /var/lib/redpanda/dataredpanda.kafka_api 172.31.20.28:9092redpanda.kafka_api_tls.cert_fileredpanda.kafka_api_tls.enabled falseredpanda.kafka_api_tls.key_fileredpanda.kafka_api_tls.truststore_fileredpanda.node_id 0redpanda.rpc_server 172.31.20.28:33145rpk.coredump_dir /var/lib/redpanda/coredumprpk.enable_memory_locking falserpk.enable_usage_stats truerpk.tls.cert_filerpk.tls.key_filerpk.tls.truststore_filerpk.tune_aio_events truerpk.tune_clocksource truerpk.tune_coredump falserpk.tune_cpu truerpk.tune_disk_irq truerpk.tune_disk_nomerges truerpk.tune_disk_scheduler truerpk.tune_fstrim truerpk.tune_network truerpk.tune_swappiness truerpk.tune_transparent_hugepages false
Note: If you’ve read our docs guide, you might be wondering why we didn’t run rpk redpanda tune
. The answer is that rpk redpanda tune
makes changes to your machine’s configuration to ensure the best performance possible, and some of them are persistent. Since they might affect your experience on desktop apps, it’s better not to tune your development machine.
Introducing rpk topic
As part of Redpanda , our goal with rpk
is for it to have everything you need to do your job. No external tools, no long setup or configuration steps. If there’s a common use-case, rpk should support it.
That’s why we added the rpk topic
command namespace, which allows you to interact with Redpanda’s Kafka®-compatible API without installing anything else, or using task-specific shell scripts.
All of the rpk topic
subcommands default to using the IP configured with rpk redpanda config bootstrap
, or localhost:9092
if a configuration file isn’t found. If you need to override that (e.g. to interact with other remote brokers), you can pass a list of broker addresses (<ip>:<port>
pairs) through the --brokers
flag.
Managing topics
Creating a topic is as simple as running
$ rpk topic create cute-pandas --replicas 1Created topic 'cute-pandas'. Partitions: 1, replicas: 1, configuration:'cleanup.policy':'delete'
Because we only have one node, the default number of partitions and a replication factor of 1 is ok, but for a production-ready configuration, you’ll probably want to tune them to your needs with the -p
& -r
flags.
Let’s check our new topic’s configuration:
$ rpk topic describe cute-pandasName cute-pandasInternal falseConfig:Name Value Read-only Sensitivepartition_count 1 false falsereplication_factor 1 false falsePartitions 1 - 1 out of 1Partition Leader Replicas In-Sync Replicas High Watermark0 0 [0] [0] 1
Here we get high-level information about the given topic. In addition to telling us the number of partitions and replicas - which we already knew - there’s useful data like which node is the leader for each partition and which other brokers hold its replicas. The High Watermark
column shows the latest offset that has been replicated to all replicas for that partition.
You can also use rpk
for producing and consuming from topics. Let’s try that.
$ echo '{"name": "Red", "website": "vectorized.io"}' | rpk topic produce cute-pandas --key record-key -H header-key:header-valueReading message... Press CTRL + D to send, CTRL + C to cancel.Sent record to partition 0 at offset 1 with timestamp 2020-10-07 18:29:08.481278811 +0000 UTC m=+0.248046468.
rpk topic produce
reads from standard input, so it works well for scripting or playing around with it on the command line.
Let’s see where that message went by consuming from our topic:
$ rpk topic consume cute-pandas{"headers": [{"key": "header-key","value": "header-value"}],"key": "record-key","message": "{\"name\": \"Red\", \"website\": \"vectorized.io\"}\n","partition": 0,"offset": 1,"timestamp": "2020-10-07T18:29:08.481Z"}
Which is just what we had produced! 🎉
rpk
will consume from the “beginning”, and block waiting for new records. You can exit by pressing CTRL+C.
If you’re planning on piping the incoming records to other commands (e.g. jq
, awk
), you can set pretty-printing off by passing --pretty-print false
.
When inspecting live traffic, you may also want to set --offset newest
to avoid consuming from the topic’s first record. --partitions
is also a useful flag to further filter the consumed records.
cute-pandas
is a toy topic, with only one partition and a replication factor of 1. For “bigger” topics with thousands of partitions, replicated across multiple nodes, it’s useful to have a way to check their health. That’s what rpk topic info
is for!
$ rpk topic info cute-pandasName cute-pandasInternal falsePartitions 1Under-replicated partitions NoneUnavailable partitions None
If there’s any issues with our topic’s partitions, we would see it here:
- Under-replicated partitions: Partitions for which replicas have lagged below their replication factor.
- Unavailable partitions: Partitions for which the majority of their replicas are unavailable.
Lastly, we can also delete topics:
$ rpk topic delete cute-pandasDeleted topic 'cute-pandas'.
We can list all topics to verify that the topic was, in fact, deleted:
$ rpk topic listNo topics found.
Outro
I hope this has been useful! If you have used rpk
and have any ideas on how to improve it or its UX, please reach out! We’re always excited to hear feedback and suggestions, and strive to incorporate them quickly into Redpanda and rpk
.
As mentioned before, you can optionally pass --brokers
to all of the rpk topic
subcommands, which allows you to interact with a remote cluster brokers’ API. Make sure to also check my previous blog post on configuring TLS in Redpanda and rpk, which could be useful when interacting remotely with a cluster.
Acknowledgments
Thanks to the sarama project contributors. It was a great reference point for rpk topic
.
2021-01-25: Edited all the commands to reflect the latest namespacing changes.