Prometheus Configuration

Redpanda exports Prometheus metrics on <node ip>:9644/metrics. If you have an existing Prometheus instance, you can generate the relevant configuration using

rpk generate prometheus-config

The command will output a YAML object you can add to the scrape_configs list in your Prometheus config file:

- job_name: redpanda-node
- targets:

If you run the command on a node where redpanda is running, it will use redpanda’s Kafka API to discover the other nodes. Otherwise, you can pass seed-addr to specify a remote redpanda node from which to discover the other ones, or --node-addrs with a comma-separated list of all known cluster node addresses.

Grafana Configuration

You can generate a comprehensive Grafana dashboard with

rpk generate grafana-dashboard --datasource <name> --metrics-endpoint <url>

--metrics-endpoint is the address to a redpanda node’s metrics endpoint (<node ip>:9644/metrics, by default).

<name> is the name of the Prometheus datasource configured in your Grafana instance.

Right out of the box, it will generate panels tracking latency for p50, p95 and p99, throughput, and errors segmentated by type.

Simply pipe the commmand’s output to a file and import it in Grafana.

rpk generate grafana-dashboard \
--datasource prometheus \
--metrics-endpoint > redpanda-dashboard.json

Stats Reporting

Redpanda ships with an additional systemd service which executes periodically and reports resource usage and configuration data to Vectorized’s metrics API. It is enabled by default, and the data is anonymous. If you’d like us to be able to identify your cluster’s data, so that we can monitor it and alert you of possible issues, please set the organization (your company’s domain) and cluster_id (usually your team’s or project’s name) configuration fields. For example:

rpk config set organization ''
rpk config set cluster_id 'us-west-2'

To opt out of all metrics reporting, set rpk.enable_usage_stats to false via rpk

rpk config set rpk.enable_usage_stats false


Through Prometheus, you can access many metrics about the Redpanda process. Most of the metrics are used for debugging, but these metrics can be useful to measure system health:

vectorized_application_uptimeRedpanda uptime in milliseconds
vectorized_cluster_partition_last_stable_offsetLast stable offsetIf this is the last record received by the cluster, then the cluster is up-to-date and ready for maintenance
vectorized_io_queue_delayTotal delay time in the queueCan indicate latency caused by disk operations in seconds
vectorized_io_queue_queue_lengthNumber of requests in the queueCan indicate latency caused by disk operations
vectorized_kafka_rpc_active_connectionskafka_rpc: Currently active connectionsShows the number of clients actively connected
vectorized_kafka_rpc_connectskafka_rpc: Number of accepted connectionsCompare to the value at a previous time to derive the rate of accepted connections
vectorized_kafka_rpc_received_byteskafka_rpc: Number of bytes received from the clients in valid requestsCompare to the value at a previous time to derive the throughput in kafka layer in bytes/sec received
vectorized_kafka_rpc_requests_completedkafka_rpc: Number of successfull requestsCompare to the value at a previous time to derive the messages per sec per shard
vectorized_kafka_rpc_requests_pendingkafka_rpc: Number of requests being processed by server
vectorized_kafka_rpc_sent_byteskafka_rpc: Number of bytes sent to clients
vectorized_kafka_rpc_service_errorskafka_rpc: Number of service errors
vectorized_raft_leadership_changesNumber of leadership changesHigh value can indicate nodes failing and causing leadership changes
vectorized_reactor_utilizationCPU utilizationShows the true utilization of the CPU by Redpanda process
vectorized_storage_log_compacted_segmentNumber of compacted segments
vectorized_storage_log_log_segments_createdNumber of created log segments
vectorized_storage_log_partition_sizeCurrent size of partition in bytes
vectorized_storage_log_read_bytesTotal number of bytes read
vectorized_storage_log_written_bytesTotal number of bytes written

These categories of metrics are presented specificly by the seastar component of Redpanda: reactor, memory, scheduler, alien, io_queue