Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
Kafka is an excellent choice for transporting metrics. It's not a simple system, so you will find yourself digging through the official documentation from time-to-time. It is resilient towards the failure of individual nodes, and it supports log retention that could potentially give you some breathing room in the face of problems with other components without worrying about the permanent loss of metrics.
A Kafka cluster consists of the following parts:
You should follow the Kafka introduction for getting started.
On top of this, it is important that you configure the num.partitions option on the broker to be a larger number, like 100
.
# server.properties
num.partitions=100
The exact number is not important, but the number of partitions used for a particular topic is the limiting factor for distributing load. So, if you have a topic with only two partitions, this would limit the number of active Heroic consumers you have to two as well.
The following is a complete example configuration for a Kafka consumer.
Take note of group.id
below.
Two consumers belonging to the same group.id
will balance the responsibility between them.
Therefore you can operate as many consumers as you need to support your desired throughput and redundancy with the same configuration.
# heroic.yaml
port: 8080
consumers:
- type: kafka
schema: com.spotify.heroic.consumer.schemas.Spotify100
topics:
- "metrics-pod1"
config:
group.id: heroic-consumer
zookeeper.connect: zookeeper1.example.com,zookeeper2.example.com,zookeeper3.example.com/heroic
auto.offset.reset: smallest
auto.commit.enable: true
Using the above configuration as skeleton you need to fill in at least the metric, and metadata backends. At this point, you now have a consumer configuration that can be used to spawn one or more Heroic instances faithfully consuming your metrics.
ffwd-java is a metrics forwarding agent developed at Spotify. It has first-class support for sending metrics into Kafka, and the following will detail how this is configured.
Kafka uses partitions to distribute load, each producer decides which partition a particular message should be sent to. ffwd-java supports partitioning (see "Topics and Logs" in the Kafka documentation) per-host using the following output plugin:
# ffwd.yaml
attributes:
host: database.example.com
pod: pod1
output:
plugins:
- type: "kafka"
flushInterval: 10000
serializer:
type: spotify100
router:
type: attribute
attribute: pod
producer:
metadata.broker.list: "kafka1.example.com,kafka2.example.com,kafka3.example.com"
request.required.acks: 1
request.timeout.ms: 1000
The above will instruct ffwd-java to send metrics to kafka, the topic will be determined (routed) to the metrics-<pod>
topic, where <pod>
is the pod
attribute in the metric.
A host-based partitioner by default, so metrics sent from a single given host will all end up on the same partition.