Consuming metrics over Kafka

Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.

Kafka is an excellent choice for transporting metrics. It's not a simple system, so you will find yourself digging through the official documentation from time-to-time. It is resilient towards the failure of individual nodes, and it supports log retention that could potentially give you some breathing room in the face of problems with other components without worrying about the permanent loss of metrics.

A Kafka cluster consists of the following parts:

You should follow the Kafka introduction for getting started. On top of this, it is important that you configure the num.partitions option on the broker to be a larger number, like 100.


# server.properties
num.partitions=100

The exact number is not important, but the number of partitions used for a particular topic is the limiting factor for distributing load. So, if you have a topic with only two partitions, this would limit the number of active Heroic consumers you have to two as well.

Configuring Heroic Consumers

The following is a complete example configuration for a Kafka consumer. Take note of group.id below. Two consumers belonging to the same group.id will balance the responsibility between them. Therefore you can operate as many consumers as you need to support your desired throughput and redundancy with the same configuration.


# heroic.yaml

port: 8080

consumers:
  - type: kafka
    schema: com.spotify.heroic.consumer.schemas.Spotify100
    topics:
      - "metrics-pod1"
    config:
      group.id: heroic-consumer
      zookeeper.connect: zookeeper1.example.com,zookeeper2.example.com,zookeeper3.example.com/heroic
      auto.offset.reset: smallest
      auto.commit.enable: true

Using the above configuration as skeleton you need to fill in at least the metric, and metadata backends. At this point, you now have a consumer configuration that can be used to spawn one or more Heroic instances faithfully consuming your metrics.

Configuring ffwd-java

ffwd-java is a metrics forwarding agent developed at Spotify. It has first-class support for sending metrics into Kafka, and the following will detail how this is configured.

Kafka uses partitions to distribute load, each producer decides which partition a particular message should be sent to. ffwd-java supports partitioning (see "Topics and Logs" in the Kafka documentation) per-host using the following output plugin:


# ffwd.yaml

attributes:
  host: database.example.com
  pod: pod1

output:
  plugins:
    - type: "kafka"
      flushInterval: 10000
      serializer:
        type: spotify100
      router:
        type: attribute
        attribute: pod
      producer:
        metadata.broker.list: "kafka1.example.com,kafka2.example.com,kafka3.example.com"
        request.required.acks: 1
        request.timeout.ms: 1000

The above will instruct ffwd-java to send metrics to kafka, the topic will be determined (routed) to the metrics-<pod> topic, where <pod> is the pod attribute in the metric. A host-based partitioner by default, so metrics sent from a single given host will all end up on the same partition.