KDB

Kafka

Kafka is fast. A single node can handle hundreds of read/writes from thousands of clients in real time. Kafka is also distributed and scalable. It creates and takes down nodes in an elastic manner, without incurring any downtime. Data streams are split into partitions and spread over different brokers for capability and redundancy.

here is some useful terminology:

Topic: a feed of messages or packages

Partition: group of topics split for scalability and redundancy

Producer: process that introduces messages into the queue

Consumer: process that subscribes to various topics and processes from a feed of published messages

Broker: a node that is part of the Kafka cluster

Memeroy

Kafka should run entirely on RAM. JVM heap size shouldn’t be bigger than your available RAM. That is to avoid swapping.

Swap usage

Watch for swap usage, as it will degrade performance on Kafka and lead to operations timing out (set vm.swappiness = 0). When used swap is > 128MB.

Kafka Monitoring Tools

Any monitoring tools with JMX support should be able to monitor a Kafka cluster. Here are 3 monitoring tools we liked:

First one is check_kafka.pl from Hari Sekhon. It performs a complete end to end test, i.e. it inserts a message in Kafka as a producer and then extracts it as a consumer. This makes our life easier when measuring service times.

Another useful tool is KafkaOffsetMonitor for monitoring Kafka consumers and their position (offset) in the queue. It aids our understanding of how our queue grows and which consumers groups are lagging behind.

Last but not least, the LinkedIn folks have developed what we think is the smartest tool out there: Burrow. It analyzes consumer offsets and lags over a window of time and determines the consumer status. You can retrieve this status over an HTTP endpoint and then plug it into your favourite monitoring tool (Server Density for example).

Oh, and we would be amiss if we didn’t mention Yahoo’s Kafka-Manager. While it does include some basic monitoring, it is more of a management tool. If you are just looking for a Kafka management tool, check out AirBnb’s kafkat.

commands

Start zookeeper

bin/zookeeper-server-start.sh config/zookeeper.properties

bin/kafka-server-start.sh config/server.properties

~/dev/git/kafka-demo/kafka_2.11-2.0.0/bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1 –partitions 1 –topic todtest
bin/kafka-topics.sh –list –zookeeper localhost:2181
bin/kafka-console-producer.sh –broker-list localhost:9092 –topic todtest
bin/kafka-console-consumer.sh –bootstrap-server localhost:9092 –topic todtest –from-beginning

bin/kafka-topics.sh –describe –zookeeper localhost:2181 –topic test

list topics

1
./kafka-topics.sh --list --zookeeper localhost:2181

describe topics

./kafka-topics.sh –describe –zookeeper localhost:2181

using connector

bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties

mvn archetype:generate \
-DarchetypeGroupId=org.apache.kafka \
-DarchetypeArtifactId=streams-quickstart-java \
-DarchetypeVersion=2.0.0 \
-DgroupId=io \
-DartifactId=todzhang \
-Dversion=0.1 \
-Dpackage=todzhangapp

Security

The keystore stores each machine’s own identity. The truststore stores all the certificates that the machine should trust. Importing a certificate into one’s truststore also means trusting all certificates that are signed by that certificate. As the analogy above, trusting the government (CA) also means trusting all passports (certificates) that it has issued. This attribute is called the chain of trust, and it is particularly useful when deploying SSL on a large Kafka cluster. You can sign all certificates in the cluster with a single CA, and have all machines share the same truststore that trusts the CA. That way all machines can authenticate all other machines.

To deploy SSL, the general steps are:

  • Generate the keys and certificates
  • Create your own Certificate Authority (CA)
  • Sign the certificate

Generate the key and the certificate for each Kafka broker in the cluster. Generate the key into a keystore called kafka.server.keystore so that you can export and sign it later with CA. The keystore file contains the private key of the certificate; therefore, it needs to be kept safely.

With user prompts

keytool -keystore kafka.server.keystore.jks -alias localhost -genkey

Without user prompts, pass command line arguments

keytool -keystore kafka.server.keystore.jks -alias localhost -validity {validity} -genkey -storepass {keystore-pass} -keypass {key-pass} -dname {distinguished-name} -ext SAN=DNS:{hostname}
Ensure that the common name (CN) exactly matches the fully qualified domain name (FQDN) of the server. The client compares the CN with the DNS domain name to ensure that it is indeed connecting to the desired server, not a malicious one. The hostname of the server can also be specified in the Subject Alternative Name (SAN). Since the distinguished name is used as the server principal when SSL is used as the inter-broker security protocol, it is useful to have hostname as a SAN rather than the CN.

Create your own Certificate Authority (CA)

Generate a CA that is simply a public-private key pair and certificate, and it is intended to sign other certificates.

1
openssl req -new -x509 -keyout ca-key -out ca-cert -days {validity}

Add the generated CA to the clients’ truststore so that the clients can trust this CA:

1
keytool -keystore kafka.client.truststore.jks -alias CARoot -import -file ca-cert

Add the generated CA to the brokers’ truststore so that the brokers can trust this CA.

1
keytool -keystore kafka.server.truststore.jks -alias CARoot -import -file ca-cert

Sign the certificate

To sign all certificates in the keystore with the CA that you generated:

Export the certificate from the keystore:

keytool -keystore kafka.server.keystore.jks -alias localhost -certreq -file cert-file
Sign it with the CA:

openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days {validity} -CAcreateserial -passin pass:{ca-password}
Import both the certificate of the CA and the signed certificate into the broker keystore:

keytool -keystore kafka.server.keystore.jks -alias CARoot -import -file ca-cert
keytool -keystore kafka.server.keystore.jks -alias localhost -import -file cert-signed

SASL

Simple Authentication and Security Layer (SASL) is a framework for authentication and data security in Internet protocols. It decouples authentication mechanisms from application protocols, in theory allowing any authentication mechanism supported by SASL to be used in any application protocol that uses SASL. Authentication mechanisms can also support proxy authorization, a facility allowing one user to assume the identity of another. They can also provide a data security layer offering data integrity and data confidentiality services. DIGEST-MD5 provides an example of mechanisms which can provide a data-security layer. Application protocols that support SASL typically also support Transport Layer Security (TLS) to complement the services offered by SASL.

References