Zookeeper with Kafka
In this tutorial, we are going to discuss Apache Zookeeper with Kafka. So, Zookeeper has been how Kafka was able to function all the way up until today, but it’s slowly disappearing and it’s going to be replaced.
The zooKeeper was originally developed by Yahoo to address the bugs that can arise with distributed, big data applications by storing the status of processes running on clusters. Like Kafka, ZooKeeper is an open-source technology under the Apache License.
Zookeeper is used to track cluster state, membership, and leadership. Zookeeper Being Eliminated from Kafka v4.x
- Kafka 0.x, 1.x & 2.x must use Zookeeper
- Kafka 3.x can work without Zookeeper (KIP-500) but is not production-ready yet
- Kafka 4.x will not have Zookeeper
How do the Kafka brokers and clients keep track of all the Kafka brokers if there is more than one? The Kafka team decided to use Zookeeper for this purpose.
Zookeeper is used for metadata management in the Kafka world. For example:
- Zookeeper keeps track of which brokers are part of the Kafka cluster
- Zookeeper is used by Kafka brokers to determine which broker is the leader of a given partition and topic and perform leader elections
- Zookeeper stores configurations for topics and permissions
- Zookeeper sends notifications to Kafka in case of changes (e.g. new topic, broker dies, broker comes up, delete topics, etc.…)
Please note that Zookeeper does not store consumer offsets with Kafka clients >= v0.10.
A Zookeeper cluster is called an ensemble. It is recommended to operate the ensemble with an odd number of servers, e.g., 3, 5, 7, as a strict majority of ensemble members (a quorum) must be working in order for Zookeeper to respond to requests. Zookeeper also has a concept of leaders and the rest are followers. Zookeeper has a leader to handle writes, and the rest of the servers are followers to handle reads.
If you are managing Kafka brokers, the answer is yes. Until Kafka 4.0 is out and ready, then you should not use Kafka without Zookeeper in production.
As long as Kafka without Zookeeper is not production-ready, you must use Zookeeper in your production deployments for Apache Kafka.
Over time, the Kafka clients and CLI have been migrated to leverage the brokers as a connection endpoint instead of Zookeeper. This means that:
- Since Kafka 0.10, consumers store offset in Kafka and Zookeeper and must not connect to Zookeeper as the option is deprecated.
- Since Kafka 2.2, the
kafka-topics.shCLI command references Kafka brokers and not Zookeeper for topic management (creation, deletion, etc…) and the Zookeeper CLI argument is deprecated.
- All of the APIs and commands that were previously leveraging Zookeeper are migrated to use Kafka instead so that when clusters are migrated to be without Zookeeper, the change is invisible to clients.
- Zookeeper is also less secure than Kafka, and therefore Zookeeper ports should only be opened to allow traffic from Kafka brokers and not Kafka clients.
Therefore, to be a great modern-day Kafka developer, never ever use Zookeeper as a configuration in your Kafka clients, and other programs that connect to Kafka.
Why remove ZooKeeper from Kafka implementations?
Using ZooKeeper with Kafka adds complexity for tuning, security, and monitoring. Instead of optimizing and maintaining one tool, users need to optimize and maintain two tools. Building out Kafka functionality to also handle traditional ZooKeeper tasks makes implementing and running Kafka simpler.
How does Kafka work without ZooKeeper?
The latest version of Kafka uses a new quorum controller. This quorum controller enables all of the metadata responsibilities that have traditionally been managed by both the Kafka controller and ZooKeeper to be run internally in the Kafka cluster.