Zookeeper – Workflow
Once a ZooKeeper ensemble starts, it will wait for the clients to connect. Clients will connect to one of the nodes in the ZooKeeper ensemble. It may be a leader or a follower node. Once a client is connected, the node assigns a session ID to the particular client and sends an acknowledgement to the client. If the client does not get an acknowledgment, it simply tries to connect another node in the ZooKeeper ensemble. Once connected to a node, the client will send heartbeats to the node in a regular interval to make sure that the connection is not lost.
If a client wants to read a particular znode, it sends a read request to the node with the znode path and the node returns the requested znode by getting it from its own database. For this reason, reads are fast in ZooKeeper ensemble.
If a client wants to store data in the ZooKeeper ensemble, it sends the znode path and the data to the server. The connected server will forward the request to the leader and then the leader will reissue the writing request to all the followers. If only a majority of the nodes respond successfully, then the write request will succeed and a successful return code will be sent to the client. Otherwise, the write request will fail. The strict majority of nodes is called as Quorum.
Nodes in a ZooKeeper Ensemble
Let us analyze the effect of having different number of nodes in the ZooKeeper ensemble. If we have a single node, then the ZooKeeper ensemble fails when that node fails. It contributes to “Single Point of Failure” and it is not recommended in a production environment.
If we have two nodes and one node fails, we don’t have majority as well, since one out of two is not a majority. If we have three nodes and one node fails, we have majority and so, it is the minimum requirement. It is mandatory for a ZooKeeper ensemble to have at least three nodes in a live production environment.
If we have four nodes and two nodes fail, it fails again and it is similar to having three nodes. The extra node does not serve any purpose and so, it is better to add nodes in odd numbers, e.g., 3, 5, 7.
We know that a write process is expensive than a read process in ZooKeeper ensemble, since all the nodes need to write the same data in its database. So, it is better to have less number of nodes (3, 5 or 7) than having a large number of nodes for a balanced environment.
The following diagram depicts the ZooKeeper WorkFlow and the subsequent table explains its different components.
Write process is handled by the leader node. The leader forwards the write request to all the znodes and waits for answers from the znodes. If half of the znodes reply, then the write process is complete.
Reads are performed internally by a specific connected znode, so there is no need to interact with the cluster.
It is used to store data in zookeeper. Each znode has its own database and every znode has the same data at every time with the help of consistency.
Leader is the Znode that is responsible for processing write requests.
Followers receive write requests from the clients and forward them to the leader znode.
Present only in leader node. It governs write requests from the follower node.
Responsible for broadcasting the changes from the leader node to the follower nodes.
Zookeeper – Leader Election
Let us analyze how a leader node can be elected in a ZooKeeper ensemble. Consider there are N number of nodes in a cluster. The process of leader election is as follows –
- All the nodes create a sequential, ephemeral znode with the same path, /app/leader_election/guid_.
- ZooKeeper ensemble will append the 10-digit sequence number to the path and the znode created will be /app/leader_election/guid_0000000001, /app/leader_election/guid_0000000002, etc.
- For a given instance, the node which creates the smallest number in the znode becomes the leader and all the other nodes are followers.
- Each follower node watches the znode having the next smallest number. For example, the node which creates znode /app/leader_election/guid_0000000008 will watch the znode /app/leader_election/guid_0000000007 and the node which creates the znode /app/leader_election/guid_0000000007 will watch the znode /app/leader_election/guid_0000000006.
- If the leader goes down, then its corresponding znode /app/leader_electionN gets deleted.
- The next in line follower node will get the notification through watcher about the leader removal.
- The next in line follower node will check if there are other znodes with the smallest number. If none, then it will assume the role of the leader. Otherwise, it finds the node which created the znode with the smallest number as leader.
- Similarly, all other follower nodes elect the node which created the znode with the smallest number as leader.
Leader election is a complex process when it is done from scratch. But ZooKeeper service makes it very simple. Let us move on to the installation of ZooKeeper for development purpose in the next chapter.
Implementing Leader Election Algorithm in Java
To implement the leader election algorithm explained in previous section, we have created following three classes (please click on the heading to see the source code on GitHub) –
ZooKeeperService.java – This class is responsible for interacting with ZooKeeper cluster by connecting to ZooKeeper service, creating, deleting and getting znodes and setting the watch on znodes.
ProcessNode.java – This class represents a process in the leader election. This class is implemented as Runnable and responsible for implementing the leader election algorithm with the help of ZooKeeperService class.
LeaderElectionLauncher.java – This is a launcher class responsible for starting the thread of ProcessNode implementation. Since Apache ZooKeeper client runs daemon processes for notifying about watchevent, this class uses ExecutorService to start ProcessNode so that program doesn’t exit after ProcessNode main thread is finished executing.
That’s it guys. This is all about Apache Zookeeper Tutorial. Let me know your comments and suggestions about this tutorial. Thank you.