In this tutorial, we are going to discuss Apache Kafka brokers. A Kafka cluster is an ensemble of multiple Kafka brokers, and a broker is just a server. But in Kafka, they’re called brokers because they receive and send data.
A Kafka broker is going to be identified with an ID, which is an integer. For example, we’re going to have Broker 101, Broker 102, and Broker 103 in our cluster.
Each broker is going to contain only certain topic partitions. That means that your data is going to be distributed across all Kafka brokers. After connecting to any broker (called a bootstrap broker), you will be connected to the entire cluster (Kafka clients have smart mechanics for that). So that means that you don’t need to know in advance all the brokers in your cluster. You just need to know how to connect to one broker, and then your clients will automatically connect to the rest.
That means that your Kafka cluster can be made of as many Kafka brokers as you want. A good number to get started is going to be 3 brokers. But some big clusters are going to have over 100 brokers in them. In this example, we just chose to number brokers starting at 100, arbitrary because, well, this is just easier for me to talk about brokers with the number 100 and then topic partitions with 0, 1, and 2.
Kafka Brokers and Topics
So let’s talk about how brokers and Kafka topics and partitions are related, and let’s take an example in which we have a Topic-A with 3 partitions and a Topic-B with two partitions. And then we have 3 Kafka brokers, Broker 101, 102, and then 103.
So, Broker 101 is going to have Topic-A, Partition 0, then Broker 102 is going to have Topic-A, Partition 2, and this is not a mistake. And then Broker 103 is having Topic-A, Partition 1. So as we can see, the topic partitions are going to be spread out across all brokers in whatever order.
And then for Topic-B, then we have Topic-B, Partition 1 on Broker 101, and Topic-B, Partition 0 on Broker 102. So in this example, we see that the data is distributed and it’s normal that Broker 103 does not have any Topic-B data partition because the two partitions have already been placed on our Kafka broker.
In this example, as we see, the data and your partitions are going to be distributed across all brokers, and this is what makes Kafka scale, and what’s actually called horizontal scaling because the more partitions and the more brokers we add, the more the data is going to be spread out across our entire cluster.
we’re noting the fact that the brokers don’t have all the data. The brokers only have the data they should have.
Kafka Broker Discovery
So let’s talk about this broker discovery mechanism. So each Kafka broker in your cluster is called a bootstrap server. So let’s take an example of 5 Kafka brokers in your Kafka cluster.
Here I just represented Broker 101 as a bootstrap, but all of them are actually bootstrap servers. And you will see the bootstrap server arguments coming back when you use the command-line interface or the Java programming.
So that means that in this cluster, we only need to connect to one broker, and then the clients will know how to be connected to the entire cluster. So our Kafka client is going to initiate a connection into Broker 101, as well as a metadata request. And then the Broker 101, if successful, is going to return the list of all the brokers in the cluster, and actually more data as well, such as which broker has which partition, but more on that later.
And then the Kafka client is going to be able to connect to the broker it needs. For example to produce or to consume data. So that means that each broker in your Kafka cluster is pretty smart and it knows about all the other brokers, all the topics, and all the partitions. That means that each broker has all the metadata information of your Kafka cluster, and this is how clients connect to a Kafka cluster.