Recent Posts

Tuesday, 20 November 2018

Apache Kafka – Terminology

     Basically, Kafka architecture contains few key terms, like topics, producers, consumers, brokers and many more. To understand Apache Kafka in detail, we must understand these key terms first. Below is the list of most prominent Kafka terminologies which may help us to build the strong foundation of Kafka knowledge.

1. Kafka Broker
     There are one or more servers available in Apache Kafka cluster, basically, these servers (each) are what we call a broker. A Kafka server, a Kafka broker and a Kafka node all refer to the same concept and are synonyms.

2. Kafka Topics
     A topic is a category of messages in Kafka. The producers publish the messages into topics and the consumers read the messages from topics. Data is stored in topics. A topic is divided into one or more partitions. In addition, all Kafka messages are generally organized into Kafka topics.

3. Kafka Partitions
     Kafka topics are divided into a number of partitions, which contains messages in an unchangeable sequence. Each message in a partition is assigned and identified by its unique offset. A topic can also have multiple partition logs like the click-topic has in the image to the right. This allows for multiple consumers to read from a topic in parallel.

4. Kafka Producers
     Producers are the publisher of messages to one or more Kafka topics. Producers send data to Kafka brokers. Every time a producer publishes a message to a broker, the broker simply appends the message to the last segment file. Actually, the message will be appended to a partition. Producer can also send messages to a partition of their choice.

5. Kafka Consumers
     Consumers read data from brokers. Consumers subscribes to one or more topics and consume published messages by pulling data from the brokers.

6. Kafka offset
     The offset is a unique identifier of a record within a partition. It denotes the position of the consumer in the partition.

7. Kafka Consumer Group
     A consumer group includes the set of consumer processes that are subscribing to a specific topic. Consumers can join a group called a consumer group. A consumer group includes the set of consumer processes that are subscribing to a specific topic. Each consumer in the group is assigned a set of partitions to consume from. They will receive messages from a different subset of the partitions in the topic. Kafka guarantees that a message is only read by a single consumer in the group.  
     Consumers pull messages from topic partitions. Different consumers can be responsible for different partitions. Kafka can support a large number of consumers and retain large amounts of data with very little overhead. By using consumer groups, consumers can be parallelized so that multiple consumers can read from multiple partitions on a topic, allowing a very high message processing throughput. The number of partitions impacts the maximum parallelism of consumers as you cannot have more consumers than partitions. 

8.  Kafka Log Anatomy
     Another way to view a partition is as a log. A data source writes messages to the log and one or more consumers reads from the log at the point in time they choose. In the diagram below a data source is writing to the log and consumers A and B are reading from the log at different offsets.


9. Kafka Message Ordering and Client Acknowledgments
     In Kafka, the order of the messages delivered from a certain partition and messages received by the partition is same.


10. Node in Kafka
     In the Apache Kafka cluster, a node is a single computer.

11. Kafka Cluster
     A  group of computers which are acting together in order to achieve a common purpose is what we call a cluster. In Kafka also, it has the same meaning i.e. a group of computers, each having one instance of Kafka broker.

12. Kafka Replicas
     Here, the word replica refers to a backup. That means a replica of a partition is a “backup” of a partition. Basically, we use replicas in order to prevent data loss, they never read or write data.

13. Kafka Message
     In one line, Message in Kafka is an information which travels from the producer to a consumer through Apache Kafka.

14. Kafka Leader
     A node which is responsible for all reads and writes for the given partition is what we call a Kafka Leader. So, every partition consists of one server, which acts as a leader.

15. Follower in Kafka
     Simply putting, a node that follows leader instructions is what we call a follower. The basic usage of a follower is, if any leader fails, any of these followers will automatically become the new leader. However, it plays as the normal consumer, which pulls messages and also updates its own data store.

16. Kafka Data Log
     Messages are preserved through Kafka, especially for a considerable amount of time. That means consumers can read as per their convenience. Since Kafka is configured to keep messages for 24 hours but somehow consumer is down for time greater than 24 hours, in that case, the consumer will lose messages. Still, it is possible to read that message from last known offset, only if the downtime on part of the consumer is just 60 minutes.

17. Kafka Connector API
     The API which permits to build as well as run reusable consumers or producers that connects existing applications or data systems to Kafka topics, we use the Connector API.

Next Tutorial : Apache Kafka - Pros and Cons

Previous Tutorial : Apache Kafka - Features

1 comment: