Apache Kafka® Technical Fundamentals: Critical Thinking Challenge Exercises

How to Use This Material

These are discussion questions created to go along with our Confluent Technical Fundamentals of Apache Kafka® course. They are designed to have you think more deeply about some of the content than can be achieved via the quick quizzes embedded within. Here are a few options on how to use them:

In an extended live session, your instructor may facilitate discussions of some questions among participants.
The questions are designed to be discussion questions, so if you can connect with a colleague or friend for a live discussion — consider Zoom or other video conferencing — you’ll get the most out of these questions.
Whether you’re working alone or discussing with others, review the question and give it serious though before expanding the Solution to see the intended solution and commentary related to each question.

Q1: How Do Producers Connect to Consumers?

Suppose you have a Kafka cluster; one producer, p₀, producing to topic t₀, with one partition; and one consumer c₀, reading from topic t₀. c₀ read what p₀ produced. How exactly are p₀ and c₀ connected? If you add another producer p₁ (that also produces to topic t₀), must you also add another consumer (e.g., c₁) to read messages from p₁?

Solution

This is one of the most basic scenarios possible—you may be playing with Kafka for fun or to get started, but this setup is not recommended in production. We have a single topic with only one partition, along with a producer and a consumer. Because there is only one partition, all messages that the producer produces will be written to that partition, p₀. And because there is only a single consumer reading from t₀ (note that c₀ had to subscribe to topic t₀), that consumer will be assigned to read from the sole partition.

Fig3 1 SingleProducerToPartitionSingleConsumer

Should we add another producer, p₁, which produces to the same topic, we do not need to add another consumer. The new producer p₁ would write to the sole partition that exists. Consumer c₀ is assigned to consume from the topic t₀. Thus:

Messages from both producers will exist in the same partition (mixed, ordered by when they arrived, not by which producer sent them)
Consumer c₀ would read messages from both producers in the order that they arrived at that partition

Fig3 2 AddingSecondProducerDoesNotNeedSecondConsumer

This brings us to the key takeaway—producers and consumers are decoupled. Producers write to logs; consumers read from logs. Consumers don’t know which producers produced what messages they are reading; producers don’t know which consumers (if any) will read what messages they produce. If you want to scale up production or consumption to improve performance of either, you can just add more producers or consumers and not worry about what’s going on on the other end.

Sometimes, people think there is a direct relationship between producers and consumers, perhaps because of how other systems and programming paradigms tie input and output together, but in Kafka, these two entities are independent.

Q2: Single Consumption or Multiple Consumption?

Suppose you have a Kafka cluster; one producer, p₀, producing to topic t₀, with one partition; and one consumer c₀, reading from topic t₀. Independent of the last question, suppose consumer c₀ read message m₀. Could another consumer c₁₀, in another consumer group, also consume message m₀?

Solution

c₀ has read a given message, m₀. Let’s back up to dissect how this happened:

First, some producer had to have written m₀ to our sole partition.
Consumer c₀ had to have been assigned to read from that sole partition.
Consumer c₀ keeps track of a consumer offset for the sole partition, which is the offset of which message it will read next. Say that c₀’s consumer offset for our lone partition was 17, and m₀ was at offset 17.
So, c₀ reads from offset 17 and gets m₀. Since c₀ has read from offset 17, it advances its consumer offset for the sole partition to 18.

Notice that the message at offset 17 was not removed from the partition. Kafka logs are not queues; messages don’t get removed. It is best to think of consuming as reading messages.

Going back to the question, let’s assume that our other consumer c₁₀ is assigned to consume from the same partition. Message m₀ is still there, so whenever c₁₀ has an offset of 17 and asks for messages, it will indeed read m₀. This gets back to the key Kafka distinguishing feature of multiple consumption, powered by the fact that events are stored.

c₁₀ was in a different consumer group from c₀. Why would c₁₀ consume a message that c₀ already consumed? Each consumer group is doing something different with the same data. Maybe c₀ consumed the message to accomplish something that demands an immediate response, like retrieving a mobile food order, prints an order slip that goes to a restaurant’s kitchen, and ensures the preparation of your lunch so that it’s waiting for you in 15 minutes on the dot. Maybe c₁₀ is consuming lunch orders and analyzing them to tally what food was ordered during lunch at the restaurant. Perhaps it’s working with some other applications or systems to make sure that inventory is in good shape by dinnertime. In the next problem, we dive deeper into consumer groups.

Q3: Grouped Consumers Details

Suppose you have a topic with 3 partitions, p₀, p₁, and p₂. Further, suppose we have consumer group g₀ with consumers c₀ and c₁. No matter what further configuration you have, what is the same about c₀ and c₁? What are they doing differently?

Solution

In this case, both c₀ and c₁ are in the same consumer group. This means that they are performing the same logic as each other. But what is different is they are working with different data. The goal of a consumer group is to read all of the data from all of the topics to which its consumers are subscribed (and if all consumers in a group share the same logic, they share their topic subscription). Let’s simplify it this way: These consumers are subscribed to one topic, and these consumers are working to consume all of the data from all of the partitions of that topic.

In a non-Kafka scenario, it’s as if we have a hotel with 10 floors of guest rooms. We might have a team of five housekeepers, each of whom must fully clean every assigned guest room alone (no teamwork is allowed within a single room in this scenario). No matter what, by the time it’s time for guests to arrive, all rooms on all 10 floors must be cleaned.

Housekeeper 1 on the team takes Floors 1 and 2 and cleans all the dirty rooms on those floors
Housekeeper 2 takes Floors 3 and 4 and cleans all the dirty rooms on those floors
This continues, i.e., Housekeeper 3 cleans Floors 5 and 6, Housekeeper 4 cleans Floors 7 and 8, and Housekeeper 5 cleans Floors 9 and 10
The housekeepers are a team, and they trust each other to perform the work

There is a division of labor wherein everyone on the team is doing the same task but on different “data.” In this metaphor, the consumers are the housekeepers, and their team is the consumer group. The hotel rooms are messages, and each floor is a partition. Each housekeeper (consumer) is assigned two partitions (floors).

Fig4 1 HotelKafkaConsumerPartitionAsgnMetaphor

Q4: Fixing a Broken Consumer/Partition Assignment

Suppose you have a topic with 3 partitions, p₀, p₁, and p₂. Further, suppose you have consumer group g₀ with consumers c₀ and c₁. Suppose at one point in time, we have only the following assignments: c₁ is consuming from p₀ and c₁ is consuming from p₂. What is the downside about this situation? Propose a fix.

Details

Now, we have this assignment of consumers to partitions:

c₁ is consuming from p₀
c₁ is consuming from p₂

But we have three partitions: p₀, p₁, and p₂.

The consumer group as whole must be working together to consume from all partitions. In this setup, p₁ isn’t getting any attention. That means all the ride requests on p₁ aren’t getting matched to drivers and people are waiting in the rain. Or a bunch of food orders on p₁ aren’t getting prepared and customers with limited lunch breaks will show up to find they have to wait another 15 minutes. Or all the hotel rooms on Floor 3 aren’t getting clean and people are going to find a mess upon check-in. We better consume from all the partitions. The easiest fix is to have c₀ consuming from p₁.

Q5: Understanding Consumer Offsets

Suppose you have a topic with 3 partitions, p₀, p₁, and p₂. Further, suppose you have consumer group g₀ with consumers c₀ and c₁. It’s the same situation as before. Suppose c₁ just read the message at offset 12 in p₀. What is its consumer offset for this partition? With your fix in mind, are there any other consumer offsets stored?

Solution

We first have the case that c₁ just read the message at offset 12 in p₀. Remember, consumers just read messages, leave them, and advance their offsets. So, c₁ will set its offset in p₀ to 13.

Fig4 2 ConsumerOffsetAdvancingInPartition

Think of it like a bookmark. If you just read Page 12 of a book and need to put the book down, you’re most likely going to put your bookmark at Page 13 to remind you where to pick up next time. If you’d prefer to put your bookmark where you last read, that’s reasonable. But the designers of Kafka had to make a decision and they made the decision that the consumer offset will tell the offset of the message to read next.

Are there other consumer offsets? Absolutely. Each consumer knows where it will read next in each partition to which it is assigned.

c₀ has a consumer offset for where to read next in p₁
c₁ has a consumer offset for where to read next in p₂

Back to the bookmark analogy: c₀ is reading from p₀ and p₂. Maybe you’ve been reading two books at once. You are likely not on the same page in both of them. A consumer needs an offset for each individual partition, just like you need a separate bookmark for each book.

Q6: When Can Two Consumers Consume the Same Partition?

Suppose you have a topic with 3 partitions, p₀, p₁, and p₂. Further, suppose you have consumer group g₀ with consumers c₀ and c₁. You know that c₁ is consuming from p₀. Can c₀ consume from p₀? If so, why? If not, how can you change the setup to allow another consumer to consume from p₀?

Solution

c₁ is consuming from p₀, and we propose c₀ is also consuming from p₀.

Before going into the answer, let’s go back to that hotel rooms and housekeepers metaphor. This is like having Housekeeper 1 assigned to clean all of the rooms on Floors 1 and 2 (as stated above), but another housekeeper, such as Housekeeper 6, also assigned to clean all of the rooms on Floor 1. Housekeeper 6’s shift starts after Housekeeper 1 is done for the day. What’s going to happen? Housekeeper 6 goes into every room on Floor 1 and finds it clean already? Could Housekeeper 6 re-clean each room? Maybe. But cleaning products, time, effort, and laundry resources are being wasted.

In the Kafka world, the answer would be no. If two consumers in the same group were to consume from the same partition, that would mean reprocessing messages in exactly the same way. At best, that wastes resources and could bother customers with duplicate notifications or ads. At worst, a stakeholder loses money or resources. Simply put, Kafka doesn’t not allow more than one consumer in a consumer group to consume from the same partition.

Fig4 4 CannotHaveTwoConsumersInGroupAsgnToSamePartition

Note that this doesn’t go both ways. A single consumer in a group could consume from more than one partition.

How can we change the problem setup to allow c₀ also to consume from p₀? c₀ can be in a different group. This way, it’s working the data differently from p₀, not reprocessing it in the same way. If this sounds like a lot to manage, the good news is that Kafka takes care of all consumer/partition assignments for you. It won’t let you break the rules. It’ll even adjust things when a component goes down.

Q7: Understanding Compaction

This is one of the logs from the second of the quick quizzes:

Segment	seg0		seg1			active
Offset	3	6	7	9	12	13	14	15
Key	7	2	4	6	4	9	5	9
Value	a	b	c	d	e	f	g	h
Age (days)	12	11	9	8	6	4	3	1

This was specifically from a problem about the deletion retention policy. What if, instead of using deletion, the compaction retention policy were turned on? Which records would remain in this log after compaction?

Solution

Whereas the deletion retention policy deals with the ages of records, compaction cares about the keys of records. (Note that you can have both compaction and deletion running as well.)

Compaction removes records for which there is a newer record with the same key. The record at offset 7, with key 4, is removed, as there is a newer instance of a record with key 4 at offset 12.

You might also think that the record at offset 13, with key 9, would be removed due to the record with key 9 at offset 15. This is a reasonable idea. However, these records are part of the active segment, and compaction does not touch the active segment, so the record at offset 13 does remain in the log. We go into more details on this in both in our Administrator and Developer courses; some great discussions related to this can and do often happen there!

To summarize simply, all records in this log except the record at offset 7 would remain after compaction ran.

Check out our Training & Certification page to learn more about our other Training offerings and sign up.