Spring Kafka consumer: Is there a way to read from multiple partitions using Kafka 0.8? - spring

This is the scenario:
I know that using latest API related to Spring kafka (like Spring-integration-kafka 2.10) we can do something like:
#KafkaListener(id = "id0", topicPartitions = { #TopicPartition(topic = "SpringKafkaTopic", partitions = { "0" }) })
#KafkaListener(id = "id1", topicPartitions = { #TopicPartition(topic = "SpringKafkaTopic", partitions = { "1" }) })
and with that read from different partitions related to the same kafka topic.
I'm wondering if we can do the same using, for example spring-integration-kafka 1.3.1
I didn't find any tip about how to do that (I'm interesting in the xml version).

In Kafka you can decide from which topics you want to read,
but we can't decide from which partitions we want to read, it's up to Kafka to decide that in order to avoid reading the same message more than once.
Consumers don't share partitions for reading purposes, by Kafka definition.
If you'll have more consumers than partitions some consumers will stay idle and won't consume from any partition. for example, if we'll have 5 consumers and 4 partitions, 1 consumer will stay idle and won't consume data from kafka broker.
The actual partition assignment is being done by a kafka broker (the group coordinator) and a leader consumer. we can't control that.

This definition helped me the most:
In Apache Kafka, the consumer group concept is a way of achieving two
things:
Having consumers as part of the same consumer group means providing the “competing consumers” pattern with whom the messages
from topic partitions are spread across the members of the group. Each
consumer receives messages from one or more partitions
(“automatically” assigned to it) and the same messages won’t be
received by the other consumers (assigned to different partitions). In
this way, we can scale the number of the consumers up to the number of
the partitions (having one consumer reading only one partition); in
this case, a new consumer joining the group will be in an idle state
without being assigned to any partition.
Having consumers as part of different consumer groups means providing the “publish/subscribe” pattern where the messages from
topic partitions are sent to all the consumers across the different
groups. It means that inside the same consumer group, we’ll have the
rules explained above, but across different groups, the consumers will
receive the same messages. It’s useful when the messages inside a
topic are of interest for different applications that will process
them in different ways. We want all the interested applications to
receive all the same messages from the topic.
From here Don't Use Apache Kafka Consumer Groups the Wrong Way!

Related

One partition multiple consumers same group, consumer IDs

We have one topic with one partition due to ordering of message requirements. We have two consumers running on different servers with same set of configurations i.e. groupId, consumerId, consumerGroup. i.e.
1 Topic -> 1 Partition -> 2 Consumers
When we deploy consumers same code is deployed on both the servers. Noticed when a message comes we see both the consumers are consuming message rather than only one processing. Reason having consumers running on two separate servers is if one server crashes at least other can continue processing messages. But looks like if both up both consuming messages. Reading Kafka docs it says if we have more consumers than partitions then some stay idle don't see that happening. Anything we are missing on configuration side apart from consumerId & groupId. Thanks
As #Gary Russel said, as long as the two consumer instances have their own consumer group, they will consume every event that is written to the topic. Just put them into the same consumer-group. You can provide a consumer-group-id in the consumer.properties.

Parallel processing and auto scaling in spring-kafka KafkaListener

I'm using spring-kafka to consume messages from two Kafka topics, which sends same message format as below.
#KafkaListener(topics = {"topic_country1", "topic_country2"}, groupId = KafkaUtils.MESSAGE_GROUP)
public void onCustomerMessage(String message, Acknowledgment ack) throws Exception {
log.info("Message : {} is received", message);
ack.acknowledge();
}
Can KafkaListener allocate the number of consumer threads according to the number of topics that it listens by it's own and parallel process messages in two topics? Or it doesn't support parallel processing and messages have to wait in the topic till one message gets processed?
In case if the number of messages in the topic is higher, I need to autoscale my micro-service to start new instances (till the number of partitions). What are the parameters (CPU, memory) I can depend on to find out the number of messages in the topics is higher from KafkaListener point of view? (i.e In an API I can auto-scale the service by monitoring the HTTP latency)
You can set the concurrency property to run more threads; but each partition can only be processed by one thread. To increase concurrency you must increase the number of partitions in each topic. When listening to multiple topics in the same listener, if those topics only have one partition, you may not get the concurrency you desire unless you change the kafka consumer partition assignor.
See https://docs.spring.io/spring-kafka/docs/2.5.0.RELEASE/reference/html/#using-ConcurrentMessageListenerContainer
When listening to multiple topics, the default partition distribution may not be what you expect. For example, if you have three topics with five partitions each and you want to use concurrency=15, you see only five active consumers, each assigned one partition from each topic, with the other 10 consumers being idle. This is because the default Kafka PartitionAssignor is the RangeAssignor (see its Javadoc). For this scenario, you may want to consider using the RoundRobinAssignor instead, which distributes the partitions across all of the consumers. Then, each consumer is assigned one topic or partition. ...
If you want to scale horizontal beyond the partition count and dynamically - consider using something like Parallel Consumer (PC). It can be used within a Spring context.
By using PC, you can processing all your keys in parallel, regardless of how long it takes to process, and you can be as concurrent as you wish - and this can scale dynamically.
PC directly solves for this, by sub partitioning the input partitions by key and processing each key in parallel.
It also tracks per record acknowledgement. Check out Parallel Consumer on GitHub (it's open source BTW, and I'm the author).

Spring Boot Kafka - Message management with consumer different

My application create with SpringBoot and is in cluster (two different istance openshit)
Every istance has one consumer that read message of topic in replication factory.
I would like to find a mechanism to block the reading of a message into topic in replication factory if it has already been read by one of the two consumers
Example:
CONSUMER CLIENT A -- READ MSG_1 --> BROKER_1
- Offset increase
- Commit OK
CONSUMER CLIENT B --> NOT READ MSG_1 --> BROKER_1
-- Correct beacause already commit
Now BROKER_1 is show and new lead is BROKER_2
How can I block the already read message into BROKER_2?
Thanks all!
Giuseppe.
Replication factor doesn't control if/how consumers read messages. The partition count does. If the topic only has one partition, then only one consumer instance is able to read messages, and all other instances are "blocked". And if the message is already read and commited then it doesn't matter which broker is the leader because the offsets are maintained per topic, not per replica
If you have more than one partition and you still want to block consumers from being able to read data, then you'll need to implement some external, coordinated lock via Zookeeper, for example

Avail same messages to multiple RabbitMQ Consumers

Requirement:
1)I need to fetch data stored in Mongo DB through Java Application and using topic exchange & binding keys, created 3 queues on RabbitMQ. I have implemented everything up to this point.
The problem starts from the 2nd point onwards.
2) When the messages should be available to multiple consumers from all the 3 queues. But when first consumer consumes the messages from 3 queues it will not be available for the rest of the consumers. How to make messages highly available to multiple consumers.
Is there any ways to achieve this or is this requirement has any alternate solutions to it.
All your consumers must provide their own unique queue and bind it to the same exchange.
There is no such a Topic abstraction in AMQP, like it is with JMS.
Even if we can publish message through the topic or fanout exchange, the message will be placed to the queue as single entry, so only one consumer will be able to pick it up from the the.
The config for my proposition may look like:
<queue id="commandQueue" name="#{node.id}.command"
auto-delete="true"/>
<fanout-exchange name="commandsExchange">
<bindings>
<binding queue="commandQueue"/>
</bindings>
</fanout-exchange>
<amqp:inbound-channel-adapter id="commandConsumer"
queue-names="#{commandQueue.name}"
channel="commandChannel"/>
With that all my application instances bind their unique queue (based on the node.id abstraction) to the same commandsExchange. And the published message to the commandsExchange will be delivered to all my nodes.
auto-delete="true" helps me to avoid extra messages for my queue, if node is dead.
HTH

Load balancing Tibco EMS topic subscribers

I have a Tibco EMS topic subscriber which I need to load balance among different instances. Each published message to the topic needs to be received by one (and only one) instance of each subscriber load balance group.
Just using global topics and balanced EMS connections (tcp://localhost:7222|tcp://localhost:7224) results in the same message received by all instances of each subscriber load balance group, producing duplicates.
Do you know any alternative for load balancing topic subscribers?
You can:
A) Bridge the topic to a queue and reconfigure your subscribers to read from the queue. Queues behave differently to topics in that a message is only obtained by one subscriber rather than all.
B) Create a number of durable subscribers on the topic with selectors that divide messages between the durables. E.g. If a message has a property 'id' that is sequentially increasing:
create durable topic DURABLENAME1 selector="(id - 2 * (id / 2)) = 0"
create durable topic DURABLENAME2 selector="(id - 2 * (id / 2)) = 1"
The selector is just a modulo so half the messages will go on one durable, and half on the other.
With EMS 8.0 new concept shared subscription is added with these only one subscription receives messages with the same subscription name go through the EMS user guide docs it may help you.
While both previous answers are valid, however the most natural approach would be to not use topics at all.
Using queues instead pf topics does the whole job (loadbalancing in round robin fashion).

Resources