multiple Spring Kafka Consumer one class multiple groups - spring-boot

I am experimenting with Apache Kafka and Spring Boot.
Is there a way to set up multiple Listeners in one JVM, Each of them associated to a different group (so that they each respond to an event) on a topic?
Is there a good pattern for this?
#KafkaListener(topics ="${slappy.consumer.topic}", groupId = "t1, t2, t3, t4, t5, t6, t7, t8, t9, t10")
public void listenToGroup(#Payload WackyMessage message) {
log.info("Wacky Consumer with processItemId of received message: '{}'", message);
process(message, ProcessItemIdentifiers.omni_silo0_deleter);
}
Sounds like this is the sort of thing Akka is for. I was hoping I could just do naked spring. I am not sure If I want to add Akka into the mix at this point.

Your possible groupId = "t1, t2, t3, t4, t5, t6, t7, t8, t9, t10" solution is logically wrong. You still have the same listener method, so what is the point to call it so many times for the same record in Kafka topic?
If you would have different #KafkaListener methodd signatures, I would agree with you about different groups on them, so each of them would really receive the same message and it could be done in parallel as well, since each #KafkaListener blows up its own listener container process.
On the other hand you can leverage an #EventListener abstraction from Spring to distribute a single record from a single #KafkaListener to any possible number of subscribers to that event. In this case you don't need to think about different groups: the single one is enough to consumer from Kafka and process everything rest in the target application already.

Related

Way to determine Kafka Topic for #KafkaListener on application startup?

We have 5 topics and we want to have a service that scales for example to 5 instances of the same app.
This would mean that i would want to dynamically (via for example Redis locking or similar mechanism) determine which instance should listen to what topic.
I know that we could have 1 topic that has 5 partitions - and each node in the same consumer group would pick up a partition. Also if we have a separately deployed service we can set the topic via properties.
The issue is that those two are not suitable for our situation and we want to see if it is possible to do that via what i explained above.
#PostConstruct
private void postConstruct() {
// Do logic via redis locking or something do determine topic
dynamicallyDeterminedVariable = // SOME LOGIC
}
#KafkaListener(topics = "{dynamicallyDeterminedVariable")
void listener(String data) {
LOG.info(data);
}
Yes, you can use SpEL for the topic name.
#{#someOtherBean.whichTopicToUse()}.

Delay start of consumer forces rebalance of group

We need to delay start of consumer.
Here's what we need:
Start consumer A (reading topic "xyz")
When consumer A will process all messages, we need to start consumer B (reading topic "zyx")
After reading this:
How to find no more messages in kafka topic/partition & reading only after writing to topic is done
We set idleEventInterval on containerProperties of consumer A:
containerProperties.setIdleEventInterval(30000L);
and on consumer B:
container.setAutoStartup(false);
then we have:
#EventListener
public void handleListenerContainerIdleEvent(ListenerContainerIdleEvent event) {
if(canStartContainer(event.getListenerId())) {
Optional.ofNullable(containers.get("container-a"))
.ifPresent(AbstractMessageListenerContainer::start);
}
}
We found that it's exactly what we need - it works fine, but we faced one problem: when consumer B is starting, it forces rebalance of all other consumers.
Can we avoid it?
Request joining group due to: group is already rebalancing
Revoke previously assigned partitions
(Re-)joining group
It's not a big issue, but we use ConsumerSeekAware to reset offset using seekToBeginning, so topic is read twice
You should not use the same group.id with consumers on different topics; it will cause an unnecessary rebalance, as you have found out.
Use different group.ids for consumers on different topics.

Restart listener and continue from latest message

Case
Clients are ReplyingKafkaTemplate instances.
Server is a ConcurrentMessageListenerContainer created using #KafkaListener and #SendTo annotations on a method.
ContainerFactory uses ContainerStoppingErrorHandler.
Request topic has only 1 partition.
Group ids are static. eg. test-consumer-group.
Requests are sent with timeouts.
Due to an exception thrown, server goes down
but the client keeps dispatching requests which queue up on the
request topic.
Current Behavior
When the server comes back up it continues processing old requests which would have timed out.
Desired Behavior
Instead, it would be better to continue with the last message; thereby skipping past even unprocessed messages as corresponding requests would timeout and retry.
Questions
What is the recommended approach to achieve this?
From the little that I understand, it looks like I'll have to manually set the initial offset. What's the simplest way to implement this?
Your #KafkaListener class must extends AbstractConsumerSeekAware and do something like this:
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
super.onPartitionsAssigned(assignments, callback);
callback.seekToEnd(assignments.keySet());
}
So, every time when your consumer joins the group it is going to seek all the assigned partitions to the end skipping all the old records.

How to republish events from event log?

I see the notion of "republishing" events from event long mentioned everywhere, but it's not really described in detail.
The problem I am thinking of is the following. Certain Producer maintains an event log, and publishes every event to the Queue. Consumer connects to the queue and receives all the events produced.
Consider a case where there are two consumers (C1, C2) and one producer (P1).
Let's say that:
Producer P1 is started
Consumer C1 connects to the queue
P1 produces events E1, E2, E3
C1 consumes E1, E2, E3
Consumer C2 connects to the queue
P1 produces E4, E5, E6
C1 consumes E4, E5, E6
C2 consumes E4, E5, E6
At this point C2 has missed all events that previously happened! How does:
C2 request the events E1, E2, E3 be republished?
C2 avoid getting events out of order (i.e.getting E4 before getting E1, E2, E3)?
If anyone has some insights, much appreciated.
Here is what I am thinking.
Synchronizing Consumers with Producer
Consumer records last event sequence number received from a foreign context.
Records it in the aggregate table in the latest_event_per_context column (this only works for one foreign context) OR
Context relation that has a foreign key to aggregate table: Table(aggregate_id, context_name, sequence_number), where the sequence number is the latest event received in a given context.
This leaves this concern in application layer. OR
Records it as an event EventReceived(event, context) - leaves it to domain to discard duplicates
Producer sequences all the events it sends out.
Check out https://geteventstore.com/
When a new subscriber is connected it has the ability to replay a stream. GES offloads the duty of managing what has already been received to the clients.

Spring AMQP RabbitMQ how to make two parallel consumers will not grab same task at the same time?

I have two systems are integrated with RabbitMQ.
BackGround
Client send multiple request messages from Spring-AMQP outbound-Gateway to RabbitMQExchange, the rabbitmq-DirectExchange will use round-robin dispatching those messages to Multiple Workers(Those workers are independent located on different desktops which will parallel running same worker code for processing different messages from RabbitExchange by use simpleMessageListner.
Logic Flow
Similiar with Rabbitmq Tutorial multiWorker-DirectExchange.
Client-----sendRequests (5tasks) to ---->RabbitMQ-DirectExchange
then Rabbitmq-DirectExchange distribute those 5 tasks to workers
PC1 ( Worker1 ), PC2 ( Worker2 )
ExchangeType & my Bindings
<!-- rabbit connection factory, rabbit template, and rabbit admin -->
<rabbit:connection-factory
id="connectionFactory"
host="local IP address"
username="guest"
password="guest"
channel-cache-size="10" />
<rabbit:template id="amqpTemplate"
connection-factory="connectionFactory"
reply-timeout="600000"
exchange="JobRequestDirectExchange"/>
<rabbit:admin connection-factory="connectionFactory" id="rabbitAdmin" />
<rabbit:direct-exchange name="taskRequests"
auto-delete="false"
durable="true" >
<rabbit:bindings>
<rabbit:binding queue="jobRequests" key="request.doTask" />
</rabbit:bindings>
</rabbit:direct-exchange>
<rabbit:queue name="jobRequests" auto-delete="false" durable="true" />
Worker-The consumer Configuration
<rabbit:listener-container id="workerContainer"
acknowledge="auto"
prefetch="1"
connection-factory="connectionFactory">
<rabbit:listener ref="taskWorker" queue-names="jobRequests" />
</rabbit:listener-container>
The Worker Class is simple POJO who will process the request and complete task.
Use: RabbitMQ 3.2.2 with Spring-Integration-Amqp 2.2
What I expect
I expect that Worker1 can receive some of tasks while Worker2 can pick the rest of tasks ( the other tasks ).
I wish workers can parallel together do whole 5 tasks. Each time each worker only do one task, after finish will be distribute another tasks one by one. (rabbit-listner has been set to prefetch=1)
Such as
worker1: t2 t3 t5
worker2: t1 t4
But
After lots of runtime-test, sometime it do the task correctly.
Worker1------task4 task1
Worker2------task3 task2 task5
While sometime it do the wrong way like this:
Worker1------task4 task1
Worker2------task4 task2 task1
Aparently, the task4 and task1 are be picked by worker1 and worker2 sametime.
Runtime test:
I checked that the client correctly send out task1 task2 task3 task4 task5 request message to RabbitExchange. But everytime each worker receive different tasks. There is a common case that may trigger wrong dispathcing.
There are 5tasks (t1,t2,t3,t4,t5) at RabbitmqExchange, and they will be send to 2 parallel workers (w1,w2).
w1 got tasks: t2 t1 t4
w2 got tasks: t3 t1
As Round-Robin dispatch method, w1 and w2 in sequence got tasks.
w1 got t2 and w2 got t3.
While t2 and t3 running, RabbitmqExchange send t1 to w1 and wait for ack from w1.
Suppose t2 spend more time to finish task than t3 does and w2 is free when w1 doing t1.
w2 finish t3 task will receive RabbitmqExchange dispatched t1 because w2 is not busy and RabbitExchange did not receive t1 finished task ack message.
My understanding is
Both w1 and w2 are doing same task t1. Either one of them once finish t1 will send back ack to RabbitmqExchange, then RabbitmqExchange will dequeue one task message. As t1 has been finished twice, RabbitmqExchange dequeue one more message that it should. So in this way t5 message has been dequeued because t1 has been done twice. Although 5 messages in RabbitmqExchange are acked and dequeue finish. But two workers missing do t5 and do t1 twice.
What should I do that can prevent two parallel workers grab the same message from a same Rabbit queue?
I tried auto-ack way, the message are correctly acked. But during the time of server wait for worker's ack, rabbitmq may redispatch the message which is not-acked but already been distributed to another worker.
Also thinking about synchronized the sent out messages or give priority to sent out messages. But do not have clear vision how to accomplish.
I am grateful to hear any ideas about this problem.Thanks
One thing I can think that is causing this duplicated messages for your consumers is when a consumer closes the channel before sending an ack message.
In that case, the RabbitMQ broker will requeue the message and set it's redelivered flag to true. From RabbitMQ docs:
If a message is delivered to a consumer and then requeued (because it was not acknowledged before the consumer connection dropped, for example) then RabbitMQ will set the redelivered flag on it when it is delivered again (whether to the same consumer or a different one). This is a hint that a consumer may have seen this message before (although that's not guaranteed, the message may have made it out of the broker but not into a consumer before the connection dropped). Conversely if the redelivered flag is not set then it is guaranteed that the message has not been seen before. Therefore if a consumer finds it more expensive to deduplicate messages or process them in an idempotent manner, it can do this only for messages with the redelivered flag set.
If when you are testing you close one of the worker processes before sending an ack, or in case they fault, this is very likely to happen. You can try to examine the redelivered flag in order to avoid it to be processed again by a different consumer, if that is the case.
Another thing I've noticed is the prefetch setting in your consumer configuration. You should set this to a higher value (tune it for your needs) instead of leaving it at just 1. You can learn more about prefetch here.
Hope that helps!
I tried long time to work out SpringConfigured-way to implement this feature but failed.
While I come out the workable solution using RabbitMQ Java Client API.
Using Spring-Asynchronous Gateway with QuartzScheduler, it always have problem send message as needed. I guess it has reason for multi-threads sort of.
At the beginning, I thought it because of that the Channel instance may accessed concurrently by multiple threads. In this way the confirms are not handled properly.
An important caveat to this is that confirms are not handled properly when a Channel is shared between multiple threads. In that scenario, it is therefore important to ensure that the Channel instance is not accessed concurrently by multiple threads.
Above from http://www.rabbitmq.com/javadoc/com/rabbitmq/client/Channel.html
Finally, I decide give up use Spring-way and change back to use RabbitMQ API(Before I use Spring XML configure the gateway/channels, now use RabbitMQ-JavaClient java programming way declare exchange with channels.). And add usage of RabbitMQRPC for asynchronous callback. Now everything works fine for current requirement.
So in summary, the final solution for my requirement is:
Use RabbitMQ JAVAClient API to declare exchange/channels/binding/routingKey.
For both client and server side.
Use RabbitMQ RPC for implement asynchronous callback feature.
(I follow RabbitMQ's java tutorial, use this link: http://www.rabbitmq.com/tutorials/tutorial-six-java.html)
Did you try setting concurrentConsumers property on the listener container as discussed here?

Resources