Spring Boot #kafkaListner with blocking queue - spring-boot

I am new to Spring Boot #kafkaListener. My application receiving almost 200K message per second on topic. I want to separate message listener and processing of the message.
How can I use java.util.concurrent.BlockingQueue with #kafkaListener? Can I use it by using CompletableFuture?
Any sample code will help more.

I believe you want to have your consumer with pipelining implemented. Its not uncommon for one to implement this in a scenario like yours. Why? Well, the KafkaConsumer lacks in that decompressing / deserializing can be time consuming without considering the time it takes to do processing. Since these operations are stacked behind one thread, it would be ideal to separate the polling from the processing, which is achieved through a couple of buffers.
One way to do this: your EventReceiver spins up a thread for the polling. That thread would do the same thing you always do, but instead of firing off the listeners for each event, you'd pass the event to a receivedEvents buffer which could be BlockingQueue<RecieveEvent>. So in the for loop, you pass each record to the blocking queue. This thread would leverage another buffer once the for loop is over, like Queue<Map<TopicPartition, OffsetAndMetadata>> -- and it would commit the offsets that the processingThread has successfully processed.
Next, your EventReceiver spins up another thread - processingThread. This would handle pulling records from the buffer, firing the event to all the listeners for this receiver, and then update the Queues state for the pollingThread to commit.
Why doesn't the processingThread just commit the events instead of passing it back to the pollingThread? This is bc KafkaConsumer requires that the same thread that calls .poll() should be the one that calls consumer.commitAsync(...) or else you'll get a concurrency exception.
This approach doesn't work with auto commit enabled.
In terms of how one can do this using Spring Kafka, I'm not completely sure. However, I do know Spring Kafka separates EventReceiver from EventListener (#KafkaListener) which is separating the low-level kafka work from the business logic. In theory, you'd have to tune their implementation, but I think implementing this one without Spring Kafka library would be easier.

Related

What is the most efficient way to know that a Kafka event is visible in a K-Table?

We use Kafka topics as both events and a repository. Using the kafka-streams API we define a simple K-Table that represents all the events in the topic.
In our use case we publish events to the topic and subsequently reference the K-Table as the backing repository. The main issue is that the published events are not immediately visible on the K-Table.
We tried transactions and exactly once semantics as described here (https://kafka.apache.org/26/documentation/streams/core-concepts#streams_processing_guarantee) but there is always a delay we cannot control.
Publish Event
Undetermined amount of time
Published Event is visible in the K-Table
Is there a way to eliminate the delay or otherwise know that a specific event has been consumed by the K-Table.
NOTE: We tried both partition and global tables with similar results.
Thanks
Because Kafka is an asynchronous system the observed delay is expected and you cannot do anything to avoid it.
However, if you publish a message to a topic, the KafkaProducer allows you to pass in a Callback to the send() method and the callback will be executed after the message was written to the topic providing the record's metadata like topic, partition, and offset.
After Kafka Streams processed messages, it will eventually commit the offsets (you can configure the commit interval, too). Thus, you can know if the message is in the KTable after the offset was committed. By default, committing happens every 30 seconds only and it's not recommended to use a very short commit interval because it implies large overhead. Thus, I am not sure if this would help for your case, as it seem you want a more timely "response".
As an alternative, you can also disable caching on the KTable and use a toStream().process() step -- after each update to the KTable, the changelog stream provided by toStream() will contain the record and you can access the record metadata (including its offset) in the Processor via the given ProcessorContext object. Thus should also allow you to figure out, when the record is available in the KTable.

Spring Kafka is Acknowledgement.acknowledge thread safe?

I am implementing an kafka based application where I would like to manually acknowledge incoming messages. Architecture forces me to do it in a separate thread.
The question is: is it possible and safe to execute Acknowledgement.acknowledge() in a different thread than consumer?
Yes it is, as long as you use MANUAL and not MANUAL_IMMEDIATE, but I don't think you'll get what you expect.
Kafka doesn't track each message, just offsets within the partition.
Let's say message 1 arrives and you hand off to another thread. Then message 2 arrives and it is handed off to yet another thread.
When the offset for message 2 is ack'd, you are effectively acking both messages.

Confirm JMS message into subprocess

Is there any way to confirm a JMS message in a subprocess?
I have process A that starts with a JMS Queue Receiver (or JMS Topic Subscriber). It calls process B which has to confirm the message received - I'm using Tibco EMS Explicit acknowledge mode.
This will allow me to reuse some parts. Is it possible to do it?
I'm afraid this is not possible. The confirm always has to be in the same process as the receiver.
In a well-designed architecture you do not want to split the messaging (and confirm) layer but rather push all functional processing into a sub-process having a result parameter indicating if the initial message shall be kept (defer processing to a later time by not confirming) or mark it as "processed" (and confirm it).
By default, all (JMS) messages are auto-confirmed so an explicit confirm is a design choice made by you (based on a particular consumption model) in the config tab of the process starter/step. You should only use this if you know what happens with that message and if a processing deferral is possible. Most loosely coupled messaging is not "transactional" (except you decide to take the extra mile) in a DB sense - so rather stick to the auto-confirm if you have no special handling demand! BW/EMS are quite good in handling (reasonably small) messages so NOT auto-confirming can create re-deliveries within milliseconds and flood your whole system (heap space) if not handled properly.

Spring Integration message processing partitioned by header information

I want to be able to process messages with Spring Integration in parallel. The messages come from multiple devices and we need to process messages from the same device in sequential order but the devices can be processed in multiple threads. There can be thousands of devices so I'm trying to figure out how to assign processor based on mod of the device ID using Spring Integration's semantics as much as possible. What approach should I be looking at?
It's difficult to generalize without knowing other requirements (transaction semantics etc) but probably the simplest approach would be a router sending messages to a number of QueueChannels using some kind of hash algorithm on the device id (so all messages for a particular device go to the same channel).
Then, have a single-threaded poller pulling messages from each queue.
EDIT: (response to comment)
Again, difficult to generalize, but...
See AbstractMessageRouter.determineTargetChannels() - a router actually returns a physical channel object (actually a list, but in most cases a list of 1). So, yes, you can create the QueueChannels programmatically and have the router return the appropriate one, based on the message.
Assuming you want all the messages to then be handled by the same downstream flow, you would also need to create a <bridge/> for each queue channel to bridge it to the input channel of the next component in the flow.
create a QueueChannel
create a BridgeHandler (set the outputChannel to the input channel of the next component)
create a PollingConsumer (constructor takes the channel and handler; set trigger etc)
start() the consumer.
All of this can be done in your custom router initialization and implement determineTargetChannels() to select the queue.
Depending on the processing time for your events, I would generally recommend running the downstream flow on the poller thread rather than setting a taskExecutor to avoid issues with the next poll trying to schedule another task before this one's done. You might need to increase the default taskScheduler's pool size.

Get the most out of high performance MDB

Application server creates a new transaction before calling MDB's onMessage method. Also I am processing database update in onMessage method. Transactions create additional overhead and processing several message in one transaction could increase performance.
Is it possible to make App server to use one transaction for several messages. Or maybe there are other approaches to this problem?
And, by the way, I can't use multiple instances, cause I need to preserve the sequence order.
I guess you can store the messages in a list and depending upon how many messages you want to process in one transaction you can check the size of the list and process the messages.

Resources