Spring Boot Manual Acknowledgement of kafka messages is not working - spring-boot

I have a spring boot kafka consumer which consume data from a topic and store it in a Database and acknowledge it once stored.
It is working fine but the problem is happening if the application failed to get the DB connection after consuming the record ,in this case we are not sending the acknowledgement but still the message never consumed until or unless we change the group id and restart the consumer
My consumer looks like below
#KafkaListener(id = "${group.id}", topics = {"${kafka.edi.topic}"})
public void onMessage(ConsumerRecord record, Acknowledgment acknowledgment) {
boolean shouldAcknowledge = false;
try {
String tNo = getTrackingNumber((String) record.key());
log.info("Check Duplicate By Comparing With DB records");
if (!ediRecordService.isDuplicate(tNo)) {---this checks the record in my DB
shouldAcknowledge = insertEDIRecord(record, tNo); --this return true
} else {
log.warn("Duplicate record found.");
shouldAcknowledge = true;
}
if (shouldAcknowledge) {
acknowledgment.acknowledge();
}```
So if you see the above snippet we did not sent acknowledgment.

That is not how kafka offset works here
The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.
From the above statement For example, from the first poll consumer get the message at offset 300 and if it failed to persist into database because of some issue and it will not submit the offset.
So in the next poll it will get the next record where offset is 301 and if it persist data into database successfully then it will commit the offset 301 (which means all records in that partitions are processed till that offset, in above example it is 301)
Solution for this : use retry mechanism until it successfully stores data into database with some limited retries or just save failed data into error topic and reprocess it later, or save the offset of failed records somewhere so later you can reprocess them.

Related

How can I define a "global" job variable that each processor can read/update using Spring Batch?

I have a Spring Batch job with a reader/processor/writer that reads a batch of EmailQueue records, processes/sends them, and then writes the results (success, fail) back into the EmailQueue database table. However, if during the job I have 5+ emails that fail to send (e.g. because the email API is down), I would like the processor to not attempt the send, but instead, mark the remaining EmailQueue objects as "failed" - and then store back in to the database with the writer. I would like my processor look something like the one below, but I can't figure out how to have a "global" monitor for the job that the processor can access.
It may be important to note that my appUserEmailSender.send(emailQueue) method doesn't throw an error if the email failed to send, it only stores the results in the EmailQueue object itself so I can write the results back into the EmailQueue db table.
public EmailQueue process(#NonNull EmailQueue emailQueue) {
// can this variable be defined globally for each job somewhere???
int emailFailSendCount = 0;
// if fail count less than 5, attempt to send email
if (emailFailSendCount<5) {
// send the email
EmailQueue result = appUserEmailSender.send(emailQueue);
// If failed, increase fail count
if (EmailQueueState.FAILED == result.getEmailQueueState()) {
emailFailSendCount++;
}
// if fail count > 5, don't attempt to send, just mark as "failed"
} else {
emailQueue.setEmailQueueState(EmailQueueState.FAILED);
}
return emailQueue;
}
Clearly the above code wouldn't work, but my question is can I define a "global" emailFailSendCount variable that each process can read or update on each processing step?

Spring Boot JPA save() method trying to insert exisiting row

I have a simple kafka consumer that collects events and based on the data in them inserts or updates a record in the database - table has a unique ID constraint on ID column and also in the entity field.
Everything works fine when the table is pre-populated and inserts happen every now and then. However when i truncate the table and send a couple thousand events with limited number of ID (i was doing 50 unique ID within 3k events) then events are processed simultaneously and the save() method randomly fails with Unique constraint violation exception. I debugged it and the outcome is pretty simple.
event1={id = 1 ... //somedata} gets picked up, service method saveOrUpdateRecord() looks for the record by ID=1, finds none, inserts a new record.
event2={id = 1 ... //somedata} gets picked up almost at the same time, service method saveOrUpdateRecord() looks for the record by ID=1, finds none (previous one is mid-inserting), tries to insert and fails with constraint violation exception - should find this record and merge it with the input from the event based on my conditions.
How can i get the saveOrUpdateRecord() to run only when the previous one was fully executed to prevent such behaviour? I really dont want to slow kafka consumer down with poll size etc, i just want my service to execute one transaction at a time.
The service method:
public void saveOrUpdateRecord(Object input) {
Object output = repository.findById(input.getId));
if (output == null) {
repository.save(input);
} else {
mergeRecord(input, output);
repository.save(output);
}
}
Will #Transactional annotaion on method do the job?
Make your service thread safe.
Use this:
public synchronized void saveOrUpdateRecord(Object input) {
Object output = repository.findById(input.getId));
if (output == null) {
repository.save(input);
} else {
mergeRecord(input, output);
repository.save(output);
}
}

Stop KafkaListener ( Spring Kafka Consumer) after it has read all messages till some specific time

I am trying to schedule my consumption process from a single partition topic. I can start it using endpointlistenerregistry.start() but I want to stop it after I have consumed all the messages in current partition i.e. when I reach to last offset in current partition. Production into the topic is done after I have finished the consumption and close it. How should I achieve the assurance that I have read all the messages till the time I started scheduler and stop my consumer ? I am using #Kafkalistener for consumer.
Set the idleEventInterval container property and add an #EventListener method to listen for ListenerContainerIdleEvents.
Then stop the container.
To read till the last offset, you simply poll till you are getting empty records.
You can invoke kafkaConsumer.pause() at the end of consumption. During next schedule it is required to invoke kafkaConsumer.resume().
Suspend fetching from the requested partitions. Future calls to poll(Duration) will not return any records from these partitions until they have been resumed using resume(Collection). Note that this method does not affect partition subscription. In particular, it does not cause a group rebalance when automatic assignment is used.
Something like this,
List<TopicPartition> topicPartitions = new ArrayList<>();
void scheduleProcess() {
topicPartitions = ... // assign partition info for this
kafkaConsumer.resume(topicPartitions)
while(true) {
ConsumerRecords<String, Object> events = kafkaConsumer.poll(Duration.ofMillis(1000));
if(!events.isEmpty()) {
// processing logic
} else {
kafkaConsumer.pause(List.of(topicPartition));
break;
}
}
}

Kafka: Efficiently join windowed aggregates to events

I'm prototyping a fraud application. We'll frequently have metrics like "total amount of cash transactions in the last 5 days" that we need to compare against some threshold to determine if we raise an alert.
We're looking to use Kafka Streams to create and maintain the aggregates and then create an enhanced version of the incoming transaction that has the original transaction fields plus the aggregates. This enhanced record gets processed by a downstream rules system.
I'm wondering the best way to approach this. I've prototyped creating the aggregates with code like this:
TimeWindows twoDayHopping TimeWindows.of(TimeUnit.DAYS.toMillis(2))
.advanceBy(TimeUnit.DAYS.toMillis(1));
KStream<String, AdditiveStatistics> aggrStream = transactions
.filter((key,value)->{
return value.getAccountTypeDesc().equals("P") &&
value.getPrimaryMediumDesc().equals("CASH");
})
.groupByKey()
.aggregate(AdditiveStatistics::new,
(key,value,accumulator)-> {
return AdditiveStatsUtil
.advance(value.getCurrencyAmount(),accumulator),
twoDayHopping,
metricsSerde,
"sas10005_store")
}
.toStream()
.map((key,value)-> {
value.setTransDate(key.window().start());
return new KeyValue<String, AdditiveStatistics>(key.key(),value);
})
.through(Serdes.String(),metricsSerde,datedAggrTopic);;
This creates a store-backed stream that has a records per key per window. I then join the original transactions stream to this window to produce the final output to a topic:
JoinWindows joinWindow = JoinWindows.of(TimeUnit.DAYS.toMillis(1))
.before(TimeUnit.DAYS.toMillis(1))
.after(-1)
.until(TimeUnit.DAYS.toMillis(2)+1);
KStream<String,Transactions10KEnhanced> enhancedTrans = transactions.join(aggrStream,
(left,right)->{
Transactions10KEnhanced out = new Transactions10KEnhanced();
out.setAccountNumber(left.getAccountNumber());
out.setAccountTypeDesc(left.getAccountTypeDesc());
out.setPartyNumber(left.getPartyNumber());
out.setPrimaryMediumDesc(left.getPrimaryMediumDesc());
out.setSecondaryMediumDesc(left.getSecondaryMediumDesc());
out.setTransactionKey(left.getTransactionKey());
out.setCurrencyAmount(left.getCurrencyAmount());
out.setTransDate(left.getTransDate());
if(right != null) {
out.setSum2d(right.getSum());
}
return out;
},
joinWindow);
This produces the correct results, but it seems to run for quite a while, even with a low number of records. I'm wondering if there's a more efficient way to achieve the same result.
It's a config issues: cf http://docs.confluent.io/current/streams/developer-guide.html#memory-management
Disable caching by setting cache size to zero (parameter cache.max.bytes.buffering in StreamsConfig) will resolve the "delayed" delivery to the output topic.
You might also read this blog post for some background information about Streams design: https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/

how to read large volume of messages from Websphere MQ

I want to read 10000 messages from Websphere MQ in groups in sequential order, i am using below code to do the same, but it is taking long time to read all the messages. Even i tried to use multi thread concepts, but sometimes 2 threads are consuming same group and race condition happening. Below is the code snippet.
I am trying to use 3 threads to read 10000 messages from MQ sequentially, but two of my threads accessing same group at time. How to avoid this ? what is best way to read large volume of messages in sequential.? My requirement is i want to read 10000 messages sequentially. Please help.
MQConnectionFactory factory = new MQConnectionFactory();
factory.setQueueManager("QM_host")
MQQueue destination = new MQQueue("default");
Connection connection = factory.createConnection();
connection.start();
Session session = connection.createSession(true, Session.AUTO_ACKNOWLEDGE);
MessageConsumer lastMessageConsumer =
session.createConsumer(destination, "JMS_IBM_Last_Msg_In_Group=TRUE");
TextMessage lastMessage = (TextMessage) lastMessageConsumer.receiveNoWait();
lastMessageConsumer.close();
if (lastMessage != null) {
int groupSize = lastMessage.getIntProperty("JMSXGroupSeq");
String groupId = lastMessage.getStringProperty("JMSXGroupID");
boolean failed = false;
for (int i = 1; (i < groupSize) && !failed; i++) {
MessageConsumer consumer = session.createConsumer(destination,
"JMSXGroupID='" + groupId + "'AND JMSXGroupSeq=" + i);
TextMessage message = (TextMessage)consumer.receiveNoWait();
if (message != null) {
System.out.println(message.getText());
} else {
failed = true;
}
consumer.close();
}
if (failed) {
session.rollback();
} else {
System.out.println(lastMessage.getText());
session.commit();
}
}
connection.close();
I think a better way would be to have a coordinator thread in your application, which would listen for the last messages of groups and for each would start a new thread to get messages belonging in the group assigned to that thread. (This would cater for the race conditions.)
Within the threads getting the messages belonging in a group, you don't need to use a for loop to get each message separately, instead you should take any message belonging in the group, while maintain a group counter and buffering out of order messages. This would be safe as long as you commit your session only after receiving and processing all messages of the group. (This would yield more performance, as each group would be processed by a separate thread, and that thread would only access every message once in MQ.)
Please see IBM's documentation on sequential retrieval of messages. In case the page moves or is changed, I'll quote the most relevant part. For sequential processing to be guaranteed, the following conditions must be met:
All the put requests were done from the same application.
All the put requests were either from the same unit of work, or all the put requests were made outside of a unit of work.
The messages all have the same priority.
The messages all have the same persistence.
For remote queuing, the configuration is such that there can only be one path from the application making the put request, through its
queue manager, through intercommunication, to the destination queue
manager and the target queue.
The messages are not put to a dead-letter queue (for example, if a queue is temporarily full).
The application getting the message does not deliberately change the order of retrieval, for example by specifying a particular MsgId
or CorrelId or by using message priorities.
Only one application is doing get operations to retrieve the messages from the destination queue. If there is more than one
application, these applications must be designed to get all the
messages in each sequence put by a sending application.
Though the page does not state this explicitly, when they say "one application" what is meant is a single thread of that one application. If an application has concurrent threads, the order of processing is not guaranteed.
Furthermore, reading 10,000 messages in a single unit of work as suggested in another response is not recommended as a means to preserve message order! Only do that if the 10,000 messages must succeed or fail as an atomic unit, which has nothing to do with whether they were received in order. In the event that large numbers of messages must be processed in a single unit of work it is absolutely necessary to tune the size of the log files, and quite possibly a few other parameters. Preserving sequence order is torture enough for any threaded async messaging transport without also introducing massive transactions that run for very long periods of time.
You can do what you want with MQ classes for Java (non-JMS) and it may be possible with MQ classes for JMS but be really tricky.
First read this page from the MQ Knowledge.
I converted the pseudo code (from the web page above) to MQ classes for Java and changed it from a browse to a destructive get.
Also, I prefer to do each group of messages under a syncpoint (assuming a reasonable sized groups).
First off, you are missing several flags for the 'options' field of GMO (GetMessageOptions) and the MatchOptions field needs to be set to 'MQMO_MATCH_MSG_SEQ_NUMBER', so that all threads will always grab the first message in the group for the first message. i.e. not grab the 2nd message in the group for the first message as you stated above.
MQGetMessageOptions gmo = new MQGetMessageOptions();
MQMessage rcvMsg = new MQMessage();
/* Get the first message in a group, or a message not in a group */
gmo.Options = CMQC.MQGMO_COMPLETE_MSG | CMQC.MQGMO_LOGICAL_ORDER | CMQC.MQGMO_ALL_MSGS_AVAILABLE | CMQC.MQGMO_WAIT | CMQC.MQGMO_SYNCPOINT;
gmo.MatchOptions = CMQC.MQMO_MATCH_MSG_SEQ_NUMBER;
rcvMsg.messageSequenceNumber = 1;
inQ.get(rcvMsg, gmo);
/* Examine first or only message */
...
gmo.Options = CMQC.MQGMO_COMPLETE_MSG | CMQC.MQGMO_LOGICAL_ORDER | CMQC.MQGMO_SYNCPOINT;
do while ((rcvMsg.messageFlags & CMQC.MQMF_MSG_IN_GROUP) == CMQC.MQMF_MSG_IN_GROUP)
{
rcvMsg.clearMessage();
inQ.get(rcvMsg, gmo);
/* Examine each remaining message in the group */
...
}
qMgr.commit();

Resources