Spring Integration - Poller does not process next message on MessageHandlingException from JobLaunchingGateway - spring

Looking for suggestions on error handling in my Spring Integration/Spring Batch application.
Context
I am using an inbound file adapter to poll for files in the input directory. The file is passed as a parameter to a Spring Batch job via the JobLaunchingGateway. The job is run on its own thread pool using a TaskExecutor.
The files are moved to a processed directory when the job completes or to the error directory when the job execution fails.
The poller uses the AcceptOnceFileListFilter which does not retain processed files between application restarts due to the cache being in-memory. This is not a concern for my application since Spring Batch would throw an exception if the same file shows up in the input directory at a later point in time.
Issue
Day 1 - The input file abc_success005.csv is processed successfully.
Several days later (application has been restarted since Day 1) - When the same file arrived at the input directory erroneously, I noticed that the poller thread received an error from the JobLaunchingGateway and it did not proceed to process the subsequent messages in the current polling cycle.
Note that the poller resumed duties and continued operations on schedule after the polling interval expired.
The SimpleJobLauncher executes the job in a separate thread pool using the TaskExecutor. However, the creation of the job execution uses the caller's thread, which is the poller in my case.
SimpleJobLauncher - From Spring batch source code
jobExecution = jobRepository.createJobExecution(job.getName(), jobParameters);
try {
taskExecutor.execute(new Runnable() {
#Override
public void run() {
try {
if (logger.isInfoEnabled()) {
logger.info("Job: [" + job + "] launched with the following parameters: [" + jobParameters
+ "]");
}
job.execute(jobExecution);
My Integration flow
#Bean
public IntegrationFlow myIntegrationFlow(JobLaunchingGateway jobLaunchingGateway,
FileMessageToJobRequest fileMessageToJobRequest) {
return IntegrationFlows.from(Files.inboundAdapter(new File(properties.getInputDir()))
.filter(new AcceptOnceFileListFilter<>()),
c -> c.poller(Pollers.fixedRate(300, TimeUnit.SECONDS)
.taskExecutor(taskExecutor())
.maxMessagesPerPoll(50)
))
.transform(fileMessageToJobRequest)
.handle(jobLaunchingGateway)
.log(LoggingHandler.Level.WARN, "headers.id + ': ' + payload")
.get();
}
Stack Trace
[ERROR] 2022-08-15 11:59:46,853 o.s.i.h.LoggingHandler error taskExecutor-2id taskExecutor-2 - org.springframework.messaging.MessageHandlingException: nested exception is org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters={input_file_name=/var/tmp/batch/input/abc_success005.csv}. If you want to run this job again, change the parameters., failedMessage=GenericMessage [payload=JobLaunchRequest: processStuffJob, parameters={input_file_name=/var/tmp/batch/input/abc_success005.csv}, headers={file_originalFile=/var/tmp/batch/input/abc_success005.csv, id=298019a3-b548-2a26-1469-78394758d824, file_name=abc_success005.csv, file_relativePath=abc_success005.csv, timestamp=1660579186811}]
at org.springframework.batch.integration.launch.JobLaunchingGateway.handleRequestMessage(JobLaunchingGateway.java:78)
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:136)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:56)
at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115)
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133)
at org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106)
at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:317)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:272)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:187)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:166)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:47)
at org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:109)
at org.springframework.integration.handler.AbstractMessageProducingHandler.sendOutput(AbstractMessageProducingHandler.java:457)
at org.springframework.integration.handler.AbstractMessageProducingHandler.doProduceOutput(AbstractMessageProducingHandler.java:325)
at org.springframework.integration.handler.AbstractMessageProducingHandler.produceOutput(AbstractMessageProducingHandler.java:268)
at org.springframework.integration.handler.AbstractMessageProducingHandler.sendOutputs(AbstractMessageProducingHandler.java:232)
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:142)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:56)
at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115)
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133)
at org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106)
at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:317)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:272)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:187)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:166)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:47)
at org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:109)
at org.springframework.integration.endpoint.SourcePollingChannelAdapter.handleMessage(SourcePollingChannelAdapter.java:196)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.messageReceived(AbstractPollingEndpoint.java:475)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.doPoll(AbstractPollingEndpoint.java:461)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.pollForMessage(AbstractPollingEndpoint.java:413)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.lambda$createPoller$4(AbstractPollingEndpoint.java:348)
at org.springframework.integration.util.ErrorHandlingTaskExecutor.lambda$execute$0(ErrorHandlingTaskExecutor.java:57)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters={input_file_name=/var/tmp/batch/input/abc_success005.csv}. If you want to run this job again, change the parameters.
at org.springframework.batch.core.repository.support.SimpleJobRepository.createJobExecution(SimpleJobRepository.java:139)
The poller thread typically fetches a bunch of files/messages and sends it to the JobLaunchingGateway for processing. When the job fails to launch, the poller does not handle the exception and hence the processing of the rest of the file terminates.
I understand that this doesn't stop the poller from resuming next cycle but it is not desirable for one bad file to end the current polling cycle. This can also lead to situations where one bad file (when left as is) as potentially cause the issue in repeat in several following polling cycles.
Form Spring Integration source code - AbstractPollingEndpoint
private Runnable createPoller() {
return () ->
this.taskExecutor.execute(() -> {
int count = 0;
while (this.initialized && (this.maxMessagesPerPoll <= 0 || count < this.maxMessagesPerPoll)) {
if (this.maxMessagesPerPoll == 0) {
logger.info("Polling disabled while 'maxMessagesPerPoll == 0'");
break;
}
if (pollForMessage() == null) {
break;
}
count++;
}
});
}
What is the recommended solution in this case?
One option I can think of is to extend the JobLaunchingGateway and deal with the file (move to error directory) + log/swallow this exception. From a functional standpoint, there is not much to be done here other than moving the file out + log the error.
I am writing this to seek better solutions. I think it is not desirable for one bad file to starve the execution of other files (even within a polling cycle)

The poller thread typically fetches a bunch of files/messages and sends it to the JobLaunchingGateway for processing.
That's not correct. See FileReadingMessageSource. It does fetch on a first request:
protected AbstractIntegrationMessageBuilder<File> doReceive() {
// rescan only if needed or explicitly configured
if (this.scanEachPoll || this.toBeReceived.isEmpty()) {
scanInputDirectory();
}
File file = this.toBeReceived.poll();
And we produce only one from that queue to the mentioned if (pollForMessage() == null) {
That's true that current polling cycle is cancelled in case of error, but we don't ignore the other cached files. We just comes back to the next this.toBeReceived.poll() in the next polling cycle.
It just was designed that way day first: the maxMessagesPerPoll is a part of single polling cycle unit of work. As well as all the messages we poll from a single executor's task. So, failure somewhere in the middle of task is like a task cancelling.
If you still find this as an inappropriate, see if maxMessagesPerPoll = 1 is OK for you, so you single polling cycle is going to be for a single file.
Another solution is to catch an exception on that .handle(jobLaunchingGateway) and don't let it pass back to the poller. See an ExpressionEvaluatingRequestHandlerAdvice: https://docs.spring.io/spring-integration/docs/current/reference/html/messaging-endpoints.html#message-handler-advice-chain

Related

MassTransit - How to fault messages in a batch

I am trying to utilize the MassTransit batching technique to process multiple messages to reduce the individual queries to be database (read and write).
If there is an exception while processing one/more of the messages, then the expectation is to fault only required messages and have the ability to process the rest of the messages.
This is common scenario in my use case ,what I am trying to establish here is a way to perform batch processing that caters for poisoned messages
For example, if I have a batch size of 10 messages, 10 in the queue and 1 persistently fails, I still need a means of ensuring the other 9 can be processed successfully. It is fine if all 10 need to be returned to the queue and subset re-consumed - but the poisoned message needs to be eliminated somehow. Does this requirement discount the use of batching?
I have tried below, however did solve my use case.
catching the exception and raising NotifyFaulted for that specific message.
modified sample-twitch application, to throw an exception to something like below , based on https://github.com/MassTransit/Sample-Twitch/blob/master/src/Sample.Components/BatchConsumers/RoutingSlipBatchEventConsumer.cs
file.
public Task Consume(ConsumeContext<Batch<RoutingSlipCompleted>> context)
{
if (_logger.IsEnabled(LogLevel.Information))
{
_logger.Log(LogLevel.Information, "Routing Slips Completed: {TrackingNumbers}",
string.Join(", ", context.Message.Select(x => x.Message.TrackingNumber)));
}
for (int i = 0; i < context.Message.Length; i++)
{
try
{
if (i % 2 != 0)
throw new System.Exception("business error -message failed");
}
catch (System.Exception ex)
{
context.Message[i].NotifyFaulted(TimeSpan.Zero, "batch routing silp faulted", ex);
}
}
return Task.CompletedTask;
}
I have dig into a few more threads that look similar to the issue ,for reference.
Masstransit error handling for batch consumer
If you want to use batch, and have a message in that batch that cannot be processed, you should catch the exception and do something else with the poison message. You could write it someplace else, publish some type of event, or whatever else. But MassTransit does not allow you to partially complete/fault messages of a batch.

Ruby time in milli seconds

I have a ruby method
def generate_CurrentDateTime()
puts "Generating current date and time";
dateTimeObj = Time.now();
dateTimeObj.year.to_s + dateTimeObj.month.to_s + dateTimeObj.day.to_s +
dateTimeObj.hour.to_s + dateTimeObj.min.to_s + dateTimeObj.sec.to_s;
end
I would like to add milliseconds in this tried millis.to_s and ms.to_s both are incorrect please help
I already fould an alternative solution
def generate_CurrentDateTime()
puts "Generating current date and time";
Time.now.strftime('%Y%m%d%H%M%S%L');
end
But i wanna know if any direct method is available.
The default shutdown strategy is to gracefully complete all in flight tasks, which means for batch consumers (like the file consumer) it will finish processing the subsequent steps in the route until the end, but will not process more items in the batch because that could be forever. Only the in process task message goes to completion.
You can override this behavior if you know it will eventually finish with the ShutdownRunningTask.CompleteAllTasks parameter:
public void configure() throws Exception {
from(url).routeId("foo").noAutoStartup()
// let it complete all tasks during shutdown
.shutdownRunningTask(ShutdownRunningTask.CompleteAllTasks)
.process(new MyProcessor())
.to("mock:bar");
}

How to safely write to one file from many verticle instances in vert.x 3.2?

Instead of using a logger or database server I'd like to append information to one file from possibly many verticle instances.
There are versions of methods for writing asynchronously to a file.
Can I assume that vertx handles the synchronisation between the writes so that these dont interfere when using those versions of methods marked as ¨async¨ ?
There seems to be a rule that one can rely on vertx providing all isolation between concurrent processing out of the box. But is that true in case of writing file access?
Could you please include a code snippet into the answer that shows how to open and write to one file from many verticle instances with finest possible granularity, e.g. for logging requests.
I wouldn't recommend writing to a single file with many different "writers". Regarding concurrent logging I would stick to the Single Writer principle.
Create a Verticle which subscribes to the Event Bus and listens for messages to be logged. Lets call this Verticle Logger which listens to system.logger.
EventBus eb = vertx.eventBus();
eb.consumer("system.logger", message -> {
// write to file
});
Verticles which like to log something need to send a message to the Logger Verticle:
eventBus.send("system.logger", "foobar");
Appending to a existing file work something like this (didn't test):
vertx.fileSystem().open("file.log", new OpenOptions(), result -> {
if (result.succeeded()) {
Buffer buff = Buffer.buffer(message); // message from consume
AsyncFile file = result.result();
file.write(buff, buff.length() * i, ar -> {
if (ar.succeeded()) {
System.out.println("done");
} else {
System.err.println("write failed: " + ar.cause());
}
});
} else {
System.err.println("open file failed " + result.cause());
}
});

Using Spring #Scheduled and #Async together

Here is my use case.
A legacy system updates a database queue table QUEUE.
I want a scheduled recurring job that
- checks the contents of QUEUE
- if there are rows in the table it locks the row and does some work
- deletes the row in QUEUE
If the previous job is still running, then a new thread will be created to do the work. I want to configure the maximum number of concurrent threads.
I am using Spring 3 and my current solution is to do the following (using a fixedRate of 1 millisecond to get the threads to run basically continuously)
#Scheduled(fixedRate = 1)
#Async
public void doSchedule() throws InterruptedException {
log.debug("Start schedule");
publishWorker.start();
log.debug("End schedule");
}
<task:executor id="workerExecutor" pool-size="4" />
This created 4 threads straight off and the threads correctly shared the workload from the queue. However I seem to be getting a memory leak when the threads take a long time to complete.
java.util.concurrent.ThreadPoolExecutor # 0xe097b8f0 | 80 | 373,410,496 | 89.74%
|- java.util.concurrent.LinkedBlockingQueue # 0xe097b940 | 48 | 373,410,136 | 89.74%
| |- java.util.concurrent.LinkedBlockingQueue$Node # 0xe25c9d68
So
1: Should I be using #Async and #Scheduled together?
2: If not then how else can I use spring to achieve my requirements?
3: How can I create the new threads only when the other threads are busy?
Thanks all!
EDIT: I think the queue of jobs was getting infinitely long... Now using
<task:executor id="workerExecutor"
pool-size="1-4"
queue-capacity="10" rejection-policy="DISCARD" />
Will report back with results
You can try
Run a scheduler with one second delay, which will lock & fetch all
QUEUE records that weren't locked so far.
For each record, call an Async method, which will process that record & delete it.
The executor's rejection policy should be ABORT, so that the scheduler can unlock the QUEUEs that aren't given out for processing yet. That way the scheduler can try processing those QUEUEs again in the next run.
Of course, you'll have to handle the scenario, where the scheduler has locked a QUEUE, but the handler didn't finish processing it for whatever reason.
Pseudo code:
public class QueueScheduler {
#AutoWired
private QueueHandler queueHandler;
#Scheduled(fixedDelay = 1000)
public void doSchedule() throws InterruptedException {
log.debug("Start schedule");
List<Long> queueIds = lockAndFetchAllUnlockedQueues();
for (long id : queueIds)
queueHandler.process(id);
log.debug("End schedule");
}
}
public class QueueHandler {
#Async
public void process(long queueId) {
// process the QUEUE & delete it from DB
}
}
<task:executor id="workerExecutor" pool-size="1-4" queue-capcity="10"
rejection-policy="ABORT"/>
//using a fixedRate of 1 millisecond to get the threads to run basically continuously
#Scheduled(fixedRate = 1)
When you use #Scheduled a new thread will be created and will invoke method doSchedule at the specified fixedRate at 1 milliseconds. When you run your app you can already see 4 threads competing for the QUEUE table and possibly a dead lock.
Investigate if there is a deadlock by taking thread dump.
http://helpx.adobe.com/cq/kb/TakeThreadDump.html
#Async annotation will not be of any use here.
Better way to implement this is to create you class as a thread by implementing runnable and passing your class to TaskExecutor with required number of threads.
Using Spring threading and TaskExecutor, how do I know when a thread is finished?
Also check your design it doesn't seem to be handling the synchronization properly. If a previous job is running and holding a lock on the row, the next job you create will still see that row and will wait for acquiring lock on that particular row.

Issue or confusion with JMS/spring/AMQ not processing messages asynchronously

We have a situation where we set up a component to run batch jobs using spring batch remotely. We send a JMS message with the job xml path, name, parameters, etc. and we wait on the calling batch client for a response from the server.
The server reads the queue and calls the appropriate method to run the job and return the result, which our messaging framework does by:
this.jmsTemplate.send(queueName, messageCreator);
this.LOGGER.debug("Message sent to '" + queueName + "'");
try {
final Destination replyTo = messageCreator.getReplyTo();
final String correlationId = messageCreator.getMessageId();
this.LOGGER.debug("Waiting for the response '" + correlationId + "' back on '" + replyTo + "' ...");
final BytesMessage message = (BytesMessage) this.jmsTemplate.receiveSelected(replyTo, "JMSCorrelationID='"
+ correlationId + "'");
this.LOGGER.debug("Response received");
Ideally, we want to be able to call out runJobSync method twice, and have two jobs simultaneously operate. We have a unit test that does something similar, without jobs. I realize this code isn't very great, but, here it is:
final List result = Collections.synchronizedList(new ArrayList());
Thread thread1 = new Thread(new Runnable(){
#Override
public void run() {
client.pingWithDelaySync(1000);
result.add(Thread.currentThread().getName());
}
}, "thread1");
Thread thread2 = new Thread(new Runnable(){
#Override
public void run() {
client.pingWithDelaySync(500);
result.add(Thread.currentThread().getName());
}
}, "thread2");
thread1.start();
Thread.sleep(250);
thread2.start();
thread1.join();
thread2.join();
Assert.assertEquals("both thread finished", 2, result.size());
Assert.assertEquals("thread2 finished first", "thread2", result.get(0));
Assert.assertEquals("thread1 finished second", "thread1", result.get(1));
When we run that test, thread 2 completes first since it just has a 500 millisencond wait, while thread 1 does a 1 second wait:
Thread.sleep(delayInMs);
return result;
That works great.
When we run two remote jobs in the wild, one which takes about 50 seconds to complete and one which is designed to fail immediately and return, this does not happen.
Start the 50 second job, then immediately start the instant fail job. The client prints that we sent a message requesting that the job run, the server prints that it received the 50 second request, but waits until that 50 second job is completed before handling the second message at all, even though we use the ThreadPoolExecutor.
We are running transactional with Auto acknowledge.
Doing some remote debugging, the Consumer from AbstractPollingMessageListenerContainer shows no unhandled messages (so consumer.receive() obviously just returns null over and over). The webgui for the amq broker shows 2 enqueues, 1 deque, 1 dispatched, and 1 in the dispatched queue. This suggests to me that something is preventing AMQ from letting the consumer "have" the second message. (prefetch is 1000 btw)
This shows as the only consumer for the particular queue.
Myself and a few other developers have poked around for the last few days and are pretty much getting nowhere. Any suggestions on either, what we have misconfigured if this is expected behavior, or, what would be broken here.
Does the method that is being remotely called matter at all? Currently the job handler method uses an executor to run the job in a different thread and does a future.get() (the extra thread is for reasons related to logging).
Any help is greatly appreciated
not sure I follow completely, but off the top, you should try the following...
set the concurrentConsumers/maxConcurrentConsumers greater than the default (1) on the MessageListenerContainer
set the prefetch to 0 to better promote balancing messages between consumers, etc.

Resources