Spring Batch - retrieve exceptions with async job - spring

I'm configuring a (long running) job in Spring Batch with an async JobLauncher, and I have two REST endpoint:
/start --> will start the job asynchronously and return the job_execution_id to the client
/status/{job_execution_id} --> will return the status of the job execution based on data stored in JobExecutionContext
In the /status endpoint, I would like to inform the client about any exceptions occurred during the process.
However, I'm not able to retrieve them in the way I was doing with the sync version of the same job:
jobExecution.getAllFailureExceptions() --> empty list
stepExecution.getFailureExceptions() --> empty list
Is there a way to tell Spring Batch to store the exception stacktrace (or at least the exception message), so I can retrieve it later?
Thanks
Giulio

Failure exceptions are added after the job execution is finished (more precisely right before the job is about to finish). So they are not available while the job is running. That's why you can't get them if you call the /status endpoint while the job is running asynchronously in the background.
The same applies for step failure exceptions, but those should be available as soon as the step is finished (while eventual subsequent steps are still running and the surrounding job as well).

Related

Restarting Partition Step in Spring Batch

Our application is a Spring batch running in openshift. The application calls another service via REST to fetch records from database. Both use nginx side car for handling the traffic. Both side cars restarted for some reason and the Spring batch job terminated suddenly .I already implemented retry mechanism using #Retryable but the logic has not even reached the retry part. The only log I found in the application is given below
"Encountered an error executing step myPartitionStep in job myJob","level":"ERROR","thread":"main","logClass":"o.s.batch.core.step.AbstractStep","logMethod":"execute","stack_trace":"o.s.b.core.JobExecutionException: Partition handler returned an unsuccessful step
o.s.b.c.p.support.PartitionStep.doExecute(PartitionStep.java:112)
o.s.batch.core.step.AbstractStep.execute(AbstractStep.java:208)
o.s.b.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:152)
o.s.b.c.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:68)
o.s.b.c.j.f.s.state.StepState.handle(StepState.java:68)
o.s.b.c.j.f.support.SimpleFlow.resume(SimpleFlow.java:169)
o.s.b.c.j.f.support.SimpleFlow.start(SimpleFlow.java:144)
o.s.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:137)
o.s.batch.core.job.AbstractJob.execute(AbstractJob.java:320)
o.s.b.c.l.s.SimpleJobLauncher$1.run(SimpleJobLauncher.java:149)
o.s.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
o.s.b.c.l.s.SimpleJobLauncher.run(SimpleJobLauncher.java:140)
j.i.r.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java)
j.i.r.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
j.i.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:566)
o.s.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
o.s.a.f.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
o.s.a.f.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
o.s.b.c.c.a.SimpleBatchConfiguration$PassthruAdvice.invoke(SimpleBatchConfiguration.java:128)
... 13 frames truncated\n"
I am not able to point what exactly is the reason for this error. It stopped at partition step which uses itemReader to call another service and fetche the records,FlatFileItemWriter which writes the records. We cannot afford to have duplicates in our file. Is it possible to restart the app exactly where it stopped without having duplicates?
The stacktrace you shared is truncated, so it is not possible to see the root cause from what you shared.
Spring Batch supports restarting a failed paritioned step, as long as you use a persistent job repository. You need to restart the same job instance, ie use the same job parameter that you used in your first run (that failed). Only failed partitions will be rerun. Any failed partition will resume from where it left off.

Spring batch limit job execution

My spring batch application is running on PCF platform which is connected to MySQL database (single instance), it's running fine when only an instance is up & running but when it comes to more than one application instance, I'm getting exception org.springframework.dao.DuplicateKeyException. This might be happening because similar batch job is firing at the same time & trying to update batch instance table with same job ID. Is there any way to restrict this kind of failure or in another way, I wanted a solution where only one batch job will run at a time even there are multiple instances running.
For me , it is a good sign that DuplicateKeyException is thrown. Because it exactly achieves what you want to do is that spring-batch already makes sure that the same job execution will not executed in parallel. (i.e. Only one server instance execute the job successfully while other fail to execute)
So I see no harms in your case. If you don't like this exception , you can catch it and re-throw it as your application level exception saying something like "The job is executing by other sever instances , so skip to execute it."
If you really want that only one server instance will try to trigger to execute a job and other servers will not try to trigger in the meantime , it is not the problem of spring-batch but is a problem about how you ensure that only one server node will fires the request in the distributed environment. If the batch job is fired as a scheduled task using #Scheduled , you can consider to use a distributed lock such as ShedLock to make sure that it is executed at most once at the same time on one node only.

Correct scope for multi threaded batch jobs in spring

I believe I've got a scoping issue here.
Project explanation:
The goal is to process any incoming file (on disk), including meta data (which is stored in an SQL database). For this I have two tasklets (FileReservation and FileProcessorTask) which are the steps in the overarching "worker" jobs. They wait for an event to start their work. There are several threads dealing with jobs for concurrency. The FileReservation tasklet sends the fileId to FileProcessorTask using the job context.
A separate job (which runs indefinitely) checks for new file meta data records in the database and upon discovering new records "wakes up" the FileReservationTask tasklets using a published event.
With the current configuration the second step in a job can receive a null message when the FileReservation tasklets are awoken.
If you uncomment the code in BatchConfiguration you'll see that it works when we have separate instances of the beans.
Any pointers are greatly appreciated.
Thanks!
Polling a folder for new files is not suitable for a batch job. So using a Spring Batch job (filePollingJob) is not a good idea IMO.
Any pointers are greatly appreciated.
Polling a folder for new files and running a job for each incoming file is a common use case, which can be implemented using a java.nio.file.WatchService or a FileInboundChannelAdapter from Spring integration. See How do I kickoff a batch job when input file arrives? for more details.

How to poll in Mule until all jobs are completed

I have a requirement to build a workflow triggers a set of batch jobs by calling an API and then polls another API to check when each batch job is completed. Only when all batch jobs are complete then the workflow can move onto the next step. What is the best way to do this?
I had thought about using the Poll component but I am not sure how I could start and stop the poll as my experience has been to run the poll at a scheduled time or continually poll the external source. My current train of thought is to use a flag in the expression box which is set to true once all batch jobs are completed.
The other issue is that the batch job Ids are all in a JSON object and what would be the best way to check off each batch job Id as the API starts to return results showing the batch jobs completing?
I am using Anypoint Studio 6.2 and Mule 3.8.3
Thanks
First - Assume your api call trigger 5 batch jobs. On each job completes you need to update the status of job success/failure/inprogress in db or in object store ( or any other retrievable way).
Let say your minimum time your jobs will complete in 1 hour.
Assume you are updating status in DB.
Create poll flow to check whether your job status is success/failure in db for every one hour and make your flow in stoppedstate.
<flow name="Job_status_check_flow" initialState="stopped">
<poll doc:name="Poll">
<fixed-frequency-scheduler frequency="1" timeUnit="HOURS"/>
<logger level="INFO" doc:name="Logger"/>
</poll>
<logger message="poll" level="INFO" doc:name="Logger"/>
<db component or object store here --->
</flow>
Since the flow is in stopped condition, poll wont trigger until the flow changed to start state. You have a control.
Always the flow will be in stopped state.When you request api to trigger 5 batch jobs at the same time also start the 'Job_status_check_flow' to start( you can use groovy component- to start and stop the flow based on the condition). Please check the links below
Starting a mule flow programmatically using groovy
auto-starting Mule flow
In this case, Poll flow check the status for every 1 hour until DB retrieves all the status of 5 jobs to completed status. If so have a groovy component in end of the 'Job_status_check_flow' to stopped state. So that poll wont trigger again.

Spring Batch: Horizontal scaling of Job Repository

I read a lot about how to enable parallel processing and chunking of an individual job, using Master/Slave paradigm. Consider an already implemented Spring Batch solution that was intended to run on a standalone server. With minimal refactoring I would like to enable this to horizontally scale and be more resilient in production operation. Speed and efficiency is not a goal.
http://www.mkyong.com/spring-batch/spring-batch-hello-world-example/
In the following example a Job Repository is used that connects to an initializes a database schema for the Job Repository. Job initiation requests are fed to a message queue, that a single server, with a single Java process is listening on via Spring JMS. When encountering this it executes a new Java process that is the Spring Batch job. If the job has not been started according to the Job Repository it will begin. If the job had failed it will pick up where the job left off. If the job is in process it will ignore.
The single point of failure is the single server and single listening process for job initiation. I would like to increase resiliency by horizontally scaling identical server instances all competing for who can first grab the job initiation message when it first appears in the queue. That server instance will now attempt to run the job.
I was conceiving that all instances of the JobRepository would share the same schema, so they can all query for when the status is currently in process and decide what they will do. I am unsure though if this schema or JobRepository implementation is meant to be utilized by multiple instances.
Is there a risk in pursuing this that this approach could result in deadlocking the database? There are other constraints to where the Partition features of Spring Batch will not work for my application.
I decided to build a prototype to test if the condition that the Spring Batch Job Repository schema and SimpleJobRepository can be used in a load balanced way with multiple Spring Batch Java processes running concurrently. I was afraid that deadlock scenarios might have occurred at the database to where all running job processes get stuck.
My Test
I started with the mkyong Spring Batch HelloWorld example and made some changes to it where it could be packaged into a Jar that can be executed from the command line. I also removed the initialize database step defined in the database.config file and manually established a local MySQL server with the proper schema elements. I added a Job parameter for time to be the current time in millis so that each job instance would be unique.
Next, I wrote a separate Java main class that used Apache Commons Exec framework to create 50 sub processes with no wait between them. Each of these processes have a Thread.sleep for 1 second within their Processor objects as well so that a number of processes will all kick off at the same time and all attempt to access the database at the same time.
Results
After running this test a number of times in a row I see that all 50 Spring batch processes consistently complete successfully and update the same database schema correctly. I don't see any indication that if there were multiple Spring Batch job processes running on multiple servers connecting to the same database that they would interfere with each other on the schema nor do I see any indication that a deadlock could happen at this time.
So it sounds as if load balancing of Spring Batch jobs without the use of advanced Master/Slave and Step Partitioning approaches is a valid use case.
If anybody would like to comment on my test or suggest ways to improve it I would appreciate it.
Here is excerpt from
Spring Batch docs on how Spring Batch handles database updates for its repository:
Spring Batch employs an optimistic locking strategy when dealing with updates to the database. This means that each time a record is 'touched' (updated) the value in the version column is incremented by one. When the repository goes back to save the value, if the version number has changed it throws an OptimisticLockingFailureException, indicating there has been an error with concurrent access. This check is necessary, since, even though different batch jobs may be running in different machines, they all use the same database tables.

Resources