Spring batch processor crash in between reading file and causing duplicate key exception - spring

I am currently on batch processor and the issue we are facing currently is that when Batch processor is reading files, it could restart unexpectedly in middle of reading files, this will make the full flow not working because when the BP resumes reading file it may be reading file that is already saved in database and causing duplicate key exception.
So, I have been told to implement the solution where when the BP runs into duplicate key exception, it should read the file from bottom to top and when it runs into duplicate key exception again, it should move to next file.
I am looking for advice/guidance on how to implement/code this solution?

A correctly configured Spring Batch job (persistent job repository + chunk-oriented step) would allow you to restart that kind of failed jobs without any issue.
In fact, the read count will be saved in the database and used in a restart scenario. No data would be written to the database in case of a chunk failure (the transaction will rolled back). So upon restart, the job would resume reading from the last save point and save new data.

Related

Restarting Partition Step in Spring Batch

Our application is a Spring batch running in openshift. The application calls another service via REST to fetch records from database. Both use nginx side car for handling the traffic. Both side cars restarted for some reason and the Spring batch job terminated suddenly .I already implemented retry mechanism using #Retryable but the logic has not even reached the retry part. The only log I found in the application is given below
"Encountered an error executing step myPartitionStep in job myJob","level":"ERROR","thread":"main","logClass":"o.s.batch.core.step.AbstractStep","logMethod":"execute","stack_trace":"o.s.b.core.JobExecutionException: Partition handler returned an unsuccessful step
o.s.b.c.p.support.PartitionStep.doExecute(PartitionStep.java:112)
o.s.batch.core.step.AbstractStep.execute(AbstractStep.java:208)
o.s.b.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:152)
o.s.b.c.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:68)
o.s.b.c.j.f.s.state.StepState.handle(StepState.java:68)
o.s.b.c.j.f.support.SimpleFlow.resume(SimpleFlow.java:169)
o.s.b.c.j.f.support.SimpleFlow.start(SimpleFlow.java:144)
o.s.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:137)
o.s.batch.core.job.AbstractJob.execute(AbstractJob.java:320)
o.s.b.c.l.s.SimpleJobLauncher$1.run(SimpleJobLauncher.java:149)
o.s.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
o.s.b.c.l.s.SimpleJobLauncher.run(SimpleJobLauncher.java:140)
j.i.r.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java)
j.i.r.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
j.i.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:566)
o.s.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
o.s.a.f.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
o.s.a.f.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
o.s.b.c.c.a.SimpleBatchConfiguration$PassthruAdvice.invoke(SimpleBatchConfiguration.java:128)
... 13 frames truncated\n"
I am not able to point what exactly is the reason for this error. It stopped at partition step which uses itemReader to call another service and fetche the records,FlatFileItemWriter which writes the records. We cannot afford to have duplicates in our file. Is it possible to restart the app exactly where it stopped without having duplicates?
The stacktrace you shared is truncated, so it is not possible to see the root cause from what you shared.
Spring Batch supports restarting a failed paritioned step, as long as you use a persistent job repository. You need to restart the same job instance, ie use the same job parameter that you used in your first run (that failed). Only failed partitions will be rerun. Any failed partition will resume from where it left off.

Step Failure not reported by Composed Task Runner or reflected in Spring Cloud Dataflow Tables

Currently we are using Spring Cloud Dataflow to run a sequence of apps we have created based on a definition. Each of the apps we have made are spring batch jobs, with individual steps. The current issue we are having is that when one of these steps inside the app's batch job fails, it is reflected as expected in the step_execution, job_execution, and task_execution tables in the scdf database. However, we are not able to rerun any scdf job that has failed in an app from the top scdf level because it seems the row entry in the step_execution table for SCDF's step related to the overall app never propagates to FAILED in the status column, instead always being COMPLETED no matter what happens. Below I have included a picture which gets across what I am saying. test-simple8-test-app is the app we have created, while check-step, sleep-step, and should-error-step are steps inside the job for that app. You can see in the should-error-step that it has FAILED for both ExitCode and Status, while the entry for the app itself has COMPLETED for status and FAILED for ExitCode.
Relevant Table
We have tried altering what we report in the task_execution table since we saw CTR is looking for certain fields there, but it still seems it does not affect the Status column in step_executions. If we manually change the entry in the db to FAILED for that value, it proceeds as we would expect and as is normal for spring batch, in that it resumes the job from that app and re executes it.
Is there a good way to relieve this problem, or is it a problem with the way we are approaching it?
Edit: Added Flow Diagram for better clarity

Spring batch integration file lock access

I have a spring batch integration where multiple servers are polling a single file directory. This causes a problem where a file can be processed up by more than one. I have attempted to add a nio-lock onto the file once a server has got it but this locks the file for processing so it can't read the contents of the file.
Is there a spring batch/integration solution to this problem or is there a way to rename the file as soon as it is picked up by a node?
Consider to use FileSystemPersistentAcceptOnceFileListFilter with the shared MetadataStore: http://docs.spring.io/spring-integration/reference/html/system-management-chapter.html#metadata-store
So, only one instance of your application will be able to pick up a file.
Even if we find a solution for nio-lock, you should understand that lock means "do not touch until freed". Therefore when one instance has done its work, another one is ready to pick up the file. I guess that isn't your goal.

Spring batch - One transaction over whole Job

I am using Spring-Batch to execute a batch that creates some objects in the database, creates a file from these objects and then sends the file to a FTP server.
Thus, I have 2 steps : One that reads conf from DB, insert into the DB and creates the file ; the second sends the file to the FTP server.
The problem is when there is a problem with the FTP server, I can't rollback the transaction (to cancel the new inserts into the DB).
How can I configure my Job to use just one transaction over the different steps?
This is a bad idea due to transactional nature of spring-batch.
IMHO a simple solution should be to mark data saved in step 1 with a token generated when job starts and, if your FTP upload will fail, move to a cleanup step to delete all data with token.
A agree with bellabax: this is a bad idea.
But I wouldn't do a 3rd cleanup step because this step may also fail, letting the transaction not rollbacked.
You could mark the inserted entries with a flag that indicates the entries has not yet been sent to the FTP.
The 3rd step would switch the flag to indicate that these entries has been sent to the FTP.
Then you just need a cron/batch/4th cleaning step/whatever that would remove all entries that haven't been sent to the FTP

VB6 application keeps lock on Access (.mdb) database after creation, causing an error 3028

Our application builds an Access database (.mdb) and then starts a different application with the Shell command which needs Read/Write Access to this very database. The problem is that on some systems our application seems erratically to retain an exclusive lock on the database, preventing the other application from accessing it. Only after closing down the first application can the other application proceed.
The specific Error that is raised is Error 3028, which seems to be specific for DAO 3.51 (Access '97) which we indeed employ. I cannot understand why some systems are affected (and then not consistently) and others never. I thought that it might be a timing issue and built in a Sleep period between building the database and launching the other application, but that does not help.
What is going on?
EDIT:
I now created a workaround by creating the database in a separate file and then copying it. Now the second program should always be able to access it and any remaining lock problems will surface in the first program, which I maintain. I will follow up later when our users have been able to test this.
Are you closing the connection to the DB before passing control to another EXE?
I had a similar issue previously which wasn't quite the same but from what you have described this is the approach I would try:
Before lauching the secondary application with the shell command.
Alongside the sleep period you have already employed you will also need to close the original program which generated the .mdb file.
I achieved this by shelling a windows batch file, and then immediately exiting the original program.
Batch file makeup as follows:
ping -n 5 localhost >NUL
start MSAccess.exe "C:\DB.mdb"
exit
This allows 5 seconds for the mdb file to be freed-up before launching, you could replace my Ms Access call with your secondary program.

Resources