Step Failure not reported by Composed Task Runner or reflected in Spring Cloud Dataflow Tables - spring

Currently we are using Spring Cloud Dataflow to run a sequence of apps we have created based on a definition. Each of the apps we have made are spring batch jobs, with individual steps. The current issue we are having is that when one of these steps inside the app's batch job fails, it is reflected as expected in the step_execution, job_execution, and task_execution tables in the scdf database. However, we are not able to rerun any scdf job that has failed in an app from the top scdf level because it seems the row entry in the step_execution table for SCDF's step related to the overall app never propagates to FAILED in the status column, instead always being COMPLETED no matter what happens. Below I have included a picture which gets across what I am saying. test-simple8-test-app is the app we have created, while check-step, sleep-step, and should-error-step are steps inside the job for that app. You can see in the should-error-step that it has FAILED for both ExitCode and Status, while the entry for the app itself has COMPLETED for status and FAILED for ExitCode.
Relevant Table
We have tried altering what we report in the task_execution table since we saw CTR is looking for certain fields there, but it still seems it does not affect the Status column in step_executions. If we manually change the entry in the db to FAILED for that value, it proceeds as we would expect and as is normal for spring batch, in that it resumes the job from that app and re executes it.
Is there a good way to relieve this problem, or is it a problem with the way we are approaching it?
Edit: Added Flow Diagram for better clarity

Related

Restarting Partition Step in Spring Batch

Our application is a Spring batch running in openshift. The application calls another service via REST to fetch records from database. Both use nginx side car for handling the traffic. Both side cars restarted for some reason and the Spring batch job terminated suddenly .I already implemented retry mechanism using #Retryable but the logic has not even reached the retry part. The only log I found in the application is given below
"Encountered an error executing step myPartitionStep in job myJob","level":"ERROR","thread":"main","logClass":"o.s.batch.core.step.AbstractStep","logMethod":"execute","stack_trace":"o.s.b.core.JobExecutionException: Partition handler returned an unsuccessful step
o.s.b.c.p.support.PartitionStep.doExecute(PartitionStep.java:112)
o.s.batch.core.step.AbstractStep.execute(AbstractStep.java:208)
o.s.b.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:152)
o.s.b.c.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:68)
o.s.b.c.j.f.s.state.StepState.handle(StepState.java:68)
o.s.b.c.j.f.support.SimpleFlow.resume(SimpleFlow.java:169)
o.s.b.c.j.f.support.SimpleFlow.start(SimpleFlow.java:144)
o.s.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:137)
o.s.batch.core.job.AbstractJob.execute(AbstractJob.java:320)
o.s.b.c.l.s.SimpleJobLauncher$1.run(SimpleJobLauncher.java:149)
o.s.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
o.s.b.c.l.s.SimpleJobLauncher.run(SimpleJobLauncher.java:140)
j.i.r.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java)
j.i.r.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
j.i.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:566)
o.s.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
o.s.a.f.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
o.s.a.f.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
o.s.b.c.c.a.SimpleBatchConfiguration$PassthruAdvice.invoke(SimpleBatchConfiguration.java:128)
... 13 frames truncated\n"
I am not able to point what exactly is the reason for this error. It stopped at partition step which uses itemReader to call another service and fetche the records,FlatFileItemWriter which writes the records. We cannot afford to have duplicates in our file. Is it possible to restart the app exactly where it stopped without having duplicates?
The stacktrace you shared is truncated, so it is not possible to see the root cause from what you shared.
Spring Batch supports restarting a failed paritioned step, as long as you use a persistent job repository. You need to restart the same job instance, ie use the same job parameter that you used in your first run (that failed). Only failed partitions will be rerun. Any failed partition will resume from where it left off.

Terraform and OCI : "The existing Db System with ID <OCID> has a conflicting state of UPDATING" when creating multiple databases

I am trying to create 30 databases (oci_database_database resource) under 5 existing db_homes. All of these resources are under a single DB System :
When applying my code, a first database is successfully created then when terraform attempts to create the second one I get the following error message : "Error: Service error:IncorrectState. The existing Db System with ID has a conflicting state of UPDATING", which causes the execution to stop.
If I re-apply my code, the second database is created then I get the same previous error when terraform attempts to create the third one.
I am assuming I get this message because terraform starts creating the following database as soon as the first one is created, but the DB System status is not up to date yet (still 'UPDATING' instead of 'AVAILABLE').
A good way for the OCI provider to avoid this issue would be to consider a database creation as completed when the creation is indeed completed AND the associated db home and db system's status are back to 'AVAILABLE'.
Any suggestion on how to adress the issue I am encountering ?
Feel free to ask if you need any additional information.
Thank you.
As mentioned above, it looks like you have opened a ticket regarding this via github. What you are experiencing should not happen, as terraform should retry after seeing the error. As per your github post, the person helping you is in need of your log with timestamp so they can better troubleshoot. At this stage I would recommend following up there and sharing the requested info.

Spring Batch: Horizontal scaling of Job Repository

I read a lot about how to enable parallel processing and chunking of an individual job, using Master/Slave paradigm. Consider an already implemented Spring Batch solution that was intended to run on a standalone server. With minimal refactoring I would like to enable this to horizontally scale and be more resilient in production operation. Speed and efficiency is not a goal.
http://www.mkyong.com/spring-batch/spring-batch-hello-world-example/
In the following example a Job Repository is used that connects to an initializes a database schema for the Job Repository. Job initiation requests are fed to a message queue, that a single server, with a single Java process is listening on via Spring JMS. When encountering this it executes a new Java process that is the Spring Batch job. If the job has not been started according to the Job Repository it will begin. If the job had failed it will pick up where the job left off. If the job is in process it will ignore.
The single point of failure is the single server and single listening process for job initiation. I would like to increase resiliency by horizontally scaling identical server instances all competing for who can first grab the job initiation message when it first appears in the queue. That server instance will now attempt to run the job.
I was conceiving that all instances of the JobRepository would share the same schema, so they can all query for when the status is currently in process and decide what they will do. I am unsure though if this schema or JobRepository implementation is meant to be utilized by multiple instances.
Is there a risk in pursuing this that this approach could result in deadlocking the database? There are other constraints to where the Partition features of Spring Batch will not work for my application.
I decided to build a prototype to test if the condition that the Spring Batch Job Repository schema and SimpleJobRepository can be used in a load balanced way with multiple Spring Batch Java processes running concurrently. I was afraid that deadlock scenarios might have occurred at the database to where all running job processes get stuck.
My Test
I started with the mkyong Spring Batch HelloWorld example and made some changes to it where it could be packaged into a Jar that can be executed from the command line. I also removed the initialize database step defined in the database.config file and manually established a local MySQL server with the proper schema elements. I added a Job parameter for time to be the current time in millis so that each job instance would be unique.
Next, I wrote a separate Java main class that used Apache Commons Exec framework to create 50 sub processes with no wait between them. Each of these processes have a Thread.sleep for 1 second within their Processor objects as well so that a number of processes will all kick off at the same time and all attempt to access the database at the same time.
Results
After running this test a number of times in a row I see that all 50 Spring batch processes consistently complete successfully and update the same database schema correctly. I don't see any indication that if there were multiple Spring Batch job processes running on multiple servers connecting to the same database that they would interfere with each other on the schema nor do I see any indication that a deadlock could happen at this time.
So it sounds as if load balancing of Spring Batch jobs without the use of advanced Master/Slave and Step Partitioning approaches is a valid use case.
If anybody would like to comment on my test or suggest ways to improve it I would appreciate it.
Here is excerpt from
Spring Batch docs on how Spring Batch handles database updates for its repository:
Spring Batch employs an optimistic locking strategy when dealing with updates to the database. This means that each time a record is 'touched' (updated) the value in the version column is incremented by one. When the repository goes back to save the value, if the version number has changed it throws an OptimisticLockingFailureException, indicating there has been an error with concurrent access. This check is necessary, since, even though different batch jobs may be running in different machines, they all use the same database tables.

Expired job executions in Spring Batch

Problem
Users can submit data to generate a report, which triggers a spring-batch job. If the same data is submitted (by the same user or another user) the same report should be generated such that Spring Batch doesn't start a new job, under the premise that the report has already been generated.
To make matters a little more complicated, generated reports expire after 90 days. The idea behind this is that the data gleaned from various web services used to build the report is likely out of date. Therefore, after 90 days the report should be re-generated using new data from those web services.
Questions
When a job has already run, how can I discover the job execution id for that job? This id is used in the URL to uniquely identify a report. JobExplorer is severely limited in querying Spring Batch data.
How can I trigger another instance of the job only after 90 days? The issue is that given duplicate job parameters, a JobInstanceAlreadyCompleteException will be thrown. Must I encode the 90 days has an extra identifying parameter, or is there an easier way?
Clean up old jobs must be done using business methods as well as for expired reports.
After this premise you can try a different path to solve your problem:
Every user launch a different job, with same report properties but
an extra job-parameters to make every jobs different
First step is to check - using business method - if there is a running job for that report ; in this case notify user he have to wait or retry later (use a decider)
Second step is to check if there is a completed - not expired - report using a business method;if so retrieve it and show to user (use a decider as for step 2)
Generate report (delete old, if necessary)
Show report to user.
Of course, generated report metadata tables are different from SB tables and should be accessed using DAO related to domain context (the report in your case).
Can this a valid alternative?

Spring Batch Admin: Schedule new jobs through web GUI

A newbie question on Sprint Batch Admin.
My requirement is that the user should be able to schedule new jobs (passing some parameters for the job functionality) through a web UI. These jobs should be persistent, will be repetitive and could be cancelled or deleted. Also, a report could be generated for last run jobs and to list all the existing jobs with their next run dates.
Perhaps my most important requirement is that this should be possible "on the fly", not requiring redeploying the web-application or a server re-start.
Can this be done using Spring Batch Admin (I see that the guide talks about uploading an XML for adding a job but that seems tedious, if there is an API why shouldn't we be able to create a job on the fly through the Batch Admin Web UI)? Or does JDK Timer or Quartz support it?
Once a job has been created, it can't be deleted, but it can be stopped. Allowing deletion from DB is a risky operation, as Spring Batch might have already been started the job execution, but the DB has not been updated yet. If one removes the job at this moment, you have inconsistency.
Scheduling a new job is described in Launch Job. It is not possible to create new types of jobs, as jobs can generally have complicated configuration which is parsed only once when Spring Context is loaded.
Dynamic deployment (on the fly) of jobs and configurations, without requiring server restart, is a feature we implemented in Trooper Batch Profile - it is not exactly Spring Batch admin but builds on it. You continue to write your jobs using Spring batch, just the container changes for in Trooper you would use its Batch profile runtime. Screen shots and features are here : https://github.com/regunathb/Trooper/wiki/Writing-Batch-jobs-in-Trooper
I think we can deploy the each spring batch job by a SBA. I mean each batch job will be compiled as a war file. We deploy them together in server. In this way, we have the following visiting urls to monitor each jobs:
h t t p://bactchjobserver/job1
h t t p://bactchjobserver/job2
h t t p://bactchjobserver/job3
h t t p://bactchjobserver/job4
But the downside is that each war fill surely contains lib files, which make each war file like 10MB size.
At the same time, I tried to manually add new-job.xml to war-file\WEB-INF\classes\META-INF\spring\batch\jobs, and new-job.jar to war-file\WEB-INF\lib without stopping JBoss. It works. The new job can be showed in SBA UI and runnable.
But obviously this would lead much maintenance and trouble shooting. It is not implementable.

Resources