Add decider as a first step for a Spring Batch job (in JavaConfig style) - spring-boot

We are migrating XML-style Spring Batch jobs to JavaConfig, and found out that it seems not possible to use a decider() as the first step in the job flow logic, right after start(). We need to put a dummy step in order to invoke a decider.
However, in XML, this configuration works perfectly fine:
<batch:decision decider="exitsDecider" id="exitsInstance">
<batch:next on="CONTINUE" to="jobStartStep" />
<batch:end on="COMPLETED" exit-code="COMPLETED" />
<batch:fail on="FAILED" exit-code="FAILED" />
</batch:decision>
<batch:step id="jobStartStep" next="validateStep">
<batch:tasklet ref="jobStartTasklet" />
</batch:step>
Don't know whether this is an undocumented feature, or just some side of corner case. What would be the equivalent in Java Config?

Related

Spring Batch Job Chaining execution not waiting for previous job to complete in Jboss

I have chained a set of Spring batch jobs in an order.
<batch:job id="rootJob">
<batch:step id="rootJob.step1">
<batch:job ref="externalJob1">
<batch:next on="COMPLETE" to="rootJob.step2">
</batch:step>
<batch:split id="rootJob.step2">
<batch:flow>
<batch:step id="splitStep1">
<batch:job ref="externalJob2">
</batch:step>
</batch:flow>
<batch:flow>
<batch:step id="splitStep2">
<batch:job ref="externalJob3">
</batch:step>
</batch:flow>
<batch:next on="COMPLETE" to="rootJob.step3">
</batch:split>
<batch:step id="rootJob.step3">
<batch:job ref="externalJob4">
</batch:step>
</batch:job>
The expectation of job flow execution.
1. On Completion of rootJob.step1 execute rootJob.step2.
2. Execute splitJob1 and splitJob2 in parallel.
3. On Completion of rootJob.step2 execute rootJob.step3
But when deployed and triggered in Jboss. The flow is not executing as expected. The steps are getting triggered in single stretch. The execution is not waiting for previous step to complete and getting launched instantly.
I suspect the TaskExecutor. In standalone we do not specify any task executor (defaults to SyncTaskExecutor) and the job flow works fine. But when deployed in Jboss we use SimpleAsyncTaskExecutor, as using SyncTaskExecutor doesnt even trigger job in Jboss.
What am i missing here or Am i doing something wrong here.? Please suggest.
Resolved the issue.
I had provided the job-launcher="jobLauncher" property like below. So separate threads were launched and the jobs were triggering in parallel.
<batch:job ref="externalJob1" job-launcher="jobLauncher">
Now i have removed the joblauncher reference from all the jobs and the jobs are triggering as designed.

Defining 2 splits to run a set of steps in parallel

I have a job configuration where I load a set of files in parallel, after the set of files is loaded I also want to load another set of files in parallel, but only after the first set is completely loaded. The 2nd set has referential fields to the first set. I thought I can use a second split but never got it working, in the xsd it seems you can define more than one split and obviously a flow does not help me with my requirement.
So how do I define 2 sets of parallel flows which run in sequence to each?
<job>
<split>
<flow>
<step next="step2"/>
<step id="step2"/>
</flow>
<flow>
<step ...>
</flow>
</split>
<split ../>
Asoub was right, it is simply possible, I did a simple config and it worked. So seems the original issue I got has some other issue which causes problems when defining 2 splits.
Simple config I used:
<batch:job id="batchJob" restartable="true">
<batch:split id="x" next="y">
<batch:flow>
<batch:step id="a">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
</batch:tasklet>
</batch:step>
</batch:flow>
<batch:flow>
<batch:step id="b">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
</batch:tasklet>
</batch:step>
</batch:flow>
</batch:split>
<batch:split id="y" next="e">
<batch:flow>
<batch:step id="c">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
</batch:tasklet>
</batch:step>
</batch:flow>
<batch:flow>
<batch:step id="d">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
</batch:tasklet>
</batch:step>
</batch:flow>
</batch:split>
<batch:step id="e">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
</batch:tasklet>
</batch:step>
</batch:job>
INFO: Job: [FlowJob: [name=batchJob]] launched with the following parameters: [{random=994444}]
Nov 23, 2016 11:33:24 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [a]
Nov 23, 2016 11:33:24 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [b]
Nov 23, 2016 11:33:24 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [c]
Nov 23, 2016 11:33:24 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [d]
Nov 23, 2016 11:33:24 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [e]
Nov 23, 2016 11:33:25 PM org.springframework.batch.core.launch.support.SimpleJobLauncher run
INFO: Job: [FlowJob: [name=batchJob]] completed with the following parameters: [{random=994444}] and the following status: [COMPLETED]
As I said in comments, "So how do I define 2 sets of parallel flows which run in sequence to each?" doesn't make sense per se, you can't start two step in parrallel and sequentially.
Still I think you want to "start loading file2 in step2 when file1 in step1 as finished loading". Which means that loading a file occurs in the middle of a step. I see two way of solving this.
Let's say this is your configuration:
<job id="job1">
<split id="split1" task-executor="taskExecutor" next="step3">
<flow>
<step id="step1" parent="s1"/>
</flow>
<flow>
<step id="step2" parent="s2"/>
</flow>
</split>
<step id="step3" parent="s4"/> <!-- not important here -->
</job>
<beans:bean id="taskExecutor" class="org.spr...SimpleAsyncTaskExecutor"/>
But this will start both of your step in parrallel immediatly. You need to prevent the start of step 2. So, you need to use a Delegate in your step2's reader that will immediatly stop from loading file2, and waits for a signal to start the reading. And somewhere in the code of the step1, where you consider loading to be done, you launch a signal to step2's delegate reader to start loading file2.
The second solution is: you create your own SimpleAsyncTaskExecutor which will start step1 and wait for the signal from step1 to start step2. It's basically the first solution, but you wait for the signal in your custom Executor rather than in a Delegate reader. (you can copy source code from SimpleAsyncTaskExecutor to get an idea)
This comes at a cost, if the step1 never reaches the part where it signal step2 to start loading, your batch will hang forever. Maybe an exception in loading could cause this. As for signal mechanisms, Java has a lot of way to do this (wait() and notifiy(), locks, semaphore, non-standard library maybe).
I don't think there is some king of parrallel step trigger in spring batch (but if there is, someone posts it).
I've already answered a little while asking to your question, you need 2 splits: the first one loads the set of files A, and second, set of files B.
<job id="job1">
<split id="splitForSet_A" task-executor="taskExecutor" next="splitForSet_B">
<flow><step id="step1" parent="s1"/></flow>
<flow><step id="step2" parent="s2"/></flow>
<flow><step id="step3" parent="s3"/></flow>
</split>
<split id="splitForSet_B" task-executor="taskExecutor" next="stepWhatever">
<flow><step id="step4" parent="s4"/></flow>
<flow><step id="step5" parent="s5"/></flow>
<flow><step id="step6" parent="s6"/></flow>
</split>
<step id="stepWhatever" parent="sx"/>
</job>
Steps 1, 2 and 3 will run in parrallel (and load fileset A), then, once they're all over, the second split (splitForSet_B) will start and run steps 4, 5 and 6 in parrallel. A split is basicaly a step that contains steps running in parrallel.
You just need to specify in each steps what file you will be using (so it will be different for steps in first split from steps in second split.
I'd use two partitioned steps. Each partitioner would be responsible for identifying the files in its respective set for the concurrent child-steps to process
<job>
<step name="loadFirstSet">
<partition partitioner="firstSetPartitioner">
<handler task-executor="asyncTaskExecutor" />
<step name="loadFileFromSetOne>
<tasklet>
<chunk reader="someReader" writer="someWriter" commit-interval="#{jobParameters['commit.interval']}" />
</tasklet>
</step>
</partition>
</step>
<step name="loadSecondSet">
<partition partitioner="secondSetPartitioner">
<handler task-executor="asyncTaskExecutor" />
<step name="loadFileFromSecondSet>
<tasklet>
<chunk reader="someOtherReader" writer="someOtherWriter" commit-interval="#{jobParameters['another.commit.interval']}" />
</tasklet>
</step>
</partition>
</step>
</job>

how to read data from multiple tables in db using spring batch

I tried reading data from one table and writing to other table using spring batch but now my requirement is to read data from mutiple tables and write to a file, so we can achieve this by defining mutiple jobs but I want to do it using single job means single reader and single writer and single processor.
Please provide me some references for this scenario.
Not possible by the classes provided by the spring batch but you can make a way our of it.
Just before the chunk processing add one step, make a custom tasklet where you will assign different sql and different output file and make them run in loop as long as there are sqls to execute.
It might sound difficult but I have worked on same situation, Here is some idea how you can do it -
<flow id="databaseReadWriteJob">
<step id="step1_setReaderWriter">
<tasklet ref="setReaderWriter" />
<next on="FAILED" to="" />
<next on="*" to="dataExtractionStep" />
</step>
<step id="dataExtractionStep">
<tasklet>
<chunk reader="dbReader" writer="flatFileWriter" commit-interval="${commit-interval}" />
</tasklet>
<next on="FAILED" to="" />
<next on="*" to="step3_removeProcessedSql" />
</step>
<step id="step3_removeProcessedSql">
<tasklet ref="removeProcessedSql" />
<next on="NOOP" to="step1_setReaderWriter" />
<next on="*" to="step4_validateNumOfSuccessfulSteps" />
</step>
</flow>
and here is the bean for setReaderWriter
<beans:bean id="setReaderWriter" class="SetReaderWriter">
<beans:property name="reader" ref="dbReader" />
<beans:property name="flatFileWriter" ref="flatFileWriter" />
<beans:property name="fileSqlMap" ref="jobSqlFileMap" />
<beans:property name="fileNameBuilder" ref="localFileNameBuilder" />
<beans:property name="sourceFolder" value="${dataDir}" />
<beans:property name="dateDiff" value="${dateDiff}" />
Anything you need to add dynamically in Reader or Writer. Above sqlMap is the map of sql as key and Output file as value of that.
I hope it could help.

How to use chunk processing with Spring Batch?

I'm using Spring Batch for the first time. I tried some examples and read through documentation. But I have still questions:
Can I skip one phase in chunk oriented processing? For example: I fetch data from database, process it and determine, that I need more, can I skip write phase and execute next step's read phase? Should I use Tasklet instead?
How to implement a conditional flow?
Thank you very much,
Florian
Skip chunks simply by throwing an exception that has been declared as "skippable exception". You can do it as follows:
<step id="step1">
<tasklet>
<chunk reader="reader" writer="writer"
commit-interval="10" skip-limit="10">
<skippable-exception-classes>
<include class="com.myapp.batch.MyException"/>
</skippable-exception-classes>
</chunk>
</tasklet>
</step>
Conditional flow can easily be implemented deciding on the ExitStatus of a step-execution:
<job id="job">
<step id="step1" parent="s1">
<next on="*" to="stepB" />
<next on="FAILED" to="stepC" />
</step>
<step id="stepB" parent="s2" next="stepC" />
<step id="stepC" parent="s3" />
</job>
Read the documentation to gain deeper knowledge on these topics: http://docs.spring.io/spring-batch/reference/html/configureStep.html

Asynchronous Spring Batch Job multiple steps flow control

I have a spring batch job configured to run asynchronously (being started from a web service and using annotations to configure the methods to be asynchronous) my first step runs successfully.
My issue is that I have multiple steps configured and the flow is determined on the status of the step i.e. on completed it moves to step 2 but on a failure it moves to a failure handling step which sends a mail. When a remove the annotations the flow appears to work as expected. However when I use the annotations to run the job asynchronously which ever step is configured to execute on completion gets executed.
flow configuration sample:
<batch:job id="batchJob" restartable="true">
<batch:step id="step1">
<batch:tasklet ref="task1">
<batch:listeners>
<batch:listener ref="failureHandler" />
</batch:listeners>
</batch:tasklet>
<batch:next on="HAS_ERRORS" to="reportError" />
<batch:end on="*" />
<batch:next on="COMPLETED" to="step2" />
</batch:step>
<batch:step id="step2">
<batch:tasklet ref="task2">
<batch:listeners>
<batch:listener ref="failureHandler" />
</batch:listeners>
</batch:tasklet>
<batch:next on="HAS_ERRORS" to="reportError" />
<batch:end on="*" />
</batch:step>
<batch:step id="reportError">
<batch:tasklet ref="failError">
<batch:listeners>
<batch:listener ref="failureHandler" />
</batch:listeners>
</batch:tasklet>
<batch:end on="*" />
</batch:step>
</batch:job>
I have attempted to return an ExitStatus and a BatchStatus which has been ignored.
I have implemented a step execution listener but I have not yet implemented a messaging mechanism to communicate across steps and I do not see anything in the step execution context which gives me an indication of the outcome of the step.
The question I have is whether or not there is a method or mechanism that I may have overlooked to get the status of a step once it's completed? or is a messaging mechanism outside of the batch process an accepted way of proceeding?
It feels wrong that I cannot see the status of the batch step once it's completed when it's asynchronous(I get the expected results/failures when I remove the #Async annotation) I think there might be something I'm missing in my understanding I've spent some time looking into it so a pointer in the right direction would be appreciated.
I do not have access to this particular code any more.
I believe the issue is caused by the annotations overriding the XML configuration which defined the expected flow.
By overriding this we change the actual flow which we expected.

Resources