We are migrating XML-style Spring Batch jobs to JavaConfig, and found out that it seems not possible to use a decider() as the first step in the job flow logic, right after start(). We need to put a dummy step in order to invoke a decider.
However, in XML, this configuration works perfectly fine:
<batch:decision decider="exitsDecider" id="exitsInstance">
<batch:next on="CONTINUE" to="jobStartStep" />
<batch:end on="COMPLETED" exit-code="COMPLETED" />
<batch:fail on="FAILED" exit-code="FAILED" />
</batch:decision>
<batch:step id="jobStartStep" next="validateStep">
<batch:tasklet ref="jobStartTasklet" />
</batch:step>
Don't know whether this is an undocumented feature, or just some side of corner case. What would be the equivalent in Java Config?
I have chained a set of Spring batch jobs in an order.
<batch:job id="rootJob">
<batch:step id="rootJob.step1">
<batch:job ref="externalJob1">
<batch:next on="COMPLETE" to="rootJob.step2">
</batch:step>
<batch:split id="rootJob.step2">
<batch:flow>
<batch:step id="splitStep1">
<batch:job ref="externalJob2">
</batch:step>
</batch:flow>
<batch:flow>
<batch:step id="splitStep2">
<batch:job ref="externalJob3">
</batch:step>
</batch:flow>
<batch:next on="COMPLETE" to="rootJob.step3">
</batch:split>
<batch:step id="rootJob.step3">
<batch:job ref="externalJob4">
</batch:step>
</batch:job>
The expectation of job flow execution.
1. On Completion of rootJob.step1 execute rootJob.step2.
2. Execute splitJob1 and splitJob2 in parallel.
3. On Completion of rootJob.step2 execute rootJob.step3
But when deployed and triggered in Jboss. The flow is not executing as expected. The steps are getting triggered in single stretch. The execution is not waiting for previous step to complete and getting launched instantly.
I suspect the TaskExecutor. In standalone we do not specify any task executor (defaults to SyncTaskExecutor) and the job flow works fine. But when deployed in Jboss we use SimpleAsyncTaskExecutor, as using SyncTaskExecutor doesnt even trigger job in Jboss.
What am i missing here or Am i doing something wrong here.? Please suggest.
Resolved the issue.
I had provided the job-launcher="jobLauncher" property like below. So separate threads were launched and the jobs were triggering in parallel.
<batch:job ref="externalJob1" job-launcher="jobLauncher">
Now i have removed the joblauncher reference from all the jobs and the jobs are triggering as designed.
I have a job configuration where I load a set of files in parallel, after the set of files is loaded I also want to load another set of files in parallel, but only after the first set is completely loaded. The 2nd set has referential fields to the first set. I thought I can use a second split but never got it working, in the xsd it seems you can define more than one split and obviously a flow does not help me with my requirement.
So how do I define 2 sets of parallel flows which run in sequence to each?
<job>
<split>
<flow>
<step next="step2"/>
<step id="step2"/>
</flow>
<flow>
<step ...>
</flow>
</split>
<split ../>
Asoub was right, it is simply possible, I did a simple config and it worked. So seems the original issue I got has some other issue which causes problems when defining 2 splits.
Simple config I used:
<batch:job id="batchJob" restartable="true">
<batch:split id="x" next="y">
<batch:flow>
<batch:step id="a">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
</batch:tasklet>
</batch:step>
</batch:flow>
<batch:flow>
<batch:step id="b">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
</batch:tasklet>
</batch:step>
</batch:flow>
</batch:split>
<batch:split id="y" next="e">
<batch:flow>
<batch:step id="c">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
</batch:tasklet>
</batch:step>
</batch:flow>
<batch:flow>
<batch:step id="d">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
</batch:tasklet>
</batch:step>
</batch:flow>
</batch:split>
<batch:step id="e">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
</batch:tasklet>
</batch:step>
</batch:job>
INFO: Job: [FlowJob: [name=batchJob]] launched with the following parameters: [{random=994444}]
Nov 23, 2016 11:33:24 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [a]
Nov 23, 2016 11:33:24 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [b]
Nov 23, 2016 11:33:24 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [c]
Nov 23, 2016 11:33:24 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [d]
Nov 23, 2016 11:33:24 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [e]
Nov 23, 2016 11:33:25 PM org.springframework.batch.core.launch.support.SimpleJobLauncher run
INFO: Job: [FlowJob: [name=batchJob]] completed with the following parameters: [{random=994444}] and the following status: [COMPLETED]
As I said in comments, "So how do I define 2 sets of parallel flows which run in sequence to each?" doesn't make sense per se, you can't start two step in parrallel and sequentially.
Still I think you want to "start loading file2 in step2 when file1 in step1 as finished loading". Which means that loading a file occurs in the middle of a step. I see two way of solving this.
Let's say this is your configuration:
<job id="job1">
<split id="split1" task-executor="taskExecutor" next="step3">
<flow>
<step id="step1" parent="s1"/>
</flow>
<flow>
<step id="step2" parent="s2"/>
</flow>
</split>
<step id="step3" parent="s4"/> <!-- not important here -->
</job>
<beans:bean id="taskExecutor" class="org.spr...SimpleAsyncTaskExecutor"/>
But this will start both of your step in parrallel immediatly. You need to prevent the start of step 2. So, you need to use a Delegate in your step2's reader that will immediatly stop from loading file2, and waits for a signal to start the reading. And somewhere in the code of the step1, where you consider loading to be done, you launch a signal to step2's delegate reader to start loading file2.
The second solution is: you create your own SimpleAsyncTaskExecutor which will start step1 and wait for the signal from step1 to start step2. It's basically the first solution, but you wait for the signal in your custom Executor rather than in a Delegate reader. (you can copy source code from SimpleAsyncTaskExecutor to get an idea)
This comes at a cost, if the step1 never reaches the part where it signal step2 to start loading, your batch will hang forever. Maybe an exception in loading could cause this. As for signal mechanisms, Java has a lot of way to do this (wait() and notifiy(), locks, semaphore, non-standard library maybe).
I don't think there is some king of parrallel step trigger in spring batch (but if there is, someone posts it).
I've already answered a little while asking to your question, you need 2 splits: the first one loads the set of files A, and second, set of files B.
<job id="job1">
<split id="splitForSet_A" task-executor="taskExecutor" next="splitForSet_B">
<flow><step id="step1" parent="s1"/></flow>
<flow><step id="step2" parent="s2"/></flow>
<flow><step id="step3" parent="s3"/></flow>
</split>
<split id="splitForSet_B" task-executor="taskExecutor" next="stepWhatever">
<flow><step id="step4" parent="s4"/></flow>
<flow><step id="step5" parent="s5"/></flow>
<flow><step id="step6" parent="s6"/></flow>
</split>
<step id="stepWhatever" parent="sx"/>
</job>
Steps 1, 2 and 3 will run in parrallel (and load fileset A), then, once they're all over, the second split (splitForSet_B) will start and run steps 4, 5 and 6 in parrallel. A split is basicaly a step that contains steps running in parrallel.
You just need to specify in each steps what file you will be using (so it will be different for steps in first split from steps in second split.
I'd use two partitioned steps. Each partitioner would be responsible for identifying the files in its respective set for the concurrent child-steps to process
<job>
<step name="loadFirstSet">
<partition partitioner="firstSetPartitioner">
<handler task-executor="asyncTaskExecutor" />
<step name="loadFileFromSetOne>
<tasklet>
<chunk reader="someReader" writer="someWriter" commit-interval="#{jobParameters['commit.interval']}" />
</tasklet>
</step>
</partition>
</step>
<step name="loadSecondSet">
<partition partitioner="secondSetPartitioner">
<handler task-executor="asyncTaskExecutor" />
<step name="loadFileFromSecondSet>
<tasklet>
<chunk reader="someOtherReader" writer="someOtherWriter" commit-interval="#{jobParameters['another.commit.interval']}" />
</tasklet>
</step>
</partition>
</step>
</job>
We have small cluster.In which spring xd distributed runtime architecture is component for ETL.
We have scheduled batch job in it using cron.But when job fails or interrupted,We are not getting notified over named channel or Email and is that possible to trigger batch jobs by sending messages to named channels?
Currently,Running on Following Environment:
Spring XD Distributed Runtime - 1.2.1
Hadoop Distribution - PHD3.0
Any help on it would be much appreciated.
You may need to write your own step that handles the notification for you. So in your batch flow you will configure some steps that only get executed when your other steps fail.
Something like...
<job id="job">
<step id="stepA">
<next on="*" to="stepB" />
<next on="FAILED" to="NotifyErrorEmail" />
</step>
<step id="stepB".. />
<step id="NotifyErrorEmail" />
</job>
You can read more in Spring Batch Configuring Steps
I have a spring batch job configured to run asynchronously (being started from a web service and using annotations to configure the methods to be asynchronous) my first step runs successfully.
My issue is that I have multiple steps configured and the flow is determined on the status of the step i.e. on completed it moves to step 2 but on a failure it moves to a failure handling step which sends a mail. When a remove the annotations the flow appears to work as expected. However when I use the annotations to run the job asynchronously which ever step is configured to execute on completion gets executed.
flow configuration sample:
<batch:job id="batchJob" restartable="true">
<batch:step id="step1">
<batch:tasklet ref="task1">
<batch:listeners>
<batch:listener ref="failureHandler" />
</batch:listeners>
</batch:tasklet>
<batch:next on="HAS_ERRORS" to="reportError" />
<batch:end on="*" />
<batch:next on="COMPLETED" to="step2" />
</batch:step>
<batch:step id="step2">
<batch:tasklet ref="task2">
<batch:listeners>
<batch:listener ref="failureHandler" />
</batch:listeners>
</batch:tasklet>
<batch:next on="HAS_ERRORS" to="reportError" />
<batch:end on="*" />
</batch:step>
<batch:step id="reportError">
<batch:tasklet ref="failError">
<batch:listeners>
<batch:listener ref="failureHandler" />
</batch:listeners>
</batch:tasklet>
<batch:end on="*" />
</batch:step>
</batch:job>
I have attempted to return an ExitStatus and a BatchStatus which has been ignored.
I have implemented a step execution listener but I have not yet implemented a messaging mechanism to communicate across steps and I do not see anything in the step execution context which gives me an indication of the outcome of the step.
The question I have is whether or not there is a method or mechanism that I may have overlooked to get the status of a step once it's completed? or is a messaging mechanism outside of the batch process an accepted way of proceeding?
It feels wrong that I cannot see the status of the batch step once it's completed when it's asynchronous(I get the expected results/failures when I remove the #Async annotation) I think there might be something I'm missing in my understanding I've spent some time looking into it so a pointer in the right direction would be appreciated.
I do not have access to this particular code any more.
I believe the issue is caused by the annotations overriding the XML configuration which defined the expected flow.
By overriding this we change the actual flow which we expected.