Spring batch :Restart a job and then start next job automatically - spring

I need to create a recovery pattern.
In my pattern I can launch a job only on a given time window.
In case the job fails, it will only be restarted on the next time window and when finish I would like to start the schedule job that was planned in advance for this window.
The only different between jobs is the time window parameters.
I thought about JobExecutionDecider with conjunction with JobExplorer or overriding a Joblauncher. But all seems too intrusive.
I failed to found an example that match my needs any Ideas will be most welcome.

Just to recap what was actually done based on the advice provided by incomplete-co.de.
I created a recovery flow which is similar to the one below. The recovery flow wraps my actual batch and responsible only to serve the correct job parameters to the internal job. It could be initial parameters on first execution, new parameters on normal execution or old parameters in case the last execution had failed.
<batch:job id="recoveryWrapper"
incrementer="wrapperRunIdIncrementer"
restartable="true">
<batch:decision id="recoveryFlowDecision" decider="recoveryFlowDecider">
<batch:next on="FIRST_RUN" to="defineParametersOnFirstRun" />
<batch:next on="RECOVER" to="recover.batchJob " />
<batch:next on="CURRENT" to="current.batchJob " />
</batch:decision>
<batch:step id="defineParametersOnFirstRun" next="current.batchJob">
<batch:tasklet ref="defineParametersOnFirstRunTasklet"/>
</batch:step>
<batch:step id="recover.batchJob " next="current.batchJob">
<batch:job ref="batchJob" job-launcher="jobLauncher"
job-parameters-extractor="jobParametersExtractor" />
</batch:step>
<batch:step id="current.batchJob" >
<batch:job ref="batchJob" job-launcher="jobLauncher"
job-parameters-extractor="jobParametersExtractor" />
</batch:step>
</batch:job>
The heart of the solution is the RecoveryFlowDecider the JobParametersExtractor while using Spring Batch Restart mechanism.
RecoveryFlowDecider will query the JobExplorer and JobRepository to find out if we had a failure in the last run. It will place The last execution on the execution context of the wrapper to use later in the JobParametersExtractor.
Note the use of runIdIncremeter to allow re-execution of the wrapper job.
#Component
public class RecoveryFlowDecider implements JobExecutionDecider {
private static final String FIRST_RUN = "FIRST_RUN";
private static final String CURRENT = "CURRENT";
private static final String RECOVER = "RECOVER";
#Autowired
private JobExplorer jobExplorer;
#Autowired
private JobRepository jobRepository;
#Override
public FlowExecutionStatus decide(JobExecution jobExecution
,StepExecution stepExecution) {
// the wrapper is named as the wrapped job + WRAPPER
String wrapperJobName = jobExecution.getJobInstance().getJobName();
String jobName;
jobName = wrapperJobName.substring(0,wrapperJobName.indexOf(EtlConstants.WRAPPER));
List<JobInstance> instances = jobExplorer.getJobInstances(jobName, 0, 1);
JobInstance internalJobInstance = instances.size() > 0 ? instances.get(0) : null;
if (null == internalJobInstance) {
return new FlowExecutionStatus(FIRST_RUN);
}
JobExecution lastExecution = jobRepository.getLastJobExecution(internalJobInstance.getJobName()
,internalJobInstance.getJobParameters());
//place the last execution on the context (wrapper context to use later)
jobExecution.getExecutionContext().put(EtlConstants.LAST_EXECUTION, lastExecution);
ExitStatus exitStatus = lastExecution.getExitStatus();
if (ExitStatus.FAILED.equals(exitStatus) || ExitStatus.UNKNOWN.equals(exitStatus)) {
return new FlowExecutionStatus(RECOVER);
}else if(ExitStatus.COMPLETED.equals(exitStatus)){
return new FlowExecutionStatus(CURRENT);
}
//We should never get here unless we have a defect
throw new RuntimeException("Unexpecded batch status: "+exitStatus+" in decider!");
}
}
Then the JobParametersExtractor will test again for the outcome of the last execution, in case of failed job it will serve the original parameters used to execute the failed job triggering Spring Bacth restart mechanism. Otherwise it will create a new set of parameters and will execute at his normal course.
#Component
public class JobExecutionWindowParametersExtractor implements
JobParametersExtractor {
#Override
public JobParameters getJobParameters(Job job, StepExecution stepExecution) {
// Read the last execution from the wrapping job
// in order to build Next Execution Window
JobExecution lastExecution= (JobExecution) stepExecution.getJobExecution().getExecutionContext().get(EtlConstants.LAST_EXECUTION);;
if(null!=lastExecution){
if (ExitStatus.FAILED.equals(lastExecution.getExitStatus())) {
JobInstance instance = lastExecution.getJobInstance();
JobParameters parameters = instance.getJobParameters();
return parameters;
}
}
//We do not have failed execution or have no execution at all we need to create a new execution window
return buildJobParamaters(lastExecution,stepExecution);
}
...
}

have you considered a JobStep? that is, a step determines if there are any additional jobs to be run. this value is set into the StepExecutionContext. a JobExecutionDecider then checks for this value; if exists, directs to a JobStep which launches the Job.
here's the doc on it http://docs.spring.io/spring-batch/reference/htmlsingle/#external-flows

Is it possible to do it in the opposite manner?
In every time window, submit the job intended for that time window.
However, the very first step of the job should check whether the job in previous time window is completed successfully or not. If it failed before, then submit the previous job, and wait for completion, before going into its own logic.

Related

Spring batch, check jobExecution status after running the job directly?

I have 2 jobs where i have to launch the job2 based on the the status of the job1.
is it right to make the following call:
JobExecution jobExecution1 = jobLauncher.run(job1, jobParameters1);
if(jobExecution1.getStatus()== BatchStatus.COMPLETED){
JobExecution jobExecution2 = jobLauncher.run(job2, jobParameters2);
}
i have a doubt that the initial jobexecution1 status might not be final when the condition is checked.
if anyone could explain more about yhis process it would be great
thanks in advance .
It depends on the TaskExecutor implementation that you configure in your JobLauncher:
If the task executor is synchronous, then the first job will be run until completion (either success or failure) before launching the second job. In this case, your code is correct.
If the task executor is asynchronous, then the call JobExecution jobExecution1 = jobLauncher.run(job1, jobParameters1); will return immediately and the status of jobExecution1 would be unknown at that point. This means it is incorrect to check the status of job1 right after launching it as a condition to launch job2 (job1 will be run in a separate thread and could still be running at that point).
By default, Spring Batch configures a synchronous task executor in the JobLauncher.
There are multiple ways. We are using below option by chaining jobs by decider. Below is the code snippet . Please let me know if you face any issues.
public class MyDecider implements JobExecutionDecider {
#Override
public FlowExecutionStatus decide(JobExecution jobExecution, StepExecution stepExecution) {
String exitCode = new Random().nextFloat() < .70f ? "CORRECT":"INCORRECT";
return new FlowExecutionStatus(exitCode);
}
}
#Bean
public JobExecutionDecider myDecider() {
return new MyDecider();
}
#Bean
public Job myTestJob() {
return this.jobBuilderFactory.get("myTestJob")
.start(step1())
.next(step2())
.on("FAILED").to(step3())
.from(step4())
.on("*").to(myDecider())
.on("PRESENT").to(myNextStepOrJob)
.next(myDecider()).on("CORRECT").to(myNextStepOrJob())
.from(myDecider()).on("INCORRECT").to(myNextStepOrJob())
.from(decider())
.on("NOT_PRESENT").to(myNextStepOrJob())
.end()
.build();
}

How to launch a Job

I'm currently trying to launch a Job programatically in my Spring application.
Here is my actual code:
public void launchJob(String[] args)
throws JobParametersInvalidException, JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException {
String jobName = args[0];
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(jobName, Job.class);
JobParameters jobParameters = getJobParameters(args);
JobExecution jobExecution = jobLauncher.run(job, jobParameters);
BatchStatus batchStatus = jobExecution.getStatus();
}
And how I launch it :
String[] args = {"transformXML2CSV", "myFile.xml", "myFile.csv"};
springBatchJobRunner.launchJob(args);
But I have some troubles during the launch, the first is to retrieve the app context, my first try was to annotate my class with a #Service and use an #Autowired like this:
#Autowired
private ApplicationContext context;
But with this way my context is always null.
My second try was to get the context by this way:
ApplicationContext context = new AnnotationConfigApplicationContext(SpringBatchNoDatabaseConfiguration.class);
The SpringBatchNoDatabaseConfiguration is my #Configuration and the Job is inside it.
Using this my context is not null but I have a strange behavior and I can't understand why and how to prevent this:
I run the launchJob function from my processor class, then when I get the context by AnnotationConfigApplicationContext my processor class is rebuild and I have a NullPointerException in it stopping all the process.
I really don't understand the last part, why it's relaunching my processor class when I get the context ?
As indicated in comments above, you run a parent batch (with spring-batch) which at a moment needs your job to process an xml file.
I suggest you keep the same spring-batch context and run the process xml file job as a nested job of the parent job. You can do that using JobStep class and spring batch controlling step flow feature. As an example, here is what your parent job would like to :
public Job parentJob(){
JobParameters processXmlFileJobParameters = getJobParameters(String[]{"transformXML2CSV", "myFile.xml", "myFile.csv"});
return this.jobBuilderFactory.get("parentJob")
.start(firstStepOfParentJob())
.on("PROCESS_XML_FILE").to(processXmlFileStep(processXmlFileJobParameters)
.from(firstStepOfParentJob()).on("*").to(lastStepOfParentJob())
.from(processXmlFileStep(processXmlFileJobParameters))
.next(lastStepOfParentJob())
.end().build();
}
public Step firstStepOfParentJob(){
return stepBuilderFactory.get("firstStepOfParentJob")
// ... depends on your parent job's business
.build();
}
public Step lastStepOfParentJob(){
return stepBuilderFactory.get("lastStepOfParentJob")
// ... depends on your parent job's business
.build();
}
public Step processXmlFileStep(JobParameters processXmlFileJobParameters){
return stepBuilderFactory.get("processXmlFileStep")
.job(processXmlFileJob())
.parametersExtractor((job,exec)->processXmlFileJobParameters)
.build();
}
public Job processXmlFileJob(){
return jobBuilderFactory.get("processXmlFileJob")
// ... describe here your xml process job
.build();
}
It seems a second Spring context is initialized with the instruction new AnnotationConfigApplicationContext(SpringBatchNoDatabaseConfiguration.class) so Spring initialiezs your beans a second time.
I recommend you to use Spring boot to automatically launch your jobs at sartup
If you don't want to use Spring boot and launch your job manually, your main method should look like this :
public static void main(String[] args){
String[] localArgs = {"transformXML2CSV", "myFile.xml", "myFile.csv"};
ApplicationContext context = new AnnotationConfigApplicationContext(SpringBatchNoDatabaseConfiguration.class);
Job job = context.getBean(Job.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
JobParameters jobParameters = getJobParameters(localArgs);
jobExecution = jobLauncher.run(job, jobParameters);
// Ohter code to do stuff after the job ends...
}
Notice that this code should be completed with your own needs. The class SpringBatchNoDatabaseConfigurationhas to be anotated with #Configuration and #EnableBatchProcessingand should define the beans of your job like this
You can also use the Spring batch class CommandLineJobRunner with your java config as explain here : how to launch Spring Batch Job using CommandLineJobRunner having Java configuration
It saves writing the code above

Spring-batch exiting with Exit Status : COMPLETED before actual job is finished?

In my Spring Batch application I have written a CustomItemWriter which internally writes item to DynamoDB using DynamoDBAsyncClient, this client returns Future object. I have a input file with millions of record. Since CustomItemWriter returns future object immediately my batch job exiting within 5 sec with status as COMPLETED, but in actual it is taking 3-4 minutes to write all item to the DB, I want that batch job finishes only after all item written to DataBase. How can i do that?
job is defined as below
<bean id="report" class="com.solution.model.Report" scope="prototype" />
<batch:job id="job" restartable="true">
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="cvsFileItemReader" processor="filterReportProcessor" writer="customItemWriter"
commit-interval="20">
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="customItemWriter" class="com.solution.writer.CustomeWriter"></bean>
CustomeItemWriter is defined as below
public class CustomeWriter implements ItemWriter<Report>{
public void write(List<? extends Report> item) throws Exception {
List<Future<PutItemResult>> list = new LinkedList();
AmazonDynamoDBAsyncClient client = new AmazonDynamoDBAsyncClient();
for(Report report : item) {
PutItemRequest req = new PutItemRequest();
req.setTableName("MyTable");
req.setReturnValue(ReturnValue.ALL_ODD);
req.addItemEntry("customerId",new
AttributeValue(item.getCustomeId()));
Future<PutItemResult> res = client.putItemAsync(req);
list.add(res);
}
}
}
Main class contains
JobExecution execution = jobLauncher.run(job, new JobParameters());
System.out.println("Exit Status : " + execution.getStatus());
Since in ItemWriter its returning future object it doesn't waits to complete the opration. And from the main since all item is submitted for writing Batch Status is showing COMPLETED and job terminates.
I want that this job should terminate only after actual write is performed in the DynamoDB.
Can we have some other step well to wait on this or some Listener is available?
Here is one approach. Since ItemWriter::write doesn't return anything you can make use of listener feature.
#Component
#JobScope
public class YourWriteListener implements ItemWriteListener<WhatEverYourTypeIs> {
#Value("#{jobExecution.executionContext}")
private ExecutionContext executionContext;
#Override
public void afterWrite(final List<? extends WhatEverYourTypeIs> paramList) {
Future future = this.executionContext.readAndValidate("FutureKey", Future.class);
//wait till the job is done using future object
}
#Override
public void beforeWrite(final List<? extends WhatEverYourTypeIs> paramList) {
}
#Override
public void onWriteError(final Exception paramException, final List<? extends WhatEverYourTypeIs> paramList) {
}
}
In your writer class, everything remains same except addind the future object to ExecutionContext.
public class YourItemWriter extends ItemWriter<WhatEverYourTypeIs> {
#Value("#{jobExecution.executionContext}")
private ExecutionContext executionContext;
#Override
protected void doWrite(final List<? extends WhatEverYourTypeIs> youritems)
//write to DynamoDb and get Future object
executionContext.put("FutureKey", future);
}
}
}
And you can register the listener in your configuration. Here is a java code, you need to do the same in your xml
#Bean
public Step initStep() {
return this.stepBuilders.get("someStepName").<YourTypeX, YourTypeY>chunk(10)
.reader(yourReader).processor(yourProcessor)
.writer(yourWriter).listener(YourWriteListener)
.build();
}

Controlling Spring-Batch Step Flow in a java-config manner

According to Spring-Batch documentation (http://docs.spring.io/spring-batch/2.2.x/reference/html/configureStep.html#controllingStepFlow), controlling step flow in an xml config file is very simple:
e.g. I could write the following job configuration:
<job id="myJob">
<step id="step1">
<fail on="CUSTOM_EXIT_STATUS"/>
<next on="*" to="step2"/>
</step>
<step id="step2">
<end on="1ST_EXIT_STATUS"/>
<next on="2ND_EXIT_STATUS" to="step10"/>
<next on="*" to="step20"/>
</step>
<step id="step10" next="step11" />
<step id="step11" />
<step id="step20" next="step21" />
<step id="step21" next="step22" />
<step id="step22" />
</job>
Is there a simple way defining such a job configuration in a java-config manner? (using JobBuilderFactory and so on...)
As the documentation also mentions, we can only branch the flow based on the exit-status of a step. To be able to report a custom exit-status (possibly different from the one automatically mapped from batch-status), we must provide an afterStep method for a StepExecutionListener.
Suppose we have an initial step step1 (an instance of a Tasklet class Step1), and we want to do the following:
if step1 fails (e.g. by throwing a runtime exception), then the entire job should be considered as FAILED.
if step1 completes with an exit-status of COMPLETED-WITH-A, then we want to branch to some step step2a which supposedly handles this specific case.
otherwise, we stay on the main truck of the job and continue with step step2.
Now, provide an afterStep method inside Step1 class (also implementing StepExecutionListener):
private static class Step1 implements Tasklet, StepExecutionListener
{
#Override
public ExitStatus afterStep(StepExecution stepExecution)
{
logger.info("*after-step1* step-execution={}", stepExecution.toString());
// Report a different exit-status on a random manner (just a demo!).
// Some of these exit statuses (COMPLETED-WITH-A) are Step1-specific
// and are used to base a conditional flow on them.
ExitStatus exitStatus = stepExecution.getExitStatus();
if (!"FAILED".equals(exitStatus.getExitCode())) {
double r = Math.random();
if (r < 0.50)
exitStatus = null; // i.e. COMPLETED
else
exitStatus = new ExitStatus(
"COMPLETED-WITH-A",
"Completed with some special condition A");
}
logger.info("*after-step1* reporting exit-status of {}", exitStatus);
return exitStatus;
}
// .... other methods of Step1
}
Finally, build the job flow inside createJob method of our JobFactory implementation:
#Override
public Job createJob()
{
// Assume some factories returning instances of our Tasklets
Step step1 = step1();
Step step2a = step2a();
Step step2 = step2();
JobBuilder jobBuilder = jobBuilderFactory.get(JOB_NAME)
.incrementer(new RunIdIncrementer())
.listener(listener); // a job-level listener
// Build job flow
return jobBuilder
.start(step1)
.on("FAILED").fail()
.from(step1)
.on("COMPLETED-WITH-A").to(step2a)
.from(step1)
.next(step2)
.end()
.build();
}
Maybe. If your intentions are to write something similar to a flow decider "programmatically" (using SB's framework interfaces, I mean) there is the built-in implementation and is enough for the most use cases.
Opposite to XML config you can use JavaConfig annotations if you are familiar with them; personally I prefer XML definition, but it's only a personal opinion.

Spring Batch resume after server's failure

I am using spring batch to parse files and I have the following scenario:
I am running a job. This job has to parse a giving file. For unexpected reason (let say for power cut) the server fails and I have to restart the machine. Now, after restarting the server I want to resume the job from the point which stopped before the power cut. This means that if the system read 1.300 rows from 10.000 now have to start reading from 1.301 row.
How can I achieve this scenario using spring batch?
About configuration: I use spring-integration which polls under a directory for new files. When a file is arrived the spring-integration creates the spring batch job. Also, spring-batch uses FlatFileItemReader to parse the file.
Here is the complete solution to restart a job after JVM crash.
Make a job restartable by making restarable="true"
job id="jobName" xmlns="http://www.springframework.org/schema/batch"
restartable="true"
2 . Code to restart a job
import java.util.Date;
import java.util.List;
import org.apache.commons.collections.CollectionUtils;
import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.ExitStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobInstance;
import org.springframework.batch.core.explore.JobExplorer;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.launch.JobOperator;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.beans.factory.annotation.Autowired;
public class ResartJob {
#Autowired
private JobExplorer jobExplorer;
#Autowired
JobRepository jobRepository;
#Autowired
private JobLauncher jobLauncher;
#Autowired
JobOperator jobOperator;
public void restart(){
try {
List<JobInstance> jobInstances = jobExplorer.getJobInstances("jobName",0,1);// this will get one latest job from the database
if(CollectionUtils.isNotEmpty(jobInstances)){
JobInstance jobInstance = jobInstances.get(0);
List<JobExecution> jobExecutions = jobExplorer.getJobExecutions(jobInstance);
if(CollectionUtils.isNotEmpty(jobExecutions)){
for(JobExecution execution: jobExecutions){
// If the job status is STARTED then update the status to FAILED and restart the job using JobOperator.java
if(execution.getStatus().equals(BatchStatus.STARTED)){
execution.setEndTime(new Date());
execution.setStatus(BatchStatus.FAILED);
execution.setExitStatus(ExitStatus.FAILED);
jobRepository.update(execution);
jobOperator.restart(execution.getId());
}
}
}
}
} catch (Exception e1) {
e1.printStackTrace();
}
}
}
3.
<bean id="jobRepository" class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean" p:dataSource-ref="dataSource" p:transactionManager-ref="transactionManager" p:lobHandler-ref="oracleLobHandler"/>
<bean id="oracleLobHandler" class="org.springframework.jdbc.support.lob.DefaultLobHandler"/>
<bean id="jobExplorer" class="org.springframework.batch.core.explore.support.JobExplorerFactoryBean" p:dataSource-ref="dataSource" />
<bean id="jobRegistry" class="org.springframework.batch.core.configuration.support.MapJobRegistry" />
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<property name="taskExecutor" ref="jobLauncherTaskExecutor" />
</bean> <task:executor id="jobLauncherTaskExecutor" pool-size="6" rejection-policy="ABORT" />
<bean id="jobOperator" class="org.springframework.batch.core.launch.support.SimpleJobOperator" p:jobLauncher-ref="jobLauncher" p:jobExplorer-re`enter code here`f="jobExplorer" p:jobRepository-ref="jobRepository" p:jobRegistry-ref="jobRegistry"/>
An updated work-around for Spring batch 4. Takes JVM start up time into account for broken jobs detection. Please note that this will not work when in a clustered environment where multiple servers start jobs.
#Bean
public ApplicationListener<ContextRefreshedEvent> resumeJobsListener(JobOperator jobOperator, JobRepository jobRepository,
JobExplorer jobExplorer) {
// restart jobs that failed due to
return event -> {
Date jvmStartTime = new Date(ManagementFactory.getRuntimeMXBean().getStartTime());
// for each job
for (String jobName : jobExplorer.getJobNames()) {
// get latest job instance
for (JobInstance instance : jobExplorer.getJobInstances(jobName, 0, 1)) {
// for each of the executions
for (JobExecution execution : jobExplorer.getJobExecutions(instance)) {
if (execution.getStatus().equals(BatchStatus.STARTED) && execution.getCreateTime().before(jvmStartTime)) {
// this job is broken and must be restarted
execution.setEndTime(new Date());
execution.setStatus(BatchStatus.FAILED);
execution.setExitStatus(ExitStatus.FAILED);
for (StepExecution se : execution.getStepExecutions()) {
if (se.getStatus().equals(BatchStatus.STARTED)) {
se.setEndTime(new Date());
se.setStatus(BatchStatus.FAILED);
se.setExitStatus(ExitStatus.FAILED);
jobRepository.update(se);
}
}
jobRepository.update(execution);
try {
jobOperator.restart(execution.getId());
}
catch (JobExecutionException e) {
LOG.warn("Couldn't resume job execution {}", execution, e);
}
}
}
}
}
};
}
What I would do in your situation is to create a step to log the last processed row in a file. Then create a second job that would read this file and start the processing from a specific row number.
So if the job stops due to whatever reason you will be able to run the new Job that will resume the processing.
you can also write like below :
#RequestMapping(value = "/updateStatusAndRestart/{jobId}/{stepId}", method = GET)
public ResponseEntity<String> updateBatchStatus(#PathVariable("jobId") Long jobExecutionId ,#PathVariable("stepId")Long stepExecutionId )throws Exception {
StepExecution stepExecution = jobExplorer.getStepExecution(jobExecutionId,stepExecutionId);
stepExecution.setEndTime(new Date(System.currentTimeMillis()));
stepExecution.setStatus(BatchStatus.FAILED);
stepExecution.setExitStatus(ExitStatus.FAILED);
jobRepository.update(stepExecution);
JobExecution jobExecution = stepExecution.getJobExecution();
jobExecution.setEndTime(new Date(System.currentTimeMillis()));
jobExecution.setStatus(BatchStatus.FAILED);
jobExecution.setExitStatus(ExitStatus.FAILED);
jobRepository.update(jobExecution);
jobOperator.restart(execution.getId());
return new ResponseEntity<String>("<h1> Batch Status Updated !! </h1>", HttpStatus.OK);
}
Here i have used restApi endpoint to pass the jobExecutionId and stepExecutionId and setting the status of both job_execution and step_execution to FAIL. then restart using batch operator.

Resources