I am reading the spring batch documentation and stuck on following part:
There are provided following example:
#Bean
public Job job() {
Flow flow1 = new FlowBuilder<SimpleFlow>("flow1")
.start(step1())
.next(step2())
.build();
Flow flow2 = new FlowBuilder<SimpleFlow>("flow2")
.start(step3())
.build();
return this.jobBuilderFactory.get("job")
.start(flow1)
.split(new SimpleAsyncTaskExecutor())
.add(flow2)
.next(step4())
.end()
.build();
}
But it is not explained what is happening.
as far I understand flow1 and flow2 are executed in parallel but what about step4 ?
step4() is executed linearly after flow1 and flow2 returned.
Look at the FlowBuilder.SplitBuilder.add() javadoc :
public FlowBuilder<Q> add(Flow... flows)
Add flows to the split, in addition to the current state already
present in the parent builder.
Parameters:
flows - more flows to add to the split
Returns: the parent builder
It returns the parent builder and not the current SplitBuilder object.
So it is not included in the flow split and so is executed sequentially.
To run the 3 flows in parallel :
return this.jobBuilderFactory.get("job")
.start(flow1)
.split(new SimpleAsyncTaskExecutor())
.add(flow2, step4())
.end()
.build();
Related
I defined a spring batch job very simple like below. I want to change its registered name using a parameter received (which is added to the spring batch parameter list of the job as jobName)
#Bean
#JobScope //this doesn't work throws exception 'No context holder available for job scope'
Job genericJob (JobNotifierListener listener,
Step genericStep1, Step genericStep2,
#Value("#{jobParameters['jobName']}") String jobName
) {
return jobBuilderFactory.get(jobName + "GenericJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(genericStep1)
.next(genericStep2)
.build();
}
How can I configure the job so that the name of the job is dynamically changed using the input batch parameter jobName? (as adding #JobScope to access the spring batch context doesn't work, throws error)
The job name should not be a job parameter. Job parameters are designed for "business" runtime parameters, not technical configuration parameters. An application property or a system property is better suited for your case:
#Bean
//#JobScope // no need for this
Job genericJob (JobNotifierListener listener,
Step genericStep1, Step genericStep2,
#Value("#{systemProperties['jobName']}") String jobName
) {
return jobBuilderFactory.get(jobName + "GenericJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(genericStep1)
.next(genericStep2)
.build();
}
I have created a Spring Batch app and I'm struggling to implement a simple flow with a condition. Here's what I want to implement:
I tried to achieve this implementing the following code:
#Bean
public Job job(JobCompletionNotificationListener listener) {
return jobs.get(Constants.JOB_SIARD_FILES_PROCESSOR + new Date().getTime())
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1())
.next(decider()).on("yes").to(step2345Flow())
.end()
.build();
}
#Bean
public Flow step2345Flow() {
return new FlowBuilder<SimpleFlow>("yes_flow")
.start(step2())
.next(step3())
.next(step4())
.next(step5())
.build();
}
When the condition is "yes" the flow is working just fine, but when the condition is "no" the flow always ends with an execution status "FAILED". I want it to be "COMPLETED" just like the first flow but without executing the steps 2, 3, 4 and 5.
Hope anyone can help me with this.
Spring Batch does not allow alternative branches in the flow to be implicit. In other words, you need an on(...) for each case.
Assuming decider() yields a proxied bean, it should work fine with
#Bean
public Job job(JobCompletionNotificationListener listener) {
return jobs.get(Constants.JOB_SIARD_FILES_PROCESSOR + new Date().getTime())
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1())
.next(decider()).on("yes").to(step2345Flow())
.from(decider()).on("no").end()
.end()
.build();
}
To cover really all cases, you can also use on("*") instead of on("no").
Please also have a second look at the official documentation: https://docs.spring.io/spring-batch/docs/4.3.x/reference/html/index-single.html#controllingStepFlow
i have been playing with a spring batch job that reads a sample csv file and dumps the records into a table.
My question is surrounding restarts, i have introduced a data issue in the file ( too long to insert) in the 3rd line
In the first run
The first two lines get inserted and the third line fails ( as expected )
when i restart
The fourth line is picked up and the rest of the file is processed
All the documentation seems to suggest that spring batch picks up where it left off, does it mean the 3rd ( problem record ) considered
'attempted' and hence wont be tried again? i was expecting all the restarts to fail untill i fixed the file.
#Bean
public FlatFileItemReader<Person> reader() {
return new FlatFileItemReaderBuilder<Person>()
.name("personItemReader")
.resource(new ClassPathResource("sample-data.csv"))
.delimited()
.names(new String[]{"firstName", "lastName"})
.fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
setTargetType(Person.class);
}})
.build();
}
#Bean
public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
.dataSource(dataSource)
.build();
}
#Bean
public Step step1(JdbcBatchItemWriter<Person> writer) {
return stepBuilderFactory.get("step1")
.<Person, Person> chunk(1)
.reader(reader())
.processor(processor())
.writer(writer)
.taskExecutor(taskExecutor())
.throttleLimit(1)
.build();
}
#Bean
public Job importUserJob(JobCompletionNotificationListener listener) {
return jobBuilderFactory.get("importUserJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1)
.build();
}
Please let me know have you gone through below. If Its not clear I can share the same sample project in github
Spring Batch restart uncompleted jobs from the same execution and step
Spring Batch correctly restart uncompleted jobs in clustered environment
In production we always use "fault-tolerant" so that job will reject the wrong data and continue. Later operations will correct the data and re-execute the job again. Advantage here is huge volume of data can be continuously processed and no need to wait for data correction.
Please compare your code with below
https://github.com/ngecom/stackoverflow-springbatchRestart
You have set a RunIdIncrementer on your job, so you will have a new job instance on each run. You need to remove that incrementer and pass the file as a job parameter to have the same job instance on each run. With this approach, all restarts will fail until you fix the file.
As a side note, you can't have restartability if you use a multi-threaded step. This is because the state would not be consistent when using multiple threads. So you need to use a single threaded-step (remove the task executor). This is explained in the documentation here: Multi-threaded step.
I have tried to find the solution but I cannot... ㅠㅠ
I want to separate steps in a job like below.
step1.class -> step2.class -> step3.class -> done
The reason why I'm so divided is that I have to use queries each step.
#Bean
public Job bundleJob() {
return jobBuilderFactory.get(JOB_NAME)
.start(step1) // bean
.next(step2) // bean
.next(step3()) // and here is the code ex) reader, processor, writer
.build();
}
my purpose is that I have to use the return data in step1, step2.
but jpaItemReader is like async ... so it doesn't process like above order.
debug flow like this.
readerStep1 -> writerStep1 -> readerStep2 -> readerWriter2 -> readerStep3 -> writerStep3
and
-> processorStep1 -> processorStep2 -> processorStep3
that is the big problem to me...
How can I wait each step in a job? Including querying.
aha! I got it.
the point is the creating beans in a configuration.
I wrote annotation bean all kinds of steps so that those are created by spring.
the solution is late binding like #JobScope or #StepScope
#Bean
#StepScope. // late creating bean.
public ListItemReader<Dto> itemReader() {
// business logic
return new ListItemReader<>(dto);
}
To have a separate steps in your job you can use a Flow with a TaskletStep. Sharing a snippet for your reference,
#Bean
public Job processJob() throws Exception {
Flow fetchData = (Flow) new FlowBuilder<>("fetchData")
.start(fetchDataStep()).build();
Flow transformData = (Flow) new FlowBuilder<>("transformData")
.start(transformData()).build();
Job job = jobBuilderFactory.get("processTenantLifeCycleJob").incrementer(new RunIdIncrementer())
.start(fetchData).next(transformData).next(processData()).end()
.listener(jobCompletionListener()).build();
ReferenceJobFactory referenceJobFactory = new ReferenceJobFactory(job);
registry.register(referenceJobFactory);
return job;
}
#Bean
public TaskletStep fetchDataStep() {
return stepBuilderFactory.get("fetchData")
.tasklet(fetchDataValue()).listener(fetchDataStepListener()).build();
}
#Bean
#StepScope
public FetchDataValue fetchDataValue() {
return new FetchDataValue();
}
#Bean
public TaskletStep transformDataStep() {
return stepBuilderFactory.get("transformData")
.tasklet(transformValue()).listener(sendReportDataCompletionListener()).build();
}
#Bean
#StepScope
public TransformValue transformValue() {
return new TransformValue();
}
#Bean
public Step processData() {
return stepBuilderFactory.get("processData").<String, Data>chunk(chunkSize)
.reader(processDataReader()).processor(dataProcessor()).writer(processDataWriter())
.listener(processDataListener())
.taskExecutor(backupTaskExecutor()).build();
}
In this example I have used 2 Flows to Fetch and Transform data which will execute data from a class.
In order to return the value of those from the step 1 and 2, you can store the value in the job context and retrieve that in the ProcessData Step which has a reader, processor and writer.
I have a requirement to execute a job with 2 modes using a parameter to distinguich betwwen the 2 modes .for example if the user use the parameter X in this case the job must read the data from the database and export it (all the records ) to an xml file.Otherwise if the user uses the parameter Y in this case the job must write each record in a separate xml file using the same header.
Use jobParameter to distinguish the modes:
#StepScope
#Bean
public Tasklet task(#Value("#{jobParameters['mode']}") String mode) {
}
If the modes are so different then you can't do them in same step, use decider:
FlowBuilder<Flow> flowBuilder = new FlowBuilder<>("modesFlow");
Flow flow = flowBuilder
.start(modesDecider)
.on("X")
.to(step1)
.from(modesDecider)
.on(step2)
.end()
.build();
jobBuilderFactory.get("modesJob")
.incrementer(new RunIdIncrementer())
.start(flow)
.end()
.build();
where:
ModesDecider implements JobExecutionDecider