i have been playing with a spring batch job that reads a sample csv file and dumps the records into a table.
My question is surrounding restarts, i have introduced a data issue in the file ( too long to insert) in the 3rd line
In the first run
The first two lines get inserted and the third line fails ( as expected )
when i restart
The fourth line is picked up and the rest of the file is processed
All the documentation seems to suggest that spring batch picks up where it left off, does it mean the 3rd ( problem record ) considered
'attempted' and hence wont be tried again? i was expecting all the restarts to fail untill i fixed the file.
#Bean
public FlatFileItemReader<Person> reader() {
return new FlatFileItemReaderBuilder<Person>()
.name("personItemReader")
.resource(new ClassPathResource("sample-data.csv"))
.delimited()
.names(new String[]{"firstName", "lastName"})
.fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
setTargetType(Person.class);
}})
.build();
}
#Bean
public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
.dataSource(dataSource)
.build();
}
#Bean
public Step step1(JdbcBatchItemWriter<Person> writer) {
return stepBuilderFactory.get("step1")
.<Person, Person> chunk(1)
.reader(reader())
.processor(processor())
.writer(writer)
.taskExecutor(taskExecutor())
.throttleLimit(1)
.build();
}
#Bean
public Job importUserJob(JobCompletionNotificationListener listener) {
return jobBuilderFactory.get("importUserJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1)
.build();
}
Please let me know have you gone through below. If Its not clear I can share the same sample project in github
Spring Batch restart uncompleted jobs from the same execution and step
Spring Batch correctly restart uncompleted jobs in clustered environment
In production we always use "fault-tolerant" so that job will reject the wrong data and continue. Later operations will correct the data and re-execute the job again. Advantage here is huge volume of data can be continuously processed and no need to wait for data correction.
Please compare your code with below
https://github.com/ngecom/stackoverflow-springbatchRestart
You have set a RunIdIncrementer on your job, so you will have a new job instance on each run. You need to remove that incrementer and pass the file as a job parameter to have the same job instance on each run. With this approach, all restarts will fail until you fix the file.
As a side note, you can't have restartability if you use a multi-threaded step. This is because the state would not be consistent when using multiple threads. So you need to use a single threaded-step (remove the task executor). This is explained in the documentation here: Multi-threaded step.
Related
How can I do the deletion of the entities that I just persisted?
#Bean
public Job job() {
return this.jobBuilderFactory.get("job")
.start(this.syncStep())
.build();
}
#Bean
public Step syncStep() {
// read
RepositoryItemReader<Element1> reader = new RepositoryItemReader<>();
reader.setRepository(repository);
reader.setMethodName("findElements");
reader.setArguments(new ArrayList<>(Arrays.asList(ZonedDateTime.now())));
final HashMap<String, Sort.Direction> sorts = new HashMap<>();
sorts.put("uid", Sort.Direction.ASC);
reader.setSort(sorts);
// write
RepositoryItemWriter<Element1> writer = new RepositoryItemWriter<>();
writer.setRepository(otherrepository);
writer.setMethodName("save");
return stepBuilderFactory.get("syncStep")
.<Element1, Element2> chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
It is a process of dumping elements. We pass the elements from one table to another.
It is a process of dumping elements. We pass the elements from one table to another.
You can do that in two steps. The first step copies items from one table to another. The second step deletes the items from the source table. The second step should be executed only if the first step succeeds.
There are a few options:
Using a CompositeItemWriter
You could create a second ItemWriter that does the delete logic, for example:
RepositoryItemWriter<Element1> deleteWriter = new RepositoryItemWriter<>();
deleteWriter.setRepository(repository);
deleteWriter.setMethodName("delete");
To execute both writers you can use a CompositeItemWriter:
CompositeItemWriter<User> writer = new CompositeItemWriter<>();
// 'saveWriter' would be the writer you currently have
writer.setDelegates(List.of(saveWriter, deleteWriter));
This however won't work if your ItemProcessor transforms the original entity to something completely new. In that case I suggest using PropertyExtractingDelegatingItemWriter.
(Note, according to this question the writers run sequentially and the second writer should not be executed if the first one fails, but I'm not 100% sure on that.)
Using a separate Step
Alternatively, you could put the new writer in an entirely separate Step:
#Bean
public Step cleanupStep() {
// Same reader as before (might want to put this in a separate #Bean)
RepositoryItemReader<Element1> reader = new RepositoryItemReader<>();
// ...
// The 'deleteWriter' from before
RepositoryItemWriter<Element1> deleteWriter = new RepositoryItemWriter<>();
// ...
return stepBuilderFactory.get("cleanupStep")
.<Element1, Element2> chunk(10)
.reader(reader)
.writer(writer)
.build();
}
Now you can schedule the two steps individually:
#Bean
public Job job() {
return this.jobBuilderFactory.get("job")
.start(this.syncStep())
.next(this.cleanupStep())
.build();
}
Using a Tasklet
If you're using a separate step and depending on the amount of data, it might be more interesting to offload it entirely to the database and execute a single delete ... where ... query.
public class CleanupRepositoryTasklet implements Tasklet {
private final Repository repository;
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
repository.customDeleteMethod();
return RepeatStatus.FINISHED;
}
}
This Tasklet can then be registered in the same way as before, by declaring a new Step in your configuration:
return this.stepBuilderFactory.get("cleanupStep")
.tasklet(myTasklet())
.build();
I have created a Spring Batch app and I'm struggling to implement a simple flow with a condition. Here's what I want to implement:
I tried to achieve this implementing the following code:
#Bean
public Job job(JobCompletionNotificationListener listener) {
return jobs.get(Constants.JOB_SIARD_FILES_PROCESSOR + new Date().getTime())
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1())
.next(decider()).on("yes").to(step2345Flow())
.end()
.build();
}
#Bean
public Flow step2345Flow() {
return new FlowBuilder<SimpleFlow>("yes_flow")
.start(step2())
.next(step3())
.next(step4())
.next(step5())
.build();
}
When the condition is "yes" the flow is working just fine, but when the condition is "no" the flow always ends with an execution status "FAILED". I want it to be "COMPLETED" just like the first flow but without executing the steps 2, 3, 4 and 5.
Hope anyone can help me with this.
Spring Batch does not allow alternative branches in the flow to be implicit. In other words, you need an on(...) for each case.
Assuming decider() yields a proxied bean, it should work fine with
#Bean
public Job job(JobCompletionNotificationListener listener) {
return jobs.get(Constants.JOB_SIARD_FILES_PROCESSOR + new Date().getTime())
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1())
.next(decider()).on("yes").to(step2345Flow())
.from(decider()).on("no").end()
.end()
.build();
}
To cover really all cases, you can also use on("*") instead of on("no").
Please also have a second look at the official documentation: https://docs.spring.io/spring-batch/docs/4.3.x/reference/html/index-single.html#controllingStepFlow
I am reading the spring batch documentation and stuck on following part:
There are provided following example:
#Bean
public Job job() {
Flow flow1 = new FlowBuilder<SimpleFlow>("flow1")
.start(step1())
.next(step2())
.build();
Flow flow2 = new FlowBuilder<SimpleFlow>("flow2")
.start(step3())
.build();
return this.jobBuilderFactory.get("job")
.start(flow1)
.split(new SimpleAsyncTaskExecutor())
.add(flow2)
.next(step4())
.end()
.build();
}
But it is not explained what is happening.
as far I understand flow1 and flow2 are executed in parallel but what about step4 ?
step4() is executed linearly after flow1 and flow2 returned.
Look at the FlowBuilder.SplitBuilder.add() javadoc :
public FlowBuilder<Q> add(Flow... flows)
Add flows to the split, in addition to the current state already
present in the parent builder.
Parameters:
flows - more flows to add to the split
Returns: the parent builder
It returns the parent builder and not the current SplitBuilder object.
So it is not included in the flow split and so is executed sequentially.
To run the 3 flows in parallel :
return this.jobBuilderFactory.get("job")
.start(flow1)
.split(new SimpleAsyncTaskExecutor())
.add(flow2, step4())
.end()
.build();
I am new to springbatch and I am trying something where from a CSV file I am trying to read about 2000 records every 10 seconds using a quartz scheduler and write it into a database.
The problem is everytime it starts reading the file from the beginning and hence writes the same set of records into the database.
I've tried dynamically changing the paramter "setLinesToSkip" but to no avail, which is probably because it is included in my default bean definition.
Is there some way by which I can resume processing from the same spot or maybe can update the value in setlinetoskip
#Bean
public Step stepOne() {
return stepBuilderFactory
.get("stepOne")
.<Stock,Stock>chunk(5)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
#Bean
public Job readCSVFileJob1() {
return jobBuilderFactory
.get("readCSVFileJob1")
.incrementer(new RunIdIncrementer())
.start(stepOne())
.build();
}
#Bean
public ItemProcessor<Stock, Stock> processor(){
return new DBLogProcessor();
}
#Bean
public FlatFileItemReader<Stock> reader() {
FlatFileItemReader<Stock> itemReader = new FlatFileItemReader<Stock>();
itemReader.setLineMapper(lineMapper());
itemReader.setLinesToSkip(1);
itemReader.setMaxItemCount(2000);
itemReader.setResource(new FileSystemResource("example.csv"));
return itemReader;
}
I have a requirement to execute a job with 2 modes using a parameter to distinguich betwwen the 2 modes .for example if the user use the parameter X in this case the job must read the data from the database and export it (all the records ) to an xml file.Otherwise if the user uses the parameter Y in this case the job must write each record in a separate xml file using the same header.
Use jobParameter to distinguish the modes:
#StepScope
#Bean
public Tasklet task(#Value("#{jobParameters['mode']}") String mode) {
}
If the modes are so different then you can't do them in same step, use decider:
FlowBuilder<Flow> flowBuilder = new FlowBuilder<>("modesFlow");
Flow flow = flowBuilder
.start(modesDecider)
.on("X")
.to(step1)
.from(modesDecider)
.on(step2)
.end()
.build();
jobBuilderFactory.get("modesJob")
.incrementer(new RunIdIncrementer())
.start(flow)
.end()
.build();
where:
ModesDecider implements JobExecutionDecider