I have a batch job where i am using ScriptBatch.3.0.x version.
My use-case is to retry the job incase of any intermediate failures in between.
I am using the Chunk based processing and StepBuilderFactory for a job. I could not see any difference by adding the retry in it.
return stepBuilderFactory.get("ValidationStepName")
.<Long, Info> chunk(10)
.reader(.....)
.processor(.....)
// .faultTolerant()
// .retryLimit(5)
// .retryLimit(5).retry(Exception.class)
.writer(......)
.faultTolerant()
.retryLimit(5)
//.retryLimit(5).retry(Exception.class)
.transactionManager(jpaTransactionManager())
.listener(new ChunkNotificationListener())
.build();
Not sure i am missing something here, I am expecting here that adding retryLimit() will retry the same chunk for n number of time on getting any exception
I am expecting here that adding retryLimit() will retry the same chunk for n number of time on getting any exception
If you specify a retry limit, you need to specify which exceptions to retry. Otherwise you would have an IllegalStateException with the message: If a retry limit is provided then retryable exceptions must also be specified.
EDIT:
Point 1 : The following test is passing with version 3.0.9:
import java.util.Arrays;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.ExpectedException;
import org.junit.runner.RunWith;
import org.mockito.Mock;
import org.mockito.junit.MockitoJUnitRunner;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.tasklet.TaskletStep;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.batch.item.support.ListItemWriter;
import org.springframework.transaction.PlatformTransactionManager;
#RunWith(MockitoJUnitRunner.class)
public class TestRetryConfig {
#Rule
public ExpectedException expectedException = ExpectedException.none();
#Mock
private JobRepository jobRepository;
#Mock
PlatformTransactionManager transactionManager;
#Test
public void testRetryLimitWithoutException() {
expectedException.expect(IllegalStateException.class);
expectedException.expectMessage("If a retry limit is provided then retryable exceptions must also be specified");
StepBuilderFactory stepBuilderFactory = new StepBuilderFactory(jobRepository, transactionManager);
TaskletStep step = stepBuilderFactory.get("step")
.<Integer, Integer>chunk(2)
.reader(new ListItemReader<>(Arrays.asList(1, 2, 3)))
.writer(new ListItemWriter<>())
.faultTolerant()
.retryLimit(3)
.build();
}
}
It shows that if you specify a retry limit without the exception type(s) to retry, the step configuration should fail.
Point 2: The following sample shows that the declared exception type is retried as expected (tested with version 3.0.9 too):
import java.util.Arrays;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJob {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public ItemReader<Integer> itemReader() {
return new ListItemReader<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
}
#Bean
public ItemWriter<Integer> itemWriter() {
return items -> {
for (Integer item : items) {
System.out.println("item = " + item);
if (item.equals(7)) {
throw new Exception("Sevens are sometime nasty, let's retry them");
}
}
};
}
#Bean
public Step step() {
return steps.get("step")
.<Integer, Integer>chunk(5)
.reader(itemReader())
.writer(itemWriter())
.faultTolerant()
.retryLimit(3)
.retry(Exception.class)
.build();
}
#Bean
public Job job() {
return jobs.get("job")
.start(step())
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
it prints:
item = 1
item = 2
item = 3
item = 4
item = 5
item = 6
item = 7
item = 6
item = 7
item = 6
item = 7
item 7 is retried 3 times and then the step fails as expected.
I hope this helps.
Related
I am new with Spring-Batch and I would like to understand how it should be used to process a List<String> as fast as possible in parallel using multiple threads and then just return a subset of them based on some condition.
For example, I was thinking to use it for checking which IP is up within a subnet.
import org.apache.commons.net.util.SubnetUtils;
String subnet = "192.168.8.0/24";
SubnetUtils utils = new SubnetUtils(subnet);
List<String> addresses = List.of(utils.getInfo().getAllAddresses());
if(InetAddress.getByName(address).isReachable(100){
// Consider this address for the final list
return true;
};
My code is as follows:
import it.eng.cysec.discoverer.service.NetworkService;
import lombok.RequiredArgsConstructor;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.JobScope;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.net.InetAddress;
import java.util.Arrays;
import java.util.Date;
#Configuration
#EnableBatchProcessing
#RequiredArgsConstructor
public class BatchConfiguration {
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
private final NetworkService networkService;
#Bean
public Job checkSubnetJob(Step checkIPStep){
return this.jobBuilderFactory.get("check-subnet-job")
.incrementer(new RunIdIncrementer())
.start(checkIPStep)
.build();
}
#Bean
#JobScope
public Step checkIPStep(#Value("#{jobParameters['subnet']}") String subnet) {
System.out.println("Subnet parameter: " + subnet);
return this.stepBuilderFactory.get("check-ip-step")
.<String, String>chunk(10)
.reader(reader(null))
.processor(processor())
.writer(writer())
.allowStartIfComplete(true)
.build();
}
#Bean
#JobScope
public ItemReader<String> reader(#Value("#{jobParameters['subnet']}") String subnet) {
return new ListItemReader<>(this.networkService.getAllSubnetAddresses(subnet));
}
#Bean
public ItemProcessor<String, String> processor() {
return ip -> {
System.out.println("Processor IP: " + ip + " " + new Date());
try {
InetAddress address = InetAddress.getByName(ip);
if(address.isReachable(5000)){
return ip;
}else {
return null;
}
}catch (Exception e){
return null;
}
};
}
#Bean
public ItemWriter<String> writer() {
// TODO How to pass the list of up IPs back to the calling function?
return list -> {
System.out.println("Arrays to String" + Arrays.toString(list.toArray()));
};
}
}
import lombok.RequiredArgsConstructor;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;
#RestController
#RequestMapping("test")
#RequiredArgsConstructor
public class TestController {
private final Job job;
private final JobLauncher jobLauncher;
#GetMapping()
public List<String> test(){
JobParameters parameters = new JobParametersBuilder()
.addString("subnet", "192.168.8.0/24", false)
.toJobParameters();
try {
this.jobLauncher.run(this.job, parameters);
} catch (Exception e) {
throw new RuntimeException(e);
}
// TODO How to return the IP that are up based on the previous object?
return List.of("OK");
}
}
So my main questions are:
How to make different chunks (of 10 IP) to be processed in parallel? Right now they are not.
What is the fastest approach that Spring-Batch provides to process all the IPs of a local network? Is it enough to keep them in memory or would it be better to persist them while processing the remaining IPs? If so, how?
How to pass back to the calling method the computed IPs result?
You can create a custom partitioner that partitions the input list based on indexes. Here is a quick example:
/*
* Copyright 2022 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.springframework.batch.sample;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.function.Consumer;
import java.util.stream.Stream;
import javax.sql.DataSource;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.partition.support.Partitioner;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.item.Chunk;
import org.springframework.batch.item.ExecutionContext;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.task.SimpleAsyncTaskExecutor;
import org.springframework.jdbc.datasource.embedded.EmbeddedDatabaseBuilder;
import org.springframework.jdbc.datasource.embedded.EmbeddedDatabaseType;
/**
* Example of a partitioned step where the input is a list, and partitions
* are sublists that are processed in parallel with local worker threads.
*
* #author Mahmoud Ben Hassine
*/
#Configuration
#EnableBatchProcessing
public class ListPartitioningSample {
#Bean
public Step managerStep(StepBuilderFactory stepBuilderFactory) {
List<String> items = Arrays.asList("foo1", "foo2", "foo3", "foo4", "foo5", "foo6", "foo7", "foo8"); // retrieved with this.networkService.getAllSubnetAddresses(subnet)
return stepBuilderFactory.get("managerStep")
.partitioner("workerStep", new ListPartitioner(items.size()))
.gridSize(2)
.taskExecutor(new SimpleAsyncTaskExecutor())
.step(workerStep(stepBuilderFactory))
.build();
}
#Bean
public Step workerStep(StepBuilderFactory stepBuilderFactory) {
return stepBuilderFactory.get("workerStep")
.<String, String>chunk(2)
.reader(itemReader(null))
.processor(itemProcessor())
.writer(itemWriter())
.build();
}
#Bean
#StepScope
public ListItemReader<String> itemReader(#Value("#{stepExecutionContext['range']}") Range partition) {
List<String> items = Arrays.asList("foo1", "foo2", "foo3", "foo4", "foo5", "foo6", "foo7", "foo8"); // retrieved with this.networkService.getAllSubnetAddresses(subnet)
return new ListItemReader<>(items.subList(partition.start, partition.end));
}
#Bean
public ItemProcessor<String, String> itemProcessor() {
return new ItemProcessor<String, String>() {
#Override
public String process(String item) throws Exception {
return item; // filter items as needed here
}
};
}
#Bean
public ItemWriter<String> itemWriter() {
return new ItemWriter<String>() {
#Override
public void write(List<? extends String> items) throws Exception {
items.forEach(new Consumer<String>() {
#Override
public void accept(String item) {
System.out.println(Thread.currentThread().getName() + ": " + item);
}
});
}
};
}
#Bean
public Job job(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory) {
return jobBuilderFactory.get("job")
.start(managerStep(stepBuilderFactory))
.build();
}
#Bean
public DataSource dataSource() {
return new EmbeddedDatabaseBuilder()
.setType(EmbeddedDatabaseType.HSQL)
.addScript("/org/springframework/batch/core/schema-hsqldb.sql")
.build();
}
// TODO quick and dirty implementation, please add sanity checks and verify edge cases
public static class ListPartitioner implements Partitioner {
private int listSize;
public ListPartitioner(int listSize) {
this.listSize = listSize;
}
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
// calculate ranges
int partitionSize = listSize / gridSize;
Range[] ranges = new Range[gridSize];
for (int i = 0, j = 0; i < gridSize; i++, j+= partitionSize) {
ranges[i] = new Range(j, j + partitionSize);
System.out.println("range = " + ranges[i]);
}
// prepare partition meta-data
Map<String, ExecutionContext> partitions = new HashMap<>(gridSize);
for (int i = 0; i < gridSize; i++) {
ExecutionContext context = new ExecutionContext();
context.put("range", ranges[i]);
partitions.put("partition" + i, context);
}
return partitions;
}
}
/**
* Represents an index range (ie a partition) in a list.
* Ex: List = ["foo1", "foo2", "bar1", "bar2"]
* Range1 = [0, 2] => sublist1 = ["foo1", "foo2"]
* Range2 = [2, 4] => sublist2 = ["bar1", "bar2"]
* #param start of sublist, inclusive
* #param end of sublist, exclusive
*/
record Range(int start, int end) {};
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(ListPartitioningSample.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
JobExecution jobExecution = jobLauncher.run(job, new JobParameters());
System.out.println("jobExecution = " + jobExecution);
}
}
The idea is to create sub lists and make each worker step work on a distinct sublist. (note the list is not duplicated, it could be shared and each worker thread will read its own distinct partition).
The sample above prints:
[main] INFO org.springframework.batch.core.launch.support.SimpleJobLauncher - Job: [SimpleJob: [name=job]] launched with the following parameters: [{}]
[main] INFO org.springframework.batch.core.job.SimpleStepHandler - Executing step: [managerStep]
range = Range[start=0, end=4]
range = Range[start=4, end=8]
SimpleAsyncTaskExecutor-1: foo1
SimpleAsyncTaskExecutor-1: foo2
SimpleAsyncTaskExecutor-2: foo5
SimpleAsyncTaskExecutor-2: foo6
SimpleAsyncTaskExecutor-1: foo3
SimpleAsyncTaskExecutor-1: foo4
SimpleAsyncTaskExecutor-2: foo7
SimpleAsyncTaskExecutor-2: foo8
[SimpleAsyncTaskExecutor-1] INFO org.springframework.batch.core.step.AbstractStep - Step: [workerStep:partition0] executed in 82ms
[SimpleAsyncTaskExecutor-2] INFO org.springframework.batch.core.step.AbstractStep - Step: [workerStep:partition1] executed in 82ms
[main] INFO org.springframework.batch.core.step.AbstractStep - Step: [managerStep] executed in 137ms
[main] INFO org.springframework.batch.core.launch.support.SimpleJobLauncher - Job: [SimpleJob: [name=job]] completed with the following parameters: [{}] and the following status: [COMPLETED] in 162ms
jobExecution = JobExecution: id=0, version=2, startTime=Wed Aug 17 12:21:00 CEST 2022, endTime=Wed Aug 17 12:21:00 CEST 2022, lastUpdated=Wed Aug 17 12:21:00 CEST 2022, status=COMPLETED, exitStatus=exitCode=COMPLETED;exitDescription=, job=[JobInstance: id=0, version=0, Job=[job]], jobParameters=[{}]
This shows that partitions (ie sublists) are processed in parallel by different threads.
Now to answer your question about how to gather written elements (the retained IPs in your case), you can put item indexes in the Execution context (not items them selves), and grab them from the execution context with a StepExecutionAggregator. You can find an example of how to do that in the word count fork/join sample that I shared here:
EDIT: show how to access the subnet job parameter from the item reader
You are already passing the subnet as a job parameter in your controller method. So you can access it in the item reader bean definition with a SpEL expression as follows:
#Bean
#StepScope
public ListItemReader<String> itemReader(#Value("#{stepExecutionContext['range']}") Range partition, #Value("#{jobParameters['subnet']}") String subnet) {
// use subnet parameter as needed here
List<String> items = Arrays.asList("foo1", "foo2", "foo3", "foo4", "foo5", "foo6", "foo7", "foo8"); // retrieved with this.networkService.getAllSubnetAddresses(subnet)
return new ListItemReader<>(items.subList(partition.start, partition.end));
}
am currently reading csv files with spring batch into objects where i have to save total lines as well as rejected/skipped lines of that current file ,and using StepExecutionListener didn't work since i need to get it before the step ends and not after the step . is there a way i can get them to be saved in itemProcessor or itemWriter without having to add another step?
i need to get it before the step ends and not after the step
You can't get the total number of lines without going until the end of the step (ie reading the entire file).
using StepExecutionListener didn't work
Using a step execution listener is the way to go. You did not share your code to see why this didn't work for you, but here is a quick example:
import java.util.Arrays;
import org.springframework.batch.core.ExitStatus;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.StepExecution;
import org.springframework.batch.core.StepExecutionListener;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJobConfiguration {
#Bean
public Job job(JobBuilderFactory jobs, StepBuilderFactory steps) {
return jobs.get("myJob")
.start(steps.get("myStep")
.<Integer, Integer>chunk(2)
.reader(new ListItemReader<>(Arrays.asList(1, 2, 3, 4)))
.processor((ItemProcessor<Integer, Integer>) item -> {
if (item % 2 != 0) {
throw new Exception("No odd numbers here!");
}
return item;
})
.writer(items -> items.forEach(System.out::println))
.faultTolerant()
.skip(Exception.class)
.skipLimit(5)
.listener(new StepExecutionListener() {
#Override
public void beforeStep(StepExecution stepExecution) {
System.out.println("Starting step " + stepExecution.getStepName());
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
System.out.println("Step "+ stepExecution.getStepName() + " is complete");
System.out.println("read.count = " + stepExecution.getReadCount());
System.out.println("write.count = " + stepExecution.getWriteCount());
System.out.println("skip.count = " + stepExecution.getSkipCount());
return stepExecution.getExitStatus();
}
})
.build())
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJobConfiguration.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
This prints:
Starting step myStep
2
4
Step myStep is complete
read.count = 4
write.count = 2
skip.count = 2
I am extending this question here - Identify which chunk has failed in chunk based step in Spring Batch.
Can you show me code to have below?
How to get to know which chunk has failed ?
How to make a counter and assign autoincremented values to one field which is not PK and save to DB?
In a non fault-tolerant step, the first error in any chunk will fail the step and you could get the chunk number with a counter in your ChunkListener implementation as mentioned previously. Here is a quick example:
import java.util.Arrays;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.listener.ChunkListenerSupport;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJobConfiguration {
#Bean
public Job job(JobBuilderFactory jobs, StepBuilderFactory steps) {
return jobs.get("job")
.start(steps.get("step")
.<Integer, Integer>chunk(5)
.reader(new ListItemReader<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)))
.writer(items -> {
System.out.println("About to write items " + items);
if (items.contains(8)) {
throw new Exception("No 8 here!");
}
})
.listener(new MyChunkListener())
.build())
.build();
}
static class MyChunkListener extends ChunkListenerSupport {
private int counter;
#Override
public void beforeChunk(ChunkContext context) {
counter++;
}
#Override
public void afterChunkError(ChunkContext context) {
System.out.println("Chunk number " + counter + " failed");
}
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJobConfiguration.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
This prints:
About to write items [1, 2, 3, 4, 5]
About to write items [6, 7, 8, 9, 10]
Chunk number 2 failed
However, in a fault-tolerant step, items will be retried one by one and Spring Batch will create single-item chunks. In this case, the ChunkListener will be called for each one of those single-item chunks, so the counter should be correctly interpreted. Here is a fault-tolerant version of the previous example:
import java.util.Arrays;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.listener.ChunkListenerSupport;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJobConfiguration {
#Bean
public Job job(JobBuilderFactory jobs, StepBuilderFactory steps) {
return jobs.get("job")
.start(steps.get("step")
.<Integer, Integer>chunk(5)
.faultTolerant()
.skip(Exception.class)
.skipLimit(10)
.reader(new ListItemReader<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)))
.writer(items -> {
System.out.println("About to write items " + items);
if (items.contains(8)) {
throw new Exception("No 8 here!");
}
})
.listener(new MyChunkListener())
.build())
.build();
}
static class MyChunkListener extends ChunkListenerSupport {
private int counter;
#Override
public void beforeChunk(ChunkContext context) {
counter++;
}
#Override
public void afterChunkError(ChunkContext context) {
System.out.println("Chunk number " + counter + " failed");
}
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJobConfiguration.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
which prints:
About to write items [1, 2, 3, 4, 5]
About to write items [6, 7, 8, 9, 10]
Chunk number 2 failed
About to write items [6]
About to write items [7]
About to write items [8]
Chunk number 5 failed
About to write items [9]
About to write items [10]
I am trying to write a solution to scale a spring batch. In the spring batch it's reading data (600 000) from the MySQL database and then process it and update the status of each processed row as 'Completed'.
I am using AsyncItemProcessor and AsyncItemWriter for scaling the spring batch.
Problem:
If I run the spring batch synchronously then it's taking same or less time while running the spring batch asynchronously. I am not getting the benefit of using multi threading.
package com.example.batchprocessing;
import javax.sql.DataSource;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecutionListener;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.integration.async.AsyncItemProcessor;
import org.springframework.batch.integration.async.AsyncItemWriter;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.JdbcCursorItemReader;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.FlatFileItemWriter;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor;
import org.springframework.batch.item.file.transform.DelimitedLineAggregator;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.support.CompositeItemWriter;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.core.io.FileSystemResource;
import org.springframework.core.task.SimpleAsyncTaskExecutor;
import org.springframework.core.task.TaskExecutor;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import org.springframework.transaction.annotation.Transactional;
import java.util.Arrays;
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Bean
public JdbcCursorItemReader<Person> reader(DataSource dataSource) {
JdbcCursorItemReader<Person> reader = new JdbcCursorItemReader<>();
reader.setDataSource(dataSource);
reader.setSql("SELECT * from people");
reader.setRowMapper(new UserRowMapper());
return reader;
}
#Bean
public PersonItemProcessor processor() {
return new PersonItemProcessor();
}
#Bean
public AsyncItemProcessor<Person, Person> asyncItemProcessor() throws Exception {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(30);
executor.setMaxPoolSize(50);
executor.setQueueCapacity(10000);
executor.setThreadNamePrefix("BatchProcessing-");
executor.afterPropertiesSet();
AsyncItemProcessor<Person, Person> asyncProcessor = new AsyncItemProcessor<>();
asyncProcessor.setDelegate(processor());
asyncProcessor.setTaskExecutor(executor);
asyncProcessor.afterPropertiesSet();
return asyncProcessor;
}
#Bean
public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("UPDATE people set status= 'completed' where person_id= :id")
.dataSource(dataSource)
.build();
}
#Bean
public AsyncItemWriter<Person> asyncItemWriter() {
AsyncItemWriter<Person> asyncWriter = new AsyncItemWriter<>();
asyncWriter.setDelegate(writer(null));
return asyncWriter;
}
#Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
return jobBuilderFactory.get("importUserJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.flow(step1)
.end()
.build();
}
#Bean
public Step step1(JdbcBatchItemWriter<Person> writer) throws Exception {
return stepBuilderFactory.get("step1")
.<Person, Person> chunk(10000)
.reader(reader(null))
//.processor(processor())
//.writer(writer)
.processor((ItemProcessor) asyncItemProcessor())
.writer(asyncItemWriter())
//.throttleLimit(30)
.build();
}
}
What am I doing wrong? Why is the AsyncItemProcessor taking more/same time as synchronous processing? Usually it should take less time. I can see in the log multiple thread is working but eventually the end time is the same as synchronous processing.
I agree to #5019386, asyncItemProcessor is helpful when processing an item is expensive. One suggestion that I can make is to move the code to create ThreadPoolTaskExecutor into a separate block as a bean so that you can avoid creating threads.
I am working on an application and have been asked to implement a scheduled spring batch job. I have set up a configuration file where I set a #Bean ResourcelessTransactionManager but it seems to mess with the persistence.xml.
There is already a persistence xml in an other module, there is no compilation error. I get a NoUniqueBeanDefinitionException when I am requesting a page that returns a view item.
This is the error:
Caused by: org.springframework.beans.factory.NoUniqueBeanDefinitionException: No qualifying bean of type [org.springframework.transaction.PlatformTransactionManager] is defined: expected single matching bean but found 2: txManager,transactionManager
at org.springframework.beans.factory.support.DefaultListableBeanFactory.getBean(DefaultListableBeanFactory.java:365)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.getBean(DefaultListableBeanFactory.java:331)
at org.springframework.transaction.interceptor.TransactionAspectSupport.determineTransactionManager(TransactionAspectSupport.java:366)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:271)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653)
at com.mypackage.services.MyClassService$$EnhancerBySpringCGLIB$$9e8bf16f.registryEvents(<generated>)
at com.mypackage.controllers.MyClassSearchView.init(MyClassSearchView.java:75)
... 168 more
Is there a way to tell spring batch to use the data source defined in the persistence.xml of the other module or maybe is this caused by something else?
I created separate BatchScheduler java class as below and included it in BatchConfiguration java class. I am sharing both the classes. BatchConfiguration contains another jpaTransactionManager.
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean;
import org.springframework.batch.support.transaction.ResourcelessTransactionManager;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableScheduling;
#Configuration
#EnableScheduling
public class BatchScheduler {
#Bean
public ResourcelessTransactionManager resourcelessTransactionManager() {
return new ResourcelessTransactionManager();
}
#Bean
public MapJobRepositoryFactoryBean mapJobRepositoryFactory(
ResourcelessTransactionManager resourcelessTransactionManager) throws Exception {
MapJobRepositoryFactoryBean factory = new
MapJobRepositoryFactoryBean(resourcelessTransactionManager);
factory.afterPropertiesSet();
return factory;
}
#Bean
public JobRepository jobRepository(
MapJobRepositoryFactoryBean factory) throws Exception {
return factory.getObject();
}
#Bean
public SimpleJobLauncher jobLauncher(JobRepository jobRepository) {
SimpleJobLauncher launcher = new SimpleJobLauncher();
launcher.setJobRepository(jobRepository);
return launcher;
}
}
BatchConfiguration contains another jpaTransactionManager.
import java.io.IOException;
import java.util.Date;
import java.util.Properties;
import javax.sql.DataSource;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.database.JpaItemWriter;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Import;
import org.springframework.context.support.PropertySourcesPlaceholderConfigurer;
import org.springframework.core.env.Environment;
import org.springframework.core.io.FileSystemResource;
import org.springframework.core.io.support.PathMatchingResourcePatternResolver;
import org.springframework.jdbc.datasource.DriverManagerDataSource;
import org.springframework.orm.jpa.JpaTransactionManager;
import org.springframework.orm.jpa.JpaVendorAdapter;
import org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean;
import org.springframework.orm.jpa.vendor.Database;
import org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter;
import org.springframework.scheduling.TaskScheduler;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.scheduling.concurrent.ConcurrentTaskScheduler;
import org.springframework.transaction.PlatformTransactionManager;
import trade.api.common.constants.Constants;
import trade.api.entity.SecurityEntity;
import trade.api.trade.batch.item.processor.SecurityItemProcessor;
import trade.api.trade.batch.item.reader.NseSecurityReader;
import trade.api.trade.batch.notification.listener.SecurityJobCompletionNotificationListener;
import trade.api.trade.batch.tasklet.SecurityReaderTasklet;
import trade.api.vo.SecurityVO;
#Configuration
#EnableBatchProcessing
#EnableScheduling
#Import({OhlcMonthBatchConfiguration.class, OhlcWeekBatchConfiguration.class, OhlcDayBatchConfiguration.class, OhlcMinuteBatchConfiguration.class})
public class BatchConfiguration {
private static final String OVERRIDDEN_BY_EXPRESSION = null;
/*
Load the properties
*/
#Value("${database.driver}")
private String databaseDriver;
#Value("${database.url}")
private String databaseUrl;
#Value("${database.username}")
private String databaseUsername;
#Value("${database.password}")
private String databasePassword;
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
private JobLauncher jobLauncher;
#Bean
public TaskScheduler taskScheduler() {
return new ConcurrentTaskScheduler();
}
//second, minute, hour, day of month, month, day(s) of week
//#Scheduled(cron = "0 0 21 * * 1-5") on week days
#Scheduled(cron="${schedule.insert.security}")
public void importSecuritySchedule() throws Exception {
System.out.println("Job Started at :" + new Date());
JobParameters param = new JobParametersBuilder().addString("JobID",
String.valueOf(System.currentTimeMillis())).toJobParameters();
JobExecution execution = jobLauncher.run(importSecuritesJob(), param);
System.out.println("Job finished with status :" + execution.getStatus());
}
#Bean SecurityJobCompletionNotificationListener securityJobCompletionNotificationListener() {
return new SecurityJobCompletionNotificationListener();
}
//Import Equity OHLC End
//Import Equity Start
// tag::readerwriterprocessor[]
#Bean
public SecurityReaderTasklet securityReaderTasklet() {
return new SecurityReaderTasklet();
}
#Bean
#StepScope
public NseSecurityReader<SecurityVO> nseSecurityReader(#Value("#{jobExecutionContext["+Constants.SECURITY_DOWNLOAD_FILE+"]}") String pathToFile) throws IOException {
NseSecurityReader<SecurityVO> reader = new NseSecurityReader<SecurityVO>();
reader.setLinesToSkip(1);
reader.setResource(new FileSystemResource(pathToFile));
reader.setLineMapper(new DefaultLineMapper<SecurityVO>() {{
setLineTokenizer(new DelimitedLineTokenizer() {{
setNames(new String[] { "symbol", "nameOfCompany", "series", "dateOfListing", "paidUpValue", "marketLot", "isinNumber", "faceValue" });
}});
setFieldSetMapper(new BeanWrapperFieldSetMapper<SecurityVO>() {{
setTargetType(SecurityVO.class);
}});
}});
return reader;
}
#Bean
public SecurityItemProcessor processor() {
return new SecurityItemProcessor();
}
#Bean
public JpaItemWriter<SecurityEntity> writer() {
JpaItemWriter<SecurityEntity> writer = new JpaItemWriter<SecurityEntity>();
writer.setEntityManagerFactory(entityManagerFactory().getObject());
return writer;
}
// end::readerwriterprocessor[]
// tag::jobstep[]
#Bean
public Job importSecuritesJob() throws IOException {
return jobBuilderFactory.get("importSecuritesJob")
.incrementer(new RunIdIncrementer())
.listener(securityJobCompletionNotificationListener())
.start(downloadSecurityStep())
.next(insertSecurityStep())
.build();
}
#Bean
public Step downloadSecurityStep() throws IOException {
return stepBuilderFactory.get("downloadSecurityStep")
.tasklet(securityReaderTasklet())
.build();
}
#Bean
public Step insertSecurityStep() throws IOException {
return stepBuilderFactory.get("insertSecurityStep")
.transactionManager(jpaTransactionManager())
.<SecurityVO, SecurityEntity> chunk(100)
.reader(nseSecurityReader(OVERRIDDEN_BY_EXPRESSION))
.processor(processor())
.writer(writer())
.build();
}
// end::jobstep[]
//Import Equity End
#Bean
public DataSource dataSource() {
DriverManagerDataSource dataSource = new DriverManagerDataSource();
dataSource.setDriverClassName(databaseDriver);
dataSource.setUrl(databaseUrl);
dataSource.setUsername(databaseUsername);
dataSource.setPassword(databasePassword);
return dataSource;
}
#Bean
public LocalContainerEntityManagerFactoryBean entityManagerFactory() {
LocalContainerEntityManagerFactoryBean lef = new LocalContainerEntityManagerFactoryBean();
lef.setPackagesToScan("trade.api.entity");
lef.setDataSource(dataSource());
lef.setJpaVendorAdapter(jpaVendorAdapter());
lef.setJpaProperties(new Properties());
return lef;
}
#Bean
public JpaVendorAdapter jpaVendorAdapter() {
HibernateJpaVendorAdapter jpaVendorAdapter = new HibernateJpaVendorAdapter();
jpaVendorAdapter.setDatabase(Database.MYSQL);
jpaVendorAdapter.setGenerateDdl(true);
jpaVendorAdapter.setShowSql(false);
jpaVendorAdapter.setDatabasePlatform("org.hibernate.dialect.MySQLDialect");
return jpaVendorAdapter;
}
#Bean
#Qualifier("jpaTransactionManager")
public PlatformTransactionManager jpaTransactionManager() {
return new JpaTransactionManager(entityManagerFactory().getObject());
}
#Bean
public static PropertySourcesPlaceholderConfigurer dataProperties(Environment environment) throws IOException {
String[] activeProfiles = environment.getActiveProfiles();
final PropertySourcesPlaceholderConfigurer ppc = new PropertySourcesPlaceholderConfigurer();
ppc.setLocations(new PathMatchingResourcePatternResolver().getResources("classpath*:application-"+activeProfiles[0]+".properties"));
return ppc;
}
//// Import Security End
}
Problem solved. There was a PlatformTransactionManager bean located in an other configuration file. I set it as #Primary and now the problem is fixed. Thanks everyone for the help.