Spring Batch AsyncItemProcessor not processing in parallel - spring

I have a AsyncItemProcessor which I want to run in parallel with the following config. But, the processing is not happening in parallel.
#Configuration
#EnableBatchProcessing
#EnableAsync
public class JobConfig {
#Autowired
private JobBuilderFactory jobBuilder;
#Autowired
private StepBuilderFactory stepBuilder;
#Autowired
#Qualifier("writer")
private ItemWriter writer;
#Bean
#JobScope
public ItemProcessor itemProcessor() {
ItemProcessor itemProcessor = new ItemProcessor();
return itemProcessor;
}
#Bean
#JobScope
public AsyncItemProcessor asyncItemProcessor() throws IOException {
AsyncItemProcessor asyncItemProcessor = new AsyncItemProcessor();
asyncItemProcessor.setDelegate(itemProcessor());
asyncItemProcessor.setTaskExecutor(getAsyncExecutor());
asyncItemProcessor.afterPropertiesSet();
return asyncItemProcessor;
}
#Bean(name = "asyncExecutor")
public TaskExecutor getAsyncExecutor() {
SimpleAsyncTaskExecutor simpleAsyncTaskExecutor = new SimpleAsyncTaskExecutor() {
#Override
protected void doExecute(Runnable task) {
final JobExecution jobExecution = JobSynchronizationManager.getContext().getJobExecution();
super.doExecute(() -> {
JobSynchronizationManager.register(jobExecution);
try {
task.run();
} finally {
JobSynchronizationManager.close();
}
});
}
};
simpleAsyncTaskExecutor.setThreadNamePrefix("processing 1-");
simpleAsyncTaskExecutor.setConcurrencyLimit(100);
return simpleAsyncTaskExecutor;
}
#Bean
#JobScope
public AsyncItemWriter asyncItemWriter() throws IOException {
AsyncItemWriter asyncItemWriter = new AsyncItemWriter<>();
asyncItemWriter.setDelegate(writer);
asyncItemWriter.afterPropertiesSet();
return asyncItemWriter;
}
#Bean
#JobScope
public FlatFileItemReader<Result> requestFileReader() {
DefaultLineMapper lineMapper = new DefaultLineMapper();
......
FlatFileItemReader<Result> itemReader = new FlatFileItemReader<>();
itemReader.setLineMapper(lineMapper);
return itemReader;
}
#Bean
public Step simpleFileStep() throws IOException {
return stepBuilder.get("simpleFileStep").chunk(100).reader(fileReader).processor(asyncItemProcessor())
.writer(asyncItemWriter()).build();
}
#Bean(name = "customExecutionContext")
#JobScope
public ExecutionContext customExecutionContext() {
ExecutionContext executionContext = new ExecutionContext();
return executionContext;
}
}
The Processor Class :
#JobScope
public class RequestProcessor implements ItemProcessor<Result, List<Item>> {
#Value("#{jobExecution}")
private JobExecution jobExecution;
#Autowired
#Qualifier("customExecutionContext")
private ExecutionContext storedContext;
#Override
public List<Item> process(Result result) throws Exception {
Date start = new Date();
// Processing logic
Date end = new Date();
long diff = end.getTime() - start.getTime();
log.info("Time taken to process the
items:"+TimeUnit.SECONDS.convert(diff, TimeUnit.MILLISECONDS));
return items;
}
}
I want to process File with 1000 records in parallel in this scenario, but only 100 items are getting processed at a time and writing at a time.
Please let me know if there is some issue with config.
Also after processing each chunk of 100 items, I am having a delay of 2 minutes before processing next chunk. In that time, I can see only following logs:
[GC concurrent-string-deduplication, 16.2K->0.0B(16.2K), avg 88.7%, 0.0000820 secs]
[GC pause (G1 Evacuation Pause) (young) 687M->302M(768M), 0.0138859 secs]

Concurrency is different than parallelism. The AsyncItemProcessor is designed to work hand in hand with the AsyncItemWriter to process items concurrently. In your case, a single chunk of 100 items will be processed concurrently, but chunks will not be processed in parallel. It is still a serial execution of chunks, but every chunk is processed concurrently by different threads from the task executor.
There is no way to process chunks in parallel in Spring Batch. What Spring Batch provides though is partitioning, where partitions can be processed in parallel (either with local threads, or with remote JVMs).
So what you can do is partition your input (for example with 1000 items in each partition) and configure a partitioned step to process partitions in parallel. Note that each partition can further process chunks concurrently as well. You can find more details and a code example in the reference documentation here: Scaling and Parallel Processing.

Related

How to restart the Spring Batch Job automatically when abnormal exceptions are arises?

Before raising the question I went through many links like : How can you restart a failed spring batch job and let it pick up where it left off? and Spring Batch restart uncompleted jobs from the same execution and step and https://learning.oreilly.com/library/view/the-definitive-guide/9781484237243/html/215885_2_En_6_Chapter.xhtml, but this doesn't solved my query yet.
I am using Spring Boot Batch application, in my project I've 3 Jobs which runs sequentially on scheduled basis daily 2 PM in the night wrapped up in a single method and each jobs has 5 steps which performs chunk-based processing does not use tasklet.
I often see an issues like network fluctuations, database is down and abnormal issues spring batch is sopping a while job and getting a lot of issues of data loss since there is no way to automatically restart from where if failed.
I want to developed ability to automatically restart the batch jobs when any type of abnormal exceptions arises. Is there any way if we can do that ?
I've configured batch jobs like below.
MyApplication.java
#SpringBootApplication
#EnableBatchProcessing
#EnableScheduling
public class MyApplication {
public static void main(String[] args) {
SpringApplication.run(MyApplication.class, args);
}
}
MyJob.java
#Configuration
public class MyJob {
#Value("${skip.limit}")
private Integer skipLimit;
#Value("${chunk.size}")
private Integer chunkSize;
#Bean(name="myJobCache")
public CacheManager myJobCache() {
return new ConcurrentMapCacheManager();
}
#Bean("customerJob")
public Job customerJob(JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
JdbcCursorItemReader<Customer> customerReader,
JdbcCursorItemReader<Department> departmentReader,
JdbcCursorItemReader<Stock> stockItemReader,
JdbcCursorItemReader<Advisory> advisoryItemReder) throws Exception {
return jobBuilderFactory.get("customerJob")
.incrementer(new RunIdIncrementer())
.start(customerStep(stepBuilderFactory, customerReader))
.next(departmentStep(stepBuilderFactory, departmentReader))
.next(stackStep(stepBuilderFactory))
.......
.......
.......
.listener(customerListener())
.build();
}
#Bean
public Step customerStep(StepBuilderFactory stepBuilderFactory,
JdbcCursorItemReader<Customer> customerReader) {
return stepBuilderFactory.get("customerStep").
<Customer, NewCustomer>chunk(chunkSize)
.reader(customerReader)
.processor(customerProcessor())
.writer(customerWriter())
.faultTolerant()
.skip(Exception.class)
.skipLimit(skipLimit)
.listener(customerSkipListener())
.listener(customerStepListener())
.build();
}
#Bean
public CustomerProcessor customerProcessor() {
return new CustomerProcessor(myJobCache());
}
#Bean
public CustomerWriter customerWriter() {
return new CustomerWriter();
}
// Like this all other jobs are configured
}
MyScheduler.java
public class MyScheduler {
#Autowired
private JobLauncher customerJobLauncher;
#Autowired
private JobLauncher abcJobLauncher;
#Autowired
private JobLauncher xyzJobLauncher;
#Autowired
#Qualifier(value = "customerJob")
private Job customerJob;
#Autowired
#Qualifier(value = "abcJob")
private Job abcJob;
#Autowired
#Qualifier(value = "xyzJob")
private Job xyzJob;
#Scheduled(cron = "0 0 */1 * * *") // run at every hour for testing
public void handle() {
JobParameters params = new JobParametersBuilder()
.addString("cust.job.id", String.valueOf(System.currentTimeMillis()))
.addDate("cust.job.date", new Date()).toJobParameters();
long diff = 0;
try {
JobExecution jobExecution = customerJobLauncher.run(customerJob, params);
Date start = jobExecution.getCreateTime();
JobParameters job2Params = new JobParametersBuilder()
.addString("abc.job.id", String.valueOf(System.currentTimeMillis()))
.addDate("abc.job.date", new Date()).toJobParameters();
JobExecution job2Execution = abcJobLauncher.run(abcJob, job2Params);
JobParameters job3Params = new JobParametersBuilder()
.addString("xyz.job.id", String.valueOf(System.currentTimeMillis()))
.addDate("xyx.job.date", new Date()).toJobParameters();
JobExecution job3Execution = xyzJobLauncher.run(xyzJob, job3Params);
Date end = job3Execution.getEndTime();
diff = end.getTime() - start.getTime();
log.info(JobExecutionTimeCalculate.getJobExecutionTime(diff));
} catch (JobExecutionAlreadyRunningException | JobRestartException | JobInstanceAlreadyCompleteException
| JobParametersInvalidException e) {
log.error("Job Failed : " + e.getMessage());
}
}
}

How to use spring transaction support with Spring Batch

I am trying to use spring batch to read file from a .dat file and persist the data into database. My requirement says to either insert all of the data or insert none of the data into table i.e, atomicity. However, using spring batch i'm not able to achieve the same it is reading data in chunks and is inserting data as long as the records are fine. if at some point the record is inappropriate and some db exception is thrown then i want complete rollback which is not happening. Let's say we get error at 2051th record then my code saves 2050 records but i want complete rollback and if all data is good then all N records should be persisted. Thanks in advance for any help or relevant approach that may solve my issue...
NOTE: I have already used Spring Transactional annotation on caller method but it's not working and i'm reading data in a chunk size of 10 items.
MyConfiguration.java
#Configuration
public class MyConfiguration
{
#Autowired
JobBuilderFactory jobBuilderFactory;
#Autowired
StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("MyCompletionListener")
JobCompletionNotificationListener jobCompletionNotificationListener;
#StepScope
#Bean(name="MyReader")
public FlatFileItemReader<InputMapperDTO> reader(#Value("#{jobParameters['fileName']}") String fileName) throws IOException
{
FlatFileItemReader<InputMapperDTO> newBean = new FlatFileItemReader<>();
newBean.setName("MyReader");
newBean.setResource(new InputStreamResource(FileUtils.openInputStream(new File(fileName))));
newBean.setLineMapper(lineMapper());
newBean.setLinesToSkip(1);
return newBean;
}
#Bean(name="MyLineMapper")
public DefaultLineMapper<InputMapperDTO> lineMapper()
{
DefaultLineMapper<InputMapperDTO> lineMapper = new DefaultLineMapper<>();
lineMapper.setLineTokenizer(lineTokenizer());
Reader reader = new Reader();
lineMapper.setFieldSetMapper(reader);
return lineMapper;
}
#Bean(name="MyTokenizer")
public DelimitedLineTokenizer lineTokenizer()
{
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setNames("InvestmentAccountUniqueIdentifier", "BaseCurrencyUniqueIdentifier",
"OperatingCurrencyUniqueIdentifier", "PricingHierarchyUniqueIdentifier", "InvestmentAccountNumber",
"DummyAccountIndicator", "InvestmentAdvisorCompanyNumberLegacy","HighNetWorthAccountTypeCode");
tokenizer.setIncludedFields(0, 5, 7, 13, 29, 40, 49,75);
return tokenizer;
}
#Bean(name="MyBatchProcessor")
public ItemProcessor<InputMapperDTO, FinalDTO> processor()
{
return new Processor();
}
#Bean(name="MyWriter")
public ItemWriter<FinalDTO> writer()
{
return new Writer();
}
#Bean(name="MyStep")
public Step step1() throws IOException
{
return stepBuilderFactory.get("MyStep")
.<InputMapperDTO, FinalDTO>chunk(10)
.reader(this.reader(null))
.processor(this.processor())
.writer(this.writer())
.build();
}
#Bean(name=MyJob")
public Job importUserJob(#Autowired #Qualifier("MyStep") Step step1)
{
return jobBuilderFactory
.get("MyJob"+new Date())
.incrementer(new RunIdIncrementer())
.listener(jobCompletionNotificationListener)
.flow(step1)
.end()
.build();
}
}
Writer.java
public class Writer implements ItemWriter<FinalDTO>
{
#Autowired
SomeRepository someRepository;
#Override
public void write(List<? extends FinalDTO> listOfObjects) throws Exception
{
someRepository.saveAll(listOfObjects);
}
}
JobCompletionNotificationListener.java
public class JobCompletionNotificationListener extends JobExecutionListenerSupport
{
#Override
public void afterJob(JobExecution jobExecution)
{
if(jobExecution.getStatus() == BatchStatus.COMPLETED)
{
System.err.println("****************************************");
System.err.println("***** Batch Job Completed ******");
System.err.println("****************************************");
}
else
{
System.err.println("****************************************");
System.err.println("***** Batch Job Failed ******");
System.err.println("****************************************");
}
}
}
MyCallerMethod
#Transactional
public String processFile(String datFile) throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException
{
long st = System.currentTimeMillis();
JobParametersBuilder builder = new JobParametersBuilder();
builder.addString("fileName",datFile);
builder.addDate("date", new Date());
jobLauncher.run(job, builder.toJobParameters());
System.err.println("****************************************");
System.err.println("***** Total time consumed = "+(System.currentTimeMillis()-st)+" ******");
System.err.println("****************************************");
return response;
}
The operation I have tried is not provided in batch. For my requirement, I have implemented custom delete which flushes the database upon failure in any step.

Spring batch JdbcPagingItemReader not able to read all events

I had spring batch application like below (table name and query are edited for some general names)
when i execute this program, it was able to read 7500 events , i.e 3 times of chunk size and not able to read remaining records in oracle database. I had a table contain 50 million records and able to copy to another noSql database.
#EnableBatchProcessing
#SpringBootApplication
#EnableAutoConfiguration
public class MultiThreadPagingApp extends DefaultBatchConfigurer{
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
public DataSource dataSource;
#Bean
public DataSource dataSource() {
final DriverManagerDataSource dataSource = new DriverManagerDataSource();
dataSource.setDriverClassName("oracle.jdbc.OracleDriver");
dataSource.setUrl("jdbc:oracle:thin:#***********");
dataSource.setUsername("user");
dataSource.setPassword("password");
return dataSource;
}
#Override
public void setDataSource(DataSource dataSource) {}
#Bean
#StepScope
ItemReader<UserModel> dbReader() throws Exception {
JdbcPagingItemReader<UserModel> reader = new JdbcPagingItemReader<UserModel>();
final SqlPagingQueryProviderFactoryBean sqlPagingQueryProviderFactoryBean = new SqlPagingQueryProviderFactoryBean();
sqlPagingQueryProviderFactoryBean.setDataSource(dataSource);
sqlPagingQueryProviderFactoryBean.setSelectClause("select * ");
sqlPagingQueryProviderFactoryBean.setFromClause("from user");
sqlPagingQueryProviderFactoryBean.setWhereClause("where id>0");
sqlPagingQueryProviderFactoryBean.setSortKey("name");
reader.setQueryProvider(sqlPagingQueryProviderFactoryBean.getObject());
reader.setDataSource(dataSource);
reader.setPageSize(2500);
reader.setRowMapper(new BeanPropertyRowMapper<>(UserModel.class));
reader.afterPropertiesSet();
reader.setSaveState(true);
System.out.println("Reading users anonymized in chunks of {}"+ 2500);
return reader;
}
#Bean
public Dbwriter writer() {
return new Dbwriter(); // I had another class for this
}
#Bean
public Step step1() throws Exception {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4);
taskExecutor.setMaxPoolSize(10);
taskExecutor.afterPropertiesSet();
return this.stepBuilderFactory.get("step1")
.<UserModel, UserModel>chunk(2500)
.reader(dbReader())
.writer(writer())
.taskExecutor(taskExecutor)
.build();
}
#Bean
public Job multithreadedJob() throws Exception {
return this.jobBuilderFactory.get("multithreadedJob")
.start(step1())
.build();
}
#Bean
public PlatformTransactionManager getTransactionManager() {
return new ResourcelessTransactionManager();
}
#Bean
public JobRepository getJobRepo() throws Exception {
return new MapJobRepositoryFactoryBean(getTransactionManager()).getObject();
}
public static void main(String[] args) {
SpringApplication.run(MultiThreadPagingApp.class, args);
}
}
Can you help me how can i efficiently read all the records using spring batch, or help me any other approach to handle this. I had tried one approch mentioned here : http://techdive.in/java/jdbc-handling-huge-resultset
its taken 120 mins to read and save all records with single thread application. Since spring batch is best fit for this, I assume we can handle this scenario in quick time.
You are setting the saveState flag to true (BTW, it should be set before calling afterPropertiesSet) on a JdbcPagingItemReader and using this reader in a multithreaded step. However, it is documented to set this flag to false in a multi-threaded context.
Multi-threading with database readers is usually not the best option, I would recommend to use partitioning in your case.
I had the same problem, i fix it by changing my sortKey. I realize that the previous one wasn't different for every data records. So i replace it with ID whitch were different for every records of the Database

Spring Batch: Reading from a JMS queue, step does not end

I have a simple bach job which reads from a JMS queue (ActiveMQ) and writes to a file. The batch job runs as expected and writes to the file honoring the commit interval which has been set to 10,000.
There are 2 observations in this regard
The batch job reading queue does not end.
I see that all the messages from the queue have been consumed but the last chunk gets written to file only when new messages are pushed to the JMS queue and the commit interval is met.
Is it the expected behavior? I would like to schedule the batch job and consume and write all the messages present in the queue at that point of time. Any advise?
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Bean
public TransactionAwareConnectionFactoryProxy activeMQConnectionFactory() {
ActiveMQConnectionFactory amqConnectionFactory = new ActiveMQConnectionFactory(ActiveMQConnection.DEFAULT_BROKER_URL);
TransactionAwareConnectionFactoryProxy activeMQConnectionFactory = new TransactionAwareConnectionFactoryProxy(amqConnectionFactory);
return activeMQConnectionFactory;
}
#Bean
public ActiveMQQueue defaultQueue() {
return new ActiveMQQueue("firstQueue");
}
#Bean
public PlatformTransactionManager transactionManager() {
return new ResourcelessTransactionManager();
}
#Bean
public JobRepository jobRepository(PlatformTransactionManager transactionManager) throws Exception {
return new MapJobRepositoryFactoryBean(transactionManager).getObject();
}
#Bean
#DependsOn("jobRepository")
public SimpleJobLauncher simpleJobLauncher(JobRepository jobRepository) {
SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
simpleJobLauncher.setJobRepository(jobRepository);
return simpleJobLauncher;
}
If I set the receiveTimeout to a smaller number, all messages are not consumed, thus set to the upper limit.
#Bean
#DependsOn(value = { "activeMQConnectionFactory", "defaultQueue" })
public JmsTemplate firstQueueTemplate(ActiveMQQueue defaultQueue, TransactionAwareConnectionFactoryProxy activeMQConnectionFactory) {
JmsTemplate firstQueueTemplate = new JmsTemplate(activeMQConnectionFactory);
firstQueueTemplate.setDefaultDestination(defaultQueue);
firstQueueTemplate.setSessionTransacted(true);
firstQueueTemplate.setReceiveTimeout(Long.MAX_VALUE);
return firstQueueTemplate;
}
Config for the batch job.
#Bean
public JmsItemReader<String> jmsItemReader(JmsTemplate firstQueueTemplate) {
JmsItemReader<String> jmsItemReader = new JmsItemReader<>();
jmsItemReader.setJmsTemplate(firstQueueTemplate);
jmsItemReader.setItemType(String.class);
return jmsItemReader;
}
#Bean
public ItemWriter<String> flatFileItemWriter() {
FlatFileItemWriter<String> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource("/mypath/output.csv"));
writer.setLineAggregator(new PassThroughLineAggregator<String>());
return writer;
}
#Bean
#DependsOn(value = { "jmsItemReader", "jmsItemWriter", "jobRepository", "transactionManager" })
public Step queueReaderStep(JmsItemReader<String> jmsItemReader, ItemWriter<String> flatFileItemWriter, JobRepository jobRepository,
PlatformTransactionManager transactionManager) throws Exception {
StepBuilderFactory stepBuilderFactory = new StepBuilderFactory(jobRepository, transactionManager);
AbstractTaskletStepBuilder<SimpleStepBuilder<String, String>> step = stepBuilderFactory.get("queueReaderStep").<String, String> chunk(10000)
.reader(jmsItemReader).writer(flatFileItemWriter);
return step.build();
}
#Bean
#DependsOn(value = { "jobRepository", "queueReaderStep" })
public Job jsmReaderJob(JobRepository jobRepository, Step queueReaderStep) {
return this.jobBuilderFactory.get("jsmReaderJob").repository(jobRepository).incrementer(new RunIdIncrementer())
.flow(queueReaderStep).end().build();
}
The JmsItemReader provided by Spring Batch is really meant as more of a template or example since, as you note, it never returns null so the step never ends. You'd need to write something to indicate that a given message indicated that the step was complete.

Spring boot batch partitioning JdbcCursorItemReader error

I have been unable to get this to work even after following Victor Jabor blog very comprehensive example. I have followed his configuration as he described and used all the latest dependencies. I, as Victor am trying to read from one db and write to another. I have this working without partitioning but need partitioning to improve performance as I need to be able to read 5 to 10 million rows within 5mins.
The following seems to work:
1) ColumnRangePartitioner
2) TaskExecutorPartitionHandler builds the correct number of step tasks based on the gridsize and spawns the correct number of threads
3) setPreparedStatementSetter from the stepExecution set by the ColumnRangePartitioner.
But when I run the application I get errors from JdbcCursorItemReader which are not consistent and which I dont understand. As a last resort I will have to debug the JdbcCursorItemReader. I am hoping to get some help before this and hopefully it will be a configuration issue.
ERROR:
Caused by: java.sql.SQLException: Exhausted Resultset
at oracle.jdbc.driver.OracleResultSetImpl.getInt(OracleResultSetImpl.java:901) ~[ojdbc6-11.2.0.2.0.jar:11.2.0.2.0]
at org.springframework.jdbc.support.JdbcUtils.getResultSetValue(JdbcUtils.java:160) ~[spring-jdbc-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.jdbc.core.BeanPropertyRowMapper.getColumnValue(BeanPropertyRowMapper.java:370) ~[spring-jdbc-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.jdbc.core.BeanPropertyRowMapper.mapRow(BeanPropertyRowMapper.java:291) ~[spring-jdbc-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.batch.item.database.JdbcCursorItemReader.readCursor(JdbcCursorItemReader.java:139) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar:3.0.7.RELEASE]
Configuration classes:
#Configuration #EnableBatchProcessing public class BatchConfiguration {
#Bean
public ItemProcessor<Archive, Archive> processor(#Value("${etl.region}") String region) {
return new ArchiveProcessor(region);
}
#Bean
public ItemWriter<Archive> writer(#Qualifier(value = "postgres") DataSource dataSource) {
JdbcBatchItemWriter<Archive> writer = new JdbcBatchItemWriter<>();
writer.setSql("insert into tdw_src.archive (id) " +
"values (:id)");
writer.setDataSource(dataSource);
writer.setItemSqlParameterSourceProvider(new org.springframework.batch.item.database.
BeanPropertyItemSqlParameterSourceProvider<>());
return writer;
}
#Bean
public Partitioner archivePartitioner(#Qualifier(value = "gmDataSource") DataSource dataSource,
#Value("ROWNUM") String column,
#Value("archive") String table,
#Value("${gm.datasource.username}") String schema) {
return new ColumnRangePartitioner(dataSource, column, schema + "." + table);
}
#Bean
public Job archiveJob(JobBuilderFactory jobs, Step partitionerStep, JobExecutionListener listener) {
return jobs.get("archiveJob")
.preventRestart()
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(partitionerStep)
.build();
}
#Bean
public Step partitionerStep(StepBuilderFactory stepBuilderFactory,
Partitioner archivePartitioner,
Step step1,
#Value("${spring.batch.gridsize}") int gridSize) {
return stepBuilderFactory.get("partitionerStep")
.partitioner(step1)
.partitioner("step1", archivePartitioner)
.gridSize(gridSize)
.taskExecutor(taskExecutor())
.build();
}
#Bean(name = "step1")
public Step step1(StepBuilderFactory stepBuilderFactory, ItemReader<Archive> customReader,
ItemWriter<Archive> writer, ItemProcessor<Archive, Archive> processor) {
return stepBuilderFactory.get("step1")
.listener(customReader)
.<Archive, Archive>chunk(5)
.reader(customReader)
.processor(processor)
.writer(writer)
.build();
}
#Bean
public TaskExecutor taskExecutor(){
return new SimpleAsyncTaskExecutor();
}
#Bean
public SimpleJobLauncher getJobLauncher(JobRepository jobRepository) {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository);
return jobLauncher;
}
Custom Reader:-
public class CustomReader extends JdbcCursorItemReader<Archive> implements StepExecutionListener {
private StepExecution stepExecution;
#Autowired
public CustomReader(#Qualifier(value = "gmDataSource") DataSource geomangerDataSource,
#Value("${gm.datasource.username}") String schema) throws Exception {
super();
this.setSql("SELECT TMP.* FROM (SELECT ROWNUM AS ID_PAGINATION, id FROM " + schema + ".archive) TMP " +
"WHERE TMP.ID_PAGINATION >= ? AND TMP.ID_PAGINATION <= ?");
this.setDataSource(geomangerDataSource);
BeanPropertyRowMapper<Archive> rowMapper = new BeanPropertyRowMapper<>(Archive.class);
this.setRowMapper(rowMapper);
this.setFetchSize(5);
this.setSaveState(false);
this.setVerifyCursorPosition(false);
// not sure if this is needed? this.afterPropertiesSet();
}
#Override
public synchronized void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
this.setPreparedStatementSetter(getPreparedStatementSetter());
}
private PreparedStatementSetter getPreparedStatementSetter() {
ListPreparedStatementSetter listPreparedStatementSetter = new ListPreparedStatementSetter();
List<Integer> list = new ArrayList<>();
list.add(stepExecution.getExecutionContext().getInt("minValue"));
list.add(stepExecution.getExecutionContext().getInt("maxValue"));
listPreparedStatementSetter.setParameters(list);
LOGGER.debug("getPreparedStatementSetter list: " + list);
return listPreparedStatementSetter;
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
return null;
}
}
I've got this all working.
First I needed to order my select statement in my CustomReader so the rownum remains the same for all threads and lastly I had to scope the beans by using #StepScope for each bean used in the step.
In reality I wont be using rownum since this needs to be ordered which reduce loose performance and therefore I will use a pk column to get the best performance.

Resources