Is there a way to do a spring batch processor next page? - spring

I set the chunk size to 15 and the page size to 15 at the time of programming, and made the batch program run every minute and 1 second. The result I want is, for example, when I have 30 data, I process 15 data in the first second and then 15 data in the immediately following page. But the way it works now is that it processes 15 data in the first second, then another 15 in the next second, and so on. Is there a way to do an action until the end of the page when executed once?
#Bean
public Job ConfirmJob() throws Exception {
Job exampleJob = jobBuilderFactory.get("ConfirmJob")
.start(Step())
.build();
return exampleJob;
}
#Bean
#JobScope
public Step Step() throws Exception {
return stepBuilderFactory.get("Step")
.<UserOrder,UserOrder>chunk(15)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
#Bean
#StepScope
public JpaPagingItemReader<UserOrder> reader() throws Exception {
Map<String,Object> parameterValues = new HashMap<>();
parameterValues.put("beforeDay", LocalDateTime.now().minusDays(14));
parameterValues.put("od", OrderStatus.ship);
return new JpaPagingItemReaderBuilder<UserOrder>()
.pageSize(15)
.parameterValues(parameterValues)
.queryString("SELECT uo FROM UserOrder uo where uo.standardfinishAt <: beforeDay And uo.orderStatus =: od ORDER BY id ASC")
.entityManagerFactory(entityManagerFactory)
.name("JpaPagingItemReader")
.build();
}
#Bean
#StepScope
public ItemProcessor<UserOrder, UserOrder> processor(){
return new ItemProcessor<UserOrder, UserOrder>() {
#Override
public UserOrder process(UserOrder us) throws Exception {
us.batchConfirm(LocalDateTime.now());
return us;
}
};
}
#Bean
#StepScope
public JpaItemWriter<UserOrder> writer(){
return new JpaItemWriterBuilder<UserOrder>()
.entityManagerFactory(entityManagerFactory)
.build();
}
}
this is scheduler code
#Autowired
private ResignConfiguration resignConfiguration;
#Scheduled(cron = "1 * * * * ?")
public void runConfirmJob() {
Map<String, JobParameter> confMap = new HashMap<>();
confMap.put("time", new JobParameter(System.currentTimeMillis()));
JobParameters jobParameters = new JobParameters(confMap);
try {
jobLauncher.run(jobConfiguration.ConfirmJob(), jobParameters);
} catch (JobExecutionAlreadyRunningException | JobInstanceAlreadyCompleteException
| JobParametersInvalidException | org.springframework.batch.core.repository.JobRestartException e) {
log.error(e.getMessage());
} catch (Exception e) {
e.printStackTrace();
}
}

It's a paging problem
JpaPagingItemReader<UserOrder> reader = new JpaPagingItemReader<UserOrder>() {
#Override
public int getPage() {
return 0;
}
};
reader.setParameterValues(parameterValues);
reader.setQueryString("SELECT uo FROM UserOrder uo where uo.createdAt <: limitDay And uo.orderStatus =: od ORDER BY id ASC");
reader.setPageSize(100);
reader.setEntityManagerFactory(entityManagerFactory);
reader.setName("JpaPagingItemReader");

Related

Spring batch exception handling sended as ResponseEntity

i m new in Spring boot, i'm training on a small project with Spring batch to get experience, Here my context: I have 2 csv file, one hold employees, the other contains all managers of the compagny. I have to read files, then add each record in database. To make it simple , i just need to call an endpoint from my controller , upload my csv file (multipartfile), then the job will start. I actually was able to do that, my problem is the following.
I have to manage multiple kind of validation (i'm using jsr 380 validation for my entites and i have also to check business exception). A kind of buisness exception can be the following rule, An employee is supervised by a manager of his departement (the employee can't be supervised by a manager, if he's not on same departement, otherwise should throw exception). So for mistaken records, with some invalid or "Illogic" input, i have to skip them (don't save on database) but store them in an Map or List that should be sended as Responses Enity to the client. Hence the client would know which row need to be fixed. I suppose i have to take a look about** Listeners** , But i really can t store exceptions in a map or list then send it as ResponseEntity. Bellow Example of what i want to achieve.
My csv files screenshots
EmployeeBatchConfig.java
#Configuration
#EnableBatchProcessing
#AllArgsConstructor
public class EmployeeBatchConfig {
private JobBuilderFactory jobBuilderFactory;
private StepBuilderFactory stepBuilderFactory;
private EmployeeRepository employeeRepository;
private EmployeeItemWriter employeeItemWriter;
#Bean
#StepScope
public FlatFileItemReader<EmployeeDto> itemReader(#Value("#
{jobParameters[fullPathFileName]}") final String pathFile) {
FlatFileItemReader<EmployeeDto> flatFileItemReader = new
FlatFileItemReader<>();
flatFileItemReader.setResource(new FileSystemResource(new
File(pathFile)));
flatFileItemReader.setName("CSV-Reader");
flatFileItemReader.setLinesToSkip(1);
flatFileItemReader.setLineMapper(lineMapper());
return flatFileItemReader;
}
private LineMapper<EtudiantDto> lineMapper() {
DefaultLineMapper<EtudiantDto> lineMapper = new DefaultLineMapper<>
();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setDelimiter(",");
lineTokenizer.setStrict(false);
lineTokenizer.setNames("Username", "lastName", "firstName",
"departement", "supervisor");
BeanWrapperFieldSetMapper<EmployeeDto> fieldSetMapper = new
BeanWrapperFieldSetMapper<>();
fieldSetMapper.setTargetType(EmployeeDto.class);
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
return lineMapper;
}
#Bean
public EmployeeProcessor processor() {
return new EmployeeProcessor(); /*Create a bean processor to skip
invalid rows*/
}
#Bean
public RepositoryItemWriter<Employee> writer() {
RepositoryItemWriter<Employee> writer = new RepositoryItemWriter<>();
writer.setRepository(employeeRepository);
writer.setMethodName("save");
return writer;
}
#Bean
public Step step1(FlatFileItemReader<EmployeeDto> itemReader) {
return stepBuilderFactory.get("slaveStep").<EmployeeDto,
Employee>chunk(5)
.reader(itemReader)
.processor(processor())
.writer(employeeItemWriter)
.faultTolerant()
.listener(skipListener())
.skip(SkipException.class)
.skipLimit(10)
.skipPolicy(skipPolicy())
.build();
}
#Bean
#Qualifier("executeJobEmployee")
public Job runJob(FlatFileItemReader<Employee> itemReader) {
return jobBuilderFactory
.get("importEmployee")
.flow(step1(itemReader))
.end()
.build();
}
#Bean
public SkipPolicy skipPolicy(){
return new ExceptionSkipPolicy();
}
#Bean
public SkipListener<EmployeeDto, Employee> skipListener(){
return new StepSkipListener();
}
/*#Bean
public ExecutionContext executionContext(){
return new ExecutionContext();
}*/
}
EmployeeProcessor.java
public class EmployeeProcessor implements ItemProcessor<EmployeeDto,
Employee>{
#Autowired
private SupervisorService managerService;
#Override
public Employee process(#Valid EmployeeDto item) throws Exception,
SkipException {
ManagerDto manager =
SupervisorService.findSupervisorById(item.getSupervisor());
//retrieve the manager of the employee and compare departement
if(!(manager.getDepartement().equals(item.getDepartement()))) {
throw new SkipException("Manager Invalid", item);
//return null;
}
return ObjectMapperUtils.map(item, Employee.class);
}
}
MySkipPolicy.java
public class MySkipPolicy implements SkipPolicy {
#Override
public boolean shouldSkip(Throwable throwable, int i) throws
SkipLimitExceededException {
return true;
}
}
StepSkipListenerPolicy.java
public class StepSkipListener implements SkipListener<EmployeeDto,
Number> {
#Override // item reader
public void onSkipInRead(Throwable throwable) {
System.out.println("In OnSkipReader");
}
#Override // item writter
public void onSkipInWrite(Number item, Throwable throwable) {
System.out.println("Nooooooooo ");
}
//#SneakyThrows
#Override // item processor
public void onSkipInProcess(#Valid EmployeeDto employee, Throwable
throwable){
System.out.println("Process... ");
/* I guess this is where I should work, but how do I deal with the
exception occur? How do I know which exception I would get ? */
}
}
SkipException.java
public class SkipException extends Exception {
private Map<String, EmployeeDto> errors = new HashMap<>();
public SkipException(String errorMessage, EmployeeDto employee) {
super();
this.errors.put(errorMessage, employee);
}
public Map<String, EmployeeDto> getErrors() {
return this.errors;
}
}
JobController.java
#RestController
#RequestMapping("/upload")
public class JobController {
#Autowired
private JobLauncher jobLauncher;
#Autowired
#Qualifier("executeJobEmployee")
private Job job;
private final String EMPLOYEE_FOLDER = "C:/Users/Project/Employee/";
#PostMapping("/employee")
public ResponseEntity<Object> importEmployee(#RequestParam("file")
MultipartFile multipartFile) throws JobInterruptedException,
SkipException, IllegalStateException, IOException,
FlatFileParseException{
try {
String fileName = multipartFile.getOriginalFilename();
File fileToImport= new File(EMPLOYEE_FOLDER + fileName);
multipartFile.transferTo(fileToImport);
JobParameters jobParameters = new JobParametersBuilder()
.addString("fullPathFileName", EMPLOYEE_FOLDER + fileName)
.addLong("startAt", System.currentTimeMillis())
.toJobParameters();
JobExecution jobExecution = this.jobLauncher.run(job,
jobParameters);
ExecutionContext executionContext =
jobExecution.getExecutionContext();
System.out.println("My Skiped items : " +
executionContext.toString());
} catch (ConstraintViolationException | FlatFileParseException |
JobRestartException | JobInstanceAlreadyCompleteException |
JobParametersInvalidException |
JobExecutionAlreadyRunningException e) {
e.printStackTrace();
return new ResponseEntity<>(e.getMessage(), HttpStatus.BAD_REQUEST);
}
return new ResponseEntity<>("Employee inserted succesfully",
HttpStatus.OK);
}
}
That requirement forces your implementation to wait for the job to finish before returning the web response, which is not the typical way of launching batch jobs from web requests. Typically, since batch jobs can run for several minutes/hours, they are launched in the background and a job ID is returned back to the client for later status check.
In Spring Batch, the SkipListener is the extension point that allows you to add custom code when a skippable exception happens when reading, processing or writing an item. I would add the business validation in an item processor and throw an exception with the skipped item and the reason for that skip (both encapsulated in the exception class that should be declared as skippable).
Skipped items are usually stored somewhere for later analysis (like a table or a file or the job execution context). In your case, you need to send them back in the web response, so you can read them from the store of your choice before returning them attached in the web response. In pseudo code in your controller, this should be something like the following:
- run the job and wait for its termination (the skip listener would write skipped items in the storage of your choice)
- get skipped items from storage
- return web response
For example, if you choose to store skipped items in the job execution context, you can do something like this in your controller:
JobExecution jobExecution = jobLauncher.run(job, jobParameters);
ExecutionContext executionContext = jobExecution.getExecutionContext();
// get skipped items from the execution context
// return the web response

Spring Batch Reader is reading alternate records

I have created a sample spring batch application which is trying to read record from a DB and in writer, it displays those records. However, I could see that only even numbered (alternate) records are printed.
It's not the problem of database as the behavior is consistent with both H2 database or Oracle database.
There are total 100 records in my DB.
With JDBCCursorItemReader, only 50 records are read and that too alternate one as can be seen from log snapshot
With JdbcPagingItemReader, only 5 records are read and that too alternate one as can be seen from log snapshot
My code configurations are given below. Why reader is skipping odd numbered records?
#Bean
public ItemWriter<Safety> safetyWriter() {
return items -> {
for (Safety item : items) {
log.info(item.toString());
}
};
}
#Bean
public JdbcCursorItemReader<Safety> cursorItemReader() throws Exception {
JdbcCursorItemReader<Safety> reader = new JdbcCursorItemReader<>();
reader.setSql("select * from safety " );
reader.setDataSource(dataSource);
reader.setRowMapper(new SafetyRowMapper());
reader.setVerifyCursorPosition(false);
reader.afterPropertiesSet();
return reader;
}
#Bean
JdbcPagingItemReader<Safety> safetyPagingItemReader() throws Exception {
JdbcPagingItemReader<Safety> reader = new JdbcPagingItemReader<>();
reader.setDataSource(dataSource);
reader.setFetchSize(10);
reader.setRowMapper(new SafetyRowMapper());
H2PagingQueryProvider queryProvider = new H2PagingQueryProvider();
queryProvider.setSelectClause("*");
queryProvider.setFromClause("safety");
Map<String, Order> sortKeys = new HashMap<>(1);
sortKeys.put("id", Order.ASCENDING);
queryProvider.setSortKeys(sortKeys);
reader.setQueryProvider(queryProvider);
return reader;
}
#Bean
public Step importSafetyDetails() throws Exception {
return stepBuilderFactory.get("importSafetyDetails")
.<Safety, Safety>chunk(chunkSize)
//.reader(cursorItemReader())
.reader(safetyPagingItemReader())
.writer(safetyWriter())
.listener(new StepListener())
.listener(new ChunkListener())
.build();
}
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job")
.start(importSafetyDetails())
.build();
}
Domain classes looks like below:
#NoArgsConstructor
#AllArgsConstructor
#Data
public class Safety {
private int id;
}
public class SafetyRowMapper implements RowMapper<Safety> {
#Override
public Safety mapRow(ResultSet resultSet, int i) throws SQLException {
if(resultSet.next()) {
Safety safety = new Safety();
safety.setId(resultSet.getInt("id"));
return safety;
}
return null;
}
}
#SpringBootApplication
#EnableBatchProcessing
public class SpringBatchSamplesApplication {
public static void main(String[] args) {
SpringApplication.run(SpringBatchSamplesApplication.class, args);
}
}
application.yml configuration is as below:
spring:
application:
name: spring-batch-samples
main:
allow-bean-definition-overriding: true
datasource:
url: jdbc:h2:mem:testdb;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE
username: sa
password:
driver-class-name: org.h2.Driver
hikari:
connection-timeout: 20000
maximum-pool-size: 10
h2:
console:
enabled: true
batch:
initialize-schema: never
server:
port: 9090
sqls are as below:
CREATE TABLE safety (
id int NOT NULL,
CONSTRAINT PK_ID PRIMARY KEY (id)
);
INSERT INTO safety (id) VALUES (1);
...100 records are inserted
Listeners classes are as below:
#Slf4j
public class StepListener{
#AfterStep
public ExitStatus afterStep(StepExecution stepExecution) {
log.info("In step {} ,Exit Status: {} ,Read Records: {} ,Committed Records: {} ,Skipped Read Records: {} ,Skipped Write Records: {}",
stepExecution.getStepName(),
stepExecution.getExitStatus().getExitCode(),
stepExecution.getReadCount(),
stepExecution.getCommitCount(),
stepExecution.getReadSkipCount(),
stepExecution.getWriteSkipCount());
return stepExecution.getExitStatus();
}
}
#Slf4j
public class ChunkListener {
#BeforeChunk
public void beforeChunk(ChunkContext context) {
log.info("<< Before the chunk");
}
#AfterChunk
public void afterChunk(ChunkContext context) {
log.info("<< After the chunk");
}
}
I tried to reproduce your problem, but I couldn't. Maybe it would be great if you could share more code.
Meanwhile I created a simple job to read 100 records from "safety" table a print them to the console. And it is working fine.
.
#SpringBootApplication
#EnableBatchProcessing
public class ReaderWriterProblem implements CommandLineRunner {
#Autowired
DataSource dataSource;
#Autowired
StepBuilderFactory stepBuilderFactory;
#Autowired
JobBuilderFactory jobBuilderFactory;
#Autowired
private JobLauncher jobLauncher;
#Autowired
private ApplicationContext context;
public static void main(String[] args) {
String[] arguments = new String[]{LocalDateTime.now().toString()};
SpringApplication.run(ReaderWriterProblem.class, arguments);
}
#Bean
public ItemWriter<Safety> safetyWriter() {
return new ItemWriter<Safety>() {
#Override
public void write(List<? extends Safety> items) throws Exception {
for (Safety item : items) {
//log.info(item.toString());
System.out.println(item);
}
}
};
}
// #Bean
// public JdbcCursorItemReader<Safety> cursorItemReader() throws Exception {
// JdbcCursorItemReader<Safety> reader = new JdbcCursorItemReader<>();
//
// reader.setSql("select * from safety ");
// reader.setDataSource(dataSource);
// reader.setRowMapper(new SafetyRowMapper());
// reader.setVerifyCursorPosition(false);
// reader.afterPropertiesSet();
//
// return reader;
// }
#Bean
JdbcPagingItemReader<Safety> safetyPagingItemReader() throws Exception {
JdbcPagingItemReader<Safety> reader = new JdbcPagingItemReader<>();
reader.setDataSource(dataSource);
reader.setFetchSize(10);
reader.setRowMapper(new SafetyRowMapper());
PostgresPagingQueryProvider queryProvider = new PostgresPagingQueryProvider();
queryProvider.setSelectClause("*");
queryProvider.setFromClause("safety");
Map<String, Order> sortKeys = new HashMap<>(1);
sortKeys.put("id", Order.ASCENDING);
queryProvider.setSortKeys(sortKeys);
reader.setQueryProvider(queryProvider);
return reader;
}
#Bean
public Step importSafetyDetails() throws Exception {
return stepBuilderFactory.get("importSafetyDetails")
.<Safety, Safety>chunk(5)
//.reader(cursorItemReader())
.reader(safetyPagingItemReader())
.writer(safetyWriter())
.listener(new MyStepListener())
.listener(new MyChunkListener())
.build();
}
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job")
.listener(new JobListener())
.start(importSafetyDetails())
.build();
}
#Override
public void run(String... args) throws Exception {
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
jobParametersBuilder.addString("date", LocalDateTime.now().toString());
try {
Job job = (Job) context.getBean("job");
jobLauncher.run(job, jobParametersBuilder.toJobParameters());
} catch (JobExecutionAlreadyRunningException | JobRestartException | JobInstanceAlreadyCompleteException | JobParametersInvalidException e) {
e.printStackTrace();
}
}
public static class JobListener implements JobExecutionListener {
#Override
public void beforeJob(JobExecution jobExecution) {
System.out.println("Before job");
}
#Override
public void afterJob(JobExecution jobExecution) {
System.out.println("After job");
}
}
private static class SafetyRowMapper implements RowMapper<Safety> {
#Override
public Safety mapRow(ResultSet resultSet, int i) throws SQLException {
Safety safety = new Safety();
safety.setId(resultSet.getLong("ID"));
return safety;
}
}
public static class MyStepListener implements StepExecutionListener {
#Override
public void beforeStep(StepExecution stepExecution) {
System.out.println("Before Step");
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
System.out.println("After Step");
return ExitStatus.COMPLETED;
}
}
private static class MyChunkListener implements ChunkListener {
#Override
public void beforeChunk(ChunkContext context) {
System.out.println("Before Chunk");
}
#Override
public void afterChunk(ChunkContext context) {
System.out.println("After Chunk");
}
#Override
public void afterChunkError(ChunkContext context) {
}
}
}
Hope this helps

Spring batch partitioning taking to much time for processing 100000 records

I want to read 10 billions of data using spring batch program. I implemented partitioning for processing for one big file.
How I can increase the performance?
Below is my configuration.
#Bean
public FlatFileItemReader<RequestDTO> reader() throws MalformedURLException {
FlatFileItemReader<RequestDTO> itemReader = new FlatFileItemReader<>();
itemReader.setLineMapper(lineMapper());
itemReader.setResource(new FileSystemResource(fileLocation));
itemReader.setLinesToSkip(1);
return itemReader;
}
/**
* This is used for mapping values.
*
* #return LineMapper will be returned
*/
#Bean
public LineMapper<RequestDTO> lineMapper() {
DefaultLineMapper<RequestDTO> lineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setNames("name", "salary", "company");
lineTokenizer.setIncludedFields(0, 1, 2);
lineTokenizer.setDelimiter(ProcessorConstants.FILE_SEPERATOR);
BeanWrapperFieldSetMapper<RequestDTO> fieldSetMapper = new BeanWrapperFieldSetMapper<>();
fieldSetMapper.setTargetType(RequestDTO.class);
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
return lineMapper;
}
#Bean
public CRSItemProcessor processor() {
return new CRSItemProcessor();
}
#Bean
public RepositoryItemWriter<AdministrationRequest> writer(DataSource dataSource) {
return new RepositoryItemWriterBuilder<CRSAdministrationRequest>().methodName("saveAndFlush")
.repository(processorBatchDAO).build();
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setMaxPoolSize(20);
taskExecutor.setCorePoolSize(10);
taskExecutor.setQueueCapacity(5);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
#Bean(name = "partitionerJob")
public Job partitionerJob(JobCompletionNotificationListener listener) throws UnexpectedInputException, MalformedURLException, ParseException {
return jobBuilderFactory.get("partitionerJob").incrementer(new RunIdIncrementer()).listener(listener)
.start(partitionStep())
.build();
}
#Bean
public Step partitionStep() throws UnexpectedInputException, MalformedURLException, ParseException {
return stepBuilderFactory.get("partitionStep")
.partitioner(slaveStep(null))
.partitioner("slaveStep",new CustomMultiResourcePartitioner())
.gridSize(20)
.taskExecutor(taskExecutor())
.build();
}
#Bean
public Step slaveStep(RepositoryItemWriter<CRSAdministrationRequest> writer) throws UnexpectedInputException, MalformedURLException, ParseException {
return stepBuilderFactory.get("slaveStep")
.<RequestDTO, AdministrationRequest>chunk(1000)
.reader(reader())
.processor(processor())
.writer(writer)
.build();
}
Below is custom partitioning class.
public class CustomMultiResourcePartitioner implements Partitioner {
/**
* Assign the filename of each of the injected resources to an
* {#link ExecutionContext}.
*
* #see Partitioner#partition(int)
*/
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> result = new HashMap<String, ExecutionContext>();
int range = 1000;
int fromId = 1;
int toId = range;
for (int i = 1; i <= gridSize; i++) {
ExecutionContext value = new ExecutionContext();
System.out.println("\nStarting : Thread" + i);
System.out.println("fromId : " + fromId);
System.out.println("toId : " + toId);
value.putInt("fromId", fromId);
value.putInt("toId", toId);
// give each thread a name, thread 1,2,3
value.putString("name", "Thread" + i);
result.put("partition" + i, value);
fromId = toId + 1;
toId += range;
}
return result;
}
}

Spring Batch MultiLineItemReader with MultiResourcePartitioner

I have a File which has Multiline data like this. DataID is Start of a new record. e.g. One record is a combination of ID and concatenating below line until the start of a new record.
>DataID1
Line1asdfsafsdgdsfghfghfghjfgjghjgxcvmcxnvm
Line2asdfsafsdgdsfghfghfghjfgjghjgxcvmcxnvm
Line3asdfsafsdgdsfghfghfghjfgjghjgxcvmcxnvm
>DataID2
DataID2asdfsafsdgdsfghfghfghjfgjghjgxcvmcxnvm
>DataID3
DataID2asdfsafsdgdsfghfghfghjfgjghjgxcvmcxnvm
I was able to implement this using SingleItemPeekableItemReader and it's working fine.
I am not trying to implement partition, As we need to process multiple files. I am not sure how the partitioner is passing file info to my customer reader and how to make my SingleItemPeekableItemReader thread safe as it not working correctly
Need some inputs as I am stuck at this point
java-config
#Bean
public Partitioner partitioner() {
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
partitioner.setResources(resources);
partitioner.partition(10);
return partitioner;
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setMaxPoolSize(4);
taskExecutor.setCorePoolSize(4);
taskExecutor.setQueueCapacity(8);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
#Bean
#Qualifier("masterStep")
public Step masterStep() {
return stepBuilderFactory.get("masterStep")
.partitioner("step1",partitioner())
.step(step1())
.taskExecutor(taskExecutor())
.build();
}
#Bean
public MultiResourceItemReader<FieldSet> multiResourceItemReader() {
log.info("Total Number of Files to be process {}",resources.length);
report.setFileCount(resources.length);
MultiResourceItemReader<FieldSet> resourceItemReader = new MultiResourceItemReader<FieldSet>();
resourceItemReader.setResources(resources);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}
#Bean
public FlatFileItemReader<FieldSet> reader() {
FlatFileItemReader<FieldSet> build = new FlatFileItemReaderBuilder<FieldSet>().name("fileReader")
.lineTokenizer(orderFileTokenizer())
.fieldSetMapper(new FastFieldSetMapper())
.recordSeparatorPolicy(new BlankLineRecordSeparatorPolicy())
.build();
build.setBufferedReaderFactory(gzipBufferedReaderFactory);
return build;
}
#Bean
public SingleItemPeekableItemReader<FieldSet> readerPeek() {
SingleItemPeekableItemReader<FieldSet> reader = new SingleItemPeekableItemReader<>();
reader.setDelegate(multiResourceItemReader());
return reader;
}
#Bean
public MultiLineFastaItemReader itemReader() {
MultiLineFastaItemReader itemReader = new MultiLineFastaItemReader(multiResourceItemReader());
itemReader.setSingalPeekable(readerPeek());
return itemReader;
}
#Bean
public PatternMatchingCompositeLineTokenizer orderFileTokenizer() {
PatternMatchingCompositeLineTokenizer tokenizer = new PatternMatchingCompositeLineTokenizer();
Map<String, LineTokenizer> tokenizers = new HashMap<>(2);
tokenizers.put(">*", head());
tokenizers.put("*", tail());
tokenizer.setTokenizers(tokenizers);
return tokenizer;
}
public DelimitedLineTokenizer head() {
DelimitedLineTokenizer token = new DelimitedLineTokenizer();
token.setNames("sequenceIdentifier");
token.setDelimiter(" ");
token.setStrict(false);
return token;
}
public DelimitedLineTokenizer tail() {
DelimitedLineTokenizer token = new DelimitedLineTokenizer();
token.setNames("sequences");
token.setDelimiter(" ");
return token;
}
#Bean
public FastReportWriter writer() {
return new FastReportWriter();
}
#Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
return jobBuilderFactory.get("importUserJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.flow(masterStep())
//.flow(step1)
.next(step2())
.end()
.build();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Fasta, Fasta>chunk(5000)
.reader(itemReader())
.processor(new FastaIteamProcessor())
//.processor(new PassThroughItemProcessor<>())
.writer(writer())
.build();
}
public class MultiLineFastaItemReader implements ItemReader<Fasta>, ItemStream {
private static final Logger log = LoggerFactory.getLogger(MultiLineFastaItemReader.class);
private SingleItemPeekableItemReader<FieldSet> singalPeekable;
AtomicInteger iteamCounter = new AtomicInteger(0);
ConcurrentHashMap<String, AtomicInteger> fileNameAndCounterMap = new ConcurrentHashMap<>();
#Autowired
private SequenceFastaReport sequenceFastaReport;
private MultiResourceItemReader<FieldSet> resourceItemReader;
public MultiLineFastaItemReader(MultiResourceItemReader<FieldSet> multiResourceItemReader) {
this.resourceItemReader = multiResourceItemReader;
}
public SingleItemPeekableItemReader<FieldSet> getSingalPeekable() {
return singalPeekable;
}
public void setSingalPeekable(SingleItemPeekableItemReader<FieldSet> singalPeekable) {
this.singalPeekable = singalPeekable;
}
#Override
public Fasta read() throws Exception {
FieldSet item = singalPeekable.read();
if (item == null) {
return null;
}
Fasta fastaObject = new Fasta();
log.info("ID {} fileName {}", item.readString(0), resourceItemReader.getCurrentResource());
fastaObject.setSequenceIdentifier(item.readString(0)
.toUpperCase());
fastaObject.setFileName(resourceItemReader.getCurrentResource()
.getFilename());
if (!fileNameAndCounterMap.containsKey(fastaObject.getFileName())) {
fileNameAndCounterMap.put(fastaObject.getFileName(), new AtomicInteger(0));
}
while (true) {
FieldSet possibleRelatedObject = singalPeekable.peek();
if (possibleRelatedObject == null) {
if (fastaObject.getSequenceIdentifier()
.length() < 1)
throw new InvalidParameterException("Somwthing Wrong in file");
sequenceFastaReport.addToReport(fileNameAndCounterMap.get(fastaObject.getFileName())
.incrementAndGet(), fastaObject.getSequences());
return fastaObject;
}
if (possibleRelatedObject.readString(0)
.startsWith(">")) {
if (fastaObject.getSequenceIdentifier()
.length() < 1)
throw new InvalidParameterException("Somwthing Wrong in file");
sequenceFastaReport.addToReport(fileNameAndCounterMap.get(fastaObject.getFileName())
.incrementAndGet(), fastaObject.getSequences());
return fastaObject;
}
String data = fastaObject.getSequences()
.toUpperCase();
fastaObject.setSequences(data + singalPeekable.read()
.readString(0)
.toUpperCase());
}
}
#Override
public void close() {
this.singalPeekable.close();
}
#Override
public void open(ExecutionContext executionContext) {
this.singalPeekable.open(executionContext);
}
#Override
public void update(ExecutionContext executionContext) {
this.singalPeekable.update(executionContext);
}
}
I am not sure how the partitioner is passing file info to my customer reader
The partitioner will create partition meta-data in step execution contexts and your reader should read that meta-data from it. In your example, you don't need to call partition on the partitioner, Spring Batch will do it. You need instead to set the partition key on the partitioner, for example:
#Bean
public Partitioner partitioner() {
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
partitioner.setResources(resources);
partitioner.setKeyName("file");
return partitioner;
}
This will create a partition for each file with the key file that you can get in your reader from the step execution context:
#Bean
#StepScope
public FlatFileItemReader reader(#Value("#{stepExecutionContext['file']}") String file) {
// define your reader
}
Note that the reader should be step scoped to use this feature. More details here: https://docs.spring.io/spring-batch/4.0.x/reference/html/step.html#late-binding

Spring boot batch partitioning JdbcCursorItemReader error

I have been unable to get this to work even after following Victor Jabor blog very comprehensive example. I have followed his configuration as he described and used all the latest dependencies. I, as Victor am trying to read from one db and write to another. I have this working without partitioning but need partitioning to improve performance as I need to be able to read 5 to 10 million rows within 5mins.
The following seems to work:
1) ColumnRangePartitioner
2) TaskExecutorPartitionHandler builds the correct number of step tasks based on the gridsize and spawns the correct number of threads
3) setPreparedStatementSetter from the stepExecution set by the ColumnRangePartitioner.
But when I run the application I get errors from JdbcCursorItemReader which are not consistent and which I dont understand. As a last resort I will have to debug the JdbcCursorItemReader. I am hoping to get some help before this and hopefully it will be a configuration issue.
ERROR:
Caused by: java.sql.SQLException: Exhausted Resultset
at oracle.jdbc.driver.OracleResultSetImpl.getInt(OracleResultSetImpl.java:901) ~[ojdbc6-11.2.0.2.0.jar:11.2.0.2.0]
at org.springframework.jdbc.support.JdbcUtils.getResultSetValue(JdbcUtils.java:160) ~[spring-jdbc-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.jdbc.core.BeanPropertyRowMapper.getColumnValue(BeanPropertyRowMapper.java:370) ~[spring-jdbc-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.jdbc.core.BeanPropertyRowMapper.mapRow(BeanPropertyRowMapper.java:291) ~[spring-jdbc-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.batch.item.database.JdbcCursorItemReader.readCursor(JdbcCursorItemReader.java:139) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar:3.0.7.RELEASE]
Configuration classes:
#Configuration #EnableBatchProcessing public class BatchConfiguration {
#Bean
public ItemProcessor<Archive, Archive> processor(#Value("${etl.region}") String region) {
return new ArchiveProcessor(region);
}
#Bean
public ItemWriter<Archive> writer(#Qualifier(value = "postgres") DataSource dataSource) {
JdbcBatchItemWriter<Archive> writer = new JdbcBatchItemWriter<>();
writer.setSql("insert into tdw_src.archive (id) " +
"values (:id)");
writer.setDataSource(dataSource);
writer.setItemSqlParameterSourceProvider(new org.springframework.batch.item.database.
BeanPropertyItemSqlParameterSourceProvider<>());
return writer;
}
#Bean
public Partitioner archivePartitioner(#Qualifier(value = "gmDataSource") DataSource dataSource,
#Value("ROWNUM") String column,
#Value("archive") String table,
#Value("${gm.datasource.username}") String schema) {
return new ColumnRangePartitioner(dataSource, column, schema + "." + table);
}
#Bean
public Job archiveJob(JobBuilderFactory jobs, Step partitionerStep, JobExecutionListener listener) {
return jobs.get("archiveJob")
.preventRestart()
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(partitionerStep)
.build();
}
#Bean
public Step partitionerStep(StepBuilderFactory stepBuilderFactory,
Partitioner archivePartitioner,
Step step1,
#Value("${spring.batch.gridsize}") int gridSize) {
return stepBuilderFactory.get("partitionerStep")
.partitioner(step1)
.partitioner("step1", archivePartitioner)
.gridSize(gridSize)
.taskExecutor(taskExecutor())
.build();
}
#Bean(name = "step1")
public Step step1(StepBuilderFactory stepBuilderFactory, ItemReader<Archive> customReader,
ItemWriter<Archive> writer, ItemProcessor<Archive, Archive> processor) {
return stepBuilderFactory.get("step1")
.listener(customReader)
.<Archive, Archive>chunk(5)
.reader(customReader)
.processor(processor)
.writer(writer)
.build();
}
#Bean
public TaskExecutor taskExecutor(){
return new SimpleAsyncTaskExecutor();
}
#Bean
public SimpleJobLauncher getJobLauncher(JobRepository jobRepository) {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository);
return jobLauncher;
}
Custom Reader:-
public class CustomReader extends JdbcCursorItemReader<Archive> implements StepExecutionListener {
private StepExecution stepExecution;
#Autowired
public CustomReader(#Qualifier(value = "gmDataSource") DataSource geomangerDataSource,
#Value("${gm.datasource.username}") String schema) throws Exception {
super();
this.setSql("SELECT TMP.* FROM (SELECT ROWNUM AS ID_PAGINATION, id FROM " + schema + ".archive) TMP " +
"WHERE TMP.ID_PAGINATION >= ? AND TMP.ID_PAGINATION <= ?");
this.setDataSource(geomangerDataSource);
BeanPropertyRowMapper<Archive> rowMapper = new BeanPropertyRowMapper<>(Archive.class);
this.setRowMapper(rowMapper);
this.setFetchSize(5);
this.setSaveState(false);
this.setVerifyCursorPosition(false);
// not sure if this is needed? this.afterPropertiesSet();
}
#Override
public synchronized void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
this.setPreparedStatementSetter(getPreparedStatementSetter());
}
private PreparedStatementSetter getPreparedStatementSetter() {
ListPreparedStatementSetter listPreparedStatementSetter = new ListPreparedStatementSetter();
List<Integer> list = new ArrayList<>();
list.add(stepExecution.getExecutionContext().getInt("minValue"));
list.add(stepExecution.getExecutionContext().getInt("maxValue"));
listPreparedStatementSetter.setParameters(list);
LOGGER.debug("getPreparedStatementSetter list: " + list);
return listPreparedStatementSetter;
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
return null;
}
}
I've got this all working.
First I needed to order my select statement in my CustomReader so the rownum remains the same for all threads and lastly I had to scope the beans by using #StepScope for each bean used in the step.
In reality I wont be using rownum since this needs to be ordered which reduce loose performance and therefore I will use a pk column to get the best performance.

Resources