Spring batch - get information about files in a directory - spring

So I'm toying around with Spring Batch for the first time and trying to understand how to do things other than process a CSV file.
Attempting to read every music file in a directory for example, I have the following code but I'm not sure how to handle the Delegate part.
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Bean
public MusicItemProcessor processor() {
return new MusicItemProcessor();
}
#Bean
public Job readFiles() {
return jobBuilderFactory.get("readFiles").incrementer(new RunIdIncrementer()).
flow(step1()).end().build();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("step1").<String, String>chunk(10)
.reader(reader())
.processor(processor()).build();
}
#Bean
public ItemReader<String> reader() {
Resource[] resources = null;
ResourcePatternResolver patternResolver = new PathMatchingResourcePatternResolver();
try {
resources = patternResolver.getResources("file:/music/*.flac");
} catch (IOException e) {
e.printStackTrace();
}
MultiResourceItemReader<String> reader = new MultiResourceItemReader<>();
reader.setResources(resources);
reader.setDelegate(new FlatFileItemReader<>()); // ??
return reader;
}
}
At the moment I can see that resources has a list of music files, but looking at the stacktrace I get back, it looks to me like new FlatFileItemReader<>() is trying to read the actual content of the files (I'll want to do that at some point, just not right now).
At the moment I just want the information about the file (absolute path, size, filename etc), not what's inside.
Have I gone completely wrong with this? Or do I just need to configure something a little different?
Any examples of code that does more than process CSV lines would also be awesome

After scouring the internet I've managed to pull together something that I think works... Some feedback would be welcome.
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Bean
public VideoItemProcessor processor() {
return new VideoItemProcessor();
}
#Bean
public Job readFiles() {
return jobBuilderFactory.get("readFiles")
.start(step())
.build();
}
#Bean
public Step step() {
try {
return stepBuilderFactory.get("step").<File, Video>chunk(500)
.reader(directoryItemReader())
.processor(processor())
.build();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
#Bean
public DirectoryItemReader directoryItemReader() throws IOException {
return new DirectoryItemReader("file:/media/media/Music/**/*.flac");
}
}
The part that had me stuck with creating a custom reader for files. If anyone else comes across this, this is how I've done it. I'm sure there are better ways but this works for me
public class DirectoryItemReader implements ItemReader<File>, InitializingBean {
private final String directoryPath;
private final List<File> foundFiles = Collections.synchronizedList(new ArrayList<>());
public DirectoryItemReader(final String directoryPath) {
this.directoryPath = directoryPath;
}
#Override
public File read() {
if (!foundFiles.isEmpty()) {
return foundFiles.remove(0);
}
synchronized (foundFiles) {
final Iterator files = foundFiles.iterator();
if (files.hasNext()) {
return foundFiles.remove(0);
}
}
return null;
}
#Override
public void afterPropertiesSet() throws Exception {
for (final Resource file : getFiles()) {
this.foundFiles.add(file.getFile());
}
}
private Resource[] getFiles() throws IOException {
ResourcePatternResolver patternResolver = new PathMatchingResourcePatternResolver();
return patternResolver.getResources(directoryPath);
}
}
The only thing you'd need to do is implement your own processor. I've used Videos in this example, so I have a video processor
#Slf4j
public class VideoItemProcessor implements ItemProcessor<File, Video> {
#Override
public Video process(final File item) throws Exception {
Video video = Video.builder()
.filename(item.getAbsoluteFile().getName())
.absolutePath(item.getAbsolutePath())
.fileSize(item.getTotalSpace())
.build();
log.info("Created {}", video);
return video;
}
}

Related

Spring Batch SkippListener not printing logs

I am not sure why the SkipListener is not printing the simple logs. I want to log the skipped records into a separate log file and I am thinking I can use MDC.put(). But I am not even able to print a log in console not sure what is happening. I think I am missing something. A help would be really appreciated. I even tried with generic Exception.class for testing but still not doing anything. Here is my code;
BtachConfiguration
#EnableBatchProcessing
public class BatchConfiguration {
static final Logger LOG = LogManager.getLogger(BatchConfiguration.class);
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
public DataSource dataSource;
#Bean
public Job loadUserJob() {
return jobBuilderFactory.get("loadUserJob")
.incrementer(new RunIdIncrementer())
.start(loadUsersStep())
.listener(new JobLoggerListener())
.build();
}
#Bean
public Step loadUsersStep() {
return stepBuilderFactory.get("loadUsersStep")
.<UserInfo, UserInfoDTO> chunk(10)
.reader(reader())
.processor(UserInfoItemProcessor())
.writer(writer())
.faultTolerant()
.skipLimit(3)
.skip(Exception.class)
//.skip(UserNotFoundException.class)
.listener(new StepStartStopListener())
.listener(new LoadDataSkipListener())
.build();
}
#Bean
public FlatFileItemReader<UserInfo> reader() {
return new FlatFileItemReaderBuilder<UserInfo>()
.name("reader")
.delimited()
.names(new String[] {
"first_name",
"last_name",
"email"})
.targetType(UserInfo.class)
.resource(new ClassPathResource("userinfo_file.csv"))
.build();
}
#Bean
public UserInfoItemProcessor UserInfoItemProcessor() {
return new UserInfoItemProcessor();
}
#Bean
public JdbcBatchItemWriter<UserInfoDTO> writer() {
JdbcBatchItemWriter<UserInfoDTO> writer = new JdbcBatchItemWriter<UserInfoDTO>();
writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>());
writer.setSql("INSERT INTO db1.userinfo "
+ "(first_name,last_name,email) " +
"VALUES (:firstName, :lastName,:email)");
writer.setDataSource(dataSource);
return writer;
}
}
SkipListener Class
public class LoadDataSkipListener implements SkipListener<UserInfoDTO, UserInfoDTO>{
public static final Logger LOG = LogManager.getLogger(LoadDataSkipListener.class);
#Override
public void onSkipInRead(Throwable t) {
LOG.info("ItemWriter: ");
}
#Override
public void onSkipInWrite(UserInfoDTO item, Throwable t) {
LOG.info(">>> onSkipInWrite <<<");
// MDC.put("skipped_users", String.valueOf(item.getUserId())); // I want to log skipped user ids on write
}
#Override
public void onSkipInProcess(UserInfoDTO item, Throwable t) {
LOG.info(">>> onSkipInProcess<<< ");
}
}
application.properties
spring.datasource.url=jdbc:mysql://localhost:3306/db1?useSSL=false
spring.datasource.username=username
spring.datasource.password=password
spring.datasource.schema=classpath:/org/springframework/batch/core/schema-mysql.sql
spring.batch.initialize-schema=always
spring.datasource.initialize=true
#logging.file={log-name}.log --> WIll this write into a new file ??
Just want to know
Why its not even printing in console ?
Can the commented MDC be used to write to a new file ?
Thanks in advance!

Saving file information in Spring batch MultiResourceItemReader

I have a directory having text files. I want to process files and write data into db. I did that by using MultiResourceItemReader.
I have a scenario like whenever file is coming, the first step is to save file info, like filename, record count in file in a log table(custom table).
Since i used MultiResourceItemReader, It's loading all files once and the code which i wrote is executing once in server startup. I tried with getCurrentResource() method but its returning null.
Please refer below code.
NetFileProcessController.java
#Slf4j
#RestController
#RequestMapping("/netProcess")
public class NetFileProcessController {
#Autowired
private JobLauncher jobLauncher;
#Autowired
#Qualifier("netFileParseJob")
private Job job;
#GetMapping(path = "/process")
public #ResponseBody StatusResponse process() throws ServiceException {
try {
Map<String, JobParameter> parameters = new HashMap<>();
parameters.put("date", new JobParameter(new Date()));
jobLauncher.run(job, new JobParameters(parameters));
return new StatusResponse(true);
} catch (Exception e) {
log.error("Exception", e);
Throwable rootException = ExceptionUtils.getRootCause(e);
String errMessage = rootException.getMessage();
log.info("Root cause is instance of JobInstanceAlreadyCompleteException --> "+(rootException instanceof JobInstanceAlreadyCompleteException));
if(rootException instanceof JobInstanceAlreadyCompleteException){
log.info(errMessage);
return new StatusResponse(false, "This job has been completed already!");
} else{
throw new ServiceException(errMessage);
}
}
}
}
BatchConfig.java
#Configuration
#EnableBatchProcessing
public class BatchConfig {
private JobBuilderFactory jobBuilderFactory;
#Autowired
public void setJobBuilderFactory(JobBuilderFactory jobBuilderFactory) {
this.jobBuilderFactory = jobBuilderFactory;
}
#Autowired
StepBuilderFactory stepBuilderFactory;
#Value("file:${input.files.location}${input.file.pattern}")
private Resource[] netFileInputs;
#Value("${net.file.column.names}")
private String netFilecolumnNames;
#Value("${net.file.column.lengths}")
private String netFileColumnLengths;
#Autowired
NetFileInfoTasklet netFileInfoTasklet;
#Autowired
NetFlatFileProcessor netFlatFileProcessor;
#Autowired
NetFlatFileWriter netFlatFileWriter;
#Bean
public Job netFileParseJob() {
return jobBuilderFactory.get("netFileParseJob")
.incrementer(new RunIdIncrementer())
.start(netFileStep())
.build();
}
public Step netFileStep() {
return stepBuilderFactory.get("netFileStep")
.<NetDetailsDTO, NetDetailsDTO>chunk(1)
.reader(new NetFlatFileReader(netFileInputs, netFilecolumnNames, netFileColumnLengths))
.processor(netFlatFileProcessor)
.writer(netFlatFileWriter)
.build();
}
}
NetFlatFileReader.java
#Slf4j
public class NetFlatFileReader extends MultiResourceItemReader<NetDetailsDTO> {
public netFlatFileReader(Resource[] netFileInputs, String netFilecolumnNames, String netFileColumnLengths) {
setResources(netFileInputs);
setDelegate(reader(netFilecolumnNames, netFileColumnLengths));
}
private FlatFileItemReader<NetDetailsDTO> reader(String netFilecolumnNames, String netFileColumnLengths) {
FlatFileItemReader<NetDetailsDTO> flatFileItemReader = new FlatFileItemReader<>();
FixedLengthTokenizer tokenizer = CommonUtil.fixedLengthTokenizer(netFilecolumnNames, netFileColumnLengths);
FieldSetMapper<NetDetailsDTO> mapper = createMapper();
DefaultLineMapper<NetDetailsDTO> lineMapper = new DefaultLineMapper<>();
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(mapper);
flatFileItemReader.setLineMapper(lineMapper);
return flatFileItemReader;
}
/*
* Mapping column data to DTO
*/
private FieldSetMapper<NetDetailsDTO> createMapper() {
BeanWrapperFieldSetMapper<NetDetailsDTO> mapper = new BeanWrapperFieldSetMapper<>();
try {
mapper.setTargetType(NetDetailsDTO.class);
} catch(Exception e) {
log.error("Exception in mapping column data to dto ", e);
}
return mapper;
}
}
I am stuck on this scenario, Any help appreciated
I don't think MultiResourceItemReader is appropriate in your case. I would run a job per file for all the reasons of making one thing do one thing and do it well:
Your preparatory step will work by design
It would be easier to run multiple jobs in parallel and improve your file ingestion throughput
In case of failure, you would only restart the job for the failed file
EDIT: add an example
Resource[] netFileInputs = ... // same code that looks for file as currently in your reader
for (Resource netFileInput : netFileInputs) {
Map<String, JobParameter> parameters = new HashMap<>();
parameters.put("netFileInput", new JobParameter(netFileInput.getFilename()));
jobLauncher.run(job, new JobParameters(parameters));
}

ElasticsearchItemReader keeps reading same records

I am really beginner in Spring and I have to develop an application using spring-batch. This application must read from a elasticsearch index and write all the records in a File.
When I run the program, I don't get any error, and the application reads the records and write them in the file correctly. The thing is the application never stops and keep reading, processing and writing the data without ending. In the following picture, you can see same records being processing many times.
I think must be some problem in my code or my design of the software, so I attach the most important parts of my code hereunder.
I developed the following ElasticsearchItemReader:
public class ElasticsearchItemReader<T> extends AbstractPaginatedDataItemReader<T> implements InitializingBean {
private final Logger logger;
private final ElasticsearchOperations elasticsearchOperations;
private final SearchQuery query;
private final Class<? extends T> targetType;
public ElasticsearchItemReader(ElasticsearchOperations elasticsearchOperations, SearchQuery query, Class<? extends T> targetType) {
setName(getShortName(getClass()));
logger = getLogger(getClass());
this.elasticsearchOperations = elasticsearchOperations;
this.query = query;
this.targetType = targetType;
}
#Override
public void afterPropertiesSet() throws Exception {
state(elasticsearchOperations != null, "An ElasticsearchOperations implementation is required.");
state(query != null, "A query is required.");
state(targetType != null, "A target type to convert the input into is required.");
}
#Override
#SuppressWarnings("unchecked")
protected Iterator<T> doPageRead() {
logger.debug("executing query {}", query.getQuery());
return (Iterator<T>)elasticsearchOperations.queryForList(query, targetType).iterator();
}
}
Also I wrote the following ReadWriterConfig:
#Configuration
public class ReadWriterConfig {
#Bean
public ElasticsearchItemReader<AnotherElement> elasticsearchItemReader() {
return new ElasticsearchItemReader<>(elasticsearchOperations(), query(), AnotherElement.class);
}
#Bean
public SearchQuery query() {
NativeSearchQueryBuilder builder = new NativeSearchQueryBuilder()
.withQuery(matchAllQuery());
return builder.build();
}
#Bean
public ElasticsearchOperations elasticsearchOperations() {
Client client = null;
try {
Settings settings = Settings.builder()
.build();
client = new PreBuiltTransportClient(settings)
.addTransportAddress(new TransportAddress(InetAddress.getByName("localhost"), 9300));
return new ElasticsearchTemplate(client);
} catch (UnknownHostException e) {
e.printStackTrace();
return null;
}
}
}
And I wrote the batchconfiguration where I call the reader, writer and processor:
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
// tag::readerwriterprocessor[]
#Bean
public ElasticsearchItemReader<AnotherElement> reader() {
return new ReadWriterConfig().elasticsearchItemReader();
}
#Bean
public PersonItemProcessor processor() {
return new PersonItemProcessor();
}
#Bean
public FlatFileItemWriter itemWriter() {
return new FlatFileItemWriterBuilder<AnotherElement>()
.name("itemWriter")
.resource(new FileSystemResource("target/output.txt"))
.lineAggregator(new PassThroughLineAggregator<>())
.build();
}
// end::readerwriterprocessor[]
// tag::jobstep[]
#Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step stepA) {
return jobBuilderFactory.get("importUserJob")
.flow(stepA)
.end()
.build();
}
#Bean
public Step stepA(FlatFileItemWriter<AnotherElement> writer) {
return stepBuilderFactory.get("stepA")
.<AnotherElement, AnotherElement> chunk(10)
.reader(reader())
.processor(processor())
.writer(itemWriter())
.build();
}
// end::jobstep[]
}
I attach some of the websites I was follpwimg to write this code:
https://github.com/spring-projects/spring-batch-extensions/blob/master/spring-batch-elasticsearch/README.md
https://spring.io/guides/gs/batch-processing/
Your reader should return an Iterator for every call of doPageRead() with which it is possible to iterate over one page of a dataset. As you are not splitting the result from the Elasticsearch query into pages but query the whole set in one step, you are returning in the first call to doPageRead() an iterator for the whole result set. Then in the next call, you return again an iterator over the very same result set.
So you have to keep track if you already returned the iterator, something like:
public class ElasticsearchItemReader<T> extends AbstractPaginatedDataItemReader<T> implements InitializingBean {
// leaving out irrelevant parts
boolean doPageReadCalled = false;
#Override
#SuppressWarnings("unchecked")
protected Iterator<T> doPageRead() {
if(doPageReadCalled) {
return null;
}
doPageReadCalled = true
return (Iterator<T>)elasticsearchOperations.queryForList(query, targetType).iterator();
}
}
On the first call you set the flag to true and then return the iterator, on the next call you then see that you already returned the data and return null.
This is a very basic solution, depending on the amount of data you get from Elasticsearch, it might be better to query for example with the scroll api and return pages until all are processed.
You need to make sure your item reader returns null at some point to signal that there is no more data to process and end the job.
As requested in comments, Here is an example of how to import the reader:
#Configuration
#org.springframework.context.annotation.Import(ReadWriterConfig.class)
#EnableBatchProcessing
public class BatchConfiguration {
// other bean definitions
#Bean
public Step stepA(ElasticsearchItemReader<AnotherElement> reader, FlatFileItemWriter<AnotherElement> writer) {
return stepBuilderFactory.get("stepA")
.<AnotherElement, AnotherElement> chunk(10)
.reader(reader)
.processor(processor())
.writer(writer)
.build();
}
}
Very late to answer this but I too faced the same issue yesterday.
Not sure if the the issue is with queryForList but following worked for me.
I changed queryForList to startScroll call and subsequent continueScroll call.
protected Iterator<T> doPageRead() {
if(isFirstCall){ //isFirstcall is a boolean indicating if this is the first call to doPageRead
ScrolledPage<T> scrolledPage = (ScrolledPage<T>) elasticsearchOperations.startScroll(1 * 60 * 1000, query, targetType);
scrollId = scrolledPage.getScrollId();
iterator = (Iterator<T>)scrolledPage.iterator();
isFirstCall = false;
} else{
iterator = (Iterator<T>)elasticsearchOperations.continueScroll( scrollId, 1 * 60 * 1000, targetType).iterator();
}
return iterator;
}
You might need to use different scroll related methods based on the version of elasticsearchOperations.

Is there a bug in Spring Batch Step flow function?

In the below piece of code, when StepA fails only StepB and StepC should execute but what actually happens is that all the 3 steps are getting executed! I want to split a spring batch job depending upon whether a step passes or not. I know that there are other ways of doing this by using JobDecider, setting some job parameter, etc but I wanted to know I was doing wrongly here?
#Configuration
#EnableBatchProcessing
public class JobConfig {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Bean
public PlatformTransactionManager transactionManager() {
return new ResourcelessTransactionManager();
}
#Bean
public JobRepository jobRepository() {
try {
return new MapJobRepositoryFactoryBean(transactionManager())
.getJobRepository();
} catch (Exception e) {
return null;
}
}
#Bean
public JobLauncher jobLauncher() {
final SimpleJobLauncher launcher = new SimpleJobLauncher();
launcher.setJobRepository(jobRepository());
return launcher;
}
#Bean
public Job job() {
return jobBuilderFactory.get("job").
flow(stepA()).on("FAILED").to(stepC()).next(stepD()).
from(stepA()).on("*").to(stepB()).next(stepC()).end().build();
}
#Bean
public Step stepA() {
return stepBuilderFactory.get("stepA")
.tasklet(new RandomFailTasket("stepA")).build();
}
#Bean
public Step stepB() {
return stepBuilderFactory.get("stepB")
.tasklet(new PrintTextTasklet("stepB")).build();
}
#Bean
public Step stepC() {
return stepBuilderFactory.get("stepC")
.tasklet(new PrintTextTasklet("stepC")).build();
}
#Bean
public Step stepD() {
return stepBuilderFactory.get("stepD")
.tasklet(new PrintTextTasklet("stepD")).build();
}
#SuppressWarnings("resource")
public static void main(String[] args) {
// create spring application context
final ApplicationContext appContext = new AnnotationConfigApplicationContext(
JobConfig.class);
// get the job config bean (i.e this bean)
final JobConfig jobConfig = appContext.getBean(JobConfig.class);
// get the job launcher
JobLauncher launcher = jobConfig.jobLauncher();
try {
// launch the job
JobExecution execution = launcher.run(jobConfig.job(), new JobParameters());
System.out.println(execution.getJobInstance().toString());
} catch (JobExecutionAlreadyRunningException e) {
e.printStackTrace();
} catch (JobRestartException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (JobInstanceAlreadyCompleteException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (JobParametersInvalidException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
StepA: is a dummy job which fails i.e it throws some exception
public class RandomFailTasket extends PrintTextTasklet {
public RandomFailTasket(String text) {
super(text);
}
public RepeatStatus execute(StepContribution arg0, ChunkContext arg1)
throws Exception {
if (Math.random() < 0.5){
throw new Exception("fail");
}
return RepeatStatus.FINISHED;
}
}
StepB, StepC, StepD are also dummy tasklets:
public class PrintTextTasklet implements Tasklet {
private final String text;
public PrintTextTasklet(String text){
this.text = text;
}
public RepeatStatus execute(StepContribution arg0, ChunkContext arg1)
throws Exception {
System.out.println(text);
return RepeatStatus.FINISHED;
}
}
need to have a look at the xml structure that you are using.
Try using Step listener - and then in the after step method you can check the Step status and then you can implement your logic to call the next step or not

Spring batch with Spring Boot terminates before children process with AsyncItemProcessor

I'm using Spring Batch with a AsyncItemProcessor and things are behaving unexpectedly. Let me show first the code:
Followed a simple example as shown on the Spring Batch project:
#EnableBatchProcessing
#SpringBootApplication
#Import({HttpClientConfigurer.class, BatchJobConfigurer.class})
public class PerfilEletricoApp {
public static void main(String[] args) throws Exception {// NOSONAR
System.exit(SpringApplication.exit(SpringApplication.run(PerfilEletricoApp.class, args)));
//SpringApplication.run(PerfilEletricoApp.class, args);
}
}
-- EDIT
If I just sleep the main process go give a few seconds to slf4j to write the flush the logs, everything works as expected.
#EnableBatchProcessing
#SpringBootApplication
#Import({HttpClientConfigurer.class, BatchJobConfigurer.class})
public class PerfilEletricoApp {
public static void main(String[] args) throws Exception {// NOSONAR
//System.exit(SpringApplication.exit(SpringApplication.run(PerfilEletricoApp.class, args)));
ConfigurableApplicationContext context = SpringApplication.run(PerfilEletricoApp.class, args);
Thread.sleep(1000 * 5);
System.exit(SpringApplication.exit(context));
}
}
-- ENDOF EDIT
I'm reading a text file with a field and then using a AsyncItemProcessor to get a multithreaded processing, which consists of a Http GET on a URL to fetch some data, I'm also using a NoOpWriter to do nothing on the write part. I'm saving the results of the GET on the Processor part of the job (using log.trace / log.warn).
#Configuration
public class HttpClientConfigurer {
// [... property and configs omitted]
#Bean
public CloseableHttpClient createHttpClient() {
// ... creates and returns a poolable http client etc
}
}
As for the Job:
#Configuration
public class BatchJobConfigurer {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Value("${async.tps:10}")
private Integer tps;
#Value("${com.bemobi.perfilelerico.sourcedir:/AppServer/perfil-eletrico/source-dir/}")
private String sourceDir;
#Bean
public ItemReader<String> reader() {
MultiResourceItemReader<String> reader = new MultiResourceItemReader<>();
reader.setResources( new Resource[] { new FileSystemResource(sourceDir)});
reader.setDelegate((ResourceAwareItemReaderItemStream<? extends String>) flatItemReader());
return reader;
}
#Bean
public ItemReader<String> flatItemReader() {
FlatFileItemReader<String> itemReader = new FlatFileItemReader<>();
itemReader.setLineMapper(new DefaultLineMapper<String>() {{
setLineTokenizer(new DelimitedLineTokenizer() {{
setNames(new String[] { "sample-field-001"});
}});
setFieldSetMapper(new SimpleStringFieldSetMapper<>());
}});
return itemReader;
}
#Bean
public ItemProcessor asyncItemProcessor(){
AsyncItemProcessor<String, OiPaggoResponse> asyncItemProcessor = new AsyncItemProcessor<>();
asyncItemProcessor.setDelegate(processor());
asyncItemProcessor.setTaskExecutor(getAsyncExecutor());
return asyncItemProcessor;
}
#Bean
public ItemProcessor<String,OiPaggoResponse> processor(){
return new PerfilEletricoItemProcessor();
}
/**
* Using a NoOpItemWriter<T> so we satisfy spring batch flow but don't use writer for anything else.
* #return a NoOpItemWriter<OiPaggoResponse>
*/
#Bean
public ItemWriter<OiPaggoResponse> writer() {
return new NoOpItemWriter<>();
}
#Bean
protected Step step1() throws Exception {
/*
Problem starts here, If Use the processor() everything ends nicely, but if I insist on the asyncItemProcessor(), the job ends and the logs from processor are not stored on the disk.
*/
return this.steps.get("step1").<String, OiPaggoResponse> chunk(10)
.reader(reader())
.processor(asyncItemProcessor())
.build();
}
#Bean
public Job job() throws Exception {
return this.jobs.get("consulta-perfil-eletrico").start(step1()).build();
}
#Bean(name = "asyncExecutor")
public TaskExecutor getAsyncExecutor()
{
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(tps);
executor.setMaxPoolSize(tps);
executor.setQueueCapacity(tps * 1000);
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
executor.setThreadNamePrefix("AsyncExecutor-");
return executor;
}
}
-- UPDATED WITH AsyncItemWriter (Working version)
/*Wrapped Writer*/
#Bean
public ItemWriter asyncItemWriter(){
AsyncItemWriter<OiPaggoResponse> asyncItemWriter = new AsyncItemWriter<>();
asyncItemWriter.setDelegate(writer());
return asyncItemWriter;
}
/*AsyncItemWriter defined on the steps*/
#Bean
protected Step step1() throws Exception {
return this.steps.get("step1").<String, OiPaggoResponse> chunk(10)
.reader(reader())
.processor(asyncItemProcessor())
.writer(asyncItemWriter())
.build();
}
--
Any thoughts on why the AsyncItemProcessor don't wait for all the children to to complete before send a OK-Completed signal to the context?
The issue is that the AsyncItemProcessor is creating Futures that no one is waiting for. Wrap your NoOpItemWriter in the AsyncItemWriter so that someone is waiting for the Futures. That will cause the job to complete as expected.

Resources