So I'm toying around with Spring Batch for the first time and trying to understand how to do things other than process a CSV file.
Attempting to read every music file in a directory for example, I have the following code but I'm not sure how to handle the Delegate part.
public class BatchConfiguration {
public JobBuilderFactory jobBuilderFactory;
public StepBuilderFactory stepBuilderFactory;
public MusicItemProcessor processor() {
return new MusicItemProcessor();
public Job readFiles() {
return jobBuilderFactory.get("readFiles").incrementer(new RunIdIncrementer()).
public Step step1() {
return stepBuilderFactory.get("step1").<String, String>chunk(10)
public ItemReader<String> reader() {
Resource[] resources = null;
ResourcePatternResolver patternResolver = new PathMatchingResourcePatternResolver();
try {
resources = patternResolver.getResources("file:/music/*.flac");
} catch (IOException e) {
MultiResourceItemReader<String> reader = new MultiResourceItemReader<>();
reader.setDelegate(new FlatFileItemReader<>()); // ??
return reader;
At the moment I can see that resources has a list of music files, but looking at the stacktrace I get back, it looks to me like new FlatFileItemReader<>() is trying to read the actual content of the files (I'll want to do that at some point, just not right now).
At the moment I just want the information about the file (absolute path, size, filename etc), not what's inside.
Have I gone completely wrong with this? Or do I just need to configure something a little different?
Any examples of code that does more than process CSV lines would also be awesome

After scouring the internet I've managed to pull together something that I think works... Some feedback would be welcome.
public class BatchConfiguration {
public JobBuilderFactory jobBuilderFactory;
public StepBuilderFactory stepBuilderFactory;
public VideoItemProcessor processor() {
return new VideoItemProcessor();
public Job readFiles() {
return jobBuilderFactory.get("readFiles")
public Step step() {
try {
return stepBuilderFactory.get("step").<File, Video>chunk(500)
} catch (IOException e) {
return null;
public DirectoryItemReader directoryItemReader() throws IOException {
return new DirectoryItemReader("file:/media/media/Music/**/*.flac");
The part that had me stuck with creating a custom reader for files. If anyone else comes across this, this is how I've done it. I'm sure there are better ways but this works for me
public class DirectoryItemReader implements ItemReader<File>, InitializingBean {
private final String directoryPath;
private final List<File> foundFiles = Collections.synchronizedList(new ArrayList<>());
public DirectoryItemReader(final String directoryPath) {
this.directoryPath = directoryPath;
public File read() {
if (!foundFiles.isEmpty()) {
return foundFiles.remove(0);
synchronized (foundFiles) {
final Iterator files = foundFiles.iterator();
if (files.hasNext()) {
return foundFiles.remove(0);
return null;
public void afterPropertiesSet() throws Exception {
for (final Resource file : getFiles()) {
private Resource[] getFiles() throws IOException {
ResourcePatternResolver patternResolver = new PathMatchingResourcePatternResolver();
return patternResolver.getResources(directoryPath);
The only thing you'd need to do is implement your own processor. I've used Videos in this example, so I have a video processor
public class VideoItemProcessor implements ItemProcessor<File, Video> {
public Video process(final File item) throws Exception {
Video video = Video.builder()
log.info("Created {}", video);
return video;


Spring Batch SkippListener not printing logs

I am not sure why the SkipListener is not printing the simple logs. I want to log the skipped records into a separate log file and I am thinking I can use MDC.put(). But I am not even able to print a log in console not sure what is happening. I think I am missing something. A help would be really appreciated. I even tried with generic Exception.class for testing but still not doing anything. Here is my code;
public class BatchConfiguration {
static final Logger LOG = LogManager.getLogger(BatchConfiguration.class);
public JobBuilderFactory jobBuilderFactory;
public StepBuilderFactory stepBuilderFactory;
public DataSource dataSource;
public Job loadUserJob() {
return jobBuilderFactory.get("loadUserJob")
.incrementer(new RunIdIncrementer())
.listener(new JobLoggerListener())
public Step loadUsersStep() {
return stepBuilderFactory.get("loadUsersStep")
.<UserInfo, UserInfoDTO> chunk(10)
.listener(new StepStartStopListener())
.listener(new LoadDataSkipListener())
public FlatFileItemReader<UserInfo> reader() {
return new FlatFileItemReaderBuilder<UserInfo>()
.names(new String[] {
.resource(new ClassPathResource("userinfo_file.csv"))
public UserInfoItemProcessor UserInfoItemProcessor() {
return new UserInfoItemProcessor();
public JdbcBatchItemWriter<UserInfoDTO> writer() {
JdbcBatchItemWriter<UserInfoDTO> writer = new JdbcBatchItemWriter<UserInfoDTO>();
writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>());
writer.setSql("INSERT INTO db1.userinfo "
+ "(first_name,last_name,email) " +
"VALUES (:firstName, :lastName,:email)");
return writer;
SkipListener Class
public class LoadDataSkipListener implements SkipListener<UserInfoDTO, UserInfoDTO>{
public static final Logger LOG = LogManager.getLogger(LoadDataSkipListener.class);
public void onSkipInRead(Throwable t) {
LOG.info("ItemWriter: ");
public void onSkipInWrite(UserInfoDTO item, Throwable t) {
LOG.info(">>> onSkipInWrite <<<");
// MDC.put("skipped_users", String.valueOf(item.getUserId())); // I want to log skipped user ids on write
public void onSkipInProcess(UserInfoDTO item, Throwable t) {
LOG.info(">>> onSkipInProcess<<< ");
#logging.file={log-name}.log --> WIll this write into a new file ??
Just want to know
Why its not even printing in console ?
Can the commented MDC be used to write to a new file ?
Thanks in advance!

Saving file information in Spring batch MultiResourceItemReader

I have a directory having text files. I want to process files and write data into db. I did that by using MultiResourceItemReader.
I have a scenario like whenever file is coming, the first step is to save file info, like filename, record count in file in a log table(custom table).
Since i used MultiResourceItemReader, It's loading all files once and the code which i wrote is executing once in server startup. I tried with getCurrentResource() method but its returning null.
Please refer below code.
public class NetFileProcessController {
private JobLauncher jobLauncher;
private Job job;
#GetMapping(path = "/process")
public #ResponseBody StatusResponse process() throws ServiceException {
try {
Map<String, JobParameter> parameters = new HashMap<>();
parameters.put("date", new JobParameter(new Date()));
jobLauncher.run(job, new JobParameters(parameters));
return new StatusResponse(true);
} catch (Exception e) {
log.error("Exception", e);
Throwable rootException = ExceptionUtils.getRootCause(e);
String errMessage = rootException.getMessage();
log.info("Root cause is instance of JobInstanceAlreadyCompleteException --> "+(rootException instanceof JobInstanceAlreadyCompleteException));
if(rootException instanceof JobInstanceAlreadyCompleteException){
return new StatusResponse(false, "This job has been completed already!");
} else{
throw new ServiceException(errMessage);
public class BatchConfig {
private JobBuilderFactory jobBuilderFactory;
public void setJobBuilderFactory(JobBuilderFactory jobBuilderFactory) {
this.jobBuilderFactory = jobBuilderFactory;
StepBuilderFactory stepBuilderFactory;
private Resource[] netFileInputs;
private String netFilecolumnNames;
private String netFileColumnLengths;
NetFileInfoTasklet netFileInfoTasklet;
NetFlatFileProcessor netFlatFileProcessor;
NetFlatFileWriter netFlatFileWriter;
public Job netFileParseJob() {
return jobBuilderFactory.get("netFileParseJob")
.incrementer(new RunIdIncrementer())
public Step netFileStep() {
return stepBuilderFactory.get("netFileStep")
.<NetDetailsDTO, NetDetailsDTO>chunk(1)
.reader(new NetFlatFileReader(netFileInputs, netFilecolumnNames, netFileColumnLengths))
public class NetFlatFileReader extends MultiResourceItemReader<NetDetailsDTO> {
public netFlatFileReader(Resource[] netFileInputs, String netFilecolumnNames, String netFileColumnLengths) {
setDelegate(reader(netFilecolumnNames, netFileColumnLengths));
private FlatFileItemReader<NetDetailsDTO> reader(String netFilecolumnNames, String netFileColumnLengths) {
FlatFileItemReader<NetDetailsDTO> flatFileItemReader = new FlatFileItemReader<>();
FixedLengthTokenizer tokenizer = CommonUtil.fixedLengthTokenizer(netFilecolumnNames, netFileColumnLengths);
FieldSetMapper<NetDetailsDTO> mapper = createMapper();
DefaultLineMapper<NetDetailsDTO> lineMapper = new DefaultLineMapper<>();
return flatFileItemReader;
* Mapping column data to DTO
private FieldSetMapper<NetDetailsDTO> createMapper() {
BeanWrapperFieldSetMapper<NetDetailsDTO> mapper = new BeanWrapperFieldSetMapper<>();
try {
} catch(Exception e) {
log.error("Exception in mapping column data to dto ", e);
return mapper;
I am stuck on this scenario, Any help appreciated
I don't think MultiResourceItemReader is appropriate in your case. I would run a job per file for all the reasons of making one thing do one thing and do it well:
Your preparatory step will work by design
It would be easier to run multiple jobs in parallel and improve your file ingestion throughput
In case of failure, you would only restart the job for the failed file
EDIT: add an example
Resource[] netFileInputs = ... // same code that looks for file as currently in your reader
for (Resource netFileInput : netFileInputs) {
Map<String, JobParameter> parameters = new HashMap<>();
parameters.put("netFileInput", new JobParameter(netFileInput.getFilename()));
jobLauncher.run(job, new JobParameters(parameters));

ElasticsearchItemReader keeps reading same records

I am really beginner in Spring and I have to develop an application using spring-batch. This application must read from a elasticsearch index and write all the records in a File.
When I run the program, I don't get any error, and the application reads the records and write them in the file correctly. The thing is the application never stops and keep reading, processing and writing the data without ending. In the following picture, you can see same records being processing many times.
I think must be some problem in my code or my design of the software, so I attach the most important parts of my code hereunder.
I developed the following ElasticsearchItemReader:
public class ElasticsearchItemReader<T> extends AbstractPaginatedDataItemReader<T> implements InitializingBean {
private final Logger logger;
private final ElasticsearchOperations elasticsearchOperations;
private final SearchQuery query;
private final Class<? extends T> targetType;
public ElasticsearchItemReader(ElasticsearchOperations elasticsearchOperations, SearchQuery query, Class<? extends T> targetType) {
logger = getLogger(getClass());
this.elasticsearchOperations = elasticsearchOperations;
this.query = query;
this.targetType = targetType;
public void afterPropertiesSet() throws Exception {
state(elasticsearchOperations != null, "An ElasticsearchOperations implementation is required.");
state(query != null, "A query is required.");
state(targetType != null, "A target type to convert the input into is required.");
protected Iterator<T> doPageRead() {
logger.debug("executing query {}", query.getQuery());
return (Iterator<T>)elasticsearchOperations.queryForList(query, targetType).iterator();
Also I wrote the following ReadWriterConfig:
public class ReadWriterConfig {
public ElasticsearchItemReader<AnotherElement> elasticsearchItemReader() {
return new ElasticsearchItemReader<>(elasticsearchOperations(), query(), AnotherElement.class);
public SearchQuery query() {
NativeSearchQueryBuilder builder = new NativeSearchQueryBuilder()
return builder.build();
public ElasticsearchOperations elasticsearchOperations() {
Client client = null;
try {
Settings settings = Settings.builder()
client = new PreBuiltTransportClient(settings)
.addTransportAddress(new TransportAddress(InetAddress.getByName("localhost"), 9300));
return new ElasticsearchTemplate(client);
} catch (UnknownHostException e) {
return null;
And I wrote the batchconfiguration where I call the reader, writer and processor:
public class BatchConfiguration {
public JobBuilderFactory jobBuilderFactory;
public StepBuilderFactory stepBuilderFactory;
// tag::readerwriterprocessor[]
public ElasticsearchItemReader<AnotherElement> reader() {
return new ReadWriterConfig().elasticsearchItemReader();
public PersonItemProcessor processor() {
return new PersonItemProcessor();
public FlatFileItemWriter itemWriter() {
return new FlatFileItemWriterBuilder<AnotherElement>()
.resource(new FileSystemResource("target/output.txt"))
.lineAggregator(new PassThroughLineAggregator<>())
// end::readerwriterprocessor[]
// tag::jobstep[]
public Job importUserJob(JobCompletionNotificationListener listener, Step stepA) {
return jobBuilderFactory.get("importUserJob")
public Step stepA(FlatFileItemWriter<AnotherElement> writer) {
return stepBuilderFactory.get("stepA")
.<AnotherElement, AnotherElement> chunk(10)
// end::jobstep[]
I attach some of the websites I was follpwimg to write this code:
Your reader should return an Iterator for every call of doPageRead() with which it is possible to iterate over one page of a dataset. As you are not splitting the result from the Elasticsearch query into pages but query the whole set in one step, you are returning in the first call to doPageRead() an iterator for the whole result set. Then in the next call, you return again an iterator over the very same result set.
So you have to keep track if you already returned the iterator, something like:
public class ElasticsearchItemReader<T> extends AbstractPaginatedDataItemReader<T> implements InitializingBean {
// leaving out irrelevant parts
boolean doPageReadCalled = false;
protected Iterator<T> doPageRead() {
if(doPageReadCalled) {
return null;
doPageReadCalled = true
return (Iterator<T>)elasticsearchOperations.queryForList(query, targetType).iterator();
On the first call you set the flag to true and then return the iterator, on the next call you then see that you already returned the data and return null.
This is a very basic solution, depending on the amount of data you get from Elasticsearch, it might be better to query for example with the scroll api and return pages until all are processed.
You need to make sure your item reader returns null at some point to signal that there is no more data to process and end the job.
As requested in comments, Here is an example of how to import the reader:
public class BatchConfiguration {
// other bean definitions
public Step stepA(ElasticsearchItemReader<AnotherElement> reader, FlatFileItemWriter<AnotherElement> writer) {
return stepBuilderFactory.get("stepA")
.<AnotherElement, AnotherElement> chunk(10)
Very late to answer this but I too faced the same issue yesterday.
Not sure if the the issue is with queryForList but following worked for me.
I changed queryForList to startScroll call and subsequent continueScroll call.
protected Iterator<T> doPageRead() {
if(isFirstCall){ //isFirstcall is a boolean indicating if this is the first call to doPageRead
ScrolledPage<T> scrolledPage = (ScrolledPage<T>) elasticsearchOperations.startScroll(1 * 60 * 1000, query, targetType);
scrollId = scrolledPage.getScrollId();
iterator = (Iterator<T>)scrolledPage.iterator();
isFirstCall = false;
} else{
iterator = (Iterator<T>)elasticsearchOperations.continueScroll( scrollId, 1 * 60 * 1000, targetType).iterator();
return iterator;
You might need to use different scroll related methods based on the version of elasticsearchOperations.

Is there a bug in Spring Batch Step flow function?

In the below piece of code, when StepA fails only StepB and StepC should execute but what actually happens is that all the 3 steps are getting executed! I want to split a spring batch job depending upon whether a step passes or not. I know that there are other ways of doing this by using JobDecider, setting some job parameter, etc but I wanted to know I was doing wrongly here?
public class JobConfig {
private JobBuilderFactory jobBuilderFactory;
private StepBuilderFactory stepBuilderFactory;
public PlatformTransactionManager transactionManager() {
return new ResourcelessTransactionManager();
public JobRepository jobRepository() {
try {
return new MapJobRepositoryFactoryBean(transactionManager())
} catch (Exception e) {
return null;
public JobLauncher jobLauncher() {
final SimpleJobLauncher launcher = new SimpleJobLauncher();
return launcher;
public Job job() {
return jobBuilderFactory.get("job").
public Step stepA() {
return stepBuilderFactory.get("stepA")
.tasklet(new RandomFailTasket("stepA")).build();
public Step stepB() {
return stepBuilderFactory.get("stepB")
.tasklet(new PrintTextTasklet("stepB")).build();
public Step stepC() {
return stepBuilderFactory.get("stepC")
.tasklet(new PrintTextTasklet("stepC")).build();
public Step stepD() {
return stepBuilderFactory.get("stepD")
.tasklet(new PrintTextTasklet("stepD")).build();
public static void main(String[] args) {
// create spring application context
final ApplicationContext appContext = new AnnotationConfigApplicationContext(
// get the job config bean (i.e this bean)
final JobConfig jobConfig = appContext.getBean(JobConfig.class);
// get the job launcher
JobLauncher launcher = jobConfig.jobLauncher();
try {
// launch the job
JobExecution execution = launcher.run(jobConfig.job(), new JobParameters());
} catch (JobExecutionAlreadyRunningException e) {
} catch (JobRestartException e) {
// TODO Auto-generated catch block
} catch (JobInstanceAlreadyCompleteException e) {
// TODO Auto-generated catch block
} catch (JobParametersInvalidException e) {
// TODO Auto-generated catch block
StepA: is a dummy job which fails i.e it throws some exception
public class RandomFailTasket extends PrintTextTasklet {
public RandomFailTasket(String text) {
public RepeatStatus execute(StepContribution arg0, ChunkContext arg1)
throws Exception {
if (Math.random() < 0.5){
throw new Exception("fail");
return RepeatStatus.FINISHED;
StepB, StepC, StepD are also dummy tasklets:
public class PrintTextTasklet implements Tasklet {
private final String text;
public PrintTextTasklet(String text){
this.text = text;
public RepeatStatus execute(StepContribution arg0, ChunkContext arg1)
throws Exception {
return RepeatStatus.FINISHED;
need to have a look at the xml structure that you are using.
Try using Step listener - and then in the after step method you can check the Step status and then you can implement your logic to call the next step or not

Spring batch with Spring Boot terminates before children process with AsyncItemProcessor

I'm using Spring Batch with a AsyncItemProcessor and things are behaving unexpectedly. Let me show first the code:
Followed a simple example as shown on the Spring Batch project:
#Import({HttpClientConfigurer.class, BatchJobConfigurer.class})
public class PerfilEletricoApp {
public static void main(String[] args) throws Exception {// NOSONAR
System.exit(SpringApplication.exit(SpringApplication.run(PerfilEletricoApp.class, args)));
//SpringApplication.run(PerfilEletricoApp.class, args);
If I just sleep the main process go give a few seconds to slf4j to write the flush the logs, everything works as expected.
#Import({HttpClientConfigurer.class, BatchJobConfigurer.class})
public class PerfilEletricoApp {
public static void main(String[] args) throws Exception {// NOSONAR
//System.exit(SpringApplication.exit(SpringApplication.run(PerfilEletricoApp.class, args)));
ConfigurableApplicationContext context = SpringApplication.run(PerfilEletricoApp.class, args);
Thread.sleep(1000 * 5);
I'm reading a text file with a field and then using a AsyncItemProcessor to get a multithreaded processing, which consists of a Http GET on a URL to fetch some data, I'm also using a NoOpWriter to do nothing on the write part. I'm saving the results of the GET on the Processor part of the job (using log.trace / log.warn).
public class HttpClientConfigurer {
// [... property and configs omitted]
public CloseableHttpClient createHttpClient() {
// ... creates and returns a poolable http client etc
As for the Job:
public class BatchJobConfigurer {
private JobBuilderFactory jobs;
private StepBuilderFactory steps;
private Integer tps;
private String sourceDir;
public ItemReader<String> reader() {
MultiResourceItemReader<String> reader = new MultiResourceItemReader<>();
reader.setResources( new Resource[] { new FileSystemResource(sourceDir)});
reader.setDelegate((ResourceAwareItemReaderItemStream<? extends String>) flatItemReader());
return reader;
public ItemReader<String> flatItemReader() {
FlatFileItemReader<String> itemReader = new FlatFileItemReader<>();
itemReader.setLineMapper(new DefaultLineMapper<String>() {{
setLineTokenizer(new DelimitedLineTokenizer() {{
setNames(new String[] { "sample-field-001"});
setFieldSetMapper(new SimpleStringFieldSetMapper<>());
return itemReader;
public ItemProcessor asyncItemProcessor(){
AsyncItemProcessor<String, OiPaggoResponse> asyncItemProcessor = new AsyncItemProcessor<>();
return asyncItemProcessor;
public ItemProcessor<String,OiPaggoResponse> processor(){
return new PerfilEletricoItemProcessor();
* Using a NoOpItemWriter<T> so we satisfy spring batch flow but don't use writer for anything else.
* #return a NoOpItemWriter<OiPaggoResponse>
public ItemWriter<OiPaggoResponse> writer() {
return new NoOpItemWriter<>();
protected Step step1() throws Exception {
Problem starts here, If Use the processor() everything ends nicely, but if I insist on the asyncItemProcessor(), the job ends and the logs from processor are not stored on the disk.
return this.steps.get("step1").<String, OiPaggoResponse> chunk(10)
public Job job() throws Exception {
return this.jobs.get("consulta-perfil-eletrico").start(step1()).build();
#Bean(name = "asyncExecutor")
public TaskExecutor getAsyncExecutor()
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setQueueCapacity(tps * 1000);
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
return executor;
-- UPDATED WITH AsyncItemWriter (Working version)
/*Wrapped Writer*/
public ItemWriter asyncItemWriter(){
AsyncItemWriter<OiPaggoResponse> asyncItemWriter = new AsyncItemWriter<>();
return asyncItemWriter;
/*AsyncItemWriter defined on the steps*/
protected Step step1() throws Exception {
return this.steps.get("step1").<String, OiPaggoResponse> chunk(10)
Any thoughts on why the AsyncItemProcessor don't wait for all the children to to complete before send a OK-Completed signal to the context?
The issue is that the AsyncItemProcessor is creating Futures that no one is waiting for. Wrap your NoOpItemWriter in the AsyncItemWriter so that someone is waiting for the Futures. That will cause the job to complete as expected.
