ElasticsearchItemReader keeps reading same records

ElasticsearchItemReader keeps reading same records - spring

I am really beginner in Spring and I have to develop an application using spring-batch. This application must read from a elasticsearch index and write all the records in a File.
When I run the program, I don't get any error, and the application reads the records and write them in the file correctly. The thing is the application never stops and keep reading, processing and writing the data without ending. In the following picture, you can see same records being processing many times.
I think must be some problem in my code or my design of the software, so I attach the most important parts of my code hereunder.
I developed the following ElasticsearchItemReader:
public class ElasticsearchItemReader<T> extends AbstractPaginatedDataItemReader<T> implements InitializingBean {
private final Logger logger;
private final ElasticsearchOperations elasticsearchOperations;
private final SearchQuery query;
private final Class<? extends T> targetType;
public ElasticsearchItemReader(ElasticsearchOperations elasticsearchOperations, SearchQuery query, Class<? extends T> targetType) {
setName(getShortName(getClass()));
logger = getLogger(getClass());
this.elasticsearchOperations = elasticsearchOperations;
this.query = query;
this.targetType = targetType;
}
#Override
public void afterPropertiesSet() throws Exception {
state(elasticsearchOperations != null, "An ElasticsearchOperations implementation is required.");
state(query != null, "A query is required.");
state(targetType != null, "A target type to convert the input into is required.");
}
#Override
#SuppressWarnings("unchecked")
protected Iterator<T> doPageRead() {
logger.debug("executing query {}", query.getQuery());
return (Iterator<T>)elasticsearchOperations.queryForList(query, targetType).iterator();
}
}
Also I wrote the following ReadWriterConfig:
#Configuration
public class ReadWriterConfig {
#Bean
public ElasticsearchItemReader<AnotherElement> elasticsearchItemReader() {
return new ElasticsearchItemReader<>(elasticsearchOperations(), query(), AnotherElement.class);
}
#Bean
public SearchQuery query() {
NativeSearchQueryBuilder builder = new NativeSearchQueryBuilder()
.withQuery(matchAllQuery());
return builder.build();
}
#Bean
public ElasticsearchOperations elasticsearchOperations() {
Client client = null;
try {
Settings settings = Settings.builder()
.build();
client = new PreBuiltTransportClient(settings)
.addTransportAddress(new TransportAddress(InetAddress.getByName("localhost"), 9300));
return new ElasticsearchTemplate(client);
} catch (UnknownHostException e) {
e.printStackTrace();
return null;
}
}
}
And I wrote the batchconfiguration where I call the reader, writer and processor:
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
// tag::readerwriterprocessor[]
#Bean
public ElasticsearchItemReader<AnotherElement> reader() {
return new ReadWriterConfig().elasticsearchItemReader();
}
#Bean
public PersonItemProcessor processor() {
return new PersonItemProcessor();
}
#Bean
public FlatFileItemWriter itemWriter() {
return new FlatFileItemWriterBuilder<AnotherElement>()
.name("itemWriter")
.resource(new FileSystemResource("target/output.txt"))
.lineAggregator(new PassThroughLineAggregator<>())
.build();
}
// end::readerwriterprocessor[]
// tag::jobstep[]
#Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step stepA) {
return jobBuilderFactory.get("importUserJob")
.flow(stepA)
.end()
.build();
}
#Bean
public Step stepA(FlatFileItemWriter<AnotherElement> writer) {
return stepBuilderFactory.get("stepA")
.<AnotherElement, AnotherElement> chunk(10)
.reader(reader())
.processor(processor())
.writer(itemWriter())
.build();
}
// end::jobstep[]
}
I attach some of the websites I was follpwimg to write this code:
https://github.com/spring-projects/spring-batch-extensions/blob/master/spring-batch-elasticsearch/README.md
https://spring.io/guides/gs/batch-processing/

Your reader should return an Iterator for every call of doPageRead() with which it is possible to iterate over one page of a dataset. As you are not splitting the result from the Elasticsearch query into pages but query the whole set in one step, you are returning in the first call to doPageRead() an iterator for the whole result set. Then in the next call, you return again an iterator over the very same result set.
So you have to keep track if you already returned the iterator, something like:
public class ElasticsearchItemReader<T> extends AbstractPaginatedDataItemReader<T> implements InitializingBean {
// leaving out irrelevant parts
boolean doPageReadCalled = false;
#Override
#SuppressWarnings("unchecked")
protected Iterator<T> doPageRead() {
if(doPageReadCalled) {
return null;
}
doPageReadCalled = true
return (Iterator<T>)elasticsearchOperations.queryForList(query, targetType).iterator();
}
}
On the first call you set the flag to true and then return the iterator, on the next call you then see that you already returned the data and return null.
This is a very basic solution, depending on the amount of data you get from Elasticsearch, it might be better to query for example with the scroll api and return pages until all are processed.

You need to make sure your item reader returns null at some point to signal that there is no more data to process and end the job.
As requested in comments, Here is an example of how to import the reader:
#Configuration
#org.springframework.context.annotation.Import(ReadWriterConfig.class)
#EnableBatchProcessing
public class BatchConfiguration {
// other bean definitions
#Bean
public Step stepA(ElasticsearchItemReader<AnotherElement> reader, FlatFileItemWriter<AnotherElement> writer) {
return stepBuilderFactory.get("stepA")
.<AnotherElement, AnotherElement> chunk(10)
.reader(reader)
.processor(processor())
.writer(writer)
.build();
}
}

Very late to answer this but I too faced the same issue yesterday.
Not sure if the the issue is with queryForList but following worked for me.
I changed queryForList to startScroll call and subsequent continueScroll call.
protected Iterator<T> doPageRead() {
if(isFirstCall){ //isFirstcall is a boolean indicating if this is the first call to doPageRead
ScrolledPage<T> scrolledPage = (ScrolledPage<T>) elasticsearchOperations.startScroll(1 * 60 * 1000, query, targetType);
scrollId = scrolledPage.getScrollId();
iterator = (Iterator<T>)scrolledPage.iterator();
isFirstCall = false;
} else{
iterator = (Iterator<T>)elasticsearchOperations.continueScroll( scrollId, 1 * 60 * 1000, targetType).iterator();
}
return iterator;
}
You might need to use different scroll related methods based on the version of elasticsearchOperations.

Related

Spring batch exception handling sended as ResponseEntity

i m new in Spring boot, i'm training on a small project with Spring batch to get experience, Here my context: I have 2 csv file, one hold employees, the other contains all managers of the compagny. I have to read files, then add each record in database. To make it simple , i just need to call an endpoint from my controller , upload my csv file (multipartfile), then the job will start. I actually was able to do that, my problem is the following.
I have to manage multiple kind of validation (i'm using jsr 380 validation for my entites and i have also to check business exception). A kind of buisness exception can be the following rule, An employee is supervised by a manager of his departement (the employee can't be supervised by a manager, if he's not on same departement, otherwise should throw exception). So for mistaken records, with some invalid or "Illogic" input, i have to skip them (don't save on database) but store them in an Map or List that should be sended as Responses Enity to the client. Hence the client would know which row need to be fixed. I suppose i have to take a look about** Listeners** , But i really can t store exceptions in a map or list then send it as ResponseEntity. Bellow Example of what i want to achieve.
My csv files screenshots
EmployeeBatchConfig.java
#Configuration
#EnableBatchProcessing
#AllArgsConstructor
public class EmployeeBatchConfig {
private JobBuilderFactory jobBuilderFactory;
private StepBuilderFactory stepBuilderFactory;
private EmployeeRepository employeeRepository;
private EmployeeItemWriter employeeItemWriter;
#Bean
#StepScope
public FlatFileItemReader<EmployeeDto> itemReader(#Value("#
{jobParameters[fullPathFileName]}") final String pathFile) {
FlatFileItemReader<EmployeeDto> flatFileItemReader = new
FlatFileItemReader<>();
flatFileItemReader.setResource(new FileSystemResource(new
File(pathFile)));
flatFileItemReader.setName("CSV-Reader");
flatFileItemReader.setLinesToSkip(1);
flatFileItemReader.setLineMapper(lineMapper());
return flatFileItemReader;
}
private LineMapper<EtudiantDto> lineMapper() {
DefaultLineMapper<EtudiantDto> lineMapper = new DefaultLineMapper<>
();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setDelimiter(",");
lineTokenizer.setStrict(false);
lineTokenizer.setNames("Username", "lastName", "firstName",
"departement", "supervisor");
BeanWrapperFieldSetMapper<EmployeeDto> fieldSetMapper = new
BeanWrapperFieldSetMapper<>();
fieldSetMapper.setTargetType(EmployeeDto.class);
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
return lineMapper;
}
#Bean
public EmployeeProcessor processor() {
return new EmployeeProcessor(); /*Create a bean processor to skip
invalid rows*/
}
#Bean
public RepositoryItemWriter<Employee> writer() {
RepositoryItemWriter<Employee> writer = new RepositoryItemWriter<>();
writer.setRepository(employeeRepository);
writer.setMethodName("save");
return writer;
}
#Bean
public Step step1(FlatFileItemReader<EmployeeDto> itemReader) {
return stepBuilderFactory.get("slaveStep").<EmployeeDto,
Employee>chunk(5)
.reader(itemReader)
.processor(processor())
.writer(employeeItemWriter)
.faultTolerant()
.listener(skipListener())
.skip(SkipException.class)
.skipLimit(10)
.skipPolicy(skipPolicy())
.build();
}
#Bean
#Qualifier("executeJobEmployee")
public Job runJob(FlatFileItemReader<Employee> itemReader) {
return jobBuilderFactory
.get("importEmployee")
.flow(step1(itemReader))
.end()
.build();
}
#Bean
public SkipPolicy skipPolicy(){
return new ExceptionSkipPolicy();
}
#Bean
public SkipListener<EmployeeDto, Employee> skipListener(){
return new StepSkipListener();
}
/*#Bean
public ExecutionContext executionContext(){
return new ExecutionContext();
}*/
}
EmployeeProcessor.java
public class EmployeeProcessor implements ItemProcessor<EmployeeDto,
Employee>{
#Autowired
private SupervisorService managerService;
#Override
public Employee process(#Valid EmployeeDto item) throws Exception,
SkipException {
ManagerDto manager =
SupervisorService.findSupervisorById(item.getSupervisor());
//retrieve the manager of the employee and compare departement
if(!(manager.getDepartement().equals(item.getDepartement()))) {
throw new SkipException("Manager Invalid", item);
//return null;
}
return ObjectMapperUtils.map(item, Employee.class);
}
}
MySkipPolicy.java
public class MySkipPolicy implements SkipPolicy {
#Override
public boolean shouldSkip(Throwable throwable, int i) throws
SkipLimitExceededException {
return true;
}
}
StepSkipListenerPolicy.java
public class StepSkipListener implements SkipListener<EmployeeDto,
Number> {
#Override // item reader
public void onSkipInRead(Throwable throwable) {
System.out.println("In OnSkipReader");
}
#Override // item writter
public void onSkipInWrite(Number item, Throwable throwable) {
System.out.println("Nooooooooo ");
}
//#SneakyThrows
#Override // item processor
public void onSkipInProcess(#Valid EmployeeDto employee, Throwable
throwable){
System.out.println("Process... ");
/* I guess this is where I should work, but how do I deal with the
exception occur? How do I know which exception I would get ? */
}
}
SkipException.java
public class SkipException extends Exception {
private Map<String, EmployeeDto> errors = new HashMap<>();
public SkipException(String errorMessage, EmployeeDto employee) {
super();
this.errors.put(errorMessage, employee);
}
public Map<String, EmployeeDto> getErrors() {
return this.errors;
}
}
JobController.java
#RestController
#RequestMapping("/upload")
public class JobController {
#Autowired
private JobLauncher jobLauncher;
#Autowired
#Qualifier("executeJobEmployee")
private Job job;
private final String EMPLOYEE_FOLDER = "C:/Users/Project/Employee/";
#PostMapping("/employee")
public ResponseEntity<Object> importEmployee(#RequestParam("file")
MultipartFile multipartFile) throws JobInterruptedException,
SkipException, IllegalStateException, IOException,
FlatFileParseException{
try {
String fileName = multipartFile.getOriginalFilename();
File fileToImport= new File(EMPLOYEE_FOLDER + fileName);
multipartFile.transferTo(fileToImport);
JobParameters jobParameters = new JobParametersBuilder()
.addString("fullPathFileName", EMPLOYEE_FOLDER + fileName)
.addLong("startAt", System.currentTimeMillis())
.toJobParameters();
JobExecution jobExecution = this.jobLauncher.run(job,
jobParameters);
ExecutionContext executionContext =
jobExecution.getExecutionContext();
System.out.println("My Skiped items : " +
executionContext.toString());
} catch (ConstraintViolationException | FlatFileParseException |
JobRestartException | JobInstanceAlreadyCompleteException |
JobParametersInvalidException |
JobExecutionAlreadyRunningException e) {
e.printStackTrace();
return new ResponseEntity<>(e.getMessage(), HttpStatus.BAD_REQUEST);
}
return new ResponseEntity<>("Employee inserted succesfully",
HttpStatus.OK);
}
}

That requirement forces your implementation to wait for the job to finish before returning the web response, which is not the typical way of launching batch jobs from web requests. Typically, since batch jobs can run for several minutes/hours, they are launched in the background and a job ID is returned back to the client for later status check.
In Spring Batch, the SkipListener is the extension point that allows you to add custom code when a skippable exception happens when reading, processing or writing an item. I would add the business validation in an item processor and throw an exception with the skipped item and the reason for that skip (both encapsulated in the exception class that should be declared as skippable).
Skipped items are usually stored somewhere for later analysis (like a table or a file or the job execution context). In your case, you need to send them back in the web response, so you can read them from the store of your choice before returning them attached in the web response. In pseudo code in your controller, this should be something like the following:
- run the job and wait for its termination (the skip listener would write skipped items in the storage of your choice)
- get skipped items from storage
- return web response
For example, if you choose to store skipped items in the job execution context, you can do something like this in your controller:
JobExecution jobExecution = jobLauncher.run(job, jobParameters);
ExecutionContext executionContext = jobExecution.getExecutionContext();
// get skipped items from the execution context
// return the web response

Saving file information in Spring batch MultiResourceItemReader

I have a directory having text files. I want to process files and write data into db. I did that by using MultiResourceItemReader.
I have a scenario like whenever file is coming, the first step is to save file info, like filename, record count in file in a log table(custom table).
Since i used MultiResourceItemReader, It's loading all files once and the code which i wrote is executing once in server startup. I tried with getCurrentResource() method but its returning null.
Please refer below code.
NetFileProcessController.java
#Slf4j
#RestController
#RequestMapping("/netProcess")
public class NetFileProcessController {
#Autowired
private JobLauncher jobLauncher;
#Autowired
#Qualifier("netFileParseJob")
private Job job;
#GetMapping(path = "/process")
public #ResponseBody StatusResponse process() throws ServiceException {
try {
Map<String, JobParameter> parameters = new HashMap<>();
parameters.put("date", new JobParameter(new Date()));
jobLauncher.run(job, new JobParameters(parameters));
return new StatusResponse(true);
} catch (Exception e) {
log.error("Exception", e);
Throwable rootException = ExceptionUtils.getRootCause(e);
String errMessage = rootException.getMessage();
log.info("Root cause is instance of JobInstanceAlreadyCompleteException --> "+(rootException instanceof JobInstanceAlreadyCompleteException));
if(rootException instanceof JobInstanceAlreadyCompleteException){
log.info(errMessage);
return new StatusResponse(false, "This job has been completed already!");
} else{
throw new ServiceException(errMessage);
}
}
}
}
BatchConfig.java
#Configuration
#EnableBatchProcessing
public class BatchConfig {
private JobBuilderFactory jobBuilderFactory;
#Autowired
public void setJobBuilderFactory(JobBuilderFactory jobBuilderFactory) {
this.jobBuilderFactory = jobBuilderFactory;
}
#Autowired
StepBuilderFactory stepBuilderFactory;
#Value("file:${input.files.location}${input.file.pattern}")
private Resource[] netFileInputs;
#Value("${net.file.column.names}")
private String netFilecolumnNames;
#Value("${net.file.column.lengths}")
private String netFileColumnLengths;
#Autowired
NetFileInfoTasklet netFileInfoTasklet;
#Autowired
NetFlatFileProcessor netFlatFileProcessor;
#Autowired
NetFlatFileWriter netFlatFileWriter;
#Bean
public Job netFileParseJob() {
return jobBuilderFactory.get("netFileParseJob")
.incrementer(new RunIdIncrementer())
.start(netFileStep())
.build();
}
public Step netFileStep() {
return stepBuilderFactory.get("netFileStep")
.<NetDetailsDTO, NetDetailsDTO>chunk(1)
.reader(new NetFlatFileReader(netFileInputs, netFilecolumnNames, netFileColumnLengths))
.processor(netFlatFileProcessor)
.writer(netFlatFileWriter)
.build();
}
}
NetFlatFileReader.java
#Slf4j
public class NetFlatFileReader extends MultiResourceItemReader<NetDetailsDTO> {
public netFlatFileReader(Resource[] netFileInputs, String netFilecolumnNames, String netFileColumnLengths) {
setResources(netFileInputs);
setDelegate(reader(netFilecolumnNames, netFileColumnLengths));
}
private FlatFileItemReader<NetDetailsDTO> reader(String netFilecolumnNames, String netFileColumnLengths) {
FlatFileItemReader<NetDetailsDTO> flatFileItemReader = new FlatFileItemReader<>();
FixedLengthTokenizer tokenizer = CommonUtil.fixedLengthTokenizer(netFilecolumnNames, netFileColumnLengths);
FieldSetMapper<NetDetailsDTO> mapper = createMapper();
DefaultLineMapper<NetDetailsDTO> lineMapper = new DefaultLineMapper<>();
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(mapper);
flatFileItemReader.setLineMapper(lineMapper);
return flatFileItemReader;
}
/*
* Mapping column data to DTO
*/
private FieldSetMapper<NetDetailsDTO> createMapper() {
BeanWrapperFieldSetMapper<NetDetailsDTO> mapper = new BeanWrapperFieldSetMapper<>();
try {
mapper.setTargetType(NetDetailsDTO.class);
} catch(Exception e) {
log.error("Exception in mapping column data to dto ", e);
}
return mapper;
}
}
I am stuck on this scenario, Any help appreciated

I don't think MultiResourceItemReader is appropriate in your case. I would run a job per file for all the reasons of making one thing do one thing and do it well:
Your preparatory step will work by design
It would be easier to run multiple jobs in parallel and improve your file ingestion throughput
In case of failure, you would only restart the job for the failed file
EDIT: add an example
Resource[] netFileInputs = ... // same code that looks for file as currently in your reader
for (Resource netFileInput : netFileInputs) {
Map<String, JobParameter> parameters = new HashMap<>();
parameters.put("netFileInput", new JobParameter(netFileInput.getFilename()));
jobLauncher.run(job, new JobParameters(parameters));
}

How to use spring transaction support with Spring Batch

I am trying to use spring batch to read file from a .dat file and persist the data into database. My requirement says to either insert all of the data or insert none of the data into table i.e, atomicity. However, using spring batch i'm not able to achieve the same it is reading data in chunks and is inserting data as long as the records are fine. if at some point the record is inappropriate and some db exception is thrown then i want complete rollback which is not happening. Let's say we get error at 2051th record then my code saves 2050 records but i want complete rollback and if all data is good then all N records should be persisted. Thanks in advance for any help or relevant approach that may solve my issue...
NOTE: I have already used Spring Transactional annotation on caller method but it's not working and i'm reading data in a chunk size of 10 items.
MyConfiguration.java
#Configuration
public class MyConfiguration
{
#Autowired
JobBuilderFactory jobBuilderFactory;
#Autowired
StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("MyCompletionListener")
JobCompletionNotificationListener jobCompletionNotificationListener;
#StepScope
#Bean(name="MyReader")
public FlatFileItemReader<InputMapperDTO> reader(#Value("#{jobParameters['fileName']}") String fileName) throws IOException
{
FlatFileItemReader<InputMapperDTO> newBean = new FlatFileItemReader<>();
newBean.setName("MyReader");
newBean.setResource(new InputStreamResource(FileUtils.openInputStream(new File(fileName))));
newBean.setLineMapper(lineMapper());
newBean.setLinesToSkip(1);
return newBean;
}
#Bean(name="MyLineMapper")
public DefaultLineMapper<InputMapperDTO> lineMapper()
{
DefaultLineMapper<InputMapperDTO> lineMapper = new DefaultLineMapper<>();
lineMapper.setLineTokenizer(lineTokenizer());
Reader reader = new Reader();
lineMapper.setFieldSetMapper(reader);
return lineMapper;
}
#Bean(name="MyTokenizer")
public DelimitedLineTokenizer lineTokenizer()
{
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setNames("InvestmentAccountUniqueIdentifier", "BaseCurrencyUniqueIdentifier",
"OperatingCurrencyUniqueIdentifier", "PricingHierarchyUniqueIdentifier", "InvestmentAccountNumber",
"DummyAccountIndicator", "InvestmentAdvisorCompanyNumberLegacy","HighNetWorthAccountTypeCode");
tokenizer.setIncludedFields(0, 5, 7, 13, 29, 40, 49,75);
return tokenizer;
}
#Bean(name="MyBatchProcessor")
public ItemProcessor<InputMapperDTO, FinalDTO> processor()
{
return new Processor();
}
#Bean(name="MyWriter")
public ItemWriter<FinalDTO> writer()
{
return new Writer();
}
#Bean(name="MyStep")
public Step step1() throws IOException
{
return stepBuilderFactory.get("MyStep")
.<InputMapperDTO, FinalDTO>chunk(10)
.reader(this.reader(null))
.processor(this.processor())
.writer(this.writer())
.build();
}
#Bean(name=MyJob")
public Job importUserJob(#Autowired #Qualifier("MyStep") Step step1)
{
return jobBuilderFactory
.get("MyJob"+new Date())
.incrementer(new RunIdIncrementer())
.listener(jobCompletionNotificationListener)
.flow(step1)
.end()
.build();
}
}
Writer.java
public class Writer implements ItemWriter<FinalDTO>
{
#Autowired
SomeRepository someRepository;
#Override
public void write(List<? extends FinalDTO> listOfObjects) throws Exception
{
someRepository.saveAll(listOfObjects);
}
}
JobCompletionNotificationListener.java
public class JobCompletionNotificationListener extends JobExecutionListenerSupport
{
#Override
public void afterJob(JobExecution jobExecution)
{
if(jobExecution.getStatus() == BatchStatus.COMPLETED)
{
System.err.println("****************************************");
System.err.println("***** Batch Job Completed ******");
System.err.println("****************************************");
}
else
{
System.err.println("****************************************");
System.err.println("***** Batch Job Failed ******");
System.err.println("****************************************");
}
}
}
MyCallerMethod
#Transactional
public String processFile(String datFile) throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException
{
long st = System.currentTimeMillis();
JobParametersBuilder builder = new JobParametersBuilder();
builder.addString("fileName",datFile);
builder.addDate("date", new Date());
jobLauncher.run(job, builder.toJobParameters());
System.err.println("****************************************");
System.err.println("***** Total time consumed = "+(System.currentTimeMillis()-st)+" ******");
System.err.println("****************************************");
return response;
}

The operation I have tried is not provided in batch. For my requirement, I have implemented custom delete which flushes the database upon failure in any step.

Spring batch - get information about files in a directory

So I'm toying around with Spring Batch for the first time and trying to understand how to do things other than process a CSV file.
Attempting to read every music file in a directory for example, I have the following code but I'm not sure how to handle the Delegate part.
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Bean
public MusicItemProcessor processor() {
return new MusicItemProcessor();
}
#Bean
public Job readFiles() {
return jobBuilderFactory.get("readFiles").incrementer(new RunIdIncrementer()).
flow(step1()).end().build();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("step1").<String, String>chunk(10)
.reader(reader())
.processor(processor()).build();
}
#Bean
public ItemReader<String> reader() {
Resource[] resources = null;
ResourcePatternResolver patternResolver = new PathMatchingResourcePatternResolver();
try {
resources = patternResolver.getResources("file:/music/*.flac");
} catch (IOException e) {
e.printStackTrace();
}
MultiResourceItemReader<String> reader = new MultiResourceItemReader<>();
reader.setResources(resources);
reader.setDelegate(new FlatFileItemReader<>()); // ??
return reader;
}
}
At the moment I can see that resources has a list of music files, but looking at the stacktrace I get back, it looks to me like new FlatFileItemReader<>() is trying to read the actual content of the files (I'll want to do that at some point, just not right now).
At the moment I just want the information about the file (absolute path, size, filename etc), not what's inside.
Have I gone completely wrong with this? Or do I just need to configure something a little different?
Any examples of code that does more than process CSV lines would also be awesome

After scouring the internet I've managed to pull together something that I think works... Some feedback would be welcome.
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Bean
public VideoItemProcessor processor() {
return new VideoItemProcessor();
}
#Bean
public Job readFiles() {
return jobBuilderFactory.get("readFiles")
.start(step())
.build();
}
#Bean
public Step step() {
try {
return stepBuilderFactory.get("step").<File, Video>chunk(500)
.reader(directoryItemReader())
.processor(processor())
.build();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
#Bean
public DirectoryItemReader directoryItemReader() throws IOException {
return new DirectoryItemReader("file:/media/media/Music/**/*.flac");
}
}
The part that had me stuck with creating a custom reader for files. If anyone else comes across this, this is how I've done it. I'm sure there are better ways but this works for me
public class DirectoryItemReader implements ItemReader<File>, InitializingBean {
private final String directoryPath;
private final List<File> foundFiles = Collections.synchronizedList(new ArrayList<>());
public DirectoryItemReader(final String directoryPath) {
this.directoryPath = directoryPath;
}
#Override
public File read() {
if (!foundFiles.isEmpty()) {
return foundFiles.remove(0);
}
synchronized (foundFiles) {
final Iterator files = foundFiles.iterator();
if (files.hasNext()) {
return foundFiles.remove(0);
}
}
return null;
}
#Override
public void afterPropertiesSet() throws Exception {
for (final Resource file : getFiles()) {
this.foundFiles.add(file.getFile());
}
}
private Resource[] getFiles() throws IOException {
ResourcePatternResolver patternResolver = new PathMatchingResourcePatternResolver();
return patternResolver.getResources(directoryPath);
}
}
The only thing you'd need to do is implement your own processor. I've used Videos in this example, so I have a video processor
#Slf4j
public class VideoItemProcessor implements ItemProcessor<File, Video> {
#Override
public Video process(final File item) throws Exception {
Video video = Video.builder()
.filename(item.getAbsoluteFile().getName())
.absolutePath(item.getAbsolutePath())
.fileSize(item.getTotalSpace())
.build();
log.info("Created {}", video);
return video;
}
}

Spring Batch - Read query from file and execute it on database

I know I can simply read the file straight from step1, a moment before setting the sql query into the reader, but I want to keep the process of reading the query separate from database reading.
Here is my job configuration.
#Configuration
public class BatchConfiguration {
[...]
#Bean
#StepScope
public JdbcCursorItemReader<Map<String, Object>> dynamicSqlItemReader() {
JdbcCursorItemReader<Map<String, Object>> jir = new JdbcCursorItemReader<>();
jir.setSql((String) contextHolder.getContext().get("fileContent"));
jir.setDataSource(dataSource);
jir.setRowMapper(new ColumnMapRowMapper());
return jir;
}
private FlatFileItemReader<String> flatFileItemReader() {
[...]
}
private ItemWriter<? super String> sysoItemWriter() {
return (ItemWriter<String>) list -> {
for (String element : list) {
System.out.println(element);
}
contextHolder.getContext().put("fileContent", list.get(0));
};
}
#Bean
public ItemWriter<Map<String, Object>> customerItemWriter() {
return list -> {
for (Map<String, Object> stringObjectMap : list) {
System.out.println(stringObjectMap);
}
};
}
#Bean
public Step step0() {
return stepBuilderFactory.get("step0")
.<String, String>chunk(1)
.reader(flatFileItemReader())
.writer(sysoItemWriter())
.build();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Map<String, Object>, Map<String, Object>>chunk(10)
.reader(dynamicSqlItemReader())
.writer(customerItemWriter())
.build();
}
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job")
.incrementer(new RunIdIncrementer())
.start(step0())
.next(step1())
.build();
}
}
This throws a java.lang.IllegalArgumentException: The SQL query must be provided because the contextHolder.getContext().get("fileContent") is still null at time of setting the query.

Before step1, you could write a tasklet for building the query and putting it into context, so that it stays separate and also it becomes available to step1. See more about tasklet here: Tasklet to delete a table in spring batch

You are not using your created contextHolder properly that's why the value there is null.
Make sure you are putting your data in flatFileItemReader() in contextHolder in directly as a map because when you are getting value, you are using contextholder.getContext(). Since it's an simple map,not a ApplicationContext, the method you are using does not exist.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ElasticsearchItemReader keeps reading same records - spring

Related

Spring batch exception handling sended as ResponseEntity

Saving file information in Spring batch MultiResourceItemReader

How to use spring transaction support with Spring Batch

Spring batch - get information about files in a directory

Spring Batch - Read query from file and execute it on database

Categories

Resources