Multi-Line Records Reader (when start prefix = end prefix) - spring

I'm implementing Multi-Line Records Reader solution based on https://docs.spring.io/spring-batch/reference/html/patterns.html#multiLineRecords
I have the following flat file:
HEA;0013100345;2007-02-15
NCU;Smith;Peter;;T;20014539;F
BAD;;Oak Street 31/A;;Small Town;00235;IL;US
HEA;0013100345;2007-02-15
NCU;Smith;Peter;;T;20014539;F
HEA;0013100345;2007-02-15
HEA(and optionally NCU, BAD) must be converted to a single object.
However in my case I don't have "end" line, so "HEA" is a start of new Item and end of previous one at the same time.
Thanks to Dean Clark for the good suggestion below. This is java config of the solution:
#Bean
public FlatFileItemReader<FieldSet> readerFlat() {
FlatFileItemReader<FieldSet> reader = new FlatFileItemReader<>();
reader.setResource(new ClassPathResource("multirecord.txt"));
reader.setLineMapper(compositeLineMapper());
return reader;
}
#Bean
public SingleItemPeekableItemReader<FieldSet> readerPeek() {
SingleItemPeekableItemReader<FieldSet> reader = new SingleItemPeekableItemReader<FieldSet>() {{
setDelegate(readerFlat());
}};
return reader;
}
#Bean
public MultiLineCaseItemReader readerMultirecord() {
MultiLineCaseItemReader multiReader = new MultiLineCaseItemReader() {{
setDelegate(readerPeek());
}};
return multiReader;
}
Then in the custom MultiLineCaseItemReader your can do both read() and peek()

As the reference docs mention, you should create a custom implementation of ItemReader to wrap the FlatFileItemReader.
More specifically, you may want to extend SingleItemPeekableItemReader and use FlatFileItemReader as your delegate.
You'd peek() ahead to the next item. If it's part of your current item, great, go ahead and augment your item. If it's the next "header" line, then you've finished the item you're working and can return the current item.
Then, the next read() will start on the line you just peeked at without losing your place in the file or messing up restartability.

Related

Spring Batch - use JpaPagingItemReader to read lists instead of individual items

Spring Batch is designed to read and process one item at a time, then write the list of all items processed in a chunk. I want my item to be a List<T> as well, to be thus read and processed, and then write a List<List<T>>. My data source is a standard Spring JpaRepository<T, ID>.
My question is whether there are some standard solutions for this "aggregated" approach. I see that there are some, but they don't read from a JpaRepository, like:
https://github.com/spring-projects/spring-batch/blob/main/spring-batch-samples/src/main/java/org/springframework/batch/sample/domain/multiline/AggregateItemReader.java
Spring Batch - Item Reader and ItemProcessor with a list
Spring Batch- how to pass list of multiple items from input to ItemReader, ItemProcessor and ItemWriter
Update:
I'm looking for a solution that would work for a rapidly changing dataset and in a multithreading environment.
I want my item to be a List as well, to be thus read and processed, and then write a List<List>.
Spring Batch does not (and should not) be aware of what an "item" is. It is up to you do design what an "item" is and how it is implemented (a single value, a list, a stream , etc). In your case, you can encapsulate the List<T> in a type that could be used as an item, and process data as needed. You would need a custom item reader though.
The solution we found is to use a custom aggregate reader as suggested here, which accumulates the read data into a list of a given size then passes it along. For our specific use case, we read data using a JpaPagingItemReader. The relevant part is:
public List<T> read() throws Exception {
ResultHolder holder = new ResultHolder();
// read until no more results available or aggregated size is reached
while (!itemReaderExhausted && holder.getResults().size() < aggregationSize) {
process(itemReader.read(), holder);
}
if (CollectionUtils.isEmpty(holder.getResults())) {
return null;
}
return holder.getResults();
}
private void process(T readValue, ResultHolder resultHolder) {
if (readValue == null) {
itemReaderExhausted = true;
return;
}
resultHolder.addResult(readValue);
}
In order to account for the volatility of the dataset, we extended the JPA reader and overwritten the getPage() method to always return 0, and controlled the dataset through the processor and writer to have the next fresh data to be fetched always on the first page. The hint was given here and in some other SO answers.
public int getPage() {
return 0;
}

Spring Batch - How to store lines read from CSV into a execution context for Restartability

I am not sure if the FlatFileItemReader has the capability to store the last line when an exception is being met. So that when i re-run the batch application it will be able to continue from the last line.
Any example to implement the following use cause would be helpful Thanks!
public static FlatFileItemReader<Employee> reader(String path){
FlatFileItemReader<Employee> reader = new FlatFileItemReader<Employee>();
reader.setResource(new ClassPathResource(path));
reader.setLineMapper(new DefaultLineMapper<Employee>() {
{
setLineTokenizer(new DelimitedLineTokenizer() {
{
setNames(new String[] {"firstName", "lastName", "emailId"});
}
});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Employee>() {
{
setTargetType(Employee.class);
}
});
}
});
return reader;
}}
I think It's a capability of the FlatFileItemReader
DefaultLineMapper
Now that the basic interfaces for reading in flat files have been defined, it becomes clear that three basic steps are required:
Read one line from the file.
It means that if you set chunk size to 1 in your job and reader fails on line 195, next time the job pick chunk no 195 and continue it from line no.195.
Also at application start spring checks failed executions and restarts it automatically.
Some docs:
Retry
Restartibility

trying to optimise the code and increasing performance by reading same text file from a method to different methods in java

Am trying to reducing the code and increasing performance by reading same text file from a method to different methods in java.
sample code of reading text file in each every method based on requirement.
enter code here:
class{
main(){
method1();
method2();
method3();
....
}
method1(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
...
}
method2(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
...
}
method3(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
.....
}
}
what i want to know is there any logic to read text file once in one method and use in different method in java?
If the content of the file is immutable, you can:
store his content, line by line, in a specific method
this method is called by constructor
the returned datas are stored in a List attribute of class
and refer to this attribute by the other methods
method1()
method2()
method3()

Spring Batch multiple readers for different DB's

I have an existing spring batch project which reads data from MySQL or ArangoDB(NoSql database) based on feature toggle decision during startup and does some process and again writes back to MySQL/ArangoDB.
Now the reader configuration for MySQL is something like below,
#Bean
#Primary
#StepScope
public HibernatePagingItemReader reader(
#Value("#{jobParameters[oldMetadataDefinitionId]}") Long oldMetadataDefinitionId) {
Map<String, Object> queryParameters = new HashMap<>();
queryParameters.put(Constants.OLD_METADATA_DEFINITION_ID, oldMetadataDefinitionId);
HibernatePagingItemReader<Long> reader = new HibernatePagingItemReader<>();
reader.setUseStatelessSession(false);
reader.setPageSize(250);
reader.setParameterValues(queryParameters);
reader.setSessionFactory(((HibernateEntityManagerFactory) entityManagerFactory.getObject()).getSessionFactory());
return reader;
}
and i have another arango reader like below,
#Bean
#StepScope
public ListItemReader arangoReader(
#Value("#{jobParameters[oldMetadataDefinitionId]}") Long oldMetadataDefinitionId) {
List<InstanceDTO> instanceList = new ArrayList<InstanceDTO>();
PersistenceService arangoPersistence = arangoConfiguration
.getPersistenceService());
List<Long> instanceIds = arangoPersistence.getDefinitionInstanceIds(oldMetadataDefinitionId);
instanceIds.forEach((instanceId) ->
{
InstanceDTO instanceDto = new InstanceDTO();
instanceDto.setDefinitionID(oldMetadataDefinitionId);
instanceDto.setInstanceID(instanceId);
instanceList.add(instanceDto);
});
return new ListItemReader(instanceList);
}
and my step configuration is below,
#Bean
#SuppressWarnings("unchecked")
public Step InstanceMergeStep(ListItemReader arangoReader, ItemWriter<MetadataInstanceDTO> arangoWriter,
ItemReader<Long> mysqlReader, ItemWriter<Long> mysqlWriter) {
Step step = null;
if (arangoUsage) {
step = steps.get("arangoInstanceMergeStep")
.<Long, Long>chunk(1)
.reader(arangoReader)
.writer(arangoWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(10)
.taskExecutor(stepTaskExecutor())
.build();
((TaskletStep) step).registerChunkListener(chunkListener);
}
else {
step = steps.get("mysqlInstanceMergeStep")
.<Long, Long>chunk(1)
.reader(mysqlReader)
.writer(mysqlWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(failedSkipLimit)
.taskExecutor(stepTaskExecutor())
.build();
((TaskletStep) step).registerChunkListener(chunkListener);
}
return step;
}
The MySQL reader has pagination support through HibernatePagingItemReader so that it will handle millions of items without any memory issue.
I want to implement the same pagination support for arango reader to fetch only 250 documents per iteration how can modify the arango reader code to acheive this?
First of all documentation of ListItemReader says that - Useful for testing so don't use it for production. Return an ItemReader instead from all your reader beans instead of actual concrete types.
Having said that, Spring Batch API or Spring Data doesn't seem to supporting Arango DB . Closest that I could find is this
( I have not worked with Arango DB before ) .
So in my opinion, you have to write your own custom arango reader that implements paging by possibly implementing abstract class - org.springframework.batch.item.database.AbstractPagingItemReader
If its not doable by extending above class, you might have to implement everything from scratch. All of pagination readers in Spring Batch API extend this abstract class including HibernatePagingItemReader.
Also, remember that arango record set should have some kind of ordering to implement pagination so we can distinguish between page - 0 & page -1 etc ( similar to ORDER BY clause , BETWEEN Operator & less than , greater than operators etc in SQL. Also FETCH FIRST XXX ROWS OR LIMIT clause kind of thing would be needed too ) .
Implementing by your own is not a very tough task as you have to calculate total possible items , order them and then divide into pages and fetch only one page at a time.
Look at API for implementations like - HibernatePagingItemReader etc to get ideas.
Hope it helps !!

New Output file for each Item passed into FlatFileItemWriter

I have the following domain object. This is the object being passed from my processor to my writer.
public class DivisionIdPromoCompStartDtEndDtGrouping {
private int divisionId;
private Date rpmPromoCompDetailStartDate;
private Date rpmPromoCompDetailEndDate;
private List<MasterList> detailRecords = new ArrayList<MasterList>();
I would like a new file per DivisionIdPromoCompStartDtEndDtGrouping. each file would have a line for each of the detailRecords in the list. The output files would be of the same format just logically separated based on data (divisionId,rpmPromoCompDetailStartDate and rpmPromoCompDetailEndDate).
How can I create an FlatFileItemWriter to output a new file for each DivisionIdPromoCompStartDtEndDtGrouping with the content detailRecords?
I think the answer might be a compositeItemWriter. Is that right? Could someone help me with an example of this.
thanks in advance
You're close. Instead of just a CompositeItemWriter, use a ClassifierCompositeItemWriter. This coupled with a Classifier implementation that will choose a writer by grouping will allow you to have one file per group. You can read more about this ItemReader in the javadoc here: http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/item/support/ClassifierCompositeItemWriter.html
No, the answer is not a composite writer. A composite writer simple forwards all items it receives to all defined childwriters.
The problem with FlatFileItemWriter is, that you you have to open and to close it, which is handled by the Framwork itself.
A simple approach would be to implement your own writer and use a FlatFileWriter in its write method.
public class MyWriter implements ItemWriter<..>{
public void write(List<..> items) {
for (.. item:items) {
FlatFileItemWriter fileWriter = new FlatFileItemWriter();
fileWriter.setResource(...); // unique FileName
fileWriter.setLineAggregator(...);
fileWriter.... ; // do other settings if necessary
fileWriter.afterPropertiesSet();
fileWriter.open(new ExecutionContext());
fileWriter.write(Collections.singleList(item));
fileWriter.close();
}
}
}
The lineAggregator has to create an appropriate String including all the linebreaks, so that everyDetail is written on its own line in the file.
Of course, you don't have to use a FlatFileWriter and just open an file, use the lineAggregator to create to line and save the line to the file.

Resources