New Output file for each Item passed into FlatFileItemWriter - spring

I have the following domain object. This is the object being passed from my processor to my writer.
public class DivisionIdPromoCompStartDtEndDtGrouping {
private int divisionId;
private Date rpmPromoCompDetailStartDate;
private Date rpmPromoCompDetailEndDate;
private List<MasterList> detailRecords = new ArrayList<MasterList>();
I would like a new file per DivisionIdPromoCompStartDtEndDtGrouping. each file would have a line for each of the detailRecords in the list. The output files would be of the same format just logically separated based on data (divisionId,rpmPromoCompDetailStartDate and rpmPromoCompDetailEndDate).
How can I create an FlatFileItemWriter to output a new file for each DivisionIdPromoCompStartDtEndDtGrouping with the content detailRecords?
I think the answer might be a compositeItemWriter. Is that right? Could someone help me with an example of this.
thanks in advance

You're close. Instead of just a CompositeItemWriter, use a ClassifierCompositeItemWriter. This coupled with a Classifier implementation that will choose a writer by grouping will allow you to have one file per group. You can read more about this ItemReader in the javadoc here: http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/item/support/ClassifierCompositeItemWriter.html

No, the answer is not a composite writer. A composite writer simple forwards all items it receives to all defined childwriters.
The problem with FlatFileItemWriter is, that you you have to open and to close it, which is handled by the Framwork itself.
A simple approach would be to implement your own writer and use a FlatFileWriter in its write method.
public class MyWriter implements ItemWriter<..>{
public void write(List<..> items) {
for (.. item:items) {
FlatFileItemWriter fileWriter = new FlatFileItemWriter();
fileWriter.setResource(...); // unique FileName
fileWriter.setLineAggregator(...);
fileWriter.... ; // do other settings if necessary
fileWriter.afterPropertiesSet();
fileWriter.open(new ExecutionContext());
fileWriter.write(Collections.singleList(item));
fileWriter.close();
}
}
}
The lineAggregator has to create an appropriate String including all the linebreaks, so that everyDetail is written on its own line in the file.
Of course, you don't have to use a FlatFileWriter and just open an file, use the lineAggregator to create to line and save the line to the file.

Related

Spring Batch - use JpaPagingItemReader to read lists instead of individual items

Spring Batch is designed to read and process one item at a time, then write the list of all items processed in a chunk. I want my item to be a List<T> as well, to be thus read and processed, and then write a List<List<T>>. My data source is a standard Spring JpaRepository<T, ID>.
My question is whether there are some standard solutions for this "aggregated" approach. I see that there are some, but they don't read from a JpaRepository, like:
https://github.com/spring-projects/spring-batch/blob/main/spring-batch-samples/src/main/java/org/springframework/batch/sample/domain/multiline/AggregateItemReader.java
Spring Batch - Item Reader and ItemProcessor with a list
Spring Batch- how to pass list of multiple items from input to ItemReader, ItemProcessor and ItemWriter
Update:
I'm looking for a solution that would work for a rapidly changing dataset and in a multithreading environment.
I want my item to be a List as well, to be thus read and processed, and then write a List<List>.
Spring Batch does not (and should not) be aware of what an "item" is. It is up to you do design what an "item" is and how it is implemented (a single value, a list, a stream , etc). In your case, you can encapsulate the List<T> in a type that could be used as an item, and process data as needed. You would need a custom item reader though.
The solution we found is to use a custom aggregate reader as suggested here, which accumulates the read data into a list of a given size then passes it along. For our specific use case, we read data using a JpaPagingItemReader. The relevant part is:
public List<T> read() throws Exception {
ResultHolder holder = new ResultHolder();
// read until no more results available or aggregated size is reached
while (!itemReaderExhausted && holder.getResults().size() < aggregationSize) {
process(itemReader.read(), holder);
}
if (CollectionUtils.isEmpty(holder.getResults())) {
return null;
}
return holder.getResults();
}
private void process(T readValue, ResultHolder resultHolder) {
if (readValue == null) {
itemReaderExhausted = true;
return;
}
resultHolder.addResult(readValue);
}
In order to account for the volatility of the dataset, we extended the JPA reader and overwritten the getPage() method to always return 0, and controlled the dataset through the processor and writer to have the next fresh data to be fetched always on the first page. The hint was given here and in some other SO answers.
public int getPage() {
return 0;
}

In Spring Batch, linked with a ItemReader call I want to call a static util method to populate a string

I have a Spring Batch reader with following configurations.
This reader is reading from the database and and at a time its reading a page size records.
#Autowired
private SomeCreditRepot someCreditRepo;
public RepositoryItemReader<SomeCreditModel> reader() {
RepositoryItemReader<SomeCreditModel> reader = new RepositoryItemReader<>();
reader.setRepository(someCreditRepo);
reader.setMethodName("someCreditTransfer");
.
.
..
return reader;
}
I want to call utils method,
refValue = BatchProcessingUtil.generateSomeRefValue();
before the processor step, so that all the records fetched by the reader will have the same value set by which is given by the above call.
So that all the entity fetched by the reader will get the same value, in the processor.
And then this refValue will be written to another table StoreRefValue(table).
What is the right way to do this in Spring Batch?
Should I fire the query to write the refValue, to the table StoreRefValue in the processor?
You can let your processor implement the interface StepExecutionListener. You'll then have to implement the methods afterStep and beforeStep. The first should simply return null, and in beforeStep you can call the utility method and save its return value.
Alternatively, you can use the annotation #BeforeStep. If you use the usual Java DSL, it's not required to explicitly add the processor as a listener to the step. Adding it as a processor should suffice.
There are more details in the reference documentation:
https://docs.spring.io/spring-batch/docs/current/reference/html/step.html#interceptingStepExecution

trying to optimise the code and increasing performance by reading same text file from a method to different methods in java

Am trying to reducing the code and increasing performance by reading same text file from a method to different methods in java.
sample code of reading text file in each every method based on requirement.
enter code here:
class{
main(){
method1();
method2();
method3();
....
}
method1(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
...
}
method2(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
...
}
method3(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
.....
}
}
what i want to know is there any logic to read text file once in one method and use in different method in java?
If the content of the file is immutable, you can:
store his content, line by line, in a specific method
this method is called by constructor
the returned datas are stored in a List attribute of class
and refer to this attribute by the other methods
method1()
method2()
method3()

Multi-Line Records Reader (when start prefix = end prefix)

I'm implementing Multi-Line Records Reader solution based on https://docs.spring.io/spring-batch/reference/html/patterns.html#multiLineRecords
I have the following flat file:
HEA;0013100345;2007-02-15
NCU;Smith;Peter;;T;20014539;F
BAD;;Oak Street 31/A;;Small Town;00235;IL;US
HEA;0013100345;2007-02-15
NCU;Smith;Peter;;T;20014539;F
HEA;0013100345;2007-02-15
HEA(and optionally NCU, BAD) must be converted to a single object.
However in my case I don't have "end" line, so "HEA" is a start of new Item and end of previous one at the same time.
Thanks to Dean Clark for the good suggestion below. This is java config of the solution:
#Bean
public FlatFileItemReader<FieldSet> readerFlat() {
FlatFileItemReader<FieldSet> reader = new FlatFileItemReader<>();
reader.setResource(new ClassPathResource("multirecord.txt"));
reader.setLineMapper(compositeLineMapper());
return reader;
}
#Bean
public SingleItemPeekableItemReader<FieldSet> readerPeek() {
SingleItemPeekableItemReader<FieldSet> reader = new SingleItemPeekableItemReader<FieldSet>() {{
setDelegate(readerFlat());
}};
return reader;
}
#Bean
public MultiLineCaseItemReader readerMultirecord() {
MultiLineCaseItemReader multiReader = new MultiLineCaseItemReader() {{
setDelegate(readerPeek());
}};
return multiReader;
}
Then in the custom MultiLineCaseItemReader your can do both read() and peek()
As the reference docs mention, you should create a custom implementation of ItemReader to wrap the FlatFileItemReader.
More specifically, you may want to extend SingleItemPeekableItemReader and use FlatFileItemReader as your delegate.
You'd peek() ahead to the next item. If it's part of your current item, great, go ahead and augment your item. If it's the next "header" line, then you've finished the item you're working and can return the current item.
Then, the next read() will start on the line you just peeked at without losing your place in the file or messing up restartability.

Reading and writing multiple files simultaneously using Spring batch

We are developing one application which will read multiple files & write multiple files i.e. one output file for one input file (name of output file must be same as input file).
MultiResourceItemReader can read multiple files but not simultaneously, which is a performance bottleneck for us. Spring batch provides multithreading support for this but again many threads will read the same file & try to write it. Since output file name must be same as Input file name, we can't use that option too.
Now I am looking for one more possibility, if I can create 'n' threads to read & write 'n' files. But I am not sure how to integrate this logic with Spring Batch framework.
Advance thanks for any help.
Since MultiResourceItemReader doesn't meet your performance needs you may take a closer look at parallel processing, which you already mentioned is a desirable option. I don't think many threads will read the same file and try to write it when running multi-threaded, if configured correctly.
Rather than taking the typical chunk-oriented approach you could create a tasklet-orient step that is partitioned (multi-threaded). The tasklet class would be the main driver, delegating calls to a reader and a writer.
The general flow would be something like this:
Retrieve the names of all the files that need to be read in/written out (via some service class) and save them to the execution context within an implementation of Partitioner.
public class filePartitioner implements Partitioner {
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, Path> filesToProcess = this.service.getFilesToProcess(directory); // this is just sudo-ish code but maybe you inject the directory you'll be targeting into this class
Map<String, ExecutionContext> execCtxs = new HashMap<>();
for(Entry<String, Path> entry : filesToProcess.entrySet()) {
ExecutionContext execCtx = new ExecutionContext();
execCtx.put("file", entry.getValue());
execCtxs.put(entry.getKey(), execCtx);
}
return execCtxs;
}
// injected
public void setServiceClass(ServiceClass service) {
this.service = service;
}
}
a. For the .getFilesToProcess() method you just need something that returns all of the files in the designated directory because you need to eventually know what is to be read and the name of the file that is to be written. Obviously there are several ways to go about this, such as...
public Map<String, Path> getFilesToProcess(String directory) {
Map<String, Path> filesToProcess = new HashMap<String, Path>();
File directoryFile = new File(directory); // where directory is where you intend to read from
this.generateFileList(filesToProcess, directoryFile, directory);
private void generateFileList(Map<String, Path> fileList, File node, String directory) {
// traverse directory and get files, adding to file list.
if(node.isFile()) {
String file = node.getAbsoluteFile().toString().substring(directory.length() + 1, node.toString().length());
fileList.put(file, directory);
}
if(node.isDirectory()) {
String[] files = node.list();
for(String filename : files) {
this.generateFileList(fileList, new File(node, filename), directory);
}
}
}
You'll need to create a tasklet, which will pull file names from the execution context and pass them to some injected class that will read in the file and write it out (custom ItemReaders and ItemWriters may be necessary).
The rest of the work would be in configuration, which should be fairly straight forward. It is in the configuration of the Partitioner where you can set your grid size, which could even be done dynamically using SpEL if you really intend to create n threads for n files. I would bet a fixed number of threads running across n files would show significant improvement in performance but you'll be able to determine that for yourself.
Hope this helps.

Resources