Wring to multiple files dynamically in Spring Batch - spring-boot

In Spring batch I configure a file write as such:
#Bean
public FlatFileItemWriter<MyObject> flatFileItemWriter() throws Exception{
FlatFileItemWriter<MyObject> itemWriter = new FlatFileItemWriter();
// pass through aggregator just calls toString on any item pass in.
itemWriter.setLineAggregator(new PassThroughLineAggregator<>());
String outputPath = File.createTempFile("output", ".out").getAbsolutePath();
System.out.println(">>output path=" + outputPath);
itemWriter.setResource(new FileSystemResource(outputPath));
itemWriter.afterPropertiesSet();
return itemWriter;
}
What happens if MyObject is a complex structure that can vary depending on configuration settings etc and I want to generate different parts of that structure to different files.
How do I do this?

Have you looked at CompositeItemWriter? You may need to have CompositeLineMapper in your reader as well as ClassifierCompositeItemProcessor depending on your needs.
Below is example of a CompositeItemWriter
#Bean
public ItemWriter fileWriter() {
CompositeItemWriter compWriter = new CompositeItemWriter();
FlatFileItemWriter<MyObject_data> dataWriter = new FlatFileItemWriter<MyObject_data>();
FlatFileItemWriter<MyObject_otherdata> otherWriter = new FlatFileItemWriter<MyObject_otherdata>();
List<ItemWriter> iList = new ArrayList<ItemWriter>();
iList.add(dataWriter);
iList.add(otherWriter);
compWriter.setDelegates(iList);
return compWriter;
}

Related

Spring Batch Partitioning At Runtime

I want to read a large file using spring batch. I want to split into multiple files and process each of them in a different thread using partitions. I am using the below code:
#Bean
#StepScope
public MultiResourcePartitioner partitioner() {
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
partitioner.setKeyName("file");
partitioner.setResources(splitFiles());
return partitioner;
}
private Resource[] splitFiles() {
// Read the large File available in the specified folder
// split the file to smaller files and return them as resource list
}
#Bean
public TaskExecutorPartitionHandler partitionHandler() {
TaskExecutorPartitionHandler partitionHandler = new TaskExecutorPartitionHandler();
partitionHandler.setStep(step1());
partitionHandler.setTaskExecutor(new SimpleAsyncTaskExecutor());
return partitionHandler;
}
#Bean
public Step partitionedMaster() {
return this.stepBuilderFactory.get("step1")
.partitioner(step1().getName(), partitioner(null))
.partitionHandler(partitionHandler())
.build();
}
#Bean
public Job partitionedJob() {
return this.jobBuilderFactory.get("partitionedJob")
.start(partitionedMaster())
.build();
}
#Bean
#StepScope
public FlatFileItemReader<Transaction> fileTransactionReader(#Value("#{stepExecutionContext['file']}") Resource resource) {
return new FlatFileItemReaderBuilder<Transaction>()
.name("flatFileTransactionReader")
.resource(resource)
.fieldSetMapper(fsm)
.build();
}
My issue is that the partitioner is partitioning the files which are only available in the folder at the start of the application. Once the application is up and running, if a new file is available in the same folder, the job couldn't read them/partition them.
I used #StepScope, still i'm having the issue.
How do I read and partition the files dynamically at runtime?
Editing it after the first answer:
Hi, Thanks for the inputs.
I can modify the code as below to send the files as parameters and invoke the job, but still the control is not going inside partitioner method, hence could not leverage partitioning.
Any inputs on this?
public JobParameters getJobParameters() {
Resource[] resources = //getFileToProcessResource
return new JobParametersBuilder()
.addLong(TIME, System.currentTimeMillis())
.addString("inputFiles", resources)
.toJobParameters();
}
JobParameters jobParameters = getJobParameters();
jobLauncher.run(partitionedJob(), jobParameters);
#Bean
#StepScope
public MultiResourcePartitioner partitioner(#Value("#{jobParameters['inputFiles']}") Resource[] resources) {
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
partitioner.setKeyName("file");
partitioner.setResources(resources);
return partitioner;
}
Once the application is up and running, if a new file is available in the same folder, the job couldn't read them/partition them
Batch processing is about fixed data sets. In your case, you start a job but its input data changes in the meantime, so that's not going to work as you expect. A fixed data set is required for restartability in order to work on the same data set in case of failure.
Since the input of your job is a file, you can use the file as a job parameter and configure a watch service (or similar mechanism) to launch a new job instance for each new file in the folder.
EDIT: Add example to make the partitioner aware of the job parameter
#Bean
#StepScope
public MultiResourcePartitioner partitioner(#Value("#{jobParameters['fileName']}") String fileName) {
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
partitioner.setKeyName("file");
partitioner.setResources(splitFiles(fileName));
return partitioner;
}
private Resource[] splitFiles(String fileName) {
// Read the large File available in the specified folder
// split the file to smaller files and return them as resource list
return null;
}

How to call appropriate Item Processor for different records?

I have a flat file containing different records(header, record and footer)
HR,...
RD,...
FR,...
ItemReader
#Bean
#StepScope
public FlatFileItemReader reader(#Value("#{jobParameters['inputFileName']}") String inputFileName) {
FlatFileItemReader reader = new FlatFileItemReader();
reader.setResource(new FileSystemResource(inputFileName));
reader.setLineMapper(patternLineMapper());
return reader;
}
#Bean
public LineMapper patternLineMapper() {
PatternMatchingCompositeLineMapper patternLineMapper = new PatternMatchingCompositeLineMapper<>();
tokenizers = new HashMap<String, LineTokenizer>();
try {
tokenizers.put("HR*", headerLineTokenizer());
tokenizers.put("RD*", recordLineTokenizer());
tokenizers.put("FR*", footerLineTokenizer());
} catch (Exception e) {
e.printStackTrace();
}
fieldSetMappers = new HashMap<String, FieldSetMapper>();
fieldSetMappers.put("HR*", new HeaderFieldSetMapper());
fieldSetMappers.put("RD*", new RecordFieldSetMapper());
fieldSetMappers.put("FR*", new FooterFieldSetMapper());
patternLineMapper.setTokenizers(tokenizers);
patternLineMapper.setFieldSetMappers(fieldSetMappers);
return patternLineMapper;
}
They are working fine and spring batch calls the appropriate reader for each record the problem is when it comes to item processor I want to use the same approach I get java.lang.ClassCastException cuz spring batch try to map domain object [returned from reader] to java.lang.String
ItemProcessor
#Bean
#StepScope
public ItemProcessor processor() {
ClassifierCompositeItemProcessor processor = new ClassifierCompositeItemProcessor();
PatternMatchingClassifier<ItemProcessor> classifier = new PatternMatchingClassifier<>();
Map<String, ItemProcessor> patternMap = new HashMap<>();
patternMap.put("HR*", new HeaderItemProcessor());
patternMap.put("RD*", new RecordItemProcessor());
patternMap.put("FR*", new FooterItemProcessor());
classifier.setPatternMap(patternMap);
processor.setClassifier(classifier);
return processor;
}
I also used BackToBackPatternClassifier but it turns out it has a bug and when I use generics like ItemWriter<Object> I get an exception Couldn't Open File. the question is
How can I make ItemProcessor that handles different record types returned from Reader??
Your issue is that the classifier you use in the ClassifierCompositeItemProcessor is based on a String pattern and not a type. What really should happen is something like:
The reader returns a specific type of items based on the input pattern, something like:
HR* -> HRType
RD* -> RDType
FR* -> FRType
This is what you have basically done on the reader side. Now on the processing side, the processor will receive objects of type HRType, RDType and FRType. So the classifier should not be based on String as input type, but on the item type, something like:
Map<Object, ItemProcessor> patternMap = new HashMap<>();
patternMap.put(HRType.class, new HeaderItemProcessor());
patternMap.put(RDType.class, new RecordItemProcessor());
patternMap.put(FRType.class, new FooterItemProcessor());
This classifier uses Object type because your ItemReader returns a raw type. I would not recommend using raw types and Object type in the classifier. What you should do is:
create a base class of your items and a specific class for each type
Make the reader return items of type <? extends BaseClass>
Use a org.springframework.classify.SubclassClassifier in your ClassifierCompositeItemProcessor

Spring Batch: How to setup a FlatFileItemReader to read a json file?

My approach so far:
#Bean
FlatFileItemReader<Blub> flatFileItemReader() {
FlatFileItemReader<Blub> reader = new FlatFileItemReader<>();
reader.setResource(new FileSystemResource("test.json"));
JsonLineMapper lineMapper = new JsonLineMapper();
reader.setLineMapper(lineMapper);
return reader;
}
The challenge is: reader.setLineMapper() cannot use the JsonLineMapper. How to use the JsonLineMapper properly?
create a class BlubJsonLineMapper
public class BlubJsonLineMapper implements LineMapper<Blub> {
private ObjectMapper mapper = new ObjectMapper();
/**
* Interpret the line as a Json object and create a Blub Entity from it.
*
* #see LineMapper#mapLine(String, int)
*/
#Override
public Blub mapLine(String line, int lineNumber) throws Exception {
return mapper.readValue(line, Blub.class);
}
}
then you can set in the FlatFileItemReader
#Bean
FlatFileItemReader<Blub> flatFileItemReader() {
FlatFileItemReader<Blub> reader = new FlatFileItemReader<>();
reader.setResource(new FileSystemResource("test.json"));
BlubJsonLineMapper lineMapper = new BlubJsonLineMapper();
reader.setLineMapper(lineMapper);
return reader;
}
How to setup a FlatFileItemReader to read a json file?
It depends on the format of your json file:
1. Each line is a json object (known as NDJson)
For example:
{object1}
{object2}
then you have two options:
1.1 Use the JsonLineMapper which returns a Map<String, Object>. In this case, your reader should also return Map<String, Object> and you can use an item processor to transform items from Map<String, Object> to Blub (BTW, transforming data from one type to another is a typical use case for an item processor)
1.2 Use a custom implementation of LineMapper<Blub> based on Jackson or Gson or any other library (as shown in the answer by #clevertension)
2. Lines are wrapped in a json array
For example:
[
{object1},
{object2}
]
then you can use the new JsonItemReader that we introduced in version 4.1.0.M1 (See example in the blog post here: https://spring.io/blog/2018/05/31/spring-batch-4-1-0-m1-released#add-a-new-json-item-reader).
There are similar questions to this one, I'm adding them here for reference:
How to read a complex JSON in spring batch?
Json Array reader file with spring batch
Is there a bug in the new Spring JSON reader or am I doing something wrong?
I have build a small demo for Json. If you need any more than it, let me know I can build another example for you
https://github.com/bigzidane/spring-batch-jsonListItem-reader

ItemWriter not outputting rows how I would like

I've written a spring batch job to read from a database and then write to a csv.
The job works but unfortunately in my output CSV file it just puts whatever is in the toString method of my Domain Object.
What I am really after is all the values in the bean separated by a comma. Which is why in my ItemWriter below I put in a DelimitedLineAggregator.
But I think my understanding of that DelimitedLineAggregator is wrong. I thought that the LineAggregator was used for the output but now I think it's used for the input data.
#Bean
#StepScope
public ItemWriter<MasterList> masterListFileWriter(
FileSystemResource masterListFile,
#Value("#{stepExecutionContext}")Map<String, Object> executionContext) {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(masterListFile);
DelimitedLineAggregator<MasterList> lineAggregator = new DelimitedLineAggregator<>();
lineAggregator.setDelimiter(";");
writer.setLineAggregator(lineAggregator);
writer.setForceSync(true);
writer.open(new ExecutionContext(executionContext));
return writer;
}
Two things.
What can I change to output all the values of my MasterList domain object separated by a comma? Is changing the toString method the only way?
Also can someone clarify the use of the LineAggregator in the writer. I'm now thinking it's used to specify how you want to aggregate lines coming from your Reader. Is that right?
Thanks in advance
I worked this out by adding a BeanWrapperFieldExtractor to the writer.
#Bean
#StepScope
public ItemWriter<MasterList> masterListFileWriter(
FileSystemResource masterListFile,
#Value("#{stepExecutionContext}")Map<String, Object> executionContext) {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(masterListFile);
DelimitedLineAggregator<MasterList> lineAggregator = new DelimitedLineAggregator<>();
lineAggregator.setDelimiter(",");
BeanWrapperFieldExtractor<MasterList> extractor = new BeanWrapperFieldExtractor<MasterList>();
extractor.setNames(new String[] { "l2", "l2Name"});
lineAggregator.setFieldExtractor(extractor);
writer.setLineAggregator(lineAggregator);
writer.setForceSync(true);
writer.open(new ExecutionContext(executionContext));
return writer;
}

How to change my job configuration to add file name dynamically

I have a spring batch job which reads from a db then outputs to a multiple csv's. Inside my db I have a special column named divisionId. A CSV file should exist for every distinct value of divisionId. I split out the data using a ClassifierCompositeItemWriter.
At the moment I have an ItemWriter bean defined for every distinct value of divisionId. The beans are the same, it's only the file name that is different.
How can I change the configuration below to create a file with the divisionId automatically pre-pended to the file name without having to register a new ItemWriter for each divisionId?
I've been playing around with #JobScope and #StepScope annotations but can't get it right.
Thanks in advance.
#Bean
public Step readStgDbAndExportMasterListStep() {
return commonJobConfig.stepBuilderFactory
.get("readStgDbAndExportMasterListStep")
.<MasterList,MasterList>chunk(commonJobConfig.chunkSize)
.reader(commonJobConfig.queryStagingDbReader())
.processor(masterListOutputProcessor())
.writer(masterListFileWriter())
.stream((ItemStream) divisionMasterListFileWriter45())
.stream((ItemStream) divisionMasterListFileWriter90())
.build();
}
#Bean
public ItemWriter<MasterList> masterListFileWriter() {
BackToBackPatternClassifier classifier = new BackToBackPatternClassifier();
classifier.setRouterDelegate(new DivisionClassifier());
classifier.setMatcherMap(new HashMap<String, ItemWriter<? extends MasterList>>() {{
put("45", divisionMasterListFileWriter45());
put("90", divisionMasterListFileWriter90());
}});
ClassifierCompositeItemWriter<MasterList> writer = new ClassifierCompositeItemWriter<MasterList>();
writer.setClassifier(classifier);
return writer;
}
#Bean
public ItemWriter<MasterList> divisionMasterListFileWriter45() {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(commonJobConfig.outDir, "45_masterList" + "" + ".csv")));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
#Bean
public ItemWriter<MasterList> divisionMasterListFileWriter90() {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(commonJobConfig.outDir, "90_masterList" + "" + ".csv")));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
I came up with a pretty complex way of doing this. I followed a tutorial at https://github.com/langmi/spring-batch-examples/wiki/Rename-Files.
The premise is to use the step execution context to place the file name in it.

Resources