I've written a spring batch job to read from a database and then write to a csv.
The job works but unfortunately in my output CSV file it just puts whatever is in the toString method of my Domain Object.
What I am really after is all the values in the bean separated by a comma. Which is why in my ItemWriter below I put in a DelimitedLineAggregator.
But I think my understanding of that DelimitedLineAggregator is wrong. I thought that the LineAggregator was used for the output but now I think it's used for the input data.
#Bean
#StepScope
public ItemWriter<MasterList> masterListFileWriter(
FileSystemResource masterListFile,
#Value("#{stepExecutionContext}")Map<String, Object> executionContext) {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(masterListFile);
DelimitedLineAggregator<MasterList> lineAggregator = new DelimitedLineAggregator<>();
lineAggregator.setDelimiter(";");
writer.setLineAggregator(lineAggregator);
writer.setForceSync(true);
writer.open(new ExecutionContext(executionContext));
return writer;
}
Two things.
What can I change to output all the values of my MasterList domain object separated by a comma? Is changing the toString method the only way?
Also can someone clarify the use of the LineAggregator in the writer. I'm now thinking it's used to specify how you want to aggregate lines coming from your Reader. Is that right?
Thanks in advance
I worked this out by adding a BeanWrapperFieldExtractor to the writer.
#Bean
#StepScope
public ItemWriter<MasterList> masterListFileWriter(
FileSystemResource masterListFile,
#Value("#{stepExecutionContext}")Map<String, Object> executionContext) {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(masterListFile);
DelimitedLineAggregator<MasterList> lineAggregator = new DelimitedLineAggregator<>();
lineAggregator.setDelimiter(",");
BeanWrapperFieldExtractor<MasterList> extractor = new BeanWrapperFieldExtractor<MasterList>();
extractor.setNames(new String[] { "l2", "l2Name"});
lineAggregator.setFieldExtractor(extractor);
writer.setLineAggregator(lineAggregator);
writer.setForceSync(true);
writer.open(new ExecutionContext(executionContext));
return writer;
}
Related
I have a flat file containing different records(header, record and footer)
HR,...
RD,...
FR,...
ItemReader
#Bean
#StepScope
public FlatFileItemReader reader(#Value("#{jobParameters['inputFileName']}") String inputFileName) {
FlatFileItemReader reader = new FlatFileItemReader();
reader.setResource(new FileSystemResource(inputFileName));
reader.setLineMapper(patternLineMapper());
return reader;
}
#Bean
public LineMapper patternLineMapper() {
PatternMatchingCompositeLineMapper patternLineMapper = new PatternMatchingCompositeLineMapper<>();
tokenizers = new HashMap<String, LineTokenizer>();
try {
tokenizers.put("HR*", headerLineTokenizer());
tokenizers.put("RD*", recordLineTokenizer());
tokenizers.put("FR*", footerLineTokenizer());
} catch (Exception e) {
e.printStackTrace();
}
fieldSetMappers = new HashMap<String, FieldSetMapper>();
fieldSetMappers.put("HR*", new HeaderFieldSetMapper());
fieldSetMappers.put("RD*", new RecordFieldSetMapper());
fieldSetMappers.put("FR*", new FooterFieldSetMapper());
patternLineMapper.setTokenizers(tokenizers);
patternLineMapper.setFieldSetMappers(fieldSetMappers);
return patternLineMapper;
}
They are working fine and spring batch calls the appropriate reader for each record the problem is when it comes to item processor I want to use the same approach I get java.lang.ClassCastException cuz spring batch try to map domain object [returned from reader] to java.lang.String
ItemProcessor
#Bean
#StepScope
public ItemProcessor processor() {
ClassifierCompositeItemProcessor processor = new ClassifierCompositeItemProcessor();
PatternMatchingClassifier<ItemProcessor> classifier = new PatternMatchingClassifier<>();
Map<String, ItemProcessor> patternMap = new HashMap<>();
patternMap.put("HR*", new HeaderItemProcessor());
patternMap.put("RD*", new RecordItemProcessor());
patternMap.put("FR*", new FooterItemProcessor());
classifier.setPatternMap(patternMap);
processor.setClassifier(classifier);
return processor;
}
I also used BackToBackPatternClassifier but it turns out it has a bug and when I use generics like ItemWriter<Object> I get an exception Couldn't Open File. the question is
How can I make ItemProcessor that handles different record types returned from Reader??
Your issue is that the classifier you use in the ClassifierCompositeItemProcessor is based on a String pattern and not a type. What really should happen is something like:
The reader returns a specific type of items based on the input pattern, something like:
HR* -> HRType
RD* -> RDType
FR* -> FRType
This is what you have basically done on the reader side. Now on the processing side, the processor will receive objects of type HRType, RDType and FRType. So the classifier should not be based on String as input type, but on the item type, something like:
Map<Object, ItemProcessor> patternMap = new HashMap<>();
patternMap.put(HRType.class, new HeaderItemProcessor());
patternMap.put(RDType.class, new RecordItemProcessor());
patternMap.put(FRType.class, new FooterItemProcessor());
This classifier uses Object type because your ItemReader returns a raw type. I would not recommend using raw types and Object type in the classifier. What you should do is:
create a base class of your items and a specific class for each type
Make the reader return items of type <? extends BaseClass>
Use a org.springframework.classify.SubclassClassifier in your ClassifierCompositeItemProcessor
My approach so far:
#Bean
FlatFileItemReader<Blub> flatFileItemReader() {
FlatFileItemReader<Blub> reader = new FlatFileItemReader<>();
reader.setResource(new FileSystemResource("test.json"));
JsonLineMapper lineMapper = new JsonLineMapper();
reader.setLineMapper(lineMapper);
return reader;
}
The challenge is: reader.setLineMapper() cannot use the JsonLineMapper. How to use the JsonLineMapper properly?
create a class BlubJsonLineMapper
public class BlubJsonLineMapper implements LineMapper<Blub> {
private ObjectMapper mapper = new ObjectMapper();
/**
* Interpret the line as a Json object and create a Blub Entity from it.
*
* #see LineMapper#mapLine(String, int)
*/
#Override
public Blub mapLine(String line, int lineNumber) throws Exception {
return mapper.readValue(line, Blub.class);
}
}
then you can set in the FlatFileItemReader
#Bean
FlatFileItemReader<Blub> flatFileItemReader() {
FlatFileItemReader<Blub> reader = new FlatFileItemReader<>();
reader.setResource(new FileSystemResource("test.json"));
BlubJsonLineMapper lineMapper = new BlubJsonLineMapper();
reader.setLineMapper(lineMapper);
return reader;
}
How to setup a FlatFileItemReader to read a json file?
It depends on the format of your json file:
1. Each line is a json object (known as NDJson)
For example:
{object1}
{object2}
then you have two options:
1.1 Use the JsonLineMapper which returns a Map<String, Object>. In this case, your reader should also return Map<String, Object> and you can use an item processor to transform items from Map<String, Object> to Blub (BTW, transforming data from one type to another is a typical use case for an item processor)
1.2 Use a custom implementation of LineMapper<Blub> based on Jackson or Gson or any other library (as shown in the answer by #clevertension)
2. Lines are wrapped in a json array
For example:
[
{object1},
{object2}
]
then you can use the new JsonItemReader that we introduced in version 4.1.0.M1 (See example in the blog post here: https://spring.io/blog/2018/05/31/spring-batch-4-1-0-m1-released#add-a-new-json-item-reader).
There are similar questions to this one, I'm adding them here for reference:
How to read a complex JSON in spring batch?
Json Array reader file with spring batch
Is there a bug in the new Spring JSON reader or am I doing something wrong?
I have build a small demo for Json. If you need any more than it, let me know I can build another example for you
https://github.com/bigzidane/spring-batch-jsonListItem-reader
In Spring batch I configure a file write as such:
#Bean
public FlatFileItemWriter<MyObject> flatFileItemWriter() throws Exception{
FlatFileItemWriter<MyObject> itemWriter = new FlatFileItemWriter();
// pass through aggregator just calls toString on any item pass in.
itemWriter.setLineAggregator(new PassThroughLineAggregator<>());
String outputPath = File.createTempFile("output", ".out").getAbsolutePath();
System.out.println(">>output path=" + outputPath);
itemWriter.setResource(new FileSystemResource(outputPath));
itemWriter.afterPropertiesSet();
return itemWriter;
}
What happens if MyObject is a complex structure that can vary depending on configuration settings etc and I want to generate different parts of that structure to different files.
How do I do this?
Have you looked at CompositeItemWriter? You may need to have CompositeLineMapper in your reader as well as ClassifierCompositeItemProcessor depending on your needs.
Below is example of a CompositeItemWriter
#Bean
public ItemWriter fileWriter() {
CompositeItemWriter compWriter = new CompositeItemWriter();
FlatFileItemWriter<MyObject_data> dataWriter = new FlatFileItemWriter<MyObject_data>();
FlatFileItemWriter<MyObject_otherdata> otherWriter = new FlatFileItemWriter<MyObject_otherdata>();
List<ItemWriter> iList = new ArrayList<ItemWriter>();
iList.add(dataWriter);
iList.add(otherWriter);
compWriter.setDelegates(iList);
return compWriter;
}
I'm using local partitioning in spring batch to write xml files to the database. I have already split the original file to smaller files and i have used MultiResourcePartitioner to process each one of them as each file will be processed by one thread. I'm getting a violation of primary Key constraint error i don't know how to deal with this issue
List of files
The partitionner
#Bean
public Partitioner partitioner1(){
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
Resource[] resources;
try {
resources = resourcePatternResolver.getResources("file:src/main/resources/data/*.xml");
} catch (IOException e) {
throw new RuntimeException("I/O problems when resolving the input file pattern.",e);
}
partitioner.setResources(resources);
return partitioner;
}
The StaxEventItemReader using XML file as an input for the reader
#Bean
#StepScope
public StaxEventItemReader<Customer> CustomerItemReader() {
XStreamMarshaller unmarshaller = new XStreamMarshaller();
Map<String, Class> aliases = new HashMap<>();
aliases.put("customer", Customer.class);
unmarshaller.setAliases(aliases);
StaxEventItemReader<Customer> reader = new StaxEventItemReader<>();
reader.setResource(new ClassPathResource("data/customerOutput1-25000.xml"));
reader.setFragmentRootElementName("customer");
reader.setUnmarshaller(unmarshaller);
return reader;
}
The JdbcBatchItemWriter (writing to the database)
#Bean
#StepScope
public JdbcBatchItemWriter<Customer> customerItemWriter() {
JdbcBatchItemWriter<Customer> itemWriter = new JdbcBatchItemWriter<>();
itemWriter.setDataSource(this.dataSource);
itemWriter.setSql("INSERT INTO NEW_CUSTOMER VALUES (:id, :firstName, :lastName, :birthdate)");
itemWriter.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider());
itemWriter.afterPropertiesSet();
return itemWriter;
}
Thanks for any help
Your reader has this line, which causes all the partitions to load the same file:
reader.setResource(new ClassPathResource("data/customerOutput1-25000.xml"));
It should instead take the resource from the Step Execution Context. You can access the execution context either in the open() method using the ItemStream interface or the beforeStep() method of the StepExectionListener interface. A bit of personal preference here, but I generally thing using ItemStream is the "better" solution.
I have a spring batch job which reads from a db then outputs to a multiple csv's. Inside my db I have a special column named divisionId. A CSV file should exist for every distinct value of divisionId. I split out the data using a ClassifierCompositeItemWriter.
At the moment I have an ItemWriter bean defined for every distinct value of divisionId. The beans are the same, it's only the file name that is different.
How can I change the configuration below to create a file with the divisionId automatically pre-pended to the file name without having to register a new ItemWriter for each divisionId?
I've been playing around with #JobScope and #StepScope annotations but can't get it right.
Thanks in advance.
#Bean
public Step readStgDbAndExportMasterListStep() {
return commonJobConfig.stepBuilderFactory
.get("readStgDbAndExportMasterListStep")
.<MasterList,MasterList>chunk(commonJobConfig.chunkSize)
.reader(commonJobConfig.queryStagingDbReader())
.processor(masterListOutputProcessor())
.writer(masterListFileWriter())
.stream((ItemStream) divisionMasterListFileWriter45())
.stream((ItemStream) divisionMasterListFileWriter90())
.build();
}
#Bean
public ItemWriter<MasterList> masterListFileWriter() {
BackToBackPatternClassifier classifier = new BackToBackPatternClassifier();
classifier.setRouterDelegate(new DivisionClassifier());
classifier.setMatcherMap(new HashMap<String, ItemWriter<? extends MasterList>>() {{
put("45", divisionMasterListFileWriter45());
put("90", divisionMasterListFileWriter90());
}});
ClassifierCompositeItemWriter<MasterList> writer = new ClassifierCompositeItemWriter<MasterList>();
writer.setClassifier(classifier);
return writer;
}
#Bean
public ItemWriter<MasterList> divisionMasterListFileWriter45() {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(commonJobConfig.outDir, "45_masterList" + "" + ".csv")));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
#Bean
public ItemWriter<MasterList> divisionMasterListFileWriter90() {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(commonJobConfig.outDir, "90_masterList" + "" + ".csv")));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
I came up with a pretty complex way of doing this. I followed a tutorial at https://github.com/langmi/spring-batch-examples/wiki/Rename-Files.
The premise is to use the step execution context to place the file name in it.