How to write file with dynamic number of files in spring batch? - spring

I'm new in spring batch and very need your help.
How can I writing file in spring batch, which can be write dynamically number of files.
How data looks like, in attachmentdata:
So, in here I need to generate files, which classify based on "DayNo", and the "DayNo" is dynamic.
I can't input dynamically the stream in bellow configuration of step:
return stepBuilderFactory.get("step")
.<Subscriber, Subscriber>chunk(50)
.reader(reader)
.processor(processor)
.writer(classifierCompositeItemWriter)
.stream(singleWriter) // need to pass all the writer here
.build();
In my case I don't know how much I need to create the Bean of ItemWriter, if i'm not passing the writer in stream , i'll got "org.springframework.batch.item.WriterNotOpenException: Writer must be open before it can be written to"
What is the better way to solve this ?

Related

Spring integration SFTP - issue with filters and number of messages emits

I started using spring integration SFTP and I have some questions.
Filters not working. I have example configuration:
Sftp.inboundAdapter(ftpFileSessionFactory())
.preserveTimestamp(true)
.deleteRemoteFiles(false)
.remoteDirectory(integrationProperties.getRemoteDirectory())
.filter(sftpFileListFilter()) // doesn't work
.patternFilter("*.xlsx") // doesn't work
And my ChainFileListFilter:
private ChainFileListFilter<ChannelSftp.LsEntry> sftpFileListFilter() {
ChainFileListFilter<ChannelSftp.LsEntry> chainFileListFilter = new ChainFileListFilter<>();
chainFileListFilter.addFilter(new SftpPersistentAcceptOnceFileListFilter(metadataStore(), "INT"));
chainFileListFilter.addFilter(new SftpSimplePatternFileListFilter("*.xlsx"));
return chainFileListFilter;
}
If I understand correctly, only the XLSX file should be saved in the local directory. If yes it doesn't work with this configuration. Am I doing something wrong or misunderstood this?
How I can configure SFTP that each downloaded file emit message? I see in the doc two params max-messages-per-poll and max-fetch-size, but I don't know how to set it up so that every file emits a message. I would like to sync files once every 24 hours and produce batch job queue. Maybe there is a workaround?
Is there built-in filter which allow me fetch only files with changed content? The best solution would be to check the checksums of the files.
I will be grateful for your help and explanations.
You cannot combine filter() and patternFilter(). Only one of them can be used: the last one overrides whatever you used before. In other words: or filter() or patternFilter() - not both. By default the logic is like this:
public SftpInboundChannelAdapterSpec patternFilter(String pattern) {
return filter(composeFilters(new SftpSimplePatternFileListFilter(pattern)));
}
private CompositeFileListFilter<ChannelSftp.LsEntry> composeFilters(FileListFilter<ChannelSftp.LsEntry>
fileListFilter) {
CompositeFileListFilter<ChannelSftp.LsEntry> compositeFileListFilter = new CompositeFileListFilter<>();
compositeFileListFilter.addFilters(fileListFilter,
new SftpPersistentAcceptOnceFileListFilter(new SimpleMetadataStore(), "sftpMessageSource"));
return compositeFileListFilter;
}
So, technically you don't need your custom one, if you don't use external persistent MetadataStore. But if you do, think about flipping SftpSimplePatternFileListFilter with SftpPersistentAcceptOnceFileListFilter. Since it is better to check for the pattern before storing the file into MetadataStore.
It is the fact that every synched remote file, passed those filters, is stored into local dir and the message for that local file is emitted immediately when the poller does a request.
The maxFetchSize plays the role when we load remote files into a local dir. The maxMessagesPerPoll is used from the poller, but those are already built from the local files. The message is emitted per local file, not as a batch for all of them. That's not what messaging is designed for.
Please, share more info what does not work with files. The SftpPersistentAcceptOnceFileListFilter checks not only file name, but also mtime of the file. So, that it not about any checksum, but more last modified timestamp of the file.

How to read multiple files, process and write separately using spring batch

I want to read multiple files, name*.txt and process them.
For that I am using MultiResourceItemReader.
It is reading all files and process and write at one time only. I want to read multiple files seperately, process and write to them.
The code:
#Bean
public MultiResourceItemReader<POJO> multiResourceItemReader() {
MultiResourceItemReader<POJO> resourceItemReader = new MultiResourceItemReader<POJO>();
ClassLoader cl = this.getClass().getClassLoader();
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(cl);
Resource[] resources = resolver.getResources("file:" + filePath );
resourceItemReader.setResources(resources);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}
That's how the MultiResourceItemReader is designed to work. In your case, you can create a job instance per file.
There are many advantages of making one thing do one thing and do it well, one of them in your use case is restartability: If one of the jobs fail, you only restart the failed one.

SPRING BATCH : dynamic commit-interval

I need to know how to set programmatically (in java class not xml)a commit-interval in my batch. My program is as the following :
// loop on lines information from flat file
// treatement on line
// commit
Is there a method in a library which permit to do the commit in java class ?
Thank you for your help
You would need to define your own custom CompletionPolicy. Then you set that as your chunk-completion-policy in your chunked step.
This old forum has an example implementation.

How do I make the mapper process the entire file from HDFS

This is the code where I read the file that contain Hl7 messages and iterate through them using Hapi Iterator (from http://hl7api.sourceforge.net)
File file = new File("/home/training/Documents/msgs.txt");
InputStream is = new FileInputStream(file);
is = new BufferedInputStream(is);
Hl7InputStreamMessageStringIterator iter = new
Hl7InputStreamMessageStringIterator(is);
I want to make this done inside the map function? obviously I need to prevent the splitting in InputFormat to read the entire file as once as a single value and change it toString (the file size is 7KB), because as you know Hapi can parse only entire message.
I am newbie to all of this so please bear with me.
You will need to implement you own FileInputFormat subclass:
It must override isSplittable() method to false which means that number of mappers will be equal to number of input files: one input file per each mapper.
You also need to implement getRecordReader() method. This is exactly the class where you need to put you parsing logic from above to.
If you do not want your data file to split or you want a single mapper which will process your entire file. So that one file will be processed by only one mapper. In that case extending map/reduce inputformat and overriding isSplitable() method and return "false" as boolean will help you.
For ref : ( Not based on your code )
https://gist.github.com/sritchie/808035
As the input is getting from the text file, you can override isSplitable() method of fileInputFormat. Using this, one mapper will process the whole file.
public boolean isSplitable(Context context,Path args[0])
{
return false;
}

How to simulate hdfs operations using spring data

I'm new to spring data-hadoop and would like to ask one general question. I have files in different format and would like to extract the useful content with Apache Tika and store as text files in HDFS. I've gone through the reference documentation of spring data-hadoop(http://docs.spring.io/spring-hadoop/docs/2.0.0.RELEASE/reference/html/store.html) but didn't understand how to do it. And I didn't find any other useful resources for this.
Is there any sample projects or sources for writing data to HDFS using spring data-hadoop ?
From Risberg's comment one useful example :-
https://github.com/trisberg/springone-2015/tree/master/boot-ingest
Another code snippet with TextFileWriter implementation of DataWriter interface :-
//build naming strategy
ChainedFileNamingStrategy namingStrategy =
new ChainedFileNamingStrategy(
Arrays.asList(new FileNamingStrategy[] {
new StaticFileNamingStrategy("document"),
new UuidFileNamingStrategy(someUUID),
new StaticFileNamingStrategy("txt", ".") }));
//set the naming strategy
textFileWriter.setFileNamingStrategy(namingStrategy);
textFileWriter.write("this is a test content");
//flush and close the writer
textFileWriter.flush();
textFileWriter.close();

Resources