How to read multiple files, process and write separately using spring batch - spring-boot

I want to read multiple files, name*.txt and process them.
For that I am using MultiResourceItemReader.
It is reading all files and process and write at one time only. I want to read multiple files seperately, process and write to them.
The code:
#Bean
public MultiResourceItemReader<POJO> multiResourceItemReader() {
MultiResourceItemReader<POJO> resourceItemReader = new MultiResourceItemReader<POJO>();
ClassLoader cl = this.getClass().getClassLoader();
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(cl);
Resource[] resources = resolver.getResources("file:" + filePath );
resourceItemReader.setResources(resources);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}

That's how the MultiResourceItemReader is designed to work. In your case, you can create a job instance per file.
There are many advantages of making one thing do one thing and do it well, one of them in your use case is restartability: If one of the jobs fail, you only restart the failed one.

Related

How to write file with dynamic number of files in spring batch?

I'm new in spring batch and very need your help.
How can I writing file in spring batch, which can be write dynamically number of files.
How data looks like, in attachmentdata:
So, in here I need to generate files, which classify based on "DayNo", and the "DayNo" is dynamic.
I can't input dynamically the stream in bellow configuration of step:
return stepBuilderFactory.get("step")
.<Subscriber, Subscriber>chunk(50)
.reader(reader)
.processor(processor)
.writer(classifierCompositeItemWriter)
.stream(singleWriter) // need to pass all the writer here
.build();
In my case I don't know how much I need to create the Bean of ItemWriter, if i'm not passing the writer in stream , i'll got "org.springframework.batch.item.WriterNotOpenException: Writer must be open before it can be written to"
What is the better way to solve this ?

Spring integration SFTP - issue with filters and number of messages emits

I started using spring integration SFTP and I have some questions.
Filters not working. I have example configuration:
Sftp.inboundAdapter(ftpFileSessionFactory())
.preserveTimestamp(true)
.deleteRemoteFiles(false)
.remoteDirectory(integrationProperties.getRemoteDirectory())
.filter(sftpFileListFilter()) // doesn't work
.patternFilter("*.xlsx") // doesn't work
And my ChainFileListFilter:
private ChainFileListFilter<ChannelSftp.LsEntry> sftpFileListFilter() {
ChainFileListFilter<ChannelSftp.LsEntry> chainFileListFilter = new ChainFileListFilter<>();
chainFileListFilter.addFilter(new SftpPersistentAcceptOnceFileListFilter(metadataStore(), "INT"));
chainFileListFilter.addFilter(new SftpSimplePatternFileListFilter("*.xlsx"));
return chainFileListFilter;
}
If I understand correctly, only the XLSX file should be saved in the local directory. If yes it doesn't work with this configuration. Am I doing something wrong or misunderstood this?
How I can configure SFTP that each downloaded file emit message? I see in the doc two params max-messages-per-poll and max-fetch-size, but I don't know how to set it up so that every file emits a message. I would like to sync files once every 24 hours and produce batch job queue. Maybe there is a workaround?
Is there built-in filter which allow me fetch only files with changed content? The best solution would be to check the checksums of the files.
I will be grateful for your help and explanations.
You cannot combine filter() and patternFilter(). Only one of them can be used: the last one overrides whatever you used before. In other words: or filter() or patternFilter() - not both. By default the logic is like this:
public SftpInboundChannelAdapterSpec patternFilter(String pattern) {
return filter(composeFilters(new SftpSimplePatternFileListFilter(pattern)));
}
private CompositeFileListFilter<ChannelSftp.LsEntry> composeFilters(FileListFilter<ChannelSftp.LsEntry>
fileListFilter) {
CompositeFileListFilter<ChannelSftp.LsEntry> compositeFileListFilter = new CompositeFileListFilter<>();
compositeFileListFilter.addFilters(fileListFilter,
new SftpPersistentAcceptOnceFileListFilter(new SimpleMetadataStore(), "sftpMessageSource"));
return compositeFileListFilter;
}
So, technically you don't need your custom one, if you don't use external persistent MetadataStore. But if you do, think about flipping SftpSimplePatternFileListFilter with SftpPersistentAcceptOnceFileListFilter. Since it is better to check for the pattern before storing the file into MetadataStore.
It is the fact that every synched remote file, passed those filters, is stored into local dir and the message for that local file is emitted immediately when the poller does a request.
The maxFetchSize plays the role when we load remote files into a local dir. The maxMessagesPerPoll is used from the poller, but those are already built from the local files. The message is emitted per local file, not as a batch for all of them. That's not what messaging is designed for.
Please, share more info what does not work with files. The SftpPersistentAcceptOnceFileListFilter checks not only file name, but also mtime of the file. So, that it not about any checksum, but more last modified timestamp of the file.

Spring batch to upload a CSV file and insert into database

My project has this requirement where user uploads a CSV file which has to be pushed to sql server database.
I know we can use Spring batch to process large number of records. But I'm not able to find any tutorial/sample code for this requirement of mine.
All the tutorials which I came across just hardcoded the CSV file name and in-memory databases in it like below:
https://spring.io/guides/gs/batch-processing/
User Input file is available in shared drive location on schduled time with file name prefix as eg: stack_overlfow_dd-MM-yyyy HH:mm, on daily basis how can I poll the Network shared drive for every 5-10 minutes atleast for one hour daily if its matches with regex then upload to database.
How can I take the csv file first from shared location and store it in memory or somewhere and then configure spring batch to read that as input.
any help here would be appreciated. Thanks In advance
All the tutorials which I came across just hardcoded the CSV file name and in-memory databases
You can find samples in the official repo here. Here is an example where the input file name is not hardcoded but passed as a job parameter.
How can I take the csv file first from shared location and store it in memory or somewhere and then configure spring batch to read that as input.
You can proceed in two steps: download the file locally then read/process/write it to the database (See https://stackoverflow.com/a/52110781/5019386).
how can I poll the Network shared drive for every 5-10 minutes atleast for one hour daily if its matches with regex then upload to database.
Once you have defined your job, you can schedule it to run when you want using:
a scheduler like Quartz
or using Spring's task scheduling features.
or using a combination of Spring Integration and Spring Batch. Spring integration would poll the directory and then launches a Spring Batch job when appropriate. This approach is described here.
More details on job scheduling here.
You can make a service layer that can process excel file and read data from file and construct java object to save into DB. Here I have used apache POI to parse Excel data and read from excel sheet.
public class FileUploadService {
#Autowired
FileUploadDao fileUploadDao;
public String uploadFileData(String inputFilePath) {
Workbook workbook = null;
Sheet sheet = null;
try {
workbook = getWorkBook(new File(inputFilePath));
sheet = workbook.getSheetAt(0);
/*Build the header portion of the Output File*/
String headerDetails = "EmployeeId,EmployeeName,Address,Country";
String headerNames[] = headerDetails.split(",");
/*Read and process each Row*/
ArrayList < ExcelTemplateVO > employeeList = new ArrayList < > ();
Iterator < Row > rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
//Read and process each column in row
ExcelTemplateVO excelTemplateVO = new ExcelTemplateVO();
int count = 0;
while (count < headerNames.length) {
String methodName = "set" + headerNames[count];
String inputCellValue = getCellValueBasedOnCellType(row, count++);
setValueIntoObject(excelTemplateVO, ExcelTemplateVO.class, methodName, "java.lang.String", inputCellValue);
}
employeeList.add(excelTemplateVO);
}
fileUploadDao.saveFileDataInDB(employeeList);
} catch (Exception ex) {
ex.printStackTrace();
}
return "Success";
}
I believe your question have already been answered here.
The author of the question has even uploaded a repository of his working result :
https://github.com/PriyankaBolisetty/SpringBatchUploadCSVFileToDatabase/tree/master/src/main/java/springbatch_example
You can retrieve and filter files' lists in a shared drive using JCIFS API method SmbFile.listFiles(String wildcard).

spring-integration-aws dynamic file download

I've a requirement to download a file from S3 based on a message content. In other words, the file to download is previously unknown, I've to search and find it at runtime. S3StreamingMessageSource doesn't seem to be a good fit because:
It relies on polling where as I need to wait for the message.
I can't find any way to create a S3StreamingMessageSource dynamically in the middle of a flow. gateway(IntegrationFlow) looks interesting but what I need is a gateway(Function<Message<?>, IntegrationFlow>) that doesn't exist.
Another candidate is S3MessageHandler but it has no support for listing files which I need for finding the desired file.
I can implement my own message handler using AWS API directly, just wondering if I'm missing something, because this doesn't seem like an unusual requirement. After all, not every app just sits there and keeps polling S3 for new files.
There is S3RemoteFileTemplate with the list() function which you can use in the handle(). Then split() result and call S3MessageHandler for each remote file to download.
Although the last one has functionality to download the whole remote dir.
For anyone coming across this question, this is what I did. The trick is to:
Set filters later, not at construction time. Note that there is no addFilters or getFilters method, so filters can only be set once, and can't be added later. #artem-bilan, this is inconvenient.
Call S3StreamingMessageSource.receive manually.
.handle(String.class, (fileName, h) -> {
if (messageSource instanceof S3StreamingMessageSource) {
S3StreamingMessageSource s3StreamingMessageSource = (S3StreamingMessageSource) messageSource;
ChainFileListFilter<S3ObjectSummary> chainFileListFilter = new ChainFileListFilter<>();
chainFileListFilter.addFilters(
new S3SimplePatternFileListFilter("**/*/*.json.gz"),
new S3PersistentAcceptOnceFileListFilter(metadataStore, ""),
new S3FileListFilter(fileName)
);
s3StreamingMessageSource.setFilter(chainFileListFilter);
return s3StreamingMessageSource.receive();
}
log.warn("Expected: {} but got: {}.",
S3StreamingMessageSource.class.getName(), messageSource.getClass().getName());
return messageSource.receive();
}, spec -> spec
.requiresReply(false) // in case all messages got filtered out
)

Reload rb scripts from different locations in JRuby

BACKGROUND:
I am using JRuby in an eclipse plugin for my product. I have a bunch of scripts that define a DSL and perform operations for me. I want to be able to dynamically reload these scripts whenever required. The scripts could change themselves on file system and moreover the location of the scripts could also change. I could even have multiple copies on file system of slightly modified/changed scripts. Each time I want scripts from a specific location to be utilized.
As I have understood so far, using "load" instead of "require" should do the job. So now if before calling any Ruby methods/functions I use "load 'XXX.rb'", it will reload the XXX.rb utilizing the new changes.
PROBLEM:
In my code I am using ScriptingContainer to run scriplets to access ruby functions. I set load paths on this scripting container to indicate from which locations the scripts should be loaded. However, the problem is that on subsequent calls and even with different instances of ScriptingContainer, I have noticed that the scripts that were loaded the first time are utilized every time. "load" reloads them, but after loading those scripts once, the next time I might need to load similar scripts from a different location but its not happening.
My assumption was that utilizing a different scripting container instance should have done the job but it seems that the load paths are globally set somewhere and calling "setLoadPath" on new ScriptingContainer instances either does not modify existing paths or only appends. If the latter case is true then probably when searching for scripts they are always found on oldest paths set and newer load paths get ignored.
Any ideas???
The solution is to specify scope for a ScriptingContainer instance when creating it. One of the ScriptingContainer constructors takes in a parameter of type LocalContextScope, use one of the constants to define the scope. See LocalContextScope.java
To test this defect and solution I have written a small snippet. You may try it out:
public class LoadPathProblem {
public static void main(String[] args) {
// Create the first container
ScriptingContainer c1 = new ScriptingContainer();
// FIX ScriptingContainer c1 = new ScriptingContainer(LocalContextScope.SINGLETHREAD);
// Setting a load path for scripts
String path1[] = new String[] { ".\\scripts\\one" };
c1.getProvider().setLoadPaths(Arrays.asList(path1));
// Run a script that requires loading scripts in the load path above
EvalUnit unit1 = c1.parse("load 'test.rb'\n" + "testCall");
IRubyObject ret1 = unit1.run();
System.out.println(JavaEmbedUtils.rubyToJava(ret1));
// Create the second container, completely independent of the first one
ScriptingContainer c2 = new ScriptingContainer();
// FIX ScriptingContainer c2 = new ScriptingContainer(LocalContextScope.SINGLETHREAD);
// Setting a different load path for this container as compared to the first container
String path2[] = new String[] { ".\\Scripts\\two" };
c2.getProvider().setLoadPaths(Arrays.asList(path2));
// Run a script that requires loading scripts in the different path
EvalUnit unit2 = c2.parse("load 'test.rb'\n" + "testCall");
IRubyObject ret2 = unit2.run();
/*
* PROBLEM: Expected here that the function testCall will be called from
* the .\scripts\two\test.rb, but as you can see the output says that
* the function was still called from .\scripts\one\test.rb.
*/
System.out.println(JavaEmbedUtils.rubyToJava(ret2));
}
}
Test scripts to try out the above code can be in different folders but with the same filename ("test.rb" for the above example:
./scripts/one/test.rb
def testCall
"Called one"
end
./scripts/two/test.rb
def testCall
"Called two"
end

Resources