How to read and process multiple files concurrently in spring? - spring

I am new to Spring framework and I am doing one simple project using spring and got stuck in between.
In my project I am reading the file from directory using spring poller. And then processing that file through various channels and sending it to the queue. But problem is that "file-inbound-channel-adapter" (which I'm using ) is reading only one file at a time.
So I need a solution which will read and process multiple files at a time.
Is there any way to implement multithreading in spring integration.
Thank you.

Add a task-executor to the poller; see the documentation.
You can control the concurrency with max-messages-per-poll and the task executor's pool size. See the complete poller configuration details for more information.

Related

Spring Batch and Apache Kafka

I am in learning phase of Kafka.I came across this video This Video
This confuses me alot. I am able to understand kafka consumer and producer and i can see lot of reference materials related to that. We have batch listeners already there so why we need spring batch support here .Is there any specific advantage of using spring kafka batch over using normal batch listeners? Please help me in understanding as i can't see any reference materials comparing both.What i felt that we have more freedom and customisations using normal consumer and producer.Please correct me if i am wrong.
Spring Batch is a batch processing framework (fixed data sets) while Kafka is a streaming platform (infinite data streams). Those tools address two different types of requirements and use cases.
However, there are many cases where you want to have a "bridge" between these two worlds. Here are a couple examples:
Replay a stream of events to create an application state up to a certain time: Here you can use a Spring Batch job that reads a Kafka topic from the beginning and replays all events (The KafkaItemReader can be helpful here)
Inject a set of events from a file or a database table in a live stream. The KafkaItemWriter can be used in this case.
etc
The advantage of using a Spring Batch job over a regular batch listener is all what Spring Batch offers in terms of transaction management, state management for restartability, fault-tolerance features, etc.

Spring batch or Spring core libraries for building file operation process

I'm dipping my toes into the microservices, is spring boot batch applicable to the following requirements?
Files of one or multiple are read from a specific directory in Linux.
Several operations like regex, build new files, write the file and ftp to a location
Send email during a process fail
Using spring boot is confirmed, now the question is
Should I use spring batch or just core spring framework?
I need to integrate with Control-M to trigger the job. Can the Control-M be completely removed by using Spring batch library? As we don't know when to expect the files in the directory.
I've not seen a POC with these requirements. Would someone provide an example POC or an affirmation this could be achieved with Spring batch?
I would use Spring Batch for that use case. Not only does it provide out of the box components for reading, processing, and writing files, it adds a lot more for error handling, scalability, etc. All of those things you'd probably end up wiring up by yourself if you go without Spring Batch.
As for being launched via Control-M, yes MANY large customers use Control-M to launch their jobs. Unfortunately, I've never done it myself so I cannot provide any details on the mechanics, but if Control-M can either launch a script or call a REST API, you can launch a job with it.
I would suggest you, go for spring batch as it has much-inbuilt functionality which will be provided to you for file reading and writing to your required location. Even you will be able to handle record skipping requirement. Your mail triggering requirement will be handled by Control M. You just need to decide one exit code for your handled exception and on the basis of that exit code you can trigger the mail to respective members. And there are many other features which will be helpful if you go for spring batch.

How do I kickoff a batch job when input file arrives?

We have Spring4 and Spring Batch 3 and our app consumes CSV files as input file. Currently we kick off the jobs manually from the command line, using CommandLineJobRunner with parms, including the name of the file to process.
I want to kick off a job to process asynchronously just as soon as the input file arrives in a monitored directory. How can we do that?
You may use java.nio.file.WatchService to monitor directory for a file.
Once file appears you may start (or kick off a job to process asynchronously) actual processing.
You may also use FileReadingMessageSource.WatchServiceDirectoryScanner from Spring Integration (https://docs.spring.io/spring-integration/reference/html/files.html#watch-service-directory-scanner)
Comparing release notes Spring Batch https://github.com/spring-projects/spring-batch/releases
to Spring Integration https://github.com/spring-projects/spring-integration/releases it looks that Spring Integration is released more often. It also has more features and Integration points.
In this case it looks like a overkill to bring Spring Integration if you just need to watch a directory for a file.
I would recommend using the powerful combination of Spring Batch with Spring Integration. For example, you can use a FileInboundChannelAdapter from Spring Integration to monitor a directory and start a Spring Batch Job as soon as the input file arrives.
There is a code example for this typical use case in the reference documentation of Spring Batch here: https://docs.spring.io/spring-batch/4.0.x/reference/html/spring-batch-integration.html#launching-batch-jobs-through-messages
I hope this helps.

Can I use Spring Integration as a daemon in order to poll a directory?

I am new to Spring Integration and I am considering using it in order to poll a directory for new files in order to process those files.
My question is: is Spring Integration some sort of daemon one can launch and that one can use in order to poll a directory?
Is this is possible can someone please direct me to relevant section of the official documentation on how to launch Spring Integration?
All you need is to have a main method (or a WAR file if you want to deploy to Tomcat or another servlet container) that creates a Spring ApplicationContext (e.g. new ClassPathXmlApplicationContext("file-poller.xml"))
It can run with a cron trigger, fixed-rate or fixed-delay trigger.
JMX operations can be exposed on Spring Integration's File adapter (or any adapter) by simply adding a single config element (e.g. <mbean-export>).
Bottom line: you REALLY do not need an ESB if you simply want a File poller to run continuously. You can have a single small config file and one line of code in a main method.
Visit the samples for more info: https://github.com/springsource/spring-integration-samples (look under basic/file specifically)
Hope that helps,
Mark
Spring Integration is a part of framework, its not a programm or daemon.
What you cant do — is to configure Spring Integration to poll a directory, lunch JVM with Spring onboard and poller will do what you want.
You can start with this blog post.
More samples
Relevant section of documentation

Spring Batch for File Processing

Is Spring Batch a good fit for processing a a large number of individual files?
Spring Batch seems to be geared towards data-centric jobs. I've got a requirement to pull down several million files from an S3 bucket, unzip them, perform some logic based on the contents, then call a web service.
Implementing this by hand is trivial, but I don't much fancy re-inventing the wheel when it comes to tracking job executions, and how far a job got along before it failed. Spring Batch seems to be an ideal fit for this job-monitoring, but I'm not sure whether subverting it to do file processing is a step too far.
Short answer is Yes, you can use spring batch for this. I had done a small POC where we had to migrate millions of images from source system to target system in a batch process and it works well IMHO.
Adding on to comment by #Prasanna Talakanti, I would suggest to use a combination of Spring Integration and Spring Batch. While Spring batch will provide you infrastructure for batch processing (Commit at intervals, restart job if failed etc), Spring integration will provide you things around web service gateways.
In Spring batch, you can define reader for reading data from S3 and writer for writing to your destination with processor in between if needed. You could also fine tune the commit interval so if the job fails in between, you have a point of rollback.

Resources