How to process multiple CSV format files using Spring batch - spring

I am using spring batch to process my inbound files, below is my use-case
will be receiving a zip contains 15 files of CSV format
I need to process them in parallel
after all files were processed need to do some calculation and report should be send out.
Could anyone suggest me how to implement this using Spring Batch.

I would like to follow the below approach
Partitioner
Unzip the zip file
For each of CSV file, create a ExecutionContext and add to Queue for pararell processing.
Reader will be CSV Reader provided by Spring Batch.
Listener will be used to send Report when all processes are done.
Please refer this one as an example.
If you want exactly the same as your requirement, please let me know I can post one for you.
Nghia

Related

Can I delete file in Nifi after send messages to kafka?

Hi I'm using nifi as an ETL tool.
Process IMG
This is my current process. I use TailFile to detect CSV file and then send messages to Kafka.
It works fine so far, but i want to delete CSV file after i send contents of csv to Kafka.
Is there any way?
Thanks
This depends on why you are using TailFile. From the docs,
"Tails" a file, or a list of files, ingesting data from the file as it is written to the file
TailFile is used to get new lines that are added to the same file, as they are written. If you need to a tail a file that is being written to, what condition determines it is no longer being written to?
However, if you are just consuming complete files from the local file system, then you could use GetFile which gives the option to delete the file after it is consumed.
From a remote file system, you could use ListSFTP and FetchSFTP which has a Completion Strategy to move or delete.

Correct scope for multi threaded batch jobs in spring

I believe I've got a scoping issue here.
Project explanation:
The goal is to process any incoming file (on disk), including meta data (which is stored in an SQL database). For this I have two tasklets (FileReservation and FileProcessorTask) which are the steps in the overarching "worker" jobs. They wait for an event to start their work. There are several threads dealing with jobs for concurrency. The FileReservation tasklet sends the fileId to FileProcessorTask using the job context.
A separate job (which runs indefinitely) checks for new file meta data records in the database and upon discovering new records "wakes up" the FileReservationTask tasklets using a published event.
With the current configuration the second step in a job can receive a null message when the FileReservation tasklets are awoken.
If you uncomment the code in BatchConfiguration you'll see that it works when we have separate instances of the beans.
Any pointers are greatly appreciated.
Thanks!
Polling a folder for new files is not suitable for a batch job. So using a Spring Batch job (filePollingJob) is not a good idea IMO.
Any pointers are greatly appreciated.
Polling a folder for new files and running a job for each incoming file is a common use case, which can be implemented using a java.nio.file.WatchService or a FileInboundChannelAdapter from Spring integration. See How do I kickoff a batch job when input file arrives? for more details.

Reading multiple files in a folder and parsing it and writing to another folder

I am new to spring batch. My requirement is, I have a folder say D:\xyzfolder\source which is having 25 flat files. Using spring batch I need to read and implement some business logic and write all 25 files with the same name into a different folder say D:\xyzfolder\destination
Currently, I am using MultiResourceItemReader and reading all the 25 files from the source folder and I am able to write into a single file using FlatFileItemWriter with setResource(outputResource) but my requirement is to write as 25 different files. Please suggest how to achieve the above requirement
For a similar use case this answer https://stackoverflow.com/a/20356050/4767829 suggests using a MultiResourceItemWriter in combination with an ItemWriteListener that dynamically sets the output resource for each item.

Processing a huge CSV file uploaded using Spring controller

Let's imagine the following situation: I have a user which using the admin panel, uploads a csv file and transforms that csv in a new one with additional data retrieved from the DB. This csv must be stored somewhere in our server and we want to perform this transformation asynchronously.
I know about Spring batch so I've tried how to figure out if there is any posibility to set the file of the batch process dynamically. I've made some tests and I've achieved to launch an spring batch job but using a hardcoded file setted in the bean constructor.
We are using grails and the spring-batch plugin. The thing is... Is there any other better way to process a huge CSV asynchronously without memory errors? I was revieweing this post Spring batch to upload a CSV file and insert into database accordingly but I don't know if it is the best approach.

Spring Batch next run start fom last read page

I'm using Spring Batch to move data from two different databases and I would like that at every run the item reader job will start from the latest page (I'm using the JpaPagingItemReader).
Is there a way to do it?
Using the processed indicator flag pattern, you can configure your queries to only pick up items that have not been processed. You can read more about this pattern in the Spring Batch documentation here: https://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#process-indicator

Resources