Processing a huge CSV file uploaded using Spring controller - spring

Let's imagine the following situation: I have a user which using the admin panel, uploads a csv file and transforms that csv in a new one with additional data retrieved from the DB. This csv must be stored somewhere in our server and we want to perform this transformation asynchronously.
I know about Spring batch so I've tried how to figure out if there is any posibility to set the file of the batch process dynamically. I've made some tests and I've achieved to launch an spring batch job but using a hardcoded file setted in the bean constructor.
We are using grails and the spring-batch plugin. The thing is... Is there any other better way to process a huge CSV asynchronously without memory errors? I was revieweing this post Spring batch to upload a CSV file and insert into database accordingly but I don't know if it is the best approach.

Related

Which approach should I choose for testing my Spring Batch Job?

Currently I'm working on some integration tests for a Spring Batch application. Such application reads from a SQL table, writes on another table and, at the end, generates a report as a .txt file.
Initially I thought of just assuring that I had another file with the expected output and compare it with the report file and check the table content.
(For some context, I'm not very experienced on Spring).
But, after reading some articles on Baelung, I'm having doubts about my initial methodology.
Should I manipulate the table content on my code to assure that I have the expected input? Should I use the Spring Test framework tools? Without them, I'm able to run the job from my test?
The correct approach for batch job integration testing is to test the job as a black box. If the job reads data from a table and writes to another table or a file, you can proceed as follows:
Put some test data in the input table (Given)
Run your job (When)
Assert on the output table/file (Then)
You can find more details in the End-To-End Testing of Batch Jobs section of the reference documentation. Spring Batch provides some test utilities that might help in testing your jobs (like mocking batch domain objects, asserting on file content, etc). Please refer to the org.springframework.batch.test package.

Spring batch unit testing job with external input and db output

If I understand it correctly normal way of spring batch testing is to basically run my application and let JobLauncherTestUtils run my normal jobs. However my application reads input from external service and writes it to my database. I don't want my tests to write to my production database and I'd like to specify test input to be read rather from the files I'd provide than from external service.
Can anyone direct me to some example how I could do it? I'd like to feed a job with a file then when job has finished check in the database that what I expect is there. I guess I could specify h2 db in application-test.properties but I have no clue about the input.
Docs from https://docs.spring.io/spring-batch/4.1.x/reference/html/testing.html#testing don't really cover it for me.
Are you reading input files from disk? If so you can edit the input file source directory only for tests to be within the src/test/resources/input_dir/your_test_file.xml for example.
If the input file directory is configured with properties, you could create properties file only for tests with something like classpath:input_dir/your_test_file.xml (which would be in your project as src/test/resources/input_dir/your_test_file.xml).
If the input file directory is configured within execution context you can provide that in the jobExecutionContext parameter of JobLauncherTestUtils.launchStep

Spring Batch next run start fom last read page

I'm using Spring Batch to move data from two different databases and I would like that at every run the item reader job will start from the latest page (I'm using the JpaPagingItemReader).
Is there a way to do it?
Using the processed indicator flag pattern, you can configure your queries to only pick up items that have not been processed. You can read more about this pattern in the Spring Batch documentation here: https://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#process-indicator

Moved file to another location in Apache NIFI

I am trying to load to MySQL database using LOCAL INFILE however, i am having difficulties to move the files to a new location once they file has been successfully imported in MySql.
Below is a screen show of the process-flow.
My problem is:
I am managed to import/ load the database using the LOAD DATA LOCAL INFILE of MySql but the issue is when I am trying to move the successfully imported files to the correct directory. I fail to achieve so. The PutFile_sucess & PutFile_fail do not work as expected, so I decided to use: FetchFile and then I get an empty file when I say FetchFile it just creates it instead of moving the whole file.
I hope I have made myself clear, I would appreciate any inputs.
if your issue is to remove the file once imported, you could just add a FetchFile processor somewhere after your sucess part and set the Completion Strategy to Delete File
However, better aproach will be to load the content of the file in Nifi then parse/split/process it and then (eventually regroup by batch) ingest the content in MySQL.
Could you maybe improve your question with informations like the format/structure/content of the file you're trying to load ?

How to process multiple CSV format files using Spring batch

I am using spring batch to process my inbound files, below is my use-case
will be receiving a zip contains 15 files of CSV format
I need to process them in parallel
after all files were processed need to do some calculation and report should be send out.
Could anyone suggest me how to implement this using Spring Batch.
I would like to follow the below approach
Partitioner
Unzip the zip file
For each of CSV file, create a ExecutionContext and add to Queue for pararell processing.
Reader will be CSV Reader provided by Spring Batch.
Listener will be used to send Report when all processes are done.
Please refer this one as an example.
If you want exactly the same as your requirement, please let me know I can post one for you.
Nghia

Resources