writing multiple files (different content) using spring batch - spring

I have a requirement to write multiple files using Spring Batch. The first file will be written based on the data from the database table. The second file will contain just the number of records written to the first file. How can I create the second file? I am not sure whether org.springframework.batch.item.file.MultiResourceItemWriter is an option for me as I think it will write multiple files based on the data it will write chunks of data in the multiple files. Correct me if I am wrong here.
Please do suggest some options with sample code if possible.

You have couple of options:
You can use CompositeItemWriter which calls collection of item writers in defined order so you can define one item writer which will write records based on data from DB and second will count records and write that to another file.
You can write data to a file in first step, finish whole file and save it somewhere, you can save counter of records if that is all you need to StepContext (common batch patterns and scroll to 11.8 Passing Data to Future Steps) and read in new Taskletcounter and save to new file.
If you want to go with option 1 which I think is right choice you can check this example of batch job configuration with CompositeItemWriter

Related

How to read and perform batch processing using spring batch annotation config

I have 2 different file with different data. The file contains 10K record per day.
Ex:
Productname price date
T shirt,500,051221
Pant,1000,051221
Productname price date
T shirt,800,061221
Pant,1800,061221
I want to create final output file by checking price difference by todays and yesterdays file.
Ex:
Productname price
T shirt,300
Pant,800
By using spring batch I have to do this.
I have tried with batch configuration by creating two different step. but its only able to read the data. but unable to
do the processing. because here I need the data of both file for processing. but in my case its reading one step after another.
Could anyone help me on this with some sample code.
I would suggest to save FlatFile data into the database for yesterday's and today's date (may be two separate tables or in a same table if you can identify difference two records easily). Read this stored data using JdbcCursorItemReader or PagingItemReader and perform calculation/logic/massaging of data at the processor level and create a new FlatFile or save into DB as per convenience. OOTB Spring Batch does not provide facility to read data and perform calculation.
Suggestion - Read data from both the FlatFile keep it in cache and read from the cache and do the further processing.

NiFi: how to get maximum timestamp from first column?

NiFi version 1.5
i have a csv file arrives first time like:
datetime,a.DLG,b.DLG,c.DLG
2019/02/04 00:00,86667,98.5,0
2019/02/04 01:00,86567,96.5,0
used listfile -> fetchfile to get the csv file.
next 10 minutes, i get appended csv file:
datetime,a.DLG,b.DLG,c.DLG
2019/02/04 00:00,86667,98.5,0
2019/02/04 01:00,86567,96.5,0
2019/02/04 02:00,86787,99.5,0
2019/02/04 03:00,86117,91.5,0
here, how do we need to get only new records alone (last two records). i do not want to process first two records that is already been processed.
my thought process is, we need to get maximum datetime to store in attribute and use QueryRecord. but i do not know how to get maximum datetime using which processor.
is there any better solution.
This is currently an open issue (NIFI-6047) but there has been a community contribution to address it, so you may see the DetectDuplicateRecord processor in an upcoming release of NiFi.
There may be a workaround to split up the CSV rows and create a compound key using ExtractText, then using DetectDuplicate.
It doesn't seems to be a work that is best solved on Nifi as you need to keep a state of what you have processed. An alternative would be for you to delete what you have already processed. Then you can assume what is in the file is always not processed.
here, how do we need to get only new records alone (last two records).
i do not want to process first two records that is already been
processed.
From my understanding, actual question is 'how to process/ingest csv rows as it is written to the file?'.
Description of 'TailFile' processor from NiFi documentation:
"Tails" a file, or a list of files, ingesting data from the file as it
is written to the file. The file is expected to be textual. Data is
ingested only when a new line is encountered (carriage return or
new-line character or combination)
This solution is appropriate when you don't want to move/delete actual file.

read data through spring batch and return data outside the job

I read everywhere how to read data in spring batch itemReader and write in database using itemWriter, but I wanted to just read data using spring batch then somehow I wanted to access this list of items outside the job. I need to perform remaining processing after job finished.
The reason I wanted to do this is because I need to perform a lot of validations on every item. I have to validate each item's variable xyz if it exists in list(which is not available within job). After performing a lot of processing I have to insert information in different tables using JPA. Please help me out!

How to levarage spring batch without using POJO?

I know BeanWrapperFieldSetMapper class depends on POJO.
But here is the thing: If I want to take advantage of Spring Batch features but do not want to create separate jobs ( does not want to write POJOs and separate reader writes or mappers) how to do this?
My requirement is to read *.csv file which will have the headers so I should be able to supply header names in a map or string[] and create my sql statement based on it, instead of writing a RowMapper.
This will help me uploading various files to different tables.
Is it possible to change BeanWrapperFieldSetMapper to make it suitable to map the values from Map or String[]?
Also Even if I do not have headers in the *.cvs file, I can construct update statement and load using chunk delimeters setting and other advantages of Spring Batch.

Spring Batch Add Custom Fields

I've never used Spring Batch before but it seems like a viable option for what I am attempting to accomplish. I have about 15 CSV files for 10 institutions that I need to process nightly. I am stashing the CSV into staging tables in an Oracle database.
The CSV File may look something like this.
DEPARTMENT_ID,DEPARTMENT_NAME,DEPARTMENT_CODE
100,Computer Science & Engineering,C5321
101,Math,M333
...
However when I process the row and add it to the database I need to fill in an institution id which would be determined based on the folder being processed at that time.
The database table would like like this
INSTITUTION_ID,DEPARTMENT_ID,DEPARTMENT_NAME,DEPARTMENT_CODE
1100,100,Computer Science & Engineering,C5321
There is also validation that needs to be done on each row in the CSV files as well. Is that something Spring Batch can handle as well?
I've seen reference to CustomItemReader and CustomItemWriter but not sure if that is what I need. The examples I've seen seem basic just dumping a CSV exactly as it is into a matching table.
Yes , all the task that you have reported can be done by spring batch -
For the Reader you may use - multi Resource Item Reader with your wild card name matching your - file names .
To validate the rows from file you can use item processor and handle the validation.
And for your case you need not use the custom item writer - you can configure the item writer as DB item writer in your XML file.
I suggest you to use the XML based approach for Spring batch implementation.
The XML will be used to configure all the architecture of your batch - as in
job -- step -- chunk -- reader -- processor -- writer
and to track errors and exceptions you can implement listeners at each stage.
-- step Execution Listener
-- Item Reader Listener
-- Item Processor Listener
-- Item Writer Listener

Resources