Need help to understand what constitutes as a step and a job and how to configure them as a spring batch program. Scenario is .
<pre>
<datasource name="xyz">
<searchcriteria name="ab" parafield1="content_i" parafield2="supplier-1"/>
<searchcriteria name="ab" parafield1="content_i" parafield2="supplier-1"/>
<searchcriteria .../>
</datasource>
</pre>
Read a set of search parameters from an external XML file as above.
For every search criteria, step 3 and step 4 has to be done.
System as to hit a SOAP service which might return with 10K odd records.
For every 250 records (endpoint capacity constraint) in 10K odd results, I have to hit another SOAP service and the results should written to 3 csv files, 2 of which are consolidated and 1 file for every record (250 files). Writing of 2 files and 1 csv file can be in parallel.
Design decisions
I cannot have a one job launcher for every search, as there is capacity constraints at the source. No no parallel search.
No DB involvement, hence no need of metadata DB
No restart ability required. It is always from the beginning.
Question (edited)
Would like to have XML reader in the first step (no processing , no writer) and for every search (read in first step), how should I repeat step 2 where again I read (call services) and generate CSV files (split between 2 writers) ?
Using Spring boot 2.0.4.
Thanks in advance
Related
I have some troubles with the MergeRecord processor in Nifi. You can see the whole Nifi flow below: I'm getting a json array from an API, then I split it, I apply some filters and then I want to build the json array again.
Nifi workflow
I'm able to build the good json array from all the chunks, but the problem is that the processor is generating data indefinitely. When I execute the job step by step (by starting / stopping every processors one by one) everything is fine, but when the MergeRecord is running it's generating the same data even if I stop the begin of the flow (so there is no more inputs...)
You can see a screenshot below of the data in the "merged" box that are stacking
data stacked
I scheduled this processor every 10 sec, and after 30 sec you can see that it executed 3 times and generated 3 times the same file while there is no more data above. It's weird because when you look at the "original" box of the processor I can see the right original amount of data (18,43Kb). But the merged part is still increasing...
Here is the configuration of the MergeRecord:
configuration
I suppose that I'm missing something but I don't know why !
Thank you for your help,
Regards,
Thomas
I am quite new to Spring Batch and am stuck with a problem for which I could not find a solution.
I have create a job which has a step and two flows:
Step 1:
Retrieves a list of contract numbers(for simplification, a unique number which will be used to search further records). Using ItemReader single chunk, it will pass a single contract number to next step.
Flow 1:
This flow has a Step(Reader,Processor,Writer) whose Reader will pick this contract number and retrieve a list of member ids. These Ids will be passed in chunks(of 10) to the processor.
The processor will further perform several Query calls to finally create a Participant details list to the writer. The writer will write this data in chunks to the Workbook object.
Flow 2: Once all the data is written in workbook, the object is sent as a file to a remote location. This process is done using a tasklet which has all the necessary details to send file to the proposed destination.
Now, once this Entire process is completed (Step 1-> Flow 1-> Flow 2) it checks whether any more contract details are to be written into the remote location.
If yes, another Contract number is retrieved from the list which is then passed to the flows(flow1 and flow2)
Once all the contract numbers are processed then the code completes with RepeatStatus.FINISHED
Adding a diagram for better understanding:
Diagrammatic representation of the above explanation
It looks something like this:
Job
-> Step 1 (retrieve Id number list but send a single contract number)
-> Flow 1
-> Reader
-> Processor
-> Writer
-> Flow 2
-> Tasklet (Send file to remote location)
(If all contract-numbers are not processed go to Step 1 and iterate to the next contract-number else finish the job)
My problems start here:
How do I jump back from flow 2 back to Step 1 based on a condition? I do find several suggestions where people add a decider loop but you can go back to the previous step (in this case of the condition is not satisfied in flow 2, flow 2 will be re-triggered). So how do I jump back from flow2 to Step 1?
How do I pass data between all the steps and flows throughout job? (without using execution context)
If you think there is a better way to do this please do suggest.
How do I jump back from flow 2 back to Step 1 based on a condition?
Use a JobExecutionDecider for that. You need to make sure that step 1 is allowed to restart even if complete (parameter allowStartIfComplete=true)
How do I pass data between all the steps and flows throughout job? (without using execution context)
If you don't want to share data through the execution context, you can use a shared object between steps.
Please let me know , when i am putting say 5 files in a directory , 5 messages gets generated by the poller , i want that the spring batch job will get triggered only one time, not five times ,if the files are coming together say within 1 min duration. is it possible?
You may consider to use an Aggregator for this kind of task. So, you will collect several files together by expected size or withing some time window. You need to use some static correlationKey to let the component to group files.
When the group is ready, a single message is emitted and you are good to trigger a Batch job for this set of files.
I am using spring batch module to read a complex file with multi-line records. First 3 lines in the file will always contain a header with few common fields.
These common fields will be used in the processing of subsequent records in the file. The job is restartable.
Suppose the input file has 10 records (please note number of records may not be same as number of lines since records can span over multiple lines).
Suppose job runs first time, starts reading the file from line 1, and processes first 5 records and fails while processing 6th record.
During this first run, since job has also parsed header part (first 3 lines in the file), application can successfully process first 5 records.
Now when failed job restarted it will start from 6th record and hence will not read the header part this time. Since application requires certain values
contained in the header record, the job fails. I would like to know possible suggestions so that restarted job always reads the header part and then starts
from where it left off (6th record in the above scenario).
Thanks in advance.
i guess, the file in question does not change between runs? then it's not necessary to re-read it, my solution builds on this assumption
if you use one step you can
implement a LineCallbackHandler
give it access to the stepExecutionContext (it's easy with annotations, but can be too with interfaces, just extend StepExecutionListenerSupport)
save the header values into the ExecutionContext
extract them from the context and use them where you want to
it should work for re-start as well, because Spring Batch reads/saves the values from the first run and will provide the complete ExecutionContext for subsequent runs
You can make 2 step job where:
First step reads first 3 lines as header information and puts everything you need to job context (and therefore save it in DB for future executions if job fails). If this step fails, header info will be read again and if it passes you are sure it will always have header info in job context.
Second step can use same file for input but this time you can tell it to skip first 3 lines and read rest as is. This way you will get restartability on that step and each time job fails it will resume where it left of.
I have requirement where I have to deal with multiple files (say 300 csv files).
I need to read --> process --> write, each individual file as I need to apply some transformation logic on the data.
For each input file there would be a corresponding transformed file. so for 300 input files we would have 300 output files.
At the end, all the 300 output files are needed to be merged into a single file which would be compressed and then transferred to a remote location over FTP/SFTP.
Say, every hour we would have to deal with a new set of 300 file on which we would be required to apply the above processing, so we would be scheduling the above job per hour.
How to handle multi file processing in the above scenario using Spring Batch ?
How to make the above processing to happen in multiple threads ?
Please suggest.
Thanks in advance.
You can use spring task execution and scheduling and then use java ThreadPoolExecutor
Check this answer here at SO for a very simple example.