i want to read multiple xml files in spring batch , and before reading i should validate the name of each file and put it in the context , how can i process ?
is it possible to have this senario using tasklet and reader writer processor ?? :
folder : file1.xml file2.xml file3.xml
validate filename (file1.xml) -- read file1 -- process -- write
then
validate filename (file2.xml) -- read file2 -- write
validate filename (file3.xml) -- read file3 -- write
......
or any other way ??????
There are three approaches you can take with this. Each has it's benefits and weaknesses.
Use a step to validate
You can set your job up with a validateFileName step that preceedes the step that processes the files (say processFiles). The validateFileName step would do any validations on the file name needed then provide the files to process to the next step. How it communicates this could be a simple as moving the valid files to a new directory or as complex as using the job's ExecutionContext to store the names of the files to process.
The advantage of this is that it decouples validation from processing. The disadvantage is that it makes the job slightly more complex given that you'd have an extra step.
Use a StepExecutionListener to do the validation
You could use a StepExecutionListener#beforeStep() call to do the validation. Same concepts apply as before with regards to how to communicate what validates and what doesn't.
This may be a less complex option, but it more tightly couples (albeit marginally) the processing and validation.
Use an ItemReader that validates before it reads
This last option is to write an ItemReader implementation that is similar to the MultiResourceItemReader but provides a hook into validating the file before reading it. If the file doesn't validate, you would skip it.
This option again couples validation with the processing, but may provide a nice reusable abstraction for this particular use case.
I hope this helps!
Related
I need to read from a DB and based on that result I need to fetch data from another DB which is on another server and after need to write it in file. Now solution that came in mind to use Spring Batch reader for reading from first DB and using we can read from 2nd DB in process.
But in this process what I feel that in process reading is not good idea because it processes single data in one time. (Please correct me if I am wrong)
Is there any other way to do this so that we can perform this task in efficient way.
Thanks in advance
Please Suggest me what could be the options
I have to write a Spring Batch job as follows:
Step 1: Load an XML file from the file system and write its contents to a database staging table
Step 2: Call Oracle PL/SQL procedure to process the staging table.
(Comments on that job structure are welcome, but not the question).
In Step 1, I want to move the XML file to another directory after I have loaded it. I want this, as much as possible, to be "transactional" with the write to the staging table. That is, either both the writes to staging and the file move succeed, or neither does.
I feel this necessary because if (A) the staging writes happen but the file does not move, the next run will pick up the file again and process it again and (B) if the file gets moved but the staging writes do not happen, then we will have missed that file's processing.
This interface's requirements are all about robustness. I know I could just put a step execution listener to move all the files at the end, but I want the approach that is going to guarantee that we never miss processing data and never process the same file twice.
Part of the difficulty is that I am using a MultiResourceItemReader. I read that ChunkListener.beforeChunk() happens as part of the chunk transaction, so I tried to make a custom chunk CompletionPolicy to force chunks to complete after each change of resource (file) name, but I could not get it to work. In any case, I would have needed an afterChunk() listener, which is not part of the transaction anyway.
I'll take any guidance on my specific questions or an expert explanation of how to robustly process files in Spring Batch (which I am only just learning). Thanks!
I have pretty similar spring batch process right now.
Spring batch fits good to your requirement.
I would recommend to start using here spring integration.
In spring integration you can configure to monitor your folder and then make it trigger batch job. There is good example in official documentation.
Then you should use powerful concept of spring batch - identifying parameters. Spring batch job runs with unique parameters, and if you put this parameter as identifying, then no other job could be spawned with same parameter (though you can restart your original job).
/**
* Add a new String parameter for the given key.
*
* #param key - parameter accessor.
* #param parameter - runtime parameter
* #param identifying - indicates if the parameter is used as part of identifying a job instance
* #return a reference to this object.
*/
public JobParametersBuilder addString(String key, String parameter, boolean identifying) {
parameterMap.put(key, new JobParameter(parameter, identifying));
return this;
}
So here you need to ask yourself what is your uniquely identifying constraint for batch job? I would suggest it's full file path. But then you need to be sure that nobody provides different files with same filename.
Also spring integration can see if file was already seen by application and ignore it. Please check documentation on AcceptOnceFileListFilter.
If you want to have guaranteed 'transactional-like' logic in batch - then don't put it into Listeners, create a specific step which will move file. Listeners are good for suplimental logic.
In this way if this step will fail for any reason, you will still be able fix issue and to retry job.
This kind of process can be easy done with a job with 2 step and 1 listener:
A standard (read from XML -> process? -> write to DB) step; you don't care about restartability because SB is smart enough to avoid data read repetition
a listener attached to step 1 to move file after successfully step execution (example 1, example 2 or example 3)
A second step with data processing
#3 may may be inserted as step 1 process phase
We are developing the project by using spring batch partition.Our requirement is we will upload the file and validate the each record from file ,if all the records perfect then only will store in database.for that
We used Spring batch partitioning 1. ItemReader,CustomItemProcessor and CustomWriter.In ItemReader will read the data and CustomItemProcessor will validate the data finally CustomItemWriter will persist all the data in preparedStatement.Once all the process done finally will commit the data,how to do in one connection with more than one thread
You should be reading the csv file! You can probably do it in two ways:
Using parallel streams: Map each record into a object and with the Java8 parallel stream API you can achieve this
NIO: Using Non blocking IO you can achieve this a way faster.
This post might be helpful : How to read all lines of a file in parallel in Java 8
I am a newbie at Spring Batch and have recently started using it.
I have a requirement where I need to post/write the messages read from each DB record on different queues using single Job. As I have to use reader to read the messages from DB and use processor to decide on which queue I have to post it.
So my question is Can I use single JMSwriter to post the messages on different queues as I have to use single Job and DB Reader.
Thanks in Advance
As I know JMSwriter not supports it (it writes to default destination of jmsTemplate).
But you may just implement your own ItemWriter, inject all jmsTemplates in it and write custom decistion logic to select appropriate destionation and write to it.
Another way - use ClassifierCompositeItemWriter , put a set of JmsWriters to it and select one by your classifier
I am writing a Spring Batch application to do the following: There is an input table (PostgreSQL DB) to which someone continually adds rows - that is basically work items being added. For each of these rows, I need to fetch more data from another DB, do some processing, and then do an output transaction which can be multiple SQL queries touching multiple tables (this needs to be one transaction for consistency reasons).
Now, the part between the input and output should be a modular - it already has 3-4 logically separated things, and in future there would be more. This flow need not be linear - what processing is done next can be dependent on the result of previous. In short, this is basically like the flow you can setup using steps inside a job.
My main problem is this: Normally a single chunk processing step has both ItemReader and ItemWriter, i.e., input to output in a single step. So, should I include all the processing steps as part of a single ItemProcessor? How would I make a single ItemProcessor a stateful workflow in itself?
The other option is to make each step a Tasklet implementation, and write two tasklets myself to behave as ItemReader and ItemWriter.
Any suggestions?
Found an answer - yes you are effectively limited to a single step. But:
1) For linear workflows, you can "chain" itemprocessors - that is create a composite itemprocessor to which you can provide all the itemprocessors which do actual work through applicationContext.xml. Composite itemprocessor just runs them one by one. This is what I'm doing right now.
2) You can always create the internal subflow as a seperate spring batch workflow and call it through code in an itemprocessor similar to composite itemprocessor above. I might move to this in the future.