I'm using Spring Batch to move data from two different databases and I would like that at every run the item reader job will start from the latest page (I'm using the JpaPagingItemReader).
Is there a way to do it?
Using the processed indicator flag pattern, you can configure your queries to only pick up items that have not been processed. You can read more about this pattern in the Spring Batch documentation here: https://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#process-indicator
Related
I believe I've got a scoping issue here.
Project explanation:
The goal is to process any incoming file (on disk), including meta data (which is stored in an SQL database). For this I have two tasklets (FileReservation and FileProcessorTask) which are the steps in the overarching "worker" jobs. They wait for an event to start their work. There are several threads dealing with jobs for concurrency. The FileReservation tasklet sends the fileId to FileProcessorTask using the job context.
A separate job (which runs indefinitely) checks for new file meta data records in the database and upon discovering new records "wakes up" the FileReservationTask tasklets using a published event.
With the current configuration the second step in a job can receive a null message when the FileReservation tasklets are awoken.
If you uncomment the code in BatchConfiguration you'll see that it works when we have separate instances of the beans.
Any pointers are greatly appreciated.
Thanks!
Polling a folder for new files is not suitable for a batch job. So using a Spring Batch job (filePollingJob) is not a good idea IMO.
Any pointers are greatly appreciated.
Polling a folder for new files and running a job for each incoming file is a common use case, which can be implemented using a java.nio.file.WatchService or a FileInboundChannelAdapter from Spring integration. See How do I kickoff a batch job when input file arrives? for more details.
Currently I'm working on some integration tests for a Spring Batch application. Such application reads from a SQL table, writes on another table and, at the end, generates a report as a .txt file.
Initially I thought of just assuring that I had another file with the expected output and compare it with the report file and check the table content.
(For some context, I'm not very experienced on Spring).
But, after reading some articles on Baelung, I'm having doubts about my initial methodology.
Should I manipulate the table content on my code to assure that I have the expected input? Should I use the Spring Test framework tools? Without them, I'm able to run the job from my test?
The correct approach for batch job integration testing is to test the job as a black box. If the job reads data from a table and writes to another table or a file, you can proceed as follows:
Put some test data in the input table (Given)
Run your job (When)
Assert on the output table/file (Then)
You can find more details in the End-To-End Testing of Batch Jobs section of the reference documentation. Spring Batch provides some test utilities that might help in testing your jobs (like mocking batch domain objects, asserting on file content, etc). Please refer to the org.springframework.batch.test package.
Let's imagine the following situation: I have a user which using the admin panel, uploads a csv file and transforms that csv in a new one with additional data retrieved from the DB. This csv must be stored somewhere in our server and we want to perform this transformation asynchronously.
I know about Spring batch so I've tried how to figure out if there is any posibility to set the file of the batch process dynamically. I've made some tests and I've achieved to launch an spring batch job but using a hardcoded file setted in the bean constructor.
We are using grails and the spring-batch plugin. The thing is... Is there any other better way to process a huge CSV asynchronously without memory errors? I was revieweing this post Spring batch to upload a CSV file and insert into database accordingly but I don't know if it is the best approach.
I have a spring batch integration where multiple servers are polling a single file directory. This causes a problem where a file can be processed up by more than one. I have attempted to add a nio-lock onto the file once a server has got it but this locks the file for processing so it can't read the contents of the file.
Is there a spring batch/integration solution to this problem or is there a way to rename the file as soon as it is picked up by a node?
Consider to use FileSystemPersistentAcceptOnceFileListFilter with the shared MetadataStore: http://docs.spring.io/spring-integration/reference/html/system-management-chapter.html#metadata-store
So, only one instance of your application will be able to pick up a file.
Even if we find a solution for nio-lock, you should understand that lock means "do not touch until freed". Therefore when one instance has done its work, another one is ready to pick up the file. I guess that isn't your goal.
I am using spring batch to process my inbound files, below is my use-case
will be receiving a zip contains 15 files of CSV format
I need to process them in parallel
after all files were processed need to do some calculation and report should be send out.
Could anyone suggest me how to implement this using Spring Batch.
I would like to follow the below approach
Partitioner
Unzip the zip file
For each of CSV file, create a ExecutionContext and add to Queue for pararell processing.
Reader will be CSV Reader provided by Spring Batch.
Listener will be used to send Report when all processes are done.
Please refer this one as an example.
If you want exactly the same as your requirement, please let me know I can post one for you.
Nghia