Part of my job is to poll DB for a specific result status and only after that I can continue with next job's steps.
Is it recommended to stop the job's process while doing polling in one of the steps(tasklet I guess) ?
Polling the DB for a specific result sounds like a situation where you need a scheduler.
Spring-batch assumes the scheduling of the job is done from outside it's scope.
You can use #Scheduled spring annotation if you want to keep all inside spring configuration or use an external tool like this.
If you have more complex situations, have a look at Spring Batch Integration.
Related
I have one #Scheduler job which runs on multiple servers in a clustered environment. However I want to restrict the job to run in only in one server and other servers should not run the same job once any other server has started it .
I have explored Spring Batch has lock mechanism using some Database table , but looking for any a solution only in spring task:scheduler.
I had the same problem and the solution what I implemented was a the Lock mechanism with Hazelcast and to made it easy to use I also added a proper annotation and a bit of spring AOP for that. So with this trick I was able to enforce a single schedule over the cluster done with a single annotation.
Spring Batch has this nice functionality that it would not run the job with same job arguments twice.
You can use this feature so that when a spring batch job kicks start in another server it does not run.
Usually people pass a timestamp as argument so it will by pass this logic, which you can change it.
When I use spring task handle a simple sync job! But when I deploy multi-node, how I make sure the cron job just run one time.
Maybe you say that:
1. Use distributed-lock control a flag before the crob job run.
2. Integrated quartz cluster function.
But I hope spring task #EnableScheduling can add a flag argument, so as we can set a flag when launch app.
We are using https://github.com/lukas-krecan/ShedLock with success, zookeeper provider in particular.
Spring boot, in a nutshell, doesn't allow any type of coordination between multiple instances
of the same microservice.
All the work of such a coordination is done by the third parties that spring boot gets integrated with.
One example of this is indeed a #Scheduled annotation.
Another is DB migration support via flyway.
When many nodes start and the migration has to be done, flyway is responsible to lock the migration table by itself, spring boot has nothing to do with it.
So, bottom line, there is no such support and all options that you've raised can work.
I'm willing to build a synchronization service in Java. The use case is, that i'm fetching data from an exchange-service (via Exchange Web Services), normalize the data a bit (process probably) and then write it to a backend via GraphQL. I already had a look around the spring modules, but am not quite sure what modules to use. I found spring batch and spring quartz.
The synchronization will have to trigger all X seconds, fetch information from the Exchange, look what's in the backend already and update what's needed.
Do you guys have any suggestions? I started implementing this whole thing in nodejs before, but as it has to run on both, Windows Servers and Docker/Linux, it has been a real pain to keep it running smooth (mostly because bundling nodejs to an application for Windows is pain).
Difference between Spring Batch & Quartz:
Spring Batch and Quartz have different goals. Spring Batch provides functionality for processing large volumes of data and Quartz provides functionality for scheduling tasks.
So Quartz could complement Spring Batch, A common combination would be to use Quartz as a trigger for a Spring Batch job using a Cron expression.
Conclusion : So basically Spring Batch defines what should be done, Quartz defines when it should be done.
Quartz is a scheduling framework. Like "execute something every hour or every last friday of the month"
Spring Batch is a framework that defines that "something" that will be executed.
You can define a job, that consists of steps. Usually a step is something that consists of item reader, optional item processor and item writer, but you can define a custom stem. You can also tell Spring batch to commit on every 10 items and a lot of other stuff.
You can use Quartz to start Spring Batch jobs.
Recommended for your use case :
Quartz scheduling as you want trigger after specific interval.
Reference :https://projects.spring.io/spring-batch/faq.html
I dont understand the difference between scheduled tasks and batch jobs in spring.
By scheduled tasks I mean the ones which are configured like these:
#EnableScheduling
public class AppConfig{
..
and used like
#Scheduled(fixedRate=550)
public void doSomething(){
..
By batch jobs I mean these:
#EnableBatchProcessing
public class AppConfig{
..
and lots of implementations like:
Jobs, Job launcher, Steps, ItemReader, ItemWriter... etc
I would like to know the main difference between them besides the implementation differences and also I am curious why to use batch jobs and make a lot of long implementations while we can use simple scheduled tasks. I mean the implementation of scheduled jobs is quite easy but maybe they had disadvantages according to the batch jobs?
Spring Scheduler is for orchestrating something based on a schedule. Spring Batch is a robust batch processing framework designed for the building of complex compute problems. Spring Batch does not handle the orchestration of jobs, just the building of them. You can orchestrate Spring Batch jobs with Spring Scheduler if you want.
2 aspects which i can think of: afaik when a job-run fails, in 2. run, it will run with the same job parameters.. at least you can configure this i think. and this kind of error situations which you can configure more easily than writing all in code in the same place manually (your scheduled method). Secondly, maybe batch gives a structure to your code when you also have to read your data from somewhere, and write somewhere... batch has some kind of reader, processor, writer schema.. Also some automatically created database tables (BATCH_JOB_INSTANCE) and batch job results.. like when the job started etc...
Edit: More Reasons for a batch: large amount of data, Transaction management, Chunk based processing, Declarative I/O, Start/Stop/Restart, Retry/Skip, Web based administration interface.
I have a situation where the data is to be read from 4 different web services, process it and then store the results in a database table. Also send a notification after this task is complete. The trigger for this process is through a web service call.
Should I write my job as a spring batch job or write the whole read/process code as an async method (using #Async) which is called from the Rest Controller?
Kindly suggest
In my opinion the your choice should be #Async, because Spring Batch was designed for large data processing and it isn't thought to processing on demand, typically you create a your batch and then launch the batch with a schedule. The benefit of this kind of architetture will be the reliability of your job that colud restarted in case of fail and so on. In your case you have a data integration problem and I can suggest to see at Spring Integration. You could have a Spring Integration pipeline that you start through a rest call.
I hope that this can help you
If there are large amounts of services should be executed, spring-batch would be the choice. Otherwise, I guess there is no need to import spring-batch.
In my opinion, #Async annotation is the easier way.
If both methods can work, of course simpler the better.
At the end, if there will be more and more service not only 4, spring-batch would be the better solution, cause spring-batch are professional in this.