Spring boot aws cluster instance scheduler - spring-boot

I have a spring-boot application, which takes request from users and save data in db.
There are certain integration calls need with the data saved. So I thought a scheduler task for every 15 mins which should pick this data and do necessary calls.
But my application is being deployed in AWS EC2 on 2 instances. So this scheduler process will run on both the instances, which will cause duplicate integration calls.
Any suggestions on how this can be achieved to avoid duplicate calls.
I haven't had any code as of now to share.
Please share your thoughts...Thanks.

It seems a similar question was answered here: Spring Scheduled Task running in clustered environment
My take:
1) Easy - you can move the scheduled process to a separate instance from the ones that service request traffic, and only run it on one instance, a "job server" if you will.
2) Most scalable - have the scheduled task on two instances but they will somehow have to synchronize who is active and who is standby (perhaps with a cache such as AWS Elasticache). Or you can switch over to using Quartz job scheduler with JDBCJobStore persistence, and it can coordinate which of the 2 instances gets to run the job. http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/tutorial-lesson-09.html

Related

High availability in web application with Spring boot

We are developing a web server which allows user to submit spark jobs to run a hadoop cluster, and the web server will help to create a new cluster and keep monitoring the job.
We deployed the web server in 3 nodes and put a loader balancer in front of them.
The High Availability requirement is that once user has submitted the job, there must be one server keep monitoring it, in case the server is done, then another server should take this task and monitoring the job, so that it has no any impact to user.
Is there any suggested way to do that? What I could think is put all job information to some central storage(a table in a database), and all server keep polling the job info from the table, using distributed lock to ensure there will be only one and always be one server lock each row in the table hence monitoring that job.
Looks like hazelcast solution sounds ok.
high availability singleton processor in Tomcat
And still checking whether this is the best when doing in AWS.

Quartz with centralized scheduling and monitoring

We are trying to revamp our batch job scheduling and monitoring process over the entire enterprise. Currently all our batch jobs are scheduled using Unix crontab and are monitored using log files generated by shell scripts.
This process has lot of disadvantages and as the number of applications grow this gets really complicated.
Two copies of applications need to be deployed one to App-Server and one as standalone(since business logic is shared between both). This is complicating our build process too.
There is no easy of use web-ui for us to see the status of jobs and manually run failed jobs remotely without getting onto the unix box.
There is no fail over or load balanced batch processing.
So I was thinking of using Quartz (with our existing Spring apps) in our applications and deploy them to App-Servers and no longer rely on the unix crontab.
Is there a way I can write a centralized web application from where I can schedule and monitor jobs running on different quartz schedulers on different app servers?
P.S: I know quartzdesk.com is one solution, but I don't want to enable RMI on my JVM.
You could use SpringBoot scheduler as an Orchestrator and call REST APIs for the remote (or local, if you are small) execution. This way, as your app grows you could easily leverage a load balancer.
If you have the possibility of using cloud services (like Amazon, Azure or Google Cloud), this could be done easily using their own load balancers. They also support docker and could take care of any peaks of utilization.

scheduling jobs to clustered quartz1.8.6 from non-cluster configured quartz scheduler instance

I'm using quartz 1.8.6 in clustered mode with 4 instances. Now, I observed high contention on table QRTZ_LOCKS. My application also provide webservices for online clients. This webservices also do scheduling of new jobs. Now, I see timeout exceptions on those webservices, because when they want to schedule new job they wait too loooong to obtain lock on QRTZ_LOCKS table. It's important for me to establish 100% reliable operation for webservices (more important than quartz jobs operations). Is it possible to start quartz job runner on 1 instance only and other 3 instances configure with org.quartz.jobStore.isClustered=false to allow them perform scheduling WITHOUT getting lock on QRTZ_LOCKS?
update: Actually, if I plan to run only one instance with job runner and all others just allowed to add new jobs this won't be a cluster anymore. So, actual question would be: is it possible to configure org.quartz.jobStore.isClustered=false to all 4 instances, make only 1 instance run jobs, but allow all 4 to schedule new jobs to same jdbc storage?
Try to turn batch mode on, and set maximum batch count to the amount of threads, available for quartz scheduler.
http://www.ebaytechblog.com/2016/01/14/performance-tuning-on-quartz-scheduler/

AWS - Load Balanced Instances & Cron Jobs

I have a Laravel application where the Application servers are behind a Load Balancer. On these Application servers, I have cron jobs running, some of which should only be run once (or run on one instance).
I did some research and found that people seem to favor a lock-system, where you keep all the cron jobs active on each application box, and when one goes to process a job, you create some sort of lock so the others know not to process the same job.
I was wondering if anyone had more details on this procedure in regards to AWS, or if there's a better solution for this problem?
You can build distributed locking mechanisms on AWS using DynamoDB with strongly consistent reads. You can also do something similar using Redis (ElastiCache).
Alternatively, you could use Lambda scheduled events to send a request to your load balancer on a cron schedule. Since only one back-end server would receive the request that server could execute the cron job.
These solutions tend to break when your autoscaling group experiences a scale-in event and the server processing the task gets deleted. I prefer to have a small server, like a t2.nano, that isn't part of the cluster and schedule cron jobs on that.
Check out this package for Laravel implementation of the lock system (DB implementation):
https://packagist.org/packages/jdavidbakr/multi-server-event
Also, this pull request solves this problem using the lock system (cache implementation):
https://github.com/laravel/framework/pull/10965
If you need to run stuff only once globally (so not once on every server) and 'lock' the thing that needs to be run, I highly recommend using AWS SQS because it offers exactly that: run a cron to fetch a ticket. If you get one, parse it. Otherwise, do nothing. So all crons are active on all machines, but tickets are 'in flight' when some machine requests a ticket and that specific ticket cannot be requested by another machine.

Scheduled tasks with multiple servers - single point of responsibility

We have a Spring + JPA web application.
We use two tomcat servers that run both application and uses the same DB.
One of our application requirmemnt is to preform cron \ scheduled tasks.
After a short research we found that spring framework delivers a very straight forward solution to cron jobs,
(Annotation based solution)
However since both tomcats running the same webapp - if we will use this spring's solution we will create a very problematic scenario where 2 crons are running at the same time (each on a different tomcat)
Is there any way to solve this issue? maybe this alternative is not good for our purpose?
thanks!
As a general rule, you're going to want to save a setting to indicate that a job is running. Similar to how "Spring Batch" does the trick, you might want to create a table in your database simply for storing a job execution. You can choose to implement this however you'd like, but ultimately, your scheduled tasks should check the database to see if an identical task is already running, and if not, proceed with the task execution. Once the task has completed, update the database appropriately so that a future execution will be able to proceed.
#kungfuters solution is certainly a better end goal, but as a simple first implementation, you could use a property to enable/disable the tasks, and only have the tasks run on one of the servers.

Resources