High availability in web application with Spring boot - spring-boot

We are developing a web server which allows user to submit spark jobs to run a hadoop cluster, and the web server will help to create a new cluster and keep monitoring the job.
We deployed the web server in 3 nodes and put a loader balancer in front of them.
The High Availability requirement is that once user has submitted the job, there must be one server keep monitoring it, in case the server is done, then another server should take this task and monitoring the job, so that it has no any impact to user.
Is there any suggested way to do that? What I could think is put all job information to some central storage(a table in a database), and all server keep polling the job info from the table, using distributed lock to ensure there will be only one and always be one server lock each row in the table hence monitoring that job.

Looks like hazelcast solution sounds ok.
high availability singleton processor in Tomcat
And still checking whether this is the best when doing in AWS.

Related

Spring boot aws cluster instance scheduler

I have a spring-boot application, which takes request from users and save data in db.
There are certain integration calls need with the data saved. So I thought a scheduler task for every 15 mins which should pick this data and do necessary calls.
But my application is being deployed in AWS EC2 on 2 instances. So this scheduler process will run on both the instances, which will cause duplicate integration calls.
Any suggestions on how this can be achieved to avoid duplicate calls.
I haven't had any code as of now to share.
Please share your thoughts...Thanks.
It seems a similar question was answered here: Spring Scheduled Task running in clustered environment
My take:
1) Easy - you can move the scheduled process to a separate instance from the ones that service request traffic, and only run it on one instance, a "job server" if you will.
2) Most scalable - have the scheduled task on two instances but they will somehow have to synchronize who is active and who is standby (perhaps with a cache such as AWS Elasticache). Or you can switch over to using Quartz job scheduler with JDBCJobStore persistence, and it can coordinate which of the 2 instances gets to run the job. http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/tutorial-lesson-09.html

Ignite Web Session Clustering design delima

I have design question about Ignite web session clustering.
I have springboot app with UI. It clustered app ie multiple instance of springboot app behind the load balancer. I am using org.apache.ignite.cache.websession.WebSessionFilter()to intercept request and create\manage session for any incoming request.
I have 2 option
Embed the ignite node inside springboot app. So have these embedded ignite node (on each springboot JVM) be part of cluster. This way request session is replicated across the entire springboot cluster. On load balancer I don’t have to maintain the sticky connection. The request can go to any app in round robin or least load algorithm.
Few considerations
Architect is simple. I don’t have worry about the cache being
down etc.
Now the cache being embedded, its using CPU and memory
from app jvm. It has potential of starving my app of resources.
Have ignite cluster running outside of app JVM. So now I run client node in springboot app and connect to main ignite cluster.
Few considerations
For any reason, if the client node cannot connect to main ignite
cluster. Do I have to manage the session manually and then push
those session manually at later point to the ignite cluster??
If I manage session locally I will need to have sticky connection on
the load balancer. Which I want to avoid if possible.
I am leaning to approach 2, but want to make it simple. So if client node
cannot create session (override
org.apache.ignite.cache.websession.WebSessionFilter()) it redirects
user to page indicating the app is down or to another app node in
the cluster.
Are there any other design approach I can take?
Am I overlooking anything in either approach?
If you have dealt with it, please share your thoughts.
Thanks in advance.
Shri
if you have a local cache for sessions and sticky sessions why do you need to use ignite at all?
However, It's better to go with ignite, your app will have HA, if some node is failed, the whole app still will work fine.
I agree you should split app cluster and ignite cluster, however, I think you shouldn't care about the server and client connection problems.
This kind of problems should lead to 500 error, would you emulate main storage if you DB go down or you can't connect to it?

AWS - Load Balanced Instances & Cron Jobs

I have a Laravel application where the Application servers are behind a Load Balancer. On these Application servers, I have cron jobs running, some of which should only be run once (or run on one instance).
I did some research and found that people seem to favor a lock-system, where you keep all the cron jobs active on each application box, and when one goes to process a job, you create some sort of lock so the others know not to process the same job.
I was wondering if anyone had more details on this procedure in regards to AWS, or if there's a better solution for this problem?
You can build distributed locking mechanisms on AWS using DynamoDB with strongly consistent reads. You can also do something similar using Redis (ElastiCache).
Alternatively, you could use Lambda scheduled events to send a request to your load balancer on a cron schedule. Since only one back-end server would receive the request that server could execute the cron job.
These solutions tend to break when your autoscaling group experiences a scale-in event and the server processing the task gets deleted. I prefer to have a small server, like a t2.nano, that isn't part of the cluster and schedule cron jobs on that.
Check out this package for Laravel implementation of the lock system (DB implementation):
https://packagist.org/packages/jdavidbakr/multi-server-event
Also, this pull request solves this problem using the lock system (cache implementation):
https://github.com/laravel/framework/pull/10965
If you need to run stuff only once globally (so not once on every server) and 'lock' the thing that needs to be run, I highly recommend using AWS SQS because it offers exactly that: run a cron to fetch a ticket. If you get one, parse it. Otherwise, do nothing. So all crons are active on all machines, but tickets are 'in flight' when some machine requests a ticket and that specific ticket cannot be requested by another machine.

Java web application short lived caching

I need to develop a Spring web application that needs to query a legacy system based on certain criteria (location). In order to reduce the load on the legacy system we wanted to extract data every 30 seconds for all locations in a single query and keep in-memory to serve client requests. Clients gets refreshed periodically (every minute). Web application does not write anything to the database.
The application is deployed to a tomcat cluster with at least two nodes.
In the above scenario what is the best way to implement in-memory data-store? We want to execute the query in only one tomcat node (say primary) and synchronize data to the other node (say secondary). When the primary node goes down, the secondary node should start executing the query to serve clients.
In the above scenario what is the best way to implement in-memory data-store?
You could use any distributed cache, such as, EHCACHE or Terracotta. With the right configuration, the cached data will be replicated to all the servers in the Tomcat cluster.
We want to execute the query in only one tomcat node.
Since you are using a Tomcat cluster, the clustered servers are most likely already behind a load balancer of some sort and your application is likely accessed as http://www.domain.com. This means, every request to a URL on www.domain.com is being routed to one of the clustered servers automatically by the load balancer.
A simple strategy would be to refresh the cache using an HTTP call, such as, curl http://www.domain.com/cache/refresh. Since this call will go through the load balancer, it will be automatically routed to one of the servers in the Tomcat cluster whenever invoked.
Now, just configure a cronjob to hit the cache refresh URL at your desired frequency. The cronjob can be configured on one of your servers, or use one of the many available web-based cron services.

What's best practice for HA gearman job servers

From gearman's main page, they mention running with multiple job servers so if a job server dies, the clients can pick up a new job server. Given the statement and diagram below, it seems that the job servers do not communicate with each other.
Our question is what happens to those jobs that are queued in the job server that died? What is the best practice to have high-availability for these servers to make sure jobs aren't interrupted in a failure?
You are able to run multiple job servers and have the clients and workers connect to the first available job server they are configured with. This way if one job server dies, clients and workers automatically fail over to another job server. You probably don't want to run too many job servers, but having two or three is a good idea for redundancy.
Source
As far as I know there is no proper way to handle this at the moment, but as long as you run both job servers with permanent queues (using MySQL or another datastore - just don't use the same actual queue for both servers), you can simply restart the job server and it'll load its queue from the database. This will allow all the queued tasks to be submitted to available workers, even after the server has died.
There is however no automagical way of doing this when a job server goes down, so if both the job server and the datastore goes down (a server running both locally goes down) will leave the tasks in limbo until it gets back online.
The permanent queue is only read on startup (and inserted / deleted from as tasks are submitted and completed).
I'm not sure about the complexity required to add such functionality to gearmand and whether it's actually wanted, but simple "task added, task handed out, task completed"-notifications between servers shouldn't been too complicated to handle.

Resources