Java web application short lived caching - performance

I need to develop a Spring web application that needs to query a legacy system based on certain criteria (location). In order to reduce the load on the legacy system we wanted to extract data every 30 seconds for all locations in a single query and keep in-memory to serve client requests. Clients gets refreshed periodically (every minute). Web application does not write anything to the database.
The application is deployed to a tomcat cluster with at least two nodes.
In the above scenario what is the best way to implement in-memory data-store? We want to execute the query in only one tomcat node (say primary) and synchronize data to the other node (say secondary). When the primary node goes down, the secondary node should start executing the query to serve clients.

In the above scenario what is the best way to implement in-memory data-store?
You could use any distributed cache, such as, EHCACHE or Terracotta. With the right configuration, the cached data will be replicated to all the servers in the Tomcat cluster.
We want to execute the query in only one tomcat node.
Since you are using a Tomcat cluster, the clustered servers are most likely already behind a load balancer of some sort and your application is likely accessed as http://www.domain.com. This means, every request to a URL on www.domain.com is being routed to one of the clustered servers automatically by the load balancer.
A simple strategy would be to refresh the cache using an HTTP call, such as, curl http://www.domain.com/cache/refresh. Since this call will go through the load balancer, it will be automatically routed to one of the servers in the Tomcat cluster whenever invoked.
Now, just configure a cronjob to hit the cache refresh URL at your desired frequency. The cronjob can be configured on one of your servers, or use one of the many available web-based cron services.

Related

High availability in web application with Spring boot

We are developing a web server which allows user to submit spark jobs to run a hadoop cluster, and the web server will help to create a new cluster and keep monitoring the job.
We deployed the web server in 3 nodes and put a loader balancer in front of them.
The High Availability requirement is that once user has submitted the job, there must be one server keep monitoring it, in case the server is done, then another server should take this task and monitoring the job, so that it has no any impact to user.
Is there any suggested way to do that? What I could think is put all job information to some central storage(a table in a database), and all server keep polling the job info from the table, using distributed lock to ensure there will be only one and always be one server lock each row in the table hence monitoring that job.
Looks like hazelcast solution sounds ok.
high availability singleton processor in Tomcat
And still checking whether this is the best when doing in AWS.

I am new to distributed cache, and confused about choosing the better option and how to handle the cluster

My application receives requests (varying from 1 to 500) from multiple clients using socket connection. These requests are xml files that are to be stored in distributed cache for further consumption by external application. There is no database in this application. Also the client requests could be simultaneous, hence the cache should handle multi threaded application. Could some one advise me on below questions:
• How to choose if internal cache within application is enough or set up external cache cluster? (does it cause any overhead ? if the application is destroyed then do we loose cache?)
• Which distributed cache has to be chosen?(application is completely java based spring boot)
• Application will not be deployed in cloud, but can we choose any cloud cache cluster? like Cache as a Service (CaaS)
• How many nodes cluster is required? How do we decide this?

Using Azure load balancer to reboot/update server with zero downtime

I have a really simple setup: An azure load balancer for http(s) traffic, two application servers running windows and one database, which also contains session data.
The goal is being able to reboot or update the software on the servers, without a single request being dropped. The problem is that the health probe will do a test every 5 seconds and needs to fail 2 times in a row. This means when I kill the application server, a lot of requests during those 10 seconds will time out. How can I avoid this?
I have already tried running the health probe on a different port, then denying all traffic to the different port, using windows firewall. Load balancer will think the application is down on that node, and therefore no longer send new traffic to that specific node. However... Azure LB does hash-based load balancing. So the traffic which was already going to the now killed node, will keep going there for a few seconds!
First of all, could you give us additional details: is your database load balanced as well ? Are you performing read and write on this database or only read ?
For your information, you have the possibility to change Azure Load Balancer distribution mode, please refer to this article for details: https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-distribution-mode
I would suggest you to disable the server you are updating at load balancer level. Wait a couple of minutes (depending of your application) before starting your updates. This should "purge" your endpoint. When update is done, update your load balancer again and put back the server in it.
Cloud concept is infrastructure as code: this could be easily scripted and included in you deployment / update procedure.
Another solution would be to use Traffic Manager. It could give you additional option to manage your endpoints (It might be a bit oversized for 2 VM / endpoints).
Last solution is to migrate to a PaaS solution where all this kind of features are already available (Deployment Slot).
Hoping this will help.
Best regards

Ignite Web Session Clustering design delima

I have design question about Ignite web session clustering.
I have springboot app with UI. It clustered app ie multiple instance of springboot app behind the load balancer. I am using org.apache.ignite.cache.websession.WebSessionFilter()to intercept request and create\manage session for any incoming request.
I have 2 option
Embed the ignite node inside springboot app. So have these embedded ignite node (on each springboot JVM) be part of cluster. This way request session is replicated across the entire springboot cluster. On load balancer I don’t have to maintain the sticky connection. The request can go to any app in round robin or least load algorithm.
Few considerations
Architect is simple. I don’t have worry about the cache being
down etc.
Now the cache being embedded, its using CPU and memory
from app jvm. It has potential of starving my app of resources.
Have ignite cluster running outside of app JVM. So now I run client node in springboot app and connect to main ignite cluster.
Few considerations
For any reason, if the client node cannot connect to main ignite
cluster. Do I have to manage the session manually and then push
those session manually at later point to the ignite cluster??
If I manage session locally I will need to have sticky connection on
the load balancer. Which I want to avoid if possible.
I am leaning to approach 2, but want to make it simple. So if client node
cannot create session (override
org.apache.ignite.cache.websession.WebSessionFilter()) it redirects
user to page indicating the app is down or to another app node in
the cluster.
Are there any other design approach I can take?
Am I overlooking anything in either approach?
If you have dealt with it, please share your thoughts.
Thanks in advance.
Shri
if you have a local cache for sessions and sticky sessions why do you need to use ignite at all?
However, It's better to go with ignite, your app will have HA, if some node is failed, the whole app still will work fine.
I agree you should split app cluster and ignite cluster, however, I think you shouldn't care about the server and client connection problems.
This kind of problems should lead to 500 error, would you emulate main storage if you DB go down or you can't connect to it?

mod_jk vs mod_cluster

Can someone please tell me the pro's and con's of mod_jk vs mod_cluster.
We are looking to do very simple load balancing.. We are going to be using sticky sessions and just need something to route new requests to a new server if one server goes down. I feel that mod_jk does this and does a good job so why do I need mod_cluster?
If your JBoss version is 5.x or above, you should use mod_cluster, it will give you a better performance and reliability than mod_jk. Here you've some reasons:
better load balacing between app servers: the load balancing logic is calculated based on information and metrics provided directly by the applications servers (bear in mind they have first hand information about its load), in contrast with mod_jk with which the logic is calculated by the proxy itself. For that, mod_cluster uses an extra connection between the servers and the proxy (a part from the data one), used to send this load information.
better integration with the lifecycle of the applications deployed in the servers: the servers keep the proxy informed about the changes of the application in each respective node (for example if you undeploy the application in one of the nodes, the node will inform the proxy (mod_cluster) immediately, avoiding this way the inconvenient 404 errors.
it doesn't require ajp: you can also use it with http or https.
better management of the servers lifecycle events: when a server shutdowns or it's restarted, it informs the proxy about its state, so that the proxy can reconfigure itself automatically.
You can use sticky sessions as well with mod cluster, though of course, if one of the nodes fails, mod cluster won't help to keep the user sessions (as it would happen as well with other balancers, unless you've the JBoss nodes in cluster). But due to the reasons given above (keeping track of the server lifecycle events, and better load balancing mainly), in case one of the servers goes down, mod cluster will manage it better and more transparently to the user (the proxy will be informed immediately, and so it will never send requests to that node, until it's informed that it's restarted).
Remember that you can use mod_cluster with JBoss AS/EAP 5.x or JBoss Web 2.1.1 or above (in the case of Tomcat I think it's version 6 or above).
To sum up, though your use case of load balancing is simple, mod_cluster offers a better performance and scalability.
You can look for more information in the JBoss site for mod_cluster, and in its documentation page.

Resources