Session Persistence Hazelcast client initialization when server is offline - session

We are trying to replicate the WebSphere Traditional (5/6/7/8/9) behaviour about session persistance for servlets and http, but with Hazelcast and Tomcat. Let me explain...
WebSphere, even when configured as client to a replication domain, keeps a local register of session data. And this local register works fine even if the server processes that should keep replicated data are shutdown from the very first moment. That is, you start the client, and session persistence works within the servlet container. Obviously, you cannot expect to recover your session in another servlet container if the first one crashes, but your applications work anyway.
On the other hand, Hazelcast client on Tomcat containers expect the Hazelcast server (at least one member of the cluster) to be up and running to initialize. If no cluster member is available, initialization fails, and ... web applications in the Tomcat servlet container do not start right. They won't answer any request.
Furthermore, once initialization fails, only way to recover is to shutdown and re-start the tomcat web containers (once a hazelcast cluster member is online).
This behaviour is a bit harsh on system administrators: no one can guarantee that a backup service as distributed session persistence is online all time. That means that launching a Tomcat client becomes a risky task, with a single point of failure by design, which is undesirable.
Now, maybe I overlooked something, maybe I got something wrong. So, ¿Did someone ever managed to start a Hazelcast client without servers, and how? For us, the difference is decisive: if we cannot make the web container start with the hazelcast server offline, then we must keep going on with WebSphere.
We have been trying it on a CentOS 7.5 on Virtual Box 5.2.22, and our Tomcat version is 8.5. Hazelcast client and server is 3.11.1/2.
<group>
<name>Integracion</name>
<password></password>
</group>
<network>
<cluster-members>
<address>hazelcastsrv1/address>
<address>hazelcastsrv2</address>
</cluster-members>
</network>
Sadly, we expect exactly what we get: the reading of the Hazelcast manual suggest that offline servers won't allow tomcat to serve applications. But we cannot beleive what we read, because it makes the library unsafe in a distributed context. We expect to be wrong, and that there are good news around the corner.

Hazelcast is not "a single point of failure by design". The design is to avoid a single point of failure. Data is mirrored across the nodes by default.
It's a data grid, you run as many nodes as capacity and resilience requires, and they cluster together.
If you need 3 nodes to be up for successful operations, and also anticipate that 1 might go down, then you need to run 4 in total. Should that 1 failure happen, you have a cluster surviving that is big enough.

Power-on/Power-off order is not relevant in Hazelcast, as long as you are providing remaining nodes, during power-off, enough time to let repartitioning complete. For example, in a 4 nodes cluster, if you take out 1 node and give the other 3 room to complete repartitioning then you dont loose the data. If you take out 2 nodes together then the cluster will be without the data whose backup was stored on 1 of the 2 nodes you took out.
For starting up, the startup sequence is not relevant as each node owns certain set of partitions that are determined based on consistent hashing. And this ownership continues to change even if there are nodes leaving/joining a running cluster.

Related

HazelCast Member with/without Client is ok for standalone web application

I am new to caching mechanism and just started learning about Hazelcast. I gone through couple of tutorials and hazelcast site but still I am not clear.
I am trying to build a caching for my springboot & angular application. It is a single standalone application.
So in my case, since my application single and no plan in running as multiple instance can I just go with Hazelcast member without client. Is client is needed?
No, the client is not mandatory, and for your case it would seem unnecessary.
The idea is around abstraction, you ask Hazelcast for item X and it is returned if it exists. Hazelcast works out where that item is held, and mostly this is hidden from you.
X could be found in your process:
Your process is a client, has near-caching active, and has a copy.
Your process is one of 1 or more servers, and happens to be the server responsible for storing item X.
X could be found in another process:
Your process is a client, has no near-caching, so is not storing anything
Your process is one of several servers, and it happens that one of the other servers is responsible for item X.
"Mostly this is hidden from you" == There will be a retrieval time difference between data found in the same process and data retrieved from another process, as it has to pass across the network. If this is a significant difference at low volumes, it's time to upgrade the network.

Alternative to session replication \ tomcat clustering

We have 3 tomcats with the same web app, using the same DB.
We want to use non-stickey session.
this means we will have to share the session (replicate) between the tomcats (cluster?)
We dont like the idea of the delta-manger since it is an all-to-all replication with preformance cost.
However we dont really like the backup-manager as well (still multiple copies)
My question is:
Is it possible to define a single tomcat that will be a "session manager" and all other tomcats will not keep sessions by themselves?
this way no broadcasting of sessions is needed...
My reading of the Tomcat docs finds:
... when using the delta manager it will replicate to all nodes, even
nodes that don't have the application deployed.
exactly as you say, but then says:
To get around this problem, you'll want to use the BackupManager. This
manager only replicates the session data to one backup node
You seem to object to "multiple copies", but this doesn't seem very different from your proposed suggestion, the BackupManager is, so far as I can see, acting as a Session Manager.
When you don't have sticky sessions you are pretty much guaranteeing that 2 of every 3 requests will need to get a copy of the session data from somewhere else, with only 3 tomcats how much performance cost would all-to-all replication impose?
I suspect that tuning your session sizes is more important. Large sessions tend to be a problem for any sort of replication.

mod_jk vs mod_cluster

Can someone please tell me the pro's and con's of mod_jk vs mod_cluster.
We are looking to do very simple load balancing.. We are going to be using sticky sessions and just need something to route new requests to a new server if one server goes down. I feel that mod_jk does this and does a good job so why do I need mod_cluster?
If your JBoss version is 5.x or above, you should use mod_cluster, it will give you a better performance and reliability than mod_jk. Here you've some reasons:
better load balacing between app servers: the load balancing logic is calculated based on information and metrics provided directly by the applications servers (bear in mind they have first hand information about its load), in contrast with mod_jk with which the logic is calculated by the proxy itself. For that, mod_cluster uses an extra connection between the servers and the proxy (a part from the data one), used to send this load information.
better integration with the lifecycle of the applications deployed in the servers: the servers keep the proxy informed about the changes of the application in each respective node (for example if you undeploy the application in one of the nodes, the node will inform the proxy (mod_cluster) immediately, avoiding this way the inconvenient 404 errors.
it doesn't require ajp: you can also use it with http or https.
better management of the servers lifecycle events: when a server shutdowns or it's restarted, it informs the proxy about its state, so that the proxy can reconfigure itself automatically.
You can use sticky sessions as well with mod cluster, though of course, if one of the nodes fails, mod cluster won't help to keep the user sessions (as it would happen as well with other balancers, unless you've the JBoss nodes in cluster). But due to the reasons given above (keeping track of the server lifecycle events, and better load balancing mainly), in case one of the servers goes down, mod cluster will manage it better and more transparently to the user (the proxy will be informed immediately, and so it will never send requests to that node, until it's informed that it's restarted).
Remember that you can use mod_cluster with JBoss AS/EAP 5.x or JBoss Web 2.1.1 or above (in the case of Tomcat I think it's version 6 or above).
To sum up, though your use case of load balancing is simple, mod_cluster offers a better performance and scalability.
You can look for more information in the JBoss site for mod_cluster, and in its documentation page.

Spring + Load balancing/Clustering

I am working on a webapp project and we are considering deploying it on multiple servers.
What solution do you advise for clustering/load-balancing with Spring?
What are the issues to take into account?
For example: How do singletons behave in a cluster of machines? What about session replication? Are there any other issues to take into account?
Here is the list of possible issues (not necessarily Spring-related):
stateful beans - if your beans have state, like collections accumulating something or counters, you need to think whether this state should be replicated or not. E.g. should this counter be local to one JVM or global in the whole cluster? In the latter case consider terracotta and hazelcast
filesystem - as long as all instances use the same database, everything is fine. But if one node writes to disk, other instance can't read it. Solutions? Either use database for all storage or distributed file system
HTTP sessions - either use sticky session or replicate sessions. If you go for replication, keep sessions as small as possible.
asynchronous jobs - if you have a job running every hour, should it run on every machine, or just on a dedicated one (or maybe on random)?

AppFabric Redundancy

We just tested an AppFabric cluster of 2 servers where we removed the "lead" server. The second server timeouts on any request to it with the error:
Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>:
There is a temporary failure. Please retry later.
(One or more specified Cache servers are unavailable, which could be caused by busy network or servers. Ensure that security permission has been granted for this client account on the cluster and that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Retry later.)
In practive this means that if one server in the cluster goes down then they all go down. (Note we are not using Windows cluster, only linking multiple AppFabric cache servers to each other.)
I need the cluster to continue operating even if a single server goes down. How do I do this?
(I realize this question is borderlining Serverfault, but imho developers should know this.)
You'll have to install the AppFabric cache on at least three lead servers for the cache to survive a single server crash. The docs state that the cluster will only go down if the "majority" of the lead servers go down, but in the fine print, they explain that 1 out of 2 constitutes a majority. I've verified that removing a server from a three lead-node cluster works as advertised.
Typical distributed systems concept. For a write or read quorum to occur in an ensemble you need to have 2f + 1 servers up where f is number of servers failing. I think appfabric or any CP (as in CAP theorem) consensus based systems need this to happen for working of the cluster.
--Sai
Thats actually a problem with the Appfabric architecture and it is rather confusing in terms of the "lead-host" concept. The idea is that the majority of lead hosts should be running so that the cluster remains up and running. So if you had three servers you'd have to have at least two lead hosts constantly communicating with each other and eating up server resources and if both go down then the whole cluster fails. The idea is to have a peer-to-peer architecture where all servers act as peers meaning that even if two servers go down the cluster remains functioning with no application downtimes. Try NCache:
http://www.alachisoft.com/ncache/

Resources