I have integrated Apache Geode into a web application to store HTTP session data in it. This web application is run load-balanced, i.e. there are multiple instances of it sharing session data. Each web application instance has its own locale Geode cache (locator and server) and the data is distributed by use of a replicated region to other Geode nodes in the cluster. All instances are in the same network, no multi-site usage. The number of GET operations per second are around 5000 per second; the number of PUT operations are approximatley half of it.
Testing this setup with only one web application instance the performarnce is very promising (in the area of 20-30 ms). However, when adding an instance there is a significatn performance drop up to a few seconds.
It has shown that disabling TCP syn cookies lead to an improvement of processing time up to 50%. Though the performance is still not acceptable.
I ask myself how an eventual bottleneck (e.g. by the communication between Geode nodes) could be identified? Mainly I think of getting out metrics/statistics from Geode, although I could not find anything helpful yet in that regard. I'd appreciate any hint on how to investigate and eliminate performance problems with Apache Geode.
Related
I was reading about Redis and Apache ignite, both of them are in-memory cache and also act as distributed cache. I was wondering what is in-memory cache? Where is the data stored ? In the memory of local system on which an application is being used or in the memory of server where the application is hosted? How does in-memory caching works?
Example:
An application with ignite cache is running on x IP address and I am using the application on y IP address so cache will stored in memory of x IP address system or y IP address system?
Also What does it mean when we say distributed cache?
The in-memory cache can be thought of as a cache that has performance critical information/data of a database that is shared across requests in an application. There is direct access to data/memory rather than through other mechanisms that enables database related operations to operate with high efficiency inturn increasing the throughput, responsiveness of the system.
In general, in the case of distributed cache based on deployment model, the cache memory can be between database and application in a distributed manner. This cache memory can be distributed between the nodes and shall operate based on the distributed hash table and the type of data. The access of data from cache in respective nodes can in turn shall apply the in-memory cache logic to bring in performance optimization.
Here is an example of how it is achieved with Amazon Elastic Cache
As you can see, the Amazaon Elastic Cache solution has a Cache Engine running in each node which implements the caching protocol/algorithm and the Amazon Elastic Cache can support cache sizes from 6 to 67 GB in a particular node. A DNS name is assigned to each Cache Node while it is created and you need to configure the DNS names of the nodes into the client library that is being used. Once your application invokes the Put or Get requests to the cluster, the library shall algorithmically choose a particular node using the hash function that shall spread the data out across the nodes and also help in fetching the same from the nodes.
A distributed cache partitions/shards your data across multiple cluster nodes.This allows utilizing memory and CPU resources of the entire cluster, and load balance requests. A node is a process that can be running on your physical server, virtual machine, or just be a Kubernetes pod. This article might be helpful to understand the basics.
Usually, an application needs to know the IP address of at least one cluster node to open a connection. Once the connection is opened, you would work with the cluster in a way similar to relational databases - just issue your SQL requests, compute tasks and perform other operations.
Also, watch the In-Memory Computing Essentials for Software Engineers recording that covers most of your questions and introduces you to the essential capabilities of Ignite. There is free instructor-led training that is scheduled from time to time on this topic.
I have a really simple setup: An azure load balancer for http(s) traffic, two application servers running windows and one database, which also contains session data.
The goal is being able to reboot or update the software on the servers, without a single request being dropped. The problem is that the health probe will do a test every 5 seconds and needs to fail 2 times in a row. This means when I kill the application server, a lot of requests during those 10 seconds will time out. How can I avoid this?
I have already tried running the health probe on a different port, then denying all traffic to the different port, using windows firewall. Load balancer will think the application is down on that node, and therefore no longer send new traffic to that specific node. However... Azure LB does hash-based load balancing. So the traffic which was already going to the now killed node, will keep going there for a few seconds!
First of all, could you give us additional details: is your database load balanced as well ? Are you performing read and write on this database or only read ?
For your information, you have the possibility to change Azure Load Balancer distribution mode, please refer to this article for details: https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-distribution-mode
I would suggest you to disable the server you are updating at load balancer level. Wait a couple of minutes (depending of your application) before starting your updates. This should "purge" your endpoint. When update is done, update your load balancer again and put back the server in it.
Cloud concept is infrastructure as code: this could be easily scripted and included in you deployment / update procedure.
Another solution would be to use Traffic Manager. It could give you additional option to manage your endpoints (It might be a bit oversized for 2 VM / endpoints).
Last solution is to migrate to a PaaS solution where all this kind of features are already available (Deployment Slot).
Hoping this will help.
Best regards
What is the most efficient way of having Infinispan/JBoss Data Grid in library mode with several applications using same caches?
I currently setup JBoss Data Grid in library mode in EAP 6.3, have about 10 applications and 6 different caches configured.
Cache mode is Replication.
Each application has a cache manager which instantiates the caches that are required by the application. Each cache is used by at least 2 applications.
I hooked up hawtio and can see from JMX beans that multiple cache managers are created with duplicated cache instances.
From the logs, I see:
ISPN000094: Received new cluster view: [pcu-18926|10] (12) [pcu-18926, pcu-24741, pcu-57265, pcu-18397, pcu-26495, pcu-56892, pcu-59913, pcu-53108, pcu-34661, pcu-43165, pcu-32195, pcu-28641]
Does it have a lot of overhead in cache managers talking to each other all the time?
I eventually want to setup 4 cluster nodes with JBoss data grid in library mode so how can I configure so that all applications in one node share same cache manager hence reducing noise?
I can't use JBoss data grid in Server mode which I am aware will fulfil my requirements.
Thanks for any advice.
First of all, I may misunderstand your setup: this log says that there are 10 'nodes'. How many servers do you actually use? If you use the cache to communicate with 10 applications on the same machine, it's a very suboptimal approach; you keep 10 copies of all data and use many RPC to propagate writes between the caches. You should have single local-mode cache and just retrieve a reference to it (probably through JNDI).
Cache managers don't talk to each other, and caches do only when there is an executed operation, or when a node is joining/leaving/crashing (then the caches have to rebalance).
It's JGroups channel that keeps the view and exchanges some messages to detect if the other nodes are alive or other synchronizing messages, but this kind of messages is send once every few seconds, so this has a very low overhead.
On the other hand, each channel keeps several threadpools, and cache manager has a threadpool as well, so there is some memory overhead. From CPU point of view, there is a thread that iterates through the cache and purges expired entries (the task is started every minute), so even with idle cache full of entries some cycles are consumed. If the cache is empty, this has very low consumption (there's not much to iterate through).
There is a new Redis cluster setup, one team I know in my company is working on, in order to improve the application data caching based out on Redis. The setup is as follows, a Redis cluster with a Redis master and many slaves, say 40-50 (but can grow more when the application is scaled), one Redis instance per one virtual machine. I was told this setup helps the applications deployed in servers on every virtual machines query the data present in the local Redis instance than querying an instance in the network in order to avoid network latency. Periodically, the Redis master is updated only with whatever data are modified or newly created or deleted (data backed by a relational database), say every 5 seconds or so. This will initiate the data sync operation with all the Redis slave instances. The data-consumers (the application deployed on all the virtual machines) of the Redis (slaves) reads updated values to do processing. Is this approach a correct one to the network latency problem faced by the applications in querying from a Redis instance that is within a data center network? Will this setup not create lots of network traffic when Redis master syncing the data with all its slave nodes?
I couldn't find much answers on this from the internet. Your opinions on this are much appreciated.
The relevance of this kind of architecture depends a lot about the workload. Here are the important criteria:
the ratio between the write and read operations. Obviously, the more read operations, the more relevant the architecture. The main benefit IMO, is not necessarily the latency gains, but the scalability, the extra reliability it brings, and the network resource consumption.
the ratio between the cost of a local Redis access against the cost of a remote Redis access. Do not assume that the only cost of a remote Redis access is the network latency. It is not. On my systems, a local Redis access costs about 50 us (in average, very low workload), while a remote access costs 120 us (in average, very low workload). The network latency is about 60 us. Measure the same kind of figures on your own system/network, with your own data.
Here are a few advices:
do not use a single Redis master against many slave instances. It will limit the scalability of the system. If you want to scale, you need to build a hierarchy of slaves. For instance, have the master replicates to 8 slaves. Each slave replicates to 8 other slaves locally running on your 64 application servers. If you need to add more nodes, you can tune the replication factor at the master or slave level, or add one more layer in this tree for extreme scalability. It brings you flexibility.
consider using unix socket between the application and the local slaves, rather than TCP sockets. If it good for both latency and throughput.
Regarding your last questions, you really need to evaluate the average local and remote latencies to decide whether this is worth it. Note that the protocol used by Redis to synchronize master and slaves is close to the normal client server traffic. Every SET commands applied on the master, will be also applied on the slave. The network bandwidth consumption is therefore similar. So in the end, it is really a matter of how many reads and how many writes you expect.
I've to move a Windows based multi-threaded application (which uses global variables as well as an RDBMS for storage) to an NLB (i.e., network load balancer) cluster. The common architectural issues that immediately come to mind are
Global variables (which are both read/ written) will have to be moved to a shared storage. What are the best practices here? Is there anything available in Windows Clustering API to manage such things?
My application uses sockets, and persistent connections is a norm in the field I work. I believe persistent connections cannot be load balanced. Again, what are the architectural recommendations in this regard?
I'll answer the persistent connection part of the question first since it's easier. All good network load-balancing solutions (including Microsoft's NLB service built into Windows Server, but also including load balancing devices like F5 BigIP) have the ability to "stick" individual connections from clients to particular cluster nodes for the duration of the connection. In Microsoft's NLB this is called "Single Affinity", while other load balancers call it "Sticky Sessions". Sometimes there are caveats (for example, Microsoft's NLB will break connections if a new member is added to the cluster, although a single connection is never moved from one host to another).
re: global variables, they are the bane of load-balanced systems. Most designers of load-balanced apps will do a lot of re-architecture to minimize dependence on shared state since it impedes the scalabilty and availability of a load-balanced application. Most of these approaches come down to a two-step strategy: first, move shared state to a highly-available location, and second, change the app to minimize the number of times that shared state must be accessed.
Most clustered apps I've seen will store shared state (even shared, volatile state like global variables) in an RDBMS. This is mostly out of convenience. You can also use an in-memory database for maximum performance. But the simplicity of using an RDBMS for all shared state (transient and durable), plus the use of existing database tools for high-availability, tends to work out for many services. Perf of an RDBMS is of course orders of magnitude slower than global variables in memory, but if shared state is small you'll be reading out of the RDBMS's cache anyways, and if you're making a network hop to read/write the data the difference is relatively less. You can also make a big difference by optimizing your database schema for fast reading/writing, for example by removing unneeded indexes and using NOLOCK for all read queries where exact, up-to-the-millisecond accuracy is not required.
I'm not saying an RDBMS will always be the best solution for shared state, only that improving shared-state access times are usually not the way that load-balanced apps get their performance-- instead, they get performance by removing the need to synchronously access (and, especially, write to) shared state on every request. That's the second thing I noted above: changing your app to reduce dependence on shared state.
For example, for simple "counters" and similar metrics, apps will often queue up their updates and have a single thread in charge of updating shared state asynchronously from the queue.
For more complex cases, apps may swtich from Pessimistic Concurrency (checking that a resource is available beforehand) to Optimistic Concurrency (assuming it's available, and then backing out the work later if you ended up, for example, selling the same item to two different clients!).
Net-net, in load-balanced situations, brute force solutions often don't work as well as thinking creatively about your dependency on shared state and coming up with inventive ways to prevent having to wait for synchronous reading or writing shared state on every request.
I would not bother with using MSCS (Microsoft Cluster Service) in your scenario. MSCS is a failover solution, meaning it's good at keeping a one-server app highly available even if one of the cluster nodes goes down, but you won't get the scalability and simplicity you'll get from a true load-balanced service. I suspect MSCS does have ways to share state (on a shared disk) but they require setting up an MSCS cluster which involves setting up failover, using a shared disk, and other complexity which isn't appropriate for most load-balanced apps. You're better off using a database or a specialized in-memory solution to store your shared state.
Regarding persistent connection look into the port rules, because port rules determine which tcpip port is handled and how.
MSDN:
When a port rule uses multiple-host
load balancing, one of three client
affinity modes is selected. When no
client affinity mode is selected,
Network Load Balancing load-balances
client traffic from one IP address and
different source ports on
multiple-cluster hosts. This maximizes
the granularity of load balancing and
minimizes response time to clients. To
assist in managing client sessions,
the default single-client affinity
mode load-balances all network traffic
from a given client's IP address on a
single-cluster host. The class C
affinity mode further constrains this
to load-balance all client traffic
from a single class C address space.
In an asp.net app what allows session state to be persistent is when the clients affinity parameter setting is enabled; the NLB directs all TCP connections from one client IP address to the same cluster host. This allows session state to be maintained in host memory;
The client affinity parameter makes sure that a connection would always route on the server it was landed initially; thereby maintaining the application state.
Therefore I believe, same would happen for your windows based multi threaded app, if you utilize the affinity parameter.
Network Load Balancing Best practices
Web Farming with the
Network Load Balancing Service
in Windows Server 2003 might help you give an insight
Concurrency (Check out Apache Cassandra, et al)
Speed of light issues (if going cross-country or international you'll want heavy use of transactions)
Backups and deduplication (Companies like FalconStor or EMC can help here in a distributed system. I wouldn't underestimate the need for consulting here)