What is in-memory cache and how does it work?

What is in-memory cache and how does it work? - caching

I was reading about Redis and Apache ignite, both of them are in-memory cache and also act as distributed cache. I was wondering what is in-memory cache? Where is the data stored ? In the memory of local system on which an application is being used or in the memory of server where the application is hosted? How does in-memory caching works?
Example:
An application with ignite cache is running on x IP address and I am using the application on y IP address so cache will stored in memory of x IP address system or y IP address system?
Also What does it mean when we say distributed cache?

The in-memory cache can be thought of as a cache that has performance critical information/data of a database that is shared across requests in an application. There is direct access to data/memory rather than through other mechanisms that enables database related operations to operate with high efficiency inturn increasing the throughput, responsiveness of the system.
In general, in the case of distributed cache based on deployment model, the cache memory can be between database and application in a distributed manner. This cache memory can be distributed between the nodes and shall operate based on the distributed hash table and the type of data. The access of data from cache in respective nodes can in turn shall apply the in-memory cache logic to bring in performance optimization.
Here is an example of how it is achieved with Amazon Elastic Cache
As you can see, the Amazaon Elastic Cache solution has a Cache Engine running in each node which implements the caching protocol/algorithm and the Amazon Elastic Cache can support cache sizes from 6 to 67 GB in a particular node. A DNS name is assigned to each Cache Node while it is created and you need to configure the DNS names of the nodes into the client library that is being used. Once your application invokes the Put or Get requests to the cluster, the library shall algorithmically choose a particular node using the hash function that shall spread the data out across the nodes and also help in fetching the same from the nodes.

A distributed cache partitions/shards your data across multiple cluster nodes.This allows utilizing memory and CPU resources of the entire cluster, and load balance requests. A node is a process that can be running on your physical server, virtual machine, or just be a Kubernetes pod. This article might be helpful to understand the basics.
Usually, an application needs to know the IP address of at least one cluster node to open a connection. Once the connection is opened, you would work with the cluster in a way similar to relational databases - just issue your SQL requests, compute tasks and perform other operations.
Also, watch the In-Memory Computing Essentials for Software Engineers recording that covers most of your questions and introduces you to the essential capabilities of Ignite. There is free instructor-led training that is scheduled from time to time on this topic.

Related

Apache Geode performance drop in cluster usage

I have integrated Apache Geode into a web application to store HTTP session data in it. This web application is run load-balanced, i.e. there are multiple instances of it sharing session data. Each web application instance has its own locale Geode cache (locator and server) and the data is distributed by use of a replicated region to other Geode nodes in the cluster. All instances are in the same network, no multi-site usage. The number of GET operations per second are around 5000 per second; the number of PUT operations are approximatley half of it.
Testing this setup with only one web application instance the performarnce is very promising (in the area of 20-30 ms). However, when adding an instance there is a significatn performance drop up to a few seconds.
It has shown that disabling TCP syn cookies lead to an improvement of processing time up to 50%. Though the performance is still not acceptable.
I ask myself how an eventual bottleneck (e.g. by the communication between Geode nodes) could be identified? Mainly I think of getting out metrics/statistics from Geode, although I could not find anything helpful yet in that regard. I'd appreciate any hint on how to investigate and eliminate performance problems with Apache Geode.

What is the difference between a dedicated and colocated cache? How would a colocated cache work?

According to the following video, a dedicated cache is a cache process hosted on a separate server while a colocated cache is a cache process hosted directly on the service hosts. Is that the standard definition? I can not find any more on this topic online.
In the colocated cache scenario would the service always reference the cache that is on this specific host or would it need to query other hosts as well? Is it possible to route requests to only hosts that have a colocated cache for that partition of data then in order to avoid the extra network hop to a cache server that would be needed to retrieve data in the dedicated cache host scenario?

In a colocated cache querying only the service instance hosting the data can be accomplished by sharding the data (typically by some key). Given a key, a requestor can resolve which shard owns the data, then resolve which instance owns the shard and direct the request at that instance. As long as everyone agrees on both levels of ownership, this works really well. It's also possible to have each instance of the service be able to forward requests so that even if a request gets misdirected it will with some likelihood eventually reach the correct instance.
Adya (2019) describes the broad approach, calling it a LInK store, which can take the approach of embedding the cache into actual service (which doesn't even require local interprocess communication). When this is done, there's basically no line between your service and the cache: the service is the cache and databases/object stores only exist to allow cold data to be evicted and to provide durability. One benefit of this approach is that your database/object store in the sunny day case mostly handles writes, and it ends up having a lot of mechanical sympathy to CQRS.
I personally have had a lot of success using Akka to implement services following this approach (Akka Cluster manages cluster membership and failure detection, Akka Cluster Sharding handles shard distribution and resolution, and Akka Persistence provides durability).

Redis cluster-network latency

There is a new Redis cluster setup, one team I know in my company is working on, in order to improve the application data caching based out on Redis. The setup is as follows, a Redis cluster with a Redis master and many slaves, say 40-50 (but can grow more when the application is scaled), one Redis instance per one virtual machine. I was told this setup helps the applications deployed in servers on every virtual machines query the data present in the local Redis instance than querying an instance in the network in order to avoid network latency. Periodically, the Redis master is updated only with whatever data are modified or newly created or deleted (data backed by a relational database), say every 5 seconds or so. This will initiate the data sync operation with all the Redis slave instances. The data-consumers (the application deployed on all the virtual machines) of the Redis (slaves) reads updated values to do processing. Is this approach a correct one to the network latency problem faced by the applications in querying from a Redis instance that is within a data center network? Will this setup not create lots of network traffic when Redis master syncing the data with all its slave nodes?
I couldn't find much answers on this from the internet. Your opinions on this are much appreciated.

The relevance of this kind of architecture depends a lot about the workload. Here are the important criteria:
the ratio between the write and read operations. Obviously, the more read operations, the more relevant the architecture. The main benefit IMO, is not necessarily the latency gains, but the scalability, the extra reliability it brings, and the network resource consumption.
the ratio between the cost of a local Redis access against the cost of a remote Redis access. Do not assume that the only cost of a remote Redis access is the network latency. It is not. On my systems, a local Redis access costs about 50 us (in average, very low workload), while a remote access costs 120 us (in average, very low workload). The network latency is about 60 us. Measure the same kind of figures on your own system/network, with your own data.
Here are a few advices:
do not use a single Redis master against many slave instances. It will limit the scalability of the system. If you want to scale, you need to build a hierarchy of slaves. For instance, have the master replicates to 8 slaves. Each slave replicates to 8 other slaves locally running on your 64 application servers. If you need to add more nodes, you can tune the replication factor at the master or slave level, or add one more layer in this tree for extreme scalability. It brings you flexibility.
consider using unix socket between the application and the local slaves, rather than TCP sockets. If it good for both latency and throughput.
Regarding your last questions, you really need to evaluate the average local and remote latencies to decide whether this is worth it. Note that the protocol used by Redis to synchronize master and slaves is close to the normal client server traffic. Every SET commands applied on the master, will be also applied on the slave. The network bandwidth consumption is therefore similar. So in the end, it is really a matter of how many reads and how many writes you expect.

What happend to Ehcache distributed cache when one of the cache node dead?

I've read ehcache documentation and knows that all the data in distributed cache are distributed across all the nodes. That mean none of the node have all the cache data. So what happened when one of the node dead? The cache objects on that dead node are gone?

Except if you use Terracotta as a clustering mechanism, all other ones would mean: all the data only present on the node that fails is lost.
Using Terracotta clustering, the data is "owned" by a Terracotta Server stripe, which can be backed up passive standby for HA. In such setups, data is never lost.

Distibuted caching should technically entail that all data is distributed on all cache servers such that the memory is being pooled and at the same time it should be made highly available. This really means that you need a peer-to-peer architecture such that all servers act as peers and if one goes down there should be a replica of the data of this server on other servers in the cache cluster so that there is business continuity. One such product is NCache that provides a true peer-to-peer architecture.
http://www.alachisoft.com/ncache/

Common Issues in Developing Cluster Aware non-web-based Enterprise Applications

I've to move a Windows based multi-threaded application (which uses global variables as well as an RDBMS for storage) to an NLB (i.e., network load balancer) cluster. The common architectural issues that immediately come to mind are
Global variables (which are both read/ written) will have to be moved to a shared storage. What are the best practices here? Is there anything available in Windows Clustering API to manage such things?
My application uses sockets, and persistent connections is a norm in the field I work. I believe persistent connections cannot be load balanced. Again, what are the architectural recommendations in this regard?

I'll answer the persistent connection part of the question first since it's easier. All good network load-balancing solutions (including Microsoft's NLB service built into Windows Server, but also including load balancing devices like F5 BigIP) have the ability to "stick" individual connections from clients to particular cluster nodes for the duration of the connection. In Microsoft's NLB this is called "Single Affinity", while other load balancers call it "Sticky Sessions". Sometimes there are caveats (for example, Microsoft's NLB will break connections if a new member is added to the cluster, although a single connection is never moved from one host to another).
re: global variables, they are the bane of load-balanced systems. Most designers of load-balanced apps will do a lot of re-architecture to minimize dependence on shared state since it impedes the scalabilty and availability of a load-balanced application. Most of these approaches come down to a two-step strategy: first, move shared state to a highly-available location, and second, change the app to minimize the number of times that shared state must be accessed.
Most clustered apps I've seen will store shared state (even shared, volatile state like global variables) in an RDBMS. This is mostly out of convenience. You can also use an in-memory database for maximum performance. But the simplicity of using an RDBMS for all shared state (transient and durable), plus the use of existing database tools for high-availability, tends to work out for many services. Perf of an RDBMS is of course orders of magnitude slower than global variables in memory, but if shared state is small you'll be reading out of the RDBMS's cache anyways, and if you're making a network hop to read/write the data the difference is relatively less. You can also make a big difference by optimizing your database schema for fast reading/writing, for example by removing unneeded indexes and using NOLOCK for all read queries where exact, up-to-the-millisecond accuracy is not required.
I'm not saying an RDBMS will always be the best solution for shared state, only that improving shared-state access times are usually not the way that load-balanced apps get their performance-- instead, they get performance by removing the need to synchronously access (and, especially, write to) shared state on every request. That's the second thing I noted above: changing your app to reduce dependence on shared state.
For example, for simple "counters" and similar metrics, apps will often queue up their updates and have a single thread in charge of updating shared state asynchronously from the queue.
For more complex cases, apps may swtich from Pessimistic Concurrency (checking that a resource is available beforehand) to Optimistic Concurrency (assuming it's available, and then backing out the work later if you ended up, for example, selling the same item to two different clients!).
Net-net, in load-balanced situations, brute force solutions often don't work as well as thinking creatively about your dependency on shared state and coming up with inventive ways to prevent having to wait for synchronous reading or writing shared state on every request.
I would not bother with using MSCS (Microsoft Cluster Service) in your scenario. MSCS is a failover solution, meaning it's good at keeping a one-server app highly available even if one of the cluster nodes goes down, but you won't get the scalability and simplicity you'll get from a true load-balanced service. I suspect MSCS does have ways to share state (on a shared disk) but they require setting up an MSCS cluster which involves setting up failover, using a shared disk, and other complexity which isn't appropriate for most load-balanced apps. You're better off using a database or a specialized in-memory solution to store your shared state.

Regarding persistent connection look into the port rules, because port rules determine which tcpip port is handled and how.
MSDN:
When a port rule uses multiple-host
load balancing, one of three client
affinity modes is selected. When no
client affinity mode is selected,
Network Load Balancing load-balances
client traffic from one IP address and
different source ports on
multiple-cluster hosts. This maximizes
the granularity of load balancing and
minimizes response time to clients. To
assist in managing client sessions,
the default single-client affinity
mode load-balances all network traffic
from a given client's IP address on a
single-cluster host. The class C
affinity mode further constrains this
to load-balance all client traffic
from a single class C address space.
In an asp.net app what allows session state to be persistent is when the clients affinity parameter setting is enabled; the NLB directs all TCP connections from one client IP address to the same cluster host. This allows session state to be maintained in host memory;
The client affinity parameter makes sure that a connection would always route on the server it was landed initially; thereby maintaining the application state.
Therefore I believe, same would happen for your windows based multi threaded app, if you utilize the affinity parameter.
Network Load Balancing Best practices
Web Farming with the
Network Load Balancing Service
in Windows Server 2003 might help you give an insight

Concurrency (Check out Apache Cassandra, et al)
Speed of light issues (if going cross-country or international you'll want heavy use of transactions)
Backups and deduplication (Companies like FalconStor or EMC can help here in a distributed system. I wouldn't underestimate the need for consulting here)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio