Load Balancing to Maximize Local Server Cache - caching

I have a single-server system that runs all kinds of computations on user data, accessible via REST API. The computations require that large chunks of the user data are in memory during the computation. To do this efficiently, the system includes an in-memory cache, so that multiple requests on the same data chunks will not need to re-read the chunks from storage.
I'm now trying to scale the system out, since one large server is not enough, and I also want to achieve active/active high availability. I'm looking for the best practice to load balance between the multiple servers, while maximizing the efficiency of the local cache already implemented.
Each REST call includes a parameter that identifies which chunk of data should be accessed. I'm looking for a way to tell a load balancer to route the request to a server that has that chunk in cache, if such a server exists - otherwise just use a regular algorithm like round robin (and update the routing table such that the next requests for the same chunk will be routed to the selected server).
A bit more input to consider:
The number of data chunks is in the thousands, potentially tens of thousands. The number of servers is in the low dozens.
I'd rather not move to a centralized cache on another server, e.g. Redis. I have a lot of spare memory on the existing machines that I'd like to utilize since the computations are mostly CPU-bound. Also, I'd prefer not re-implement another custom caching layer.
My servers are on AWS so a way to implement this in ELB is fine with me, but open to other cloud-agnostic solutions. I could in theory implement a system that updates rules on an AWS application load balancer, but it could potentially grow to thousands of rules (one per chunk) and I'm not sure that will be efficient.
Since requests using the same data chunk can come from multiple sources, session-based stickiness is not enough. Some of these operations are write operations, and I'd really not want to deal with cross-server synchronization. All the operations on a single chunk should be routed to the single server that has that chunk in memory.
Any ideas are welcome! Thanks!

Related

Azure Redis cache latency

I am working on an application having web job and azure function app. Web job generates the redis cache for function app to consume. Cache size is around 10 Mega Bytes. I am using lazy loading and all as per the recommendation. I still find that the overall cache operation is slow. Depending upon the size of the file i am processing, i may end up calling Redis cache upto 100,000 times . Wondering if I need to hold the cache data in a local variabke instead of reading it every time from redis. Has anyone experienced any latency in accessing Redis? Does it makes sense to create a singletone object in c# function app and refresh it based on some timer or other logic?
could you consider this points in your usage this is some good practices of azure redis cashe
Redis works best with smaller values, so consider chopping up bigger data into multiple keys. In this Redis discussion, 100kb is considered "large". Read this article for an example problem that can be caused by large values.
Use Standard or Premium Tier for Production systems. The Basic Tier is a single node system with no data replication and no SLA. Also, use at least a C1 cache. C0 caches are really meant for simple dev/test scenarios since they have a shared CPU core, very little memory, are prone to "noisy neighbor", etc.
Remember that Redis is an In-Memory data store. so that you are aware of scenarios where data loss can occur.
Reuse connections - Creating new connections is expensive and increases latency, so reuse connections as much as possible. If you choose to create new connections, make sure to close the old connections before you release them (even in managed memory languages like .NET or Java).
Locate your cache instance and your application in the same region. Connecting to a cache in a different region can significantly increase latency and reduce reliability. Connecting from outside of Azure is supported, but not recommended especially when using Redis as a cache (as opposed to a key/value store where latency may not be the primary concern).
Redis works best with smaller values, so consider chopping up bigger data into multiple keys.
Configure your maxmemory-reserved setting to improve system responsiveness under memory pressure conditions, especially for write-heavy workloads or if you are storing larger values (100KB or more) in Redis. I would recommend starting with 10% of the size of your cache, then increase if you have write-heavy loads. See some considerations when selecting a value.
Avoid Expensive Commands - Some redis operations, like the "KEYS" command, are VERY expensive and should be avoided.
Configure your client library to use a "connect timeout" of at least 10 to 15 seconds, giving the system time to connect even under higher CPU conditions. If your client or server tend to be under high load, use an even larger value. If you use a large number of connections in a single application, consider adding some type of staggered reconnect logic to prevent a flood of connections hitting the server at the same time.

Redis: using two instances or just one (caching and storage)?

We need to perform rate limiting for requests to our API. We have a lot of web servers, and the rate limit should be shared between all of them. Also, the rate limit demands a certain amount of ephemeral storage (we want to store the users quota for a certain period of time).
We have a great rate limiting implementation that works with Redis by using SETEX. In this use case we need Redis to also be used a storage (for a short while, according to the expiration set on the SETEX calls). Also, the cache needs to be shared across all servers, and there is no way we could use something like an in-memory cache on each web server for dealing with the rate limiting since the rate limiting is per user - so we expect to have a lot of memory consumed for this purpose. So this process is a great use case for a Redis cluster.
Thing is - the same web server that performs the rate limit, also has some other caching needs. It fetches some stuff from a DB, and then caches the results in two layers: first, in an in-memory LRU-cache (on the actual server) and the second layer is Redis again - this time used as cache-only (no storage). In case the item gets evicted from the in-memory LRU-cache, it is passed on to be saved in Redis (so that even when a cache miss occurs in-memory, there would still be a cache-hit because thanks to Redis).
Should we use the same Redis instance for both needs (rate limiter that needs storage on one hand and cache layer that does not on the other)? I guess we could use a single Redis instance that includes storage (not the cache only option) and just use that for both needs? Would it be better, performance wise, for each server of ours to talk to two Redis instances - one that's used as cache-only and one that also features the storage option?
I always recommend dividing your setup into distinct data roles. Combining them sounds neat but in practice can be a real pain. In your case you ave two distinct "data roles": cached data and stored data. That is two major classes of distinction which means use two different instances.
In your particular case isolating them will be easier from an operational standpoint when things go wrong or need upgrading. You'll avoid intermingling services such that an issue in caching causes issues in your "storage" layer - or the inverse.
Redis usage tends to grow into more areas. If you get in the habit of dedicated Redis endpoints now you'll be better able to grow your usage in the future, as opposed to having to refactor and restructure into it when things get a bit rough.

Balancing Redis queries and in-process memory?

I am a software developer but wannabe architect new to the server scalability world.
In the context of multiple services working with the same data set, aiming to scale for redundancies and load balancing.
The question is: In a idealistic system, should services try to optimize their internal processing to reduce the amount of queries done to the remote server cache for better performance and less bandwidth at the cost of some local memory and code base or is it better to just go all-in and query the remote cache as the single transaction point every time any transaction need processing done on the data?
When I read about Redis and even general database usage online, the later seems to be the common option. Every nodes of the scaled application have no memory and read and write directly to the remote cache on every transactions.
But as a developer, I ask if this isn't a tremendous waste of resources? Whether you are designing at electronic chips level, at inter-thread, inter-process or inter-machine, I do believe it's the responsibility of each sub-system to do whatever it can to optimize its processing without depending on the external world if it can and hence reduce overall operation time.
I mean, if the same data is read over hundreds or time from the same service without changes (write), isn't it just more logical to keep a local cache and wait for notifications of changes (pub/sub) and only read only these changes to update the cache instead reading the bigger portion of data every time a transaction require it? On the other hand, I understand that this method implies that the same data will be duplicated at multiple place (more ram usage) and require some sort of expiration system not to keep the cache from filling up.
I know Redis is built to be fast. But however fast it is, in my opinion there's still a massive difference between reading directly from local memory versus querying an external service, transfer data over network, allocating memory, deserialize into proper objects and garbage collect it when you are finished with it. Anyone have benchmark numbers between in-process dictionaries query versus a Redis query on the localhost? Is it a negligible time in the bigger scheme of things or is it an important factor?
Now, I believe the real answer to my question until now is "it depends on your usage scenario", so let's elaborate:
Some of our services trigger actions on conditions of data change, others periodically crunch data, others periodically read new data from external network source and finally others are responsible to present data to users and let them trigger some actions and bring in new data. So it's a bit more complex than a single web pages deserving service. We already have a cache system codebase in most services, and we have a message broker system to notify data changes and trigger actions. Currently only one service of each type exist (not scaled). They transfer small volatile data over messages and bigger more persistent (changing less often) data over SQL. We are in process of moving pretty much all data to Redis to ease scalability and performances. Now some colleagues are having a heated discussion about whether we should abandon the cache system altogether and use Redis as the common global cache, or keep our notification/refresh system. We were wondering what the external world think about it. Thanks
(damn that's a lot of text)
I would favor utilizing in-process memory as much as possible. Any remote query introduces latency. You can use a hybrid approach and utilize in-process cache for speed (and it is MUCH faster) but put a significantly shorter TTL on it, and then once expired, reach further back to Redis.

Is performance worse when putting database to a dedicated server?

I heard that one way to scale your system is to use different machine for web server, database server, and even use multiple instances for each type of server
I wonder how could this improve performance over the one-server-for-everything model? Aren't there bottle necks in the connection between those servers? Moreover, you will have to care about synchronization while accessing the database server from different web server.
If your infrastructure is small enough then yes, 1 server for everything is (probably) the best way to do things, however when your size starts to require that you use more then 1 server, scaling the size of your single box can become much more expensive then having multiple cheaper servers. This also means that you can have more failure tolerance (if one server goes down, the other(s) can take over). As for synchronizing data, on the database side that is usually achieved by using clustering or replicating, on the application side it can be achieved with the likes of memcached or saving to the drive, and web servers themselves don't really need to be synchronized. Network bottlenecks on a local network (like your servers would be from one another) are negligible.
Having numerous servers may appear to be an attractive solution. One problem which often occurs is the latency that arises from communication between the servers. Even with fiber inter-connects it will be slower than if they reside on the same server. Of course, in a single server-solution, if one server application does a lot of work it may starve the DB application of needed CPU resources.
Another issue which may turn up is that of SANs. Proponents of SANs will say that they are just as fast as locally attached storage. The purpose of SANs is to cut costs on storage. Even if the SAN were to use the same high-performance disks as the local solution (wiping out the cost savings) you still have a slower connection and more simultaneous users to contend with on the SAN.
Conventional wisdom has it that a DB should be SQL-based with normalized data. It is worthwile to spend some time weighing pros and cons (yes SQL has cons) against each other.
Since "time-immemorial" (at least the last twenty years) indifferent programmers have overloaded servers with stuff they are too lazy to implement in the client. Indifferent (or ignorant) architects allow this practice to continue. End result: sluggish c/s implementations which are close to useless. Tripling the server park is a desperate "week-before-delivery" measure which - at best - results in a marginal performance increase. Often you lose performance instead.
DBs should not be bothered with complex requests involving multiple tables. Simple requests filtered by the client is the way to go.
One thing to try might be to put framework/SOAP-handling on one server and let it send binary requests to the DB server which answers with binary responses (trying to make sense of a SOAP request is very CPU-intensive and something which you don't want to leave to the DB application which will be more or less choked anyway). This way you'll have SOAP throttling only one part of the environment (the interface to users/other framework users) and the rest of the interfaces will be as efficient as they can be (binary).
Another thing - if the application allows it - is to put a cache front-end on the DB-application. The purpose of this cache is to do as much repetitive stuff as possible without involving the DB itself. This way the DB is left with handling fewer but (perhaps) more complicated requests instead of doing everything.
Oh, don't let clients send SQL statements directly to the DB. You'd be suprised at the junk a DB has to contend with.

Most efficient way to cache in a fastcgi app

For fun i am writing a fastcgi app. Right now all i do is generate a GUID and display it at the top of the page then make a db query based on the url which pulls data from one of my existing sites.
I would like to attempt to cache everything on the page except for the GUID. What is a good way of doing that? I heard of but never used redis. But it appears its a server which means its in a seperate process. Perhaps an in process solution would be faster? (unless its not?)
What is a good solution for page caching? (i'm using C++)
Your implementation sounds like you need a simple key-value caching mechanism, and you could possibly use a container like std::unordered_map from C++11, or its boost cousin, boost::unordered_map. unordered_map provides a hash table implementation. If you needed even higher performance at some point, you could also look at Boost.Intrusive which provides high performance, standard library-compatible containers.
If you roll your cache with the suggestions mentioned, a second concern will be expiring cache entries, because of the possibility your cached data will grow stale. I don't know what your data is like, but you can choose to implement a caching strategy like any of these:
after a certain time/number of uses, expire a cached entry
after a certain time/number of uses, expire the entire cache (extreme)
least-recently used - there's a stack overflow question concerning this: LRU cache design
Multithreaded/concurrent access may also be a concern, though as suggested in the link above, a possibility would be to lock the cache on access rather than worry about granular locking.
Now if you're talking about scaling, and moving up to multiple processes, and distributing server processes across multiple physical machines, the simple in-process caching might not be the way to go anymore (everyone could have different copies of data at any given time, inconsistency of performance if some server has cached data but others don't).
That's where Redis/Memcached/Membase/etc. shine - they are built for scaling and for offloading work from a database. They could be beaten out by a database and in-memory cache in performance (there is latency, after all, and a host of other factors), but when it comes to scaling, they are very useful and save load from a database, and can quickly serve requests. They also come with features cache expiration (implementations differ between them).
Best of all? They're easy to use and drop in. You don't have to choose redis/memcache from the outset, as caching itself is just an optimization and you can quickly replace the caching code with using, say, an in-memory cache of your own to using redis or something else.
There are still some differences between the caching servers though - membase and memcache distribute their data, while redis has master-slave replication.
For the record: I work in a company where we use memcached servers - we have several of them in the data center with the rest of our servers each having something like 16 GB of RAM allocated completely to cache.
edit:
And for speed comparisons, I'll adapt something from a Herb Sutter presentation I watched long ago:
process in-memory -> really fast
getting data from a local process in-memory data -> still really fast
data from local disk -> depends on your I/O device, SSD can be fast, but mechanical drives are glacial
getting data from remote process (in-memory data) -> fast-ish, and your cache servers better be close
getting data from remote process (disk) -> iceberg

Resources