Cache eviction in Mondrian - mondrian

It is not clear from the docs how Mondrian behaves regarding cache eviction.
The "Out of memory" section on configuration is very vague. Is it correct to say that Mondrian never evicts anything from cache? And if the user performs too diverse queries cache eventually grows to infinity?

Related

Redis CRDB Eviction Policy

I have read in the redis documentation that caching eviction policy for CRDB should be set to No Eviction .
"Note: Geo-Distributed CRDBs always operate in noeviction mode."
https://docs.redislabs.com/latest/rs/administering/database-operations/eviction-policy/
Reasoning for that is the garbage collection might cause inconsistencies as both the data center will have bidirectional synch.
I am not getting this point, can someone explain by giving a real world problem that might occur if suppose we have cache eviction policy LRU .
I got to know after doing some research that it is often a trouble to handle eviction when we have active replication. For example if one of the master runs out of memory and cache is trying to evict the keys to make some room for latest data, what might happen is - it will delete those keys from the other master even if there are no memory issues there. So until and unless there is really a good way to handle this ,eviction is not supported.

Can cache admission strategy be useful to prune distributed cache writes

Assume some distributed CRUD Service that uses a distributed cache that is not read-through (just some Key-Value store agnostic of DB). So there are n server nodes connected to m cache nodes (round-robin as routing). The cache is supposed to cache data stored in a DB layer.
So the default retrieval sequence seems to be:
check if data is in cache, if so return data
else fetch from DB
send data to cache (cache does eviction)
return data
The question is whether the individual service nodes can be smarter about what data to send to the cache, to reduce cache capacity costs (achieve similar hit ratio with less required cache storage space).
Given recent benchmarks on optimal eviction/admission strategies (in particular LFU), some new caches might not even store data if it is deemed too infrequently used, maybe application nodes can do some best-effort guess.
So my idea is that the individual service nodes could evaluate whether data that was fetched from a DB should be send to the distributed cache or not based on an algorithm like LFU, thus reducing the network traffic between service and cache. I am thinking about local checks (suffering a lack of effectivity on cold startups), but checks against a shared list of cached keys may also be considered.
So the sequence would be
check if data is in cache, if so return data
else fetch from DB
check if data key is frequently used
if yes, send data to cache (cache does eviction). Else not.
return data
Is this possible, reasonable, has it already been done?
It is common in databases, search, and analytical products to guard their LRU caches with filters to avoid pollution caused by scans. For example see Postgres' Buffer Ring Replacement Strategy and ElasticSearch's filter cache. These are admission policies detached from the cache itself, which could be replaced if their caching algorithm was more intelligent. It sounds like your idea is similar, except a distributed version.
Most remote / distributed caches use classic eviction policies (LRU, LFU). That is okay because they are often excessively large, e.g. Twitter requires a 99.9% hit rate for their SLA targets. This means they likely won't drop recent items because the penalty is too high and oversize so that the victim is ancient.
However, that breaks down when batch jobs run and pollute the remote caching tier. In those cases, its not uncommon to see the cache population disabled to avoid impacting user requests. This is then a distributed variant of Postgres' problem described above.
The largest drawback with your idea is checking the item's popularity. This might be local only, which has a frequent cold start problem, or remote call which adds a network hop. That remote call would be cheaper than the traffic of shipping the item, but you are unlikely to be bandwidth limited. Likely you're goal would be to reduce capacity costs by a higher hit rate, but if your SLA requires a nearly perfect hit rate then you'll over provision anyway. It all depends on whether the gains by reducing cache-aside population operations are worth the implementation effort. I suspect that for most it hasn't been.

Data consistency for NoSQL + Distributed Cache in very concurrent environment

On the slide you can see very rough architecture of booking system. It's very concurrent environment, where many users at once may try to book the same hotel/room.
At bottom we have NoSQL database, for quick response/request there is distributed cache and application which requests data.
The idea of this slide is that when you use NoSQL + Distributed Cache you'll get sync problems, means data consistency problems. You need to sync distributed cache with NoSQL db.
Question: What the solutions/techniques already exists for such case besides IMDG? That could be both frameworks or/and best practices. Is there any specific distributed caches that solves this problem?
Question2[updated]: What are the reasons we do write to the NoSQL db instead of cache? Are that transactions, node fail possibility or anything else?
P.S. That's not my slide, and author claimed that is a great use case for IMDG.
Do you really need the distributed cache? NoSQL solutions are by nature very performant, approaching the performance of stand-alone caches (like memcached).
I can get ~10ms access times out of Cassandra, which is not much slower than most caches.
I'll bet that by the time you put in cache validation overhead, and network overhead of missed cache hits, you are going to be better off going straight to your database.
You can still use caches for things that are less transient, like room types, prices, etc.

Distrubuted Caching algorithms/tutorials

What is the best mechanism to understand how caching frameworks/ caching algorithms works , is there any book which covers following topics in details.
cache hits
cache miss
LFU
LRU
LRU2
Two Queues
ARC
MRU
FIFO
Second Chance
Distributed caching

MySQL query caching: limited to a maximum cache size of 128 MB?

My application is very database intensive so I've tried really hard to make sure the application and the MySQL database are working as efficiently as possible together.
Currently I'm tuning the MySQL query cache to get it in line with the characteristics of queries being run on the server.
query_cache_size is the maximum amount of data that may be stored in the cache and query_cache_limit is the maximum size of a single resultset in the cache.
My current MySQL query cache is configured as follows:
query_cache_size=128M
query_cache_limit=1M
tuning-primer.sh gives me the following tuning hints about the running system:
QUERY CACHE
Query cache is enabled
Current query_cache_size = 128 M
Current query_cache_used = 127 M
Current query_cache_limit = 1 M
Current Query cache Memory fill ratio = 99.95 %
Current query_cache_min_res_unit = 4 K
However, 21278 queries have been removed from the query cache due to lack of memory
Perhaps you should raise query_cache_size
MySQL won't cache query results that are larger than query_cache_limit in size
And mysqltuner.pl gives the following tuning hints:
[OK] Query cache efficiency: 31.3% (39K cached / 125K selects)
[!!] Query cache prunes per day: 2300654
Variables to adjust:
query_cache_size (> 128M)
Both tuning scripts suggest that I should raise the query_cache_size. However, increasing the query_cache size over 128M may reduce performance according to mysqltuner.pl (see http://mysqltuner.pl/).
How would you tackle this problem? Would you increase the query_cache_size despite mysqltuner.pl's warning or try to adjust the querying logic in some way? Most of the data access is handled by Hibernate, but quite a lot of hand-coded SQL is used in the application as well.
The warning issued by mysqltuner.py is actually relevant even if your cache has no risk of being swapped.
It is well-explained in the following:
http://blogs.oracle.com/dlutz/entry/mysql_query_cache_sizing
Basically MySQL spends more time grooming the cache the bigger the cache is and since the cache is very volatile under even moderate write loads (queries gets cleared often), putting it too large will have an adverse effect on your application performance. Tweak the query_cache_size and query_cache_limit for your application, try finding a breaking point where you have most hits per insert, a low number of lowmem_prunes and keep a close eye on your database servers load while doing so too.
Usually "too big cache size" warnings are issued under assumption that you have few physical memory and the cache itself well need to be swapped or will take resources that are required by the OS (like file cache).
If you have enough memory, it's safe to increase query_cache size (I've seen installations with 1GB query cache).
But are you sure you are using the query cache right? Do have lots of verbatim repeating queries? Could you please post the example of a typical query?
You should be easy on increasing your cache, it is not only a "not that much available mem" thing!
Reading for instance the manual you get this quote:
Be cautious about sizing the query cache excessively large, which increases the overhead required to maintain the cache, possibly beyond the benefit of enabling it. Sizes in tens of megabytes are usually beneficial. Sizes in the hundreds of megabytes might not be.
There are various other sources you can check out!
A non-zero prune rate may be an indication that you should increase the size of your query cache. However, keep in mind that the overhead of maintaining the cache is likely to increase with its size, so do this in small increments and monitor the result. If you need to dramatically increase the size of the cache to eliminate prunes, there is a good chance that your workload is not a good match for the query cache.
So don't just put as much as you can in that query cache!
The best thing, would be to gradually increase the query cache and measure performance on your site. It's some sort of default in performance questions, but in cases like this 'testing' is one of the best things you can do.
Be careful with setting the query_cache_size and limit to high. MySQL only uses a single thread to read from the query cache.
With the query_cache_size set to 4G and query_cache_limit 12M we had a query cache rate of 85% but noticed a recurring spikes in connections.
After changing the query_cache_size to 256M with 64K query_cache_limit the query cache ratio dropped to 50% but the overall performance increased.
Overhead for Query cache is around 10% so I would disable query caching. Usually if you can't get your hit rate over 40 or 50 % maybe query cache isn't right for your database.
I've blog about this topic... Mysql query_cache_size performance here.
Query Cache gets invalidated/flush every time there is an insert, Use InnoDB/cache and avoid query cache or set it to a very small value.

Resources