Ehcache data persistent based on timeToIdleSeconds and timeToLive? - ehcache

I am aware about the difference between timeToIdleSeconds and timeToLiveSeconds. But if my cache also persist data on disk will these parameters remove data from disk after expiration as well or theses parameters removes data from memory only.
If i want to configure these parameters to remove data from disk after expiration how can i configure it.
Please suggest.

TTI and TTL are applied to all storage tiers of Ehcache, which includes the disk tier.
Note that Ehcache does not use a background process for removing expired entries, so they will be removed:
if accessed after their expiration time, then get will remove the mapping from the cache and not return it,
or through eviction.

Related

Redis cache tags flush memory leak

we using Laravel provided Redis tagged cache to cache query results for models in this way:
cache()->tags([model1, model2...])->remember(sql_query_hash, result_callback)
and if in example after some time and a users peak one of tags in example model1 have 500k of unique cached queries, and there comes update and need to do :
cache()>tags([model1])->flush();
my job gets allowed memory exhausted, for workers we have 128MB of memory. Yes I know if I would increase memory of workers I could flush then 1kk of keys etc. but its not a right way because we have exponential users increase and our project will grow, so maybe some of tags will have 10kk of keys on users peak, and how then I have to flush a cache for tag?
https://github.com/laravel/framework/blob/5.7/src/Illuminate/Cache/RedisTaggedCache.php#L174
this is how laravel flush tags keys, by retrieving all in memory then chunks it in memory again so this array_chunk double the memory usage after getting all keys, and then doing Redis::del operation to remove cached keys for this tag.
I don`t know how it call its a bug o not, but for me need some options does anyone dealing with that problem too, and maybe have some solutions?

how to keep caching up to date

when memecached or Redis is used for data-storage caching. How is the cache being updated when the value changed?
For, example. If I read key1 from cache the first time and it missed, then I pull value1 and put key1=value1 into cache.
After that if the value of key1 changed to value2.
How is value in cache updated or invalidated?
Does that mean whenever there is a change on key1's value. Either the application or database need to check if this key1 is in cache and update it?
Since you are using a cache, you have to tolerate the data inconsistency problem, i.e. at some time point, data in cache is different from data in database.
You don't need to update the value in cache, whenever the value has been changed. Otherwise, the whole cache system will be very complicated (e.g. you have to maintain a list of keys that have been cached), and it also might be unnecessary to do that (e.g. the key-value might be used only once, and no need to update it any more).
How can we update the data in cache and keep the cache system simple?
Normally, besides setting or updating a key-value pair in cache, we also set a TIMEOUT for each key. After that, client can get the key-value pair from the cache. However, if a key reaches the timeout, the cache system removes the key-value pair from the cache. This is called THE KEY HAS BEEN EXPIRED. The next time, the client trying to get that key from cache, will get nothing. This is called CACHE MISS. In this case, client has to get the key-value pair from database, and update it to cache with a new timeout.
If the data has been updated in database, while the key has NOT been expired in cache, client will get inconsistent data. However, when the key has been expired, its value will be retrieved from database and inserted into cache by some client. After that, other clients will get updated data until the data has been changed again.
How to set the timeout?
Normally, there're two kinds of expiration policy:
Expire in N seconds/minutes/hours...
Expire at some future timepoint, e.g. expire at 2017/7/30 00:00:00
A large timeout can largely reduce the load of database, while the data might be out-of-date for a long time. A small timeout can keep the data up-to-date as much as possible, while the database will have a heavy load. So you have to balance the trade-off when designing the timeout.
How does Redis expire keys?
Redis has two ways to expire keys:
When client tries to operate on a key, Redis checks if the key has reached the timeout. If it does, Redis removes the key, and acts as if the key doesn't exist. In this way, Redis ensures that client doesn't get expired data.
Redis also has an expiration thread that samples keys at a configured frequency. If the keys reach the timeout, Redis removes these keys. In this way, Redis can accelerate the key expiration process.
You can simply empty the particular cache value in the api function where insertion or updation of that particular value is performed. This way the server will fetch the updated value in the next request because you had already emptied the cache value.
Here is a diagram which will make it easier for you to understand:
I had similar issue related to stale data esp. in two cases:
When i get bulk messages/events
In this (my) use case, I am writing score to Redis cache and reading it again in subsequent call. In case of bulk messages, due to weak consistency in Redis, data might not be replicated to all replicas when I request again to read the data against same key(which is generally few ms(1-2 ms).
Remediation:
In this case, I was getting stale data. In order to address that, used cache on cache i.e. Loading TTL cache on Redis Cache. Here, it used to check the data in loading cache first, if not present, it checks data in Redis cache. Once done, both the caches are being updated.
in distributed system(k8s) where I have multiple pods
(kafka is being used as messaging broker)
When went for above strategy, we have another problem, what if data for a key previously served by say pod1, reaches to pod2. This has bigger impact, as it leads to data inconsistencies.
Remediation:
Here kafka partition key was set as "key" which is set in Redis. This way, we are getting subsequent messages to a particular pod only. In case of restart of pods, cache will be build again.
This solved our problem.

Can you evict Ignite cache backups to disk?

We would like to keep primary keys in memory and backup keys on disks. So on re-shuffle, we will accept the performance of reading key/values from disks.
From my research on the ignite documentation, I don't see that option out of the box. Is there any way to do this via configuration?
If this feature doesn't exist, as a workaround I had the following idea. If we know our cache takes 1 terabyte, we know with backups it will be 2 terabytes. (Approximately) If we allocate a little over 1 terabyte in memory and set the eviction policy to disk, will this effectively get us the functionality we want? That is, will it evict backup copies to disk and leave primaries in memory?
This feature doesn't exist and your workaround won't work because it will randomly evict primary and backup copies. However, you can probably implement your own eviction policy that will immediately evict any created backup and configure swap space to store this backups.
Note that I see sense only in case you're running SQL queries and/or if you don't have persistence store. If you only use key based access, any lost entry will be reloaded from the persistence store when needed.

How is memcached updated?

I have never used memcached before and I am confused on the following basic question.
Memcached is a cache right? And I assume we cache data from a DB for faster access. So when the DB is updated who is responsible to update the cache? Our code is does memcached "understand" when the DB has been updated?
Memcached is a cache right? And I assume we cache data from a DB for
faster access
Yes it is a cache, but you have to understand that a cache speed up the access when you are often accessing same data. If you access thousand times data/objects which are always different each other a cache doesn't help.
To answer your question:
So when the DB is updated who is responsible to update the cache?
Always you but you don't have to worry about if you are doing the right thing.
Our code is does memcached "understand" when the DB has been updated?
memcached doesn't know about your database. (actually the client doesn't know even about servers..) So when you use an object of your database you should check if is present in cache, if not you put in cache otherwise you are fine.. that is all. When the moment comes memcache will free the memory used by old data, or you can tell memcached to free data after a time you choose(read the API for details).
You are responsible to update the cache (or some plugin).
What happens is that the query is compressed to some key features and these are hashed. This is tested against the cache. If the value is in the cache, the data is returned directly from cache. Otherwise the query is performed, stored in cache and returned to the user.
In pseudo code:
key = query_key(your_sql_query)
if key in cache:
return cache.get(key)
else:
results = execute(your_sql_query)
cache.set(key, results, time_to_live)
return results.
The cache is cleared once in a while, you can give a time to live to a key, then your cached results are refreshed.
This is the most simple model, but can cause some inconsistencies.
One strategy is that if your code is also the only app that updates data, then your code can also refresh memcached as a second step after it has updated the database. Or at least evict the stale data from memcached, so the next time an app wants to read it, it will be forced to re-query the current data from the database and restore that latest data to memcached.
Another strategy is to store data in memcached with an expiration time, so memcached automatically purges that data element after a certain time. You pick the expiration time, based on your knowledge of how frequently the data might be updated, and how tolerant your app is of reading stale data.
So ultimately, you are the one responsible for putting data into memcached. Only you know what data is worth storing in the cache, what format you want to store it in, how frequently you expect to query it, and when to refresh it. You make this judgment on a case-by-case basis, because you know better than any automatic system the likely behavior of your data and your app.

Terracotta L2 cache invalidation by messaging

I am trying to evaluate Terracotta Disctributed Cache with ehcache. I have the following query. There are 20+ apps which will use a TAS distributed cache. As I understand there will be a L1 cache in each of these apps and a L2 in the cluster. The cluster cache data is fronting a Database which will be updated by a different app which we do not have access to. So we only read from this DB. But the DB updates needs to flow to the cache.
By the way of DB triggers the updated (keys alone) are stored in a temp table. In specific intervals a job monitors this table and collects the keys in the cache that needs to be expired. This is a separate batch job.
From here I need help. How do I inform the TAS L2 cache to expire/evict these keys? What options in terracotta are there?. Will this expiry event flow from L2 to all the individual apps? What is the time lag? I do not want to send the expiry keys to all the individual apps. Can this be accomplished?.
Thanks for the help!
Maybe I am missing something, but I am not sure why you would want to expire/evict those keys instead of simply calling cache.removeAll(keys). This removal will be automatically propagated to all L1 nodes which have those entries in their local cache.
The time lag depends on the consistency settings of the distributed cache.

Resources