ehCache eviction on timeout and successful refresh - ehcache

Is it possible to configure or extend ehCache to satisfy the following requirements?
Cache an element with a time to live.
When an element is requested from the cache and the time to live has been exceeded, attempt to refresh that value, however if the look up fails use the previous value
The first of these is fairly obvious, but I don't see a way of satisfying the second condition.

Not without overriding Cache.searchInStoreWith/WithoutStats methods. Thing is that currently it's implemented the way that firstly element is evicted from underlying Store and only then, if it's configured, CacheListener is called and even then only the key is passed, not the value.
Of course to it would be possible to configure CacheWriter and not to evict expired elements(and even update them), but without overriding Chache.get your call would still return null. So it's kind of possible to hack around and get hold of expired element
Though to me it seems that it would be easy to change implementation so that expired element wouldn't be evicted, but instead CacheLoaded could be invoked. I'm planning on doing just that asynchronously since I'm better off with stale data than waiting too long for response from SOR or having no data at all if SOR wouldn't be reachable.
Seems like something similar is implemented to comply with JSR 107, but it does not distinguish between non existent and expired elements and if look up fails - expired element is gone.

Related

Is there a Caffeine feature that will purge a particular item from the cache after defined time and recreates it at the same time?

ExpireAfter will only purge the item but will not re-create the item. So what I need to do is, after a predefined interval, I need to purge a particular item from the cache and at the same time I need to recreate it. It might recreate with same data if there is no change in the data. Assuming the data was changed, the recreating will give the latest object.
My idea was to retrieve latest item form the cache all the time. In contrast, the Refresh feature (https://github.com/ben-manes/caffeine/wiki/Refresh) will provide the stale item for the first request and does an asynchronous loading. So for the second request the cache will provide the latest object.
Asynchronous removal listener that re-fetches the expired entry
should work in my case. Can you please provide me some information on
how to achieve this?
I'm also curious to know how the scheduled task can do it?
Assuming cache can address the following two cases:
Subsequent requests case:
I understand the refreshAfterWrite will provide the stale entry for
the first time but for the second request, what happens if the cache
hasn't yet completed loading the expired entry?
Does cache blocks the second request, completes the re-fetch, and
then provide the latest value to the second request?.
The idea is to make the cache provides the latest data after the
defined entry expiry time.
In the case where the cache has to load values equal to its capacity at one shot:
Let say the cache size is 100 and the time to load all the 100 items
is 2 minutes.
Assuming the first request would load 100 items into the cache at the
same time, after the defined expiry time, the cache should evict and
re-fetch all the 100 elements.
For the second request to access items from those 100 items, how can
I make the cache smart enough so that it returns the entries that
have been re-loaded and asynchronously re-loads the other entries?.
The idea is not to block any request for an existing entry. Serve the
request for an existing entry and do the re-load for the remaining
expired entries.
Asynchronous removal listener that re-fetches the expired entry should work in my case. Can you please provide me some information on how to achieve this?
The removal listener requires a reference to the cache, but that is not available during construction. If it calls a private method instead then the uninitialized field isn't captured and it can be resolved at runtime.
Cache<K, V> cache = Caffeine.newBuilder()
.expireAfterWrite(1, TimeUnit.HOURS)
.removalListener((K key, V value, RemovalCause cause) -> {
if (cause == RemovalCause.EXPIRED) {
reload(key);
}
}).build();
private void reload(K key) {
cache.get(key, k -> /* load */);
}
I'm also curious to know how the scheduled task can do it?
If you are reloading all entries then you might not even need a key-value cache. In that case the simplest approach would be to reload an immutable map.
volatile ImmutableMap<K, V> data = load();
scheduledExecutorService.scheduleAtFixedRate(() -> data = load(),
/* initial */ 1, /* period */ 1, TimeUnit.HOURS);
I understand the refreshAfterWrite will provide the stale entry for the first time but for the second request, what happens if the cache hasn't yet completed loading the expired entry?
The subsequent requests obtain the stale entry until either (a) the refresh completes and updates the mappings or (b) the entry was removed and the caller must reload. The case of (b) can occur if the entry expired while the refresh was in progress, where returning the stale value is no longer an option.
Does cache blocks the second request, completes the re-fetch, and then provide the latest value to the second request?.
No, the stale but valid value is returned. This is to let the refresh hide the latency of reloading a popular entry. For example an application configuration that is used by all requests would block when expired, causing periodic delays. The refresh would be triggered early, reload, and the callers would never observe it absent. This hides latencies, while also allowing idle entries to expire and fade away.
In the case where the cache has to load values equal to its capacity at one shot... after the defined expiry time, the cache should evict and re-fetch all the 100 elements.
The unclear part of your description is if the cache reloads only the entries being accessed within the refresh period or if it reloads the entire contents. The former is what Caffeine offers, while the latter is better served with an explicit scheduling thread.

Enabling "grace mode" for spring cache (returning old result while calculating new result asyncronously)

We would like to define our caches in our (spring boot) application similar to the concept varnish is doing with it's "grace mode". Right now we are using spring's #Cacheable annotation together with EhCache (but we are not bound to EhCache, if there's another cache backend able to to this would be fine, too).
Ideally I am searching for something like:
#Cacheable(cacheNames = "name", evictionAfter = "5 minutes", graceTimeAfterEviction = "1 minute")
public myCachedMethod() { ... }
What I mean is: If a value in the cache is already evicted (in this example 5+ minutes old) and a new request is asking for it, I want to return the already evicted value (to this request and all following until the new value is stored -- or the grace time passed, too), but still run the method asynchronously filling the cache with a new value.
Most of the approaches I have seen so far solve this by refreshing the cache programmatically with an asynchronous, scheduled task. But that's not what I want, as my cache entries depend on the arguments, and I don't want to keep old values if no one asks for them any more.
Of course, I know that this is not possible in this way, as spring is not aware of the cache management itself. And it can't be configured in the cache backend neither, because the cache backend has no idea of how to refresh the cache.
But maybe there is an approach out there, I did not find (I searched a lot for it already for a few years/months).
According to the documentation of caffeine cache this seems to be supported (https://github.com/ben-manes/caffeine, "https://github.com/ben-manes/caffeine"), but the documentation is, again, only about scheduled refreshes, no grace period. And it does not work with spring's annoations.
There is a similar question here: Spring cacheable asynchronous update while returns old cache, but more focussed on the sync=true feature.

What is it called when two requests are being served from the same cache?

I'm trying to find the technical term for the following (and potential solutions), in a distributed system with a shared cache:
request A comes in, cache miss, so we begin to generate the response
for A
request B comes in with the same cache key, since A is not
completed yet and hasn't written the result to cache, B is also a
cache miss and begins to generate a response as well
request A completes and stores value in cache
request B completes and stores value in cache (over-writing request A's cache value)
You can see how this can be a problem at scale, if instead of two requests, you have many that all get a cache miss and attempt to generate a cache value as soon as the cache entry expires. Ideally, there would be a way for request B to know that request A is generating a value for the cache, and wait until that is complete and use that value.
I'd like to know the technical term for this phenomenon, it's a cache race of sorts.
It's a kind of Thundering Herd
Solution: when first request A comes and fills a flag, if request B comes and finds the flag then wait... After A loaded the data into the cache, remove flag.
If all other request are waked up by the cache loaded event, would trigger all thread "Thundering Herd". So also need to care about the solution.
For example in Linux kernel, only one process would be waked up, even several process depends on the event.

Redis cache updating

EDIT2: Clarification: The code ALREADY has refresh cache on miss logic. What I'm trying to do is reducing the number of missed cache hits.
I'm using Redis as a cache for an API. The idea is that when the API receives a call it first checks the cache and if the data isn't in cache the API will fetch it and cache it afterwards for next time.
At the moment the configuration is the following:
maxmemory 50mb
maxmemory-policy allkeys-lru
That is, use at most 50mb memory, keep trying keys in there and when memory is full start by deleting the least recently used keys (lru).
Now I want to introduce a second category of keys. For this second category I'm going to set a certain expiry time. Now I would like to set up a mechanism such that when these keys expiry this mechanism kicks in and refreshes them (and sets new expiry).
How do I do this?
EDIT:
Some progress. It turns out that Redis has a pub/sub messaging system which in particular can dispatch messages on event. One of them is expiring keys, which can be enabled as such:
notify-keyspace-events Ex
I found this code can describes a blocking python process subscribing to Redis' messaging system. It can easily be changed to detect keys expiring and make a call to the API when a key expires, and the API will then refresh the key.
def work(self, item):
requests.get('http://apiurl/?q={param}'.format(param=item['data']))
So this does precisely what I was asking about.
Often, this feels way too dangerous and out of control. I can imagine a bunch of different situations under which this will very quickly fail.
So, what's a better solution?
http://redis.io/topics/notifications
Keyspace notifications allows clients to subscribe to Pub/Sub channels
in order to receive events affecting the Redis data set in some way.
Examples of the events that is possible to receive are the following:
All the keys expiring in the database 0. (e.g)
...
EXPIRE generates an expire event when an expire is set to the key, or
a expired event every time setting an expire results into the key
being deleted (see EXPIRE documentation for more info).
To expire keys, just use Redis' built-in expiry mechanism. You don't need to refresh the cache contents on expiry, the simplest is to do it when the code experiences a cache miss.

Guava Cache: How to access without it counting for the eviction policy?

I have a Guava cache which I would like to expire after X minutes have passed from the last access on a key. However, I also periodically do an action on all the current key-vals (much more frequently than the X minutes), and I wouldn't like this to count as an access to the key-value pair, because then the keys will never expire.
Is there some way to read the value of the keys without this influencing the internal state of the cache? ie cache._secretvalues.get(key) where I could conceivably subclass Cache to StealthCache and do getStealth(key)? I know relying on internal stuff is non-ideal, just wondering if it's possible at all. I think when I do cache.asMap.get() it still counts as an access internally.
From the official Guava tutorials:
Access time is reset by all cache read and write operations (including
Cache.asMap().get(Object) and Cache.asMap().put(K, V)), but not by
containsKey(Object), nor by operations on the collection-views of
Cache.asMap(). So, for example, iterating through cache.entrySet()
does not reset access time for the entries you retrieve.
So, what I would have to do is iterate through the entrySet instead to do my stealth operations.

Resources