Best way to remove cache entry based on predicate in infinispan?

Best way to remove cache entry based on predicate in infinispan? - caching

I want to remove few cache entries if the key in the cache matches some pattern.
For example, I've the following key-value pair in the cache,
("key-1", "value-1"), ("key-2", "value-2"), ("key-3", "value-3"), ("key-4", "value-4")
Since cache implements a map interface, i can do like this
cache.entrySet().removeIf(entry -> entry.getKey().indexOf("key-") > 0);
Is there a better way to do this in infinispan (may be using functional or cache stream api)?

The removeIf method on entrySet should work just fine. It will be pretty slow though for a distributed cache as it will pull down every entry in the cache and evaluate the predicate locally and then perform a remove for each entry that matches. Even in a replicated cache it still has to do all of the remove calls (at least the iterator will be local though). This method may be changed in the future as we are updating the Map methods already [a].
Another option is to instead use the functional API as you said [1]. Unfortunately the way this is implemented is it will still pull all the entries locally first. This may be changed at a later point if the Functional Map APIs become more popular.
Yet another choice is the cache stream API which may be a little more cumbersome to use, but will provide you the best performance of all of the options. Glad you mentioned it :) What I would recommend is to apply any intermediate operations first (luckily in your case you can use filter since your keys won't change concurrently). Then use the forEach terminal operation which passes the Cache on that node [2] (note this is an override). Inside the forEach callable you can call the remove command just as you wanted.
cache.entrySet().parallelStream() // stream() if you want single thread per node
.filter(e -> e.getKey().indexOf("key-") > 0)
.forEach((cache, e) -> cache.remove(e.getKey()));
You could also use indexing to avoid the iteration of the container as well, but I won't go into that here. Indexing is a whole different beast.
[a] https://issues.jboss.org/browse/ISPN-5728
[1] https://docs.jboss.org/infinispan/9.0/apidocs/org/infinispan/commons/api/functional/FunctionalMap.ReadWriteMap.html#evalAll-java.util.function.Function-
[2] https://docs.jboss.org/infinispan/9.0/apidocs/org/infinispan/CacheStream.html#forEach-org.infinispan.util.function.SerializableBiConsumer-

Related

Are cache ways in gem5 explicit or are they implied/derived from the number of cache sets and cache size?

I am trying to implement a gem5 version of HybCache as described in HYBCACHE: Hybrid Side-Channel-Resilient Caches for Trusted Execution Environments (which can be found at https://www.usenix.org/system/files/sec20spring_dessouky_prepub.pdf).
A brief summary of HybCache is that a subset of all the cache is reserved for use by secure processes and are isolated. This is achieved by using a limited subset of cache ways when the process is in 'isolated' mode. Non-isolated processes uses the cache operations normally, having access to the entire cache and using the replacement policy and associativity given in the configuration. The isolated subset of cache ways uses random replacement policy and is fully associative. Here is a picture demonstrating the idea.
The ways 6 and 7 are grey and represent the isolated cache ways.
So, I need to manipulate the placement of data into these ways. My question is, since I have found no mention of cache ways in the gem5 code, does that mean that the cache ways only exist logically? That is, do I have to manually calculate the location of each cache way? If cache ways are used in gem5, then were are they used? What is the file name?
Any help would be greatly appreciated.

This answer is only valid for the Classic cache model (src/mem/cache/).
In gem5 the number of cache ways is determined automatically from the cache size and the associativity. Check the files in src/mem/cache/tags/indexing_policies/ for the relevant code (specifically, the constructor of base.cc).
There are two ways you could tackle this implementation:
1 - Create a new class that inherits from BaseTags (e.g., HybCacheTags). This class will contain the decision of whether it should work in secure mode or not, and how to do so (i.e., when to call which indexing and replacement policy). Depending on whatever else is proposed in the paper, you may also need to derive from Cache to create a HybCache.
The new tags need one indexing policy per operation mode. One is the conventional (SetAssociative), and the other is derived from SetAssociative, where the parameter assoc makes the numSets become 1 (to make it fully associative). The derived one will also have to override at least one function, getPossibleEntries(), to only allow selecting the ways that you want. You can check skewed_assoc.cc for an example of a more complex location selection.
The new tags need one replacement policy per operation mode. You will likely just use the ones in the replacement_policies folder.
2 - You could create a HybCache based on the Cache class that has two tags, one conventional (i.e., BaseSetAssoc), and the other based on the FALRU class (rewritten to work as a, e.g., FARandom).
I believe the first option is easier and less hardcoded. FALRU has not been split into an indexing policy and replacement policy, so if you need to change one of these, you will have to reimplement it.
While implementing you may encounter coherence faults. If it happens it is much likely a problem in the indexing logic, and I wouldn't look into trying to find issues in the coherence model.

Is it a good practice to cache Redis data in ngx.shared

I have some Lua code embedded in nginx. In this code I get some small data from Redis cache. Now I wonder, if it is a good practice to cache this data (already cached in some sense) in nginx, using ngx.shared construct? Are there any pros and cons of doing it this way? In pseudo-code I expect to have something like:
local cache = ngx.shared.cache
local cached_key = cache:get("cached_key")
if cached_key == nil then
... get data from Redis
cache:set("cached_key", cached_key)
end

As stated in the documentation ngx.shared is a space shared among all the workers of the nginx server.
All the listed operations are atomic, so you only have to bother about race conditions if you use two operations on ngx.shared one after the other. In this case, they should be protected using ngx.semaphore.
The pros:
Using ngx.shared provides faster access to the data, because you avoid a request/response loop to the Redis server.
Even if you need a ngx.semaphore you can expect faster access to the data (but i have no benchmark to provide).
The cons:
The ngx.shared cache provides inaccurate data, as your local cache does not reflect the current Redis value. This is not always a crucial point, as there can always be a delta between the values used in the worker and the value stored in Redis.
Data stored in ngx.shared can be inconsistent, which is more important. For instance it can store x=true and y=false whereas in Redis x and y have always the same value. It depends on how you update your local cache.
You have to handle yourself the cache, by updating the values in your cache whenever they are sent to Redis. This can be easily done by wrapping the redis functions. Expect bugs if you handle updates by putting it after each call to redis.get, because you (or someone) will forget it.
You also have to handle reads: whenever a value is not found in your ngx.cache, you have to automatically read it from Redis. Expect bugs if you handle reads by putting them after each call to cache.get, because you (or someone) will forget it.
For the two last points, you can easily write a small wrapper module.
As a conclusion:
If your server runs only one instance, with one or several workers, using ngx.shared is interesting, as you can always have a cache of your Redis data that is always up-to-date.
If your server runs several instances and having an always up-to-date cache is mandatory, or if you could have consistency problems, then you should avoid caching using ngx.shared.
In all cases, if the size of your data can be huge, make sure to provide a way to clean it before memory consumption is too high. If you cannot provide cleaning, then you should not use ngx.shared.
Also, do not forget to store the cached value within a local variable, in order to avoid geting it again and again, and thus to improve efficiency.

Organizing memcache keys

Im trying to find a good way to handle memcache keys for storing, retrieving and updating data to/from the cache layer in a more civilized way.
Found this pattern, which looks great, but how do I turn it into a functional part of a PHP application?
The Identity Map pattern: http://martinfowler.com/eaaCatalog/identityMap.html
Thanks!
Update: I have been told about the modified memcache (memcache-tag) that apparently does do a lot of this, but I can't install linux software on my windows development box...

Well, memcache use IS an identity map pattern. You check your cache, then you hit your database (or whatever else you're using). You can go about finding information about the source by storing objects instead of just values, but you'll take a performance hit for that.
You effectively cannot ask the cache what it contains as a list. To mass invalidate, you'll have to keep a list of what you put in and iterate it, or you'll have to iterate every possible key that could fit the pattern of concern. The resource you point out, memcache-tag can simplify this, but it doesn't appear to be maintained inline with the memcache project.
So your options now are iterative deletes, or totally flushing everything that is cached. Thus, I propose a design consideration is the question that you should be asking. In order to get a useful answer for you, I query thus: why do you want to do this?

Caching and clearing in a proxy with relations between proxy calls

As part of a system I am working on we have put a layer of caching in a proxy which calls another system. The key to this cache is built up of the key value pairs which are used in the proxy call. So if the proxy is called with the same values the item will be retrieved from the cache rather than from the other service. This works and is fairly simple.
It gets more complicated when it comes to clearing the cache as it is not obvious which items to clear when an item is changed. if object A is contained in nodeset B and object A is changed, how do we know that nodeset B is stale.
We have got round the problem by having the service that we call return the nodesets to clear when objects are changed. However this breaks encapsulation and adds a layer of complexity in that we have to look in the responses to see what needs clearing.
Is there a better/standard way to deal with such situations.

Isn't thsi the sort of thing that could be (and should be) handled with the Observer pattern? Namely, B should listen to events that affect it's liveness, in this case the state of A.
A Map is a pretty natural abstraction for a cache and this is how Oracle Coherence and Terracotta do it. Coherence, with which I'm far more familiar, has mechanisms to listen to cache events either in general or for specific nodes. That's probably what you should emulate.
You also might want to look at the documentation for either of those even if its just as a guide or source of ideas.
You don't say what platform you're running in but perhaps we can suggest some alternatives to rolling your own, which is always going to be fraught with problems, particularly with something as complicated as a cache (make no mistake: caches are complicated).

Usage of RemoteCache with DeltaAware and Delta interface infinispan

I need some guidance related to the following scenario in infinispan. Here is my scenario:
1) I created two nodes and started successfully in infinispan using client server mode.
2) In the hot rod client I created a remotechachemanager and then obtained a RemoteCache.
3) In the remote cache I put like this cache.put(key, new HashMap()); it is successfully added.
4) Now when I am going to clear this value using cache.remove(key) , I am seeing that it is not getting removed and the hash map is still there every time I go to remove it.
How can clear the value so that it will be cleared from all node of the cluster?
How can I also propagate the changes like adding or removing from the value HashMap above?
Has it anything to do with implementing DeltaAware and Delta interface?
Please suggest me about this concept or some pointers where I can learn
Thank you

Removal of the HashMap should work as long as you use the same key and have equals() and hashCode() correctly implemented on the key. I assume you're using distributed or replicated mode.
EDIT: I've realized that equals() and hashCode() are not that important for RemoteCache, since the key is serialized anyway and all the comparison will be executed on the underlying byte[].
Remote cache does not directly support DeltaAware. Generally, using these is quite tricky even in library mode.
If you want to use cache with maps, I suggest rather using composite key like cache-key#map-key than storing complex HashMap.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio