Caching and clearing in a proxy with relations between proxy calls - caching

As part of a system I am working on we have put a layer of caching in a proxy which calls another system. The key to this cache is built up of the key value pairs which are used in the proxy call. So if the proxy is called with the same values the item will be retrieved from the cache rather than from the other service. This works and is fairly simple.
It gets more complicated when it comes to clearing the cache as it is not obvious which items to clear when an item is changed. if object A is contained in nodeset B and object A is changed, how do we know that nodeset B is stale.
We have got round the problem by having the service that we call return the nodesets to clear when objects are changed. However this breaks encapsulation and adds a layer of complexity in that we have to look in the responses to see what needs clearing.
Is there a better/standard way to deal with such situations.

Isn't thsi the sort of thing that could be (and should be) handled with the Observer pattern? Namely, B should listen to events that affect it's liveness, in this case the state of A.
A Map is a pretty natural abstraction for a cache and this is how Oracle Coherence and Terracotta do it. Coherence, with which I'm far more familiar, has mechanisms to listen to cache events either in general or for specific nodes. That's probably what you should emulate.
You also might want to look at the documentation for either of those even if its just as a guide or source of ideas.
You don't say what platform you're running in but perhaps we can suggest some alternatives to rolling your own, which is always going to be fraught with problems, particularly with something as complicated as a cache (make no mistake: caches are complicated).

Related

Configuration of Level 1 and Level 2 cache in JPA

I have read the following pages and I have several doubts.
About the persistence context type for Level 1 cache
What is the difference between Transaction-scoped Persistence context and Extended Persistence context?
About the Level 2 cache
http://www.objectdb.com/java/jpa/persistence/cache
Now, my questions are:
In a normal situation what is the best PersistenceContextType to
choose for L1 cache, TRANSACTION or EXTENDED? I suppose the answer
is TRANSACTION as it is the default. However I would like to know when
should I use EXTENDED.
In a normal situation what are the best values to choose for the
following porperties of L2 cache?:
javax.persistence.sharedCache.mode (I suppose the answer is ALL as it is the default and caches all the entities)
javax.persistence.cache.retrieveMode (I suppose the answer is USE as it is the default and uses the cache on retrieval)
javax.persistence.cache.storeMode (I suppose the answer is USE as it is the default, however I still don't understand the difference with REFRESH which seems better for me)
Can someone explain how to correctly put these properties of L1 and L2 correctly and explain when to use some values or others?
NOTE: this answer is not yet complete, I will update with details WRT cache modes
When working with Java EE, the default persistence context (PC) setting is TRANSACTION. This is also the optimal mode for almost all tasks. Because of it's relatively short lifespan, it has the benefit of being low or zero maintenance.
I can think of primarily two reasons to prefer an extended EM over a transactional one:
communication with external systems or the UI. You can manipulate managed entities and save them with the least possible moving parts - no merging and even no explicit saving is necessary. See this example by Adam Bien.
mimicking a conversation scope - using a single transaction spanning multiple HTTP requests is not practical, so an extended PC can be used instead. Examples here and here
an application where data is rarely written, but read very frequently. If you have reason to believe that the data is not going to change, you can have the benefits of caching the entities for frequent reads instead of fetching them from DB each time.
There are some downsides to using an extended EM
if a transaction is rolled back, all managed entities are detached. Restoring the PC to a consistent usable state may be quite hard to accomplish.
when used without caution, an extended PC can get cluttered with entities you no longer need. A long-living cache can contain large amounts of stale data.
You may need a strategy for refreshing/refetching the managed entities and a strategy for evicting entities, classes or clearing the cache altogether. Failure to design appropriate strategies can result in bugs that hard to spot and harder to reproduce. Proper cache invalidation is not trivial
So if using an extended EM, use it for a single purpose, so you can reason about the contents of the cache more easily.
I am not sure about the appropriate storeMode and retrieveMode settings yet. As for the storeMode, I have some doubts about their exact function

How to keep your distributed cache clean?

In a N-Tier architecture, what would be the best patterns to use so that you can keep your cache clean?
I know it's easy to just set an absolute/sliding timeout, but is there a better mechanism available to allow you to mark your cache as dirty after you update the underlying persistence.
The difficulty I"m trying to wrap my head around is that Cache are usually stored as KVP. But a query is usually a fair bit more complex than that. So how can the gateway service tell the cache store that for such and such query, it needs to refetch from persistence.
I also can't afford to hand-code the cache update per query. I'm looking for a more systematic approach.
Is this just a pipe dream, or is there some way to do this elegantly?
Link/Guide/Post appreciated.
I have worked with AppFabric and I think tried to do what you are asking about. I was working on an auction site and I wanted to pro-actively invalidate items in the cache.
For example, we had listings (things for sale) and they would be present all over the cache (AppFabric). The data that represented a listing was in 10 different places. What I initially wanted was a way to say, "Ok, my listing has changed. Let me go find everywhere it exists in cache, and then update." (I think you say "mark as dirty" in your question)
I found doing this was incredibly difficult. There are tags in AppFabric that I tried to use, so I would mark a given object (or collection of objects) with a tag and that would let me query the cache and remove items. In other words, if an object had a LISTING tag, I would find it and invalidate it.
Eventually I settled on a two-pronged attack.
For 95% of the data I let it expire. It was a happy day when I decided this because everything got much easier to develop. I had to make some concessions in the UI etc., but it was well worth it.
For the last 5% of the data I resolved to only ever store it once. For example, a bid on a listing. Whenever a new bid came in, we'd pro-actively invalidate that object, and then everything that needed that information would be updated as well.

memcached usage patterns

I'm planning the injection of a caching system within my website, will use it in different layers (data, presentation and may be somewhere else). Being my stack LAMP and my infrastructure 100% cloud on AWS, I thought the natural choice would be Amazon Elasticache (a managed installation of memcached). But...
Surprisingly - for me - I discovered memcached completely lacks of dependency management. I don't need "advanced" stuffs like ASP.Net cache SqlDependency or FileDependency, but memcached doesn't offer an easy other-key dependency neither, something pretty useful for building a dependency tree that greatly simplify the invalidation process.
So, as I know memcached is used in many complex systems, am I missing something? Are there usage patterns that make this lack irrelevant?
thanks
UPDATE
as asked, I add some pseudo code to clarify what I mean
dependency = 'ROOT_KEY';
cache:set(dependency, 0, NEVER_EXPIRE);
expire = 600;
cache:set('key1', obj1, expire, dependency);
cache:set('key2', obj2, expire, dependency);
...
cache:set('keyN', objN, expire, dependency);
//later, when I have to invalidate
cache:remove(dependency); //this will cause all keyX to be invalidated too
Based on the example in your question, memcached (and thus Elastic Cache) does not support any sort of key metadata like you are looking for by which you could relate such keys and operate on them as a group.
I suppose if you had only a handful of different "dependencies" you could simply utilize multiple elastic cache instances, which would allow you to invalidate all items within each instance/dependency simultaneously. This of course might end up costing you more in terms of AWS hardware costs then your would like since you can only increment your cache sizes in discrete amounts. This also would eliminate the ability for you to do a cache lookup without knowing the dependency/instance upon which the lookup is to occur.
For what you are trying to do, you might be able to use something like memory tables in MySQL/RDS if you are looking for more of a works-out-of-the-box type of solution. Of course you would not want to use RDS high-availibility features or point-in-time restoration, as these will break, since they require writing to disk. You would basically need to have a standalone RDS instance doing nothing but these memory tables.
It seems none of these options however is really an exact fit for what you are looking to do, so you might need to look into either adjusting your approach (if you want to use basic AWS components), or deploying an alternate caching system on EC2.

Cache vs HashMap for simple usecase

This must be a very basic:- Just curious, If I don't need distributed, cache-as-sor models, why do we need third party cache libraries (ehcache, memcached) when all you need (for simple use case) is just a key-value pair holder, something like HashMap ?
A lot of thought goes into producing software, and the more thought and testing by others (and fixes) improves the value of the software and also validates the code as a model (I didn't say a good model).
For the example, above, how would you handle the deleting of "old" cache items? You would have to add more code/features to insure that the cache could be emptied.
Using memcache may be overkill for a simple program, but it's already solved many of the problems that you will have and gives you a bit of extra ability.
I would also use Redis as an example. You can DO a lot of stuff in your own language, but sometimes, Redis would make other items easier.
YMMV!
-daniel

Organizing memcache keys

Im trying to find a good way to handle memcache keys for storing, retrieving and updating data to/from the cache layer in a more civilized way.
Found this pattern, which looks great, but how do I turn it into a functional part of a PHP application?
The Identity Map pattern: http://martinfowler.com/eaaCatalog/identityMap.html
Thanks!
Update: I have been told about the modified memcache (memcache-tag) that apparently does do a lot of this, but I can't install linux software on my windows development box...
Well, memcache use IS an identity map pattern. You check your cache, then you hit your database (or whatever else you're using). You can go about finding information about the source by storing objects instead of just values, but you'll take a performance hit for that.
You effectively cannot ask the cache what it contains as a list. To mass invalidate, you'll have to keep a list of what you put in and iterate it, or you'll have to iterate every possible key that could fit the pattern of concern. The resource you point out, memcache-tag can simplify this, but it doesn't appear to be maintained inline with the memcache project.
So your options now are iterative deletes, or totally flushing everything that is cached. Thus, I propose a design consideration is the question that you should be asking. In order to get a useful answer for you, I query thus: why do you want to do this?

Resources