I'm planning the injection of a caching system within my website, will use it in different layers (data, presentation and may be somewhere else). Being my stack LAMP and my infrastructure 100% cloud on AWS, I thought the natural choice would be Amazon Elasticache (a managed installation of memcached). But...
Surprisingly - for me - I discovered memcached completely lacks of dependency management. I don't need "advanced" stuffs like ASP.Net cache SqlDependency or FileDependency, but memcached doesn't offer an easy other-key dependency neither, something pretty useful for building a dependency tree that greatly simplify the invalidation process.
So, as I know memcached is used in many complex systems, am I missing something? Are there usage patterns that make this lack irrelevant?
thanks
UPDATE
as asked, I add some pseudo code to clarify what I mean
dependency = 'ROOT_KEY';
cache:set(dependency, 0, NEVER_EXPIRE);
expire = 600;
cache:set('key1', obj1, expire, dependency);
cache:set('key2', obj2, expire, dependency);
...
cache:set('keyN', objN, expire, dependency);
//later, when I have to invalidate
cache:remove(dependency); //this will cause all keyX to be invalidated too
Based on the example in your question, memcached (and thus Elastic Cache) does not support any sort of key metadata like you are looking for by which you could relate such keys and operate on them as a group.
I suppose if you had only a handful of different "dependencies" you could simply utilize multiple elastic cache instances, which would allow you to invalidate all items within each instance/dependency simultaneously. This of course might end up costing you more in terms of AWS hardware costs then your would like since you can only increment your cache sizes in discrete amounts. This also would eliminate the ability for you to do a cache lookup without knowing the dependency/instance upon which the lookup is to occur.
For what you are trying to do, you might be able to use something like memory tables in MySQL/RDS if you are looking for more of a works-out-of-the-box type of solution. Of course you would not want to use RDS high-availibility features or point-in-time restoration, as these will break, since they require writing to disk. You would basically need to have a standalone RDS instance doing nothing but these memory tables.
It seems none of these options however is really an exact fit for what you are looking to do, so you might need to look into either adjusting your approach (if you want to use basic AWS components), or deploying an alternate caching system on EC2.
Related
In a N-Tier architecture, what would be the best patterns to use so that you can keep your cache clean?
I know it's easy to just set an absolute/sliding timeout, but is there a better mechanism available to allow you to mark your cache as dirty after you update the underlying persistence.
The difficulty I"m trying to wrap my head around is that Cache are usually stored as KVP. But a query is usually a fair bit more complex than that. So how can the gateway service tell the cache store that for such and such query, it needs to refetch from persistence.
I also can't afford to hand-code the cache update per query. I'm looking for a more systematic approach.
Is this just a pipe dream, or is there some way to do this elegantly?
Link/Guide/Post appreciated.
I have worked with AppFabric and I think tried to do what you are asking about. I was working on an auction site and I wanted to pro-actively invalidate items in the cache.
For example, we had listings (things for sale) and they would be present all over the cache (AppFabric). The data that represented a listing was in 10 different places. What I initially wanted was a way to say, "Ok, my listing has changed. Let me go find everywhere it exists in cache, and then update." (I think you say "mark as dirty" in your question)
I found doing this was incredibly difficult. There are tags in AppFabric that I tried to use, so I would mark a given object (or collection of objects) with a tag and that would let me query the cache and remove items. In other words, if an object had a LISTING tag, I would find it and invalidate it.
Eventually I settled on a two-pronged attack.
For 95% of the data I let it expire. It was a happy day when I decided this because everything got much easier to develop. I had to make some concessions in the UI etc., but it was well worth it.
For the last 5% of the data I resolved to only ever store it once. For example, a bid on a listing. Whenever a new bid came in, we'd pro-actively invalidate that object, and then everything that needed that information would be updated as well.
Is it possible to change all the key/value pairs in memcache instances with a command line?
Say, I have 10 memcache servers and they have key value pairs, and they all have the objects with 30 days expiration. But they don't expire at the same time, and I don't want all of them to expire at the same time. I want to change the objects to expire in 10 days. How can I make this change?
Is this even possible?
Can this be done via a commandline? Do I have to write a program for this?
You can accomplish this by touching values periodically. The FAQ describes a way to do this.
However, memcache isn't designed for this. What you're doing seems to be more like a persistent cache scenario. If you love memcache semantics, Membase and MemcacheDB provide solutions that may better fit your needs. There are many different persistent cache systems that do this just as well.
Depending on your specs, sometimes speeding up your data source may deliver better performance than memcache. Modern DMBSs cache heavily with sensible access protocols. This is entirely dependent on what your data sources look like and how much flexibility you have in your system design.
Memcache has a telnet interface. Then you can use FLUSH_ALL or FLUSH_ALL <seconds_to_wait>, if that's what you mean...
This must be a very basic:- Just curious, If I don't need distributed, cache-as-sor models, why do we need third party cache libraries (ehcache, memcached) when all you need (for simple use case) is just a key-value pair holder, something like HashMap ?
A lot of thought goes into producing software, and the more thought and testing by others (and fixes) improves the value of the software and also validates the code as a model (I didn't say a good model).
For the example, above, how would you handle the deleting of "old" cache items? You would have to add more code/features to insure that the cache could be emptied.
Using memcache may be overkill for a simple program, but it's already solved many of the problems that you will have and gives you a bit of extra ability.
I would also use Redis as an example. You can DO a lot of stuff in your own language, but sometimes, Redis would make other items easier.
YMMV!
-daniel
Im trying to find a good way to handle memcache keys for storing, retrieving and updating data to/from the cache layer in a more civilized way.
Found this pattern, which looks great, but how do I turn it into a functional part of a PHP application?
The Identity Map pattern: http://martinfowler.com/eaaCatalog/identityMap.html
Thanks!
Update: I have been told about the modified memcache (memcache-tag) that apparently does do a lot of this, but I can't install linux software on my windows development box...
Well, memcache use IS an identity map pattern. You check your cache, then you hit your database (or whatever else you're using). You can go about finding information about the source by storing objects instead of just values, but you'll take a performance hit for that.
You effectively cannot ask the cache what it contains as a list. To mass invalidate, you'll have to keep a list of what you put in and iterate it, or you'll have to iterate every possible key that could fit the pattern of concern. The resource you point out, memcache-tag can simplify this, but it doesn't appear to be maintained inline with the memcache project.
So your options now are iterative deletes, or totally flushing everything that is cached. Thus, I propose a design consideration is the question that you should be asking. In order to get a useful answer for you, I query thus: why do you want to do this?
As part of a system I am working on we have put a layer of caching in a proxy which calls another system. The key to this cache is built up of the key value pairs which are used in the proxy call. So if the proxy is called with the same values the item will be retrieved from the cache rather than from the other service. This works and is fairly simple.
It gets more complicated when it comes to clearing the cache as it is not obvious which items to clear when an item is changed. if object A is contained in nodeset B and object A is changed, how do we know that nodeset B is stale.
We have got round the problem by having the service that we call return the nodesets to clear when objects are changed. However this breaks encapsulation and adds a layer of complexity in that we have to look in the responses to see what needs clearing.
Is there a better/standard way to deal with such situations.
Isn't thsi the sort of thing that could be (and should be) handled with the Observer pattern? Namely, B should listen to events that affect it's liveness, in this case the state of A.
A Map is a pretty natural abstraction for a cache and this is how Oracle Coherence and Terracotta do it. Coherence, with which I'm far more familiar, has mechanisms to listen to cache events either in general or for specific nodes. That's probably what you should emulate.
You also might want to look at the documentation for either of those even if its just as a guide or source of ideas.
You don't say what platform you're running in but perhaps we can suggest some alternatives to rolling your own, which is always going to be fraught with problems, particularly with something as complicated as a cache (make no mistake: caches are complicated).