Let's say I have a Widget in a package version 1.0.
I use eh cache and store tons of these objects in the disk cache.
Two weeks later, someone adds a field to Widget, bumps the package to 1.1, and redeploys. Can eh cache handle loading the out of date objects from the persistent cache? Will it be able to ignore missing, or extra fields?
Actually, I just read Eh Cache utilizes the Serializable interface, so I guess so long as your implementation accounted for potentially missing fields, it would be alright. hmm.
Related
I am loading Asset Bundles from a Server at runtime with "LoadFromCacheOrDownload()".
I wonder how long the asset bundles are stored there (How long? Still there after restart?).
Should I also save them to the filesystem or is the cache enough?
Thank you.
Cached data by default sticks around for 150 days before being deleted because it has been unused. So if you don't clean it before that, it will most likely stick around. Caching behaviour is however dependent on the cache size as well, which is 50MiB for the web, and 4GiB for other platforms.
With this in mind it's up to you to decide whether or not the cache (and its behaviour) suffice for you, or if you would be better off storing data yourself as well.
I have read the following pages and I have several doubts.
About the persistence context type for Level 1 cache
What is the difference between Transaction-scoped Persistence context and Extended Persistence context?
About the Level 2 cache
http://www.objectdb.com/java/jpa/persistence/cache
Now, my questions are:
In a normal situation what is the best PersistenceContextType to
choose for L1 cache, TRANSACTION or EXTENDED? I suppose the answer
is TRANSACTION as it is the default. However I would like to know when
should I use EXTENDED.
In a normal situation what are the best values to choose for the
following porperties of L2 cache?:
javax.persistence.sharedCache.mode (I suppose the answer is ALL as it is the default and caches all the entities)
javax.persistence.cache.retrieveMode (I suppose the answer is USE as it is the default and uses the cache on retrieval)
javax.persistence.cache.storeMode (I suppose the answer is USE as it is the default, however I still don't understand the difference with REFRESH which seems better for me)
Can someone explain how to correctly put these properties of L1 and L2 correctly and explain when to use some values or others?
NOTE: this answer is not yet complete, I will update with details WRT cache modes
When working with Java EE, the default persistence context (PC) setting is TRANSACTION. This is also the optimal mode for almost all tasks. Because of it's relatively short lifespan, it has the benefit of being low or zero maintenance.
I can think of primarily two reasons to prefer an extended EM over a transactional one:
communication with external systems or the UI. You can manipulate managed entities and save them with the least possible moving parts - no merging and even no explicit saving is necessary. See this example by Adam Bien.
mimicking a conversation scope - using a single transaction spanning multiple HTTP requests is not practical, so an extended PC can be used instead. Examples here and here
an application where data is rarely written, but read very frequently. If you have reason to believe that the data is not going to change, you can have the benefits of caching the entities for frequent reads instead of fetching them from DB each time.
There are some downsides to using an extended EM
if a transaction is rolled back, all managed entities are detached. Restoring the PC to a consistent usable state may be quite hard to accomplish.
when used without caution, an extended PC can get cluttered with entities you no longer need. A long-living cache can contain large amounts of stale data.
You may need a strategy for refreshing/refetching the managed entities and a strategy for evicting entities, classes or clearing the cache altogether. Failure to design appropriate strategies can result in bugs that hard to spot and harder to reproduce. Proper cache invalidation is not trivial
So if using an extended EM, use it for a single purpose, so you can reason about the contents of the cache more easily.
I am not sure about the appropriate storeMode and retrieveMode settings yet. As for the storeMode, I have some doubts about their exact function
In a N-Tier architecture, what would be the best patterns to use so that you can keep your cache clean?
I know it's easy to just set an absolute/sliding timeout, but is there a better mechanism available to allow you to mark your cache as dirty after you update the underlying persistence.
The difficulty I"m trying to wrap my head around is that Cache are usually stored as KVP. But a query is usually a fair bit more complex than that. So how can the gateway service tell the cache store that for such and such query, it needs to refetch from persistence.
I also can't afford to hand-code the cache update per query. I'm looking for a more systematic approach.
Is this just a pipe dream, or is there some way to do this elegantly?
Link/Guide/Post appreciated.
I have worked with AppFabric and I think tried to do what you are asking about. I was working on an auction site and I wanted to pro-actively invalidate items in the cache.
For example, we had listings (things for sale) and they would be present all over the cache (AppFabric). The data that represented a listing was in 10 different places. What I initially wanted was a way to say, "Ok, my listing has changed. Let me go find everywhere it exists in cache, and then update." (I think you say "mark as dirty" in your question)
I found doing this was incredibly difficult. There are tags in AppFabric that I tried to use, so I would mark a given object (or collection of objects) with a tag and that would let me query the cache and remove items. In other words, if an object had a LISTING tag, I would find it and invalidate it.
Eventually I settled on a two-pronged attack.
For 95% of the data I let it expire. It was a happy day when I decided this because everything got much easier to develop. I had to make some concessions in the UI etc., but it was well worth it.
For the last 5% of the data I resolved to only ever store it once. For example, a bid on a listing. Whenever a new bid came in, we'd pro-actively invalidate that object, and then everything that needed that information would be updated as well.
I'm reading this article:
http://37signals.com/svn/posts/3113-how-key-based-cache-expiration-works
I'm not using rails so I don't really understand their example.
It says in #3:
When the key changes, you simply write the new content to this new
key. So if you update the todo, the key changes from
todos/5-20110218104500 to todos/5-20110218105545, and thus the new
content is written based on the updated object.
How does the view know to read from the new todos/5-20110218105545 instead of the old one?
I was confused about that too at first -- how does this save a trip to the database if you have to read from the database anyway to see if the cache is valid? However, see Jesse's comments (1, 2) from Feb 12th:
How do you know what the cache key is? You would have to fetch it from the database to know the mtime right? If you’re pulling the record from the database already, I would expect that to be the greatest hit, no?
Am I missing something?
and then
Please remove my brain-dead comment. I just realized why this doesn’t matter: the caching is cascaded, so yes a full depth regeneration incurs a DB hit. The next cache hit will incur one DB query for the top-level object—all the descendant objects are not queried because the cache for the parent object includes cached versions for the children (thus, no query necessary).
And Paul Leader's comment 2 below that:
Bingo. That’s why is works soooo well. If you do it right it doesn’t just eliminate the need to generate the HTML but any need to hit the db. With this caching system in place, our data-vis app is almost instantaneous, it’s actually useable and the code is much nicer.
So given the models that DHH lists in step 5 of the article and the views he lists in step 6, and given that you've properly setup your relationships to touch the parent objects on update, and given that your partials access your child data as parent.children, or even child.children in nested partials, then this caching system should have a net gain because as long as the parent's cache-key is still valid then the parent.children lookup will never happen and will also be pulled from cache, etc.
However, this method may be pointless if your partials reference lots of instance variables from the controller since those queries will already have been performed by the time Rails sees the calls to cache in the view templates. In that case you would probably be better off using other caching patterns.
Or at least this is my understanding of how it works. HTH
We have our ORM pretty nicely coupled with cache, so all our object gets are cached. Currently we invalidate our objects before and after our insert/update/delete of our object. What's your experience?
Why before AND after i/u/d?
If you don't want to update your cache directly then it's enough to invalidate an object after i/u/d assuming you load it into cache on every cache miss. If your object space is big enough that your cache could use up too much memory, you'll need some expiration mechanism too (invalidate after X minutes or after X minutes w/o being accessed).
Or you could go for LRU (Least Recently used) but this is not easy to implement on your own if your ORM doesn't support it natively.