What's the best strategy to invalidate ORM cache? - performance

We have our ORM pretty nicely coupled with cache, so all our object gets are cached. Currently we invalidate our objects before and after our insert/update/delete of our object. What's your experience?

Why before AND after i/u/d?
If you don't want to update your cache directly then it's enough to invalidate an object after i/u/d assuming you load it into cache on every cache miss. If your object space is big enough that your cache could use up too much memory, you'll need some expiration mechanism too (invalidate after X minutes or after X minutes w/o being accessed).
Or you could go for LRU (Least Recently used) but this is not easy to implement on your own if your ORM doesn't support it natively.

Related

When to use Update vs Invalidate Cache Protocols

In what scenarios would it be better to use an update protocol vs an invalidate protocol? Also when would it be better to use an invalidate vs update?
I'm not able to think of any scenarios in which either would be used. If you're going to invalidate a cache line why not just update it at the same time?
Cache invalidation could be on multiple bases. It could be based on time, sliding window, based on other items within the cache or it could be from any data source.
Updating a cache is relatively a more expensive process. Considering what your data source is, it might cost you precious resources for something that would not be needed for some time.
So the question would be as why to invalidate items and why / when should you update them ?
Well, it completely depends on what is your use case. Do you want your items to automatically expire or have a dependency on any item.
When and why do you want to update them is also dependent on your use case. Would you need that item if it has not been accessed for the last 15 minutes or hours ? Why not update it only when it has been invalidated or expired.
In caches there is another concept of Read-Through. It calls for an update of item from your data source if it does not exist in the cache.

How to keep key in redis if it's being used, and expire if not?

I have a specific cache system in Redis.
The content of this system is quite volatile, and values get added and removed all the time. I want to keep the "used" keys in memory as much as possible, while getting the old ones to expire.
Each request can require hundreds of keys from the cache.
I'm aware that I could set a "long enough" expire time, and just dealt with the Cache misses, but I'd like to have as little misses as possible.
Currently I'm doing something like this, when I'm writing / reading to the cache (pseudo code)
# write
write(key, value)
expire(key, ttl)
# read
read(key)
expire(key, ttl)
I can optimise the read by using pipelining.
Now this still seems like it's not the best way of doing it.
Could someone give me a better strategy?
If you can live with the (current) resolution of 10 seconds then the OBJECT IDLETIME command would let you get a better sense of what has not been used for a while (in blocks of 10 seconds)
> SET X 10
OK
> OBJECT IDLETIME X
10
I would create a script (https://redis.io/commands/script-load) that does this atomically and faster directly on the server side and then use it with EvalSha (https://redis.io/commands/evalsha).
This saves the extra round trip on each of the commands.
Alternatively you can implement a similar algorithm to the LRU cache that Redis runs when it's out of space (https://redis.io/topics/lru-cache) - every once in a while get random keys and remove them if they're too old for you, optionally loop until you get a long sequence of new keys.
If what you are trying to achieve is a perfect LRU cache (Least Recently Used), you can tune Redis to behave like this globally, here is a link about Redis as LRU:
http://oldblog.antirez.com/post/redis-as-LRU-cache.html
Note that it is using maxmemory property on redis and the eviction rule is global unless you look at volatile LRU: How to make Redis choose LRU eviction policy for only some of the keys?
You are using a manual solution for eviction with custom expiration / TTL which is the most powerful solution, but maybe you can simplify your configuration and have a better predictable cache in memory size with this solution.

Hibernate - how to return objects not tracked by session?

In Entity Framework there is an option called AutoDetectChangesEnabled which significantly improves performance when performing bulk operations.
Is there any equivalent in Hibernate, which could improve performance while selecting/inserting many records to the database?
Or maybe the question should be, is such really needed?
There are many options:
Session.setDefaultReadOnly() - looks like a direct equivalent of AutoDetectChangesEnabled. However, it only disables detection of changes, but keeps session cache enabled, because it's needed for other features. So, it only affects performance, but not memory consumption.
StatelessSession - has no session cache (doesn't keep references to entities at all), and lacks many features of regular Session because of that
Another common approach to this problem is to clear() the session periodically (say, after each 100 entities) during processing (or evict() individual entities manually). This approach combines advantages of previous options, because it keeps normal semantics of Session while discarding entities when they are no longer needed

OLAP Saiku Cache expires

I'm using Saiku and PHPAnalytics to run MDX queries on my cube.
it seems if i run queries it's all good, caching is fine. But if I go for 2 hours and run those queries again - it does not using cache! Why? I need the cache to be saved for a long time! What to do? I tried to add this ti mondrian.properties mondrian.rolap.CachePool.costLimit = 2147483647
But no help. What do to?
The default in-memory cache of Mondrian stores things in a WeakHashMap. This means that it could be cleared at the discretion of the JVM's garbage collector. Most application servers are setup to do a periodical sweep of garbage collection (usually each hour or so). You have to either tweak your JVM's configuration to not do this.
-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000
You can also implement your own cache implementation of the SegmentCache SPI. If your implementation uses hard references, they will never be collected. This is trickier to do and will require you to do quite a bit of studying to get it right. You can start by taking a look at the default implementation and start from there.
The mondrian cache should cache up until the cache is deliberately flushed. That said it uses an aging system to determine what should be cached should it run out of memory to store the data, the oldest query gets pushed out of the cache and replaced.
I've not tried the PHPAnalytics stuff, but maybe they've put some call into the Saiku server to flush the cache on a regular basis, otherwise this shouldn't happen.

What should be stored in cache for web app?

I realize that this might be a vague question the bequests a vague answer, but I'm in need of some real world examples, thoughts, &/or best practices for caching data for a web app. All of the examples I've read are more technical in nature (how to add or remove cache data from the respective cache store), but I've not been able to find a higher level strategy for caching.
For example, my web app has an inbox/mail feature for each user. What I've been doing to date is storing typical session data in the cache. In this example, when the user logs in I go to the database and retrieve the user's mail messages and store them in cache. I'm beginning to wonder if I should just maintain a copy of all users' messages in the cache, all the time, and just retrieve them from cache when needed, instead of loading from the database upon login. I have a bunch of other data that's loaded on login (product catalogs and related entities) and login is starting to slow down.
So I guess my question to the community, is what would you do/recommend as an approach in this scenario?
Thanks.
This might be better suited to https://softwareengineering.stackexchange.com/, but generally you want to cache:
Metadata/configuration data that does not change frequently. E.g. country/state lists, external resource addresses, logic/branching settings, product/price/tax definitions, etc.
Data that is costly to retrieve or generate and that does not need to frequently change. E.g. historical data sets for reports.
Data that is unique to the current user's session.
The last item above is where you need to be careful as you can drastically increase your app's memory usage, by adding a few megabytes to the data for every active session. It also implies different levels of caching -- application wide, user session, etc.
Generally you should NOT cache data that is under active change.
In larger systems you also need to think about where the cache(s) will sit. Is it possible to have one central cache server, or is it good enough for each server/process to handle its own caching?
Also: you should have some method to quickly reset/invalidate the cached data. For a smaller or less mission-critical app, this could be as simple as restarting the web server. For the large system that I work on, we use a 12 hour absolute expiration window for most cached data, but we have a way of forcing immediate expiration if we need it.
This is a really broad question, and the answer depends heavily on the specific application/system you are building. I don't know enough about your specific scenario to say if you should cache all the users' messages, but instinctively it seems like a bad idea since you would seem to be effectively caching your entire data set. This could lead to problems if new messages come in or get deleted. Would you then update them in the cache? Would that not simply duplicate the backing store?
Caching is only a performance optimization technique, and as with any optimization, measure first before making substantial changes, to avoid wasting time optimizing the wrong thing. Maybe you don't need much caching, and it would only complicate your app. Maybe the data you are thinking of caching can be retrieved in a faster way, or less of it can be retrieved at once.
Cache anything that causes duplicate database queries.
Client side file caching is important as well. Assuming files are marked with an id in your database, cache them on every network request to avoid many network requests for the same file. A resource to do this can be found here (https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API). If you don't need to cache files, web storage, local storage and cookies are good for smaller pieces of data.
//if file is in cache
//refer to cache
//else
//make network request and push file to cache

Resources