OLAP Saiku Cache expires - caching

I'm using Saiku and PHPAnalytics to run MDX queries on my cube.
it seems if i run queries it's all good, caching is fine. But if I go for 2 hours and run those queries again - it does not using cache! Why? I need the cache to be saved for a long time! What to do? I tried to add this ti mondrian.properties mondrian.rolap.CachePool.costLimit = 2147483647
But no help. What do to?

The default in-memory cache of Mondrian stores things in a WeakHashMap. This means that it could be cleared at the discretion of the JVM's garbage collector. Most application servers are setup to do a periodical sweep of garbage collection (usually each hour or so). You have to either tweak your JVM's configuration to not do this.
-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000
You can also implement your own cache implementation of the SegmentCache SPI. If your implementation uses hard references, they will never be collected. This is trickier to do and will require you to do quite a bit of studying to get it right. You can start by taking a look at the default implementation and start from there.

The mondrian cache should cache up until the cache is deliberately flushed. That said it uses an aging system to determine what should be cached should it run out of memory to store the data, the oldest query gets pushed out of the cache and replaced.
I've not tried the PHPAnalytics stuff, but maybe they've put some call into the Saiku server to flush the cache on a regular basis, otherwise this shouldn't happen.

Related

Solr Caching Update on Writes

I've been looking at potential ways to speed up solr queries for an application I'm working on. I've read about solr caching (https://wiki.apache.org/solr/SolrCaching), and I think the filter and query caches may be of some help. The application's config does setup these caches, but it looks like with some default settings that weren't experimented with, and our cache hit rate is relatively low.
One detail I've not been able to determine is how the caches deal with updates. If I update records that would result in removing or adding that record from the query or filter cache, do the caches update in a performant way? The application is fairly write-heavy, so whether the caches update in a conducive manner or not will probably determine whether trying to tune the caches will help much.
The short answer is that an update (add, edit, or delete) on your index followed by a commit operation rebuilds the index and replaces the current index. Since caches are associated with a specific index version, they are discarded when the index is replaced. If autowarming is enabled, then the caches in the new index will be primed with recent queries or queries that you specify.
However, this is Solr that we're talking about and there are usually multiple ways to handle any situation. That is definitely the case here. The commit operation mentioned above is known as a hard commit and may or may not be happening depending on your Solr configuration and how your applications interact with it. There's another option known as a soft commit that I believe would be a good choice for your index. Here's the difference...
A hard commit means that the index is rebuilt and then persisted to disk. This ensures that changes are not lost, but is an expensive operation.
A soft commit means that the index is updated in memory and not persisted to disk. This is a far less expensive operation, but data could conceivably be lost if Solr is halted unexpectedly.
Going a step further, Solr has two nifty settings known as autoCommit and autoSoftCommit which I highly recommend. You should disable all hard commit operations in your application code if you enable auto commit. The autoCommit setting can specify a period of time to queue up document changes (maxTime) and/or the number of changes to allow in the queue (maxDocs). When either of these limits is reached, a hard commit is performed. The autoSoftCommit setting works the same way, but results in (you guessed it) a soft commit. Solr's documentation on UpdateHandlers is a good starting point to learn about this.
These settings effectively make it possible to do batch updates instead of one at a time. In a write-heavy application such as yours, this is definitely a good idea. The optimal settings will depend upon the frequency of reads vs writes and, of course, the business requirements of the application. If near-real-time (NRT) search is a requirement, you may want autoSoftCommit set to a few seconds. If it's acceptable for search results to be a bit stale, then you should consider setting autoSoftCommit to a minute or even a few minutes. The autoCommit setting is usually set much higher as its primary function is data integrity and persistence.
I recommend a lot of testing in a non-production environment to decide upon reasonable caching and commit settings for your application. Given that your application is write-heavy, I would lean toward conservative cache settings and you may want to disable autowarming completely. You should also monitor cache statistics in production and reduce the size of caches with low hit rates. And, of course, keep in mind that your optimal settings will be a moving target, so you should review them periodically and make adjustments when needed.
On a related note, the Seven Deadly Sins of Solr is a great read and relevant to the topic at hand. Best of luck and have fun with Solr!

How long does the Cache in Unity hold objects?

I am loading Asset Bundles from a Server at runtime with "LoadFromCacheOrDownload()".
I wonder how long the asset bundles are stored there (How long? Still there after restart?).
Should I also save them to the filesystem or is the cache enough?
Thank you.
Cached data by default sticks around for 150 days before being deleted because it has been unused. So if you don't clean it before that, it will most likely stick around. Caching behaviour is however dependent on the cache size as well, which is 50MiB for the web, and 4GiB for other platforms.
With this in mind it's up to you to decide whether or not the cache (and its behaviour) suffice for you, or if you would be better off storing data yourself as well.

What should be stored in cache for web app?

I realize that this might be a vague question the bequests a vague answer, but I'm in need of some real world examples, thoughts, &/or best practices for caching data for a web app. All of the examples I've read are more technical in nature (how to add or remove cache data from the respective cache store), but I've not been able to find a higher level strategy for caching.
For example, my web app has an inbox/mail feature for each user. What I've been doing to date is storing typical session data in the cache. In this example, when the user logs in I go to the database and retrieve the user's mail messages and store them in cache. I'm beginning to wonder if I should just maintain a copy of all users' messages in the cache, all the time, and just retrieve them from cache when needed, instead of loading from the database upon login. I have a bunch of other data that's loaded on login (product catalogs and related entities) and login is starting to slow down.
So I guess my question to the community, is what would you do/recommend as an approach in this scenario?
Thanks.
This might be better suited to https://softwareengineering.stackexchange.com/, but generally you want to cache:
Metadata/configuration data that does not change frequently. E.g. country/state lists, external resource addresses, logic/branching settings, product/price/tax definitions, etc.
Data that is costly to retrieve or generate and that does not need to frequently change. E.g. historical data sets for reports.
Data that is unique to the current user's session.
The last item above is where you need to be careful as you can drastically increase your app's memory usage, by adding a few megabytes to the data for every active session. It also implies different levels of caching -- application wide, user session, etc.
Generally you should NOT cache data that is under active change.
In larger systems you also need to think about where the cache(s) will sit. Is it possible to have one central cache server, or is it good enough for each server/process to handle its own caching?
Also: you should have some method to quickly reset/invalidate the cached data. For a smaller or less mission-critical app, this could be as simple as restarting the web server. For the large system that I work on, we use a 12 hour absolute expiration window for most cached data, but we have a way of forcing immediate expiration if we need it.
This is a really broad question, and the answer depends heavily on the specific application/system you are building. I don't know enough about your specific scenario to say if you should cache all the users' messages, but instinctively it seems like a bad idea since you would seem to be effectively caching your entire data set. This could lead to problems if new messages come in or get deleted. Would you then update them in the cache? Would that not simply duplicate the backing store?
Caching is only a performance optimization technique, and as with any optimization, measure first before making substantial changes, to avoid wasting time optimizing the wrong thing. Maybe you don't need much caching, and it would only complicate your app. Maybe the data you are thinking of caching can be retrieved in a faster way, or less of it can be retrieved at once.
Cache anything that causes duplicate database queries.
Client side file caching is important as well. Assuming files are marked with an id in your database, cache them on every network request to avoid many network requests for the same file. A resource to do this can be found here (https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API). If you don't need to cache files, web storage, local storage and cookies are good for smaller pieces of data.
//if file is in cache
//refer to cache
//else
//make network request and push file to cache

How safe is it to store sessions with Redis?

I'm currently using MySql to store my sessions. It works great, but it is a bit slow.
I've been asked to use Redis, but I'm wondering if it is a good idea because I've heard that Redis delays write operations. I'm a bit afraid because sessions need to be real-time.
Has anyone experienced such problems?
Redis is perfect for storing sessions. All operations are performed in memory, and so reads and writes will be fast.
The second aspect is persistence of session state. Redis gives you a lot of flexibility in how you want to persist session state to your hard-disk. You can go through http://redis.io/topics/persistence to learn more, but at a high level, here are your options -
If you cannot afford losing any sessions, set appendfsync always in your configuration file. With this, Redis guarantees that any write operations are saved to the disk. The disadvantage is that write operations will be slower.
If you are okay with losing about 1s worth of data, use appendfsync everysec. This will give great performance with reasonable data guarantees
This question is really about real-time sessions, and seems to have arisen partly due to a misunderstanding of the phrase 'delayed write operations' While the details were eventually teased out in the comments, I just wanted to make it super-duper clear...
You will have no problems implementing real-time sessions.
Redis is an in-memory key-value store with optional persistence to disk. 'Delayed write operations' refers to writes to disk, not the database in general, which exists in memory. If you SET a key/value pair, you can GET it immediately (i.e in real-time). The policy you select with regards to persistence (how much you delay the writes) will determine the upper-bound for how much data could be lost in a crash.
Basically there are two main types available: async snapsnots and fsync(). They're called RDB and AOF respectively. More on persistence modes on the official page.
The signal handling of the daemonized process syncs to disk when it receives a SIGTERM for instance, so the data will still be there after a reboot. I think the daemon or the OS has to crash before you'll see an integrity corruption, even with the default settings (RDB snapshots).
The AOF setting uses an Append Only File that logs the commands the server receives, and recreates the DB from scratch on cold start, from the saved file. The default disk-sync policy is to flush once every second (IIRC) but can be set to lock and write on every command.
Using both the snapshots and the incremental log seems to offer both a long term don't-mind-if-I-miss-a-few-seconds-of-data approach with a more secure, but costly incremental log. Redis supports clustering out of the box, so replication can be done too it seems.
I'm using the default RDB setting myself and saving the snapshots to remote FTP. I haven't seen a failure that's caused a data loss yet. Acute hardware failure or power outages would most likely, but I'm hosted on a VPS. Slim chance of that happening :)

How do I load the Oracle schema into memory instead of the hard drive?

I have a certain web application that makes upwards of ~100 updates to an Oracle database in succession. This can take anywhere from 3-5 minutes, which sometimes causes the webpage to time out. A re-design of the application is scheduled soon but someone told me that there is a way to configure a "loader file" which loads the schema into memory and runs the transactions there instead of on the hard drive, supposedly improving speed by several orders of magnitude. I have tried to research this "loader file" but all I can find is information about the SQL* bulk data loader. Does anyone know what he's talking about? Is this really possible and is it a feasible quick fix or should I just wait until the application is re-designed?
Oracle already does it's work in memory - disk I/O is managed behind the scenes. Frequently accessed data stays in memory in the buffer cache. Perhaps your informant was referring to "pinning" an object in memory, but that's really not effective in the modern releases of Oracle (since V8), particularly for table data. Let Oracle do it's job - it's actually very good at it (probably better than we are). Face it - 100K updates is going to take a while.

Resources