How safe is it to store sessions with Redis? - session

I'm currently using MySql to store my sessions. It works great, but it is a bit slow.
I've been asked to use Redis, but I'm wondering if it is a good idea because I've heard that Redis delays write operations. I'm a bit afraid because sessions need to be real-time.
Has anyone experienced such problems?

Redis is perfect for storing sessions. All operations are performed in memory, and so reads and writes will be fast.
The second aspect is persistence of session state. Redis gives you a lot of flexibility in how you want to persist session state to your hard-disk. You can go through http://redis.io/topics/persistence to learn more, but at a high level, here are your options -
If you cannot afford losing any sessions, set appendfsync always in your configuration file. With this, Redis guarantees that any write operations are saved to the disk. The disadvantage is that write operations will be slower.
If you are okay with losing about 1s worth of data, use appendfsync everysec. This will give great performance with reasonable data guarantees

This question is really about real-time sessions, and seems to have arisen partly due to a misunderstanding of the phrase 'delayed write operations' While the details were eventually teased out in the comments, I just wanted to make it super-duper clear...
You will have no problems implementing real-time sessions.
Redis is an in-memory key-value store with optional persistence to disk. 'Delayed write operations' refers to writes to disk, not the database in general, which exists in memory. If you SET a key/value pair, you can GET it immediately (i.e in real-time). The policy you select with regards to persistence (how much you delay the writes) will determine the upper-bound for how much data could be lost in a crash.

Basically there are two main types available: async snapsnots and fsync(). They're called RDB and AOF respectively. More on persistence modes on the official page.
The signal handling of the daemonized process syncs to disk when it receives a SIGTERM for instance, so the data will still be there after a reboot. I think the daemon or the OS has to crash before you'll see an integrity corruption, even with the default settings (RDB snapshots).
The AOF setting uses an Append Only File that logs the commands the server receives, and recreates the DB from scratch on cold start, from the saved file. The default disk-sync policy is to flush once every second (IIRC) but can be set to lock and write on every command.
Using both the snapshots and the incremental log seems to offer both a long term don't-mind-if-I-miss-a-few-seconds-of-data approach with a more secure, but costly incremental log. Redis supports clustering out of the box, so replication can be done too it seems.
I'm using the default RDB setting myself and saving the snapshots to remote FTP. I haven't seen a failure that's caused a data loss yet. Acute hardware failure or power outages would most likely, but I'm hosted on a VPS. Slim chance of that happening :)

Related

Is it a good practice to cache Redis data in ngx.shared

I have some Lua code embedded in nginx. In this code I get some small data from Redis cache. Now I wonder, if it is a good practice to cache this data (already cached in some sense) in nginx, using ngx.shared construct? Are there any pros and cons of doing it this way? In pseudo-code I expect to have something like:
local cache = ngx.shared.cache
local cached_key = cache:get("cached_key")
if cached_key == nil then
... get data from Redis
cache:set("cached_key", cached_key)
end
As stated in the documentation ngx.shared is a space shared among all the workers of the nginx server.
All the listed operations are atomic, so you only have to bother about race conditions if you use two operations on ngx.shared one after the other. In this case, they should be protected using ngx.semaphore.
The pros:
Using ngx.shared provides faster access to the data, because you avoid a request/response loop to the Redis server.
Even if you need a ngx.semaphore you can expect faster access to the data (but i have no benchmark to provide).
The cons:
The ngx.shared cache provides inaccurate data, as your local cache does not reflect the current Redis value. This is not always a crucial point, as there can always be a delta between the values used in the worker and the value stored in Redis.
Data stored in ngx.shared can be inconsistent, which is more important. For instance it can store x=true and y=false whereas in Redis x and y have always the same value. It depends on how you update your local cache.
You have to handle yourself the cache, by updating the values in your cache whenever they are sent to Redis. This can be easily done by wrapping the redis functions. Expect bugs if you handle updates by putting it after each call to redis.get, because you (or someone) will forget it.
You also have to handle reads: whenever a value is not found in your ngx.cache, you have to automatically read it from Redis. Expect bugs if you handle reads by putting them after each call to cache.get, because you (or someone) will forget it.
For the two last points, you can easily write a small wrapper module.
As a conclusion:
If your server runs only one instance, with one or several workers, using ngx.shared is interesting, as you can always have a cache of your Redis data that is always up-to-date.
If your server runs several instances and having an always up-to-date cache is mandatory, or if you could have consistency problems, then you should avoid caching using ngx.shared.
In all cases, if the size of your data can be huge, make sure to provide a way to clean it before memory consumption is too high. If you cannot provide cleaning, then you should not use ngx.shared.
Also, do not forget to store the cached value within a local variable, in order to avoid geting it again and again, and thus to improve efficiency.

2 instances of Redis: as a cache and as a persistent datastore

I want to setup 2 instances of Redis because I have different requirements for the data I want to store in Redis. While I sometimes do not mind losing some data that are used primarly as cached data, I want to avoid to lose some data in some cases like when I use python RQ that stores into Redis the jobs to execute.
I mentionned below the main settings to achieve such a goal.
What do you think?
Did I forget anything important?
1) Redis as a cache
# Snapshotting to not rebuild the whole cache if it has to restart
# Be reasonable to not decrease the performances
save 900 1
save 300 10
save 60 10000
# Define a max memory and remove less recently used keys
maxmemory X # To define according needs
maxmemory-policy allkeys-lru
maxmemory-samples 5
# The rdb file name
dbfilename dump.rdb
# The working directory.
dir ./
# Make sure appendonly is disabled
appendonly no
2) Redis as a persistent datastore
# Disable snapshotting since we will save each request, see appendonly
save ""
# No limit in memory
# How to disable it? By not defining it in the config file?
maxmemory
# Enable appendonly
appendonly yes
appendfilename redis-aof.aof
appendfsync always # Save on each request to not lose any data
no-appendfsync-on-rewrite no
# Rewrite the AOL file, choose a good min size based on the approximate size of the DB?
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 32mb
aof-rewrite-incremental-fsync yes
aof-load-truncated yes
Sources:
http://redis.io/topics/persistence
https://raw.githubusercontent.com/antirez/redis/2.8/redis.conf
http://fr.slideshare.net/eugef/redis-persistence-in-practice-1
http://oldblog.antirez.com/post/redis-persistence-demystified.html
How to perform Persistence Store in Redis?
https://www.packtpub.com/books/content/implementing-persistence-redis-intermediate
I think your persistence options are too aggressive - but it mostly depends on the nature and the volume of your data.
For the cache, using RDB is a good idea, but keep in mind that depending on the volume of data, dumping the content of the memory on disk has a cost. On my system, Redis can write memory data at 400 MB/s, but note that data may (or may not) be compressed, may (or may not) be using dense data structures, so your mileage will vary. With your settings, a cache supporting heavy writing will generate a dump every minute. You have to check that with the volume you have, the dump duration is well below that minute (something like 6-10 seconds would be fine). Actually, I would recommend to keep only save 900 1 and remove the other save lines. And even a dump every 15 min could be considered as too frequent, especially if you have SSD hardware that will progressively wear out.
For the persistent store, you need to define also the dir parameter (since it also controls the location of the AOF file). The appendfsync always option is overkill and too slow for most purposes, except if you have very low throughput. You should set it to everysec. If you cannot afford to lose a single bit of data even in case of system crash, then using Redis as a storage backend is not a good idea. Finally, you will probably have to adjust auto-aof-rewrite-percentage and auto-aof-rewrite-min-size to the level of write throughput the Redis instance has to sustain.
I totally agree with #Didier - this is more of a supplement rather than a full answer.
First note that Redis offers tunable persistency - you can use RDB and/or AOF. While a your choice of using RDB for a persistent cache makes perfect sense, I would recommend considering using both for your persistent store. This will allow you both point-in-time recovery based on the snapshots (i.e. backup) as well as post-crash recovery to the last recorded operation with the AOF.
For the persistent store, you don't want to set maxmemory to 0 (which is the default if it is commented out in the conf file). When set to 0, Redis will use as much memory as the OS will give it so eventually, as your dataset grows, you will run into a situation where the OS will kill it to free memory (this often happens when you least expect it ;)). You should, instead, use a real value that's based on the amount of RAM that your server has with enough padding for the OS. For example, if your server has 16GB of RAM, as a rule of thumb I'd restrict Redis from using more than 14GB.
But there's a catch. Since you've read everything about Redis' persistency, you probably remember that Redis forks to write the data to disk. Forking can more than double the memory consumption (forked copy + changes) during the child process' execution so you need to make sure that your server has enough free memory to accommodate that if you use data persistence. Also note that you should consider in your maxmemory calculation other potential memory-consuming thingies such as replication and client buffers depending on what/how you and the app use Redis.

OLAP Saiku Cache expires

I'm using Saiku and PHPAnalytics to run MDX queries on my cube.
it seems if i run queries it's all good, caching is fine. But if I go for 2 hours and run those queries again - it does not using cache! Why? I need the cache to be saved for a long time! What to do? I tried to add this ti mondrian.properties mondrian.rolap.CachePool.costLimit = 2147483647
But no help. What do to?
The default in-memory cache of Mondrian stores things in a WeakHashMap. This means that it could be cleared at the discretion of the JVM's garbage collector. Most application servers are setup to do a periodical sweep of garbage collection (usually each hour or so). You have to either tweak your JVM's configuration to not do this.
-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000
You can also implement your own cache implementation of the SegmentCache SPI. If your implementation uses hard references, they will never be collected. This is trickier to do and will require you to do quite a bit of studying to get it right. You can start by taking a look at the default implementation and start from there.
The mondrian cache should cache up until the cache is deliberately flushed. That said it uses an aging system to determine what should be cached should it run out of memory to store the data, the oldest query gets pushed out of the cache and replaced.
I've not tried the PHPAnalytics stuff, but maybe they've put some call into the Saiku server to flush the cache on a regular basis, otherwise this shouldn't happen.

What should be stored in cache for web app?

I realize that this might be a vague question the bequests a vague answer, but I'm in need of some real world examples, thoughts, &/or best practices for caching data for a web app. All of the examples I've read are more technical in nature (how to add or remove cache data from the respective cache store), but I've not been able to find a higher level strategy for caching.
For example, my web app has an inbox/mail feature for each user. What I've been doing to date is storing typical session data in the cache. In this example, when the user logs in I go to the database and retrieve the user's mail messages and store them in cache. I'm beginning to wonder if I should just maintain a copy of all users' messages in the cache, all the time, and just retrieve them from cache when needed, instead of loading from the database upon login. I have a bunch of other data that's loaded on login (product catalogs and related entities) and login is starting to slow down.
So I guess my question to the community, is what would you do/recommend as an approach in this scenario?
Thanks.
This might be better suited to https://softwareengineering.stackexchange.com/, but generally you want to cache:
Metadata/configuration data that does not change frequently. E.g. country/state lists, external resource addresses, logic/branching settings, product/price/tax definitions, etc.
Data that is costly to retrieve or generate and that does not need to frequently change. E.g. historical data sets for reports.
Data that is unique to the current user's session.
The last item above is where you need to be careful as you can drastically increase your app's memory usage, by adding a few megabytes to the data for every active session. It also implies different levels of caching -- application wide, user session, etc.
Generally you should NOT cache data that is under active change.
In larger systems you also need to think about where the cache(s) will sit. Is it possible to have one central cache server, or is it good enough for each server/process to handle its own caching?
Also: you should have some method to quickly reset/invalidate the cached data. For a smaller or less mission-critical app, this could be as simple as restarting the web server. For the large system that I work on, we use a 12 hour absolute expiration window for most cached data, but we have a way of forcing immediate expiration if we need it.
This is a really broad question, and the answer depends heavily on the specific application/system you are building. I don't know enough about your specific scenario to say if you should cache all the users' messages, but instinctively it seems like a bad idea since you would seem to be effectively caching your entire data set. This could lead to problems if new messages come in or get deleted. Would you then update them in the cache? Would that not simply duplicate the backing store?
Caching is only a performance optimization technique, and as with any optimization, measure first before making substantial changes, to avoid wasting time optimizing the wrong thing. Maybe you don't need much caching, and it would only complicate your app. Maybe the data you are thinking of caching can be retrieved in a faster way, or less of it can be retrieved at once.
Cache anything that causes duplicate database queries.
Client side file caching is important as well. Assuming files are marked with an id in your database, cache them on every network request to avoid many network requests for the same file. A resource to do this can be found here (https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API). If you don't need to cache files, web storage, local storage and cookies are good for smaller pieces of data.
//if file is in cache
//refer to cache
//else
//make network request and push file to cache

How do I load the Oracle schema into memory instead of the hard drive?

I have a certain web application that makes upwards of ~100 updates to an Oracle database in succession. This can take anywhere from 3-5 minutes, which sometimes causes the webpage to time out. A re-design of the application is scheduled soon but someone told me that there is a way to configure a "loader file" which loads the schema into memory and runs the transactions there instead of on the hard drive, supposedly improving speed by several orders of magnitude. I have tried to research this "loader file" but all I can find is information about the SQL* bulk data loader. Does anyone know what he's talking about? Is this really possible and is it a feasible quick fix or should I just wait until the application is re-designed?
Oracle already does it's work in memory - disk I/O is managed behind the scenes. Frequently accessed data stays in memory in the buffer cache. Perhaps your informant was referring to "pinning" an object in memory, but that's really not effective in the modern releases of Oracle (since V8), particularly for table data. Let Oracle do it's job - it's actually very good at it (probably better than we are). Face it - 100K updates is going to take a while.

Resources