Configuring redis to consistently evict older data first - caching

I'm storing a bunch of realtime data in redis. I'm setting a TTL of 14400 seconds (4 hours) on all of the keys. I've set maxmemory to 10G, which currently is not enough space to fit 4 hours of data in memory, and I'm not using virtual memory, so redis is evicting data before it expires.
I'm okay with redis evicting the data, but I would like it to evict the oldest data first. So even if I don't have a full 4 hours of data, at least I can have some range of data (3 hours, 2 hours, etc) with no gaps in it. I tried to accomplish this by setting maxmemory-policy=volatile-ttl, thinking that the oldest keys would be evicted first since they all have the same TTL, but it's not working that way. It appears that redis is evicting data somewhat arbitrarily, so I end up with gaps in my data. For example, today the data from 2012-01-25T13:00 was evicted before the data from 2012-01-25T12:00.
Is it possible to configure redis to consistently evict the older data first?
Here are the relevant lines from my redis.cnf file. Let me know if you want to see any more of the cofiguration:
maxmemory 10gb
maxmemory-policy volatile-ttl
vm-enabled no

AFAIK, it is not possible to configure Redis to consistently evict the older data first.
When the *-ttl or *-lru options are chosen in maxmemory-policy, Redis does not use an exact algorithm to pick the keys to be removed. An exact algorithm would require an extra list (for *-lru) or an extra heap (for *-ttl) in memory, and cross-reference it with the normal Redis dictionary data structure. It would be expensive in term of memory consumption.
With the current mechanism, evictions occur in the main event loop (i.e. potential evictions are checked at each loop iteration before each command is executed). Until memory is back under the maxmemory limit, Redis randomly picks a sample of n keys, and selects for expiration the most idle one (for *-lru) or the one which is the closest to its expiration limit (for *-ttl). By default only 3 samples are considered. The result is non deterministic.
One way to increase the accuracy of this algorithm and mitigate the problem is to increase the number of considered samples (maxmemory-samples parameter in the configuration file).
Do not set it too high, since it will consume some CPU. It is a tradeoff between eviction accuracy and CPU consumption.
Now if you really require a consistent behavior, one solution is to implement your own eviction mechanism on top of Redis. For instance, you could add a list (for non updatable keys) or a sorted set (for updatable keys) in order to track the keys that should be evicted first. Then, you add a daemon whose purpose is to periodically check (using INFO) the memory consumption and query the items of the list/sorted set to remove the relevant keys.
Please note other caching systems have their own way to deal with this problem. For instance with memcached, there is one LRU structure per slab (which depends on the object size), so the eviction order is also not accurate (although more deterministic than with Redis in practice).

Related

Cassandra client code with high read throughput with row_cache optimization

Can someone point me to cassandra client code that can achieve a read throughput of at least hundreds of thousands of reads/s if I keep reading the same record (or even a small number of records) over and over? I believe row_cache_size_in_mb is supposed to cache frequently used records in memory, but setting it to say 10MB seems to make no difference.
I tried cassandra-stress of course, but the highest read throughput it achieves with 1KB records (-col size=UNIFORM\(1000..1000\)) is ~15K/s.
With low numbers like above, I can easily write an in-memory hashmap based cache that will give me at least a million reads per second for a small working set size. How do I make cassandra do this automatically for me? Or is it not supposed to achieve performance close to an in-memory map even for a tiny working set size?
Can someone point me to cassandra client code that can achieve a read throughput of at least hundreds of thousands of reads/s if I keep reading the same record (or even a small number of records) over and over?
There are some solution for this scenario
One idea is to use row cache but be careful, any update/delete to a single column will invalidate the whole partition from the cache so you loose all the benefit. Row cache best usage is for small dataset and are frequently read but almost never modified.
Are you sure that your cassandra-stress scenario never update or write to the same partition over and over again ?
Here are my findings: when I enable row_cache, counter_cache, and key_cache all to sizable values, I am able to verify using "top" that cassandra does no disk I/O at all; all three seem necessary to ensure no disk activity. Yet, despite zero disk I/O, the throughput is <20K/s even for reading a single record over and over. This likely confirms (as also alluded to in my comment) that cassandra incurs the cost of serialization and deserialization even if its operations are completely in-memory, i.e., it is not designed to compete with native hashmap performance. So, if you want get native hashmap speeds for a small-working-set workload but expand to disk if the map grows big, you would need to write your own cache on top of cassandra (or any of the other key-value stores like mongo, redis, etc. for that matter).
For those interested, I also verified that redis is the fastest among cassandra, mongo, and redis for a simple get/put small-working-set workload, but even redis gets at best ~35K/s read throughput (largely independent, by design, of the request size), which hardly comes anywhere close to native hashmap performance that simply returns pointers and can do so comfortably at over 2 million/s.

What values to use for a Redis memory budget?

I'm working on an app that will be using Redis to store some end user session state details. There will be many tens to hundred of millions of key/value (unordered sets) with expiries set (to the second). To gauge whether or not my existing Redis installation will run into memory exhaustion problems, I need to create a budget for my app's expected and worst case Redis memory usage.
The size of the raw data, number of keys, key value pair lifespan, etc. is well know and in the calculation spreadsheet. For Redis specific things, like expiry value size, I just have wild guesses.
For my memory budget calculation, what values should I use for:
per key redis overhead (bytes? percent?)
per key expiry size (seconds, not ms)
per key set size/overhead
per set item size/overhead
(Other per key or data type information might be helpful to others.)

How slow is Redis when full and evicting keys ? (LRU algorithm)

I am using Redis in a Java application, where I am reading log files, storing/retrieving some info in Redis for each log. Keys are IP addresses in my log file, which mean that they are always news keys coming, even if the same appears regularly.
At some point, Redis reaches its maxmemory size (3gb in my case), and starts evicting some keys. I use the "allkeys-lru" settings as I want to keep the youngest keys.
The whole application then slows a lot, taking 5 times longer than at the beginning.
So I have three questions:
is it normal to have such a dramatic slowdown (5 times longer)? Did anybody experience such slowdown? If not, I may have another issue in my code (improbable as the slowdown appears exactly when Redis reaches its limit)
can I improve my config ? I tried to change the maxmemory-samples setting without much success
should I consider an alternative for my particular problem? Is there a in-memory DB that could handle evicting keys with better performances ? I may consider a pure Java object (HashMap...), even if it doesn't look like a good design.
edit 1:
we use 2 DBs in Redis
edit 2:
We use redis 2.2.12 (ubuntu 12.04 LTS). Further investigations explained the issue: we are using db0 and db1 in redis. db1 is used much less than db0, and keys are totally different. When Redis reaches max-memory (and LRU algo starts evicting keys), redis does remove almost all db1 keys, which slows drastically all calls. This is a strange behavior, probably unusual and maybe linked to our application. We fixed the issue by moving to another (better) memory mechanism for keys that were loaded in db1.
thanks !
I'm not convinced Redis is the best option for your use case.
Redis "LRU" is only a best effort algorithm (i.e. quite far from an exact LRU). Redis tracks memory allocations and knows when it has to free some memory. This is checked before the execution of each command. The mechanism to evict a key in "allkeys-lru" mode consists in choosing maxmemory-samples random keys, comparing their idle time, and select the most idle key. Redis repeats these operations until the used memory is below maxmemory.
The higher maxmemory-samples, the more CPU consumption, but the more accurate result.
Provided you do not explicitly use the EXPIRE command, there is no other overhead to be associated with key eviction.
Running a quick test with Redis benchmark on my machine results in a throughput of:
145 Kops/s when no eviction occurs
125 Kops/s when 50% eviction occurs (i.e. 1 key over 2 is evicted).
I cannot reproduce the 5 times factor you experienced.
The obvious recommendation to reduce the overhead of eviction is to decrease maxmemory-samples, but it also means a dramatic decrease of the accuracy.
My suggestion would be to give memcached a try. The LRU mechanism is different. It is still not exact (it applies only on a per slab basis), but it will likely give better results that Redis on this use case.
Which version of Redis are you using? The 2.8 version (quite recent) improved the expiration algorithm and if you are using 2.6 you might give it a try.
http://download.redis.io/redis-stable/00-RELEASENOTES

Redis CPU performance on sorted sets

we are running redis and doing hundreds of increments per second of keys in a sorted set, and at the same time doing thousands of reads on the sorted set every second as well.
This seems to be working well but during peak load cpu usage gets pretty high, 80% of a single core. The sorted set itself is a small memory footprint of a few thousand keys.
is the cpu usage increase likely to be due to the hundreds of increments per second or the thousands of reads? understand both impact performance but which has the larger impact?
given this what are some of the best metrics to monitor on my production instance to review these bottlenecks?
One point to check is whether the sorted sets are small enough to be serialized by Redis or not. For instance the "debug object" could be applied on a sample of sorted sets to check if they are encoded as ziplist or not.
ziplist usage trade memory against CPU, especially when the size of the sorted set is close to threshold (zset-max-ziplist-entries, zset-max-ziplist-value, in the configuration file).
Supposing the sorted sets are not ziplist encoded, I would say CPU usage is likely due to the thousands of reads per sec rather than the hundreds of updates per sec. An update of a zset is a log(n) operation. It is very fast, and there is no locking related latency with Redis. A read of the zset items is a O(n) operation, and may result in a large buffer to build and return to the client.
To be sure, you may want to generate the read only traffic, check the CPU, then stop it, generate the update traffic, check the CPU again and compare.
The zset read operations performance should be close to the LRANGE performance you can find in the Redis benchmark. A few thousands of TPS for zsets featuring a thousand of items seem to be in line with typical Redis performance.

Best way to persist only a subset of Redis keys to disk

Is it possible to persist only certain keys to disk using Redis? Is the best solution for this as of right now to run separate Redis servers where one server can have throw away caches and the other one has more important data that we need to flush to disk periodically (such as counters to visits on a web page)
You can set expirations on a subset of your keys. They will be persisted to disk, but only until they expire. This may be sufficient for your use case.
You can then use the redis maxmemory and maxmemory-policy configuration options to cap memory usage and tell redis what to do when it hits the max memory. If you use the volatile-lru or volatile-ttl options Redis will discard only those keys that have an expiration when it runs out of memory, throwing out either the Least Recently Used or the one with the nearest expiration (Time To Live), respectively.
However, as stated, these values are still put to disk until expiration. If you really need to avoid this then your assumption is correct and another server looks to be the only option.

Resources