I'm working on an app that will be using Redis to store some end user session state details. There will be many tens to hundred of millions of key/value (unordered sets) with expiries set (to the second). To gauge whether or not my existing Redis installation will run into memory exhaustion problems, I need to create a budget for my app's expected and worst case Redis memory usage.
The size of the raw data, number of keys, key value pair lifespan, etc. is well know and in the calculation spreadsheet. For Redis specific things, like expiry value size, I just have wild guesses.
For my memory budget calculation, what values should I use for:
per key redis overhead (bytes? percent?)
per key expiry size (seconds, not ms)
per key set size/overhead
per set item size/overhead
(Other per key or data type information might be helpful to others.)
Related
My usecase:
I am using Redis for storing high amount of data.
In 1 sec i write around 24k keys in redis with ttl as 30 minutes and i want the keys to be deleted after ttl has expired.
The current redis implementation of evicting keys is it works in tasks and each task pick 20 random keys and see if keys have expired ttl then it deletes those keys and redis recommends not more than 100 such tasks to be used. So if i se hz(no of tasks to 100) then Redis will be able to clear tke keys max # 2000 keys/ sec which is too less for me as my insertion rate is very high which eventually results in out of memory exception when memory gets full.
Alternative i have is :
1/ Hit Random Keys, or keys which we know have expired, this will initiate delete in Redis
2/ Set eviction policy when maxmemory is reached. This will aggressively delete redis keys, when max memory is reached.
3/ Set hz (Frequency), to some higher value. This will initiate more tasks for purging expired keys per sec.
1/ Doesn't seem feasible.
For 2/ & 3/
Based on the current cache timer of 30 minutes, and given insertion rate, we can use
maxmemory 12*1024*1024
maxmemory-samples 10
maxmemory-policy volatile-ttl
hz 100
But using 2 would mean all the time redis will be performing the deletion of keys and then insertion,as i am assuming in my case memory will always be equal to 12 GB
So is it good to use this strategy, or we should write our own keys eviction service over Redis?
Are you using Azure Redis cache? If yes, you can think of using clustering. You can have up to 10 shards in the cluster and this will help you to share your load for all different no. of keys for different operation.
I think the only way to get a definitive answer to your question is to write a test. A synthetic test similar to your actual workload should not be hard to create and will let you know if redis can expire keys as quickly as you can insert them and what impact changing the hz value has on performance.
Using maxmemory should also be a workable strategy. Will mean the allocated memory might always be full, but should work.
Another is to reduce the number of keys you are writing to. If the keys you are writing to contain string values, you can instead write them into a redis hash field.
For example, if your inserts look something like this:
redis.set "aaa", value_a
redis.set "bbb", value_b
You could instead use a hash:
# current_second is just a timestamp in seconds
redis.hset current_second, "aaa", value_a
redis.hset current_second, "bbb", value_b
By writing to a hash with the current timestamp in its key and setting the TTL on the entire hash redis only has to evict one key per second.
Given some of the advantages of using hashes in redis, I expect the hash approach to perform best if your use case is compatible.
Might be worth testing before deciding.
Is there any recommendation of how to serve large number of customers with Cassandra?
Imagine that all of them read rows (specific per customer) in Cassandra. If we have one big Column Family (CF) we will correspondingly have single memtable per this CF. So, that customers who read data more frequently will displace cache entries of less frequent-in-read customers. And quality of service (i.e. read speed) will differ for different users. This is not fair (all customers must experience the same performance).
Is it normal to allocate separate CF's per customer (e.g. 5000 CF's or more)? As I understand this will cause creation of 5000 memtables what will lead to fair caching because each customer will be served in separate cache (memtable). Am I correct?
And on the other hand, will creation of large number of CF's decrease performance rather than having single big CF?
Memtables are not caches, they are to ensure writes in Cassandra are sequential on disk. They are read from when doing queries, but are flushed when they are too big rather than using an eviction policy that is appropriate for a cache.
Having separate column families for each customer will be very inefficient - 1000s of CFs is too many. Better would be to make sure your customer CF remains in cache. If you have enough memory then it will (assuming you don't have other CFs on your Cassandra cluster). Or you could use the row cache and set the size to be big enough to hold all the data in your customer CF.
we are running redis and doing hundreds of increments per second of keys in a sorted set, and at the same time doing thousands of reads on the sorted set every second as well.
This seems to be working well but during peak load cpu usage gets pretty high, 80% of a single core. The sorted set itself is a small memory footprint of a few thousand keys.
is the cpu usage increase likely to be due to the hundreds of increments per second or the thousands of reads? understand both impact performance but which has the larger impact?
given this what are some of the best metrics to monitor on my production instance to review these bottlenecks?
One point to check is whether the sorted sets are small enough to be serialized by Redis or not. For instance the "debug object" could be applied on a sample of sorted sets to check if they are encoded as ziplist or not.
ziplist usage trade memory against CPU, especially when the size of the sorted set is close to threshold (zset-max-ziplist-entries, zset-max-ziplist-value, in the configuration file).
Supposing the sorted sets are not ziplist encoded, I would say CPU usage is likely due to the thousands of reads per sec rather than the hundreds of updates per sec. An update of a zset is a log(n) operation. It is very fast, and there is no locking related latency with Redis. A read of the zset items is a O(n) operation, and may result in a large buffer to build and return to the client.
To be sure, you may want to generate the read only traffic, check the CPU, then stop it, generate the update traffic, check the CPU again and compare.
The zset read operations performance should be close to the LRANGE performance you can find in the Redis benchmark. A few thousands of TPS for zsets featuring a thousand of items seem to be in line with typical Redis performance.
I'm storing a bunch of realtime data in redis. I'm setting a TTL of 14400 seconds (4 hours) on all of the keys. I've set maxmemory to 10G, which currently is not enough space to fit 4 hours of data in memory, and I'm not using virtual memory, so redis is evicting data before it expires.
I'm okay with redis evicting the data, but I would like it to evict the oldest data first. So even if I don't have a full 4 hours of data, at least I can have some range of data (3 hours, 2 hours, etc) with no gaps in it. I tried to accomplish this by setting maxmemory-policy=volatile-ttl, thinking that the oldest keys would be evicted first since they all have the same TTL, but it's not working that way. It appears that redis is evicting data somewhat arbitrarily, so I end up with gaps in my data. For example, today the data from 2012-01-25T13:00 was evicted before the data from 2012-01-25T12:00.
Is it possible to configure redis to consistently evict the older data first?
Here are the relevant lines from my redis.cnf file. Let me know if you want to see any more of the cofiguration:
maxmemory 10gb
maxmemory-policy volatile-ttl
vm-enabled no
AFAIK, it is not possible to configure Redis to consistently evict the older data first.
When the *-ttl or *-lru options are chosen in maxmemory-policy, Redis does not use an exact algorithm to pick the keys to be removed. An exact algorithm would require an extra list (for *-lru) or an extra heap (for *-ttl) in memory, and cross-reference it with the normal Redis dictionary data structure. It would be expensive in term of memory consumption.
With the current mechanism, evictions occur in the main event loop (i.e. potential evictions are checked at each loop iteration before each command is executed). Until memory is back under the maxmemory limit, Redis randomly picks a sample of n keys, and selects for expiration the most idle one (for *-lru) or the one which is the closest to its expiration limit (for *-ttl). By default only 3 samples are considered. The result is non deterministic.
One way to increase the accuracy of this algorithm and mitigate the problem is to increase the number of considered samples (maxmemory-samples parameter in the configuration file).
Do not set it too high, since it will consume some CPU. It is a tradeoff between eviction accuracy and CPU consumption.
Now if you really require a consistent behavior, one solution is to implement your own eviction mechanism on top of Redis. For instance, you could add a list (for non updatable keys) or a sorted set (for updatable keys) in order to track the keys that should be evicted first. Then, you add a daemon whose purpose is to periodically check (using INFO) the memory consumption and query the items of the list/sorted set to remove the relevant keys.
Please note other caching systems have their own way to deal with this problem. For instance with memcached, there is one LRU structure per slab (which depends on the object size), so the eviction order is also not accurate (although more deterministic than with Redis in practice).
Is it possible to persist only certain keys to disk using Redis? Is the best solution for this as of right now to run separate Redis servers where one server can have throw away caches and the other one has more important data that we need to flush to disk periodically (such as counters to visits on a web page)
You can set expirations on a subset of your keys. They will be persisted to disk, but only until they expire. This may be sufficient for your use case.
You can then use the redis maxmemory and maxmemory-policy configuration options to cap memory usage and tell redis what to do when it hits the max memory. If you use the volatile-lru or volatile-ttl options Redis will discard only those keys that have an expiration when it runs out of memory, throwing out either the Least Recently Used or the one with the nearest expiration (Time To Live), respectively.
However, as stated, these values are still put to disk until expiration. If you really need to avoid this then your assumption is correct and another server looks to be the only option.