How to organize fair cache in Cassandra Cluster for large number of customers using it? - caching

Is there any recommendation of how to serve large number of customers with Cassandra?
Imagine that all of them read rows (specific per customer) in Cassandra. If we have one big Column Family (CF) we will correspondingly have single memtable per this CF. So, that customers who read data more frequently will displace cache entries of less frequent-in-read customers. And quality of service (i.e. read speed) will differ for different users. This is not fair (all customers must experience the same performance).
Is it normal to allocate separate CF's per customer (e.g. 5000 CF's or more)? As I understand this will cause creation of 5000 memtables what will lead to fair caching because each customer will be served in separate cache (memtable). Am I correct?
And on the other hand, will creation of large number of CF's decrease performance rather than having single big CF?

Memtables are not caches, they are to ensure writes in Cassandra are sequential on disk. They are read from when doing queries, but are flushed when they are too big rather than using an eviction policy that is appropriate for a cache.
Having separate column families for each customer will be very inefficient - 1000s of CFs is too many. Better would be to make sure your customer CF remains in cache. If you have enough memory then it will (assuming you don't have other CFs on your Cassandra cluster). Or you could use the row cache and set the size to be big enough to hold all the data in your customer CF.

Related

Azure Redis cache latency

I am working on an application having web job and azure function app. Web job generates the redis cache for function app to consume. Cache size is around 10 Mega Bytes. I am using lazy loading and all as per the recommendation. I still find that the overall cache operation is slow. Depending upon the size of the file i am processing, i may end up calling Redis cache upto 100,000 times . Wondering if I need to hold the cache data in a local variabke instead of reading it every time from redis. Has anyone experienced any latency in accessing Redis? Does it makes sense to create a singletone object in c# function app and refresh it based on some timer or other logic?
could you consider this points in your usage this is some good practices of azure redis cashe
Redis works best with smaller values, so consider chopping up bigger data into multiple keys. In this Redis discussion, 100kb is considered "large". Read this article for an example problem that can be caused by large values.
Use Standard or Premium Tier for Production systems. The Basic Tier is a single node system with no data replication and no SLA. Also, use at least a C1 cache. C0 caches are really meant for simple dev/test scenarios since they have a shared CPU core, very little memory, are prone to "noisy neighbor", etc.
Remember that Redis is an In-Memory data store. so that you are aware of scenarios where data loss can occur.
Reuse connections - Creating new connections is expensive and increases latency, so reuse connections as much as possible. If you choose to create new connections, make sure to close the old connections before you release them (even in managed memory languages like .NET or Java).
Locate your cache instance and your application in the same region. Connecting to a cache in a different region can significantly increase latency and reduce reliability. Connecting from outside of Azure is supported, but not recommended especially when using Redis as a cache (as opposed to a key/value store where latency may not be the primary concern).
Redis works best with smaller values, so consider chopping up bigger data into multiple keys.
Configure your maxmemory-reserved setting to improve system responsiveness under memory pressure conditions, especially for write-heavy workloads or if you are storing larger values (100KB or more) in Redis. I would recommend starting with 10% of the size of your cache, then increase if you have write-heavy loads. See some considerations when selecting a value.
Avoid Expensive Commands - Some redis operations, like the "KEYS" command, are VERY expensive and should be avoided.
Configure your client library to use a "connect timeout" of at least 10 to 15 seconds, giving the system time to connect even under higher CPU conditions. If your client or server tend to be under high load, use an even larger value. If you use a large number of connections in a single application, consider adding some type of staggered reconnect logic to prevent a flood of connections hitting the server at the same time.

Can cache admission strategy be useful to prune distributed cache writes

Assume some distributed CRUD Service that uses a distributed cache that is not read-through (just some Key-Value store agnostic of DB). So there are n server nodes connected to m cache nodes (round-robin as routing). The cache is supposed to cache data stored in a DB layer.
So the default retrieval sequence seems to be:
check if data is in cache, if so return data
else fetch from DB
send data to cache (cache does eviction)
return data
The question is whether the individual service nodes can be smarter about what data to send to the cache, to reduce cache capacity costs (achieve similar hit ratio with less required cache storage space).
Given recent benchmarks on optimal eviction/admission strategies (in particular LFU), some new caches might not even store data if it is deemed too infrequently used, maybe application nodes can do some best-effort guess.
So my idea is that the individual service nodes could evaluate whether data that was fetched from a DB should be send to the distributed cache or not based on an algorithm like LFU, thus reducing the network traffic between service and cache. I am thinking about local checks (suffering a lack of effectivity on cold startups), but checks against a shared list of cached keys may also be considered.
So the sequence would be
check if data is in cache, if so return data
else fetch from DB
check if data key is frequently used
if yes, send data to cache (cache does eviction). Else not.
return data
Is this possible, reasonable, has it already been done?
It is common in databases, search, and analytical products to guard their LRU caches with filters to avoid pollution caused by scans. For example see Postgres' Buffer Ring Replacement Strategy and ElasticSearch's filter cache. These are admission policies detached from the cache itself, which could be replaced if their caching algorithm was more intelligent. It sounds like your idea is similar, except a distributed version.
Most remote / distributed caches use classic eviction policies (LRU, LFU). That is okay because they are often excessively large, e.g. Twitter requires a 99.9% hit rate for their SLA targets. This means they likely won't drop recent items because the penalty is too high and oversize so that the victim is ancient.
However, that breaks down when batch jobs run and pollute the remote caching tier. In those cases, its not uncommon to see the cache population disabled to avoid impacting user requests. This is then a distributed variant of Postgres' problem described above.
The largest drawback with your idea is checking the item's popularity. This might be local only, which has a frequent cold start problem, or remote call which adds a network hop. That remote call would be cheaper than the traffic of shipping the item, but you are unlikely to be bandwidth limited. Likely you're goal would be to reduce capacity costs by a higher hit rate, but if your SLA requires a nearly perfect hit rate then you'll over provision anyway. It all depends on whether the gains by reducing cache-aside population operations are worth the implementation effort. I suspect that for most it hasn't been.

Cassandra client code with high read throughput with row_cache optimization

Can someone point me to cassandra client code that can achieve a read throughput of at least hundreds of thousands of reads/s if I keep reading the same record (or even a small number of records) over and over? I believe row_cache_size_in_mb is supposed to cache frequently used records in memory, but setting it to say 10MB seems to make no difference.
I tried cassandra-stress of course, but the highest read throughput it achieves with 1KB records (-col size=UNIFORM\(1000..1000\)) is ~15K/s.
With low numbers like above, I can easily write an in-memory hashmap based cache that will give me at least a million reads per second for a small working set size. How do I make cassandra do this automatically for me? Or is it not supposed to achieve performance close to an in-memory map even for a tiny working set size?
Can someone point me to cassandra client code that can achieve a read throughput of at least hundreds of thousands of reads/s if I keep reading the same record (or even a small number of records) over and over?
There are some solution for this scenario
One idea is to use row cache but be careful, any update/delete to a single column will invalidate the whole partition from the cache so you loose all the benefit. Row cache best usage is for small dataset and are frequently read but almost never modified.
Are you sure that your cassandra-stress scenario never update or write to the same partition over and over again ?
Here are my findings: when I enable row_cache, counter_cache, and key_cache all to sizable values, I am able to verify using "top" that cassandra does no disk I/O at all; all three seem necessary to ensure no disk activity. Yet, despite zero disk I/O, the throughput is <20K/s even for reading a single record over and over. This likely confirms (as also alluded to in my comment) that cassandra incurs the cost of serialization and deserialization even if its operations are completely in-memory, i.e., it is not designed to compete with native hashmap performance. So, if you want get native hashmap speeds for a small-working-set workload but expand to disk if the map grows big, you would need to write your own cache on top of cassandra (or any of the other key-value stores like mongo, redis, etc. for that matter).
For those interested, I also verified that redis is the fastest among cassandra, mongo, and redis for a simple get/put small-working-set workload, but even redis gets at best ~35K/s read throughput (largely independent, by design, of the request size), which hardly comes anywhere close to native hashmap performance that simply returns pointers and can do so comfortably at over 2 million/s.

Strategy for "user data" in couchbase

I know that a big part of the performance from Couchbase comes from serving in-memory documents and for many of my data types that seems like an entirely reasonable aspiration but considering how user-data scales and is used I'm wondering if it's reasonable to plan for only a small percentage of the user documents to be in memory all of the time. I'm thinking maybe only 10-15% at any given time. Is this a reasonable assumption considering:
At any given time period there will be a only a fractional number of users will be using the system.
In this case, users only access there own data (or predominantly so)
Recently entered data is exponentially more likely to be viewed than historical user documents
UPDATE:
Some additional context:
Let's assume there's a user base of a 1 million customers, that 20% rarely if ever access the site, 40% access it once a week, and 40% access it every day.
At any given moment, only 5-10% of the user population would be logged in
When a user logs in they are like to re-query for certain documents in a single session (although the client does do some object caching to minimise this)
For any user, the most recent records are very active, the very old records very inactive
In summary, I would say of a majority of user-triggered transactional documents are queried quite infrequently but there are a core set -- records produced in the last 24-48 hours and relevant to the currently "logged in" group -- that would have significant benefits to being in-memory.
Two sub-questions are:
Is there a way to indicate a timestamp on a per-document basis to indicate it's need to be kept in memory?
How does couchbase overcome the growing list of document id's in-memory. It is my understanding that all ID's must always be in memory? isn't this too memory intensive for some apps?
First,one of the major benefits to CB is the fact that it is spread across multiple nodes. This also means your queries are spread across multiple nodes and you have a performance gain as a result (I know several other similar nosql spread across nodes - so maybe not relevant for your comparison?).
Next, I believe this question is a little bit too broad as I believe the answer will really depend on your usage. Does a given user only query his data one time, at random? If so, then according to you there will only be an in-memory benefit 10-15% of the time. If instead, once a user is on the site, they might query their data multiple times, there is a definite performance benefit.
Regardless, Couchbase has pretty fast disk-access performance, particularly on SSDs, so it probably doesn't make much difference either way, but again without specifics there is no way to be sure. If it's a relatively small document size, and if it involves a user waiting for one of them to load, then the user certainly will not notice a difference whether the document is loaded from RAM or disk.
Here is an interesting article on benchmarks for CB against similar nosql platforms.
Edit:
After reading your additional context, I think your scenario lines up pretty much exactly how Couchbase was designed to operate. From an eviction standpoint, CB keeps the newest and most-frequently accessed items in RAM. As RAM fills up with new and/or old items, oldest and least-frequently accessed are "evicted" to disk. This link from the Couchbase Manual explains more about how this works.
I think you are on the right track with Couchbase - in any regard, it's flexibility with scaling will easily allow you to tune the database to your application. I really don't think you can go wrong here.
Regarding your two questions:
Not in Couchbase 2.2
You should use relatively small document IDs. While it is true they are stored in RAM, if your document ids are small, your deployment is not "right-sized" if you are using a significant percentage of the available cluster RAM to store keys. This link talks about keys and gives details relevant to key size (e.g. 250-byte limit on size, metadata, etc.).
Basically what you are making a decision point on is sizing the Couchbase cluster for bucket RAM, and allowing a reduced residency ratio (% of document values in RAM), and using Cache Misses to pull from disk.
However, there are caveats in this scenario as well. You will basically also have relatively constant "cache eviction" where "not recently used" values are being removed from RAM cache as you pull cache missed documents from disk into RAM. This is because you will always be floating at the high water mark for the Bucket RAM quota. If you also simultaneously have a high write velocity (new/updated data) they will also need to be persisted. These two processes can compete for Disk I/O if the write velocity exceeds your capacity to evict/retrieve, and your SDK client will receive a Temporary OOM error if you actually cannot evict fast enough to open up RAM for new writes. As you scale horizontally, this becomes less likely as you have more Disk I/O capacity spread across more machines all simultaneously doing this process.
If when you say "queried" you mean querying indexes (i.e. Views), this is a separate data structure on disk that you would be querying and of course getting results back is not subject to eviction/NRU, but if you follow the View Query with a multi-get the above still applies. (Don't emit entire documents into your Index!)

Configuring redis to consistently evict older data first

I'm storing a bunch of realtime data in redis. I'm setting a TTL of 14400 seconds (4 hours) on all of the keys. I've set maxmemory to 10G, which currently is not enough space to fit 4 hours of data in memory, and I'm not using virtual memory, so redis is evicting data before it expires.
I'm okay with redis evicting the data, but I would like it to evict the oldest data first. So even if I don't have a full 4 hours of data, at least I can have some range of data (3 hours, 2 hours, etc) with no gaps in it. I tried to accomplish this by setting maxmemory-policy=volatile-ttl, thinking that the oldest keys would be evicted first since they all have the same TTL, but it's not working that way. It appears that redis is evicting data somewhat arbitrarily, so I end up with gaps in my data. For example, today the data from 2012-01-25T13:00 was evicted before the data from 2012-01-25T12:00.
Is it possible to configure redis to consistently evict the older data first?
Here are the relevant lines from my redis.cnf file. Let me know if you want to see any more of the cofiguration:
maxmemory 10gb
maxmemory-policy volatile-ttl
vm-enabled no
AFAIK, it is not possible to configure Redis to consistently evict the older data first.
When the *-ttl or *-lru options are chosen in maxmemory-policy, Redis does not use an exact algorithm to pick the keys to be removed. An exact algorithm would require an extra list (for *-lru) or an extra heap (for *-ttl) in memory, and cross-reference it with the normal Redis dictionary data structure. It would be expensive in term of memory consumption.
With the current mechanism, evictions occur in the main event loop (i.e. potential evictions are checked at each loop iteration before each command is executed). Until memory is back under the maxmemory limit, Redis randomly picks a sample of n keys, and selects for expiration the most idle one (for *-lru) or the one which is the closest to its expiration limit (for *-ttl). By default only 3 samples are considered. The result is non deterministic.
One way to increase the accuracy of this algorithm and mitigate the problem is to increase the number of considered samples (maxmemory-samples parameter in the configuration file).
Do not set it too high, since it will consume some CPU. It is a tradeoff between eviction accuracy and CPU consumption.
Now if you really require a consistent behavior, one solution is to implement your own eviction mechanism on top of Redis. For instance, you could add a list (for non updatable keys) or a sorted set (for updatable keys) in order to track the keys that should be evicted first. Then, you add a daemon whose purpose is to periodically check (using INFO) the memory consumption and query the items of the list/sorted set to remove the relevant keys.
Please note other caching systems have their own way to deal with this problem. For instance with memcached, there is one LRU structure per slab (which depends on the object size), so the eviction order is also not accurate (although more deterministic than with Redis in practice).

Resources