We would like to keep primary keys in memory and backup keys on disks. So on re-shuffle, we will accept the performance of reading key/values from disks.
From my research on the ignite documentation, I don't see that option out of the box. Is there any way to do this via configuration?
If this feature doesn't exist, as a workaround I had the following idea. If we know our cache takes 1 terabyte, we know with backups it will be 2 terabytes. (Approximately) If we allocate a little over 1 terabyte in memory and set the eviction policy to disk, will this effectively get us the functionality we want? That is, will it evict backup copies to disk and leave primaries in memory?
This feature doesn't exist and your workaround won't work because it will randomly evict primary and backup copies. However, you can probably implement your own eviction policy that will immediately evict any created backup and configure swap space to store this backups.
Note that I see sense only in case you're running SQL queries and/or if you don't have persistence store. If you only use key based access, any lost entry will be reloaded from the persistence store when needed.
Related
I have a cluster with two Redis docker instances (v3.2.5) I use for caching responses from Spring boot microservices.
I've disabled all persistence and the number of keys is stable over time, all of them expiring between 5 minutes and 1 day.
Despite this, I can see the memory usage creeping up. It looks like once a day (around midnight) it uses a lot of memory and then releases some of it.
Does anyone have any idea what this process may be, if there's any way to configure Redis to avoid using that much memory?
The number of keys I have doesn't justify this amount of memory
UPDATE
After taking a snapshot of the database and loading the data on a fresh new Redis instance (same version, same config) the memory_used_human is 10 times lower than the original one.
Is it possible that key expiration doesn't really delete keys from memory?
We are considering to use Couchbase as persistent cache layer. Since Couchbase writes cache items to memory first and syncs to disk asynchronously, one concern we have is crash consistency. If some cache item were updated in memory and Couchbase crashes before committing them to disk, those items will be stale when Couchbase restarts.
My question is:
Will Couchbase detect and report those items are stale? If so, we can just discard those items since they are cache.
Is there any other Couchbase-specific ways to deal with the stale cache problem?
I don't think there would be a way to detect if a document is stale, since (in your scenario) they weren't written to disk before a crash.
However, you can specify durability requirements when creating a document. By default, a write is considered successful if it makes it into memory. You can add additional constraints like "PersistTo" (the document must be persisted to N number of nodes before the write is considered successful).
I was wondering if I could get an explanation between the differences between In-Memory cache(redis, memcached), In-Memory data grids (gemfire) and In-Memory database (VoltDB). I'm having a hard time distinguishing the key characteristics between the 3.
Cache - By definition means it is stored in memory. Any data stored in memory (RAM) for faster access is called cache. Examples: Ehcache, Memcache Typically you put an object in cache with String as Key and access the cache using the Key. It is very straight forward. It depends on the application when to access the cahce vs database and no complex processing happens in the Cache. If the cache spans multiple machines, then it is called distributed cache. For example, Netflix uses EVCAche which is built on top of Memcache to store the users movie recommendations that you see on the home screen.
In Memory Database - It has all the features of a Cache plus come processing/querying capabilities. Redis falls under this category. Redis supports multiple data structures and you can query the data in the Redis ( examples like get last 10 accessed items, get the most used item etc). It can span multiple machine and is usually very high performant and also support persistence to disk if needed. For example, Twitter uses Redis database to store the timeline information.
I don't know about gemfire and VoltDB, but even memcached and redis are very different. Memcached is really simple caching, a place to store variables in a very uncomplex fashion, and then retrieve them so you don't have to go to a file or database lookup every time you need that data. The types of variable are very simple. Redis on the other hand is actually an in memory database, with a very interesting selection of data types. It has a wonderful data type for doing sorted lists, which works great for applications such as leader boards. You add your new record to the data, and it gets sorted automagically.
So I wouldn't get too hung up on the categories. You really need to examine each tool differently to see what it can do for you, and the application you're building. It's kind of like trying to draw comparisons on nosql databases - they are all very different, and do different things well.
I would add that things in the "database" category tend to have more features to protect and replicate your data than a simple "cache". Cache is temporary (usually) where as database data should be persistent. Many cache solutions I've seen do not persist to disk, so if you lost power to your whole cluster, you'd lose everything in cache.
But there are some cache solutions that have persistence and replication features too, so the line is blurry.
An in-memory Cache is a common query store therefore relieves DB of read Workloads. Common examples of in-memory cache are Redis cache. An example could be Web site storing popular searches made by clients thereby relieving the DB of some load.
In-memory Cache provides query functionality on top of caching (storing session data in RAM (temporary storage)).
Memcache falls in the temp store caching category.
Does Couchbase store documents in-memory first before moving the data to filestore? Is there any configuration available to specify how long the data has to be store in-memory before it can be flushed to file store?
Couchbase architecture is Memory first\Cache thru.
You can't decide if using memory or not, and it write the data to disk as soon as possible.
Part of that is that you need to have enough memory for the amount of data you have.
You do have some policies like Full or Value eviction but again you don't have the control.
But what you can do is in the SDK wait until the data is replicated\persisted to disk.
Couchbase stores data both on disk and in RAM. The default behavior is to write the document to disk at some arbitrary time (usually quickly) after storing in RAM. This leaves a short window where node failure can result in loss of data. I can't find anything in the documentation for the current version of Couchbase, but it used to be that you could request the "set" method to only complete once the data has been persisted to disk (default is to RAM only).
In any case, after writing to RAM, the document will eventually be written to disk. Couchbase keeps a disk write queue which you can check on the metrics report page in the management console. Now, CB does synchronize writes across the cluster, and I believe a write will be synchronized across a cluster before Couchbase will acknowledge that the write happened (e.g. before the write method returns to the caller). Again, the documentation is hard to determine on this, as prior versions the documentation was much more detailed.
If you have more documents than available RAM, only the most-frequently accessed documents will be stored in RAM for quick retrieval, with all others being "evicted" to disk.
I am trying to evaluate Terracotta Disctributed Cache with ehcache. I have the following query. There are 20+ apps which will use a TAS distributed cache. As I understand there will be a L1 cache in each of these apps and a L2 in the cluster. The cluster cache data is fronting a Database which will be updated by a different app which we do not have access to. So we only read from this DB. But the DB updates needs to flow to the cache.
By the way of DB triggers the updated (keys alone) are stored in a temp table. In specific intervals a job monitors this table and collects the keys in the cache that needs to be expired. This is a separate batch job.
From here I need help. How do I inform the TAS L2 cache to expire/evict these keys? What options in terracotta are there?. Will this expiry event flow from L2 to all the individual apps? What is the time lag? I do not want to send the expiry keys to all the individual apps. Can this be accomplished?.
Thanks for the help!
Maybe I am missing something, but I am not sure why you would want to expire/evict those keys instead of simply calling cache.removeAll(keys). This removal will be automatically propagated to all L1 nodes which have those entries in their local cache.
The time lag depends on the consistency settings of the distributed cache.