Memcached, Redis, or Couchbase [closed] - caching

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have a Debian server with about 16GB RAM that I'm using with nginx and several heavy mysql databases, and some custom php apps. I'd like to implement a memory cache between Mysql and PHP, but the databases are too large to store everything in RAM. I'm thinking a LRU cache may be better so far as I research. Does this rule out Redis? Couchbase is also a consideration.

Supposing there is a unique server running nginx + php + mysql instances with some remaining free RAM, the easiest way to use that RAM to cache data is simply to increase the buffer caches of the mysql instances. Databases already use LRU-like mechanisms to handle their buffers.
Now, if you need to move part of the processing away from the databases, then pre-caching may be an option. Before talking about memcached/redis, a shared memory cache integrated with php such as APC will be efficient provided only one server is considered (actually more efficient than redis/memcached).
Both memcached and redis can be considered to perform remote caching (i.e. to share the cache between various nodes). I would not rule out redis for this: it can easily be configured for this purpose. Both will allow to define a memory limit, and handle the cache with LRU-like behavior.
However, I would not use couchbase here, which is an elastic (i.e. supposed to be used on several nodes) NoSQL key/value store (i.e. not a cache). You could probably move some data from your mysql instances to a couchbase cluster, but using it just for caching is over-engineering IMO.

As Matt Ingenthron pointed out and Hari noted that Couchbase supports working as a direct Memcached replacement. Couchbase utilizes memcached in a non-elastic way, as in each node participating in the memcache cluster is discreet with no persistence, i.e. just a cache but couchbase also offers "Couchbase" bucket types which do provide persistence. Membase is part of the code as well so Couchbase not only serve data from disk but also from RAM and persists it there while replicating to other nodes and persisting to disk as changes are applied. I would highly recommend Couchbase 3.x for both caching and persistence in one footprint, or multiple footprints if you just wanted only a caching layer separate from your persistence layer.

We used memcached initially to cache data. In memcached partitioning data for different applications under different bucket was a real issue.Also we have a requirement to flush data from one bucket alone. Monitoring data is another requirement. We moved to Couchbase and use the memcache-style bucket. I guess its much more flexible and efficient to use Couchbase memcache-style bucket for caching rather than using memcached.

Have you ever considered to move your databases totally to RAM using one of the in-memory NoSQL solutions with persistence? It could take less storage than your original MySQL database, because many NoSQL solutions usually have less footprint than SQL databases. Besides, if server side logic is very important for you, then try Tarantool as it has Lua scripting onboard and should have a quite small memory footprint. In my cases the same data in Tarantool occupied twice less than in MySQL. This is because they have small overhead per row and per field and use messagepack for data storing.

Related

What is the fundamental difference between ElasticSearch and a cache?

Theoretically speaking, can't you just cache search results from a SQL query made to the database making it similar to ElastiSearch? I understand you would run into invalidation issues, but what are the fundamental differences between ElastiSearch and a cache like Redis?
Elasticsearch is primarily a Search engine optimized to store and retrieve structured or semi-structured data. It takes care of processing structured/semi-structured data, indexes and provides a nice DSL to query data. Oh, and it happens to be super fast :)
A distributed cache like Memcached and Redis ( BTW Redis is not just cache, but a data structure store) primarily stores Key-Value pairs for faster lookup. Think of your local hash table distributed across a bunch of machines.
Two different use cases. If it's just for the cache - Elasticsearch may not be the right choice.

Mongodb - make inmemory or use cache

I will be creating a 5 node mongodb cluster. It will be more read heavy than write and had a question which design would bring better performance. These nodes will be dedicated to only mongodb. For the sake of an example, say each node will have 64GB of ram.
From the mongodb docs it states:
MongoDB automatically uses all free memory on the machine as its cache
Does this mean as long as my data is smaller than the available ram it will be like having an in-memory database?
I also read that it is possible to implement mongodb purely in memory
http://edgystuff.tumblr.com/post/49304254688/how-to-use-mongodb-as-a-pure-in-memory-db-redis
If my data was quite dynamic (can range from 50gb to 75gb every few hours), would it be theoretically be better performing to design mongodb in a way which allows mongodb to manage itself with its cache (default setup of mongo), or to put the mongodb into memory initially and if the data grows over the size of ram use swap space (SSD)?
MongoDB default storage engine maps the files in memory. It provides an efficient way to access the data, while avoiding double caching (i.e. MongoDB cache is actually the page cache of the OS).
Does this mean as long as my data is smaller than the available ram it will be like having an in-memory database?
For read traffic, yes. For write traffic, it is different, since MongoDB may have to journalize the write operation (depending on the configuration), and maintain the oplog.
Is it better to run MongoDB from memory only (leveraging tmpfs)?
For read traffic, it should not be better. Putting the files on tmpfs will also avoid double caching (which is good), but the data can still be paged out. Using a regular filesystem instead will be as fast once the data have been paged in.
For write traffic, it is faster, provided the journal and oplog are also put on tmpfs. Note that in that case, a system crash will result in a total data loss. Usually, the performance gain does not worth the risk.

Can Redis use disk as part of a LRU cache?

We have the need for a distributed LRU cache, but one which can use both memory and disk. We have a large dataset, which is stored on disk permenantly. From that dataset, we create other calculated datasets, but only when clients need them.
Since these secondary datasets are derived from data which is persistent, we never need to permanently save this derived data.
I thought that Redis would have the ability to use disk as a secondary LRU cache, but have not been able to find any documentation that points to that. It seems like Redis only uses the disk to persist the entire cache. I envisioned that we'd be able to scale out horizontally with a bunch of Redis instances.
If Redis can not do this, is there another system that does?
If the data does not fit into memory, the OS can swap it out to the disk. This is called virtual memory. Here you find an explanation: http://redis.io/topics/virtual-memory
Remark: You want to retrieve some data, do stuff on it and you have some intermediate results. Please check whether you may want to distribute your processing, not only the data. Take a look at Apache Hadoop and especially Apache Spark.
The way to solve this problem without changing how your clients work, is in fact not to use Redis, but instead to use a Redis compatible database like Ardb which in turn can be configured to use LevelDB under the hood which supports LRU type on-disk caches.

What is the difference between Cassandra vs Oracle Coherence?

Assume that Oracle Coherence is free :)
Which one do you prefer?
What are the architectural and feature capability differences between Oracle Coherence(Tangosol) and Cassandra?
Best Regards
Oracle Coherence is a pure in-memory cache which can be distributed across nodes. Depending on its configuration it can have strong consistency, or eventual consistency for inserts and updates. Coherence is object based - consistent data model.
Since you buy Coherence from oracle - you can get commercial support, from oracle.
Cassandra is a bigtable data store that is distributed across nodes. No single point of failure. It uses some caching to improve performance before committing the data to disk in its implementation of bigTable. Cassandra requires some structure in its tuple (key/value/timestamp) but otherwise can support flexible data structures.
Preferences should be determined by your use case. They are both pretty cool in their own right.
You might also want to check out
- Terracotta in the in-memory space
- CouchDB and HBase as other players in the big table space.
Lets not forget Gemfire from Gemstone Systems, now owned by VMware (http://www.vmware.com/products/vfabric-gemfire/overview.html). Gemfire is an in memory distributed data fabric similar to Coherence and Terracotta but different in certain key ways. Each one has their pro's and cons but Gemfire is getting more support in a Spring sub project lately called spring-gemfire.
Both are NoSQL Databases. Currently there are 3 types of NoSQL databases that exists - Key Value Store, Tabular and Document Oriented. Coherence is a key value store, Cassandra is more like a tabular and MongoDB is a Document Oriented nosql db.

Cache systems - Hypertable vs Memcached

I want to implement a cache system for our application, we've started integrating with Memcached. Recently I started hearing of Hypertable, and saw some great benchmarks done with that..
However, I couldn't find good comparison between the two.
Just to get things straight: I know that Hypertable is considered closer to a DB than to a cache. On the other hand, it's not exactly an RDBMS - in fact, it's exactly not an RDBMS. It has its own benefits, but the question is whether they're worth the performance cost (if any)?
Hypertable is an implementation of concepts in Google's BigTable. Namely a column-oriented DB which has properties of being highly denormalized which means it doesn't need joins.
Memcached is an in-memory caching layer which acts like a distributed hashtable, keeping your app from having to hit the actual DB.
Both lend themselves well to being distributed and work well with MapReduce style topologies but they serve different purposes. Memcached/DHT is going to serve to speed access to data in memory while HyperTable/BigTable are actual mechanisms for permanent data storage on disk.
Memcached is used for speeding things up, e.g. results of SQL queries, without going to DB, by storing everything in memory (RAM).
Hypertable (HBase, Cassandra, MongoDB etc.) and others are permanent storage NoSQL DBs (data stored and retrieved from Hard Drives). They can't give you the performance of the reading/writing from/to RAM (e.g. memcached). So these are not compared to one another.
A better use case is to use NoSQL DBs for permanent storage, and using memcached as a front-side fast access cache between web-application and (NoSQL or any) DB.

Resources