mantain session between 2 instgances of tomcat - session

How to mantain session b/w 2 different instances of tomcat.
not asking about sticky session
do not want to use any token machenisum like fedrated login/o-outh

Well, the formal answer to this should be:
Use the session replication mechanism. For example, for Tomcat 7 is described here.
So if you know that the application is very small (say, less than 100 users, although it really depends on the data you're planning to store there) you can safely use it, and stop reading my answer here :)
However, in my opinion, small applications tend to grow, the number or users tends to increase, so the best would be not to maintain the session at all :). Because if you need session, this means that you store on server the information about the client. So when the number of client grows much, you won't be able to maintain that.
An alternative is:
Store the session information in some very-fast storage, for example Redis
Maintain some identifier of the user (the chances are that you already have one) and just query redis for the data. Redis has TTLs, so the data can be removed from Redis automatically.
The advantage is that the solution like this is much more scalable, so it can easily handle literally millions of records, while maintaining this at the level of JVM will make it impossible to scale.

Related

Which caching mechanism to use in my spring application in below scenarios

We are using Spring boot application with Maria DB database. We are getting data from difference services and storing in our database. And while calling other service we need to fetch data from db (based on mapping) and call the service.
So to avoid database hit, we want to cache all mapping data in cache and use it to retrieve data and call service API.
So our ask is - Add data in Cache when it gets created in database (could add up-to millions records) and remove from cache when status of one of column value is "xyz" (for example) or based on eviction policy.
Should we use in-memory cache using Hazelcast/ehCache or Redis/Couch base?
Please suggest.
Thanks
I mostly agree with Rick in terms of don't build it until you need it, however it is important these days to think early of where this caching layer would fit later and how to integrate it (for example using interfaces). Adding it into a non-prepared system is always possible but much more expensive (in terms of hours) and complicated.
Ok to the actual question; disclaimer: Hazelcast employee
In general for caching Hazelcast, ehcache, Redis and others are all good candidates. The first question you want to ask yourself though is, "can I hold all necessary records in the memory of a single machine. Especially in terms for ehcache you get replication (all machines hold all information) which means every single node needs to keep them in memory. Depending on the size you want to cache, maybe not optimal. In this case Hazelcast might be the better option as we partition data in a cluster and optimize the access to a single network hop which minimal overhead over network latency.
Second question would be around serialization. Do you want to store information in a highly optimized serialization (which needs code to transform to human readable) or do you want to store as JSON?
Third question is about the number of clients and threads that'll access the data storage. Obviously a local cache like ehcache is always the fastest option, for the tradeoff of lots and lots of memory. Apart from that the most important fact is the treading model the in-memory store uses. It's either multithreaded and nicely scaling or a single-thread concept which becomes a bottleneck when you exhaust this thread. It is to overcome with more processes but it's a workaround to utilize todays systems to the fullest.
In more general terms, each of your mentioned systems would do the job. The best tool however should be selected by a POC / prototype and your real world use case. The important bit is real world, as a single thread behaves amazing under low pressure (obviously way faster) but when exhausted will become a major bottleneck (again obviously delaying responses).
I hope this helps a bit since, at least to me, every answer like "yes we are the best option" would be an immediate no-go for the person who said it.
Build InnoDB with the memcached Plugin
https://dev.mysql.com/doc/refman/5.7/en/innodb-memcached.html

Redis: using two instances or just one (caching and storage)?

We need to perform rate limiting for requests to our API. We have a lot of web servers, and the rate limit should be shared between all of them. Also, the rate limit demands a certain amount of ephemeral storage (we want to store the users quota for a certain period of time).
We have a great rate limiting implementation that works with Redis by using SETEX. In this use case we need Redis to also be used a storage (for a short while, according to the expiration set on the SETEX calls). Also, the cache needs to be shared across all servers, and there is no way we could use something like an in-memory cache on each web server for dealing with the rate limiting since the rate limiting is per user - so we expect to have a lot of memory consumed for this purpose. So this process is a great use case for a Redis cluster.
Thing is - the same web server that performs the rate limit, also has some other caching needs. It fetches some stuff from a DB, and then caches the results in two layers: first, in an in-memory LRU-cache (on the actual server) and the second layer is Redis again - this time used as cache-only (no storage). In case the item gets evicted from the in-memory LRU-cache, it is passed on to be saved in Redis (so that even when a cache miss occurs in-memory, there would still be a cache-hit because thanks to Redis).
Should we use the same Redis instance for both needs (rate limiter that needs storage on one hand and cache layer that does not on the other)? I guess we could use a single Redis instance that includes storage (not the cache only option) and just use that for both needs? Would it be better, performance wise, for each server of ours to talk to two Redis instances - one that's used as cache-only and one that also features the storage option?
I always recommend dividing your setup into distinct data roles. Combining them sounds neat but in practice can be a real pain. In your case you ave two distinct "data roles": cached data and stored data. That is two major classes of distinction which means use two different instances.
In your particular case isolating them will be easier from an operational standpoint when things go wrong or need upgrading. You'll avoid intermingling services such that an issue in caching causes issues in your "storage" layer - or the inverse.
Redis usage tends to grow into more areas. If you get in the habit of dedicated Redis endpoints now you'll be better able to grow your usage in the future, as opposed to having to refactor and restructure into it when things get a bit rough.

Growing hash-of-queues beyond main memory limits

I have a cluster application, which is divided into a controller and a bunch of workers. The controller runs on a dedicated host, the workers phone in over the network and get handed jobs, so far so normal. (Basically the "divide-and-conquer pipeline" from the zeromq manual, with job-specific wrinkles. That's not important right now.)
The controller's core data structure is unordered_map<string, queue<string>> in pseudo-C++ (the controller is actually implemented in Python, but I am open to the possibility of rewriting it in something else). The strings in the queues define jobs, and the keys of the map are a categorization of the jobs. The controller is seeded with a set of jobs; when a worker starts up, the controller removes one string from one of the queues and hands it out as the worker's first job. The worker may crash during the run, in which case the job gets put back on the appropriate queue (there is an ancillary table of outstanding jobs). If it completes the job successfully, it will send back a list of new job-strings, which the controller will sort into the appropriate queues. Then it will pull another string off some queue and send it to the worker as its next job; usually, but not always, it will pick the same queue as the previous job for that worker.
Now, the question. This data structure currently sits entirely in main memory, which was fine for small-scale test runs, but at full scale is eating all available RAM on the controller, all by itself. And the controller has several other tasks to accomplish, so that's no good.
What approach should I take? So far, I have considered:
a) to convert this to a primarily-on-disk data structure. It could be cached in RAM to some extent for efficiency, but jobs take tens of seconds to complete, so it's okay if it's not that efficient,
b) using a relational database - e.g. SQLite, (but SQL schemas are a very poor fit AFAICT),
c) using a NoSQL database with persistency support, e.g. Redis (data structure maps over trivially, but this still appears very RAM-centric to make me feel confident that the memory-hog problem will actually go away)
Concrete numbers: For a full-scale run, there will be between one and ten million keys in the hash, and less than 100 entries in each queue. String length varies wildly but is unlikely to be more than 250-ish bytes. So, a hypothetical (impossible) zero-overhead data structure would require 234 – 237 bytes of storage.
Ultimately, it all boils down on how you define efficiency needed on part of the controller -- e.g. response times, throughput, memory consumption, disk consumption, scalability... These properties are directly or indirectly related to:
number of requests the controller needs to handle per second (throughput)
acceptable response times
future growth expectations
From your options, here's how I'd evaluate each option:
a) to convert this to a primarily-on-disk data structure. It could be
cached in RAM to some extent for efficiency, but jobs take tens of
seconds to complete, so it's okay if it's not that efficient,
Given the current memory hog requirement, some form of persistent storage seems a reaonsable choice. Caching comes into play if there is a repeatable access pattern, say the same queue is accessed over and over again -- otherwise, caching is likely not to help.
This option makes sense if 1) you cannot find a database that maps trivially to your data structure (unlikely), 2) for some other reason you want to have your own on-disk format, e.g. you find that converting to a database is too much overhead (again, unlikely).
One alternative to databases is to look at persistent queues (e.g. using a RabbitMQ backing store), but I'm not sure what the per-queue or overall size limits are.
b) using a relational database - e.g. SQLite, (but SQL schemas are a
very poor fit AFAICT),
As you mention, SQL is probably not a good fit for your requirements, even though you could surely map your data structure to a relational model somehow.
However, NoSQL databases like MongoDB or CouchDB seem much more appropriate. Either way, a database of some sort seems viable as long as they can meet your throughput requirement. Many if not most NoSQL databases are also a good choice from a scalability perspective, as they include support for sharding data across multiple machines.
c) using a NoSQL database with persistency support, e.g. Redis (data
structure maps over trivially, but this still appears very RAM-centric
to make me feel confident that the memory-hog problem will actually go
away)
An in-memory database like Redis doesn't solve the memory hog problem, unless you set up a cluster of machines that each holds a part of the overall data. This makes sense only if keeping all data in-memory is needed due to low response times requirements. Yet, given the nature of your jobs, taking tens of seconds to complete, response times, respective to workers, hardly matter.
If you find, however, that response times do matter, Redis would be a good choice, as it handles partitioning trivially using either client-side consistent-hashing or at the cluster level, thus also supporting scalability scenarios.
In any case
Before you choose a solution, be sure to clarify your requirements. You mention you want an efficient solution. Since efficiency can only be gauged against some set of requirements, here's the list of questions I would try to answer first:
*Requirements
how many jobs are expected to complete, say per minute or per hour?
how many workers are needed to do so?
concluding from that:
what is the expected load in requestes/per second, and
what response times are expected on part of the controller (handing out jobs, receiving results)?
And looking into the future:
will the workload increase, i.e. does your solution need to scale up (more jobs per time unit, more more data per job?)
will there be a need for persistency of jobs and results, e.g. for auditing purposes?
Again, concluding from that,
how will this influence the number of workers?
what effect will it have on the number of requests/second on part of the controller?
With these answers, you will find yourself in a better position to choose a solution.
I would look into a message queue like RabbitMQ. This way it will first fill up the RAM and then use the disk. I have up to 500,000,000 objects in queues on a single server and it's just plugging away.
RabbitMQ works on Windows and Linux and has simple connectors/SDKs to about any kind of language.
https://www.rabbitmq.com/

Memcached vs. Redis? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 2 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
We're using a Ruby web-app with Redis server for caching. Is there a point to test Memcached instead?
What will give us better performance? Any pros or cons between Redis and Memcached?
Points to consider:
Read/write speed.
Memory usage.
Disk I/O dumping.
Scaling.
Summary (TL;DR)
Updated June 3rd, 2017
Redis is more powerful, more popular, and better supported than memcached. Memcached can only do a small fraction of the things Redis can do. Redis is better even where their features overlap.
For anything new, use Redis.
Memcached vs Redis: Direct Comparison
Both tools are powerful, fast, in-memory data stores that are useful as a cache. Both can help speed up your application by caching database results, HTML fragments, or anything else that might be expensive to generate.
Points to Consider
When used for the same thing, here is how they compare using the original question's "Points to Consider":
Read/write speed: Both are extremely fast. Benchmarks vary by workload, versions, and many other factors but generally show redis to be as fast or almost as fast as memcached. I recommend redis, but not because memcached is slow. It's not.
Memory usage: Redis is better.
memcached: You specify the cache size and as you insert items the daemon quickly grows to a little more than this size. There is never really a way to reclaim any of that space, short of restarting memcached. All your keys could be expired, you could flush the database, and it would still use the full chunk of RAM you configured it with.
redis: Setting a max size is up to you. Redis will never use more than it has to and will give you back memory it is no longer using.
I stored 100,000 ~2KB strings (~200MB) of random sentences into both. Memcached RAM usage grew to ~225MB. Redis RAM usage grew to ~228MB. After flushing both, redis dropped to ~29MB and memcached stayed at ~225MB. They are similarly efficient in how they store data, but only one is capable of reclaiming it.
Disk I/O dumping: A clear win for redis since it does this by default and has very configurable persistence. Memcached has no mechanisms for dumping to disk without 3rd party tools.
Scaling: Both give you tons of headroom before you need more than a single instance as a cache. Redis includes tools to help you go beyond that while memcached does not.
memcached
Memcached is a simple volatile cache server. It allows you to store key/value pairs where the value is limited to being a string up to 1MB.
It's good at this, but that's all it does. You can access those values by their key at extremely high speed, often saturating available network or even memory bandwidth.
When you restart memcached your data is gone. This is fine for a cache. You shouldn't store anything important there.
If you need high performance or high availability there are 3rd party tools, products, and services available.
redis
Redis can do the same jobs as memcached can, and can do them better.
Redis can act as a cache as well. It can store key/value pairs too. In redis they can even be up to 512MB.
You can turn off persistence and it will happily lose your data on restart too. If you want your cache to survive restarts it lets you do that as well. In fact, that's the default.
It's super fast too, often limited by network or memory bandwidth.
If one instance of redis/memcached isn't enough performance for your workload, redis is the clear choice. Redis includes cluster support and comes with high availability tools (redis-sentinel) right "in the box". Over the past few years redis has also emerged as the clear leader in 3rd party tooling. Companies like Redis Labs, Amazon, and others offer many useful redis tools and services. The ecosystem around redis is much larger. The number of large scale deployments is now likely greater than for memcached.
The Redis Superset
Redis is more than a cache. It is an in-memory data structure server. Below you will find a quick overview of things Redis can do beyond being a simple key/value cache like memcached. Most of redis' features are things memcached cannot do.
Documentation
Redis is better documented than memcached. While this can be subjective, it seems to be more and more true all the time.
redis.io is a fantastic easily navigated resource. It lets you try redis in the browser and even gives you live interactive examples with each command in the docs.
There are now 2x as many stackoverflow results for redis as memcached. 2x as many Google results. More readily accessible examples in more languages. More active development. More active client development. These measurements might not mean much individually, but in combination they paint a clear picture that support and documentation for redis is greater and much more up-to-date.
Persistence
By default redis persists your data to disk using a mechanism called snapshotting. If you have enough RAM available it's able to write all of your data to disk with almost no performance degradation. It's almost free!
In snapshot mode there is a chance that a sudden crash could result in a small amount of lost data. If you absolutely need to make sure no data is ever lost, don't worry, redis has your back there too with AOF (Append Only File) mode. In this persistence mode data can be synced to disk as it is written. This can reduce maximum write throughput to however fast your disk can write, but should still be quite fast.
There are many configuration options to fine tune persistence if you need, but the defaults are very sensible. These options make it easy to setup redis as a safe, redundant place to store data. It is a real database.
Many Data Types
Memcached is limited to strings, but Redis is a data structure server that can serve up many different data types. It also provides the commands you need to make the most of those data types.
Strings (commands)
Simple text or binary values that can be up to 512MB in size. This is the only data type redis and memcached share, though memcached strings are limited to 1MB.
Redis gives you more tools for leveraging this datatype by offering commands for bitwise operations, bit-level manipulation, floating point increment/decrement support, range queries, and multi-key operations. Memcached doesn't support any of that.
Strings are useful for all sorts of use cases, which is why memcached is fairly useful with this data type alone.
Hashes (commands)
Hashes are sort of like a key value store within a key value store. They map between string fields and string values. Field->value maps using a hash are slightly more space efficient than key->value maps using regular strings.
Hashes are useful as a namespace, or when you want to logically group many keys. With a hash you can grab all the members efficiently, expire all the members together, delete all the members together, etc. Great for any use case where you have several key/value pairs that need to grouped.
One example use of a hash is for storing user profiles between applications. A redis hash stored with the user ID as the key will allow you to store as many bits of data about a user as needed while keeping them stored under a single key. The advantage of using a hash instead of serializing the profile into a string is that you can have different applications read/write different fields within the user profile without having to worry about one app overriding changes made by others (which can happen if you serialize stale data).
Lists (commands)
Redis lists are ordered collections of strings. They are optimized for inserting, reading, or removing values from the top or bottom (aka: left or right) of the list.
Redis provides many commands for leveraging lists, including commands to push/pop items, push/pop between lists, truncate lists, perform range queries, etc.
Lists make great durable, atomic, queues. These work great for job queues, logs, buffers, and many other use cases.
Sets (commands)
Sets are unordered collections of unique values. They are optimized to let you quickly check if a value is in the set, quickly add/remove values, and to measure overlap with other sets.
These are great for things like access control lists, unique visitor trackers, and many other things. Most programming languages have something similar (usually called a Set). This is like that, only distributed.
Redis provides several commands to manage sets. Obvious ones like adding, removing, and checking the set are present. So are less obvious commands like popping/reading a random item and commands for performing unions and intersections with other sets.
Sorted Sets (commands)
Sorted Sets are also collections of unique values. These ones, as the name implies, are ordered. They are ordered by a score, then lexicographically.
This data type is optimized for quick lookups by score. Getting the highest, lowest, or any range of values in between is extremely fast.
If you add users to a sorted set along with their high score, you have yourself a perfect leader-board. As new high scores come in, just add them to the set again with their high score and it will re-order your leader-board. Also great for keeping track of the last time users visited and who is active in your application.
Storing values with the same score causes them to be ordered lexicographically (think alphabetically). This can be useful for things like auto-complete features.
Many of the sorted set commands are similar to commands for sets, sometimes with an additional score parameter. Also included are commands for managing scores and querying by score.
Geo
Redis has several commands for storing, retrieving, and measuring geographic data. This includes radius queries and measuring distances between points.
Technically geographic data in redis is stored within sorted sets, so this isn't a truly separate data type. It is more of an extension on top of sorted sets.
Bitmap and HyperLogLog
Like geo, these aren't completely separate data types. These are commands that allow you to treat string data as if it's either a bitmap or a hyperloglog.
Bitmaps are what the bit-level operators I referenced under Strings are for. This data type was the basic building block for reddit's recent collaborative art project: r/Place.
HyperLogLog allows you to use a constant extremely small amount of space to count almost unlimited unique values with shocking accuracy. Using only ~16KB you could efficiently count the number of unique visitors to your site, even if that number is in the millions.
Transactions and Atomicity
Commands in redis are atomic, meaning you can be sure that as soon as you write a value to redis that value is visible to all clients connected to redis. There is no wait for that value to propagate. Technically memcached is atomic as well, but with redis adding all this functionality beyond memcached it is worth noting and somewhat impressive that all these additional data types and features are also atomic.
While not quite the same as transactions in relational databases, redis also has transactions that use "optimistic locking" (WATCH/MULTI/EXEC).
Pipelining
Redis provides a feature called 'pipelining'. If you have many redis commands you want to execute you can use pipelining to send them to redis all-at-once instead of one-at-a-time.
Normally when you execute a command to either redis or memcached, each command is a separate request/response cycle. With pipelining, redis can buffer several commands and execute them all at once, responding with all of the responses to all of your commands in a single reply.
This can allow you to achieve even greater throughput on bulk importing or other actions that involve lots of commands.
Pub/Sub
Redis has commands dedicated to pub/sub functionality, allowing redis to act as a high speed message broadcaster. This allows a single client to publish messages to many other clients connected to a channel.
Redis does pub/sub as well as almost any tool. Dedicated message brokers like RabbitMQ may have advantages in certain areas, but the fact that the same server can also give you persistent durable queues and other data structures your pub/sub workloads likely need, Redis will often prove to be the best and most simple tool for the job.
Lua Scripting
You can kind of think of lua scripts like redis's own SQL or stored procedures. It's both more and less than that, but the analogy mostly works.
Maybe you have complex calculations you want redis to perform. Maybe you can't afford to have your transactions roll back and need guarantees every step of a complex process will happen atomically. These problems and many more can be solved with lua scripting.
The entire script is executed atomically, so if you can fit your logic into a lua script you can often avoid messing with optimistic locking transactions.
Scaling
As mentioned above, redis includes built in support for clustering and is bundled with its own high availability tool called redis-sentinel.
Conclusion
Without hesitation I would recommend redis over memcached for any new projects, or existing projects that don't already use memcached.
The above may sound like I don't like memcached. On the contrary: it is a powerful, simple, stable, mature, and hardened tool. There are even some use cases where it's a little faster than redis. I love memcached. I just don't think it makes much sense for future development.
Redis does everything memcached does, often better. Any performance advantage for memcached is minor and workload specific. There are also workloads for which redis will be faster, and many more workloads that redis can do which memcached simply can't. The tiny performance differences seem minor in the face of the giant gulf in functionality and the fact that both tools are so fast and efficient they may very well be the last piece of your infrastructure you'll ever have to worry about scaling.
There is only one scenario where memcached makes more sense: where memcached is already in use as a cache. If you are already caching with memcached then keep using it, if it meets your needs. It is likely not worth the effort to move to redis and if you are going to use redis just for caching it may not offer enough benefit to be worth your time. If memcached isn't meeting your needs, then you should probably move to redis. This is true whether you need to scale beyond memcached or you need additional functionality.
Use Redis if
You require selectively deleting/expiring items in the cache. (You need this)
You require the ability to query keys of a particular type. eq. 'blog1:posts:*', 'blog2:categories:xyz:posts:*'. oh yeah! this is very important. Use this to invalidate certain types of cached items selectively. You can also use this to invalidate fragment cache, page cache, only AR objects of a given type, etc.
Persistence (You will need this too, unless you are okay with your cache having to warm up after every restart. Very essential for objects that seldom change)
Use memcached if
Memcached gives you headached!
umm... clustering? meh. if you gonna go that far, use Varnish and Redis for caching fragments and AR Objects.
From my experience I've had much better stability with Redis than Memcached
Memcached is multithreaded and fast.
Redis has lots of features and is very fast, but completely limited to one core as it is based on an event loop.
We use both. Memcached is used for caching objects, primarily reducing read load on the databases. Redis is used for things like sorted sets which are handy for rolling up time-series data.
This is too long to be posted as a comment to already accepted answer, so I put it as a separate answer
One thing also to consider is whether you expect to have a hard upper memory limit on your cache instance.
Since redis is an nosql database with tons of features and caching is only one option it can be used for, it allocates memory as it needs it — the more objects you put in it, the more memory it uses. The maxmemory option does not strictly enforces upper memory limit usage. As you work with cache, keys are evicted and expired; chances are your keys are not all the same size, so internal memory fragmentation occurs.
By default redis uses jemalloc memory allocator, which tries its best to be both memory-compact and fast, but it is a general purpose memory allocator and it cannot keep up with lots of allocations and object purging occuring at a high rate. Because of this, on some load patterns redis process can apparently leak memory because of internal fragmentation. For example, if you have a server with 7 Gb RAM and you want to use redis as non-persistent LRU cache, you may find that redis process with maxmemory set to 5Gb over time would use more and more memory, eventually hitting total RAM limit until out-of-memory killer interferes.
memcached is a better fit to scenario described above, as it manages its memory in a completely different way. memcached allocates one big chunk of memory — everything it will ever need — and then manages this memory by itself, using its own implemented slab allocator. Moreover, memcached tries hard to keep internal fragmentation low, as it actually uses per-slab LRU algorithm, when LRU evictions are done with object size considered.
With that said, memcached still has a strong position in environments, where memory usage has to be enforced and/or be predictable. We've tried to use latest stable redis (2.8.19) as a drop-in non-persistent LRU-based memcached replacement in workload of 10-15k op/s, and it leaked memory A LOT; the same workload was crashing Amazon's ElastiCache redis instances in a day or so because of the same reasons.
Memcached is good at being a simple key/value store and is good at doing key => STRING. This makes it really good for session storage.
Redis is good at doing key => SOME_OBJECT.
It really depends on what you are going to be putting in there. My understanding is that in terms of performance they are pretty even.
Also good luck finding any objective benchmarks, if you do find some kindly send them my way.
If you don't mind a crass writing style, Redis vs Memcached on the Systoilet blog is worth a read from a usability standpoint, but be sure to read the back & forth in the comments before drawing any conclusions on performance; there are some methodological problems (single-threaded busy-loop tests), and Redis has made some improvements since the article was written as well.
And no benchmark link is complete without confusing things a bit, so also check out some conflicting benchmarks at Dormondo's LiveJournal and the Antirez Weblog.
Edit -- as Antirez points out, the Systoilet analysis is rather ill-conceived. Even beyond the single-threading shortfall, much of the performance disparity in those benchmarks can be attributed to the client libraries rather than server throughput. The benchmarks at the Antirez Weblog do indeed present a much more apples-to-apples (with the same mouth) comparison.
I got the opportunity to use both memcached and redis together in the caching proxy that i have worked on , let me share you where exactly i have used what and reason behind same....
Redis >
1) Used for indexing the cache content , over the cluster . I have more than billion keys in spread over redis clusters , redis response times is quite less and stable .
2) Basically , its a key/value store , so where ever in you application you have something similar, one can use redis with bothering much.
3) Redis persistency, failover and backup (AOF ) will make your job easier .
Memcache >
1) yes , an optimized memory that can be used as cache . I used it for storing cache content getting accessed very frequently (with 50 hits/second)with size less than 1 MB .
2) I allocated only 2GB out of 16 GB for memcached that too when my single content size was >1MB .
3) As the content grows near the limits , occasionally i have observed higher response times in the stats(not the case with redis) .
If you ask for overall experience Redis is much green as it is easy to configure, much flexible with stable robust features.
Further , there is a benchmarking result available at this link , below are few higlight from same,
Hope this helps!!
Test. Run some simple benchmarks. For a long while I considered myself an old school rhino since I used mostly memcached and considered Redis the new kid.
With my current company Redis was used as the main cache. When I dug into some performance stats and simply started testing, Redis was, in terms of performance, comparable or minimally slower than MySQL.
Memcached, though simplistic, blew Redis out of water totally. It scaled much better:
for bigger values (required change in slab size, but worked)
for multiple concurrent requests
Also, memcached eviction policy is in my view, much better implemented, resulting in overall more stable average response time while handling more data than the cache can handle.
Some benchmarking revealed that Redis, in our case, performs very poorly. This I believe has to do with many variables:
type of hardware you run Redis on
types of data you store
amount of gets and sets
how concurrent your app is
do you need data structure storage
Personally, I don't share the view Redis authors have on concurrency and multithreading.
Another bonus is that it can be very clear how memcache is going to behave in a caching scenario, while redis is generally used as a persistent datastore, though it can be configured to behave just like memcached aka evicting Least Recently Used items when it reaches max capacity.
Some apps I've worked on use both just to make it clear how we intend the data to behave - stuff in memcache, we write code to handle the cases where it isn't there - stuff in redis, we rely on it being there.
Other than that Redis is generally regarded as superior for most use cases being more feature-rich and thus flexible.
It would not be wrong, if we say that redis is combination of (cache + data structure) while memcached is just a cache.
A very simple test to set and get 100k unique keys and values against redis-2.2.2 and memcached. Both are running on linux VM(CentOS) and my client code(pasted below) runs on windows desktop.
Redis
Time taken to store 100000 values is = 18954ms
Time taken to load 100000 values is = 18328ms
Memcached
Time taken to store 100000 values is = 797ms
Time taken to retrieve 100000 values is = 38984ms
Jedis jed = new Jedis("localhost", 6379);
int count = 100000;
long startTime = System.currentTimeMillis();
for (int i=0; i<count; i++) {
jed.set("u112-"+i, "v51"+i);
}
long endTime = System.currentTimeMillis();
System.out.println("Time taken to store "+ count + " values is ="+(endTime-startTime)+"ms");
startTime = System.currentTimeMillis();
for (int i=0; i<count; i++) {
client.get("u112-"+i);
}
endTime = System.currentTimeMillis();
System.out.println("Time taken to retrieve "+ count + " values is ="+(endTime-startTime)+"ms");
One major difference that hasn't been pointed out here is that Memcache has an upper memory limit at all times, while Redis does not by default (but can be configured to). If you would always like to store a key/value for certain amount of time (and never evict it because of low memory) you want to go with Redis. Of course, you also risk the issue of running out of memory...
Memcached will be faster if you are interested in performance, just even because Redis involves networking (TCP calls). Also internally Memcache is faster.
Redis has more features as it was mentioned by other answers.
The biggest remaining reason is specialization.
Redis can do a lot of different things and one side effect of that is developers may start using a lot of those different feature sets on the same instance. If you're using the LRU feature of Redis for a cache along side hard data storage that is NOT LRU it's entirely possible to run out of memory.
If you're going to setup a dedicated Redis instance to be used ONLY as an LRU instance to avoid that particular scenario then there's not really any compelling reason to use Redis over Memcached.
If you need a reliable "never goes down" LRU cache...Memcached will fit the bill since it's impossible for it to run out of memory by design and the specialize functionality prevents developers from trying to make it so something that could endanger that. Simple separation of concerns.
We thought of Redis as a load-takeoff for our project at work. We thought that by using a module in nginx called HttpRedis2Module or something similar we would have awesome speed but when testing with AB-test we're proven wrong.
Maybe the module was bad or our layout but it was a very simple task and it was even faster to take data with php and then stuff it into MongoDB. We're using APC as caching-system and with that php and MongoDB. It was much much faster then nginx Redis module.
My tip is to test it yourself, doing it will show you the results for your environment. We decided that using Redis was unnecessary in our project as it would not make any sense.
Redis is better.
The Pros of Redis are ,
It has a lot of data storage options such as string , sets , sorted sets , hashes , bitmaps
Disk Persistence of records
Stored Procedure (LUA scripting) support
Can act as a Message Broker using PUB/SUB
Whereas Memcache is an in-memory key value cache type system.
No support for various data type storages like lists , sets as in
redis.
The major con is Memcache has no disk persistence .
Here is the really great article/differences provided by Amazon
Redis is a clear winner comparing with memcached.
Only one plus point for Memcached
It is multithreaded and fast. Redis has lots of great features and is very fast, but limited to one core.
Great points about Redis, which are not supported in Memcached
Snapshots - User can take a snapshot of Redis cache and persist on
secondary storage any point of time.
Inbuilt support for many data structures like Set, Map, SortedSet,
List, BitMaps etc.
Support for Lua scripting in redis

Most efficient way to cache in a fastcgi app

For fun i am writing a fastcgi app. Right now all i do is generate a GUID and display it at the top of the page then make a db query based on the url which pulls data from one of my existing sites.
I would like to attempt to cache everything on the page except for the GUID. What is a good way of doing that? I heard of but never used redis. But it appears its a server which means its in a seperate process. Perhaps an in process solution would be faster? (unless its not?)
What is a good solution for page caching? (i'm using C++)
Your implementation sounds like you need a simple key-value caching mechanism, and you could possibly use a container like std::unordered_map from C++11, or its boost cousin, boost::unordered_map. unordered_map provides a hash table implementation. If you needed even higher performance at some point, you could also look at Boost.Intrusive which provides high performance, standard library-compatible containers.
If you roll your cache with the suggestions mentioned, a second concern will be expiring cache entries, because of the possibility your cached data will grow stale. I don't know what your data is like, but you can choose to implement a caching strategy like any of these:
after a certain time/number of uses, expire a cached entry
after a certain time/number of uses, expire the entire cache (extreme)
least-recently used - there's a stack overflow question concerning this: LRU cache design
Multithreaded/concurrent access may also be a concern, though as suggested in the link above, a possibility would be to lock the cache on access rather than worry about granular locking.
Now if you're talking about scaling, and moving up to multiple processes, and distributing server processes across multiple physical machines, the simple in-process caching might not be the way to go anymore (everyone could have different copies of data at any given time, inconsistency of performance if some server has cached data but others don't).
That's where Redis/Memcached/Membase/etc. shine - they are built for scaling and for offloading work from a database. They could be beaten out by a database and in-memory cache in performance (there is latency, after all, and a host of other factors), but when it comes to scaling, they are very useful and save load from a database, and can quickly serve requests. They also come with features cache expiration (implementations differ between them).
Best of all? They're easy to use and drop in. You don't have to choose redis/memcache from the outset, as caching itself is just an optimization and you can quickly replace the caching code with using, say, an in-memory cache of your own to using redis or something else.
There are still some differences between the caching servers though - membase and memcache distribute their data, while redis has master-slave replication.
For the record: I work in a company where we use memcached servers - we have several of them in the data center with the rest of our servers each having something like 16 GB of RAM allocated completely to cache.
edit:
And for speed comparisons, I'll adapt something from a Herb Sutter presentation I watched long ago:
process in-memory -> really fast
getting data from a local process in-memory data -> still really fast
data from local disk -> depends on your I/O device, SSD can be fast, but mechanical drives are glacial
getting data from remote process (in-memory data) -> fast-ish, and your cache servers better be close
getting data from remote process (disk) -> iceberg

Resources