Reliable memcached pool - caching

I currently having below issue: I run multiple memcached instances across various servers, if any of the server go down, my application screws up(increases hits to backend). Should I use moxi or couchbase ?. I don't want to change in my code.

If you use memcached today, install Moxi on each application server and point it at a Couchbase cluster with a Couchbase bucket. That is the whole idea around moxi. It is meant to be that bridge between memcached and Couchbase. Just do not make the mistake that some people do and install Couchbase on the same nodes as your application servers like some do with memcached.
Just know that you will not get the full features that Couchbase offers, as some of that requires the SDKs, but you will get the pieces it sounds like you need initially.

Related

Couchbase warmup strategies

We have a memcached cluster running in production. Now we are replacing memcached with a Couchbase cluster as a persistent cache layer. The question is how to implement this cut-over and how to warm up Couchbase bucket. Obviously we can't simply switch over to the cold Couchbase since starting with old cache would bring the whole site down.
One option I was thinking is to warm up the Couchbase as a memcached node first. That means Couchbase is using the (non-persistent) memcached bucket, and getting the cache set/get traffic like any other memcached node. The good thing about it is there is minimum code changes (what's needed is configure the moxi proxy to take memcached traffic, and register that node as a memcached node). Later we will convert all memcached buckets to Couchbase. But not sure Couchbase supports the conversions between these two types of buckets.
The 2nd option is set up the persistent Couchbase bucket (as opposed to non-persistent memcached bucket) at the beginning. We change the production cache client to replicate all the traffic to both memcached and coucbase clusters. We monitor the Couchbase bucket and once the cache items reach certain size, we complete the cut-over. A small drawback is the extra complexity to change the cache client.
Thoughts?
EDIT on Aug 9, 2016
As I later found out, converting memcached bucket to couchbase bucket is not supported in Couchbase. So the first option is not feasible.
Finally we decide to set up the Client-side (standalone) proxy in each application host. We do it incrementally from host to host to ramp up the cache traffic. That way the changes in the site is small enough.
If you want easy, less work, and proven to work well, do the following:
Set up a Moxi client on each application server.
Point Moxi to Couchbase bucket on the Couchbase cluster.
Change your web application servers to point at the local MOXI install.
For your next code revision start converting your code to using the Couchbase SDK instead of memcached.
Yes, there will be a time where things will not be hot in the cache, but it will not take long for Couchbase to get populated. This method is used all of the time to switch over. It is easy, nearly fool proof. One thing I have seen people do is try and copy things from their existing memcached servers over to Couchbase before cutting over, but what I am not sure of is how they new the key of each value in memcached.
Also note that Moxi is an interim step to easily get off of regular memcached and it is great, but for the long run, it is much better to switch to the SDK. The SDK has many more features than pure memcached.
Do not use the memcached buckets as they have none of the HA, persistence or whatever features of Couchbase.

Is it necessary for memcached to replicate its data?

I understand that memcached is a distributed caching system. However, is it entirely necessary for memcached to replicate? The objective is to persist sessions in a clustered environment.
For example if we have memcached running on say 2 servers, both with data on it, and server #1 goes down, could we potentially lose session data that was stored on it? In other words, what should we expect to see happen should any memcached server (storing data) goes down and how would it affect our sessions in a clustered environment?
At the end of the day, will it be up to use to add some fault tolerance to our application? For example, if the key doesn't exist possibly because one of the servers it was on went down, re-query and store back to memcached?
From what I'm reading, it appears to lean in this direction but would like confirmation: https://developers.google.com/appengine/articles/scaling/memcache#transient
Thanks in advance!
Memcached has it's own fault tolerance built in so you don't need to add it to your application. I think providing an example will show why this is the case. Let's say you have 2 memcached servers set up in front of your database (let's say it's mysql). Initially when you start your application there will be nothing in memcached. When your application needs to get data if will first check in memcached and if it doesn't exist then it will read the data from the database and insert it into memcached before returning it to the user. For writes you will make sure that you insert the data into both your database and memcached. As you application continues to run it will populate the memcached servers with a bunch of data and take load off of your database.
Now one of your memcached servers crashes and you lose half of your cached data. What will happen is that your application will now be going to the database more frequently right after the crash and your application logic will continue to insert data into memcached except everything will go directly to the server that didn't crash. The only consequence here is that your cache is smaller and your database might need to do a little bit more work if everything doesn't fit into the cache. Your memcached client should also be able to handle the crash since it will be able to figure out where your remaining healthy memcached servers are and it will automatically hash values into them accordingly. So in short you don't need any extra logic for failure situations in memcached since the memcached client should take care of this for you. You just need to understand that memcached servers going down might mean your database has to do a lot of extra work. I also wouldn't recommend re-populating the cache after a failure. Just let the cache warm itself back up since there's no point in loading items that you aren't going to use in the near future.
m03geek also made a post where he mentioned that you could also use Couchbase and this is true, but I want to add a few things to his response about what the pros and cons are. First off Couchbase has two bucket (database) types and these are the Memcached Bucket and the Couchbase Bucket. The Memcached bucket is plain memcached and everything I wrote above is valid for this bucket. The only reasons you might want to go with Couchbase if you are going to use the memcached bucket are that you get a nice web ui which will provide stats about your memcached cluster along with ease of use of adding and removing servers. You can also get paid support down the road for Couchbase.
The Couchbase bucket is totally different in that it is not a cache, but an actual database. You can completely drop your backend database and just use this bucket type. One nice thing about the Couchbase bucket is that it provides replication and therefore prevents the cold cache problem that memcached has. I would suggest reading the Couchbase documentation if this sounds interesting you you since there are a lot of feature you get with the Couchbase bucket.
This paper about how Facebook uses memcached might be interesting too.
https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf
Couchbase embedded memcached and "vanilla" memcached have some differences. One of them, as far as I know, is that couchbase's memcached servers act like one. This means that if you store your key-value on one server, you'll be able to retreive it from another server in cluster. And vanilla memcached "clusters" are usally built with sharding technique, which means on app side you should know what server contain desired key.
My opinion is that replicating memcached data is unnessesary. Modern datacenters provide almost 99% uptime. So if someday one of your memcached servers will go down just some of your online users will be needed to relogin.
Also on many websites you can see "Remember me" checkbox that sets a cookie, which can be used to restore session. If your users will have that cookie they will not even notice that one of your servers were down. (that's answer for your question about "add some fault tolerance to our application")
But you can always use something like haproxy and replicate all your session data on 2 or more independent servers. In this case to store 1 user session you'll need N times more RAM, where N is number of replicas.
Another way - to use couchbase to store sessions. Couchbase cluster support replicas "out of the box" and it also stores data on disk, so if your node (or all nodes) will suddenly shutdown or reboot, session data will not lost.
Short answer: memcached with "remember me" cookie and without replication should be enough.

How do you distribute your app across multiple servers using EC2?

For the first time I am developing an app that requires quite a bit of scaling, I have never had an application need to run on multiple instances before.
How is this normally achieved? Do I cluster SQL servers then mirror the programming across all servers and use load balancing?
Or do I separate out the functionality to run some on one server some on another?
Also how do I push out code to all my EC2 windows instances?
This will depend on the requirements you have. But as a general guideline (I am assuming a website) I would separate db, webserver, caching server etc to different instance(s) and use s3(+cloudfont) for static assets. I would also make sure that some proper rate limiting is in place so that only legitimate load is on the infrastructure.
For RDBMS server I might setup a master-slave db setup (RDS makes this easier), use db sharding etc. DB cluster solutions also exists which will be more complex to setup but simplifies database access for the application programmer. I would also check all the db queries and the tune db/sql queries accordingly. In some cases pure NoSQL type databases might be better than RDBMS or a mix of both where the application switches between them depending on the data required.
For webserver I will setup a loadbalancer and then use autoscaling on the webserver instance(s) behind the loadbalancer. Something similar will apply for app server if any. I will also tune the web servers settings.
Caching server will also be separated into its on cluster of instance(s). ElastiCache seems like a nice service. Redis has comparable performance to memcache but has more features(like lists, sets etc) which might come in handy when scaling.
Disclaimer - I'm not going to mention any Windows specifics because I have always worked on Unix machines. These guidelines are fairly generic.
This is a subjective question and everyone would tailor one's own system in a unique style. Here are a few guidelines I follow.
If it's a web application, separate the presentation (front-end), middleware (APIs) and database layers. A sliced architecture scales the best as compared to a monolithic application.
Database - Amazon provides excellent and highly available services (unless you are on us-east availability zone) for SQL and NoSQL data stores. You might want to check out RDS for Relational databases and DynamoDb for NoSQL. Both scale well and you need not worry about managing and load sharding/clustering your data stores once you launch them.
Middleware APIs - This is a crucial part. It is important to have a set of APIs (preferably REST, but you could pretty much use anything here) which expose your back-end functionality as a service. A service oriented architecture can be scaled very easily to cater multiple front-facing clients such as web, mobile, desktop, third-party widgets, etc. Middleware APIs should typically NOT be where your business logic is processed, most of it (or all of it) should be translated to database lookups/queries for higher performance. These services could be load balanced for high availability. Amazon's Elastic Load Balancers (ELB) are good for starters. If you want to get into some more customization like blocking traffic for certain set of IP addresses, performing Blue/Green deployments, then maybe you should consider HAProxy load balancers deployed to separate instances.
Front-end - This is where your presentation layer should reside. It should avoid any direct database queries except for the ones which are limited to the scope of the front-end e.g.: a simple Redis call to get the latest cache keys for front-end fragments. Here is where you could pretty much perform a lot of caching, right from the service calls to the front-end fragments. You could use AWS CloudFront for static assets delivery and AWS ElastiCache for your cache store. ElastiCache is nothing but a managed memcached cluster. You should even consider load balancing the front-end nodes behind an ELB.
All this can be bundled and deployed with AutoScaling using AWS Elastic Beanstalk. It currently supports ASP .NET, PHP, Python, Java and Ruby containers. AWS Elastic Beanstalk still has it's own limitations but is a very cool way to manage your infrastructure with the least hassle for monitoring, scaling and load balancing.
Tip: Identifying the read and write intensive areas of your application helps a lot. You could then go ahead and slice your infrastructure accordingly and perform required optimizations with a read or write focus at a time.
To sum it all, Amazon AWS has pretty much everything you could possibly use to craft your server topology. It's upon you to choose components.
Hope this helps!
The way I would do it would be, to have 1 server as the DB server with mysql running on it. All my data on memcached, which can span across multiple servers and my clients with a simple "if not on memcached, read from db, put it on memcached and return".
Memcached is very easy to scale, as compared to a DB. A db scaling takes a lot of administrative effort. Its a pain to get it right and working. So I choose memcached. Infact I have extra memcached servers up, just to manage downtime (if any of my memcached) servers.
My data is mostly read, and few writes. And when writes happen, I push the data to memcached too. All in all this works better for me, code, administrative, fallback, failover, loadbalancing way. All win. You just need to code a "little" bit better.
Clustering mysql is more tempting, as it seems more easy to code, deploy, maintain and keep up and performing. Remember mysql is harddisk based, and memcached is memory based, so by nature its much more faster (10 times atleast). And since it takes over all the read load from the db, your db config can be REALLY simple.
I really hope someone points to a contrary argument here, I would love to hear it.

Amazon EC2 + Windows Server 2008 + Memcached = how?

We are building a system that would benefit greatly from a Distributed Caching mechanism, like Memcached. But i cant get my head around the configuration of Memcached daemons and clients finding each other on an Amazon Data Center. Do we manually setup the IP addresses of each memcache instance (they wont be dedicated, they will run on Web Servers or Worker Boxes) or is there a automagic way of getting them to talk to each other? I was looking at Microsoft Windows Server App Fabric Caching, but it seems to either need a file share or a domain to work correctly, and i have neither at the moment... given internal IP addresses are Transient on Amazon, i am wondering how you get around this...
I haven't setup a cluster of memcached servers before, but Membase is a solution that could take away all of the pain you are experiencing with memcached. Membase is basically memcached with a persistence layer underneath and comes with great cluster management software. Clustering servers together is as easy since all you need to do is tell the cluster what the ip address of the new node is. If you already have an application written for Memcached it will also work with Membase since Membase uses the Memcached protocol. It might be worth taking a look at.
I believe you could create an elastic ip in EC2 for each of the boxes that hold your memcached servers. These elastic ips can be dynamically mapped to any EC2 instance. Then your memcached clients just use the elastic ips as if they were static ip addresses.
http://alestic.com/2009/06/ec2-elastic-ip-internal
As you seemed to have discovered, Route53 is commonly used for these discovery purposes. For your specific use case, however, I would just use Amazon ElasticCache. Amazon has both memcached and redis compliant versions of ElasticCache and they manage the infrastructure for you including providing you with a DNS entry point. Also for managing things like asp.net session state, you might consider this article on the DynamoDB session state provider.
General rule of thumb: if you are developing a new app then try and leverage what the cloud provides vs. build it, it'll make your life way simpler.

Setting up Mongo DB and hosting

Recently I stumbled across mongoDB, couchDB etc.
I am hoping to have a play with this type of database and was wondering how much access to the hosting server one needs to get it running.
If anyone has any knowledge of this, I would love to know whether it can be set up to work when your app is hosted via a 'normal' hosting company.
I use Mongo, and so I'm really only speaking for Mongo, but your typical web hosting environment wouldn't allow you to set up your own database. You'd want root-level (admin) access to the server to set up Mongo. To get that, you'd want something like a VPS or a dedicated server.
However, to just play around with Mongo, I'd recommend downloading the binary for your OS and giving it a run. Their JavaScript shell interface is very easy to use.
Hope that helps!
Tim
Various ways:-
1) There are many free mongodb hosting available. Try DotCloud.com. Many others here http://www.cloudhostingguru.com/mongoDB-server-hosting.php
2) If you are asking specifically about shared hosting, the answer is mostly no. But, if you could run mongoDB somewhere else (like from the above link) and want to connect from your website, it is probably possible if your host allows your own extensions (for php)
3) VPS
How about virtual private server hosting? The host gives you what looks like an entire machine... hard drive, CPU, memory. You get to install whatever you want, since it's your (virtual) machine.
In terms of MongoDB like others have said, you need the ability to install the MongoDB software and run it (normally as a daemon). However, hosted services are just beginning to appear, such as MongoHQ. Perhaps something like this might be appropriate once its out of beta (or if you request an invite).
It appears hosted CouchDB services are also popping up, such as couch.io or Cloudant. I personally have no experience with Couch so I can be less certain than with Mongo, but I'd imagine that again to run it yourself, you'd need to install the software (and thus require root access).
If you don't currently have a VPS or dedicated server (or the cloud-based versions of the aforementioned), perhaps moving your data out to a dedicated hosted service would be an ideal way to go to avoid the pain and expense of changing your hosting setup.
You can host your application and your database in the different hosting servers.
For MongoDB you can use mongohq or mongolab with space 0.5 Gb for free

Resources