Single instance Amazon EC2 - amazon-ec2

We're running a lightweight web app on a single EC2 server instance, which is fine for our needs, but we're wondering about monitoring and restarting it if it goes down.
We have a separate non-Amazon server we'd like to use to monitor the EC2 and start a fresh instance if necessary and shut down the old one. All our user data is on Elastic Storage, so we're not too worried about losing anything.
I was wondering if anyone has any experience of using EC2 in this way, and in particular of automating the process of starting the new instance? We have no problem creating something from scratch, but it seems like it should be a solved problem, so I was wondering if anyone has any tips, links, scripts, tutorials, etc to share.
Thanks.

You should have a look at puppet and its support for AWS. I would also look at the RightScale AWS library as well as this post about starting a server with the RightScale scripts. You may also find this article on web serving with EC2 useful. I have done something similar to this but without the external monitoring, the node monitored itself and shut down when it was no longer needed then a new one would start up later when there was more work to do.

Couple of points:
You MUST MUST MUST back up your Amazon EBS volume.
They claim "better" reliability, but not 100%, and it's SEVERAL orders of magnitude off of S3's "12 9's" of durability. S3 durability >> EBS durability. That's a fact. EBS supports a "snapshots" feature which backs up your storage efficiently and incrementally to S3. Also, with EBS snapshots, you only pay for the compressed deltas, which is typically far far less than the allocated volume size. In another life, I've sent lost-volume emails to smaller customers like you who "thought" that EBS was "durable" and trusted it with the only copy of a mission-critical database... it's heartbreaking.
Your Q: automating start-up of a new instance
The design path you mention is relatively untraveled; here's why... Lots of companies run redundant "hot-spare" instances where the second instance is booted and running. This allows rapid failover (seconds) in the event of "failure" (could be hardware or software). The issue with a "cold-spare" is that it's harder to keep the machine up to date and ready to pick up where the old box left off. More important, it's tricky to VALIDATE that the spare is capable of successfully recovering your production service. Hardware is more reliable than untested software systems. TEST TEST TEST. If you haven't tested your fail-over, it doesn't work.
The simple automation of starting a new EBS instance is easy, bordering on trivial. It's just a one-line bash script calling the EC2 command-line tools. What's tricky is everything on top of that. Such a solution pretty much implies a fully 100% automated deployment process. And this is all specific to your application. Can your app pull down all the data it needs to run (maybe it's stored in S3?). Can you kill you instance today and boot a new instance with 0.000 manual setup/install steps?
Or, you may be talking about a scenario I'll call "re-instancing an EBS volume":
EC2 box dies (root volume is EBS)
Force detach EBS volume
Boot new EC2 instance with the EBS volume
... That mostly works. The gotchas:
Doesn't protect against EBS failures, either total volume loss or an availability loss
Recovery time is O(minutes) assuming everything works just right
Your services need to be configured to restart automatically. It does no good to bring the box back if Nginx isn't running.
Your DNS routes or other services or whatever need to be ok with the IP-address changing. This can be worked around with ElasticIP.
How are your host SSH keys handled? Same name, new host key can break SSH-based automation when it gets the strong-warning for host-key-changed.
I don't have proof of this (other than seeing it happen once), but I believe that EC2/EBS _already_does_this_ automatically for boot-from-EBS instances
Again, the hard part here is on your plate. Can you stop your production service today and bring it up RELIABLY on a new instance? If so, the EC2 part of the story is really really easy.

As a side point:
All our user data is on Elastic Storage, so we're not too worried about losing anything.
I'd strongly suggest to regularly snapshot your EBS (Elastic Block Storage) to S3 if you are not doing that already.

You can use an autoscale group with a min/max/desired quantity of 1. Place the instance behind an ELB and have the autoscale group be triggered by the ELB healthy node count. This allows you to have built in monitoring by cloudwatch and the ELB health check. Anytime there is an issue the instance be replaced by the autoscale service.

If you have not checked 'Protect against accidental termination' you might want to do so.
Even if you have disabled 'Detailed Monitoring' for your instance you should still see the 'StatusCheckFailed' metric for your instance over which you can configure an alarm (In the CloudWatch dashboard)
Your application (hosted in a different server) should receive the alarm and start the instance using the AWS API (or CLI)
Since you have protected against accidental termination you would never need to spawn a new instance.

Related

How can i scale up an amazon ec2 instance quickly, in 2-3 mins?

I'm using AWS cloudwatch for scaling my application. I created launch configuration, autoscaling group, upscaling and downscaling alarm and policies. The problem is it is taking 5 mins to launch an instance from an AMI. Is there a way to reduce the start-up time from 5 to 2-3 mins?
No, it isn't possible to speed up the provisioning of a new EC2 instance by an AutoScaling scale up action.
I think it's important to appreciate all that EC2 is doing in those 5 minutes. It's building a new virtual machine, installing an image of an operating system on it, hooking it up to a network and bringing it into service. That's pretty impressive for 5 minutes work if you ask me.
If you're needing to scale up that quickly, then quite frankly you're doing it wrong. Even with autoscaling, you should always be a little over provisioned for your expected load. If you start getting close to that over scaled limit, then it's time to autoscale up. Don't provision exactly what you need, it won't work well.
The start up time depends on a few things:
The availability of resources for your instance type within the
availability zone.
The size of the AMI. In the case of a custom AMI image, it may need to be copied to the correct internal storage for the VM.
Initialization procedures. For windows, some images with user-data scripts can require a reboot to join the domain.
This is an old question, and as I have seen. Start times have improved for EC2 over the past years. Some providers like Google Cloud can provide servers in under a minute. So if your workload is that demanding, you may research the available providers and their operational differences.

Amazon EC2 - Fast AMI Creation in a Production Environment

We run a server architecture where we have an X number of base servers which are always on. Our servers process jobs sent to them and the vast majority of our job requests come in during the workday. To facilitate this particular spike in volume, we use EC2 auto-scaling.
I prefer to launch servers through auto-scaling with as much of a configured AMI as possible as opposed to launching from a base AMI and installing packages through long Chef or Puppet scripts.
In our current build process, we implement changes to our code base late at night when only our base servers are needed and no servers are launched through auto-scaling. But every once in a while, we'll have a critical bug fix that needs to be implemented immediately during the day.
We have a rather large EBS hard drive associated with our servers (app. 400 GB) and AMI creation of a base server with the latest changes usually takes upwards of one hour. This isn't a problem for late night deployments when no additional servers need to be launched, but causes us to lose valuable time during the day because it prevents us from launching additional servers when the latest AMI isn't ready.
Is there anything out there which can speed up the AMI creation process here? I've heard of Netflix's Aminator and Boxfuse, but are there any other alternatives? Also, how do these services stack up against one another?

EC2 for handling demand spikes

I'm writing the backend for a mobile app that does some cpu intensive work. We anticipate the app will not have heavy usage most of the time, but will have occasional spikes of high demand. I was thinking what we should do is reserve a couple of 24/7 servers to handle the steady-state of low demand traffic and then add and remove EC2 instances as needed to handle the spikes. The mobile app will first hit a simple load balancing server that does a simple round-robin user distribution among all the available processing servers. The load balancer will handle bringing new EC2 instances up and turning them back off as needed.
Some questions:
I've never written something like this before, does this sound like a good strategy?
What's the best way to handle bringing new EC2 instances up and back down? I was thinking I could just create X instances ahead of time, set them up as needed (install software, etc), and then stop each instance. The load balancer will then start and stop the instances as needed (eg through boto). I think this should be a lot faster and easier than trying to create new instances and install everything through a script or something. Good idea?
One thing I'm concerned about here is the cost of turning EC2 instances off and back on again. I looked at the AWS Usage Report and had difficulty interpreting it. I could see starting a stopped instance being a potentially costly operation. But it seems like since I'm just starting a stopped instance rather than provisioning a new one from scratch it shouldn't be too bad. Does that sound right?
This is a very reasonable strategy. I used it successfully before.
You may want to look at Elastic Load Balancing (ELB) in combination with Auto Scaling. Conceptually the two should solve this exact problem.
Back when I did this around 2010, ELB had some problems with certain types of HTTP requests that prevented us from using it. I understand those issues are resolved.
Since ELB was not an option, we manually launched instances from EBS snapshots as needed and manually added them to an NGinX load balancer. That certainly could have been automated using the AWS APIs, but our peaks were so predictable (end of month) that we just tasked someone to spin up the new instances and didn't get around to automating the task.
When an instance is stopped, I believe the only cost that you pay is for the EBS storage backing the instance and its data. Unless your instances have a huge amount of data associated, the EBS storage charge should be minimal. Perhaps things have changed since I last used AWS, but I would be surprised if this changed much if at all.
First with regards to costs, whether an instance is started from scratch or from a stopped state has no impact on cost. You are billed for the amount of compute units you use over time, period.
Second, what you are looking to do is called autoscaling. What you do is setup up a launch config that specifies an AMI you are going to use (along with any user-data configs you are using, the ELB and availiabilty zones you are going to use, min and max number of instances, etc. You set up a scaling group using that launch config. Then you set up scaling policies to determine what scaling actions are going to be attached to the group. You then attach cloud watch alarms to each of those policies to trigger the scaling actions.
You don't have servers in reserve that you attach to the ELB or anything like that. Everything is based on creating a single AMI that is used as the template for the servers you need.
You should read up on autoscaling at the link below:
http://aws.amazon.com/autoscaling/

Haproxy Load Balancer, EC2, writing my own availability script

I've been looking at high availability solutions such as heartbeat, and keepalived to failover when an haproxy load balancer goes down. I realised that although we would like high availability it's not really a requirement at this point in time to do it to the extent of the expenditure on having 2 load balancer instances running at any one time so that we get instant failover (particularly as one lb is going to be redundant in our setup).
My alternate solution is to fire up a new load balancer EC2 instance from an AMI if the current load balancer has stopped working and associate it to the elastic ip that our domain name points to. This should ensure that downtime is limited to the time it takes to fire up the new instance and associate the elastic ip, which given our current circumstance seems like a reasonably cost effective solution to high availability, particularly as we can easily do it multi-av zone. I am looking to do this using the following steps:
Prepare an AMI of the load balancer
Fire up a single ec2 instance acting as the load balancer and assign the Elastic IP to it
Have a micro server ping the current load balancer at regular intervals (we always have an extra micro server running anyway)
If the ping times out, fire up a new EC2 instance using the load balancer AMI
Associate the elastic ip to the new instance
Shut down the old load balancer instance
Repeat step 3 onwards with the new instance
I know how to run the commands in my script to start up and shut down EC2 instances, associate the elastic IP address to an instance, and ping the server.
My question is what would be a suitable ping here? Would a standard ping suffice at regular intervals, and what would be a good interval? Or is this a rather simplistic approach and there is a smarter health check that I should be doing?
Also if anyone foresees any problems with this approach please feel free to comment
I understand exactly where you're coming from, my company is in the same position. We care about having a highly available fault tolerant system however the overhead cost simply isn't viable for the traffic we get.
One problem I have with your solution is that you're assuming the micro instance and load balancer wont both die at the same time. With my experience with amazon I can tell you it's defiantly possible that this could happen, however unlikely, its possible that whatever causes your load balancer to die also takes down the micro instance.
Another potential problem is you also assume that you will always be able to start another replacement instance during downtime. This is simply not the case, take for example an outage amazon had in their us-east-1 region a few days ago. A power outage caused one of their zones to loose power. When they restored power and began to recover the instances their API's were not working properly because of the sheer load. During this time it took almost 1 hour before they were available. If an outage like this knocks out your load balancer and you're unable to start another you'll be down.
That being said. I find the ELB's provided by amazon are a better solution for me. I'm not sure what the reasoning is behind using HAProxy but I recommend investigating the ELB's as they will allow you to do things such as auto-scaling etc.
For each ELB you create amazon creates one load balancer in each zone that has an instance registered. These are still vulnerable to certain problems during severe outages at amazon like the one described above. For example during this downtime I could not add new instances to the load balancers but my current instances ( the ones not affected by the power outage ) were still serving requests.
UPDATE 2013-09-30
Recently we've changed our infrastructure to use a combination of ELB and HAProxy. I find that ELB gives the best availability but the fact that it uses DNS load balancing doesn't work well for my application. So our setup is ELB in front of a 2 node HAProxy cluster. Using this tool HAProxyCloud I created for AWS I can easily add auto scaling groups to the HAProxy servers.
I know this is a little old, but the solution you suggest is overcomplicated, there's a much simpler method that does exactly what you're trying to accomplish...
Just put your HAProxy machine, with your custom AMI in an auto-scaling group with a minimum AND maximum of 1 instance. That way when your instance goes down the ASG will bring it right back up, EIP and all. No external monitoring necessary, same if not faster response to downed instances.

How should I approach my EC2 contingency plan?

I've browsed through other questions on SO concerning backing up EC2, and it has provided me with a good basis, but I'm still a bit confused as to how I should approach my solution and develop a contingency plan. Most questions are fairly specific, but I have a pretty vanilla setup, and I think this information will be beneficial to future users. Let me provide my basic setup:
Basic small instance
Pushing files to S3
Running MongoDB
Running nginx
Now, due to the ephemeral nature of EC2, it's apparent that I need to bind my EC2 instance to EBS to ensure persistent storage. The reason I'm attempting to develop a contingency plan is that I'm worried my instance may disappear at any time (due to outages, etc). If my instance were to disappear, I'm concerned that I would have to spin up a new instance and reinstall all my applications before getting everything up an running again. A few questions:
How do I backup my instance to ensure that if it were to disappear, I could quickly bring it back up (preferably without having to reinstall all previous software)? I don't need a series of backups, just the previous days (or weeks) backup to ensure that a previous working version exists that can be started quickly.
If I use EBS instead of instance storage, it essentially acts in place of my hard drive right? So if I have MongoDB installed, I'm assuming the database it is writing to cab placed on EBS?
If I go with the small instance with 160 GB of storage and I use EBS, would I need to allocate 160 GB of EBS out of the gate, or is the 160 GB for instance storage only?
So, in summary, I would like a solution (either manual or automatic) that can create a snapshot of my EC2 instance to ensure that if it were to disappear, it could be reconstructed without having to spend the time to manually reconstruct everything.
In an ideal world, if my instance were to disappear, I could spin up an version of my instance with everything intact (to a point where things were backed up). Any resources or suggestions? Thanks in advance.
OK, here we go:
For backups:
Create your instance from one of the stock AWS images. Make sure it is an EBS-backed VM - depending on the size of the VM you pick, you'll get a volume assigned with 'n' GB of space, attached as the boot volume (/dev/sda1).
Configure the VM with whatever software you need, apply patches, tune for disk fragmentation, CPU consumption (task priorities, etc), and any other configuration you need that makes the VM tailored to your requirements.
Stop the VM and take a snapshot of the EBS volume, then restart it (re-assigning the Elastic IP is there is one). This is your backup snapshot - repeat as desired at whatever frequency you like. Remember to stop the VM when you snap it to prevent the OS writing to the volume whilst you're taking a copy of it.
For recovery:
Your VM will fail, eventually. You'll break something and render it damaged or inoperative, or the hardware it's running on will suffer a fault. It will happen.
When it does, terminate it (if it didn't self-terminate) and spin-up a new VM of the same type from the AWS stock list. Wait until it shows as 'Running', and then stop it.
Detach its EBS volume and delete it.
Create a new EBS volume from whichever backup snapshot you last created, and attach that new volume to the VM as /dev/sda1.
Start the VM and assign your EIP if appropriate.
About EBS storage:
It's a chunk of storage space. If you format it to look like a standard disk, you can use it exactly as you would a physical disk. Install stuff on it, point software at it for use as storage space, whatever.
you have two options: (BUT NOT EXACTLY AS YOU WANT ;( )
1- Have an 'external' EBS attached into your EC2 instance, and manually (you can do automatically through cronjobs), make snapshots from it ! But it's not what you want, why ? If your EC2 instance disappears, you will need to re-create all your environment again and re-attach your EBS... So it's a nice way of having backups of HUGE datas on your EC2, but your enviroment is destructed...
2- The best way, but not so perfect, is after you finished configuring your EC2, make a private AMI from it, so anytime you want you can launch more instances like that , from that AMI, so everything is cloned.... BUT the worst part of it, is that every time you change a configuration from the instance, you still need to make a new AMI, and every time you make a new AMI, you need to reboot your instance to grant data integrity on your new private AMI !
I recommend you to take a closer look into RESERVED EC2 instances, that have a better stability from normal ones. BUT YOU STILL CAN HAVE HARDWARE DISASTERS as normal instances too...

Resources