Amazon EC2 Capacity & Workflow Questions - windows

I’m hoping some of you with experience using amazon EC2 could offer some advice… of course it’ll be subjective which is fine, I’m pretty sure your guestimate would be better than mine.
I am planning on moving all my client’s websites from shared hosting environments to Amazon EC2. They’re all pretty low traffic sites (the busiest site receives around 50 unique visitors a day). There’s about 8 sites, but I may expand this as I take on more projects and host more sites… current capacity planning is for say 12 sites.
Each site runs on ASP.Net (Umbraco CMS), and requires a SQL Server database.
My thoughts are one of the following:
Setup a Small Instance (1.7gb RAM, 1 EC2 Compute Unit), and run IIS and SQL Server Express on that server.
Setup 2 Micro Instances (613MB Ram each, Up to 2 EC2 Compute Units) – one for IIS, the other for SQL Server.
Which arrangement do you think would work the best for my requirements. I’ve started setting up a Micro instance with Server 2008, SQL Server Express, etc… and finding it not coping with the memory requirements, hence considering expanding. I could always configure on a Small instance, then export the AMI and fire it up in a Micro instance after, and do the same every time any serious changes to the server are required. I guess I could even do all updates etc on a spare Small Spot instance, then switch load that AMI up in a Micro and transfer the IP Address across, so I don’t need to do too much work on the production servers. I figure if I store all my website data files on EBS Volumes, then it should be fairly easy to move hosting between servers with minimal downtime, while never working on a production server.
I’m interested to know what you all think, and what strategies you employ for such activities as upgrades, windows updates, software installations, etc.
And what capacity do you think I’d need for my requirements.
Cheers
Greg

Well, first-up, Server 2008 doesn't play well in the 613MB RAM the Micro instance gives you. It runs, but it's a dog, and it barks louder the more services (IIS, SSE, etc) you layer on top. We using nothing smaller than a Small for Server 2008, and in fact typically do the environment config in a Medium and scale down to Small once the heavy lifting is complete and the OS is ready to use. Server 2003, however, seems to breathe easier on a Micro - but we still do the config on a larger instance and scale down.
We're running low-traffic websites on Server 2003/IIS6 in a Micro, with a Server 2008/SS install on a shared, separate, Small instance. We do also have one Server 2008/IIS7 Micro build running, but only to remind ourselves why we don't use it more widely. ;)
Larger websites run Server 2008/IIS7 in either Small or Medium instances, but almost always still using that shared separate SS instance for database services. We try not to deploy multiple SS installations, since it makes maintenance and backups more complex.
Stashing content and config on EBS Volumes is of course good practice, unless you like rebuilding the entire system whenever an Instance disappears. Snapshotting your Instances periodically is also good practice, since you can spin-up a new Instance from a baseline AMI and swap the snapshot in as a boot Volume for fast recovery in the event of disaster.

Related

Serving up webpage from Amazon EC2 instance

If I'm serving up a website using apache from an Amazon EC2 instance, does it ever make sense for me to stop the machine? Also, I'm extremely new to EC2 so I'm not entirely sure how EBS works. It looks like Amazon does gave me 8gb of storage for free, but am I actually being charged for that storage 24/7? Thanks
If you stop the server, it is down. If you're in a development stage, and you want to limit your costs to the bare minimum, yes, you can stop the server at the end of each day. This is one of the advantages to an EBS backed instance.
EBS is basically external network attached storage. For most people, EBS backed servers are the way to go, since you can easily clone them, stop and start them, etc. You can also make snapshots of an ebs volume, so it's a great way to have low cost backups of your server.
As for EBS storage, yes you pay for it, but it is relatively inexpensive. The real cost of EC2 ends up being CPU/runtime for the most part, although EBS certainly makes it easy to use up large amounts of storage.
does it ever make sense for me to stop the machine?
For production machine, no. I never had to stop prod machines in last couple of years. We launch new machines from our AMI when required and kill them when not.
However, for load testing or some research work for clustered environment -- we had to pause machines for a while. We use stop feature at that time.
...I'm not entirely sure how EBS works.
Quoting from official doc:
Amazon EBS volumes are off-instance storage that persists independently from the life of an instance. Amazon Elastic Bionlock Store provides highly available, highly reliable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance.
So, in highly simplified terms, it's like an external HDD or NAS
It looks like Amazon does gave me 8gb of storage for free, but am I actually being charged for that storage 24/7?
If you're paying for the instance... it should include the cost of the storage that AWS provides. For a given instance type, EBS backed instances cost more than Instance Store ones, so I guess it would include the EBS cost, but it's their pricing policy -- I can't really comment.
Side Note
Being a network storage, EBS backed images have its pros and cons. The biggest benefit is, if instance ever crashes, your root device would not vanish (please make sure you have checked 'do not delete root device on termination' while creating the instance). It comes handy in times of hardware failures or accidental termination.
However, being on network, it has all the issue that any networked device can have. For some applications that has really really fast and excessive IO (like for Cassandra), EBS seemed to be a bad idea.
If you have a free instance (it has to be a micro instance running their brand of EC2 Linux, not, say CentOS) then there is no reason to turn it off.
If you are paying per hour, then yeah, it makes sense to shutdown when not in use.
If you need more computing power and have a bigger instance, you could have the instance running on a higher CPU rate (more expensive instance type) for those hours the site is going to be accessed a lot and after that just change the instance type back to some minor. Just don't mess with the volumes.
If you don't want to be offline for those couple of minutes you're going to need, you could set up a free (micro) instance and assign the elastic IP to that instance or even redirect to a static web page on s3...
example: redirect to a s3 static page with "Maintenance in progress" message displayed.
Also, watch out while stoping/terminating your instance, I dont know is that the case just on windows instances but after starting instance again my non-root drive (volumes for non system partitions) went offline (when checking "volumes" they were attached) so I had to mount them again.

Basic AWS questions

I'm newbie on AWS, and it has so many products (EC2, Load Balancer, EBS, S3, SimpleDB etc.), and so many docs, that I can't figure out where I must start from.
My goal is to be ready for scalability.
Suppose I want to set up a simple webserver, which access a database in mongolab. I suppose I need one EC2 instance to run it. At this point, do I need something more (EBS, S3, etc.)?
At some point of time, my app has reached enough traffic and I must scale it. I was thinking of starting a new copy (instance) of my EC2 machine. But then it will have another IP. So, how traffic is distributed between both EC2 instances? Is that did automatically? Must I hire a Load Balancer service to distribute the traffic? And then will I have to pay for 2 EC2 instances and 1 LB? At this point, do I need something more (e.g.: Elastic IP)?
Welcome to the club Sony Santos,
AWS is a very powerfull architecture, but with this power comes responsibility. I and presumably many others have learned the hard way building applications using AWS's services.
You ask, where do I start? This is actually a very good question, but you probably won't like my answer. You need to read and do research about all the technologies offered by amazon and even other providers such as Rackspace, GoGrid, Google's Cloud and Azure. Amazon is not easy to get going but its not meant to be really, its focus is more about being very customizable and have a very extensive api. But lets get back to your question.
To run a simple webserver you would need to start an EC2 instance this instance by default runs on a diskdrive called EBS. Essentially an EBS drive is a normal harddrive except that you can do lots of other cool stuff with it like take it off one server and move it to another. S3 is really more of a file storage system its more useful if you have a bunch of images or if you want to store a lot of backups of your databases etc, but its not a requirement for a simple webserver. Just running an EC2 instance is all you need, everything else will happen behind the scenes.
If you app reaches a lot of traffic you have two options. You can scale your machine up by shutting it off and starting it with a larger instance. Generally speaking this is the easiest thing to do, but you'll get to a point where you either cannot handle all the traffic with 1 instance even at the larger size and you'll decide you need two OR you'll want a more fault tolerant application that will still be online in the event of a failure or update.
If you create a second instance you will need to do some form of loadbalancing. I recommend using amazons Elastic Load Balancer as its easy to configure and its integration with the cloud is better than using Round Robin DNS or a application like haproxy. Elastic Load Balancers are not expensive, I believe they cost around $18 / month + data that's passed between the loadbalancer.
But no, you don't need anything else to do scale up your site. 2 EC2 instances and a ELB will do the trick.
Additional questions you didn't ask but probably should have.
How often does an EC2 instance experience hardware failure and crash my server. What can I do if this happens?
It happens frequently, usually in batches. Sometimes I go months without any problems then I will get a few servers crash at a time. But its defiantly something you should plan for I didn't in the beginning and I paid for it. Make sure you create scripts and have backups and a backup plan ready incase your server fails. Be ok with it being down or have a load balanced solution from day 1.
Whats the hardest part about scalabilty?
Testing testing testing testing... Don't ever assume anything. Also be prepared for sudden spikes in your traffic. You have to be prepared for anything if you page goes from 1 to 1000 people over night are you prepared to handle it? Have you tested what you "think" will happen?
Best of luck and have fun... I know I have :)

Scaling Tigase XMPP server on Amazon EC2

Does anyone have an experience running clustered Tigase XMPP servers on Amazon's EC2, primarily I wish to know about anything that might trip me up that is non-obvious. (For example apparently running Ejabberd on EC2 can cause issues due to Mnesia.)
Or if you have any general advice to installing and running Tigase on Ubuntu.
Extra information:
The system I’m developing uses XMPP just to communicate (in near real-time) between a mobile app and the server(s).
The number of users will initially be small, but hopefully will grow. This is why the system needs to be scalable. Presumably for a just a few thousand users you wouldn’t need a cc1.4xlarge EC2 instance? (Otherwise this is going to be very expensive to run!)
I plan on using a MySQL database hosted in Amazon RDS for the XMPP server database.
I also plan on creating an external XMPP component written in Python, using SleekXMPP. It will be this external component that does all the ‘work’ of the server, as the application I’m making is quite different from instant messaging. For this part I have not worked out how to connect an external XMPP component written in Python to a Tigase server. The documentation seems to suggest that components are written specifically for Tigase - and not for a general XMPP server, using XEP-0114: Jabber Component Protocol, as I expected.
With this extra information, if you can think of anything else I should know about I’d be glad to know.
Thank you :)
I have lots of experience. I think there is a load of non-obvious problems. Like the only reliable instance to run application like Tigase is cc1.4xlarge. Others cause problems with CPU availability and this is just a lottery whether you are lucky enough to run your service on a server which is not busy with others people work.
Also you need an instance with the highest possible I/O to make sure it can cope with network traffic. The high I/O applies especially to database instance.
Not sure if this is obvious or not, but there is this problem with hostnames on EC2, every time you start instance the hostname changes and IP address changes. Tigase cluster is quite sensitive to hostnames. There is a way to force/change the hostname for the instance, so this might be a way around the problem.
Of course I am talking about a cluster for millions of online users and really high traffic 100k XMPP packets per second or more. Generally for large installation it is way cheaper and more efficient to have a dedicated servers.
Generally Tigase runs very well on Amazon EC2 but you really need the latest SVN code as it has lots of optimizations added especially after tests on the cloud. If you provide some more details about your service I may have some more suggestions.
More comments:
If it comes to costs, a dedicated server is always cheaper option for constantly running service. Unless you plan to switch servers on/off on hourly basis I would recommend going for some dedicated service. Costs are lower and performance is way more predictable.
However, if you really want/need to stick to Amazon EC2 let me give you some concrete numbers, below is a list of instances and how many online users the cluster was able to reliably handle:
5*cc1.4xlarge - 1mln 700k online users
1*c1.xlarge - 118k online users
2*c1.xlarge - 127k online users
2*m2.4xlarge (with 5GB RAM for Tigase) - 236k online users
2*m2.4xlarge (with 20GB RAM for Tigase) - 315k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 400k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 312k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 327k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 280k online users
A few more comments:
Why amount of memory matters that much? This is because CPU power is very unreliable and inconsistent on all but cc1.4xlarge instances. You have 8 virtual CPUs but if you look at the top command you often see one CPU is working and the rest is not. This insufficient CPU power leads to internal queues grow in the Tigase. When the CPU power is back Tigase can process waiting packets. The more memory Tigase has the more packets can be queued and it better handles CPU deficiencies.
Why there is 5*m2.4xlarge 4 times? This is because I repeated tests many times at different days and time of the day. As you can see depending on the time and date the system could handle different load. I guess this is because Tigase instance shared CPU power with some other services. If they were busy Tigase suffered from CPU under power.
That said I think with installation of up to 10k online users you should be fine. However, other factors like roster size greatly matter as they affect traffic, and load. Also if you have other elements which generate a significant traffic this will put load on your system.
In any case, without some tests it is impossible to tell how really your system behaves or whether it can handle the load.
And the last question regarding component:
Of course Tigase does support XEP-0114 and XEP-0225 for connecting external components. So this should not be a problem with components written in different languages. On the other hand I recommend using Tigase's API for writing component. They can be deployed either as internal Tigase components or as external components and this is transparent for the developer, you do not have to worry about this at development time. This is part of the API and framework.
Also, you can use all the goods from Tigase framework, scripting capabilities, monitoring, statistics, much easier development as you can easily deploy your code as internal component for tests.
You really do not have to worry about any XMPP specific stuff, you just fill body of processPacket(...) method and that's it.
There should be enough online documentation for all of this on the Tigase website.
Also, I would suggest reading about Python support for multi-threading and how it behaves under a very high load. It used to be not so great.

Amazon EC2 consideration - redundancy and elastic IPs

I've been tasked with determining if Amazon EC2 is something we should move our ecommerce site to. We currently use Amazon S3 for a lot of images and files. The cost would go up by about $20/mo for our host costs, but we could sell our server for a few thousand dollars. This all came up because right now there are no procedures in place if something happened to our server.
How reliable is Amazon EC2? Is the redundancy good, I don't see anything about this in the FAQ and it's a problem on our current system I'm looking to solve.
Are elastic IPs beneficial? It sounds like you could point DNS to that IP and then on Amazon's end, reroute that IP address to any EC2 instance so you could easily get another instance up and running if the first one failed.
I'm aware of scalability, it's the redundancy and reliability that I'm asking about.
At work, I've had something like 20-40 instances running at all times for over a year. I think we've had 1-3 alert emails come from amazon suggesting that we terminate and boot another instance (presumably because they are detecting possible failure in the underlying hardware). We've never had an instance go down suddenly, which seems rather good.
Elastic IP's are amazing and are part of the solution. The other part is being able to rapidly bring up new instances. I've learned that you shouldn't care about instances going down, that it's more important to use proper load balancing and be able to bring up commodity instances quickly.
Yes, it's very good. If you aren't able to put together a concurrent redundancy (where you have multiple servers fulfilling requests simultaneously), using the elastic IP to quickly redirect to another EC2 instance would be a way to minimize downtime.
Yeah I think moving from inhouse server to Amazon will definitely make a lot of sense economically. EBS backed instances ensure that even if the machine gets rebooted, the transient memory is not lost. And if you have a clear separation between your application and data layer and can have them on different machines, then you can build even better redundancy for your data.
For ex, if you use mysql, then you can consider using Amazon RDS service - which gives you a highly available and reliable MySQL instance, fully managed (patches and all). The application layer then can be made more resilient by having more smaller instances rather than one larger instance, through load balancing.
The cost you will save on is really hardware maintenance and the cost you would have to incur to build in disaster recovery.

Benefits of EBS vs. instance-store (and vice-versa) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm unclear as to what benefits I get from EBS vs. instance-store for my instances on Amazon EC2. If anything, it seems that EBS is way more useful (stop, start, persist + better speed) at relatively little difference in cost...? Also, is there any metric as to whether more people are using EBS now that it's available, considering it is still relatively new?
The bottom line is you should almost always use EBS backed instances.
Here's why
EBS backed instances can be set so that they cannot be (accidentally) terminated through the API.
EBS backed instances can be stopped when you're not using them and resumed when you need them again (like pausing a Virtual PC), at least with my usage patterns saving much more money than I spend on a few dozen GB of EBS storage.
EBS backed instances don't lose their instance storage when they crash (not a requirement for all users, but makes recovery much faster)
You can dynamically resize EBS instance storage.
You can transfer the EBS instance storage to a brand new instance (useful if the hardware at Amazon you were running on gets flaky or dies, which does happen from time to time)
It is faster to launch an EBS backed instance because the image does not have to be fetched from S3.
If the hardware your EBS-backed instance is scheduled for maintenance, stopping and starting the instance automatically migrates to new hardware. I was also able to move an EBS-backed instance on failed hardware by force-stopping the instance and launching it again (your mileage may vary on failed hardware).
I'm a heavy user of Amazon and switched all of my instances to EBS backed storage as soon as the technology came out of beta. I've been very happy with the result.
EBS can still fail - not a silver bullet
Keep in mind that any piece of cloud-based infrastructure can fail at any time. Plan your infrastructure accordingly. While EBS-backed instances provide certain level of durability compared to ephemeral storage instances, they can and do fail. Have an AMI from which you can launch new instances as needed in any availability zone, back up your important data (e.g. databases), and if your budget allows it, run multiple instances of servers for load balancing and redundancy (ideally in multiple availability zones).
When Not To
At some points in time, it may be cheaper to achieve faster IO on Instance Store instances. There was a time when it was certainly true. Now there are many options for EBS storage, catering to many needs. The options and their pricing evolve constantly as technology changes. If you have a significant amount of instances that are truly disposable (they don't affect your business much if they just go away), do the math on cost vs. performance. EBS-backed instances can also die at any point in time, but my practical experience is that EBS is more durable.
99% of our AWS setup is recyclable. So for me it doesn't really matter if I terminate an instance -- nothing is lost ever. E.g. my application is automatically deployed on an instance from SVN, our logs are written to a central syslog server.
The only benefit of instance storage that I see are cost-savings. Otherwise EBS-backed instances win. Eric mentioned all the advantages.
[2012-07-16] I would phrase this answer a lot different today.
I haven't had any good experience with EBS-backed instances in the past year or so. The last downtimes on AWS pretty much wrecked EBS as well.
I am guessing that a service like RDS uses some kind of EBS as well and that seems to work for the most part. On the instances we manage ourselves, we have got rid off EBS where possible.
Getting rid to an extend where we moved a database cluster back to iron (= real hardware). The only remaining piece in our infrastructure is a DB server where we stripe multiple EBS volumes into a software RAID and backup twice a day. Whatever would be lost in between backups, we can live with.
EBS is a somewhat flakey technology since it's essentially a network volume: a volume attached to your server from remote. I am not negating the work done with it – it is an amazing product since essentially unlimited persistent storage is just an API call away. But it's hardly fit for scenarios where I/O performance is key.
And in addition to how network storage behaves, all network is shared on EC2 instances. The smaller an instance (e.g. t1.micro, m1.small) the worse it gets because your network interfaces on the actual host system are shared among multiple VMs (= your EC2 instance) which run on top of it.
The larger instance you get, the better it gets of course. Better here means within reason.
When persistence is required, I would always advice people to use something like S3 to centralize between instances. S3 is a very stable service. Then automate your instance setup to a point where you can boot a new server and it gets ready by itself. Then there is no need to have network storage which lives longer than the instance.
So all in all, I see no benefit to EBS-backed instances what so ever. I rather add a minute to bootstrap, then run with a potential SPOF.
We like instance-store. It forces us to make our instances completely recyclable, and we can easily automate the process of building a server from scratch on a given AMI. This also means we can easily swap out AMIs. Also, EBS still has performance problems from time to time.
Eric pretty much nailed it. We (Bitnami) are a popular provider of free AMIs for popular applications and development frameworks (PHP, Joomla, Drupal, you get the idea). I can tell you that EBS-backed AMIs are significantly more popular than S3-backed. In general I think s3-backed instances are used for distributed, time-limited jobs (for example, large scale processing of data) where if one machine fails, another one is simply spinned up. EBS-backed AMIS tend to be used for 'traditional' server tasks, such as web or database servers that keep state locally and thus require the data to be available in the case of crashing.
One aspect I did not see mentioned is the fact that you can take snapshots of an EBS-backed instance while running, effectively allowing you to have very cost-effective backups of your infrastructure (the snapshots are block-based and incremental)
I've had the exact same experience as Eric at my last position. Now in my new job, I'm going through the same process I performed at my last job... rebuilding all their AMIs for EBS backed instances - and possibly as 32bit machines (cheaper - but can't use same AMI on 32 and 64 machines).
EBS backed instances launch quickly enough that you can begin to make use of the Amazon AutoScaling API which lets you use CloudWatch metrics to trigger the launch of additional instances and register them to the ELB (Elastic Load Balancer), and also to shut them down when no longer required.
This kind of dynamic autoscaling is what AWS is all about - where the real savings in IT infrastructure can come into play. It's pretty much impossible to do autoscaling right with the old s3 "InstanceStore"-backed instances.
I'm just starting to use EC2 myself so not an expert, but Amazon's own documentation says:
we recommend that you use the local instance store for temporary data and, for data requiring a higher level of durability, we recommend using Amazon EBS volumes or backing up the data to Amazon S3.
Emphasis mine.
I do more data analysis than web hosting, so persistence doesn't matter as much to me as it might for a web site. Given the distinction made by Amazon itself, I wouldn't assume that EBS is right for everyone.
I'll try to remember to weigh in again after I've used both.
EBS is like the virtual disk of a VM:
Durable, instances backed by EBS can be freely started and stopped (saving money)
Can be snapshotted at any point in time, to get point-in-time backups
AMIs can be created from EBS snapshots, so the EBS volume becomes a template for new systems
Instance storage is:
Local, so generally faster
Non-networked, in normal cases EBS I/O comes at the cost of network bandwidth (except for EBS-optimized instances, which have separate EBS bandwidth)
Has limited I/O per second IOPS. Even provisioned I/O maxes out at a few thousand IOPS
Fragile. As soon as the instance is stopped, you lose everything in instance storage.
Here's where to use each:
Use EBS for the backing OS partition and permanent storage (DB data, critical logs, application config)
Use instance storage for in-process data, noncritical logs, and transient application state. Example: external sort storage, tempfiles, etc.
Instance storage can also be used for performance-critical data, when there's replication between instances (NoSQL DBs, distributed queue/message systems, and DBs with replication)
Use S3 for data shared between systems: input dataset and processed results, or for static data used by each system when lauched.
Use AMIs for prebaked, launchable servers
Most people choose to use EBS backed instance as it is stateful. It is to safer because everything you have running and installed inside it, will survive stop/stop or any instance failure.
Instance store is stateless, you loose it with all the data inside in case of any instance failure situation. However, it is free and faster because the instance volume is tied to the physical server where the VM is running.
For someone new to all this and if accidentally landed here
As of now all AMI's in quickstart section are EBS backed
Also there's a good explanation at official doc for difference between EBS and Instance store
& this image pretty much sums it up
If you run multiple instance and assign a scheduled service of AWS Instance as one of your priority on Avoiding Unexpected Charges, I would recommend not to use the instance-store.
As explained on documentation of EBS
Volumes
and the answer from j2d3 and Siddharth Sharma the
instance-store can run for as long as you want, but it cannot be
stopped. Means that the service cannot be scheduled by an Automatic
Start/Stop or Instance
Recovery.
Moreover, for this kind of scheme there is also no benefit to use EBS Backed on Elastic Beanstalk as it is designed to ensure that all the resources you need are keep running. It will always do an automatically relaunches any services that you stop.
Reviewing all the rest, out of the total charges on using the VPC, EBS and ELB that added to EC2-Classic, the EC2-VPC with ELB is mostly the best choice where unlike on EC2-Classic, a stopped instance retains its associated Elastic IP addresses and the EBS volume is stored automatically.
As conclusion, taking the main part of your question:
it seems that EBS is way more useful (stop, start, persist + better
speed) at relatively little difference in cost...?
The answer is yes but if your instance is EBS-based, it can be stopped. It will remain in your account, you will not be charged for it. You will be charge only the volume but EBS is charged hourly. You may also consider that among all available types you have a flexibility to Resize the EBS Volume.
Beside the benefits that already listed by Eric, it shall also be aware that in term of cost S3 may or may not be cheaper than EBS. I agree that it relatively little difference in cost if you keep running both types of instance within the same platform and architecture of the application all the time.
However if there a scenario to run the application on a lower cost service, pull all unhandled task and role them to the VPC/EBS via a pipeline or lambda within a short time basis say <1 hour a day, which impossible to do when you use an instance-store, then it will be a different story.

Resources