Best way to create a Website/Server with expandable storage - amazon-ec2

At the moment I am creating a website where Users can upload files.
It is running on an Apache, php MariaDB root Server with 400GB of Storage.
But as you can imagine, if Users are able to upload files the storage will be full in no time.
What is the common way to setup such environments?
I know there is Amazon EC2 and Elastic File Systems but I would like to know if someone (maybe someone professional) could explain some common alternatives.
So the requirements are: Root Server or VPS (full access to the system is mandatory) and some kind of expandable storage.
Thanks

Keeping user files on an EC2 instance is not scalable nor fault tolerant. In case your instance or its availability zone goes down, you can loose all the users data. The issue with limited storage you are already aware of.
Thus, a good practice is to design your application to be stateless, which means that it can run on any instance in any availability zone at any time. This requires user files to be stored outside of your instance. The common choices are S3 and EFS. The use of any of them will make your application's storage highly available, fault tolerant and scalable.

Related

Is Amazon S3 ever unavailable independent of EC2?

Currently, we are uploading all of our user-generated-content to a medium-size EC2 Instance, and then from there we run a cron job to sync all of the uploaded content to S3. We have some code that runs on the backend (every time you need to access any uploaded file) that checks to see whether or not the resource has been moved to S3, or if it is just available on our uploads instance.
This seems a little wasteful, but it does provide redundency -- if S3 is down, we have some javascript code in place that forces the files to be served from our upload box. The actual file uploads are stored in EBS, not on the instance.
We've got about 150GB worth of files in the S3 bucket right now; which makes performing a separate backup of the S3 Bucket extremely time consuming and nearly impossible to run on any sort of regular basis.
So, my question is, is this even necessary? Can anyone point me to some uptime statistics between S3 and EC2? Does it ever happen that S3 is down, but EC2 is available? It seems like it might be simpler to just upload everything directly to S3 and trust that it is up.... On the other hand, we could just store everything in EBS and forget S3 completely, which seems like it makes more sense.
It's much more likely that your EC2 instance will be down than S3 will be down. For one, you have a single instance running on a single host with a single network connection in a single availability zone. Past that, on a platform level, EC2 (particularly involving EBS) has had several protracted outages, whereas S3 has not had a significant availability event since 2008.
S3 is a distributed system spread all across your region of choice. Operating at the object level with eventual consistency guarantees is frankly a lot simpler than the problems addressed by EBS and EC2, all of which add additional consistency guarantees (and thus ways to fail) by design.
I generally make upload processes treat S3 as a backing store -- upload to S3 directly, or upload via an EC2 instance in a write-through fashion -- and accept that if S3 is down, then I can't handle uploads. Doing it this way introduces a failure mode where your app is running but S3 is not, but it significantly reduces the potential for data loss, which is usually a more serious problem than unavailability. This also allows you to simultaneously handle uploads via different EC2 instances in different availability zones, hedging against EC2 failures, as well as via instance-store instances, hedging against EBS failures.

Serving up webpage from Amazon EC2 instance

If I'm serving up a website using apache from an Amazon EC2 instance, does it ever make sense for me to stop the machine? Also, I'm extremely new to EC2 so I'm not entirely sure how EBS works. It looks like Amazon does gave me 8gb of storage for free, but am I actually being charged for that storage 24/7? Thanks
If you stop the server, it is down. If you're in a development stage, and you want to limit your costs to the bare minimum, yes, you can stop the server at the end of each day. This is one of the advantages to an EBS backed instance.
EBS is basically external network attached storage. For most people, EBS backed servers are the way to go, since you can easily clone them, stop and start them, etc. You can also make snapshots of an ebs volume, so it's a great way to have low cost backups of your server.
As for EBS storage, yes you pay for it, but it is relatively inexpensive. The real cost of EC2 ends up being CPU/runtime for the most part, although EBS certainly makes it easy to use up large amounts of storage.
does it ever make sense for me to stop the machine?
For production machine, no. I never had to stop prod machines in last couple of years. We launch new machines from our AMI when required and kill them when not.
However, for load testing or some research work for clustered environment -- we had to pause machines for a while. We use stop feature at that time.
...I'm not entirely sure how EBS works.
Quoting from official doc:
Amazon EBS volumes are off-instance storage that persists independently from the life of an instance. Amazon Elastic Bionlock Store provides highly available, highly reliable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance.
So, in highly simplified terms, it's like an external HDD or NAS
It looks like Amazon does gave me 8gb of storage for free, but am I actually being charged for that storage 24/7?
If you're paying for the instance... it should include the cost of the storage that AWS provides. For a given instance type, EBS backed instances cost more than Instance Store ones, so I guess it would include the EBS cost, but it's their pricing policy -- I can't really comment.
Side Note
Being a network storage, EBS backed images have its pros and cons. The biggest benefit is, if instance ever crashes, your root device would not vanish (please make sure you have checked 'do not delete root device on termination' while creating the instance). It comes handy in times of hardware failures or accidental termination.
However, being on network, it has all the issue that any networked device can have. For some applications that has really really fast and excessive IO (like for Cassandra), EBS seemed to be a bad idea.
If you have a free instance (it has to be a micro instance running their brand of EC2 Linux, not, say CentOS) then there is no reason to turn it off.
If you are paying per hour, then yeah, it makes sense to shutdown when not in use.
If you need more computing power and have a bigger instance, you could have the instance running on a higher CPU rate (more expensive instance type) for those hours the site is going to be accessed a lot and after that just change the instance type back to some minor. Just don't mess with the volumes.
If you don't want to be offline for those couple of minutes you're going to need, you could set up a free (micro) instance and assign the elastic IP to that instance or even redirect to a static web page on s3...
example: redirect to a s3 static page with "Maintenance in progress" message displayed.
Also, watch out while stoping/terminating your instance, I dont know is that the case just on windows instances but after starting instance again my non-root drive (volumes for non system partitions) went offline (when checking "volumes" they were attached) so I had to mount them again.

Basic AWS questions

I'm newbie on AWS, and it has so many products (EC2, Load Balancer, EBS, S3, SimpleDB etc.), and so many docs, that I can't figure out where I must start from.
My goal is to be ready for scalability.
Suppose I want to set up a simple webserver, which access a database in mongolab. I suppose I need one EC2 instance to run it. At this point, do I need something more (EBS, S3, etc.)?
At some point of time, my app has reached enough traffic and I must scale it. I was thinking of starting a new copy (instance) of my EC2 machine. But then it will have another IP. So, how traffic is distributed between both EC2 instances? Is that did automatically? Must I hire a Load Balancer service to distribute the traffic? And then will I have to pay for 2 EC2 instances and 1 LB? At this point, do I need something more (e.g.: Elastic IP)?
Welcome to the club Sony Santos,
AWS is a very powerfull architecture, but with this power comes responsibility. I and presumably many others have learned the hard way building applications using AWS's services.
You ask, where do I start? This is actually a very good question, but you probably won't like my answer. You need to read and do research about all the technologies offered by amazon and even other providers such as Rackspace, GoGrid, Google's Cloud and Azure. Amazon is not easy to get going but its not meant to be really, its focus is more about being very customizable and have a very extensive api. But lets get back to your question.
To run a simple webserver you would need to start an EC2 instance this instance by default runs on a diskdrive called EBS. Essentially an EBS drive is a normal harddrive except that you can do lots of other cool stuff with it like take it off one server and move it to another. S3 is really more of a file storage system its more useful if you have a bunch of images or if you want to store a lot of backups of your databases etc, but its not a requirement for a simple webserver. Just running an EC2 instance is all you need, everything else will happen behind the scenes.
If you app reaches a lot of traffic you have two options. You can scale your machine up by shutting it off and starting it with a larger instance. Generally speaking this is the easiest thing to do, but you'll get to a point where you either cannot handle all the traffic with 1 instance even at the larger size and you'll decide you need two OR you'll want a more fault tolerant application that will still be online in the event of a failure or update.
If you create a second instance you will need to do some form of loadbalancing. I recommend using amazons Elastic Load Balancer as its easy to configure and its integration with the cloud is better than using Round Robin DNS or a application like haproxy. Elastic Load Balancers are not expensive, I believe they cost around $18 / month + data that's passed between the loadbalancer.
But no, you don't need anything else to do scale up your site. 2 EC2 instances and a ELB will do the trick.
Additional questions you didn't ask but probably should have.
How often does an EC2 instance experience hardware failure and crash my server. What can I do if this happens?
It happens frequently, usually in batches. Sometimes I go months without any problems then I will get a few servers crash at a time. But its defiantly something you should plan for I didn't in the beginning and I paid for it. Make sure you create scripts and have backups and a backup plan ready incase your server fails. Be ok with it being down or have a load balanced solution from day 1.
Whats the hardest part about scalabilty?
Testing testing testing testing... Don't ever assume anything. Also be prepared for sudden spikes in your traffic. You have to be prepared for anything if you page goes from 1 to 1000 people over night are you prepared to handle it? Have you tested what you "think" will happen?
Best of luck and have fun... I know I have :)

Amazon EC2 consideration - redundancy and elastic IPs

I've been tasked with determining if Amazon EC2 is something we should move our ecommerce site to. We currently use Amazon S3 for a lot of images and files. The cost would go up by about $20/mo for our host costs, but we could sell our server for a few thousand dollars. This all came up because right now there are no procedures in place if something happened to our server.
How reliable is Amazon EC2? Is the redundancy good, I don't see anything about this in the FAQ and it's a problem on our current system I'm looking to solve.
Are elastic IPs beneficial? It sounds like you could point DNS to that IP and then on Amazon's end, reroute that IP address to any EC2 instance so you could easily get another instance up and running if the first one failed.
I'm aware of scalability, it's the redundancy and reliability that I'm asking about.
At work, I've had something like 20-40 instances running at all times for over a year. I think we've had 1-3 alert emails come from amazon suggesting that we terminate and boot another instance (presumably because they are detecting possible failure in the underlying hardware). We've never had an instance go down suddenly, which seems rather good.
Elastic IP's are amazing and are part of the solution. The other part is being able to rapidly bring up new instances. I've learned that you shouldn't care about instances going down, that it's more important to use proper load balancing and be able to bring up commodity instances quickly.
Yes, it's very good. If you aren't able to put together a concurrent redundancy (where you have multiple servers fulfilling requests simultaneously), using the elastic IP to quickly redirect to another EC2 instance would be a way to minimize downtime.
Yeah I think moving from inhouse server to Amazon will definitely make a lot of sense economically. EBS backed instances ensure that even if the machine gets rebooted, the transient memory is not lost. And if you have a clear separation between your application and data layer and can have them on different machines, then you can build even better redundancy for your data.
For ex, if you use mysql, then you can consider using Amazon RDS service - which gives you a highly available and reliable MySQL instance, fully managed (patches and all). The application layer then can be made more resilient by having more smaller instances rather than one larger instance, through load balancing.
The cost you will save on is really hardware maintenance and the cost you would have to incur to build in disaster recovery.

What is the point of instance storage on EC2?

I'm building some AMIs from one of the basic ones on EC2. One of the instance types is running Tomcat and contains a lot of Lucene indexes; another instance will be running MySQL and have correspondingly large data requirements with it.
I'm trying to define the best way to include those in the AMIs that I'm authoring. If I mount /mnt/lucene and /mnt/mysql, those don't get included in the AMI generated. So it seems to me like the preferred way to deal with those is to have an EBS for each one, take snapshots and spin up instances which have their own EBS based on the most recent snapshots. Is that the best way to proceed?
What is the point of instance storage? It seems like it will only work as a temporary storage area - what am I missing? Presumably there is a reason Amazon offer up to 800GB of storage on standard large instances...
Instance storage is faster than EBS. You don't mention what you will be doing with your instances, but for some applications speed might be more important than durability. For an application that is primarily doing data mining on a large database, having a few hundred gigs of local, fast storage to host the DB might be beneficial. Worker nodes in a MapReduce cluster might also be great candidates for instance storage, depending on what type of job it was.
Another point of instance storage is that it's independent. There have been many EBS outages (google e.g. "site:aws.amazon.com ebs outage"). If the instance runs at all, it has the instance storage available. Obviously if you rely on instance storage, you need to run multiple instances (on multiple availability zones) and tolerate single failing instances.
I know this is late to the game, but one other little considered factoid...
EBS storage makes it exceedingly easy to create AMI's from, whereas, instance-store based storage requires that creation of AMI's be done locally on the machine itself with a whole bunch of work to prep, store, and register the AMI.

Resources