Currently, we are uploading all of our user-generated-content to a medium-size EC2 Instance, and then from there we run a cron job to sync all of the uploaded content to S3. We have some code that runs on the backend (every time you need to access any uploaded file) that checks to see whether or not the resource has been moved to S3, or if it is just available on our uploads instance.
This seems a little wasteful, but it does provide redundency -- if S3 is down, we have some javascript code in place that forces the files to be served from our upload box. The actual file uploads are stored in EBS, not on the instance.
We've got about 150GB worth of files in the S3 bucket right now; which makes performing a separate backup of the S3 Bucket extremely time consuming and nearly impossible to run on any sort of regular basis.
So, my question is, is this even necessary? Can anyone point me to some uptime statistics between S3 and EC2? Does it ever happen that S3 is down, but EC2 is available? It seems like it might be simpler to just upload everything directly to S3 and trust that it is up.... On the other hand, we could just store everything in EBS and forget S3 completely, which seems like it makes more sense.
It's much more likely that your EC2 instance will be down than S3 will be down. For one, you have a single instance running on a single host with a single network connection in a single availability zone. Past that, on a platform level, EC2 (particularly involving EBS) has had several protracted outages, whereas S3 has not had a significant availability event since 2008.
S3 is a distributed system spread all across your region of choice. Operating at the object level with eventual consistency guarantees is frankly a lot simpler than the problems addressed by EBS and EC2, all of which add additional consistency guarantees (and thus ways to fail) by design.
I generally make upload processes treat S3 as a backing store -- upload to S3 directly, or upload via an EC2 instance in a write-through fashion -- and accept that if S3 is down, then I can't handle uploads. Doing it this way introduces a failure mode where your app is running but S3 is not, but it significantly reduces the potential for data loss, which is usually a more serious problem than unavailability. This also allows you to simultaneously handle uploads via different EC2 instances in different availability zones, hedging against EC2 failures, as well as via instance-store instances, hedging against EBS failures.
Related
At the moment I am creating a website where Users can upload files.
It is running on an Apache, php MariaDB root Server with 400GB of Storage.
But as you can imagine, if Users are able to upload files the storage will be full in no time.
What is the common way to setup such environments?
I know there is Amazon EC2 and Elastic File Systems but I would like to know if someone (maybe someone professional) could explain some common alternatives.
So the requirements are: Root Server or VPS (full access to the system is mandatory) and some kind of expandable storage.
Thanks
Keeping user files on an EC2 instance is not scalable nor fault tolerant. In case your instance or its availability zone goes down, you can loose all the users data. The issue with limited storage you are already aware of.
Thus, a good practice is to design your application to be stateless, which means that it can run on any instance in any availability zone at any time. This requires user files to be stored outside of your instance. The common choices are S3 and EFS. The use of any of them will make your application's storage highly available, fault tolerant and scalable.
I will make a project in the not too distant future, a project where we will be storing thousands of thousands of images in the course of time. I'm on a hard decision whether to use Amazon S3 or EFS to store those images. Both I think are a very good option, but my question goes to what would be the best service or what would be the best practice?
My application will be done with Laravel and I already did the integration of both services.
Most of the characteristics of the project are:
Most of the files I will store will be photos about 95%.
Approximately 1.5k photos would be stored daily.
The photos are very large (professional cameras).
Traffic to the application will not be much, approx. 100 users at a time.
Each user would consult about 100 photos per day.
What do you recommend?
S3 is absolutely the right answer and practice. I have built numerous applications like you describe, some with 100s of millions of images, and S3 is superior. It also allows for flexibility such as your API returning the images as pre-signed URLs which will reduce load to your servers, images can be linked directly via static web hosting, and it provides lifecycle policies to archive less used data. Additionally, further integration with other AWS services is easy using event triggers.
As for storing/uploading, S3 multi-part upload is very useful to both increase performance and increase reliability.
EFS would make sense for your type of scenario if you were doing some intensive processing where you had a cluster of severs that needed lower latency with a shared file system - think HPC. EFS would also come at a higher cost and doesn't provide as many extensibility options or built-in features as S3. Your scenario doesn't sound like it requires EFS.
http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
For the scenario you proposed AWS S3 is the choice. Why?
Since images are more often added, it costs roughly 1/10 th of EFS.
Less overhead on your web servers since files can be directly uploaded and downloaded with S3.
You can leverage event driven processing with Lambda e.g Generating thumbnail, Image processing filters by S3 Lambda trigger.
Higher level of SLA for availability and durability.
Supporting for inbuilt lifecycle management to archival and reduce cost.
AWS EFS can also be an option if it happens to frequently modify the images (Where EBS is also an option)
You can also consider using AWS CloudFront with either the option to cache images.
Note: At the end its not about using a single service. Based on your upcoming requrements you can choose either one of them or both.
If I'm serving up a website using apache from an Amazon EC2 instance, does it ever make sense for me to stop the machine? Also, I'm extremely new to EC2 so I'm not entirely sure how EBS works. It looks like Amazon does gave me 8gb of storage for free, but am I actually being charged for that storage 24/7? Thanks
If you stop the server, it is down. If you're in a development stage, and you want to limit your costs to the bare minimum, yes, you can stop the server at the end of each day. This is one of the advantages to an EBS backed instance.
EBS is basically external network attached storage. For most people, EBS backed servers are the way to go, since you can easily clone them, stop and start them, etc. You can also make snapshots of an ebs volume, so it's a great way to have low cost backups of your server.
As for EBS storage, yes you pay for it, but it is relatively inexpensive. The real cost of EC2 ends up being CPU/runtime for the most part, although EBS certainly makes it easy to use up large amounts of storage.
does it ever make sense for me to stop the machine?
For production machine, no. I never had to stop prod machines in last couple of years. We launch new machines from our AMI when required and kill them when not.
However, for load testing or some research work for clustered environment -- we had to pause machines for a while. We use stop feature at that time.
...I'm not entirely sure how EBS works.
Quoting from official doc:
Amazon EBS volumes are off-instance storage that persists independently from the life of an instance. Amazon Elastic Bionlock Store provides highly available, highly reliable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance.
So, in highly simplified terms, it's like an external HDD or NAS
It looks like Amazon does gave me 8gb of storage for free, but am I actually being charged for that storage 24/7?
If you're paying for the instance... it should include the cost of the storage that AWS provides. For a given instance type, EBS backed instances cost more than Instance Store ones, so I guess it would include the EBS cost, but it's their pricing policy -- I can't really comment.
Side Note
Being a network storage, EBS backed images have its pros and cons. The biggest benefit is, if instance ever crashes, your root device would not vanish (please make sure you have checked 'do not delete root device on termination' while creating the instance). It comes handy in times of hardware failures or accidental termination.
However, being on network, it has all the issue that any networked device can have. For some applications that has really really fast and excessive IO (like for Cassandra), EBS seemed to be a bad idea.
If you have a free instance (it has to be a micro instance running their brand of EC2 Linux, not, say CentOS) then there is no reason to turn it off.
If you are paying per hour, then yeah, it makes sense to shutdown when not in use.
If you need more computing power and have a bigger instance, you could have the instance running on a higher CPU rate (more expensive instance type) for those hours the site is going to be accessed a lot and after that just change the instance type back to some minor. Just don't mess with the volumes.
If you don't want to be offline for those couple of minutes you're going to need, you could set up a free (micro) instance and assign the elastic IP to that instance or even redirect to a static web page on s3...
example: redirect to a s3 static page with "Maintenance in progress" message displayed.
Also, watch out while stoping/terminating your instance, I dont know is that the case just on windows instances but after starting instance again my non-root drive (volumes for non system partitions) went offline (when checking "volumes" they were attached) so I had to mount them again.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm unclear as to what benefits I get from EBS vs. instance-store for my instances on Amazon EC2. If anything, it seems that EBS is way more useful (stop, start, persist + better speed) at relatively little difference in cost...? Also, is there any metric as to whether more people are using EBS now that it's available, considering it is still relatively new?
The bottom line is you should almost always use EBS backed instances.
Here's why
EBS backed instances can be set so that they cannot be (accidentally) terminated through the API.
EBS backed instances can be stopped when you're not using them and resumed when you need them again (like pausing a Virtual PC), at least with my usage patterns saving much more money than I spend on a few dozen GB of EBS storage.
EBS backed instances don't lose their instance storage when they crash (not a requirement for all users, but makes recovery much faster)
You can dynamically resize EBS instance storage.
You can transfer the EBS instance storage to a brand new instance (useful if the hardware at Amazon you were running on gets flaky or dies, which does happen from time to time)
It is faster to launch an EBS backed instance because the image does not have to be fetched from S3.
If the hardware your EBS-backed instance is scheduled for maintenance, stopping and starting the instance automatically migrates to new hardware. I was also able to move an EBS-backed instance on failed hardware by force-stopping the instance and launching it again (your mileage may vary on failed hardware).
I'm a heavy user of Amazon and switched all of my instances to EBS backed storage as soon as the technology came out of beta. I've been very happy with the result.
EBS can still fail - not a silver bullet
Keep in mind that any piece of cloud-based infrastructure can fail at any time. Plan your infrastructure accordingly. While EBS-backed instances provide certain level of durability compared to ephemeral storage instances, they can and do fail. Have an AMI from which you can launch new instances as needed in any availability zone, back up your important data (e.g. databases), and if your budget allows it, run multiple instances of servers for load balancing and redundancy (ideally in multiple availability zones).
When Not To
At some points in time, it may be cheaper to achieve faster IO on Instance Store instances. There was a time when it was certainly true. Now there are many options for EBS storage, catering to many needs. The options and their pricing evolve constantly as technology changes. If you have a significant amount of instances that are truly disposable (they don't affect your business much if they just go away), do the math on cost vs. performance. EBS-backed instances can also die at any point in time, but my practical experience is that EBS is more durable.
99% of our AWS setup is recyclable. So for me it doesn't really matter if I terminate an instance -- nothing is lost ever. E.g. my application is automatically deployed on an instance from SVN, our logs are written to a central syslog server.
The only benefit of instance storage that I see are cost-savings. Otherwise EBS-backed instances win. Eric mentioned all the advantages.
[2012-07-16] I would phrase this answer a lot different today.
I haven't had any good experience with EBS-backed instances in the past year or so. The last downtimes on AWS pretty much wrecked EBS as well.
I am guessing that a service like RDS uses some kind of EBS as well and that seems to work for the most part. On the instances we manage ourselves, we have got rid off EBS where possible.
Getting rid to an extend where we moved a database cluster back to iron (= real hardware). The only remaining piece in our infrastructure is a DB server where we stripe multiple EBS volumes into a software RAID and backup twice a day. Whatever would be lost in between backups, we can live with.
EBS is a somewhat flakey technology since it's essentially a network volume: a volume attached to your server from remote. I am not negating the work done with it – it is an amazing product since essentially unlimited persistent storage is just an API call away. But it's hardly fit for scenarios where I/O performance is key.
And in addition to how network storage behaves, all network is shared on EC2 instances. The smaller an instance (e.g. t1.micro, m1.small) the worse it gets because your network interfaces on the actual host system are shared among multiple VMs (= your EC2 instance) which run on top of it.
The larger instance you get, the better it gets of course. Better here means within reason.
When persistence is required, I would always advice people to use something like S3 to centralize between instances. S3 is a very stable service. Then automate your instance setup to a point where you can boot a new server and it gets ready by itself. Then there is no need to have network storage which lives longer than the instance.
So all in all, I see no benefit to EBS-backed instances what so ever. I rather add a minute to bootstrap, then run with a potential SPOF.
We like instance-store. It forces us to make our instances completely recyclable, and we can easily automate the process of building a server from scratch on a given AMI. This also means we can easily swap out AMIs. Also, EBS still has performance problems from time to time.
Eric pretty much nailed it. We (Bitnami) are a popular provider of free AMIs for popular applications and development frameworks (PHP, Joomla, Drupal, you get the idea). I can tell you that EBS-backed AMIs are significantly more popular than S3-backed. In general I think s3-backed instances are used for distributed, time-limited jobs (for example, large scale processing of data) where if one machine fails, another one is simply spinned up. EBS-backed AMIS tend to be used for 'traditional' server tasks, such as web or database servers that keep state locally and thus require the data to be available in the case of crashing.
One aspect I did not see mentioned is the fact that you can take snapshots of an EBS-backed instance while running, effectively allowing you to have very cost-effective backups of your infrastructure (the snapshots are block-based and incremental)
I've had the exact same experience as Eric at my last position. Now in my new job, I'm going through the same process I performed at my last job... rebuilding all their AMIs for EBS backed instances - and possibly as 32bit machines (cheaper - but can't use same AMI on 32 and 64 machines).
EBS backed instances launch quickly enough that you can begin to make use of the Amazon AutoScaling API which lets you use CloudWatch metrics to trigger the launch of additional instances and register them to the ELB (Elastic Load Balancer), and also to shut them down when no longer required.
This kind of dynamic autoscaling is what AWS is all about - where the real savings in IT infrastructure can come into play. It's pretty much impossible to do autoscaling right with the old s3 "InstanceStore"-backed instances.
I'm just starting to use EC2 myself so not an expert, but Amazon's own documentation says:
we recommend that you use the local instance store for temporary data and, for data requiring a higher level of durability, we recommend using Amazon EBS volumes or backing up the data to Amazon S3.
Emphasis mine.
I do more data analysis than web hosting, so persistence doesn't matter as much to me as it might for a web site. Given the distinction made by Amazon itself, I wouldn't assume that EBS is right for everyone.
I'll try to remember to weigh in again after I've used both.
EBS is like the virtual disk of a VM:
Durable, instances backed by EBS can be freely started and stopped (saving money)
Can be snapshotted at any point in time, to get point-in-time backups
AMIs can be created from EBS snapshots, so the EBS volume becomes a template for new systems
Instance storage is:
Local, so generally faster
Non-networked, in normal cases EBS I/O comes at the cost of network bandwidth (except for EBS-optimized instances, which have separate EBS bandwidth)
Has limited I/O per second IOPS. Even provisioned I/O maxes out at a few thousand IOPS
Fragile. As soon as the instance is stopped, you lose everything in instance storage.
Here's where to use each:
Use EBS for the backing OS partition and permanent storage (DB data, critical logs, application config)
Use instance storage for in-process data, noncritical logs, and transient application state. Example: external sort storage, tempfiles, etc.
Instance storage can also be used for performance-critical data, when there's replication between instances (NoSQL DBs, distributed queue/message systems, and DBs with replication)
Use S3 for data shared between systems: input dataset and processed results, or for static data used by each system when lauched.
Use AMIs for prebaked, launchable servers
Most people choose to use EBS backed instance as it is stateful. It is to safer because everything you have running and installed inside it, will survive stop/stop or any instance failure.
Instance store is stateless, you loose it with all the data inside in case of any instance failure situation. However, it is free and faster because the instance volume is tied to the physical server where the VM is running.
For someone new to all this and if accidentally landed here
As of now all AMI's in quickstart section are EBS backed
Also there's a good explanation at official doc for difference between EBS and Instance store
& this image pretty much sums it up
If you run multiple instance and assign a scheduled service of AWS Instance as one of your priority on Avoiding Unexpected Charges, I would recommend not to use the instance-store.
As explained on documentation of EBS
Volumes
and the answer from j2d3 and Siddharth Sharma the
instance-store can run for as long as you want, but it cannot be
stopped. Means that the service cannot be scheduled by an Automatic
Start/Stop or Instance
Recovery.
Moreover, for this kind of scheme there is also no benefit to use EBS Backed on Elastic Beanstalk as it is designed to ensure that all the resources you need are keep running. It will always do an automatically relaunches any services that you stop.
Reviewing all the rest, out of the total charges on using the VPC, EBS and ELB that added to EC2-Classic, the EC2-VPC with ELB is mostly the best choice where unlike on EC2-Classic, a stopped instance retains its associated Elastic IP addresses and the EBS volume is stored automatically.
As conclusion, taking the main part of your question:
it seems that EBS is way more useful (stop, start, persist + better
speed) at relatively little difference in cost...?
The answer is yes but if your instance is EBS-based, it can be stopped. It will remain in your account, you will not be charged for it. You will be charge only the volume but EBS is charged hourly. You may also consider that among all available types you have a flexibility to Resize the EBS Volume.
Beside the benefits that already listed by Eric, it shall also be aware that in term of cost S3 may or may not be cheaper than EBS. I agree that it relatively little difference in cost if you keep running both types of instance within the same platform and architecture of the application all the time.
However if there a scenario to run the application on a lower cost service, pull all unhandled task and role them to the VPC/EBS via a pipeline or lambda within a short time basis say <1 hour a day, which impossible to do when you use an instance-store, then it will be a different story.
I'm building some AMIs from one of the basic ones on EC2. One of the instance types is running Tomcat and contains a lot of Lucene indexes; another instance will be running MySQL and have correspondingly large data requirements with it.
I'm trying to define the best way to include those in the AMIs that I'm authoring. If I mount /mnt/lucene and /mnt/mysql, those don't get included in the AMI generated. So it seems to me like the preferred way to deal with those is to have an EBS for each one, take snapshots and spin up instances which have their own EBS based on the most recent snapshots. Is that the best way to proceed?
What is the point of instance storage? It seems like it will only work as a temporary storage area - what am I missing? Presumably there is a reason Amazon offer up to 800GB of storage on standard large instances...
Instance storage is faster than EBS. You don't mention what you will be doing with your instances, but for some applications speed might be more important than durability. For an application that is primarily doing data mining on a large database, having a few hundred gigs of local, fast storage to host the DB might be beneficial. Worker nodes in a MapReduce cluster might also be great candidates for instance storage, depending on what type of job it was.
Another point of instance storage is that it's independent. There have been many EBS outages (google e.g. "site:aws.amazon.com ebs outage"). If the instance runs at all, it has the instance storage available. Obviously if you rely on instance storage, you need to run multiple instances (on multiple availability zones) and tolerate single failing instances.
I know this is late to the game, but one other little considered factoid...
EBS storage makes it exceedingly easy to create AMI's from, whereas, instance-store based storage requires that creation of AMI's be done locally on the machine itself with a whole bunch of work to prep, store, and register the AMI.