EBS for storing databases vs. website files - amazon-ec2

I spent the day experimenting with AWS for the first time. I've got an EC2 instance running and I mounted an Elastic Block Store (EBS) to keep the MySQL databases.
Does it make sense to also put my web application files on the EBS, or should I just deploy them to the normal EC2 file system?

When you say your web application files, I'm not sure what exactly you are referring to.
If you are referring to your deployed code, it probably doesn't make sense to use EBS. What you want to do is create an AMI with your prerequisites, then have a script to create an instance of that AMI and deploy your latest code. I highly recommend you automate and test this process as it's easy to forget about some setting you have to manually change somewhere.
If you are storing data files, that are modified by the running application, EBS may make sense. If this is something like user-uploaded images or similar, you will likely find that S3 gives you a much simpler model.
EBS would be good for: databases, lucene indexes, file based CMS, SVN repository, or anything similar to that.

EBS gives you persistent storage so if you EC2 instance fails the files still exist. Apparently their is increased IO performance but I would test it to be sure.

If your files are going to change frequently (like a DB does) and you don't want to keep syncing them to S3 (or somewhere else), then an EBS is a good way to go. If you make infrequent changes and you can manually (or scripted) sync the files as necessary then store them in S3. If you need to shutdown or you lose your instance for whatever reason, you can just pull them down when you start up the new instance.
This is also assuming that you care about cost. If cost is not an issue, using the EBS is less complicated.
I'm not sure if you plan on having a separate EBS for your DB and your web files but if you only plan on having one EBS and you have enough empty space on it for your web files, then again, the EBS is less complicated.
If it's performance you are worried about, as mentioned, it's best to test your particular app.

Our approach is to have a script pre-deployed on our AMI that fetches the latest and greatest version of the code from source control. That makes it very straightforward to launch new instances quickly, or update all running instances (we take them out of the load balancing rotation one at a time, run the script, and put them back in the rotation).
UPDATE:
Reading between the lines it looks like you're mounting a separate EBS volume to an instance-store backed instance. AWS recently introduced EBS backed instances that have a ton of benefits vs. the old instance-store ones. I still mount my MySQL data on a separate EBS partition, though, so that I can easily mount it to a different server if needed.
I strongly suggest an EBS backed instance with a separate EBS volume for the MySQL data.

Related

How to update a laravel project in aws Elastic beanstalk, while keeping the same storage

I want to update my Laravel project in aws beanstalk, but the problem is the storage in tha aws elastic beanstalk is now different , and i want to keep it, i dont know how , cuz my Project contains a storage folder, but it's empty, and if i update it , i'll loose all the files
how can i update the code, but keep the storage ?
Your application should be designed to be stateless. The reason is that your EB instances always run in an Auto Scaling group.
This means that they can be terminated and replaced at any time, without your knowledge or involvement. There are many scenarios under which that may happen. Examples are, Availability Zone re-balance, migration to new physical hardware, scaling in and out activities, or instance health degradation.
Subsequently, you are always at risk loosing your storage, whether you like it or not.
Therefore you application should be designed as stateless, which means that it does not store any data on the instance. This is achieved usually by storing the data in an external storage such as EFS:
How can I mount an Amazon EFS volume to an instance in my Elastic Beanstalk environment?
But if you still want to keep your design, you can always use .ebextentions scripts to help you replace the storage folder. Specifically, in Commands you would make a copy of your storage folder to a safe location at the start of the new deployment. Then in Container commands you would copy the files back to your new application folder, just before the new deployment completes.

Do Amazon EBS snapshots retain deleted data?

Consider the following scenario:
I have a file with sensitive information stored on an EBS-backed EC2 instance. I delete this file in the standard non-secure way (rm -f my_secret_file). Once the file is deleted, I immediately shut down the instance and take an EBS snapshot. (Or create an AMI... either one, really.)
If a malicious party was able to gain access to the snapshot and mount/boot it, could they undelete any portion of my_secret_file using the various filesystem tools available? Put another way, do the EBS snapshots retain the data that existed in "unallocated"/deleted blocks at the time?
Yes - I would be extremely surprised if they didn't. The EBS snapshots are block-level snapshots so they will capture everything, regardless of the logical state of the file system similar to a hard disk image.

Back up the whole Alfresco installation directory, hosted on Amazon EC2 instance

I have an Alfresco community installation, hosted on Amazon Web Services, which I am using as a personal repository. I am starting having quite important docs stored within (roughly 2Gb), so I am thinking about how to implement a strong backup/restore strategy.
I have seen many tutorials and official docs, showing how to backup alfresco by backing up two directories, alf_data and the postgresql (or whatever database is used) directory.
The question: in the case of a default Alfresco installation, which means with an embedded database, I wonder if the following scenario is enough for being considered a good cold back up strategy. The starting point is of course stopping Alfresco, then one (or both) of the following.
Tar gz the whole alfresco installation directory and store in a safe place (at the moment Amazon S3).
Create an EBS snapshot with the amazon EC2 console
If both your alf_data and postgres directory is on the EBS, than a snapshot is sufficient.
You just need to know that a hot-backup (done while running Alfresco) could be inconstant: out of sync database & alf_data or inclomplete within a transaction.
A cold-backup is the best, take look at the Alfresco Wiki for more info.
Still when doing a hot-backup at night when there are no jobs running (ldap/cleanup/etc) it's doable.

Swap disappears when stopping an ebs backed instance.

My instance swap file is disappearing when I start my instance.
I have an Ubuntu ec2 instance, and I follow the "Four-step Process to Add Swap File" instructions at https://help.ubuntu.com/community/SwapFaq:
sudo dd if=/dev/zero of=/mnt/512MiB.swap bs=1024 count=524288
sudo chmod 600 /mnt/512MiB.swap
sudo mkswap /mnt/512MiB.swap
sudo swapon /mnt/512MiB.swap
I then changed my /etc/fstab to include:
/mnt/512MiB.swap none swap sw 0 0
Since I am using a much bigger swap, this process takes some time, and I don't want to do it every time I start. I would rather pay for the storage. However, when I start my instance, the swap has disappeared. If I type 'top', the instance does not have a swap file in use.
What should I do?
While the Amazon EC2 instance you are using has EBS backed Root Device Storage, all EC2 instance types still have the EC2 instance storage (also known as an ephemeral store) available for use as well, and the smaller instance types (e.g. m1.small and c1.medium) have it attached and mounted at /mnt by default even (the larger ones not!).
The most important characteristic of this storage type to be aware of is, that the data on the instance store volumes persists only during the life of the associated Amazon EC2 instance.
This statement is nowadays a tiny bit misleading, insofar it applies to stopping an EBS backed instance as well (not rebooting though), i.e. the moment you stop that instance, the ephemeral volume mounted at /mnt is detached and deleted and all data stored there is lost, including your swap file of course; once you start the instance again, a new ephemeral volume will be attached and mounted at /mnt.
Solution
You can still use the EC2 instance storage (which is plentiful and free of charge) if you exactly know what you are doing (see section Background below), e.g. it is a perfect option for strictly temporary data or anything that can be recreated easily on demand, like a cache for example.
A swap file is matching this requirements as well of course, so you simply need to create a script with the commands outlined in your question and execute it on instance start to recreate the swap file. You should put a guard in place though, because the instance storage survives reboots, i.e. you neither need nor should recreate the swap file on reboots, just with real stop/start cycles.
Background
The instance storage used to be the only storage option when Amazon EC2 was first introduced, but the resulting severe limitations for everyday usage have fortunately been remedied with the Amazon Elastic Block Store (EBS) you are using as well accordingly. Eric Hammond has recently provided a great summary why You Should Use EBS Boot Instances on Amazon EC2, addressing this very topic:
If you are just getting started with Amazon EC2, then use EBS boot
instances and stop reading this article. Forget that you ever heard
about instance-store and accept my apology that I just mentioned it.
Once you are completely comfortable with using EBS boot instances on
EC2, you may (or may not) want to come back here and read why you made
a good decision.

how does multiple EC2 instances (scaling) works on one EBS for data storage?

So, in a simple situation, if there is only one instance, then I can store the data into a EBS volume mounted on that instance. e.g. /mnt/db
However, how does it work if I scale and have multiple instance (either static or dynamic scaling)?
Because one EBS can only attach to one instance, if I have multiple instance, does it mean that I have to attach an EBS volume for each instance? If that's the case, the data on each Instance's EBS volume will be different.
It is obvious that I want all instances to access (R & W) a single volume (as data-storage). and the data in the volume will constantly grow and there is no downtime.
What is the solution? Is there a way that I don't mount the device (EBS), and just call it for accessing the data?
Here is what I can think of:
1) if each instance has its own EBS volume, then each time interval (e.g. 1 hour), all instances will unmount & detach the EBS volume,and attach a new one. Then there is one powerful instance that mount all the EBS volumes just detached, and aggregate all the data.
2) or similar to 1), instead of detach and attach, I just take a snapshot on all volumes for all instances. Then the powerful instance aggregateness the data from the snapshot. And save the result into either another EBS or S3.
These two approach seem to be working.. but require a lot of work. is there a smarter way to approach this problem? thanks.
by the way, because of performance issue, I cannot have the instance writes data to S3. :)
OH how about this
3) First, all instances have their own EBS and write data into the EBS. and then each hour, data will be sent to S3. Then another instance will aggregate them.
how about having ang NFS instance which can be mounted to the other instances?
It seems that you need to create an EBS snapshot of your most up to date EC2 instance. This will create an EBS backed AMI. You would then need to terminate all your EC2 instances that are not up to date and launch a new stack of instances from your newly created AMI. If you had a load balancer running then you would have to attach these new instances to your load balancer also.
It seems a little long-winded but it can all be done programmatically. At least this is how I think scaling in the cloud with Amazon works and far as propagating changes across multiple instances goes. Somebody else with more experience verify this. I plan to test it out myself later on.

Resources