Create an AWS volume/snapshot from a tarball - amazon-ec2

I have a large tarball (expands to 13G of stuff). I want an EBS snapshot of a volume with the contents of the tarball as the contents of the volume.
I have an existing rig that involves using Linux loopback mounts to create an ext2 file system, fill it up, unmount it, push it to S3, and tell Amazon to make a snapshot from it. I don't like this rig, because it can only be run on servers where I've set up the apparatus -- but mostly because I suspect that I've reinvented a wheel.
Is there a common technique that makes more use of Amazon tools? I'm imagining something like a chef recipe that creates an instance with the necessary big empty volume, pushes the content to S3, pulls it from S3 to the instance and unpacks it.

Packer from Hashicorp is probably what you are looking for.

Related

Do Amazon EBS snapshots retain deleted data?

Consider the following scenario:
I have a file with sensitive information stored on an EBS-backed EC2 instance. I delete this file in the standard non-secure way (rm -f my_secret_file). Once the file is deleted, I immediately shut down the instance and take an EBS snapshot. (Or create an AMI... either one, really.)
If a malicious party was able to gain access to the snapshot and mount/boot it, could they undelete any portion of my_secret_file using the various filesystem tools available? Put another way, do the EBS snapshots retain the data that existed in "unallocated"/deleted blocks at the time?
Yes - I would be extremely surprised if they didn't. The EBS snapshots are block-level snapshots so they will capture everything, regardless of the logical state of the file system similar to a hard disk image.

sync EBS volumes via S3

I am looking to have multiple Amazon EC2 instances use the same data store. Amazon does not condone mounting an S3 Bucket as a file system, so I am trying to avoid this solution. Is there a way to synchronize an EBS volume with S3 or would it be best to use rsync and cron?
Do you really have to have the files locally available from within EBS? What if instead you served them to yourself via CloudFront, and restricted the permissions so that only your instances (or only your Security Group) could see the files?
Come Fall 2015 you'll be able to use Elastic File Storage (EFS) for this. But until then, I would suppose the next best thing is to use the AWS command-line to sync down from S3 to your volume:
aws s3 sync s3://my-bucket/my/folder /mnt/my-ebs/
After the initial run, that sync command is surprisingly fast. So from there you could just cron it to run hourly or so?

Magento on Amazon ec2 and var /common files directory

I want to put on EC2 a medium Magento Installation.
I have used a Load Balancer, CDN for images, RDS mysql.
But now I have to choose where and how save Magento files( /app and /var directory).
I Think that use the local EBS file system is better, and with GIT I don't have any problem to update the code. But i have a problem with the /var directory, in this I have a directory with many csv used for some integration with other system (that read it with an FTP).
How I can manage those files?
I'm researching how to do something similar and so far this is what I'm exploring but won't be testing it for quite a while, maybe it can help you.
Sign up for Amazon S3. Install Fuse over Amazon. Create a bucket for each folder in S3. Map your /var and /media and whatever other directory to the appropriate bucket. You obviously won't want to store the xml cache that magento does since it will likely be kinda slow. You can mount s3 with a local cache -ouse_cache=/tmp which might be work playing around with.
You should report back if you figure something out.

How to rsync to all Amazon EC2 servers?

I have a Scalr EC2 cluster, and want an easy way to synchronize files across all instances.
For example, I have a bunch of files in /var/www on one instance, I want to be able to identify all of the other hosts, and then rsync to each of those hosts to update their files.
ls /etc/aws/hosts/app/
returns the IP addresses of all of the other instances
10.1.2.3
10.1.33.2
10.166.23.1
Ideas?
As Zach said you could use S3.
You could download one of many clients out there for mapping drives to S3. (search for S3 and webdav).
If I was going to go this route I would setup an S3 bucket with all my shared files and use jetS3 in a cronJob to sync each node's local drive to the bucket (pulling down S3 bucket updates). Then since I normally use eclipse & ant for building, I would create a ANT job for deploying updates to the S3 bucket (pushing updates up to the S3 bucket).
From http://jets3t.s3.amazonaws.com/applications/synchronize.html
Usage: Synchronize [options] UP <S3Path> <File/Directory>
(...)
or: Synchronize [options] DOWN
UP : Synchronize the contents of the Local Directory with S3.
DOWN : Synchronize the contents of S3 with the Local Directory
...
I would recommend the above solution, if you don't need cross-node file locking. It's easy and every system can just pull data from a central location.
If you need more cross-node locking:
An ideal solution would be to use IBM's GPFS, but IBM doesn't just give it away (at least not yet). Even though it's designed for High Performance interconnects it also has the ability to be used over slower connections. We used it as a replacement for NFS and it was amazingly fast ( about 3 times faster than NFS ). There maybe something similar that is open source, but I don't know. EDIT: OpenAFS may work well for building a clustered filesystem over many EC2 instances.
Have you evaluated using NFS? Maybe you could dedicate one instance as an NFS host.

EBS for storing databases vs. website files

I spent the day experimenting with AWS for the first time. I've got an EC2 instance running and I mounted an Elastic Block Store (EBS) to keep the MySQL databases.
Does it make sense to also put my web application files on the EBS, or should I just deploy them to the normal EC2 file system?
When you say your web application files, I'm not sure what exactly you are referring to.
If you are referring to your deployed code, it probably doesn't make sense to use EBS. What you want to do is create an AMI with your prerequisites, then have a script to create an instance of that AMI and deploy your latest code. I highly recommend you automate and test this process as it's easy to forget about some setting you have to manually change somewhere.
If you are storing data files, that are modified by the running application, EBS may make sense. If this is something like user-uploaded images or similar, you will likely find that S3 gives you a much simpler model.
EBS would be good for: databases, lucene indexes, file based CMS, SVN repository, or anything similar to that.
EBS gives you persistent storage so if you EC2 instance fails the files still exist. Apparently their is increased IO performance but I would test it to be sure.
If your files are going to change frequently (like a DB does) and you don't want to keep syncing them to S3 (or somewhere else), then an EBS is a good way to go. If you make infrequent changes and you can manually (or scripted) sync the files as necessary then store them in S3. If you need to shutdown or you lose your instance for whatever reason, you can just pull them down when you start up the new instance.
This is also assuming that you care about cost. If cost is not an issue, using the EBS is less complicated.
I'm not sure if you plan on having a separate EBS for your DB and your web files but if you only plan on having one EBS and you have enough empty space on it for your web files, then again, the EBS is less complicated.
If it's performance you are worried about, as mentioned, it's best to test your particular app.
Our approach is to have a script pre-deployed on our AMI that fetches the latest and greatest version of the code from source control. That makes it very straightforward to launch new instances quickly, or update all running instances (we take them out of the load balancing rotation one at a time, run the script, and put them back in the rotation).
UPDATE:
Reading between the lines it looks like you're mounting a separate EBS volume to an instance-store backed instance. AWS recently introduced EBS backed instances that have a ton of benefits vs. the old instance-store ones. I still mount my MySQL data on a separate EBS partition, though, so that I can easily mount it to a different server if needed.
I strongly suggest an EBS backed instance with a separate EBS volume for the MySQL data.

Resources