Magento on Amazon ec2 and var /common files directory

Magento on Amazon ec2 and var /common files directory - magento

I want to put on EC2 a medium Magento Installation.
I have used a Load Balancer, CDN for images, RDS mysql.
But now I have to choose where and how save Magento files( /app and /var directory).
I Think that use the local EBS file system is better, and with GIT I don't have any problem to update the code. But i have a problem with the /var directory, in this I have a directory with many csv used for some integration with other system (that read it with an FTP).
How I can manage those files?

I'm researching how to do something similar and so far this is what I'm exploring but won't be testing it for quite a while, maybe it can help you.
Sign up for Amazon S3. Install Fuse over Amazon. Create a bucket for each folder in S3. Map your /var and /media and whatever other directory to the appropriate bucket. You obviously won't want to store the xml cache that magento does since it will likely be kinda slow. You can mount s3 with a local cache -ouse_cache=/tmp which might be work playing around with.
You should report back if you figure something out.

Related

AWS temporary files before uploading to S3?

My Laravel app allows users to upload images. Currently, when the user uploads their images, they are stored in a temporary location on the server. A cron job then modifies the uploaded images (compresses them, etc.), and uploads them to S3. Any temporary files older than 48 hours that failed to upload to S3 are deleted by another cron job.
I've set up an Elastic Beanstalk environment, but it's occurred to me that storing uploaded images in a temporary directory on an instance is risky because instances can be created and destroyed when necessary.
How and where, then, would I store these temporary files so that they're not at risk of being deleted by an instance?

As discussed in the comments, I think that uploading the file to S3 is the best option. As far as I know, it's not possible to stop Elastic Beanstalk from destroying an ec2 instance, unless you want to get rid of all of the scaling and instance failure/autoreplacement features.
One option I don't know much about may be AWS EBS. "Amazon Elastic Block Store (Amazon EBS) provides persistent block storage volumes for use with Amazon EC2 instances in the AWS Cloud." I don't have any direct experience with EBS, the overriding question of course would be if EBS is truly persistent, even after an ec2 instance is destroyed. As EBS has costs associated with it, it seems like since you are already using S3, S3 would be the way to go.

S3 has a feature called object lifecycle management you can use to have files deleted automatically by setting them them to expire 2 days after they're uploaded.
You can either:
A) Prefix the temporary files to put them in an S3 psuedo-folder (i.e., Temp/), apply the object lifecycle expire rule to that specific prefix (or "folder"), and use the files in there as a source of truth for the new files derived from it post-manipulation.
or
B) Create an S3 bucket specifically for temporary files. Manipulate the files from there and copy to the production bucket.

Create an AWS volume/snapshot from a tarball

I have a large tarball (expands to 13G of stuff). I want an EBS snapshot of a volume with the contents of the tarball as the contents of the volume.
I have an existing rig that involves using Linux loopback mounts to create an ext2 file system, fill it up, unmount it, push it to S3, and tell Amazon to make a snapshot from it. I don't like this rig, because it can only be run on servers where I've set up the apparatus -- but mostly because I suspect that I've reinvented a wheel.
Is there a common technique that makes more use of Amazon tools? I'm imagining something like a chef recipe that creates an instance with the necessary big empty volume, pushes the content to S3, pulls it from S3 to the instance and unpacks it.

Packer from Hashicorp is probably what you are looking for.

How do I run my application code (PHP) across my various Amazon EC2 instances?

I've been trying to get to grips with Amazons AWS services for a client. As is evidenced by the very n00bish question(s) I'm about to ask I'm having a little trouble wrapping my head round some very basic things:
a) I've played around with a few instances and managed to get LAMP working just fine, the problem I'm having is that the code I place in /var/www doesn't seem to be shared across those machines. What do I have to do to achieve this? I was thinking of a shared EBS volume and changing Apaches document root?
b) Furthermore what is the best way to upload code and assets to an EBS/S3 volume? Should I setup an instance to handle FTP to the aforementioned shared volume?
c) Finally I have a basic plan for the setup that I wanted to run by someone that actually knows what they are talking about:
DNS pointing to Load Balancer (AWS Elastic Beanstalk)
Load Balancer managing multiple AWS EC2 instances.
EC2 instances sharing code from a single EBS store.
An RDS instance to handle database queries.
Cloud Front to serve assets directly to the user.
Thanks,
Rich.
Edit: My Solution for anyone that comes across this on google.
Please note that my setup is not finished yet and the bash scripts I'm providing in this explanation are probably not very good as even though I'm very comfortable with the command line I have no experience of scripting in bash. However, it should at least show you how my setup works in theory.
All AMIs are Ubuntu Maverick i386 from Alestic.
I have two AMI Snapshots:
Master
Users
git - Very limited access runs git-shell so can't be accessed via SSH but hosts a git repository which can be pushed to or pulled from.
ubuntu - Default SSH account, used to administer server and deploy code.
Services
Simple git repository hosting via ssh.
Apache and PHP, databases are hosted on Amazon RDS
Slave
Services
Apache and PHP, databases are hosted on Amazon RDS
Right now (this will change) this is how deploy code to my servers:
Merge changes to master branch on local machine.
Stop all slave instances.
Use Git to push the master branch to the master server.
Login to ubuntu user via SSH on master server and run script which does the following:
Exports (git-archive) code from local repository to folder.
Compresses folder and uploads backup of code to S3 with timestamp attached to the file name.
Replaces code in /var/www/ with folder and gives appropriate permissions.
Removes exported folder from home directory but leaves compressed file intact with containing the latest code.
5 Start all slave instances. On startup they run a script:
Apache does not start until it's triggered.
Use scp (Secure copy) to copy latest compressed code from master to /tmp/www
Extract code and replace /var/www/ and give appropriate permissions.
Start Apache.
I would provide code examples but they are very incomplete and I need more time. I also want to get all my assets (css/js/img) being automatically being pushed to s3 so they can be distibutes to clients via CloudFront.

EBS is like a harddrive you can attach to one instance, basically a 1:1 mapping. S3 is the only shared storage stuff in AWS, otherwise you will need to setup an NFS server or similar.
What you can do is put all your php files on s3 and then sync them down to a new instance when you start it.
I would recommend bundling a custom AMI with everything you need installed (apache, php, etc) and setup a cron job to sync php files from s3 to your document root. Your workflow would be, upload files to s3, let server cron sync files.
The rest of your setup seems pretty standard.

How to rsync to all Amazon EC2 servers?

I have a Scalr EC2 cluster, and want an easy way to synchronize files across all instances.
For example, I have a bunch of files in /var/www on one instance, I want to be able to identify all of the other hosts, and then rsync to each of those hosts to update their files.
ls /etc/aws/hosts/app/
returns the IP addresses of all of the other instances
10.1.2.3
10.1.33.2
10.166.23.1
Ideas?

As Zach said you could use S3.
You could download one of many clients out there for mapping drives to S3. (search for S3 and webdav).
If I was going to go this route I would setup an S3 bucket with all my shared files and use jetS3 in a cronJob to sync each node's local drive to the bucket (pulling down S3 bucket updates). Then since I normally use eclipse & ant for building, I would create a ANT job for deploying updates to the S3 bucket (pushing updates up to the S3 bucket).
From http://jets3t.s3.amazonaws.com/applications/synchronize.html
Usage: Synchronize [options] UP <S3Path> <File/Directory>
(...)
or: Synchronize [options] DOWN
UP : Synchronize the contents of the Local Directory with S3.
DOWN : Synchronize the contents of S3 with the Local Directory
...
I would recommend the above solution, if you don't need cross-node file locking. It's easy and every system can just pull data from a central location.
If you need more cross-node locking:
An ideal solution would be to use IBM's GPFS, but IBM doesn't just give it away (at least not yet). Even though it's designed for High Performance interconnects it also has the ability to be used over slower connections. We used it as a replacement for NFS and it was amazingly fast ( about 3 times faster than NFS ). There maybe something similar that is open source, but I don't know. EDIT: OpenAFS may work well for building a clustered filesystem over many EC2 instances.
Have you evaluated using NFS? Maybe you could dedicate one instance as an NFS host.

EBS for storing databases vs. website files

I spent the day experimenting with AWS for the first time. I've got an EC2 instance running and I mounted an Elastic Block Store (EBS) to keep the MySQL databases.
Does it make sense to also put my web application files on the EBS, or should I just deploy them to the normal EC2 file system?

When you say your web application files, I'm not sure what exactly you are referring to.
If you are referring to your deployed code, it probably doesn't make sense to use EBS. What you want to do is create an AMI with your prerequisites, then have a script to create an instance of that AMI and deploy your latest code. I highly recommend you automate and test this process as it's easy to forget about some setting you have to manually change somewhere.
If you are storing data files, that are modified by the running application, EBS may make sense. If this is something like user-uploaded images or similar, you will likely find that S3 gives you a much simpler model.
EBS would be good for: databases, lucene indexes, file based CMS, SVN repository, or anything similar to that.

EBS gives you persistent storage so if you EC2 instance fails the files still exist. Apparently their is increased IO performance but I would test it to be sure.

If your files are going to change frequently (like a DB does) and you don't want to keep syncing them to S3 (or somewhere else), then an EBS is a good way to go. If you make infrequent changes and you can manually (or scripted) sync the files as necessary then store them in S3. If you need to shutdown or you lose your instance for whatever reason, you can just pull them down when you start up the new instance.
This is also assuming that you care about cost. If cost is not an issue, using the EBS is less complicated.
I'm not sure if you plan on having a separate EBS for your DB and your web files but if you only plan on having one EBS and you have enough empty space on it for your web files, then again, the EBS is less complicated.
If it's performance you are worried about, as mentioned, it's best to test your particular app.

Our approach is to have a script pre-deployed on our AMI that fetches the latest and greatest version of the code from source control. That makes it very straightforward to launch new instances quickly, or update all running instances (we take them out of the load balancing rotation one at a time, run the script, and put them back in the rotation).
UPDATE:
Reading between the lines it looks like you're mounting a separate EBS volume to an instance-store backed instance. AWS recently introduced EBS backed instances that have a ton of benefits vs. the old instance-store ones. I still mount my MySQL data on a separate EBS partition, though, so that I can easily mount it to a different server if needed.
I strongly suggest an EBS backed instance with a separate EBS volume for the MySQL data.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio