sync EBS volumes via S3

sync EBS volumes via S3 - amazon-ec2

I am looking to have multiple Amazon EC2 instances use the same data store. Amazon does not condone mounting an S3 Bucket as a file system, so I am trying to avoid this solution. Is there a way to synchronize an EBS volume with S3 or would it be best to use rsync and cron?

Do you really have to have the files locally available from within EBS? What if instead you served them to yourself via CloudFront, and restricted the permissions so that only your instances (or only your Security Group) could see the files?
Come Fall 2015 you'll be able to use Elastic File Storage (EFS) for this. But until then, I would suppose the next best thing is to use the AWS command-line to sync down from S3 to your volume:
aws s3 sync s3://my-bucket/my/folder /mnt/my-ebs/
After the initial run, that sync command is surprisingly fast. So from there you could just cron it to run hourly or so?

Related

How can a folder on windows Sync to AWS S3 when ever a new file or folder created inside

My purpose is to keep a local folder inside my computer synced with an AWS S3 bucket. I need to establish this from a windows server where AWS CLI installed.

The right solution depends on how fast you need the bucket to be synced with your local folder.
A very simple solution is to use the Windows task scheduler with the batch "aws s3 sync s3://mybucket/dir localdir" triggered on a regular basis.

You can sync your window drive and S3 bucket in sync with AWS CLI command aws s3 sync
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

AWS temporary files before uploading to S3?

My Laravel app allows users to upload images. Currently, when the user uploads their images, they are stored in a temporary location on the server. A cron job then modifies the uploaded images (compresses them, etc.), and uploads them to S3. Any temporary files older than 48 hours that failed to upload to S3 are deleted by another cron job.
I've set up an Elastic Beanstalk environment, but it's occurred to me that storing uploaded images in a temporary directory on an instance is risky because instances can be created and destroyed when necessary.
How and where, then, would I store these temporary files so that they're not at risk of being deleted by an instance?

As discussed in the comments, I think that uploading the file to S3 is the best option. As far as I know, it's not possible to stop Elastic Beanstalk from destroying an ec2 instance, unless you want to get rid of all of the scaling and instance failure/autoreplacement features.
One option I don't know much about may be AWS EBS. "Amazon Elastic Block Store (Amazon EBS) provides persistent block storage volumes for use with Amazon EC2 instances in the AWS Cloud." I don't have any direct experience with EBS, the overriding question of course would be if EBS is truly persistent, even after an ec2 instance is destroyed. As EBS has costs associated with it, it seems like since you are already using S3, S3 would be the way to go.

S3 has a feature called object lifecycle management you can use to have files deleted automatically by setting them them to expire 2 days after they're uploaded.
You can either:
A) Prefix the temporary files to put them in an S3 psuedo-folder (i.e., Temp/), apply the object lifecycle expire rule to that specific prefix (or "folder"), and use the files in there as a source of truth for the new files derived from it post-manipulation.
or
B) Create an S3 bucket specifically for temporary files. Manipulate the files from there and copy to the production bucket.

How to save intermidiate results on Amazon EC2 when spot instance used?

I do some scientific calculations and I have some intermidiate results on each iteration, so I think I can use spot instance reduce cost of processing.
How can I save intermidiate results on each iteration?
How can I automatically rerun instance from last checkpoint when it's terminated?

When the spot price of an Amazon EC2 instance rises above your bid price, your Amazon EC2 instance is terminated. A 2-minute notice is provided via the metadata interface. You can use this notice as a trigger for saving your work, or you could simply save work at regular intervals regardless of the notice period.
Do not save your work "locally", since the Amazon EBS volumes will either be deleted (eg boot volume) or disconnected (eg data volumes). I would recommend that you save your work in a persistent datastore, such as a database or Amazon S3.
One option would be to save files to your local disk, but use the AWS Command-Line Interface (CLI) to copy the files to Amazon S3 using the aws s3 sync command.
Then, if you have configured a persistent spot instance, simply copy the files from Amazon S3 when the new Amazon EC2 spot instance is started.
See:
Spot Instance Interruptions

Syncing between Amazon EBS Devices

I have 2 EC2 instances, each with their own EBS attached. Sitting infront of the EC2s is a load balancer.
These instances run CMS driven sites, where uses can upload files.
What would be the best solution to the problem of a file getting uploaded to one EBS and the load balancer sending a visitor to the EC2 instance whose EBS does not have the file? Some sort of cron which runs an rsync?
Suggestions very welcome!
Thanks
S

I believe the best solution would be to use single shared storage like Amazon S3. It's better to use some plugin for your CMS to store users' files on S3. But if there is no such plugin you can use Fuse s3fs adapter to mount the file system on both instances and configure your CMS to store those files in that specified directory.

there are several solutions to this problem from top of my head i think
nfs/samba shared dir between instances
svn deploy
cluster file systems - OCFS/GFS
cloud management such as capistrano and trriger a deploy when you need
and of course cron jobs when you can do ftp, scp, rsync, s3sync/copy etc

Or possibly, create one EC2 instance as NFS and share it's directories with your other instances.

There are multiple solutions to keep data in both EC2 in sync with or without using EBS volumes.
Can use AWS EFS service instead of using EBS volumes. EFS volume can be shared between EC2 instances within a VPC, and both instances will have data in sync on the mountpath where EFS is mounted on instances.
Another solution is using Gluster File Storage. This can also work between EBS volumes in different AWS region. Refer this link: http://sanketdangi.com/post/5601762671/gluster-config-aws-multi-az
Can mount S3 bucket on your EC2 instances using S3 Fuse. Refer this link: https://github.com/s3fs-fuse/s3fs-fuse/wiki/Fuse-Over-Amazon
May be you can also use "s3 sync" on both ebs volumes. This way both ebs will be in sync via S3. Refer this link: https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

How to rsync to all Amazon EC2 servers?

I have a Scalr EC2 cluster, and want an easy way to synchronize files across all instances.
For example, I have a bunch of files in /var/www on one instance, I want to be able to identify all of the other hosts, and then rsync to each of those hosts to update their files.
ls /etc/aws/hosts/app/
returns the IP addresses of all of the other instances
10.1.2.3
10.1.33.2
10.166.23.1
Ideas?

As Zach said you could use S3.
You could download one of many clients out there for mapping drives to S3. (search for S3 and webdav).
If I was going to go this route I would setup an S3 bucket with all my shared files and use jetS3 in a cronJob to sync each node's local drive to the bucket (pulling down S3 bucket updates). Then since I normally use eclipse & ant for building, I would create a ANT job for deploying updates to the S3 bucket (pushing updates up to the S3 bucket).
From http://jets3t.s3.amazonaws.com/applications/synchronize.html
Usage: Synchronize [options] UP <S3Path> <File/Directory>
(...)
or: Synchronize [options] DOWN
UP : Synchronize the contents of the Local Directory with S3.
DOWN : Synchronize the contents of S3 with the Local Directory
...
I would recommend the above solution, if you don't need cross-node file locking. It's easy and every system can just pull data from a central location.
If you need more cross-node locking:
An ideal solution would be to use IBM's GPFS, but IBM doesn't just give it away (at least not yet). Even though it's designed for High Performance interconnects it also has the ability to be used over slower connections. We used it as a replacement for NFS and it was amazingly fast ( about 3 times faster than NFS ). There maybe something similar that is open source, but I don't know. EDIT: OpenAFS may work well for building a clustered filesystem over many EC2 instances.
Have you evaluated using NFS? Maybe you could dedicate one instance as an NFS host.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio