What is the recommended EC2 instance type for creating storage/file gateway in AWS.
Compute/memory/Storage optimized?
Also what is the criteria to be considered while allocating the size of ebs for cache? say if i have about 100k files and each of around 5 MB
This is for creating SMB/NFS file share from S3
Related
We are trying to use the d2.xlarge instance type for worker nodes in Hadoop cluster.
When i look at the AWS Site
https://aws.amazon.com/ec2/instance-types/
It says HDD based local storage.
Is it EBS Storage? or attached to the EC2 instance?
You can attach Elastic Block Store (EBS) volumes to any Amazon EC2 instance. EBS is persistent disk storage.
Some EC2 instances also have Instance Store, which is locally-attached disk storage. If the instance is stopped or terminated, the contents of Instance Store is lost.
Instance Store is popular for use with Amazon EMR because it provides very large amounts of storage for HDFS. However, please be aware that the data is lost if the cluster is terminated.
The d2.xlarge instance type has 3 x 2000 GB instance store drives, which are stored on magnetic disk. This is in addition to any EBS volumes you attach.
Local means attached storage. HDD means spinning disks (not SSD's).
I do some scientific calculations and I have some intermidiate results on each iteration, so I think I can use spot instance reduce cost of processing.
How can I save intermidiate results on each iteration?
How can I automatically rerun instance from last checkpoint when it's terminated?
When the spot price of an Amazon EC2 instance rises above your bid price, your Amazon EC2 instance is terminated. A 2-minute notice is provided via the metadata interface. You can use this notice as a trigger for saving your work, or you could simply save work at regular intervals regardless of the notice period.
Do not save your work "locally", since the Amazon EBS volumes will either be deleted (eg boot volume) or disconnected (eg data volumes). I would recommend that you save your work in a persistent datastore, such as a database or Amazon S3.
One option would be to save files to your local disk, but use the AWS Command-Line Interface (CLI) to copy the files to Amazon S3 using the aws s3 sync command.
Then, if you have configured a persistent spot instance, simply copy the files from Amazon S3 when the new Amazon EC2 spot instance is started.
See:
Spot Instance Interruptions
I have a large set of data to be analyzed and I am planning to use Amazon EC2 to compute. So I am wondering where can I store the data for computing.
There is a lot of lingo in the amazon world.
You can either store the data on an EBS drive connected to your EC2 instance, or if it is in MySQL format or a simple format, you could consider storing it on Amazon's managed MySQL service called RDS.
EC2 units can either be backed by S3 storage, or EBS volumes. If you want to have rapid access to your data, you will need to choose an EC2 instance backed by Amazon Elastic Block Storage (EBS). EBS gives you the flexibility to use any database or data structure you want.
I am about to migrate a large web project (many sites using common data) to EC2 and i wondered what would be the best setup (I am very much a newbie with Amazon AWS).
The site pages are rebuilt by scripts once a week and the resultant static pages are served (currently about 7 to 10k views a day). Inbetween the weekly builds I would like to access the db to add/edit data.
I am thinking either EC2 + RDS or EC2 and S3 (S3 having the advantage of keeping a copy of the static pages too). Do these options sound reasonable, based on what I have mentioned?
Thanks in advance
We're using EC2 (experimtented with a few instance types just to learn cpu extra large worked best for our type of application), and rather than using RDS we extensively use EBS -
one EBS for running code, one EBS which holds mysql database files.
S3 is used for incremental backups mostly- as the EBS can be mounted on any other instance easily.
I have 2 EC2 instances, each with their own EBS attached. Sitting infront of the EC2s is a load balancer.
These instances run CMS driven sites, where uses can upload files.
What would be the best solution to the problem of a file getting uploaded to one EBS and the load balancer sending a visitor to the EC2 instance whose EBS does not have the file? Some sort of cron which runs an rsync?
Suggestions very welcome!
Thanks
S
I believe the best solution would be to use single shared storage like Amazon S3. It's better to use some plugin for your CMS to store users' files on S3. But if there is no such plugin you can use Fuse s3fs adapter to mount the file system on both instances and configure your CMS to store those files in that specified directory.
there are several solutions to this problem from top of my head i think
nfs/samba shared dir between instances
svn deploy
cluster file systems - OCFS/GFS
cloud management such as capistrano and trriger a deploy when you need
and of course cron jobs when you can do ftp, scp, rsync, s3sync/copy etc
Or possibly, create one EC2 instance as NFS and share it's directories with your other instances.
There are multiple solutions to keep data in both EC2 in sync with or without using EBS volumes.
Can use AWS EFS service instead of using EBS volumes. EFS volume can be shared between EC2 instances within a VPC, and both instances will have data in sync on the mountpath where EFS is mounted on instances.
Another solution is using Gluster File Storage. This can also work between EBS volumes in different AWS region. Refer this link: http://sanketdangi.com/post/5601762671/gluster-config-aws-multi-az
Can mount S3 bucket on your EC2 instances using S3 Fuse. Refer this link: https://github.com/s3fs-fuse/s3fs-fuse/wiki/Fuse-Over-Amazon
May be you can also use "s3 sync" on both ebs volumes. This way both ebs will be in sync via S3. Refer this link: https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html