How to save intermidiate results on Amazon EC2 when spot instance used? - amazon-ec2

I do some scientific calculations and I have some intermidiate results on each iteration, so I think I can use spot instance reduce cost of processing.
How can I save intermidiate results on each iteration?
How can I automatically rerun instance from last checkpoint when it's terminated?

When the spot price of an Amazon EC2 instance rises above your bid price, your Amazon EC2 instance is terminated. A 2-minute notice is provided via the metadata interface. You can use this notice as a trigger for saving your work, or you could simply save work at regular intervals regardless of the notice period.
Do not save your work "locally", since the Amazon EBS volumes will either be deleted (eg boot volume) or disconnected (eg data volumes). I would recommend that you save your work in a persistent datastore, such as a database or Amazon S3.
One option would be to save files to your local disk, but use the AWS Command-Line Interface (CLI) to copy the files to Amazon S3 using the aws s3 sync command.
Then, if you have configured a persistent spot instance, simply copy the files from Amazon S3 when the new Amazon EC2 spot instance is started.
See:
Spot Instance Interruptions

Related

AWS temporary files before uploading to S3?

My Laravel app allows users to upload images. Currently, when the user uploads their images, they are stored in a temporary location on the server. A cron job then modifies the uploaded images (compresses them, etc.), and uploads them to S3. Any temporary files older than 48 hours that failed to upload to S3 are deleted by another cron job.
I've set up an Elastic Beanstalk environment, but it's occurred to me that storing uploaded images in a temporary directory on an instance is risky because instances can be created and destroyed when necessary.
How and where, then, would I store these temporary files so that they're not at risk of being deleted by an instance?
As discussed in the comments, I think that uploading the file to S3 is the best option. As far as I know, it's not possible to stop Elastic Beanstalk from destroying an ec2 instance, unless you want to get rid of all of the scaling and instance failure/autoreplacement features.
One option I don't know much about may be AWS EBS. "Amazon Elastic Block Store (Amazon EBS) provides persistent block storage volumes for use with Amazon EC2 instances in the AWS Cloud." I don't have any direct experience with EBS, the overriding question of course would be if EBS is truly persistent, even after an ec2 instance is destroyed. As EBS has costs associated with it, it seems like since you are already using S3, S3 would be the way to go.
S3 has a feature called object lifecycle management you can use to have files deleted automatically by setting them them to expire 2 days after they're uploaded.
You can either:
A) Prefix the temporary files to put them in an S3 psuedo-folder (i.e., Temp/), apply the object lifecycle expire rule to that specific prefix (or "folder"), and use the files in there as a source of truth for the new files derived from it post-manipulation.
or
B) Create an S3 bucket specifically for temporary files. Manipulate the files from there and copy to the production bucket.

Amazon EC2 - Automatic Restore Snapshot

I'm looking to setup a demo environment in Amazon that consists of a pre-configured EC2 image that resets itself back to a snapshot configuration every hour, this is would be a Linux VM.
What would be the best way to go about doing this in EC2? Does Amazon offer any tools for scheduling and reverting to the snapshot or would this need to be done from a third party VM or software?
There is no VMWare-like 'snapshot' functionality in Amazon EC2 (where you can roll-back to a point-in-time).
The network-attached disk storage system used with Amazon EC2 is called Amazon Elastic Block Store (EBS). While EBS does have a 'snapshot' function, this actually takes a backup of an EBS Volume and stores it in Amazon S3. The snapshot can then be used to create a new EBS volume, which will contain the same contents as the original disk at the time the snapshot was created.
One option would be to launch a new Amazon EC2 instance, which will automatically create a new boot disk from the indicated Amazon Machine Image (AMI). This is the way to launch new machines with the same disk content. However, this might not lend itself well to your "revert every half hour" since it requires a new machine to be started, which will also trigger a new hourly billing cycle.
You might be able to script the deletion of files or the reload of some database tables, but this will depend upon your particular system and applications.

During hardware failure, do EBS-based EC2 instances terminate or stop?

Amazon's new EBS-based EC2 instances have two options to shutdown: terminate or stop. Stopped instances can be later started again, automatically continuing from the same EBS root disk state they had when they were stopped.
But what happens when an Amazon datacenter has a hardware failure, and the EC2 instance is forced to shutdown. Does it terminate or stop? If the instance has been configured to stop by default on shutdown, can I rely on it being stopped also in this situation, and being able to start it again later?
An EC2 instance can be terminated at any time and one must account for this indeed, as already mentioned in David's answer (+1). You can arrange for a failed instance's Elastic Block Store (EBS) to remain available regardless though, see e.g. the respective FAQ What happens to my data when a system terminates?:
The data stored on a local instance store will persist only as long as
that instance is alive. However, data that is stored on an Amazon EBS
volume will persist independently of the life of the instance. If you
are using an Amazon EBS volume as a root partition, then you have set
the Delete On Terminate flag to "N" for your Amazon EBS volume to
persist outside the life of the instance. [emphasis mine]
This is explained in more detail in section 2. Delete on Termination within Eric Hammond's recommended article Three Ways to Protect EC2 Instances from Accidental Termination and Loss of Data:
Though EBS volumes created and attached to an instance at
instantiation are preserved through a “stop”/”start” cycle, by default
they are destroyed and lost when an EC2 instance is terminated. This
behavior can be changed with a delete-on-termination boolean value
buried in the documentation for the --block-device-mapping option of
ec2-run-instances.
He is referring to the ec2-run-instances documentation, and all this is meanwhile illustrated in more detail within Amazon EC2 Root Device Storage Concepts as well:
By default, the root device volume and the other volumes created when
an Amazon EBS-backed instance is launched are automatically deleted
when the instance terminates [...]. You can change
the default behavior by setting the DeleteOnTermination flag to the
value you want when you launch the instance. For an example of how to
change the flag at launch time, see Using Amazon EC2 Root Device Storage.
I assume you mean the CPU related hardware fails, rather than the network disk. The way I treat EC2 is to create a system that can go up and down without data loss. Anything important you should use an S3 bucket, not EBS.

how does multiple EC2 instances (scaling) works on one EBS for data storage?

So, in a simple situation, if there is only one instance, then I can store the data into a EBS volume mounted on that instance. e.g. /mnt/db
However, how does it work if I scale and have multiple instance (either static or dynamic scaling)?
Because one EBS can only attach to one instance, if I have multiple instance, does it mean that I have to attach an EBS volume for each instance? If that's the case, the data on each Instance's EBS volume will be different.
It is obvious that I want all instances to access (R & W) a single volume (as data-storage). and the data in the volume will constantly grow and there is no downtime.
What is the solution? Is there a way that I don't mount the device (EBS), and just call it for accessing the data?
Here is what I can think of:
1) if each instance has its own EBS volume, then each time interval (e.g. 1 hour), all instances will unmount & detach the EBS volume,and attach a new one. Then there is one powerful instance that mount all the EBS volumes just detached, and aggregate all the data.
2) or similar to 1), instead of detach and attach, I just take a snapshot on all volumes for all instances. Then the powerful instance aggregateness the data from the snapshot. And save the result into either another EBS or S3.
These two approach seem to be working.. but require a lot of work. is there a smarter way to approach this problem? thanks.
by the way, because of performance issue, I cannot have the instance writes data to S3. :)
OH how about this
3) First, all instances have their own EBS and write data into the EBS. and then each hour, data will be sent to S3. Then another instance will aggregate them.
how about having ang NFS instance which can be mounted to the other instances?
It seems that you need to create an EBS snapshot of your most up to date EC2 instance. This will create an EBS backed AMI. You would then need to terminate all your EC2 instances that are not up to date and launch a new stack of instances from your newly created AMI. If you had a load balancer running then you would have to attach these new instances to your load balancer also.
It seems a little long-winded but it can all be done programmatically. At least this is how I think scaling in the cloud with Amazon works and far as propagating changes across multiple instances goes. Somebody else with more experience verify this. I plan to test it out myself later on.

Terminating Amazon EC2 - what happens to persistant data

Its a pretty quick question - I have setup a pretty simple LAMP based website on EC2. I created an EBS and mounted it to the instance where I'm saving all the mysql data and other backups.
Now in order to connect to the instance - I use WINSCP and use the Elastic IP from where I can view all the data.
Now my question is - say I terminate the instance - the backup data and mysql data which resides on the EBS will still be available right. So how can I access this data.
I mean using WINSCP and the same Elastic IP, I wont be able to connect anymore as the instance is terminated - so how can access the data stored on EBS.
Sorry for the ignorant question but just starting to play with EC2
Thanks
I'm assuming you've created an EBS-backed instance and added to that (attached) a further EBS volume as a chunk of extra storage. In which case, when you terminate the instance, the boot EBS volume is released and deleted, but attached EBS storage is only released - it remains in the 'Available' state after the instance has been destroyed and its' data contents are left intact. You can then access whatever is on it by simply attaching it to another running instance.

Resources