Scheduling A Job on AWS EC2 - amazon-ec2

I have a website running on AWS EC2. I need to create a nightly job that generates a sitemap file and uploads the files to the various browsers. I'm looking for a utility on AWS that allows this functionality. I've considered the following:
1) Generate a request to the web server that triggers it to do this task
I don't like this approach because it ties up a server thread and uses cpu cycles on the host
2) Create a cron job on the machine the web server is running on to execute this task
Again, I don't like this approach because it takes cpu cycles away from the web server
3) Create another EC2 instance and set up a cron job to run the task
This solves the web server resource issues, but why pay for an additional EC2 instance to run a job for <5 minutes? Waste of money!
Are there any other options? Is this a job for ElasticMapReduce?

If I were in your shoes, I'd probably start by trying to run the cron job on the web server each night at low tide and monitor the resource usage to make sure it doesn't interfere with the web server.
If you find that it doesn't play nicely, or you have high standards for the elegance of your architecture (I can admire that), then you'll probably need to run a separate instance.
I agree that it seems like a waste to run an instance 24 hours a day for a job you only need to run once a night.
Here's one aproach: The cron job on your primary machine (currently a web server) could fire up a new instance to run the task. It could pass in a user-data script that gets run when the instance starts, and the instance could shut itself down when it completes the task (where instance-initiated-shutdown-behavior was set to "terminate").
Unfortunately, this misses your desire to enforce separation of concerns, it gets complicated when you start scaling to multiple web servers, and it requires your web server to be alive in order for the job to run.
A couple months ago, I came up with a different approach to run an instance on a cron schedule, relying entirely on existing AWS features and with no requirement to have other servers running.
The basic idea is to use Amazon's Auto Scaling with a recurring action that scales the group from "0" to "1" at a specific time each night. The instance can terminate itself when the job is done, and the Auto Scaling can clean up much later to make sure it's terminated.
I've provided more details and a working example in this article:
Running EC2 Instances on a Recurring Schedule with Auto Scaling
http://alestic.com/2011/11/ec2-schedule-instance

Amazon has just released[1] new features for Elastic Beanstalk. You can now create a worker environment containing cron.yaml that configures scheduling tasks calling an URL with the CRON syntax: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-periodictasks
[1] http://aws.amazon.com/about-aws/whats-new/2015/02/17/aws-elastic-beanstalk-supports-environment-cloning-periodic-tasks-and-1-click-iam-role-creation/

Assuming you are running on a *nix version of EC2, I would suggest that you run it in cron using the nice command.
nice changes the priority of the job. You can make it a much lower priority, so if your webserver is busy, the cron job will have to wait for the CPU.
The higher the nice number, the lower the priority.
Nicenesses range from -20 (most favorable scheduling) to 19 (least favorable).

AWS DataPipeline
You can use AWS Data Pipeline to schedule a task with a given period. The action can be any command when you configure your Pipeline with the ShellCommandActivity.
You can even use your existing EC2 instance to run the command: Setup Task Runner on your EC2 instance and set the workerGroup field when setting the ShellCommandActivity (doc) on your DataPipeline:
{
"pipelineId": "df-0937003356ZJEXAMPLE",
"pipelineObjects": [
{
"id": "Schedule",
"name": "Schedule",
"fields": [
{ "key": "startDateTime", "stringValue": "2012-12-12T00:00:00" },
{ "key": "type", "stringValue": "Schedule" },
{ "key": "period", "stringValue": "1 hour" },
{ "key": "endDateTime", "stringValue": "2012-12-21T18:00:00" }
]
}, {
"id": "DoSomething",
"name": "DoSomething",
"fields": [
{ "key": "type", "stringValue": "ShellCommandActivity" },
{ "key": "command", "stringValue": "echo hello" },
{ "key": "schedule", "refValue": "Schedule" },
{ "key": "workerGroup", "stringValue": "yourWorkerGroup" }
]
}
]
}
Limits: Minimum scheduling interval is 15 minutes.
Pricing: About $1.00 per month.

You should consider CloudWatch Event and Lambda (http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html). You only pay for the actual runs. I assume the workers maintained by Elastic beanstalk still cost some money even when they are idle.
Update: found this nice article (http://brianstempin.com/2016/02/29/replacing-the-cron-in-aws/)

If this task can be accomplished with one machine, i recommend booting up an instance programmatically using the fog gem written in ruby.
After you start an instance, you can run a command via ssh. Once completed you can shutdown with fog as well.
Amazon EMR is also a good solution if your task can be written in a map reduce manner. EMR will take care of starting/stopping instances. The elastic-mapreduce-ruby cli tool can help you automate it

You can use AWS Opswork to setup cron jobs for your application. For more information read their user guide on AWS OpsWork. I found a page explaining how to setup cron jobs: http://docs.aws.amazon.com/opsworks/latest/userguide/workingcookbook-extend-cron.html

Related

AWS EC2 Scheduling Tasks with Windows Scheduler

If I have a Amazon Redshift instance and an Amazon EC2 instance (running windows amongst other things), can I set up windows scheduled jobs in the EC2 instance that connects to Redshift and runs copy commands?
Really what I am asking is 'is EC2 just a VM on the cloud' and can I do anything I like in it (like set up windows scheduled jobs and be guaranteed they will run on a scheduled time)
It seems that AWS data pipeline is the recommended way to have scheduled jobs load data into Resdshift but this starts to get pricey with frequent jobs
I ran up a redshift instance and a EC2 windows 2012 instance
I installed the ODBC redshift driver
I ran a VBScript that incremented a counter in a table
I scheduled that script in task scheduler
I logged out of EC2 and came back and the data was updated.
So it seems that using windows scheduler on EC2 is a valid alternative to AWS data pipeline of you want to do it that way.
I haven't yet tried the copy command but I will come back and document that also if I have time

Running multiple environments on one AWS EC2 instance (Elastic Beanstalk)

I am very new to the Amazon AWS services. I was wondering if there is a way to run an instance of EC2 (say, Amazon Linux AMI) and then connect two environments to this instance.
Particularly, I'd like to run a PHP and a Tomcat environment on a single EC2 instance.
The problem is, every time I create a new environment in Elastic Beanstalk, it seems to create a new EC2 instance as well. Am I missing something here?
I'd appreciate any hint on this.
AWS Elastic Beanstalk is designed for deploying your running apps in a way that is designed for scalability from the ground-up. Because of this, Elastic Beanstalk will launch one or more EC2 instances, connect them to an Elastic Load Balancer instance, configure CloudWatch monitoring and Auto Scaling triggers.
Also, because of its fundamental design for scalability, Elastic Beanstalk is designed around a one-app-per-environment model (whereby "environment", I mean one of these EC2 + ELB + CloudWatch + AutoScaling clusters).
Since running two separate web servers with two separate apps (PHP & Java) is not a fundamentally scalable design, it's not a use-case that Elastic Beanstalk is optimized for.
You are free to spin-up a standalone EC2 instance and install whatever you'd like on it, but you're right — git aws.push support has not been made available for standalone EC2 instances. If the git support is important to you, you'll need to weigh the pros and cons of each approach.
I would also like to be able to do this, basically from a cost perspective for demos etc.
For example, a single instance with one PHP app and one Java app. Or, a single instance with two Java apps.
However, from what I have read so far in the Elastic Beanstalk developer guide, I have not found anything explicitly stating that multiple applications per environment is supported (or even, multiple environments per EC2 instance - if that even makes sense).
It makes me wonder if this is a feature that is often requested and planned for the future, or alternatively if the single-app-per-environment model is 'by design' for some reason.

Running a script on an AWS server

I have a script that I need to run once a day that requires a lot of memory. I would like to run it on a dedicated amazon box.
Is there some automated way to build a box, download all required software (like ruby) and then run my script. After the script is ran, I would like to shutdown the box.
The two options I can think of are:
I am thinking about hacking EMR to do this. (My script is a mapper against an empty directory)
Chef - This seemed like too much for one simple script.
You can accomplish setting up a new EC2 instance on startup using the official Ubuntu AMIs, the official Amazon Linux AMIs, and any other AMI that supports the concept of a user-data script.
Create a script (bash, Perl, Python,
whatever) that starts with #!
Pass this script as the user-data when running the EC2 instance.
The script will automatically be run as root on the first boot.
Here's the article where I introduced the concept of a user-data script:
Automate EC2 Instance Setup with user-data Scripts
http://alestic.com/2009/06/ec2-user-data-scripts
Your user-data script can install the required software, configure it, install your work script, and set up a cron job that runs the work script once a day.
ENHANCEMENT:
If the installation script don't take a long time to run (e.g., under an hour or few) then you don't even have to run a single dedicated instance 24 hours a day. You can instead use an approach that lets AWS start an instance for you on a regular schedule.
Here's an article I wrote that provides details on this approach with sample commands:
Running EC2 Instances on a Recurring Schedule with Auto Scaling
http://alestic.com/2011/11/ec2-schedule-instance
The general approach is to use Auto Scaling to start an instance with your user-data script on a regular schedule. Your job will terminate the instance when it has completed. They key is to suspend Auto Scaling's normal desire to re-start instances that terminate so that you don't pay for a running instance until the next time your job starts.

Celery, Resque, or custom solution for processing jobs on machines in my cloud?

My company has thousands of server instances running application code - some instances run databases, others are serving web apps, still others run APIs or Hadoop jobs. All servers run Linux.
In this cloud, developers typically want to do one of two things to an instance:
Upgrade the version of the application running on that instance. Typically this involves a) tagging the code in the relevant subversion repository, b) building an RPM from that tag, and c) installing that RPM on the relevant application server. Note that this operation would touch four instances: the SVN server, the build host (where the build occurs), the YUM host (where the RPM is stored), and the instance running the application.
Today, a rollout of a new application version might be to 500 instances.
Run an arbitrary script on the instance. The script can be written in any language provided the interpreter exists on that instance. E.g. The UI developer wants to run his "check_memory.php" script which does x, y, z on the 10 UI instances and then restarts the webserver if some conditions are met.
What tools should I look at to help build this system? I've seen Celery and Resque and delayed_job, but they seem like they're built for moving through a lot of tasks. This system is under much less load - maybe on a big day a thousand hundred upgrade jobs might run, and a couple hundred executions of arbitrary scripts. Also, they don't support tasks written in any language.
How should the central "job processor" communicate with the instances? SSH, message queues (which one), something else?
Thank you for your help.
NOTE: this cloud is proprietary, so EC2 tools are not an option.
I can think of two approaches:
Set up password-less SSH on the servers, have a file that contains the list of all machines in the cluster, and run your scripts directly using SSH. For example: ssh user#foo.com "ls -la". This is the same approach used by Hadoop's cluster startup and shutdown scripts. If you want to assign tasks dynamically, you can pick nodes at random.
Use something like Torque or Sun Grid Engine to manage your cluster.
The package installation can be wrapped inside a script, so you just need to solve the second problem, and use that solution to solve the first one :)

Why is my Amazon EC2 instance "pending"?

I've been evaluating several cloud compute providers, Amazon EC2 among them. I started an instance with a Windows image, and ever since then it's been "pending", for more than 30 minutes now.
Is this a typical amount of wait for an instance to start? This would be highly undesirable for my purpose. Perhaps I started it incorrectly? I couldn't find any info on what "pending" means on Amazon - does anyone here know?
Pending means the instance is being created, if it has been like that for 30 minutes something went wrong, typicaly I wait for 3 minutes.
I would just create another instance and when the pending one is over, terminate it.
you would probably waste 12.5 cents tho...
I recently faced the same issue. After reaching out to AWS support, they provided a workaround which worked well.
You can use the AWS CLI to stop the instance, instead of AWS console. Although an instance stack in Pending state cannot be managed through AWS console, using the AWS CLI allows you to stop it and start it again. The following command should allow you to stop the instance and force it move to the "Stopped" state:
aws ec2 stop-instances --instance-ids <You instance Id>
You can find more information about how to install and use AWS CLI here:
https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html

Resources