I am designing an Auto Scaling system for my application which runs on Amazon EC2 instances. The application reads messages from SQS and processes them.
The Auto Scaling system will monitor two things:
The number of messages in the SQS,
The total number of processes running on all EC2 machines.
For example, if the number of messages in SQS exceeds 3000 I want the system to autoscale, create a new EC2 instance, deploy code on it and whenever the number of messages goes below 2000 i want the system to terminate an EC2 instance.
I am doing this with Ruby and capistrano.
My question is:
I am unable to find a method to determine number of processes running on all EC2 machines and save the number inside a variable. Could you help me?
You might want to utilize cron and CloudWatch API to push the numbers manually to CloudWatch as a part of auto-scaling-group policy. By numbers I mean the number of processes from each instance ps aux | grep your_process | wc -l
CloudWatch will let you set alarm for that manual metric aggregated by SUM of the nr of processes across either all running instances or by auto-scaling-group.
Something to let you get started:
Pushing RAM Memory metrics manually:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts-perl.html
One more:
http://aws.typepad.com/aws/2011/05/amazon-cloudwatch-user-defined-metrics.html
For memory it looks simple, as amazon already provides scripts for this. For processes you might need to dig in these scripts or read the official API docs
EDIT:
If you are now worried about single-point-of-failure in the watching system and you have a list of servers it might be preferred to examine them in parallel from a remote server:
rm ~/count.log
# SSH in parallel
for ROW in `cat ~/ListofIP.txt`
do
IP=`echo ${ROW} | sed 's/\./ /g' | awk '{print $1}'`
ssh -i /path/to/keyfile root#${IP} "ps -ef | grep process_name.rb | grep -v grep | wc -l" >> ~/count.log &
done
# Wait for totals
while [ ! `wc -l ~/ListofIP.txt` -eq `wc -l ~/count.log` ]
do
wait 1
done
# Sum up numbers from ~/count.log
# Push TO CloudWatch
Related
I am running a java TCP/IP server/client set up, and need to automate multiple instances of the client in a bash file like so:
javac *.java
java Server1 &
java Client &
java Client &
java Client &
java Client &
java Client &
ETC.
How do I get them all to terminate when complete?
If you want proper, safe job control, you need to keep track of the process IDs of the backgrounded applications as you background them.
Instead of relying on the output of ps, you should consider using something like this as a start/stop script:
#!/usr/bin/env bash
numclients=5
case "$1" in
start)
# Start the server first...
java Server1 &
pid=$!
echo "$pid" > /var/run/Server1.pid
# Then start the clients.
for clid in $(seq 1 $numclients); do
java client &
pid=$!
echo "$pid" > /var/run/client-${clid}.pid
done
;;
stop)
# Kill the clients first
for clid in $(seq 1 $numclients); do
if [ -s /var/run/client-${clid}.pid ]; then
kill -TERM $(< /var/run/client-${clid}.pid)
fi
done
# Then, kill the server
if [ -s /var/run/Server1.pid ]; then
kill -TERM $(< /var/run/Server1.pid)
fi
;;
esac
I just wrote this, I haven't tested it. If there are typos or incompatibilities with your environment, feel free to solve them and consider the script above script an example of what you should do.
Note that in particular, the seq command is available in FreeBSD and many Linux distros, but is not in older versions of OSX. There are easy alternatives if you need them. jot can function as a replacement in OSX or FreeBSD, and if you don't need/want to use the $numclients variable, you could make a "sequence expression" by using {1..5} (or whatever) instead.
Also, there are a number of other factors you might want to consider when launching and killing your application. For example:
What should happen if everything is already running?
What should happen if only the server or only the clients are already running?
What should happen if the wrong number of clients are already running?
What happens if clients (or even the server) die? (Hint: look at tools like daemontools.)
What happens if pid files are stale?
All of these conditions may be covered by tools that your operating system already uses. You might want to look in to building your application startup and teardown scripts using your system scripts as examples.
Almost all Linux and OS X machines have pkill, which accepts a string and will kill the process of the same name.
You could use kill and pass in a list of pids from grep like so:
kill `ps -ef | grep 'java Client' | grep -v grep | awk '{print $3}'`
I need a shell script that will create a loop to start parallel tasks read in from a file...
Something in the lines of..
#!/bin/bash
mylist=/home/mylist.txt
for i in ('ls $mylist')
do
do something like cp -rp $i /destination &
end
wait
So what I am trying to do is send a bunch of tasks in the background with the "&" for each line in $mylist and wait for them to finish before existing.
However, there may be a lot of lines in there so I want to control how many parallel background processes get started; want to be able to max it at say.. 5? 10?
Any ideas?
Thank you
Your task manager will make it seem like you can run many parallel jobs. How many you can actually run to obtain maximum efficiency depends on your processor. Overall you don't have to worry about starting too many processes because your system will do that for you. If you want to limit them anyway because the number could get absurdly high you could use something like this (provided you execute a cp command every time):
...
while ...; do
jobs=$(pgrep 'cp' | wc -l)
[[ $jobs -gt 50 ]] && (sleep 100 ; continue)
...
done
The number of running cp commands will be stored in the jobs variable and before starting a new iteration it will check if there are too many already. Note that we jump to a new iteration so you'd have to keep track of how many commands you already executed. Alternatively you could use wait.
Edit:
On a side note, you can assign a specific CPU core to a process using taskset, it may come in handy when you have fewer more complex commands.
You are probably looking for something like this using GNU Parallel:
parallel -j10 cp -rp {} /destination :::: /home/mylist.txt
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
Have searched on SO and GNU parallel tutorial and gone through examples here, but still don't quite see what I need solved. Any tips appreciated on how I could accomplish the following:
I need to invoke the same script on several remote servers with a different argument passed to each one (argument is a string), then wait until all those jobs are done... Then, run that same script some more times on those same remote servers, but this time try to keep the remote servers as busy as possible (ie when they finish their job, send them another job). Ideally the strings could be read in from a file on the "master" machine that is sending the jobs to the remote servers.
To diagram this, I'm trying to run *my_script* like this:
server A: myscript fee
server B: myscript fi
When both jobs are done I then want to do something like:
server A: myscript fo
server B: myscript fum
... and supposing A finished its work before server B, immediately sending it the next job like :
server A: myscript englishmun
... etc
Again, hugely appreciate any ideas people might have about whether this is easy/hard with GNU parallel (or if something else like pdsh, cluster ssh, might be better suited).
TIA!
It seems we can split the problem up in two parts: An initialization part that needs to be run on all server and a job processing part that does not care which server it is run on.
The last part is GNU Parallel's specialty:
cat argfile | parallel -S serverA,serverB myscript
The first part is a bit more tricky: You want the first k arguments to go onto to k servers.
head -n 2 argfile | parallel -j1 -S serverA,serverB myscript
The problem is here that if there are loads of servers, then serverA may finish before you get to the last server. It is much easier to run the same job on all servers:
head -n 1 argfile | parallel --onall -S serverA,serverB myscript
I want to run a job on all the active nodes of a 64 node Sun Grid Engine Cluster, scheduled using qsub. I am currently using array-job variable for the same, but sometimes the program is scheduled multiple times on the same node.
qsub -t 1-64:1 -S /home/user/.local/bin/bash program.sh
Is it possible to schedule only one job per node, on all nodes parallely?
You could use a parallel environment. Create a parallel environment with :
qconf -ap "parallel_environment_name"
and set "allocation_rule" to 1, which means that all processes will have to reside on different hosts. Then when submitting your array job, specify your the number of nodes you want to use with your parallel environment. In your case :
qsub -t 1-64:1 -pe "parallel_environment_name" 64 -S /home/user/.local/bin/bash program.sh
For more information, check these links: http://linux.die.net/man/5/sge_pe and Configuring a new parallel environment at DanT's Grid Blog (link no longer working; there are copies on the wayback machine and softpanorama).
I you have a bash terminal, you can run
for host in $(qhost | tail -n +4 | cut -d " " -f 1); do qsub -l hostname=$host program.sh; done
"-l hostname=" specifies on which host to run the job.
The for loop iterates over the result returned by qstat to take each node and call the command specifying the host to use.
Is there a way that Amazon Web Services EC2 instances can be self terminating? Does Amazon have anything that allows an instance to terminate itself ("Hara-Kiri") after running for more than say an hour? I could change the scripts on the running instance to do this itself, but that might fail and I don't want to edit the image, so I would like Amazon to kill the instance.
To have an instance terminate itself do both of these steps:
Start the instance with --instance-initiated-shutdown-behavior terminate or the equivalent on the AWS console or API call.
Run shutdown -h now as root. On Ubuntu, you could set this up to happen in 55 minutes using:
echo "sudo halt" | at now + 55 minutes
I wrote an article a while back on other options to accomplish this same "terminate in an hour" goal:
Automatic Termination of Temporary Instances on Amazon EC2
http://alestic.com/2010/09/ec2-instance-termination
The article was originally written before instance-initiated-shutdown-behavior was available, but you'll find updates and other gems in the comments.
You can do this
ec2-terminate-instances $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
The ec2 will get its current instance id and terminate itself.
Hopefully this will work
instanceId=$(curl http://169.254.169.254/latest/meta-data/instance-id/)
region=$(curl http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | awk '{print $3}' | sed 's/"//g'|sed 's/,//g')
/usr/bin/aws ec2 terminate-instances --instance-ids $instanceId --region $region
Hope this help you !!!
Here is my script for Self-Terminating
$ EC2_INSTANCE_ID="`wget -q -O - http://instance-data/latest/meta-data/instance-id || die \"wget instance-id has failed: $?\"`"
$ echo "ec2-terminate-instances $EC2_INSTANCE_ID" | at now + 55 min || die 'cannot obtain instance-id'
If you want to assign it as Self-Stopping on Self-Terminating, you can do it one time only.
In your EC2 Console go to Instance Settings, change Shutdown Behavior to Stop.
Configure /etc/cloud/cloud.cfg, you may refer to how to run a boot script using cloud-init.
Follow answer from Eric Hammond, put the command in a file and locate it in scripts-per-boot path:
$ echo '#!/bin/sh' > per-boot.sh
$ echo 'echo "halt" | at now + 55 min' >> per-boot.sh
$ echo 'echo per-boot: `date` >> /tmp/per-boot.txt' >> per-boot.sh
$ chmod +x per-boot.sh
$ sudo chown -R root per-boot.sh
$ sudo mv -viu per-boot.sh /var/lib/cloud/scripts/per-boot
Reboot your instance, check if the script is executed:
$ cat /tmp/per-boot.txt
per-boot: Mon Jul 4 15:35:42 UTC 2016
If so, just in case you forgot to stop your instance, it will assure you that the instance will do itself termination as stopping when it has run for 55 minutes or whatever time you set in the script.
Broadcast message from root#ip-10-0-0-32
(unknown) at 16:30 ...
The system is going down for halt NOW!
PS: For everyone want to use the Self-Stopping, one thing you should note that not all EC2 types are self recovery on restarting. I recommend to use EC2-VPC/EBS with On/Off Schedule.
I had a similar need, where I had web applications firing up EC2 instances. I could not trust the web application to stop/terminate the instances, so I created a script to run in a separate process, called the "feeder". The feeder owns the responsibility of stopping/terminating the instance. The web application must periodically request that the feeder "feed" the instance. If an instance "starves" (is not fed within a timeout period), the feeder will stop/terminate it. Two feeders can be run simultaneously on different machines to protect against issues with one feeder process. In other words, the instance runs on a pressure switch. When pressure is released, the instance is stopped/terminated. This also allows me to share an instance among multiple users.
To the original question, the feeder, which could be running in the EC2 instance itself, eliminates the need to know a priori how long the task will be running, but it places a burden on the application to provide periodic feedings. If the laptop is closed, the instance will go down.
The feeder lives here: https://github.com/alessandrocomodi/fpga-webserver and has a permissive open-source license.