Monitoring StarCluster / Sun Grid Engine Cluster Performance - performance

I am a bit new to using StarCluster and SGE. I was wondering what the best practice is for monitoring "Cluster Performance", that is, to determine how many of a certain job the cluster can run in some unit of time. I am familiar with qstat command but that just shows the status of each job. I guess my use case is to submit X jobs and to know how long it takes for all X to complete. Is there an easy out-of-the-box way to do this or must I write a scipt to do it?
Right now I am using Ubuntu 12.04 for each instance.
Thanks Much!

A simple bash script like this one + a time command should suffice then.
lines=999
while [ $lines -ne 0 ]; do
sleep 1;
lines=`qstat -u "*" | wc -l`;
done;
This script will loop as long as the queue is not empty. If you call your script "queue_watch.sh", then start you jobs and then run the command
time bash queue_watch.sh
And that should do it.

Related

How to free up memory after running a process in a shell

I have a script that runs a JAVA process that loads data into a database every 10 secs using a loop. This script seems to work perfectly, but after a couple of days I start getting Memory issues. If I stop the script everything frees up, I can start it again and it will run happily for another couple of days.
RUNME=Y
PROPERTIES=someproprties.properties
CHECKFILE=somelockfile.lock
touch $CHECKFILE
while [ "$RUNME" = "Y" ]; do
if [ -f $CHECKFILE ]
then
#Run Process
$DR_HOME/bin/dr -cp $CP_PLUGIN -Xmx64g --engine parallelism=1 --runjson $HOME_DIR/workflows/some_dataflow.dr --overridefile $PROPERTIES 1> /dev/null 2>> $LOG_FILE
#Give Process a little time to finish up before moving on
sleep 10s
else
RUNME=N
fi
done
I had assumed that once the process had run it would make any memory that it had allocated for the process available again, so that the next iteration of the loop could use this. Given that this does not seem to be the case, is there a way I can force the release of memory post the running the process. I appreciate that this may be something that I need to address in the actual JAVA Process rather than in a Shell - but as this is the area I have more control over, I thought I would at least ask.
To check the processes which are running and memory used
sid=$(ps -p $$ -osid=)
while ....
ps --sid $sid -opid,tty,cpu,vsz,etime,command
vsz shows the virtual size used by the process
Then if it's really bash process, it may be environment which is growing, but from the script it can't be that.

Write a timer in shell script to trigger/run another prepared script

Can anyone give me a hint how to write a timer shell script so that, e.g. once it's got to tomorrow midnight, it will run other scripts that are already prepared in the same directory. Thanks a lot.
If you don't have the ability to use cron or at, you can do this with a script. The key commands are sleep to kill time and date to get the current time. Something like (untested sh script):
while sleep 600; do
time=$(date +%H)
if [ ${time} = '00' ]; then
echo Now is the time
break
fi
done
This technique allows running scripts periodically on systems where cron and at access is disabled. The break is for one time use. Adjust the sleep time to meet your needs. date can return any number of variables that can be used to decide if the desired time has arrived. For simple perodic runs the if statement can be removed.

shell script to loop and start processes in parallel?

I need a shell script that will create a loop to start parallel tasks read in from a file...
Something in the lines of..
#!/bin/bash
mylist=/home/mylist.txt
for i in ('ls $mylist')
do
do something like cp -rp $i /destination &
end
wait
So what I am trying to do is send a bunch of tasks in the background with the "&" for each line in $mylist and wait for them to finish before existing.
However, there may be a lot of lines in there so I want to control how many parallel background processes get started; want to be able to max it at say.. 5? 10?
Any ideas?
Thank you
Your task manager will make it seem like you can run many parallel jobs. How many you can actually run to obtain maximum efficiency depends on your processor. Overall you don't have to worry about starting too many processes because your system will do that for you. If you want to limit them anyway because the number could get absurdly high you could use something like this (provided you execute a cp command every time):
...
while ...; do
jobs=$(pgrep 'cp' | wc -l)
[[ $jobs -gt 50 ]] && (sleep 100 ; continue)
...
done
The number of running cp commands will be stored in the jobs variable and before starting a new iteration it will check if there are too many already. Note that we jump to a new iteration so you'd have to keep track of how many commands you already executed. Alternatively you could use wait.
Edit:
On a side note, you can assign a specific CPU core to a process using taskset, it may come in handy when you have fewer more complex commands.
You are probably looking for something like this using GNU Parallel:
parallel -j10 cp -rp {} /destination :::: /home/mylist.txt
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Small Scale load levelling

I have a series of jobs which need to be done; no dependencies between jobs. I'm looking for a tool which will help me distribute these jobs to machines. The only restriction is that each machine should run one job at a time only. I'm trying to maximize throughput, because the jobs are not very balanced. My current hacked together shell scripts are less than efficient as I pre-build the per-machine job-queue, and can't move jobs from the queue of a heavily loaded machine to one which is waiting, having already finished everything.
Previous suggestions have included SLURM which seems like overkill, and even more overkill LoadLeveller.
GNU Parallel looks like almost exactly what I want, but the remote machines don't speak SSH; there's a custom job launcher used (which has no queueing capabilities). What I'd like is Gnu Parallel where the machine can just be substituted into a shell script on the fly just before the job is dispatched.
So, in summary:
List of Jobs + List of Machines which can accept: Maximize throughput. As close to shell as possible is preferred.
Worst case scenario something can be hacked together with bash's lockfile, but I feel as if a better solution must exist somewhere.
Assuming your jobs are in a text file jobs.tab looking like
/path/to/job1
/path/to/job2
...
Create dispatcher.sh as something like
mkfifo /tmp/jobs.fifo
while true; do
read JOB
if test -z "$JOB"; then
break
fi
echo -n "Dispatching job $JOB .."
echo $JOB >> /tmp/jobs.fifo
echo ".. taken!"
done
rm /tmp/jobs.fifo
and run one instance of
dispatcher.sh < jobs.tab
Now create launcher.sh as
while true; do
read JOB < /tmp/jobs.fifo
if test -z "$JOB"; then
break
fi
#launch job $JOB on machine $0 from your custom launcher
done
and run one instance of launcher.sh per target machine (giving the machine as first and only argument)
GNU Parallel supports your own ssh command. So this should work:
function my_submit { echo On host $1 run command $3; }
export -f my_submit
parallel -j1 -S "my_submit server1,my_submit server2" my_command ::: arg1 arg2

How do I write a bash script to restart a service if it dies?

I have a program that runs as a daemon, using the C command fork(). It creates a new instance that runs in the background. The main instance exists after that.
What would be the best option to check if the service is running? I'm considering:
Create a file with the process id of the program and check if it's running with a script.
Use ps | grep to find the program in the running proccess list.
Thanks.
I think it will be better to manage your process with supervisord, or other process control system.
Create a cron job that runs every few minutes (or whatever you're comfortable with) and does something like this:
/path/to/is_script_stopped.sh && /path/to/script.sh
Write is_script_stopped.sh using any of the methods that you suggest. If your script is stopped cron will evaluate your script, if not, it won't.
To the question, you gave in the headline:
This simple endless loop will restart yourProgram as soon as it fails:
#!/bin/bash
for ((;;))
do
yourProgram
done
If your program depends on a resource, which might fail, it would be wise to insert a short pause, to avoid, that it will catch all system resources when failing million times per second:
#!/bin/bash
for ((;;))
do
yourProgram
sleep 1
done
To the question from the body of your post:
What would be the best option to check if the service is running?
If your ps has a -C option (like the Linux ps) you would prefer that over a ps ax | grep combination.
ps -C yourProgram

Resources