Remote task queue using bash & ssh for variable number of live workers - bash

I want to distribute the work from a master server to multiple worker servers using batches.
Ideally I would have a tasks.txt file with the list of tasks to execute
cmd args 1
cmd args 2
cmd args 3
cmd args 4
cmd args 5
cmd args 6
cmd args 7
...
cmd args n
and each worker server will connect using ssh, read the file and mark each line as in progress or done
#cmd args 1 #worker1 - done
#cmd args 2 #worker2 - in progress
#cmd args 3 #worker3 - in progress
#cmd args 4 #worker1 - in progress
cmd args 5
cmd args 6
cmd args 7
...
cmd args n
I know how to make the ssh connection, read the file, and execute remotely but don't know how to make the read and write an atomic operation, in order to not have cases where 2 servers start the same task, and how to update the line.
I would like for each worker to go to the list of tasks and lock the next available task in the list rather than the server actively commanding the workers, as I will have a flexible number of workers clones that I will start or close according to how fast I will need the tasks to complete.
UPDATE:
and my ideea for the worker script would be :
#!/bin/bash
taskCmd=""
taskLine=0
masterSSH="ssh usr#masterhost"
tasksFile="/path/to/tasks.txt"
function getTask(){
while [[ $taskCmd == "" ]]
do
sleep 1;
taskCmd_and_taskLine=$($masterSSH "#read_and_lock_next_available_line $tasksFile;")
taskCmd=${taskCmd_and_taskLine[0]}
taskLine=${taskCmd_and_taskLine[1]}
done
}
function updateTask(){
message=$1
$masterSSH "#update_currentTask $tasksFile $taskLine $message;"
}
function doTask(){
return $taskCmd;
}
while [[ 1 -eq 1 ]]
do
getTask
updateTask "in progress"
doTask
taskErrCode=$?
if [[ $taskErrCode -eq 0 ]]
then
updateTask "done, finished successfully"
else
updateTask "done, error $taskErrCode"
fi
taskCmd="";
taskLine=0;
done

You can use flock to concurrently access the file:
exec 200>>/some/any/file ## create a file descriptor
flock -w 30 200 ## concurrently access /some/any/file, timeout of 30 sec.
You can point the file descriptor to your tasks list or any other file, but of course the same file in order to flock work. The lock will me removed as soon as the process that created it is done or fail. You can also remove the lock by yourself when you don't need it anymore:
flock -u 200
An usage sample:
ssh user#x.x.x.x '
set -e
exec 200>>f
echo locking...
flock -w 10 200
echo working...
sleep 5
'
set -e fails the script if any step fails. Play with the sleep time and execute this script in parallel. Just one sleep will execute at a time.

Check if you are reinventing GNU Parallel:
parallel -S worker1 -S worker2 command ::: arg1 arg2 arg3
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

try to implement something like
while read line; do
echo $line
#check if the line contains the # char, if not execute the ssh, else nothing to do
checkAlreadyDone=$(grep "^#" $line)
if [ -z "${checkAlreadyDone}" ];then
<insert here the command to execute ssh call>
<here, if everything has been executed without issue, you should
add a commad to update the file taskList.txt
one option could be to insert a sed command but it should be tested>
else
echo "nothing to do for $line"
fi
done < taskList.txt
Regards
Claudio

I think I have successfully implemented one: https://github.com/guo-yong-zhi/DistributedTaskQueue
It is mainly based on bash, ssh and flock, and python3 is required for string processing.

Related

How to wait in bash till a shell script is finished?

right now I'm using this script for a program:
export FREESURFER_HOME=$HOME/freesurfer
source $FREESURFER_HOME/SetUpFreeSurfer.sh
cd /home/ubuntu/fastsurfer
datadir=/home/ubuntu/moya/data
fastsurferdir=/home/ubuntu/moya/output
mkdir -p $fastsurferdir/logs # create log dir for storing nohup output log (optional)
while read p ; do
echo $p
nohup ./run_fastsurfer.sh --t1 $datadir/$p/orig.nii \
--parallel --threads 16 --sid $p --sd $fastsurferdir > $fastsurferdir/logs/out-${p}.log &
sleep 3600s
done < /home/ubuntu/moya/data/subjects-list.txt
Instead of using sleep 3600s, as the program needs around an hour, I'd like to use wait until all processes (several PIDS) are finished.
If this is the right way, can you tell me how to do that?
BR Alex
wait will wait for all background processes to finish (see help wait). So all you need is to run wait after creating all of the background processes.
This may be more than what you are asking for but I figured I would provide some methods for controlling the number of threads you want to have running at once. I find that I always want to limit the number for various reasons.
Explaination
The following will limit concurrent threads to max_threads running at one time. I am also using the main design pattern so we have a main that runs the script with a function run_jobs that handles the calling and waiting. I read all of $p into an array, then traverse that array as we launch threads. It will either launch a thread up to 4 or wait 5 seconds, once there are at least one less than four it will start another thread. When finished it waits for any remaining to be done. If you want something more simplistic I can do that as well.
#!/usr/bin/env bash
export FREESURFER_HOME=$HOME/freesurfer
source $FREESURFER_HOME/SetUpFreeSurfer.sh
typeset max_threads=4
typeset subjects_list="/home/ubuntu/moya/data/subjects-list.txt"
typeset subjectsArray
run_jobs() {
local child="$$"
local num_children=0
local i=0
while [[ 1 ]] ; do
num_children=$(ps --no-headers -o pid --ppid=$child | wc -w) ; ((num_children-=1))
echo "Children: $num_children"
if [[ ${num_children} -lt ${max_threads} ]] ;then
if [ $i -lt ${#subjectsArray[#]} ] ;then
((i+=1))
# RUN COMMAND HERE &
./run_fastsurfer.sh --t1 $datadir/${subjectsArray[$i]}/orig.nii \
--parallel --threads 16 --sid ${subjectsArray[$i]} --sd $fastsurferdir
fi
fi
sleep 10
done
wait
}
main() {
cd /home/ubuntu/fastsurfer
datadir=/home/ubuntu/moya/data
fastsurferdir=/home/ubuntu/moya/output
mkdir -p $fastsurferdir/logs # create log dir for storing nohup output log (optional)
mapfile -t subjectsArray < ${subjects_list}
run_jobs
}
main
Note: I did not run this code since you have not provided enough information to actually do so.

question on using bwait to wait for multiple bsub jobs to finish

I am new to using LSF (been using PBS/Torque all along).
I need to write code/logic to make sure all bsub jobs finish before other commands/jobs can be fired.
Here is what I have done: I have a master shell script which calls multiple other shell scripts via bsub commands. I capture the job ids from bsub in a log file and I need to ensure that all jobs get finished before the downstream shell script should execute its other commands.
Master shell script
#!/bin/bash
...Code not shown for brevity..
"Command 1 invoked with multiple bsubs" > log_cmd_1.txt
Need Code logic to use bwait before downstream Commands can be used
"Command 2 will be invoked with multiple bsubs" > log_cmd_2.txt
and so on
stdout captured from Command 1 within the Master Shell script is stored in log_cmd_1.txt which looks like this
Submitting Sample 101
Job <545> is submitted to .
Submitting Sample 102
Job <546> is submitted to .
Submitting Sample 103
Job <547> is submitted to .
Submitting Sample 104
Job <548> is submitted to .
I have used the codeblock shown below after Command 1 in the master shell script.
However, it does not seem to work for my situation. Looks like I would have gotten the whole thing wrong below.
while sleep 30m;
do
#the below gets the JobId from the log_cmd_1.txt and tries bwait
grep '^Job' <path_to>/log_cmd_1.txt | perl -pe 's!.*?<(\d+)>.*!$1!' | while read -r line; do res=$(bwait -w "done($line)");echo $res; done 1>
<path_to>/running.txt;
# the below sed command deletes lines that start with Space
sed '/^\s*$/d' running.txt > running2.txt;
# -s file check operator means "file is not zero size"
if [ -s $WORK_DIR/logs/running2.txt ]
then
echo "Jobs still running";
else
echo "Jobs complete";
break;
fi
done
The question: What's the correct way to do this using bwait within the master shell script.
Thanks in advance.
bwait will block until the condition is satisfied, so the loops are probably not neecessary. Note that since you're using done, if the job fails then bwait will exit and inform you that the condition can never be satisfied. Make sure to check that case.
What you have should work. At least the following test worked for me.
#!/bin/bash
# "Command 1 invoked with multiple bsubs" > log_cmd_1.txt
( bsub sleep 0; bsub sleep 0 ) > log_cmd_1.txt
# Need Code logic to use bwait before downstream Commands can be used
while sleep 1
do
#the below gets the JobId from the log_cmd_1.txt and tries bwait
grep '^Job' log_cmd_1.txt | perl -pe 's!.*?<(\d+)>.*!$1!' | while read -r line; do res=$(bwait -w "done($line)");echo "$res"; done 1> running.txt;
# the below sed command deletes lines that start with Space
sed '/^\s*$/d' running.txt > running2.txt;
# -s file check operator means "file is not zero size"
if [ -s running2.txt ]
then
echo "Jobs still running";
else
echo "Jobs complete";
break;
fi
done
Another way to do it. Which may is a little cleaner, is to use job arrays and job dependencies. Job arrays will combine several pieces of work that can be managed as a single job. So your
"Command 1 invoked with multiple bsubs" > log_cmd_1.txt
could be submitted as a single job array. You'll need a driver script that can launch the individual jobs. Here's an example driver script.
$ cat runbatch1.sh
#!/bin/bash
# $LSB_JOBINDEX goes from 1 to 10
if [ "$LSB_JOBINDEX" -eq 1 ]; then
# do the work for job batch 1, job 1
...
elif [ "$LSB_JOBINDEX" -eq 2 ]; then
# etc
...
fi
Then you can submit the job array like this.
bsub -J 'batch1[1-10]' sh runbatch1.sh
This command will run 10 job array elements. The driver script's environment will use the variable LSB_JOB_INDEX to let you know which element the driver is running. Since the array has a name, batch, it's easier to manage. You can submit a second job array that won't start until all elements of the first have completed successfully. The second array is submitted with this command.
bsub -w 'done(batch1)' -J 'batch2[1-10]' sh runbatch2.sh
I hope that this helps.

How to issue shell commands to slave machines from master and wait until all are finished?

I have 4 shell commands I need to run and they do not depend on each other.
I have 4 slave machines. So, I want to run one of the 4 commands on each of the 4 machines, and then I want to wait until all 4 of them are finished.
How do I distribute this processing? This is what I tried:
$1 is a list of ip addresses to the slave machines.
for host in $(cat $1)
do
echo $host
# ssh into each machine and launch command
ssh username#$host <command>;
done
But this seems as if it is waiting for the command to finish before moving on to the next host and launching the next command.
How do I accomplish this distributed processing that doesn't depend on each other?
I would use GNU Parallel like this - running hostname in parallel on each of 4 servers:
parallel -j 4 --nonall -S 192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4 hostname
If you need to pass parameters, use --onall and put arguments after :::
parallel -j 4 --onall -S 192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4 echo ::: hello
Add --tag if you want the output lines tagged by the hostname/IP.
Add -k if you want to keep the output in order.
Add : to the server list to run on local host too.
If you aren't concerned about how many commands run concurrently, just put each one in the background with &, then wait on them as a group.
while IFS= read -r host; do
ssh username#$host <command> &
done < "$1"
wait
Note the use of a while loop instead of a for loop; see Bash FAQ 001.
The ssh part of your script needs to be like:
$ ssh -f user#host "sh -c 'sleep 30 ; nohup ls > foo 2>&1 &'"
This one sleeps for 30 secs and writes the output of ls to file foo. 30 secs is enough for you to go and see it yourself. Just build your loop around that.

Terminal Application to Keep Web Server Process Alive

Is there an app that can, given a command and options, execute for the lifetime of the process and ping a given URL indefinitely on a specific interval?
If not, could this be done on the terminal as a bash script? I'm almost positive it's doable through terminal, but am not fluent enough to whip it up within a few minutes.
Found this post that has a portion of the solution, minus the ping bits. ping runs on linux, indefinitely; until it's actively killed. How would I kill it from bash after say, two pings?
General Script
As others have suggested, use this in pseudo code:
execute command and save PID
while PID is active, ping and sleep
exit
This results in following script:
#!/bin/bash
# execute command, use '&' at the end to run in background
<command here> &
# store pid
pid=$!
while ps | awk '{ print $1 }' | grep $pid; do
ping <address here>
sleep <timeout here in seconds>
done
Note that the stuff inside <> should be replaces with actual stuff. Be it a command or an ip address.
Break from Loop
To answer your second question, that depends in the loop. In the loop above, simply track the loop count using a variable. To do that, add a ((count++)) inside the loop. And do this: [[ $count -eq 2 ]] && break. Now the loop will break when we're pinging for a second time.
Something like this:
...
while ...; do
...
((count++))
[[ $count -eq 2 ]] && break
done
ping twice
To ping only a few times, use the -c option:
ping -c <count here> <address here>
Example:
ping -c 2 www.google.com
Use man ping for more information.
Better practice
As hek2mgl noted in a comment below, the current solution may not suffice to solve the problem. While answering the question, the core problem will still persist. To aid to that problem, a cron job is suggested in which a simple wget or curl http request is sent periodically. This results in a fairly easy script containing but one line:
#!/bin/bash
curl <address here> > /dev/null 2>&1
This script can be added as a cron job. Leave a comment if you desire more information how to set such a scheduled job. Special thanks to hek2mgl for analyzing the problem and suggesting a sound solution.
Say you want to start a download with wget and while it is running, ping the url:
wget http://example.com/large_file.tgz & #put in background
pid=$!
while kill -s 0 $pid #test if process is running
do
ping -c 1 127.0.0.1 #ping your adress once
sleep 5 #and sleep for 5 seconds
done
A nice little generic utility for this is Daemonize. Its relevant options:
Usage: daemonize [OPTIONS] path [arg] ...
-c <dir> # Set daemon's working directory to <dir>.
-E var=value # Pass environment setting to daemon. May appear multiple times.
-p <pidfile> # Save PID to <pidfile>.
-u <user> # Run daemon as user <user>. Requires invocation as root.
-l <lockfile> # Single-instance checking using lockfile <lockfile>.
Here's an example of starting/killing in use: flickd
To get more sophisticated, you could turn your ping script into a systemd service, now standard on many recent Linuxes.

shell script, for loop, does loop wait for execution of the command to iterate

I have a shell script with a for loop. Does loop wait for execution of the command in its body before iterating?
Thanks in Advance
Here is my code. Will the commands execute sequentially or parallel?
for m in "${mode[#]}"
do
cmd="exec $perlExecutablePath $perlScriptFilePath --owner $j -rel $i -m $m"
$cmd
eval "$cmd"
done
Assuming that you haven't background-ed the command, then yes.
For example:
for i in {1..10}; do cmd; done
waits for cmd to complete before continuing the loop, whereas:
for i in {1..10}; do cmd &; done
doesn't.
If you want to run your commands in parallel, I would suggest changing your loop to something like this:
for m in "${mode[#]}"
do
"$perlExecutablePath" "$perlScriptFilePath" --owner "$j" -rel "$i" -m "$m" &
done
This runs each command in the background, so it doesn't wait for one command to finish before the next one starts.
An alternative would be to look at GNU Parallel, which is designed for this purpose.
Using GNU Parallel it looks like this:
parallel $perlExecutablePath $perlScriptFilePath --owner $j -rel $i -m {} ::: "${mode[#]}"
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Resources