How to issue shell commands to slave machines from master and wait until all are finished? - shell

I have 4 shell commands I need to run and they do not depend on each other.
I have 4 slave machines. So, I want to run one of the 4 commands on each of the 4 machines, and then I want to wait until all 4 of them are finished.
How do I distribute this processing? This is what I tried:
$1 is a list of ip addresses to the slave machines.
for host in $(cat $1)
do
echo $host
# ssh into each machine and launch command
ssh username#$host <command>;
done
But this seems as if it is waiting for the command to finish before moving on to the next host and launching the next command.
How do I accomplish this distributed processing that doesn't depend on each other?

I would use GNU Parallel like this - running hostname in parallel on each of 4 servers:
parallel -j 4 --nonall -S 192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4 hostname
If you need to pass parameters, use --onall and put arguments after :::
parallel -j 4 --onall -S 192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4 echo ::: hello
Add --tag if you want the output lines tagged by the hostname/IP.
Add -k if you want to keep the output in order.
Add : to the server list to run on local host too.

If you aren't concerned about how many commands run concurrently, just put each one in the background with &, then wait on them as a group.
while IFS= read -r host; do
ssh username#$host <command> &
done < "$1"
wait
Note the use of a while loop instead of a for loop; see Bash FAQ 001.

The ssh part of your script needs to be like:
$ ssh -f user#host "sh -c 'sleep 30 ; nohup ls > foo 2>&1 &'"
This one sleeps for 30 secs and writes the output of ls to file foo. 30 secs is enough for you to go and see it yourself. Just build your loop around that.

Related

Run jobs in sequence rather than consecutively using bash

So I work a lot with Gaussian 09 (the computational chemistry software) on a supercomputer.
To submit a job I use the following command line
g09sub input.com -n 2 -m 4gb -t 200:00:00
Where n is the number of processors used, m is the memory requested, and t is the time requested.
I was wondering if there was a way to write a script that will submit the first 10 .com files in the folder and then submit another .com file as each finishes.
I have a script that will submit all the .com files in a folder at once, but I have a limit to how many jobs I can queue on the supercomputer I use.
The current script looks like
#!/bin/bash
#SBATCH --partition=shared
for i in *.com
do g09sub $i -n 2 -m 4gb -t 200:00:00
done
So 1.com, 2.com, 3.com, etc would be submitted all at the same time.
What I want is to have 1.com, 2.com, 3.com, 4.com, 5.com, 6.com, 7.com, 8.com, 9.com, and 10.com all start at the same time and then as each of those finishes have another .com file start. So that no more than 10 jobs from any one folder will be running at the same time.
If it would be useful, each job creates a .log file when it is finished.
Though I am unsure if it is important, the supercomputer uses a PBS queuing system.
Try xargs or GNU parallel
xargs
ls *.com | xargs -I {} g09sub -P 10 {} -n 2 -m 4gb -t 200:00:00
Explanation:
-I {} tell that {} will represent input file name
-P 10 set max jobs at once
parallel
ls *.com | parallel -P 10 g09sub {} -n 2 -m 4gb -t 200:00:00 # GNU parallel supports -P too
ls *.com | parallel --jobs 10 g09sub {} -n 2 -m 4gb -t 200:00:00
Explanation:
{} represent input file name
--jobs 10 set max jobs at once
Not sure about the availability on your supercomputer, but the GNU bash manual offers a parallel example under 3.2.6 GNU Parallel, at the bottom.
There are ways to run commands in parallel that are not built into Bash. GNU Parallel is a tool to do just that.
...
Finally, Parallel can be used to run a sequence of shell commands in parallel, similar to ‘cat file | bash’. It is not uncommon to take a list of filenames, create a series of shell commands to operate on them, and feed that list of commands to a shell. Parallel can speed this up. Assuming that file contains a list of shell commands, one per line,
parallel -j 10 < file
will evaluate the commands using the shell (since no explicit command
is supplied as an argument), in blocks of ten shell jobs at a time.
Where that option was not available to me, using the jobs function worked rather crudely. eg:
for entry in *.com; do
while [ $(jobs | wc -l) -gt 9 ]; do
sleep 1 # this is in seconds; your sleep may support 'arbitrary floating point number'
done
g09sub ${entry} -n 2 -m 4gb -t 200:00:00 &
done
$(jobs | wc -l) counts the number of jobs spawned in the background by ${cmd} &

Remote task queue using bash & ssh for variable number of live workers

I want to distribute the work from a master server to multiple worker servers using batches.
Ideally I would have a tasks.txt file with the list of tasks to execute
cmd args 1
cmd args 2
cmd args 3
cmd args 4
cmd args 5
cmd args 6
cmd args 7
...
cmd args n
and each worker server will connect using ssh, read the file and mark each line as in progress or done
#cmd args 1 #worker1 - done
#cmd args 2 #worker2 - in progress
#cmd args 3 #worker3 - in progress
#cmd args 4 #worker1 - in progress
cmd args 5
cmd args 6
cmd args 7
...
cmd args n
I know how to make the ssh connection, read the file, and execute remotely but don't know how to make the read and write an atomic operation, in order to not have cases where 2 servers start the same task, and how to update the line.
I would like for each worker to go to the list of tasks and lock the next available task in the list rather than the server actively commanding the workers, as I will have a flexible number of workers clones that I will start or close according to how fast I will need the tasks to complete.
UPDATE:
and my ideea for the worker script would be :
#!/bin/bash
taskCmd=""
taskLine=0
masterSSH="ssh usr#masterhost"
tasksFile="/path/to/tasks.txt"
function getTask(){
while [[ $taskCmd == "" ]]
do
sleep 1;
taskCmd_and_taskLine=$($masterSSH "#read_and_lock_next_available_line $tasksFile;")
taskCmd=${taskCmd_and_taskLine[0]}
taskLine=${taskCmd_and_taskLine[1]}
done
}
function updateTask(){
message=$1
$masterSSH "#update_currentTask $tasksFile $taskLine $message;"
}
function doTask(){
return $taskCmd;
}
while [[ 1 -eq 1 ]]
do
getTask
updateTask "in progress"
doTask
taskErrCode=$?
if [[ $taskErrCode -eq 0 ]]
then
updateTask "done, finished successfully"
else
updateTask "done, error $taskErrCode"
fi
taskCmd="";
taskLine=0;
done
You can use flock to concurrently access the file:
exec 200>>/some/any/file ## create a file descriptor
flock -w 30 200 ## concurrently access /some/any/file, timeout of 30 sec.
You can point the file descriptor to your tasks list or any other file, but of course the same file in order to flock work. The lock will me removed as soon as the process that created it is done or fail. You can also remove the lock by yourself when you don't need it anymore:
flock -u 200
An usage sample:
ssh user#x.x.x.x '
set -e
exec 200>>f
echo locking...
flock -w 10 200
echo working...
sleep 5
'
set -e fails the script if any step fails. Play with the sleep time and execute this script in parallel. Just one sleep will execute at a time.
Check if you are reinventing GNU Parallel:
parallel -S worker1 -S worker2 command ::: arg1 arg2 arg3
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
try to implement something like
while read line; do
echo $line
#check if the line contains the # char, if not execute the ssh, else nothing to do
checkAlreadyDone=$(grep "^#" $line)
if [ -z "${checkAlreadyDone}" ];then
<insert here the command to execute ssh call>
<here, if everything has been executed without issue, you should
add a commad to update the file taskList.txt
one option could be to insert a sed command but it should be tested>
else
echo "nothing to do for $line"
fi
done < taskList.txt
Regards
Claudio
I think I have successfully implemented one: https://github.com/guo-yong-zhi/DistributedTaskQueue
It is mainly based on bash, ssh and flock, and python3 is required for string processing.

Terminal Application to Keep Web Server Process Alive

Is there an app that can, given a command and options, execute for the lifetime of the process and ping a given URL indefinitely on a specific interval?
If not, could this be done on the terminal as a bash script? I'm almost positive it's doable through terminal, but am not fluent enough to whip it up within a few minutes.
Found this post that has a portion of the solution, minus the ping bits. ping runs on linux, indefinitely; until it's actively killed. How would I kill it from bash after say, two pings?
General Script
As others have suggested, use this in pseudo code:
execute command and save PID
while PID is active, ping and sleep
exit
This results in following script:
#!/bin/bash
# execute command, use '&' at the end to run in background
<command here> &
# store pid
pid=$!
while ps | awk '{ print $1 }' | grep $pid; do
ping <address here>
sleep <timeout here in seconds>
done
Note that the stuff inside <> should be replaces with actual stuff. Be it a command or an ip address.
Break from Loop
To answer your second question, that depends in the loop. In the loop above, simply track the loop count using a variable. To do that, add a ((count++)) inside the loop. And do this: [[ $count -eq 2 ]] && break. Now the loop will break when we're pinging for a second time.
Something like this:
...
while ...; do
...
((count++))
[[ $count -eq 2 ]] && break
done
ping twice
To ping only a few times, use the -c option:
ping -c <count here> <address here>
Example:
ping -c 2 www.google.com
Use man ping for more information.
Better practice
As hek2mgl noted in a comment below, the current solution may not suffice to solve the problem. While answering the question, the core problem will still persist. To aid to that problem, a cron job is suggested in which a simple wget or curl http request is sent periodically. This results in a fairly easy script containing but one line:
#!/bin/bash
curl <address here> > /dev/null 2>&1
This script can be added as a cron job. Leave a comment if you desire more information how to set such a scheduled job. Special thanks to hek2mgl for analyzing the problem and suggesting a sound solution.
Say you want to start a download with wget and while it is running, ping the url:
wget http://example.com/large_file.tgz & #put in background
pid=$!
while kill -s 0 $pid #test if process is running
do
ping -c 1 127.0.0.1 #ping your adress once
sleep 5 #and sleep for 5 seconds
done
A nice little generic utility for this is Daemonize. Its relevant options:
Usage: daemonize [OPTIONS] path [arg] ...
-c <dir> # Set daemon's working directory to <dir>.
-E var=value # Pass environment setting to daemon. May appear multiple times.
-p <pidfile> # Save PID to <pidfile>.
-u <user> # Run daemon as user <user>. Requires invocation as root.
-l <lockfile> # Single-instance checking using lockfile <lockfile>.
Here's an example of starting/killing in use: flickd
To get more sophisticated, you could turn your ping script into a systemd service, now standard on many recent Linuxes.

shell script, for loop, does loop wait for execution of the command to iterate

I have a shell script with a for loop. Does loop wait for execution of the command in its body before iterating?
Thanks in Advance
Here is my code. Will the commands execute sequentially or parallel?
for m in "${mode[#]}"
do
cmd="exec $perlExecutablePath $perlScriptFilePath --owner $j -rel $i -m $m"
$cmd
eval "$cmd"
done
Assuming that you haven't background-ed the command, then yes.
For example:
for i in {1..10}; do cmd; done
waits for cmd to complete before continuing the loop, whereas:
for i in {1..10}; do cmd &; done
doesn't.
If you want to run your commands in parallel, I would suggest changing your loop to something like this:
for m in "${mode[#]}"
do
"$perlExecutablePath" "$perlScriptFilePath" --owner "$j" -rel "$i" -m "$m" &
done
This runs each command in the background, so it doesn't wait for one command to finish before the next one starts.
An alternative would be to look at GNU Parallel, which is designed for this purpose.
Using GNU Parallel it looks like this:
parallel $perlExecutablePath $perlScriptFilePath --owner $j -rel $i -m {} ::: "${mode[#]}"
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

bash script parallel ssh remote command

i have a script that fires remote commands on several different machines through ssh connection. Script goes something like:
for server in list; do
echo "output from $server"
ssh to server execute some command
done
The problem with this is evidently the time, as it needs to establish ssh connection, fire command, wait for answer, print it. What i would like is to have script that would try to establish connections all at once and return echo "output from $server" and output of command as soon as it gets it, so not necessary in the list order.
I've been googling this for a while but didn't find an answer. I cannot cancel ssh session after command run as one thread suggested, because i need an output and i cannot use parallel gnu suggested in other threads. Also i cannot use any other tool, i cannot bring/install anything on this machine, only useable tool is GNU bash, version 4.1.2(1)-release.
Another question is how are ssh sessions like this limited? If i simply paste 5+ or so lines of "ssh connect, do some command" it actually doesn't do anything, or execute only on first from list. (it works if i paste 3-4 lines). Thank you
Have you tried this?
for server in list; do
ssh user#server "command" &
done
wait
echo finished
Update: Start subshells:
for server in list; do
(echo "output from $server"; ssh user#server "command"; echo End $server) &
done
wait
echo All subshells finished
There are several parallel SSH tools that can handle that for you:
http://code.google.com/p/pdsh/
http://sourceforge.net/projects/clusterssh/
http://code.google.com/p/sshpt/
http://code.google.com/p/parallel-ssh/
Also, you could be interested in configuration deployment solutions such as Chef, Puppet, Ansible, Fabric, etc (see this summary ).
A third option is to use a terminal broadcast such as pconsole
If you only can use GNU commands, you can write your script like this:
for server in $servers ; do
( { echo "output from $server" ; ssh user#$server "command" ; } | \
sed -e "s/^/$server:/" ) &
done
wait
and then sort the output to reconcile the lines.
I started with the shell hacks mentionned in this thread, then proceeded to something somewhat more robust : https://github.com/bearstech/pussh
It's my daily workhorse, and I basically run anything against 250 servers in 20 seconds (it's actually rate limited otherwise the connection rate kills my ssh-agent). I've been using this for years.
See for yourself from the man page (clone it and run 'man ./pussh.1') : https://github.com/bearstech/pussh/blob/master/pussh.1
Examples
Show all servers rootfs usage in descending order :
pussh -f servers df -h / |grep /dev |sort -rn -k5
Count the number of processors in a cluster :
pussh -f servers grep ^processor /proc/cpuinfo |wc -l
Show the processor models, sorted by occurence :
pussh -f servers sed -ne "s/^model name.*: //p" /proc/cpuinfo |sort |uniq -c
Fetch a list of installed package in one file per host :
pussh -f servers -o packages-for-%h dpkg --get-selections
Mass copy a file tree (broadcast) :
tar czf files.tar.gz ... && pussh -f servers -i files.tar.gz tar -xzC /to/dest
Mass copy several remote file trees (gather) :
pussh -f servers -o '|(mkdir -p %h && tar -xzC %h)' tar -czC /src/path .
Note that the pussh -u feature (upload and execute) was the main reason why I programmed this, no tools seemed to be able to do this. I still wonder if that's the case today.
You may like the parallel-ssh project with the pssh command:
pssh -h servers.txt -l user command
It will output one line per server when the command is successfully executed. With the -P option you can also see the output of the command.

Resources