Related
How to wait in a bash script for several subprocesses spawned from that script to finish, and then return exit code !=0 when any of the subprocesses ends with code !=0?
Simple script:
#!/bin/bash
for i in `seq 0 9`; do
doCalculations $i &
done
wait
The above script will wait for all 10 spawned subprocesses, but it will always give exit status 0 (see help wait). How can I modify this script so it will discover exit statuses of spawned subprocesses and return exit code 1 when any of subprocesses ends with code !=0?
Is there any better solution for that than collecting PIDs of the subprocesses, wait for them in order and sum exit statuses?
wait also (optionally) takes the PID of the process to wait for, and with $! you get the PID of the last command launched in the background.
Modify the loop to store the PID of each spawned sub-process into an array, and then loop again waiting on each PID.
# run processes and store pids in array
for i in $n_procs; do
./procs[${i}] &
pids[${i}]=$!
done
# wait for all pids
for pid in ${pids[*]}; do
wait $pid
done
http://jeremy.zawodny.com/blog/archives/010717.html :
#!/bin/bash
FAIL=0
echo "starting"
./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &
for job in `jobs -p`
do
echo $job
wait $job || let "FAIL+=1"
done
echo $FAIL
if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi
Here is simple example using wait.
Run some processes:
$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &
Then wait for them with wait command:
$ wait < <(jobs -p)
Or just wait (without arguments) for all.
This will wait for all jobs in the background are completed.
If the -n option is supplied, waits for the next job to terminate and returns its exit status.
See: help wait and help jobs for syntax.
However the downside is that this will return on only the status of the last ID, so you need to check the status for each subprocess and store it in the variable.
Or make your calculation function to create some file on failure (empty or with fail log), then check of that file if exists, e.g.
$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.
How about simply:
#!/bin/bash
pids=""
for i in `seq 0 9`; do
doCalculations $i &
pids="$pids $!"
done
wait $pids
...code continued here ...
Update:
As pointed by multiple commenters, the above waits for all processes to be completed before continuing, but does not exit and fail if one of them fails, it can be made to do with the following modification suggested by #Bryan, #SamBrightman, and others:
#!/bin/bash
pids=""
RESULT=0
for i in `seq 0 9`; do
doCalculations $i &
pids="$pids $!"
done
for pid in $pids; do
wait $pid || let "RESULT=1"
done
if [ "$RESULT" == "1" ];
then
exit 1
fi
...code continued here ...
If you have GNU Parallel installed you can do:
# If doCalculations is a function
export -f doCalculations
seq 0 9 | parallel doCalculations {}
GNU Parallel will give you exit code:
0 - All jobs ran without error.
1-253 - Some of the jobs failed. The exit status gives the number of failed jobs
254 - More than 253 jobs failed.
255 - Other error.
Watch the intro videos to learn more: http://pi.dk/1
Here's what I've come up with so far. I would like to see how to interrupt the sleep command if a child terminates, so that one would not have to tune WAITALL_DELAY to one's usage.
waitall() { # PID...
## Wait for children to exit and indicate whether all exited with 0 status.
local errors=0
while :; do
debug "Processes remaining: $*"
for pid in "$#"; do
shift
if kill -0 "$pid" 2>/dev/null; then
debug "$pid is still alive."
set -- "$#" "$pid"
elif wait "$pid"; then
debug "$pid exited with zero exit status."
else
debug "$pid exited with non-zero exit status."
((++errors))
fi
done
(("$#" > 0)) || break
# TODO: how to interrupt this sleep when a child terminates?
sleep ${WAITALL_DELAY:-1}
done
((errors == 0))
}
debug() { echo "DEBUG: $*" >&2; }
pids=""
for t in 3 5 4; do
sleep "$t" &
pids="$pids $!"
done
waitall $pids
To parallelize this...
for i in $(whatever_list) ; do
do_something $i
done
Translate it to this...
for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
(
export -f do_something ## export functions (if needed)
export PATH ## export any variables that are required
xargs -I{} --max-procs 0 bash -c ' ## process in batches...
{
echo "processing {}" ## optional
do_something {}
}'
)
If an error occurs in one process, it won't interrupt the other processes, but it will result in a non-zero exit code from the sequence as a whole.
Exporting functions and variables may or may not be necessary, in any particular case.
You can set --max-procs based on how much parallelism you want (0 means "all at once").
GNU Parallel offers some additional features when used in place of xargs -- but it isn't always installed by default.
The for loop isn't strictly necessary in this example since echo $i is basically just regenerating the output of $(whatever_list). I just think the use of the for keyword makes it a little easier to see what is going on.
Bash string handling can be confusing -- I have found that using single quotes works best for wrapping non-trivial scripts.
You can easily interrupt the entire operation (using ^C or similar), unlike the the more direct approach to Bash parallelism.
Here's a simplified working example...
for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
{
echo sleep {}
sleep 2s
}'
This is something that I use:
#wait for jobs
for job in `jobs -p`; do wait ${job}; done
This is an expansion on the most-upvoted answer, by #Luca Tettamanti, to make a fully-runnable example.
That answer left me wondering:
What type of variable is n_procs, and what does it contain? What type of variable is procs, and what does it contain? Can someone please update this answer to make it runnable by adding definitions for those variables? I don't understand how.
...and also:
How do you get the return code from the subprocess when it has completed (which is the whole crux of this question)?
Anyway, I figured it out, so here is a fully-runnable example.
Notes:
$! is how to obtain the PID (Process ID) of the last-executed sub-process.
Running any command with the & after it, like cmd &, for example, causes it to run in the background as a parallel suprocess with the main process.
myarray=() is how to create an array in bash.
To learn a tiny bit more about the wait built-in command, see help wait. See also, and especially, the official Bash user manual on Job Control built-ins, such as wait and jobs, here: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait.
Full, runnable program: wait for all processes to end
multi_process_program.sh (from my eRCaGuy_hello_world repo):
#!/usr/bin/env bash
# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
seconds_to_sleep="$1"
sleep "$seconds_to_sleep"
return "$seconds_to_sleep"
}
# Create an array of whatever commands you want to run as subprocesses
procs=() # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")
num_procs=${#procs[#]} # number of processes
echo "num_procs = $num_procs"
# run commands as subprocesses and store pids in an array
pids=() # bash array
for (( i=0; i<"$num_procs"; i++ )); do
echo "cmd = ${procs[$i]}"
${procs[$i]} & # run the cmd as a subprocess
# store pid of last subprocess started; see:
# https://unix.stackexchange.com/a/30371/114401
pids+=("$!")
echo " pid = ${pids[$i]}"
done
# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[#]}"; do
wait "$pid"
return_code="$?"
echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."
Change the file above to be executable by running chmod +x multi_process_program.sh, then run it like this:
time ./multi_process_program.sh
Sample output. See how the output of the time command in the call shows it took 5.084sec to run. We were also able to successfully retrieve the return code from each subprocess.
eRCaGuy_hello_world/bash$ time ./multi_process_program.sh
num_procs = 4
cmd = my_sleep 5
pid = 21694
cmd = my_sleep 2
pid = 21695
cmd = my_sleep 3
pid = 21697
cmd = my_sleep 4
pid = 21699
PID = 21694; return_code = 5
PID = 21695; return_code = 2
PID = 21697; return_code = 3
PID = 21699; return_code = 4
All 4 processes have ended.
PID 21694 is done; return_code = 5; 3 PIDs remaining.
PID 21695 is done; return_code = 2; 2 PIDs remaining.
PID 21697 is done; return_code = 3; 1 PIDs remaining.
PID 21699 is done; return_code = 4; 0 PIDs remaining.
real 0m5.084s
user 0m0.025s
sys 0m0.061s
Going further: determine live when each individual process ends
If you'd like to do some action as each process finishes, and you don't know when they will finish, you can poll in an infinite while loop to see when each process terminates, then do whatever action you want.
Simply comment out the "OPTION 1" block of code above, and replace it with this "OPTION 2" block instead:
# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
for i in "${!pids[#]}"; do
pid="${pids[$i]}"
# echo "pid = $pid" # debugging
# See if PID is still running; see my answer here:
# https://stackoverflow.com/a/71134379/4561887
ps --pid "$pid" > /dev/null
if [ "$?" -ne 0 ]; then
# PID doesn't exist anymore, meaning it terminated
# 1st, read its return code
wait "$pid"
return_code="$?"
# 2nd, remove this PID from the `pids` array by `unset`ting the
# element at this index; NB: due to how bash arrays work, this does
# NOT actually remove this element from the array. Rather, it
# removes its index from the `"${!pids[#]}"` list of indices,
# adjusts the array count(`"${#pids[#]}"`) accordingly, and it sets
# the value at this index to either a null value of some sort, or
# an empty string (I'm not exactly sure).
unset "pids[$i]"
num_pids="${#pids[#]}"
echo "PID $pid is done; return_code = $return_code;" \
"$num_pids PIDs remaining."
fi
done
# exit the while loop if the `pids` array is empty
if [ "${#pids[#]}" -eq 0 ]; then
break
fi
# Do some small sleep here to keep your polling loop from sucking up
# 100% of one of your CPUs unnecessarily. Sleeping allows other processes
# to run during this time.
sleep 0.1
done
Sample run and output of the full program with Option 1 commented out and Option 2 in-use:
eRCaGuy_hello_world/bash$ ./multi_process_program.sh
num_procs = 4
cmd = my_sleep 5
pid = 22275
cmd = my_sleep 2
pid = 22276
cmd = my_sleep 3
pid = 22277
cmd = my_sleep 4
pid = 22280
PID 22276 is done; return_code = 2; 3 PIDs remaining.
PID 22277 is done; return_code = 3; 2 PIDs remaining.
PID 22280 is done; return_code = 4; 1 PIDs remaining.
PID 22275 is done; return_code = 5; 0 PIDs remaining.
Each of those PID XXXXX is done lines prints out live right after that process has terminated! Notice that even though the process for sleep 5 (PID 22275 in this case) was run first, it finished last, and we successfully detected each process right after it terminated. We also successfully detected each return code, just like in Option 1.
Other References:
*****+ [VERY HELPFUL] Get exit code of a background process - this answer taught me the key principle that (emphasis added):
wait <n> waits until the process with PID is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
In other words, it helped me know that even after the process is complete, you can still call wait on it to get its return code!
How to check if a process id (PID) exists
my answer
Remove an element from a Bash array - note that elements in a bash array aren't actually deleted, they are just "unset". See my comments in the code above for what that means.
How to use the command-line executable true to make an infinite while loop in bash: https://www.cyberciti.biz/faq/bash-infinite-loop/
I see lots of good examples listed on here, wanted to throw mine in as well.
#! /bin/bash
items="1 2 3 4 5 6"
pids=""
for item in $items; do
sleep $item &
pids+="$! "
done
for pid in $pids; do
wait $pid
if [ $? -eq 0 ]; then
echo "SUCCESS - Job $pid exited with a status of $?"
else
echo "FAILED - Job $pid exited with a status of $?"
fi
done
I use something very similar to start/stop servers/services in parallel and check each exit status. Works great for me. Hope this helps someone out!
I don't believe it's possible with Bash's builtin functionality.
You can get notification when a child exits:
#!/bin/sh
set -o monitor # enable script job control
trap 'echo "child died"' CHLD
However there's no apparent way to get the child's exit status in the signal handler.
Getting that child status is usually the job of the wait family of functions in the lower level POSIX APIs. Unfortunately Bash's support for that is limited - you can wait for one specific child process (and get its exit status) or you can wait for all of them, and always get a 0 result.
What it appears impossible to do is the equivalent of waitpid(-1), which blocks until any child process returns.
The following code will wait for completion of all calculations and return exit status 1 if any of doCalculations fails.
#!/bin/bash
for i in $(seq 0 9); do
(doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1
Here's my version that works for multiple pids, logs warnings if execution takes too long, and stops the subprocesses if execution takes longer than a given value.
[EDIT] I have uploaded my newer implementation of WaitForTaskCompletion, called ExecTasks at https://github.com/deajan/ofunctions.
There's also a compat layer for WaitForTaskCompletion
[/EDIT]
function WaitForTaskCompletion {
local pids="${1}" # pids to wait for, separated by semi-colon
local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
local caller_name="${4}" # Who called this function
local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors
Logger "${FUNCNAME[0]} called by [$caller_name]."
local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once
local log_ttime=0 # local time instance for comparaison
local seconds_begin=$SECONDS # Seconds since the beginning of the script
local exec_time=0 # Seconds since the beginning of this function
local retval=0 # return value of monitored pid process
local errorcount=0 # Number of pids that finished with errors
local pidCount # number of given pids
IFS=';' read -a pidsArray <<< "$pids"
pidCount=${#pidsArray[#]}
while [ ${#pidsArray[#]} -gt 0 ]; do
newPidsArray=()
for pid in "${pidsArray[#]}"; do
if kill -0 $pid > /dev/null 2>&1; then
newPidsArray+=($pid)
else
wait $pid
result=$?
if [ $result -ne 0 ]; then
errorcount=$((errorcount+1))
Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
fi
fi
done
## Log a standby message every hour
exec_time=$(($SECONDS - $seconds_begin))
if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
if [ $log_ttime -ne $exec_time ]; then
log_ttime=$exec_time
Logger "Current tasks still running with pids [${pidsArray[#]}]."
fi
fi
if [ $exec_time -gt $soft_max_time ]; then
if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[#]}]."
soft_alert=1
SendAlert
fi
if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[#]}]. Stopping task execution."
kill -SIGTERM $pid
if [ $? == 0 ]; then
Logger "Task stopped successfully"
else
errrorcount=$((errorcount+1))
fi
fi
fi
pidsArray=("${newPidsArray[#]}")
sleep 1
done
Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
Logger "Stopping execution."
exit 1337
else
return $errorcount
fi
}
# Just a plain stupid logging function to be replaced by yours
function Logger {
local value="${1}"
echo $value
}
Example, wait for all three processes to finish, log a warning if execution takes loger than 5 seconds, stop all processes if execution takes longer than 120 seconds. Don't exit program on failures.
function something {
sleep 10 &
pids="$!"
sleep 12 &
pids="$pids;$!"
sleep 9 &
pids="$pids;$!"
WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting
If you have bash 4.2 or later available the following might be useful to you. It uses associative arrays to store task names and their "code" as well as task names and their pids. I have also built in a simple rate-limiting method which might come handy if your tasks consume a lot of CPU or I/O time and you want to limit the number of concurrent tasks.
The script launches all tasks in the first loop and consumes the results in the second one.
This is a bit overkill for simple cases but it allows for pretty neat stuff. For example one can store error messages for each task in another associative array and print them after everything has settled down.
#! /bin/bash
main () {
local -A pids=()
local -A tasks=([task1]="echo 1"
[task2]="echo 2"
[task3]="echo 3"
[task4]="false"
[task5]="echo 5"
[task6]="false")
local max_concurrent_tasks=2
for key in "${!tasks[#]}"; do
while [ $(jobs 2>&1 | grep -c Running) -ge "$max_concurrent_tasks" ]; do
sleep 1 # gnu sleep allows floating point here...
done
${tasks[$key]} &
pids+=(["$key"]="$!")
done
errors=0
for key in "${!tasks[#]}"; do
pid=${pids[$key]}
local cur_ret=0
if [ -z "$pid" ]; then
echo "No Job ID known for the $key process" # should never happen
cur_ret=1
else
wait $pid
cur_ret=$?
fi
if [ "$cur_ret" -ne 0 ]; then
errors=$(($errors + 1))
echo "$key (${tasks[$key]}) failed."
fi
done
return $errors
}
main
I've had a go at this and combined all the best parts from the other examples here. This script will execute the checkpids function when any background process exits, and output the exit status without resorting to polling.
#!/bin/bash
set -o monitor
sleep 2 &
sleep 4 && exit 1 &
sleep 6 &
pids=`jobs -p`
checkpids() {
for pid in $pids; do
if kill -0 $pid 2>/dev/null; then
echo $pid is still alive.
elif wait $pid; then
echo $pid exited with zero exit status.
else
echo $pid exited with non-zero exit status.
fi
done
echo
}
trap checkpids CHLD
wait
#!/bin/bash
set -m
for i in `seq 0 9`; do
doCalculations $i &
done
while fg; do true; done
set -m allows you to use fg & bg in a script
fg, in addition to putting the last process in the foreground, has the same exit status as the process it foregrounds
while fg will stop looping when any fg exits with a non-zero exit status
unfortunately this won't handle the case when a process in the background exits with a non-zero exit status. (the loop won't terminate immediately. it will wait for the previous processes to complete.)
Wait for all jobs and return the exit code of the last failing job. Unlike solutions above, this does not require pid saving, or modifying inner loops of scripts. Just bg away, and wait.
function wait_ex {
# this waits for all jobs and returns the exit code of the last failing job
ecode=0
while true; do
[ -z "$(jobs)" ] && break
wait -n
err="$?"
[ "$err" != "0" ] && ecode="$err"
done
return $ecode
}
EDIT: Fixed the bug where this could be fooled by a script that ran a command that didn't exist.
Just store the results out of the shell, e.g. in a file.
#!/bin/bash
tmp=/tmp/results
: > $tmp #clean the file
for i in `seq 0 9`; do
(doCalculations $i; echo $i:$?>>$tmp)&
done #iterate
wait #wait until all ready
sort $tmp | grep -v ':0' #... handle as required
I've just been modifying a script to background and parallelise a process.
I did some experimenting (on Solaris with both bash and ksh) and discovered that 'wait' outputs the exit status if it's not zero , or a list of jobs that return non-zero exit when no PID argument is provided. E.g.
Bash:
$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]- Exit 2 sleep 20 && exit 2
[2]+ Exit 1 sleep 10 && exit 1
Ksh:
$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+ Done(2) sleep 20 && exit 2
[2]+ Done(1) sleep 10 && exit 1
This output is written to stderr, so a simple solution to the OPs example could be:
#!/bin/bash
trap "rm -f /tmp/x.$$" EXIT
for i in `seq 0 9`; do
doCalculations $i &
done
wait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; then
exit 1
fi
While this:
wait 2> >(wc -l)
will also return a count but without the tmp file. This might also be used this way, for example:
wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)
But this isn't very much more useful than the tmp file IMO. I couldn't find a useful way to avoid the tmp file whilst also avoiding running the "wait" in a subshell, which wont work at all.
I needed this, but the target process wasn't a child of current shell, in which case wait $PID doesn't work. I did find the following alternative instead:
while [ -e /proc/$PID ]; do sleep 0.1 ; done
That relies on the presence of procfs, which may not be available (Mac doesn't provide it for example). So for portability, you could use this instead:
while ps -p $PID >/dev/null ; do sleep 0.1 ; done
There are already a lot of answers here, but I am surprised no one seems to have suggested using arrays... So here's what I did - this might be useful to some in the future.
n=10 # run 10 jobs
c=0
PIDS=()
while true
my_function_or_command &
PID=$!
echo "Launched job as PID=$PID"
PIDS+=($PID)
(( c+=1 ))
# required to prevent any exit due to error
# caused by additional commands run which you
# may add when modifying this example
true
do
if (( c < n ))
then
continue
else
break
fi
done
# collect launched jobs
for pid in "${PIDS[#]}"
do
wait $pid || echo "failed job PID=$pid"
done
This works, should be just as a good if not better than #HoverHell's answer!
#!/usr/bin/env bash
set -m # allow for job control
EXIT_CODE=0; # exit code of overall script
function foo() {
echo "CHLD exit code is $1"
echo "CHLD pid is $2"
echo $(jobs -l)
for job in `jobs -p`; do
echo "PID => ${job}"
wait ${job} || echo "At least one test failed with exit code => $?" ; EXIT_CODE=1
done
}
trap 'foo $? $$' CHLD
DIRN=$(dirname "$0");
commands=(
"{ echo "foo" && exit 4; }"
"{ echo "bar" && exit 3; }"
"{ echo "baz" && exit 5; }"
)
clen=`expr "${#commands[#]}" - 1` # get length of commands - 1
for i in `seq 0 "$clen"`; do
(echo "${commands[$i]}" | bash) & # run the command via bash in subshell
echo "$i ith command has been issued as a background job"
done
# wait for all to finish
wait;
echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end
and of course, I have immortalized this script, in an NPM project which allows you to run bash commands in parallel, useful for testing:
https://github.com/ORESoftware/generic-subshell
Exactly for this purpose I wrote a bash function called :for.
Note: :for not only preserves and returns the exit code of the failing function, but also terminates all parallel running instance. Which might not be needed in this case.
#!/usr/bin/env bash
# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
local pids=("$#")
[ ${#pids} -eq 0 ] && return $?
trap 'kill -INT "${pids[#]}" &>/dev/null || true; trap - INT' INT
trap 'kill -TERM "${pids[#]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM
for pid in "${pids[#]}"; do
wait "${pid}" || return $?
done
trap - INT RETURN TERM
}
# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
local f="${1}" && shift
local i=0
local pids=()
for arg in "$#"; do
( ${f} "${arg}" ) &
pids+=("$!")
if [ ! -z ${FOR_PARALLEL+x} ]; then
(( i=(i+1)%${FOR_PARALLEL} ))
if (( i==0 )) ;then
:wait "${pids[#]}" || return $?
pids=()
fi
fi
done && [ ${#pids} -eq 0 ] || :wait "${pids[#]}" || return $?
}
usage
for.sh:
#!/usr/bin/env bash
set -e
# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)
msg="You should see this three times"
:(){
i="${1}" && shift
echo "${msg}"
sleep 1
if [ "$i" == "1" ]; then sleep 1
elif [ "$i" == "2" ]; then false
elif [ "$i" == "3" ]; then
sleep 3
echo "You should never see this"
fi
} && :for : 1 2 3 || exit $?
echo "You should never see this"
$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1
References
[1]: blog
[2]: gist
set -e
fail () {
touch .failure
}
expect () {
wait
if [ -f .failure ]; then
rm -f .failure
exit 1
fi
}
sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect
The set -e at top makes your script stop on failure.
expect will return 1 if any subjob failed.
There can be a case where the process is complete before waiting for the process. If we trigger wait for a process that is already finished, it will trigger an error like pid is not a child of this shell. To avoid such cases, the following function can be used to find whether the process is complete or not:
isProcessComplete(){
PID=$1
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running"
sleep 5
done
echo "Process $PID has finished"
}
Starting with Bash 5.1, there is a nice new way of waiting for and handling the results of multiple background jobs thanks to the introduction of wait -p:
#!/usr/bin/env bash
# Spawn background jobs
for ((i=0; i < 10; i++)); do
secs=$((RANDOM % 10)); code=$((RANDOM % 256))
(sleep ${secs}; exit ${code}) &
echo "Started background job (pid: $!, sleep: ${secs}, code: ${code})"
done
# Wait for background jobs, print individual results, determine overall result
result=0
while true; do
wait -n -p pid; code=$?
[[ -z "${pid}" ]] && break
echo "Background job ${pid} finished with code ${code}"
(( ${code} != 0 )) && result=1
done
# Return overall result
exit ${result}
I used this recently (thanks to Alnitak):
#!/bin/bash
# activate child monitoring
set -o monitor
# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!
# count, and kill when all done
c=0
function kill_on_count() {
# you could kill on whatever criterion you wish for
# I just counted to simulate bash's wait with no args
[ $c -eq 9 ] && kill $pid
c=$((c+1))
echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD
function save_status() {
local i=$1;
local rc=$2;
# do whatever, and here you know which one stopped
# but remember, you're called from a subshell
# so vars have their values at fork time
}
# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
(doCalculations $i; save_status $i $?) &
done
# wait for locking subprocess to be killed
wait $pid
echo
From there one can easily extrapolate, and have a trigger (touch a file, send a signal) and change the counting criteria (count files touched, or whatever) to respond to that trigger. Or if you just want 'any' non zero rc, just kill the lock from save_status.
Trapping CHLD signal may not work because you can lose some signals if they arrived simultaneously.
#!/bin/bash
trap 'rm -f $tmpfile' EXIT
tmpfile=$(mktemp)
doCalculations() {
echo start job $i...
sleep $((RANDOM % 5))
echo ...end job $i
exit $((RANDOM % 10))
}
number_of_jobs=10
for i in $( seq 1 $number_of_jobs )
do
( trap "echo job$i : exit value : \$? >> $tmpfile" EXIT; doCalculations ) &
done
wait
i=0
while read res; do
echo "$res"
let i++
done < "$tmpfile"
echo $i jobs done !!!
solution to wait for several subprocesses and to exit when any one of them exits with non-zero status code is by using 'wait -n'
#!/bin/bash
wait_for_pids()
{
for (( i = 1; i <= $#; i++ )) do
wait -n $#
status=$?
echo "received status: "$status
if [ $status -ne 0 ] && [ $status -ne 127 ]; then
exit 1
fi
done
}
sleep_for_10()
{
sleep 10
exit 10
}
sleep_for_20()
{
sleep 20
}
sleep_for_10 &
pid1=$!
sleep_for_20 &
pid2=$!
wait_for_pids $pid2 $pid1
status code '127' is for non-existing process which means the child might have exited.
I almost fell into the trap of using jobs -p to collect PIDs, which does not work if the child has already exited, as shown in the script below. The solution I picked was simply calling wait -n N times, where N is the number of children I have, which I happen to know deterministically.
#!/usr/bin/env bash
sleeper() {
echo "Sleeper $1"
sleep $2
echo "Exiting $1"
return $3
}
start_sleepers() {
sleeper 1 1 0 &
sleeper 2 2 $1 &
sleeper 3 5 0 &
sleeper 4 6 0 &
sleep 4
}
echo "Using jobs"
start_sleepers 1
pids=( $(jobs -p) )
echo "PIDS: ${pids[*]}"
for pid in "${pids[#]}"; do
wait "$pid"
echo "Exit code $?"
done
echo "Clearing other children"
wait -n; echo "Exit code $?"
wait -n; echo "Exit code $?"
echo "Waiting for N processes"
start_sleepers 2
for ignored in $(seq 1 4); do
wait -n
echo "Exit code $?"
done
Output:
Using jobs
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
PIDS: 56496 56497
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Clearing other children
Exit code 0
Exit code 1
Waiting for N processes
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
Exit code 0
Exit code 2
Exiting 3
Exit code 0
Exiting 4
Exit code 0
How to wait in a bash script for several subprocesses spawned from that script to finish, and then return exit code !=0 when any of the subprocesses ends with code !=0?
Simple script:
#!/bin/bash
for i in `seq 0 9`; do
doCalculations $i &
done
wait
The above script will wait for all 10 spawned subprocesses, but it will always give exit status 0 (see help wait). How can I modify this script so it will discover exit statuses of spawned subprocesses and return exit code 1 when any of subprocesses ends with code !=0?
Is there any better solution for that than collecting PIDs of the subprocesses, wait for them in order and sum exit statuses?
wait also (optionally) takes the PID of the process to wait for, and with $! you get the PID of the last command launched in the background.
Modify the loop to store the PID of each spawned sub-process into an array, and then loop again waiting on each PID.
# run processes and store pids in array
for i in $n_procs; do
./procs[${i}] &
pids[${i}]=$!
done
# wait for all pids
for pid in ${pids[*]}; do
wait $pid
done
http://jeremy.zawodny.com/blog/archives/010717.html :
#!/bin/bash
FAIL=0
echo "starting"
./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &
for job in `jobs -p`
do
echo $job
wait $job || let "FAIL+=1"
done
echo $FAIL
if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi
Here is simple example using wait.
Run some processes:
$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &
Then wait for them with wait command:
$ wait < <(jobs -p)
Or just wait (without arguments) for all.
This will wait for all jobs in the background are completed.
If the -n option is supplied, waits for the next job to terminate and returns its exit status.
See: help wait and help jobs for syntax.
However the downside is that this will return on only the status of the last ID, so you need to check the status for each subprocess and store it in the variable.
Or make your calculation function to create some file on failure (empty or with fail log), then check of that file if exists, e.g.
$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.
How about simply:
#!/bin/bash
pids=""
for i in `seq 0 9`; do
doCalculations $i &
pids="$pids $!"
done
wait $pids
...code continued here ...
Update:
As pointed by multiple commenters, the above waits for all processes to be completed before continuing, but does not exit and fail if one of them fails, it can be made to do with the following modification suggested by #Bryan, #SamBrightman, and others:
#!/bin/bash
pids=""
RESULT=0
for i in `seq 0 9`; do
doCalculations $i &
pids="$pids $!"
done
for pid in $pids; do
wait $pid || let "RESULT=1"
done
if [ "$RESULT" == "1" ];
then
exit 1
fi
...code continued here ...
If you have GNU Parallel installed you can do:
# If doCalculations is a function
export -f doCalculations
seq 0 9 | parallel doCalculations {}
GNU Parallel will give you exit code:
0 - All jobs ran without error.
1-253 - Some of the jobs failed. The exit status gives the number of failed jobs
254 - More than 253 jobs failed.
255 - Other error.
Watch the intro videos to learn more: http://pi.dk/1
Here's what I've come up with so far. I would like to see how to interrupt the sleep command if a child terminates, so that one would not have to tune WAITALL_DELAY to one's usage.
waitall() { # PID...
## Wait for children to exit and indicate whether all exited with 0 status.
local errors=0
while :; do
debug "Processes remaining: $*"
for pid in "$#"; do
shift
if kill -0 "$pid" 2>/dev/null; then
debug "$pid is still alive."
set -- "$#" "$pid"
elif wait "$pid"; then
debug "$pid exited with zero exit status."
else
debug "$pid exited with non-zero exit status."
((++errors))
fi
done
(("$#" > 0)) || break
# TODO: how to interrupt this sleep when a child terminates?
sleep ${WAITALL_DELAY:-1}
done
((errors == 0))
}
debug() { echo "DEBUG: $*" >&2; }
pids=""
for t in 3 5 4; do
sleep "$t" &
pids="$pids $!"
done
waitall $pids
To parallelize this...
for i in $(whatever_list) ; do
do_something $i
done
Translate it to this...
for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
(
export -f do_something ## export functions (if needed)
export PATH ## export any variables that are required
xargs -I{} --max-procs 0 bash -c ' ## process in batches...
{
echo "processing {}" ## optional
do_something {}
}'
)
If an error occurs in one process, it won't interrupt the other processes, but it will result in a non-zero exit code from the sequence as a whole.
Exporting functions and variables may or may not be necessary, in any particular case.
You can set --max-procs based on how much parallelism you want (0 means "all at once").
GNU Parallel offers some additional features when used in place of xargs -- but it isn't always installed by default.
The for loop isn't strictly necessary in this example since echo $i is basically just regenerating the output of $(whatever_list). I just think the use of the for keyword makes it a little easier to see what is going on.
Bash string handling can be confusing -- I have found that using single quotes works best for wrapping non-trivial scripts.
You can easily interrupt the entire operation (using ^C or similar), unlike the the more direct approach to Bash parallelism.
Here's a simplified working example...
for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
{
echo sleep {}
sleep 2s
}'
This is something that I use:
#wait for jobs
for job in `jobs -p`; do wait ${job}; done
This is an expansion on the most-upvoted answer, by #Luca Tettamanti, to make a fully-runnable example.
That answer left me wondering:
What type of variable is n_procs, and what does it contain? What type of variable is procs, and what does it contain? Can someone please update this answer to make it runnable by adding definitions for those variables? I don't understand how.
...and also:
How do you get the return code from the subprocess when it has completed (which is the whole crux of this question)?
Anyway, I figured it out, so here is a fully-runnable example.
Notes:
$! is how to obtain the PID (Process ID) of the last-executed sub-process.
Running any command with the & after it, like cmd &, for example, causes it to run in the background as a parallel suprocess with the main process.
myarray=() is how to create an array in bash.
To learn a tiny bit more about the wait built-in command, see help wait. See also, and especially, the official Bash user manual on Job Control built-ins, such as wait and jobs, here: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait.
Full, runnable program: wait for all processes to end
multi_process_program.sh (from my eRCaGuy_hello_world repo):
#!/usr/bin/env bash
# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
seconds_to_sleep="$1"
sleep "$seconds_to_sleep"
return "$seconds_to_sleep"
}
# Create an array of whatever commands you want to run as subprocesses
procs=() # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")
num_procs=${#procs[#]} # number of processes
echo "num_procs = $num_procs"
# run commands as subprocesses and store pids in an array
pids=() # bash array
for (( i=0; i<"$num_procs"; i++ )); do
echo "cmd = ${procs[$i]}"
${procs[$i]} & # run the cmd as a subprocess
# store pid of last subprocess started; see:
# https://unix.stackexchange.com/a/30371/114401
pids+=("$!")
echo " pid = ${pids[$i]}"
done
# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[#]}"; do
wait "$pid"
return_code="$?"
echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."
Change the file above to be executable by running chmod +x multi_process_program.sh, then run it like this:
time ./multi_process_program.sh
Sample output. See how the output of the time command in the call shows it took 5.084sec to run. We were also able to successfully retrieve the return code from each subprocess.
eRCaGuy_hello_world/bash$ time ./multi_process_program.sh
num_procs = 4
cmd = my_sleep 5
pid = 21694
cmd = my_sleep 2
pid = 21695
cmd = my_sleep 3
pid = 21697
cmd = my_sleep 4
pid = 21699
PID = 21694; return_code = 5
PID = 21695; return_code = 2
PID = 21697; return_code = 3
PID = 21699; return_code = 4
All 4 processes have ended.
PID 21694 is done; return_code = 5; 3 PIDs remaining.
PID 21695 is done; return_code = 2; 2 PIDs remaining.
PID 21697 is done; return_code = 3; 1 PIDs remaining.
PID 21699 is done; return_code = 4; 0 PIDs remaining.
real 0m5.084s
user 0m0.025s
sys 0m0.061s
Going further: determine live when each individual process ends
If you'd like to do some action as each process finishes, and you don't know when they will finish, you can poll in an infinite while loop to see when each process terminates, then do whatever action you want.
Simply comment out the "OPTION 1" block of code above, and replace it with this "OPTION 2" block instead:
# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
for i in "${!pids[#]}"; do
pid="${pids[$i]}"
# echo "pid = $pid" # debugging
# See if PID is still running; see my answer here:
# https://stackoverflow.com/a/71134379/4561887
ps --pid "$pid" > /dev/null
if [ "$?" -ne 0 ]; then
# PID doesn't exist anymore, meaning it terminated
# 1st, read its return code
wait "$pid"
return_code="$?"
# 2nd, remove this PID from the `pids` array by `unset`ting the
# element at this index; NB: due to how bash arrays work, this does
# NOT actually remove this element from the array. Rather, it
# removes its index from the `"${!pids[#]}"` list of indices,
# adjusts the array count(`"${#pids[#]}"`) accordingly, and it sets
# the value at this index to either a null value of some sort, or
# an empty string (I'm not exactly sure).
unset "pids[$i]"
num_pids="${#pids[#]}"
echo "PID $pid is done; return_code = $return_code;" \
"$num_pids PIDs remaining."
fi
done
# exit the while loop if the `pids` array is empty
if [ "${#pids[#]}" -eq 0 ]; then
break
fi
# Do some small sleep here to keep your polling loop from sucking up
# 100% of one of your CPUs unnecessarily. Sleeping allows other processes
# to run during this time.
sleep 0.1
done
Sample run and output of the full program with Option 1 commented out and Option 2 in-use:
eRCaGuy_hello_world/bash$ ./multi_process_program.sh
num_procs = 4
cmd = my_sleep 5
pid = 22275
cmd = my_sleep 2
pid = 22276
cmd = my_sleep 3
pid = 22277
cmd = my_sleep 4
pid = 22280
PID 22276 is done; return_code = 2; 3 PIDs remaining.
PID 22277 is done; return_code = 3; 2 PIDs remaining.
PID 22280 is done; return_code = 4; 1 PIDs remaining.
PID 22275 is done; return_code = 5; 0 PIDs remaining.
Each of those PID XXXXX is done lines prints out live right after that process has terminated! Notice that even though the process for sleep 5 (PID 22275 in this case) was run first, it finished last, and we successfully detected each process right after it terminated. We also successfully detected each return code, just like in Option 1.
Other References:
*****+ [VERY HELPFUL] Get exit code of a background process - this answer taught me the key principle that (emphasis added):
wait <n> waits until the process with PID is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
In other words, it helped me know that even after the process is complete, you can still call wait on it to get its return code!
How to check if a process id (PID) exists
my answer
Remove an element from a Bash array - note that elements in a bash array aren't actually deleted, they are just "unset". See my comments in the code above for what that means.
How to use the command-line executable true to make an infinite while loop in bash: https://www.cyberciti.biz/faq/bash-infinite-loop/
I see lots of good examples listed on here, wanted to throw mine in as well.
#! /bin/bash
items="1 2 3 4 5 6"
pids=""
for item in $items; do
sleep $item &
pids+="$! "
done
for pid in $pids; do
wait $pid
if [ $? -eq 0 ]; then
echo "SUCCESS - Job $pid exited with a status of $?"
else
echo "FAILED - Job $pid exited with a status of $?"
fi
done
I use something very similar to start/stop servers/services in parallel and check each exit status. Works great for me. Hope this helps someone out!
I don't believe it's possible with Bash's builtin functionality.
You can get notification when a child exits:
#!/bin/sh
set -o monitor # enable script job control
trap 'echo "child died"' CHLD
However there's no apparent way to get the child's exit status in the signal handler.
Getting that child status is usually the job of the wait family of functions in the lower level POSIX APIs. Unfortunately Bash's support for that is limited - you can wait for one specific child process (and get its exit status) or you can wait for all of them, and always get a 0 result.
What it appears impossible to do is the equivalent of waitpid(-1), which blocks until any child process returns.
The following code will wait for completion of all calculations and return exit status 1 if any of doCalculations fails.
#!/bin/bash
for i in $(seq 0 9); do
(doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1
Here's my version that works for multiple pids, logs warnings if execution takes too long, and stops the subprocesses if execution takes longer than a given value.
[EDIT] I have uploaded my newer implementation of WaitForTaskCompletion, called ExecTasks at https://github.com/deajan/ofunctions.
There's also a compat layer for WaitForTaskCompletion
[/EDIT]
function WaitForTaskCompletion {
local pids="${1}" # pids to wait for, separated by semi-colon
local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
local caller_name="${4}" # Who called this function
local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors
Logger "${FUNCNAME[0]} called by [$caller_name]."
local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once
local log_ttime=0 # local time instance for comparaison
local seconds_begin=$SECONDS # Seconds since the beginning of the script
local exec_time=0 # Seconds since the beginning of this function
local retval=0 # return value of monitored pid process
local errorcount=0 # Number of pids that finished with errors
local pidCount # number of given pids
IFS=';' read -a pidsArray <<< "$pids"
pidCount=${#pidsArray[#]}
while [ ${#pidsArray[#]} -gt 0 ]; do
newPidsArray=()
for pid in "${pidsArray[#]}"; do
if kill -0 $pid > /dev/null 2>&1; then
newPidsArray+=($pid)
else
wait $pid
result=$?
if [ $result -ne 0 ]; then
errorcount=$((errorcount+1))
Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
fi
fi
done
## Log a standby message every hour
exec_time=$(($SECONDS - $seconds_begin))
if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
if [ $log_ttime -ne $exec_time ]; then
log_ttime=$exec_time
Logger "Current tasks still running with pids [${pidsArray[#]}]."
fi
fi
if [ $exec_time -gt $soft_max_time ]; then
if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[#]}]."
soft_alert=1
SendAlert
fi
if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[#]}]. Stopping task execution."
kill -SIGTERM $pid
if [ $? == 0 ]; then
Logger "Task stopped successfully"
else
errrorcount=$((errorcount+1))
fi
fi
fi
pidsArray=("${newPidsArray[#]}")
sleep 1
done
Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
Logger "Stopping execution."
exit 1337
else
return $errorcount
fi
}
# Just a plain stupid logging function to be replaced by yours
function Logger {
local value="${1}"
echo $value
}
Example, wait for all three processes to finish, log a warning if execution takes loger than 5 seconds, stop all processes if execution takes longer than 120 seconds. Don't exit program on failures.
function something {
sleep 10 &
pids="$!"
sleep 12 &
pids="$pids;$!"
sleep 9 &
pids="$pids;$!"
WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting
If you have bash 4.2 or later available the following might be useful to you. It uses associative arrays to store task names and their "code" as well as task names and their pids. I have also built in a simple rate-limiting method which might come handy if your tasks consume a lot of CPU or I/O time and you want to limit the number of concurrent tasks.
The script launches all tasks in the first loop and consumes the results in the second one.
This is a bit overkill for simple cases but it allows for pretty neat stuff. For example one can store error messages for each task in another associative array and print them after everything has settled down.
#! /bin/bash
main () {
local -A pids=()
local -A tasks=([task1]="echo 1"
[task2]="echo 2"
[task3]="echo 3"
[task4]="false"
[task5]="echo 5"
[task6]="false")
local max_concurrent_tasks=2
for key in "${!tasks[#]}"; do
while [ $(jobs 2>&1 | grep -c Running) -ge "$max_concurrent_tasks" ]; do
sleep 1 # gnu sleep allows floating point here...
done
${tasks[$key]} &
pids+=(["$key"]="$!")
done
errors=0
for key in "${!tasks[#]}"; do
pid=${pids[$key]}
local cur_ret=0
if [ -z "$pid" ]; then
echo "No Job ID known for the $key process" # should never happen
cur_ret=1
else
wait $pid
cur_ret=$?
fi
if [ "$cur_ret" -ne 0 ]; then
errors=$(($errors + 1))
echo "$key (${tasks[$key]}) failed."
fi
done
return $errors
}
main
I've had a go at this and combined all the best parts from the other examples here. This script will execute the checkpids function when any background process exits, and output the exit status without resorting to polling.
#!/bin/bash
set -o monitor
sleep 2 &
sleep 4 && exit 1 &
sleep 6 &
pids=`jobs -p`
checkpids() {
for pid in $pids; do
if kill -0 $pid 2>/dev/null; then
echo $pid is still alive.
elif wait $pid; then
echo $pid exited with zero exit status.
else
echo $pid exited with non-zero exit status.
fi
done
echo
}
trap checkpids CHLD
wait
#!/bin/bash
set -m
for i in `seq 0 9`; do
doCalculations $i &
done
while fg; do true; done
set -m allows you to use fg & bg in a script
fg, in addition to putting the last process in the foreground, has the same exit status as the process it foregrounds
while fg will stop looping when any fg exits with a non-zero exit status
unfortunately this won't handle the case when a process in the background exits with a non-zero exit status. (the loop won't terminate immediately. it will wait for the previous processes to complete.)
Wait for all jobs and return the exit code of the last failing job. Unlike solutions above, this does not require pid saving, or modifying inner loops of scripts. Just bg away, and wait.
function wait_ex {
# this waits for all jobs and returns the exit code of the last failing job
ecode=0
while true; do
[ -z "$(jobs)" ] && break
wait -n
err="$?"
[ "$err" != "0" ] && ecode="$err"
done
return $ecode
}
EDIT: Fixed the bug where this could be fooled by a script that ran a command that didn't exist.
Just store the results out of the shell, e.g. in a file.
#!/bin/bash
tmp=/tmp/results
: > $tmp #clean the file
for i in `seq 0 9`; do
(doCalculations $i; echo $i:$?>>$tmp)&
done #iterate
wait #wait until all ready
sort $tmp | grep -v ':0' #... handle as required
I've just been modifying a script to background and parallelise a process.
I did some experimenting (on Solaris with both bash and ksh) and discovered that 'wait' outputs the exit status if it's not zero , or a list of jobs that return non-zero exit when no PID argument is provided. E.g.
Bash:
$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]- Exit 2 sleep 20 && exit 2
[2]+ Exit 1 sleep 10 && exit 1
Ksh:
$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+ Done(2) sleep 20 && exit 2
[2]+ Done(1) sleep 10 && exit 1
This output is written to stderr, so a simple solution to the OPs example could be:
#!/bin/bash
trap "rm -f /tmp/x.$$" EXIT
for i in `seq 0 9`; do
doCalculations $i &
done
wait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; then
exit 1
fi
While this:
wait 2> >(wc -l)
will also return a count but without the tmp file. This might also be used this way, for example:
wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)
But this isn't very much more useful than the tmp file IMO. I couldn't find a useful way to avoid the tmp file whilst also avoiding running the "wait" in a subshell, which wont work at all.
I needed this, but the target process wasn't a child of current shell, in which case wait $PID doesn't work. I did find the following alternative instead:
while [ -e /proc/$PID ]; do sleep 0.1 ; done
That relies on the presence of procfs, which may not be available (Mac doesn't provide it for example). So for portability, you could use this instead:
while ps -p $PID >/dev/null ; do sleep 0.1 ; done
There are already a lot of answers here, but I am surprised no one seems to have suggested using arrays... So here's what I did - this might be useful to some in the future.
n=10 # run 10 jobs
c=0
PIDS=()
while true
my_function_or_command &
PID=$!
echo "Launched job as PID=$PID"
PIDS+=($PID)
(( c+=1 ))
# required to prevent any exit due to error
# caused by additional commands run which you
# may add when modifying this example
true
do
if (( c < n ))
then
continue
else
break
fi
done
# collect launched jobs
for pid in "${PIDS[#]}"
do
wait $pid || echo "failed job PID=$pid"
done
This works, should be just as a good if not better than #HoverHell's answer!
#!/usr/bin/env bash
set -m # allow for job control
EXIT_CODE=0; # exit code of overall script
function foo() {
echo "CHLD exit code is $1"
echo "CHLD pid is $2"
echo $(jobs -l)
for job in `jobs -p`; do
echo "PID => ${job}"
wait ${job} || echo "At least one test failed with exit code => $?" ; EXIT_CODE=1
done
}
trap 'foo $? $$' CHLD
DIRN=$(dirname "$0");
commands=(
"{ echo "foo" && exit 4; }"
"{ echo "bar" && exit 3; }"
"{ echo "baz" && exit 5; }"
)
clen=`expr "${#commands[#]}" - 1` # get length of commands - 1
for i in `seq 0 "$clen"`; do
(echo "${commands[$i]}" | bash) & # run the command via bash in subshell
echo "$i ith command has been issued as a background job"
done
# wait for all to finish
wait;
echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end
and of course, I have immortalized this script, in an NPM project which allows you to run bash commands in parallel, useful for testing:
https://github.com/ORESoftware/generic-subshell
Exactly for this purpose I wrote a bash function called :for.
Note: :for not only preserves and returns the exit code of the failing function, but also terminates all parallel running instance. Which might not be needed in this case.
#!/usr/bin/env bash
# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
local pids=("$#")
[ ${#pids} -eq 0 ] && return $?
trap 'kill -INT "${pids[#]}" &>/dev/null || true; trap - INT' INT
trap 'kill -TERM "${pids[#]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM
for pid in "${pids[#]}"; do
wait "${pid}" || return $?
done
trap - INT RETURN TERM
}
# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
local f="${1}" && shift
local i=0
local pids=()
for arg in "$#"; do
( ${f} "${arg}" ) &
pids+=("$!")
if [ ! -z ${FOR_PARALLEL+x} ]; then
(( i=(i+1)%${FOR_PARALLEL} ))
if (( i==0 )) ;then
:wait "${pids[#]}" || return $?
pids=()
fi
fi
done && [ ${#pids} -eq 0 ] || :wait "${pids[#]}" || return $?
}
usage
for.sh:
#!/usr/bin/env bash
set -e
# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)
msg="You should see this three times"
:(){
i="${1}" && shift
echo "${msg}"
sleep 1
if [ "$i" == "1" ]; then sleep 1
elif [ "$i" == "2" ]; then false
elif [ "$i" == "3" ]; then
sleep 3
echo "You should never see this"
fi
} && :for : 1 2 3 || exit $?
echo "You should never see this"
$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1
References
[1]: blog
[2]: gist
set -e
fail () {
touch .failure
}
expect () {
wait
if [ -f .failure ]; then
rm -f .failure
exit 1
fi
}
sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect
The set -e at top makes your script stop on failure.
expect will return 1 if any subjob failed.
There can be a case where the process is complete before waiting for the process. If we trigger wait for a process that is already finished, it will trigger an error like pid is not a child of this shell. To avoid such cases, the following function can be used to find whether the process is complete or not:
isProcessComplete(){
PID=$1
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running"
sleep 5
done
echo "Process $PID has finished"
}
Starting with Bash 5.1, there is a nice new way of waiting for and handling the results of multiple background jobs thanks to the introduction of wait -p:
#!/usr/bin/env bash
# Spawn background jobs
for ((i=0; i < 10; i++)); do
secs=$((RANDOM % 10)); code=$((RANDOM % 256))
(sleep ${secs}; exit ${code}) &
echo "Started background job (pid: $!, sleep: ${secs}, code: ${code})"
done
# Wait for background jobs, print individual results, determine overall result
result=0
while true; do
wait -n -p pid; code=$?
[[ -z "${pid}" ]] && break
echo "Background job ${pid} finished with code ${code}"
(( ${code} != 0 )) && result=1
done
# Return overall result
exit ${result}
I used this recently (thanks to Alnitak):
#!/bin/bash
# activate child monitoring
set -o monitor
# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!
# count, and kill when all done
c=0
function kill_on_count() {
# you could kill on whatever criterion you wish for
# I just counted to simulate bash's wait with no args
[ $c -eq 9 ] && kill $pid
c=$((c+1))
echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD
function save_status() {
local i=$1;
local rc=$2;
# do whatever, and here you know which one stopped
# but remember, you're called from a subshell
# so vars have their values at fork time
}
# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
(doCalculations $i; save_status $i $?) &
done
# wait for locking subprocess to be killed
wait $pid
echo
From there one can easily extrapolate, and have a trigger (touch a file, send a signal) and change the counting criteria (count files touched, or whatever) to respond to that trigger. Or if you just want 'any' non zero rc, just kill the lock from save_status.
Trapping CHLD signal may not work because you can lose some signals if they arrived simultaneously.
#!/bin/bash
trap 'rm -f $tmpfile' EXIT
tmpfile=$(mktemp)
doCalculations() {
echo start job $i...
sleep $((RANDOM % 5))
echo ...end job $i
exit $((RANDOM % 10))
}
number_of_jobs=10
for i in $( seq 1 $number_of_jobs )
do
( trap "echo job$i : exit value : \$? >> $tmpfile" EXIT; doCalculations ) &
done
wait
i=0
while read res; do
echo "$res"
let i++
done < "$tmpfile"
echo $i jobs done !!!
solution to wait for several subprocesses and to exit when any one of them exits with non-zero status code is by using 'wait -n'
#!/bin/bash
wait_for_pids()
{
for (( i = 1; i <= $#; i++ )) do
wait -n $#
status=$?
echo "received status: "$status
if [ $status -ne 0 ] && [ $status -ne 127 ]; then
exit 1
fi
done
}
sleep_for_10()
{
sleep 10
exit 10
}
sleep_for_20()
{
sleep 20
}
sleep_for_10 &
pid1=$!
sleep_for_20 &
pid2=$!
wait_for_pids $pid2 $pid1
status code '127' is for non-existing process which means the child might have exited.
I almost fell into the trap of using jobs -p to collect PIDs, which does not work if the child has already exited, as shown in the script below. The solution I picked was simply calling wait -n N times, where N is the number of children I have, which I happen to know deterministically.
#!/usr/bin/env bash
sleeper() {
echo "Sleeper $1"
sleep $2
echo "Exiting $1"
return $3
}
start_sleepers() {
sleeper 1 1 0 &
sleeper 2 2 $1 &
sleeper 3 5 0 &
sleeper 4 6 0 &
sleep 4
}
echo "Using jobs"
start_sleepers 1
pids=( $(jobs -p) )
echo "PIDS: ${pids[*]}"
for pid in "${pids[#]}"; do
wait "$pid"
echo "Exit code $?"
done
echo "Clearing other children"
wait -n; echo "Exit code $?"
wait -n; echo "Exit code $?"
echo "Waiting for N processes"
start_sleepers 2
for ignored in $(seq 1 4); do
wait -n
echo "Exit code $?"
done
Output:
Using jobs
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
PIDS: 56496 56497
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Clearing other children
Exit code 0
Exit code 1
Waiting for N processes
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
Exit code 0
Exit code 2
Exiting 3
Exit code 0
Exiting 4
Exit code 0
I have a CI script that I want to speed up by running several things in the background. I want the script wait for all processes and check each one to see if it failed.
Here a a simplification:
#!/bin/bash
set -e
bg()
{
sleep .$[ ( $RANDOM % 10 ) + 1 ]s
}
bg2()
{
sleep .$[ ( $RANDOM % 10 ) + 1 ]s
exit 1
}
bg & # will pass after a random delay
bg2 & # will fail after a random delay
# I want the output of the program to be a failure since bg2 fails
Yes.
You can use the wait command in bash to wait for completion on one or more sub-processes to terminate in which case we provide the PID to wait on it. Also wait can optionally take no arguments in which case it waits for all background process to terminate.
Example:-
#!/bin/bash
sleep 3 &
wait "$!" # Feeding the non-zero process-id as argument to wait command.
# Can also be stored in a variable as pid=$(echo $!)
# Waits until the process 'sleep 3' is completed. Here the wait
# on a single process is done by capturing its process id
echo "I am waking up"
sleep 4 &
sleep 5 &
wait # Without specifying the id, just 'wait' waits until all jobs
# started on the background is complete.
# (or) simply
# wait < <(jobs -p) # To wait on all background jobs started with (job &)
echo "I woke up again"
Update:-
To identify the jobs when the fail, it is best to loop over the list of background jobs and log their exit-code for visibility. Thanks to wonderful suggestion by chepner. It goes like
#!/bin/bash
for p in $(jobs -p)
do
wait "$p" || { echo "job $p failed" >&2; exit; }
done
#!/bin/bash
set -e
bg()
{
sleep .$[ ( $RANDOM % 10 ) + 1 ]s
}
bg2()
{
sleep .$[ ( $RANDOM % 10 ) + 1 ]s
exit 1
}
export -f bg
export -f bg2
parallel ::: bg bg2 || echo $? of the jobs failed
I am trying to write a .sh file that runs many programs simultaneously
I tried this
prog1
prog2
But that runs prog1 then waits until prog1 ends and then starts prog2...
So how can I run them in parallel?
How about:
prog1 & prog2 && fg
This will:
Start prog1.
Send it to background, but keep printing its output.
Start prog2, and keep it in foreground, so you can close it with ctrl-c.
When you close prog2, you'll return to prog1's foreground, so you can also close it with ctrl-c.
To run multiple programs in parallel:
prog1 &
prog2 &
If you need your script to wait for the programs to finish, you can add:
wait
at the point where you want the script to wait for them.
If you want to be able to easily run and kill multiple process with ctrl-c, this is my favorite method: spawn multiple background processes in a (…) subshell, and trap SIGINT to execute kill 0, which will kill everything spawned in the subshell group:
(trap 'kill 0' SIGINT; prog1 & prog2 & prog3)
You can have complex process execution structures, and everything will close with a single ctrl-c (just make sure the last process is run in the foreground, i.e., don't include a & after prog1.3):
(trap 'kill 0' SIGINT; prog1.1 && prog1.2 & (prog2.1 | prog2.2 || prog2.3) & prog1.3)
If there is a chance the last command might exit early and you want to keep everything else running, add wait as the last command. In the following example, sleep 2 would have exited first, killing sleep 4 before it finished; adding wait allows both to run to completion:
(trap 'kill 0' SIGINT; sleep 4 & sleep 2 & wait)
You can use wait:
some_command &
P1=$!
other_command &
P2=$!
wait $P1 $P2
It assigns the background program PIDs to variables ($! is the last launched process' PID), then the wait command waits for them. It is nice because if you kill the script, it kills the processes too!
With GNU Parallel http://www.gnu.org/software/parallel/ it is as easy as:
(echo prog1; echo prog2) | parallel
Or if you prefer:
parallel ::: prog1 prog2
Learn more:
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). Your command line
will love you for it.
Read: Ole Tange, GNU Parallel 2018 (Ole Tange, 2018).
xargs -P <n> allows you to run <n> commands in parallel.
While -P is a nonstandard option, both the GNU (Linux) and macOS/BSD implementations support it.
The following example:
runs at most 3 commands in parallel at a time,
with additional commands starting only when a previously launched process terminates.
time xargs -P 3 -I {} sh -c 'eval "$1"' - {} <<'EOF'
sleep 1; echo 1
sleep 2; echo 2
sleep 3; echo 3
echo 4
EOF
The output looks something like:
1 # output from 1st command
4 # output from *last* command, which started as soon as the count dropped below 3
2 # output from 2nd command
3 # output from 3rd command
real 0m3.012s
user 0m0.011s
sys 0m0.008s
The timing shows that the commands were run in parallel (the last command was launched only after the first of the original 3 terminated, but executed very quickly).
The xargs command itself won't return until all commands have finished, but you can execute it in the background by terminating it with control operator & and then using the wait builtin to wait for the entire xargs command to finish.
{
xargs -P 3 -I {} sh -c 'eval "$1"' - {} <<'EOF'
sleep 1; echo 1
sleep 2; echo 2
sleep 3; echo 3
echo 4
EOF
} &
# Script execution continues here while `xargs` is running
# in the background.
echo "Waiting for commands to finish..."
# Wait for `xargs` to finish, via special variable $!, which contains
# the PID of the most recently started background process.
wait $!
Note:
BSD/macOS xargs requires you to specify the count of commands to run in parallel explicitly, whereas GNU xargs allows you to specify -P 0 to run as many as possible in parallel.
Output from the processes run in parallel arrives as it is being generated, so it will be unpredictably interleaved.
GNU parallel, as mentioned in Ole's answer (does not come standard with most platforms), conveniently serializes (groups) the output on a per-process basis and offers many more advanced features.
#!/bin/bash
prog1 & 2> .errorprog1.log; prog2 & 2> .errorprog2.log
Redirect errors to separate logs.
Here is a function I use in order to run at max n process in parallel (n=4 in the example):
max_children=4
function parallel {
local time1=$(date +"%H:%M:%S")
local time2=""
# for the sake of the example, I'm using $2 as a description, you may be interested in other description
echo "starting $2 ($time1)..."
"$#" && time2=$(date +"%H:%M:%S") && echo "finishing $2 ($time1 -- $time2)..." &
local my_pid=$$
local children=$(ps -eo ppid | grep -w $my_pid | wc -w)
children=$((children-1))
if [[ $children -ge $max_children ]]; then
wait -n
fi
}
parallel sleep 5
parallel sleep 6
parallel sleep 7
parallel sleep 8
parallel sleep 9
wait
If max_children is set to the number of cores, this function will try to avoid idle cores.
There is a very useful program that calls nohup.
nohup - run a command immune to hangups, with output to a non-tty
This works beautifully for me (found here):
sh -c 'command1 & command2 & command3 & wait'
It outputs all the logs of each command intermingled (which is what I wanted), and all are killed with ctrl+c.
I had a similar situation recently where I needed to run multiple programs at the same time, redirect their outputs to separated log files and wait for them to finish and I ended up with something like that:
#!/bin/bash
# Add the full path processes to run to the array
PROCESSES_TO_RUN=("/home/joao/Code/test/prog_1/prog1" \
"/home/joao/Code/test/prog_2/prog2")
# You can keep adding processes to the array...
for i in ${PROCESSES_TO_RUN[#]}; do
${i%/*}/./${i##*/} > ${i}.log 2>&1 &
# ${i%/*} -> Get folder name until the /
# ${i##*/} -> Get the filename after the /
done
# Wait for the processes to finish
wait
Source: http://joaoperibeiro.com/execute-multiple-programs-and-redirect-their-outputs-linux/
You can try ppss (abandoned). ppss is rather powerful - you can even create a mini-cluster.
xargs -P can also be useful if you've got a batch of embarrassingly parallel processing to do.
Process Spawning Manager
Sure, technically these are processes, and this program should really be called a process spawning manager, but this is only due to the way that BASH works when it forks using the ampersand, it uses the fork() or perhaps clone() system call which clones into a separate memory space, rather than something like pthread_create() which would share memory. If BASH supported the latter, each "sequence of execution" would operate just the same and could be termed to be traditional threads whilst gaining a more efficient memory footprint. Functionally however it works the same, though a bit more difficult since GLOBAL variables are not available in each worker clone hence the use of the inter-process communication file and the rudimentary flock semaphore to manage critical sections. Forking from BASH of course is the basic answer here but I feel as if people know that but are really looking to manage what is spawned rather than just fork it and forget it. This demonstrates a way to manage up to 200 instances of forked processes all accessing a single resource. Clearly this is overkill but I enjoyed writing it so I kept on. Increase the size of your terminal accordingly. I hope you find this useful.
ME=$(basename $0)
IPC="/tmp/$ME.ipc" #interprocess communication file (global thread accounting stats)
DBG=/tmp/$ME.log
echo 0 > $IPC #initalize counter
F1=thread
SPAWNED=0
COMPLETE=0
SPAWN=1000 #number of jobs to process
SPEEDFACTOR=1 #dynamically compensates for execution time
THREADLIMIT=50 #maximum concurrent threads
TPS=1 #threads per second delay
THREADCOUNT=0 #number of running threads
SCALE="scale=5" #controls bc's precision
START=$(date +%s) #whence we began
MAXTHREADDUR=6 #maximum thread life span - demo mode
LOWER=$[$THREADLIMIT*100*90/10000] #90% worker utilization threshold
UPPER=$[$THREADLIMIT*100*95/10000] #95% worker utilization threshold
DELTA=10 #initial percent speed change
threadspeed() #dynamically adjust spawn rate based on worker utilization
{
#vaguely assumes thread execution average will be consistent
THREADCOUNT=$(threadcount)
if [ $THREADCOUNT -ge $LOWER ] && [ $THREADCOUNT -le $UPPER ] ;then
echo SPEED HOLD >> $DBG
return
elif [ $THREADCOUNT -lt $LOWER ] ;then
#if maxthread is free speed up
SPEEDFACTOR=$(echo "$SCALE;$SPEEDFACTOR*(1-($DELTA/100))"|bc)
echo SPEED UP $DELTA%>> $DBG
elif [ $THREADCOUNT -gt $UPPER ];then
#if maxthread is active then slow down
SPEEDFACTOR=$(echo "$SCALE;$SPEEDFACTOR*(1+($DELTA/100))"|bc)
DELTA=1 #begin fine grain control
echo SLOW DOWN $DELTA%>> $DBG
fi
echo SPEEDFACTOR $SPEEDFACTOR >> $DBG
#average thread duration (total elapsed time / number of threads completed)
#if threads completed is zero (less than 100), default to maxdelay/2 maxthreads
COMPLETE=$(cat $IPC)
if [ -z $COMPLETE ];then
echo BAD IPC READ ============================================== >> $DBG
return
fi
#echo Threads COMPLETE $COMPLETE >> $DBG
if [ $COMPLETE -lt 100 ];then
AVGTHREAD=$(echo "$SCALE;$MAXTHREADDUR/2"|bc)
else
ELAPSED=$[$(date +%s)-$START]
#echo Elapsed Time $ELAPSED >> $DBG
AVGTHREAD=$(echo "$SCALE;$ELAPSED/$COMPLETE*$THREADLIMIT"|bc)
fi
echo AVGTHREAD Duration is $AVGTHREAD >> $DBG
#calculate timing to achieve spawning each workers fast enough
# to utilize threadlimit - average time it takes to complete one thread / max number of threads
TPS=$(echo "$SCALE;($AVGTHREAD/$THREADLIMIT)*$SPEEDFACTOR"|bc)
#TPS=$(echo "$SCALE;$AVGTHREAD/$THREADLIMIT"|bc) # maintains pretty good
#echo TPS $TPS >> $DBG
}
function plot()
{
echo -en \\033[${2}\;${1}H
if [ -n "$3" ];then
if [[ $4 = "good" ]];then
echo -en "\\033[1;32m"
elif [[ $4 = "warn" ]];then
echo -en "\\033[1;33m"
elif [[ $4 = "fail" ]];then
echo -en "\\033[1;31m"
elif [[ $4 = "crit" ]];then
echo -en "\\033[1;31;4m"
fi
fi
echo -n "$3"
echo -en "\\033[0;39m"
}
trackthread() #displays thread status
{
WORKERID=$1
THREADID=$2
ACTION=$3 #setactive | setfree | update
AGE=$4
TS=$(date +%s)
COL=$[(($WORKERID-1)/50)*40]
ROW=$[(($WORKERID-1)%50)+1]
case $ACTION in
"setactive" )
touch /tmp/$ME.$F1$WORKERID #redundant - see main loop
#echo created file $ME.$F1$WORKERID >> $DBG
plot $COL $ROW "Worker$WORKERID: ACTIVE-TID:$THREADID INIT " good
;;
"update" )
plot $COL $ROW "Worker$WORKERID: ACTIVE-TID:$THREADID AGE:$AGE" warn
;;
"setfree" )
plot $COL $ROW "Worker$WORKERID: FREE " fail
rm /tmp/$ME.$F1$WORKERID
;;
* )
;;
esac
}
getfreeworkerid()
{
for i in $(seq 1 $[$THREADLIMIT+1])
do
if [ ! -e /tmp/$ME.$F1$i ];then
#echo "getfreeworkerid returned $i" >> $DBG
break
fi
done
if [ $i -eq $[$THREADLIMIT+1] ];then
#echo "no free threads" >> $DBG
echo 0
#exit
else
echo $i
fi
}
updateIPC()
{
COMPLETE=$(cat $IPC) #read IPC
COMPLETE=$[$COMPLETE+1] #increment IPC
echo $COMPLETE > $IPC #write back to IPC
}
worker()
{
WORKERID=$1
THREADID=$2
#echo "new worker WORKERID:$WORKERID THREADID:$THREADID" >> $DBG
#accessing common terminal requires critical blocking section
(flock -x -w 10 201
trackthread $WORKERID $THREADID setactive
)201>/tmp/$ME.lock
let "RND = $RANDOM % $MAXTHREADDUR +1"
for s in $(seq 1 $RND) #simulate random lifespan
do
sleep 1;
(flock -x -w 10 201
trackthread $WORKERID $THREADID update $s
)201>/tmp/$ME.lock
done
(flock -x -w 10 201
trackthread $WORKERID $THREADID setfree
)201>/tmp/$ME.lock
(flock -x -w 10 201
updateIPC
)201>/tmp/$ME.lock
}
threadcount()
{
TC=$(ls /tmp/$ME.$F1* 2> /dev/null | wc -l)
#echo threadcount is $TC >> $DBG
THREADCOUNT=$TC
echo $TC
}
status()
{
#summary status line
COMPLETE=$(cat $IPC)
plot 1 $[$THREADLIMIT+2] "WORKERS $(threadcount)/$THREADLIMIT SPAWNED $SPAWNED/$SPAWN COMPLETE $COMPLETE/$SPAWN SF=$SPEEDFACTOR TIMING=$TPS"
echo -en '\033[K' #clear to end of line
}
function main()
{
while [ $SPAWNED -lt $SPAWN ]
do
while [ $(threadcount) -lt $THREADLIMIT ] && [ $SPAWNED -lt $SPAWN ]
do
WID=$(getfreeworkerid)
worker $WID $SPAWNED &
touch /tmp/$ME.$F1$WID #if this loops faster than file creation in the worker thread it steps on itself, thread tracking is best in main loop
SPAWNED=$[$SPAWNED+1]
(flock -x -w 10 201
status
)201>/tmp/$ME.lock
sleep $TPS
if ((! $[$SPAWNED%100]));then
#rethink thread timing every 100 threads
threadspeed
fi
done
sleep $TPS
done
while [ "$(threadcount)" -gt 0 ]
do
(flock -x -w 10 201
status
)201>/tmp/$ME.lock
sleep 1;
done
status
}
clear
threadspeed
main
wait
status
echo
Since for some reason I can't use wait, I came up with this solution:
# create a hashmap of the tasks name -> its command
declare -A tasks=(
["Sleep 3 seconds"]="sleep 3"
["Check network"]="ping imdb.com"
["List dir"]="ls -la"
)
# execute each task in the background, redirecting their output to a custom file descriptor
fd=10
for task in "${!tasks[#]}"; do
script="${tasks[${task}]}"
eval "exec $fd< <(${script} 2>&1 || (echo $task failed with exit code \${?}! && touch tasks_failed))"
((fd+=1))
done
# print the outputs of the tasks and wait for them to finish
fd=10
for task in "${!tasks[#]}"; do
cat <&$fd
((fd+=1))
done
# determine the exit status
# by checking whether the file "tasks_failed" has been created
if [ -e tasks_failed ]; then
echo "Task(s) failed!"
exit 1
else
echo "All tasks finished without an error!"
exit 0
fi
Your script should look like:
prog1 &
prog2 &
.
.
progn &
wait
progn+1 &
progn+2 &
.
.
Assuming your system can take n jobs at a time. use wait to run only n jobs at a time.
If you're:
On Mac and have iTerm
Want to start various processes that stay open long-term / until Ctrl+C
Want to be able to easily see the output from each process
Want to be able to easily stop a specific process with Ctrl+C
One option is scripting the terminal itself if your use case is more app monitoring / management.
For example I recently did the following. Granted it's Mac specific, iTerm specific, and relies on a deprecated Apple Script API (iTerm has a newer Python option). It doesn't win any elegance awards but gets the job done.
#!/bin/sh
root_path="~/root-path"
auth_api_script="$root_path/auth-path/auth-script.sh"
admin_api_proj="$root_path/admin-path/admin.csproj"
agent_proj="$root_path/agent-path/agent.csproj"
dashboard_path="$root_path/dashboard-web"
osascript <<THEEND
tell application "iTerm"
set newWindow to (create window with default profile)
tell current session of newWindow
set name to "Auth API"
write text "pushd $root_path && $auth_api_script"
end tell
tell newWindow
set newTab to (create tab with default profile)
tell current session of newTab
set name to "Admin API"
write text "dotnet run --debug -p $admin_api_proj"
end tell
end tell
tell newWindow
set newTab to (create tab with default profile)
tell current session of newTab
set name to "Agent"
write text "dotnet run --debug -p $agent_proj"
end tell
end tell
tell newWindow
set newTab to (create tab with default profile)
tell current session of newTab
set name to "Dashboard"
write text "pushd $dashboard_path; ng serve -o"
end tell
end tell
end tell
THEEND
If you have a GUI terminal, you could spawn a new tabbed terminal instance for each process you want to run in parallel.
This has the benefit that each program runs in its own tab where it can be interacted with and managed independently of the other running programs.
For example, on Ubuntu 20.04:
gnome-terminal --tab -- bash -c 'prog1'
gnome-terminal --tab -- bash -c 'prog2'
To run certain programs or other commands sequentially, you can add ;
gnome-terminal --tab -- bash -c 'prog1_1; prog1_2'
gnome-terminal --tab -- bash -c 'prog2'
I've found that for some programs, the terminal closes before they start up. For these programs I append the terminal command with ; wait or ; sleep 1
gnome-terminal --tab -- bash -c 'prog1; wait'
For Mac OS, you would have to find an equivalent command for the terminal you are using - I haven't tested on Mac OS since I don't own a Mac.
There're a lot of interesting answers here, but I took inspiration from this answer and put together a simple script that runs multiple processes in parallel and handles the results once they're done. You can find it in this gist, or below:
#!/usr/bin/env bash
# inspired by https://stackoverflow.com/a/29535256/2860309
pids=""
failures=0
function my_process() {
seconds_to_sleep=$1
exit_code=$2
sleep "$seconds_to_sleep"
return "$exit_code"
}
(my_process 1 0) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 1 second to success"
(my_process 1 1) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 1 second to failure"
(my_process 2 0) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 2 seconds to success"
(my_process 2 1) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 2 seconds to failure"
echo "..."
for pid in $pids; do
if wait "$pid"; then
echo "Process $pid succeeded"
else
echo "Process $pid failed"
failures=$((failures+1))
fi
done
echo
echo "${failures} failures detected"
This results in:
86400: 1 second to success
86401: 1 second to failure
86402: 2 seconds to success
86404: 2 seconds to failure
...
Process 86400 succeeded
Process 86401 failed
Process 86402 succeeded
Process 86404 failed
2 failures detected
With bashj ( https://sourceforge.net/projects/bashj/ ) , you should be able to run not only multiple processes (the way others suggested) but also multiple Threads in one JVM controlled from your script. But of course this requires a java JDK. Threads consume less resource than processes.
Here is a working code:
#!/usr/bin/bashj
#!java
public static int cnt=0;
private static void loop() {u.p("java says cnt= "+(cnt++));u.sleep(1.0);}
public static void startThread()
{(new Thread(() -> {while (true) {loop();}})).start();}
#!bashj
j.startThread()
while [ j.cnt -lt 4 ]
do
echo "bash views cnt=" j.cnt
sleep 0.5
done
How to wait in a bash script for several subprocesses spawned from that script to finish, and then return exit code !=0 when any of the subprocesses ends with code !=0?
Simple script:
#!/bin/bash
for i in `seq 0 9`; do
doCalculations $i &
done
wait
The above script will wait for all 10 spawned subprocesses, but it will always give exit status 0 (see help wait). How can I modify this script so it will discover exit statuses of spawned subprocesses and return exit code 1 when any of subprocesses ends with code !=0?
Is there any better solution for that than collecting PIDs of the subprocesses, wait for them in order and sum exit statuses?
wait also (optionally) takes the PID of the process to wait for, and with $! you get the PID of the last command launched in the background.
Modify the loop to store the PID of each spawned sub-process into an array, and then loop again waiting on each PID.
# run processes and store pids in array
for i in $n_procs; do
./procs[${i}] &
pids[${i}]=$!
done
# wait for all pids
for pid in ${pids[*]}; do
wait $pid
done
http://jeremy.zawodny.com/blog/archives/010717.html :
#!/bin/bash
FAIL=0
echo "starting"
./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &
for job in `jobs -p`
do
echo $job
wait $job || let "FAIL+=1"
done
echo $FAIL
if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi
Here is simple example using wait.
Run some processes:
$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &
Then wait for them with wait command:
$ wait < <(jobs -p)
Or just wait (without arguments) for all.
This will wait for all jobs in the background are completed.
If the -n option is supplied, waits for the next job to terminate and returns its exit status.
See: help wait and help jobs for syntax.
However the downside is that this will return on only the status of the last ID, so you need to check the status for each subprocess and store it in the variable.
Or make your calculation function to create some file on failure (empty or with fail log), then check of that file if exists, e.g.
$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.
How about simply:
#!/bin/bash
pids=""
for i in `seq 0 9`; do
doCalculations $i &
pids="$pids $!"
done
wait $pids
...code continued here ...
Update:
As pointed by multiple commenters, the above waits for all processes to be completed before continuing, but does not exit and fail if one of them fails, it can be made to do with the following modification suggested by #Bryan, #SamBrightman, and others:
#!/bin/bash
pids=""
RESULT=0
for i in `seq 0 9`; do
doCalculations $i &
pids="$pids $!"
done
for pid in $pids; do
wait $pid || let "RESULT=1"
done
if [ "$RESULT" == "1" ];
then
exit 1
fi
...code continued here ...
If you have GNU Parallel installed you can do:
# If doCalculations is a function
export -f doCalculations
seq 0 9 | parallel doCalculations {}
GNU Parallel will give you exit code:
0 - All jobs ran without error.
1-253 - Some of the jobs failed. The exit status gives the number of failed jobs
254 - More than 253 jobs failed.
255 - Other error.
Watch the intro videos to learn more: http://pi.dk/1
Here's what I've come up with so far. I would like to see how to interrupt the sleep command if a child terminates, so that one would not have to tune WAITALL_DELAY to one's usage.
waitall() { # PID...
## Wait for children to exit and indicate whether all exited with 0 status.
local errors=0
while :; do
debug "Processes remaining: $*"
for pid in "$#"; do
shift
if kill -0 "$pid" 2>/dev/null; then
debug "$pid is still alive."
set -- "$#" "$pid"
elif wait "$pid"; then
debug "$pid exited with zero exit status."
else
debug "$pid exited with non-zero exit status."
((++errors))
fi
done
(("$#" > 0)) || break
# TODO: how to interrupt this sleep when a child terminates?
sleep ${WAITALL_DELAY:-1}
done
((errors == 0))
}
debug() { echo "DEBUG: $*" >&2; }
pids=""
for t in 3 5 4; do
sleep "$t" &
pids="$pids $!"
done
waitall $pids
To parallelize this...
for i in $(whatever_list) ; do
do_something $i
done
Translate it to this...
for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
(
export -f do_something ## export functions (if needed)
export PATH ## export any variables that are required
xargs -I{} --max-procs 0 bash -c ' ## process in batches...
{
echo "processing {}" ## optional
do_something {}
}'
)
If an error occurs in one process, it won't interrupt the other processes, but it will result in a non-zero exit code from the sequence as a whole.
Exporting functions and variables may or may not be necessary, in any particular case.
You can set --max-procs based on how much parallelism you want (0 means "all at once").
GNU Parallel offers some additional features when used in place of xargs -- but it isn't always installed by default.
The for loop isn't strictly necessary in this example since echo $i is basically just regenerating the output of $(whatever_list). I just think the use of the for keyword makes it a little easier to see what is going on.
Bash string handling can be confusing -- I have found that using single quotes works best for wrapping non-trivial scripts.
You can easily interrupt the entire operation (using ^C or similar), unlike the the more direct approach to Bash parallelism.
Here's a simplified working example...
for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
{
echo sleep {}
sleep 2s
}'
This is something that I use:
#wait for jobs
for job in `jobs -p`; do wait ${job}; done
This is an expansion on the most-upvoted answer, by #Luca Tettamanti, to make a fully-runnable example.
That answer left me wondering:
What type of variable is n_procs, and what does it contain? What type of variable is procs, and what does it contain? Can someone please update this answer to make it runnable by adding definitions for those variables? I don't understand how.
...and also:
How do you get the return code from the subprocess when it has completed (which is the whole crux of this question)?
Anyway, I figured it out, so here is a fully-runnable example.
Notes:
$! is how to obtain the PID (Process ID) of the last-executed sub-process.
Running any command with the & after it, like cmd &, for example, causes it to run in the background as a parallel suprocess with the main process.
myarray=() is how to create an array in bash.
To learn a tiny bit more about the wait built-in command, see help wait. See also, and especially, the official Bash user manual on Job Control built-ins, such as wait and jobs, here: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait.
Full, runnable program: wait for all processes to end
multi_process_program.sh (from my eRCaGuy_hello_world repo):
#!/usr/bin/env bash
# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
seconds_to_sleep="$1"
sleep "$seconds_to_sleep"
return "$seconds_to_sleep"
}
# Create an array of whatever commands you want to run as subprocesses
procs=() # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")
num_procs=${#procs[#]} # number of processes
echo "num_procs = $num_procs"
# run commands as subprocesses and store pids in an array
pids=() # bash array
for (( i=0; i<"$num_procs"; i++ )); do
echo "cmd = ${procs[$i]}"
${procs[$i]} & # run the cmd as a subprocess
# store pid of last subprocess started; see:
# https://unix.stackexchange.com/a/30371/114401
pids+=("$!")
echo " pid = ${pids[$i]}"
done
# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[#]}"; do
wait "$pid"
return_code="$?"
echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."
Change the file above to be executable by running chmod +x multi_process_program.sh, then run it like this:
time ./multi_process_program.sh
Sample output. See how the output of the time command in the call shows it took 5.084sec to run. We were also able to successfully retrieve the return code from each subprocess.
eRCaGuy_hello_world/bash$ time ./multi_process_program.sh
num_procs = 4
cmd = my_sleep 5
pid = 21694
cmd = my_sleep 2
pid = 21695
cmd = my_sleep 3
pid = 21697
cmd = my_sleep 4
pid = 21699
PID = 21694; return_code = 5
PID = 21695; return_code = 2
PID = 21697; return_code = 3
PID = 21699; return_code = 4
All 4 processes have ended.
PID 21694 is done; return_code = 5; 3 PIDs remaining.
PID 21695 is done; return_code = 2; 2 PIDs remaining.
PID 21697 is done; return_code = 3; 1 PIDs remaining.
PID 21699 is done; return_code = 4; 0 PIDs remaining.
real 0m5.084s
user 0m0.025s
sys 0m0.061s
Going further: determine live when each individual process ends
If you'd like to do some action as each process finishes, and you don't know when they will finish, you can poll in an infinite while loop to see when each process terminates, then do whatever action you want.
Simply comment out the "OPTION 1" block of code above, and replace it with this "OPTION 2" block instead:
# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
for i in "${!pids[#]}"; do
pid="${pids[$i]}"
# echo "pid = $pid" # debugging
# See if PID is still running; see my answer here:
# https://stackoverflow.com/a/71134379/4561887
ps --pid "$pid" > /dev/null
if [ "$?" -ne 0 ]; then
# PID doesn't exist anymore, meaning it terminated
# 1st, read its return code
wait "$pid"
return_code="$?"
# 2nd, remove this PID from the `pids` array by `unset`ting the
# element at this index; NB: due to how bash arrays work, this does
# NOT actually remove this element from the array. Rather, it
# removes its index from the `"${!pids[#]}"` list of indices,
# adjusts the array count(`"${#pids[#]}"`) accordingly, and it sets
# the value at this index to either a null value of some sort, or
# an empty string (I'm not exactly sure).
unset "pids[$i]"
num_pids="${#pids[#]}"
echo "PID $pid is done; return_code = $return_code;" \
"$num_pids PIDs remaining."
fi
done
# exit the while loop if the `pids` array is empty
if [ "${#pids[#]}" -eq 0 ]; then
break
fi
# Do some small sleep here to keep your polling loop from sucking up
# 100% of one of your CPUs unnecessarily. Sleeping allows other processes
# to run during this time.
sleep 0.1
done
Sample run and output of the full program with Option 1 commented out and Option 2 in-use:
eRCaGuy_hello_world/bash$ ./multi_process_program.sh
num_procs = 4
cmd = my_sleep 5
pid = 22275
cmd = my_sleep 2
pid = 22276
cmd = my_sleep 3
pid = 22277
cmd = my_sleep 4
pid = 22280
PID 22276 is done; return_code = 2; 3 PIDs remaining.
PID 22277 is done; return_code = 3; 2 PIDs remaining.
PID 22280 is done; return_code = 4; 1 PIDs remaining.
PID 22275 is done; return_code = 5; 0 PIDs remaining.
Each of those PID XXXXX is done lines prints out live right after that process has terminated! Notice that even though the process for sleep 5 (PID 22275 in this case) was run first, it finished last, and we successfully detected each process right after it terminated. We also successfully detected each return code, just like in Option 1.
Other References:
*****+ [VERY HELPFUL] Get exit code of a background process - this answer taught me the key principle that (emphasis added):
wait <n> waits until the process with PID is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
In other words, it helped me know that even after the process is complete, you can still call wait on it to get its return code!
How to check if a process id (PID) exists
my answer
Remove an element from a Bash array - note that elements in a bash array aren't actually deleted, they are just "unset". See my comments in the code above for what that means.
How to use the command-line executable true to make an infinite while loop in bash: https://www.cyberciti.biz/faq/bash-infinite-loop/
I see lots of good examples listed on here, wanted to throw mine in as well.
#! /bin/bash
items="1 2 3 4 5 6"
pids=""
for item in $items; do
sleep $item &
pids+="$! "
done
for pid in $pids; do
wait $pid
if [ $? -eq 0 ]; then
echo "SUCCESS - Job $pid exited with a status of $?"
else
echo "FAILED - Job $pid exited with a status of $?"
fi
done
I use something very similar to start/stop servers/services in parallel and check each exit status. Works great for me. Hope this helps someone out!
I don't believe it's possible with Bash's builtin functionality.
You can get notification when a child exits:
#!/bin/sh
set -o monitor # enable script job control
trap 'echo "child died"' CHLD
However there's no apparent way to get the child's exit status in the signal handler.
Getting that child status is usually the job of the wait family of functions in the lower level POSIX APIs. Unfortunately Bash's support for that is limited - you can wait for one specific child process (and get its exit status) or you can wait for all of them, and always get a 0 result.
What it appears impossible to do is the equivalent of waitpid(-1), which blocks until any child process returns.
The following code will wait for completion of all calculations and return exit status 1 if any of doCalculations fails.
#!/bin/bash
for i in $(seq 0 9); do
(doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1
Here's my version that works for multiple pids, logs warnings if execution takes too long, and stops the subprocesses if execution takes longer than a given value.
[EDIT] I have uploaded my newer implementation of WaitForTaskCompletion, called ExecTasks at https://github.com/deajan/ofunctions.
There's also a compat layer for WaitForTaskCompletion
[/EDIT]
function WaitForTaskCompletion {
local pids="${1}" # pids to wait for, separated by semi-colon
local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
local caller_name="${4}" # Who called this function
local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors
Logger "${FUNCNAME[0]} called by [$caller_name]."
local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once
local log_ttime=0 # local time instance for comparaison
local seconds_begin=$SECONDS # Seconds since the beginning of the script
local exec_time=0 # Seconds since the beginning of this function
local retval=0 # return value of monitored pid process
local errorcount=0 # Number of pids that finished with errors
local pidCount # number of given pids
IFS=';' read -a pidsArray <<< "$pids"
pidCount=${#pidsArray[#]}
while [ ${#pidsArray[#]} -gt 0 ]; do
newPidsArray=()
for pid in "${pidsArray[#]}"; do
if kill -0 $pid > /dev/null 2>&1; then
newPidsArray+=($pid)
else
wait $pid
result=$?
if [ $result -ne 0 ]; then
errorcount=$((errorcount+1))
Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
fi
fi
done
## Log a standby message every hour
exec_time=$(($SECONDS - $seconds_begin))
if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
if [ $log_ttime -ne $exec_time ]; then
log_ttime=$exec_time
Logger "Current tasks still running with pids [${pidsArray[#]}]."
fi
fi
if [ $exec_time -gt $soft_max_time ]; then
if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[#]}]."
soft_alert=1
SendAlert
fi
if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[#]}]. Stopping task execution."
kill -SIGTERM $pid
if [ $? == 0 ]; then
Logger "Task stopped successfully"
else
errrorcount=$((errorcount+1))
fi
fi
fi
pidsArray=("${newPidsArray[#]}")
sleep 1
done
Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
Logger "Stopping execution."
exit 1337
else
return $errorcount
fi
}
# Just a plain stupid logging function to be replaced by yours
function Logger {
local value="${1}"
echo $value
}
Example, wait for all three processes to finish, log a warning if execution takes loger than 5 seconds, stop all processes if execution takes longer than 120 seconds. Don't exit program on failures.
function something {
sleep 10 &
pids="$!"
sleep 12 &
pids="$pids;$!"
sleep 9 &
pids="$pids;$!"
WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting
If you have bash 4.2 or later available the following might be useful to you. It uses associative arrays to store task names and their "code" as well as task names and their pids. I have also built in a simple rate-limiting method which might come handy if your tasks consume a lot of CPU or I/O time and you want to limit the number of concurrent tasks.
The script launches all tasks in the first loop and consumes the results in the second one.
This is a bit overkill for simple cases but it allows for pretty neat stuff. For example one can store error messages for each task in another associative array and print them after everything has settled down.
#! /bin/bash
main () {
local -A pids=()
local -A tasks=([task1]="echo 1"
[task2]="echo 2"
[task3]="echo 3"
[task4]="false"
[task5]="echo 5"
[task6]="false")
local max_concurrent_tasks=2
for key in "${!tasks[#]}"; do
while [ $(jobs 2>&1 | grep -c Running) -ge "$max_concurrent_tasks" ]; do
sleep 1 # gnu sleep allows floating point here...
done
${tasks[$key]} &
pids+=(["$key"]="$!")
done
errors=0
for key in "${!tasks[#]}"; do
pid=${pids[$key]}
local cur_ret=0
if [ -z "$pid" ]; then
echo "No Job ID known for the $key process" # should never happen
cur_ret=1
else
wait $pid
cur_ret=$?
fi
if [ "$cur_ret" -ne 0 ]; then
errors=$(($errors + 1))
echo "$key (${tasks[$key]}) failed."
fi
done
return $errors
}
main
I've had a go at this and combined all the best parts from the other examples here. This script will execute the checkpids function when any background process exits, and output the exit status without resorting to polling.
#!/bin/bash
set -o monitor
sleep 2 &
sleep 4 && exit 1 &
sleep 6 &
pids=`jobs -p`
checkpids() {
for pid in $pids; do
if kill -0 $pid 2>/dev/null; then
echo $pid is still alive.
elif wait $pid; then
echo $pid exited with zero exit status.
else
echo $pid exited with non-zero exit status.
fi
done
echo
}
trap checkpids CHLD
wait
#!/bin/bash
set -m
for i in `seq 0 9`; do
doCalculations $i &
done
while fg; do true; done
set -m allows you to use fg & bg in a script
fg, in addition to putting the last process in the foreground, has the same exit status as the process it foregrounds
while fg will stop looping when any fg exits with a non-zero exit status
unfortunately this won't handle the case when a process in the background exits with a non-zero exit status. (the loop won't terminate immediately. it will wait for the previous processes to complete.)
Wait for all jobs and return the exit code of the last failing job. Unlike solutions above, this does not require pid saving, or modifying inner loops of scripts. Just bg away, and wait.
function wait_ex {
# this waits for all jobs and returns the exit code of the last failing job
ecode=0
while true; do
[ -z "$(jobs)" ] && break
wait -n
err="$?"
[ "$err" != "0" ] && ecode="$err"
done
return $ecode
}
EDIT: Fixed the bug where this could be fooled by a script that ran a command that didn't exist.
Just store the results out of the shell, e.g. in a file.
#!/bin/bash
tmp=/tmp/results
: > $tmp #clean the file
for i in `seq 0 9`; do
(doCalculations $i; echo $i:$?>>$tmp)&
done #iterate
wait #wait until all ready
sort $tmp | grep -v ':0' #... handle as required
I've just been modifying a script to background and parallelise a process.
I did some experimenting (on Solaris with both bash and ksh) and discovered that 'wait' outputs the exit status if it's not zero , or a list of jobs that return non-zero exit when no PID argument is provided. E.g.
Bash:
$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]- Exit 2 sleep 20 && exit 2
[2]+ Exit 1 sleep 10 && exit 1
Ksh:
$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+ Done(2) sleep 20 && exit 2
[2]+ Done(1) sleep 10 && exit 1
This output is written to stderr, so a simple solution to the OPs example could be:
#!/bin/bash
trap "rm -f /tmp/x.$$" EXIT
for i in `seq 0 9`; do
doCalculations $i &
done
wait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; then
exit 1
fi
While this:
wait 2> >(wc -l)
will also return a count but without the tmp file. This might also be used this way, for example:
wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)
But this isn't very much more useful than the tmp file IMO. I couldn't find a useful way to avoid the tmp file whilst also avoiding running the "wait" in a subshell, which wont work at all.
I needed this, but the target process wasn't a child of current shell, in which case wait $PID doesn't work. I did find the following alternative instead:
while [ -e /proc/$PID ]; do sleep 0.1 ; done
That relies on the presence of procfs, which may not be available (Mac doesn't provide it for example). So for portability, you could use this instead:
while ps -p $PID >/dev/null ; do sleep 0.1 ; done
There are already a lot of answers here, but I am surprised no one seems to have suggested using arrays... So here's what I did - this might be useful to some in the future.
n=10 # run 10 jobs
c=0
PIDS=()
while true
my_function_or_command &
PID=$!
echo "Launched job as PID=$PID"
PIDS+=($PID)
(( c+=1 ))
# required to prevent any exit due to error
# caused by additional commands run which you
# may add when modifying this example
true
do
if (( c < n ))
then
continue
else
break
fi
done
# collect launched jobs
for pid in "${PIDS[#]}"
do
wait $pid || echo "failed job PID=$pid"
done
This works, should be just as a good if not better than #HoverHell's answer!
#!/usr/bin/env bash
set -m # allow for job control
EXIT_CODE=0; # exit code of overall script
function foo() {
echo "CHLD exit code is $1"
echo "CHLD pid is $2"
echo $(jobs -l)
for job in `jobs -p`; do
echo "PID => ${job}"
wait ${job} || echo "At least one test failed with exit code => $?" ; EXIT_CODE=1
done
}
trap 'foo $? $$' CHLD
DIRN=$(dirname "$0");
commands=(
"{ echo "foo" && exit 4; }"
"{ echo "bar" && exit 3; }"
"{ echo "baz" && exit 5; }"
)
clen=`expr "${#commands[#]}" - 1` # get length of commands - 1
for i in `seq 0 "$clen"`; do
(echo "${commands[$i]}" | bash) & # run the command via bash in subshell
echo "$i ith command has been issued as a background job"
done
# wait for all to finish
wait;
echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end
and of course, I have immortalized this script, in an NPM project which allows you to run bash commands in parallel, useful for testing:
https://github.com/ORESoftware/generic-subshell
Exactly for this purpose I wrote a bash function called :for.
Note: :for not only preserves and returns the exit code of the failing function, but also terminates all parallel running instance. Which might not be needed in this case.
#!/usr/bin/env bash
# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
local pids=("$#")
[ ${#pids} -eq 0 ] && return $?
trap 'kill -INT "${pids[#]}" &>/dev/null || true; trap - INT' INT
trap 'kill -TERM "${pids[#]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM
for pid in "${pids[#]}"; do
wait "${pid}" || return $?
done
trap - INT RETURN TERM
}
# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
local f="${1}" && shift
local i=0
local pids=()
for arg in "$#"; do
( ${f} "${arg}" ) &
pids+=("$!")
if [ ! -z ${FOR_PARALLEL+x} ]; then
(( i=(i+1)%${FOR_PARALLEL} ))
if (( i==0 )) ;then
:wait "${pids[#]}" || return $?
pids=()
fi
fi
done && [ ${#pids} -eq 0 ] || :wait "${pids[#]}" || return $?
}
usage
for.sh:
#!/usr/bin/env bash
set -e
# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)
msg="You should see this three times"
:(){
i="${1}" && shift
echo "${msg}"
sleep 1
if [ "$i" == "1" ]; then sleep 1
elif [ "$i" == "2" ]; then false
elif [ "$i" == "3" ]; then
sleep 3
echo "You should never see this"
fi
} && :for : 1 2 3 || exit $?
echo "You should never see this"
$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1
References
[1]: blog
[2]: gist
set -e
fail () {
touch .failure
}
expect () {
wait
if [ -f .failure ]; then
rm -f .failure
exit 1
fi
}
sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect
The set -e at top makes your script stop on failure.
expect will return 1 if any subjob failed.
There can be a case where the process is complete before waiting for the process. If we trigger wait for a process that is already finished, it will trigger an error like pid is not a child of this shell. To avoid such cases, the following function can be used to find whether the process is complete or not:
isProcessComplete(){
PID=$1
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running"
sleep 5
done
echo "Process $PID has finished"
}
Starting with Bash 5.1, there is a nice new way of waiting for and handling the results of multiple background jobs thanks to the introduction of wait -p:
#!/usr/bin/env bash
# Spawn background jobs
for ((i=0; i < 10; i++)); do
secs=$((RANDOM % 10)); code=$((RANDOM % 256))
(sleep ${secs}; exit ${code}) &
echo "Started background job (pid: $!, sleep: ${secs}, code: ${code})"
done
# Wait for background jobs, print individual results, determine overall result
result=0
while true; do
wait -n -p pid; code=$?
[[ -z "${pid}" ]] && break
echo "Background job ${pid} finished with code ${code}"
(( ${code} != 0 )) && result=1
done
# Return overall result
exit ${result}
I used this recently (thanks to Alnitak):
#!/bin/bash
# activate child monitoring
set -o monitor
# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!
# count, and kill when all done
c=0
function kill_on_count() {
# you could kill on whatever criterion you wish for
# I just counted to simulate bash's wait with no args
[ $c -eq 9 ] && kill $pid
c=$((c+1))
echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD
function save_status() {
local i=$1;
local rc=$2;
# do whatever, and here you know which one stopped
# but remember, you're called from a subshell
# so vars have their values at fork time
}
# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
(doCalculations $i; save_status $i $?) &
done
# wait for locking subprocess to be killed
wait $pid
echo
From there one can easily extrapolate, and have a trigger (touch a file, send a signal) and change the counting criteria (count files touched, or whatever) to respond to that trigger. Or if you just want 'any' non zero rc, just kill the lock from save_status.
Trapping CHLD signal may not work because you can lose some signals if they arrived simultaneously.
#!/bin/bash
trap 'rm -f $tmpfile' EXIT
tmpfile=$(mktemp)
doCalculations() {
echo start job $i...
sleep $((RANDOM % 5))
echo ...end job $i
exit $((RANDOM % 10))
}
number_of_jobs=10
for i in $( seq 1 $number_of_jobs )
do
( trap "echo job$i : exit value : \$? >> $tmpfile" EXIT; doCalculations ) &
done
wait
i=0
while read res; do
echo "$res"
let i++
done < "$tmpfile"
echo $i jobs done !!!
solution to wait for several subprocesses and to exit when any one of them exits with non-zero status code is by using 'wait -n'
#!/bin/bash
wait_for_pids()
{
for (( i = 1; i <= $#; i++ )) do
wait -n $#
status=$?
echo "received status: "$status
if [ $status -ne 0 ] && [ $status -ne 127 ]; then
exit 1
fi
done
}
sleep_for_10()
{
sleep 10
exit 10
}
sleep_for_20()
{
sleep 20
}
sleep_for_10 &
pid1=$!
sleep_for_20 &
pid2=$!
wait_for_pids $pid2 $pid1
status code '127' is for non-existing process which means the child might have exited.
I almost fell into the trap of using jobs -p to collect PIDs, which does not work if the child has already exited, as shown in the script below. The solution I picked was simply calling wait -n N times, where N is the number of children I have, which I happen to know deterministically.
#!/usr/bin/env bash
sleeper() {
echo "Sleeper $1"
sleep $2
echo "Exiting $1"
return $3
}
start_sleepers() {
sleeper 1 1 0 &
sleeper 2 2 $1 &
sleeper 3 5 0 &
sleeper 4 6 0 &
sleep 4
}
echo "Using jobs"
start_sleepers 1
pids=( $(jobs -p) )
echo "PIDS: ${pids[*]}"
for pid in "${pids[#]}"; do
wait "$pid"
echo "Exit code $?"
done
echo "Clearing other children"
wait -n; echo "Exit code $?"
wait -n; echo "Exit code $?"
echo "Waiting for N processes"
start_sleepers 2
for ignored in $(seq 1 4); do
wait -n
echo "Exit code $?"
done
Output:
Using jobs
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
PIDS: 56496 56497
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Clearing other children
Exit code 0
Exit code 1
Waiting for N processes
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
Exit code 0
Exit code 2
Exiting 3
Exit code 0
Exiting 4
Exit code 0