I'm trying to run 3 commands in parallel in bash shell:
$ (first command) & (second command) & (third command) & wait
The problem with this is that if first command fails, for example, the exit code is 0 (I guess because wait succeeds).
The desired behavior is that if one of the commands fails, the exit code will be non-zero (and ideally, the other running commands will be stopped).
How could I achieve this?
Please note that I want to run the commands in parallel!
the best I can think of is:
first & p1=$!
second & p2=$!
...
wait $p1 && wait $p2 && ..
or
wait $p1 || ( kill $p2 $p3 && exit 1 )
...
however this still enforces an order for the check of processes, so if the third fails immediately you won't notice it until the first and second finishes.
This might work for you:
parallel -j3 --halt 2 <list_of_commands.txt
This will run 3 commands in parallel.
If any running job fails it will kill the remaining running jobs and then stop, returning the exit code of the failing job.
You should use && instead of &. eg:
first command && second command && third command && wait
However this will NOT run your command in parallel as every subsequent command's execution will depend on exit code 0 of the previous command.
The shell function below will wait for all PIDs passed as arguments to finish, returning 0 if all PIDs executed without error.
The first PID that exists with an error will cause the PIDs that come after it to be killed, and the exit code that caused the error will be returned by the function.
wait_and_fail_on_first() {
local piderr=0 i
while test $# -gt 0; do {
dpid="$1"; shift
wait $dpid || { piderr=$?; kill $#; return $piderr ;}
} done
}
Here's how to use it:
(first command) & pid1=$!
(second command) & pid2=$!
(third command) & pid3=$!
wait_and_fail_on_first $pid1 $pid2 $pid3 || {
echo "PID $dpid failed with code $?"
echo "Other PIDs were killed"
}
Related
I have a bash script where I would like to run two processes in parallel, and have the script fail if either of the processes return non-zero. A minimal example of my initial attempt is:
#!/bin/bash
set -e
(sleep 3 ; true ) &
(sleep 4 ; false ) &
wait %1 && wait %2
echo "Still here, exit code: $?"
As expected this doesn't print the message because wait %1 && wait %2 fails and the script exits due to the set -e. However, if the waits are reversed such that the first one has the non-zero status (wait %2 && wait %1), the message is printed:
$ bash wait_test.sh
Still here, exit code: 1
Putting each wait on its own line works as I want and exits the script if either of the processes fail, but the fact that it doesn't work with && makes me suspect that I'm misunderstanding something here.
Can anyone explain what's going on?
You can achieve what you want quite elegantly with GNU Parallel and its "fail handling".
In general, it will run as many jobs in parallel as you have CPU cores.
In your case, try this, which says "exit with failed status if one or more jobs failed":
#!/bin/bash
cat <<EOF | parallel --halt soon,fail=1
echo Job 1; exit 0
echo Job 2; exit 1
EOF
echo GNU Parallel exit status: $?
Sample Output
Job 1
Job 2
parallel: This job failed:
echo Job 2; exit 1
GNU Parallel exit status: 1
Now run it such that no job fails:
#!/bin/bash
cat <<EOF | parallel --halt soon,fail=1
echo Job 1; exit 0
echo Job 2; exit 0
EOF
echo GNU Parallel exit status: $?
Sample Output
Job 1
Job 2
GNU Parallel exit status: 0
If you dislike the heredoc syntax, you can put the list of jobs in a file called jobs.txt like this:
echo Job 1; exit 0
echo Job 2; exit 0
Then run with:
parallel --halt soon,fail=1 < jobs.txt
From bash manual section about usage of set
-e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non- zero status because a command failed while -e was being ignored, the shell does not exit. A trap on ERR, if set, is executed before the shell exits. This option applies to the shell environment and each subshell environment separately (see COMMAND EXECUTION ENVIRONMENT above), and may cause subshells to exit before executing all the commands in the subshell.
tl;dr
In a bash script, for a command list like this
command1 && command2
command1 is run in a separate environment, so it cannot affect the script's execution environment. but command2 is run in the current environment, so it can affect
In a CI setting, I'd like to run multiple jobs in the background, and use set -e to exit on the first error.
This requires using wait -n instead of wait, but for increasing throughput I'd then want to move the for i in {1..20}; do wait -n; done to the end of the script.
Unfortunately, this means that it is hard to track the errors.
Rather, what I would want is to do the equivalent to a non-blocking wait -n often, and exit as soon as possible.
Is this possible or do I have to write my bash scripts as a Makefile?
Alternative Approach: Emulate set -e for background jobs
Instead of checking the jobs all the time it could be easier and more efficient to exit the script directly when a job fails. To this end, append ... || kill $$ to every job you start:
# before
myCommand &
myProgram arg1 arg2 &
# after
myCommand || kill $$ &
myProgram arg1 arg2 || kill $$ &
Non-Blocking wait -n
If you really have to, you can write your own non-blocking wait -n with a little trick:
nextJobExitCode() {
sleep 0.1 &
wait -n
exitCode="$?"
kill %%
return "$exitCode"
}
The function nextJobExitCode waits at most 0.1 seconds for your jobs. If none of your jobs were already finished or did finish in that 0.1 seconds, nextJobExitCode will terminate with exit code 0.
Example usage
set -e
sleep 1 & # job 1
(sleep 3; false) & # job 2
nextJobExitCode # won't exit. No jobs finished yet
sleep 2
nextJobExitCode # won't exit. Job 1 finished with 0
sleep 2
nextJobExitCode # will exit! Job 2 finished with 1
I have a bash script that launches a child process that crashes (actually, hangs) from time to time and with no apparent reason (closed source, so there isn't much I can do about it). As a result, I would like to be able to launch this process for a given amount of time, and kill it if it did not return successfully after a given amount of time.
Is there a simple and robust way to achieve that using bash?
P.S.: tell me if this question is better suited to serverfault or superuser.
(As seen in:
BASH FAQ entry #68: "How do I run a command, and have it abort (timeout) after N seconds?")
If you don't mind downloading something, use timeout (sudo apt-get install timeout) and use it like: (most Systems have it already installed otherwise use sudo apt-get install coreutils)
timeout 10 ping www.goooooogle.com
If you don't want to download something, do what timeout does internally:
( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec ping www.goooooogle.com )
In case that you want to do a timeout for longer bash code, use the second option as such:
( cmdpid=$BASHPID;
(sleep 10; kill $cmdpid) \
& while ! ping -w 1 www.goooooogle.com
do
echo crap;
done )
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) &
or to get the exit codes as well:
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) & waiter=$!
# wait on our worker process and return the exitcode
exitcode=$(wait $pid && echo $?)
# kill the waiter subshell, if it still runs
kill -9 $waiter 2>/dev/null
# 0 if we killed the waiter, cause that means the process finished before the waiter
finished_gracefully=$?
sleep 999&
t=$!
sleep 10
kill $t
I also had this question and found two more things very useful:
The SECONDS variable in bash.
The command "pgrep".
So I use something like this on the command line (OSX 10.9):
ping www.goooooogle.com & PING_PID=$(pgrep 'ping'); SECONDS=0; while pgrep -q 'ping'; do sleep 0.2; if [ $SECONDS = 10 ]; then kill $PING_PID; fi; done
As this is a loop I included a "sleep 0.2" to keep the CPU cool. ;-)
(BTW: ping is a bad example anyway, you just would use the built-in "-t" (timeout) option.)
Assuming you have (or can easily make) a pid file for tracking the child's pid, you could then create a script that checks the modtime of the pid file and kills/respawns the process as needed. Then just put the script in crontab to run at approximately the period you need.
Let me know if you need more details. If that doesn't sound like it'd suit your needs, what about upstart?
One way is to run the program in a subshell, and communicate with the subshell through a named pipe with the read command. This way you can check the exit status of the process being run and communicate this back through the pipe.
Here's an example of timing out the yes command after 3 seconds. It gets the PID of the process using pgrep (possibly only works on Linux). There is also some problem with using a pipe in that a process opening a pipe for read will hang until it is also opened for write, and vice versa. So to prevent the read command hanging, I've "wedged" open the pipe for read with a background subshell. (Another way to prevent a freeze to open the pipe read-write, i.e. read -t 5 <>finished.pipe - however, that also may not work except with Linux.)
rm -f finished.pipe
mkfifo finished.pipe
{ yes >/dev/null; echo finished >finished.pipe ; } &
SUBSHELL=$!
# Get command PID
while : ; do
PID=$( pgrep -P $SUBSHELL yes )
test "$PID" = "" || break
sleep 1
done
# Open pipe for writing
{ exec 4>finished.pipe ; while : ; do sleep 1000; done } &
read -t 3 FINISHED <finished.pipe
if [ "$FINISHED" = finished ] ; then
echo 'Subprocess finished'
else
echo 'Subprocess timed out'
kill $PID
fi
rm finished.pipe
Here's an attempt which tries to avoid killing a process after it has already exited, which reduces the chance of killing another process with the same process ID (although it's probably impossible to avoid this kind of error completely).
run_with_timeout ()
{
t=$1
shift
echo "running \"$*\" with timeout $t"
(
# first, run process in background
(exec sh -c "$*") &
pid=$!
echo $pid
# the timeout shell
(sleep $t ; echo timeout) &
waiter=$!
echo $waiter
# finally, allow process to end naturally
wait $pid
echo $?
) \
| (read pid
read waiter
if test $waiter != timeout ; then
read status
else
status=timeout
fi
# if we timed out, kill the process
if test $status = timeout ; then
kill $pid
exit 99
else
# if the program exited normally, kill the waiting shell
kill $waiter
exit $status
fi
)
}
Use like run_with_timeout 3 sleep 10000, which runs sleep 10000 but ends it after 3 seconds.
This is like other answers which use a background timeout process to kill the child process after a delay. I think this is almost the same as Dan's extended answer (https://stackoverflow.com/a/5161274/1351983), except the timeout shell will not be killed if it has already ended.
After this program has ended, there will still be a few lingering "sleep" processes running, but they should be harmless.
This may be a better solution than my other answer because it does not use the non-portable shell feature read -t and does not use pgrep.
Here's the third answer I've submitted here. This one handles signal interrupts and cleans up background processes when SIGINT is received. It uses the $BASHPID and exec trick used in the top answer to get the PID of a process (in this case $$ in a sh invocation). It uses a FIFO to communicate with a subshell that is responsible for killing and cleanup. (This is like the pipe in my second answer, but having a named pipe means that the signal handler can write into it too.)
run_with_timeout ()
{
t=$1 ; shift
trap cleanup 2
F=$$.fifo ; rm -f $F ; mkfifo $F
# first, run main process in background
"$#" & pid=$!
# sleeper process to time out
( sh -c "echo \$\$ >$F ; exec sleep $t" ; echo timeout >$F ) &
read sleeper <$F
# control shell. read from fifo.
# final input is "finished". after that
# we clean up. we can get a timeout or a
# signal first.
( exec 0<$F
while : ; do
read input
case $input in
finished)
test $sleeper != 0 && kill $sleeper
rm -f $F
exit 0
;;
timeout)
test $pid != 0 && kill $pid
sleeper=0
;;
signal)
test $pid != 0 && kill $pid
;;
esac
done
) &
# wait for process to end
wait $pid
status=$?
echo finished >$F
return $status
}
cleanup ()
{
echo signal >$$.fifo
}
I've tried to avoid race conditions as far as I can. However, one source of error I couldn't remove is when the process ends near the same time as the timeout. For example, run_with_timeout 2 sleep 2 or run_with_timeout 0 sleep 0. For me, the latter gives an error:
timeout.sh: line 250: kill: (23248) - No such process
as it is trying to kill a process that has already exited by itself.
#Kill command after 10 seconds
timeout 10 command
#If you don't have timeout installed, this is almost the same:
sh -c '(sleep 10; kill "$$") & command'
#The same as above, with muted duplicate messages:
sh -c '(sleep 10; kill "$$" 2>/dev/null) & command'
I have 4 shell scripts, first 3 scripts i want to execute parallel. Later after successful completion of all 3 scripts i want to execute 4th script
Parellelexecution
sh script1.sh,
sh script2.sh,
sh script3.sh
script4.sh should execute after all 3 execution.
bash 4.3 added a -n flag to wait that lets it wait for any one background job to complete. For a fixed number of background jobs, you could do use something like
script1.sh &
script2.sh &
script3.sh &
wait -n && wait -n && wait -n && script4.sh
For a large or variable number of background jobs, Kurt's answer is better.
In bash you can do:
pids=
for s in script1.sh script2.sh script3.sh; do
$s &
pids="$pids $!"
done
JOBS_FAILED=false
for pid in $pids; do
if ! wait $pid; then
# script didn't exit successfully
JOBS_FAILED=true
fi
done
if [[ $JOBS_FAILED == false ]]; then
script4.sh
fi
First it starts all the first 3 scripts in background and collects their pids. Then it runs through each pid waiting for it to exit and checking its return value. If any of the first three scripts fail, $JOBS_FAILED is set to the string true but all the processes are still waited on. Once all the first 3 scripts finish, the script checks if any jobs failed. If not, script4.sh is run.
I need a bash script that does the following:
Starts a background process with all output directed to a file
Writes the process's exit code to a file
Returns the process's pid (right away, not when process exits).
The script must exit
I can get the pid but not the exit code:
$ executable >>$log 2>&1 &
pid=`jobs -p`
Or, I can capture the exit code but not the pid:
$ executable >>$log;
# blocked on previous line until process exits
echo $0 >>$log;
How can I do all of these at the same time?
The pid is in $!, no need to run jobs. And the return status is returned by wait:
$executable >> $log 2>&1 &
pid=$!
wait $!
echo $? # return status of $executable
EDIT 1
If I understand the additional requirement as stated in a comment, and you want the script to return immediately (without waiting for the command to finish), then it will not be possible to have the initial script write the exit status of the command. But it is easy enough to have an intermediary write the exit status as soon as the child finishes. Something like:
sh -c "$executable"' & echo pid=$! > pidfile; wait $!; echo $? > exit-status' &
should work.
EDIT 2
As pointed out in the comments, that solution has a race condition: the main script terminates before the pidfile is written. The OP solves this by doing a polling sleep loop, which is an abomination and I fear I will have trouble sleeping at night knowing that I may have motivated such a travesty. IMO, the correct thing to do is to wait until the child is done. Since that is unacceptable, here is a solution that blocks on a read until the pid file exists instead of doing the looping sleep:
{ sh -c "$executable > $log 2>&1 &"'
echo $! > pidfile
echo # Alert parent that the pidfile has been written
wait $!
echo $? > exit-status
' & } | read