I have 4 shell scripts, first 3 scripts i want to execute parallel. Later after successful completion of all 3 scripts i want to execute 4th script
Parellelexecution
sh script1.sh,
sh script2.sh,
sh script3.sh
script4.sh should execute after all 3 execution.
bash 4.3 added a -n flag to wait that lets it wait for any one background job to complete. For a fixed number of background jobs, you could do use something like
script1.sh &
script2.sh &
script3.sh &
wait -n && wait -n && wait -n && script4.sh
For a large or variable number of background jobs, Kurt's answer is better.
In bash you can do:
pids=
for s in script1.sh script2.sh script3.sh; do
$s &
pids="$pids $!"
done
JOBS_FAILED=false
for pid in $pids; do
if ! wait $pid; then
# script didn't exit successfully
JOBS_FAILED=true
fi
done
if [[ $JOBS_FAILED == false ]]; then
script4.sh
fi
First it starts all the first 3 scripts in background and collects their pids. Then it runs through each pid waiting for it to exit and checking its return value. If any of the first three scripts fail, $JOBS_FAILED is set to the string true but all the processes are still waited on. Once all the first 3 scripts finish, the script checks if any jobs failed. If not, script4.sh is run.
Related
I have a bash script where I would like to run two processes in parallel, and have the script fail if either of the processes return non-zero. A minimal example of my initial attempt is:
#!/bin/bash
set -e
(sleep 3 ; true ) &
(sleep 4 ; false ) &
wait %1 && wait %2
echo "Still here, exit code: $?"
As expected this doesn't print the message because wait %1 && wait %2 fails and the script exits due to the set -e. However, if the waits are reversed such that the first one has the non-zero status (wait %2 && wait %1), the message is printed:
$ bash wait_test.sh
Still here, exit code: 1
Putting each wait on its own line works as I want and exits the script if either of the processes fail, but the fact that it doesn't work with && makes me suspect that I'm misunderstanding something here.
Can anyone explain what's going on?
You can achieve what you want quite elegantly with GNU Parallel and its "fail handling".
In general, it will run as many jobs in parallel as you have CPU cores.
In your case, try this, which says "exit with failed status if one or more jobs failed":
#!/bin/bash
cat <<EOF | parallel --halt soon,fail=1
echo Job 1; exit 0
echo Job 2; exit 1
EOF
echo GNU Parallel exit status: $?
Sample Output
Job 1
Job 2
parallel: This job failed:
echo Job 2; exit 1
GNU Parallel exit status: 1
Now run it such that no job fails:
#!/bin/bash
cat <<EOF | parallel --halt soon,fail=1
echo Job 1; exit 0
echo Job 2; exit 0
EOF
echo GNU Parallel exit status: $?
Sample Output
Job 1
Job 2
GNU Parallel exit status: 0
If you dislike the heredoc syntax, you can put the list of jobs in a file called jobs.txt like this:
echo Job 1; exit 0
echo Job 2; exit 0
Then run with:
parallel --halt soon,fail=1 < jobs.txt
From bash manual section about usage of set
-e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non- zero status because a command failed while -e was being ignored, the shell does not exit. A trap on ERR, if set, is executed before the shell exits. This option applies to the shell environment and each subshell environment separately (see COMMAND EXECUTION ENVIRONMENT above), and may cause subshells to exit before executing all the commands in the subshell.
tl;dr
In a bash script, for a command list like this
command1 && command2
command1 is run in a separate environment, so it cannot affect the script's execution environment. but command2 is run in the current environment, so it can affect
In a CI setting, I'd like to run multiple jobs in the background, and use set -e to exit on the first error.
This requires using wait -n instead of wait, but for increasing throughput I'd then want to move the for i in {1..20}; do wait -n; done to the end of the script.
Unfortunately, this means that it is hard to track the errors.
Rather, what I would want is to do the equivalent to a non-blocking wait -n often, and exit as soon as possible.
Is this possible or do I have to write my bash scripts as a Makefile?
Alternative Approach: Emulate set -e for background jobs
Instead of checking the jobs all the time it could be easier and more efficient to exit the script directly when a job fails. To this end, append ... || kill $$ to every job you start:
# before
myCommand &
myProgram arg1 arg2 &
# after
myCommand || kill $$ &
myProgram arg1 arg2 || kill $$ &
Non-Blocking wait -n
If you really have to, you can write your own non-blocking wait -n with a little trick:
nextJobExitCode() {
sleep 0.1 &
wait -n
exitCode="$?"
kill %%
return "$exitCode"
}
The function nextJobExitCode waits at most 0.1 seconds for your jobs. If none of your jobs were already finished or did finish in that 0.1 seconds, nextJobExitCode will terminate with exit code 0.
Example usage
set -e
sleep 1 & # job 1
(sleep 3; false) & # job 2
nextJobExitCode # won't exit. No jobs finished yet
sleep 2
nextJobExitCode # won't exit. Job 1 finished with 0
sleep 2
nextJobExitCode # will exit! Job 2 finished with 1
I know I can run my bash script in the background by using bash script.sh & disown or alternatively, by using nohup. However, I want to run my script in the background by default, so when I run bash script.sh or after making it executable, by running ./script.sh it should run in the background by default. How can I achieve this?
Self-contained solution:
#!/bin/sh
# Re-spawn as a background process, if we haven't already.
if [[ "$1" != "-n" ]]; then
nohup "$0" -n &
exit $?
fi
# Rest of the script follows. This is just an example.
for i in {0..10}; do
sleep 2
echo $i
done
The if statement checks if the -n flag has been passed. If not, it calls itself with nohup (to disassociate the calling terminal so closing it doesn't close the script) and & (to put the process in the background and return to the prompt). The parent then exits to leave the background version to run. The background version is explicitly called with the -n flag, so wont cause an infinite loop (which is hell to debug!).
The for loop is just an example. Use tail -f nohup.out to see the script's progress.
Note that I pieced this answer together with this and this but neither were succinct or complete enough to be a duplicate.
Simply write a wrapper that calls your actual script with nohup actualScript.sh &.
Wrapper script wrapper.sh
#! /bin/bash
nohup ./actualScript.sh &
Actual script in actualScript.sh
#! /bin/bash
for i in {0..10}
do
sleep 10 #script is running, test with ps -eaf|grep actualScript
echo $i
done
tail -f 10 nohup.out
0
1
2
3
4
...
Adding to Heath Raftery's answer, what worked for me is a variation of what he suggested such as this:
if [[ "$1" != "-n" ]]; then
$0 -n & disown
exit $?
fi
I'm running some tests in parallel by calling a process from a script. Each process prints only to stdout > a file, and exits 0 iff successful (otherwise -1).
If and when a process exits with -1, I print something to its (or a related) output file (namely, the arguments it was called with), kill all other processes, and exit.
I have written a script using trap "..." CHLD to run some code when a subprocess exits and this works under certain conditions, but I find my script is not very robust. If I send a keyboard interrupt sometimes the subprocesses keep going, and sometimes the number of subprocesses simply overwhelm the machine(s) and none of them seem to advance.
I am using this on my quad core laptop as well as a cluster of 128 CPUs, over which subprocesses are distributed automatically. How do I run a large number of background subprocesses in a bash script, limited to some number of them running concurrently, and do something + exit if one of them returns with a bad code? I would also like the script to clean up after keyboard interrupt. Should I use GNU-parallel? how?
Here is a MWE of my script so far, which spawns subprocesses unhindered, annotated with what I think each part means. I got the idea to use trap from shell - get exit code of background process
$ cat parallel_tests.sh
#!/bin/bash
# some help from https://stackoverflow.com/questions/1570262/shell-get-exit-code-of-background-process
handle_chld() {
#echo pids are ${pids[#]}
local tmp=() ###temporary storage for pids that haven't finished
#for each pid that hadn't finished since the last trap
for((i=0;i<${#pids[#]};++i)); do
#if this pid is still running
if [[ $(ps -p ${pids[i]} -o pid=) ]]
then
tmp+=(${pids[i]}) ### add pid to list of pids that are running
else
wait ${pids[i]} ### put the exit code of this pid into $?
if [ "$?" != "0" ] ### if the exit code $? is non-zero
then
#kill all remaning processes
for((j=0;j<${#pids[#]};++j))
do
if [[ $(ps -p ${pids[j]} -o pid=) ]]
then
echo killing child processes of ${pids[j]}
pkill -P ${pids[j]}
fi
done
cat _tmp${pids[i]}
#print things to the terminal here
echo "FAILED process ${pids[i]} args: `cat _tmpargs${pids[i]}`"
exit 1
else
echo "FINISHED process ${pids[i]} args: `cat _tmpargs${pids[i]}`"
fi
fi
done
#update list of running pids
pids=(${tmp[#]})
}
# set this to monitor SIGCHLD
set -o monitor
# call handle_chld() when SIGCHLD signal is triggered
trap "handle_chld" CHLD
ALL_ARGS="2 32 87" ### ad nauseam
for A in $ALL_ARGS; do
(sleep $A; false) > _tmp$! &
pids+=($!)
echo $A > _tmpargs${pids[${#pids[#]}-1]}
echo "STARTED process ${pids[${#pids[#]}-1]} args: `cat _tmpargs${pids[${#pids[#]}-1]}`"
done
echo "Every process started. Now waiting on PIDS:"
echo ${pids[#]}
wait ${pids[#]} ###wait until every process is finished (or exit in the trap)
The output of this version after 2+epsilon seconds is:
$ ./parallel_tests.sh
STARTED process 66369 args: 2
STARTED process 66374 args: 32
STARTED process 66381 args: 87
Every process started. Now waiting on PIDS:
66369 66374 66381
killing child processes of 66374
./parallel_tests.sh: line 43: 66376 Terminated: 15 sleep $A
killing child processes of 66381
./parallel_tests.sh: line 43: 66383 Terminated: 15 sleep $A
FAILED process 66369 args: 2
Essentially, pid 66369 fails first, and the other two processes are dealt with in the trap. I have simplified the construction of the test processes here, so we can't assume that I'll manually insert waits before spawning new ones. Additionally, some of the test processes can be nearly instant. Essentially, I have a whole mess of test processes, long and short, starting as soon as resources can be allotted.
I'm not sure what's causing the problems I mentioned above, as this script uses several features that are new to me. General pointers are welcomed!
(I have seen this question and it does not answer my question)
cat arguments | parallel --halt now,fail=1 my_prg
Alternatively:
parallel --halt now,fail=1 my_prg ::: $ALL_ARGS
GNU Parallel is designed so it will also kill remote jobs. It does that using process groups and heavy perl scripting on the remote server: https://www.gnu.org/software/parallel/parallel_design.html#The-remote-system-wrapper
I'm trying to run 3 commands in parallel in bash shell:
$ (first command) & (second command) & (third command) & wait
The problem with this is that if first command fails, for example, the exit code is 0 (I guess because wait succeeds).
The desired behavior is that if one of the commands fails, the exit code will be non-zero (and ideally, the other running commands will be stopped).
How could I achieve this?
Please note that I want to run the commands in parallel!
the best I can think of is:
first & p1=$!
second & p2=$!
...
wait $p1 && wait $p2 && ..
or
wait $p1 || ( kill $p2 $p3 && exit 1 )
...
however this still enforces an order for the check of processes, so if the third fails immediately you won't notice it until the first and second finishes.
This might work for you:
parallel -j3 --halt 2 <list_of_commands.txt
This will run 3 commands in parallel.
If any running job fails it will kill the remaining running jobs and then stop, returning the exit code of the failing job.
You should use && instead of &. eg:
first command && second command && third command && wait
However this will NOT run your command in parallel as every subsequent command's execution will depend on exit code 0 of the previous command.
The shell function below will wait for all PIDs passed as arguments to finish, returning 0 if all PIDs executed without error.
The first PID that exists with an error will cause the PIDs that come after it to be killed, and the exit code that caused the error will be returned by the function.
wait_and_fail_on_first() {
local piderr=0 i
while test $# -gt 0; do {
dpid="$1"; shift
wait $dpid || { piderr=$?; kill $#; return $piderr ;}
} done
}
Here's how to use it:
(first command) & pid1=$!
(second command) & pid2=$!
(third command) & pid3=$!
wait_and_fail_on_first $pid1 $pid2 $pid3 || {
echo "PID $dpid failed with code $?"
echo "Other PIDs were killed"
}