Waiting for multiple processes in bash with set -e - bash

I have a bash script where I would like to run two processes in parallel, and have the script fail if either of the processes return non-zero. A minimal example of my initial attempt is:
#!/bin/bash
set -e
(sleep 3 ; true ) &
(sleep 4 ; false ) &
wait %1 && wait %2
echo "Still here, exit code: $?"
As expected this doesn't print the message because wait %1 && wait %2 fails and the script exits due to the set -e. However, if the waits are reversed such that the first one has the non-zero status (wait %2 && wait %1), the message is printed:
$ bash wait_test.sh
Still here, exit code: 1
Putting each wait on its own line works as I want and exits the script if either of the processes fail, but the fact that it doesn't work with && makes me suspect that I'm misunderstanding something here.
Can anyone explain what's going on?

You can achieve what you want quite elegantly with GNU Parallel and its "fail handling".
In general, it will run as many jobs in parallel as you have CPU cores.
In your case, try this, which says "exit with failed status if one or more jobs failed":
#!/bin/bash
cat <<EOF | parallel --halt soon,fail=1
echo Job 1; exit 0
echo Job 2; exit 1
EOF
echo GNU Parallel exit status: $?
Sample Output
Job 1
Job 2
parallel: This job failed:
echo Job 2; exit 1
GNU Parallel exit status: 1
Now run it such that no job fails:
#!/bin/bash
cat <<EOF | parallel --halt soon,fail=1
echo Job 1; exit 0
echo Job 2; exit 0
EOF
echo GNU Parallel exit status: $?
Sample Output
Job 1
Job 2
GNU Parallel exit status: 0
If you dislike the heredoc syntax, you can put the list of jobs in a file called jobs.txt like this:
echo Job 1; exit 0
echo Job 2; exit 0
Then run with:
parallel --halt soon,fail=1 < jobs.txt

From bash manual section about usage of set
-e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non- zero status because a command failed while -e was being ignored, the shell does not exit. A trap on ERR, if set, is executed before the shell exits. This option applies to the shell environment and each subshell environment separately (see COMMAND EXECUTION ENVIRONMENT above), and may cause subshells to exit before executing all the commands in the subshell.
tl;dr
In a bash script, for a command list like this
command1 && command2
command1 is run in a separate environment, so it cannot affect the script's execution environment. but command2 is run in the current environment, so it can affect

Related

bashs's command behavior about exit pipe exit. `exit 1 | exit 2`

I am curious about bash's behavior and the exit status of the situation when I enter the command
exit [exit status] | exit [exit status] | .. [repetition of exit and exit status]
it gives me output below. and, then doesn't exits.
Is this an undefined behavior?
bash-3.2$ exit 1 | exit 2
bash-3.2$ echo $?
2
From the bash man page:
Each command in a pipeline is executed as a separate process (i.e., in a subshell).
So, even the first exit will not exit your shell, as it only exits the subshell.
As for the exit codes:
The return status of a pipeline is the exit status of the last command, unless the pipefail option is enabled. If pipefail is enabled, the pipeline's return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully.
You can activate pipefail like this:
$ set -o pipefail
$ exit 1 | exit 2 | exit 0
$ echo $?
2
exit 1 | exit 2 is not sequential but concurrent.
Even if the last command takes STDOUT output from the first command.
What is a simple explanation for how pipes work in Bash?
Moreover, each one of those commands is executed in a subshell.
So your main shell, where you type commands is not exited.
A piped command is like a whole composition of commands instead of one command after another.
If you want to exit, you can make it sequential exit 1 || exit 2.
Finally, by default, $? is the most recent foreground pipeline exit status.
What are the special dollar sign shell variables?

What does ": exit 0" mean in a bash script?

I found in some bash scripts, usually at the end of the file, there is one line of code, like below:
: exit 0
What's the meaning of it? Can I remove it directly?
The bash builtin : is basically a command that returns zero (success) after all its arguments are expanded by the shell(a). In this case, the expansion doesn't really do anything so it's effectively a null operation. I suspect it's there just to indicate the effect of the :(b).
And the effect of that : has to do with what bash scripts return. They basically return the exit status of the last command that was run in the script. The : will therefore force the exit status of the script as a whole to be zero regardless of what the command before it returned.
You can see the effect with the following script:
ls /tmp/nosuchfile 2> /dev/null
If you run that followed by echo $?, you'll see an error code:
pax> ./script.sh ; echo $?
2
If you then change the script to:
ls /tmp/nosuchfile 2> /dev/null
: some arbitrary text
then you will see a success code from the script:
pax> ./script.sh ; echo $?
0
(a) I often use it for infinite loops, such as:
while : ; do somePeriodicThing ; sleep 60 ; done
(b) Of course, it's not quite the same as exit 0 since the exit will exit your current shell which will act differently depending on whether you ran it or sourced it:
./script.sh # runs it, exit will exit that script.
. ./script.sh # sources it, exit will exit your shell.
The : will not exit your current shell in either of those two cases.

Can bash spawn multiple processes and report how many ran successfully?

Is it possible in Bash to spawn multiple processes and after the last process finishes, report how many of the processes terminated correctly/didn't core dump?
Or would it be better to do this in Python?
(I'd ideally like to report which command failed, if any)
You can hopefully leverage GNU Parallel and its failure handling. General example:
parallel ::: ./processA ./processB ./processC
Specific example... here I run 3 simple jobs, each surrounded by single quotes and set it up to stop once all jobs are complete or failed:
parallel --halt soon,fail=100% ::: 'echo 0 && exit 0' 'echo 1 && exit 1' 'echo 2 && exit 2'
Output
0
1
parallel: This job failed:
echo 1 && exit 1
2
parallel: This job failed:
echo 2 && exit 2
By default, it will run N jobs in parallel, where N is the number of cores your CPU has, if you just want the jobs to be run sequentially, use:
parallel -j 1 ...
Obviously you could pipe the output through grep -c "This job failed" to count the failures.
Assuming you have a file with the commands:
cmd1
cmd2
cmd3
Then this will give you the number of failed jobs as long as you have at most 100 failures:
cat file | parallel
a=$?; echo $((`wc -l <file`-$a))
To get exactly which jobs failed use --joblog.
cat file | parallel --joblog my.log
# Find non-zero column 7
grep -v -P '(.*\t){6}0\t.*\t' my.log
It's easy.
First run your jobs in the background. Remember the pids.
Then for each child execute wait $pid and see the wait exit status, which is equal to the exit status of the childs pid you pass to it.
If the exit status is zero, it means the child terminated successfully.
#!/bin/bash
exit 0 &
childs+=($!)
exit 1 &
childs+=($!)
exit 2 &
childs+=($!)
echo 1 &
childs+=($!)
successes=0
for i in "${childs[#]}"; do
wait $i
if (($? == 0)); then
((successes++))
fi
done
# will print that 2 processes (exit 0 and echo 1) terminated successfully
printf "$successes processes terminated correctly and didn't core dump\n"

Tee resets exit status is always 0

I have a short script like this:
#!/bin/bash
<some_process> | tee -a /tmp/some.log &
wait $(pidof <some_process_name>)
echo $?
The result is always 0, irrespective of the exit status of some_process.
I know PIPESTATUS can be used here, but why does tee break wait?
Well, this is something that, for some peculiar reason, the docs don't mention. The code, however, does:
int wait_for (pid) { /*...*/
/* If this child is part of a job, then we are really waiting for the
job to finish. Otherwise, we are waiting for the child to finish. [...] */
if (job == NO_JOB)
job = find_job (pid, 0, NULL);
So it's actually waiting for the whole job, which, as we know, normally yields the exit status of the last command in the chain.
To make matters worse, $PIPESTATUS can only be used with the last foreground command.
You can, however, utilize $PIPESTATUS in a subshell job, like this:
(<some_process> | tee -a /tmp/some.log; exit ${PIPESTATUS[0]}) &
# somewhere down the line:
wait %%<some_process>
The trick here is to use $PIPESTATUS but also wait -n:
Check this example:
#!/bin/bash
# start background task
(sleep 5 | false | tee -a /tmp/some.log ; exit ${PIPESTATUS[1]}) &
# foreground code comes here
echo "foo"
# wait for background task to finish
wait -n
echo $?
The reason you always get an exit status of 0 is that the exit status returned is that of the last command in the pipeline, which is tee. By using a pipe, you eliminate the normal detection of exit status of <some_command>.
From the bash man page:
The return status of a pipeline is the exit status of the last command, unless the pipefail option is enabled. If pipefail is enabled, the pipeline's return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully.
So... The following might be what you want:
#!/usr/bin/env bash
set -o pipefail
<some_command> | tee -a /tmp/some.log &
wait %1
echo $?

Bash & (ampersand) operator

I'm trying to run 3 commands in parallel in bash shell:
$ (first command) & (second command) & (third command) & wait
The problem with this is that if first command fails, for example, the exit code is 0 (I guess because wait succeeds).
The desired behavior is that if one of the commands fails, the exit code will be non-zero (and ideally, the other running commands will be stopped).
How could I achieve this?
Please note that I want to run the commands in parallel!
the best I can think of is:
first & p1=$!
second & p2=$!
...
wait $p1 && wait $p2 && ..
or
wait $p1 || ( kill $p2 $p3 && exit 1 )
...
however this still enforces an order for the check of processes, so if the third fails immediately you won't notice it until the first and second finishes.
This might work for you:
parallel -j3 --halt 2 <list_of_commands.txt
This will run 3 commands in parallel.
If any running job fails it will kill the remaining running jobs and then stop, returning the exit code of the failing job.
You should use && instead of &. eg:
first command && second command && third command && wait
However this will NOT run your command in parallel as every subsequent command's execution will depend on exit code 0 of the previous command.
The shell function below will wait for all PIDs passed as arguments to finish, returning 0 if all PIDs executed without error.
The first PID that exists with an error will cause the PIDs that come after it to be killed, and the exit code that caused the error will be returned by the function.
wait_and_fail_on_first() {
local piderr=0 i
while test $# -gt 0; do {
dpid="$1"; shift
wait $dpid || { piderr=$?; kill $#; return $piderr ;}
} done
}
Here's how to use it:
(first command) & pid1=$!
(second command) & pid2=$!
(third command) & pid3=$!
wait_and_fail_on_first $pid1 $pid2 $pid3 || {
echo "PID $dpid failed with code $?"
echo "Other PIDs were killed"
}

Resources