How to run flakey commands within set -e context? - bash

I want to protect most of my bash script with set -e, in order to fail early and loudly when an error is detected during the script's processing. However, I still want to be able to run some commands that are actually expected to fail, such as using grep to evaluate the presence/absence of some file content that is used to direct the control flow of the rest of the script. How can I run grep within a set -e context, such that A) grep is allowed to fail and B) grep's exist status is recorded for access by the rest of the script?
In ordinary POSIX sh, I would do something like:
grep 'needle' haystack >/dev/null
if [ "$?" -eq 0 ]; then
handle_grep_results
else
handle_grep_no_results
fi
However, when set -e is specified before this section, then the script exits early whenever grep fails to find the needle. One way to work around this is to temporarily disable the protections with set +e, and then re-enable them after the section, but I would prefer to leave the protections on, if that makes sense. Is this possible with bash?

You can simply check the return status of grep:
grep -q luck myfile.txt || no_luck=1
Shell utilities use the return status to communicate with the shell; what they are communicating is not necessarily an error condition. As the grep example shows, it can be a simple boolean. In fact, the [[ builtin (and its friends [ and test) do precisely that: use the status code to return a boolean result. That doesn't make those utilities "flakey".
set -e ignores non-zero status returns from commands executed within a conditional or on the left-hand side of a || or && connector, which makes it possible to use set -e.
Having said that, -e is a very blunt tool and its use in production code is not generally recommended. You can always explicitly fail using ||:
important_setup || exit 1

Your if command seems to contain a couple of typos (an extra # and a missing space before ]), but more generally, you should understand that the very purpose of if is to run a command and check its exit code. Anything which looks like
command
if [ $? = 0 ]; then
is more compactly and idiomatically written
if command; then
and in this context, a failure exit status from command is not a condition which causes set -e to terminate the script (because then you couldn't have else clauses in scripts with set -e!)
In this particular example where both the then and else blocks contain simple commands, you can simplify to the shorthand
grep -q 'needle' haystack && handle_grep_results || handle_grep_no_results
which also suggests
command || true
when you simply don't care whether command succeeded or not in a script with set -e.
(Notice also grep -q over grep >/dev/null - the former implies -m 1, i.e. grep can close the file and return success as soon as it finds the first match.)

Related

Bash set -e doesn't work as expected(I understand)

#!/bin/bash
set -exuo pipefail
# Run delorean to update the namespaces folder
main() {
if [ !$(yq -r '.random' file_that_doesnt_exist.yaml) = "true" ]; then
echo "yes"
else
echo "no"
fi
}
# shellcheck disable=SC2068
main $#
set -e pipefail based on my understanding should exit the bash script on the first occurence of error. However, I get "no" in stdout even though `echo "no" occurs after the error. How does that happen?
set -e stops the script on error, but not always:
The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !.
(Quote from man bash).
Basically, don't use -e (or at least don't only use that.)
Use a trap, and carefully design your tests with it in mind.
c.f. this whole page for good discussions on alternatives and explanations, and this one for some other examples and elaborations that should give you good inspiration for a better solution.

Bash script doesn't continue when condition fulfilled

To check the validity of lines in a file I'm using a condition which is met when egrep -v does NOT return an empty result. When there are invalid lines, then this works fine (i.e. the conditional block is executed), but when every line is valid then the script ends without further processing.
Script:
INVALID_HOSTS=$(egrep -v ${IP_REGEX} hosts)
if [[ ! -z "${INVALID_HOSTS}" ]]; then
echo "Invalid hosts:"
for entry in ${INVALID_HOSTS}
do echo ${entry}
done
exit_with_error_msg "hosts file contains invalid hosts (Pattern must be: \"\d+.\d+.\d+.\d+:\d+\"), exiting"
else
echo "all cool"
fi
echo "after if-else"
So when there are no invalid lines then neither the echo "all cool" nor echo "after if-else" get executed. The script just stops and returns to the shell.
When set -x is enabled, then it prints:
++ egrep -v '^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):([1-9]|[1-5]?[0-9]{2,4}|6[1-4][0-9]{3}|65[1-4][0-9]{2}|655[1-2][0-9]|6553[1-5])$' hosts
+ INVALID_HOSTS=
Playing around with it I'm sure that it's about the if [[ ! -z "${INVALID_HOSTS}" ]]; then, but my bash wizardry is not strong enough to overcome this magical barrier.
Thanks for any help!
This is a bit long for a comment. I'll start it as an answer and we can work our way through further details or I can scrap it entirely if not helpful. I'll make some assumptions and let us see if it hits the spot.
For starters, you do indeed use the value further, so command expansion into a variable is not entirely useless, but otherwise it's much easier to determine match (or lack thereof) of grep through it's return value. If anything matched (output would be non-empty), it returns (shell true) value of 0, otherwise it returns false (in this case 1). Not to mention the ! -z test notation should really be -n if used at all.
And this is where I'd start assuming a bit. I suspect this is not your entire script and you have errexit option turned on in that shell session (or through rc file in general). Either by means of set -o errexit or set -e or running bash with -e option. Since grep not matching anything returns as failed, your shell (script execution) would terminate after having encountered a failing command.
Observe the difference between:
$ bash -ec 'grep "BOGUS" /etc/fstab ; echo "$?"'
$ bash -c 'grep "BOGUS" /etc/fstab ; echo "$?"'
1
With errexit, bash terminates after grep has "failed" and we never even reach the echo.
Since the assumption has proven to be correct, small extension. If errexit is what you want, you'd need to either change the option value before/after a command you want to be able to fail (return non-zero value) without affecting your script:
set +o errexit
grep THIS_COULD_NOT_MATCH...
set -o errexit
Or you can ignore return value of individual commands by ensuring their success:
grep THIS_COULD_NOT_MATCH... || true
You can also still use potentially "failing" commands safely in conditionals (such as if) without terminating your shell.

What is the meaning of grep with stdout redirection to /dev/null in job script?

I have a bash script that is submitted as a bash job. It creates some files, executes some computations, moves the output files somewhere else and cleans up. For moving the output files, it contains these lines:
set -e
mv $tmp/stdout.txt $current/tmp.stdout.txt
grep Report $current/tmp.stdout.txt >/dev/null 2>&1
mv $current/tmp.stdout.txt $current/stdout.txt
set +e
If the computation was successfull, the output file stdout.txt contains several lines that start with Report; but none if not. Further processing checks that the $current/stdout.txt file exists (and resubmits the job otherwise).
The first mv moves the output file from the temporary directory to the final directory under a temporary name; and the second mv renames the output file to its final name. But what is the purpose of the grep in between? If the output file contains lines with Report, they are redirected to \dev\null and nothing happens. If the output file contains no lines with Report, it doesn't output anything, neither to the redirected stdout nor to the redirected stderr. So my impression is that this line does nothing and I should replace mv+grep+mv by a single mv. Which functionality do I overlook here?
The set -e is important here.
grep sets its exit status to 0 the input file is successfully processed and any results are found, and a nonzero value otherwise.
set -e tells the shell to exit if any checked command has a nonzero exit status. (It has a bunch of gotchas and caveats, and generally shouldn't be used; see BashFAQ #105).
Thus -- unless this code is embedded in a context that triggers one of the several scenarios where set -e has no effect -- your script terminates before the second mv if the grep has no matches.
A better way to write this section of your script would be:
mv "$tmp/stdout.txt" "$current/tmp.stdout.txt" || exit
grep -q Report "$current/tmp.stdout.txt" || exit
mv "$current/tmp.stdout.txt" "$current/stdout.txt" || exit
grep -q is more efficient than grep >/dev/null, since it can exit immediately when a match is seen, whereas otherwise grep needs to read all the way to the end of the input file. (2>/dev/null is just generally bad practice, since it hides errors you'd need to know about to debug misbehavior; hence that being removed here).
Quotes make variables with whitespace or glob characters safe, which they wouldn't be otherwise.
Putting || exit on individual commands you want to be fatal on errors is considerably more reliable than depending on set -e for the reasons given in BashFAQ #105 (skip the allegory for the exercises below if in a hurry, or see https://www.in-ulm.de/~mascheck/various/set-e/ for a list of cases where set -e's behavior is known to differ across different shells and/or shell releases).
Grep will return an error code if no matches are found.
set -e means the error will stop the script.
There are other options on grep that will mean it has no output instead of doing all the capturing.
The set -e configures the bash to abort at the first error it encounters. If the grep fails (finds nothing), the bash will terminate after the grep.
Most grep versions, however, know the -q option which makes them quiet (suppress all output), so the redirection is not needed anymore. Also, code relying on set -e isn't easy to maintain. A proper grep ... || exit 1 would be more explicit.

Bash fail fast functions

There are recommendations to use the following options to make Bash fail fast:
set -o errexit
set -o nounset
set -o pipefail
These options however do not work as expected for Bash functions, piped via ||.
E.g. in script
#!/bin/bash
set -o errexit
set -o nounset
set -o pipefail
my_function() {
test -z "$1"
echo "this SHOULD NOT be printed"
}
my_function 1 || echo "???" # 1
my_function 1 # 2
echo "this will not be printed"
Line # 2 will cause the script to terminate with code 1 without any output. This is what I expect.
Line # 1 confuses me actually: my_function will successfully be completed, printing "this SHOULD NOT be printed" and returning code 0, thus the "???" will not be printed.
How can I make Bash to process my_function on line # 1 the same fail fast way, as on line # 2?
There are also better recommendations not to use set -e/errexit:
Clearly we don't want to abort when the conditional (in if [ -d /foo ]; then), returns non-zero. [...] The implementors decided to make a bunch of special rules, like "commands that are part of an if test are immune", or "commands in a pipeline, other than the last one, are immune".
These rules are extremely convoluted, and they still fail to catch even some remarkably simple cases. Even worse, the rules change from one Bash version to another, as Bash attempts to track the extremely slippery POSIX definition of this "feature". When a SubShell is involved, it gets worse still -- the behavior changes depending on whether Bash is invoked in POSIX mode.
If the script were to stop on a non-zero return, foo || bar would also be useless because bar would never run (the script would instead exit). Therefore, one of these convoluted rules is that non-zero returns in commands on the left-hand side of && or || will not cause the script to exit. This is the effect you're seeing.
There is no way to make set -e work properly and there is no automatic replacement. You can not make it work the way you want without writing manual error handling.
Looking at the bash manual, here is what is says for the trap command:
trap [-lp] [arg] [sigspec …]
If a sigspec is ERR, the command arg is executed whenever a pipeline (which may consist of a single simple command), a list, or a compound
command returns a non-zero exit status, subject to the following
conditions. The ERR trap is not executed if the failed command is part
of the command list immediately following an until or while keyword,
part of the test following the if or elif reserved words, part of a
command executed in a && or || list except the command following the
final && or ||, any command in a pipeline but the last, or if the
command’s return status is being inverted using !. These are the same
conditions obeyed by the errexit (-e) option.
Look at the last sentence especially. That explains it!

Bash: Checking for exit status of multi-pipe command chain

I have a problem checking whether a certain command in a multi-pipe command chain did throw an error. Usually this is not hard to check but neither set -o pipefail nor checking ${PIPESTATUS[#]} works in my case. The setup is like this:
cmd="$snmpcmd $snmpargs $agent $oid | grep <grepoptions> for_stuff | cut -d',' f$fields | sed 's/ubstitute/some_other_stuff/g'"
Note-1: The command was tested thoroughly and works perfectly.
Now, I want to store the output of that command in an array called procdata. Thus, I did:
declare -a procdata
procdata=( $(eval $cmd) )
Note-2: eval is necessary because otherwise $snmpcmd throws up with an invalid option -- <grepoption> error which makes no sense because <grepoption> is not an $snmpcmd option obviously. At this stage I consider this a bug with $snmpcmd but that's another show...
If an error occurres, procdata will be empty. However, it might be empty for two different reasons: either because an error occurred while executing the $snmpcmd (e.g. timeout) or because grep couldn't find what it was looking for. The problem is, I need to be able to distinguish between these two cases and handle them separately.
Thus, set -o pipefail is not an option since it will propagate any error and I can't distinguish which part of the pipe failed. On the other hand echo ${PIPESTATUS[#]} is always 0 after procdata=( $(eval $cmd) ) even though I have many pipes!?. Yet if I execute the whole command directly at the prompt and call echo ${PIPESTATUS[#]} immediately after, it returns the exit status of all the pipes correctly.
I know I could bind the err stream to stdout but I would have to use heuristic methods to check whether the elements in procdata are valid or error messages and I run the risk of getting false positives. I could also pipe stdout to /dev/null and capture only the error stream and check whether ${#procdata[#]} -eq 0. But I'd have to repeat the call to get the actual data and the whole command is time costly (ca. 3-5s). I wouldn't want to call it twice. Or I could use a temporary file to write errors to but I'd rather do it without the overhead of creating/deleting files.
Any ideas how I can make this work in bash?
Thanks
P.S.:
$ echo $BASH_VERSION
4.2.37(1)-release
A number of things here:
(1) When you say eval $cmd and attempt to get the exit values of the processes in the pipeline contained in the command $cmd, echo "${PIPESTATUS[#]}" would contain only the exit status for eval. Instead of eval, you'd need to supply the complete command line.
(2) You need to get the PIPESTATUS while assigning the output of the pipeline to the variable. Attempting to do that later wouldn't work.
As an example, you can say:
foo=$(command | grep something | command2; echo "${PIPESTATUS[#]})"
This captures the output of the pipeline and the PIPESTATUS array into the variable foo.
You could get the command output into an array by saying:
result=($(head -n -1 <<< "$foo"))
and the PIPESTATUS array by saying
tail -1 <<< "$foo"

Resources