Bash: Checking for exit status of multi-pipe command chain - bash

I have a problem checking whether a certain command in a multi-pipe command chain did throw an error. Usually this is not hard to check but neither set -o pipefail nor checking ${PIPESTATUS[#]} works in my case. The setup is like this:
cmd="$snmpcmd $snmpargs $agent $oid | grep <grepoptions> for_stuff | cut -d',' f$fields | sed 's/ubstitute/some_other_stuff/g'"
Note-1: The command was tested thoroughly and works perfectly.
Now, I want to store the output of that command in an array called procdata. Thus, I did:
declare -a procdata
procdata=( $(eval $cmd) )
Note-2: eval is necessary because otherwise $snmpcmd throws up with an invalid option -- <grepoption> error which makes no sense because <grepoption> is not an $snmpcmd option obviously. At this stage I consider this a bug with $snmpcmd but that's another show...
If an error occurres, procdata will be empty. However, it might be empty for two different reasons: either because an error occurred while executing the $snmpcmd (e.g. timeout) or because grep couldn't find what it was looking for. The problem is, I need to be able to distinguish between these two cases and handle them separately.
Thus, set -o pipefail is not an option since it will propagate any error and I can't distinguish which part of the pipe failed. On the other hand echo ${PIPESTATUS[#]} is always 0 after procdata=( $(eval $cmd) ) even though I have many pipes!?. Yet if I execute the whole command directly at the prompt and call echo ${PIPESTATUS[#]} immediately after, it returns the exit status of all the pipes correctly.
I know I could bind the err stream to stdout but I would have to use heuristic methods to check whether the elements in procdata are valid or error messages and I run the risk of getting false positives. I could also pipe stdout to /dev/null and capture only the error stream and check whether ${#procdata[#]} -eq 0. But I'd have to repeat the call to get the actual data and the whole command is time costly (ca. 3-5s). I wouldn't want to call it twice. Or I could use a temporary file to write errors to but I'd rather do it without the overhead of creating/deleting files.
Any ideas how I can make this work in bash?
Thanks
P.S.:
$ echo $BASH_VERSION
4.2.37(1)-release

A number of things here:
(1) When you say eval $cmd and attempt to get the exit values of the processes in the pipeline contained in the command $cmd, echo "${PIPESTATUS[#]}" would contain only the exit status for eval. Instead of eval, you'd need to supply the complete command line.
(2) You need to get the PIPESTATUS while assigning the output of the pipeline to the variable. Attempting to do that later wouldn't work.
As an example, you can say:
foo=$(command | grep something | command2; echo "${PIPESTATUS[#]})"
This captures the output of the pipeline and the PIPESTATUS array into the variable foo.
You could get the command output into an array by saying:
result=($(head -n -1 <<< "$foo"))
and the PIPESTATUS array by saying
tail -1 <<< "$foo"

Related

How can I create timestamped logs and error handle in BASH at the same time?

I am writing a BASH script and two of the things I need it to do is:
Provide a timestamped log file.
Handle errors.
I am finding that these two objectives are clashing.
First of all, I am using the ts command to timestamp log entries, e.g. <a command/subscript> 2>&1 | ts '%H:%M:%S ' >> log. Note that I need all the lines output of any subscripts to be timestamped too. This works great... until I try to handle errors using exit codes.
Any command that fails (exits with a code of 1) is immediately followed with the ts command which executes successfully (exits with a code of 0). This means that I am unable to use the exit codes of the commands to handle errors with the $? environment variable because ts is always the last command to run and always has an exit code of 0.
Here is the case statement I am using:
<command> 2>&1 | ts '%H:%M:%S ' >> log
case $? in
0)
echo "Success"
;;
*)
echo "Failure"
esac
When a foreground pipeline returns, bash saves exit status values of its components to an array variable named PIPESTATUS. In this case, you can use ${PIPESTATUS[0]} (or just $PIPESTATUS; as you're interested in the first component) instead of $? to get <command>'s exit status value.
Proof of concept:
$ false | true | false | true
$ declare -p PIPESTATUS
declare -a PIPESTATUS=([0]="1" [1]="0" [2]="1" [3]="0")

BASH stops without error, but works if copied in terminal

I am trying to write a script to slice a 13 Gb file in smaller parts to launch a split computation on a cluster. What I wrote so far works on terminal if I copy and paste it, but stops at the first cycle of the for loop.
set -ueo pipefail
NODES=8
READS=0days_rep2.fasta
Ntot=$(cat $READS | grep 'read' | wc -l)
Ndiv=$(($Ntot/$NODES))
for i in $(seq 0 $NODES)
do
echo $i
start_read=$(cat $READS | grep 'read' | head -n $(($Ndiv*${i}+1)) | tail -n 1)
echo ${start_read}
end_read=$(cat $READS | grep 'read' | head -n $(($Ndiv*${i}+$Ndiv)) | tail -n 1)
echo ${end_read}
done
If I run the script:
(base) [andrea#andrea-xps data]$ bash cluster.sh
0
>baa12ba1-4dc2-4fae-a989-c5817d5e487a runid=314af0bb142c280148f1ff034cc5b458c7575ff1 sampleid=0days_rep2 read=280855 ch=289 start_time=2019-10-26T02:42:02Z
(base) [andrea#andrea-xps data]$
it seems to stop abruptly after the command "echo ${start_read}" without raising any sort of error. If I copy and paste the script in terminal it runs without problems.
I am using Manjaro linux.
Andrea
The problem:
The problem here (as #Jens suggested in a comment) has to do with the use of the -e and pipefail options; -e makes the shell exit immediately if any simple command gets an error, and pipefail makes a pipeline fail if any command in it fails.
But what's failing? Take a look at the command here:
start_read=$(cat $READS | grep 'read' | head -n $(($Ndiv*${i}+1)) | tail -n 1)
Which, clearly, runs the cat, grep, head, and tail commands in a pipeline (which runs in a subshell so the output can be captured and put in the start_read variable). So cat starts up, and starts reading from the file and shoving it down the pipe to grep. grep reads that, picks out the lines containing 'read', and feeds them on toward head. head reads the first line of that (note that on the first pass, Ndiv is 0, so it's running head -n 1) from its input, feeds that on toward the tail command, and then exits. tail passes on the one line it got, then exits as well.
The problem is that when head exited, it hadn't read everything grep had to give it; that left grep trying to shove data into a pipe with nothing on the other end, so the system sent it a SIGPIPE signal to tell it that wasn't going to work, and that caused grep to exit with an error status. And then since it exited, cat was similarly trying to stuff an orphaned pipe, so it got a SIGPIPE as well and also exited with an error status.
Since both cat and grep exited with errors, and pipefail is set, that subshell will also exit with an error status, and that means the parent shell considers the whole assignment command to have failed, and abort the script on the spot.
Solutions:
So, one possible solution is to remove the -e option from the set command. -e is kind of janky in what it considers an exit-worthy error and what it doesn't, so I don't generally like it anyway (see BashFAQ #105 for details).
Another problem with -e is that (as we've seen here) it doesn't give much of any indication of what went wrong, or even that something went wrong! Error checking is important, but so's error reporting.
(Note: the danger in removing -e is that your script might get a serious error partway through... and then blindly keep running, in a situation that doesn't make sense, possibly damaging things in the process. So you should think about what might go wrong as the script runs, and add manual error checking as needed. I'll add some examples to my script suggestion below.)
Anyway, just removing -e is just papering over the fact that this isn't a really good approach to the problem. You're reading (or trying to read) over the entire file multiple times, and processing it through multiple commands each time. You really should only be reading through the thing twice: once to figure out how many reads there are, and once to break it into chunks. You might be able to write a program to do the splitting in awk, but most unix-like systems already have a program specifically for this task: split. There's also no need for cat everywhere, since the other commands are perfectly capable of reading directly from files (again, #Jens pointed this out in a comment).
So I think something like this would work:
#!/bin/bash
set -uo pipefail # I removed the -e 'cause I don't trust it
nodes=8 # Note: lower- or mixed-case variables are safer to avoid conflicts
reads=0days_rep2.fasta
splitprefix=0days_split_
Ntot=$(grep -c 'read' "$reads") || { # grep can both read & count in a single step
# The || means this'll run if there was an error in that command.
# A normal thing to do is print an error message to stderr
# (with >&2), then exit the script with a nonzero (error) status
echo "$0: Error counting reads in $reads" >&2
exit 1
}
Ndiv=$((($Ntot+$nodes-1)/$nodes)) # Force it to round *up*, not down
grep 'read' "$reads" | split -l $Ndiv -a1 - "$splitprefix" || {
echo "$0: Error splitting fasta file" >&2
exit 1
}
This'll create files named "0days_split_a" through "0days_split_h". If you have the GNU version of split, you could add its -d option (use numeric suffixes instead of letters) and/or --additional-suffix=.fasta (to add the .fasta extension to the split files).
Another note: if only a little bit of that big file is read lines, it might be faster to run grep 'read' "$reads" >sometempfile first, and then run the rest of the script on the temp file, so you don't have to read & thin it twice. But if most of the file is read lines, this won't help much.
Alright, we have found the troublemaker: set -e in combination with set -o pipefail.
Gordon Davisson's answer provides all the details. I provide this answer for the sole purpose of reaping an upvote for my debugging efforts in the comments to your answer :-)

Bash script doesn't continue when condition fulfilled

To check the validity of lines in a file I'm using a condition which is met when egrep -v does NOT return an empty result. When there are invalid lines, then this works fine (i.e. the conditional block is executed), but when every line is valid then the script ends without further processing.
Script:
INVALID_HOSTS=$(egrep -v ${IP_REGEX} hosts)
if [[ ! -z "${INVALID_HOSTS}" ]]; then
echo "Invalid hosts:"
for entry in ${INVALID_HOSTS}
do echo ${entry}
done
exit_with_error_msg "hosts file contains invalid hosts (Pattern must be: \"\d+.\d+.\d+.\d+:\d+\"), exiting"
else
echo "all cool"
fi
echo "after if-else"
So when there are no invalid lines then neither the echo "all cool" nor echo "after if-else" get executed. The script just stops and returns to the shell.
When set -x is enabled, then it prints:
++ egrep -v '^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):([1-9]|[1-5]?[0-9]{2,4}|6[1-4][0-9]{3}|65[1-4][0-9]{2}|655[1-2][0-9]|6553[1-5])$' hosts
+ INVALID_HOSTS=
Playing around with it I'm sure that it's about the if [[ ! -z "${INVALID_HOSTS}" ]]; then, but my bash wizardry is not strong enough to overcome this magical barrier.
Thanks for any help!
This is a bit long for a comment. I'll start it as an answer and we can work our way through further details or I can scrap it entirely if not helpful. I'll make some assumptions and let us see if it hits the spot.
For starters, you do indeed use the value further, so command expansion into a variable is not entirely useless, but otherwise it's much easier to determine match (or lack thereof) of grep through it's return value. If anything matched (output would be non-empty), it returns (shell true) value of 0, otherwise it returns false (in this case 1). Not to mention the ! -z test notation should really be -n if used at all.
And this is where I'd start assuming a bit. I suspect this is not your entire script and you have errexit option turned on in that shell session (or through rc file in general). Either by means of set -o errexit or set -e or running bash with -e option. Since grep not matching anything returns as failed, your shell (script execution) would terminate after having encountered a failing command.
Observe the difference between:
$ bash -ec 'grep "BOGUS" /etc/fstab ; echo "$?"'
$ bash -c 'grep "BOGUS" /etc/fstab ; echo "$?"'
1
With errexit, bash terminates after grep has "failed" and we never even reach the echo.
Since the assumption has proven to be correct, small extension. If errexit is what you want, you'd need to either change the option value before/after a command you want to be able to fail (return non-zero value) without affecting your script:
set +o errexit
grep THIS_COULD_NOT_MATCH...
set -o errexit
Or you can ignore return value of individual commands by ensuring their success:
grep THIS_COULD_NOT_MATCH... || true
You can also still use potentially "failing" commands safely in conditionals (such as if) without terminating your shell.

What is the meaning of grep with stdout redirection to /dev/null in job script?

I have a bash script that is submitted as a bash job. It creates some files, executes some computations, moves the output files somewhere else and cleans up. For moving the output files, it contains these lines:
set -e
mv $tmp/stdout.txt $current/tmp.stdout.txt
grep Report $current/tmp.stdout.txt >/dev/null 2>&1
mv $current/tmp.stdout.txt $current/stdout.txt
set +e
If the computation was successfull, the output file stdout.txt contains several lines that start with Report; but none if not. Further processing checks that the $current/stdout.txt file exists (and resubmits the job otherwise).
The first mv moves the output file from the temporary directory to the final directory under a temporary name; and the second mv renames the output file to its final name. But what is the purpose of the grep in between? If the output file contains lines with Report, they are redirected to \dev\null and nothing happens. If the output file contains no lines with Report, it doesn't output anything, neither to the redirected stdout nor to the redirected stderr. So my impression is that this line does nothing and I should replace mv+grep+mv by a single mv. Which functionality do I overlook here?
The set -e is important here.
grep sets its exit status to 0 the input file is successfully processed and any results are found, and a nonzero value otherwise.
set -e tells the shell to exit if any checked command has a nonzero exit status. (It has a bunch of gotchas and caveats, and generally shouldn't be used; see BashFAQ #105).
Thus -- unless this code is embedded in a context that triggers one of the several scenarios where set -e has no effect -- your script terminates before the second mv if the grep has no matches.
A better way to write this section of your script would be:
mv "$tmp/stdout.txt" "$current/tmp.stdout.txt" || exit
grep -q Report "$current/tmp.stdout.txt" || exit
mv "$current/tmp.stdout.txt" "$current/stdout.txt" || exit
grep -q is more efficient than grep >/dev/null, since it can exit immediately when a match is seen, whereas otherwise grep needs to read all the way to the end of the input file. (2>/dev/null is just generally bad practice, since it hides errors you'd need to know about to debug misbehavior; hence that being removed here).
Quotes make variables with whitespace or glob characters safe, which they wouldn't be otherwise.
Putting || exit on individual commands you want to be fatal on errors is considerably more reliable than depending on set -e for the reasons given in BashFAQ #105 (skip the allegory for the exercises below if in a hurry, or see https://www.in-ulm.de/~mascheck/various/set-e/ for a list of cases where set -e's behavior is known to differ across different shells and/or shell releases).
Grep will return an error code if no matches are found.
set -e means the error will stop the script.
There are other options on grep that will mean it has no output instead of doing all the capturing.
The set -e configures the bash to abort at the first error it encounters. If the grep fails (finds nothing), the bash will terminate after the grep.
Most grep versions, however, know the -q option which makes them quiet (suppress all output), so the redirection is not needed anymore. Also, code relying on set -e isn't easy to maintain. A proper grep ... || exit 1 would be more explicit.

How to run flakey commands within set -e context?

I want to protect most of my bash script with set -e, in order to fail early and loudly when an error is detected during the script's processing. However, I still want to be able to run some commands that are actually expected to fail, such as using grep to evaluate the presence/absence of some file content that is used to direct the control flow of the rest of the script. How can I run grep within a set -e context, such that A) grep is allowed to fail and B) grep's exist status is recorded for access by the rest of the script?
In ordinary POSIX sh, I would do something like:
grep 'needle' haystack >/dev/null
if [ "$?" -eq 0 ]; then
handle_grep_results
else
handle_grep_no_results
fi
However, when set -e is specified before this section, then the script exits early whenever grep fails to find the needle. One way to work around this is to temporarily disable the protections with set +e, and then re-enable them after the section, but I would prefer to leave the protections on, if that makes sense. Is this possible with bash?
You can simply check the return status of grep:
grep -q luck myfile.txt || no_luck=1
Shell utilities use the return status to communicate with the shell; what they are communicating is not necessarily an error condition. As the grep example shows, it can be a simple boolean. In fact, the [[ builtin (and its friends [ and test) do precisely that: use the status code to return a boolean result. That doesn't make those utilities "flakey".
set -e ignores non-zero status returns from commands executed within a conditional or on the left-hand side of a || or && connector, which makes it possible to use set -e.
Having said that, -e is a very blunt tool and its use in production code is not generally recommended. You can always explicitly fail using ||:
important_setup || exit 1
Your if command seems to contain a couple of typos (an extra # and a missing space before ]), but more generally, you should understand that the very purpose of if is to run a command and check its exit code. Anything which looks like
command
if [ $? = 0 ]; then
is more compactly and idiomatically written
if command; then
and in this context, a failure exit status from command is not a condition which causes set -e to terminate the script (because then you couldn't have else clauses in scripts with set -e!)
In this particular example where both the then and else blocks contain simple commands, you can simplify to the shorthand
grep -q 'needle' haystack && handle_grep_results || handle_grep_no_results
which also suggests
command || true
when you simply don't care whether command succeeded or not in a script with set -e.
(Notice also grep -q over grep >/dev/null - the former implies -m 1, i.e. grep can close the file and return success as soon as it finds the first match.)

Resources