Bash source doesn't bail out on errexit, any possible reason?

Bash source doesn't bail out on errexit, any possible reason? - bash

I couldn't explain what happens in my scripts, could anyone shed some light please?
I am doing some pretty standard stuff, set errexit, sourcing one script from another, catching errors and eventually bailing out if any.
s1.sh
#!/bin/bash
num=1
if [ $num -eq 1 ]; then
FOO="$(set -o | grep -e "errexit" -e "nounset" | grep off >&2)"
VAR="SOME/TEXT/$(basename "$UNBOUND_VARIABLE")"
RET="$(echo $?)"
#ERR="$UNBOUND_VARIABLE" # this will be trapped and source will exit at this line
BAR="LAST_IS_GOOD"
fi
s2.sh
function source_all
{
local __f
set -exu
for __f in ${#}; do
case "$__f" in
"s1.sh" ) set -o posix; (source "$(pwd)/$__f") || return 1; echo "$$ $?" >&2 ;;
esac
done
set +eux +o posix
}
function main
{
source_all s1.sh || return 1
}
main
output
+ for __f in ${#}
+ case "$__f" in
+ set -o posix
++ pwd
+ source (blah/blah)/s1.sh
++ num=1
++ '[' 1 -eq 1 ']'
+++ set -o
+++ grep -e errexit -e nounset
+++ grep off
++ FOO=
(blah/blah)/s1.sh: line 6: UNBOUND_VARIABLE: unbound variable # should exit
++ VAR=SOME/TEXT/
+++ echo 1
++ RET=1
++ BAR=LAST_IS_GOOD
+ echo '9568 0'
9568 0
+ set +eux +o posix
source --help
Exit Status:
Returns the status of the last command executed in FILENAME; fails if
FILENAME cannot be read.
question is: why source invoked in s2.sh doesn't return 1? Why does it keep processing s1.sh after UNBOUND_VARIABLE?
thanks for your inputs

The UNBOUND VARIABLE error comes because you are using set -u are referencing with $UNBOUND_VARIABLE a variable named _UNBOUND_VARIABLE_ which has not been assigned to, in the statement
VAR="SOME/TEXT/$(basename "$UNBOUND_VARIABLE")"
. The set -e does have an effect, in that the subshell this is executed, i.e.
(source "$(pwd)/$__f")
is aborted. While the subshell due this abort indeed returns with non-zero exit code, but this does not trigger an exit of the parent process, because you have a || return to the right. For the same reason, the command
false || echo x
would not terminate the execution, even though a single
false
would.

Related

Why chain commands with "and" operator ("&&") don't stop on non-zero result with enabled "errexit"?

I have default bash v4.4.12 set up on Debian:
$ bash --version
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
I prefer to use these options in my scripts:
set -o pipefail
set -o errexit
set -o nounset
It make stop a script on non-zero result in pipe-commands (pipefail), execute exit (errexit), and validate unset variables (nounset).
I have test script:
set -o pipefail
set -o errexit
set -o nounset
set -o xtrace
return_1() { return 22; }
test_1 () {
return_1;
echo 'after';
}
success_1() {
echo success
}
if [ "${1:-}" == 1 ]; then
# make "return 1" in the root level
return_1
fi
if [ "${1:-}" == 2 ]; then
# run test_1, we will get "return 1" within a function
test_1
fi
if [ "${1:-}" == 3 ]; then
# run test_1 and success_1 in chain
# success_1 MUST NOT be ran because test_1 makes non-zero status
# in fact, success_1 will be ran =(
test_1 && success_1
fi
Testing.
$ bash /test.sh 1; echo "status: ${?}"
+ '[' 1 == 1 ']'
+ return_1
+ return 22
status: 22
Works as expected.
$ bash /test.sh 2; echo "status: ${?}"
+ '[' 2 == 1 ']'
+ '[' 2 == 2 ']'
+ test_1
+ return_1
+ return 22
status: 22
Everything is right. The line "echo 'after';" haven't called.
$ bash /test.sh 3; echo "status: ${?}"
+ '[' 3 == 1 ']'
+ '[' 3 == 2 ']'
+ '[' 3 == 3 ']'
+ test_1
+ return_1
+ return 22
+ echo after
after
+ success_1
+ echo success
success
status: 0
Completely NOT right. :(
1. The line "echo 'after';" have called.
2. The function "success_1" have called as well.
Really, what's going on in this case?
UPD Manual refferences.
http://man7.org/linux/man-pages/man1/bash.1.html

You've fallen into the classic trap of using set -e. Do carefully read through Why doesn't set -e (or set -o errexit, or trap ERR) do what I expected?
From the GNU bash documentation for set -e
The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test in an if statement, part of any command executed in a && or || list except the command following the final && or ||
What do you think happens for this code?
#!/usr/bin/env bash
set -e
test -d nosuchdir && echo no dir
echo survived
Run it for yourself and observe why set -e has no effect on commands run with && or ||. The same has happened for the 3rd case you've reported. Even though the function test_1 is returning a non-zero exit code, using another function as part of && has forced the shell to ignore the errorexit option set.
Better to avoid set -e and use your own error check added. In this case use the function's return code in the if condition and negate the result
if ! test_1; then
success_1
fi
Also read through Raise error in a Bash script which has some well written answers on how to error handling the best way in shell.

Why is grep "-c" with 0 count exits program with status code -1

why is that when grep -c returns 0, script fails with '-1' exit code. This happens only when set -o errexit is set.
Copy/Paste on a bash shell
cat <<'EOT' > /tmp/foo.sh
#!/usr/bin/env bash
function bash_traceback() {
local lasterr="$?"
set +o xtrace
local code="-1"
local bash_command=${BASH_COMMAND}
echo "Error in ${BASH_SOURCE[1]}:${BASH_LINENO[0]} ('$bash_command' exited with status $lasterr)"
if [ ${#FUNCNAME[#]} -gt 2 ]; then
# Print out the stack trace described by $function_stack
echo "Traceback of ${BASH_SOURCE[1]} (most recent call last):"
for ((i=0; i < ${#FUNCNAME[#]} - 1; i++)); do
local funcname="${FUNCNAME[$i]}"
[ "$i" -eq "0" ] && funcname=$bash_command
echo -e " $i: ${BASH_SOURCE[$i+1]}:${BASH_LINENO[$i]}\t$funcname"
done
fi
echo "Exiting with status ${code}"
exit "${code}"
}
test_redirecting_of_stdout_stderr() {
# Exit program when error found
set -o errexit
# Exit program when undefined variable is being used
set -o nounset
local up_count
up_count=$(ls | grep -c NOTHING_MATCHED)
echo "up_count: $up_count"
}
# provide an error handler whenever a command exits nonzero
trap 'bash_traceback' ERR
set -o errtrace
test_redirecting_of_stdout_stderr
EOT
bash /tmp/foo.sh
Output
debian:~/my-mediawiki-docker$ bash /tmp/foo.sh
Error in /tmp/foo.sh:31 ('up_count=$(ls | grep -c NOTHING_MATCHED)' exited with status 255)
Traceback of /tmp/foo.sh (most recent call last):
0: /tmp/foo.sh:31 up_count=$(ls | grep -c NOTHING_MATCHED)
1: /tmp/foo.sh:40 test_redirecting_of_stdout_stderr
Exiting with status -1
debian:~/my-mediawiki-docker$

grep reports "failure" if it fails to find any matching lines. Here's man grep:
EXIT STATUS
The exit status is 0 if selected lines are found, and 1 if not found.
If you want to allow a command to exit with non-zero without terminating the script during errexit, use || true:
up_count=$(ls | grep -c NOTHING_MATCHED) || true

Setting this option assumes that any non-zero exit status from a command is a fatal error. That is not the case with grep, which uses a non-zero exit status simply to indicate a failure to match. This allows you to write code like
if grep "$pattern" file.txt; then
echo "Found a match"
else
echo "Found no match"
fi
errexit specifically ignores the exit status of a command used in an if condition like the above, but cannot know whether a line like
up_count=$(ls | grep -c NOTHING_MATCHED)
is "allowed" to have a non-zero exit status. The workaround is to protect such commands with
# As an aside, see http://mywiki.wooledge.org/ParsingLs
up_count=$(ls | grep -c NOTHING_MATCHED) || :
In general, it is better to do your own error checking than to rely on errexit; see http://mywiki.wooledge.org/BashFAQ/105 for more information.

Abort bash script using a function

I try to make a function which can interrupt the script execution (due to fatal error):
quit() {
echo -e "[ERROR]" | log
exit 1
}
Call example:
if [ "$#" -eq 1 ]; then
# Do stuff
else
echo -e "function getValue: wrong parameters" | log
quit
fi
Function quit is called (echo in the logfile) but the script keeps going. I've read that exit only terminate the subshell (is that true?) which means that it terminates the quit function but not the entire script.
For now, I prefer not use return code in quit method as it implies a lot of code refactoring.
Is there a way to stop the script from the quit function?
EDIT:
full example of a case where the error appears:
#!/bin/bash
logfile="./testQuit_log"
quit() {
echo "quit" | log
exit 1
}
log() {
read data
echo -e "$data" | tee -a "$logfile"
}
foo() {
if [ "$#" -ne 1 ]; then
echo "foo error" | log
quit
fi
echo "res"
}
rm $logfile
var=`foo p1 p2`
var2=`foo p1`
echo "never echo that!" | log
EDIT2:
it works correctly when I switch these lines:
var=`foo p1 p2`
var2=`foo p1`
with
var= foo p1 p2
var2= foo p1
Any explanation? Is that because of the subshell?

As it has been outlined in the question's comment section, using exit in a subshell will only exit the subshell and it is not easy to work around this limitation. Luckily, exiting from a subshell or even a function in the same shell is not the best idea anyway:
A good pattern to solve the problem of handling an error on a lower level (like a function or subshell) in a language without exceptions is to return the error instead of terminating the program directly from the lower level:
foo() {
if [ "$#" -ne 1 ]; then
echo "foo error" | log
return 1
else
echo "res"
# return 0 is the default
fi
}
This allows control flow to return to the highest level even on error, which is generally considered a good thing (and will incredibly ease debugging complex programs). You can use your function like this:
var=$( foo p1 p2 ) || exit 1
var2=$( foo p1 ) || exit 1
Just to be clear, the || branch is not entered if the assignment fails (it won't), but if the command line inside the command substitution ($( )) returns a non-zero exit code.
Note that $( ) should be used for command substitution instead of backticks, see this related question.

Looking at a debug of the script shows the problem. var=`foo p1 p2` forces execution of foo in a subshell (note: the increase in level from + to ++ at the time of the call below) Execution of the script proceeds in a subshell until exit 1 is reached. exit 1 effectively exits the subshell returning to the primary script.
$ bash -x exitstuck.sh
+ logfile=./testQuit_log
+ rm ./testQuit_log
++ foo p1 p2 # var=`foo p1 p2` enters subshell '+ -> ++'
++ '[' 2 -ne 1 ']'
++ echo 'foo error'
++ log # log() called
++ read data
++ echo -e 'foo error'
++ tee -a ./testQuit_log
++ quit # quit() called
++ echo quit
++ log
++ read data
++ echo -e quit
++ tee -a ./testQuit_log
++ exit 1 # exit 1 exits subshell, note: '++ -> +'
+ var='foo error
quit'
++ foo p1
++ '[' 1 -ne 1 ']'
++ echo res
+ var2=res
+ log
+ read data
+ echo 'never echo that!'
+ echo -e 'never echo that!'
+ tee -a ./testQuit_log
never echo that!
You can use this to your advantage to accomplish what it is you are trying to do. How? When exit 1 exits the subshell, it does so returning the exit code 1. You can test the exit code in your main script and exit as you intend:
var=`foo p1 p2`
if [ $? -eq 1 ]; then
exit
fi
var2=`foo p1`
Running in debug again shows the intended operation:
$ bash -x exitstuck.sh
+ logfile=./testQuit_log
+ rm ./testQuit_log
++ foo p1 p2
++ '[' 2 -ne 1 ']'
++ echo 'foo error'
++ log
++ read data
++ echo -e 'foo error'
++ tee -a ./testQuit_log
++ quit
++ echo quit
++ log
++ read data
++ echo -e quit
++ tee -a ./testQuit_log
++ exit 1
+ var='foo error
quit'
+ '[' 1 -eq 1 ']'
+ exit

Bash get exit status of command when 'set -e' is active?

I generally have -e set in my Bash scripts, but occasionally I would like to run a command and get the return value.
Without doing the set +e; some-command; res=$?; set -e dance, how can I do that?

From the bash manual:
The shell does not exit if the command that fails is [...] part of any command executed in a && or || list [...].
So, just do:
#!/bin/bash
set -eu
foo() {
# exit code will be 0, 1, or 2
return $(( RANDOM % 3 ))
}
ret=0
foo || ret=$?
echo "foo() exited with: $ret"
Example runs:
$ ./foo.sh
foo() exited with: 1
$ ./foo.sh
foo() exited with: 0
$ ./foo.sh
foo() exited with: 2
This is the canonical way of doing it.

as an alternative
ans=0
some-command || ans=$?

Maybe try running the commands in question in a subshell, like this?
res=$(some-command > /dev/null; echo $?)

Given behavior of shell described at this question it's possible to use following construct:
#!/bin/sh
set -e
{ custom_command; rc=$?; } || :
echo $rc

Another option is to use simple if. It is a bit longer, but fully supported by bash, i.e. that the command can return non-zero value, but the script doesn't exit even with set -e. See it in this simple script:
#! /bin/bash -eu
f () {
return 2
}
if f;then
echo Command succeeded
else
echo Command failed, returned: $?
fi
echo Script still continues.
When we run it, we can see that script still continues after non-zero return code:
$ ./test.sh
Command failed, returned: 2
Script still continues.

Use a wrapper function to execute your commands:
function __e {
set +e
"$#"
__r=$?
set -e
}
__e yourcommand arg1 arg2
And use $__r instead of $?:
if [[ __r -eq 0 ]]; then
echo "success"
else
echo "failed"
fi
Another method to call commands in a pipe, only that you have to quote the pipe. This does a safe eval.
function __p {
set +e
local __A=() __I
for (( __I = 1; __I <= $#; ++__I )); do
if [[ "${!__I}" == '|' ]]; then
__A+=('|')
else
__A+=("\"\$$__I\"")
fi
done
eval "${__A[#]}"
__r=$?
set -e
}
Example:
__p echo abc '|' grep abc
And I actually prefer this syntax:
__p echo abc :: grep abc
Which I could do with
...
if [[ ${!__I} == '::' ]]; then
...

BASH getopt command returns its own parameters instead of command line parameters

I am creating a BASH script and working with the BASH getopt command to parse command line arguments. Instead of returning the arguments provided on the command line, getopt returns the arguments that were supplied to the getopt command. I am not sure what could be going on, as the code that I have was working perfect, and seemingly out of nowhere, it has stoped working correctly (and no, I haven't updated anything or changed any code or environment settings). I can't use getopts (with the extra 's') because it is, for some unknown reason, not installed on the machine that will be running this script.
Even though the script is supplied with zero command line arguments, the getopt command is for some reason returning all of the arguments that I have supplied, minus the -o flag, instead of the expected -- value indicating the end of the options. The code that I have is as follows:
SHORT_OPTS=":hvso:l:d:n:p:t:"
LONG_OPTS="help,version,submit-job,output:,library:,job-dir:"
LONG_OPTS="${LONG_OPTS},num-nodes:,num-procs:,max-time:"
OPTS=$(getopt -o "${SHORT_OPTS}" -l "${LONG_OPTS}" -a -n "${PROG_NAME}" -- "${#}")
# Check for invalid command line options and arguments
if [[ ${?} -ne ${SUCCESS} ]] ; then
echo -e "${PROG_NAME}: error: Invalid option or argument\n" >&2
usage ; exit ${FAILURE}
else
echo "BEFORE $#"
eval set -- ${OPTS}
echo "AFTER $#"
fi
# Process command line options and their arguments
while true ; do
case "${1}" in
-h | --help)
# Display script usage information and exit
usage ; exit ${SUCCESS} ;;
-v | --version)
# Display script version information and exit
echo "${PROG_NAME} v${PROG_VERSION}" ; exit ${SUCCESS} ;;
-s | --submit-job)
# Enable automatic submission of the Moab job
JOB_AUTO_SUBMIT="${PREF_YES}" ; shift 1 ;;
-o | --output)
# Set the base name for output file names
TARGET="${2}" ; shift 2 ;;
-l | --library)
# Set the library to use for NWChem atomic configurations
NW_LIB="${2}" ; shift 2 ;;
-d | --job-dir)
# Ensure the specified directory for the Moab job exists
if [[ -e "${2}" ]] ; then
JOB_WORK_DIR=$(resolvePath "${2}") ; shift 2
else
echo -e "${PROG_NAME}: error: -d ${2}: No such directory\n"
usage ; exit ${FAILURE}
fi ;;
-n | --num-nodes)
# Ensure the number of compute nodes is greater than zero
if positiveInt "${2}" ; then
JOB_NODES="${2}" ; shift 2
else
echo -n "${PROG_NAME}: error: -n ${1}: Number of "
echo -e "job nodes must be a positive integer\n"
usage ; exit ${FAILURE}
fi ;;
-p | --num-procs)
# Ensure the number of processors per node is greater than zero
if positiveInt "${2}" ; then
JOB_PROCS="${2}" ; shift 2
else
echo -n "${PROG_NAME}: error: -p ${2}: Number of "
echo -e "processors per node must be a positive integer\n"
usage ; exit ${FAILURE}
fi ;;
-t | --max-time)
# Ensure the maximum job runtime is in the correct format
if [[ "${2}" == [0-9][0-9]:[0-9][0-9]:[0-9][0-9] ]] ; then
JOB_MAX_TIME="${2}" ; shift 2
else
echo -n "${PROG_NAME}: error: -t ${2}: Invalid time "
echo -e "format, please use hh:mm:ss format\n"
usage ; exit ${FAILURE}
fi ;;
--)
# No more options to process
shift ; break ;;
esac
done
# Check to see if POTCAR and CONTCAR locations were specified
if [[ ${#} -eq 2 ]] ; then
# Regular expressions for identifying POTCAR and CONTCAR files
PCAR_REGEX="[Pp][Oo][Tt][Cc][Aa][Rr]"
CCAR_REGEX="[Cc][Oo][Nn][Tt][Cc][Aa][Rr]"
# Attempt to identify POTCAR and CONTCAR argument ordering
if [[ ${1} =~ ${PCAR_REGEX} && ${2} =~ ${CCAR_REGEX} ]] ; then
POTCAR="${1}" ; CONTCAR="${2}" ; shift 2
else
POTCAR="${2}" ; CONTCAR="${1}" ; shift 2
fi
# Accept exactly two or zero command line arguments
elif [[ ${#} -ne 0 ]] ; then
echo "${PROG_NAME}: error: ${#}: Invalid argument count, expected [2|0]"
echo "$#"
exit ${FAILURE}
fi
Given this code, and running the application, I get the following output:
BEFORE
AFTER -- :hvso:l:d:n:p:t: -l help,version,submit-job,output:,library:,job-dir:,num-nodes:,num-procs:,max-time: -a -n vasp2nwchem --
vasp2nwchem: error: 7: Invalid argument count, expected [2|0]
:hvso:l:d:n:p:t: -l help,version,submit-job,output:,library:,job-dir:,num-nodes:,num-procs:,max-time: -a -n vasp2nwchem --
So, the code enters the while loop portion of the code, jumps to the last case, and shifts off the first --, leaving me with all of the arguments that I supplied to getopt, minus the -o flag.
Any light that anyone could shed on this conundrum would be immensely appreciated, because it is seriously about to send me over the edge, especially since this code was functional no less than thrity minutes ago, and has now stopped working entirely!!!

I don't see anything wrong. I have GNU getopt installed as /usr/gnu/bin/getopt (and BSD getopt in /usr/bin), so this script (chk.getopt.sh) is almost equivalent to the start of yours, though I do set the PROG_NAME variable. This is more or less the SSCCE (Short, Self-Contained, Complete Example) for your rather substantial script.
#!/bin/bash
PROG_NAME=$(basename $0 .sh)
SHORT_OPTS=":hvso:l:d:n:p:t:"
LONG_OPTS="help,version,submit-job,output:,library:,job-dir:"
LONG_OPTS="${LONG_OPTS},num-nodes:,num-procs:,max-time:"
OPTS=$(/usr/gnu/bin/getopt -o "${SHORT_OPTS}" -l "${LONG_OPTS}" -a -n "${PROG_NAME}" -- "$#")
# Check for invalid command line options and arguments
if [[ ${?} -ne ${SUCCESS} ]] ; then
echo -e "${PROG_NAME}: error: Invalid option or argument\n" >&2
usage ; exit ${FAILURE}
else
echo "BEFORE $#"
eval set -- ${OPTS}
echo "AFTER $#"
fi
When I run it, this is the output:
$ bash -x chk.getopt.sh -ooutput -nnumber -pperhaps -ppotato -- -o oliphaunt
++ basename chk.getopt.sh .sh
+ PROG_NAME=chk.getopt
+ SHORT_OPTS=:hvso:l:d:n:p:t:
+ LONG_OPTS=help,version,submit-job,output:,library:,job-dir:
+ LONG_OPTS=help,version,submit-job,output:,library:,job-dir:,num-nodes:,num-procs:,max-time:
++ /usr/gnu/bin/getopt -o :hvso:l:d:n:p:t: -l help,version,submit-job,output:,library:,job-dir:,num-nodes:,num-procs:,max-time: -a -n chk.getopt -- -ooutput -nnumber -pperhaps -ppotato -- -o oliphaunt
+ OPTS=' -o '\''output'\'' -n '\''number'\'' -p '\''perhaps'\'' -p '\''potato'\'' -- '\''-o'\'' '\''oliphaunt'\'''
+ [[ 0 -ne '' ]]
+ echo 'BEFORE -ooutput' -nnumber -pperhaps -ppotato -- -o oliphaunt
BEFORE -ooutput -nnumber -pperhaps -ppotato -- -o oliphaunt
+ eval set -- -o ''\''output'\''' -n ''\''number'\''' -p ''\''perhaps'\''' -p ''\''potato'\''' -- ''\''-o'\''' ''\''oliphaunt'\'''
++ set -- -o output -n number -p perhaps -p potato -- -o oliphaunt
+ echo 'AFTER -o' output -n number -p perhaps -p potato -- -o oliphaunt
AFTER -o output -n number -p perhaps -p potato -- -o oliphaunt
$
This all looks correct; the options before the double dash have been split from their arguments, and the ones after the double dash are OK.
So, it is not obvious that there's a problem with your code. Even with an empty string as the program name, it worked OK for me.
May be you should show the output of this much of the script on your machine?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash source doesn't bail out on errexit, any possible reason? - bash

Related

Why chain commands with "and" operator ("&&") don't stop on non-zero result with enabled "errexit"?

Why is grep "-c" with 0 count exits program with status code -1

Abort bash script using a function

Bash get exit status of command when 'set -e' is active?

BASH getopt command returns its own parameters instead of command line parameters

Categories

Resources