Trap ERR only works once? - bash

I'm writing a script that waits until a bunch of directories exist before starting a service. It basically consists of an infinite loop that breaks at the end, or continues if any of the needed directories aren't found. Simplified, the algorithm itself looks like
loop_while_false() {
trap continue ERR
while true; do
printf .
sleep 1
false
break
done
trap ERR
echo
}
(I'm aware I could accomplish this particular behavior with until or while !, but that's tangential to the question.)
The first time I run this, I get the expected output of a long series of dots until I hit ^c. But if I run it again, I just get one dot. If I don't hit ^c, but redefine the loop to be finite, then, in a new shell, the trap works multiple times. But why is ^c breaking the trap for the life of the shell? Even weirder (I spent extra time on this while StackExchange was upgrading hardware) if you write the function this way, it doesn't break:
loop_while_noread() {
trap continue ERR
while true; do
printf .
read -t1 -n1
break
done
trap ERR
echo
}
Unless you run loop_while_false first, and kill it with ^c. Here's an example session:
$ trap -p
trap -- 'shell_session_update' EXIT
$ loop_while_noread
...q
$ loop_while_noread
...r
$ loop_while_noread
....^C
$ loop_while_noread
..q
$ trap -p
trap -- 'shell_session_update' EXIT
trap -- 'continue' ERR
$ loop_while_false
.....^C
$ trap -p
trap -- 'shell_session_update' EXIT
trap -- 'continue' ERR
$ loop_while_false
.
$ loop_while_noread
.
It as if there's a weird relationship between sleep or false and trap. Is this expected behavior?
I'm using bash 3.2.57(1)-release on OS X El Capitan.

It's certainly a bug. You can work around it by changing the sleep command to:
sleep 1||:
I can't find any bug reports, but I did a little poking against 4.3.30(1) with gdb, and established that after the sleep 1 returns with an error (because it was interrupted), something fails in the execution of the trap ERR command, with the result that the SIG_INPROGRESS flag is never reset for ERR. That flag suppresses future execution of trap ERR, even though it is still enabled.
I didn't get into the part where "something fails in the execution"; when gdb steps over parse_and_execute (trap_command, tag, flags);, the function never returns and I end up back at the bash prompt, so I suppose that a longjmp happens at some point. (The SIG_INPROGRESS flag would be reset after parse_and_execute returns, so the fact that the function doesn't return explains why the flag is not reset.)
All this action is in trap.c inside _run_trap_internal.

Related

Make 'trap ERR' working inside bash functions with 'return' (or in any subshells)

I'm trying to use trap ERR in my scripts. But:
function hmmm() {
trap 'exit 10' ERR
echo 12>/SOME/NONEXISTING/FILE
# some commands that must not be done if previous has failed
echo "THAT MUST NOT BE PRINTED" >&2
return 5
}
echo ok1
a=$(hmmm) || status="$?"
echo "function returns: $status"
Prints
ok1
test2.sh: line 3: /SOME/NONEXISTING/FILE: No such file or directory
THAT MUST NOT BE PRINTED
function returns: 5
The same behavior with any combination of set -e, set -E, trap on top level etc. I always need to handle return code of function - so, as I undertand, I can't use trap ERR in my scripts at all - I willn't ever working. Am I right, or there is working method to enable trap ERR inside functions, subshells and sourced libraries and keep constructions like
a=$( ...somecode ... ) || result="$?"
working? Or, that more important, in example above make bash exit on error inside function ALWAYS, not depending on calling method.
Added:
In fact, I want to know is there any working way to BE SURE that errors are trapped inside functions, subshells and sourced code. Because my functions and libraries can be used by other peoples - so, I can't control how this functions are called, and, to be honest, even if I can - I will never use some behavior inside function, that can be accidentally and silently changed from outside.
trap isn't executed since it's a part of ||.
Change:
a=$(hmmm) || status="$?"
to:
a=$(hmmm)
status="$?"
From bash manual:
The ERR trap is not executed if the failed command is part of the
command list immediately following a while or until keyword, part of
the test in an if statement, part of a command executed in a && or ||
list except the command following the final && or ||, any command in a
pipeline but the last, or if the command's return value is being
inverted using !. These are the same conditions obeyed by the errexit
(-e) option.

Bash ERR signal not trapped in procedures?

Consider the following code:
#!/bin/bash
trap 'echo "ERROR" && exit 2' ERR
proc(){
false
return 0
}
echo START
proc
echo END
The above shows output
START
END
but I would expect the false command to trigger the trap procedure for the ERR signal.
If I put false instead of the call to proc the signal is triggered and output
START
ERROR
is shown as expected. If I put the trap command again at the beginning of proc procedure, it is again being correctly trapped.
How is it so that trapping only works outside of procedures, unless trap command is repeated in the procedure? I could not find any documentation on that.
I got the same behavior on bash versions 3.1.0, 3.2.25, 4.1.17 .
Quoting man bash on FUNCTIONS:
the ERR trap is
not inherited unless the -o errtrace shell option has been enabled.
So, just add
set -o errtrace
to the script and it starts working.

Trap syntax issue in bash

I intend to use trap to execute some clean up code in case of a failure. I have the following code, but it seems to be have some syntactical issues.
#!/bin/bash
set -e
function handle_error {
umount /mnt/chroot
losetup -d $LOOP_DEV1 $LOOP_DEV2
}
trap "{ echo \"$BASH_COMMAND failed with status code $?\"; handle_error; }" ERR
Does any one see an issue with the way the trap has been written. In case of an error the trap does get executed fine but it also throws another unwanted error message below.
/root/myscript.sh: line 60: } ERR with status code 0: command not found
##line 60 is that line of code that exited with a non zero status
How do I write it correctly to avoid the error message? Also what if I had to send arguments $LOOP_DEV1 and $LOOP_DEV2 from the main script to the trap and then to the handle_error function? Right now they are exported as environment variables in the main script. I did some search for trap examples but I couldn't get something similar.
EDIT
I changed the shebang from /bin/sh to /bin/bash. As /bin/sh was already symlinked to bash I did not expect unicorns nor did I see any.
That trap call is creating an interesting recursion, because $BASH_COMMAND (and $?) are being expanded when the trap command executes. However, $BASH_COMMAND at that point is the trap command itself, textually including $BASH_COMMAND (and some quotes and semicolons). Actually figuring out what the command to be executed when the trap fires is an interesting study, but it's not necessary to fix the problem, which you can do like this:
trap '{ echo "$BASH_COMMAND failed with status code $?"; handle_error; }' ERR
Note that replacing " with ' not only avoids immediate parameter expansion, it also avoids have to escape the inner "s.

Terminating a shell function non-interactively

Is there a way to terminate a shell function non-interactively without killing the shell that's running it?
I know that the shell can be told how to respond to a signal (e.g. USR1), but I can't figure out how the signal handler would terminate the function.
If necessary you may assume that the function to be terminate has been written in such a way that it is "terminable" (i.e. by declaring some suitable options).
(My immediate interest is in how to do this for zsh, but I'm also interested in knowing how to do it for bash and for /bin/sh.)
EDIT: In response to Rob Watt's suggestion:
% donothing () { echo $$; sleep 1000000 }
% donothing
47139
If at this point I hit Ctrl-C at the same terminal that is running the shell, then the function donothing does indeed terminate, and I get the command prompt back. But if instead, from a different shell session, I run
% kill -s INT 47139
...the donothing function does not terminate.
Maybe I'm not fully understand what you want, but maybe something like this?
trap "stopme=1" 2
function longcycle() {
last=$1
for i in 1 2 3 4 5
do
[ ! -z "$stopme" ] && return
echo $i
sleep 1
done
}
stopme=""
echo "Start 1st cycle"
longcycle
echo "1st cycle end"
echo "2nd cycle"
stopme=""
longcycle
echo "2nd cycle end"
The above is for bash. Run it, and try press CTRL-C.
Or for not interactively, Save the above as for example my_command, then try:
$ ./my_command & #into background
$ kill -2 $! #send CTRL-C to the bg process
EDIT:
Solution for your sleep example in the bash:
$ donothing() { trap '[[ $mypid ]] && trap - 2 && kill $mypid' 0 2; sleep 1000000 & mypid=$!;wait; }
$ donothing
when you send a signal from another terminal will terminate it. Remeber, signal '0' je "normal end of the process". Semantic name: 0=EXIT, 2=INT... etc.
and remeber too, than signals are sending to processes not to the functions. In your example, the process is the current (interactive shell), so must use the wait trick to get something interrupt-able... Not a nice solution - but the only way when want interrupt something what is running in interactive shell (not a forked one) from the another terminal...

Concurrent logging in bash scripts

I am currently trying to figure out why a shell script fails at concurrent logging every once in a while.
I have a shell function like the following:
log()
{
local l_text=$1
local l_file="/path/to/logs/$(date +%Y%m%d)_script.log"
local l_line="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) ${l_text}"
echo ${l_line} >> ${l_file}
}
Now every once in a while this fails with a syntax error:
/path/to/script.sh: command substitution: line 163: syntax error near unexpected token `)'
/path/to/script.sh: command substitution: line 163: `hostname -s) ${l_text}'
The problem is, that I have multiple sub-processes, which each want to log as well as send traps (during which logging is performed as well). I haved debugged the problem and found out, that this happens, when the function is entered three times simultaneously. First the main process enters, then the child. After the date part of l_text is executed, main get's interrupted by a trap which is caused by child and in this trap tries to log something. The child and the trap finish their logging nicely, but then main is resumed after the trap and tries to execute the hostname part (presumedly) and fails with this error.
So it seems like main does not like being put to sleep while it is producing the $(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) ${l_text} part of the log statement and cannot resume nicely. I was assuming this should work fine, because I am just using local variables and thread safe output methods.
Is this a general concurrency problem I am running into here? Or is this very specific for the trap mechanism in bash scripts? I know about the commodities of SIGNAL handling in C, so I am aware that only certain operations are allowed in SIGNAL handlers. However I am not aware if the same precautions also apply when handling SIGNALs in a bash script. I tried to find documentation on this, but none of the documents I could find gave any indications of problems with SIGNAL handling in scripts.
EDIT:
Here is an actuall simple script that can be used to replicate the problem:
#!/bin/bash
log() {
local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1"
echo $text >> /dev/null
}
sub_process() {
while true; do
log "Thread is running"
kill -ALRM $$
sleep 1
done
}
trap "log 'received ALRM'" ALRM
sub_process &
sub_process_pid=$!
trap "kill ${sub_process_pid}; exit 0" INT TERM
while true; do
log "Main is running"
sleep 1
done
Every once in a while this script will get killed because of a syntax error in line 5. Line 5 is echo $text >> /dev/null, but since the syntax error also mentiones the hostname command, similar to the one I posted above, I am assuming there is an of-by-one error as well and the actual error is in line 4, which is local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1".
Does anybody know what to do with the above script to correct it? I alread tried moving out the construction of the string into some temporary variables:
log() {
local thedate=$(date +'%Y-%m-%d %H:%M:%S')
local thehostname=$(hostname -s)
local text="${thedate} ${thehostname} $1"
echo $text >> /dev/null
}
This way the error appears less frequently, but it still is present, so this is not a real fix.
I would say that this is definitely a bug in bash and I would encourage you to report it to the bash developers. At the very least, you should never get a syntax error for what is syntactically correct code.
For the record, I get the same results as you with GNU bash, version 4.2.10(1)-release (x86_64-pc-linux-gnu).
I found that you can workaround the problem by not calling a function in your trap handler. E.g. replacing
trap "log 'received ALRM'" ALRM
with
trap "echo $(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) received ALRM" ALRM
makes the script stable for me.
I know about the commodities of SIGNAL handling in C, so I am aware
that only certain operations are allowed in SIGNAL handlers. However I
am not aware if the same precautions also apply when handling SIGNALs
in a bash script.
I guess you shouldn't have to take special precautions but apparently in practice you do. Given that the problem seem to go away without the function call, I'm guessing that something in bash either isn't re-entrant where it should be or fails prevent re-entry in the first place.

Resources