How to get the real line number of a failing Bash command? - bash

In the process of coming up with a way to catch errors in my Bash scripts, I've been experimenting with "set -e", "set -E", and the "trap" command. In the process, I've discovered some strange behavior in how $LINENO is evaluated in the context of functions. First, here's a stripped down version of how I'm trying to log errors:
#!/bin/bash
set -E
trap 'echo Failed on line: $LINENO at command: $BASH_COMMAND && exit $?' ERR
Now, the behavior is different based on where the failure occurs. For example, if I follow the above with:
echo "Should fail at: $((LINENO + 1))"
false
I get the following output:
Should fail at: 6
Failed on line: 6 at command: false
Everything is as expected. Line 6 is the line containing the single command "false". But if I wrap up my failing command in a function and call it like this:
function failure {
echo "Should fail at $((LINENO + 1))"
false
}
failure
Then I get the following output:
Should fail at 7
Failed on line: 5 at command: false
As you can see, $BASH_COMMAND contains the correct failing command: "false", but $LINENO is reporting the first line of the "failure" function definition as the current command. That makes no sense to me. Is there a way to get the line number of the line referenced in $BASH_COMMAND?
It's possible this behavior is specific to older versions of Bash. I'm stuck on 3.2.51 for the time being. If the behavior has changed in later releases, it would still be nice to know if there's a workaround to get the value I want on 3.2.51.
EDIT: I'm afraid some people are confused because I broke up my example into chunks. Let me try to clarify what I have, what I'm getting, and what I want.
This is my script:
#!/bin/bash
set -E
function handle_error {
local retval=$?
local line=$1
echo "Failed at $line: $BASH_COMMAND"
exit $retval
}
trap 'handle_error $LINENO' ERR
function fail {
echo "I expect the next line to be the failing line: $((LINENO + 1))"
command_that_fails
}
fail
Now, what I expect is the following output:
I expect the next line to be the failing line: 14
Failed at 14: command_that_fails
Now, what I get is the following output:
I expect the next line to be the failing line: 14
Failed at 12: command_that_fails
BUT line 12 is not command_that_fails. Line 12 is function fail {, which is somewhat less helpful. I have also examined the ${BASH_LINENO[#]} array, and it does not have an entry for line 14.

For bash releases prior to 4.1, a special level of awful, hacky, performance-killing hell is needed to work around an issue wherein, on errors, the system jumps back to the function definition point before invoking an error handler.
#!/bin/bash
set -E
set -o functrace
function handle_error {
local retval=$?
local line=${last_lineno:-$1}
echo "Failed at $line: $BASH_COMMAND"
echo "Trace: " "$#"
exit $retval
}
if (( ${BASH_VERSION%%.*} <= 3 )) || [[ ${BASH_VERSION%.*} = 4.0 ]]; then
trap '[[ $FUNCNAME = handle_error ]] || { last_lineno=$real_lineno; real_lineno=$LINENO; }' DEBUG
fi
trap 'handle_error $LINENO ${BASH_LINENO[#]}' ERR
fail() {
echo "I expect the next line to be the failing line: $((LINENO + 1))"
command_that_fails
}
fail

BASH_LINENO is an array. You can refer to different values in it: ${BASH_LINENO[1]}, ${BASH_LINENO[2]}, etc. to back up the stack. (Positions in this array line up with those in the BASH_SOURCE array, if you want to get fancy and actually print a stack trace).
Even better, though, you can just inject the correct line number in your trap:
failure() {
local lineno=$1
echo "Failed at $lineno"
}
trap 'failure ${LINENO}' ERR
You might also find my prior answer at https://stackoverflow.com/a/185900/14122 (with a more complete error-handling example) interesting.

That behaviour is very reasonable.
The whole picture of the call stack provides comprehensive information whenever an error occurs. Your example had demonstrated a good error message; you could see where the an error actually occurred and which line triggered the function, etc.
If the interpreter/compiler can't precisely indicate where the error actually occurs, you could be more easily confused.

Related

Variables comparision

I want to write a script with several commands and get the combination result of all them:
#!/bin/bash
command1; RET_CMD1=$(echo $?)
command2; RET_CMD2=$(echo $?)
command3; RET_CMD3=$(echo $?)
\#result is error if any of them fails
\#could I do something like:
RET=RET_CMD1 && RET_CMD2 && RET_CMD3 *<- this is the part that I can't remember how I did in the past..*
echo $RET
Thanks for your help!
I think you're just looking for this:
if ! { command1 && command2 && command3; }; then
echo "one of the commands failed"
fi
The result of the block { command1 && command2 && command3; } will be 0 (success) only if all of the commands exited successfully. The semicolon is needed if the block is all written on one line.
There is no need to save the return codes to variables, or even to refer to $?, since if works based on the return code of a command (or list of commands).
So to think about this...
we want to return 0 on success... or some other positive integer if an error occurred with one of the commands.
If no error occurred with any 3, they would all return 0, which means you would also return 0 in your script. Some simple addition can resolve this.
RET=$[RET_CMD1 + RET_CMD2 + RET_CMD3] # !
echo $RET
You can also replace the first line (!) with logical or operator, as you mentioned.
RET=$[RET_CMD1 | RET_CMD2 | RET_CMD3]
Note that addition and logical or are different in nature. But you seemed to want the logical or...
Disadvantages of this setup: Not being able to trace where the error occurred from the return value. Tracing errors from either 3 commands will need to rely on other error output generated. (This is just a forewarning.)

Bash - Log all commands and exit codes in a script

I have a long (~2,000 lines) script that I'm trying to log for future debugging. Right now I have:
function log_with_time()
{
while read a; do
echo `date +'%H:%M:%S.%4N '` " $a" >> $LOGFILE
done
}
exec 7> >(log_with_time)
BASH_XTRACEFD=7
PS4=' exit($?)ln:$LINENO: '
set -x
echo "helloWorld 1"
which gives me very nice logging for any and all commands that are run:
15:18:03.6359 exit(0)ln:28: echo 'helloWorld 1'
The issue that I'm running into is that xtrace seems to be asynchronous. With longer scripts, the log times fall behind the actual time the commands are called, and the exit code doesn't match the logged command.
There has to be a better way to do this but I'd be happy if I could just synchronize xtrace.
...
tldr: How can I generally log the time, command and exit code for all commands in a script?
...
(First time posting, feedback appreciated)
UPDATE:
exec {BASH_XTRACEFD}>>$LOGFILE
PS4=' time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
set -x
fail()
{
echo "fail" >> $LOGFILE
return 1
}
trap 'echo exit:$? >> $LOGFILE' DEBUG
fail
solves all of my synchronization issues. exit codes and timestamps are working beautifully. My only issue now is one of formatting: the trap itself is getting reported by xtrace.
time:18:30:07.6080 ln:27: fail
time:18:30:07.6089 ln:12: echo fail
fail
time:18:30:07.6126 ln:13: return 1
time:18:30:07.6134 ln:28: echo exit:1
exit:1
I've tried setting +x in the trap but then set +x gets logged. If I could find a way to omit one line from xtrace, this log would be perfect.
The async behavior is coming from the process substitution -- anything in >(...) is running in its own subshell on the other end of a FIFO. Since it's a separate process, it's inherently unsynchronized.
You don't need log_with_time here at all, though, and so you don't need BASH_XTRACEFD redirecting to a process substitution in the first place. Consider:
# aside: $(date ...) has a *huge* amount of performance overhead here. Personally, I'd
# advise against using it, unless you really need all that precision; $SECONDS will
# be orders-of-magnitude cheaper.
PS4=' prior-exit:$? time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
...thereafter:
$ true
prior-exit:0 time:16:01:17.2509 ln:28: true
$ false
prior-exit:0 time:16:01:18.4242 ln:29: false
$ false
prior-exit:1 time:16:01:19.2963 ln:30: false
$ true
prior-exit:1 time:16:01:20.2159 ln:31: true
$ true
prior-exit:0 time:16:01:20.8650 ln:32: true
Per conversation with Charles Duffy in the comments to whom all credit is given:
Process substitution >(...) is asynchronous, allowing the log writing to fall behind and out of sync with the xtrace.
Instead use:
exec {BASH_XTRACEFD}>>$LOGFILE
PS4=' time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
for synchronously logging the time and line.
Furthermore, xtrace is triggered before running the command, making it a bad candidate for capturing exit codes. Instead use:
trap 'echo exit:$? >> $LOGFILE' DEBUG
to log the exit codes of each command since trap triggers on command completion. Note that this won't report on every step in a function call like xtrace will. (could use some help with the phrasing here)
No solution yet for omitting the trap from xtrace, but it's good enough:
LOGFILE="SomeFile.log"
exec {BASH_XTRACEFD}>>$LOGFILE
PS4=' time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
set -x
fail() # test function that returns 1
{
echo "fail" >> $LOGFILE
return 1
}
success() # test function that returns 0
{
echo "success" >> $LOGFILE
return 0
}
trap 'echo $? >> $LOGFILE' DEBUG
fail
success
echo "complete"
yields:
time:14:10:22.2686 ln:21: trap 'echo $? >> $LOGFILE' DEBUG
time:14:10:22.2693 ln:23: echo 0
0
time:14:10:22.2736 ln:23: fail
time:14:10:22.2741 ln:12: echo fail
fail
time:14:10:22.2775 ln:13: return 1
time:14:10:22.2782 ln:24: echo 1
1
time:14:10:22.2830 ln:24: success
time:14:10:22.2836 ln:17: echo success
success
time:14:10:22.2873 ln:18: return 0
time:14:10:22.2881 ln:26: echo 0
0
time:14:10:22.2912 ln:26: echo complete

Bash: How to get the call chain on errors?

I have a back trace function in bash that works well enough (code below) but the problem is that bash itself when it hits an error, it doesn't give a back trace or any kind of info that would help determine the caller, which can help in debugging the issue.
e.g.:
./c.sh: line 23: urgh: command not found
function backtrace () {
local deptn=${#FUNCNAME[#]}
for ((i=1; i<$deptn; i++)); do
local func="${FUNCNAME[$i]}"
local line="${BASH_LINENO[$((i-1))]}"
local src="${BASH_SOURCE[$((i-1))]}"
printf '%*s' $i '' # indent
echo "at: $func(), $src, line $line"
done
}
Is it possible to trap bash on such errors so I could call my own function to get output like this?
at: c(), ./c.sh, line 22
at: b(), ./c.sh, line 11
at: main(), ./b.sh, line 5
Update: final working version from suggestions and traceback trap on error:
function backtrace () {
local deptn=${#FUNCNAME[#]}
for ((i=1; i<$deptn; i++)); do
local func="${FUNCNAME[$i]}"
local line="${BASH_LINENO[$((i-1))]}"
local src="${BASH_SOURCE[$((i-1))]}"
printf '%*s' $i '' # indent
echo "at: $func(), $src, line $line"
done
}
function trace_top_caller () {
local func="${FUNCNAME[1]}"
local line="${BASH_LINENO[0]}"
local src="${BASH_SOURCE[0]}"
echo " called from: $func(), $src, line $line"
}
set -o errtrace
trap 'trace_top_caller' ERR
Absolutely -- this is exactly what error traps are for:
trap backtrace ERR
In the past, I vaguely recall finding it necessary to make that something more like trap 'backtrace "${#BASH_SOURCE[#]}" "${BASH_SOURCE[#]}" "${#BASH_LINENO[#]}" "${BASH_LINENO[#]}"' ERR to work around a bug (and reading array values off the function's argv); however, I don't remember at present just what that bug would be and which versions it impacted.

bash exit status always 0

I'm experiencing this weird issue where my exit status always return 0 even when it didn't execute successfully.
I want to output the exit status on my prompt with the following code:
function status() {
echo $?
}
export PS1="\$(status)>"
When I run this, I get the following output
0❯ pwd
/Users/tringuyen
0❯ ad
bash: ad: command not found
0❯ echo $?
127
clearly the second last command ad didn't return a 0 status code. However that's what I got from the prompt.
Does anyone know what might be going on here?
EDIT 6/20 11:57AM: The issue seems to be that $? is always 0 no matter what, except there was an error within the .bashrc file itself, which will cause it to return a value different from 0.
Does the following work for you with your bash version?
export PS1="\$?>"
I use the following in my $PS1:
PS1="\`if [ \$? = 0 ]; then echo \[\e[33m\]^_^\[\e[0m\]; else echo \[\e[31m\]\$? O_O\[\e[0m\]; fi\`"
Src: https://github.com/sanmiguel/dotfiles/blob/master/bash/bash_functions.symlink#L63
I also had similar problem but my function looked different. The problem was, I was missing semicolon ";" after VAR=$?
OLD:
function status() {
VAR=$?
echo $VAR
}
Always returned Zero no matter what.
NEW:
function status() {
VAR=$?;
echo VAR;
}
Now returned proper return value.
export PS1="\$(status)>"

Why do I get different bash script results when invoked with 'set -x', and how do I fix it?

I've found that the results of my bash script will change depending upon if I execute it with debugging or not (i.e. invoking set -x). I don't mean that I get more output, but that the result of the program itself differs.
I'm assuming this isn't the desired behavior, and I'm hoping that you can teach me how to correc this.
The bash script below is a contrived example, I tried reducing the logic from the script I'm investigating so that the problem can be easily reproducible and obvious.
#!/bin/bash
# Base function executes command (print working directory) stores the value in
# the destination and returns the status.
function get_cur_dir {
local dest=$1
local result
result=$((pwd) 2>&1)
status=$?
eval $dest="\"$result\""
return $status
}
# 2nd level function uses the base function to execute the command and store
# the result in the desired location. However if the base function fails it
# terminates the script. Yes, I know 'pwd' won't fail -- this is a contrived
# example to illustrate the types of problems I am seeing.
function get_cur_dir_nofail {
local dest=$1
local gcdnf_result
local status
get_cur_dir gcdnf_result
status=$?
if [ $status -ne 0 ]; then
echo "ERROR: Command failure"
exit 1
fi
eval dest="\"$gcdnf_result\""
}
# Cause blarg to be loaded with the current directory, use the results to
# create a flag_file name, and do logic with the flag_file name.
function main {
get_cur_dir blarg
echo "Current diregtory is:$blarg"
local flag_file=/tmp/$blarg.flag
echo -e ">>>>>>>> $flag_file"
if [ "/tmp//root.flag" = "$flag_file" ]; then
echo "Match"
else
echo "No Match"
fi
}
main
.
.
When I execute without the set -x it works as I expect as illustrated below:
Current diregtory is:/root
>>>>>>>> /tmp//root.flag
Match
.
.
However, when I add the debugging output with -x it doesn't work, as illustrated below:
root#psbu-jrr-lnx:# bash -x /tmp/example.sh
+ main
+ get_cur_dir blarg
+ local dest=blarg
+ local result
+ result='++ pwd
/root'
+ status=0
+ eval 'blarg="++ pwd
/root"'
++ blarg='++ pwd
/root'
+ return 0
+ echo 'Current diregtory is:++ pwd
/root'
Current diregtory is:++ pwd
/root
+ local 'flag_file=/tmp/++ pwd
/root.flag'
+ echo -e '>>>>>>>> /tmp/++ pwd
/root.flag'
>>>>>>>> /tmp/++ pwd
/root.flag
+ '[' /tmp//root.flag = '/tmp/++ pwd
/root.flag' ']'
+ echo 'No Match'
No Match
root#psbu-jrr-lnx:#
I think what happens is you capture the debugging logging output produced by the shell when you run it with set -x, this line, for example, does it:
result=$((pwd) 2>&1)
In the above line you shouldn't really need to redirect standard error to standard output, so remove 2>&1.
Changing...
result=$((pwd) 2>&1)
...into...
result=$(pwd 2>&1)
...will allow you to capture the output of pwd without capturing the debug info generated by set -x.
The reason the the $PWD variable exists is to free your script from having to run a separate process or interpret its output (which in this case has been modified by -x). Use $PWD instead.

Resources