Bash - Log all commands and exit codes in a script - bash

I have a long (~2,000 lines) script that I'm trying to log for future debugging. Right now I have:
function log_with_time()
{
while read a; do
echo `date +'%H:%M:%S.%4N '` " $a" >> $LOGFILE
done
}
exec 7> >(log_with_time)
BASH_XTRACEFD=7
PS4=' exit($?)ln:$LINENO: '
set -x
echo "helloWorld 1"
which gives me very nice logging for any and all commands that are run:
15:18:03.6359 exit(0)ln:28: echo 'helloWorld 1'
The issue that I'm running into is that xtrace seems to be asynchronous. With longer scripts, the log times fall behind the actual time the commands are called, and the exit code doesn't match the logged command.
There has to be a better way to do this but I'd be happy if I could just synchronize xtrace.
...
tldr: How can I generally log the time, command and exit code for all commands in a script?
...
(First time posting, feedback appreciated)
UPDATE:
exec {BASH_XTRACEFD}>>$LOGFILE
PS4=' time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
set -x
fail()
{
echo "fail" >> $LOGFILE
return 1
}
trap 'echo exit:$? >> $LOGFILE' DEBUG
fail
solves all of my synchronization issues. exit codes and timestamps are working beautifully. My only issue now is one of formatting: the trap itself is getting reported by xtrace.
time:18:30:07.6080 ln:27: fail
time:18:30:07.6089 ln:12: echo fail
fail
time:18:30:07.6126 ln:13: return 1
time:18:30:07.6134 ln:28: echo exit:1
exit:1
I've tried setting +x in the trap but then set +x gets logged. If I could find a way to omit one line from xtrace, this log would be perfect.

The async behavior is coming from the process substitution -- anything in >(...) is running in its own subshell on the other end of a FIFO. Since it's a separate process, it's inherently unsynchronized.
You don't need log_with_time here at all, though, and so you don't need BASH_XTRACEFD redirecting to a process substitution in the first place. Consider:
# aside: $(date ...) has a *huge* amount of performance overhead here. Personally, I'd
# advise against using it, unless you really need all that precision; $SECONDS will
# be orders-of-magnitude cheaper.
PS4=' prior-exit:$? time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
...thereafter:
$ true
prior-exit:0 time:16:01:17.2509 ln:28: true
$ false
prior-exit:0 time:16:01:18.4242 ln:29: false
$ false
prior-exit:1 time:16:01:19.2963 ln:30: false
$ true
prior-exit:1 time:16:01:20.2159 ln:31: true
$ true
prior-exit:0 time:16:01:20.8650 ln:32: true

Per conversation with Charles Duffy in the comments to whom all credit is given:
Process substitution >(...) is asynchronous, allowing the log writing to fall behind and out of sync with the xtrace.
Instead use:
exec {BASH_XTRACEFD}>>$LOGFILE
PS4=' time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
for synchronously logging the time and line.
Furthermore, xtrace is triggered before running the command, making it a bad candidate for capturing exit codes. Instead use:
trap 'echo exit:$? >> $LOGFILE' DEBUG
to log the exit codes of each command since trap triggers on command completion. Note that this won't report on every step in a function call like xtrace will. (could use some help with the phrasing here)
No solution yet for omitting the trap from xtrace, but it's good enough:
LOGFILE="SomeFile.log"
exec {BASH_XTRACEFD}>>$LOGFILE
PS4=' time:$(date +%H:%M:%S.%4N) ln:$LINENO: '
set -x
fail() # test function that returns 1
{
echo "fail" >> $LOGFILE
return 1
}
success() # test function that returns 0
{
echo "success" >> $LOGFILE
return 0
}
trap 'echo $? >> $LOGFILE' DEBUG
fail
success
echo "complete"
yields:
time:14:10:22.2686 ln:21: trap 'echo $? >> $LOGFILE' DEBUG
time:14:10:22.2693 ln:23: echo 0
0
time:14:10:22.2736 ln:23: fail
time:14:10:22.2741 ln:12: echo fail
fail
time:14:10:22.2775 ln:13: return 1
time:14:10:22.2782 ln:24: echo 1
1
time:14:10:22.2830 ln:24: success
time:14:10:22.2836 ln:17: echo success
success
time:14:10:22.2873 ln:18: return 0
time:14:10:22.2881 ln:26: echo 0
0
time:14:10:22.2912 ln:26: echo complete

Related

Tcsh Script Last Exit Code ($?) value is resetting

I am running the following script using tcsh. In my while loop, I'm running a C++ program that I created and will return a different exit code depending on certain things. While it returns an exit code of 0, I want the script to increment counter and run the program again.
#!/bin/tcsh
echo "Starting the script."
set counter = 0
while ($? == 0)
# counter ++
./auto $counter
end
I have verified that my program is definitely returning with exit code = 1 after a certain point. However, the condition in the while loop keeps evaluating to true for some reason and running.
I found that if I stick the following line at the end of my loop and then replace the condition check in the while loop with this new variable, it works fine.
while ($return_code == 0)
# counter ++
./auto $counter
set return_code = $?
end
Why is it that I can't just use $? directly? Is another operation underneath the hood performed in between running my custom program and checking the loop condition that's causing $? to change value?
That is peculiar.
I've altered your example to something that I think illustrates the issue more clearly. (Note that $? is an alias for $status.)
#!/bin/tcsh -f
foreach i (1 2 3)
false
# echo false status=$status
end
echo Done status=$status
The output is
Done status=0
If I uncomment the echo command in the loop, the output is:
false status=1
false status=1
false status=1
Done status=0
(Of course the echo in the loop would break the logic anyway, because the echo command completes successfully and sets $status to zero.)
I think what's happening is that the end that terminates the loop is executed as a statement, and it sets $status ($?) to 0.
I see the same behavior with both tcsh and bsd-csh.
Saving the value of $status in another variable immediately after the command is a good workaround -- and arguably just a better way of doing it, since $status is extremely fragile, and will almost literally be clobbered if you look at it.
Note that I've add a -f option to the #! line. This prevents tcsh from sourcing your init file(s) (.cshrc or .tcshrc) and is considered good practice. (That's not the case for sh/bash/ksh/zsh, which assign a completely different meaning to -f.)
A digression: I used tcsh regularly for many years, both as my interactive login shell and for scripting. I would not have anticipated that end would set $status. This is not the first time I've had to find out how tcsh or csh behaves by trial and error and been surprised by the result. It is one of the reasons I switched to bash for interactive and scripting use. I won't tell you to do the same, but you might want to read Tom Christiansen's classic "csh.whynot".
Slightly shorter/simpler explanation:
Recall that with tcsh/csh EACH command (including shell builtin) return a status. Therefore $? (aliases to $status) is updated by 'if' statements, 'for' loops, assignments, ...
From practical point of view, better to limit the usage of direct use of $? to an if statement after the command execution:
do-something
if ( $status == 0 )
...
endif
In all other cases, capture the status in a variable, and use only that variable
do-something
something_status=$?
if ( $something_status == 0 )
...
endif
To expand on the $status, even a condition test in an if statement will modify the status, therefore the following repeated test on $status will not never hit the '$status == 5', even when do-something will return status of 5
do-something
if ( $status == 2 ) then
echo FOO
else if ( $status == 5 ) then
echo BAR
endif

Checking if current file is being sourced using return yields reversed result after command returning 1

I know the title is a bit weird, but I'm not sure how to better word it.
I'm using $(return &> /dev/null) to detect if a file is sourced or not.
I know that this method is not 100% reliable, but it has worked for me without issues before now. I've figured out how to work around the problem, but I can't figure out why this is happening.
Edit: This is for an internal company project and is not intended to be POSIX compliant or portable to other shells.
I've tried this on three different systems and had the same results:
Redhat 6.9; $BASH_VERSION=4.1.2(1)-release
Mint 18.2; $BASH_VERSION=4.3.48(1)-release
Arch; $BASE_VERSION=4.4.23(1)-release
If $? is 1 (false) when $(return &> /dev/null) the return code is reversed.
$ hr | cat test_0source - test_0source.sh; hr; ./test_0source
#!/bin/bash
# test_0source
false
test_0source.sh
false
source test_0source.sh
true
test_0source.sh
true
source test_0source.sh
========================================
#!/bin/bash
# test_0source.sh
# shellcheck disable=SC2091
$(return &> /dev/null)
echo "$?"
========================================
1
1
1
0
I expected to be seeing
1
0
1
0
My workaround is to add true right before the return check. While this gives me the correct results, adding false instead of true cause the return check to always return 1.
I realize I'm missing something basic here, but I'm not seeing it. Why is this doing this?
Edit: I had an incorrect value for $? in my initial explanation above.
Update:
First, using $(return 0 &> /dev/null) resolves the problem, thanks to #ondre-k.
Ondre also pointed out that comparing BASH_SOURCE against $0 is a more idiomatic way to do this:
$ hr | cat test_0bs - test_0bs.sh; hr; test_0bs | less
#!/bin/bash
# test_0bs
false
test_0bs.sh
hr
false
source test_0bs.sh
hr
true
test_0bs.sh
hr
true
source test_0bs.sh
========================================
#!/bin/bash
# test_0bs.sh
if [[ ${BASH_SOURCE[0]} == "$0" ]]; then
echo "We are not being sourced."
else
echo "We are being sourced."
fi
========================================
We are not being sourced.
========================================
We are being sourced.
========================================
We are not being sourced.
========================================
We are being sourced.
I suspect there may be an edge case or two around this, but it also seems to me that these cases will be less of a concern.
What you are doing is basically "abusing" the fact that return can only happen from function or a sourced script and assuming it yields 0 when a return was possible (we're sourced) and 1 if not, while suppressing the error output. You also do that in a subshell. Oddly enough, bash is still OK with placement of the return but does not return from a sourced script and keeps going. So far, so god.
The problem is, that "naked" (w/o explicit value specified) return, just like exit propagate last seen return code (return code of the command immediately preceding it).
In other words false; return would be the same thing as return 1. You can also try this out by replacing true and false by (exit 255) and see what happens. The 1 you are seeing for return following a false is not the one of "error: you cannot say return now" as your test is looking for, but just an ordinary return that has returned the last from the command preceding it (false). This difference would also become obvious if you've dropped the stderr redirection.
TL;DR for this construct to work as you expected, change it to return 0.
I hope I have not missed some corner case, but this should work as an alternative with bash by comparing name of the executed script and a source file. [[ "${BASH_SOURCE}" = ${0} ]] evaluates to 0 if file has not been sourced and 1 if it has. Or replace = with != to get same meaning of values as in the return case above.

How to get the real line number of a failing Bash command?

In the process of coming up with a way to catch errors in my Bash scripts, I've been experimenting with "set -e", "set -E", and the "trap" command. In the process, I've discovered some strange behavior in how $LINENO is evaluated in the context of functions. First, here's a stripped down version of how I'm trying to log errors:
#!/bin/bash
set -E
trap 'echo Failed on line: $LINENO at command: $BASH_COMMAND && exit $?' ERR
Now, the behavior is different based on where the failure occurs. For example, if I follow the above with:
echo "Should fail at: $((LINENO + 1))"
false
I get the following output:
Should fail at: 6
Failed on line: 6 at command: false
Everything is as expected. Line 6 is the line containing the single command "false". But if I wrap up my failing command in a function and call it like this:
function failure {
echo "Should fail at $((LINENO + 1))"
false
}
failure
Then I get the following output:
Should fail at 7
Failed on line: 5 at command: false
As you can see, $BASH_COMMAND contains the correct failing command: "false", but $LINENO is reporting the first line of the "failure" function definition as the current command. That makes no sense to me. Is there a way to get the line number of the line referenced in $BASH_COMMAND?
It's possible this behavior is specific to older versions of Bash. I'm stuck on 3.2.51 for the time being. If the behavior has changed in later releases, it would still be nice to know if there's a workaround to get the value I want on 3.2.51.
EDIT: I'm afraid some people are confused because I broke up my example into chunks. Let me try to clarify what I have, what I'm getting, and what I want.
This is my script:
#!/bin/bash
set -E
function handle_error {
local retval=$?
local line=$1
echo "Failed at $line: $BASH_COMMAND"
exit $retval
}
trap 'handle_error $LINENO' ERR
function fail {
echo "I expect the next line to be the failing line: $((LINENO + 1))"
command_that_fails
}
fail
Now, what I expect is the following output:
I expect the next line to be the failing line: 14
Failed at 14: command_that_fails
Now, what I get is the following output:
I expect the next line to be the failing line: 14
Failed at 12: command_that_fails
BUT line 12 is not command_that_fails. Line 12 is function fail {, which is somewhat less helpful. I have also examined the ${BASH_LINENO[#]} array, and it does not have an entry for line 14.
For bash releases prior to 4.1, a special level of awful, hacky, performance-killing hell is needed to work around an issue wherein, on errors, the system jumps back to the function definition point before invoking an error handler.
#!/bin/bash
set -E
set -o functrace
function handle_error {
local retval=$?
local line=${last_lineno:-$1}
echo "Failed at $line: $BASH_COMMAND"
echo "Trace: " "$#"
exit $retval
}
if (( ${BASH_VERSION%%.*} <= 3 )) || [[ ${BASH_VERSION%.*} = 4.0 ]]; then
trap '[[ $FUNCNAME = handle_error ]] || { last_lineno=$real_lineno; real_lineno=$LINENO; }' DEBUG
fi
trap 'handle_error $LINENO ${BASH_LINENO[#]}' ERR
fail() {
echo "I expect the next line to be the failing line: $((LINENO + 1))"
command_that_fails
}
fail
BASH_LINENO is an array. You can refer to different values in it: ${BASH_LINENO[1]}, ${BASH_LINENO[2]}, etc. to back up the stack. (Positions in this array line up with those in the BASH_SOURCE array, if you want to get fancy and actually print a stack trace).
Even better, though, you can just inject the correct line number in your trap:
failure() {
local lineno=$1
echo "Failed at $lineno"
}
trap 'failure ${LINENO}' ERR
You might also find my prior answer at https://stackoverflow.com/a/185900/14122 (with a more complete error-handling example) interesting.
That behaviour is very reasonable.
The whole picture of the call stack provides comprehensive information whenever an error occurs. Your example had demonstrated a good error message; you could see where the an error actually occurred and which line triggered the function, etc.
If the interpreter/compiler can't precisely indicate where the error actually occurs, you could be more easily confused.

Is there an elegant way to store and evaluate return values in bash scripts?

I have a rather complex series of commands in bash that ends up returning a meaningful exit code. Various places later in the script need to branch conditionally on whether the command set succeed or not.
Currently I am storing the exit code and testing it numerically, something like this:
long_running_command | grep -q trigger_word
status=$?
if [ $status -eq 0 ]; then
: stuff
else
: more code
if [ $status -eq 0 ]; then
: stuff
else
For some reason it feels like this should be simpler. We have a simple exit code stored and now we are repeatedly typing out numerical test operations to run on it. For example I can cheat use the string output instead of the return code which is simpler to test for:
status=$(long_running_command | grep trigger_word)
if [ $status ]; then
: stuff
else
: more code
if [ $status ]; then
: stuff
else
On the surface this looks more straight forward, but I realize it's dirty.
If the other logic wasn't so complex and I was only running this once, I realize I could embed it in place of the test operator, but this is not ideal when you need to reuse the results in other locations without re-running the test:
if long_running_command | grep -q trigger_word; then
: stuff
else
The only thing I've found so far is assigning the code as part of command substitution:
status=$(long_running_command | grep -q trigger_word; echo $?)
if [ $status -eq 0 ]; then
: stuff
else
Even this is not technically a one shot assignment (although some may argue the readability is better) but the necessary numerical test syntax still seems cumbersome to me. Maybe I'm just being OCD.
Am I missing a more elegant way to assign an exit code to a variable then branch on it later?
The simple solution:
output=$(complex_command)
status=$?
if (( status == 0 )); then
: stuff with "$output"
fi
: more code
if (( status == 0 )); then
: stuff with "$output"
fi
Or more eleganter-ish
do_complex_command () {
# side effects: global variables
# store the output in $g_output and the status in $g_status
g_output=$(
command -args | commands | grep -q trigger_word
)
g_status=$?
}
complex_command_succeeded () {
test $g_status -eq 0
}
complex_command_output () {
echo "$g_output"
}
do_complex_command
if complex_command_succeeded; then
: stuff with "$(complex_command_output)"
fi
: more code
if complex_command_succeeded; then
: stuff with "$(complex_command_output)"
fi
Or
do_complex_command () {
# side effects: global variables
# store the output in $g_output and the status in $g_status
g_output=$(
command -args | commands
)
g_status=$?
}
complex_command_output () {
echo "$g_output"
}
complex_command_contains_keyword () {
complex_command_output | grep -q "$1"
}
if complex_command_contains_keyword "trigger_word"; then
: stuff with "$(complex_command_output)"
fi
If you don't need to store the specific exit status, just whether the command succeeded or failed (e.g. whether grep found a match), I's use a fake boolean variable to store the result:
if long_running_command | grep trigger_word; then
found_trigger=true
else
found_trigger=false
fi
# ...later...
if ! $found_trigger; then
# stuff to do if the trigger word WASN'T found
fi
#...
if $found_trigger; then
# stuff to do if the trigger WAS found
fi
Notes:
The shell doesn't really have boolean (true/false) variables. What's actually happening here is that "true" and "false" are stored as strings in the found_trigger variable; when if $found_trigger; then executes, it runs the value of $found_trigger as a command, and it just happens that the true command always succeeds and the false command always fails, thus causing "the right thing" to happen. In if ! $found_trigger; then, the "!" toggles the success/failure status, effectively acting as a boolean "not".
if long_running_command | grep trigger_word; then is equivalent to running the command, then using if [ $? -ne 0 ]; then to check its exit status. I find it a little cleaner, but you have to get used to thinking of if as checking the success/failure of a command, not just testing boolean conditions. If "active" if commands aren't intuitive to you, use a separate test instead.
As Charles Duffy pointed out in a comment, this trick executes data as a command, and if you don't have full control over that data... you don't have control over what your script is going to do. So never set a fake-boolean variable to anything other than the fixed strings "true" and "false", and be sure to set the variable before using it. If you have any nontrivial execution flow in the script, set all fake-boolean variables to sane default values (i.e. "true" or "false") before the execution flow gets complicated.
Failure to follow these rules can lead to security holes large enough to drive a freight train through.
Why don't you set flags for the stuff that needs to happen later?
cheeseballs=false
nachos=false
guppies=false
command
case $? in
42) cheeseballs=true ;;
17 | 31) cheeseballs=true; nachos=true; guppies=true;;
66) guppies=true; echo "Bingo!";;
esac
$cheeseballs && java -crash -burn
$nachos && python ./tex.py --mex
if $guppies; then
aquarium --light=blue --door=hidden --decor=squid
else
echo SRY
fi
As pointed out by #CharlesDuffy in the comments, storing an actual command in a variable is slightly dubious, and vaguely triggers Bash FAQ #50 warnings; the code reads (slightly & IMHO) more naturally like this, but you have to be really careful that you have total control over the variables at all times. If you have the slightest doubt, perhaps just use string values and compare against the expected value at each junction.
[ "$cheeseballs" = "true" ] && java -crash -burn
etc etc; or you could refactor to some other implementation structure for the booleans (an associative array of options would make sense, but isn't portable to POSIX sh; a PATH-like string is flexible, but perhaps too unstructured).
Based on the OP's clarification that it's only about success v. failure (as opposed to the specific exit codes):
long_running_command | grep -q trigger_word || failed=1
if ((!failed)); then
: stuff
else
: more code
if ((!failed)); then
: stuff
else
Sets the success-indicator variable only on failure (via ||, i.e, if a non-zero exit code is returned).
Relies on the fact that variables that aren't defined evaluate to false in an arithmetic conditional (( ... )).
Care must be taken that the variable ($failed, in this example) hasn't accidentally been initialized elsewhere.
(On a side note, as #nos has already mentioned in a comment, you need to be careful with commands involving a pipeline; from man bash (emphasis mine):
The return status of a pipeline is the exit status of the last command,
unless the pipefail option is enabled. If pipefail is enabled, the
pipeline's return status is the value of the last (rightmost) command
to exit with a non-zero status, or zero if all commands exit successfully.
To set pipefail (which is OFF by default), use set -o pipefail; to turn it back off, use set +o pipefail.)
If you don't care about the exact error code, you could do:
if long_running_command | grep -q trigger_word; then
success=1
: success
else
success=0
: failure
fi
if ((success)); then
: success
else
: failure
fi
Using 0 for false and 1 for true is my preferred way of storing booleans in scripts. if ((flag)) mimics C nicely.
If you do care about the exit code, then you could do:
if long_running_command | grep -q trigger_word; then
status=0
: success
else
status=$?
: failure
fi
if ((status == 0)); then
: success
else
: failure
fi
I prefer an explicit test against 0 rather than using !, which doesn't read right.
(And yes, $? does yield the correct value here.)
Hmm, the problem is a bit vague - if possible, I suggest considering refactoring/simplify, i.e.
function check_your_codes {
# ... run all 'checks' and store the results in an array
}
###
function process_results {
# do your 'stuff' based on array values
}
###
create_My_array
check_your_codes
process_results
Also, unless you really need to save the exit code then there is no need to store_and_test - just test_and_do, i.e. use a case statement as suggested above or something like:
run_some_commands_and_return_EXIT_CODE_FROM_THE_LAST_ONE
if [[ $? -eq 0 ]] ; then do_stuff else do_other_stuff ; fi
:)
Dale

Why do I get different bash script results when invoked with 'set -x', and how do I fix it?

I've found that the results of my bash script will change depending upon if I execute it with debugging or not (i.e. invoking set -x). I don't mean that I get more output, but that the result of the program itself differs.
I'm assuming this isn't the desired behavior, and I'm hoping that you can teach me how to correc this.
The bash script below is a contrived example, I tried reducing the logic from the script I'm investigating so that the problem can be easily reproducible and obvious.
#!/bin/bash
# Base function executes command (print working directory) stores the value in
# the destination and returns the status.
function get_cur_dir {
local dest=$1
local result
result=$((pwd) 2>&1)
status=$?
eval $dest="\"$result\""
return $status
}
# 2nd level function uses the base function to execute the command and store
# the result in the desired location. However if the base function fails it
# terminates the script. Yes, I know 'pwd' won't fail -- this is a contrived
# example to illustrate the types of problems I am seeing.
function get_cur_dir_nofail {
local dest=$1
local gcdnf_result
local status
get_cur_dir gcdnf_result
status=$?
if [ $status -ne 0 ]; then
echo "ERROR: Command failure"
exit 1
fi
eval dest="\"$gcdnf_result\""
}
# Cause blarg to be loaded with the current directory, use the results to
# create a flag_file name, and do logic with the flag_file name.
function main {
get_cur_dir blarg
echo "Current diregtory is:$blarg"
local flag_file=/tmp/$blarg.flag
echo -e ">>>>>>>> $flag_file"
if [ "/tmp//root.flag" = "$flag_file" ]; then
echo "Match"
else
echo "No Match"
fi
}
main
.
.
When I execute without the set -x it works as I expect as illustrated below:
Current diregtory is:/root
>>>>>>>> /tmp//root.flag
Match
.
.
However, when I add the debugging output with -x it doesn't work, as illustrated below:
root#psbu-jrr-lnx:# bash -x /tmp/example.sh
+ main
+ get_cur_dir blarg
+ local dest=blarg
+ local result
+ result='++ pwd
/root'
+ status=0
+ eval 'blarg="++ pwd
/root"'
++ blarg='++ pwd
/root'
+ return 0
+ echo 'Current diregtory is:++ pwd
/root'
Current diregtory is:++ pwd
/root
+ local 'flag_file=/tmp/++ pwd
/root.flag'
+ echo -e '>>>>>>>> /tmp/++ pwd
/root.flag'
>>>>>>>> /tmp/++ pwd
/root.flag
+ '[' /tmp//root.flag = '/tmp/++ pwd
/root.flag' ']'
+ echo 'No Match'
No Match
root#psbu-jrr-lnx:#
I think what happens is you capture the debugging logging output produced by the shell when you run it with set -x, this line, for example, does it:
result=$((pwd) 2>&1)
In the above line you shouldn't really need to redirect standard error to standard output, so remove 2>&1.
Changing...
result=$((pwd) 2>&1)
...into...
result=$(pwd 2>&1)
...will allow you to capture the output of pwd without capturing the debug info generated by set -x.
The reason the the $PWD variable exists is to free your script from having to run a separate process or interpret its output (which in this case has been modified by -x). Use $PWD instead.

Resources