I am currently trying to figure out why a shell script fails at concurrent logging every once in a while.
I have a shell function like the following:
log()
{
local l_text=$1
local l_file="/path/to/logs/$(date +%Y%m%d)_script.log"
local l_line="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) ${l_text}"
echo ${l_line} >> ${l_file}
}
Now every once in a while this fails with a syntax error:
/path/to/script.sh: command substitution: line 163: syntax error near unexpected token `)'
/path/to/script.sh: command substitution: line 163: `hostname -s) ${l_text}'
The problem is, that I have multiple sub-processes, which each want to log as well as send traps (during which logging is performed as well). I haved debugged the problem and found out, that this happens, when the function is entered three times simultaneously. First the main process enters, then the child. After the date part of l_text is executed, main get's interrupted by a trap which is caused by child and in this trap tries to log something. The child and the trap finish their logging nicely, but then main is resumed after the trap and tries to execute the hostname part (presumedly) and fails with this error.
So it seems like main does not like being put to sleep while it is producing the $(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) ${l_text} part of the log statement and cannot resume nicely. I was assuming this should work fine, because I am just using local variables and thread safe output methods.
Is this a general concurrency problem I am running into here? Or is this very specific for the trap mechanism in bash scripts? I know about the commodities of SIGNAL handling in C, so I am aware that only certain operations are allowed in SIGNAL handlers. However I am not aware if the same precautions also apply when handling SIGNALs in a bash script. I tried to find documentation on this, but none of the documents I could find gave any indications of problems with SIGNAL handling in scripts.
EDIT:
Here is an actuall simple script that can be used to replicate the problem:
#!/bin/bash
log() {
local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1"
echo $text >> /dev/null
}
sub_process() {
while true; do
log "Thread is running"
kill -ALRM $$
sleep 1
done
}
trap "log 'received ALRM'" ALRM
sub_process &
sub_process_pid=$!
trap "kill ${sub_process_pid}; exit 0" INT TERM
while true; do
log "Main is running"
sleep 1
done
Every once in a while this script will get killed because of a syntax error in line 5. Line 5 is echo $text >> /dev/null, but since the syntax error also mentiones the hostname command, similar to the one I posted above, I am assuming there is an of-by-one error as well and the actual error is in line 4, which is local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1".
Does anybody know what to do with the above script to correct it? I alread tried moving out the construction of the string into some temporary variables:
log() {
local thedate=$(date +'%Y-%m-%d %H:%M:%S')
local thehostname=$(hostname -s)
local text="${thedate} ${thehostname} $1"
echo $text >> /dev/null
}
This way the error appears less frequently, but it still is present, so this is not a real fix.
I would say that this is definitely a bug in bash and I would encourage you to report it to the bash developers. At the very least, you should never get a syntax error for what is syntactically correct code.
For the record, I get the same results as you with GNU bash, version 4.2.10(1)-release (x86_64-pc-linux-gnu).
I found that you can workaround the problem by not calling a function in your trap handler. E.g. replacing
trap "log 'received ALRM'" ALRM
with
trap "echo $(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) received ALRM" ALRM
makes the script stable for me.
I know about the commodities of SIGNAL handling in C, so I am aware
that only certain operations are allowed in SIGNAL handlers. However I
am not aware if the same precautions also apply when handling SIGNALs
in a bash script.
I guess you shouldn't have to take special precautions but apparently in practice you do. Given that the problem seem to go away without the function call, I'm guessing that something in bash either isn't re-entrant where it should be or fails prevent re-entry in the first place.
Related
Question
I am using Bash 5 and have a long-running loop which needs to check occasionally for various keys the user may have hit. I know how to do this using stty — see my answer below — but it's more ugly than ought.
Essentially, I'm looking for a clean way to do this:
keyhit-p() {
if "read -n1 would block"; then
return false
else
return true
fi
}
Non-solution: read -t 0
I have read the bash manual and know about read -t 0. That does not do what I want, which is to detect if any input is available. Instead, it only returns true if the user hits ENTER (a complete line of input).
For example:
while true; do
if read -n1 -t0; then
echo "This only works if you hit enter"
break
fi
done
A working answer, albeit ugly
While the following works, I am hoping someone has a better answer.
#!/bin/bash
# Reset terminal's stty to previous values on exit.
trap 'stty $(stty --save)' EXIT
keyhit-p() {
# Return true if input is available on stdin (any key has been hit).
local sttysave=$(stty --save)
stty -icanon min 1 time 0 # ⎫
read -t0 # ⎬ Ugly: This ought to be atomic so the
local status=$? # ⎪ terminal's stty is always restored.
stty $sttysave # ⎭
return $status
}
while true; do
echo -n .
if ! keyhit-p; then
continue
else
while keyhit-p; do
read -n1
echo Key: $REPLY
done
break
fi
done
This alters the user's terminal settings (stty) before the read and attempts to write them back afterward, but does so non-atomically. It's possible for the script to get interrupted and leave the user's terminal in an incorrect state. I'd like to see an answer which solves that problem, ideally using only the tools built in to bash.
A faster, even uglier answer
Another flaw in the above routine is that it takes a lot of CPU time trying to get everything right. It requires calling an external program (stty) three times just to check that nothing has happened. Forks can be expensive in loops. If we dispense with correctness, we can get a routine that runs two orders of magnitude (256×) faster.
#!/bin/bash
# Reset terminal's stty to previous values on exit.
trap 'stty $(stty --save)' EXIT
# Set one character at a time input for the whole script.
stty -icanon min 1 time 0
while true; do
echo -n .
# We save time by presuming `read -t0` no longer waits for lines.
# This may cause problems and can be wrong, for example, with ^Z.
if ! read -t0; then
continue
else
while read -t0; do
read -n1
echo Key: $REPLY
done
break
fi
done
Instead of changing to non-canonical mode only during the read test, this script sets it once at the beginning and uses an exception handler when the script exits to undo it.
While I like that the code looks cleaner, the atomicity flaw of the original version is exacerbated because the SUSPEND signal isn't handled. If the user's shell is bash, icanon is enabled when the process is suspended, but NOT disabled when the process is foregrounded. That makes read -t0 return FALSE even when keys (other than Enter) are hit. Other user shells may not enable icanon on ^Z as bash does, but that's even worse as entering commands will no longer work as usual.
Additionally, requiring non-canonical mode to be left on all the time may cause other problems as the script gets longer than this trivial example. It is not documented how non-canonical mode is supposed to affect read and other bash built-ins. It seems to work in my tests, but will it always? Chances of running into problems would multiply when calling — or being called by — external programs. Maybe there would be no issues, but it would require tedious testing.
This question already has answers here:
SIGINT to cancel read in bash script?
(2 answers)
Closed 2 years ago.
I'm playing around with bash read functionality. I like what I have so far as a simple layer on top of my current shell. The read -e does tab-complete and previous commands, and sending EOF with ctrl+d gets me back to my original shell. Here's my reference:
Bash (or other shell): wrap all commands with function/script
I'd like some help handling SIGINT, ctrl+c. In a normal shell, if you start typing and hit ^C halfway, it immediately ends the line. For this simple example, after ^C, I still have to hit return before it's registered.
How do I keep the nice things that readline does, but still handle SIGINT correctly? Ideally, it would send a continue statement to the while read loop, or somehow send a \n to STDIN where my read is waiting.
Example code:
#!/bin/bash
# Emulate bash shell
gtg=1
function handleCtrl-C {
# What do I do here?
gtg=0
return
}
trap handleCtrl-C INT
while read -e -p "> " line
do
if [[ $gtg == 1 ]] ; then
eval "$line"
fi
gtg=1
done
I think I came up with something finally I liked. See SIGINT to cancel read in bash script? for that answer.
Reading man 7 signal tells that some system calls have a restartable flag set as a result will return back to the command
For some system calls, if a signal is caught while the call is
executing and the call is prematurely terminated, the call is
auto-matically restarted. Any handler installed with signal(3) will
have the SA_RESTART flag set, meaning that any restartable system call
will not return on receipt of a signal. The affected system calls
include read(2), write(2), sendto(2), recvfrom(2),sendmsg(2), and
recvmsg(2) on a communications channel or a low speed device and
during a ioctl(2) or wait(2). However, calls that have already
committed are not restarted, but instead return a partial success (for
example, a short read count). These semantics could be changed with
siginterrupt(3).
You can try printing the value input to line and verify that the read is resumed after CtrlC return until new line is hit. Type in something like "exit", followed by Ctrl-C and then "exit" the output comes out as "exitexit". Make the following change and run for the above test case
echo ">$line<"
if [ $gtg == 1 ] ; then
You'll the output as
You can verify this with a C program as well.
Sorry I cannot give a clear title for what's happening but here is the simplified problem code.
#!/bin/bash
# get the absolute path of .conf directory
get_conf_dir() {
local path=$(some_command) || { echo "please install some_command first."; exit 100; }
echo "$path"
}
# process the configuration
read_conf() {
local conf_path="$(get_conf_dir)/foo.conf"
[ -r "$conf_path" ] || { echo "conf file not found"; exit 200; }
# more code ...
}
read_conf
So basically here what I am trying to do is, reading a simple configuration file in bash script, and I have some trouble in error handling.
The some_command is a command which comes from a 3rd party library (i.e. greadlink from coreutils), required for obtain the path.
When running the code above, I expect it outputs "command not found" because that's where the FIRST error occurs, but actually it always prints "conf file not found".
I am very confused about such behavior, and I think BASH probably intent to handle thing like this but I don't know why. And most importantly, how to fix it?
Any idea would be greatly appreciated.
Do you see your please install some_command first message anywhere? Is it in $conf_path from the local conf_path="$(get_conf_dir)/foo.conf" line? Do you have a $conf_path value of please install some_command first/foo.conf? Which then fails the -r test?
No, you don't. (But feel free to echo the value of $conf_path in that exit 200 block to confirm this fact.) (Also Error messages should, in general, get sent to standard error and not standard output anyway. So they should be echo "..." 2>&1. That way they don't be caught by the normal command substitution at all.)
The reason you don't is because that exit 100 block is never happening.
You can see this with set -x at the top of your script also. Go try it.
See what I mean?
The reason it isn't happening is that the failure return of some_command is being swallowed by the local path=$(some_command) assignment statement.
Try running this command:
f() { local a=$(false); echo "Returned: $?"; }; f
Do you expect to see Returned: 1? You might but you won't see that.
What you will see is Returned: 0.
Now try either of these versions:
f() { a=$(false); echo "Returned: $?"; }; f
f() { local a; a=$(false); echo "Returned: $?"; }; f
Get the output you expected in the first place?
Right. local and export and declare and typeset are statements on their own. They have their own return values. They ignore (and replace) the return value of the commands that execute in their contexts.
The solution to your problem is to split the local path and path=$(some_command) statements.
http://www.shellcheck.net/ catches this (and many other common errors). You should make it your friend.
In addition to the above (if you've managed to follow along this far) even with the changes mentioned so far your exit 100 won't exit the main script since it will only exit the sub-shell spawned by the command substitution in the assignment.
If you want that exit 100 to exit your script then you either need to notice and re-exit with it (check for get_conf_dir failure after the conf_path assignment and exit with the previous exit code) or drop the get_conf_dir function itself and just do that inline in read_conf.
I intend to use trap to execute some clean up code in case of a failure. I have the following code, but it seems to be have some syntactical issues.
#!/bin/bash
set -e
function handle_error {
umount /mnt/chroot
losetup -d $LOOP_DEV1 $LOOP_DEV2
}
trap "{ echo \"$BASH_COMMAND failed with status code $?\"; handle_error; }" ERR
Does any one see an issue with the way the trap has been written. In case of an error the trap does get executed fine but it also throws another unwanted error message below.
/root/myscript.sh: line 60: } ERR with status code 0: command not found
##line 60 is that line of code that exited with a non zero status
How do I write it correctly to avoid the error message? Also what if I had to send arguments $LOOP_DEV1 and $LOOP_DEV2 from the main script to the trap and then to the handle_error function? Right now they are exported as environment variables in the main script. I did some search for trap examples but I couldn't get something similar.
EDIT
I changed the shebang from /bin/sh to /bin/bash. As /bin/sh was already symlinked to bash I did not expect unicorns nor did I see any.
That trap call is creating an interesting recursion, because $BASH_COMMAND (and $?) are being expanded when the trap command executes. However, $BASH_COMMAND at that point is the trap command itself, textually including $BASH_COMMAND (and some quotes and semicolons). Actually figuring out what the command to be executed when the trap fires is an interesting study, but it's not necessary to fix the problem, which you can do like this:
trap '{ echo "$BASH_COMMAND failed with status code $?"; handle_error; }' ERR
Note that replacing " with ' not only avoids immediate parameter expansion, it also avoids have to escape the inner "s.
I'm adding some custom logging functionality to a bash script, and can't figure out why it won't take the output from one named pipe and feed it back into another named pipe.
Here is a basic version of the script (http://pastebin.com/RMt1FYPc):
#!/bin/bash
PROGNAME=$(basename $(readlink -f $0))
LOG="$PROGNAME.log"
PIPE_LOG="$PROGNAME-$$-log"
PIPE_ECHO="$PROGNAME-$$-echo"
# program output to log file and optionally echo to screen (if $1 is "-e")
log () {
if [ "$1" = '-e' ]; then
shift
$# > $PIPE_ECHO 2>&1
else
$# > $PIPE_LOG 2>&1
fi
}
# create named pipes if not exist
if [[ ! -p $PIPE_LOG ]]; then
mkfifo -m 600 $PIPE_LOG
fi
if [[ ! -p $PIPE_ECHO ]]; then
mkfifo -m 600 $PIPE_ECHO
fi
# cat pipe data to log file
while read data; do
echo -e "$PROGNAME: $data" >> $LOG
done < $PIPE_LOG &
# cat pipe data to log file & echo output to screen
while read data; do
echo -e "$PROGNAME: $data"
log echo $data # this doesn't work
echo -e $data > $PIPE_LOG 2>&1 # and neither does this
echo -e "$PROGNAME: $data" >> $LOG # so I have to do this
done < $PIPE_ECHO &
# clean up temp files & pipes
clean_up () {
# remove named pipes
rm -f $PIPE_LOG
rm -f $PIPE_ECHO
}
#execute "clean_up" on exit
trap "clean_up" EXIT
log echo "Log File Only"
log -e echo "Echo & Log File"
I thought the commands on line 34 & 35 would take the $data from $PIPE_ECHO and output it to the $PIPE_LOG. But, it doesn't work. Instead I have to send that output directly to the log file, without going through the $PIPE_LOG.
Why is this not working as I expect?
EDIT: I changed the shebang to "bash". The problem is the same, though.
SOLUTION: A.H.'s answer helped me understand that I wasn't using named pipes correctly. I have since solved my problem by not even using named pipes. That solution is here: http://pastebin.com/VFLjZpC3
it seems to me, you do not understand what a named pipe really is. A named pipe is not one stream like normal pipes. It is a series of normal pipes, because a named pipe can be closed and a close on the producer side is might be shown as a close on the consumer side.
The might be part is this: The consumer will read data until there is no more data. No more data means, that at the time of the read call no producer has the named pipe open. This means that multiple producer can feed one consumer only when there is no point in time without at least one producer. Think of it of door which closes automatically: If there is a steady stream of people keeping the door always open either by handing the doorknob to the next one or by squeezing multiple people through it at the same time, the door is open. But once the door is closed it stays closed.
A little demonstration should make the difference a little clearer:
Open three shells. First shell:
1> mkfifo xxx
1> cat xxx
no output is shown because cat has opened the named pipe and is waiting for data.
Second shell:
2> cat > xxx
no output, because this cat is a producer which keeps the named pipe open until we tell him to close it explicitly.
Third shell:
3> echo Hello > xxx
3>
This producer immediately returns.
First shell:
Hello
The consumer received data, wrote it and - since one more consumer keeps the door open, continues to wait.
Third shell
3> echo World > xxx
3>
First shell:
World
The consumer received data, wrote it and - since one more consumer keeps the door open, continues to wait.
Second Shell: write into the cat > xxx window:
And good bye!
(control-d key)
2>
First shell
And good bye!
1>
The ^D key closed the last producer, the cat > xxx, and hence the consumer exits also.
In your case which means:
Your log function will try to open and close the pipes multiple times. Not a good idea.
Both your while loops exit earlier than you think. (check this with (while ... done < $PIPE_X; echo FINISHED; ) &
Depending on the scheduling of your various producers and consumers the door might by slam shut sometimes and sometimes not - you have a race condition built in. (For testing you can add a sleep 1 at the end of the log function.)
You "testcases" only tries each possibility once - try to use them multiple times (you will block, especially with the sleeps ), because your producer might not find any consumer.
So I can explain the problems in your code but I cannot tell you a solution because it is unclear what the edges of your requirements are.
It seems the problem is in the "cat pipe data to log file" part.
Let's see: you use a "&" to put the loop in the background, I guess you mean it must run in parallel with the second loop.
But the problem is you don't even need the "&", because as soon as no more data is available in the fifo, the while..read stops. (still you've got to have some at first for the first read to work). The next read doesn't hang if no more data is available (which would pose another problem: how does your program stops ?).
I guess the while read checks if more data is available in the file before doing the read and stops if it's not the case.
You can check with this sample:
mkfifo foo
while read data; do echo $data; done < foo
This script will hang, until you write anything from another shell (or bg the first one). But it ends as soon as a read works.
Edit:
I've tested on RHEL 6.2 and it works as you say (eg : bad!).
The problem is that, after running the script (let's say script "a"), you've got an "a" process remaining. So, yes, in some way the script hangs as I wrote before (not that stupid answer as I thought then :) ). Except if you write only one log (be it log file only or echo,in this case it works).
(It's the read loop from PIPE_ECHO that hangs when writing to PIPE_LOG and leaves a process running each time).
I've added a few debug messages, and here is what I see:
only one line is read from PIPE_LOG and after that, the loop ends
then a second message is sent to the PIPE_LOG (after been received from the PIPE_ECHO), but the process no longer reads from PIPE_LOG => the write hangs.
When you ls -l /proc/[pid]/fd, you can see that the fifo is still open (but deleted).
If fact, the script exits and removes the fifos, but there is still one process using it.
If you don't remove the log fifo at the cleanup and cat it, it will free the hanging process.
Hope it will help...