What does `flock -u` actually do? - shell

I'm playing around with the command flock, which obtains and releases locks on files. For example, if I run
flock /tmp/mylock true
then it immediately exits, presumably obtaining and then releasing the lock. If I run
flock /tmp/mylock sleep 100
then it delays 100 seconds, again obtaining and releasing the lock. And, if I run the following in two separate shells:
flock /tmp/mylock sleep 100
and
flock /tmp/mylock true
then the second command is blocked, because it can't obtain the lock while the first command runs. Once the sleep 100 completes and the lock is released, the second command runs and exits. All good.
Here's the problem. If, during that 100 second delay, I run the following in a third shell:
flock -u /tmp/mylock true
then what happens? The man page for flock says:
-u, --unlock
Drop a lock. This is usually not required, since a lock is
automatically dropped when the file is closed. However, it may
be required in special cases, for example if the enclosed com-
mand group may have forked a background process which should not
be holding the lock.
So, this should drop the lock, which should allow flock /tmp/mylock true to run, right? (I would also guess that the flock /tmp/mylock sleep 100 would immediately exit, but that's speculation.)
What happens? Nothing. flock -u /tmp/mylock true immediately exits, but flock /tmp/mylock true continues to be blocked, and flock /tmp/mylock sleep 100 continues to exit.
What does flock -u /tmp/mylock <command> actually do?
(All examples tested on Ubuntu 18.04.)

Here's an example with -u working with file descriptor 9 open on a file mylock, successfully unlocking 9 so that a backgrounded flock mylock can proceed.
Note that flock 9 cannot also have a command as in that case the "9" is taken to be a filename, not an fd.
bash -s <<\! 9>mylock 2>&1 |
flock 9; echo gotlock1
flock 9; echo gotlock2
9>&- flock mylock bash -c 'echo start_sleep;sleep 8; echo end_sleep' &
sleep 2
flock -u 9; echo unlock; sleep .1
flock 9; echo gotlock3
!
awk '{t2=systime(); if(t1==0)t1=t2; printf "%2d %s\n",t2-t1,$0; t1=t2}'
The first line makes bash run the following lines after opening fd 9, but also pipes stdout and stderr through the awk script seen at the end. This is just to annotate the output with the timing of the lines. The result is:
0 gotlock1
0 gotlock2
2 unlock
0 start_sleep
8 end_sleep
0 gotlock3
This shows the first 2 flock 9 commands run immediately. Then a flock mylock command is run in the background, after closing fd 9 just for this line. This command could have been run from a second window, for example. The output shows that it hangs, as we do not see start_sleep. This means that the preceding flock 9 did actually get an exclusive lock.
The output then shows that after sleep 2 and flock -u 9 we get the unlock echo, and only then does the background command get the lock and starts its sleep 8.
The main script immediately does a flock 9, but the output shows that this does not proceed until the background script ends with end_sleep 8 seconds later, and the main script outputs gotlock3.
The lslocks command sometimes shows 2 processes interested in the lock. The * means a wait:
COMMAND PID TYPE SIZE MODE M START END PATH
flock 23671 FLOCK 0B WRITE* 0 0 0 /tmp/mylock
flock 23655 FLOCK 0B WRITE 0 0 0 /tmp/mylock
But it does not show the result of the first flock 9 on its own, presumably because there is no process with the lock, even though the file truly is locked, as we see when the background job cannot proceed.

Related

Identify whether a process was killed by a signal in bash

Consider these two C programs:
#include <signal.h>
int main(void) {
raise(SIGTERM);
}
int main(void) {
return 143;
}
If I run either one, the value of $? in bash will be 143. The wait syscall lets you distinguish them, though:
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11148
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 143}], 0, NULL) = 11214
And bash clearly uses this knowledge, since the first one results in Terminated being printed to the terminal (oddly, this happens even if I redirect both stdout and stderr elsewhere), and the second one doesn't. How can I differentiate these two cases from a bash script?
I believe getting the full exit codes from pure bash/shell is not possible.
The answers on Unix' StackExchange are very comprehensive.
What's common between all shells is that $? contains the lowest 8 bits of the exit code (the number passed to exit()) if the process terminated normally.
Where it differs is when the process is terminated by a signal. In all cases, and that's required by POSIX, the number will be greater than 128. POSIX doesn't specify what the value may be. In practice though, in all Bourne-like shells that I know, the lowest 7 bits of $? will contain the signal number. But, where n is the signal number,
in ash, zsh, pdksh, bash, the Bourne shell, $? is 128 + n. What that means is that in those shells, if you get a $? of 129, you don't know whether it's because the process exited with exit(129) or whether it was killed by the signal 1 (HUP on most systems). But the rationale is that shells, when they do exit themselves, by default return the exit status of the last exited command. By making sure $? is never greater than 255, that allows to have a consistent exit status:
$ bash -c 'sh -c "kill \$\$"; printf "%x\n" "$?"'
bash: line 1: 16720 Terminated sh -c "kill \$\$"
8f # 128 + 15
$ bash -c 'sh -c "kill \$\$"; exit'; printf '%x\n' "$?"
bash: line 1: 16726 Terminated sh -c "kill \$\$"
8f # here that 0x8f is from a exit(143) done by bash. Though it's
# not from a killed process, that does tell us that probably
# something was killed by a SIGTERM
For this reason, i believe, that you would need to run a command outside of bash to catch the exit code.
With some abstraction, a similar question has been asked regarding unbuffer which is a small script written in tcl. To be more precise, unbuffer uses the library libexpect with a tcl/tk wrapper.
From the source of unbuffer I extracted the relevant code to derive a workaround:
#!/bin/bash
expectStat() {
expect <(cat << EOT
set stty_init "-opost"
set timeout -1
eval [list spawn -noecho ] $#
expect
send_user "[wait]\n"
EOT
)
}
expectStat sleep 5 &
wait
which returns approximately the following line if sleep exits normally:
18383 exp4 0 0
If sleep is killed before it's exiting itself, the above script will approximately return:
18383 exp4 0 0 CHILDKILLED SIGTERM {software termination signal}
If a script is terminated with exit 143, the script will approximately return:
18383 exp4 0 143
The meaning of these strings can be extracted from the manual for expect. The integrated function wait is returning the above return lines.
The first two values are the pid, and expect's name for the process.
The fourth is the exit status. If a singal occurs more information is printed. The sixth value is the signal send to the process on its termination.
wait
normally returns a list of four integers. The first integer is the pid of the process that was waited upon. The second integer is the corresponding spawn id. The third integer is -1 if an operating system error occurred, or 0 otherwise. If the third integer was 0, the fourth integer is the status returned by the spawned process. If the third integer was -1, the fourth integer is the value of errno set by the operating system. The global variable errorCode is also set.
Additional elements may appear at the end of the return value from wait. An optional fifth element identifies a class of information. Currently, the only possible value for this element is CHILDKILLED in which case the next two values are the C-style signal name and a short textual description.
This means the fourth value and if present the sixth value are the values you are looking for. Store the whole line and extract the signal and exit code, for example with the following code:
RET=$(expectStat script.sh 1>&1)
# Filter status
EXITVALUE="$(echo "$RET" | cut -d' ' -f4)"
SIGNAL=$(echo "$RET" | cut -d' ' -f6)
#echo "Exit value: $EXITVALUE, Signal: $SIGNAL"
if [ -n "$SIGNAL" ]; then
echo "Likely killed by signal"
else
echo "$EXITVALUE"
fi
Conclusively, this workaround is very inelegant. Maybe, there is another tool which brings its own c-based tools to get the occurrence of a signal.
wait is a syscall and also a bash builtin.
To differentiate the two cases from bash run the program in the background and use the builtin wait to report the outcome.
Following are examples of both a non-zero exit code and an uncaught signal. These examples use the exit and kill bash builtins in a child bash shell, instead of a child bash shell you would run your program.
$ bash -c 'kill -s SIGTERM $$' & wait
[1] 36068
[1]+ Terminated: 15 bash -c 'kill -s SIGTERM $$'
$ bash -c 'exit 143' & wait
[1] 36079
[1]+ Exit 143 bash -c 'exit 143'
$
As to why you see Terminated printed to the terminal even when you redirect stdout and stderr the reason is that is printed by bash, not by the program.
Update:
By explicitly using the wait builtin you can now redirect its stderr (with the exit status of the program) to a separate file.
The following examples show the three types of termination: normal exit 0, non-zero exit, and uncaught signal. The results reported by wait are stored in files tagged with the PID of the corresponding program.
$ bash -c 'exit 0' & wait 2> exit_status_pid_$!
[1] 40279
$ bash -c 'exit 143' & wait 2> exit_status_pid_$!
[1] 40291
$ bash -c 'kill -s SIGTERM $$' & wait 2> exit_status_pid_$!
[1] 40303
$ for f in exit_status_pid*; do echo $f: $(cat $f); done
exit_status_pid_40279: [1]+ Done bash -c 'exit 0'
exit_status_pid_40291: [1]+ Exit 143 bash -c 'exit 143'
exit_status_pid_40303: [1]+ Terminated: 15 bash -c 'kill -s SIGTERM $$'
$
This is straying farther from bash but bcc offers exitsnoop. Using the
examples from the description, on Debian Sid:
root#vsid:~# apt install bpfcc-tools linux-headers-amd64
root#vsid:~# exitsnoop-bpfcc
PCOMM PID PPID TID AGE(s) EXIT_CODE
example1 1041 948 1041 0.00 signal 15 (TERM)
example2 1042 948 1042 0.00 code 143
^C
See the install guide for other distributions.
Strace can capture most of the signals, but might not work for syscalls (e.g. kill -9 ), therefore, as mentioned in this article:
Auditd is a daemon process or service that does as the name implies and produces audit logs of System level activities. It is installed from the usual repository as the audit package and then is configured in /etc/audit/auditd.conf and the rules are in /etc/audit/audit.rules.
The article provides examples for Audit output, which can help determining if it's helpful for you:
The usual output will look like this:
time->Wed Jun 3 16:34:08 2015
type=SYSCALL msg=audit(1433363648.091:6342): arch=c000003e syscall=62 success=no exit=-3 a0=1e06 a1=0 a2=1e06 a3=fffffffffffffff0 items=0 ppid=10044 pid=10140 auid=500 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm=4174746163682041504920696E6974 exe="/opt/ibm/WebSphere/AppServer/java/jre/bin/java" subj=unconfined_u:unconfined_r:unconfined_java_t:s0-s0:c0.c1023 key="kill_signals"
There is also mention of System Tap, and a redirection to a guide .

In bash: processing every command line without using the debug trap?

I have a complicated mechanism built into my bash environment that requires the execution of a couple scripts when the prompt is generated, but also when the user hits enter to begin processing a command. I'll give an oversimplified description:
The debug trap does this in a fairly limited way: it fires every time a statement is executed.
trap 'echo $BASH_COMMAND' DEBUG # example
Unfortunately, this means that when I type this:
sleep 1; sleep 2; sleep 3
rather than processing a $BASH_COMMAND that contains the entire line, I get the three sleeps in three different traps. Worse yet:
sleep 1 | sleep 2 | sleep 3
fires all three as the pipe is set up - before sleep 1 even starts executing, the output might lead you to believe that sleep 3 is running.
I need a way to execute a script right at the beginning, processing the entire command, and I'd rather it not fire when the prompt command is run, but I can deal with that if I must.
THERE'S A MAJOR PROBLEM WITH THIS SOLUTION. COMMANDS WITH PIPES (|) WILL FINISH EXECUTING THE TRAP, BUT BACKGROUNDING A PROCESS DURING THE TRAP WILL CAUSE THE PROCESSING OF THE COMMAND TO FREEZE - YOU'LL NEVER GET A PROMPT BACK WITHOUT HITTING ^C. THE TRAP COMPLETES, BUT $PROMPT_COMMAND NEVER RUNS. THIS PROBLEM PERSISTS EVEN IF YOU DISOWN THE PROCESS IMMEDIATELY AFTER BACKGROUNDING IT.
This wound up being a little more interesting than I expected:
LOGFILE=~/logfiles/$BASHPID
start_timer() {
if [ ! -e $LOGFILE ]; then
#You may have to adjust this to fit with your history output format:
CMD=`history | tail -1 | tr -s " " | cut -f2-1000 -d" "`
#timer2 keeps updating the status line with how long the cmd has been running
timer2 -p "$PROMPT_BNW $CMD" -u -q & echo $! > $LOGFILE
fi
}
stop_timer() {
#Unfortunately, killing a process always prints that nasty confirmation line,
#and you can't silence it by redirecting stdout and stderr to /dev/null, so you
#have to disown the process before killing it.
disown `cat $LOGFILE`
kill -9 `cat $LOGFILE`
rm -f $LOGFILE
}
trap 'start_timer' DEBUG

Quit less when pipe closes

As part of a bash script, I want to run a program repeatedly, and redirect the output to less. The program has an interactive element, so the goal is that when you exit the program via the window's X button, it is restarted via the script. This part works great, but when I use a pipe to less, the program does not automatically restart until I go to the console and press q. The relevant part of the script:
while :
do
program | less
done
I want to make less quit itself when the pipe closes, so that the program restarts without any user intervention. (That way it behaves just as if the pipe was not there, except while the program is running you can consult the console to view the output of the current run.)
Alternative solutions to this problem are also welcome.
Instead of exiting less, could you simply aggregate the output of each run of program?
while :
do
program
done | less
Having less exit when program would be at odds with one useful feature of less, which is that it can buffer the output of a program that exits before you finish reading its output.
UPDATE: Here's an attempt at using a background process to kill less when it is time. It assumes that the only program reading the output file is the less to kill.
while :
do
( program > /tmp/$$-program-output; kill $(lsof -Fp | cut -c2-) ) &
less /tmp/$$-program-output
done
program writes its output to a file. Once it exits, the kill command uses lsof to
find out what process is reading the file, then kills it. Note that there is a race condition; less needs to start before program exists. If that's a problem, it can
probably be worked around, but I'll avoid cluttering the answer otherwise.
You may try to kill the process group program and less belong to instead of using kill and lsof.
#!/bin/bash
trap 'kill 0' EXIT
while :
do
# script command gives sh -c own process group id (only sh -c cmd gets killed, not entire script!)
# FreeBSD script command
script -q /dev/null sh -c '(trap "kill -HUP -- -$$" EXIT; echo hello; sleep 5; echo world) | less -E -c'
# GNU script command
#script -q -c 'sh -c "(trap \"kill -HUP -- -$$\" EXIT; echo hello; sleep 5; echo world) | less -E -c"' /dev/null
printf '\n%s\n\n' "you now may ctrl-c the program: $0" 1>&2
sleep 3
done
While I agree with chepner's suggestion, if you really want individual less instances, I think this item for the man page will help you:
-e or --quit-at-eof
Causes less to automatically exit the second time it reaches end-of-file. By default,
the only way to exit less is via the "q" command.
-E or --QUIT-AT-EOF
Causes less to automatically exit the first time it reaches end-of-file.
you would make this option visible to less in the LESS envir variable
export LESS="-E"
while : ; do
program | less
done
IHTH

How do I terminate all the subshell processes?

I have a bash script to test how a server performs under load.
num=1
if [ $# -gt 0 ]; then
num=$1
fi
for i in {1 .. $num}; do
(while true; do
{ time curl --silent 'http://localhost'; } 2>&1 | grep real
done) &
done
wait
When I hit Ctrl-C, the main process exits, but the background loops keep running. How do I make them all exit? Or is there a better way of spawning a configurable number of logic loops executing in parallel?
Here's a simpler solution -- just add the following line at the top of your script:
trap "kill 0" SIGINT
Killing 0 sends the signal to all processes in the current process group.
One way to kill subshells, but not self:
kill $(jobs -p)
Bit of a late answer, but for me solutions like kill 0 or kill $(jobs -p) go too far (kill all child processes).
If you just want to make sure one specific child-process (and its own children) are tidied up then a better solution is to kill by process group (PGID) using the sub-process' PID, like so:
set -m
./some_child_script.sh &
some_pid=$!
kill -- -${some_pid}
Firstly, the set -m command will enable job management (if it isn't already), this is important, as otherwise all commands, sub-shells etc. will be assigned to the same process group as your parent script (unlike when you run the commands manually in a terminal), and kill will just give a "no such process" error. This needs to be called before you run the background command you wish to manage as a group (or just call it at script start if you have several).
Secondly, note that the argument to kill is negative, this indicates that you want to kill an entire process group. By default the process group ID is the same as the first command in the group, so we can get it by simply adding a minus sign in front of the PID we fetched with $!. If you need to get the process group ID in a more complex case, you will need to use ps -o pgid= ${some_pid}, then add the minus sign to that.
Lastly, note the use of the explicit end of options --, this is important, as otherwise the process group argument will be treated as an option (signal number), and kill will complain it doesn't have enough arguments. You only need this if the process group argument is the first one you wish to terminate.
Here is a simplified example of a background timeout process, and how to cleanup as much as possible:
#!/bin/bash
# Use the overkill method in case we're terminated ourselves
trap 'kill $(jobs -p | xargs)' SIGINT SIGHUP SIGTERM EXIT
# Setup a simple timeout command (an echo)
set -m
{ sleep 3600; echo "Operation took longer than an hour"; } &
timeout_pid=$!
# Run our actual operation here
do_something
# Cancel our timeout
kill -- -${timeout_pid} >/dev/null 2>&1
wait -- -${timeout_pid} >/dev/null 2>&1
printf '' 2>&1
This should cleanly handle cancelling this simplistic timeout in all reasonable cases; the only case that can't be handled is the script being terminated immediately (kill -9), as it won't get a chance to cleanup.
I've also added a wait, followed by a no-op (printf ''), this is to suppress "terminated" messages that can be caused by the kill command, it's a bit of a hack, but is reliable enough in my experience.
You need to use job control, which, unfortunately, is a bit complicated. If these are the only background jobs that you expect will be running, you can run a command like this one:
jobs \
| perl -ne 'print "$1\n" if m/^\[(\d+)\][+-]? +Running/;' \
| while read -r ; do kill %"$REPLY" ; done
jobs prints a list of all active jobs (running jobs, plus recently finished or terminated jobs), in a format like this:
[1] Running sleep 10 &
[2] Running sleep 10 &
[3] Running sleep 10 &
[4] Running sleep 10 &
[5] Running sleep 10 &
[6] Running sleep 10 &
[7] Running sleep 10 &
[8] Running sleep 10 &
[9]- Running sleep 10 &
[10]+ Running sleep 10 &
(Those are jobs that I launched by running for i in {1..10} ; do sleep 10 & done.)
perl -ne ... is me using Perl to extract the job numbers of the running jobs; you can obviously use a different tool if you prefer. You may need to modify this script if your jobs has a different output format; but the above output is also on Cygwin, so it's very likely identical to yours.
read -r reads a "raw" line from standard input, and saves it into the variable $REPLY. kill %"$REPLY" will be something like kill %1, which "kills" (sends an interrupt signal to) job number 1. (Not to be confused with kill 1, which would kill process number 1.) Together, while read -r ; do kill %"$REPLY" ; done goes through each job number printed by the Perl script, and kills it.
By the way, your for i in {1 .. $num} won't do what you expect, since brace expansion is handled before parameter expansion, so what you have is equivalent to for i in "{1" .. "$num}". (And you can't have white-space inside the brace expansion, anyway.) Unfortunately, I don't know of a clean alternative; I think you have to do something like for i in $(bash -c "{1..$num}"), or else switch to an arithmetic for-loop or whatnot.
Also by the way, you don't need to wrap your while-loop in parentheses; & already causes the job to be run in a subshell.
Here's my eventual solution. I'm keeping track of the subshell process IDs using an array variable, and trapping the Ctrl-C signal to kill them.
declare -a subs #array of subshell pids
function kill_subs() {
for pid in ${subs[#]}; do
kill $pid
done
exit 0
}
num=1 if [ $# -gt 0 ]; then
num=$1 fi
for ((i=0;i < $num; i++)); do
while true; do
{ time curl --silent 'http://localhost'; } 2>&1 | grep real
done &
subs[$i]=$! #grab the pid of the subshell
done
trap kill_subs 1 2 15
wait
While these is not an answer, I just would like to point out something which invalidates the selected one; using jobs or kill 0 might have unexpected results; in my case it killed unintended processes which in my case is not an option.
It has been highlighted somehow in some of the answers but I am afraid not with enough stress or it has been not considered:
"Bit of a late answer, but for me solutions like kill 0 or kill $(jobs -p) go too far (kill all child processes)."
"If these are the only background jobs that you expect will be running, you can run a command like this one:"

Shell Script (bash/ksh): 20 seconds to read a variable

I need to wait for an input for 20 seconds, after that myscript should continue the execution.
I've tried using read -t20 var however this works only on bash. I'm using ksh on Solaris 10.
Can someone help me please?
EDIT: 20 seconds is only an example. Let's pretend it needs to wait for 1 hour. But the guy could or could not be in front the PC to write the input, he doesn't need to wait the 1 hour to enter an input, but if he's not in front of the PC so the shell should continue the execution after waiting for some time.
Thanks!
From man ksh:
TMOUT
If set to a value greater than zero, the shell terminates if a command is not entered within the prescribed number of seconds after issuing the PS1 prompt. The shell can be compiled with a maximum bound for this value which cannot be exceeded.
I'm not sure whether this works with read in ksh on Solaris. It does work with ksh93, but that version also has read -t.
This script includes this approach:
# Start the (potentially blocking) read process in the background
(read -p && print "$REPLY" > "$Tmp") & readpid=$!
# Now start a "watchdog" process that will kill the reader after
# some time:
(
sleep 2; kill $readpid >/dev/null 2>&1 ||
{ sleep 1; kill -1 $readpid >/dev/null 2>&1; } ||
{ sleep 1; kill -9 $readpid; }
) & watchdogpid=$!
# Now wait for the reading process to terminate. It will terminate
# reliably, either because the read terminated, or because the
# "watchdog" process made it terminate.
wait $readpid
# Now stop the watchdog:
kill -9 $watchdogpid >/dev/null 2>&1
REPLY=TERMINATED # Assume the worst
[[ -s $Tmp ]] && read < "$Tmp"
Look at this forum thread it has the answer in the third post.

Resources