Why is the KILL signal handler not executing when my child process dies - bash

I have migrated some scripts from ksh to bash and I have observed very strange behavior in bash. I was able to reduce to a very short snippet.
echo first test
LC_ALL=C xclock &
Active_pid=$!
sleep 1
kill -9 $Active_pid
sleep 1
echo second test
LC_ALL=C xclock &
Active_pid=$!
sleep 1
trap "echo Signal SIGKILL caught" 9
kill -9 $Active_pid
sleep 1
The output is
first test
./mig_bash.sh: line 15: 4471 Killed LC_ALL=C xclock
second test
My problem was the production of the trace in the first test. I have tried to see if a signal was received. By trial and error, I wrote the "second test" that solve my problem. I do not understand it. How this removes the trace of the first test without executing echo Signal SIGKILL ?
I am completely lost.

I couldn't find anything in the bash documentation that would explain the observed behavior, so I turned to the source code. Debugging lead to the function notify_of_job_status(). The line that prints the message about a killed subprocess can be reached only if all of the following conditions hold:
the subprocess is registered in the job table (i.e. has not been disown-ed)
the shell was NOT started in interactive mode
the signal that terminated the child process is NOT trapped in the parent shell (see the signal_is_trapped (termsig) == 0 check)
Demonstration:
$ cat test.sh
echo Starting a subprocess
LC_ALL=C sleep 100 &
Active_pid=$!
case "$1" in
disown) disown ;;
trapsigkill) trap "echo Signal SIGKILL caught" 9 ;;
esac
sleep 1
kill -9 $Active_pid
sleep 1
echo End of script
$ # Demonstrate the undesired message
$ bash test.sh
Starting a subprocess
test.sh: line 14: 15269 Killed LC_ALL=C sleep 100
End of script
$ # Suppress the undesired message by disowning the child process
$ bash test.sh disown
Starting a subprocess
End of script
$ # Suppress the undesired message by trapping SIGKILL in the parent shell
$ bash test.sh trapsigkill
Starting a subprocess
End of script
$ # Suppress the undesired message by using an interactive shell
$ bash -i test.sh
Starting a subprocess
End of script
How this removes the trace of the first test without executing echo Signal SIGKILL ?
The trap is not executed since the KILL signal is received by the sub-process rather than the shell process for which the trap has been set. The effect of the trap on the diagnostics is in the (somewhat arguable) logic in the notify_of_job_status() function.

Related

limit spawned parallel processes and exit all upon failure of any

I'm running some tests in parallel by calling a process from a script. Each process prints only to stdout > a file, and exits 0 iff successful (otherwise -1).
If and when a process exits with -1, I print something to its (or a related) output file (namely, the arguments it was called with), kill all other processes, and exit.
I have written a script using trap "..." CHLD to run some code when a subprocess exits and this works under certain conditions, but I find my script is not very robust. If I send a keyboard interrupt sometimes the subprocesses keep going, and sometimes the number of subprocesses simply overwhelm the machine(s) and none of them seem to advance.
I am using this on my quad core laptop as well as a cluster of 128 CPUs, over which subprocesses are distributed automatically. How do I run a large number of background subprocesses in a bash script, limited to some number of them running concurrently, and do something + exit if one of them returns with a bad code? I would also like the script to clean up after keyboard interrupt. Should I use GNU-parallel? how?
Here is a MWE of my script so far, which spawns subprocesses unhindered, annotated with what I think each part means. I got the idea to use trap from shell - get exit code of background process
$ cat parallel_tests.sh
#!/bin/bash
# some help from https://stackoverflow.com/questions/1570262/shell-get-exit-code-of-background-process
handle_chld() {
#echo pids are ${pids[#]}
local tmp=() ###temporary storage for pids that haven't finished
#for each pid that hadn't finished since the last trap
for((i=0;i<${#pids[#]};++i)); do
#if this pid is still running
if [[ $(ps -p ${pids[i]} -o pid=) ]]
then
tmp+=(${pids[i]}) ### add pid to list of pids that are running
else
wait ${pids[i]} ### put the exit code of this pid into $?
if [ "$?" != "0" ] ### if the exit code $? is non-zero
then
#kill all remaning processes
for((j=0;j<${#pids[#]};++j))
do
if [[ $(ps -p ${pids[j]} -o pid=) ]]
then
echo killing child processes of ${pids[j]}
pkill -P ${pids[j]}
fi
done
cat _tmp${pids[i]}
#print things to the terminal here
echo "FAILED process ${pids[i]} args: `cat _tmpargs${pids[i]}`"
exit 1
else
echo "FINISHED process ${pids[i]} args: `cat _tmpargs${pids[i]}`"
fi
fi
done
#update list of running pids
pids=(${tmp[#]})
}
# set this to monitor SIGCHLD
set -o monitor
# call handle_chld() when SIGCHLD signal is triggered
trap "handle_chld" CHLD
ALL_ARGS="2 32 87" ### ad nauseam
for A in $ALL_ARGS; do
(sleep $A; false) > _tmp$! &
pids+=($!)
echo $A > _tmpargs${pids[${#pids[#]}-1]}
echo "STARTED process ${pids[${#pids[#]}-1]} args: `cat _tmpargs${pids[${#pids[#]}-1]}`"
done
echo "Every process started. Now waiting on PIDS:"
echo ${pids[#]}
wait ${pids[#]} ###wait until every process is finished (or exit in the trap)
The output of this version after 2+epsilon seconds is:
$ ./parallel_tests.sh
STARTED process 66369 args: 2
STARTED process 66374 args: 32
STARTED process 66381 args: 87
Every process started. Now waiting on PIDS:
66369 66374 66381
killing child processes of 66374
./parallel_tests.sh: line 43: 66376 Terminated: 15 sleep $A
killing child processes of 66381
./parallel_tests.sh: line 43: 66383 Terminated: 15 sleep $A
FAILED process 66369 args: 2
Essentially, pid 66369 fails first, and the other two processes are dealt with in the trap. I have simplified the construction of the test processes here, so we can't assume that I'll manually insert waits before spawning new ones. Additionally, some of the test processes can be nearly instant. Essentially, I have a whole mess of test processes, long and short, starting as soon as resources can be allotted.
I'm not sure what's causing the problems I mentioned above, as this script uses several features that are new to me. General pointers are welcomed!
(I have seen this question and it does not answer my question)
cat arguments | parallel --halt now,fail=1 my_prg
Alternatively:
parallel --halt now,fail=1 my_prg ::: $ALL_ARGS
GNU Parallel is designed so it will also kill remote jobs. It does that using process groups and heavy perl scripting on the remote server: https://www.gnu.org/software/parallel/parallel_design.html#The-remote-system-wrapper

Sending SIGINT to foreground process works but not background

I have two scripts. script1 spawns script2 and then sends a SIGINT signal to it. However the trap in script2 doesn't seem to work?!
script1:
#!/bin/bash
./script2 &
sleep 1
kill -SIGINT $!
sleep 2
script2:
#!/bin/bash
echo "~~ENTRY"
trap 'echo you hit ctrl-c, waking up...' SIGINT
sleep infinity
echo "~~EXIT"
If change ./script2 & to ./script2 and press CTRL+C the whole things works fine. So what am I doing wrong?
You have several issues in your examples, at the end I have a solution for your issue:
your first script seems to miss a wait statement, thus, it exits
after roughly 3 seconds. However script2 will remain in memory and
running.
How do you want bash to automatically figure which process it should
send the SIGINT signal ?
Actually bash will disable SIGINT (and SIGQUIT) on background processes and they can't be enabled (you can check by running trap command alone to check the current status of set traps). See How to send a signal SIGINT from script to script ? BASH
So your script2 is NOT setting a trap on SIGINT because it's a background process, both SIGINT and SIGQUIT are ignored and can't be anymore trapped nor resetted on background processes.
As a reference, here are the documentation from bash related to your issue:
Process group id effect on background process (in Job Control section of doc):
[...] processes whose process group ID is equal to the current terminal
process group ID [..] receive keyboard-generated signals such as
SIGINT. These processes are said to be in the foreground.
Background processes are those whose process group ID differs from
the terminal's; such processes are immune to keyboard-generated
signals.
Default handler for SIGINT and SIGQUIT (in Signals section of doc):
Non-builtin commands run by bash have signal handlers set to the values inherited by the shell from its parent. When job control is not in effect, asynchronous commands ignore SIGINT and SIGQUIT in addition to these inherited handlers.
and about modification of traps (in trap builtin doc):
Signals ignored upon entry to the shell cannot be trapped or reset.
SOLUTION 1
modify your script1 to be:
#!/bin/bash
{ ./script2; } &
sleep 1
subshell_pid=$!
pid=$(ps -ax -o ppid,pid --no-headers | sed -r 's/^ +//g;s/ +/ /g' |
grep "^$subshell_pid " | cut -f 2 -d " ")
kill -SIGINT $pid
sleep 2
wait ## Don't forget this.
How does this work ? Actually, the usage of { and } will create a subshell, that will be limited by the explained limitation on SIGINT, because this subshell is a background process. However, the subshell's own subprocess are foreground and NOT background processes (for our subshell scope)... as a consequence, they can trap or reset SIGINT and SIGQUIT signals.
The trick is then to find the pid of this subprocess in the subshell, here I use ps to find the only process having the subshell's pid as parent pid.
SOLUTION 2
Actually, only direct new process managed as job will get their SIGINT and SIGQUIT ignored. A simple bash function won't. So if script2 code was in a function sourced in script1, here would be your new script1 that doesn't need anything else:
#!/bin/bash
script2() {
## script2 code
echo "~~ENTRY"
trap 'echo you hit ctrl-c, waking up...' SIGINT
sleep infinity
echo "~~EXIT"
}
## script1 code
script2 &
sleep 1
kill -SIGINT $!
sleep 2
This will work also. Behind the scene, the same mecanism than SOLUTION 1 is working: a bash function is very close to the { } construct.
I guess what you are trying to achieve is that when script2 receives the SIGINT it continues and prints the message. Then, you need
#!/bin/bash
echo "~~ENTRY"
trap 'echo you hit ctrl-c, waking up...; CONT=true' SIGINT
CONT=false
while ! $CONT
do
sleep 1
done
echo "~~EXIT"

Why subshell can't catch signal from parent shell?

I've got 2 shell scripts:
# subshell.sh
trap "echo Caught SIGTERM" 15
echo $$
sleep 100000
# parent.sh
setsid sh subshell.sh &
pid=$!
echo "sid=$pid"
sleep 2
# This won't work!
kill -15 -$pid
The main purpose is to send SIGTERM to subshell and all its children. After googling for a while (there is a tricky problem of how bash handles signal), I choose setsid to create a new session and sending the signal used -pid. However, the message won't be printed although pid is correct. If I manually execuate kill -15 -$pid, this can work. So how can I send a signal to the subshell?
Well finally I managed to make this work by creating another subshell..., and then call kill -15 -$pid inside that subshell. Still don't know why parent shell can't do this

SIGALRM waits for subshell processes?

Here is the unexpected situation: in the following script, SIGALRM doesn't invoke the function alarm() at the expected time.
#!/bin/sh -x
alarm() {
echo "alarmed!!!"
}
trap alarm 14
OUTER=$(exec sh -c 'echo $PPID')
#for arg in `ls $0`; do
ls $0 | while read arg; do
INNER=$(exec sh -c 'echo $PPID')
# child A, the timer
sleep 1 && kill -s 14 $$ &
# child B, some other scripts
sleep 60 &
wait $!
done
Expectation:
After 1 second, the function alarm() should be called.
Actually:
alarm() is called until 60s, or when we hit Ctrl+C.
We know in the script, $$ actually indicates the OUTER process, so I suppose we should see the string printed to screen after 1 second. However, it is until child B exits do we see alarm() is called.
When we get the trap line commented, the whole program just terminates after 1 second. So... I suppose SIGALRM is at least received, but why doesn't it invoke actions?
And, as a side question, is the default behavior of SIGALRM to be termination? From here I am told that by default it is ignored, so why OUTER exits after receiving it?
From the bash man page:
If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will
not be executed until the command completes. When bash is waiting for an asynchronous command via the wait
builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately
with an exit status greater than 128, immediately after which the trap is executed.
Your original script is in the first scenario. The subshell (the while read loop) has called wait, but the top level script is just waiting for the subshell, so when it receives the signal, the trap is not executed until the subshell completes. If you send the signal to the subshell with kill -s 14 $INNER, you get the behavior you expect.

How to suppress Terminated message after killing in bash?

How can you suppress the Terminated message that comes up after you kill a
process in a bash script?
I tried set +bm, but that doesn't work.
I know another solution involves calling exec 2> /dev/null, but is that
reliable? How do I reset it back so that I can continue to see stderr?
In order to silence the message, you must be redirecting stderr at the time the message is generated. Because the kill command sends a signal and doesn't wait for the target process to respond, redirecting stderr of the kill command does you no good. The bash builtin wait was made specifically for this purpose.
Here is very simple example that kills the most recent background command. (Learn more about $! here.)
kill $!
wait $! 2>/dev/null
Because both kill and wait accept multiple pids, you can also do batch kills. Here is an example that kills all background processes (of the current process/script of course).
kill $(jobs -rp)
wait $(jobs -rp) 2>/dev/null
I was led here from bash: silently kill background function process.
The short answer is that you can't. Bash always prints the status of foreground jobs. The monitoring flag only applies for background jobs, and only for interactive shells, not scripts.
see notify_of_job_status() in jobs.c.
As you say, you can redirect so standard error is pointing to /dev/null but then you miss any other error messages. You can make it temporary by doing the redirection in a subshell which runs the script. This leaves the original environment alone.
(script 2> /dev/null)
which will lose all error messages, but just from that script, not from anything else run in that shell.
You can save and restore standard error, by redirecting a new filedescriptor to point there:
exec 3>&2 # 3 is now a copy of 2
exec 2> /dev/null # 2 now points to /dev/null
script # run script with redirected stderr
exec 2>&3 # restore stderr to saved
exec 3>&- # close saved version
But I wouldn't recommend this -- the only upside from the first one is that it saves a sub-shell invocation, while being more complicated and, possibly even altering the behavior of the script, if the script alters file descriptors.
EDIT:
For more appropriate answer check answer given by Mark Edgar
Solution: use SIGINT (works only in non-interactive shells)
Demo:
cat > silent.sh <<"EOF"
sleep 100 &
kill -INT $!
sleep 1
EOF
sh silent.sh
http://thread.gmane.org/gmane.comp.shells.bash.bugs/15798
Maybe detach the process from the current shell process by calling disown?
The Terminated is logged by the default signal handler of bash 3.x and 4.x. Just trap the TERM signal at the very first of child process:
#!/bin/sh
## assume script name is test.sh
foo() {
trap 'exit 0' TERM ## here is the key
while true; do sleep 1; done
}
echo before child
ps aux | grep 'test\.s[h]\|slee[p]'
foo &
pid=$!
sleep 1 # wait trap is done
echo before kill
ps aux | grep 'test\.s[h]\|slee[p]'
kill $pid ## no need to redirect stdin/stderr
sleep 1 # wait kill is done
echo after kill
ps aux | grep 'test\.s[h]\|slee[p]'
Is this what we are all looking for?
Not wanted:
$ sleep 3 &
[1] 234
<pressing enter a few times....>
$
$
[1]+ Done sleep 3
$
Wanted:
$ (set +m; sleep 3 &)
<again, pressing enter several times....>
$
$
$
$
$
As you can see, no job end message. Works for me in bash scripts as well, also for killed background processes.
'set +m' disables job control (see 'help set') for the current shell. So if you enter your command in a subshell (as done here in brackets) you will not influence the job control settings of the current shell. Only disadvantage is that you need to get the pid of your background process back to the current shell if you want to check whether it has terminated, or evaluate the return code.
This also works for killall (for those who prefer it):
killall -s SIGINT (yourprogram)
suppresses the message... I was running mpg123 in background mode.
It could only silently be killed by sending a ctrl-c (SIGINT) instead of a SIGTERM (default).
disown did exactly the right thing for me -- the exec 3>&2 is risky for a lot of reasons -- set +bm didn't seem to work inside a script, only at the command prompt
Had success with adding 'jobs 2>&1 >/dev/null' to the script, not certain if it will help anyone else's script, but here is a sample.
while true; do echo $RANDOM; done | while read line
do
echo Random is $line the last jobid is $(jobs -lp)
jobs 2>&1 >/dev/null
sleep 3
done
Another way to disable job notifications is to place your command to be backgrounded in a sh -c 'cmd &' construct.
#!/bin/bash
# ...
pid="`sh -c 'sleep 30 & echo ${!}' | head -1`"
kill "$pid"
# ...
# or put several cmds in sh -c '...' construct
sh -c '
sleep 30 &
pid="${!}"
sleep 5
kill "${pid}"
'
I found that putting the kill command in a function and then backgrounding the function suppresses the termination output
function killCmd() {
kill $1
}
killCmd $somePID &
Simple:
{ kill $! } 2>/dev/null
Advantage? can use any signal
ex:
{ kill -9 $PID } 2>/dev/null

Resources