Orphan Child (ksh Shell Script not terminating first upon CTRL-X) - bash

I have a program (C++ Executable) on AIX 5.3 that launches a Shell Script (ksh).
When I launch the program and the shell script, i see two processes
AIX:>ps -ef | grep 3657892
u001 **3657892** 3670248 0 18:16:34 pts/11 0:00 /u0012006/bin/Launcher
u001 3723398 **3657892** 0 18:16:41 pts/11 0:00 /usr/bin/ksh /u0012006/shell/Trjt_Slds.sh -m
Now, When I do a CTRL-X key combination on the Keyboard to end and go out of the Shell Script, the main launching program (C++ Executable) process gets killed while the shell script continues to execute.
AIX:>ps -ef | grep 3723398
u001 3723398 1 106 18:16:41 pts/11 0:01 /usr/bin/ksh /u0012006/shell/Trjt_Slds.sh -m
u001 3731504 3723398 0 0:00 <defunct>
u001 3735612 3723398 0 0:00 <defunct>
u001 3739838 3723398 0 0:00 <defunct>
This is leading to the CPU Consumption going to 100% and a lot of defunct processes get launched.
Is there a way to have the AIX Shell Script terminate first when I do a CTRL-X?

Note: Launcher is broken and should be fixed. Thus, any "solution" will be a hack.
One thought is to check $PPID in various places in the script. If it is set to 1 (init), then exit the script.
I don't understand the use of control-X. That is not going to generate any tty signal. I guess that is what you want. Perhaps the tty is also in raw mode. But you might consider hooking control-X up to one of the various tty signals like SIGINT. e.g. stty intr ^X but you will also need to remember to unset it with stty intr ^C
Last, you could wrap the script in a script and use the technique to kill the child and exit. e.g. (untested)
#!/bin/ksh
# launch original program in background
/path/to/real/program "$#" &
# get child's pid
child=$!
while : ; do
# when we become an orphan
if [[ $$PPID -eq 1 ]] ; then
# kill the child and exit
kill $child
exit
fi
# poll once a second
sleep 1
done
Update
./s1 is:
#!/bin/ksh
./s2 &
sleep 10
exit
./s2 is:
#!/bin/ksh
while : ; do
if kill -0 $PPID ; then
echo still good
else
echo orphaned
exit
fi
sleep 1
done

ksh always does this. Just got bitten by this, unlike bash, ksh does not forward hup signals when you exit. if you can find the child pids you can hup them yourself.

Related

How to prevent bash subshell from waiting for child process

I have a bash script that among other things, launches a background process. I use a function that setups some configuration for the process, launches it, checks it started correctly, and returns its PID, which is used later to kill the subprocess. The sample code below has the same structure but simplified logic:
function launcher(){
sleep 30 &
echo $!
PID=$(launcher)
echo $PID
kill $PID
The issue I'm facing is that the subshell that executes the launcher function does not return until the sleep command ends. Therefore the echo $PID statement is not executed until the subshell ends.
what surprises me is that if I check the sleep command, it does not have the script as parent id:
UID PID PPID C STIME TTY STAT TIME CMD
user 20135 1 0 18:39 pts/8 S+ 0:00 sleep 30
How can I start the sleep & in the background to allow the subshell to end before it ends?
Note: Please notice in my case, the background process will never end until I kill it, so I need the subshell to end get the PID. Also notice in my real code, the logic of the launcher function is quite complex and I'm running it as a subshell to isolate the main process from it.
Thanks in advance
It happens that the problem was about stdin because the main shell was reading from the subshell's stdout, which is inherited by the background process. Just redirecting the stdout when invoking the background process makes it work as expected.
sleep 100 > /dev/null &
I'm not sure if this gets it done but
function launcher(){
echo "start launching"
sleep 100 &
echo "end launching"
}
launcher
PID=$!
# Here $PID is the process id of `sleep`
echo $PID
kill $PID
Without the kill, this runs the sleep command forked and the shell script ends leaving the sleep command running with the pid set in PID which means you could kill it later or not.
Is this what you need? If not, can you clarify what you're expecting?
I also noticed that if the parent script stays alive, the PPID of the sleep process is correct and stays in tact.
# sleeper_test.sh
#!/bin/bash
function launcher(){
echo "start launching"
sleep 100 &
echo "end launching"
}
launcher
PID=$!
# Here $PID is the process id of `sleep`
echo $PID
sleep 10
#kill $PID
$ ps -ef | grep sleep
501 13748 5471 0 1:54PM ttys000 0:00.00 /bin/bash ./sleeper_test.sh
501 13749 13748 0 1:54PM ttys000 0:00.00 sleep 100 <- child correctly tied to the parent sh script
501 13750 13748 0 1:54PM ttys000 0:00.00 sleep 10
$ ps -ef | grep sleep
501 13749 1 0 1:54PM ttys000 0:00.00 sleep 100 <- since the parent ended - it's parent becomes the root process parent
``

bash getting background process id gives parent pid

Creating a bash script with this command:
cat <<"END"> z
#! /bin/bash
sleep 20 && exit 1 &
ret=$!
ps $ret | grep $ret
END
and then running it gives:
7230 pts/39 S+ 0:00 /bin/bash ./z
I was expecting to see sleep 20 ... which is the child process. If I remove the && exit 1 it does return the child process.
Whats the reason? How can I get the child process id in above statement?
You already get the right information about the child process. Only in your case, ps doesn't know or want to show a proper COMMAND name for your chained sub-process you start in the background - what probably confused you.
Looks like this is the case with the chained commands (.. && ..., thus it has nothing to do with exit 1 could be also echo 5 etc.) where the process group leader name is showed as cmd name instead.
From the (ps man page)
`cmd | COMMAND`: simple name of executable
# Process state codes
`S`: interruptible sleep (waiting for an event to complete)
`+`: is in the foreground process group
See the S+ in your ps | grep output.
So, you can adapt your script a bit to confirm that you actually capture(d) the right information about the child process, like so:
cat <<"END"> z
#! /bin/bash
sleep 20 && exit 1 &
ret=$!
echo $ret
jobs -l
# display parent and child process info
# -j Jobs format
ps -j $$ $ret
END
Output of echo $ret:
30274
Output of jobs -l:
[1]+ 30274 Running sleep 20 && exit 1 &
Output of ps -j $$ $ret:
PID PGID SID TTY STAT TIME COMMAND
30273 30273 21804 pts/0 S+ 0:00 /bin/bash ./z
30274 30273 21804 pts/0 S+ 0:00 /bin/bash ./z
Note that both the parent and child have the same PGID, whereas the pid 30274 of the child process displayed by jobs -l and ps ... matches.
Further, if you change sleep 20 && exit 1 & as bash -c 'sleep 20 && exit 1' & you would get a proper command name for the child this time, as follows (cf. output order above):
30384
[1]+ 30384 Running bash -c 'sleep 20 && exit 1' &
PID PGID SID TTY STAT TIME COMMAND
30383 30383 21804 pts/0 S+ 0:00 /bin/bash ./z
30384 30383 21804 pts/0 S+ 0:00 bash -c sleep 20 && exit 1
Last but not least, in your original version instead of ps $ret | grep $ret you could also try
pstree -s $ret
From pstree man page
-s: Show parent processes of the specified process.
Which will provide you with an output similar to that one below, which would also confirm that you get the right process info for sleep 20 && exit 1 &:
systemd───systemd───gnome-terminal-───bash───bash───sleep
What you see is not parent pid, but sub-shell pid
When you run :
sleep 20 && exit 1 &
The processes tree is like :
current-shell ---> sub-shell ---> 'sleep 20 && exit 1'
When you run :
sleep 20 &
The processes tree is like :
current-shell ---> 'sleep 20'
Reason why you see pid for 'sleep 20'
Whats the reason?
The reason is that some entity has to do &&. It can't be sleep, because sleep only sleeps, and after sleep terminates (so there is no longer sleep to make any decision), some "entity" needs to compare the exit status of sleep and decide and then execute exit 1. That "entity" is the shell, that has to be "above" sleep to do the action. So the "real" background process is the shell, and sleep is it's child process.
In case of only sleep 20 & there is an optimization in bash that the parent shell in case bash sees there is only a single command to do. So bash scans the whole command command bla bla & and sees there is only one command to do. Because of that bash does only call to exec instead of the standard fork+exec and becomes sleep itself instead of running a child process. Because of the exec the subshell becomes sleep, so you see it in process name. It's a resource optimization done bash.

Identify whether a process was killed by a signal in bash

Consider these two C programs:
#include <signal.h>
int main(void) {
raise(SIGTERM);
}
int main(void) {
return 143;
}
If I run either one, the value of $? in bash will be 143. The wait syscall lets you distinguish them, though:
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11148
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 143}], 0, NULL) = 11214
And bash clearly uses this knowledge, since the first one results in Terminated being printed to the terminal (oddly, this happens even if I redirect both stdout and stderr elsewhere), and the second one doesn't. How can I differentiate these two cases from a bash script?
I believe getting the full exit codes from pure bash/shell is not possible.
The answers on Unix' StackExchange are very comprehensive.
What's common between all shells is that $? contains the lowest 8 bits of the exit code (the number passed to exit()) if the process terminated normally.
Where it differs is when the process is terminated by a signal. In all cases, and that's required by POSIX, the number will be greater than 128. POSIX doesn't specify what the value may be. In practice though, in all Bourne-like shells that I know, the lowest 7 bits of $? will contain the signal number. But, where n is the signal number,
in ash, zsh, pdksh, bash, the Bourne shell, $? is 128 + n. What that means is that in those shells, if you get a $? of 129, you don't know whether it's because the process exited with exit(129) or whether it was killed by the signal 1 (HUP on most systems). But the rationale is that shells, when they do exit themselves, by default return the exit status of the last exited command. By making sure $? is never greater than 255, that allows to have a consistent exit status:
$ bash -c 'sh -c "kill \$\$"; printf "%x\n" "$?"'
bash: line 1: 16720 Terminated sh -c "kill \$\$"
8f # 128 + 15
$ bash -c 'sh -c "kill \$\$"; exit'; printf '%x\n' "$?"
bash: line 1: 16726 Terminated sh -c "kill \$\$"
8f # here that 0x8f is from a exit(143) done by bash. Though it's
# not from a killed process, that does tell us that probably
# something was killed by a SIGTERM
For this reason, i believe, that you would need to run a command outside of bash to catch the exit code.
With some abstraction, a similar question has been asked regarding unbuffer which is a small script written in tcl. To be more precise, unbuffer uses the library libexpect with a tcl/tk wrapper.
From the source of unbuffer I extracted the relevant code to derive a workaround:
#!/bin/bash
expectStat() {
expect <(cat << EOT
set stty_init "-opost"
set timeout -1
eval [list spawn -noecho ] $#
expect
send_user "[wait]\n"
EOT
)
}
expectStat sleep 5 &
wait
which returns approximately the following line if sleep exits normally:
18383 exp4 0 0
If sleep is killed before it's exiting itself, the above script will approximately return:
18383 exp4 0 0 CHILDKILLED SIGTERM {software termination signal}
If a script is terminated with exit 143, the script will approximately return:
18383 exp4 0 143
The meaning of these strings can be extracted from the manual for expect. The integrated function wait is returning the above return lines.
The first two values are the pid, and expect's name for the process.
The fourth is the exit status. If a singal occurs more information is printed. The sixth value is the signal send to the process on its termination.
wait
normally returns a list of four integers. The first integer is the pid of the process that was waited upon. The second integer is the corresponding spawn id. The third integer is -1 if an operating system error occurred, or 0 otherwise. If the third integer was 0, the fourth integer is the status returned by the spawned process. If the third integer was -1, the fourth integer is the value of errno set by the operating system. The global variable errorCode is also set.
Additional elements may appear at the end of the return value from wait. An optional fifth element identifies a class of information. Currently, the only possible value for this element is CHILDKILLED in which case the next two values are the C-style signal name and a short textual description.
This means the fourth value and if present the sixth value are the values you are looking for. Store the whole line and extract the signal and exit code, for example with the following code:
RET=$(expectStat script.sh 1>&1)
# Filter status
EXITVALUE="$(echo "$RET" | cut -d' ' -f4)"
SIGNAL=$(echo "$RET" | cut -d' ' -f6)
#echo "Exit value: $EXITVALUE, Signal: $SIGNAL"
if [ -n "$SIGNAL" ]; then
echo "Likely killed by signal"
else
echo "$EXITVALUE"
fi
Conclusively, this workaround is very inelegant. Maybe, there is another tool which brings its own c-based tools to get the occurrence of a signal.
wait is a syscall and also a bash builtin.
To differentiate the two cases from bash run the program in the background and use the builtin wait to report the outcome.
Following are examples of both a non-zero exit code and an uncaught signal. These examples use the exit and kill bash builtins in a child bash shell, instead of a child bash shell you would run your program.
$ bash -c 'kill -s SIGTERM $$' & wait
[1] 36068
[1]+ Terminated: 15 bash -c 'kill -s SIGTERM $$'
$ bash -c 'exit 143' & wait
[1] 36079
[1]+ Exit 143 bash -c 'exit 143'
$
As to why you see Terminated printed to the terminal even when you redirect stdout and stderr the reason is that is printed by bash, not by the program.
Update:
By explicitly using the wait builtin you can now redirect its stderr (with the exit status of the program) to a separate file.
The following examples show the three types of termination: normal exit 0, non-zero exit, and uncaught signal. The results reported by wait are stored in files tagged with the PID of the corresponding program.
$ bash -c 'exit 0' & wait 2> exit_status_pid_$!
[1] 40279
$ bash -c 'exit 143' & wait 2> exit_status_pid_$!
[1] 40291
$ bash -c 'kill -s SIGTERM $$' & wait 2> exit_status_pid_$!
[1] 40303
$ for f in exit_status_pid*; do echo $f: $(cat $f); done
exit_status_pid_40279: [1]+ Done bash -c 'exit 0'
exit_status_pid_40291: [1]+ Exit 143 bash -c 'exit 143'
exit_status_pid_40303: [1]+ Terminated: 15 bash -c 'kill -s SIGTERM $$'
$
This is straying farther from bash but bcc offers exitsnoop. Using the
examples from the description, on Debian Sid:
root#vsid:~# apt install bpfcc-tools linux-headers-amd64
root#vsid:~# exitsnoop-bpfcc
PCOMM PID PPID TID AGE(s) EXIT_CODE
example1 1041 948 1041 0.00 signal 15 (TERM)
example2 1042 948 1042 0.00 code 143
^C
See the install guide for other distributions.
Strace can capture most of the signals, but might not work for syscalls (e.g. kill -9 ), therefore, as mentioned in this article:
Auditd is a daemon process or service that does as the name implies and produces audit logs of System level activities. It is installed from the usual repository as the audit package and then is configured in /etc/audit/auditd.conf and the rules are in /etc/audit/audit.rules.
The article provides examples for Audit output, which can help determining if it's helpful for you:
The usual output will look like this:
time->Wed Jun 3 16:34:08 2015
type=SYSCALL msg=audit(1433363648.091:6342): arch=c000003e syscall=62 success=no exit=-3 a0=1e06 a1=0 a2=1e06 a3=fffffffffffffff0 items=0 ppid=10044 pid=10140 auid=500 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm=4174746163682041504920696E6974 exe="/opt/ibm/WebSphere/AppServer/java/jre/bin/java" subj=unconfined_u:unconfined_r:unconfined_java_t:s0-s0:c0.c1023 key="kill_signals"
There is also mention of System Tap, and a redirection to a guide .

How to kill a background process created in a script

Suppose I input the following in a shell
(while true; do echo hahaha; sleep 1; done)&
Then I know I can kill it by
fg; CTRL-C
However, if the command above is in a script e.g. tmp.sh and I'm running that script, how to kill it?
(while true; do echo hahaha; sleep 1; done)&
RUNNING_PID=$!
kill ${RUNNING_PID}
$! will pick up the PID of the process that is running so you can do with it as you wish
Let's suppose that you have your bash script named tmp.sh with the next content:
#!/bin/bash
(while true; do echo hahaha; sleep 1; done)&
And you execute it! Of course, it will print hahaha to the stdout every 1 second. You can't list it with the jobs command. But... it's still a process! And it's a child in the forest of the current terminal! So:
1- Get the file name of the terminal connected to standard input:
$tty
/dev/pts/2
2- List the processes associated with the terminal (In the example we are using pts/2), and show the status with S and display in a forest format f:
$ps --tty pts/2 Sf
PID TTY STAT TIME COMMAND
3691 pts/2 Ss+ 0:00 /bin/bash
3787 pts/2 S 0:00 /bin/bash
4879 pts/2 S 0:00 \_ sleep 1
3- Now, you can see that the example lists a sleep 1 command that is a child of the /bin/bash process with PID 3787. Now kill it!
kill -9 3787
Note: Don't kill the bash process that has the s+ statuses, is bash process that gives you the prompt! From man(ps):
s is a session leader
+ is in the foreground process group
Recommendations:
In a case like this, you should save the PID in a file:
#!/bin/bash
(while true; do echo hahaha; sleep 1; done)&
echo $! > /path/to/my_script.pid
Then, you could just do some script to shut it down:
#!/bin/bash
kill -9 $(cat /path/to/my_script.pid)

How to get pid of piped command?

(or How to kill the child process)?
inotifywait -mqr --format '%w %f %e' $feedDir | while read dir file event
do
#something
done &
echo $! #5431
ps eg:
>$ ps
PID TTY TIME CMD
2867 pts/3 00:00:02 bash
5430 pts/3 00:00:00 inotifywait
5431 pts/3 00:00:00 bash
5454 pts/3 00:00:00 ps
It seems if I kill 5431 then 5430 (inotifywait) will be left running, but if I kill 5430 then both processes die. I don't suppose I can reliably assume that the pid of inotifywait will always be 1 less than $!?
When we run a pipe, each command is executed in a separated process. The interpreter waits for the last one but if we use ampersand (&).
cmd1 | cmd2 &
The pid of processes will be probably close, but we cannot assume it reliably. In the case where the last command is a bash reserved word as while, it creates a dedicated bash (that's why your 'dir', 'file' variables won't exist after the done keyword). Example:
ps # shows one bash process
echo "azerty" | while read line; do ps; done # shows one more bash
When the first command exits, the second one will terminate because the read on the pipe return EOF.
When the second command exits, the first command will be terminated by the signal SIGPIPE (write on a pipe with no reader) when it tries to write to the pipe. But if the command waits indefinitely... it is not terminated.
echo "$!" prints the pid of the last command executed in background. In your case, the bash process that is executing the while loop.
You can find the pid of "inotifywait" with the following syntax. But it's uggly:
(inotifywait ... & echo "$!">inotifywait.pid) | \
while read dir file event
do
#something
done &
cat inotifywait.pid # prints pid of inotifywait
If you don't want the pid, but just be sure the process will be terminated, you can use the -t option of inotifywait:
(while true; do inotifywait -t 10 ...; done)| \
while read dir file event
do
#something
done &
kill "$!" # kill the while loop
None of this solution are nice. What is your real achievement? Maybe we can find a more elegant solution.
If your goal is to make sure all of the children can be killed or interrupted elegantly. If you're using BusyBox's Ash, you don't have process substitution. If you don't want to use an fd either, check out this solution.
#!/bin/sh
pid=$$
terminate() {
pkill -9 -P "$pid"
}
trap terminate SIGHUP SIGINT SIGQUIT SIGTERM
# do your stuff here, note: should be run in the background {{{
inotifywait -mqr --format '%w %f %e' $feedDir | while read dir file event
do
#something
done &
# }}}
# Either pkill -9 -P "$pid" here
wait
# or pkill -9 -P "$pid" here
Or in another shell:
kill <pid ($$)>

Resources