I have a bash script that among other things, launches a background process. I use a function that setups some configuration for the process, launches it, checks it started correctly, and returns its PID, which is used later to kill the subprocess. The sample code below has the same structure but simplified logic:
function launcher(){
sleep 30 &
echo $!
PID=$(launcher)
echo $PID
kill $PID
The issue I'm facing is that the subshell that executes the launcher function does not return until the sleep command ends. Therefore the echo $PID statement is not executed until the subshell ends.
what surprises me is that if I check the sleep command, it does not have the script as parent id:
UID PID PPID C STIME TTY STAT TIME CMD
user 20135 1 0 18:39 pts/8 S+ 0:00 sleep 30
How can I start the sleep & in the background to allow the subshell to end before it ends?
Note: Please notice in my case, the background process will never end until I kill it, so I need the subshell to end get the PID. Also notice in my real code, the logic of the launcher function is quite complex and I'm running it as a subshell to isolate the main process from it.
Thanks in advance
It happens that the problem was about stdin because the main shell was reading from the subshell's stdout, which is inherited by the background process. Just redirecting the stdout when invoking the background process makes it work as expected.
sleep 100 > /dev/null &
I'm not sure if this gets it done but
function launcher(){
echo "start launching"
sleep 100 &
echo "end launching"
}
launcher
PID=$!
# Here $PID is the process id of `sleep`
echo $PID
kill $PID
Without the kill, this runs the sleep command forked and the shell script ends leaving the sleep command running with the pid set in PID which means you could kill it later or not.
Is this what you need? If not, can you clarify what you're expecting?
I also noticed that if the parent script stays alive, the PPID of the sleep process is correct and stays in tact.
# sleeper_test.sh
#!/bin/bash
function launcher(){
echo "start launching"
sleep 100 &
echo "end launching"
}
launcher
PID=$!
# Here $PID is the process id of `sleep`
echo $PID
sleep 10
#kill $PID
$ ps -ef | grep sleep
501 13748 5471 0 1:54PM ttys000 0:00.00 /bin/bash ./sleeper_test.sh
501 13749 13748 0 1:54PM ttys000 0:00.00 sleep 100 <- child correctly tied to the parent sh script
501 13750 13748 0 1:54PM ttys000 0:00.00 sleep 10
$ ps -ef | grep sleep
501 13749 1 0 1:54PM ttys000 0:00.00 sleep 100 <- since the parent ended - it's parent becomes the root process parent
``
Related
Creating a bash script with this command:
cat <<"END"> z
#! /bin/bash
sleep 20 && exit 1 &
ret=$!
ps $ret | grep $ret
END
and then running it gives:
7230 pts/39 S+ 0:00 /bin/bash ./z
I was expecting to see sleep 20 ... which is the child process. If I remove the && exit 1 it does return the child process.
Whats the reason? How can I get the child process id in above statement?
You already get the right information about the child process. Only in your case, ps doesn't know or want to show a proper COMMAND name for your chained sub-process you start in the background - what probably confused you.
Looks like this is the case with the chained commands (.. && ..., thus it has nothing to do with exit 1 could be also echo 5 etc.) where the process group leader name is showed as cmd name instead.
From the (ps man page)
`cmd | COMMAND`: simple name of executable
# Process state codes
`S`: interruptible sleep (waiting for an event to complete)
`+`: is in the foreground process group
See the S+ in your ps | grep output.
So, you can adapt your script a bit to confirm that you actually capture(d) the right information about the child process, like so:
cat <<"END"> z
#! /bin/bash
sleep 20 && exit 1 &
ret=$!
echo $ret
jobs -l
# display parent and child process info
# -j Jobs format
ps -j $$ $ret
END
Output of echo $ret:
30274
Output of jobs -l:
[1]+ 30274 Running sleep 20 && exit 1 &
Output of ps -j $$ $ret:
PID PGID SID TTY STAT TIME COMMAND
30273 30273 21804 pts/0 S+ 0:00 /bin/bash ./z
30274 30273 21804 pts/0 S+ 0:00 /bin/bash ./z
Note that both the parent and child have the same PGID, whereas the pid 30274 of the child process displayed by jobs -l and ps ... matches.
Further, if you change sleep 20 && exit 1 & as bash -c 'sleep 20 && exit 1' & you would get a proper command name for the child this time, as follows (cf. output order above):
30384
[1]+ 30384 Running bash -c 'sleep 20 && exit 1' &
PID PGID SID TTY STAT TIME COMMAND
30383 30383 21804 pts/0 S+ 0:00 /bin/bash ./z
30384 30383 21804 pts/0 S+ 0:00 bash -c sleep 20 && exit 1
Last but not least, in your original version instead of ps $ret | grep $ret you could also try
pstree -s $ret
From pstree man page
-s: Show parent processes of the specified process.
Which will provide you with an output similar to that one below, which would also confirm that you get the right process info for sleep 20 && exit 1 &:
systemd───systemd───gnome-terminal-───bash───bash───sleep
What you see is not parent pid, but sub-shell pid
When you run :
sleep 20 && exit 1 &
The processes tree is like :
current-shell ---> sub-shell ---> 'sleep 20 && exit 1'
When you run :
sleep 20 &
The processes tree is like :
current-shell ---> 'sleep 20'
Reason why you see pid for 'sleep 20'
Whats the reason?
The reason is that some entity has to do &&. It can't be sleep, because sleep only sleeps, and after sleep terminates (so there is no longer sleep to make any decision), some "entity" needs to compare the exit status of sleep and decide and then execute exit 1. That "entity" is the shell, that has to be "above" sleep to do the action. So the "real" background process is the shell, and sleep is it's child process.
In case of only sleep 20 & there is an optimization in bash that the parent shell in case bash sees there is only a single command to do. So bash scans the whole command command bla bla & and sees there is only one command to do. Because of that bash does only call to exec instead of the standard fork+exec and becomes sleep itself instead of running a child process. Because of the exec the subshell becomes sleep, so you see it in process name. It's a resource optimization done bash.
What is the shortest way to sleep a bash script at a certain location until another script wakes it up to continue it's job?
Mayby using flock -u .. or blocking read on a pipe ?
Say scriptA sleeps and waits for being waken up by scriptB.
One way is, in A, before you sleep, write the pid to some file say scriptA.pid then falling in sleep.
When B is running, at the right moment, you can read the scriptA.pid file, to get the pid of A, then do pkill -P pidofA sleep thus, the sleep sub-process will be killed, and A will continue its execution.
I'm a fan of named pipes (fifo). scriptA.sh:
pipe='/tmp/mypipe'
mkfifo "$pipe"
echo "$0 going to sleep..."
# Should block
read < "$pipe"
echo "$0 continuing"
scriptB.sh
pipe='/tmp/mypipe'
mkfifo "$pipe"
echo "$0 waking other process"
# might block
echo > "$pipe"
echo "$0 exiting"
You will get a mkfifo: /tmp/mypipe: File exists from the second mkfifo, if that bothers you then test for existence first (-e "$pipe"). This does not tidy-up (rm) the fifo, not sure where that should go because timing of the application is critical to where you put that.
You could use the inter process signals: the kill command should be used to send a signal to a process using its pid.
The SIGSTOP signal stops the execution of the process.
The SIGCONT signal resumes the process execution.
The example script below:
stores the pid of the process in a file.
the script sends to its own process the SIGSTOP signal ($$ is the pid of the current bash process).
Hopefully, another process will resume the execution.
Give a try to this:
#!/bin/bash --
printf "%s" $$ > /tmp/aScript.pid
kill -STOP $$ # STOP the execution here
# execution continues here when the SIGCONT signal is received
printf "script %s: received the SIGCONT signal\n" $$
Test in a terminal:
$ ./aScript.sh &
[1] 26444
$ kill -CONT $(cat /tmp/aScript.pid)
script 26444: received the SIGCONT signal
1st method
The running script can stop itself -
$: cat flagfile
#!/usr/bin/bash
echo $$ > /tmp/flagfile.pid
kill -STOP $$
date
$: ./flagfile &
[1] 24679
$: ps -fu $LOGNAME | grep 'flagfile$'
P2759474 24679 24521 0 13:29 pts/0 00:00:00 /usr/bin/bash ./flagfile
[1]+ Stopped ./flagfile
Then any other script can restart it.
$: kill -CONT $(</tmp/flagfile.pid)
$: Wed Dec 12 13:36:01 CST 2018
That last line gave me back a prompt before the background process managed to output the date. :)
2nd method
If a delay is ok, you can have a trap break it out.
This isn't totally stopping the script, but you can set the delay and make it as freindly as you have leeway to wait for it to wake up.
$: cat flagfile
#!/usr/bin/bash
trap 'loop=0' USR1
loop=1
delay=2
echo $$ > /tmp/flagfile.pid
while (( loop )); do sleep $delay; done
date
$: ./flagfile &
[1] 25018
$: ps -fu $LOGNAME | grep 'flagfile$'
P2759474 25018 24521 0 13:42 pts/0 00:00:00 /usr/bin/bash ./flagfile
Wait as long as you like....
$: kill -USR1 $(</tmp/flagfile.pid)
$: Wed Dec 12 13:42:43 CST 2018
[1]+ Done ./flagfile
Suppose I input the following in a shell
(while true; do echo hahaha; sleep 1; done)&
Then I know I can kill it by
fg; CTRL-C
However, if the command above is in a script e.g. tmp.sh and I'm running that script, how to kill it?
(while true; do echo hahaha; sleep 1; done)&
RUNNING_PID=$!
kill ${RUNNING_PID}
$! will pick up the PID of the process that is running so you can do with it as you wish
Let's suppose that you have your bash script named tmp.sh with the next content:
#!/bin/bash
(while true; do echo hahaha; sleep 1; done)&
And you execute it! Of course, it will print hahaha to the stdout every 1 second. You can't list it with the jobs command. But... it's still a process! And it's a child in the forest of the current terminal! So:
1- Get the file name of the terminal connected to standard input:
$tty
/dev/pts/2
2- List the processes associated with the terminal (In the example we are using pts/2), and show the status with S and display in a forest format f:
$ps --tty pts/2 Sf
PID TTY STAT TIME COMMAND
3691 pts/2 Ss+ 0:00 /bin/bash
3787 pts/2 S 0:00 /bin/bash
4879 pts/2 S 0:00 \_ sleep 1
3- Now, you can see that the example lists a sleep 1 command that is a child of the /bin/bash process with PID 3787. Now kill it!
kill -9 3787
Note: Don't kill the bash process that has the s+ statuses, is bash process that gives you the prompt! From man(ps):
s is a session leader
+ is in the foreground process group
Recommendations:
In a case like this, you should save the PID in a file:
#!/bin/bash
(while true; do echo hahaha; sleep 1; done)&
echo $! > /path/to/my_script.pid
Then, you could just do some script to shut it down:
#!/bin/bash
kill -9 $(cat /path/to/my_script.pid)
I have a Bash script that runs a long running process in the foreground. When it receives a SIGQUIT signal, it should perform various cleanup operations such as killing itself and all of its child processes (via kill of process group etc.). A minimal script, that should catch the signal, is shown below (called test_trap.sh):
#!/bin/bash
trap 'echo "TRAP CAUGHT"; exit 1' QUIT # other required signals are omitted for brevity
echo starting sleep
sleep 11666
echo ending sleep
echo done
I would like to send the SIGHUP signal to the process of the test_trap.sh script. However, sending a SIGHUP to the test_trap.sh does not trigger the trap expression, but only when I send the signal to the child sleep 11666 process does the trap fire. Below is a bash session demonstrating this:
bash-4.1$ test_trap.sh &
[1] 19633
bash-4.1$ starting sleep
bash-4.1$ kill -s SIGQUIT 19633
bash-4.1$ jobs
[1]+ Running test_trap.sh &
bash-4.1$ ps -ef --forest --cols=10000 | grep '11666\|test_trap.sh' | grep -v grep
theuser 19633 12227 0 07:40 pts/4 00:00:00 \_ /bin/bash ./test_trap.sh
theuser 19634 19633 0 07:40 pts/4 00:00:00 | \_ sleep 11666
bash-4.1$ kill -s SIGQUIT 19634
bash-4.1$ Quit (core dumped)
TRAP CAUGHT
[1]+ Exit 1 test_trap.sh
bash-4.1$ ps -ef --forest --cols=10000 | grep '11666\|test_trap.sh' | grep -v grep
bash-4.1$
Note that the "sleep 11666" is just a representative process. That process can actually be an interactive subshell (e.g., bash -i).
Why doesn't the parent test_trap.sh process catch the SIGHUP signal? Why would the trap fire only when the process for sleep 11666 was signaled?
I do not want to use uncatchable SIGKILL as I do need to do an assortment of cleanup operations in the trap expression.
This script is intended run on any fairly recent version of any Linux distribution containing Bash (e.g., not Cygwin).
References:
killing Parent process along with child process using SIGKILL
Kill bash and child process
bash must wait for sleep to complete before it can execute the handler. A good workaround is to run sleep in the background, then immediately wait for it. While sleep is uninterruptible, wait is not.
trap 'kill $sleep_pid; echo "TRAP CAUGHT"; exit 1' QUIT
echo starting sleep
sleep 11666 &
sleep_pid=$!
wait
echo ending sleep
echo done
The recording of sleep_pid and using it to kill sleep from the handler are optional.
Actually, bash is receiving the signal, but it is in an uninterruptible state waiting for the sleep command to end. When it ends, bash will react to the signal and execute the trap.
You can replace the long sleep command with a loop of short sleep commands:
while true
do
sleep 1
done
With that, if you send the signal to the bash process, it will react as soon as the currently executing sleep command ends, that is, at most 1 second after it was sent.
Try with the signal SIGINT (the same which is sent by pressing Ctrl+C) instead of SIGKILL. Other signals only work when the bash can process I/O or some other condition.
I have a program (C++ Executable) on AIX 5.3 that launches a Shell Script (ksh).
When I launch the program and the shell script, i see two processes
AIX:>ps -ef | grep 3657892
u001 **3657892** 3670248 0 18:16:34 pts/11 0:00 /u0012006/bin/Launcher
u001 3723398 **3657892** 0 18:16:41 pts/11 0:00 /usr/bin/ksh /u0012006/shell/Trjt_Slds.sh -m
Now, When I do a CTRL-X key combination on the Keyboard to end and go out of the Shell Script, the main launching program (C++ Executable) process gets killed while the shell script continues to execute.
AIX:>ps -ef | grep 3723398
u001 3723398 1 106 18:16:41 pts/11 0:01 /usr/bin/ksh /u0012006/shell/Trjt_Slds.sh -m
u001 3731504 3723398 0 0:00 <defunct>
u001 3735612 3723398 0 0:00 <defunct>
u001 3739838 3723398 0 0:00 <defunct>
This is leading to the CPU Consumption going to 100% and a lot of defunct processes get launched.
Is there a way to have the AIX Shell Script terminate first when I do a CTRL-X?
Note: Launcher is broken and should be fixed. Thus, any "solution" will be a hack.
One thought is to check $PPID in various places in the script. If it is set to 1 (init), then exit the script.
I don't understand the use of control-X. That is not going to generate any tty signal. I guess that is what you want. Perhaps the tty is also in raw mode. But you might consider hooking control-X up to one of the various tty signals like SIGINT. e.g. stty intr ^X but you will also need to remember to unset it with stty intr ^C
Last, you could wrap the script in a script and use the technique to kill the child and exit. e.g. (untested)
#!/bin/ksh
# launch original program in background
/path/to/real/program "$#" &
# get child's pid
child=$!
while : ; do
# when we become an orphan
if [[ $$PPID -eq 1 ]] ; then
# kill the child and exit
kill $child
exit
fi
# poll once a second
sleep 1
done
Update
./s1 is:
#!/bin/ksh
./s2 &
sleep 10
exit
./s2 is:
#!/bin/ksh
while : ; do
if kill -0 $PPID ; then
echo still good
else
echo orphaned
exit
fi
sleep 1
done
ksh always does this. Just got bitten by this, unlike bash, ksh does not forward hup signals when you exit. if you can find the child pids you can hup them yourself.