How to be notified when a script's background job completes? - bash

My question is very similar to this one except that my background process was started from a script. I could be doing something wrong but when I try this simple example:
#!/bin/bash
set -mb # enable job control and notification
sleep 5 &
I never receive notification when the sleep background command finishes. However, if I execute the same directly in the terminal,
$ set -mb
$ sleep 5 &
$
[1]+ Done sleep 5
I see the output that I expect.
I'm using bash on cygwin. I'm guessing that it might have something to do with where the output is directed, but trying various output redirection, I'm not getting any closer.
EDIT: So I have more information about the why (thx to jkramer), but still looking for the how. How do I get "push" notification that a background process started from a script has terminated? Saving a PID to a file and polling is not what I'm looking for.

if you run it as source file . then it can notify. e.g.
cat foo.sh
#!/bin/bash
set -mb # enable job control and notification
sleep 5 &
. foo.sh
[1]+ Done sleep 5

The job control of your shell only affects processes controlled by your terminal, that is tasks started directly from the shell.
When the parent process (your script) dies, the init process automatically becomes the parent of the child process (your sleep command), effectively killing all output. Try this example:
[jkramer/sgi5k:~]# cat foo.sh
#!/bin/bash
sleep 20 &
echo $!
[jkramer/sgi5k:~]# bash foo.sh
19638
[jkramer/sgi5k:~]# ps -ef | grep 19638
jkramer 19638 1 0 23:08 pts/3 00:00:00 sleep 20
jkramer 19643 19500 0 23:08 pts/3 00:00:00 grep 19638
As you can see, the parent process of sleep after the script terminates is 1, the init process. If you need to get notified about the termination of your child process, you can save the content of the $! variable (PID of the last created process) in a file or somewhere and use the `wait´ command to wait for its termination.

Related

How to run the a shell script as background process and move on with next script without waiting for completion of first

I have below scripts ready with me -
1.sh:
echo "Good"
sleep 10
echo "Morning"
2.sh:
echo "Whats"
sleep 30
echo "Up"
script1.sh:
sh1.sh &
sh2.sh &
script2.sh:
echo "Hello world"
Requirement:
Execute script1.sh and do not wait for its completion or failure i.e., let the script run in background As soon as script1.sh is triggered the very next second execute the script2.sh.
./script1.sh
./script2.sh
Challenge:
./script2.sh keeps on waiting for completion of . ./script1.sh.
Like ./script2.sh I have lot of scripts to be run one after another but they should never wait for completion of ./script1.sh
Thanks,
B.J.
Just as youdid in 1.sh, you should append & after script1.sh:
#! /bin/bash
./script1.sh &
./script2.sh
exit 0
This will create a background process of script1.sh and continues in the main thread with script2.sh.
Usually, it a good practice not to leave background processes (unless they are long running servers, daemons, etc.). Better to make the parent script wait for all the children. Otherwise, you might have lot of orphan processes, which may use resources and have unintended consequences (e.g., open files, logging, ...)
Consider
#! /bin/bash
script1.sh &
script2.sh
script3.sh
wait # wait for any backgrounded processs
One immediate advantage is that killing the main script will also kill running script1 and script2. If for some reason the main script exit before all background childs are terminated, they can not be easily stopped (other then killing them by PID).
Also, using ps/pstree will show system status in clear way

Trying to close all child processes when I interrupt my bash script

I have written a bash script to carry out some tests on my system. The tests run in the background and in parallel. The tests can take a long time and sometimes I may wish to abort the tests part way through.
If I Control+C then it aborts the parent script, but leaves the various children running. I wish to make it so that I can hit Control+C or otherwise to quit and then kill all child processes running in the background. I have a bit of code that does the job if I'm running running the background jobs directly from the terminal, but it doesn't work in my script.
I have a minimal working example.
I have tried using trap in combination with pgrep -P $$.
#!/bin/bash
trap 'kill -n 2 $(pgrep -P $$)' 2
sleep 10 &
wait
I was hoping that on hitting control+c (SIGINT) would kill everything that the script started but it actually says:
./breakTest.sh: line 1: kill: (3220) - No such process
This number changes, but doesn't seem to apply to any running processes, so I don't know where it is coming from.
I guess if the contents of the trap command get evaluated where the trap command occurs then it might explain the outcome. The 3220 pid might be for pgrep itself.
I'd appreciate some insight here
Thanks
I have found a solution using pkill. This example also deals with many child processes.
#!/bin/bash
trap 'pkill -P $$' SIGINT SIGTERM
for i in {1..10}; do
sleep 10 &
done
wait
This appears to kill all the child processes elegantly. Though I don't properly understand what the issue was with my original code, apart from sending the correct signal.
in bash whenever you you use & after a command it places that command as a background job ( this background jobs are called job_spec ) which is incremented by one until you exit that terminal session. You can use the jobs command to get the list of the background jobs running. To work with this jobs you have to use the % with the job id. The jobs command also accept other options such as jobs -p to see the proces sids of all jobs , jobs -p %JOB_SPEC to see the process of id of that particular job.
#!/usr/bin/env bash
trap 'kill -9 %1' 2
sleep 10 &
wait
or
#!/usr/bin/env bash
trap 'kill -9 $(jobs -p %1)' 2
sleep 10 &
wait
I implemented something like this few years back, you can take a look at it async bash
You can try something like the following:
pkill -TERM -P <your_parent_id_here>

How to signal orphaned background process?

I am executing a shell script in background from my tcl script. The tcl script ends execution after some time. At this point I assume the background shell script becomes orphan and is adopted by init.
set res [catch { exec sudo $script &}]
Now the problem is I am not able to signal my (orphaned) background script. But why? Ok it now belongs to init but why can't I signal it. Only sigkill seems to work and that kills it - I need to trigger the signal handler I've written to handle SIGUSR2
trap 'process' SIGUSR2
Why can't I signal my orphan background process? Is there no way this can be done? Or is there some workaround?
EDIT: Seems to work fine when the sleep is not involved. See sample code below:
trap 'kill `cat /var/run/sleep.pid`; foo' SIGUSR2;
foo(){ echo test; }
while true; do
echo -n .
sleep 100 &
echo ${!} > /var/run/sleep.pid
wait ${!}
done
Works fine when not orphaned - but in the case of orphan process I think the problem is the true pid of sleep gets overwritten and I'm not able to kill it when the trap arrives.
lets run a small script like that:
bash -c '(trap foo SIGUSR2;foo(){ echo test; };while true; do echo -n .;sleep 1;done) & echo $!'; read
It will fork a background process which just runs and outputs some dots. It will also output the PID of the process, which you can use to check and signal it.
$ ps -f 19489
UID PID PPID C STIME TTY STAT TIME CMD
michas 19489 1 0 23:45 pts/8 S 0:00 bash -c (trap foo SIGUS...
Because the forking shell died directly after running the command in background, the process is now owned by init (PPID=1).
Now you can signal the process to call the handler:
kill -USR2 19489
If you do, you will notice the "test" output at the terminal printing the dots.
There should be no difference, whether you start a background process from shell or tcl. If it runs you can send it a signal and if there is a handler, it will be called.
If it really does not answer to signals it might be blocked, waiting for something. For example in a sleep or waiting for some IO.

Terminate running commands when shell script is killed [duplicate]

This question already has answers here:
What's the best way to send a signal to all members of a process group?
(34 answers)
Closed 6 years ago.
For testing purposes I have this shell script
#!/bin/bash
echo $$
find / >/dev/null 2>&1
Running this from an interactive terminal, ctrl+c will terminate bash, and the find command.
$ ./test-k.sh
13227
<Ctrl+C>
$ ps -ef |grep find
$
Running it in the background, and killing the shell only will orphan the commands running in the script.
$ ./test-k.sh &
[1] 13231
13231
$ kill 13231
$ ps -ef |grep find
nos 13232 1 3 17:09 pts/5 00:00:00 find /
$
I want this shell script to terminate all its child processes when it exits regardless of how it's called. It'll eventually be started from a python and java application - and some form of cleanup is needed when the script exits - any options I should look into or any way to rewrite the script to clean itself up on exit?
I would do something like this:
#!/bin/bash
trap : SIGTERM SIGINT
echo $$
find / >/dev/null 2>&1 &
FIND_PID=$!
wait $FIND_PID
if [[ $? -gt 128 ]]
then
kill $FIND_PID
fi
Some explanation is in order, I guess. Out the gate, we need to change some of the default signal handling. : is a no-op command, since passing an empty string causes the shell to ignore the signal instead of doing something about it (the opposite of what we want to do).
Then, the find command is run in the background (from the script's perspective) and we call the wait builtin for it to finish. Since we gave a real command to trap above, when a signal is handled, wait will exit with a status greater than 128. If the process waited for completes, wait will return the exit status of that process.
Last, if the wait returns that error status, we want to kill the child process. Luckily we saved its PID. The advantage of this approach is that you can log some error message or otherwise identify that a signal caused the script to exit.
As others have mentioned, putting kill -- -$$ as your argument to trap is another option if you don't care about leaving any information around post-exit.
For trap to work the way you want, you do need to pair it up with wait - the bash man page says "If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes." wait is the way around this hiccup.
You can extend it to more child processes if you want, as well. I didn't really exhaustively test this one out, but it seems to work here.
$ ./test-k.sh &
[1] 12810
12810
$ kill 12810
$ ps -ef | grep find
$
Was looking for an elegant solution to this issue and found the following solution elsewhere.
trap 'kill -HUP 0' EXIT
My own man pages say nothing about what 0 means, but from digging around, it seems to mean the current process group. Since the script get's it's own process group, this ends up sending SIGHUP to all the script's children, foreground and background.
Send a signal to the group.
So instead of kill 13231 do:
kill -- -13231
If you're starting from python then have a look at:
http://www.pixelbeat.org/libs/subProcess.py
which shows how to mimic the shell in starting
and killing a group
#Patrick's answer almost did the trick, but it doesn't work if the parent process of your current shell is in the same group (it kills the parent too).
I found this to be better:
trap 'pkill -P $$' EXIT
See here for more info.
Just add a line like this to your script:
trap "kill $$" SIGINT
You might need to change 'SIGINT' to 'INT' on your setup, but this will basically kill your process and all child processes when you hit Ctrl-C.
The thing you would need to do is trap the kill signal, kill the find command and exit.

How do I put an already-running process under nohup?

I have a process that is already running for a long time and don't want to end it.
How do I put it under nohup (that is, how do I cause it to continue running even if I close the terminal?)
Using the Job Control of bash to send the process into the background:
Ctrl+Z to stop (pause) the program and get back to the shell.
bg to run it in the background.
disown -h [job-spec] where [job-spec] is the job number (like %1 for the first running job; find about your number with the jobs command) so that the job isn't killed when the terminal closes.
Suppose for some reason Ctrl+Z is also not working, go to another terminal, find the process id (using ps) and run:
kill -SIGSTOP PID
kill -SIGCONT PID
SIGSTOP will suspend the process and SIGCONT will resume the process, in background. So now, closing both your terminals won't stop your process.
The command to separate a running job from the shell ( = makes it nohup) is disown and a basic shell-command.
From bash-manpage (man bash):
disown [-ar] [-h] [jobspec ...]
Without options, each jobspec is removed from the table of active jobs. If the -h option is given, each jobspec is not
removed from the table, but is marked so that SIGHUP is not sent to the job if the shell receives a SIGHUP. If no jobspec is
present, and neither the -a nor the -r option is supplied, the current job is used. If no jobspec is supplied, the -a option
means to remove or mark all jobs; the -r option without a jobspec argument restricts operation to running jobs. The return
value is 0 unless a jobspec does not specify a valid job.
That means, that a simple
disown -a
will remove all jobs from the job-table and makes them nohup
These are good answers above, I just wanted to add a clarification:
You can't disown a pid or process, you disown a job, and that is an important distinction.
A job is something that is a notion of a process that is attached to a shell, therefore you have to throw the job into the background (not suspend it) and then disown it.
Issue:
% jobs
[1] running java
[2] suspended vi
% disown %1
See http://www.quantprinciple.com/invest/index.php/docs/tipsandtricks/unix/jobcontrol/
for a more detailed discussion of Unix Job Control.
Unfortunately disown is specific to bash and not available in all shells.
Certain flavours of Unix (e.g. AIX and Solaris) have an option on the nohup command itself which can be applied to a running process:
nohup -p pid
See http://en.wikipedia.org/wiki/Nohup
Node's answer is really great, but it left open the question how can get stdout and stderr redirected. I found a solution on Unix & Linux, but it is also not complete. I would like to merge these two solutions. Here it is:
For my test I made a small bash script called loop.sh, which prints the pid of itself with a minute sleep in an infinite loop.
$./loop.sh
Now get the PID of this process somehow. Usually ps -C loop.sh is good enough, but it is printed in my case.
Now we can switch to another terminal (or press ^Z and in the same terminal). Now gdb should be attached to this process.
$ gdb -p <PID>
This stops the script (if running). Its state can be checked by ps -f <PID>, where the STAT field is 'T+' (or in case of ^Z 'T'), which means (man ps(1))
T Stopped, either by a job control signal or because it is being traced
+ is in the foreground process group
(gdb) call close(1)
$1 = 0
Close(1) returns zero on success.
(gdb) call open("loop.out", 01102, 0600)
$6 = 1
Open(1) returns the new file descriptor if successful.
This open is equal with open(path, O_TRUNC|O_CREAT|O_RDWR, S_IRUSR|S_IWUSR).
Instead of O_RDWR O_WRONLY could be applied, but /usr/sbin/lsof says 'u' for all std* file handlers (FD column), which is O_RDWR.
I checked the values in /usr/include/bits/fcntl.h header file.
The output file could be opened with O_APPEND, as nohup would do, but this is not suggested by man open(2), because of possible NFS problems.
If we get -1 as a return value, then call perror("") prints the error message. If we need the errno, use p errno gdb comand.
Now we can check the newly redirected file. /usr/sbin/lsof -p <PID> prints:
loop.sh <PID> truey 1u REG 0,26 0 15008411 /home/truey/loop.out
If we want, we can redirect stderr to another file, if we want to using call close(2) and call open(...) again using a different file name.
Now the attached bash has to be released and we can quit gdb:
(gdb) detach
Detaching from program: /bin/bash, process <PID>
(gdb) q
If the script was stopped by gdb from an other terminal it continues to run. We can switch back to loop.sh's terminal. Now it does not write anything to the screen, but running and writing into the file. We have to put it into the background. So press ^Z.
^Z
[1]+ Stopped ./loop.sh
(Now we are in the same state as if ^Z was pressed at the beginning.)
Now we can check the state of the job:
$ ps -f 24522
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID><PPID> 0 11:16 pts/36 S 0:00 /bin/bash ./loop.sh
$ jobs
[1]+ Stopped ./loop.sh
So process should be running in the background and detached from the terminal. The number in the jobs command's output in square brackets identifies the job inside bash. We can use in the following built in bash commands applying a '%' sign before the job number :
$ bg %1
[1]+ ./loop.sh &
$ disown -h %1
$ ps -f <PID>
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID><PPID> 0 11:16 pts/36 S 0:00 /bin/bash ./loop.sh
And now we can quit from the calling bash. The process continues running in the background. If we quit its PPID become 1 (init(1) process) and the control terminal become unknown.
$ ps -f <PID>
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID> 1 0 11:16 ? S 0:00 /bin/bash ./loop.sh
$ /usr/bin/lsof -p <PID>
...
loop.sh <PID> truey 0u CHR 136,36 38 /dev/pts/36 (deleted)
loop.sh <PID> truey 1u REG 0,26 1127 15008411 /home/truey/loop.out
loop.sh <PID> truey 2u CHR 136,36 38 /dev/pts/36 (deleted)
COMMENT
The gdb stuff can be automatized creating a file (e.g. loop.gdb) containing the commands and run gdb -q -x loop.gdb -p <PID>. My loop.gdb looks like this:
call close(1)
call open("loop.out", 01102, 0600)
# call close(2)
# call open("loop.err", 01102, 0600)
detach
quit
Or one can use the following one liner instead:
gdb -q -ex 'call close(1)' -ex 'call open("loop.out", 01102, 0600)' -ex detach -ex quit -p <PID>
I hope this is a fairly complete description of the solution.
Simple and easiest steps
Ctrl + Z ----------> Suspends the process
bg --------------> Resumes and runs background
disown %1 -------------> required only if you need to detach from the terminal
To send running process to nohup (http://en.wikipedia.org/wiki/Nohup)
nohup -p pid , it did not worked for me
Then I tried the following commands and it worked very fine
Run some SOMECOMMAND,
say /usr/bin/python /vol/scripts/python_scripts/retention_all_properties.py 1.
Ctrl+Z to stop (pause) the program and get back to the shell.
bg to run it in the background.
disown -h so that the process isn't killed when the terminal closes.
Type exit to get out of the shell because now you're good to go as the operation will run in the background in its own process, so it's not tied to a shell.
This process is the equivalent of running nohup SOMECOMMAND.
ctrl + z - this will pause the job (not going to cancel!)
bg - this will put the job in background and return in running process
disown -a - this will cut all the attachment with job (so you can close the terminal and it will still run)
These simple steps will allow you to close the terminal while keeping process running.
It wont put on nohup (based on my understanding of your question, you don't need it here).
On my AIX system, I tried
nohup -p processid>
This worked well. It continued to run my process even after closing terminal windows. We have ksh as default shell so the bg and disown commands didn't work.

Resources