I have a process that is already running for a long time and don't want to end it.
How do I put it under nohup (that is, how do I cause it to continue running even if I close the terminal?)
Using the Job Control of bash to send the process into the background:
Ctrl+Z to stop (pause) the program and get back to the shell.
bg to run it in the background.
disown -h [job-spec] where [job-spec] is the job number (like %1 for the first running job; find about your number with the jobs command) so that the job isn't killed when the terminal closes.
Suppose for some reason Ctrl+Z is also not working, go to another terminal, find the process id (using ps) and run:
kill -SIGSTOP PID
kill -SIGCONT PID
SIGSTOP will suspend the process and SIGCONT will resume the process, in background. So now, closing both your terminals won't stop your process.
The command to separate a running job from the shell ( = makes it nohup) is disown and a basic shell-command.
From bash-manpage (man bash):
disown [-ar] [-h] [jobspec ...]
Without options, each jobspec is removed from the table of active jobs. If the -h option is given, each jobspec is not
removed from the table, but is marked so that SIGHUP is not sent to the job if the shell receives a SIGHUP. If no jobspec is
present, and neither the -a nor the -r option is supplied, the current job is used. If no jobspec is supplied, the -a option
means to remove or mark all jobs; the -r option without a jobspec argument restricts operation to running jobs. The return
value is 0 unless a jobspec does not specify a valid job.
That means, that a simple
disown -a
will remove all jobs from the job-table and makes them nohup
These are good answers above, I just wanted to add a clarification:
You can't disown a pid or process, you disown a job, and that is an important distinction.
A job is something that is a notion of a process that is attached to a shell, therefore you have to throw the job into the background (not suspend it) and then disown it.
Issue:
% jobs
[1] running java
[2] suspended vi
% disown %1
See http://www.quantprinciple.com/invest/index.php/docs/tipsandtricks/unix/jobcontrol/
for a more detailed discussion of Unix Job Control.
Unfortunately disown is specific to bash and not available in all shells.
Certain flavours of Unix (e.g. AIX and Solaris) have an option on the nohup command itself which can be applied to a running process:
nohup -p pid
See http://en.wikipedia.org/wiki/Nohup
Node's answer is really great, but it left open the question how can get stdout and stderr redirected. I found a solution on Unix & Linux, but it is also not complete. I would like to merge these two solutions. Here it is:
For my test I made a small bash script called loop.sh, which prints the pid of itself with a minute sleep in an infinite loop.
$./loop.sh
Now get the PID of this process somehow. Usually ps -C loop.sh is good enough, but it is printed in my case.
Now we can switch to another terminal (or press ^Z and in the same terminal). Now gdb should be attached to this process.
$ gdb -p <PID>
This stops the script (if running). Its state can be checked by ps -f <PID>, where the STAT field is 'T+' (or in case of ^Z 'T'), which means (man ps(1))
T Stopped, either by a job control signal or because it is being traced
+ is in the foreground process group
(gdb) call close(1)
$1 = 0
Close(1) returns zero on success.
(gdb) call open("loop.out", 01102, 0600)
$6 = 1
Open(1) returns the new file descriptor if successful.
This open is equal with open(path, O_TRUNC|O_CREAT|O_RDWR, S_IRUSR|S_IWUSR).
Instead of O_RDWR O_WRONLY could be applied, but /usr/sbin/lsof says 'u' for all std* file handlers (FD column), which is O_RDWR.
I checked the values in /usr/include/bits/fcntl.h header file.
The output file could be opened with O_APPEND, as nohup would do, but this is not suggested by man open(2), because of possible NFS problems.
If we get -1 as a return value, then call perror("") prints the error message. If we need the errno, use p errno gdb comand.
Now we can check the newly redirected file. /usr/sbin/lsof -p <PID> prints:
loop.sh <PID> truey 1u REG 0,26 0 15008411 /home/truey/loop.out
If we want, we can redirect stderr to another file, if we want to using call close(2) and call open(...) again using a different file name.
Now the attached bash has to be released and we can quit gdb:
(gdb) detach
Detaching from program: /bin/bash, process <PID>
(gdb) q
If the script was stopped by gdb from an other terminal it continues to run. We can switch back to loop.sh's terminal. Now it does not write anything to the screen, but running and writing into the file. We have to put it into the background. So press ^Z.
^Z
[1]+ Stopped ./loop.sh
(Now we are in the same state as if ^Z was pressed at the beginning.)
Now we can check the state of the job:
$ ps -f 24522
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID><PPID> 0 11:16 pts/36 S 0:00 /bin/bash ./loop.sh
$ jobs
[1]+ Stopped ./loop.sh
So process should be running in the background and detached from the terminal. The number in the jobs command's output in square brackets identifies the job inside bash. We can use in the following built in bash commands applying a '%' sign before the job number :
$ bg %1
[1]+ ./loop.sh &
$ disown -h %1
$ ps -f <PID>
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID><PPID> 0 11:16 pts/36 S 0:00 /bin/bash ./loop.sh
And now we can quit from the calling bash. The process continues running in the background. If we quit its PPID become 1 (init(1) process) and the control terminal become unknown.
$ ps -f <PID>
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID> 1 0 11:16 ? S 0:00 /bin/bash ./loop.sh
$ /usr/bin/lsof -p <PID>
...
loop.sh <PID> truey 0u CHR 136,36 38 /dev/pts/36 (deleted)
loop.sh <PID> truey 1u REG 0,26 1127 15008411 /home/truey/loop.out
loop.sh <PID> truey 2u CHR 136,36 38 /dev/pts/36 (deleted)
COMMENT
The gdb stuff can be automatized creating a file (e.g. loop.gdb) containing the commands and run gdb -q -x loop.gdb -p <PID>. My loop.gdb looks like this:
call close(1)
call open("loop.out", 01102, 0600)
# call close(2)
# call open("loop.err", 01102, 0600)
detach
quit
Or one can use the following one liner instead:
gdb -q -ex 'call close(1)' -ex 'call open("loop.out", 01102, 0600)' -ex detach -ex quit -p <PID>
I hope this is a fairly complete description of the solution.
Simple and easiest steps
Ctrl + Z ----------> Suspends the process
bg --------------> Resumes and runs background
disown %1 -------------> required only if you need to detach from the terminal
To send running process to nohup (http://en.wikipedia.org/wiki/Nohup)
nohup -p pid , it did not worked for me
Then I tried the following commands and it worked very fine
Run some SOMECOMMAND,
say /usr/bin/python /vol/scripts/python_scripts/retention_all_properties.py 1.
Ctrl+Z to stop (pause) the program and get back to the shell.
bg to run it in the background.
disown -h so that the process isn't killed when the terminal closes.
Type exit to get out of the shell because now you're good to go as the operation will run in the background in its own process, so it's not tied to a shell.
This process is the equivalent of running nohup SOMECOMMAND.
ctrl + z - this will pause the job (not going to cancel!)
bg - this will put the job in background and return in running process
disown -a - this will cut all the attachment with job (so you can close the terminal and it will still run)
These simple steps will allow you to close the terminal while keeping process running.
It wont put on nohup (based on my understanding of your question, you don't need it here).
On my AIX system, I tried
nohup -p processid>
This worked well. It continued to run my process even after closing terminal windows. We have ksh as default shell so the bg and disown commands didn't work.
Related
Just switched from bash to zsh.
In bash, background tasks continue running when the shell exits. For example here, dolphin continues running after the exit:
$ dolphin .
^Z
[1]+ Stopped dolphin .
$ bg
[1]+ dolphin . &
$ exit
This is what I want as the default behavior.
In contrast, zsh's behavior is to warn about running jobs on exit, then close them if you exit again. For example here, dolphin is closed when the second exit-command actually exits the shell:
% dolphin .
^Z
zsh: suspended dolphin .
% bg
[1] + continued dolphin .
% exit
zsh: you have running jobs.
% exit
How do I make zsh's default behavior here like bash's?
Start the program with &!:
dolphin &!
The &! (or equivalently, &|) is a zsh-specific shortcut to both background and disown the process, such that exiting the shell will leave it running.
From the zsh documentation:
HUP
... In zsh, if you have a background job running when the shell exits, the shell will assume you want that to be killed; in this case it is sent a particular signal called SIGHUP... If you often start jobs that should go on even when the shell has exited, then you can set the option NO_HUP, and background jobs will be left alone.
So just set the NO_HUP option:
% setopt NO_HUP
I have found that using a combination of nohup, &, and disown works for me, as I don't want to permanently cause jobs to run when the shell has exited.
nohup <command> & disown
While just & has worked for me in bash, I found when using only nohup, &, or disown on running commands, like a script that calls a java run command, the process would still stop when the shell is exited.
nohup makes the command ignore NOHUP and SIGHUP signals from the shell
& makes the process run in the background in a subterminal
disown followed by an argument (the index of the job number in your jobs list) prevents the shell from sending a SIGHUP signal to child processes. Using disown without an argument causes it to default to the most recent job.
I found the nohup and disown information at this page, and the & information in this SO answer.
Update
When I originally wrote this, I was using it for data processing scripts/programs. For those kinds of use cases, something like ts (task-spooler), works nicely.
I typically use screen for keeping background jobs running.
1) Create a screen session:
screen -S myScreenName
2) Launch your scripts,services,daemons or whatever
3) Exit (detach) screen-session with
screen -d
or
shortcut ALT+A then d
After few hundreds of years - if you want to resume your session (reattach):
screen -r myScreenName
If you want to know if there's a screen-session, its name and its status (attached or detached):
screen -ls
This solution works on all terminal interpreters like bash, zsh etc.
See also man screen
I want to write a bash script that will continue to run if the user is disconnected, but can be aborted if the user presses Ctrl+C.
I can solve the first part of it like this:
#!/bin/bash
cmd='
#commands here, avoiding single quotes...
'
nohup bash -c "$cmd" &
tail -f nohup.out
But pressing Ctrl+C obviously just kills the tail process, not the main body. Can I have both? Maybe using Screen?
I want to write a bash script that will continue to run if the user is disconnected, but can be aborted if the user presses Ctrl+C.
I think this is exactly the answer on the question you formulated, this one without screen:
#!/bin/bash
cmd=`cat <<EOF
# commands here
EOF
`
nohup bash -c "$cmd" &
# store the process id of the nohup process in a variable
CHPID=$!
# whenever ctrl-c is pressed, kill the nohup process before exiting
trap "kill -9 $CHPID" INT
tail -f nohup.out
Note however that nohup is not reliable. When the invoking user logs out, chances are that nohup also quits immediately. In that case disown works better.
bash -c "$cmd" &
CHPID=$!
disown
This is probably the simplest form using screen:
screen -S SOMENAME script.sh
Then, if you get disconnected, on reconnection simply run:
screen -r SOMENAME
Ctrl+C should continue to work as expected
Fact 1: When a terminal (xterm for example) gets closed, the shell is supposed to send a SIGHUP ("hangup") to any processes running in it. This harkens back to the days of analog modems, when a program needed to clean up after itself if mom happened to pick up the phone while you were online. The signal could be trapped, so that a special function could do the cleanup (close files, remove temporary junk, etc). The concept of "losing your connection" still exists even though we use sockets and SSH tunnels instead of analog modems. (Concepts don't change; all that changes is the technology we use to implement them.)
Fact 2: The effect of Ctrl-C depends on your terminal settings. Normally, it will send a SIGINT, but you can check by running stty -a in your shell and looking for "intr".
You can use these facts to your advantage, using bash's trap command. For example try running this in a window, then press Ctrl-C and check the contents of /tmp/trapped. Then run it again, close the window, and again check the contents of /tmp/trapped:
#!/bin/bash
trap "echo 'one' > /tmp/trapped" 1
trap "echo 'two' > /tmp/trapped" 2
echo "Waiting..."
sleep 300000
For information on signals, you should be able to man signal (FreeBSD or OSX) or man 7 signal (Linux).
(For bonus points: See how I numbered my facts? Do you understand why?)
So ... to your question. To "survive" disconnection, you want to specify behaviour that will be run when your script traps SIGHUP.
(Bonus question #2: Now do you understand where nohup gets its name?)
Just switched from bash to zsh.
In bash, background tasks continue running when the shell exits. For example here, dolphin continues running after the exit:
$ dolphin .
^Z
[1]+ Stopped dolphin .
$ bg
[1]+ dolphin . &
$ exit
This is what I want as the default behavior.
In contrast, zsh's behavior is to warn about running jobs on exit, then close them if you exit again. For example here, dolphin is closed when the second exit-command actually exits the shell:
% dolphin .
^Z
zsh: suspended dolphin .
% bg
[1] + continued dolphin .
% exit
zsh: you have running jobs.
% exit
How do I make zsh's default behavior here like bash's?
Start the program with &!:
dolphin &!
The &! (or equivalently, &|) is a zsh-specific shortcut to both background and disown the process, such that exiting the shell will leave it running.
From the zsh documentation:
HUP
... In zsh, if you have a background job running when the shell exits, the shell will assume you want that to be killed; in this case it is sent a particular signal called SIGHUP... If you often start jobs that should go on even when the shell has exited, then you can set the option NO_HUP, and background jobs will be left alone.
So just set the NO_HUP option:
% setopt NO_HUP
I have found that using a combination of nohup, &, and disown works for me, as I don't want to permanently cause jobs to run when the shell has exited.
nohup <command> & disown
While just & has worked for me in bash, I found when using only nohup, &, or disown on running commands, like a script that calls a java run command, the process would still stop when the shell is exited.
nohup makes the command ignore NOHUP and SIGHUP signals from the shell
& makes the process run in the background in a subterminal
disown followed by an argument (the index of the job number in your jobs list) prevents the shell from sending a SIGHUP signal to child processes. Using disown without an argument causes it to default to the most recent job.
I found the nohup and disown information at this page, and the & information in this SO answer.
Update
When I originally wrote this, I was using it for data processing scripts/programs. For those kinds of use cases, something like ts (task-spooler), works nicely.
I typically use screen for keeping background jobs running.
1) Create a screen session:
screen -S myScreenName
2) Launch your scripts,services,daemons or whatever
3) Exit (detach) screen-session with
screen -d
or
shortcut ALT+A then d
After few hundreds of years - if you want to resume your session (reattach):
screen -r myScreenName
If you want to know if there's a screen-session, its name and its status (attached or detached):
screen -ls
This solution works on all terminal interpreters like bash, zsh etc.
See also man screen
My question is very similar to this one except that my background process was started from a script. I could be doing something wrong but when I try this simple example:
#!/bin/bash
set -mb # enable job control and notification
sleep 5 &
I never receive notification when the sleep background command finishes. However, if I execute the same directly in the terminal,
$ set -mb
$ sleep 5 &
$
[1]+ Done sleep 5
I see the output that I expect.
I'm using bash on cygwin. I'm guessing that it might have something to do with where the output is directed, but trying various output redirection, I'm not getting any closer.
EDIT: So I have more information about the why (thx to jkramer), but still looking for the how. How do I get "push" notification that a background process started from a script has terminated? Saving a PID to a file and polling is not what I'm looking for.
if you run it as source file . then it can notify. e.g.
cat foo.sh
#!/bin/bash
set -mb # enable job control and notification
sleep 5 &
. foo.sh
[1]+ Done sleep 5
The job control of your shell only affects processes controlled by your terminal, that is tasks started directly from the shell.
When the parent process (your script) dies, the init process automatically becomes the parent of the child process (your sleep command), effectively killing all output. Try this example:
[jkramer/sgi5k:~]# cat foo.sh
#!/bin/bash
sleep 20 &
echo $!
[jkramer/sgi5k:~]# bash foo.sh
19638
[jkramer/sgi5k:~]# ps -ef | grep 19638
jkramer 19638 1 0 23:08 pts/3 00:00:00 sleep 20
jkramer 19643 19500 0 23:08 pts/3 00:00:00 grep 19638
As you can see, the parent process of sleep after the script terminates is 1, the init process. If you need to get notified about the termination of your child process, you can save the content of the $! variable (PID of the last created process) in a file or somewhere and use the `wait´ command to wait for its termination.
This question already has answers here:
What's the best way to send a signal to all members of a process group?
(34 answers)
Closed 6 years ago.
For testing purposes I have this shell script
#!/bin/bash
echo $$
find / >/dev/null 2>&1
Running this from an interactive terminal, ctrl+c will terminate bash, and the find command.
$ ./test-k.sh
13227
<Ctrl+C>
$ ps -ef |grep find
$
Running it in the background, and killing the shell only will orphan the commands running in the script.
$ ./test-k.sh &
[1] 13231
13231
$ kill 13231
$ ps -ef |grep find
nos 13232 1 3 17:09 pts/5 00:00:00 find /
$
I want this shell script to terminate all its child processes when it exits regardless of how it's called. It'll eventually be started from a python and java application - and some form of cleanup is needed when the script exits - any options I should look into or any way to rewrite the script to clean itself up on exit?
I would do something like this:
#!/bin/bash
trap : SIGTERM SIGINT
echo $$
find / >/dev/null 2>&1 &
FIND_PID=$!
wait $FIND_PID
if [[ $? -gt 128 ]]
then
kill $FIND_PID
fi
Some explanation is in order, I guess. Out the gate, we need to change some of the default signal handling. : is a no-op command, since passing an empty string causes the shell to ignore the signal instead of doing something about it (the opposite of what we want to do).
Then, the find command is run in the background (from the script's perspective) and we call the wait builtin for it to finish. Since we gave a real command to trap above, when a signal is handled, wait will exit with a status greater than 128. If the process waited for completes, wait will return the exit status of that process.
Last, if the wait returns that error status, we want to kill the child process. Luckily we saved its PID. The advantage of this approach is that you can log some error message or otherwise identify that a signal caused the script to exit.
As others have mentioned, putting kill -- -$$ as your argument to trap is another option if you don't care about leaving any information around post-exit.
For trap to work the way you want, you do need to pair it up with wait - the bash man page says "If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes." wait is the way around this hiccup.
You can extend it to more child processes if you want, as well. I didn't really exhaustively test this one out, but it seems to work here.
$ ./test-k.sh &
[1] 12810
12810
$ kill 12810
$ ps -ef | grep find
$
Was looking for an elegant solution to this issue and found the following solution elsewhere.
trap 'kill -HUP 0' EXIT
My own man pages say nothing about what 0 means, but from digging around, it seems to mean the current process group. Since the script get's it's own process group, this ends up sending SIGHUP to all the script's children, foreground and background.
Send a signal to the group.
So instead of kill 13231 do:
kill -- -13231
If you're starting from python then have a look at:
http://www.pixelbeat.org/libs/subProcess.py
which shows how to mimic the shell in starting
and killing a group
#Patrick's answer almost did the trick, but it doesn't work if the parent process of your current shell is in the same group (it kills the parent too).
I found this to be better:
trap 'pkill -P $$' EXIT
See here for more info.
Just add a line like this to your script:
trap "kill $$" SIGINT
You might need to change 'SIGINT' to 'INT' on your setup, but this will basically kill your process and all child processes when you hit Ctrl-C.
The thing you would need to do is trap the kill signal, kill the find command and exit.