how can a process be killed with all its childern - ruby

I have a situation where one xterm runs another xterm which runs a process.
I would like to set some kind of watchdog and kill all those windows in case of a stuck process.
I'm using Ruby to do this. When the first xterm is opened I get it's PID, and later when a SIGTERM is sent to it, only the first xterm window dies, but not the second.
It can be easily reproduced in irb:
irb(main):002:0> cmd = "xterm -e xterm -e sleep 1000"
=> "xterm -e xterm -e sleep 1000"
irb(main):003:0> pid = Process.spawn(cmd)
=> 669
irb(main):004:0> Process.kill(15, 669)
=> 1
This leaves the second xterm window open. How can all the process chain be killed?
Thanks

Using either ps xf or pstree, you can dynamically see running processes hierarchy.
I've used strace to track which was the behaviour of those programs. It turned that they are checking the contents of /proc/$pid/stat and /proc/$pid/status, which give them information of who the process's parent is.
Probably, you will have to look for the processes whose Parent PID is equal to the PID you got (or that process's PPID).

Related

No such process at killing command after sometime

I want to run an application from bash and kill it after sometime. I found this answer:
xmessage "Hello World" & pidsave=$! sleep 10; kill $pidsave
But this is the result:
[4] 23034
[3] Terminated xmessage "Hello World"
bash: kill: (22985) - No such process
As you see, xmessage did not stopped and it's window remains. Of course this works:
your_command & sleep 20; kill $!
What is wrong in first command? And what is it's prefer in comparison whit second command?
The process xmessage terminal already when you are running the kill command to kill it.
This can be seen here:
[3] Terminated xmessage "Hello World"
so there is no point kill-ing it afterwards.
To answer why the window is still there:
Many processes spawn another process, after that kills itself. This is happening in this case, the process you have spawned by running xmessage from terminal is spawning another one (the one with the window) and exits after that. The child remains running, and becomes an orphan as the parent died (init becomes it's new parent as init inherits all orphans).

Getting pid of a background gnome-terminal process

I can easily start a background process, find its pid and search it in the list of running processes.
$gedit &
$PID=$!
$ps -e | grep $PID
This works for me. But if I start gnome-terminal as the background process
$gnome-terminal &
$PID=$!
$ps -e | grep $PID
Then, it is not found in the list of all running process.
Am I missing something here?
If you use the "--disable-factory" option to gnome-terminal it's possible to use gnome-terminal in the way you desire. By default it attempts to use an already active terminal, so this would allow you to grab the pid of the one you launch. The following script opens a window for 5 seconds, then kills it:
#!/bin/bash
echo "opening a new terminal"
gnome-terminal --disable-factory &
pid=$!
echo "sleeping"
sleep 5;
echo "closing gnome-terminal"
kill -SIGHUP $pid
This appears to be because the gnome-terminal process you start starts a process itself and then exits. So the PID you capture is the pid of the "stub" process which starts up and then forks the real terminal. It does this so it can be completely detached from the calling terminal.
Unfortunately I do not know of any way of capturing the pid of the "granchild" gnome-terminal process which is the one left running. If you do a ps you will see the gnome-terminal "grandchild" process running with a parent pid of 1.
(This is just a footnote) As #Sodved said, gnome-terminal starts a process itself and then exits, there is no way to get the grandchild pid. (See also APUE Chapter 7 why a child process won't re-attach to the grandparent process when its parent process was terminated. )
I found that gnome-terminal instantiates only once, so here is just a short script for your specific task:
GNOME_TERMINAL_PID=`pidof gnome-terminal`
If you don't have pidof:
GNOME_TERMINAL_PID=`grep Name: */status | grep gnome-terminal | cut -d/ -f1`

Kill all processes launched inside an xterm when exit

I'm using Cygwin to start some servers.
Each server is launched inside an xterm with a bunch of command like this one:
xterm -e $my_cmd /C &
Is there an easy way to kill all launched children (xterm and their running commands) in a row ?
I want also to be able kill a particular launched command when I close its parent xterm.
Someone knows how to perform that ?
killall xterm? That command is in the psmisc package. Xterm will notify its child process with a SIGHUP ("hangup") before it exits. Normally that will cause the child process to exit too, although some servers interpret that signal differently.

Why do unix background processes sometimes die when I exit my shell?

I wanted to know why i am seeing a different behaviour in the background process in Bash shell
Case 1: Logged in to Unix server using Putty(SSH)
By default it uses csh shell
I changed to bash shell
typed sleep 2000 &
press enter
It gave me the job number. Now i killed my session by clicking the x in the putty window
Now open another session and tried to lookup the process..the process died.
Case 2:Case 1: Logged in to Unix server using Putty(SSH)
By default it uses csh shell
I changed to bash shell
vi mysleep.sh
sleep 2000 & Saved mysleep.sh
./mysleep.sh
Diff here is..instead of executing the sleep command directly i am storing the sleep command in a file and executing the file.
Now i killed my session by clicking the x in the putty window
Now open another session and tried to lookup the process..the process is still there
Not sure why this is happening. I thought i need to do disown in bash to run the process even after logging out.
One diff i see in the parent process id..In the second case..the parent process id for the sleep 2000 becomes 1. Looks like as soon as process for mysleep.sh died the kernel assigned the parent process to 1.
The difference here is indeed the intervening process.
When you close the terminal window, a HUP signal (related to "nohup" as an0nymo0usc0ward mentioned) is sent to the processes running in it. The default action on receiving HUP is to die - from the signal(3) manpage,
No Name Default Action Description
1 SIGHUP terminate process terminal line hangup
In your first example, the sleep process directly receives this HUP signal and dies because it isn't set to do anything else. (Some processes catch HUP and use it to perform some action, e.g. reread some configuration files)
In the second example, the shell process running your shell script has already died, so the sleep process never gets the signal. In UNIX, every process must have a parent process due to the internals of how the wait(2) family of calls works and indeed processes in general. So when the parent process dies, the kernel gives it to init (pid 1, as you note) as a foster child.
Orphan process (on wikipedia) has some more information available about it, also see Zombie process for some additional technical details.
Already running process?
^z
bg
disown %<jobid>
New process/script (on local machine's console)?
nohup script.sh &
New process/script (on remote machine's console)?
Depending on your need,
there are two options [ there will be more ;-) ]
ssh remotehost 'nohup /path/to/script.sh </dev/null > nohup.out 2>&1 &'
OR
use 'screen'
Try "nohup cmd args..."
Steven's answer is correct, but I'd like to highlight the tricky part here again:
=> Using a bash script that just executes sleep in the background
The effect of this is that the "script" exits almost immediately (since it's done all its commands). However, it did create a child process (sleep) during its lifetime. The effect of this is that:
The "script" cannot be the parent anymore, and sleep is orphaned to init (which shows nicely in a pstree)
The bash shell where you started the script from has no underlying jobs anymore
Note that this stuff all happens when you executed the script, and has nothing to do with any ssh logout/putty closing.
When you then finally close your putty session, bash receives a "SIGHUP", but doesn't forward it to any other process (since there are no jobs left)
In the other case, bash did still have a job left, which it then sent the SIGHUP to, causing it to end (as you noticed)
Hope this helps

How do I put an already-running process under nohup?

I have a process that is already running for a long time and don't want to end it.
How do I put it under nohup (that is, how do I cause it to continue running even if I close the terminal?)
Using the Job Control of bash to send the process into the background:
Ctrl+Z to stop (pause) the program and get back to the shell.
bg to run it in the background.
disown -h [job-spec] where [job-spec] is the job number (like %1 for the first running job; find about your number with the jobs command) so that the job isn't killed when the terminal closes.
Suppose for some reason Ctrl+Z is also not working, go to another terminal, find the process id (using ps) and run:
kill -SIGSTOP PID
kill -SIGCONT PID
SIGSTOP will suspend the process and SIGCONT will resume the process, in background. So now, closing both your terminals won't stop your process.
The command to separate a running job from the shell ( = makes it nohup) is disown and a basic shell-command.
From bash-manpage (man bash):
disown [-ar] [-h] [jobspec ...]
Without options, each jobspec is removed from the table of active jobs. If the -h option is given, each jobspec is not
removed from the table, but is marked so that SIGHUP is not sent to the job if the shell receives a SIGHUP. If no jobspec is
present, and neither the -a nor the -r option is supplied, the current job is used. If no jobspec is supplied, the -a option
means to remove or mark all jobs; the -r option without a jobspec argument restricts operation to running jobs. The return
value is 0 unless a jobspec does not specify a valid job.
That means, that a simple
disown -a
will remove all jobs from the job-table and makes them nohup
These are good answers above, I just wanted to add a clarification:
You can't disown a pid or process, you disown a job, and that is an important distinction.
A job is something that is a notion of a process that is attached to a shell, therefore you have to throw the job into the background (not suspend it) and then disown it.
Issue:
% jobs
[1] running java
[2] suspended vi
% disown %1
See http://www.quantprinciple.com/invest/index.php/docs/tipsandtricks/unix/jobcontrol/
for a more detailed discussion of Unix Job Control.
Unfortunately disown is specific to bash and not available in all shells.
Certain flavours of Unix (e.g. AIX and Solaris) have an option on the nohup command itself which can be applied to a running process:
nohup -p pid
See http://en.wikipedia.org/wiki/Nohup
Node's answer is really great, but it left open the question how can get stdout and stderr redirected. I found a solution on Unix & Linux, but it is also not complete. I would like to merge these two solutions. Here it is:
For my test I made a small bash script called loop.sh, which prints the pid of itself with a minute sleep in an infinite loop.
$./loop.sh
Now get the PID of this process somehow. Usually ps -C loop.sh is good enough, but it is printed in my case.
Now we can switch to another terminal (or press ^Z and in the same terminal). Now gdb should be attached to this process.
$ gdb -p <PID>
This stops the script (if running). Its state can be checked by ps -f <PID>, where the STAT field is 'T+' (or in case of ^Z 'T'), which means (man ps(1))
T Stopped, either by a job control signal or because it is being traced
+ is in the foreground process group
(gdb) call close(1)
$1 = 0
Close(1) returns zero on success.
(gdb) call open("loop.out", 01102, 0600)
$6 = 1
Open(1) returns the new file descriptor if successful.
This open is equal with open(path, O_TRUNC|O_CREAT|O_RDWR, S_IRUSR|S_IWUSR).
Instead of O_RDWR O_WRONLY could be applied, but /usr/sbin/lsof says 'u' for all std* file handlers (FD column), which is O_RDWR.
I checked the values in /usr/include/bits/fcntl.h header file.
The output file could be opened with O_APPEND, as nohup would do, but this is not suggested by man open(2), because of possible NFS problems.
If we get -1 as a return value, then call perror("") prints the error message. If we need the errno, use p errno gdb comand.
Now we can check the newly redirected file. /usr/sbin/lsof -p <PID> prints:
loop.sh <PID> truey 1u REG 0,26 0 15008411 /home/truey/loop.out
If we want, we can redirect stderr to another file, if we want to using call close(2) and call open(...) again using a different file name.
Now the attached bash has to be released and we can quit gdb:
(gdb) detach
Detaching from program: /bin/bash, process <PID>
(gdb) q
If the script was stopped by gdb from an other terminal it continues to run. We can switch back to loop.sh's terminal. Now it does not write anything to the screen, but running and writing into the file. We have to put it into the background. So press ^Z.
^Z
[1]+ Stopped ./loop.sh
(Now we are in the same state as if ^Z was pressed at the beginning.)
Now we can check the state of the job:
$ ps -f 24522
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID><PPID> 0 11:16 pts/36 S 0:00 /bin/bash ./loop.sh
$ jobs
[1]+ Stopped ./loop.sh
So process should be running in the background and detached from the terminal. The number in the jobs command's output in square brackets identifies the job inside bash. We can use in the following built in bash commands applying a '%' sign before the job number :
$ bg %1
[1]+ ./loop.sh &
$ disown -h %1
$ ps -f <PID>
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID><PPID> 0 11:16 pts/36 S 0:00 /bin/bash ./loop.sh
And now we can quit from the calling bash. The process continues running in the background. If we quit its PPID become 1 (init(1) process) and the control terminal become unknown.
$ ps -f <PID>
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID> 1 0 11:16 ? S 0:00 /bin/bash ./loop.sh
$ /usr/bin/lsof -p <PID>
...
loop.sh <PID> truey 0u CHR 136,36 38 /dev/pts/36 (deleted)
loop.sh <PID> truey 1u REG 0,26 1127 15008411 /home/truey/loop.out
loop.sh <PID> truey 2u CHR 136,36 38 /dev/pts/36 (deleted)
COMMENT
The gdb stuff can be automatized creating a file (e.g. loop.gdb) containing the commands and run gdb -q -x loop.gdb -p <PID>. My loop.gdb looks like this:
call close(1)
call open("loop.out", 01102, 0600)
# call close(2)
# call open("loop.err", 01102, 0600)
detach
quit
Or one can use the following one liner instead:
gdb -q -ex 'call close(1)' -ex 'call open("loop.out", 01102, 0600)' -ex detach -ex quit -p <PID>
I hope this is a fairly complete description of the solution.
Simple and easiest steps
Ctrl + Z ----------> Suspends the process
bg --------------> Resumes and runs background
disown %1 -------------> required only if you need to detach from the terminal
To send running process to nohup (http://en.wikipedia.org/wiki/Nohup)
nohup -p pid , it did not worked for me
Then I tried the following commands and it worked very fine
Run some SOMECOMMAND,
say /usr/bin/python /vol/scripts/python_scripts/retention_all_properties.py 1.
Ctrl+Z to stop (pause) the program and get back to the shell.
bg to run it in the background.
disown -h so that the process isn't killed when the terminal closes.
Type exit to get out of the shell because now you're good to go as the operation will run in the background in its own process, so it's not tied to a shell.
This process is the equivalent of running nohup SOMECOMMAND.
ctrl + z - this will pause the job (not going to cancel!)
bg - this will put the job in background and return in running process
disown -a - this will cut all the attachment with job (so you can close the terminal and it will still run)
These simple steps will allow you to close the terminal while keeping process running.
It wont put on nohup (based on my understanding of your question, you don't need it here).
On my AIX system, I tried
nohup -p processid>
This worked well. It continued to run my process even after closing terminal windows. We have ksh as default shell so the bg and disown commands didn't work.

Resources