I have a script (script.sh) that spawns a whole lot of child processes. If I run the script from the shell via ./script.sh, I can kill the whole process tree via
kill -- -<PID>
where PID is the process ID of the script.sh process (this apparently equals the group ID).
However, if I spawn the script from Ruby via
pid = Process.spawn(script.sh)
I cannot manage to kill the process tree.
Process.kill(9,pid)
only kills the parent process. And even worst, the following
Process.kill(9,-Process.getpgid(pid)) ### Don't try this line at home
terminates my computer.
Trying to kill the processes via
system("kill -- -#{pid}")
also fails.
How am I supposed to kill this process tree from Ruby?
I think I have found the solution. Spawning the process as
pid = Process.spawn(script.sh, :pgroup => true)
makes me able to kill the process group via
Process.kill(9,-Process.getpgid(pid))
It looks like bash groups processes by default, while Spawn doesn't enable this by default.
Related
Motivation:
In a Java program, I'm setting a bash script to be executed on -XX:OnOutOfMemoryError. This script is responsible for uploading the heap-dump to HDFS. However, quite often only a part of the file gets uploaded.
I'm suspecting the JVM gets killed by cluster manager before the upload script completes. My guess is the JVM receives a process group kill signal and takes the bash script, i.e. its child process, down too.
The Question:
Is there a way in unix to run a sub-process in such a way that it does not die when it's parent receives a group kill signal?
You can use disown. Start the process in the background and then disown it, and any kill signals to the process parent will no longer be propagated to the child.
Script would look something like:
./handler_script &
disown
On a linux box, I have at most 3 java jars files running. How do I quickly kill all 3 with one command?
Usually I would:
ps ex - get the processes running
then find the process ids then do:
kill -9 #### #### ####
Any way to shorten this process? My eyes hurts from squinting to find the process ids.
My script does the following:
nohup ./start-gossip &
nohup ./start &
nohup ./start-admin &
Is there a way to get the process ids of each without looking it up?
Short answer:
pkill java
This looks up a process (or processes) to kill by name. This will find any other java processes too, so be careful. It also accepts -9, but you should avoid using -9 unless something is really broken.
EDIT:
Based on updates, you may be able to specify the script names to pkill as well (I'm not positive). But, the more traditional way to handle this issue is to leave pid files around. After you start a new process and background it, its pid is available in $!. If you write that pid to a file, then it's easy to check if the process is still running and kill just the processes you mean to. There is some chance that the pid will be reused, however.
You can save the PIDs when you start the processes so you can use them later:
nohup ./start-gossip &
START_GOSSIP_PID=$!
nohup ./start &
START_PID=$!
nohup ./start-admin &
START_ADMIN_PID=$!
...
kill -9 $START_GOSSIP_PID
kill -9 $START_PID
kill -9 $START_ADMIN_PID
This has the advantage (over pkill) of not killing off any other processes that coincidentally have similar names. If you don't want to perform the kill operation from the script itself, but just want to have the PIDs handy, write them to a file (from the script):
echo $START_GOSSIP_PID > /some/path/start_gossip.pid
Or even just do this when you launch the process, rather than saving the PID to a variable:
nohup ./start-gossip &
echo $! > /some/path/start_gossip.pid
To get the process id of that java process run
netstat -tuplen
Process ID (PID) of that process whom you want to kill and run
kill -9 PID
In Ruby, I'm running a system("command here") that is constantly watching changes for files, similar to tail. I'd like my program to continue to run and not halt at the system() call. Is there a way in Ruby to create another process so both can run independently, output results to the terminal, and then when you exit the program all processes the application created are removed?
Just combine spawn and waitall:
spawn 'sleep 6'
spawn 'sleep 8'
Process.waitall
You don't want to use system as that waits for the process to complete. You could use spawn instead and then wait for the processes (to avoid zombies). Then, when you want to exit, send a SIGTERM to your spawned processes. You could also use fork to launch your child processes but spawn is probably easier if you're using external programs.
You could also use process groups instead of tracking all the process IDs, then a single Process.kill('TERM', -process_group_id) call would take care of things. Your child processes should end up in the same process group but there is Process.setpgid if you need it.
Here's an example that uses fork (easier to get it all wrapped in one package that way).
def launch(id, sleep_for)
pid = Process.fork do
while(true)
puts "#{id}, pgid = #{Process.getpgid(Process.pid())}, pid = #{Process.pid()}"
sleep(sleep_for)
end
end
# No zombie processes please.
Process.wait(pid, Process::WNOHANG)
pid
end
# These just forward the signals to the whole process group and
# then immediately exit.
pgid = Process.getpgid(Process.pid())
Signal.trap('TERM') { Process.kill('TERM', -pgid); exit }
Signal.trap('INT' ) { Process.kill('INT', -pgid); exit }
launch('a', 5)
launch('b', 3)
while(true)
puts "p, pgid = #{Process.getpgid(Process.pid())}, pid = #{Process.pid()}"
sleep 2
end
If you run that in one terminal and then kill it from another (using the shell's kill command)you'll see that the children are also killed. If you remove the "forward this signal to the whole process group" Signal.trap stuff, then a simple SIGTERM will leave the children still running.
All of this assumes that you're working on some sort of Unixy system (such as Linux or OSX), YMMV anywhere else.
One more vote for using Spawn. We use it in Production a lot and it's very stable.
My scripts cdist-deploy-to and cdist-mass-deploy (from cdist configuration management) run interactively (i.e. are called by a user).
These scripts call a lot of scripts, which again call some scripts:
cdist-mass-deploy ...
cdist-deploy-to ...
cdist-explorer-run-global ...
cdist-dir ....
What I want is to exit / kill all scripts, as soon as cdist-mass-deploy is either stopped by control C (SIGINT) or killed with SIGTERM.
cdist-deploy-to can also be called interactively and should exhibit the same behaviour.
Using ps -ef... and co variants to find out all processes with the ppid looks like it could be quite unportable. Using $! does not work as in the deeper levels the children are no background processes.
I tried using the following code:
__cdist_kill_on_interrupt()
{
__cdist_tmp_removal
kill 0
exit 1
}
trap __cdist_kill_on_interrupt INT TERM
But this leads to ugly Terminated messages as well as to a segfault in the shells (dash, bash, zsh) and seems not to stop everything instantly anyway:
# cdist-mass-deploy -p ikq04.ethz.ch ikq05.ethz.ch
core: Waiting for cdist-deploy-to jobs to finish
^CTerminated
Terminated
Terminated
Terminated
Segmentation fault
So the question is, how to cleanly exit including all (sub-)children in a portable manner (bourne shell, no csh support needed)?
You don't need to handle ^C, that will result in a signal being sent to the whole process group, which will kill all the processes that are not in the background. So you don't need to catch INT.
The only reason you get a Terminated when you kill them is that kill sends TERM by default, but that's reasonable if you are handling a TERM in the first place. You could use kill -INT 0 if you want to avoid the messages.
(responding with extra info)
If the child processes are run in the background, you can get their process ids just after you start them, using the $! special shell variable. Gather these together in a variable and just kill them all when you need to terminate.
I wanted to know why i am seeing a different behaviour in the background process in Bash shell
Case 1: Logged in to Unix server using Putty(SSH)
By default it uses csh shell
I changed to bash shell
typed sleep 2000 &
press enter
It gave me the job number. Now i killed my session by clicking the x in the putty window
Now open another session and tried to lookup the process..the process died.
Case 2:Case 1: Logged in to Unix server using Putty(SSH)
By default it uses csh shell
I changed to bash shell
vi mysleep.sh
sleep 2000 & Saved mysleep.sh
./mysleep.sh
Diff here is..instead of executing the sleep command directly i am storing the sleep command in a file and executing the file.
Now i killed my session by clicking the x in the putty window
Now open another session and tried to lookup the process..the process is still there
Not sure why this is happening. I thought i need to do disown in bash to run the process even after logging out.
One diff i see in the parent process id..In the second case..the parent process id for the sleep 2000 becomes 1. Looks like as soon as process for mysleep.sh died the kernel assigned the parent process to 1.
The difference here is indeed the intervening process.
When you close the terminal window, a HUP signal (related to "nohup" as an0nymo0usc0ward mentioned) is sent to the processes running in it. The default action on receiving HUP is to die - from the signal(3) manpage,
No Name Default Action Description
1 SIGHUP terminate process terminal line hangup
In your first example, the sleep process directly receives this HUP signal and dies because it isn't set to do anything else. (Some processes catch HUP and use it to perform some action, e.g. reread some configuration files)
In the second example, the shell process running your shell script has already died, so the sleep process never gets the signal. In UNIX, every process must have a parent process due to the internals of how the wait(2) family of calls works and indeed processes in general. So when the parent process dies, the kernel gives it to init (pid 1, as you note) as a foster child.
Orphan process (on wikipedia) has some more information available about it, also see Zombie process for some additional technical details.
Already running process?
^z
bg
disown %<jobid>
New process/script (on local machine's console)?
nohup script.sh &
New process/script (on remote machine's console)?
Depending on your need,
there are two options [ there will be more ;-) ]
ssh remotehost 'nohup /path/to/script.sh </dev/null > nohup.out 2>&1 &'
OR
use 'screen'
Try "nohup cmd args..."
Steven's answer is correct, but I'd like to highlight the tricky part here again:
=> Using a bash script that just executes sleep in the background
The effect of this is that the "script" exits almost immediately (since it's done all its commands). However, it did create a child process (sleep) during its lifetime. The effect of this is that:
The "script" cannot be the parent anymore, and sleep is orphaned to init (which shows nicely in a pstree)
The bash shell where you started the script from has no underlying jobs anymore
Note that this stuff all happens when you executed the script, and has nothing to do with any ssh logout/putty closing.
When you then finally close your putty session, bash receives a "SIGHUP", but doesn't forward it to any other process (since there are no jobs left)
In the other case, bash did still have a job left, which it then sent the SIGHUP to, causing it to end (as you noticed)
Hope this helps