Linux - Child process to survive parent process tree kill - bash

Motivation:
In a Java program, I'm setting a bash script to be executed on -XX:OnOutOfMemoryError. This script is responsible for uploading the heap-dump to HDFS. However, quite often only a part of the file gets uploaded.
I'm suspecting the JVM gets killed by cluster manager before the upload script completes. My guess is the JVM receives a process group kill signal and takes the bash script, i.e. its child process, down too.
The Question:
Is there a way in unix to run a sub-process in such a way that it does not die when it's parent receives a group kill signal?

You can use disown. Start the process in the background and then disown it, and any kill signals to the process parent will no longer be propagated to the child.
Script would look something like:
./handler_script &
disown

Related

Killing process group from Ruby kills my whole computer

I have a script (script.sh) that spawns a whole lot of child processes. If I run the script from the shell via ./script.sh, I can kill the whole process tree via
kill -- -<PID>
where PID is the process ID of the script.sh process (this apparently equals the group ID).
However, if I spawn the script from Ruby via
pid = Process.spawn(script.sh)
I cannot manage to kill the process tree.
Process.kill(9,pid)
only kills the parent process. And even worst, the following
Process.kill(9,-Process.getpgid(pid)) ### Don't try this line at home
terminates my computer.
Trying to kill the processes via
system("kill -- -#{pid}")
also fails.
How am I supposed to kill this process tree from Ruby?
I think I have found the solution. Spawning the process as
pid = Process.spawn(script.sh, :pgroup => true)
makes me able to kill the process group via
Process.kill(9,-Process.getpgid(pid))
It looks like bash groups processes by default, while Spawn doesn't enable this by default.

If I run a script with nohup, which in turn calls another script, is the other script effected by nohup?

So let's say I have a script called script1 who somewhere in the code calls script2:
...
./script2
...
And let's say I run script1 as such:
nohup ./script1
Will script2 be effected by the nohup?
The nohup command detaches the command from the controlling terminal from which it is being run. Child processes inherit the environment from the parent process, thus are also detached.
The name of the command comes from "NO Hang-UP", refererring to SIGHUP signal. The signal is used to notify processes that the terminal is closed, and no more input/output is possible. The signal is sent only to the processes which are attached to the terminal (read from and/or write to; e.g. interactive user input/output). What nohup tool does, is to simply redirect input/output of the given command away from the terminal, thus making sure it will not receive the SIGHUP when the terminal closes. On the Unix-like OSs, the child processes automatically inherit the I/O redirection from the parent process.

What purpose does using exec in docker entrypoint scripts serve?

For example in the redis official image:
https://github.com/docker-library/redis/blob/master/2.8/docker-entrypoint.sh
#!/bin/bash
set -e
if [ "$1" = 'redis-server' ]; then
chown -R redis .
exec gosu redis "$#"
fi
exec "$#"
Why not just run the commands as usual without exec preceding them?
As #Peter Lyons says, using exec will replace the parent process, rather than have two processes running.
This is important in Docker for signals to be proxied correctly. For example, if Redis was started without exec, it will not receive a SIGTERM upon docker stop and will not get a chance to shutdown cleanly. In some cases, this can lead to data loss or zombie processes.
If you do start child processes (i.e. don't use exec), the parent process becomes responsible for handling and forwarding signals as appropriate. This is one of the reasons it's best to use supervisord or similar when running multiple processes in a container, as it will forward signals appropriately.
Without exec, the parent shell process survives and waits for the child to exit. With exec, the child process replaces the parent process entirely so when there's nothing for the parent to do after forking the child, I would consider exec slightly more precise/correct/efficient. In the grand scheme of things, I think it's probably safe to classify it as a minor optimization.
without exec
parent shell starts
parent shell forks child
child runs
child exits
parent shell exits
with exec
parent shell starts
parent shell forks child, replaces itself with child
child program runs taking over the shell's process
child exits
Think of it as an optimization like tail recursion.
If running another program is the final act of the shell script, there's not much of a need to have the shell run the program in a new process and wait for it. Using exec, the shell process replaces itself with the program.
In either case, the exit value of the shell script will be identical1. Whatever program originally called the shell script will see an exit value that is equal to the exit value of the exec`ed program (or 127 if the program cannot be found).
1 modulo corner cases such as a program doing something different depending on the name of its parent.

Script which launches another application will bring it down on exit

I have a script which does launch another application using nohup my_app &, but when the initial script dies the launched process also goes down. As per my understanding since since it has been ran with nohup that should not happen. The original script also called with nohup.
What went wrong there?
A very reliable script that has been used successfully for years, and has always terminated after invoking a nohup uses this construct:
nohup ${BinDir}/${Watcher} >${DataDir}/${Watcher}.nohup.out 2>&1 &
Perhaps the problem is that output is not being managed?
nohup does not mean that a (child) process is still running when the (parent) process is killed. nohup is used f.e. when you're connecting over ssh to a server and there starting a process. If you log out, the process will terminate (logging out sents the signal SIGHUP to the process causing the process to terminate), using nohup avoid this behaviour and you're process is still running when you logged out.
If you need a program which runs in the background even it's parent process has terminated try using daemons.
It depends what my-app does - it might set its own signal mask. You probably know that nohup ignores the hang-up signal SIGHUP, and this is inherited by the target program. If that target program does its own signal handling then it might be setting SIGHUP to, for example SIG_DFT - the default action (which is to die).
To check, run strace -f -o out or truss -f -o out on the command. This will give you all the kernel calls in the file called 'out'. You should be able to spot the signal mask being changed if it is.

Why do unix background processes sometimes die when I exit my shell?

I wanted to know why i am seeing a different behaviour in the background process in Bash shell
Case 1: Logged in to Unix server using Putty(SSH)
By default it uses csh shell
I changed to bash shell
typed sleep 2000 &
press enter
It gave me the job number. Now i killed my session by clicking the x in the putty window
Now open another session and tried to lookup the process..the process died.
Case 2:Case 1: Logged in to Unix server using Putty(SSH)
By default it uses csh shell
I changed to bash shell
vi mysleep.sh
sleep 2000 & Saved mysleep.sh
./mysleep.sh
Diff here is..instead of executing the sleep command directly i am storing the sleep command in a file and executing the file.
Now i killed my session by clicking the x in the putty window
Now open another session and tried to lookup the process..the process is still there
Not sure why this is happening. I thought i need to do disown in bash to run the process even after logging out.
One diff i see in the parent process id..In the second case..the parent process id for the sleep 2000 becomes 1. Looks like as soon as process for mysleep.sh died the kernel assigned the parent process to 1.
The difference here is indeed the intervening process.
When you close the terminal window, a HUP signal (related to "nohup" as an0nymo0usc0ward mentioned) is sent to the processes running in it. The default action on receiving HUP is to die - from the signal(3) manpage,
No Name Default Action Description
1 SIGHUP terminate process terminal line hangup
In your first example, the sleep process directly receives this HUP signal and dies because it isn't set to do anything else. (Some processes catch HUP and use it to perform some action, e.g. reread some configuration files)
In the second example, the shell process running your shell script has already died, so the sleep process never gets the signal. In UNIX, every process must have a parent process due to the internals of how the wait(2) family of calls works and indeed processes in general. So when the parent process dies, the kernel gives it to init (pid 1, as you note) as a foster child.
Orphan process (on wikipedia) has some more information available about it, also see Zombie process for some additional technical details.
Already running process?
^z
bg
disown %<jobid>
New process/script (on local machine's console)?
nohup script.sh &
New process/script (on remote machine's console)?
Depending on your need,
there are two options [ there will be more ;-) ]
ssh remotehost 'nohup /path/to/script.sh </dev/null > nohup.out 2>&1 &'
OR
use 'screen'
Try "nohup cmd args..."
Steven's answer is correct, but I'd like to highlight the tricky part here again:
=> Using a bash script that just executes sleep in the background
The effect of this is that the "script" exits almost immediately (since it's done all its commands). However, it did create a child process (sleep) during its lifetime. The effect of this is that:
The "script" cannot be the parent anymore, and sleep is orphaned to init (which shows nicely in a pstree)
The bash shell where you started the script from has no underlying jobs anymore
Note that this stuff all happens when you executed the script, and has nothing to do with any ssh logout/putty closing.
When you then finally close your putty session, bash receives a "SIGHUP", but doesn't forward it to any other process (since there are no jobs left)
In the other case, bash did still have a job left, which it then sent the SIGHUP to, causing it to end (as you noticed)
Hope this helps

Resources