How do making processes with & (in bash) and killing them work? - bash

I have a script that launches another script in the background, and then terminates it. I was expecting the child script to be gone, but in the end it still manages to print some output. Here is the example:
in script one.sh:
echo "this is one"
./two.sh &
sleep 1
pid=$!
kill $pid
echo "this was one"
in script two.sh:
echo "this is two"
./three.sh
echo "this was two"
in script three.sh:
echo "this is three"
sleep 5
echo "this was three"
I ran ./one.sh which is supposed to run two.sh in the background, which in turn runs three.sh but not in the background! The output is get is:
this is one
this is two
this is three
this was one
this was three
Shouldn't "this was three" not appear in the output since three.sh was not ran in the background and two.sh was terminated by one.sh? Could you also point me towards any documentation that describes how processes behave when (not) in background and what happens when they are terminated?
Thank you very much for all your help!

When you start a new process from a bash script this is basically done via fork() .
The new process, referred to as the child, is an exact duplicate of the calling process, referred to as the parent (except for a number of points that can be found in man fork).
If a parent dies the child becomes a child of the init process.
Then it is the role of the init process to collect the return code of the child (reaping) after it has exited. So when you kill "two", "three" isn't killed but just gets a different parent. And this is the reason for the trailing three.
The question is discussed from a C-point-of-view here : How to make child process die after parent exits?

You're killing the backgrounded process two.sh, but not two.sh and its child three.sh.
This question:
Best way to kill all child processes
has more info on killing child processes.

The reason this may seem surprising is that one might expect the TERM signal (the default from "kill") to be propagated to child processes, in other words, that the SIGTERM signal (signal #15) received by two.sh would be propagated to three.sh as well. However, this is not actually the case. Killing two.sh simply leaves three.sh to be fostered out to the "init" process (proceess ID 1) as its new parent process, and init will clean up after three.sh when it exits.
The situation gets more complicated with process groups, and the bash documentation talks about how keyboard-generated signals get send to all processes within the foreground process group, often a pipeline being run without an "&" on the end. However, these issues don't apply to the example scripts.
Note: In Unix, you shouldn't use ".sh" extensions on executable scripts. Focus on putting the right "#!/bin/bash" or "#!/bin/sh" on the first line instead. Commands should not expose their implementation language in the command name, lest one have to leave the wrong one on later when the implementation language changes, but other code has come to rely on the original, now incorrect extension.

Related

Why does bash "forget" about my background processes?

I have this code:
#!/bin/bash
pids=()
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
done
for pid in "${pids[#]}"; do
wait "$pid"
done
I expect the following behavior:
spin through the first loop
wait about a second on the first pid
spin through the second loop
Instead, I get this error:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(repeated 171 times with different pids)
If I run the script with shorter loop (50 instead of 999), then I get no errors.
What's going on?
Edit: I am using GNU bash 4.4.23 on Windows.
POSIX says:
The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.
{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:
$ getconf CHILD_MAX
13195
Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.
The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:
sleep is executed in the background via a fork+exec.
At some point, sleep exits leaving behind a zombie.
That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.
However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.
Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.
I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' and any earlier. I can't reproduce with bash:5.1.0 .
What's going on?
It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.

How to write to a coprocess from a child process of the parent that opened the coprocess

I am using a coprocess inside my main parent process to spawn commands to a shell that otherwise cannot be solved (the shell that I open in the coprocess is not maintained by me and executes the "newgrp" and "exec" commands that stop me from sending commands to that shell simply from my script... So I need the coprocess to be able to execute commands in that shell from a script). So far I have been using one thread, the parent process to push commands to the coprocess but now I am in need of spawning commands from several child processes, too, because of an optimization step. The bash doc says, file descriptors are not inherited by child processes, and this is in fact true, when I opened a subshell I got the following error message from bash:
[...]/automated_integration/clif_ai_common.sh: line 396: ${!clifAi_sendCmdToCoproc_varName}: Bad file descriptor
The code that makes this message appear is as follows:
if [[ ${PARAM_NO_MOVING_VERIF_TB_TAGS} != true ]]; then
(
clifAi_log ${CLIFAI_LOGLEVEL_INFO} "" "clifAi_sanityRegression_callbackRunning" "Populating moving VERIF and TB tags in the background..."
clifAi_popVerifTags "${clifAi_sanityRegression_callbackRunning_coproc}" "${clifAi_sanityRegression_callbackRunning_wslogfile}" "${PARAM_OPTLEVEL}" "${CONST_EXCLUDE_FILTER}" "${CONST_DIR_TO_OPT}" ${clifAi_sanityRegression_callbackRunning_excludeList}
clifAi_popTbTags "${clifAi_sanityRegression_callbackRunning_coproc}" "${clifAi_sanityRegression_callbackRunning_wslogfile}"
rm -rf ${VAR_VERIFTBTAG_SEMAPHORE_FILE}
) &
fi
Bash reports the same error if I move this piece of code into a function and call it with & without the ( ), so no subshell. This is also understandable; it will still spawn a child process, regardless of running it in a subshell or not.
My question is, how can I write to the coprocess owned by the parent process from child processes, too? What is the best practice?
Many thanks in advance,
Geza Balazs

killing a background process with shell in Ubuntu

I tried killing a process with shell in Ubuntu which is created with like:
#!/bin/bash
<!--There should be codes which can kill my app(run in the background)
echo "app will be run."
java -jar path/to/my/jar/file.jar /arguman/of/myApp.txt & << 'ENDAPP'
disown ENDAPP
I know how to kill an app with manuel which is like:
ps -ax -u| grep appName
and find processid then,
kill [processId]
Is it possible to do?İf yes,how?
Thank you.
You can actually find examples of how to do this right here, on this site, by doing a simple search. (Or Google it: "bash shell wait.") When you execute any background process, you can get the PID ("process id") of the new child. You can wait on the child to finish. You can also kill it.
Shell commands that show you executing jobs also provide their PID.
However, bear in mind that "killing a child" ... while it won't land you in prison in this case ;-) ... "is generally not a good thing to do." You have no idea what the child was doing, what it had or had not finished doing, what data might now be in an inconsistent state, when you put that bullet through its brain. It is impossible to reliably debug any process that relies on murdering its children.
You can "send a different signal," such as SIGHUP or SIGUSR1, to a process, using the same kill command, and design the child process to be listening for that signal as an indication that it must "shut itself down, quickly." Always give a process a signal to "put its own affairs in order, and then to leave 'this mortal coil' ..."

What purpose does using exec in docker entrypoint scripts serve?

For example in the redis official image:
https://github.com/docker-library/redis/blob/master/2.8/docker-entrypoint.sh
#!/bin/bash
set -e
if [ "$1" = 'redis-server' ]; then
chown -R redis .
exec gosu redis "$#"
fi
exec "$#"
Why not just run the commands as usual without exec preceding them?
As #Peter Lyons says, using exec will replace the parent process, rather than have two processes running.
This is important in Docker for signals to be proxied correctly. For example, if Redis was started without exec, it will not receive a SIGTERM upon docker stop and will not get a chance to shutdown cleanly. In some cases, this can lead to data loss or zombie processes.
If you do start child processes (i.e. don't use exec), the parent process becomes responsible for handling and forwarding signals as appropriate. This is one of the reasons it's best to use supervisord or similar when running multiple processes in a container, as it will forward signals appropriately.
Without exec, the parent shell process survives and waits for the child to exit. With exec, the child process replaces the parent process entirely so when there's nothing for the parent to do after forking the child, I would consider exec slightly more precise/correct/efficient. In the grand scheme of things, I think it's probably safe to classify it as a minor optimization.
without exec
parent shell starts
parent shell forks child
child runs
child exits
parent shell exits
with exec
parent shell starts
parent shell forks child, replaces itself with child
child program runs taking over the shell's process
child exits
Think of it as an optimization like tail recursion.
If running another program is the final act of the shell script, there's not much of a need to have the shell run the program in a new process and wait for it. Using exec, the shell process replaces itself with the program.
In either case, the exit value of the shell script will be identical1. Whatever program originally called the shell script will see an exit value that is equal to the exit value of the exec`ed program (or 127 if the program cannot be found).
1 modulo corner cases such as a program doing something different depending on the name of its parent.

How to kill all children of the current shell on interrupt?

My scripts cdist-deploy-to and cdist-mass-deploy (from cdist configuration management) run interactively (i.e. are called by a user).
These scripts call a lot of scripts, which again call some scripts:
cdist-mass-deploy ...
cdist-deploy-to ...
cdist-explorer-run-global ...
cdist-dir ....
What I want is to exit / kill all scripts, as soon as cdist-mass-deploy is either stopped by control C (SIGINT) or killed with SIGTERM.
cdist-deploy-to can also be called interactively and should exhibit the same behaviour.
Using ps -ef... and co variants to find out all processes with the ppid looks like it could be quite unportable. Using $! does not work as in the deeper levels the children are no background processes.
I tried using the following code:
__cdist_kill_on_interrupt()
{
__cdist_tmp_removal
kill 0
exit 1
}
trap __cdist_kill_on_interrupt INT TERM
But this leads to ugly Terminated messages as well as to a segfault in the shells (dash, bash, zsh) and seems not to stop everything instantly anyway:
# cdist-mass-deploy -p ikq04.ethz.ch ikq05.ethz.ch
core: Waiting for cdist-deploy-to jobs to finish
^CTerminated
Terminated
Terminated
Terminated
Segmentation fault
So the question is, how to cleanly exit including all (sub-)children in a portable manner (bourne shell, no csh support needed)?
You don't need to handle ^C, that will result in a signal being sent to the whole process group, which will kill all the processes that are not in the background. So you don't need to catch INT.
The only reason you get a Terminated when you kill them is that kill sends TERM by default, but that's reasonable if you are handling a TERM in the first place. You could use kill -INT 0 if you want to avoid the messages.
(responding with extra info)
If the child processes are run in the background, you can get their process ids just after you start them, using the $! special shell variable. Gather these together in a variable and just kill them all when you need to terminate.

Resources