Why does the "yes | sleep 10" pipe not fail - bash

In thinking about how to implement a certain feature in one of my own programs I've been wondering how bash handles pipes of the following nature internally:
yes | sleep 10
This obviously does nothing but I don't understand how this does not result in an error. I would have thought that either:
because sleep does not read from stdin, the pipe connecting both processes would fill up and cause yes to block indefinitely when it attempts to write to the now full pipe
if non-blocking IO is used, errors should occur if yes is executed first and writes to the pipe before the sleep process is even run and thus no process is connected to the read end of the pipe
I guess this is some major misunderstanding on my part. I've tried looking at the bash source code but that's gone over my head.

Here's what actually happens when you run the shell command yes | sleep 10.
First the shell creates an anonymous pipe using the pipe system call. The pipe system call opens two file descriptors which are the read end and the write end of the pipe. Whatever gets written to the write end becomes available for reading from the read end.
After this, the shell creates two child processes using the fork system call. The two children run in parallel.
In one child, the shell connects the write end of the pipe to standard output and closes the read end. Then the shell calls the execve system call to replace the code image in this process by the code image of yes.
The program yes writes to the pipe as long as it can. If there isn't an active read call on the read end of the pipe, the write call just blocks. (There's actually a small buffer which write will fill up before blocking, but this doesn't matter here.)
In the other child, the shell connects the read end of the pipe to standard input and closes the write end. Then the shell calls the execve system call to replace the code image in this process by the code image of sleep.
The program sleep does nothing for 10 seconds.
The original shell process closes both ends of the pipe and waits for both of its children to exit (using the wait system call).
Once the 10 seconds are up, the process running sleep exits. At this point, the read end of the pipe is no longer open in any process. When a process tries to write to a pipe whose read end is not open in any process, the kernel sends a SIGPIPE signal to the writing process. Thus the process running yes is killed by the SIGPIPE signal.
At this point, the shell detects that its child processes on both sides of the pipeline have exited. The pipeline command returns the status of the right-hand side, which is 0 (sleep exits successfully).
because sleep does not read from stdin, the pipe connecting both processes would fill up and cause yes to block indefinitely when it attempts to write to the now full pipe
This is correct.
if non-blocking IO is used, errors should occur if yes is executed first and writes to the pipe before the sleep process is even run and thus no process is connected to the read end of the pipe
This is not correct in several places. yes does not use non-blocking IO. It's executed in parallel with sleep, not first. There is never any point in time when no process is connected to the read end of the pipe, not until sleep exits. Depending on the timing, it's possible that yes will start writing before sleep starts executing, maybe even before the child process for the sleep program is forked, but the read end became open when the pipe call returned, at the same time as the write end became open.

Related

Why does bash "forget" about my background processes?

I have this code:
#!/bin/bash
pids=()
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
done
for pid in "${pids[#]}"; do
wait "$pid"
done
I expect the following behavior:
spin through the first loop
wait about a second on the first pid
spin through the second loop
Instead, I get this error:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(repeated 171 times with different pids)
If I run the script with shorter loop (50 instead of 999), then I get no errors.
What's going on?
Edit: I am using GNU bash 4.4.23 on Windows.
POSIX says:
The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.
{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:
$ getconf CHILD_MAX
13195
Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.
The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:
sleep is executed in the background via a fork+exec.
At some point, sleep exits leaving behind a zombie.
That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.
However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.
Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.
I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' and any earlier. I can't reproduce with bash:5.1.0 .
What's going on?
It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.

Ignore HUP signal in Bash script with pipe commands

I have the following script which monitors the /tmp directory indefinitely and if there are any operations with files in this directory, then file name is read by while loop and first a character in file name is replaced with b character and this modified file name is logged to test.log file:
#!/bin/bash
trap ':' HUP
trap 'kill $(jobs -p)' EXIT
/usr/local/bin/inotifywait -q -m /tmp --format %f |
while IFS= read -r filename; do
echo "$filename" | sed 's/a/b/' > test.log
done
This is simplified version of the actual script. I also have a Sys-V type init script for the script above and as I would like to stay LSB compliant, my init script has force-reload(Causes the configuration to be reloaded if the service supports this. Otherwise, the service is restarted.) option which sends the HUP signal to script. Now before executing the force-reload, which executes killproc -HUP test.sh, the output of pstree is following:
# pstree -Ap 4424
test.sh(4424)-+-inotifywait(4425)
`-test.sh(4426)
#
After executing the strace killproc -HUP test.sh the child shell is terminated:
# pstree -Ap 4424
test.sh(4424)---inotifywait(4425)
#
According to strace, killproc sent SIGHUP to processes 4424 and 4426, but only the latter was terminated.
What is the point of this child-shell with PID 4426 in my example, i.e why is it created in the first place? In addition, is there a way to ignore HUP signal?
Pipeline commands are run in a subshell
The first part of your question is explained by the mechanism through which a shell (in this case Bash) runs commands in a pipeline.
A pipe is a FIFO (first in, first out) one-way inter-process communication (IPC) channel: it allows bytes to be written at one end (the write-only end) and read from the other (read-only end) without needing to read from or write to a physical filesystem.
A pipeline allows two different commands to communicate with each other through an anonymous or unnamed (i.e., has no entry in the filesystem) pipe.
When a simple command is executed by a shell, the command is run in a child process of the shell. If no job control is used, control of the terminal is regained by the shell when the child process terminates.
When two commands are run in a pipeline, both commands in the pipeline are executed as two separate child processes which run concurrently.
In Unix systems, pipes are created using the pipe(2) system call, which creates a new pipe and returns a pair of file descriptors with one referring to the read end and the other to the write end of the pipe.
With Bash on a GNU/Linux system, the clone(2) system call is used to create the sub-processes. This allows the child process to share the table of file descriptors with its parent process so that both child sub-processes inherit the file descriptor of the anonymous pipe so that one can read to it and the other can write to it.
In your case, the inotifywait command gets a PID of 4425 and writes to the write-only end of the pipe by connecting its stdout to the file descriptor of the write end.
At the same time, the right hand side of the pipe command gets the PID, 4426 and its stdin file descriptor is set to that of the read-only end of the pipe. Since the subshell for the right hand side of the pipe isn’t an external command, the name to represent the child process is the same as that of its parent, test.sh.
For more info, see man 7 pipe and the following links:
Anonymous pipe, Wikipedia article
Unix Pipeline, Wikipedia article
Signal handling
It took me ages (a couple of hours of research, in fact) to figure out why the trap for the SIGHUP signal wasn’t being ignored.
All my research indicated that child process created by a clone(2) system call should also be able to share the table of signal handlers of the parent process.
The Bash man page also states that
Command substitution, commands grouped with parentheses, and asynchronous commands are invoked in a subshell environment that is a duplicate of the shell environment, except that traps caught by the shell are reset to the values that the shell inherited from its parent at invocation.
It later states that
Signals ignored upon entry to the shell cannot be trapped or reset. Trapped signals that are not being ignored are reset to their original values in a subshell or subshell environment when one is created.
This indicates that subshells do not inherit signal handlers that are not ignored. As I understood it, your trap ':' HUP line meant that the SIGHUP signal was (effectively) being ignored (since the : builtin does nothing except return success) – and should in turn be ignored by the pipeline’s subshell.
However, I eventually came across the description of the trap builtin in the Bash man page which defines what Bash means by ignore:
If arg is the null string the signal specified by each sigspec is ignored by the shell and by the commands it invokes.
Simply changing the trap command to trap '' HUP ensures that the SIGHUP signal is ignored, for the script itself – and any subshells.

Resource leaking of available PIDs by long running bash scripts

I am currently reading up on some more details on Bash scripting and especially process management here. In the section on "PIDs and Parents" I found the following statement:
A process's PID will NEVER be freed up for use after the process dies UNTIL the parent process waits for the PID to see whether it ended and retrieve its exit code.
So if I understand this correctly, if I start an process in a bash script, then the process terminates, that the PID cannot be used by any other process. Wouldn't this mean, that if I have a long running script, which repeatedly starts other sub-processes but never waits on them, that I'll eventually have a resource leak, because the used PIDs will not be returned back to the system?
How about if I actually wait for the other process, but the wait get's cancelled by a trap. Would this wait somehow still free up the PID, or do I have to wait again after the trap has been caught?
Luckily you won't. I can't tell you exactly why but you can easily test this. Run the following script (stop with Ctrl+C):
#!/bin/bash
while true; do
sleep 5 &
sleep 1
done
You can see you get no zombies (leaked PIDs) after 6+ seconds. To see some zombies use the following python code (again, stop with Ctrl+C):
#!/usr/bin/python
import subprocess, time
pl = []
while True:
pl.append(subprocess.Popen(["sleep", "5"]))
time.sleep(1)
After 6 seconds you'll see one zombie:
ps xaw | grep 'sleep'
...
26470 pts/2 Z+ 0:00 [sleep] <defunct>
...
My guess is that bash does wait and stores the results reaping the zombile processes with or without the builtin wait command. For the python script, if you remove the pl.append part the garbage collection releases the objects and does it's magic again reaping the zombies. Just for info a child may never become a zombie (from wikipedia, Zombie process):
...if the parent explicitly ignores SIGCHLD by setting its handler to SIG_IGN (rather
than simply ignoring the signal by default) or has the SA_NOCLDWAIT flag set, all
child exit status information will be discarded and no zombie processes will be left.
You don't have to explicitly wait on foreground processes because the shell in which your script is running waits on them. The next process won't start until the previous one finishes.
If you start many long running background processes, you could use all available PIDs, but that's subject to the limit of ulimit -u (which could be unlimited).

How to kill all children of the current shell on interrupt?

My scripts cdist-deploy-to and cdist-mass-deploy (from cdist configuration management) run interactively (i.e. are called by a user).
These scripts call a lot of scripts, which again call some scripts:
cdist-mass-deploy ...
cdist-deploy-to ...
cdist-explorer-run-global ...
cdist-dir ....
What I want is to exit / kill all scripts, as soon as cdist-mass-deploy is either stopped by control C (SIGINT) or killed with SIGTERM.
cdist-deploy-to can also be called interactively and should exhibit the same behaviour.
Using ps -ef... and co variants to find out all processes with the ppid looks like it could be quite unportable. Using $! does not work as in the deeper levels the children are no background processes.
I tried using the following code:
__cdist_kill_on_interrupt()
{
__cdist_tmp_removal
kill 0
exit 1
}
trap __cdist_kill_on_interrupt INT TERM
But this leads to ugly Terminated messages as well as to a segfault in the shells (dash, bash, zsh) and seems not to stop everything instantly anyway:
# cdist-mass-deploy -p ikq04.ethz.ch ikq05.ethz.ch
core: Waiting for cdist-deploy-to jobs to finish
^CTerminated
Terminated
Terminated
Terminated
Segmentation fault
So the question is, how to cleanly exit including all (sub-)children in a portable manner (bourne shell, no csh support needed)?
You don't need to handle ^C, that will result in a signal being sent to the whole process group, which will kill all the processes that are not in the background. So you don't need to catch INT.
The only reason you get a Terminated when you kill them is that kill sends TERM by default, but that's reasonable if you are handling a TERM in the first place. You could use kill -INT 0 if you want to avoid the messages.
(responding with extra info)
If the child processes are run in the background, you can get their process ids just after you start them, using the $! special shell variable. Gather these together in a variable and just kill them all when you need to terminate.

Where goes signal sent to process which called system?

Given a very simple ruby script:
child = fork do
system 'sleep 10000'
end
5.times do
sleep 1
puts "send kill to #{child}"
Process.kill("QUIT", child)
end
QUIT signal is just lost. Where does it go? Something with default handler which just ignores it?
How to send signal to all processes created by that fork? Is it possible to do that without searching for all child processes?
The problem is that the system call creates yet another child process running the given command in a subshell, so there are actually three processes running in your example. Additionally, the Ruby Kernel#system command is implemented via the standard C function system(3), which calls fork and exec to create the new process and (on most systems) ignores SIGINT and SIGQUIT, and blocks SIGCHLD.
If you simply call sleep(10000) instead of system("sleep 10000") then things should work as you expect. You can also trap SIGQUIT in the child to handle it gracefully:
child = fork do
Signal.trap("QUIT") { puts "CHILD: ok, quitting time!"; exit }
sleep(10000)
end
If you really need to use a "system" call from the child process then you might be better off using an explicit fork/exec pair (instead of the implicit ones in the system call), so that you can perform your own signal handling in the third forked child.
I think that you are sending signal to fork process corectly. I think that the problem is with the system command. System command creates new fork and waits until it ends and I think that this waiting is blocking your quit signal. If you run your example as test.rb you'll see three processes:
test.rb
test.rb
sleep 10000
If you send signal "TERM" or "KILL" instead of "QUIT" the second test.rb will die but sleep 10000 will continue!

Resources