Does '&' in bash cause Zombie Processes? - bash

A colleague insists that I need to call wait() after using & in a Bash script to spawn multiple child processes. I believe that the concern is that because the parent process is exiting before the child processes do, they'll be orphaned and left in zombie states.
I understand that fork() requires wait() or waitpid() to properly delete the file descriptors it creates. However, is this really needed for Bash? Isn't this something the bash subshell each child process is running in takes care of? In my own experimentation, I have not been able to recreate a situation in which a Bash child process I've created is left in a zombie state.

Processes whose parents die are reparented to init, which should eventually reap them when they exit. What causes zombie processes is when the parent process keeps living, but never gets around to reaping the child for some reason.

Related

Killing all descendent processes robustly on macOS

On Linux, Pid namespaces can be used to robustly kill all descendent (including orphaned & zombie) processes – see this answer for example.
What's the closest to a "robust" way to do the same on macOS? I can't rely on process groups unfortunately as some of the descendent processes alter them.
It's a gross kludge, but it might work: The first process would open a file descriptor so that, by default, all descendant processes inherit it. When it wants to kill them all, it runs lsof to find all processes with that file open and kills them.
It won't work for processes which have detached themselves, but you could walk the child process tree using proc_listchildpids() and send signals to each PID you obtain. There are probably some timing edge cases between checking a process's children and killing it - it could spawn more processes in this time. You could perhaps suspend all processes before listing their children and killing them. Processes whose parent has died should I think be reattached to their grandparent anyway though (I may be wrong on this) so in that case, as long as you keep calling proc_listchildpids() after sending each round of signals you should eventually end up in a steady state. (Ideally with no child processes left. But if they get into a really bad state [due to a kernel bug], some processes may be completely unkillable.)
proc_listchildpids() is declared in <libproc/libproc.h>.

Does Windows 7 recycle process id (PID) numbers?

I have this little test program that tracks PID's as they are created and shut down.
I am investigating a problem that my program has found and would like to ask you about this
in order to have a better idea on what's going on.
When a windows process is started, it gets a PID but when the process is shut down, does the PID
become retired (like a star basketballer's jersey number) or is it possible for a new, entirely
unrelated, process to be created under that released PID?
Thanks
Yes, process IDs may be recycled by the system. They become available for this as soon as the last handle to the process has been closed.
Raymond Chen discussed this matter here: When does a process ID become available for reuse?
The process ID is a value associated with the process object, and as
long as the process object is still around, so too will its process
ID. The process object remains as long as the process is still running
(the process implicitly retains a reference to itself) or as long as
somebody still has a handle to the process object.
If you think about it, this makes sense, because as long as there is
still a handle to the process, somebody can call WaitForSingleObject
to wait for the process to exit, or they can call GetExitCodeProcess
to retrieve the exit code, and that exit code has to be stored
somewhere for later retrieval.
When all handles are closed, then the kernel knows that nobody is
going to ask whether the process is still running or what its exit
code is (because you need a handle to ask those questions). At which
point the process object can be destroyed, which in turn destroys the
process ID.
I ran a test for about an hour and in that time 302 processes exits and 70 of them had PIDs in common (same PID was used for a new process). So that would say they are reused frequently.
Evidently, if the process is terminated, its PID is available for reuse.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683215%28v=vs.85%29.aspx
Remarks
Until a process terminates, its process identifier uniquely identifies it on the system. For more information about access rights, see Process Security and Access Rights.

Are Process::detach and Process::wait mutually exclusive (Ruby)?

I'm refactoring a bit of concurrent processing in my Ruby on Rails server (running on Linux) to use Spawn. Spawn::fork_it documentation claims that forked processes can still be waited on after being detached: https://github.com/tra/spawn/blob/master/lib/spawn.rb (line 186):
# detach from child process (parent may still wait for detached process if they wish)
Process.detach(child)
However, the Ruby Process::detach documentation says you should not do this: http://www.ruby-doc.org/core/classes/Process.html
Some operating systems retain the status of terminated child processes until the parent collects that status (normally using some variant of wait(). If the parent never collects this status, the child stays around as a zombie process. Process::detach prevents this by setting up a separate Ruby thread whose sole job is to reap the status of the process pid when it terminates. Use detach only when you do not intent to explicitly wait for the child to terminate.
Yet Spawn::wait effectively allows you to do just that by wrapping Process::wait. On a side note, I specifically want to use the Process::waitpid2 method to wait on the child processes, instead of using the Spawn::wait method.
Will detach-and-wait not work correctly on Linux? I'm concerned that this may cause a race condition between the detached reaper thread and the waiting parent process, as to who collects the child status first.
The answer to this question is there in the documentation. Are you writing code for your own use in a controlled environment? Or to be used widely by third parties? Ruby is written to be widely used by third parties, so their recommendation is to not do something that could fail on "some operating systems". Perhaps the Spawn library is designed primarily for use on Linux machines and tested only on a small subset thereof where this tactic works.
If you're distributing the code you're writing to be used by anyone and everyone, I would take Ruby's approach.
If you control the environment where this code will be run, I would write two tests:
A test that spawns a process, detaches it and then waits for it.
A test that spawns a process and then just waits for it.
Count the failure rate for both and if they are equal (within a margin that you feel is acceptable), go for it!

How are PIDs generated on Ubuntu?

I've just wrote a program that forks one process. The child process just displays "HI" 200 times. The father process just says he's the father.
I've printed out both pids.
When I run my program multiple times, I see that the parent's pid stays the same, which is normal. What I don't understand is why the child's pid keeps getting incremented by 2, and exactly 2.
My question: Is this the standard method of pid generation in Ubuntu? Incrementing by 2?
PIDs happen to be handed out monotonically increasing in Linux 2.6, but why does it matter which you get? Don't rely on any specific behavior. If there is a skip of +2 it might simply be because another process happened to spawn a child. Or because +1 would have reached a PID that is already in use.
Found a reference here saying that vfork() consumes a pid as a byproduct of its operation. As well, in some cases, if you're forking from a shell script, the fork might spawn a new shell before your actual script gets involved, which would also consume a pid.
I'd suggest suspending your program between a couple forks, and see if there's another process occupying those "missing" pids.

How to fire and forget a subprocess?

I have a long running process and I need it to launch another process (that will run for a good while too). I need to only start it, and then completely forget about it.
I managed to do what I needed by scooping some code from the Programming Ruby book, but I'd like to find the best/right way, and understand what is going on. Here's what I got initially:
exec("whatever --take-very-long") if fork.nil?
Process.detach($$)
So, is this the way, or how else should I do it?
After checking the answers below I ended up with this code, which seems to make more sense:
(pid = fork) ? Process.detach(pid) : exec("foo")
I'd appreciate some explanation on how fork works. [got that already]
Was detaching $$ right? I don't know why this works, and I'd really love to have a better grasp of the situation.
Alnitak is right. Here's a more explicit way to write it, without $$
pid = Process.fork
if pid.nil? then
# In child
exec "whatever --take-very-long"
else
# In parent
Process.detach(pid)
end
The purpose of detach is just to say, "I don't care when the child terminates" to avoid zombie processes.
The fork function separates your process in two.
Both processes then receive the result of the function. The child receives a value of zero/nil (and hence knows that it's the child) and the parent receives the PID of the child.
Hence:
exec("something") if fork.nil?
will make the child process start "something", and the parent process will carry on with where it was.
Note that exec() replaces the current process with "something", so the child process will never execute any subsequent Ruby code.
The call to Process.detach() looks like it might be incorrect. I would have expected it to have the child's PID in it, but if I read your code right it's actually detaching the parent process.
Detaching $$ wasn't right. From p. 348 of the Pickaxe (2nd Ed):
$$ Fixnum The process number of the program being executed. [r/o]
This section, "Variables and Constants" in the "Ruby Language" chapter, is very handy for decoding various ruby short $ constants - however the online edition (the first
So what you were actually doing was detaching the program from itself, not from its child.
Like others have said, the proper way to detach from the child is to use the child's pid returned from fork().
The other answers are good if you're sure you want to detach the child process. However, if you either don't mind, or would prefer to keep the child process attached (e.g. you are launching sub-servers/services for a web app), then you can take advantage of the following shorthand
fork do
exec('whatever --option-flag')
end
Providing a block tells fork to execute that block (and only that block) in the child process, while continuing on in the parent.
i found the answers above broke my terminal and messed up the output. this is the solution i found.
system("nohup ./test.sh &")
just in case anyone has the same issue, my goal was to log into a ssh server and then keep that process running indefinitely. so test.sh is this
#!/usr/bin/expect -f
spawn ssh host -l admin -i cloudkey
expect "pass"
send "superpass\r"
sleep 1000000

Resources