I have this little test program that tracks PID's as they are created and shut down.
I am investigating a problem that my program has found and would like to ask you about this
in order to have a better idea on what's going on.
When a windows process is started, it gets a PID but when the process is shut down, does the PID
become retired (like a star basketballer's jersey number) or is it possible for a new, entirely
unrelated, process to be created under that released PID?
Thanks
Yes, process IDs may be recycled by the system. They become available for this as soon as the last handle to the process has been closed.
Raymond Chen discussed this matter here: When does a process ID become available for reuse?
The process ID is a value associated with the process object, and as
long as the process object is still around, so too will its process
ID. The process object remains as long as the process is still running
(the process implicitly retains a reference to itself) or as long as
somebody still has a handle to the process object.
If you think about it, this makes sense, because as long as there is
still a handle to the process, somebody can call WaitForSingleObject
to wait for the process to exit, or they can call GetExitCodeProcess
to retrieve the exit code, and that exit code has to be stored
somewhere for later retrieval.
When all handles are closed, then the kernel knows that nobody is
going to ask whether the process is still running or what its exit
code is (because you need a handle to ask those questions). At which
point the process object can be destroyed, which in turn destroys the
process ID.
I ran a test for about an hour and in that time 302 processes exits and 70 of them had PIDs in common (same PID was used for a new process). So that would say they are reused frequently.
Evidently, if the process is terminated, its PID is available for reuse.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683215%28v=vs.85%29.aspx
Remarks
Until a process terminates, its process identifier uniquely identifies it on the system. For more information about access rights, see Process Security and Access Rights.
Related
On Linux, Pid namespaces can be used to robustly kill all descendent (including orphaned & zombie) processes – see this answer for example.
What's the closest to a "robust" way to do the same on macOS? I can't rely on process groups unfortunately as some of the descendent processes alter them.
It's a gross kludge, but it might work: The first process would open a file descriptor so that, by default, all descendant processes inherit it. When it wants to kill them all, it runs lsof to find all processes with that file open and kills them.
It won't work for processes which have detached themselves, but you could walk the child process tree using proc_listchildpids() and send signals to each PID you obtain. There are probably some timing edge cases between checking a process's children and killing it - it could spawn more processes in this time. You could perhaps suspend all processes before listing their children and killing them. Processes whose parent has died should I think be reattached to their grandparent anyway though (I may be wrong on this) so in that case, as long as you keep calling proc_listchildpids() after sending each round of signals you should eventually end up in a steady state. (Ideally with no child processes left. But if they get into a really bad state [due to a kernel bug], some processes may be completely unkillable.)
proc_listchildpids() is declared in <libproc/libproc.h>.
I have a user level process which is sleeping currently, by using sleep() function. I am trying to write a kernel module which can first extract the task_struct of the user process from its PID, and then can wake the process. Till date I have implemented the code for getting the task_struct from PID. But, I dont know of any function which can wake up that process. I tried wake_up_process(task_struct), though its returning 1, i.e, success in waking up the process, but the the printf() statement just after the sleep() statement of the user process is not getting executed. Will changing the state of the task_struct help? Or there's some another approach for doing the same? Please guide me further.
It is possible, but you might be going about it the wrong way. sleep() waits on a delay, and even though you could signal the process from within the kernel (essentially like kill(2) in user mode, with some non harmful signal, but something that will "kick you out" of the system call, the correct way of doing so is having the sleeping process block on a device which your kernel module exports. This way, the kernel module will have control - the process will be stuck in a read(2) call, and until your read implmentation in the module returns, the process will be stuck.
This is preferable, because the whole idea of sleeping is when you are waiting for something. When you simple sleep(xxxx) indefinitely, you're basically waiting on a time out. What more, using the device approach, you can add the file descriptor to a select(2)/poll(2) loop, as well, which makes for very elegant synchronization with other input/output descriptors.
Is there any way to find out what was the last Exit Code of an application the last time it run?
I want to check if application wasn't exit with zero exit code last time (which means abnormal termination in my case) And if so, do some checking and maybe fix/clean up previously generated data.
Since some applications do this (they give a warning and ask if you want to run in Safe Mode this time) I think maybe Windows can tell me this.
And if not, what is the best practice of doing this? Setting a flag on a file or something when application terminated correctly and check that next time it executed?
No, there's no permanent record of the exit code. It exists only as long as a handle to the process is kept open. And returned by GetExitCodeProcess(), it needs that handle. As soon as the last handle is closed then that exit code is gone for good. One technique is a little bootstrapper app that starts the process and keeps the handle. It can then also do other handy things like send alerts, keep a log, clean up partial files or record minidumps of crashes. Use WaitForSingleObject() to detect the process exit.
Btw, you definitely want to exit code number to mean the opposite thing. A zero is always the "normal exit" value. This helps you detect hard crashes. The exit code is always non-zero when Windows terminates the app forcibly, set to the exception code.
There are other ways, you can indeed create a file or registry key that indicates the process is running and check for that when it starts back up. The only real complication with it is that you need to do something meaningful when the user starts the program twice. Which is a hard problem to solve, such apps are usually single-instance apps. You use a named mutex to detect that an instance of the program is already running. Imprinting the evidence with the process ID and start time is workable.
There is no standard way to do this on the Windows Platform.
The easiest way to handle this case is to put a value on the registry and to clear it when the program exits.
If the value is still present when the program starts, then it terminated unexpectedly.
Put a value in the HKCU/Software// to be sure you have sufficient rights (the value will be per user in this case).
I'm refactoring a bit of concurrent processing in my Ruby on Rails server (running on Linux) to use Spawn. Spawn::fork_it documentation claims that forked processes can still be waited on after being detached: https://github.com/tra/spawn/blob/master/lib/spawn.rb (line 186):
# detach from child process (parent may still wait for detached process if they wish)
Process.detach(child)
However, the Ruby Process::detach documentation says you should not do this: http://www.ruby-doc.org/core/classes/Process.html
Some operating systems retain the status of terminated child processes until the parent collects that status (normally using some variant of wait(). If the parent never collects this status, the child stays around as a zombie process. Process::detach prevents this by setting up a separate Ruby thread whose sole job is to reap the status of the process pid when it terminates. Use detach only when you do not intent to explicitly wait for the child to terminate.
Yet Spawn::wait effectively allows you to do just that by wrapping Process::wait. On a side note, I specifically want to use the Process::waitpid2 method to wait on the child processes, instead of using the Spawn::wait method.
Will detach-and-wait not work correctly on Linux? I'm concerned that this may cause a race condition between the detached reaper thread and the waiting parent process, as to who collects the child status first.
The answer to this question is there in the documentation. Are you writing code for your own use in a controlled environment? Or to be used widely by third parties? Ruby is written to be widely used by third parties, so their recommendation is to not do something that could fail on "some operating systems". Perhaps the Spawn library is designed primarily for use on Linux machines and tested only on a small subset thereof where this tactic works.
If you're distributing the code you're writing to be used by anyone and everyone, I would take Ruby's approach.
If you control the environment where this code will be run, I would write two tests:
A test that spawns a process, detaches it and then waits for it.
A test that spawns a process and then just waits for it.
Count the failure rate for both and if they are equal (within a margin that you feel is acceptable), go for it!
I've just wrote a program that forks one process. The child process just displays "HI" 200 times. The father process just says he's the father.
I've printed out both pids.
When I run my program multiple times, I see that the parent's pid stays the same, which is normal. What I don't understand is why the child's pid keeps getting incremented by 2, and exactly 2.
My question: Is this the standard method of pid generation in Ubuntu? Incrementing by 2?
PIDs happen to be handed out monotonically increasing in Linux 2.6, but why does it matter which you get? Don't rely on any specific behavior. If there is a skip of +2 it might simply be because another process happened to spawn a child. Or because +1 would have reached a PID that is already in use.
Found a reference here saying that vfork() consumes a pid as a byproduct of its operation. As well, in some cases, if you're forking from a shell script, the fork might spawn a new shell before your actual script gets involved, which would also consume a pid.
I'd suggest suspending your program between a couple forks, and see if there's another process occupying those "missing" pids.