I'm reading Windows Internals (7th Edition), and they write about processes in Chapter 1:
Processes
[...] a Windows process comprises the following:
[...]
At least one thread of execution Although an "empty" process is possible, it is (mostly) not useful.
What does "mostly" mean in this context? What could a process with no threads do, and how would that be useful?
EDIT: Also, in a 2015 talk, Mark Russinovich says that a process has "at least one thread" (19:12). Was that a generalization?
Disclaimer: I work for Microsoft.
I think the answer has come out in the comments. There seem to be at least two scenarios where a threadless process would be useful.
Scenario 1: capturing process snapshots
This is probably the most straightforward one. As RbMm commented, PssCaptureSnapshot can be called with the PSS_CAPTURE_VA_CLONE option to create a threadless (or "empty") process (using ZwCreateProcessEx, presumably to duplicate the target process's memory in kernel mode).
The primary use here would be for debugging, if a developer wanted to inspect a process's memory at a certain point in time.
Notably, Eryk Sun points out that an empty process is not necessary for inspecting handles (even though an empty process holds both its own memory space and handles), since there is already a way to inspect a process's handles without creating a new process or duplicating memory.
Scenario 2: forking processes with specific inherited handles---safely
Raymond Chen explains another use for a threadless process: creating new "real" processes with inherited handles safely.
When a thread wants to create a new process (CreateProcess), there are several ways for it to pass handles to the new process:
Make a handle inheritable and CreateProcess with bInheritHandles = true.
Make a handle inheritable, add it to a PROC_THREAD_ATTRIBUTE_LIST, and pass that list to the CreateProcess call.
However, they offer conflicting guarantees that can cause problems when callers want to create two threads with different handles concurrently. As Raymond puts it in Why do people take a lock around CreateProcess calls?:
In order for a handle to be inherited, you not only have to put it in the PROC_THREAD_ATTRIBUTE_LIST, but you also must make the handle inheritable. This means that if another thread is not on board with the PROC_THREAD_ATTRIBUTE_LIST trick and does a straight CreateProcess with bInheritHandles = true, it will inadvertently inherit your handles.
You can use a threadless process to mitigate this. In general:
Create a threadless process.
DuplicateHandle all of the handles you want to capture into this new threadless process.
CreateProcess your new, real forked process, using the PROC_THREAD_ATTRIBUTE_LIST, but set the nominal parent process of this process to be the threadless process (with PROC_THREAD_ATTRIBUTE_PARENT_PROCESS).
You can now CreateProcess concurrently without worrying about other callers, and you can now close the duplicate handles and the empty process.
Related
So, in a feature request I filed against Node.js, I was looking for a way to replace the current Node process with another. In Linux and friends (really, any POSIX-compliant system), this is easy: use execve and friends and call it a day. But obviously, that won't work on Windows, since it only has CreateProcess (which execve and friends delegate to, complete with async behavior). And it's not like people haven't wanted to do similar, leading to numerous duplicate questions on this site. (This isn't a duplicate because it's explicitly seeking a workaround given certain constraints, not just asking for direct replacement.)
Process replacement has several facets that have to addressed:
All console I/O streams have to be forwarded to the new process.
All signals need transparently forwarded to the new process.
The data from the old process have to be destroyed, with as many resources reclaimed as possible.
All pre-existing threads and child processes should be destroyed.
All pre-existing handles should be destroyed apart from open file descriptors and named pipes/etc.
Optimally, the old process's memory should be kept to a minimum after the process is created.
For my particular use case, retaining the process ID is not important.
And for my particular case, there are a few constraints:
I can control the initial process's startup as well as the location of my "process replacement" function.
I could load arbitrary native code via add-ons at potentially any stack offset.
Implication: I can't even dream of tracking malloc calls, handles, thread manipulation, or process manipulation to track and free them all, since DLL rewriting isn't exactly practical.
I have no control over when my "process replacement" is called. It could be called through an add-on, which could've been called through either interpreted code via FFI or even another add-on recursively. It could even be called during add-on initialization.
Implication: I would have no ability to know what's in the stack, even if I perfectly instrumented my side. And rewriting all their calls and pushes is far from practical, and would just be all-around slow for obvious reasons.
So, here's the gist of what I was thinking: use something similar to a pseudo-trampoline.
Statically allocate the following:
A single pointer for the stack pointer.
MAX_PATH + 1 chars for the application path + '\0'.
MAX_PATH + 1 chars for the current working directory path + '\0'.
32768 chars for the arguments + '\0'.
32768 chars for the environment + '\0'.
On entry, set the global stack pointer reference to the stack pointer.
On "replacement":
Do relevant process cleanup and lock/release everything you can.
Set the stack pointer to the stored original global one.
Terminate each child thread.
Kill each child process.
Free each open handle.
If possible (i.e. not in a UWP program), For each heap, destroy it if it's not the default heap or the temporary heap (if it exists).
If possible, close each open handle.
If possible, walk the default heap and free each segment associated with it.
Create a new process with the statically allocated file/arguments/environment/etc. with no new window created.
Proxy all future received signals, exceptions, etc. without modification to this process somehow. The standard signals are easy, but not so much with the exceptions.
Wait for the process to end.
Return with the process's exit code.
The idea here is to use a process-based trampoline and drop the current process size to an absolute minimum while the newly created one is started.
But where I'm not very familiar with Windows, I probably made quite a few mistakes here. Also, the above seems extremely inefficient and to an extent it just feels horribly wrong for something a kernel could just release a few memory pages, deallocate a bunch of memory handles, and move some memory around for the next process.
So, to summarize, what's the ideal way to emulate process replacement on Windows with the fewest limitations?
Given that I don't understand what is actually being requested and I certainly look at things like 'execve' with a "who the hell would ever call that anyway, nothing but madness can ever result" sentiment, I nonetheless look at this problem by asking myself:
if process-a was killed and replaced by an near identical process-b - who or what would notice?
Anything that held the process id, or a handle to the process would certainly notice. This can be handled by writing a wrapper app which loads the first node process, and when prodded, kills it and loads the next. External observers see the wrapping process handles and id's unchanged.
Obviously this would cut off the stdin and stdout streams being fed into the node applications. But again, the wrapper process could get around this by passing the same set of inheritable handles to each node process launched by filling in the STARTUPINFO structure passed to CreateProcess properly.
Windows doesn't support signals, and the ones that the MS C runtime fake all deal with internal errors except one, which deals with an interactive console window being closed via ctrl-C, which the active Node.js app is sure to get anyway - or can be passed on from the wrapper as the node apps would not actually be running on the interactive console with this approach.
Other than that, everything else seems to be an internal detail of the Node.js application so shouldn't effect any 3rd party app communicating with what it thinks is a single node app via its stdin/stdout streams.
In Windows, what is the formal way of identifying a process uniquely? I am not talking about PID, which is allocated dynamically, but a unique ID or a name which is permanent to that process. I know that every program/process has a security descriptor but it seems to hold SIDs for loggedin user and group (not the process). We cannot use the path and name of executable from where the process starts as that can change.
My aim is to identify a process in the kernel mode and allow it to perform certain operation. What is the easiest and best way of doing this?
Your question is too vague to answer properly. For example how could the path possibly change (without poking around in kernel memory) after creation of a process? And yes, I am aware that one could hook into the memory-mapping process during process creation to replace the image originally destined to be loaded with another. Point is that a process is merely one instance of running a given executable. And it's not clear what exact tampering attempts you want to counter here.
But from kernel mode you do have the ability to simply use the pointer to the EPROCESS structure. No need to use the PID, although that will be unique while the process is still alive.
So assuming your process uses an IRP to communicate to the driver (whether it be WriteFile, ReadFile, DeviceIoControl or something more exotic), in order to register itself, you can use IoGetCurrentProcess to get the PEPROCESS value which will be unique to the process.
While the structure itself is not officially documented, hints can be gleaned from the "Windows Internals" book (in its various incarnations), the dt (Display Type) command in WinDbg (and friends) as well as from third-party resources on the internet (e.g. here, specific to Vista).
The process objects are kept in several linked lists. So if you know the (officially undocumented!!!) layout for a particular OS version, you may traverse the lists to get from one to the next process object (i.e. EPROCESS structure).
Cautionary notes
Make sure to reference the object of the process, by using the respective object manager routines. Otherwise you cannot be certain it's safe to both reach into these structures (which is anyway unsafe, since you cannot rely on their layout across OS versions) or to pass it to functions that expect a PEPROCESS.
As a side-note: Harry Johnston is of course right to assert that a privileged user can insert arbitrary (well almost arbitrary) code into the TCB in order to thwart your protective measures. In the end it is going to be an arms race.
Also keep in mind that similar to PIDs, theoretically the value of the PEPROCESS may be recycled. But in both cases you can simply counter this by invalidating whatever internal state you keep in your driver that allows the process to do its magic, whenever the process goes down. Using something like PsSetCreateProcessNotifyRoutine would seem to be a good method here. In order to translate your process handle from the callback to a PEPROCESS value, use ObReferenceObjectByHandle.
An alternative of countering recycling of the PID/PEPROCESS is by keeping a reference to the process object and thus keeping it in a kind of undead state (similar to not closing a handle in user mode), although the main thread may have finished.
if I have a handle to some windows process which has stopped (killed or just ended):
Will the handle (or better the memory behind it) be re-used for another process?
Or will GetExitCodeProcess() for example get the correct result forever from now on?
If 1. is true: How "long" would GetExitCodeProcess() work?
If 2. is true: Wouldn't that mean that I can bring down the OS with starting/killing new processes, since I create more and more handles (and the OS reserves memory for them)?
I'm a bit confused about the concept of handles.
Thank you in advance!
The handle indirectly points to an kernel object. As long as there are open handles, the object will be kept alive.
Will the handle (or better the memory behind it) be re-used for another process?
The numeric value of the handle (or however it is implemented) might get reused, but that doesn't mean it'll always point to the same thing. Just like process IDs.
Or will GetExitCodeProcess() for example get the correct result forever from now on?
No. When all handles to the process are closed, the process object is freed (along with its exit code). Note that running process holds an implicit handle to itself. You can hold an open handle, though, as long as you need it.
If 2. is true: Wouldn't that mean that I can bring down the OS with starting/killing new processes, since I create more and more handles (and the OS reserves memory for them)?
There are many ways to starve the system. It will either start heavily swapping or just fail to spawn a new process at some point.
Short answer:
GetExitCodeProcess works until you call CloseHandle, after what the process object will be released and may be reused.
Long answer:
See Cat Plus Plus's answer.
What is the basic difference between a pthread and fork w.r.t. linux in terms of
implementation differences and how the scheduling varies (does it vary ?)
I ran strace on two similar programs , one using pthreads and another using fork,
both in the end make clone() syscall with different arguments, so I am guessing
the two are essentially the same on a linux system but with pthreads being easier
to handle in code.
Can someone give a deep explanation?
EDIT : see also a related question
In C there are some differences however:
fork()
Purpose is to create a new process, which becomes the child process of the caller
Both processes will execute the next instruction following the fork() system call
Two identical copies of the computer's address space,code, and stack are created one for parent and child.
Thinking of the fork as it was a person; Forking causes a clone of your program (process), that is running the code it copied.
pthread_create()
Purpose is to create a new thread in the program which is given the same process of the caller
Threads within the same process can communicate using shared memory. (Be careful!)
The second thread will share data,open files, signal handlers and signal dispositions, current working directory, user and group ID's. The new thread will get its own stack, thread ID, and registers though.
Continuing the analogy; your program (process) grows a second arm when it creates a new thread, connected to the same brain.
On Linux, the system call clone clones a task, with a configurable level of sharing.
fork() calls clone(least sharing) and pthread_create() calls clone(most sharing).
forking costs a tiny bit more than pthread_createing because of copying tables and creating COW mappings for memory.
You should look at the clone manpage.
In particular, it lists all the possible clone modes and how they affect the process/thread, virtual memory space etc...
You say "threads easier to handle in code": that's very debatable. Writing bug-free, deadlock-free multi-thread code can be quite a challenge. Sometimes having two separate processes makes things much simpler.
The SetParent function takes a child and new parent window handle.
This also seems to work when the child window is in a different Windows process.
I have seen a post that claims this is not officially supported, but the current docs don't mention this any more. Is this a flaw in the current docs, or did this behavior change?
HWND WINAPI SetParent(
__in HWND hWndChild,
__in_opt HWND hWndNewParent
);
You can have a parent-child relationship with the windows in different processes. It's tricky to get it to work right in all cases. You may have to debug various strange symptoms.
Normally, windows in separate processes would get their messages from separate input queues using separate message pumps. When you use SendMessage to a window in another process, it's actually posted to the other window's queue, processed there, and the return is effectively marshaled back to the original process. So if one of the processes stops handling messages, you can effectively lock up the other as well. (That's true even within a process when the windows are created on different threads and the thread queues haven't been attached.)
But when you set up the parent/child relationship among windows in different threads, Windows attaches those input queues together, forcing the message processing to be synchronous. You're no longer in the normal case, but you face the same kinds of problems: a hang in the processing for one window effectively hangs the other process.
Watch out for messages that pass pointers in the params. The pointers will not be valid in the receiving process. (There are a couple of exceptions, like WM_COPYDATA, which recreates the data in the receiving process for you. But even those have limitations.)
You have to be especially careful when the windows are being destroyed. If possible, disconnect the parent-child relationship before destroying either window. If it's not possible, then it's probably best to manually destroy the child window before the parent is destroyed. Normally, destroying a parent will cause the children to be destroyed automatically, but it's easy to get hangs when the child is in another process (or un-attached thread).
In newer versions of Windows (Vista+), you can also hit some security speedbumps if the processes run at different integrity levels.
Thanks to IInspectable who pointed out an error in my earlier answer.
Just remove WS_CHILDWINDOW from the child window. It avoid locks.
Sorry, that has not been helped