Good or evil - SetParent() win32 API between different processes - windows

The SetParent function takes a child and new parent window handle.
This also seems to work when the child window is in a different Windows process.
I have seen a post that claims this is not officially supported, but the current docs don't mention this any more. Is this a flaw in the current docs, or did this behavior change?
HWND WINAPI SetParent(
__in HWND hWndChild,
__in_opt HWND hWndNewParent
);

You can have a parent-child relationship with the windows in different processes. It's tricky to get it to work right in all cases. You may have to debug various strange symptoms.
Normally, windows in separate processes would get their messages from separate input queues using separate message pumps. When you use SendMessage to a window in another process, it's actually posted to the other window's queue, processed there, and the return is effectively marshaled back to the original process. So if one of the processes stops handling messages, you can effectively lock up the other as well. (That's true even within a process when the windows are created on different threads and the thread queues haven't been attached.)
But when you set up the parent/child relationship among windows in different threads, Windows attaches those input queues together, forcing the message processing to be synchronous. You're no longer in the normal case, but you face the same kinds of problems: a hang in the processing for one window effectively hangs the other process.
Watch out for messages that pass pointers in the params. The pointers will not be valid in the receiving process. (There are a couple of exceptions, like WM_COPYDATA, which recreates the data in the receiving process for you. But even those have limitations.)
You have to be especially careful when the windows are being destroyed. If possible, disconnect the parent-child relationship before destroying either window. If it's not possible, then it's probably best to manually destroy the child window before the parent is destroyed. Normally, destroying a parent will cause the children to be destroyed automatically, but it's easy to get hangs when the child is in another process (or un-attached thread).
In newer versions of Windows (Vista+), you can also hit some security speedbumps if the processes run at different integrity levels.
Thanks to IInspectable who pointed out an error in my earlier answer.

Just remove WS_CHILDWINDOW from the child window. It avoid locks.
Sorry, that has not been helped

Related

What is the use of a process with no threads in Windows?

I'm reading Windows Internals (7th Edition), and they write about processes in Chapter 1:
Processes
[...] a Windows process comprises the following:
[...]
At least one thread of execution Although an "empty" process is possible, it is (mostly) not useful.
What does "mostly" mean in this context? What could a process with no threads do, and how would that be useful?
EDIT: Also, in a 2015 talk, Mark Russinovich says that a process has "at least one thread" (19:12). Was that a generalization?
Disclaimer: I work for Microsoft.
I think the answer has come out in the comments. There seem to be at least two scenarios where a threadless process would be useful.
Scenario 1: capturing process snapshots
This is probably the most straightforward one. As RbMm commented, PssCaptureSnapshot can be called with the PSS_CAPTURE_VA_CLONE option to create a threadless (or "empty") process (using ZwCreateProcessEx, presumably to duplicate the target process's memory in kernel mode).
The primary use here would be for debugging, if a developer wanted to inspect a process's memory at a certain point in time.
Notably, Eryk Sun points out that an empty process is not necessary for inspecting handles (even though an empty process holds both its own memory space and handles), since there is already a way to inspect a process's handles without creating a new process or duplicating memory.
Scenario 2: forking processes with specific inherited handles---safely
Raymond Chen explains another use for a threadless process: creating new "real" processes with inherited handles safely.
When a thread wants to create a new process (CreateProcess), there are several ways for it to pass handles to the new process:
Make a handle inheritable and CreateProcess with bInheritHandles = true.
Make a handle inheritable, add it to a PROC_THREAD_ATTRIBUTE_LIST, and pass that list to the CreateProcess call.
However, they offer conflicting guarantees that can cause problems when callers want to create two threads with different handles concurrently. As Raymond puts it in Why do people take a lock around CreateProcess calls?:
In order for a handle to be inherited, you not only have to put it in the PROC_THREAD_ATTRIBUTE_LIST, but you also must make the handle inheritable. This means that if another thread is not on board with the PROC_THREAD_ATTRIBUTE_LIST trick and does a straight Create­Process with bInheritHandles = true, it will inadvertently inherit your handles.
You can use a threadless process to mitigate this. In general:
Create a threadless process.
DuplicateHandle all of the handles you want to capture into this new threadless process.
CreateProcess your new, real forked process, using the PROC_THREAD_ATTRIBUTE_LIST, but set the nominal parent process of this process to be the threadless process (with PROC_THREAD_ATTRIBUTE_PARENT_PROCESS).
You can now CreateProcess concurrently without worrying about other callers, and you can now close the duplicate handles and the empty process.

What is the ideal way to emulate process replacement on Windows?

So, in a feature request I filed against Node.js, I was looking for a way to replace the current Node process with another. In Linux and friends (really, any POSIX-compliant system), this is easy: use execve and friends and call it a day. But obviously, that won't work on Windows, since it only has CreateProcess (which execve and friends delegate to, complete with async behavior). And it's not like people haven't wanted to do similar, leading to numerous duplicate questions on this site. (This isn't a duplicate because it's explicitly seeking a workaround given certain constraints, not just asking for direct replacement.)
Process replacement has several facets that have to addressed:
All console I/O streams have to be forwarded to the new process.
All signals need transparently forwarded to the new process.
The data from the old process have to be destroyed, with as many resources reclaimed as possible.
All pre-existing threads and child processes should be destroyed.
All pre-existing handles should be destroyed apart from open file descriptors and named pipes/etc.
Optimally, the old process's memory should be kept to a minimum after the process is created.
For my particular use case, retaining the process ID is not important.
And for my particular case, there are a few constraints:
I can control the initial process's startup as well as the location of my "process replacement" function.
I could load arbitrary native code via add-ons at potentially any stack offset.
Implication: I can't even dream of tracking malloc calls, handles, thread manipulation, or process manipulation to track and free them all, since DLL rewriting isn't exactly practical.
I have no control over when my "process replacement" is called. It could be called through an add-on, which could've been called through either interpreted code via FFI or even another add-on recursively. It could even be called during add-on initialization.
Implication: I would have no ability to know what's in the stack, even if I perfectly instrumented my side. And rewriting all their calls and pushes is far from practical, and would just be all-around slow for obvious reasons.
So, here's the gist of what I was thinking: use something similar to a pseudo-trampoline.
Statically allocate the following:
A single pointer for the stack pointer.
MAX_PATH + 1 chars for the application path + '\0'.
MAX_PATH + 1 chars for the current working directory path + '\0'.
32768 chars for the arguments + '\0'.
32768 chars for the environment + '\0'.
On entry, set the global stack pointer reference to the stack pointer.
On "replacement":
Do relevant process cleanup and lock/release everything you can.
Set the stack pointer to the stored original global one.
Terminate each child thread.
Kill each child process.
Free each open handle.
If possible (i.e. not in a UWP program), For each heap, destroy it if it's not the default heap or the temporary heap (if it exists).
If possible, close each open handle.
If possible, walk the default heap and free each segment associated with it.
Create a new process with the statically allocated file/arguments/environment/etc. with no new window created.
Proxy all future received signals, exceptions, etc. without modification to this process somehow. The standard signals are easy, but not so much with the exceptions.
Wait for the process to end.
Return with the process's exit code.
The idea here is to use a process-based trampoline and drop the current process size to an absolute minimum while the newly created one is started.
But where I'm not very familiar with Windows, I probably made quite a few mistakes here. Also, the above seems extremely inefficient and to an extent it just feels horribly wrong for something a kernel could just release a few memory pages, deallocate a bunch of memory handles, and move some memory around for the next process.
So, to summarize, what's the ideal way to emulate process replacement on Windows with the fewest limitations?
Given that I don't understand what is actually being requested and I certainly look at things like 'execve' with a "who the hell would ever call that anyway, nothing but madness can ever result" sentiment, I nonetheless look at this problem by asking myself:
if process-a was killed and replaced by an near identical process-b - who or what would notice?
Anything that held the process id, or a handle to the process would certainly notice. This can be handled by writing a wrapper app which loads the first node process, and when prodded, kills it and loads the next. External observers see the wrapping process handles and id's unchanged.
Obviously this would cut off the stdin and stdout streams being fed into the node applications. But again, the wrapper process could get around this by passing the same set of inheritable handles to each node process launched by filling in the STARTUPINFO structure passed to CreateProcess properly.
Windows doesn't support signals, and the ones that the MS C runtime fake all deal with internal errors except one, which deals with an interactive console window being closed via ctrl-C, which the active Node.js app is sure to get anyway - or can be passed on from the wrapper as the node apps would not actually be running on the interactive console with this approach.
Other than that, everything else seems to be an internal detail of the Node.js application so shouldn't effect any 3rd party app communicating with what it thinks is a single node app via its stdin/stdout streams.

How to identify a process in Windows? Kernel and User mode

In Windows, what is the formal way of identifying a process uniquely? I am not talking about PID, which is allocated dynamically, but a unique ID or a name which is permanent to that process. I know that every program/process has a security descriptor but it seems to hold SIDs for loggedin user and group (not the process). We cannot use the path and name of executable from where the process starts as that can change.
My aim is to identify a process in the kernel mode and allow it to perform certain operation. What is the easiest and best way of doing this?
Your question is too vague to answer properly. For example how could the path possibly change (without poking around in kernel memory) after creation of a process? And yes, I am aware that one could hook into the memory-mapping process during process creation to replace the image originally destined to be loaded with another. Point is that a process is merely one instance of running a given executable. And it's not clear what exact tampering attempts you want to counter here.
But from kernel mode you do have the ability to simply use the pointer to the EPROCESS structure. No need to use the PID, although that will be unique while the process is still alive.
So assuming your process uses an IRP to communicate to the driver (whether it be WriteFile, ReadFile, DeviceIoControl or something more exotic), in order to register itself, you can use IoGetCurrentProcess to get the PEPROCESS value which will be unique to the process.
While the structure itself is not officially documented, hints can be gleaned from the "Windows Internals" book (in its various incarnations), the dt (Display Type) command in WinDbg (and friends) as well as from third-party resources on the internet (e.g. here, specific to Vista).
The process objects are kept in several linked lists. So if you know the (officially undocumented!!!) layout for a particular OS version, you may traverse the lists to get from one to the next process object (i.e. EPROCESS structure).
Cautionary notes
Make sure to reference the object of the process, by using the respective object manager routines. Otherwise you cannot be certain it's safe to both reach into these structures (which is anyway unsafe, since you cannot rely on their layout across OS versions) or to pass it to functions that expect a PEPROCESS.
As a side-note: Harry Johnston is of course right to assert that a privileged user can insert arbitrary (well almost arbitrary) code into the TCB in order to thwart your protective measures. In the end it is going to be an arms race.
Also keep in mind that similar to PIDs, theoretically the value of the PEPROCESS may be recycled. But in both cases you can simply counter this by invalidating whatever internal state you keep in your driver that allows the process to do its magic, whenever the process goes down. Using something like PsSetCreateProcessNotifyRoutine would seem to be a good method here. In order to translate your process handle from the callback to a PEPROCESS value, use ObReferenceObjectByHandle.
An alternative of countering recycling of the PID/PEPROCESS is by keeping a reference to the process object and thus keeping it in a kind of undead state (similar to not closing a handle in user mode), although the main thread may have finished.

What happens to a process handle once the process was ended?

if I have a handle to some windows process which has stopped (killed or just ended):
Will the handle (or better the memory behind it) be re-used for another process?
Or will GetExitCodeProcess() for example get the correct result forever from now on?
If 1. is true: How "long" would GetExitCodeProcess() work?
If 2. is true: Wouldn't that mean that I can bring down the OS with starting/killing new processes, since I create more and more handles (and the OS reserves memory for them)?
I'm a bit confused about the concept of handles.
Thank you in advance!
The handle indirectly points to an kernel object. As long as there are open handles, the object will be kept alive.
Will the handle (or better the memory behind it) be re-used for another process?
The numeric value of the handle (or however it is implemented) might get reused, but that doesn't mean it'll always point to the same thing. Just like process IDs.
Or will GetExitCodeProcess() for example get the correct result forever from now on?
No. When all handles to the process are closed, the process object is freed (along with its exit code). Note that running process holds an implicit handle to itself. You can hold an open handle, though, as long as you need it.
If 2. is true: Wouldn't that mean that I can bring down the OS with starting/killing new processes, since I create more and more handles (and the OS reserves memory for them)?
There are many ways to starve the system. It will either start heavily swapping or just fail to spawn a new process at some point.
Short answer:
GetExitCodeProcess works until you call CloseHandle, after what the process object will be released and may be reused.
Long answer:
See Cat Plus Plus's answer.

How advisable is not having a message loop in WinMain?

This may be the simplest win32 program ever ..
#include <windows.h>
int WINAPI WinMain(HINSTANCE hInst, HINSTANCE hPrev, LPSTR cmdLine, int show)
{
MessageBox(0, "Hello world..", "Salutations!", MB_OK);
return 0;
}
.. which makes no calls whatsoever to the usual GetMessage() call. My question is this: if my program doesn't process any window messages, can the OS cope with that? I.e., does it cause memory leaks? Or some other resource that wouldn't be apparent unless I ran it 16K times?
In a broader sense, exactly how 'dependent' is Win32 on applications taking care of their messages? I would hope that when the compiler links the executable as a windows program, that the run-time would be capable of cleaning up any kind of message queue, be it emptied or not.
Just a technicality, but you do have a window, and you do have a message loop, just not in your code.
The call to MessageBox() creates a window (of class #32770) and runs a local message loop, not returning to your code till the message loop drops out, presumably when WM_NCDESTROY is sent. I think it's the same message loop that runs in response to DialogBox().
But you could substitute your call to MessageBox() with anything else that really doesn't create a message loop, and you'll still be fine. Windows doesn't care if you have a message loop, although some functionality (primarily UI related) is difficult or impossible to use without it. In fact, you don't have to link to user32 at all, and some apps that have no user interface don't.
Now if you create a window and don't process messages for it in some way, Windows XP and up will replace your window with a "ghost" window that has a white client area and Task Manager will tell the user that the application is not responding.
Although it seems so at first, the message loop is not magic or a strictly required part of Windows boilerplate. It is highly ingrained as a standard in most Windows applications, though, because it's the best way to handle the dispatching of window messages. The "event-driven" nature of most Windows applications makes us forget sometimes that Windows applications were originally designed to be single-threaded, and in this model, it is code running within that single thread, not some unseen force within the operating system, that must make every function call within our code. The addition of multithreading changed that somewhat, but the basic model still remains the same.
EDIT
A note about message queues:
As is mentioned elsewhere, a message queue is only created (and on a per-thread basis) when a window is created by that thread. Your example program, upon creating a message box, does create a message queue. But this queue need not be empty when your application exits. This queue is just a memory structure. It's a block of memory that can hold a certain number of message objects (specifying destination hWnd, message id, wParam, lParam, system time when message was posted, mouse position when message was posted, and some data that allows the derivation of keyboard and mouse button state when the message was posted), along with pointers to the head and tail of the queue (I assume it's a circular queue). When the application exits, this memory, like all memory belonging to the process, is summarily freed.
There are, of course, other things that must be cleaned up outside your process. The OS must keep a table of all existing windows, for example, along with the thread and process that created them. Of course, these are all cleaned up automatically as well.
Since you don't have a window, you don't need a message loop. In Win32 the messages are sent to windows, not to applications.
You do have a message loop - MessageBox is a modal dialog and therefore contains a message loop within.
You don't have to create windows. But still, there are some kind of messages, like
WM_TIMER
WM_TIMECHANGE
WM_CLIPBOARDUPDATE
WM_COPYDATA
WM_POWER
that you may need. So, a ghost window hanging around wouldn't be bad.
If you don't have a window, then that is fine, but if you do then you need to make sure that you pump messages for it. Otherwise the system can hang on broadcast messages waiting for you to respond. This is important for things like COM which create hidden windows for message processing. If your main thread does not pump messages (e.g., by calling WaitForSingleObject) then calls to your COM objects will not be processed, and any programs which send broadcasts will appear to hang.
I have read somewhere (and can't find the reference) is that Windows will create a message queue on demand. If you never call a function that looks for a message queue, one will never be created. And this occurs on a per-thread basis.

Resources