What is the ideal way to emulate process replacement on Windows? - windows

So, in a feature request I filed against Node.js, I was looking for a way to replace the current Node process with another. In Linux and friends (really, any POSIX-compliant system), this is easy: use execve and friends and call it a day. But obviously, that won't work on Windows, since it only has CreateProcess (which execve and friends delegate to, complete with async behavior). And it's not like people haven't wanted to do similar, leading to numerous duplicate questions on this site. (This isn't a duplicate because it's explicitly seeking a workaround given certain constraints, not just asking for direct replacement.)
Process replacement has several facets that have to addressed:
All console I/O streams have to be forwarded to the new process.
All signals need transparently forwarded to the new process.
The data from the old process have to be destroyed, with as many resources reclaimed as possible.
All pre-existing threads and child processes should be destroyed.
All pre-existing handles should be destroyed apart from open file descriptors and named pipes/etc.
Optimally, the old process's memory should be kept to a minimum after the process is created.
For my particular use case, retaining the process ID is not important.
And for my particular case, there are a few constraints:
I can control the initial process's startup as well as the location of my "process replacement" function.
I could load arbitrary native code via add-ons at potentially any stack offset.
Implication: I can't even dream of tracking malloc calls, handles, thread manipulation, or process manipulation to track and free them all, since DLL rewriting isn't exactly practical.
I have no control over when my "process replacement" is called. It could be called through an add-on, which could've been called through either interpreted code via FFI or even another add-on recursively. It could even be called during add-on initialization.
Implication: I would have no ability to know what's in the stack, even if I perfectly instrumented my side. And rewriting all their calls and pushes is far from practical, and would just be all-around slow for obvious reasons.
So, here's the gist of what I was thinking: use something similar to a pseudo-trampoline.
Statically allocate the following:
A single pointer for the stack pointer.
MAX_PATH + 1 chars for the application path + '\0'.
MAX_PATH + 1 chars for the current working directory path + '\0'.
32768 chars for the arguments + '\0'.
32768 chars for the environment + '\0'.
On entry, set the global stack pointer reference to the stack pointer.
On "replacement":
Do relevant process cleanup and lock/release everything you can.
Set the stack pointer to the stored original global one.
Terminate each child thread.
Kill each child process.
Free each open handle.
If possible (i.e. not in a UWP program), For each heap, destroy it if it's not the default heap or the temporary heap (if it exists).
If possible, close each open handle.
If possible, walk the default heap and free each segment associated with it.
Create a new process with the statically allocated file/arguments/environment/etc. with no new window created.
Proxy all future received signals, exceptions, etc. without modification to this process somehow. The standard signals are easy, but not so much with the exceptions.
Wait for the process to end.
Return with the process's exit code.
The idea here is to use a process-based trampoline and drop the current process size to an absolute minimum while the newly created one is started.
But where I'm not very familiar with Windows, I probably made quite a few mistakes here. Also, the above seems extremely inefficient and to an extent it just feels horribly wrong for something a kernel could just release a few memory pages, deallocate a bunch of memory handles, and move some memory around for the next process.
So, to summarize, what's the ideal way to emulate process replacement on Windows with the fewest limitations?

Given that I don't understand what is actually being requested and I certainly look at things like 'execve' with a "who the hell would ever call that anyway, nothing but madness can ever result" sentiment, I nonetheless look at this problem by asking myself:
if process-a was killed and replaced by an near identical process-b - who or what would notice?
Anything that held the process id, or a handle to the process would certainly notice. This can be handled by writing a wrapper app which loads the first node process, and when prodded, kills it and loads the next. External observers see the wrapping process handles and id's unchanged.
Obviously this would cut off the stdin and stdout streams being fed into the node applications. But again, the wrapper process could get around this by passing the same set of inheritable handles to each node process launched by filling in the STARTUPINFO structure passed to CreateProcess properly.
Windows doesn't support signals, and the ones that the MS C runtime fake all deal with internal errors except one, which deals with an interactive console window being closed via ctrl-C, which the active Node.js app is sure to get anyway - or can be passed on from the wrapper as the node apps would not actually be running on the interactive console with this approach.
Other than that, everything else seems to be an internal detail of the Node.js application so shouldn't effect any 3rd party app communicating with what it thinks is a single node app via its stdin/stdout streams.

Related

What is the use of a process with no threads in Windows?

I'm reading Windows Internals (7th Edition), and they write about processes in Chapter 1:
Processes
[...] a Windows process comprises the following:
[...]
At least one thread of execution Although an "empty" process is possible, it is (mostly) not useful.
What does "mostly" mean in this context? What could a process with no threads do, and how would that be useful?
EDIT: Also, in a 2015 talk, Mark Russinovich says that a process has "at least one thread" (19:12). Was that a generalization?
Disclaimer: I work for Microsoft.
I think the answer has come out in the comments. There seem to be at least two scenarios where a threadless process would be useful.
Scenario 1: capturing process snapshots
This is probably the most straightforward one. As RbMm commented, PssCaptureSnapshot can be called with the PSS_CAPTURE_VA_CLONE option to create a threadless (or "empty") process (using ZwCreateProcessEx, presumably to duplicate the target process's memory in kernel mode).
The primary use here would be for debugging, if a developer wanted to inspect a process's memory at a certain point in time.
Notably, Eryk Sun points out that an empty process is not necessary for inspecting handles (even though an empty process holds both its own memory space and handles), since there is already a way to inspect a process's handles without creating a new process or duplicating memory.
Scenario 2: forking processes with specific inherited handles---safely
Raymond Chen explains another use for a threadless process: creating new "real" processes with inherited handles safely.
When a thread wants to create a new process (CreateProcess), there are several ways for it to pass handles to the new process:
Make a handle inheritable and CreateProcess with bInheritHandles = true.
Make a handle inheritable, add it to a PROC_THREAD_ATTRIBUTE_LIST, and pass that list to the CreateProcess call.
However, they offer conflicting guarantees that can cause problems when callers want to create two threads with different handles concurrently. As Raymond puts it in Why do people take a lock around CreateProcess calls?:
In order for a handle to be inherited, you not only have to put it in the PROC_THREAD_ATTRIBUTE_LIST, but you also must make the handle inheritable. This means that if another thread is not on board with the PROC_THREAD_ATTRIBUTE_LIST trick and does a straight Create­Process with bInheritHandles = true, it will inadvertently inherit your handles.
You can use a threadless process to mitigate this. In general:
Create a threadless process.
DuplicateHandle all of the handles you want to capture into this new threadless process.
CreateProcess your new, real forked process, using the PROC_THREAD_ATTRIBUTE_LIST, but set the nominal parent process of this process to be the threadless process (with PROC_THREAD_ATTRIBUTE_PARENT_PROCESS).
You can now CreateProcess concurrently without worrying about other callers, and you can now close the duplicate handles and the empty process.

How to identify a process in Windows? Kernel and User mode

In Windows, what is the formal way of identifying a process uniquely? I am not talking about PID, which is allocated dynamically, but a unique ID or a name which is permanent to that process. I know that every program/process has a security descriptor but it seems to hold SIDs for loggedin user and group (not the process). We cannot use the path and name of executable from where the process starts as that can change.
My aim is to identify a process in the kernel mode and allow it to perform certain operation. What is the easiest and best way of doing this?
Your question is too vague to answer properly. For example how could the path possibly change (without poking around in kernel memory) after creation of a process? And yes, I am aware that one could hook into the memory-mapping process during process creation to replace the image originally destined to be loaded with another. Point is that a process is merely one instance of running a given executable. And it's not clear what exact tampering attempts you want to counter here.
But from kernel mode you do have the ability to simply use the pointer to the EPROCESS structure. No need to use the PID, although that will be unique while the process is still alive.
So assuming your process uses an IRP to communicate to the driver (whether it be WriteFile, ReadFile, DeviceIoControl or something more exotic), in order to register itself, you can use IoGetCurrentProcess to get the PEPROCESS value which will be unique to the process.
While the structure itself is not officially documented, hints can be gleaned from the "Windows Internals" book (in its various incarnations), the dt (Display Type) command in WinDbg (and friends) as well as from third-party resources on the internet (e.g. here, specific to Vista).
The process objects are kept in several linked lists. So if you know the (officially undocumented!!!) layout for a particular OS version, you may traverse the lists to get from one to the next process object (i.e. EPROCESS structure).
Cautionary notes
Make sure to reference the object of the process, by using the respective object manager routines. Otherwise you cannot be certain it's safe to both reach into these structures (which is anyway unsafe, since you cannot rely on their layout across OS versions) or to pass it to functions that expect a PEPROCESS.
As a side-note: Harry Johnston is of course right to assert that a privileged user can insert arbitrary (well almost arbitrary) code into the TCB in order to thwart your protective measures. In the end it is going to be an arms race.
Also keep in mind that similar to PIDs, theoretically the value of the PEPROCESS may be recycled. But in both cases you can simply counter this by invalidating whatever internal state you keep in your driver that allows the process to do its magic, whenever the process goes down. Using something like PsSetCreateProcessNotifyRoutine would seem to be a good method here. In order to translate your process handle from the callback to a PEPROCESS value, use ObReferenceObjectByHandle.
An alternative of countering recycling of the PID/PEPROCESS is by keeping a reference to the process object and thus keeping it in a kind of undead state (similar to not closing a handle in user mode), although the main thread may have finished.

Is it possible to associate data with a running process?

As the title says, I want to associate a random bit of data (ULONG) with a running process on the local machine. I want that data persisted with the process it's associated with, not the process thats reading & writing the data. Is this possible in Win32?
Yes but it can be tricky. You can't access an arbitrary memory address of another process and you can't count on shared memory because you want to do it with an arbitrary process.
The tricky way
What you can do is to create a window (with a special and known name) inside the process you want to decorate. See the end of the post for an alternative solution without windows.
First of all you have to get a handle to the process with OpenProcess.
Allocate memory with VirtualAllocEx in the other process to hold a short method that will create a (hidden) window with a special known name.
Copy that function from your own code with WriteProcessMemory.
Execute it with CreateRemoteThread.
Now you need a way to identify and read back this memory from another process other than the one that created that. For this you simply can find the window with that known name and you have your holder for a small chunk of data.
Please note that this technique may be used to inject code in another process so some Antivirus may warn about it.
Final notes
If Address Space Randomization is disabled you may not need to inject code in the process memory, you can call CreateRemoteThread with the address of a Windows kernel function with the same parameters (for example LoadLibrary). You can't do this with native applications (not linked to kernel32.dll).
You can't inject into system processes unless you have debug privileges for your process (with AdjustTokenPrivileges).
As alternative to the fake window you may create a suspended thread with a local variable, a TLS or stack entry used as data chunk. To find this thread you have to give it a name using, for example, this (but it's seldom applicable).
The naive way
A poor man solution (but probably much more easy to implement and somehow even more robust) can be to use ADS to hide a small data file for each process you want to monitor (of course an ADS associated with its image then it's not applicable for services and rundll'ed processes unless you make it much more complicated).
Iterate all processes and for each one create an ADS with a known name (and the process ID).
Inside it you have to store the system startup time and all the data you need.
To read back that informations:
Iterate all processes and check for that ADS, read it and compare the system startup time (if they mismatch then it means you found a widow ADS and it should be deleted.
Of course you have to take care of these widows so periodically you may need to check for them. Of course you can avoid this storing ALL these small chunk of data into a well-known location, your "reader" may check them all each time, deleting files no longer associated to a running process.

Pipe output(stdout) from running process Win32Api

I need to get (or pipe) the output from a process that is already running, using the windows api.
Basically my application should allow the user to select a window to pipe the input from, and all input will be displayed in a console. I would also be looking on how to get a pipe on stderr later on.
Important: I did not start the process using CreateProcess() or otherwise. The process is already running, and all I have is the handle to the process (returned from GetWindowThreadProcessId()).
The cleanest way of doing this without causing any ill effects, such that may occur if you used the method Adam implied of swapping the existing stdout handle with your own, is to use hooking.
If you inject a thread into the existing application and swap calls to WriteFile with an intercepted version that will first give you a copy of what's being written (filtered by handle, source, whatever) then pass it along to the real ::WriteFile with no harm done. Or you can intercept the call higher up by only swapping out printf or whichever call it is that the software is using (some experimentation needed, obviously).
HOWEVER, Adam is spot-on when he says this isn't what you want to do. This is a last resort, so think very, very carefully before going down this line!
Came across this article from MS while searching on the topic.
http://support.microsoft.com/kb/190351
The concept of piping input and output on Unix is trivial, there seems no great reason for it to be so complex on Windows. - Karl
Whatever you're trying to do, you're doing it wrong. If you're interacting with a program for which you have the source code, create a defined interface for your IPC: create a socket, a named pipe, windows messaging, shared memory segment, COM server, or whatever your preferred IPC mechanism is. Do not try to graft IPC onto a program that wasn't intending to do IPC.
You have no control over how that process's stdout was set up, and it is not yours to mess with. It was created by its parent process and handed off to the child, and from there on out, it's in control of the child. You don't go in and change the carpets in somebody else's house.
Do not even think of going into that process, trying to CloseHandle its stdout, and CreateFile a new stdout pointing to your pipe. That's a recipe for disaster and will result in quirky behavior and "impossible" crashes.
Even if you could do what you wanted to do, what would happen if two programs did this?

What happens to a process handle once the process was ended?

if I have a handle to some windows process which has stopped (killed or just ended):
Will the handle (or better the memory behind it) be re-used for another process?
Or will GetExitCodeProcess() for example get the correct result forever from now on?
If 1. is true: How "long" would GetExitCodeProcess() work?
If 2. is true: Wouldn't that mean that I can bring down the OS with starting/killing new processes, since I create more and more handles (and the OS reserves memory for them)?
I'm a bit confused about the concept of handles.
Thank you in advance!
The handle indirectly points to an kernel object. As long as there are open handles, the object will be kept alive.
Will the handle (or better the memory behind it) be re-used for another process?
The numeric value of the handle (or however it is implemented) might get reused, but that doesn't mean it'll always point to the same thing. Just like process IDs.
Or will GetExitCodeProcess() for example get the correct result forever from now on?
No. When all handles to the process are closed, the process object is freed (along with its exit code). Note that running process holds an implicit handle to itself. You can hold an open handle, though, as long as you need it.
If 2. is true: Wouldn't that mean that I can bring down the OS with starting/killing new processes, since I create more and more handles (and the OS reserves memory for them)?
There are many ways to starve the system. It will either start heavily swapping or just fail to spawn a new process at some point.
Short answer:
GetExitCodeProcess works until you call CloseHandle, after what the process object will be released and may be reused.
Long answer:
See Cat Plus Plus's answer.

Resources