MiniDumpWriteDump from another process - windows

I'm wanting to use MiniDumpWriteDump to generate crash logs for an application of mine. Microsoft recommends performing the dump from another process, which is what I'm trying to do. The issue I'm having is with passing the PEXCEPTION_INFORMATION structure from the parent to the child process. The issue is that the parent process owns the memory for this structure, and I need to give it to the child. I found this post
How do I get at the exception information when using MiniDumpWriteDump out-of-process?
And the accepted answer said "It doesn't matter that the pointer is not valid in the context of the watchdog process." which lead me to believe I could simply pass the PEXCEPTION_INFORMATION pointer that my unhandled exception filter receives to the child process, and windows would read it from the parent. This isn't happening and so I don't really know what to do, at the moment the child process crashes, presumably because windows tries to access this memory as-if it belonged to the child. I'm obviously missing something here, but I'm not sure what. I use pipes to send data to the child process, and the answer to the above question says using memory mapped files works, but I'm not really sure why, or if I'm understanding the answer correctly.

Debug the process you want to dump.
typedef struct _EXCEPTION_POINTERS {
PEXCEPTION_RECORD ExceptionRecord;
PCONTEXT ContextRecord;
} EXCEPTION_POINTERS, *PEXCEPTION_POINTERS;
ExceptionRecord can be got from EXCEPTION_DEBUG_EVENT by WaitforDebugEventEx.
ContextRecord can be got by OpenThread and GetThreadContext with threadid from DebugEvent

Related

What is the ideal way to emulate process replacement on Windows?

So, in a feature request I filed against Node.js, I was looking for a way to replace the current Node process with another. In Linux and friends (really, any POSIX-compliant system), this is easy: use execve and friends and call it a day. But obviously, that won't work on Windows, since it only has CreateProcess (which execve and friends delegate to, complete with async behavior). And it's not like people haven't wanted to do similar, leading to numerous duplicate questions on this site. (This isn't a duplicate because it's explicitly seeking a workaround given certain constraints, not just asking for direct replacement.)
Process replacement has several facets that have to addressed:
All console I/O streams have to be forwarded to the new process.
All signals need transparently forwarded to the new process.
The data from the old process have to be destroyed, with as many resources reclaimed as possible.
All pre-existing threads and child processes should be destroyed.
All pre-existing handles should be destroyed apart from open file descriptors and named pipes/etc.
Optimally, the old process's memory should be kept to a minimum after the process is created.
For my particular use case, retaining the process ID is not important.
And for my particular case, there are a few constraints:
I can control the initial process's startup as well as the location of my "process replacement" function.
I could load arbitrary native code via add-ons at potentially any stack offset.
Implication: I can't even dream of tracking malloc calls, handles, thread manipulation, or process manipulation to track and free them all, since DLL rewriting isn't exactly practical.
I have no control over when my "process replacement" is called. It could be called through an add-on, which could've been called through either interpreted code via FFI or even another add-on recursively. It could even be called during add-on initialization.
Implication: I would have no ability to know what's in the stack, even if I perfectly instrumented my side. And rewriting all their calls and pushes is far from practical, and would just be all-around slow for obvious reasons.
So, here's the gist of what I was thinking: use something similar to a pseudo-trampoline.
Statically allocate the following:
A single pointer for the stack pointer.
MAX_PATH + 1 chars for the application path + '\0'.
MAX_PATH + 1 chars for the current working directory path + '\0'.
32768 chars for the arguments + '\0'.
32768 chars for the environment + '\0'.
On entry, set the global stack pointer reference to the stack pointer.
On "replacement":
Do relevant process cleanup and lock/release everything you can.
Set the stack pointer to the stored original global one.
Terminate each child thread.
Kill each child process.
Free each open handle.
If possible (i.e. not in a UWP program), For each heap, destroy it if it's not the default heap or the temporary heap (if it exists).
If possible, close each open handle.
If possible, walk the default heap and free each segment associated with it.
Create a new process with the statically allocated file/arguments/environment/etc. with no new window created.
Proxy all future received signals, exceptions, etc. without modification to this process somehow. The standard signals are easy, but not so much with the exceptions.
Wait for the process to end.
Return with the process's exit code.
The idea here is to use a process-based trampoline and drop the current process size to an absolute minimum while the newly created one is started.
But where I'm not very familiar with Windows, I probably made quite a few mistakes here. Also, the above seems extremely inefficient and to an extent it just feels horribly wrong for something a kernel could just release a few memory pages, deallocate a bunch of memory handles, and move some memory around for the next process.
So, to summarize, what's the ideal way to emulate process replacement on Windows with the fewest limitations?
Given that I don't understand what is actually being requested and I certainly look at things like 'execve' with a "who the hell would ever call that anyway, nothing but madness can ever result" sentiment, I nonetheless look at this problem by asking myself:
if process-a was killed and replaced by an near identical process-b - who or what would notice?
Anything that held the process id, or a handle to the process would certainly notice. This can be handled by writing a wrapper app which loads the first node process, and when prodded, kills it and loads the next. External observers see the wrapping process handles and id's unchanged.
Obviously this would cut off the stdin and stdout streams being fed into the node applications. But again, the wrapper process could get around this by passing the same set of inheritable handles to each node process launched by filling in the STARTUPINFO structure passed to CreateProcess properly.
Windows doesn't support signals, and the ones that the MS C runtime fake all deal with internal errors except one, which deals with an interactive console window being closed via ctrl-C, which the active Node.js app is sure to get anyway - or can be passed on from the wrapper as the node apps would not actually be running on the interactive console with this approach.
Other than that, everything else seems to be an internal detail of the Node.js application so shouldn't effect any 3rd party app communicating with what it thinks is a single node app via its stdin/stdout streams.

How to identify a process in Windows? Kernel and User mode

In Windows, what is the formal way of identifying a process uniquely? I am not talking about PID, which is allocated dynamically, but a unique ID or a name which is permanent to that process. I know that every program/process has a security descriptor but it seems to hold SIDs for loggedin user and group (not the process). We cannot use the path and name of executable from where the process starts as that can change.
My aim is to identify a process in the kernel mode and allow it to perform certain operation. What is the easiest and best way of doing this?
Your question is too vague to answer properly. For example how could the path possibly change (without poking around in kernel memory) after creation of a process? And yes, I am aware that one could hook into the memory-mapping process during process creation to replace the image originally destined to be loaded with another. Point is that a process is merely one instance of running a given executable. And it's not clear what exact tampering attempts you want to counter here.
But from kernel mode you do have the ability to simply use the pointer to the EPROCESS structure. No need to use the PID, although that will be unique while the process is still alive.
So assuming your process uses an IRP to communicate to the driver (whether it be WriteFile, ReadFile, DeviceIoControl or something more exotic), in order to register itself, you can use IoGetCurrentProcess to get the PEPROCESS value which will be unique to the process.
While the structure itself is not officially documented, hints can be gleaned from the "Windows Internals" book (in its various incarnations), the dt (Display Type) command in WinDbg (and friends) as well as from third-party resources on the internet (e.g. here, specific to Vista).
The process objects are kept in several linked lists. So if you know the (officially undocumented!!!) layout for a particular OS version, you may traverse the lists to get from one to the next process object (i.e. EPROCESS structure).
Cautionary notes
Make sure to reference the object of the process, by using the respective object manager routines. Otherwise you cannot be certain it's safe to both reach into these structures (which is anyway unsafe, since you cannot rely on their layout across OS versions) or to pass it to functions that expect a PEPROCESS.
As a side-note: Harry Johnston is of course right to assert that a privileged user can insert arbitrary (well almost arbitrary) code into the TCB in order to thwart your protective measures. In the end it is going to be an arms race.
Also keep in mind that similar to PIDs, theoretically the value of the PEPROCESS may be recycled. But in both cases you can simply counter this by invalidating whatever internal state you keep in your driver that allows the process to do its magic, whenever the process goes down. Using something like PsSetCreateProcessNotifyRoutine would seem to be a good method here. In order to translate your process handle from the callback to a PEPROCESS value, use ObReferenceObjectByHandle.
An alternative of countering recycling of the PID/PEPROCESS is by keeping a reference to the process object and thus keeping it in a kind of undead state (similar to not closing a handle in user mode), although the main thread may have finished.

What happens when kernel delayed_work is rescheduled

I am using the kernel shared workqueue, and I have a delayed_work struct that I want to reschedule to run immediately.
Will the following code guarantee that the delayed_work will run as soon as possible?
cancel_delayed_work(work);
schedule_delayed_work(work, 0);
What happens in a situation where the work is already running? cancel_delayed_work will return 0, but I'm not sure what schedule_delayed_work will do if the work is currently running or is unscheduled.
Well, you know what they say about necessity being the mother of all invention (or research in this case). I really needed this answer and got it by digging through kernel/workqueue.c. Although the answer is mostly contained in the doc comments combined with Documentation/workqueue.txt, it isn't clearly spelled out without reading the whole spec on the Concurrency Managed Workqueue (cmwq) subsystem and even then, some of the information is out of date!
Short Answer
Will [your code] guarantee that the delayed_work will run as soon as possible?
Yes (with the below caveat)
What happens in a situation where the work is already running?
It will run at some point after the currently running delayed_work function exits and on the same CPU as the last one, although any other work already queued on that workqueue (or delayed work that is due) will be run first. This is presuming that you have not re-initialized your delayed_work or work_struct object and that you have not changed the work->function pointer.
Long Answer
So first off, struct delayed_work uses pseudo-inheritance to derive from struct work_struct by embedding a struct work_struct as its first member. This subsystem uses some amazing atomic bit-frigging to have some serious concurrency. A work_struct is "owned" when it's data field has the WORK_STRUCT_PENDING bit set. When a worker executes your work, it releases ownership and records the last work pool via the private set_work_pool_and_clear_pending() function -- this is the last time the API modifies the work_struct object (until you re-schedule it, of course). Calling cancel_delayed_work() does the exact same thing.
So if you call cancel_delayed_work() when your work function has already begun executing, it returns false (as advertised) since it is no longer owned by anybody, even though it may still be running. However, when you try to re-add it with schedule_delayed_work(), it will examine the work to discover the last pool_workqueue and then find out if any of that pool_workqueue's workers are currently running your work. If they are (and you haven't changed the work->func pointer), it simply appends the work to the queue of that pool_workqueue and that's how it avoids re-entrancy! Otherwise, it will queue it on the pool for the current CPU. (The reason for the work->func pointer check is to allow for reuse of the work_struct object.)
Note however that simply calling schedule_delayed_work() without cancelling it first will result in no change if the work is still queued, so you definitely must cancel it first.
EDIT: Oh yeah, if you are confused by the discussion in Documentation/workqueue.txt about WQ_NON_REENTRANT, ignore it. This flag is deprecated and ignored and all workqueues are now non-reetrant.

What happens to a process handle once the process was ended?

if I have a handle to some windows process which has stopped (killed or just ended):
Will the handle (or better the memory behind it) be re-used for another process?
Or will GetExitCodeProcess() for example get the correct result forever from now on?
If 1. is true: How "long" would GetExitCodeProcess() work?
If 2. is true: Wouldn't that mean that I can bring down the OS with starting/killing new processes, since I create more and more handles (and the OS reserves memory for them)?
I'm a bit confused about the concept of handles.
Thank you in advance!
The handle indirectly points to an kernel object. As long as there are open handles, the object will be kept alive.
Will the handle (or better the memory behind it) be re-used for another process?
The numeric value of the handle (or however it is implemented) might get reused, but that doesn't mean it'll always point to the same thing. Just like process IDs.
Or will GetExitCodeProcess() for example get the correct result forever from now on?
No. When all handles to the process are closed, the process object is freed (along with its exit code). Note that running process holds an implicit handle to itself. You can hold an open handle, though, as long as you need it.
If 2. is true: Wouldn't that mean that I can bring down the OS with starting/killing new processes, since I create more and more handles (and the OS reserves memory for them)?
There are many ways to starve the system. It will either start heavily swapping or just fail to spawn a new process at some point.
Short answer:
GetExitCodeProcess works until you call CloseHandle, after what the process object will be released and may be reused.
Long answer:
See Cat Plus Plus's answer.

Shutdown exception handling for Win32/C++

I have a process that handles exceptions great. It calls:
_set_se_translator(exception_trans_func);
SetUnhandledExceptionFilter(UnhandledExceptionFilterHandler);
_set_purecall_handler(purecallHandler);
set_terminate(terminateHandler);
set_unexpected(unexpectedHandler);
_set_invalid_parameter_handler(InvalidParameterHandler);
atexit(exitHandler); //ignored during an expected exit
_onexit(onexitHandler); //ignored during an expected exit
Anytime an exception happens, one of the handlers is called which creates a crash dump for me. Life is good.
Except at one customer site. When they shutdown the process, there is an exception that isn't routed through these calls for some reason and they get the error:
The instruction at "0x101ba9df" referenced memory at "0x00000004". The memory could not be "read". Click OK to terminate...."
The memory reference of x000000004 looks like it's probably a null pointer. And looking at that address appears to be a global STL object's destructor (probably in the CRT's initterm call where globals are cleaned up).
Right now I'm kind of stuck though since I can't get a diagnostic dump and call stack and see exactly what is going on. So....
Why isn't the exception being routed through the above handlers, and instead being shown to the user?
Is there any way to hide that dialog (since no harm is being done at that point)?
And is there a way to track down the root error?
Thanks for any ideas.
What operating system are they running?
I assume you're setting the error mode using something like
::SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX | SEM_NOOPENFILEERRORBOX);
to make sure that windows isn't jumping in with its own error handling?
This sounds like the CRT has put an SEH try/catch block (can't write it properly, Markdown kicks in) around some piece of code, and is catching the exception to display the message, so you never end up calling the unhandled exception code path. You might have to do some CRT hacking to figure out what's happening.
It could be that STL code is being executed during the destruction of global variables at program shutdown time and perhaps (depending on the version of STL that you're using) some global variables that it requires have already been destroyed.
I've seen this with VS2008's STL. There are some STL lock objects that are created via a file level static during start up.
Are you using STL in your error handler functions? It could be that one of these is going off late in program shutdown and causing the problem.

Resources