Windows stack guard page not triggering in _chkstk - winapi

I have seen a few crashes "in the wild" where the crash dump shows the code throwing an access violation inside _chkstk when attempting to expand the stack. Windbg shows that _chkstk is touching the guard page, however rather than expanding the stack as it should, it just throws an access violation.
I suspected this might be due to user mode structured exception handlers in the code, however my testing shows that under normal conditions the _chkstk guard page exceptions happen in kernel mode and never even reach the user mode exception handlers.
Hence in this case it looks like the kernel mode guard page exceptions are not being handled for some reason, and instead user mode access violations are triggered.
What could cause this to happen?

This turned out to be an XP/Server 2003 kernel specific issue. On those OS if one thread reads another threads stack then the guard page and TIB state get messed up and any subsequent attempts to grow the stack (_chkstk) result in an access violation. This does not happen on later OS.
In our case we were writing out an in-process minidump containing thread stacks, which would corrupt the stack state as described when the dbghelp library read each threads stack.
The moral of the story is that it is not safe to generate in-process minidumps, they should always be generated by an external process.

Related

Detect UI operation which will "hang" the application if running in service mode

Fellow experts!
I have faced the following dilemma: some of our tools (executables) are started as scheduled tasks, some are started as services and others as usual desktop apps with interactive Windows user. We are using the code sharing strategy for source management (this is not debatable for this question).
So the solution I want to find is the following:
Detect UI operation at run-time which leads to hanging service/background task (such as say call to Application.ShowException, ShowMessage, MessageDialog, TForm.Show etc.). And when such an action detected I want to raise the exception instead. Then the operation will fail, we will have stack trace etc. but the process will not hang up! The most problematic hang up is when some event processing is done in transaction and then in some of the code used to process event suddenly (because of error in code, design, whatever) there is UI code executed then the process hangs and the DB parts can be locked!
What I think I need to do is: Use DDetours library to intercept WinAPI calls to a certain routines and raise exception instead (so that the process does not hang, but just fail in some method). Also I know that the creation of forms and windows does not hang the app, but only the tries to show them to the user.
Is there some known method of handling this problem? Or maybe there is some list of WinAPI routine set which hangs in service mode?
Thank you in advance.

Silent Process Exit due to KiSchedulerApcTerminate

I have a Windows 10 machine where some UI process runs happily for days until it silently exits without any visible exception.
I have enabled ETW tracing but pretty much all call stacks come up with ???? marks. I suppose there was no proper ETW provider rundown happening in this process. After enabling Process Destruction call stacks I have found that one
[Root]
ntoskrnl.exe!KiPageFault
ntoskrnl.exe!KiInitiateUserApc
ntoskrnl.exe!KiDeliverApc
ntoskrnl.exe!KiSchedulerApcTerminate
ntoskrnl.exe!PsExitCurrentUserThread
ntoskrnl.exe!PspExitThread
ntoskrnl.exe!PspExitProcess
Does this indicate a page in error so I should suspect a hardware failure of the hard disc or memory?
I have found a failed file io request at that time:
The specified request is not a valid operation for the target device.
(0xc0000010) = Event SubType, FSCTL = Event Type, C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.ServiceModel\v4.0_4.0.0.0__b77a5c561934e089\System.ServiceModel.dll
which makes me think that the hard disk has an issue.

Win32: Any way to allocate memory range and have traps on specific pages?

I was wondering if Windows exposes an API that allows a user-mode process to allocate a chunk of address space and then install "traps" on specific pages of this space, so that if these pages are accessed (read/written) by the process, then Windows will call a handler in the program, for instance by invoking a callback function or throwing an exception that can be handled by the program? I think this should be possible for Windows to implement by setting up the page tables to trigger page faults on the relevant pages. Then, if the memory is accessed this will trigger a page fault by the CPU, which the page fault handler can reflect back to the program. But I don't know if Windows actually offers this functionality.
By the way, is the functionality supported in Linux?
Quoting Raymond's answer/comment:
"Use `VirtualProtect' to revoke access to the page, and then install a vectored exception handler."

What errors / exceptions trigger Windows Error Reporting?

When running a Delphi application outside the debugger most exceptions that occur seem to be silently ignored (like an access violation). Sometimes however there appears the Windows error reporting dialog (send or not send, you probably know what I mean). What exactly does this mean? What errors trigger this behaviour?
Additional info: I have a global exception handler for my application that should log all unhandled exceptions. So, no exceptions should leave the application unhandled.
Thanks.
Most exceptions are not silently ignored when running outside the debugger. They are normally caught by the event loop in VCL applications, or fall through to the main begin/end in console applications, etc. The default aciton of the VCL event loop is to display a dialog containing the message associated with the exception.
It's if the exception escapes the application, either by reaching the main begin/end without being caught, or not being caught by the event loop, that the Windows error reporting steps in - functionally, it is an exception handler just like any other except at the very base of the stack.
It covers exceptions that are not handled by the application - if an exception propagates outside of the main entry point of the app, then WER will step in. This covers things like AVs, divide by zero, invalid handle access and other out of band or "chip" exceptions. Sometimes your code can attempt to handle those things, but if memory is corrupted too badly or what have you, then your code will die.
You will generally get problems if you have exceptions in threads that are not handled in the Execute method. The program will mostly be killed, but the behaviour is unpredictable and seems to depend on many things (like the number and state of other threads). Often the main window vanishes immediately, and any further exceptions will thus not be handled by the program, and this is probably what causes WER to catch them.
I made it a habit to have an outer exception handler in Execute that logs any unhandled exceptions and allows the thread to terminate cleanly.
Exceptions occurring in initialization and finalization sections would escape your global exception handler and trigger WER.

Windows processes in kernel vs system

I have a few questions related to Windows processes in kernel and usermode.
If I have a hello world application, and a hello world driver that exposes a new system call, foo(), I am curious about what I can and can't do once I am in kernel mode.
For starters, when I write my new hello world app, I am given a new process, which means I have my own user mode VM space (lets keep it simple, 32 bit windows). So I have 2GB of space that I "own", I can poke and peek until my hearts content. However, I am bound by my process. I can't (lets not bring shared memory into this yet) touch anyone elses memory.
If, I write this hello world driver, and call it from my user app, I (the driver code) is now in kernel mode.
First clarification/questions:
I am STILL in the same process as the user mode app, correct? Still have the same PID?
Memory Questions:
Memory is presented to my process as VM, that is even if I have 1GB of RAM, I can still access 4GB of memory (2GB user / 2GB of kernel - not minding details of switches on servers, or specifics, just a general assumption here).
As a user process, I cannot peek at any kernel mode memory address, but I can do whatever I want to the user space, correct?
If I call into my hello world driver, from the driver code, do I still have the same view of the usermode memory? But now I also have access to any memory in kernel mode?
Is this kernel mode memory SHARED (unlike User mode, which is my own processes copy)? That is, writing a driver is more like writing a threaded application for a single process that is the OS (scheduling aside?)
Next question. As a driver, could I change the process that I am running. Say, I knew another app (say, a usermode webserver), and load the VM for that process, change it's instruction pointer, stack, or even load different code into the process, and then switch back to my own app? (I am not trying to do anything nefarious here, I am just curious what it really means to be in kernel mode)?
Also, once in kernel mode, can I prevent the OS from preempting me? I think (in Windows) you can set your IRQL level to do this, but I don't fully understand this, even after reading Solomons book (Inside Windows...). I will ask another question, directly related to IRQL/DPCs but, for now, I would love to know if a kernel driver has the power to set an IRQL to High and take over the system.
More to come, but answers to these questions would help.
Each process has a "context" that, among other things, contains the VM mappings specific to that process (<2 GB normally in 32bit mode). When thread executing in user mode enteres kernel mode (e.g. from a system call or IO request), the same thread is still executing, in the process, with the same context. PsGetCurrentProcessId will return the same thing at this point as GetCurrentProcessID would have just before in user mode (same with thread IDs).
The user memory mappings that came with the context are still in place upon entering kernel mode: you can access user memory from kernel mode directly. There are special things that need to be done for this to be safe though: Using Neither Buffered Nor Direct I/O. In particular, an invalid address access attempt in the user space range will raise a SEH exception that needs to be caught, and the contents of user memory can change at any time due to the action of another thread in that process. Accessing an invalid address in the kernel address range causes a bugcheck. A thread executing in user mode cannot access any kernel memory.
Kernel address space is not part of a process's context, so is mapped the same between all of them. However, any number of threads may be active in kernel mode at any one time, so it is not like a single threaded application. In general, threads service their own system calls upon entering kernel mode (as opposed to having dedicated kernel worker threads to handle all requests).
The underlying structures that save thread and process state is all available in kernel mode. Mapping the VM of another process is best done ahead of time from the other process by creating an MDL from that process and mapping it into system address space. If you just want to alter the context of another thread, this can be done entirely from user mode. Note that a thread must be suspended to change its context without having a race condition. Loading a module into a process from kernel mode is ill advised; all of the loader APIs are designed for use from user mode only.
Each CPU has a current IRQL that it is running at. It determines what things can interrupt what the CPU is currently doing. Only an event from a higher IRQL can preempt the CPU's current activity.
PASSIVE_LEVEL is where all user code and most kernel code executes. Many kernel APIs require the IRQL to be PASSIVE_LEVEL
APC_LEVEL is used for kernel APCs
DISPATCH_LEVEL is for scheduler events (known as the dispatcher in NT terminology). Running at this level will prevent you from being preempted by the scheduler. Note that it is not safe to have any kind of page fault at this level; there would be a deadlock possibility with the memory manager trying to retrieve pages. The kernel will bugcheck immediately if it has a page fault at DISPATCH_LEVEL or higher. This means that you can't safely access paged pool, paged code segments or any user memory that hasn't been locked (i.e. by an MDL).
Above this are levels connected to hardware device interrupt levels, known as DIRQL.
The highest level is HIGH_LEVEL. Nothing can preempt this level. It's used by the kernel during a bugcheck to halt the system.
I recommend reading Scheduling, Thread Context, and IRQL
A good primer for this topic would be found at: http://www.codinghorror.com/blog/archives/001029.html
As Jeff points out for the user mode memory space:
"In User mode, the executing code has no ability to directly access hardware or reference memory. Code running in user mode must delegate to system APIs to access hardware or memory. Due to the protection afforded by this sort of isolation, crashes in user mode are always recoverable. Most of the code running on your computer will execute in user mode."
So your app will have no access to the Kernel Mode memory, infact your communication with the driver is probably through IOCTLs (i.e. IRPs).
The kernel however has access to everything, including to mappings for your user mode processes. This is a one way street, user mode cannot map into kernel mode for security and stability reasons. Even through kernel mode drivers can map into user mode memory I would advise against it.
At least that's the way it was back before WDF. I am not sure of the capabilities of memory mapping with user mode drivers.
See also: http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fdownload.microsoft.com%2Fdownload%2Fe%2Fb%2Fa%2Feba1050f-a31d-436b-9281-92cdfeae4b45%2FKM-UMGuide.doc&ei=eAygSvfuAt7gnQe01P3gDQ&rct=j&q=user+mode+mapping+into+kernel+mode&usg=AFQjCNG1QYQMcIpcokMoQSWJlGSEodaBHQ

Resources