Obtaining stack trace after an access violation on Windows - windows

I'm trying to use the StackWalk64 function in DbgHelp.dll to get a stack trace when I receive a SIGSEGV, but the stack trace obtained is unrelated to the actual site of the access violation:
[0] sigsegv_handler() e:\hudson\jobs\ide-nightly-trunk\workspace\ide-nightly-trunk\core\ide\cspyserver\src\stackwalker\cssstackwalker.cpp:31
[1] XcptFilter() C:\Windows\WinSxS\x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4148_none_5090ab56bcba71c2\MSVCR90.dll
[2] __tmainCRTStartup() f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c:603
[3] seh_longjmp_unwind4() C:\Windows\WinSxS\x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4148_none_5090ab56bcba71c2\MSVCR90.dll
[4] BaseThreadInitThunk() C:\Windows\syswow64\kernel32.dll
[5] RtlCreateUserProcess() C:\Windows\SysWOW64\ntdll.dll
[6] RtlCreateProcessParameters() C:\Windows\SysWOW64\ntdll.dll
I suspect that weird windows exception handling and setjmp/longjmp is involved, but I'm not really sure what I should be looking for.

Please note that it's always going to be challenging to get a reliable stack fault after an access violation. By definition the process is corrupt when the AV occurs so it may be impossible to retrieve the actual stack trace afterwords (for instance what happens if the error which caused the exception also corrupted some of the structures used by your stack walking logic)?
In this case, it appears that you're trying to capture the stack trace in your exception filter which will never work - the exception filter is run on a partially unwound stack. You can find the exception record and the context record for the failure with the GetExceptionInformation API (this API only works from the filter expression so you need to do something like
__try
{
<stuff>
}
__except(MyExceptionFilter(GetExceptionInformation())
{
<stuff>
}
You should be able to retrieve the accurate stack trace with the context record and exception information.

I don't have any experience on Windows using the C-runtime support in this area. But I have had good success using the vectored exception handler feature (see MSDN AddVectoredExceptionHandler). The EXCEPTION_POINTERS structure passed to the handler can be used with the MiniDumpWriteDump API to produce a user mode dump file that you can open with WinDbg to inspect the exception.
Notes:
- you need to run .excr after opening the dump to switch to the exception context.
- Vector exception filters get called for all exceptions so be sure to filter out only those that you are interested in by looking at the EXCEPTION_RECORD::ExceptionCode passed to the filter.

Related

What can cause an ARM MemManage exception with all bits in the MMSFR register zero?

I'm working on Ethernet code on an STM32F429 ARM Cortex M4 device and running into a situation where I'm getting an MemManage exception where the cause is proving very difficult to track down. From what I understand, the MemManage exception is caused by some violation of the MPU such as trying to execute code in the protected register space at 0xE0000000 and above. Cortex M4 documentation I've read indicates the reason for the exception should be captured in the MMSFR register bits and that the address of the error may be captured in the MMFAR register in certain circumstances.
What frustrating me is that MemManage exception is being generated with all bits in the MMSFR register zero. I'm executing a breakpoint instruction just as the exception handler is entered so I'm pretty sure the MMSFR is not being accidentally cleared. Furthermore, no where in my code am I even using the MPU and it should be in its default state on power-up. Finally, I can purposely create a MemManage exception elsewhere in my code and the MMSFR bits correctly identify the issue I triggered. Unwinding the stack from the exception, the only thing unusual thing about the PC is that it's in the middle of code that is called early on to initialize the RTOS, but should not be executing later when the exception occurs. I'm trying to determine how the PC got to the value it did, but it's proving difficult to isolate.
Does someone have some ideas as to why the the MemManage exception might occur without the MMSFR bits being set? Or, suggestions for techniques to better understand the circumstances that occur in my code just before the exception occurs.
My instinct (not necessarily accurate!) is that something's not right here. There's no reason that the MemManage exception should not accurately log the reason for its invocation, and your mention of the PC having been somewhere it shouldn't have been suggests that whatever's wrong went wrong well before the exception entry. On that basis I think you'll learn more by identifying where the exception takes place than by trying to deduce the cause from the exception type.
I'd start by checking the value in LR at the point you've identified that the exception takes place. This won't necessarily tell you where the PC corruption took place, but it'll tell you where the last BL was issued prior to the problem, so it might help put bounds on where the problem might be. You might also find it helpful to check the exception state bits in the PSR ([8-0]) to confirm the type of the fault. (MemManage is 0x004.)
I finally tracked down the issue. It was code executing a callback function within a structure, but the structure pointer was a null pointer. The offset of the callback function within the structure corresponded to the offset of the MemManager exception handler in the vector table from address zero. Thus, the MemManager handler was not being called via an exception, but rather a simple function call. This was why the stack looked confusing to me -- I was expecting to see a an exception stack frame rather than a simple function call stack frame.
The clue to me was the exception state bits in the PSR ([8-0]) being all zeros (thanks to the suggestion from cooperised) which indicates my MemManager exception was not actually being called as an exception. I then backtracked from there to understand what code was responsible for calling the handler as a function call. My flawed assumption was that the only way the MemManager handler could be reached was via an exception -- with the PSR value and non-exception stack frame being the major clues that I was ignoring.
Double-check that the exception you're getting is actually MemManage and not something else (e.g. if you're using a shared handler for several exception types). Another possibility is that you're getting an imprecise fault and the information about the original fault has been discarded. From FreeRTOS debugging guide:
ARM Cortex-M faults can be precise or imprecise. If the IMPRECISERR
bit (bit 2) is set in the BusFault Status Register (or BFSR, which
is byte accessible at address 0xE000ED29) is set then the fault is
imprecise.
...
In the above example, turning off write buffering by setting the
DISDEFWBUF bit (bit 1) in the Auxiliary Control Register (or
ACTLR) will result in the imprecise fault becoming a precise fault,
which makes the fault easier to debug, albeit at the cost of slower
program execution.

Under what conditions do I need to set up SEH unwind info for an x86-64 assembly function?

The 64-bit Windows ABI defines a generalized exception handling mechanism, which I believe is shared across C++ exceptions and structured exceptions available even in other languages such as C.
If I'm writing an x86-64 assembly routine to be compiled in nasm and linked into a C or C++ library, what accommodations do I need make on Windows in terms of generating unwind info and so on?
I'm not planning on generating any exceptions directly in the assembly code, although I suppose it is possible that the code may get an access violation if a user-supplied buffer is invalid, etc.
I'd like the write the minimum possible to get this to work, especially since it seems that nasm has poor support for generating unwind info and using MASM is not an option for this cross-platform project. I do need to use (hence save and restore) non-volatile registers.
As a general rule, Windows x64 requires all functions to provide unwind information. The only exception is for leaf functions which do not modify rsp and do not modify any nonvolatile registers.
Judging by the context of your question, what you really want to know is the practical consequences of not providing unwind information for your non-leaf assembly functions on x64 Windows. Since C++ exceptions are implemented based on SEH exceptions, when I talk about exceptions below, I mean both all "native" (access violation, something thrown using RaiseException, etc.) and C++ exceptions. Here's a list off the top of my head:
Exceptions won't be able to pass through your function
It's important to note that this point is not about throwing an exception, or an access violation happening directly in your function. Let's say your assembly code calls into a C++ function, which throws an exception. Even if the caller of your assembly function has a matching catch block, it will never be able to catch the exception, as unwinding will stop at your function without the unwind data.
When walking the stack, the stack walk will stop at the function without unwind data (or go astray; the point is, you will get an invalid call stack)
Basicaly, anything that walks the stack is screwed if your function is present on the call stack (debuggers when displaying the call stack, profilers, etc.)
Registered Unhandled Exception Filters will not be called back if an exception gets thrown, and your assembly function is on the call stack
This interferes with anything that relies on UEFs. Custom crash handlers, for instance. Or something potentially more relevant: std::terminate won't be called back in this case, if your program throws a C++ exception, that is unhandled (as it's dictated by the C++ standard). The MSVC runtime uses a UEF to implement this, so this won't work as well.
Are you developing a 3rd party library? If that's the case, the importance of the above points will depend on the use case of your clients.

Why does eclipse debugger only show 1 or 2 lines of the stack followed by 0x0?

On Linux I get nice, healthy, full stack traces. On Windows, however, when something crashes (like a segfault violation), I only get the top one or two lines of the stack, followed by the entry 0x0 (which I cannot expand). This makes it very hard to debug
Probably you should start using WinDBG to debug your program instead of IDE like eclipse. This is very powerful command line tool and its functionality is very similar to GDB.
On Windows, "UnhandledExceptionFilter" function is called when no exception handler is defined to handle the exception that is raised. The function typically passes the exception up to the Ntdll.dll file, which catches and tries to handle it.
EXCEPTION_POINTERS structure does contains the most useful information about what is the exception and where it has occurred which gets passed as one of the parameter of the above function. This information would be used by .exr and .cxr command in WinDBG to get the complete stack trace.
typedef struct _EXCEPTION_POINTERS {
PEXCEPTION_RECORD ExceptionRecord;
PCONTEXT ContextRecord;
} EXCEPTION_POINTERS, *PEXCEPTION_POINTERS;
ExceptionRecord A pointer to an EXCEPTION_RECORD structure that
contains a machine-independent description of the exception.
ContextRecord A pointer to a CONTEXT structure that contains a
processor-specific description of the state of the processor at the
time of the exception.
For complete steps about how to get the complete back trace and analysis from the dump file(like GDB)or debug session, you may want to read and follow the steps mentioned in the following link:
http://support.microsoft.com/kb/313109

C++/CX caught exception - how to print the full stack?

I have a Windows Store application (for Windows 8) written in C++/CX and I have wrapped a chunk of my code in a try/catch block.
The catch block is working and catches an exception, but so far I only seem to be able to print out the "message" part of the exception and not the full exception stack:
try
{
...
}
catch(Exception^ e)
{
LogMessage("Exception caught: " + e->ToString());
}
When the exception is caught, the LogMessage outputs only the following text:
"Exception caught: The object already exists"
I've tried e->ToString() and e->Message, but both result in the same output and that does not include the full exception stack.
In C# it seems to be really easy to output the full exception stack, so I am not sure why it seems to be difficult in C++/CX ?
This is difficult in C++/CX because determining what functions would be in the stack requires code to parse debugging symbols. In C#, the CLR does work at runtime to remember which methods are in the stack, but in C++/CX, the names of functions are not recorded in the resulting binary. Put another way, the stack trace you get in C# depends on a C# feature: reflection.
Moreover, an exception may result from a call into code which is a plain COM API, rather than a C++/CX API. In such cases, the exception is generated from an error HRESULT return code underneath, not at the time where the exception is thrown. (Indeed, this is what happens whenever crossing component boundaries; this is handled with plain COM even if both sides of the operation are C++/CX) As such, the stack you would need for a trace is no longer available.
C++ exceptions do not record a stack trace. On the plus side, from native programs you can collect a minidump when an unhandled exception occurs, which lets you view the stack using a debugger if you need to.
Keep in mind that a C++/CX program is pure unmanaged C++ code. The CX language extension only makes it easy to consume WinRT types in your C++ code, it hides the COM implementation details. So it gets the full treatment of the code optimizer. Which does not try to ensure that stack walks can be safely performed. Particularly so in leaf functions that don't throw exceptions. It will readily omit setting the EBP register, the important one that indicates the base of a stack activation frame.
This is not the case in managed code, like C#. Stack walks are very important in a garbage collected runtime environment. The garbage collector must perform them to find object references when it collects garbage. Code Access Security also depends on stack walks. A happy side effect is that it now also becomes very easy to generate a stack trace for an exception. It is even exposed in the framework api, the StackTrace class lets you walk the stack in your own code.
No simple fix for this, you need debugging symbols to have a shot at it. And StackWalk64 from the DbgHelp api. With odds that you still don't get anywhere because the program crashed somewhere in the bowels of a Windows function. Speed trumps convenience in C++.

Useless stack trace in SetUnhandledExceptionFilter handler

I've been using SetUnhandledExceptionFilter for a long time, and my handler walks the stack and uses dbghelp.dll to convert the addresses into File/Line references. It then writes that to a log file and puts up a dialog with the same information for the user. This USED to work just fine. These days however I'm getting a completely useless stack:
1004bbaa: Lgid.dll, C:\Data\Code\Lgi\trunk\src\win32\Lgi\LgiException.cpp:175
10057de0: Lgid.dll, C:\Data\Code\Lgi\trunk\src\win32\Lgi\GApp.cpp:107
7c864191: kernel32.dll, UnhandledExceptionFilter+0x1c7
102158ed: MSVCRTD.dll, winxfltr.c:228
006dc1a7: Scribe.exe, crtexe.c:345
7c817077: kernel32.dll, RegisterWaitForInputIdle+0x49
00000000: Scribe.exe
Where 'Scribe.exe' is my application. Now if I walk the debugger from the exception handler back up the stack several frames I eventually get to a completely different temporary stack that actually includes all the calls that led up to the crash. Which is the information I actually want to log for the user. It's as if the exception handler is executing on a separate stack from the main application.
What I need is the stack information for the actual application stack, that includes all the calls leading up to the crash. Is there some easy way to get that from inside the exception handler?
According to http://www.eptacom.net/pubblicazioni/pub_eng/except.html I can get the exception's EIP and EBP out of the EXCEPTION_POINTERS 'Context' member. So I tried passing that EBP to my stack walker as it's initial point and it could then walk the application stack correctly. As long as I put the EIP as the first point in the stack walk I get the whole thing.
Are you using x64? Could you be hitting http://blog.paulbetts.org/index.php/2010/07/20/the-case-of-the-disappearing-onload-exception-user-mode-callback-exceptions-in-x64/ ?

Resources