What does the "Unloaded" prefix mean in a Windows stack trace? - windows

I've the hellish problem of a third party DLL appearing to cause a recursive stack overflow crash when it gets unloaded. I wind up with this pattern on the stack (using windbg):
<Unloaded_ThirdParty.dll>+0xdd01
ntdll!ExecuteHandler2+0x26
ntdll!ExecuteHandler+0x24
ntdll!KiUserExceptionDispatcher+0xf
<Unloaded_ThirdParty.dll>+0xdd01
ntdll!ExecuteHandler2+0x26
ntdll!ExecuteHandler+0x24
ntdll!KiUserExceptionDispatcher+0xf
...
As you would guess, I don't have the source code to ThirdParty.dll.
Q: What does the prefix "Unloaded_" mean in the stack dump. I haven't run across this before.

This means that ThirdParty.dll was no longer being referenced and has already been removed from memory at the time that the crash occurs. To find out the actual stack trace, you need to reload the .dll at its original place in memory with the following command:
.reload /f ThirdParty.dll=0xaaaaaaaa
Of course you need to replace 0xaaaaaaaa with the original base address of the module. This may be somewhat hard to figure out if the module has already been unloaded, but if you have an HMODULE lying around that refers to the dll, the value of that HMODULE is the base address. Worst case, you can add a debugger trace statement to your code that logs the HMODULE of the dll just before you unload it.

I've had a crash just like this before, and as JS points out it means that the dll has been unloaded prior to the crash. However, having the stack trace into that dll may not necessarily give you the information you need to diagnose the problem.
Something in your code is unloading the library because it thinks it's finished with it, but you still have a pointer to it (or to a function inside it) somewhere. My guess would be a callback, perhaps from another thread. I'd suggest searching through your source for any calls to FreeLibrary() and also putting a breakpoint on the FreeLibrary symbol. Find out where the library is being unloaded, and then at that point, ensure that all data that references the dll has been reset. Use a mutex if you have multiple threads.
A tool that may be very useful for this is the excellent Process Monitor which I think shows you dll load and unload events, and will give you a stack trace for each one.

Related

How do symbols solve walking the stack with FPO in x86 debugging?

In this answer: https://stackoverflow.com/a/8646611/192359 , it is explained that when debugging x86 code, symbols allow the debugger to display the callstack even when FPO (Frame Pointer Omission) is used.
The given explanation is:
On the x86 PDBs contain FPO information, which allows the debugger to reliably unwind a call stack.
My question is what's this information? As far as I understand, just knowing whether a function has FPO or not does not help you finding the original value of the stack pointer, since that depends on runtime information.
What am I missing here?
Fundamentally, it is always possible to walk the stack with enough information1, except in cases where the stack or execution context has been irrecoverably corrupted.
For example, even if rbp isn't used as the frame pointer, the return address is still on the stack somewhere, and you just need to know where. For a function that doesn't modify rsp (indirectly or directly) in the body of the function it would be at a simple fixed offset from rsp. For functions that modify rsp in the body of the function (i.e., that have a variable stack size), the offset from rsp might depend on the exact location in the function.
The PDB file simply contains this "side band" information which allows someone to determine the return address for any instruction in the function. Hans linked a relevant in-memory structure above - you can see that since it knows the size of the local variables and so on it can calculate the offset between rsp and the base of the frame, and hence get at the return address. It also knows how many instruction bytes are part of the "prolog" which is important because if the IP is still in that region, different rules apply (i.e., the stack hasn't been adjusted to reflect the locals in this function yet).
In 64-bit Windows, the exact function call ABI has been made a bit more concrete, and all functions generally have to provide unwind information: not in a .pdb but directly in a section included in the binary. So even without .pdb files you should be able to unwind a properly structured 64-bit Windows program. It allows any register to be used as the frame pointer, and still allows frame-pointer omission (with some restrictions). For details, start here.
1 If this weren't true, ask yourself how the currently running function could ever return? Now, technically you could design a program which clobbers or forgets the stack in a way that it cannot return, and either never exits or uses a method like exit() or abort() to terminate. This is highly unusual and not possibly outside of assembly.

Are Win32 applications automatically linked against ntdll.dll?

I've just found out by accident that doing this GetModuleHandle("ntdll.dll") works without a previous call to LoadLibrary("ntdll.dll").
This means ntdll.dll is already loaded in my process.
Is it safe to assume that ntdll.dll will always be loaded on Win32 applications, so that a call to LoadLibrary is not necessary?
From MSDN on LoadLibrary() (emphasis mine):
The system maintains a per-process reference count on all loaded
modules. Calling LoadLibrary increments the reference count. Calling
the FreeLibrary or FreeLibraryAndExitThread function decrements the
reference count. The system unloads a module when its reference count
reaches zero or when the process terminates (regardless of the
reference count).
In other words, continue to call LoadLibrary() and ensure you get your handle to ntdll.dll to be safe -- but the system will almost certainly be bumping a reference count as it should already be loaded.
As for "is it really always loaded?", see Windows Internals on the Image Loader (the short answer is yes, ntdll.dll is part of the loader itself and is always present).
The relevant paragraph is:
The image loader lives in the user-mode system DLL Ntdll.dll and not in the kernel library. Therefore, it behaves just like standard code that is part of a DLL, and it is subject to the same restrictions in terms of memory access and security rights. What makes this code special is the guaranty that it will always be present in the running process (Ntdll.dll is always loaded) and that it is the first piece of code to run in user mode as part of a new application. (When the system builds the initial context, the program counter, or instruction pointer is set to an initialization function inside Ntdll.dll.)

What is RUST_BACKTRACE supposed to tell me?

My program is panicking so I followed its advice to run RUST_BACKTRACE=1 and I get this (just a little snippet).
1: 0x800c05b5 - std::sys::imp::backtrace::tracing::imp::write::hf33ae72d0baa11ed
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
2: 0x800c22ed - std::panicking::default_hook::{{closure}}::h59672b733cc6a455
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:351
If the program panics it stops the whole program, so where can I figure out at which line it's panicking on?
Is this line telling me there is a problem at line 42 and line 351?
The whole backtrace is on this image, I felt it would be to messy to copy and paste it here.
I've never heard of a stack trace or a back trace. I'm compiling with warnings, but I don't know what debugging symbols are.
What is a stack trace?
If your program panics, you encountered a bug and would like to fix it; a stack trace wants to help you here. When the panic happens, you would like to know the cause of the panic (the function in which the panic was triggered). But the function directly triggering the panic is usually not enough to really see what's going on. Therefore we also print the function that called the previous function... and so on. We trace back all function calls leading to the panic up to main() which is (pretty much) the first function being called.
What are debug symbols?
When the compiler generates the machine code, it pretty much only needs to emit instructions for the CPU. The problem is that it's virtually impossible to quickly see from which Rust-function a set of instructions came. Therefore the compiler can insert additional information into the executable that is ignored by the CPU, but is used by debugging tools.
One important part are file locations: the compiler annotates which instruction came from which file at which line. This also means that we can later see where a specific function is defined. If we don't have debug symbols, we can't.
In your stack trace you can see a few file locations:
1: 0x800c05b5 - std::sys::imp::backtrace::tracing::imp::write::hf33ae72d0baa11ed
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
The Rust standard library is shipped with debug symbols. As such, we can see where the function is defined (gcc_s.rs line 42).
If you compile in debug mode (rustc or cargo build), debug symbols are activated by default. If you, however, compile in release mode (rustc -O or cargo build --release), debug symbols are disabled by default as they increase the executable size and... usually aren't important for the end user. You can tweak whether or not you want debug symbols in your Cargo.toml in a specific profile section with the debug key.
What are all these strange functions?!
When you first look at a stack trace you might be confused by all the strange function names you're seeing. Don't worry, this is normal! You are interested in what part of your code triggered the panic, but the stack trace shows all functions somehow involved. In your example, you can ignore the first 9 entries: those are just functions handling the panic and generating the exact message you are seeing.
Entry 10 is still not your code, but might be interesting as well: the panic was triggered in the index() function of Vec<T> which is called when you use the [] operator. And finally, entry 11 shows a function you defined. But you might have noticed that this entry is missing a file location... the above section describes how to fix that.
What do to with a stack trace? (tl;dr)
Activate debug symbols if you haven't already (e.g. just compile in debug mode).
Ignore any functions from std and core at the top of the stack trace.
Look at the first function you defined, find the corresponding location in your file and fix the bug.
If you haven't already, change all camelCase function and method names to snake_case to stick to the community wide style guide.

Useless stack trace in SetUnhandledExceptionFilter handler

I've been using SetUnhandledExceptionFilter for a long time, and my handler walks the stack and uses dbghelp.dll to convert the addresses into File/Line references. It then writes that to a log file and puts up a dialog with the same information for the user. This USED to work just fine. These days however I'm getting a completely useless stack:
1004bbaa: Lgid.dll, C:\Data\Code\Lgi\trunk\src\win32\Lgi\LgiException.cpp:175
10057de0: Lgid.dll, C:\Data\Code\Lgi\trunk\src\win32\Lgi\GApp.cpp:107
7c864191: kernel32.dll, UnhandledExceptionFilter+0x1c7
102158ed: MSVCRTD.dll, winxfltr.c:228
006dc1a7: Scribe.exe, crtexe.c:345
7c817077: kernel32.dll, RegisterWaitForInputIdle+0x49
00000000: Scribe.exe
Where 'Scribe.exe' is my application. Now if I walk the debugger from the exception handler back up the stack several frames I eventually get to a completely different temporary stack that actually includes all the calls that led up to the crash. Which is the information I actually want to log for the user. It's as if the exception handler is executing on a separate stack from the main application.
What I need is the stack information for the actual application stack, that includes all the calls leading up to the crash. Is there some easy way to get that from inside the exception handler?
According to http://www.eptacom.net/pubblicazioni/pub_eng/except.html I can get the exception's EIP and EBP out of the EXCEPTION_POINTERS 'Context' member. So I tried passing that EBP to my stack walker as it's initial point and it could then walk the application stack correctly. As long as I put the EIP as the first point in the stack walk I get the whole thing.
Are you using x64? Could you be hitting http://blog.paulbetts.org/index.php/2010/07/20/the-case-of-the-disappearing-onload-exception-user-mode-callback-exceptions-in-x64/ ?

Viewing registers in a crash dump

Is there a way to view the register contents in each stack frame in a crash dump?
The registers window seems to contain the registers when the exception occurred but it would be useful to be able to see their contents in each stack frame.
Depending on the calling convention, you can get some of the registers which are saved on the stack. For example, in the cdecl calling convention, all of the registers except for EAX, ECX, and EDX are required to be saved, either by the caller or the callee. Those three registers are clobberable, so you generally won't be able to get their values from higher up in the call stack. If a function doesn't use a register that must be saved, then it won't save it, but since it doesn't use it, that register has the same value in the next higher stack frame.
After doing some research and thinking about this a bit, I realized that it is probably not possible. A crash minidump saves certain areas of process memory (depending on the flags passed to the MiniDumpWriteDump() function) and enough state information to re-create the environment where the crash happened in a debugger. It does not have the processor state at each instruction or even at each stack frame, it only knows about the processor state when the exception occurred.
In optimized builds, it's true that some information down the stack may get tossed, however, you can ask the debugger to try and show you the information for a given stack frame. First do "kn" to see the stack with frame numbers, then try ".frame /c [frame]" or ".frame /r [frame]".
Check out the help (".hh") for more information.
I don't think you can get it either when debugging. The only value you can get from registers is their value at the current instruction.

Resources