WinDbg: APPLICATION_HANG_WRONG_SYMBOLS - debugging

I'm pretty new to WinDbg and I'm trying to find a bug which has my application hanging for no appearent reason. I'm not sure I'm doing things right, but I understand that I need both symbols for the system dlls aswell as the .exe I'm debugging. Thus, I set up my symbol path like this:
srv*c:\websymbols*http://msdl.microsoft.com/download/symbols;S:\MY\PATH
The second path pointing to a folder where I placed the .pdb that was generated by VS. I'm positive that's the correct .pdb file, but it was built on a different architecture (not sure if this is an issue). I would like to see a complete stack trace for a start, so I ran !analyze-v. The output looks like this. As you can see it lists APPLICATION_HANG_WRONG_SYMBOLS as primary problem. So I ran .reload /f, giving me this output. I have no symbols for dnAnalytics or Vertec.Interop, so these errors make sense, but there are some missing checksums and iphlpapi.pdb was not found.
So my questions would be: Why does WinDBG lists wrong symbols as primary problem, even though I'm positive I do have the right .pdb file available? (I'm running WinDBG on the same machine the dump was generated). To what extent can I trust the stack trace even thought my symbols are wrong, appearently? Does anyone see a obvious problem that could cause a hang in my application from the stack trace already? Any pointers appreciated!

The "wrong symbols" here is probably because you are using 64-bit CLR with version less than 4.0 and the !analyze extension has some trouble decoding the mixed native / managed stack.
Why is WinDBG looking for the BJM.exe
symbols on the microsoft server in
this case?
This is because you put the symbol server before you local path in the symbol path. Windbg does not know which module is yours and which is Microsoft's. It just looks for the PDB file for the module in the order speicified by the symbol path.
To what extent can I trust the stack trace even thought my symbols are wrong, appearently?
Stacks on x64 are very reliable since the stack-walk does not require symbols. Symbols are reliable (that is you do not have wrong symbols) unless you forced windbg to ignore wrong timestamp / checksum with .reload /f /i
In some cases the address -> symbol may seem wrong. This is usually due to small functions that have the same code (very common in C++ code if the functions are virtual or the code is not optimized)

Try using !sym noisy to get some more information about what it's actually looking at (docs). !itoldyouso may also be useful here (link)

Related

WinDbg slow when debugging local process (step over)

This is really driving me crazy. I am using WinDbg as my primary debugger.
It is used to debug local service (WinDbg running locally, debugging service on the same machine).
The PDB files are stored on local hard drive.
Source code is accessed via SMB share.
Debugging works in bursts, sometimes it flow well, most of the time I keep seeing the unbelievably annoying "*BUSY*" message, this happens almost every time when I perform a "step-over".
Any ideas what I could do to speed things up?
Thanks
This may happens if you have many unqualified breakpoints that track module load events (created via bu) saved in your workspace.
Also it is worth to check network connection and local symbols cache size
I was having the exact same problem, and was able to see a big improvement by adjusting the symbol options. Specifically, the SYMOPT_NO_PUBLICS option seemed to be the most important, but I adjusted a few other related options. I did the following...
.symopt-0x4
.symopt+0x100
.symopt+0x8000
.symopt-0x10000
...which is:
-SYMOPT_DEFERRED_LOADS
+SYMOPT_NO_UNQUALIFIED_LOADS
+SYMOPT_NO_PUBLICS
-SYMOPT_AUTO_PUBLICS
After all that, I had a symopt bit mask value of 0x80028333, which I now use as a WinDbg command line option, as in:
windbg.exe -sflags 0x80028333
I haven't yet discovered if there is any downside to this approach. Perhaps there are some cases where using SYMOPT_NO_PUBLICS results in missing information, but it's been working well for me thus far, and it's definitely way faster.
WinDbg Symbol Options MS Documentation
Usually this happened to me when multiple tool windows (locals, watches, call stack ) were opened. If those windows are opened, WinDbg will spend cycles to update all of them using slow symbols lookups on each 'step-over' command.
In this regard using just a command line ( ntsd ) debugger is much faster.
If it's slowly pulling Windows symbol files from the internet, consider downloading all of them to your hard drive and pointing WinDbg to their location on it. It would be best to have your service's symbol files and source files on the local drive too.
A couple of things to try:
Use the !sym noisy command to show more detailed information on the images and pdbs that windbg is referencing. You may see that it's accessing the network more than you're expecting, or that a network server is unavailable and requests are timing out
Ensure that your symbol path has got a cache in it
Take a copy of the sources locally and add to the source path
Instead of using 'step-over', set a breakpoint on the next command, or even a hardware breakpoint (using ba)
When your symbol path contains a large file store, WinDbg can take a long time searching for symbols that don't exist. You'll can diagnose this behavior by, as user eran suggested, by running Process Monitor and observing the file operations being made by WinDbg.
This was an extreme pain point for our organization, but I found a workaround detailed in this question: "WinDbg takes extremely long time to loading symbols; is searching every directory in large network UNC symbol store"

Debugging a minidump in Visual Studio where the call stack is null

I have a customer who is getting a 100% reproduceable crash that I can't replicate in my program compiled in Visual Studio 2005. I sent them a debug build of my program and kept all the PDB and DLL files handy. They sent me the minidump file, but when I open it I get:
"Unhandled exception at 0x00000000 in MiniDump.dmp: 0xC0000005: Access violation reading location 0x00000000."
Then the call stack shows only "0x00000000()" and the disassembly shows me a dump of the memory at 0x0. I've set up the symbol server, loaded my PDB symbols, etc. But I can't see any way of knowing which of the many DLLs actually caused the jump to null. This is a large project with many dependencies, and some of them are binaries that I don't have the source or PDBs for, as I am using an API as a 3rd party.
So how on earth is this minidump useful? How do I see which DLL caused the crash? I've never really used minidumps for debugging before, but all the tutorials I have read seem to at least display a function name or something else that gives you a clue in the call stack. I just get the one line pointing to null.
I also tried using "Depends" to see if there was some DLL dependency that was unresolved; however on my three test machines with various Windows OS's, I seem to get three different sets of OS DLL dependencies (and yet can't replicate the crash); so this doesn't seem a particularly reliable method either to diagnose the problem.
What other methods are available to determine the cause of this problem? Is there some way to step back one instruction to see which DLL jumped to null?
Well it looks like the answer in this instance was "Use WinDbg instead of Visual Studio for debugging minidumps". I couldn't get any useful info out of VS, but WinDbg gave me a wealth of info on the chain of function calls that led to the crash.
In this instance it still didn't help solve my problem, as all of the functions were in the 3rd party library I am using, so it looks like the only definitive answer to my specific problem is to use log files to trace the state of my application that leads to the crash.
I guess if anyone else sees a similar problem with an unhelpful call stack when debugging a minidump, the best practice is to open it with WinDgb rather than Visual Studio. Seems odd that the best tool for the job is the free Microsoft product, not the commerical one.
The other lesson here is probably "any program that uses a third party library needs to write a log file".
The whole idea behind all 'simple' ways of post mortem debugging is the capture of a stack trace. If your application overwrites the stack there is no way for such analysis. Only very sophisticated methods, that record the whole program execution in dedicated hardware could help.
The way to go in such a case are log files. Spread some log statements very wide around the area where the fault occurs and transmit that version to the customer. After the crash you'll see the last log statement in your log file. Add more log statements between that point and the next log statement that has not been recorded in the log file, ship that version again. Repeat until you found the line causing the problem.
I wrote a two part article about this at ddj.com:
About Log Files Part 1
About Log Files Part 2
Just an observation, but the the stack is getting truncated or over-written, might this be a simple case of using an uninitialised field, or perhaps a buffer overrun ?
That might be fairly easy to locate.
Have you tried to set WinDbg on a customer's computer and use it as a default debugger for any application that causes a crash? You just need to add pdb files to the folder where your application resides. When a crush happens WinDbg starts and you can try to get call stack.
Possibly you already know this, but here are some points about minidump debugging:
1. You need to have exactly the same executables and PDB files, as on the client computer where minidump was created, and they should be placed exactly in the same directories. Just rebuilding the same version doesn't help.
2. Debugger must be connected to MS Symbols server.
3. When debugger starts, it prints process loading log in the Output window. Generally, all libraries should be successfully loaded with debug information. Libraries without debug information are loaded as well, but "no debug info" is printed. Learn this log - it can give you some information.
If executable stack contains frames from a library without debug information, it may be not shown. This happens, for example, if your code is running as third-party library callback.
Try to create minidump on your own computer, by adding some code which creates unhandled exception, and debug it immediately. Does this work? Compare loading log in successful and unsuccessful debugging sessions.
You may have called null function pointer. Current executing function information is needed to show call stack information. Force set instruction pointer to start of any simple function, then you'll see call stack information again.
void SimpleFunc()
{ // <- set next statement here
}

Symbols, in Microsoft Debugging Tools for Windows?

What is the need/use of 'symbols' in the Microsoft debugger?
I spent some time trying to figure out the debugger a while back and never was able to get it making any sense (I was trying to debug a server hang...). Part of my problem was not having the proper 'symbols'.
What are they? And why would I need them? Aren't I just looking for text?
Are there any better links out there to using it than How to solve Windows system crashes in minutes ?
You need symbols in order to translate addresses into meaningful names. For example, you have locations on your stack at every function call:
0x00003791
0x00004a42
Symbols allows the debugger to map these addresses to methods
0x00003791 myprog!methodnamea
0x00004a42 myprog!methodnameb
When you build a debug version of a program, the compiler emits symbols with the extension .PDB. It also contains line information so you can do source code debugging, etc..
You need to set your symbol search path correctly for the debugger to pick this up. IN the command window you can do
.sympath c:\symbols;c:\temp\symbols
in order to have it search for the .PDB in these directories. It will also look in the same directory that the executable is ran from.
It also might be helpful to use the Microsoft public symbols server so that you can resolve OS binaries such as NTDLL, GDI, etc.. with this path at the beginning:
.sympath SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols;c:\symbols
You will need to create c:\websymbols first.
On the Windows binary architecture, the information needed for debugging (function names, file and line numbers, etc.) aren't present in the binary itself. Rather, they're collected into a PDB file (Program DataBase, file extension .pdb), which the debugger uses to correlate binary instructions with the sorts of information you probably use while debugging.
So in order to debug a server hang, you need the PDB file both for the server application itself, and optionally for the Windows binaries that your server is calling into.
As a general note, my experience with WinDbg is that it was much, much harder to learn how to use compared to GDB, but that it had much greater power once you understood how to use it. (The opposite of the usual case with Windows/Linux tools, interestingly.)
If you just have the binary file, the only info you can typically get is the stack trace, and maybe the binary or IL(in .NET) instructions. Having the symbols lets you actually match that binary/IL instruction up with a corresponding line in the source code. If you have the source code, it also lets you hook up the debugger in Visual Studio and step through the source code.

How do I analyse a BSOD and the error information it will provide me?

Well, fortunately I haven't written many applications that cause a BSOD but I just wonder about the usefullness of the information on this screen. Does it contain any useful information that could help me to find the error in my code? If so, what do I need, exactly?
And then, the system restarts and probably has written some error log or other information to the system somewhere. Where is it, what does it contain and how do I use it to improve my code?
I did get a BSOD regularly in the past when I was interacting with a PBX system where the amount of documentation of it's drivers were just absent, so I had to do some trial-and-error coding. Fortunately, I now work for a different company and don't see any BSOD's as a result of my code.
If you want a fairly easy way to find out what caused an OS crash that will work ~90% of the time - assuming you have a crash dump available - then try the following:
Download WinDbg as part of the Debugging tools for Windows package. Note, you only need to install the component called Debugging Tools for Windows.
Run WinDbg
Select "Open Crash Dump" from the file menu
When the dump file has loaded type analyze -v and press enter
WinDbg will do an automated analysis of the crash and will provide a huge amount of information on the system state at the time of the crash. It will usually be able to tell you which module was at fault and what type of error caused the crash. You should also get a stack trace that may or may not be helpful to you.
Another useful command is kbwhich prints out a stack trace. In that list, look for a line contains .sys. This is normally the driver which caused the crash.
Note that you will have to configure symbols in WinDbg if you want the stack trace to give you function names. To do this:
Create a folder such as C:\symbols
In WinDbg, open File -> Symbol File Path
Add: SRV*C:\symbols*http://msdl.microsoft.com/download/symbols
This will cache symbol files from Microsoft's servers.
If the automated analysis is not sufficient then there are a variety of commands that WinDbg provides to enable you to work out exactly what was happening at the time of the crash. The help file is a good place to start in this scenario.
Generally speaking, you cannot cause a OS crash or bug check from within your application code. That said, if you are looking for general tips and stuff, I recommend the NTDebugging blog. Most of the stuff is way over my head.
What happens when the OS crashes is it will write a kernel dump file, depending on the current flags and so on, you get more or less info in it. You can load up the dump file in windbg or some other debugger. Windbg has the useful !analyze command, which will examine the dump file and give you hints on the bucket the crash fell into, and the possible culprits. Also check the windbg documentation on the general cause of the bug check, and what you can do to resolve it.

Post-mortem crash-dump debugging without having the exact version of a Windows DLL in the Symbol Server

Within my application, I use the MiniDumpWriteDump function (see dbghelp.dll) to write a crash dump file whenever my application crashes.
I also use a symbol server to store all my executables and pdb files, so that whenever a customer sends me a crash-dump file, the debugger automatically picks up the correct version of the executable and the debug information.
I also store Windows DLL's (ntdll.dll, kernel32.dll, ...) and their debug information in the symbol server (using SymChk). The debug information is fetched from Microsoft's public symbol server.
Most of the time this works perfect, except when:
a customer crashes in one of the Windows DLL's
and the customer uses DLL's that I haven't put in the symbol server
This is because it is quite undoable to store every flavor of every Windows DLL in the Symbol Server (especially with the weekly patches).
So, if a customer crashes in, let's say, version 5.2.123.456 of NTDLL.DLL, and I didn't put this exact version of the DLL in my Symbol Server, then I'm stuck. Even Microsoft's public symbol server doesn't help because it only provides the debug information, not the DLL's itself.
My current solution is to ask the customer for his DLL's, but that's not always easy. Therefore I'm looking for a better solution.
Is there a way to get the debugger showing a correct call stack, or loading the debug information of a specific DLL, even if you don't have the exact version of the DLL?
Alternatively, is there a way to get all versions of all (or the important) Windows DLL's (from Microsoft)?
EDIT:
In the mean time I found a really easy way to solve this problem.
With the utility ModuleRescue (see http://www.debuginfo.com/tools/modulerescue.html) you can generate dummy DLL's from a minidump file. With these dummy DLL's, the debugger is satisfied, and correctly starts loading the debug symbols from the Microsoft servers.
It is possible to relax WinDbg's symbol resolution; see my answer to a similar question. On the other hand, the solution that I propose here relies on the fact that the DLLs are identical other than having different GUIDs identifying their debug symbols. A different version of a DLL is likely going to have a different binary, so the symbols are probably not going to match properly even if you can get them to load.
I'm pretty sure Microsoft's symbol server also provides the binaries. I'm looking in my store and I see a ton of Microsoft .dll files. I have my _NT_SYMBOL_PATH defined as
SRV*F:\Symbols\Microsoft*http://msdl.microsoft.com/download/symbols
This way it will search my local store first before trying to copy them down from Microsoft's public server.
You can have multiple symbol servers in your symbol path. So simply set up the symbol path to point to your own server for your own private modules, and to the public MS server for OS modules, see Symbol Path:
This can be easily combined with the
Microsoft public symbol store by using
the following initial setting:
_NT_SYMBOL_PATH=srv*c:\mysymbols*http://msdl.microsoft.com/download/symbols;cache*c:\mysymbols
The Microsoft Public Symbol Store is documented as http://msdl.microsoft.com/download/symbols.
? What part isn't working?
I've never been in your situation, exactly, but I would expect the debugger to give you the correct portion of the call stack that was in your code, up to the call into the dark dll. Of course, from there down to the actual crash symbols would be unavailable, but can't you see which NTDLL API was being called and which arguments were passed to that call?
You don't say which tool you're using for minidump debugging: WinDBG or VS.

Resources