Why don't Minidumps give good call stacks? - debugging

I've used minidumps on many game projects over the years and they seem to have about a 50% chance of having a valid call stack. What can I do to make them have better call stacks?
I've tried putting the latest dbghelp.dll in the exe directory. That seems to help some.
Is Visual Studio 2008 or 2010 any better? (I'm still on VS 2005).
The code I use looks like this sample.

One thing you can do to improve the accuracy of call stacks found in dumps is to use a debugger other than Visual Studio -- specifically, use WinDbg or another tool that uses the "Windows Debugger" debugging engine found in dbgeng.dll (as opposed to the "Visual Studio Debugger" debugging engine that Visual Studio uses).
In our experience, WinDbg is 100% reliable in producing good call stacks from the same dumps where Visual Studio produces unusable or wildly inaccurate call stacks. From what I can tell, in cases where an unhandled exception is the source of the crash WinDbg automatically performs the tricky process of reconstructing/recovering the exception callstack, but Visual Studio does not (or cannot?). The two debuggers use different heuristics for interpreting stacks
WinDbg can be daunting at first, so here's my quick guide on how to make it easier or even avoid having to use it directly.
A Mere Mortal's Guide To Extracting Good Callstacks
These are ordered from "fastest/easiest" to "slowest/most cryptic to interpret".
Easiest Option: use DbgDiag from Microsoft
This is a little-known tool that automates a lot of analysis of common problems, and it's simple enough to give to non-programmers or even customers. It's fast and nearly foolproof, and has become my "go to" tool for quickly analyzing an incoming crash dump.
Launch the "DebugDiag Analysis" application
Select the "CrashHangAnalysis" checkbox on the main page
Drag-and-drop your dump into the "Data files" pane on the main page
Click "Start Analysis"
After a few seconds to a few minutes it will spit out a nice .mhtml file containing an analysis of the problem, info about all the related thread, complete call stacks, etc. All hyperlinked and easy to use.
DebugDiag even automates some of the more complicated analysis that is possible but painful in WinDbg (like tracking down which of the 350 threads in your application is responsible for a deadlock).
Note: Chrome will not download or open .mhtml files for security reasons, so you must open in Internet Explorer or Microsoft Edge for it to be usable. This is annoying, and I've filed a request with the DebugDiag team (dbgdiag#microsoft.com) to change the format to plain HTML
Middle option: Install WinDbg as an alternate debugging engine for Visual Studio
Install Visual Studio if it's not yet installed. This needs to be done before the next step.
Install the Windows Driver Kit (WDK)
Launch Visual Studio, and (this part is important!) use the new "File -> Open -> Crash Dump..." option to open the dump. This will debug the crash dump using the Windows Debugger (if you instead drag-and-drop the dump on Visual Studio or use the standard "File -> Open -> File..." option to open the dump, it will debug it using the old Visual Studio debugging engine... so be careful to use the right option).
You should now be able to see the correct call stack and navigate around using the Visual Studio GUI, although some things work differently (the watch windows require using the unfamiliar WinDbg syntax, thread IDs are different, etc). Note: the Visual Studio UI may be very sluggish, especially if many threads are involved and the 'threads' or 'parallel stacks' windows are open.
Hardcore option: Use WinDbg directly
Launch WinDbg.exe
Drag-and-drop your dump into the WinDbg window
Type !analyze -v and press Enter. After a little bit of time WinDbg will spit out a crash call stack, and also its estimation of what the source of the problem is. If you're analyzing a deadlock, you can try !analyze -v -hang and WinDbg will often show you the dependency chain involved.
At this point you may have all the info you need! However, if you then want to examine the process state in the Visual Studio debugger you can take the following additional steps:
Open the crash dump in Visual Studio
Right-click in the callstack window and choose "Go to Disassembly"
Paste the hex address from the top line of WinDbg's output callstack into the "Address" bar of the Disassembly window and press enter. You're now at the location of the crash, looking at the disassembled code.
Right-click in the disassembly window and choose "Go To Source Code" to go to the source code for the location. Now you're looking at the source code at the crash site.
Note: all of the above require having correct symbol server paths configured, otherwise you won't be able to resolve the symbols in the call stacks. I recommend setting the _NT_SYMBOL_PATH environment variable so that it's automatically available to Visual Studio, WinDbg, and DebugDiag.

What's missing from your callstack? Do you have a bunch of addresses that don't resolve to valid function names (ie, 0x8732ae00 instead of CFoo:Bar())? If so, then what you need is to put your .PDBs where your debugger can find them, or set up a symbol server and set the "Symbol Paths" in the right-click context menu of the Modules pane.
We store every .PDB from every binary every time someone checks in a new Perforce changelist, so that when a dump comes back from anyone inside the office or any customer at retail, we have the .PDB corresponding to the version of the game they were running. With the symbol server and paths set, all I have to do is just double-click the .mdmp and it works every time.
Or do you have a call stack that appears to only have one function in it? Like, 0x8538cf00 without anything else above it in the stack? If so, then your crash is actually the stack itself being corrupted. If the return addresses in the backchain have been overwritten, naturally the debugger will be unable to resolve them.
Sometimes also you'll find that the thread that actually emits the minidump is not the one that threw the exception that caused the crash. Look in the Threads window to see if one of the other threads has the offending code in it.
If you are debugging a "Release" build -- that is to say, one compiled with all optimization flags turned on -- you will have to live with the fact that the debugger will have trouble finding local variables and some other data. This is because turning on optimizations means allowing the compiler to keep data on registers, collapse calculations, and generally do a variety of things that prevents data from ever actually being written to the stack. If this is your problem then you'll need to open up the disassembly window and chase the data by hand, or rebuild a debug binary and reproduce the problem where you can look at it.

Turn off Frame Pointer Optimization, if you need stack dumps. Frame pointers are used to explicitly define stack frames. Without them, the debugger has to deduce the location of each frame.

The code to record the minidump is unlikely to be relevant. The main things that a minidump records are module information (for getting symbols) and the full content of all thread stacks. Beyond that basic information (which is always recorded) nothing else matters.
Getting good symbols (including PE files) is crucial for stack walking. More details can be found here: https://randomascii.wordpress.com/2013/03/09/symbols-the-microsoft-way/
I find that Visual Studio is usually reliable at displaying call stacks. It automatically displays the relevant call stack from the exception record, and it makes changing threads easy so that you can see the call stacks of all threads. It does sometimes try to 'hide' details that it thinks might confuse you - whether that is good or bad depends on your skill level.
Windbg defaults to showing the call stack of the code that recorded the crash dump rather than the crashing call stack. Windbg requires that you go ".ecxr" or "!analyze -v" in order to see the crash stack. I find this annoying. Windbg also requires more configuration in order to be useful.
The two debuggers do have different stack walking heuristics. These heuristics are needed, for instance, if you call or return to address zero since there is no unwind information for that address. For 'clean' crashes where the failing instruction is in normal code these heuristics are less important.
The stack walking has almost certainly improved in the last ten years. VS 2015 Community Edition is very capable and is free so you might as well try it.
If you use windbg then you can try some experiments:
!vc7fpo - toggles some of the windbg heuristics.
!stackdbg d, 7, f - turns on windbg stack walk
k1 - walks one level of the stack, spitting diagnostics as controlled by !stackdbg
dds esp - dumps the raw contents of the stack, doing a symbol lookup on each pointer
If you upgrade to VS 2015 and still have problems then it is likely that the stack walking failures are specific to the crashes you are seeing. If a buffer overrun tromps the stack before crashing then the call stack will be irrevocably damaged. Your question has too little information about what failures you are seeing to give a definitive diagnosis. I find the stack displays of both debuggers fairly reliable, but I also usually understand why they sometimes fail and when that happens I can still extract the information that I need.

I dont use minidumps, but rather dump teh stack by "hand" into a logfile
(see www.ddj.com/cpp/185300443 and
How to Log Stack Frames with Windows x64).
I encounter a similar behavior like you do: Sometimes there is a valid call stack, sometimes there is not. In a minor number of cases the stack might be really corrupted. In maybe 1/3 of all cases the installed Exception handler is not called at all! I guess that its somehow a problem of the windows structured exception handling.

Related

natvisreload using Visual Studio Professional 2017 : syntax error

I'm trying to do dump analysis using Visual Studio Professional 2017, but when entering the command .natvisreload in the watch-window I get syntax error and there is nothing in the output window. (This seems to mean that the command is not understood)
In order to get me on track of the real problem, I'd like to know an example of another command I can launch in the Watch window: does anybody know another command, starting with a dot, I can launch in the Watch window (in order to distinguish whether the issue is related to the specific command .natvisreload or to the general Watch window)?
"natvis" is an abbreviation for "native visualizer". Used by the unmanaged debugging engine to provide a customized view of a native object. The .natvisreload command is one that only the unmanaged debugging engine can understand. From the comment it is somewhat obvious how this went wrong:
An example of the slug you see when you use File > Open > File to open a minidump for a process that uses managed code. Note the 3 options you have at the upper right to get the debugging started. "Managed Only" only enables the managed debugging engine, "Native Only" for the unmanaged engine, "Mixed" enables both.
You used "Mixed". While that enables both engines, there can be only one active at the same time. Unfortunately it is not always obvious which particular one is in control. Other than the debugger being able to display source code. And a side effect like you discovered here, the ".natvisreload" command goes "huh?" since that is not a command that the managed debugging engine understands.
So one workaround is to use "Native Only".
You can however switch between engines on-the-fly. This normally happens automagically when the debugger lands on a breakpoint. Not an option for dump debugging. The non-intuitive other way is to use the Debug > Windows > Call Stack debugger window. In mixed-mode debugging you see both managed and unmanaged stack frames in the stack, starting with RtlUserThreadStart at the bottom. Double-click one of these frames, like the bottom one, and the debugger switches engines. Note that you don't necessarily have something decent to look at, especially if this was a managed program, and you merely get a machine code dump for the native code. The ".natvisreload" command will however now work as intended.

Does Visual Studio have "break where you are" functionality?

We have started encountering endless loops in our engine recently and I have no idea how to effectively fight them. The app simply freezes for eternity and I am unable to stop it's execution to understand what is going on. Placing normal breakpoint in major places (update loop) does nothing. I am almost certain that the problem is in a continuously running loop somwhere, but due to the code's size I cannot even start guessing where to look for it.
So, my question is how do you break app's execution in Visual Studio at some arbitrary place in the code, where the app happens to be at that time? Something akin to "stay where you are". Is it even theoretically possible?
Sure, use Debug + Break All.
This of course doesn't necessarily break your program at a nice address that happens to match one of the statements in your program. Pretty likely if setting breakpoints didn't invoke a break. You are likely to see a notification from the debugger that it cannot display source code. Or for that matter, it might not have selected the correct thread.
So first thing you want to do is use Debug + Windows + Threads and make sure that the correct thread is selected. Double-click the one you want to debug. Next thing you want to do is look at the Debug + WIndows + Call Stack window. It should at least display some of your methods, giving you a hint how it ended up in never-never land.
And it isn't unlikely that it got deadlocked on native code or an operating system call. To see that, you'll need to enable unmanaged debugging. Project + Properties, Debugging, tick the "Enable native code debugging". And make sure you've got the symbol server enabled so you'll get debugging info for the operating system DLLs. Tools + Options, Debugging, Symbols.

Why does my windows program die with its frozen (bluish gray) Forms or windows?

My delphi program (NOT for .NET) on windows 7 seems to be running for couple of days straight and then the program sort of freezes with all of its windows painted with blueish grey color as if its windows are disabled. You simply don't have control over the program anymore but has to kill its process and start it up again. You don't need to reboot the system itself.
Has anyone experience this or anything similar? If so, what did you do to resolve or try to resolve it?
Thanks,
Your question context is very vague. We do not have any information about your application, even its design and architecture.
Nethertheless, my (general-purpose) suggestions are the following:
If your application is not multi-threaded, do the process in background threads, then leave the main thread ready to process GDI messages;
If your application is multi-threaded, take care that all VCL access from background threads are made via a Synchronize call;
If your application is multi-threaded or use timers, take care that no method is re-entrant (in some circonstances, you may come into a race condition);
Hunt any memory leak;
Use a detailed logging of the program execution, logging all exceptions risen, to guess the context of the program hang (it may be used on the customer side also to hunt race conditions);
Download the great free tool named ProcessExplorer (now hosted by Microsoft), and check out the state of your frozen program: you will see detailed information about threads, CPU use, memory, network, libraries, handles - this is a must have for any serious debugging - track especially the GDI handles leaks (number of those should remain stable);
If you did not check it already, take a look at the global Windows system event log: there may be some information here;
Perhaps a third party component or library is responsible of the process hang: try to isolate the part of your code which may be responsible of this hang.
I've Delphi application running for months without any problem. Issue is definitively in application code, not in the Delphi architecture (its RTL and VCL are very stable).
The bluish grey color is probably the default window color, meaning the window is no longer painting itself. This is consistent with the other symptom that the program no longer responds to any input. This means it isn't processing any window messages.
The easiest way to debug is to run the program in a debugger, and when it's hung just stop it and see where it's at.
If you have a memory leak you may eventually run out of memory in your process space, and it's possible that the program doesn't properly respond to that condition. Check Task Manager to see the amount of memory it's using.
Yes I fixed several hangs and other problems in the past years.
I used ProcessExplorer before (to view the stack), but it needs Microsoft debug symbols. And with Delphi you can only create a .map file. With map2dbg I could convert the .map to a .dbg, but this does not always work (note: .dbg is deprecated, newer versions of Microsoft debugging tools do not use them anymore).
So I made my own tool :-)
It is part of "AsmProfiler Sampling" tool:
http://code.google.com/p/asmprofiler/downloads/detailname=AsmProfiler_Sampling%20v1.0.7.13.zip
Click on the "Stack view of Process" button in the first screen.
Then select your process from the list and double click on it:
http://code.google.com/p/asmprofiler/wiki/ProcessStackViewer
Now you can view the stack trace of each thread. If the GUI does not respond, the main thread hangs, so check the first thread. (note: sometimes you see an "emtpy" stack because a function misaligned the stack for calculation etc, use the raw strack tracing algoritm to get more the full stack again (with a lot of false positives, because every pointer on the stack which is possible a function is shown!)).
Please post the stack here if you can't solve it, so we can take a look at it.
Note: it uses the jclDebug.pas unit of the JEDI library, so it can read .map and .jdbg files (also .dbg and .pdb debug files of Windows dlls) and also internal JCLDEBUG sections (embedded .jdbg file in one .exe). So you must at least build an .exe with detailed (!) map file, see Project Options -> Compiler -> Linking.

Windows Debugger Road-Map

It seems there are dozens of debugger and debugging tools that Microsoft produces, which creates a maze of choices and questions concerning which tool to apply, and when. For example, there is windbg - and the debugger built into Visual Studio. Both can access minidumps. Why would I choose one over the other?
Dr. Watson was the default post-mortem crash analysis tool of the past. It has now been replaced with "Problem Reports & Solutions". Which is in turn replaced with IIS Exception Monitor on servers? And perhaps all of this is built on top of "Microsoft CDB Debugger," or perhaps that is a another duplicate tool? ADPlus, yet another one, is built on CDB Debugger. The maze seems to go on endlessly.
Can someone provide a link to a taxonomy or roadmap of all these tools, with comments of which are being deprecated (Dr. Watson?) and what "tool direction" debug students should absorb? I'm sure there are a number of tools and base libraries I've not mentioned here. it would be nice to know the dependencies between them too (such as ADPlus using the CDB Debugger).
I've found this link to be helpful, since it answers some of the questions I'm asking - though the material is dated. Any other resources that give a similarly simple compare / contrast run-down?
There is no difference between CDB and NTSD, other than how they spawn new windows. Choosing when to use Visual Studio over the command line debuggers is sometimes a matter of personal choice, but sometimes the command line is a better tool for the job. Once you get good at using the command line debuggers, you can get things done much more quickly. I suspect there are a few scenarios that remain where you can only debug a specific problem with the command line debugger, but I can't think of any off hand. The third debugger you've missed is kd, which is the kernal debugger. If you want to debug kernal mode stuff (i.e. your device drivers you've written) it's really your only choice.
CDB, NTSD and KD are all part of the debugging tools for Windows, itself part of the DDK. Visual Studio does not depend on the other debugging package and vice-versa.
Watson and the like are not debuggers. They merely observe and report. I suspect the best advice there is use whichever one is appropriate to your problem. I mean, there are lots of tools for all sorts of different MS technologies. E.g. Orca for MSI databases. All of these products are unrelated, often released and maintained by different divisions, etc. As a result, I doubt you'll find a chart showing their relationships since they are so diverse.

How to understand .pdb files of visual studio?

I opened it with an editor,totally messy.
BTW, in the "Disassembly" view,is it possible to dump all the assembly code? I tried but can only grab a screen of lines
I've been keeping track of your questions. There's context that you should have put in your question, I think you're trying to debug a DirectShow plug-in that you don't have the source code for. Some kind of camera gizmo.
No, opening a .pdb file in a text editor isn't going to show you anything useful. It is binary data. I know you have a relevant .pdb for the plug-in you're working with, you get decent stack traces with named functions. You probably got the .pdb from the Microsoft Symbol server. Reading a .pdb file is the job of the debugger. There are several APIs available to read it yourself, the dbghelp API is the core one.
But it will not show you anything you don't already know from the debugger. The .pdb file is just a database of functions. You got the stripped one, it will never show more than what you see in the call stack window.
Ultimately, this is a chain of XY questions. You keep asking about Y without ever revealing what the real X problem is all about. You'll just get useless answers, like this one, until you tell us about X.
CodeProject: How to Inspect the Contents of a Program Database (PDB) File
Keep in mind though, that those files are for the debugger and not directly for you. At least I don't have the urgent wish of being able to read every possible file format in a text or hex editor.
Bearing in mind #Hans Passant's remarks - I think the question as is deserves an answer, if nothing else then for sake of those arriving here by search.
The shipped tool for PDB inspection is DBH. It is bundled with debugging tools for windows. While admittedly a command line tool (I'm from the GUI generation personally) , it is tremendously rich in features and you'd be hard pressed to find anything it can't do in half a line.
It sounds like what you want is the dumpbin command. Note that this requires the VC tools to be in your PATH; you could either use the appropriate shortcut in the start menu or run %VSxxCOMNTOOLS%\vsvars32.bat, where xx is a mnemonic for the actual version of VC you are using: the easiest way to figure out exactly what environment variable to use is probably to run set and see what variables with such names are actually set in your environment. For instance, on a system with MSVC 2005 Express and MSVC 2008 Express installed, I get this:
C:\code\xemacs-beta\nt>set | grep "VS.*COMNTOOLS"
VS80COMNTOOLS=C:\Program Files\Microsoft Visual Studio 8\Common7\Tools\
VS90COMNTOOLS=C:\Program Files\Microsoft Visual Studio 9.0\Common7\Tools\
So, to create a new shell window with the VC 9 (2008) tools added to the path, I can run start "VC 9" "%VS90COMNTOOLS%\vsvars32.bat". (The "VC 9" argument is the title for the new window.)
Anyway, with the VC tools in PATH, we can run such commands as dumpbin /disasm ..\src\temacs.exe > temacs.disasm (except you probably aren't in the nt subdirectory of an xemacs tree, so stick whatever path you want there). This will disassemble the file given, using any appropriate PDB it can find for the symbol names.
What MS calls documentation can be found in the dumpbin MSDN entry (actually, it's a lot better than I remembered it as being -- perhaps I was sore at it for being so uninformative about the /disasm flag in particular), but don't underestimate the usefulness of dumpbin /? -- it may not give as much information, but at least you can see it all on the screen at one time.

Resources