does user.dmp contain only the most recent crash - debugging

I wanted to know if the "user.dmp" created by drwtsn32 has only the most recent crash.
I tried to capture a few crashes. But when i tried to analyse it, I just see one crash.
Thanks for the help,
Arun

Just one crash dump, although the log file may contain information on previous crashes.
Run drwtsn32.exe (without any arguments) to bring up the configuration dialog.

Yep. I recommend setting WinDbg as the default post-mortem debugger. You can save your dumps from there.

Related

Visual Studio debugger doesn't stop on breakpoints

From time to time I see that debugger doesn't honor breakpoints at all, mostly when there are threads involved. Is this a known problem? If not I'll try to create a repro.
FYI a locked MDB file is causing this behaviour. Filled a bug report:
https://bugzilla.xamarin.com/show_bug.cgi?id=5608
I guess that your IDE is just not attached to those process which works at the moment when you see no stop on breakpoints. Look at this Walkthrough: Debugging a Multithreaded Application to see how to do that safely.
Hope that helps,

Difficult to debug embedded application

I'm trying to debug an application on an embedded device running an old version of Linux/Qtopia. I asked for help on QT forums but the people there don't know about old software and embedded systems. I'd really like some help with debug strategies.
My program will crash after the main window has been constructed, i.e. some time into the event loop. But depending on the order of functions in the constructor, sometimes it will run only from the console and sometimes it will only run from the icon. Despite my best efforts I can't narrow down what is causing the problem.
There is no seg fault or signal but my program does not continue and the destructor does not get called. It seems to me that one of the first things that would happen in the event loop is a resize event and when this is called could vary if you ran from the console or icon. Also, the various widgets in my GUI would be initialised and drawn so that is also a potential source of error, if I haven't set up something properly.
My debugging options are limited as the area where the crash actually occurs is not under my control. I tried logging to a file and printing to stderr but this was no help. When I got to the state where it runs from the icon but not console, I tried running in gdb and strace but it ran OK - the classic problem of debug software initialising differently.
My next thought is to try to force a core dump and then analyse that. How do I force a core dump ? Is there a better strategy ?
Logging to a file or to a communication port (serial port, etc.) is probably the simplest way to see what is happening and maintaining the normal runtime (i.e. not in a debugger).
You say that logging to a file and printing to stderr was no help. Why not? Are you printing relevant debugging information to the file? Are you using the Linux/Qtopia sources and adding debug logging?
Assuming you have sources for all of the code you are running, it should be just a matter of adding debug logging in the right places to pinpoint where the problem is occurring.

Debugging a minidump in Visual Studio where the call stack is null

I have a customer who is getting a 100% reproduceable crash that I can't replicate in my program compiled in Visual Studio 2005. I sent them a debug build of my program and kept all the PDB and DLL files handy. They sent me the minidump file, but when I open it I get:
"Unhandled exception at 0x00000000 in MiniDump.dmp: 0xC0000005: Access violation reading location 0x00000000."
Then the call stack shows only "0x00000000()" and the disassembly shows me a dump of the memory at 0x0. I've set up the symbol server, loaded my PDB symbols, etc. But I can't see any way of knowing which of the many DLLs actually caused the jump to null. This is a large project with many dependencies, and some of them are binaries that I don't have the source or PDBs for, as I am using an API as a 3rd party.
So how on earth is this minidump useful? How do I see which DLL caused the crash? I've never really used minidumps for debugging before, but all the tutorials I have read seem to at least display a function name or something else that gives you a clue in the call stack. I just get the one line pointing to null.
I also tried using "Depends" to see if there was some DLL dependency that was unresolved; however on my three test machines with various Windows OS's, I seem to get three different sets of OS DLL dependencies (and yet can't replicate the crash); so this doesn't seem a particularly reliable method either to diagnose the problem.
What other methods are available to determine the cause of this problem? Is there some way to step back one instruction to see which DLL jumped to null?
Well it looks like the answer in this instance was "Use WinDbg instead of Visual Studio for debugging minidumps". I couldn't get any useful info out of VS, but WinDbg gave me a wealth of info on the chain of function calls that led to the crash.
In this instance it still didn't help solve my problem, as all of the functions were in the 3rd party library I am using, so it looks like the only definitive answer to my specific problem is to use log files to trace the state of my application that leads to the crash.
I guess if anyone else sees a similar problem with an unhelpful call stack when debugging a minidump, the best practice is to open it with WinDgb rather than Visual Studio. Seems odd that the best tool for the job is the free Microsoft product, not the commerical one.
The other lesson here is probably "any program that uses a third party library needs to write a log file".
The whole idea behind all 'simple' ways of post mortem debugging is the capture of a stack trace. If your application overwrites the stack there is no way for such analysis. Only very sophisticated methods, that record the whole program execution in dedicated hardware could help.
The way to go in such a case are log files. Spread some log statements very wide around the area where the fault occurs and transmit that version to the customer. After the crash you'll see the last log statement in your log file. Add more log statements between that point and the next log statement that has not been recorded in the log file, ship that version again. Repeat until you found the line causing the problem.
I wrote a two part article about this at ddj.com:
About Log Files Part 1
About Log Files Part 2
Just an observation, but the the stack is getting truncated or over-written, might this be a simple case of using an uninitialised field, or perhaps a buffer overrun ?
That might be fairly easy to locate.
Have you tried to set WinDbg on a customer's computer and use it as a default debugger for any application that causes a crash? You just need to add pdb files to the folder where your application resides. When a crush happens WinDbg starts and you can try to get call stack.
Possibly you already know this, but here are some points about minidump debugging:
1. You need to have exactly the same executables and PDB files, as on the client computer where minidump was created, and they should be placed exactly in the same directories. Just rebuilding the same version doesn't help.
2. Debugger must be connected to MS Symbols server.
3. When debugger starts, it prints process loading log in the Output window. Generally, all libraries should be successfully loaded with debug information. Libraries without debug information are loaded as well, but "no debug info" is printed. Learn this log - it can give you some information.
If executable stack contains frames from a library without debug information, it may be not shown. This happens, for example, if your code is running as third-party library callback.
Try to create minidump on your own computer, by adding some code which creates unhandled exception, and debug it immediately. Does this work? Compare loading log in successful and unsuccessful debugging sessions.
You may have called null function pointer. Current executing function information is needed to show call stack information. Force set instruction pointer to start of any simple function, then you'll see call stack information again.
void SimpleFunc()
{ // <- set next statement here
}

How do I analyse a BSOD and the error information it will provide me?

Well, fortunately I haven't written many applications that cause a BSOD but I just wonder about the usefullness of the information on this screen. Does it contain any useful information that could help me to find the error in my code? If so, what do I need, exactly?
And then, the system restarts and probably has written some error log or other information to the system somewhere. Where is it, what does it contain and how do I use it to improve my code?
I did get a BSOD regularly in the past when I was interacting with a PBX system where the amount of documentation of it's drivers were just absent, so I had to do some trial-and-error coding. Fortunately, I now work for a different company and don't see any BSOD's as a result of my code.
If you want a fairly easy way to find out what caused an OS crash that will work ~90% of the time - assuming you have a crash dump available - then try the following:
Download WinDbg as part of the Debugging tools for Windows package. Note, you only need to install the component called Debugging Tools for Windows.
Run WinDbg
Select "Open Crash Dump" from the file menu
When the dump file has loaded type analyze -v and press enter
WinDbg will do an automated analysis of the crash and will provide a huge amount of information on the system state at the time of the crash. It will usually be able to tell you which module was at fault and what type of error caused the crash. You should also get a stack trace that may or may not be helpful to you.
Another useful command is kbwhich prints out a stack trace. In that list, look for a line contains .sys. This is normally the driver which caused the crash.
Note that you will have to configure symbols in WinDbg if you want the stack trace to give you function names. To do this:
Create a folder such as C:\symbols
In WinDbg, open File -> Symbol File Path
Add: SRV*C:\symbols*http://msdl.microsoft.com/download/symbols
This will cache symbol files from Microsoft's servers.
If the automated analysis is not sufficient then there are a variety of commands that WinDbg provides to enable you to work out exactly what was happening at the time of the crash. The help file is a good place to start in this scenario.
Generally speaking, you cannot cause a OS crash or bug check from within your application code. That said, if you are looking for general tips and stuff, I recommend the NTDebugging blog. Most of the stuff is way over my head.
What happens when the OS crashes is it will write a kernel dump file, depending on the current flags and so on, you get more or less info in it. You can load up the dump file in windbg or some other debugger. Windbg has the useful !analyze command, which will examine the dump file and give you hints on the bucket the crash fell into, and the possible culprits. Also check the windbg documentation on the general cause of the bug check, and what you can do to resolve it.

I need to find the point in my userland code that crash my kernel

I have big system that make my system crash hard. When I boot up, I don't even have
a coredump. If I log every line that
get executed until my system goes down. I will find that evil code.
Can I log every source code line in GDB to a file?
UPDATE:
ok, I found the bug. It was nasty. The application I started did not
take the system down. After learning about coredump inspection with mdb, and some gdb stepping I found out that the systemcall causing the dump, was not implemented. Updating the system to latest kernel will fix my problem. Thanks to all of you.
MY LESSON:
make sure you know what process causes the coredump. It's not always the one you started.
Sounds like a tricky little problem.
I often try to eliminate as many possible suspects as I can by commenting out large chunks of code, configuring the system to not run certain pieces (if it allows you to do that) etc. This amounts to doing an ad-hoc binary search on the problem, and is a surprisingly effective way of zooming in on offending code relatively quickly.
A potential problem with logging is that the log might not hit the disk before the system locks up - if you don't get a core dump, you might not get the log.
Speaking of core dumps, make sure you don't have a limit on your core dump size (man ulimit.)
You could try to obtain a list of all the functions in your code using objdump, process it a little bit and create a bunch of GDB trace statements on those functions - basically creating a GDB script automatically. If that turns out to be overkill, then a binary search on the code using tracepoints can also help you zoom in on the problem.
And don't panic. You're smarter than the bug - you'll find it.
You can not reasonably track every line of your source using GDB (too slow). Besides, a system crash is most likely a result of a system call, and libc is probably doing the system call on your behalf. Even if you find the line of the application that caused OS crash, you still don't really know anything.
You should start by clarifying which OS is crashing. For Linux, you can try the following approaches:
strace -fo trace.out /path/to/app
After reboot, trace.out will contain syscalls the application was doing just before the crash. If you are lucky, you'll see the last syscall-of-death, but I wouldn't count on it.
Alternatively, try to reproduce the crash on the user-mode Linux, or on kernel with KGDB compiled in.
These will tell you where the problem in the kernel is. Finding the matching system call in your application will likely be trivial.
Please clarify your problem: What part of the system is crashing?
Is it an application?
If so, which application? Is this an application which you have written yourself? Is this an application you have obtained from elsewhere? Can you obtain a clean interrupt if you use a debugger? Can you obtain a backtrace showing which functions are calling the section of code which crashes?
Is it a new hardware driver?
Is it based on an older driver? If so, what has changed? Is it based on a manufacturer's data sheet? Is that data sheet the latest and most correct?
Is it somewhere in the kernel? Which kernel?
What is the OS? I assume it is linux, seeing that you are using the GNU debugger. But of course, that is not necessarily so.
You say you have no coredump. Have you enabled coredumps on your machine? Most systems these days do not have coredumps enabled by default.
Regarding logging GDB output, you may have some success, but it depends where the problem is whether or not you will have the right output logged before the system crashes. There is plenty of delay in writing to disk. You may not catch it in time.
I'm not familiar with the gdb way of doing this, but with windbg the way to go is to have a debugger attached to the kernel and control the debugger remotely over a serial cable (or firewire) from a second debugger. I'm pretty sure gdb has similar capabilities, I could quickly find some hints here: http://www.digipedia.pl/man/gdb.4.html

Resources