Lldb and OSX crashReports - osx-mavericks

People,
I have just working on OSX 10.9, and have a crash at hand to debug.
I see OSX has LLVM and LLDB which replaces well known and well documented gdb.
Anyway, I see in crashreport the precise stack trace, and that's pretty impressive from Apple.
However, I can get to image lookup in lldb and print the API name.
When I use verbose option with image lookup it prints few extra information, however, I am still not able to view local variables in the specific API.
I tried image dump, image sym-tab etc and other lldb options.
None of them seems to helping. Scanned through StackOverflow to see if its there's but not luck yet.
Therefore I have the Q
From OSX crashreports we cannot get stack-trace with local variables/arguments values?
How do we see a function arguments/locals variables using LLDB when we a OSX crashreport handy.
I see frame variable etc works fine when attaching to a running process, however these doesn't work when I crash the process and try to see the locals/arguments.
Request you to please guide.
THank you.

The CrashReporter output consists (mainly) of a backtrace of all the threads in the program, the list of libraries/frameworks that were loaded at the time (with their load addresses and Mach-O UUIDs to identify which build of each binary was being used) and the register context for the frame that crashed.
image lookup -va in lldb will show you where the arguments/variables were located at that given pc. If a variable is stored in register rbx at that point, image lookup will say it's in rbx. Often times a variable is stored on the stack and the location will say something like rbp-40 where rbp is the frame pointer register on x86_64. In that case, it means the variable is currently available on the stack at the value of rbp minus 40.
Crash reports do not include any of the stack contents -- they intentionally don't include things that might be sensitive. For instance, your program might have a password in plaintext in a stack local variable.
If a variable you're interested in is stored in a register at the point of the crash, you can use the register context section of the crash report to figure out what value was in there.
Often it's not quite that simple. If you can read your way through assembly language, the best way to figure out what really happened is to disassemble the crashed function and use the register context information to understand why the final instruction resulted in the crash.

Related

How to extract stack traces from minidumps?

I've got a whole bunch of minidumps which were recorded during the runtime of an application through MiniDumpWriteDump. The minidumps were created on a machine with a different OS version than my development machine.
Now I'm trying to write a program to extract stack traces from the minidumps, using dbghelp.dll. I'm walking the MINIDUMP_MODULE_LIST and call SymLoadModule64, but this fails to download the pdbs (kernel32 etc.) from the public symbol server. If I add "C:\Windows\System32" to the symbol path it finds the dlls and downloads the symbols, but of course they don't match the dlls from the minidump, so the results are useless.
So how do I tell dbghelp.dll to download and use the proper pdbs?
[edit]
I forgot to state that SymLoadModule64 only takes a filename and no version/checksum information, so obviously with SymLoadModule64 alone it's impossible for dbghelp to figure out which pdb to download.
The information is actually available in the MINIDUMP_MODULE_LIST but I don't know how to pass it back to the dbghelp API.
There is SymLoadModuleEx which takes additional parameters, but I have no idea if that's what I need or what I should pass for the additional parameters.
[edit]
No luck so far, though I've noticed there's also dbgeng.dll distributed together with dbghelp.dll in the debugging SDK. MSDN looks quite well documented and says it's the same engine as windbg uses. Maybe I can use that to extract the stack traces.
If anyone can point me to some introduction to using dbgeng.dll to process minidumps that would probably help too, as the MSDN documents only the individual components but not how they work together.
Just in case anyone else wants to automate extracting stack traces from dumps, here's what I ended up doing:
Like I mentioned in the update it's possible to use dbgeng.dll instead of dbghelp.dll, which seems to be the same engine WinDbg uses. After some trial and error here's how to get a good stack trace with the same symbol loading mechanism as WinDbg.
call DebugCreate to get an instance of the debug engine
query for IDebugClient4, IDebugControl4, IDebugSymbols3
use IDebugSymbols3.SetSymbolOptions to configure how symbols are loaded (see MSDN for the options WinDbg uses)
use IDebugSymbols3.SetSymbolPath to set the symbol path like you would do in WinDbg
use IDebugClient4.OpenDumpFileWide to open the dump
use IDebugControl4.WaitForEvent to wait until the dump is loaded
use IDebugSymbols3.SetScopeFromStoredEvent to select the exception stored in the dump
use IDebugControl4.GetStackTrace to fetch the last few stack frames
use IDebugClient4.SetOutputCallbacks to register a listener receiving the decoded stack trace
use IDebugControl4.OutputStackTrace to process the stack frames
use IDebugClient4.SetOutputCallbacks to unregister the callback
release the interfaces
The call to WaitForEvent seems to be important because without it the following calls fail to extract the stack trace.
Also there still seems to be some memory leak in there, can't tell if it's me not cleaning up properly or something internal to dbgeng.dll, but I can just restart the process every 20 dumps or so, so I didn't investigate more.
An easy way to automate the analysis of multiple minidump files is to use the scripts written by John Robbins in his article "Automating Analyzing Tons Of Minidump Files With WinDBG And PowerShell" (you can grab the code on GitHub).
This is easy to tweak to have it perform whatever WinDbg commands you'd like it to, if the default setup is not sufficient.

Debugging a minidump in Visual Studio where the call stack is null

I have a customer who is getting a 100% reproduceable crash that I can't replicate in my program compiled in Visual Studio 2005. I sent them a debug build of my program and kept all the PDB and DLL files handy. They sent me the minidump file, but when I open it I get:
"Unhandled exception at 0x00000000 in MiniDump.dmp: 0xC0000005: Access violation reading location 0x00000000."
Then the call stack shows only "0x00000000()" and the disassembly shows me a dump of the memory at 0x0. I've set up the symbol server, loaded my PDB symbols, etc. But I can't see any way of knowing which of the many DLLs actually caused the jump to null. This is a large project with many dependencies, and some of them are binaries that I don't have the source or PDBs for, as I am using an API as a 3rd party.
So how on earth is this minidump useful? How do I see which DLL caused the crash? I've never really used minidumps for debugging before, but all the tutorials I have read seem to at least display a function name or something else that gives you a clue in the call stack. I just get the one line pointing to null.
I also tried using "Depends" to see if there was some DLL dependency that was unresolved; however on my three test machines with various Windows OS's, I seem to get three different sets of OS DLL dependencies (and yet can't replicate the crash); so this doesn't seem a particularly reliable method either to diagnose the problem.
What other methods are available to determine the cause of this problem? Is there some way to step back one instruction to see which DLL jumped to null?
Well it looks like the answer in this instance was "Use WinDbg instead of Visual Studio for debugging minidumps". I couldn't get any useful info out of VS, but WinDbg gave me a wealth of info on the chain of function calls that led to the crash.
In this instance it still didn't help solve my problem, as all of the functions were in the 3rd party library I am using, so it looks like the only definitive answer to my specific problem is to use log files to trace the state of my application that leads to the crash.
I guess if anyone else sees a similar problem with an unhelpful call stack when debugging a minidump, the best practice is to open it with WinDgb rather than Visual Studio. Seems odd that the best tool for the job is the free Microsoft product, not the commerical one.
The other lesson here is probably "any program that uses a third party library needs to write a log file".
The whole idea behind all 'simple' ways of post mortem debugging is the capture of a stack trace. If your application overwrites the stack there is no way for such analysis. Only very sophisticated methods, that record the whole program execution in dedicated hardware could help.
The way to go in such a case are log files. Spread some log statements very wide around the area where the fault occurs and transmit that version to the customer. After the crash you'll see the last log statement in your log file. Add more log statements between that point and the next log statement that has not been recorded in the log file, ship that version again. Repeat until you found the line causing the problem.
I wrote a two part article about this at ddj.com:
About Log Files Part 1
About Log Files Part 2
Just an observation, but the the stack is getting truncated or over-written, might this be a simple case of using an uninitialised field, or perhaps a buffer overrun ?
That might be fairly easy to locate.
Have you tried to set WinDbg on a customer's computer and use it as a default debugger for any application that causes a crash? You just need to add pdb files to the folder where your application resides. When a crush happens WinDbg starts and you can try to get call stack.
Possibly you already know this, but here are some points about minidump debugging:
1. You need to have exactly the same executables and PDB files, as on the client computer where minidump was created, and they should be placed exactly in the same directories. Just rebuilding the same version doesn't help.
2. Debugger must be connected to MS Symbols server.
3. When debugger starts, it prints process loading log in the Output window. Generally, all libraries should be successfully loaded with debug information. Libraries without debug information are loaded as well, but "no debug info" is printed. Learn this log - it can give you some information.
If executable stack contains frames from a library without debug information, it may be not shown. This happens, for example, if your code is running as third-party library callback.
Try to create minidump on your own computer, by adding some code which creates unhandled exception, and debug it immediately. Does this work? Compare loading log in successful and unsuccessful debugging sessions.
You may have called null function pointer. Current executing function information is needed to show call stack information. Force set instruction pointer to start of any simple function, then you'll see call stack information again.
void SimpleFunc()
{ // <- set next statement here
}

Analysing crash dump in windbg

I am using a third party closed source API which throws an exception stating that "all named pipes are busy".
I would like to debug this further (rather than just stepping through) so I can actually learn what is happening under the covers.
I have taken a dump of this process using WinDbg. What commands should I now use to analyse this dump?
Thanks
You could start doing as follows to get an overview of the exception:
!analyze -v
Now you could load the exception context record:
.ecxr
And now... just take a look at the stack, registers, threads,...
kb ;will show you the stack trace of the crash.
dv ;local variables
Depending on the clues you get, you should follow a different direction. If you want a quick reference to WinDbg I'd recommend you this link.
I hope you find some of this commands and info useful.
In postmortem debugging with Windbg, it can be useful to run some general diagnostic commands before deciding where to dig deeper. These should be your first steps:
.logopen <filename> (See also .logappend)
.lastevent See why the process halted and on what thread
u List disassembly near $eip on offending thread
~ Status of all threads
Kb List callstack, including parameters
.logclose
These commands typically give you an overview of what happened so you can dig further. In the case of dealing with libraries where you don't have source, sending the resulting log file to the vendor along with the build # of the binary library should be sufficient for them to trace it to a known issue if there is one.
This generally happens when a client calls CreateFile for an existing pipe and all the existing pipe instances are busy. At this point CreateFile returns an error and the error code is ERROR_PIPE_BUSY. The right thing at this point is to call WaitNamedPipe with a timeout value to wait for a pipe instance to become available.
The problem generally happens when more than one client tries to connect to the named pipe at the same time.
I assume that the 3rd party dll is native (Otherwise, just use Reflector)
Before using WinDbg to analyze the dump, try using Process-Monitor (SysInternals, freeware) to monitor your process's activity. if it fails because of a file system related issue, you can see exactly what caused the problem and what exactly it tried to do before failing.
If Process-Monitor wasn't enough than you can try and debug your process. but in order to see some meaningful information about the 3rd party dll you'll need it's pdb's.
After setting the correct debug symbols, you can view the call stack by using the k command or one of it's variations (again, I assume you're talking about native code). if your process is indeed crashing because of this dll than examine the parameters that you pass to it's function to ensure that the problem is not on your side. I guess that further down the call stack, you reach some Win32 API - examine the parameters that the dll's function is passing, trying to see if something "smells". If you have the dll's private symbol you can examine it's function's local variables as well (dv) which can give you some more information.
I hope I gave you a good starting point.
This is an excellent resource for using WinDbg to analyze crashes that may be of some use: http://www.networkworld.com/article/3100370/windows/how-to-solve-windows-10-crashes-in-less-than-a-minute.html
The article is for Windows 10, but it contains links to similar information for earlier versions of Windows.

I need to find the point in my userland code that crash my kernel

I have big system that make my system crash hard. When I boot up, I don't even have
a coredump. If I log every line that
get executed until my system goes down. I will find that evil code.
Can I log every source code line in GDB to a file?
UPDATE:
ok, I found the bug. It was nasty. The application I started did not
take the system down. After learning about coredump inspection with mdb, and some gdb stepping I found out that the systemcall causing the dump, was not implemented. Updating the system to latest kernel will fix my problem. Thanks to all of you.
MY LESSON:
make sure you know what process causes the coredump. It's not always the one you started.
Sounds like a tricky little problem.
I often try to eliminate as many possible suspects as I can by commenting out large chunks of code, configuring the system to not run certain pieces (if it allows you to do that) etc. This amounts to doing an ad-hoc binary search on the problem, and is a surprisingly effective way of zooming in on offending code relatively quickly.
A potential problem with logging is that the log might not hit the disk before the system locks up - if you don't get a core dump, you might not get the log.
Speaking of core dumps, make sure you don't have a limit on your core dump size (man ulimit.)
You could try to obtain a list of all the functions in your code using objdump, process it a little bit and create a bunch of GDB trace statements on those functions - basically creating a GDB script automatically. If that turns out to be overkill, then a binary search on the code using tracepoints can also help you zoom in on the problem.
And don't panic. You're smarter than the bug - you'll find it.
You can not reasonably track every line of your source using GDB (too slow). Besides, a system crash is most likely a result of a system call, and libc is probably doing the system call on your behalf. Even if you find the line of the application that caused OS crash, you still don't really know anything.
You should start by clarifying which OS is crashing. For Linux, you can try the following approaches:
strace -fo trace.out /path/to/app
After reboot, trace.out will contain syscalls the application was doing just before the crash. If you are lucky, you'll see the last syscall-of-death, but I wouldn't count on it.
Alternatively, try to reproduce the crash on the user-mode Linux, or on kernel with KGDB compiled in.
These will tell you where the problem in the kernel is. Finding the matching system call in your application will likely be trivial.
Please clarify your problem: What part of the system is crashing?
Is it an application?
If so, which application? Is this an application which you have written yourself? Is this an application you have obtained from elsewhere? Can you obtain a clean interrupt if you use a debugger? Can you obtain a backtrace showing which functions are calling the section of code which crashes?
Is it a new hardware driver?
Is it based on an older driver? If so, what has changed? Is it based on a manufacturer's data sheet? Is that data sheet the latest and most correct?
Is it somewhere in the kernel? Which kernel?
What is the OS? I assume it is linux, seeing that you are using the GNU debugger. But of course, that is not necessarily so.
You say you have no coredump. Have you enabled coredumps on your machine? Most systems these days do not have coredumps enabled by default.
Regarding logging GDB output, you may have some success, but it depends where the problem is whether or not you will have the right output logged before the system crashes. There is plenty of delay in writing to disk. You may not catch it in time.
I'm not familiar with the gdb way of doing this, but with windbg the way to go is to have a debugger attached to the kernel and control the debugger remotely over a serial cable (or firewire) from a second debugger. I'm pretty sure gdb has similar capabilities, I could quickly find some hints here: http://www.digipedia.pl/man/gdb.4.html

Meaning of hex number in Windows crash dialog

Every now and then (ahem...) my code crashes on some system; quite often, my users send screenshots of Windows crash dialogs. For instance, I recently received this:
Unhandled win32 exception # 0x3a009598 in launcher2g.exe:
0xC00000005: Access violation writing location 0x00000000.
It's clear to me (due to the 0xc0000005 code as well as the written out error message) that I'm following a null pointer somewhere in my launcher2g.exe process. What's not clear to me is the significance of the '0x3a009598' number. Is this the code offset in the process' address space where the assembler instruction is stored which triggered the problem?
Under the assumption that 0x3a000000 is the position where the launcher2g.exe module was loaded into the process, I used the Visual Studio debugger to check the assembler code at 0x3a009598 but unfortunately that was just lots of 'int 3' instructions (this was a debug build, so there's lots of int 3 padding).
I always wondered how to make the most of these # 0x12345678 numbers - it would be great if somebody here could shed some light on it, or share some pointers to further explanations.
UPDATE: In case anybody finds this question in the future, here's a very interesting read I found which explains how to make sense of error messages as the one I quoted above: Finding crash information using the MAP file.
0x3a009598 would be the address of the x86 instruction that caused the crash.
The EXE typically gets loaded at its preferred load address - usually 0x04000000 iirc. So its probably bloody far away from 0x3a009598. Some DLL loaded by the process is probably located at this address.
Crash dumps are usually the most useful way to debug this kind of thing if you can get your users to generate and send them. You can load them with Visual Studio 2005 and up and get automatic symbol resolution of system dlls.
Next up, the .map files produced by your build process should help you determine the offending function - assuming you do manage to figure out which exe/dll module the crash was inside, and what its actual load address was.
On XP users can use DrWatsn32 to produce and send you crash dumps. On Vista and up, Windows Error Reporting writes the crash dumps to c:\users\\AppData\Local\Temp*.mdmp

Resources