Windbg with xp embedded, ntdll.dll symbols fail, other symbols affected? - debugging

I am using windbg with xp embedded. Attempting to fetch the operating system symbols fails with the message "Symbol file could not be found. Defaulted to export symbols for ntdll.dll". (Is this typical for xp embedded???)
I have no problem locating and loading symbols and source for my own code. However stepping through the code suggests there is a severe mismatch between the code and the symbol file as the location of variables in memory as returned by dv does not appear to agree with the actual memory contents (e.g. assign a variable, but afterwards, the address that dv claims corresponds to it doesn't appears unchanged).
My sympath lists the symbol directory first, then the cache, then the server so cached symbol files shouldn't be interfering.
Is this a latent effect of not finding the ntdll symbol files and using another one that doesn't match correctly or is there something else that could be causing this?
Example:
.sympath D:/Symbols
.symfix+
.srcpath D:/Symbols ** Yes, currently the source is in with the symbols
.reload
** (defaults to export symbols for ntdll.dll since symbol file can't be found)
bp 00401000 (break at a constructor)
g
(program runs till it hits constructor)
l+t
dv /i /t /V ** look up this pointer memory location to check constructor
** We bring up a memory window at the location the this pointer refers to and
** step through the code, but no changes appear in that memory window
** moreover a local LARGE_INTEGER whose value is set with QueryPerformanceCounter
** also appears unchanged after the call
** when the constructor returns we assign the memory address returned by
** new to a global pointer, whose memory address we look up with dt, but
** after the call that address still has 0 in it
Can anyone tell me how to actually fix this?
As a side note we actually run cdb as a server on the xp embedded machine and use the "connect to remote session" option of windbg. The above commands are all executed through windbg.

Executing !sym noisy before the .reload will let you know why it's not finding symbols for ntdll.dll. It's entirely possible that they're simply not indexed on the symbol server, which generally means you are out of luck (there really isn't anyone to contact to get this fixed unfortunately).
As for your other symbol issues:
1) Is this the release build of your code? If so, it's entirely expected
2) If it is the debug build, are you 100% sure that the source you're pointing to matches the target machine? Make sure you're 100% before answering :)
-scott

Related

Breakpoint on the .init Section of a Shared Library

I tried to run inkscape-0.92.3 in gdb. Precisely, I tried to set a breakpoint on the first address of the .init section in its main shared library (i.e., /usr/lib/inkscape/libinkscape_base.so). The address is 0x7ffff6ebd9d0 based on the information returned by info files. But when I set the breakpoint on this address using b *0x7ffff6ebd9d0, I receive the following error:
Cannot insert breakpoint 1.
Cannot access memory at address 0x7ffff6ebd9d0
This address is the address of the _init function of this library. The same symbol, also, exists in other shared libraries. So I can put a breakpoint on this symbol using b _init, which leads to a lot of sub-breakpoints. This time all breakpoints work fine and I can c(ontinue) until I reach the _init symbol for the libinkscape shared library. Does anybody know the reason for the error in the raw address case?
Does anybody know the reason for the error in the raw address case?
The reason: this address isn't mapped yet (the library hasn't been loaded yet).
It works for break _init case because GDB can check whether any newly-loaded shared library defines that symbol. But it's not smart enough to check whether address 0x7ffff6ebd9d0 becomes breakpoint-able.
You can work around this by using (gdb) set stop-on-solib-events 1. GDB will then stop every time new shared libraries are loaded, before running their initializers.
Once libinkscape_base.so shows up, you will be able to use the address breakpoint as desired.

Where is located the Initial Breakpoint?

I am writing a simple debugger for learning purposes. I need to know where the Initial Breakpoint set by Windows is located to handle it properly. Read somewhere that is should be at the function DbgBreakPoint() from ntdll.dll, however that function resolves to address 0x77ab0a60 and from my tests the Initial Breakpoint always raises at address 0x77aedbcf. Is this a function or just some random address with an INT 3 instruction? If I am not mistaken ntdll.dll is always loaded at the same address, if so do programs always break at this exact address, or is there a variation?
process in user mode begin execute from LdrInitializeThunk, it call LdrpInitializeProcess. this routine, after load all static dependencies but before call it initialization routines - check are debugger present (BeingDebugged member of PEB) and if yes - call LdrpDoDebuggerBreak where exist int 3 instruction. in case wow64 process the LdrpDoDebuggerBreak will be called 2 time - from 64 and 32 bit dll. as result 64-bit debugger got 2 breakpoints - STATUS_BREAKPOINT and STATUS_WX86_BREAKPOINT.
how handle this - already debugger must select yourself. interactive debugger simply stop here. another debugger tools, usually simply skip(handle) first STATUS_BREAKPOINT (and STATUS_WX86_BREAKPOINT) by returning DBG_CONTINUE

WinDbg shows some variables but not others, shows some variables in same location

I'm trying to use WinDbg 6.2.9200.16384 x64 over a serial cable to debug a driver I'm writing. WinDbg connects to the target machine (Windows 8) just fine, and I see all the dbgprints as the system boots and loads everything. I can load the symbols for my driver just fine and can set breakpoints, and when my driver hits those breakpoints, the system halts as expected. This is where things get weird: when I hit a breakpoint, I can only see some of the local variables in my function in both the locals window and when using the 'dv' command. I created a variable to test with:
int myInt = 8;
When I use a dbgprint to show the value of myInt, it works fine and I see it as 8. However, the variable doesn't even appear at all in the locals window or with the 'dv' command. Other variables do, such as
ULONG rcb = 0;
and I can see its value just fine in the locals window. These variables are literally declared one after the other.
Another symptom of this strange problem is this. I have a function
ULONG someFunction(UINT16 offset) {
ULONG rcb, tempAddr, temp, temp1;
ULONG writeAddr, readAddr;
UINT16 dev;
dev = 15;
...
}
I call this function like so:
someFunction(0x777);
When I set a breakpoint in this function and inspect the variable values with WinDbg, nothing makes any sense. First, it only sees 4 of my 8 variables, just offset, rcb, writeAddr, and readAddr. It tells me the value of offset is not 0x777 as I would expect, but 0xE061 (this changes each time I run the code). When I look closer at the locals window (same information is shown via 'dv' and '? varname' commands) I notice that the location of offset and the location of rcb are the exact same address. Likewise, writeAddr and readAddr are stored at the same address as well. None of the other variables are detected by the debugger.
I am convinced that I've loaded the symbols properly, the source and symbols paths are set correctly, I've run '.reload /f' a million times with no errors loading my driver's symbols. I'm still able to break and step through other lines of code, but the locals just don't make any sense. When I dbgprint, the correct values are shown so it seems like this is a problem with the debugger itself, not with my driver. Any ideas?
<>
Nowadays Compiler has been enhanced a lot to get better optimized binary with optimized performance and other metrics. Hence compiler store few variables as locals(visible via 'dv /v' command) and store other variables in their registers. That's the reason you didn't see variable int myInt in dv command. We can get to know which registers are being used for the variables, by disassembling the function using 'uf binary!functionname' or by viewing disassembled code in Windbg View-> Disassembly.
Note that the driver may behave little differently with and without optimization of the compiler in the aspects of performance, memory usage, etc. So its always recommended to debug the one generated from the default optimized compiler, as this is the one used in realtime user scenario.
I fixed the problem. To anyone else who runs into this same thing: I was working with a free build of the driver, so the compiler had optimized out a lot of my variables. To fix it, either compile a checked version of the driver, or add the line
MSC_OPTIMIZATION=/Od /Oi
to your sources file to disable optimizations for the free build. Hope this helps anyone with the same problem.

Is there a cap on the number of modules WinDbg can see?

Does anyone know if there is a cap on the number of DLLs WinDbg can see ? I believe Visual Studio was once capped at 500 but I can't find a source for this claim outside of some second hand accounts at work.
I'm trying to debug a hairy scenario and WinDbg's stack trace is incomplete. According to Process Explorer, the module I'm interested in is loaded but it doesn't show up in the output of 'lm' in WinDbg.
Suspiciously, said output is exactly 500 modules long, even though I know there are many more than that loaded, leading me to believe WinDbg isn't seeing DLLs beyond the first 500. Can anyone confirm ? Or suggest some other reason why a loaded module might not show up in 'lm' ?
Edit: upon further investigation, I was able to get WinDbg to load see the module I needed by attaching the debugger earlier, before that module was loaded.
It seems to me that, upon attaching to a process, the debugger engine will only see the first 500 dlls but will process subsequent loads correctly. I would still love confirmation from a WinDbg expert though, or better yet, a bypass to process more than 500 modules when attaching !
There is a registry key controlling the number of debugger messages a debugger can see.
When you increase the value to e.g. 2048 you can see all loaded dlls.
Here is the relevant key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager
DWORD DebuggerMaxModuleMsgs = e.g. 2048
I have expired that due to corruption in the module list windbg has not displayed all modules.
Here is a script (found in the Windbg help file) which I have used on 32 bits xp userdumps.
when hunting for modules not found in the lm output.
You can also try the !dlls in windbg.
$$ run with: $$>< C:\DbgScripts\walkLdr.txt
$$
$$ Get module list LIST_ENTRY in $t0.
r? $t0 = &#$peb->Ldr->InLoadOrderModuleList
$$ Iterate over all modules in list.
.for (r? $t1 = *(ntdll!_LDR_DATA_TABLE_ENTRY**)#$t0;
(#$t1 != 0) & (#$t1 != #$t0);
r? $t1 = (ntdll!_LDR_DATA_TABLE_ENTRY*)#$t1->InLoadOrderLinks.Flink)
{
$$ Get base address in $Base.
as /x ${/v:$Base} ##c++(#$t1->DllBase)
$$ Get full name into $Mod.
as /msu ${/v:$Mod} ##c++(&#$t1->FullDllName)
.block
{
.echo ${$Mod} at ${$Base}
}
ad ${/v:$Base}
ad ${/v:$Mod}
}

What does the "Unloaded" prefix mean in a Windows stack trace?

I've the hellish problem of a third party DLL appearing to cause a recursive stack overflow crash when it gets unloaded. I wind up with this pattern on the stack (using windbg):
<Unloaded_ThirdParty.dll>+0xdd01
ntdll!ExecuteHandler2+0x26
ntdll!ExecuteHandler+0x24
ntdll!KiUserExceptionDispatcher+0xf
<Unloaded_ThirdParty.dll>+0xdd01
ntdll!ExecuteHandler2+0x26
ntdll!ExecuteHandler+0x24
ntdll!KiUserExceptionDispatcher+0xf
...
As you would guess, I don't have the source code to ThirdParty.dll.
Q: What does the prefix "Unloaded_" mean in the stack dump. I haven't run across this before.
This means that ThirdParty.dll was no longer being referenced and has already been removed from memory at the time that the crash occurs. To find out the actual stack trace, you need to reload the .dll at its original place in memory with the following command:
.reload /f ThirdParty.dll=0xaaaaaaaa
Of course you need to replace 0xaaaaaaaa with the original base address of the module. This may be somewhat hard to figure out if the module has already been unloaded, but if you have an HMODULE lying around that refers to the dll, the value of that HMODULE is the base address. Worst case, you can add a debugger trace statement to your code that logs the HMODULE of the dll just before you unload it.
I've had a crash just like this before, and as JS points out it means that the dll has been unloaded prior to the crash. However, having the stack trace into that dll may not necessarily give you the information you need to diagnose the problem.
Something in your code is unloading the library because it thinks it's finished with it, but you still have a pointer to it (or to a function inside it) somewhere. My guess would be a callback, perhaps from another thread. I'd suggest searching through your source for any calls to FreeLibrary() and also putting a breakpoint on the FreeLibrary symbol. Find out where the library is being unloaded, and then at that point, ensure that all data that references the dll has been reset. Use a mutex if you have multiple threads.
A tool that may be very useful for this is the excellent Process Monitor which I think shows you dll load and unload events, and will give you a stack trace for each one.

Resources