How do I debug a process that starts at boot time? - windows

I am trying to set a breakpoint into a Windows service that starts at boot time. Because of an unfortunate mistake on my end, the service forces the machine into a reboot loop: this means that I can't get to a stable state from which I could deploy a fix, and obviously I can't try to debug the service at a more convenient time.
I can use windbg in kernel mode. I'd very much like to break when the service hits the wmain function, but I'm having issues with that.
Up to now, I found that I can stop when the image is loaded by using the following commands:
!gflag +ksl
sxe ld MyServiceExecutable.exe
The problem is that once it breaks, I find myself in an empty process, in which I am apparently unable to set breakpoints. bm MyServiceExecutable!wmain says that it can't find the symbol and that the breakpoint will be "deferred", but it is effectively never set or reached. Setting a breakpoint on KERNEL32!BaseThreadInitThunk seems to work more or less at random across all the processes running and I didn't have a lot of luck with it to stop in my service so far.

Alright, this might not the best way to do it, but it worked. MSFTs, please correct me if I'm doing something dumb!
The first part was good:
kd> !gflag +ksl
New NtGlobalFlag contents: 0x00440000
ksl - Enable loading of kernel debugger symbols
ece - Enable close exception
kd> sxe ld MyServiceExecutable.exe
kd> g
In kernel mode, sxe ld will stop the first time the executable is loaded only.
When the debugger stops again, we're inside the freshly created process. We don't need the gflag anymore:
kd> !gflag -ksl
New NtGlobalFlag contents: 0x00400000
ece - Enable close exception
Though we're going to need the EPROCESS pointer. We can get it with .process or !process -1 0, but it is already in the $proc pseudo-register:
kd> r $proc
$proc=0011223344556677
kd> .process
Implicit process is now 00112233`44556677
From this point it's possible to set breakpoints on nt symbols, so let's use NtMapViewOfSection as it's called for each dll loaded.
kd> bp /p #$proc nt!NtMapViewOfSection
kd> g
On the next stop ntdll should be loaded (check with kn if it's on the stack, .reload /user if necessary), so you can set a breakpoint on RtlUserThreadStart. Also, we are overwriting breakpoint 0, because since we don't need to break on NtMapViewOfSection anymore(it would just be a nuisance).
kd> bp0 /p #$proc ntdll!RtlUserThreadStart
kd> g
All symbols should have been loaded by the time the first user thread starts, so you're free to set your breakpoint wherever you want.
kd> .reload /user
kd> bp /p #$proc MyServiceExecutable!wmain
kd> g

Use the technique that MS describes for debugging winlogon which involves using the kernel mode and user mode debuggers in tandem. See "Debugging WinLogon" in the debugger.chm file that comes with the "Debugging Tools for Windows" download.

Related

Trying to Analyze Dump File with WinDbg: PEB is Paged Out, Won't load symbols

Hi I'm trying to use WinDbg to look at a memory.dmp kernel dump file with the aim of diagnosing a crash. When I open the crash file and get the symbols I get the message
BugCheck A, {2, ff, 4e, fffff801a42ebff2}
CompressedPageDataReader warning: failed to get _SM_PAGE_KEY symbol.
CompressedPageDataReader warning: failed to get _SM_PAGE_KEY symbol.
Probably caused by : ntkrnlmp.exe ( nt!KxWaitForLockOwnerShipWithIrql+12 )
Followup: MachineOwner
---------
0: kd> .reload
Loading Kernel Symbols
..................................CompressedPageDataReader warning: failed to get _SM_PAGE_KEY symbol.
Loading User Symbols
PEB is paged out (Peb.Ldr = 000000e1`114f4018). Type ".hh dbgerr001" for details
Which I assume means it can't load some of the symbols. When I try the !vad process to fix the PEB page error I get
0: kd> !vad 000000e1114f4018 1
VAD # ffffca0f084164e0
Start VPN e111400 End VPN e1115ff Control Area 0000000000000000
FirstProtoPte 0000000000000000 LastPte f943916c00000002 Commit Charge 21 (0n33)
Secured.Flink 0 Blink 0 Banked/Extend 0
File Offset 50005
ViewUnmap NoChange PrivateMemory READWRITE
which doesn't correspond to what the internet tells me the result should be.
when I try the !process method I get
0: kd> !process 000000e1114f4018 1
Searching for Process with Cid == e1114f4018
Invalid Handle: 0x114f4018
***Could not retrieve process handle from the Cid table. Searching...
which also is an error which doesn't load the symbols either. What is wrong? In either the symbol loading or the crash itself if there is enough info.
NOTE: I've tried the solutions from the MSDN page and they dont work as noted. Part of the problem is I don't know if I'm using the 000000e1`114f4018 address I'm given in the PEB paged out error message correctly in the command.
NOTE 2: Here is a link to the crash report from WinDBG. If someone can figure out the cause and explain how they figured it out that would be dandy.
https://www.scribd.com/document/326672131/Crash-Archive
The PEB being paged out is normal. In order for the PEB to be present, the dump must be a full memory dump and the corresponding pages must be resident at the time of the crash.
This mostly doesn't matter because the PEB contains user mode state (user loaded modules, command line, environment variables, etc.) which generally isn't interesting for a kernel mode crash.
What IS interesting is the !analyze -v output, including the kernel mode stack of the faulting thread. Based on what you have provided, we can at least see the crash code:
BugCheck A, {2, ff, 4e, fffff801a42ebff2}
Bugcheck A is an IRQL_NOT_LESS_OR_EQUAL, which means you have an invalid pointer dereference at an elevated IRQL (>= DISPATCH_LEVEL). The first argument is the bad address ("2") and the second argument is the IRQL ("0xFF" - this is WinDbg speak for "interrupts disabled on the processor").
In summary this means that someone has dereferenced address "2", which clearly isn't a good thing. It happened to happen with interrupts disabled on the processor, so you get an IRQL_NOT_LESS_OR_EQUAL. The trick then is to look at the call stack and faulting instruction and figure out where the "2" came from.

!clrstack never reports anything

I know I am dealing with a managed thread but I have never managed to get !clrstack to work. I always get:
0:000> !clrstack
OS Thread Id: 0xaabb (0)
Child SP IP Call Site
GetFrameContext failed: 1
00000000 00000000
Admittedly I could use !dumpstack but I can't figure out how to make it show the arguments. It only shows ChildEBP, Return Address and the function name. Besides it mixes managed and unmanaged calls and I'd like to focus only on the managed portions.
UPDATE
As requested by Thomas, !clrstack -i returns:
0:000> !clrstack -i
Loaded c:\cache\mscordbi.dll\53489464110000\mscordbi.dll
Loaded c:\cache\mscordacwks_x86_x86_4.0.30319.34209.dll\5348961E69d000\mscordacwks_x86_x86_4.0.30319.34209.dll
Dumping managed stack and managed variables using ICorDebug.
=================================================================
Child SP IP Call Site
003ad0bc 77d1f8e1 [NativeStackFrame]
Stack walk complete.
Its progress :-)
Please post the output from !dumpstack or k to double check the callstack, you know the !clrstack only display the managed code call stack, however sometimes , if the managed thread finished this work, it would be waited in the CLR code(semaphore) if you use the thread pool, and the remain call stack become totally unmanaged call stack.so !clrstack display nothing for it.

Why I see ntdll disassembly, not my assembly code in WinDbg?

I want to analyze my first assembler program. Look at registers, execute step by step etc. It's for learning purposes.
I have a problem. Disassembly is strange. I can't find my code. I step through some ntdll functions. There is no my MessageBoxA, no ExitProcess etc.
I have used OllyDbg (32-bit) before and OllyDbg starts executing from normal entry of the program. Disassembly in OllyDbg was very similar to my MASM code.
What I am doing wrong? Why there is different disassembly? How to step through my code, not ntdll?
ollydbg interpreted the Address of Entry Point in the Pe Header and set a temporary BreakPoint and continued execution until that address Which Normally would be main if it was an assembler program or WinMainCrtStartup if it was compiler generated binary
Windbg will not do that it will stop at the system BreakPoint ( the default behavior of the Debug Apis)
you have two options
if it is your own assembler program and you have symbols for it
do
.reload /f
bp [your exe]![Your EntryPoint]
bl to confirm if it is right
g to continue execution and break in your code
if it is a third party program for which you do not have source
lmm [name of third party binary]
!dh [start address of the third party binary] ( see lm results to know the address)
look for Address of Entry Point
bp [start address of the third party binary] + [Address of entrypoint]
g
windbg will stop in the user code
a sample for method 2 on calc.exe under win7 sp1
0:000> lm m calc
start end module name
00210000 002d0000 calc (deferred)
0:000> !dh 210000
---------------------
12D6C address of entry point
0:000> bp 210000+12d6c
0:000> bl
0 e 00222d6c 0001 (0001) 0:**** calc!WinMainCRTStartup
0:000> g
calc!WinMainCRTStartup:
00222d6c e84bfdffff call calc!__security_init_cookie (00222abc)
Short answer: bu $exentry; g
Long answer:
A process starts its user mode execution in OS code found in ntdll which does all sorts of initialization and finally calls the entry point stored in the PE header.
WinDbg breaks in what's called the initial breakpoint (or ibp in WinDbg parlance). OllyDbg continues running after this breakpoint and sets a breakpoint on the PE entry point. That's nice, except when it screws you. For example, a well-known anti-debugging trick against olly was creating a binary with custom TLS callbacks, a rather esoteric feature. These callbacks are called before the PE entry point and thus go undetected by olly.
What you want to do is indeed set a breakpoint on the PE entry point.
The hard way is doing it like blabb did - using !dh to dump the module header and setting a breakpoint on the address noted there. Even the hard way can be a little easier, as there's no need to lm before running !dh. In WinDbg module names are interpreted at their base addresses. For example:
0:000> !dh -f notepad
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
14C machine (i386)
5 number of sections
55BEBE90 time date stamp Sun Aug 02 18:06:24 2015
...
OPTIONAL HEADER VALUES
10B magic #
12.10 linker version
15400 size of code
1F000 size of initialized data
0 size of uninitialized data
159F0 address of entry point
1000 base of code
And then it's simply a matter of:
0:000> bu notepad+159F0
0:000> bl
0 e 002259f0 0001 (0001) 0:**** notepad!WinMainCRTStartup
But the easy way is to use one of WinDbg's builtin pseudo-registers ("automatic pseudo-registers" is their term, in contrast to user-defined pseudo-registers), namely the aformentioned $exentry.
0:000> ? notepad + 159F0 == $exentry
Evaluate expression: 1 = 00000001
0:000> ? $exentry
Evaluate expression: 2251248 = 002259f0
You can see that 002259f0 is the same address as the one we calculated by hand.
For completeness I note that there's the $iment(address) operator that gives the entry point of any module, not just the main module (the EXE):
0:000> ? $iment(winspool)
Evaluate expression: 1889096272 = 70995250
0:000> ln $iment(winspool)
Browse module
Set bu breakpoint
(70995250) WINSPOOL!_DllMainCRTStartup | (7099526b) WINSPOOL!__CppXcptFilter
Exact matches:
WINSPOOL!_DllMainCRTStartup (<no parameter info>)

Pinning a DLL in memory (increase reference count)

I am trying to run an application, but the application exits due to an access violation. Running the application in the debugger I can see that this is caused by an unloaded library. I can not wait for the next release of the application, so I'm trying to workaround the problem.
I wonder whether WinDbg provides a way of increasing the reference count of a loaded module, similar to the C++ LoadLibrary() call. I could then break on module loads and increase the reference count on the affected DLL to see if I can use the application then.
I have already looked for commands starting with .load, !load, .lock, !lock, .mod and !mod in WinDbg help. .load will load the DLL as an extension into the debugger process, not into the target process.
Update
Forgot to mention that I have no source code, so I can't simply implement a LoadLibrary() call as a workaround and recompile.
The comment by Hans Passant leads me to .call and I tried to use it like
.call /v kernel32!LoadLibraryA("....dll")
but it gives the error message
Symbol not a function in '.call /v kernel32!LoadLibraryA("....dll")'
Update 2
Probably the string for the file name in .call should be a pointer to some memory in the target process instead of a string which resides in WinDbg.exe where I type the command. That again means I would probably mean to allocate some memory to store the string inside, so this might become more complex.
Using .call in windbg as always been finicky to me. I believe you are having trouble with it because kernel32 only has public symbols so the debugger doesn't know what it's arguments look like.
So let's look at some alternatives...
The easy way
You can go grab a tool like Process Hacker, which I think is a wonderful addition to any debugger's tool chest. It has an option to inject a DLL into a process.
Behind the scenes, it calls CreateRemoteThread to spawn a thread in the target process which calls LoadLibrary on the chosen DLL. With any luck, this will increase the module reference count. You can verify that the LoadCount has been increased in windbg by running the !dlls command before and after the dll injection.
The hard way
You can also dig into the internal data structures Windows uses to keep track of a process's loaded modules and play with the LoadCount. This changes between versions of Windows and is a serious no-no. But, we're debugging, so, what the hell? Let's do this.
Start by getting a list of loaded modules with !dlls. Suppose we care about your.dll; we might see something like:
0x002772a8: C:\path\to\your.dll
Base 0x06b80000 EntryPoint 0x06b81000 Size 0x000cb000 DdagNode 0x002b3a10
Flags 0x800822cc TlsIndex 0x00000000 LoadCount 0x00000001 NodeRefCount 0x00000001
We can see that the load count is currently 1. To modify it, we could use the address printed before the module path. It is the address of the the ntdll!_LDR_DATA_TABLE_ENTRY the process holds for that module.
r? #$t0 = (ntdll!_LDR_DATA_TABLE_ENTRY*) 0x002772a8
And, now you can change the LoadCount member to something larger as so:
?? #$t0->LoadCount = 2
But, as I said, this stuff changes with new versions of Windows. On Windows 8, the LoadCount member was moved out of _LDR_DATA_TABLE_ENTRY and into a new ntdll!_LDR_DDAG_NODE structure. In place of it, there is now an ObsoleteNodeCount which is not what we want.
On Windows 8, we would run the following command instead:
?? #$t0->DdagNode->LoadCount = 2
And, time to check our work...
0x002772a8: C:\path\to\your.dll
Base 0x06b80000 EntryPoint 0x06b81000 Size 0x000cb000 DdagNode 0x002b3a10
Flags 0x800822cc TlsIndex 0x00000000 LoadCount 0x00000002 NodeRefCount 0x00000001
Awesome. It's 2 now. That'll teach FreeLibrary a lesson about unloading our DLLs before we say it can.
The takeaway
Try the easy way first. If that doesn't work, you can start looking at the internal data structures Windows uses to keep track of this kind of stuff. I don't provide the hard way hoping you'll actually try it, but that it might make you more comfortable around the !dlls command and those data structures in the future.
Still, all modifying the LoadCount will afford you is confirmation that you are seeing a DLL get unloaded before it should have. If the problem goes away after artificially increasing the LoadCount, meaning that you've confirmed your theory, you'll have to take a different approach to debugging it -- figuring out when and why it got unloaded.
A dll that is linked while compiling will normally have a LoadCount of -1 (0xffff) and it is not Unloadable Via FreeLibrary
so you can utilize the loadModule Event to break on a Dynamically Loaded Module and increase the LoadCount during the Event
Blink of InLoadOrderModuleList (last dll Loaded in the process) when on initial break at ntdll!Dbgbreak() xp-sp3 for an arbitrary console app which uses a dll
0:000> dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName LoadCount ##((( #$peb)->Ldr)->InLoadOrderModuleList.Blink)
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\system32\GDI32.dll"
+0x038 LoadCount : 0xffff <----------- not unloadable via FreeLibrary
setting up break on Specific Module Load
0:000> sxe ld skeleton
0:000> g
ModLoad: 10000000 10005000 C:\skeleton.dll
ntdll!KiFastSystemCallRet:
7c90e514 c3 ret
the LoadModule Breaks on MapSection so Ldr isnt yet updated
0:000> dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName LoadCount ##((( #$peb)->Ldr)->InLoadOrderModuleList.Blink)
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\system32\GDI32.dll"
+0x038 LoadCount : 0xffff
go up until the Ldr is updated
0:000> gu;gu;gu
ntdll!LdrpLoadDll+0x1e9:
7c91626a 8985c4fdffff mov dword ptr [ebp-23Ch],eax ss:0023:0013fa3c=00000000
blink showing the last loaded Module notice loadCount 0 not updated yet
0:000> dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName LoadCount ##((( #$peb)->Ldr)->InLoadOrderModuleList.Blink)
+0x024 FullDllName : _UNICODE_STRING "C:\skeleton.dll"
+0x038 LoadCount : 0
dump the LoadEntry of the module
0:000> !dlls -c skeleton
Dump dll containing 0x10000000:
**0x00252840:** C:\skeleton.dll
Base 0x10000000 EntryPoint 0x10001000 Size 0x00005000
Flags 0x00000004 LoadCount 0x00000000 TlsIndex 0x00000000
LDRP_IMAGE_DLL
increase load count arbitrarily and redump (process attach hasnt been called yet)
0:000> ed 0x252840+0x38 4
0:000> !dlls -c skeleton
Dump dll containing 0x10000000:
0x00252840: C:\skeleton.dll
Base 0x10000000 EntryPoint 0x10001000 Size 0x00005000
Flags 0x00000004 LoadCount 0x00000004 TlsIndex 0x00000000
LDRP_IMAGE_DLL
run the binary
0:000> g
dll is loaded into the process break with ctrl+break
Break-in sent, waiting 30 seconds...
(aa0.77c): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
7c90120e cc int 3
dump and see system has updated the loadcount to our count+1 also process attach has been called
0:001> !dlls -c skeleton
Dump dll containing 0x10000000:
0x00252840: C:\skeleton.dll
Base 0x10000000 EntryPoint 0x10001000 Size 0x00005000
Flags 0x80084004 LoadCount 0x00000005 TlsIndex 0x00000000
LDRP_IMAGE_DLL
LDRP_ENTRY_PROCESSED
LDRP_PROCESS_ATTACH_CALLED
btw use ken johnsons (skywing) sdbgext !remotecall instead of .call
it doesnt require Private Symbols
.load sdbgext
!remotecall kernel32!LoadLibraryA 0 "c:\skeleton.dll" ; g
should load the dll in the process
or use
!loaddll "c:\\skeleton.dll" from the same extension
kernel32!LoadLibraryA() will be run when execution is resumed
0:002> g
kernel32!LoadLibraryA() [conv=0 argc=4 argv=00AC0488]
kernel32!LoadLibraryA() returned 10000000
Simplest way - get .dll path and LoadLibrary it.
It will increase .dll reference count and .dll will not be released.

Can I set a breakpoint in ntdll.dll!_LdrpInitializeProcess?

When debugging a Windows process, it would sometimes be convenient to break as early as possible.
Inital Callstack looks like this: (you get this e.g. when you set a breakpoint in a DllMain function on DLL_PROCESS_ATTACH)
...
ntdll.dll!_LdrpCallInitRoutine#16() + 0x14 bytes
ntdll.dll!_LdrpRunInitializeRoutines#4() + 0x205 bytes
> ntdll.dll!_LdrpInitializeProcess#20() - 0x96d bytes
ntdll.dll!__LdrpInitialize#12() + 0x6269 bytes
ntdll.dll!_KiUserApcDispatcher#20() + 0x7 bytes
so setting a breakpoint in one of these ntdll routines should really break the process very early.
However, I can't figure out how to set a breakpoint there prior to starting the process in the debugger. Is it possible in Visual Studio (2005)? How? Can it be done in WinDbg?
I would use something like GFlags to launch the debugger when the process starts.
Here is a sample gflags settings for test.exe
And here is the debugger output. Notice the call-stack with ntdll!LdrpInitializeProcess
CommandLine: "C:\temp\test.exe"
Symbol search path is:
srv*;srvc:\symbolshttp://msdl.microsoft.com/download/symbols
Executable search path is: ModLoad:
0000000000d20000 0000000000d28000
image0000000000d20000 (1b40.464):
Break instruction exception - code
80000003 (first chance)
ntdll!LdrpDoDebuggerBreak+0x30:
0000000077c7cb60 cc int
3 0:000> k Child-SP RetAddr
Call Site 000000000012ed70
0000000077c32ef5
ntdll!LdrpDoDebuggerBreak+0x30
000000000012edb0 0000000077c11a17
ntdll!LdrpInitializeProcess+0x1b4f
000000000012f2a0 0000000077bfc32e
ntdll! ?? ::FNODOBFM::string'+0x29220
000000000012f310 00000000`00000000
ntdll!LdrInitializeThunk+0xe
Or you could open the process within the debugger like Windbg which would break into ntdll!LdrpInitializeProcess by default.
HTH
I have found out how to do it in Visual Studio.
The problem here is, that setting a breakpoint in any assembly function will be remembered as a "Data Breakpoint". These breakpoints are disabled as soon as the process stops, so even if I set one in this function (I can do this because I have the function on the stack if I set a breakpoint in any DllMain function) this breakpoint will be disabled for a new process run.
However for ntdll.dll (and kernel32.dll) the load addresses are pretty much fixed and won't change (and least not until reboot).
So, before starting the process, I just have to re-enable the Data Breakpoint for the address that corresponds to this NtDll function and the debugger will then stop there.

Resources