I have the following scenario:
I am attached with the kernel debugger (Windbg) to a Hyper-V machine running Windows 10 64bit.
My process-to-be-debugged is a 32bit user-mode process which sometimes hangs the machine (it communicates with a minifilter), and therefore I cannot use user-mode debugger or remote debugger.
Now I have a symbols server, I know the process and the thread I want to investigate, how do I:
View the callstack for this thread only
Load symbols for my modules
Bonus question: For some reason, I have many instances of my program. Except the "active" one, the rest are not visible in Process Explorer, have no threads and 0 Handle count. What could cause this?
Things I tried:
!process ffffe08620a30800 7
(view all process threads)
...
THREAD **ffffe0862212f800** Cid 08a0.1cfc Teb: 0000000000d6e000 Win32Thread: 0000000000000000 RUNNING on processor 0
Not impersonating
DeviceMap ffffcb0a55817c30
Owning Process ffffe08620a30800 Image: avguard.exe
Attached Process N/A Image: N/A
Wait Start TickCount 27338460 Ticks: 1 (0:00:00:00.015)
Context Switch Count 2999214 IdealProcessor: 0
UserTime **00:17:51.125**
KernelTime 00:06:14.671
Win32 Start Address 0x00000000741bbfb4
Stack Init ffffb481db842dd0 Current ffffb481db842a10
Base ffffb481db843000 Limit ffffb481db83d000 Call 0
Priority 8 BasePriority 8 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
ffffb481`db842c40 00000000`77b1222c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x2f (TrapFrame # ffffb481`db842c40)
00000000`0333ed18 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : wow64cpu!CpupSyscallStub+0xc
...
This thread is the one I want to investigate. It has a high processor usage, as visible in bold: User Time: 17 mins. However the stack is not helpful.
Then I did:
.thread /p /r /w ffffe0862212f800
Implicit thread is now ffffe086`2212f800
Implicit process is now ffffe086`20a30800
.cache forcedecodeuser done
Loading User Symbols
.ModLoad: 00000000`009b0000 00000000`00a25000 C:\Program Files (x86)\my process.exe
.ModLoad: 00007ffb`67e20000 00007ffb`67ff1000 C:\WINDOWS\SYSTEM32\ntdll.dll
.ModLoad: 00000000`778d0000 00000000`77922000 C:\WINDOWS\System32\wow64.dll
.ModLoad: 00000000`77850000 00000000`778c7000 C:\WINDOWS\System32\wow64win.dll
.ModLoad: 00000000`77b10000 00000000`77b1a000 C:\WINDOWS\System32\wow64cpu.dll
The context is partially valid. Only x86 user-mode context is available.
x86 context set
1: kd:x86> kb
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
db842cc0 00000000 00000000 00000000 00000000 **0x819613ca**
What is this address 0x819613ca?
How can I extract the module in which this belongs? Or extract a meaningfull callstack?
How can I proceed now further with my investigation?
Related
I have a memory crash dump, and I can list processes with !process 0 0
What I want to do is find the Image Base Address of calc.exe and get its contents from the memory. Potentially saving it into a file.
what do I need to do to achieve that?
Edit: the type of dump I have is "automatic dump" but I would like to know the technique for other types such as full core dump
a dump can be of several types
what is the type of dump is it user mode or kernel mode ?
it is mindump of full dump ?
in many cases the pages may not be present either being paged out or intentionally discarded init section of modules
anyway
if user mode try !vadump or !address to locate the module of interest find its start address and end address and try dumping in page size increments (0x1000 bytes )using .writemem
in kmode use !vad
and follow both commands by lm or !dh to get the module information in both user mode and kernelmode
here is an user mode dump !address info
F:\caldump>cdb -c "!address calculator;q" -z calc.dmp | awk "/Reading/,/quit/"
0:023> cdb: Reading initial command '!address calculator;q'
Usage: Image
Base Address: 00007ff7`04a30000
End Address: 00007ff7`04a31000
Region Size: 00000000`00001000 ( 4.000 kB)
State: 00001000 MEM_COMMIT
Protect: 00000002 PAGE_READONLY
Type: 01000000 MEM_IMAGE
Allocation Base: 00007ff7`04a30000
Allocation Protect: 00000080 PAGE_EXECUTE_WRITECOPY
Image Path: C:\Program Files\WindowsApps\Microsoft.WindowsCalculator_10.1906.55.0_x64__8wekyb3d8bbwe\Calculator.exe
Module Name: Calculator
Loaded Image Name:
Mapped Image Name:
More info: lmv m Calculator
More info: !lmi Calculator
More info: ln 0x7ff704a30000
More info: !dh 0x7ff704a30000
Since !process 0 0 works then it's a kernel dump. Try inspecting the peb
https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/-peb
I am familiar with using windbg or IDA for remote kernel debugging, but right now i have extracted a kernel driver from an executable, and have done static analysis on its IDB and renamed a lot of variables, what is the easiest way of using my IDB file to debug the driver on the remote debugee when it gets loaded by the executable?
I know how to attach to remote kernel using IDA, but how can i use my current IDB file, and put breakpoint on some of its functions so it they get hit when the driver is loaded? (I dont have the corresponding pdb file for the driver so i can't use symbols for breakpoint)
this is a vanilla windbg answer to break on any DriverInit
once you have Broken on DriverInit You Can Lookup and Set bp on all MajorFunctions
Assuming you have a regular kd Connection
use sxe -ibp;.reboot to reboot the target
on reconnection the target will break very Early as below
kd> sxe ibp;.reboot
Shutdown occurred at (Sun Oct 18 02:58:09.077 2020 )...unloading all symbol tables.
Waiting to reconnect...
Connected to Windows 7 7601 x86 compatible target at (xxx), ptr64 FALSE
Kernel Debugger connection established. (Initial Breakpoint requested)
once broken set a breakpoint on nt!IopLoadDriver
inside this function search for an indirect Call
that Calls the _DRIVER_OBJECT->DriverInit
kd> ?? #FIELD_OFFSET(nt!_DRIVER_OBJECT , DriverInit)
long 0x2c
like
nt!IopLoadDriver+0x7ea:
829d5355 ff562c call dword ptr[esi+2Ch] ds:84f2928c={cdrom!FxDriverEntry (87eb53cf)}
set a break point here to
you are now set to enter almost every driver that is loaded
once you are on entrypoint of any Driver
use the DriverObject (an argument the DriverEntry Takes )
and Set Breakpoints on each MAJORFunction
kd> bp . "du poi(#esi+1c+4);gc"
kd> bl
0 e Disable Clear 829d4b6a 0001 (0001) nt!IopLoadDriver
1 e Disable Clear 829d5355 0001 (0001) nt!IopLoadDriver+0x7ea "du poi(#esi+1c+4);gc"
kd> bd 0
kd> bl
0 d Enable Clear 829d4b6a 0001 (0001) nt!IopLoadDriver
1 e Disable Clear 829d5355 0001 (0001) nt!IopLoadDriver+0x7ea "du poi(#esi+1c+4);gc"
kd> g
841bd1d0 "\Driver\Null.Ѕ捁印䍁䥐停偎〰〰"
84f18718 "\Driver\Beep.Б浍摌䂈蓶䈸蓶...."
84eef210 "\Driver\VgaSave"
84eb2860 "\Driver\RDPCDDᛛ..В浍慃憠褎.蓫菌蓲"
84e903c0 "\Driver\RDPENCDD..浍摌읨蓤潤獷獜獹整.尲牤癩牥"
84e90400 "屳摲数据摤献獹"
84ef15c0 "\Driver\RDPREFMP..牉..蓧"
84ef4a78 "\FileSystem\Msfs.В浍慃冀褘蝴蓳荤蓶"
84f191f0 "\FileSystem\Npfs.З獍䑆.°"
I have a full dump of a VM with windows 10 installed. This dump was taken from a hard hanged system, frozen mouse and keyboard, totally unresponsive.
While analyzing I found that there are no thread in running or ready state. No deadlocks. Only suspicious thing is that there are a lot of thread waiting for a reply from ALPC and also there is page fault pattern in as lot of threads that looks like this:
ffffbc0f`151380f0 fffff805`1e4e081c Ntfs!NtfsNonCachedIo+0x4ea
ffffbc0f`151383b0 fffff805`1e4df8bc Ntfs!NtfsCommonRead+0xd2c
ffffbc0f`151385b0 fffff805`19687d3a Ntfs!NtfsFsdRead+0x1fc
ffffbc0f`15138680 fffff805`19687ce7 nt!IopfCallDriver+0x46
ffffbc0f`151386c0 fffff805`1d926ccf nt!IofCallDriver+0x17
ffffbc0f`151386f0 fffff805`1d9248d3 FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x28f
ffffbc0f`15138760 fffff805`19687d3a FLTMGR!FltpDispatch+0xa3
ffffbc0f`151387c0 fffff805`19687ce7 nt!IopfCallDriver+0x46
ffffbc0f`15138800 fffff805`196215b2 nt!IofCallDriver+0x17
ffffbc0f`15138830 fffff805`196221e2 nt!IoPageReadEx+0x1e6
ffffbc0f`151388a0 fffff805`19622eee nt!MiIssueHardFaultIo+0xb6
ffffbc0f`151388f0 fffff805`19666566 nt!MiIssueHardFault+0x48e
ffffbc0f`151389f0 fffff805`197aba1e nt!MmAccessFault+0x276
ffffbc0f`15138b00 00007ffd`2e42ec10 nt!KiPageFault+0x35e (TrapFrame # ffffbc0f`15138b00)
also almost every thread (maybe I've seen 1 or 2 that don't) in every process ends with this:
ffffbc0f`15d08df0 fffff805`1966aad4 nt!KiSwapContext+0x76
ffffbc0f`15d08f30 fffff805`196657ca nt!KiSwapThread+0x190
ffffbc0f`15d08fa0 fffff805`19666fb0 nt!KiCommitThreadWait+0x13a
ffffbc0f`15d09050 fffff805`1e4e261a nt!KeWaitForSingleObject+0x140
I have one particular example of a thread with a page fault belonging to a prl_tools_service.exe (which is Parallels VM related service) that has same pattern and when looking into trap frame at the moment of KiPageFault there was an attempt to get value from an address in eax and in trap frame rax=0000000000000001 which can't be a valid address and I can't see how this page fault can be resolved.
IRQLs of both processors are LOW_LEVEL
The question, basically, is - where should I look for any faults since there must be a kernel problem (hence mouse and keyboard freeze) and how do I find wether this example of page fault pattern could stall the kernel.
Since the question is pretty vague any kind of response, a direction where to look, a hint - every thing will be much appreciated
UPD: as requested by Lieven Keersmaekers and blabb here is !analyze -hang output:
0: kd> !analyze -hang
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
Unknown bugcheck code (0)
Unknown bugcheck description
Arguments:
Arg1: 0000000000000000
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000
Debugging Details:
------------------
Scanning for threads blocked on locks ...
Cannot get _ERESOURCE type
BUGCHECK_CODE: 0
BUGCHECK_P1: 0
BUGCHECK_P2: 0
BUGCHECK_P3: 0
BUGCHECK_P4: 0
PROCESS_NAME: System
ERROR_CODE: (NTSTATUS) 0x45474150 - <Unable to get error code text>
SYMBOL_NAME: nt!PpmIdleGuestExecute+1d
MODULE_NAME: nt
IMAGE_NAME: ntkrnlmp.exe
FAILURE_BUCKET_ID: 0x0_STACKPTR_ERROR_nt!PpmIdleGuestExecute
FAILURE_ID_HASH: {94784d45-ed21-c95f-fc42-87fec626bbee}
Followup: MachineOwner
---------
One of the users of my command line application has reported what appears to be an infinite loop. They helpfully took a dump of the process (via Task Manager) while it was in this state and sent it to me.
I'm not sure how to get useful information out of this dump. My normal technique of windbg -z the-dump-file.dmp -y releases\v5.0.0 -i releases\v5.0.0 doesn't give me much information that I know how to interpret. Are there ghc-specific tools I can use instead?
Moving forward, are the build options I should add or other things I should do to my release process to make this kind of post-mortem debugging more fruitful?
Here's an example of the stacks that I'm seeing. Not much useful info, especially for someone used to debugging C/C++ code in WinDbg. :-)
0 Id: 112dc.cc18 Suspend: 1 Teb: 00000000`00341000 Unfrozen
*** ERROR: Module load completed but symbols could not be loaded for gbc.exe
# Child-SP RetAddr Call Site
00 00000000`01b7d8d0 00000000`01049f71 gbc+0xc5676e
01 00000000`01b7d930 00000000`0104b5b4 gbc+0xc49f71
02 00000000`01b7d9a0 00000000`0104c644 gbc+0xc4b5b4
03 00000000`01b7da60 00000000`0104c1fa gbc+0xc4c644
04 00000000`01b7dab0 00000000`0042545b gbc+0xc4c1fa
05 00000000`01b7db30 00000000`011c40a0 gbc+0x2545b
06 00000000`01b7db38 00000000`0535bee1 gbc+0xdc40a0
07 00000000`01b7db40 00000000`010ffd80 0x535bee1
08 00000000`01b7db48 00000000`0535bee1 gbc+0xcffd80
09 00000000`01b7db50 00007ffb`3581fb01 0x535bee1
0a 00000000`01b7db58 00007ffb`3581b850 imm32!?MSCTF_NULL_THUNK_DATA_DLB+0x2e9
0b 00000000`01b7db60 00000000`00000010 imm32!CtfImmGetCompatibleKeyboardLayout
0c 00000000`01b7db68 00000000`00000000 0x10
1 Id: 112dc.d324 Suspend: 1 Teb: 00000000`00349000 Unfrozen
# Child-SP RetAddr Call Site
00 00000000`05c2fc48 00007ffb`36441563 ntdll!ZwWaitForWorkViaWorkerFactory+0x14
01 00000000`05c2fc50 00007ffb`34172774 ntdll!TppWorkerThread+0x293
02 00000000`05c2ff60 00007ffb`36470d61 kernel32!BaseThreadInitThunk+0x14
03 00000000`05c2ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x21
2 Id: 112dc.11b48 Suspend: 1 Teb: 00000000`0034b000 Unfrozen
# Child-SP RetAddr Call Site
00 00000000`0642dd38 00007ffb`32f2988f ntdll!ZwWaitForSingleObject+0x14
01 00000000`0642dd40 00000000`00ffca15 KERNELBASE!WaitForSingleObjectEx+0x9f
02 00000000`0642dde0 00000000`00000000 gbc+0xbfca15
Some resources that might be useful. (If there are more up-to-date ones, I would like to see them myself.)
https://ghc.haskell.org/trac/ghc/wiki/Debugging/CompiledCode
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/debug-info.html
https://wiki.haskell.org/Debugging
A few important nuggets:
The runtime flag +RTS -? Will tell you what runtime flags add debugging information. These will start with +RTS -D. For example, +RTS -DS turns on a number of runtime assertions and sanity checks.
The strange names you see are encoded in something called Z-encoding. This is defined at https://ghc.haskell.org/trac/ghc/browser/ghc/compiler/cmm/CLabel.hs.
If you can recompile the code with debugging symbols on and threading off, and still reproduce the bug, you can set breakpoints (or hit control-C) inside the debugger and backtrace from there. You can examine memory with a command like print/a 0x006eb0c0 (although you seem to be using 64-bit pointers). You can see the assembly-language instruction that crashed with disassemble.
You need to use the -ddump-stg compile flag to see what the variable names mean, because that is the last phase of the transformation before the program is assembled, and the variable names you see in the debugger correspond to the ones here.
You can instrument the code with Debug.Trace.
The question may be a bit awkward, but here's my detailed problem:
Currently I'm looking into setting up SysInternals' procdump.exe to monitor an application of ours that exhibits spurious disappearances -- that is, the user reports that the application is simply "gone" without any trace after a short visible hang of the application's window.
My first idea was to run procdump -e -x . MyApp.exe which would record a crash dump when the application encounters an unhandled exception, but then I saw that there is also a -t switch, that --
-t - Write a dump when the process terminates.
automatically generates a dump when the process terminates.
Now the problem
I have tested the -t switch with our app by inserting a ExitProcess or TerminateProcess call at a defined location where I can trigger it.
While the app behaves as expected, i.e. TerminateProcess immediately "kills" the running app and ExitProcess takes a while because global cleanup is run, the dump generated this way is useless in both cases.
The dumps I get for -t always contain only a sinlge thread (where the app was running over 20 thread at termination time) and the callstack isn't even at a useful location. (It just seems to be one random thread from the terminated app.)
Am I doing something wrong? Can I usefully use procdump -t to track down unexpected calls of process exit functions at all?
Can I usefully use procdump -t to track down unexpected calls of
process exit functions at all?
I think not and here's why:
test process calc.exe
CommandLine: "C:\Program Files\Sysinternals\procdump.exe" -t calc.exe
I try to carefully suggest that procdump is waiting on calc.exe process handle.
0:000> kb
ChildEBP RetAddr Args to Child
0017f2e0 77135e6c 75336872 00000002 0017f334 ntdll!KiFastSystemCallRet
0017f2e4 75336872 00000002 0017f334 00000001 ntdll!NtWaitForMultipleObjects+0xc
0017f380 76cbf14a 0017f334 0017f3a8 00000000 KERNELBASE!WaitForMultipleObjectsEx+0x100
0017f3c8 76cbf2c2 00000002 7ffdb000 00000000 kernel32!WaitForMultipleObjectsExImplementation+0xe0
0017f3e4 011c6135 00000002 0017f46c 00000000 kernel32!WaitForMultipleObjects+0x18
WARNING: Stack unwind information not available. Following frames may be wrong.
0017fc30 011c999e 00000003 013d1de0 013d1e78 procdump+0x6135
0017fc78 76cc1194 7ffdb000 0017fcc4 7714b495 procdump+0x999e
0017fc84 7714b495 7ffdb000 77ad79b5 00000000 kernel32!BaseThreadInitThunk+0xe
0017fcc4 7714b468 011c99f5 7ffdb000 00000000 ntdll!__RtlUserThreadStart+0x70
0017fcdc 00000000 011c99f5 7ffdb000 00000000 ntdll!_RtlUserThreadStart+0x1b
0:000> dd 17f46c
0017f46c 00000238 00000268
0:000> !handle 238 f
Handle 238
Type Process
Attributes 0
GrantedAccess 0x1fffff:
Delete,ReadControl,WriteDac,WriteOwner,Synch
Terminate,CreateThread,,VMOp,VMRead,VMWrite,DupHandle,CreateProcess,SetQuota,SetInfo,QueryInfo,SetPort
HandleCount 5
PointerCount 52
Name <none>
Object Specific Information
Process Id 1580
Parent Process 2476
Base Priority 8
In the crash dump file gets stack last complete process thread (TID 3136) just before the end of the process.
0:000> ~
. 0 Id: dc8.c40 Suspend: -1 Teb: 7ffdd000 Unfrozen
0:000> .formats c40
Evaluate expression:
Hex: 00000c40
Decimal: 3136
Crash dump file is created after the completion of the last thread, and before the end of the process.