what is nt!PsActiveProcessHead? - windows

Background:
When using volatility, the variable PsActiveProcessHead can be found by KDBG scan (of a dead system), or it can be found on Windows Crash Dump (again, dead system) at _DMP_HEADER.
In a live system, the address of this symbol can be found by
(lkd>> x nt!PsActiveProcessHead)
Question:
To which windows kernel object/structure the nt!PsActiveProcessHead variable belongs to/refers to? (to which object/structure this symbol points to?)
For example, the ActiveProcessLinks which also is a _LIST_ENTRY structure (same as ActiveProcessHead) belongs to _EPROCESS object. Is there such an object for the ActiveProcessHead as well?

Yes, it also points to a doubly linked list (_LIST_ENTRY), and more precisely to _EPROCESS.ActiveProcessLinks.
Checking the doubly linked list pointed to by nt!PsActiveProcessHead:
0: kd> dt nt!_list_entry poi(nt!PsActiveProcessHead)
[ 0xffffc582`ca5c3328 - 0xfffff804`40c10680 ]
+0x000 Flink : 0xffffc582`ca5c3328 _LIST_ENTRY [ 0xffffc582`d11d1328 - 0xffffc582`ca4b15e8 ]
+0x008 Blink : 0xfffff804`40c10680 _LIST_ENTRY [ 0xffffc582`ca4b15e8 - 0xffffc582`edada368 ]
Next entry:
0: kd> dt nt!_list_entry poi(0xffffc582`ca5c3328)
[ 0xffffc582`d0023428 - 0xffffc582`ca5c3328 ]
+0x000 Flink : 0xffffc582`d0023428 _LIST_ENTRY [ 0xffffc582`d54243a8 - 0xffffc582`d11d1328 ]
+0x008 Blink : 0xffffc582`ca5c3328 _LIST_ENTRY [ 0xffffc582`d11d1328 - 0xffffc582`ca4b15e8 ]
Getting the offset at which the ActiveProcessLinkis in the _EPROCESS structure:
0: kd> ? ##c++(#FIELD_OFFSET(nt!_eprocess, ActiveProcessLinks))
Evaluate expression: 744 = 00000000`000002e8
Just confirming with the first two flinks I have in the above outputs (note: we remove the offset of ActiveProcessLinksfrom the address that we have, and then dump the ImageFileName from the EPROCESS structure). It just proves that it is really pointing to ActiveProcessLinks in _EPROCESS:
0: kd> dt nt!_eprocess 0xffffc582`ca5c3328-##c++(#FIELD_OFFSET(nt!_eprocess , ActiveProcessLinks)) ImageFileName
+0x450 ImageFileName : [15] "Registry"
0: kd> dt nt!_eprocess 0xffffc582`d0023428-##c++(#FIELD_OFFSET(nt!_eprocess , ActiveProcessLinks)) ImageFileName
+0x450 ImageFileName : [15] "csrss.exe"
Dumping the whole list:
0: kd> !list "-t nt!_eprocess.ActiveProcessLinks.Flink -e -x \"dt nt!_eprocess ImageFileName\"(poi(nt!PsActiveProcessHead) - ##c++(#FIELD_OFFSET(nt!_eprocess, ActiveProcessLinks)))"
dt nt!_EPROCESS ImageFileName 0xffffc582ca4b1300
+0x450 ImageFileName : [15] "System"
dt nt!_EPROCESS ImageFileName 0xffffc582ca5c3040
+0x450 ImageFileName : [15] "Registry"
dt nt!_EPROCESS ImageFileName 0xffffc582d11d1040
+0x450 ImageFileName : [15] "smss.exe"
dt nt!_EPROCESS ImageFileName 0xffffc582d0023140
+0x450 ImageFileName : [15] "csrss.exe"
[...snip....]
So basically it is meant to be a list of the currently active process. It points to the doubly linked list in _EPROCESS.ActiveProcessLinks.

Related

DPC_WATCHDOG_VIOLATION (133/1) Potentially related to NdisFIndicateReceiveNetBufferLists?

We have a NDIS LWF driver, and on a single machine we get a DPC_WATCHDOG_VIOLATION 133/1 bugcheck when they try to connect to their VPN to connect to the internet. This could be related to our NdisFIndicateReceiveNetBufferLists, as the IRQL is raised to DISPATCH before calling it (and obviously lowered to whatever it was afterward), and that does appear in the output of !dpcwatchdog shown below. This is done due to a workaround for another bug explained here:
IRQL_UNEXPECTED_VALUE BSOD after NdisFIndicateReceiveNetBufferLists?
Now this is the bugcheck:
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000001, The system cumulatively spent an extended period of time at
DISPATCH_LEVEL or above. The offending component can usually be
identified with a stack trace.
Arg2: 0000000000001e00, The watchdog period.
Arg3: fffff805422fb320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains
additional information regarding the cumulative timeout
Arg4: 0000000000000000
STACK_TEXT:
nt!KeBugCheckEx
nt!KeAccumulateTicks+0x1846b2
nt!KiUpdateRunTime+0x5d
nt!KiUpdateTime+0x4a1
nt!KeClockInterruptNotify+0x2e3
nt!HalpTimerClockInterrupt+0xe2
nt!KiCallInterruptServiceRoutine+0xa5
nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
nt!KiInterruptDispatchNoLockNoEtw+0x37
nt!KxWaitForSpinLockAndAcquire+0x2c
nt!KeAcquireSpinLockAtDpcLevel+0x5c
wanarp!WanNdisReceivePackets+0x4bb
ndis!ndisMIndicateNetBufferListsToOpen+0x141
ndis!ndisMTopReceiveNetBufferLists+0x3f0e4
ndis!ndisCallReceiveHandler+0x61
ndis!ndisInvokeNextReceiveHandler+0x1df
ndis!NdisMIndicateReceiveNetBufferLists+0x104
ndiswan!IndicateRecvPacket+0x596
ndiswan!ApplyQoSAndIndicateRecvPacket+0x20b
ndiswan!ProcessPPPFrame+0x16f
ndiswan!ReceivePPP+0xb3
ndiswan!ProtoCoReceiveNetBufferListChain+0x442
ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xf6
ndis!NdisMCoIndicateReceiveNetBufferLists+0x11
raspptp!CallIndicateReceived+0x210
raspptp!CallProcessRxNBLs+0x199
ndis!ndisDispatchIoWorkItem+0x12
nt!IopProcessWorkItem+0x135
nt!ExpWorkerThread+0x105
nt!PspSystemThreadStartup+0x55
nt!KiStartSystemThread+0x28
SYMBOL_NAME: wanarp!WanNdisReceivePackets+4bb
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: wanarp
IMAGE_NAME: wanarp.sys
And this following is the output of !dpcwatchdog, but I still can't find what is causing this bugcheck, and can't find which function is consuming too much time in DISPATCH level which is causing this bugcheck. Although I think this could be related to some spin locking done by wanarp? Could this be a bug with wanarp? Note that we don't use any spinlocking in our driver, and us raising the IRQL should not cause any issue as it is actually very common for indication in Ndis to be done at IRQL DISPATCH.
So How can I find the root cause of this bugcheck? There are no other third party LWF in the ndis stack.
3: kd> !dpcwatchdog
All durations are in seconds (1 System tick = 15.625000 milliseconds)
Circular Kernel Context Logger history: !logdump 0x2
DPC and ISR stats: !intstats /d
--------------------------------------------------
CPU#0
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
dpcs: no pending DPCs found
--------------------------------------------------
CPU#1
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
1: Normal : 0xfffff80542220e00 0xfffff805418dbf10 nt!PpmCheckPeriodicStart
1: Normal : 0xfffff80542231d40 0xfffff8054192c730 nt!KiBalanceSetManagerDeferredRoutine
1: Normal : 0xffffbd0146590868 0xfffff80541953200 nt!KiEntropyDpcRoutine
DPC Watchdog Captures Analysis for CPU #1.
DPC Watchdog capture size: 641 stacks.
Number of unique stacks: 1.
No common functions detected!
The captured stacks seem to indicate that only a single DPC or generic function is the culprit.
Try to analyse what other processors were doing at the time of the following reference capture:
CPU #1 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec
# RetAddr Call Site
00 fffff805418d8991 nt!KiUpdateRunTime+0x5D
01 fffff805418d2803 nt!KiUpdateTime+0x4A1
02 fffff805418db1c2 nt!KeClockInterruptNotify+0x2E3
03 fffff80541808a45 nt!HalpTimerClockInterrupt+0xE2
04 fffff805419fab9a nt!KiCallInterruptServiceRoutine+0xA5
05 fffff805419fb107 nt!KiInterruptSubDispatchNoLockNoEtw+0xFA
06 fffff805418a9a9c nt!KiInterruptDispatchNoLockNoEtw+0x37
07 fffff805418da3cc nt!KxWaitForSpinLockAndAcquire+0x2C
08 fffff8054fa614cb nt!KeAcquireSpinLockAtDpcLevel+0x5C
09 fffff80546ba1eb1 wanarp!WanNdisReceivePackets+0x4BB
0a fffff80546be0b84 ndis!ndisMIndicateNetBufferListsToOpen+0x141
0b fffff80546ba7ef1 ndis!ndisMTopReceiveNetBufferLists+0x3F0E4
0c fffff80546bddfef ndis!ndisCallReceiveHandler+0x61
0d fffff80546ba4a94 ndis!ndisInvokeNextReceiveHandler+0x1DF
0e fffff8057c32d17e ndis!NdisMIndicateReceiveNetBufferLists+0x104
0f fffff8057c30d6c7 ndiswan!IndicateRecvPacket+0x596
10 fffff8057c32d56b ndiswan!ApplyQoSAndIndicateRecvPacket+0x20B
11 fffff8057c32d823 ndiswan!ProcessPPPFrame+0x16F
12 fffff8057c308e62 ndiswan!ReceivePPP+0xB3
13 fffff80546c5c006 ndiswan!ProtoCoReceiveNetBufferListChain+0x442
14 fffff80546c5c2d1 ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xF6
15 fffff8057c2b0064 ndis!NdisMCoIndicateReceiveNetBufferLists+0x11
16 fffff8057c2b06a9 raspptp!CallIndicateReceived+0x210
17 fffff80546bd9dc2 raspptp!CallProcessRxNBLs+0x199
18 fffff80541899645 ndis!ndisDispatchIoWorkItem+0x12
19 fffff80541852b65 nt!IopProcessWorkItem+0x135
1a fffff80541871d25 nt!ExpWorkerThread+0x105
1b fffff80541a00778 nt!PspSystemThreadStartup+0x55
1c ---------------- nt!KiStartSystemThread+0x28
--------------------------------------------------
CPU#2
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
2: Normal : 0xffffbd01467f0868 0xfffff80541953200 nt!KiEntropyDpcRoutine
DPC Watchdog Captures Analysis for CPU #2.
DPC Watchdog capture size: 641 stacks.
Number of unique stacks: 1.
No common functions detected!
The captured stacks seem to indicate that only a single DPC or generic function is the culprit.
Try to analyse what other processors were doing at the time of the following reference capture:
CPU #2 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec
# RetAddr Call Site
00 fffff805418d245a nt!KeClockInterruptNotify+0x453
01 fffff80541808a45 nt!HalpTimerClockIpiRoutine+0x1A
02 fffff805419fab9a nt!KiCallInterruptServiceRoutine+0xA5
03 fffff805419fb107 nt!KiInterruptSubDispatchNoLockNoEtw+0xFA
04 fffff805418a9a9c nt!KiInterruptDispatchNoLockNoEtw+0x37
05 fffff805418a9a68 nt!KxWaitForSpinLockAndAcquire+0x2C
06 fffff8054fa611cb nt!KeAcquireSpinLockRaiseToDpc+0x88
07 fffff80546ba1eb1 wanarp!WanNdisReceivePackets+0x1BB
08 fffff80546be0b84 ndis!ndisMIndicateNetBufferListsToOpen+0x141
09 fffff80546ba7ef1 ndis!ndisMTopReceiveNetBufferLists+0x3F0E4
0a fffff80546bddfef ndis!ndisCallReceiveHandler+0x61
0b fffff80546be3a81 ndis!ndisInvokeNextReceiveHandler+0x1DF
0c fffff80546ba804e ndis!ndisFilterIndicateReceiveNetBufferLists+0x3C611
0d fffff8054e384d77 ndis!NdisFIndicateReceiveNetBufferLists+0x6E
0e fffff8054e3811a9 ourdriver+0x4D70
0f fffff80546ba7d40 ourdriver+0x11A0
10 fffff8054182a6b5 ndis!ndisDummyIrpHandler+0x100
11 fffff80541c164c8 nt!IofCallDriver+0x55
12 fffff80541c162c7 nt!IopSynchronousServiceTail+0x1A8
13 fffff80541c15646 nt!IopXxxControlFile+0xC67
14 fffff80541a0aab5 nt!NtDeviceIoControlFile+0x56
15 ---------------- nt!KiSystemServiceCopyEnd+0x25
--------------------------------------------------
CPU#3
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
dpcs: no pending DPCs found
Target machine version: Windows 10 Kernel Version 19041 MP (4 procs)
Also note that we also pass the NDIS_RECEIVE_FLAGS_DISPATCH_LEVEL flag to the NdisFIndicateReceiveNetBufferLists, if the current IRQL is dispatch.
Edit1:
This is also the output of !locks and !qlocks and !ready, And the contention count on one of the resources is 49135, is this normal or too high? Could this be related to our issue? The threads that are waiting on it or own it are for normal processes such as chrome, csrss, etc.
3: kd> !kdexts.locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks.
Resource # nt!ExpTimeRefreshLock (0xfffff80542219440) Exclusively owned
Contention Count = 17
Threads: ffffcf8ce9dee640-01<*>
KD: Scanning for held locks.....
Resource # 0xffffcf8cde7f59f8 Shared 1 owning threads
Contention Count = 62
Threads: ffffcf8ce84ec080-01<*>
KD: Scanning for held locks...............................................................................................
Resource # 0xffffcf8ce08d0890 Exclusively owned
Contention Count = 49135
NumberOfSharedWaiters = 1
NumberOfExclusiveWaiters = 6
Threads: ffffcf8cf18e3080-01<*> ffffcf8ce3faf080-01
Threads Waiting On Exclusive Access:
ffffcf8ceb6ce080 ffffcf8ce1d20080 ffffcf8ce77f1080 ffffcf8ce92f4080
ffffcf8ce1d1f0c0 ffffcf8ced7c6080
KD: Scanning for held locks.
Resource # 0xffffcf8ce08d0990 Shared 1 owning threads
Threads: ffffcf8cf18e3080-01<*>
KD: Scanning for held locks.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Resource # 0xffffcf8ceff46350 Shared 1 owning threads
Threads: ffffcf8ce6de8080-01<*>
KD: Scanning for held locks......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Resource # 0xffffcf8cf0cade50 Exclusively owned
Contention Count = 3
Threads: ffffcf8ce84ec080-01<*>
KD: Scanning for held locks.........................
Resource # 0xffffcf8cf0f76180 Shared 1 owning threads
Threads: ffffcf8ce83dc080-02<*>
KD: Scanning for held locks.......................................................................................................................................................................................................................................................
Resource # 0xffffcf8cf1875cb0 Shared 1 owning threads
Contention Count = 3
Threads: ffffcf8ce89db040-02<*>
KD: Scanning for held locks.
Resource # 0xffffcf8cf18742d0 Shared 1 owning threads
Threads: ffffcf8cee5e1080-02<*>
KD: Scanning for held locks....................................................................................
Resource # 0xffffcf8cdceeece0 Shared 2 owning threads
Contention Count = 4
Threads: ffffcf8ce3a1c080-01<*> ffffcf8ce5625040-01<*>
Resource # 0xffffcf8cdceeed48 Shared 1 owning threads
Threads: ffffcf8ce5625043-02<*> *** Actual Thread ffffcf8ce5625040
KD: Scanning for held locks...
Resource # 0xffffcf8cf1d377d0 Exclusively owned
Threads: ffffcf8cf0ff3080-02<*>
KD: Scanning for held locks....
Resource # 0xffffcf8cf1807050 Exclusively owned
Threads: ffffcf8ce84ec080-01<*>
KD: Scanning for held locks......
245594 total locks, 13 locks currently held
3: kd> !qlocks
Key: O = Owner, 1-n = Wait order, blank = not owned/waiting, C = Corrupt
Processor Number
Lock Name 0 1 2 3
KE - Unused Spare
MM - Unused Spare
MM - Unused Spare
MM - Unused Spare
CC - Vacb
CC - Master
EX - NonPagedPool
IO - Cancel
CC - Unused Spare
IO - Vpb
IO - Database
IO - Completion
NTFS - Struct
AFD - WorkQueue
CC - Bcb
MM - NonPagedPool
3: kd> !ready
KSHARED_READY_QUEUE fffff8053f1ada00: (00) ****------------------------------------------------------------
SharedReadyQueue fffff8053f1ada00: No threads in READY state
Processor 0: No threads in READY state
Processor 1: Ready Threads at priority 15
THREAD ffffcf8ce9dee640 Cid 2054.2100 Teb: 000000fab7bca000 Win32Thread: 0000000000000000 READY on processor 1
Processor 2: No threads in READY state
Processor 3: No threads in READY state
3: kd> dt nt!_ERESOURCE 0xffffcf8ce08d0890
+0x000 SystemResourcesList : _LIST_ENTRY [ 0xffffcf8c`e08d0610 - 0xffffcf8c`e08cf710 ]
+0x010 OwnerTable : 0xffffcf8c`ee6e8210 _OWNER_ENTRY
+0x018 ActiveCount : 0n1
+0x01a Flag : 0xf86
+0x01a ReservedLowFlags : 0x86 ''
+0x01b WaiterPriority : 0xf ''
+0x020 SharedWaiters : 0xffffae09`adcae8e0 Void
+0x028 ExclusiveWaiters : 0xffffae09`a9aabea0 Void
+0x030 OwnerEntry : _OWNER_ENTRY
+0x040 ActiveEntries : 1
+0x044 ContentionCount : 0xbfef
+0x048 NumberOfSharedWaiters : 1
+0x04c NumberOfExclusiveWaiters : 6
+0x050 Reserved2 : (null)
+0x058 Address : (null)
+0x058 CreatorBackTraceIndex : 0
+0x060 SpinLock : 0
3: kd> dx -id 0,0,ffffcf8cdcc92040 -r1 (*((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8ce08d08c0))
(*((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8ce08d08c0)) [Type: _OWNER_ENTRY]
[+0x000] OwnerThread : 0xffffcf8cf18e3080 [Type: unsigned __int64]
[+0x008 ( 0: 0)] IoPriorityBoosted : 0x0 [Type: unsigned long]
[+0x008 ( 1: 1)] OwnerReferenced : 0x0 [Type: unsigned long]
[+0x008 ( 2: 2)] IoQoSPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 (31: 3)] OwnerCount : 0x1 [Type: unsigned long]
[+0x008] TableSize : 0xc [Type: unsigned long]
3: kd> dx -id 0,0,ffffcf8cdcc92040 -r1 ((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8cee6e8210)
((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8cee6e8210) : 0xffffcf8cee6e8210 [Type: _OWNER_ENTRY *]
[+0x000] OwnerThread : 0x0 [Type: unsigned __int64]
[+0x008 ( 0: 0)] IoPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 ( 1: 1)] OwnerReferenced : 0x1 [Type: unsigned long]
[+0x008 ( 2: 2)] IoQoSPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 (31: 3)] OwnerCount : 0x0 [Type: unsigned long]
[+0x008] TableSize : 0x7 [Type: unsigned long]
Thanks for reporting this. I've tracked this down to an OS bug: there's a deadlock in wanarp. This issue appears to affect every version of the OS going back to Windows Vista.
I've filed internal issue task.ms/42393356 to track this: if you have a Microsoft support contract, your rep can get you status updates on that issue.
Meanwhile, you can partially work around this issue by either:
Indicating 1 packet at a time (NumberOfNetBufferLists==1); or
Indicating on a single CPU at a time
The bug in wanarp is exposed when 2 or more CPUs collectively process 3 or more NBLs at the same time. So either workaround would avoid the trigger conditions.
Depending on how much bandwidth you're pushing through this network interface, those options could be rather bad for CPU/battery/throughput. So please try to avoid pessimizing batching unless it's really necessary. (For example, you could make this an option that's off-by-default, unless the customer specifically uses wanarp.)
Note that you cannot fully prevent the issue yourself. Other drivers in the stack, including NDIS itself, have the right to group packets together, which would have the side effect re-batching the packets that you carefully un-batched. However, I believe that you can make a statistically significant dent in the crashes if you just indicate 1 NBL at a time, or indicate multiple NBLs on 1 CPU at a time.
Sorry this is happening to you again! wanarp is... a very old codebase.

How should I apply add-symbol-file command during u-boot linux boot debug?

I'm following linux bootloading using u-boot (using SPL falcon mode where u-boot-spl launches linux directly) on a qemu virtual machine. Now the code jumped to linux kernel and because I have done add-symbol-file vmlinux 0x80081000 I can follow the kernel code step by step using gdb connected to the virtual machine. Actually I loaded the kernel image to 0x80080000 but I had to set the address to 0x80081000 to make the source code appear on the gdb correctly according to the PC value(I don't know why this difference of 0x1000 is needed).
Later I found the kernel sets up the page table (identity mapping and swap table) and jumps to __primary_switched and this is where pure kernel virtual address is used first time for the PC. This is where the call is made at the end of the head.S file.
ldr x8, =__primary_switched
adrp x0, __PHYS_OFFSET
br x8
In the symbol file (vmlinux, an elf file), the symbols before __primary_switched are all mapped at virtual addresses (starting with 0xffffffc0..... high addresses) but the gdb could follow the source even when the PC value was using physical address. (The PC was initially loaded with physical address of the kernel start and PC relative jumps were being used until it jumps to __primary_switched, mmu disabled or using identity mapping) So does this mean, in doing add-symbol-file only the offset of the symbols from the start of text matters?
Another quetion : I can follow the kernel source with gdb but after __primary_switched, I cannot see the source. The debugger doesn't show the correct source location according to the now kernel virtual PC value. Should I tell the debugger to use correct offset using add-symbol-file again? if so how?
ADD (8:32 AM Wednesday, January 12, 2022, UTC)
I found from gdb manual,
"add-symbol-file filename [ -readnow | -readnever ] [ -o offset ] [
textaddress ] [ -s section address ... ] The add-symbol-file command
reads additional symbol table information from the file filename. You
would use this command when filename has been dynamically loaded (by
some other means) into the program that is running. The textaddress
parameter gives the memory address at which the file's text section
has been loaded. You can additionally specify the base address of
other sections using an arbitrary number of '-s section address'
pairs. If a section is omitted, gdb will use its default addresses as
found in filename. Any address or textaddress can be given as an
expression. ..."
I changed my program a little bit to fix a problem. The readelf shows the .text section starting at ffffffc010080800.
So I adjusted the command to "add-symbol-file vmlinux 0x80000800" and gdb shows the kernel source correct after jump to linux.
Still it doesn't show me the source code after __primary_switched.
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .head.text PROGBITS ffffffc010080000 00010000
0000000000000040 0000000000000000 AX 0 0 4
[ 2] .text PROGBITS ffffffc010080800 00010800
0000000000304370 0000000000000000 AX 0 0 2048
[ 3] .rodata PROGBITS ffffffc010390000 00320000
.... (skip) ...
[12] .notes NOTE ffffffc01045be18 003ebe18
000000000000003c 0000000000000000 A 0 0 4
[13] .init.text PROGBITS ffffffc010470000 003f0000
0000000000027ec8 0000000000000000 AX 0 0 4
[14] .exit.text PROGBITS ffffffc010497ec8 00417ec8
000000000000046c 0000000000000000 AX 0 0 4
Since '__primary_switched' resides in section .init.text, I tried adding "-s .init.text 0xffffffc010470000" or "-s .init_text 0x803ef800"(physcial
address) to the add-symbol-file command to no avail. Is my command wrong? Or could this be from page table (virtual -> Physical) problem because I see synchronous exception right after I enter __primary_switched (I see PC value has become 0x200. If the exception vector is located in 0x0, this is the vector entry for synch exception like undefined instruction. I should also check the vector base address has not been set correctly.)
I found my kernel load address was wrong (__PHYS_OFFSET was below physical ddr address start).
After fixing it, the PC increments normally with kernel virtual address and I should just apply the add-symbol-file command using the virtual address.
This was the new section addresses.
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .head.text PROGBITS ffffffc010080000 00010000
0000000000000040 0000000000000000 AX 0 0 4
[ 2] .text PROGBITS ffffffc010080800 00010800
0000000000304370 0000000000000000 AX 0 0 2048
[ 3] .rodata PROGBITS ffffffc010390000 00320000
00000000000a6385 0000000000000000 WA 0 0 4096
[ 4] .modinfo PROGBITS ffffffc010436385 003c6385
00000000000018ff 0000000000000000 A 0 0 1
[ 5] .pci_fixup PROGBITS ffffffc010437c90 003c7c90
00000000000020f0 0000000000000000 A 0 0 16
[ 6] __ksymtab PROGBITS ffffffc010439d80 003c9d80
0000000000006d20 0000000000000000 A 0 0 4
[ 7] __ksymtab_gpl PROGBITS ffffffc010440aa0 003d0aa0
0000000000005808 0000000000000000 A 0 0 4
[ 8] __ksymtab_strings PROGBITS ffffffc0104462a8 003d62a8
00000000000134f2 0000000000000000 A 0 0 1
[ 9] __param PROGBITS ffffffc0104597a0 003e97a0
0000000000000b68 0000000000000000 A 0 0 8
[10] __modver PROGBITS ffffffc01045a308 003ea308
0000000000000cf8 0000000000000000 A 0 0 8
[11] __ex_table PROGBITS ffffffc01045b000 003eb000
0000000000000e18 0000000000000000 A 0 0 8
[12] .notes NOTE ffffffc01045be18 003ebe18
000000000000003c 0000000000000000 A 0 0 4
[13] .init.text PROGBITS ffffffc010470000 003f0000
0000000000027ec8 0000000000000000 AX 0 0 4
[14] .exit.text PROGBITS ffffffc010497ec8 00417ec8
The final kernel image is loaded at 0x80080000. Then __PHYS_OFFSET becomes 0x80000000. (TEXT_OFFSET is 0x80000 by default). Now I can debug the kernel source before __primary_switch using this command.
add-symbol-file images/vmlinux 0x80080800 -s .head.text 0x80080000 -s .init.text 0x803f7800
And after the kernel entered __primary_switched (now kernel virtual address is used), I added this command to see the source and I can follow code using qemu and gdb step-by-step.
add-symbol-file images/vmlinux 0xffffffc010080800 -s .head.text 0xffffffc010080000 -s .init.text 0xffffffc010470000 Hope this helps someone later.
But after some days, I think I could just use add-symbol-file images/vmlinux 0xffffffc010080800 (applying all the section info).

Analysing dump: how to interpret Windbg command "!heap -srch <address>" output?

I'm analysing a dump of a Windows process, the dump has been created using procdump.
I'm wondering what some collections (CMap) of CAlarm objects look like (a CAlarm object is created by my own program).
In order to do this, I'm launching following commands:
1) What is the address of the type I'm interested in?
x /2 *!*CMap*CAlarm*`vftable*
00000001`3fc7b840 <application_name>!CMap<ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >,wchar_t const * __ptr64,CAlarm * __ptr64,CAlarm * __ptr64>::`vftable'
2) Which objects are of this particular type?
!heap -srch 00000001`3fc7b840
skipping searching 00000000014d6660 allocation of size 000000000002ee08 greater than 0000000000010000
skipping searching 00000000003fdf00 allocation of size 0000000000025ff0 greater than 0000000000010000
skipping searching 000000000042bf00 allocation of size 000000000001fff0 greater than 0000000000010000
=> no results, so let's do something else:
!heap -srch `3fc7b840
This gives a lot of objects, something like:
_HEAP # 920000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
00000000009214a0 0002 0002 [00] 00000000009214b0 00010 - (busy)
...
_HEAP # 860000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
00000000008612e0 00ce 00ae [00] 00000000008612f0 00cd0 - (free)
Let's now use the UserPtr entry (let's start with the last one):
0:000> dt <application_name>!CMap<ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >,wchar_t const *,CAlarm *,CAlarm *> 00000000008612f0
+0x000 __VFN_table : 0x00000000`00860158
+0x008 m_pHashTable : 0x00000000`00860158 -> 0x00000000`008612f0 CMap<ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >,wchar_t const *,CAlarm *,CAlarm *>::CAssoc
+0x010 m_nHashTableSize : 0
+0x018 m_nCount : 0n0
+0x020 m_pFreeList : (null)
+0x028 m_pBlocks : (null)
+0x030 m_nBlockSize : 0n0
This looks reasonable, but is it correct? That's quite doubtful, when looking at another entry:
0:000> dt <application_name>!CMap<ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >,wchar_t const *,CAlarm *,CAlarm *> 00000000009214b0
+0x000 __VFN_table : (null)
+0x008 m_pHashTable : (null)
+0x010 m_nHashTableSize : 0
+0x018 m_nCount : 0n1153117455832760832
+0x020 m_pFreeList : (null)
+0x028 m_pBlocks : (null)
+0x030 m_nBlockSize : 0n0
=> No way my CMap has to many elements.
It looks like I'm using the x /2 result in a wrong way as input for the !heap -srch command. Does anybody know what's the right way to do this?

How can I work out which process/thread owns the resource that my program is hanging on

I have a user mode process which is hanging when calling NtClose. That NtClose is hanging while trying to acquire a lock in the kernel. I believe it's the lock to the handle table. Here's the kernel part of the stack:
THREAD fffffa800bd4fb50 Cid 277c.21d8 Teb: 000007fffff80000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable
fffffa80047bad20 SynchronizationEvent
IRP List:
fffffa80049f49c0: (0006,0430) Flags: 00000404 Mdl: 00000000
Not impersonating
DeviceMap fffff8a000008bc0
Owning Process fffffa800c195060 Image: My_Service.exe
Attached Process N/A Image: N/A
Wait Start TickCount 455527 Ticks: 223 (0:00:00:03.478)
Context Switch Count 1703
UserTime 00:00:00.015
KernelTime 00:00:00.109
Win32 Start Address 0x000000013f509190
Stack Init fffff8800c3e0fb0 Current fffff8800c3e0790
Base fffff8800c3e1000 Limit fffff8800c3db000 Call 0
Priority 10 BasePriority 8 UnusualBoost 2 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
fffff880`0c3e07d0 fffff800`02ccc972 : fffffa80`0bd4fb50 fffffa80`0bd4fb50 fffff880`00000000 00000000`00000003 : nt!KiSwapContext+0x7a
fffff880`0c3e0910 fffff800`02cddd8f : 00000000`00000000 fffff880`0af2d400 fffff880`00000068 fffff880`0af2d408 : nt!KiCommitThreadWait+0x1d2
fffff880`0c3e09a0 fffff800`02cb7086 : 00000000`00000000 fffffa80`0000001b 00000000`00000000 fffff880`009eb100 : nt!KeWaitForSingleObject+0x19f
fffff880`0c3e0a40 fffff800`02cdc1ac : ffffffff`fd9da600 fffffa80`047bad20 fffffa80`03e1d238 00000000`00000200 : nt!ExpWaitForResource+0xae
fffff880`0c3e0ab0 fffff880`016e6f88 : 00000000`00000000 fffff8a0`0d555010 fffff880`0af2d840 fffff8a0`0a71e576 : nt!ExAcquireResourceExclusiveLite+0x14f
fffff880`0c3e0b20 fffff880`01652929 : fffffa80`06fc72c0 fffffa80`049f49c0 fffff880`0af2d550 fffffa80`0bd4fb50 : Ntfs!NtfsCommonCleanup+0x2705
fffff880`0c3e0f30 fffff800`02ccea37 : fffff880`0af2d550 00000000`00000000 00000000`00000000 00000000`00000000 : Ntfs!NtfsCommonCleanupCallout+0x19
fffff880`0c3e0f60 fffff800`02cce9f8 : 00000000`00000000 00000000`00000000 fffff880`0c3e1000 fffff800`02ce2e42 : nt!KySwitchKernelStackCallout+0x27 (TrapFrame # fffff880`0c3e0e20)
fffff880`0af2d420 fffff800`02ce2e42 : 00000000`0000277c 00000000`00000002 00000000`00000002 fffff880`042f8965 : nt!KiSwitchKernelStackContinue
fffff880`0af2d440 fffff880`016529a2 : fffff880`01652910 00000000`00000000 fffff880`0af2d800 00000000`00000000 : nt!KeExpandKernelStackAndCalloutEx+0x2a2
fffff880`0af2d520 fffff880`016f3894 : fffff880`0af2d5f0 fffff880`0af2d5f0 fffff880`0af2d5f0 fffff880`0af2d760 : Ntfs!NtfsCommonCleanupOnNewStack+0x42
fffff880`0af2d590 fffff880`01145bcf : fffff880`0af2d5f0 fffffa80`049f49c0 fffffa80`049f4da8 fffffa80`03ef5010 : Ntfs!NtfsFsdCleanup+0x144
fffff880`0af2d800 fffff880`011446df : fffffa80`04e239a0 00000000`00000000 fffffa80`048cb100 fffffa80`049f49c0 : fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
fffff880`0af2d890 fffff800`02fe3fef : fffffa80`049f49c0 fffffa80`0c195060 00000000`00000000 fffffa80`04aa93d0 : fltmgr!FltpDispatch+0xcf
fffff880`0af2d8f0 fffff800`02fd1fe4 : 00000000`00000000 fffffa80`0c195060 fffff880`01165cb0 fffff800`02c64000 : nt!IopCloseFile+0x11f
fffff880`0af2d980 fffff800`02fd1da1 : fffffa80`0c195060 fffffa80`00000001 fffff8a0`18385220 00000000`00000000 : nt!ObpDecrementHandleCount+0xb4
fffff880`0af2da00 fffff800`02fd2364 : 00000000`0000cae8 fffffa80`0c195060 fffff8a0`18385220 00000000`0000cae8 : nt!ObpCloseHandleTableEntry+0xb1
fffff880`0af2da90 fffff800`02cd61d3 : fffffa80`0bd4fb50 fffff880`0af2db60 00000001`3f64afd8 00000000`00000000 : nt!ObpCloseHandle+0x94
My question is, how can I work out which other process/thread on the system has acquired this kernel resource using windbg? (By the way I'm looking at a full system dump from a customer, I don't have this reproduced in a debugger)
So the answer was to use kdext*.locks, this shows that the thread above was deadlocked with a System thread that belonged to one of Symantec's antivirus drivers.
The locks which were causing a problem here were kernel ERESOURCE locks. There's two versions of !locks I've discovered, one for user mode critical sections and the other for kernel mode locks

Windows singly linked list (_SINGLE_LIST_ENTRY)

I'm just doing some debugging on a Windows 7 crash dump, and I've come across a singly-linked list that I'm not able to fully understand.
Here's the output from WinDBG:
dt _GENERAL_LOOKASIDE_POOL fffff80002a14800 -b
....
0x000 SingleListHead: _SINGLE_LIST_ENTRY
+0x000 Next: 0x0000000000220001
....
From what I've been reading, it seems that each singly linked list begins with a list head, which contains a pointer to the first element in the list, or null if the list is empty.
Microsoft state: MSDN article
For a SINGLE_LIST_ENTRY that serves as a list entry, the Next member
points to the next entry in the list, or NULL if there is no next
entry in the list. For a SINGLE_LIST_ENTRY that serves as the list
header, the Next member points to the first entry in the list, or NULL
if the list is empty.
I'm 99% sure this list contains some entries, but I don't understand how the value of 0x0000000000220001 is supposed to be pointing to anything. This value certainly doesn't resolve to a valid page mapping, so I can only assume it's some kind of offset. However, I'm not sure.
If anyone could help shine some light on this, I'd appreciate it.
Thanks
UPDATE
I've just found a document (translated from Chinese) that seems to explain the structure a little more. If anyone could offer some input on it, I'd appreciate it.
Lookaside List article
What I'm actually looking at is a lookaside list that Windows should be using for the allocation of IRPs, here's the full output from WinDBG (values changed from original question):
lkd> !lookaside iopsmallirplookasidelist
Lookaside "" # fffff80002a14800 "Irps"
Type = 0000 NonPagedPool
Current Depth = 0 Max Depth = 4
Size = 280 Max Alloc = 1120
AllocateMisses = 127 FreeMisses = 26
TotalAllocates = 190 TotalFrees = 90
Hit Rate = 33% Hit Rate = 71%
lkd> dt _general_lookaside fffff80002a14800 -b
ntdll!_GENERAL_LOOKASIDE
+0x000 ListHead : _SLIST_HEADER
+0x000 Alignment : 0x400001
+0x008 Region : 0xfffffa80`01e83b11
+0x000 Header8 : <unnamed-tag>
+0x000 Depth : 0y0000000000000001 (0x1)
+0x000 Sequence : 0y001000000 (0x40)
+0x000 NextEntry : 0y000000000000000000000000000000000000000 (0)
+0x008 HeaderType : 0y1
+0x008 Init : 0y0
+0x008 Reserved : 0y11111111111111111101010000000000000011110100000111011000100 (0x7fffea0007a0ec4)
+0x008 Region : 0y111
+0x000 Header16 : <unnamed-tag>
+0x000 Depth : 0y0000000000000001 (0x1)
+0x000 Sequence : 0y000000000000000000000000000000000000000001000000 (0x40)
+0x008 HeaderType : 0y1
+0x008 Init : 0y0
+0x008 Reserved : 0y00
+0x008 NextEntry : 0y111111111111111111111010100000000000000111101000001110110001 (0xfffffa8001e83b1)
+0x000 HeaderX64 : <unnamed-tag>
+0x000 Depth : 0y0000000000000001 (0x1)
+0x000 Sequence : 0y000000000000000000000000000000000000000001000000 (0x40)
+0x008 HeaderType : 0y1
+0x008 Reserved : 0y000
+0x008 NextEntry : 0y111111111111111111111010100000000000000111101000001110110001 (0xfffffa8001e83b1)
+0x000 SingleListHead : _SINGLE_LIST_ENTRY
+0x000 Next : 0x00000000`00400001
+0x010 Depth : 4
+0x012 MaximumDepth : 0x20
+0x014 TotalAllocates : 0xbe
+0x018 AllocateMisses : 0x7f
+0x018 AllocateHits : 0x7f
+0x01c TotalFrees : 0x5a
+0x020 FreeMisses : 0x1a
+0x020 FreeHits : 0x1a
+0x024 Type : 0 ( NonPagedPool )
+0x028 Tag : 0x73707249
+0x02c Size : 0x118
+0x030 AllocateEx : 0xfffff800`029c30e0
+0x030 Allocate : 0xfffff800`029c30e0
+0x038 FreeEx : 0xfffff800`029c30d0
+0x038 Free : 0xfffff800`029c30d0
+0x040 ListEntry : _LIST_ENTRY [ 0xfffff800`02a147c0 - 0xfffff800`02a148c0 ]
+0x000 Flink : 0xfffff800`02a147c0
+0x008 Blink : 0xfffff800`02a148c0
+0x050 LastTotalAllocates : 0xbe
+0x054 LastAllocateMisses : 0x7f
+0x054 LastAllocateHits : 0x7f
+0x058 Future :
[00] 0
[01] 0
lkd> !slist fffff80002a14800
SLIST HEADER:
+0x000 Header16.Sequence : 40
+0x000 Header16.Depth : 1
SLIST CONTENTS:
fffffa8001e83b10 0000000000000000 0000000000000000
0000000000000404 0000000000000000
Sorry if some of the formatting is lost. Essentially, this should be a lookaside list that contains a list of chunks that are all of the same size 0x118 (sizeof(_IRP) + sizeof(_IO_STACK_LOCATION))
However I'm not entirely sure how the list is actually put together, I'm not sure if this should be a singly linked list of memory chunks, or if I'm reading all of it incorrectly.
In case of small irp list with win7x86rtm:
lkd> !lookaside iopsmallirplookasidelist
Lookaside "" # 82d5ffc0 "Irps"
....
lkd> dt _SINGLE_LIST_ENTRY 82d5ffc0
nt!_SINGLE_LIST_ENTRY
+0x000 Next : 0x86737e30 _SINGLE_LIST_ENTRY
....
lkd> !pool 0x86737e30
Pool page 86737e30 region is Nonpaged pool
*86737e28 size: a0 previous size: 48 (Allocated) *Irp
Pooltag Irp : Io, IRP packets
The size of memory chank is a0 bytes
lkd> ?? sizeof(_pool_header)+sizeof(_single_list_entry)+sizeof(_irp)+sizeof(_io_stack_location)
unsigned int 0xa0
which include pool header, pointer, irp, stack location
Minor update:
Author Tarjei Mandt aka #kernelpool
In _GENERAL_LOOKASIDE structure, SingleListHead.Next points to the first free pool chunk on the singly-linked lookaside list. The size of the lookaside list is limited by the value of Depth, periodically adjusted by the balance set manager according to the number of hits and misses on the lookaside list. Hence, a frequently used lookaside list will have a larger Depth value than an infrequently used list. The intial Depth is 4 nt!ExMinimumLookasideDepth, with maximum being MaximumDepth (256)...more
SINGLE_LIST_ENTRY implements intrusive linked-lists. Look for struct list_head which offers similar functionnality within the linux kernel.
As for the .Next member, it really is a pointer to a SINGLE_LIST_ENTRY that is most likely embedded inside another struct.

Resources