Windows singly linked list (_SINGLE_LIST_ENTRY) - windows

I'm just doing some debugging on a Windows 7 crash dump, and I've come across a singly-linked list that I'm not able to fully understand.
Here's the output from WinDBG:
dt _GENERAL_LOOKASIDE_POOL fffff80002a14800 -b
....
0x000 SingleListHead: _SINGLE_LIST_ENTRY
+0x000 Next: 0x0000000000220001
....
From what I've been reading, it seems that each singly linked list begins with a list head, which contains a pointer to the first element in the list, or null if the list is empty.
Microsoft state: MSDN article
For a SINGLE_LIST_ENTRY that serves as a list entry, the Next member
points to the next entry in the list, or NULL if there is no next
entry in the list. For a SINGLE_LIST_ENTRY that serves as the list
header, the Next member points to the first entry in the list, or NULL
if the list is empty.
I'm 99% sure this list contains some entries, but I don't understand how the value of 0x0000000000220001 is supposed to be pointing to anything. This value certainly doesn't resolve to a valid page mapping, so I can only assume it's some kind of offset. However, I'm not sure.
If anyone could help shine some light on this, I'd appreciate it.
Thanks
UPDATE
I've just found a document (translated from Chinese) that seems to explain the structure a little more. If anyone could offer some input on it, I'd appreciate it.
Lookaside List article
What I'm actually looking at is a lookaside list that Windows should be using for the allocation of IRPs, here's the full output from WinDBG (values changed from original question):
lkd> !lookaside iopsmallirplookasidelist
Lookaside "" # fffff80002a14800 "Irps"
Type = 0000 NonPagedPool
Current Depth = 0 Max Depth = 4
Size = 280 Max Alloc = 1120
AllocateMisses = 127 FreeMisses = 26
TotalAllocates = 190 TotalFrees = 90
Hit Rate = 33% Hit Rate = 71%
lkd> dt _general_lookaside fffff80002a14800 -b
ntdll!_GENERAL_LOOKASIDE
+0x000 ListHead : _SLIST_HEADER
+0x000 Alignment : 0x400001
+0x008 Region : 0xfffffa80`01e83b11
+0x000 Header8 : <unnamed-tag>
+0x000 Depth : 0y0000000000000001 (0x1)
+0x000 Sequence : 0y001000000 (0x40)
+0x000 NextEntry : 0y000000000000000000000000000000000000000 (0)
+0x008 HeaderType : 0y1
+0x008 Init : 0y0
+0x008 Reserved : 0y11111111111111111101010000000000000011110100000111011000100 (0x7fffea0007a0ec4)
+0x008 Region : 0y111
+0x000 Header16 : <unnamed-tag>
+0x000 Depth : 0y0000000000000001 (0x1)
+0x000 Sequence : 0y000000000000000000000000000000000000000001000000 (0x40)
+0x008 HeaderType : 0y1
+0x008 Init : 0y0
+0x008 Reserved : 0y00
+0x008 NextEntry : 0y111111111111111111111010100000000000000111101000001110110001 (0xfffffa8001e83b1)
+0x000 HeaderX64 : <unnamed-tag>
+0x000 Depth : 0y0000000000000001 (0x1)
+0x000 Sequence : 0y000000000000000000000000000000000000000001000000 (0x40)
+0x008 HeaderType : 0y1
+0x008 Reserved : 0y000
+0x008 NextEntry : 0y111111111111111111111010100000000000000111101000001110110001 (0xfffffa8001e83b1)
+0x000 SingleListHead : _SINGLE_LIST_ENTRY
+0x000 Next : 0x00000000`00400001
+0x010 Depth : 4
+0x012 MaximumDepth : 0x20
+0x014 TotalAllocates : 0xbe
+0x018 AllocateMisses : 0x7f
+0x018 AllocateHits : 0x7f
+0x01c TotalFrees : 0x5a
+0x020 FreeMisses : 0x1a
+0x020 FreeHits : 0x1a
+0x024 Type : 0 ( NonPagedPool )
+0x028 Tag : 0x73707249
+0x02c Size : 0x118
+0x030 AllocateEx : 0xfffff800`029c30e0
+0x030 Allocate : 0xfffff800`029c30e0
+0x038 FreeEx : 0xfffff800`029c30d0
+0x038 Free : 0xfffff800`029c30d0
+0x040 ListEntry : _LIST_ENTRY [ 0xfffff800`02a147c0 - 0xfffff800`02a148c0 ]
+0x000 Flink : 0xfffff800`02a147c0
+0x008 Blink : 0xfffff800`02a148c0
+0x050 LastTotalAllocates : 0xbe
+0x054 LastAllocateMisses : 0x7f
+0x054 LastAllocateHits : 0x7f
+0x058 Future :
[00] 0
[01] 0
lkd> !slist fffff80002a14800
SLIST HEADER:
+0x000 Header16.Sequence : 40
+0x000 Header16.Depth : 1
SLIST CONTENTS:
fffffa8001e83b10 0000000000000000 0000000000000000
0000000000000404 0000000000000000
Sorry if some of the formatting is lost. Essentially, this should be a lookaside list that contains a list of chunks that are all of the same size 0x118 (sizeof(_IRP) + sizeof(_IO_STACK_LOCATION))
However I'm not entirely sure how the list is actually put together, I'm not sure if this should be a singly linked list of memory chunks, or if I'm reading all of it incorrectly.

In case of small irp list with win7x86rtm:
lkd> !lookaside iopsmallirplookasidelist
Lookaside "" # 82d5ffc0 "Irps"
....
lkd> dt _SINGLE_LIST_ENTRY 82d5ffc0
nt!_SINGLE_LIST_ENTRY
+0x000 Next : 0x86737e30 _SINGLE_LIST_ENTRY
....
lkd> !pool 0x86737e30
Pool page 86737e30 region is Nonpaged pool
*86737e28 size: a0 previous size: 48 (Allocated) *Irp
Pooltag Irp : Io, IRP packets
The size of memory chank is a0 bytes
lkd> ?? sizeof(_pool_header)+sizeof(_single_list_entry)+sizeof(_irp)+sizeof(_io_stack_location)
unsigned int 0xa0
which include pool header, pointer, irp, stack location
Minor update:
Author Tarjei Mandt aka #kernelpool
In _GENERAL_LOOKASIDE structure, SingleListHead.Next points to the first free pool chunk on the singly-linked lookaside list. The size of the lookaside list is limited by the value of Depth, periodically adjusted by the balance set manager according to the number of hits and misses on the lookaside list. Hence, a frequently used lookaside list will have a larger Depth value than an infrequently used list. The intial Depth is 4 nt!ExMinimumLookasideDepth, with maximum being MaximumDepth (256)...more

SINGLE_LIST_ENTRY implements intrusive linked-lists. Look for struct list_head which offers similar functionnality within the linux kernel.
As for the .Next member, it really is a pointer to a SINGLE_LIST_ENTRY that is most likely embedded inside another struct.

Related

DPC_WATCHDOG_VIOLATION (133/1) Potentially related to NdisFIndicateReceiveNetBufferLists?

We have a NDIS LWF driver, and on a single machine we get a DPC_WATCHDOG_VIOLATION 133/1 bugcheck when they try to connect to their VPN to connect to the internet. This could be related to our NdisFIndicateReceiveNetBufferLists, as the IRQL is raised to DISPATCH before calling it (and obviously lowered to whatever it was afterward), and that does appear in the output of !dpcwatchdog shown below. This is done due to a workaround for another bug explained here:
IRQL_UNEXPECTED_VALUE BSOD after NdisFIndicateReceiveNetBufferLists?
Now this is the bugcheck:
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000001, The system cumulatively spent an extended period of time at
DISPATCH_LEVEL or above. The offending component can usually be
identified with a stack trace.
Arg2: 0000000000001e00, The watchdog period.
Arg3: fffff805422fb320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains
additional information regarding the cumulative timeout
Arg4: 0000000000000000
STACK_TEXT:
nt!KeBugCheckEx
nt!KeAccumulateTicks+0x1846b2
nt!KiUpdateRunTime+0x5d
nt!KiUpdateTime+0x4a1
nt!KeClockInterruptNotify+0x2e3
nt!HalpTimerClockInterrupt+0xe2
nt!KiCallInterruptServiceRoutine+0xa5
nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
nt!KiInterruptDispatchNoLockNoEtw+0x37
nt!KxWaitForSpinLockAndAcquire+0x2c
nt!KeAcquireSpinLockAtDpcLevel+0x5c
wanarp!WanNdisReceivePackets+0x4bb
ndis!ndisMIndicateNetBufferListsToOpen+0x141
ndis!ndisMTopReceiveNetBufferLists+0x3f0e4
ndis!ndisCallReceiveHandler+0x61
ndis!ndisInvokeNextReceiveHandler+0x1df
ndis!NdisMIndicateReceiveNetBufferLists+0x104
ndiswan!IndicateRecvPacket+0x596
ndiswan!ApplyQoSAndIndicateRecvPacket+0x20b
ndiswan!ProcessPPPFrame+0x16f
ndiswan!ReceivePPP+0xb3
ndiswan!ProtoCoReceiveNetBufferListChain+0x442
ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xf6
ndis!NdisMCoIndicateReceiveNetBufferLists+0x11
raspptp!CallIndicateReceived+0x210
raspptp!CallProcessRxNBLs+0x199
ndis!ndisDispatchIoWorkItem+0x12
nt!IopProcessWorkItem+0x135
nt!ExpWorkerThread+0x105
nt!PspSystemThreadStartup+0x55
nt!KiStartSystemThread+0x28
SYMBOL_NAME: wanarp!WanNdisReceivePackets+4bb
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: wanarp
IMAGE_NAME: wanarp.sys
And this following is the output of !dpcwatchdog, but I still can't find what is causing this bugcheck, and can't find which function is consuming too much time in DISPATCH level which is causing this bugcheck. Although I think this could be related to some spin locking done by wanarp? Could this be a bug with wanarp? Note that we don't use any spinlocking in our driver, and us raising the IRQL should not cause any issue as it is actually very common for indication in Ndis to be done at IRQL DISPATCH.
So How can I find the root cause of this bugcheck? There are no other third party LWF in the ndis stack.
3: kd> !dpcwatchdog
All durations are in seconds (1 System tick = 15.625000 milliseconds)
Circular Kernel Context Logger history: !logdump 0x2
DPC and ISR stats: !intstats /d
--------------------------------------------------
CPU#0
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
dpcs: no pending DPCs found
--------------------------------------------------
CPU#1
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
1: Normal : 0xfffff80542220e00 0xfffff805418dbf10 nt!PpmCheckPeriodicStart
1: Normal : 0xfffff80542231d40 0xfffff8054192c730 nt!KiBalanceSetManagerDeferredRoutine
1: Normal : 0xffffbd0146590868 0xfffff80541953200 nt!KiEntropyDpcRoutine
DPC Watchdog Captures Analysis for CPU #1.
DPC Watchdog capture size: 641 stacks.
Number of unique stacks: 1.
No common functions detected!
The captured stacks seem to indicate that only a single DPC or generic function is the culprit.
Try to analyse what other processors were doing at the time of the following reference capture:
CPU #1 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec
# RetAddr Call Site
00 fffff805418d8991 nt!KiUpdateRunTime+0x5D
01 fffff805418d2803 nt!KiUpdateTime+0x4A1
02 fffff805418db1c2 nt!KeClockInterruptNotify+0x2E3
03 fffff80541808a45 nt!HalpTimerClockInterrupt+0xE2
04 fffff805419fab9a nt!KiCallInterruptServiceRoutine+0xA5
05 fffff805419fb107 nt!KiInterruptSubDispatchNoLockNoEtw+0xFA
06 fffff805418a9a9c nt!KiInterruptDispatchNoLockNoEtw+0x37
07 fffff805418da3cc nt!KxWaitForSpinLockAndAcquire+0x2C
08 fffff8054fa614cb nt!KeAcquireSpinLockAtDpcLevel+0x5C
09 fffff80546ba1eb1 wanarp!WanNdisReceivePackets+0x4BB
0a fffff80546be0b84 ndis!ndisMIndicateNetBufferListsToOpen+0x141
0b fffff80546ba7ef1 ndis!ndisMTopReceiveNetBufferLists+0x3F0E4
0c fffff80546bddfef ndis!ndisCallReceiveHandler+0x61
0d fffff80546ba4a94 ndis!ndisInvokeNextReceiveHandler+0x1DF
0e fffff8057c32d17e ndis!NdisMIndicateReceiveNetBufferLists+0x104
0f fffff8057c30d6c7 ndiswan!IndicateRecvPacket+0x596
10 fffff8057c32d56b ndiswan!ApplyQoSAndIndicateRecvPacket+0x20B
11 fffff8057c32d823 ndiswan!ProcessPPPFrame+0x16F
12 fffff8057c308e62 ndiswan!ReceivePPP+0xB3
13 fffff80546c5c006 ndiswan!ProtoCoReceiveNetBufferListChain+0x442
14 fffff80546c5c2d1 ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xF6
15 fffff8057c2b0064 ndis!NdisMCoIndicateReceiveNetBufferLists+0x11
16 fffff8057c2b06a9 raspptp!CallIndicateReceived+0x210
17 fffff80546bd9dc2 raspptp!CallProcessRxNBLs+0x199
18 fffff80541899645 ndis!ndisDispatchIoWorkItem+0x12
19 fffff80541852b65 nt!IopProcessWorkItem+0x135
1a fffff80541871d25 nt!ExpWorkerThread+0x105
1b fffff80541a00778 nt!PspSystemThreadStartup+0x55
1c ---------------- nt!KiStartSystemThread+0x28
--------------------------------------------------
CPU#2
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
2: Normal : 0xffffbd01467f0868 0xfffff80541953200 nt!KiEntropyDpcRoutine
DPC Watchdog Captures Analysis for CPU #2.
DPC Watchdog capture size: 641 stacks.
Number of unique stacks: 1.
No common functions detected!
The captured stacks seem to indicate that only a single DPC or generic function is the culprit.
Try to analyse what other processors were doing at the time of the following reference capture:
CPU #2 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec
# RetAddr Call Site
00 fffff805418d245a nt!KeClockInterruptNotify+0x453
01 fffff80541808a45 nt!HalpTimerClockIpiRoutine+0x1A
02 fffff805419fab9a nt!KiCallInterruptServiceRoutine+0xA5
03 fffff805419fb107 nt!KiInterruptSubDispatchNoLockNoEtw+0xFA
04 fffff805418a9a9c nt!KiInterruptDispatchNoLockNoEtw+0x37
05 fffff805418a9a68 nt!KxWaitForSpinLockAndAcquire+0x2C
06 fffff8054fa611cb nt!KeAcquireSpinLockRaiseToDpc+0x88
07 fffff80546ba1eb1 wanarp!WanNdisReceivePackets+0x1BB
08 fffff80546be0b84 ndis!ndisMIndicateNetBufferListsToOpen+0x141
09 fffff80546ba7ef1 ndis!ndisMTopReceiveNetBufferLists+0x3F0E4
0a fffff80546bddfef ndis!ndisCallReceiveHandler+0x61
0b fffff80546be3a81 ndis!ndisInvokeNextReceiveHandler+0x1DF
0c fffff80546ba804e ndis!ndisFilterIndicateReceiveNetBufferLists+0x3C611
0d fffff8054e384d77 ndis!NdisFIndicateReceiveNetBufferLists+0x6E
0e fffff8054e3811a9 ourdriver+0x4D70
0f fffff80546ba7d40 ourdriver+0x11A0
10 fffff8054182a6b5 ndis!ndisDummyIrpHandler+0x100
11 fffff80541c164c8 nt!IofCallDriver+0x55
12 fffff80541c162c7 nt!IopSynchronousServiceTail+0x1A8
13 fffff80541c15646 nt!IopXxxControlFile+0xC67
14 fffff80541a0aab5 nt!NtDeviceIoControlFile+0x56
15 ---------------- nt!KiSystemServiceCopyEnd+0x25
--------------------------------------------------
CPU#3
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
dpcs: no pending DPCs found
Target machine version: Windows 10 Kernel Version 19041 MP (4 procs)
Also note that we also pass the NDIS_RECEIVE_FLAGS_DISPATCH_LEVEL flag to the NdisFIndicateReceiveNetBufferLists, if the current IRQL is dispatch.
Edit1:
This is also the output of !locks and !qlocks and !ready, And the contention count on one of the resources is 49135, is this normal or too high? Could this be related to our issue? The threads that are waiting on it or own it are for normal processes such as chrome, csrss, etc.
3: kd> !kdexts.locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks.
Resource # nt!ExpTimeRefreshLock (0xfffff80542219440) Exclusively owned
Contention Count = 17
Threads: ffffcf8ce9dee640-01<*>
KD: Scanning for held locks.....
Resource # 0xffffcf8cde7f59f8 Shared 1 owning threads
Contention Count = 62
Threads: ffffcf8ce84ec080-01<*>
KD: Scanning for held locks...............................................................................................
Resource # 0xffffcf8ce08d0890 Exclusively owned
Contention Count = 49135
NumberOfSharedWaiters = 1
NumberOfExclusiveWaiters = 6
Threads: ffffcf8cf18e3080-01<*> ffffcf8ce3faf080-01
Threads Waiting On Exclusive Access:
ffffcf8ceb6ce080 ffffcf8ce1d20080 ffffcf8ce77f1080 ffffcf8ce92f4080
ffffcf8ce1d1f0c0 ffffcf8ced7c6080
KD: Scanning for held locks.
Resource # 0xffffcf8ce08d0990 Shared 1 owning threads
Threads: ffffcf8cf18e3080-01<*>
KD: Scanning for held locks.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Resource # 0xffffcf8ceff46350 Shared 1 owning threads
Threads: ffffcf8ce6de8080-01<*>
KD: Scanning for held locks......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Resource # 0xffffcf8cf0cade50 Exclusively owned
Contention Count = 3
Threads: ffffcf8ce84ec080-01<*>
KD: Scanning for held locks.........................
Resource # 0xffffcf8cf0f76180 Shared 1 owning threads
Threads: ffffcf8ce83dc080-02<*>
KD: Scanning for held locks.......................................................................................................................................................................................................................................................
Resource # 0xffffcf8cf1875cb0 Shared 1 owning threads
Contention Count = 3
Threads: ffffcf8ce89db040-02<*>
KD: Scanning for held locks.
Resource # 0xffffcf8cf18742d0 Shared 1 owning threads
Threads: ffffcf8cee5e1080-02<*>
KD: Scanning for held locks....................................................................................
Resource # 0xffffcf8cdceeece0 Shared 2 owning threads
Contention Count = 4
Threads: ffffcf8ce3a1c080-01<*> ffffcf8ce5625040-01<*>
Resource # 0xffffcf8cdceeed48 Shared 1 owning threads
Threads: ffffcf8ce5625043-02<*> *** Actual Thread ffffcf8ce5625040
KD: Scanning for held locks...
Resource # 0xffffcf8cf1d377d0 Exclusively owned
Threads: ffffcf8cf0ff3080-02<*>
KD: Scanning for held locks....
Resource # 0xffffcf8cf1807050 Exclusively owned
Threads: ffffcf8ce84ec080-01<*>
KD: Scanning for held locks......
245594 total locks, 13 locks currently held
3: kd> !qlocks
Key: O = Owner, 1-n = Wait order, blank = not owned/waiting, C = Corrupt
Processor Number
Lock Name 0 1 2 3
KE - Unused Spare
MM - Unused Spare
MM - Unused Spare
MM - Unused Spare
CC - Vacb
CC - Master
EX - NonPagedPool
IO - Cancel
CC - Unused Spare
IO - Vpb
IO - Database
IO - Completion
NTFS - Struct
AFD - WorkQueue
CC - Bcb
MM - NonPagedPool
3: kd> !ready
KSHARED_READY_QUEUE fffff8053f1ada00: (00) ****------------------------------------------------------------
SharedReadyQueue fffff8053f1ada00: No threads in READY state
Processor 0: No threads in READY state
Processor 1: Ready Threads at priority 15
THREAD ffffcf8ce9dee640 Cid 2054.2100 Teb: 000000fab7bca000 Win32Thread: 0000000000000000 READY on processor 1
Processor 2: No threads in READY state
Processor 3: No threads in READY state
3: kd> dt nt!_ERESOURCE 0xffffcf8ce08d0890
+0x000 SystemResourcesList : _LIST_ENTRY [ 0xffffcf8c`e08d0610 - 0xffffcf8c`e08cf710 ]
+0x010 OwnerTable : 0xffffcf8c`ee6e8210 _OWNER_ENTRY
+0x018 ActiveCount : 0n1
+0x01a Flag : 0xf86
+0x01a ReservedLowFlags : 0x86 ''
+0x01b WaiterPriority : 0xf ''
+0x020 SharedWaiters : 0xffffae09`adcae8e0 Void
+0x028 ExclusiveWaiters : 0xffffae09`a9aabea0 Void
+0x030 OwnerEntry : _OWNER_ENTRY
+0x040 ActiveEntries : 1
+0x044 ContentionCount : 0xbfef
+0x048 NumberOfSharedWaiters : 1
+0x04c NumberOfExclusiveWaiters : 6
+0x050 Reserved2 : (null)
+0x058 Address : (null)
+0x058 CreatorBackTraceIndex : 0
+0x060 SpinLock : 0
3: kd> dx -id 0,0,ffffcf8cdcc92040 -r1 (*((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8ce08d08c0))
(*((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8ce08d08c0)) [Type: _OWNER_ENTRY]
[+0x000] OwnerThread : 0xffffcf8cf18e3080 [Type: unsigned __int64]
[+0x008 ( 0: 0)] IoPriorityBoosted : 0x0 [Type: unsigned long]
[+0x008 ( 1: 1)] OwnerReferenced : 0x0 [Type: unsigned long]
[+0x008 ( 2: 2)] IoQoSPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 (31: 3)] OwnerCount : 0x1 [Type: unsigned long]
[+0x008] TableSize : 0xc [Type: unsigned long]
3: kd> dx -id 0,0,ffffcf8cdcc92040 -r1 ((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8cee6e8210)
((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8cee6e8210) : 0xffffcf8cee6e8210 [Type: _OWNER_ENTRY *]
[+0x000] OwnerThread : 0x0 [Type: unsigned __int64]
[+0x008 ( 0: 0)] IoPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 ( 1: 1)] OwnerReferenced : 0x1 [Type: unsigned long]
[+0x008 ( 2: 2)] IoQoSPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 (31: 3)] OwnerCount : 0x0 [Type: unsigned long]
[+0x008] TableSize : 0x7 [Type: unsigned long]
Thanks for reporting this. I've tracked this down to an OS bug: there's a deadlock in wanarp. This issue appears to affect every version of the OS going back to Windows Vista.
I've filed internal issue task.ms/42393356 to track this: if you have a Microsoft support contract, your rep can get you status updates on that issue.
Meanwhile, you can partially work around this issue by either:
Indicating 1 packet at a time (NumberOfNetBufferLists==1); or
Indicating on a single CPU at a time
The bug in wanarp is exposed when 2 or more CPUs collectively process 3 or more NBLs at the same time. So either workaround would avoid the trigger conditions.
Depending on how much bandwidth you're pushing through this network interface, those options could be rather bad for CPU/battery/throughput. So please try to avoid pessimizing batching unless it's really necessary. (For example, you could make this an option that's off-by-default, unless the customer specifically uses wanarp.)
Note that you cannot fully prevent the issue yourself. Other drivers in the stack, including NDIS itself, have the right to group packets together, which would have the side effect re-batching the packets that you carefully un-batched. However, I believe that you can make a statistically significant dent in the crashes if you just indicate 1 NBL at a time, or indicate multiple NBLs on 1 CPU at a time.
Sorry this is happening to you again! wanarp is... a very old codebase.

Powershell + MegaCLI - Making the output more readable

Looking for some help with making an output from a MegaCli command a bit more readable.
The output is:
PS C:\Users\Administrator> C:\Users\Administrator\Downloads\8-04-07_MegaCLI\Win_CliKL_8.04.07\MegaCliKL -LDInfo -Lall -aAll
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :OS
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 558.375 GB
Mirror Data : 558.375 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: Yes
Cache Cade Type : Read Only
Virtual Drive: 1 (Target Id: 1)
Name :Storage
RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0
Size : 7.275 TB
Parity Size : 0
State : Optimal
Strip Size : 64 KB
Number Of Drives : 4
Span Depth : 1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: Yes
Cache Cade Type : Read Only
Exit Code: 0x00
The command I'm using is:
C:\Users\Administrator\Downloads\8-04-07_MegaCLI\Win_CliKL_8.04.07\MegaCliKL -LDInfo -Lall -aAll
How can I make that information a bit more readable?
I only actually need: Name, Raid Level, Size, Number of drives, State, and Span Depth.
It has to be doable in just powershell.
Thanks in advance for any help!
Zack
If "a bit more readable" means "reduce output merely to lines starting with listed items":
$MegaCliKL = & C:\Users\Administrator\Downloads\8-04-07_MegaCLI\Win_CliKL_8.04.07\MegaCliKL -LDInfo -Lall -aAll
$listedItems = '^\s*Name',
'Raid Level',
'Size',
'Number of drives',
'State',
'Span Depth' -join '|^\s*'
$MegaCliKL -match $listedItems |
ForEach-Object {
if ( $_ -match '^\s*Name' ) {''} # line separator
$_
}
Output:
Name :OS
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 558.375 GB
State : Optimal
Number Of Drives : 2
Span Depth : 1
Name :Storage
RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0
Size : 7.275 TB
State : Optimal
Number Of Drives : 4
Span Depth : 1

what is nt!PsActiveProcessHead?

Background:
When using volatility, the variable PsActiveProcessHead can be found by KDBG scan (of a dead system), or it can be found on Windows Crash Dump (again, dead system) at _DMP_HEADER.
In a live system, the address of this symbol can be found by
(lkd>> x nt!PsActiveProcessHead)
Question:
To which windows kernel object/structure the nt!PsActiveProcessHead variable belongs to/refers to? (to which object/structure this symbol points to?)
For example, the ActiveProcessLinks which also is a _LIST_ENTRY structure (same as ActiveProcessHead) belongs to _EPROCESS object. Is there such an object for the ActiveProcessHead as well?
Yes, it also points to a doubly linked list (_LIST_ENTRY), and more precisely to _EPROCESS.ActiveProcessLinks.
Checking the doubly linked list pointed to by nt!PsActiveProcessHead:
0: kd> dt nt!_list_entry poi(nt!PsActiveProcessHead)
[ 0xffffc582`ca5c3328 - 0xfffff804`40c10680 ]
+0x000 Flink : 0xffffc582`ca5c3328 _LIST_ENTRY [ 0xffffc582`d11d1328 - 0xffffc582`ca4b15e8 ]
+0x008 Blink : 0xfffff804`40c10680 _LIST_ENTRY [ 0xffffc582`ca4b15e8 - 0xffffc582`edada368 ]
Next entry:
0: kd> dt nt!_list_entry poi(0xffffc582`ca5c3328)
[ 0xffffc582`d0023428 - 0xffffc582`ca5c3328 ]
+0x000 Flink : 0xffffc582`d0023428 _LIST_ENTRY [ 0xffffc582`d54243a8 - 0xffffc582`d11d1328 ]
+0x008 Blink : 0xffffc582`ca5c3328 _LIST_ENTRY [ 0xffffc582`d11d1328 - 0xffffc582`ca4b15e8 ]
Getting the offset at which the ActiveProcessLinkis in the _EPROCESS structure:
0: kd> ? ##c++(#FIELD_OFFSET(nt!_eprocess, ActiveProcessLinks))
Evaluate expression: 744 = 00000000`000002e8
Just confirming with the first two flinks I have in the above outputs (note: we remove the offset of ActiveProcessLinksfrom the address that we have, and then dump the ImageFileName from the EPROCESS structure). It just proves that it is really pointing to ActiveProcessLinks in _EPROCESS:
0: kd> dt nt!_eprocess 0xffffc582`ca5c3328-##c++(#FIELD_OFFSET(nt!_eprocess , ActiveProcessLinks)) ImageFileName
+0x450 ImageFileName : [15] "Registry"
0: kd> dt nt!_eprocess 0xffffc582`d0023428-##c++(#FIELD_OFFSET(nt!_eprocess , ActiveProcessLinks)) ImageFileName
+0x450 ImageFileName : [15] "csrss.exe"
Dumping the whole list:
0: kd> !list "-t nt!_eprocess.ActiveProcessLinks.Flink -e -x \"dt nt!_eprocess ImageFileName\"(poi(nt!PsActiveProcessHead) - ##c++(#FIELD_OFFSET(nt!_eprocess, ActiveProcessLinks)))"
dt nt!_EPROCESS ImageFileName 0xffffc582ca4b1300
+0x450 ImageFileName : [15] "System"
dt nt!_EPROCESS ImageFileName 0xffffc582ca5c3040
+0x450 ImageFileName : [15] "Registry"
dt nt!_EPROCESS ImageFileName 0xffffc582d11d1040
+0x450 ImageFileName : [15] "smss.exe"
dt nt!_EPROCESS ImageFileName 0xffffc582d0023140
+0x450 ImageFileName : [15] "csrss.exe"
[...snip....]
So basically it is meant to be a list of the currently active process. It points to the doubly linked list in _EPROCESS.ActiveProcessLinks.

Analysing dump: how to interpret Windbg command "!heap -srch <address>" output?

I'm analysing a dump of a Windows process, the dump has been created using procdump.
I'm wondering what some collections (CMap) of CAlarm objects look like (a CAlarm object is created by my own program).
In order to do this, I'm launching following commands:
1) What is the address of the type I'm interested in?
x /2 *!*CMap*CAlarm*`vftable*
00000001`3fc7b840 <application_name>!CMap<ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >,wchar_t const * __ptr64,CAlarm * __ptr64,CAlarm * __ptr64>::`vftable'
2) Which objects are of this particular type?
!heap -srch 00000001`3fc7b840
skipping searching 00000000014d6660 allocation of size 000000000002ee08 greater than 0000000000010000
skipping searching 00000000003fdf00 allocation of size 0000000000025ff0 greater than 0000000000010000
skipping searching 000000000042bf00 allocation of size 000000000001fff0 greater than 0000000000010000
=> no results, so let's do something else:
!heap -srch `3fc7b840
This gives a lot of objects, something like:
_HEAP # 920000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
00000000009214a0 0002 0002 [00] 00000000009214b0 00010 - (busy)
...
_HEAP # 860000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
00000000008612e0 00ce 00ae [00] 00000000008612f0 00cd0 - (free)
Let's now use the UserPtr entry (let's start with the last one):
0:000> dt <application_name>!CMap<ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >,wchar_t const *,CAlarm *,CAlarm *> 00000000008612f0
+0x000 __VFN_table : 0x00000000`00860158
+0x008 m_pHashTable : 0x00000000`00860158 -> 0x00000000`008612f0 CMap<ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >,wchar_t const *,CAlarm *,CAlarm *>::CAssoc
+0x010 m_nHashTableSize : 0
+0x018 m_nCount : 0n0
+0x020 m_pFreeList : (null)
+0x028 m_pBlocks : (null)
+0x030 m_nBlockSize : 0n0
This looks reasonable, but is it correct? That's quite doubtful, when looking at another entry:
0:000> dt <application_name>!CMap<ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >,wchar_t const *,CAlarm *,CAlarm *> 00000000009214b0
+0x000 __VFN_table : (null)
+0x008 m_pHashTable : (null)
+0x010 m_nHashTableSize : 0
+0x018 m_nCount : 0n1153117455832760832
+0x020 m_pFreeList : (null)
+0x028 m_pBlocks : (null)
+0x030 m_nBlockSize : 0n0
=> No way my CMap has to many elements.
It looks like I'm using the x /2 result in a wrong way as input for the !heap -srch command. Does anybody know what's the right way to do this?

How can I work out which process/thread owns the resource that my program is hanging on

I have a user mode process which is hanging when calling NtClose. That NtClose is hanging while trying to acquire a lock in the kernel. I believe it's the lock to the handle table. Here's the kernel part of the stack:
THREAD fffffa800bd4fb50 Cid 277c.21d8 Teb: 000007fffff80000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable
fffffa80047bad20 SynchronizationEvent
IRP List:
fffffa80049f49c0: (0006,0430) Flags: 00000404 Mdl: 00000000
Not impersonating
DeviceMap fffff8a000008bc0
Owning Process fffffa800c195060 Image: My_Service.exe
Attached Process N/A Image: N/A
Wait Start TickCount 455527 Ticks: 223 (0:00:00:03.478)
Context Switch Count 1703
UserTime 00:00:00.015
KernelTime 00:00:00.109
Win32 Start Address 0x000000013f509190
Stack Init fffff8800c3e0fb0 Current fffff8800c3e0790
Base fffff8800c3e1000 Limit fffff8800c3db000 Call 0
Priority 10 BasePriority 8 UnusualBoost 2 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
fffff880`0c3e07d0 fffff800`02ccc972 : fffffa80`0bd4fb50 fffffa80`0bd4fb50 fffff880`00000000 00000000`00000003 : nt!KiSwapContext+0x7a
fffff880`0c3e0910 fffff800`02cddd8f : 00000000`00000000 fffff880`0af2d400 fffff880`00000068 fffff880`0af2d408 : nt!KiCommitThreadWait+0x1d2
fffff880`0c3e09a0 fffff800`02cb7086 : 00000000`00000000 fffffa80`0000001b 00000000`00000000 fffff880`009eb100 : nt!KeWaitForSingleObject+0x19f
fffff880`0c3e0a40 fffff800`02cdc1ac : ffffffff`fd9da600 fffffa80`047bad20 fffffa80`03e1d238 00000000`00000200 : nt!ExpWaitForResource+0xae
fffff880`0c3e0ab0 fffff880`016e6f88 : 00000000`00000000 fffff8a0`0d555010 fffff880`0af2d840 fffff8a0`0a71e576 : nt!ExAcquireResourceExclusiveLite+0x14f
fffff880`0c3e0b20 fffff880`01652929 : fffffa80`06fc72c0 fffffa80`049f49c0 fffff880`0af2d550 fffffa80`0bd4fb50 : Ntfs!NtfsCommonCleanup+0x2705
fffff880`0c3e0f30 fffff800`02ccea37 : fffff880`0af2d550 00000000`00000000 00000000`00000000 00000000`00000000 : Ntfs!NtfsCommonCleanupCallout+0x19
fffff880`0c3e0f60 fffff800`02cce9f8 : 00000000`00000000 00000000`00000000 fffff880`0c3e1000 fffff800`02ce2e42 : nt!KySwitchKernelStackCallout+0x27 (TrapFrame # fffff880`0c3e0e20)
fffff880`0af2d420 fffff800`02ce2e42 : 00000000`0000277c 00000000`00000002 00000000`00000002 fffff880`042f8965 : nt!KiSwitchKernelStackContinue
fffff880`0af2d440 fffff880`016529a2 : fffff880`01652910 00000000`00000000 fffff880`0af2d800 00000000`00000000 : nt!KeExpandKernelStackAndCalloutEx+0x2a2
fffff880`0af2d520 fffff880`016f3894 : fffff880`0af2d5f0 fffff880`0af2d5f0 fffff880`0af2d5f0 fffff880`0af2d760 : Ntfs!NtfsCommonCleanupOnNewStack+0x42
fffff880`0af2d590 fffff880`01145bcf : fffff880`0af2d5f0 fffffa80`049f49c0 fffffa80`049f4da8 fffffa80`03ef5010 : Ntfs!NtfsFsdCleanup+0x144
fffff880`0af2d800 fffff880`011446df : fffffa80`04e239a0 00000000`00000000 fffffa80`048cb100 fffffa80`049f49c0 : fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
fffff880`0af2d890 fffff800`02fe3fef : fffffa80`049f49c0 fffffa80`0c195060 00000000`00000000 fffffa80`04aa93d0 : fltmgr!FltpDispatch+0xcf
fffff880`0af2d8f0 fffff800`02fd1fe4 : 00000000`00000000 fffffa80`0c195060 fffff880`01165cb0 fffff800`02c64000 : nt!IopCloseFile+0x11f
fffff880`0af2d980 fffff800`02fd1da1 : fffffa80`0c195060 fffffa80`00000001 fffff8a0`18385220 00000000`00000000 : nt!ObpDecrementHandleCount+0xb4
fffff880`0af2da00 fffff800`02fd2364 : 00000000`0000cae8 fffffa80`0c195060 fffff8a0`18385220 00000000`0000cae8 : nt!ObpCloseHandleTableEntry+0xb1
fffff880`0af2da90 fffff800`02cd61d3 : fffffa80`0bd4fb50 fffff880`0af2db60 00000001`3f64afd8 00000000`00000000 : nt!ObpCloseHandle+0x94
My question is, how can I work out which other process/thread on the system has acquired this kernel resource using windbg? (By the way I'm looking at a full system dump from a customer, I don't have this reproduced in a debugger)
So the answer was to use kdext*.locks, this shows that the thread above was deadlocked with a System thread that belonged to one of Symantec's antivirus drivers.
The locks which were causing a problem here were kernel ERESOURCE locks. There's two versions of !locks I've discovered, one for user mode critical sections and the other for kernel mode locks

Resources