PCI driver failed: Detected PCI bus error on device - linux-kernel

I'm trying to do reset on a specific pci device using my own customize driver on ppc64(power pc) machine.
This driver works on another ppc64 machine.
This is the function that responsible to to do this action.
I removed several code lines to emphasize the important flow.
int reset_device(void)
{
pdev = g_reset_info.devs[ix];
err = pci_enable_device(pdev);
if (err) {
return err;
}
pci_set_master(pdev);
err = pci_save_state(pdev);
if (err) {
return err;
}
pdev = g_reset_info.devs[ix];
err = pci_set_pcie_reset_state(pdev, pcie_hot_reset);
if (err) {
return err;
}
msleep(jiffies_to_msecs(HZ/2));
msleep(jiffies_to_msecs(HZ/2));
pdev = g_reset_info.devs[ix];
err = pci_set_pcie_reset_state(pdev, pcie_deassert_reset);
if (err) {
return err;
}
pdev = g_reset_info.devs[ix];
pci_restore_state(pdev);
msleep(jiffies_to_msecs(HZ/2));
msleep(jiffies_to_msecs(HZ/2));
return 0;
}
This is the output which came from the dmesg:
mst_ppc_pci_reset_driver reset_device 63 Send hot reset to device: 0000:50:00.0
mst_ppc_pci_reset_driver reset_device 81 Deassert device: 0000:50:00.0
Call Trace:
[c000000186f92fe0] [c0000000000155ac] .show_stack+0x6c/0x198 (unreliable)
[c000000186f93090] [c000000000076a8c] .eeh_dn_check_failure+0x354/0x3f0
[c000000186f93150] [c000000000029b7c] .rtas_read_config+0x13c/0x198
[c000000186f931f0] [c00000000039c8d0] .pci_bus_read_config_word+0xa0/0xf8
[c000000186f932b0] [c0000000003a2730] .pci_find_capability+0x40/0xd0
[c000000186f93360] [c0000000003a2b6c] .pci_restore_pcie_state+0x54/0x2e8
[c000000186f93410] [c0000000003a501c] .pci_restore_state+0x84/0x1b8
[c000000186f934d0] [d000000003810384] .reset_device+0x184/0x430 [mst_ppc_pci_reset]
[c000000186f93590] [c0000000003a6254] .local_pci_probe+0x7c/0xf8
[c000000186f93620] [c0000000003a63a8] .__pci_device_probe+0xd8/0x128
[c000000186f936d0] [c0000000003a72a8] .pci_device_probe+0x38/0x68
[c000000186f93760] [c0000000004d0bd8] .really_probe+0xb0/0x288
[c000000186f93810] [c0000000004d0e4c] .driver_probe_device+0x9c/0x110
[c000000186f938a0] [c0000000004d0fbc] .__driver_attach+0xfc/0x100
[c000000186f93930] [c0000000004cfee4] .bus_for_each_dev+0xc4/0x118
[c000000186f939e0] [c0000000004d08a8] .driver_attach+0x28/0x40
[c000000186f93a60] [c0000000004cf3b0] .bus_add_driver+0x190/0x340
[c000000186f93b10] [c0000000004d1950] .driver_register+0x98/0x1b8
[c000000186f93bb0] [c0000000003a760c] .__pci_register_driver+0x64/0x140
[c000000186f93c50] [d0000000038107c0] .init+0x28/0x400 [mst_ppc_pci_reset]
[c000000186f93cd0] [c00000000000ab68] .do_one_initcall+0x68/0x1e0
[c000000186f93d90] [c00000000010893c] .SyS_init_module+0xcc/0x218
[c000000186f93e30] [c0000000000098ec] syscall_exit+0x0/0x40
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0xf (was 0xffffffff, writing 0xff)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0xe (was 0xffffffff, writing 0x0)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0xd (was 0xffffffff, writing 0x40)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0xc (was 0xffffffff, writing 0x0)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0xb (was 0xffffffff, writing 0x6115b3)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0xa (was 0xffffffff, writing 0x0)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0x9 (was 0xffffffff, writing 0x0)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0x8 (was 0xffffffff, writing 0x0)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0x7 (was 0xffffffff, writing 0x0)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0x6 (was 0xffffffff, writing 0x0)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0x5 (was 0xffffffff, writing 0x0)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0x4 (was 0xffffffff, writing 0x9e00000c)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0x3 (was 0xffffffff, writing 0x20)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0x2 (was 0xffffffff, writing 0x2070000)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0x1 (was 0xffffffff, writing 0x100146)
mst_ppc_pci_reset_driver 0000:50:00.0: restoring config space at offset 0x0 (was 0xffffffff, writing 0x101115b3)
EEH: Detected PCI bus error on device <null>
EEH: This PCI device has failed 1 times in the last hour:
EEH: Bus location=U78CB.001.WZS02VY-P1-C11-T1 driver=mst_ppc_pci_reset_driver pci addr=0000:50:00.0
EEH: Device location=U78CB.001.WZS02VY-P1-C11-T1 driver= pci addr=<null>
EEH: of node=/pci#800000020000013/pci15b3,61#0
EEH: PCI device/vendor: 101115b3
EEH: PCI cmd/status register: 00100140
EEH: PCI-E capabilities and status follow:
EEH: PCI-E 00: 0002c010
EEH: PCI-E 01: 19008fe2
EEH: PCI-E 02: 0000595e
EEH: PCI-E 03: 0043f103
EEH: PCI-E 04: 10830000
EEH: PCI-E 05: 00000000
EEH: PCI-E 06: 00000000
EEH: PCI-E 07: 00000000
EEH: PCI-E 08: 00000000
EEH: PCI-E AER capability register set follows:
EEH: PCI-E AER 00: 00010001
EEH: PCI-E AER 01: 00000000
EEH: PCI-E AER 02: 00000000
EEH: PCI-E AER 03: 00062010
EEH: PCI-E AER 04: 00000000
EEH: PCI-E AER 05: 00002000
EEH: PCI-E AER 06: 000001e4
EEH: PCI-E AER 07: 00000000
EEH: PCI-E AER 08: 00000000
EEH: PCI-E AER 09: 00000000
EEH: PCI-E AER 0a: 00000000
EEH: PCI-E AER 0b: 00000000
EEH: PCI-E AER 0c: 00000000
EEH: PCI-E AER 0d: 00000000
RTAS: event: 2736, Type: Platform Error, Severity: 2
mst_ppc_pci_reset_driver 0000:50:00.0: PME# disabled

When debugging this sort of issue it's a very good idea to track what kernel versions you are using and to provide specific details about the HW you are testing with. From the fact your kernel has eeh_dn_check_failure() rather than eeh_check_dev_failure() I can gather this is a very old kernel. Does the other system you tested with have the same kernel? Same firmware? All this is relevant to your problem.
Anyway, I'd say you need a one second wait between the de-asserting the reset and restoring config space. The PCI spec requires that system software give the device one second to initialise after a reset before attempting IO, config cycles included. In 2015 commit 26833a5029b7 ("powerpc/eeh: Make the delay for PE reset unified") added a delay after de-assert (on powerpc at least) so that would be handled for you. Considering your kernel is old enough to still have eeh_dn_check_failure() (renamed in 2012, see f8f7d63fd96e) you probably don't have that patch and need to do the wait yourself.
What's probably happening is that the device isn't ready to respond to config accesses and drops them. The hypervisor will detect a timeout and assumes the device is malfunctioning so it isolates the device (freezes it) using the EEH mechanism that IBM's Power hardware has. Normally the OS will try to un-freeze and reset the device after that happens, but that can fail for a lot of reasons, especially on older kernels.

Related

How to debug IACCVIOL where all registers have been cleared?

I'm developing a complex and intricate app for STM32F746. I stumbled upon the following hard fault and I'm not sure how to find the origin of the problem :
16:05:51.832 HardFault : ExceptionFrame { r0: 0xffffffff, r1: 0xffffffff, r2: 0xffffffff, r3: 0xffffffff, r12: 0xffffffff, lr: 0xffffffff, pc: 0xffffffff, xpsr: 0xffffffff }
16:05:51.832 UFSR : 0000000000000000
16:05:51.832 BFSR : 00000000
16:05:51.832 MMFSR : 00000001
16:05:51.832 HFSR : 01000000000000000000000000000000
The MMFSR part of the CFSR clearly indicates the error is IACCVIOL. Unfortunately, MMARVALID is not set, so I can't use the MMFAR to find the root of the issue.
Stepping through with GDB takes a huge amount of time for very little progress, as I need to start over every time the fault appears. I couldn't find a way to record/replay the session to quickly track down the issue in GDB.
Is there an approach that could help me pinpoint where the code fails ?

How can I see the full backtrace using kgdb to debug an ARM Linux module?

I worked my way through all of the free Linux training materials created by Free Electrons. In the last lab, we learn to use kgdb to remotely debug a simple crash in a loadable module. The crash is caused by a null pointer dereference in a memzero function call.
I am using Linux kernel 4.9 and a BeagleBone Black as the target, all according to the recommendations for the labs, and I've had no problems up to this point. My host is Ubuntu xenial and I am using standard packages for the ARM toolchain (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) and gdb (7.11.1-0ubuntu1~16.04) debugger.
gdb is able to read the symbol tables from vmlinux and from the module with the bug in it, which is called drvbroken.ko. The module has a bug in its init function, so it crashes immediately when I insmod it.
gdb output:
(gdb) backtrace
#0 __memzero () at arch/arm/lib/memzero.S:69
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) list 69
64 ldmeqfd sp!, {pc} # 1/2 quick exit
65 /*
66 * No need to correct the count; we're only testing bits from now on
67 */
68 tst r1, #32 # 1
69 stmneia r0!, {r2, r3, ip, lr} # 4
70 stmneia r0!, {r2, r3, ip, lr} # 4
71 tst r1, #16 # 1 16 bytes or more?
72 stmneia r0!, {r2, r3, ip, lr} # 4
73 ldr lr, [sp], #4 # 1
The result is the same whether I build the kernel with CONFIG_ARM_UNWIND (the default) or disable that and use CONFIG_FRAME_POINTER (the old method recommended by the lab notes).
I tried the same procedure in kdb, and here I see a very long backtrace that includes the calling functions. The caller of memzero is cdev_init.
kdb output:
Entering kdb (current=0xde616240, pid 106) on processor 0 Oops: (null)
due to oops # 0xc04c2be0
CPU: 0 PID: 106 Comm: insmod Tainted: G O 4.9.0-dirty #1
Hardware name: Generic AM33XX (Flattened Device Tree)
task: de616240 task.stack: de676000
PC is at __memzero+0x40/0x7c
LR is at 0x0
pc : [<c04c2be0>] lr : [<00000000>] psr: 00000013
sp : de677da4 ip : 00000000 fp : de677dbc
r10: bf000240 r9 : 219a3868 r8 : 00000000
r7 : de65c7c0 r6 : de6420c0 r5 : bf0000b4 r4 : 00000000
r3 : 00000000 r2 : 00000000 r1 : fffffffc r0 : 00000000
Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 9e69c019 DAC: 00000051
CPU: 0 PID: 106 Comm: insmod Tainted: G O 4.9.0-dirty #1
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
... pruned function calls related to kdb itself ...
[<c08326cc>] (do_page_fault) from [<c010138c>] (do_DataAbort+0x3c/0xbc)
r10:bf000240 r9:de676000 r8:de677d50 r7:00000000 r6:c08326cc r5:00000817
r4:c0d0bb2c
[<c0101350>] (do_DataAbort) from [<c0831d04>] (__dabt_svc+0x64/0xa0)
Exception stack(0xde677d50 to 0xde677d98)
7d40: 00000000 fffffffc 00000000 00000000
7d60: 00000000 bf0000b4 de6420c0 de65c7c0 00000000 219a3868 bf000240 de677dbc
7d80: 00000000 de677da4 00000000 c04c2be0 00000013 ffffffff
r8:00000000 r7:de677d84 r6:ffffffff r5:00000013 r4:c04c2be0
[<c02bf44c>] (cdev_init) from [<bf002048>] (init_module+0x48/0xb4 [drvbroken])
r5:bf002000 r4:bf000480
[<bf002000>] (init_module [drvbroken]) from [<c01018d4>] (do_one_initcall+0x44/0x180)
r5:bf002000 r4:ffffe000
[<c0101890>] (do_one_initcall) from [<c024fa2c>] (do_init_module+0x64/0x1d8)
r8:00000001 r7:de65c7c0 r6:de6420c0 r5:c0dbfa84 r4:bf000240
[<c024f9c8>] (do_init_module) from [<c01e10e8>] (load_module+0x1d6c/0x23d8)
r6:c0d0512c r5:c0dbfa84 r4:c0d4c70f
[<c01df37c>] (load_module) from [<c01e18ac>] (SyS_init_module+0x158/0x17c)
r10:00000051 r9:de676000 r8:e0a95100 r7:00000000 r6:000ac118 r5:00004100
It is pretty easy to figure out where to look for the bug with this information, but alas, it is not possible to get a line number or list the source directly from kdb. This is much easier in gdb, assuming that I can get a full backtrace.

How to read a Windows 10 BSOD mini dump analysis

I'm hoping someone here can help.
I have a new Windows 10 machine (all parts by EVGA).
I get random BSOD, so I've grabbed a mini dump, installed the SDK and looked into it. I just don't understand what it is reporting.
Can someone point me in the direction of a guide, or decode this mini dump.
Note : Each dump looks very similar. e.g. almost the same report from 'irp'
Here is the dump....
Microsoft (R) Windows Debugger Version 10.0.10586.567 X86
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\Minidump\033016-4718-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available
Symbol search path is: srv*
Executable search path is:
Windows 10 Kernel Version 10586 MP (8 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 10586.162.amd64fre.th2_release_sec.160223-1728
Machine Name:
Kernel base = 0xfffff8018d674000 PsLoadedModuleList = 0xfffff8018d952cd0
Debug session time: Wed Mar 30 18:15:33.639 2016 (UTC + 1:00)
System Uptime: 0 days 2:47:26.264
Loading Kernel Symbols
.
Press ctrl-c (cdb, kd, ntsd) or ctrl-break (windbg) to abort symbol loads that take too long.
Run !sym noisy before .reload to track down problems loading symbols.
..............................................................
................................................................
................................
Loading User Symbols
Loading unloaded module list
.............
*
Bugcheck Analysis *
*
Use !analyze -v to get detailed debugging information.
BugCheck 9F, {3, ffffe000935ea880, fffff8018f25a890, ffffe00092718bd0}
Probably caused by : ACPI.sys
Followup: MachineOwner
0: kd> !analyze -v
*
Bugcheck Analysis *
*
DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time.
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: ffffe000935ea880, Physical Device Object of the stack
Arg3: fffff8018f25a890, nt!TRIAGE_9F_POWER on Win7 and higher, otherwise the Functional Device Object of the stack
Arg4: ffffe00092718bd0, The blocked IRP
Debugging Details:
DUMP_CLASS: 1
DUMP_QUALIFIER: 400
BUILD_VERSION_STRING: 10586.162.amd64fre.th2_release_sec.160223-1728
SYSTEM_MANUFACTURER: EVGA INTERNATIONAL CO.,LTD
SYSTEM_PRODUCT_NAME: Default string
SYSTEM_SKU: Default string
SYSTEM_VERSION: Default string
BIOS_VENDOR: American Megatrends Inc.
BIOS_VERSION: 1.07
BIOS_DATE: 01/04/2016
BASEBOARD_MANUFACTURER: EVGA INTERNATIONAL CO.,LTD
BASEBOARD_PRODUCT: 111-SS-E172
BASEBOARD_VERSION: 1.0
DUMP_TYPE: 2
DUMP_FILE_ATTRIBUTES: 0x8
Kernel Generated Triage Dump
BUGCHECK_P1: 3
BUGCHECK_P2: ffffe000935ea880
BUGCHECK_P3: fffff8018f25a890
BUGCHECK_P4: ffffe00092718bd0
DRVPOWERSTATE_SUBCODE: 3
IMAGE_NAME: ACPI.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 56cbf9c9
MODULE_NAME: ACPI
FAULTING_MODULE: fffff800d5de0000 ACPI
CPU_COUNT: 8
CPU_MHZ: d50
CPU_VENDOR: GenuineIntel
CPU_FAMILY: 6
CPU_MODEL: 5e
CPU_STEPPING: 3
CPU_MICROCODE: 6,5e,3,0 (F,M,S,R) SIG: 33'00000000 (cache) 33'00000000 (init)
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT
BUGCHECK_STR: 0x9F
PROCESS_NAME: System
CURRENT_IRQL: 2
ANALYSIS_SESSION_HOST: Q-PC
ANALYSIS_SESSION_TIME: 03-30-2016 20:04:47.0460
ANALYSIS_VERSION: 10.0.10586.567 x86fre
STACK_TEXT:
fffff8018f25a858 fffff8018d854e42 : 000000000000009f 0000000000000003 ffffe000935ea880 fffff8018f25a890 : nt!KeBugCheckEx
fffff8018f25a860 fffff8018d854d62 : ffffe00096133010 fffff8018f252070 0000000000000000 fffff8018d73e0a6 : nt!PopIrpWatchdogBugcheck+0xde
fffff8018f25a8c0 fffff8018d6e22c6 : ffffe00096133048 fffff8018f25aa10 0000000000000001 0000000000000002 : nt!PopIrpWatchdog+0x32
fffff8018f25a910 fffff8018d7b951a : 0000000000000000 fffff8018d991180 fffff8018da07740 ffffe00096723800 : nt!KiRetireDpcList+0x5f6
fffff8018f25ab60 0000000000000000 : fffff8018f25b000 fffff8018f254000 0000000000000000 0000000000000000 : nt!KiIdleLoop+0x5a
STACK_COMMAND: kb
THREAD_SHA1_HASH_MOD_FUNC: 81a7ba75a791115b4f55c8910c64a260d525502e
THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 936d5c51c0ad2157bf4c85af575dd55cea2c0947
THREAD_SHA1_HASH_MOD: f08ac56120cad14894587db086f77ce277bfae84
FOLLOWUP_NAME: MachineOwner
IMAGE_VERSION: 10.0.10586.122
FAILURE_BUCKET_ID: 0x9F_3_POWER_DOWN_i8042prt_IMAGE_ACPI.sys
BUCKET_ID: 0x9F_3_POWER_DOWN_i8042prt_IMAGE_ACPI.sys
PRIMARY_PROBLEM_CLASS: 0x9F_3_POWER_DOWN_i8042prt_IMAGE_ACPI.sys
TARGET_TIME: 2016-03-30T17:15:33.000Z
OSBUILD: 10586
OSSERVICEPACK: 0
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
SUITE_MASK: 272
PRODUCT_TYPE: 1
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
OSEDITION: Windows 10 WinNt TerminalServer SingleUserTS
OS_LOCALE:
USER_LCID: 0
OSBUILD_TIMESTAMP: 2016-02-24 05:48:00
BUILDDATESTAMP_STR: 160223-1728
BUILDLAB_STR: th2_release_sec
BUILDOSVER_STR: 10.0.10586.162.amd64fre.th2_release_sec.160223-1728
ANALYSIS_SESSION_ELAPSED_TIME: 3d7
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:0x9f_3_power_down_i8042prt_image_acpi.sys
FAILURE_ID_HASH: {22a3ff34-49ca-8d37-715b-ae023b6cc9fb}
Followup: MachineOwner
0: kd> !irp ffffe00092718bd0
Irp is active with 8 stacks 6 is current (= 0xffffe00092718e08)
No Mdl: No System Buffer: Thread 00000000: Irp stack trace. Pending has been returned
cmd flg cl Device File Completion-Context
[N/A(0), N/A(0)]
0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[N/A(0), N/A(0)]
0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[N/A(0), N/A(0)]
0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[N/A(0), N/A(0)]
0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[IRP_MJ_POWER(16), IRP_MN_WAIT_WAKE(0)]
0 0 ffffe000935ea880 00000000 fffff800d6a81ec0-00000000
\Driver\ACPI i8042prt!I8xPowerUpToD0Complete
Args: 00000000 00000000 00000000 00000002
[IRP_MJ_POWER(16), IRP_MN_SET_POWER(2)]
0 e1 ffffe00093f936f0 00000000 fffff800d6ab1060-00000000 Success Error Cancel pending
\Driver\i8042prt kbdclass!KeyboardClassPowerComplete
Args: 00051100 00000001 00000001 00000002
[IRP_MJ_POWER(16), IRP_MN_SET_POWER(2)]
0 e1 ffffe00093dc95f0 00000000 fffff8018d7840b8-ffffe00096133010 Success Error Cancel pending
\Driver\kbdclass nt!PopRequestCompletion
Args: 00051100 00000001 00000001 00000002
[N/A(0), N/A(0)]
0 0 00000000 00000000 00000000-ffffe00096133010
Args: 00000000 00000000 00000000 00000000
I'm also adding a BlueScreen screen shot, incase that helps.
Now adding output from some extra commands after Martins comments...
0: kd> !devstack ffffe000935ea880
!DevObj !DrvObj !DevExt ObjectName
ffffe00093dc95f0 \Driver\kbdclass ffffe00093dc9740 InfoMask field not found for _OBJECT_HEADER at ffffe00093dc95c0
ffffe00093f936f0 \Driver\i8042prt ffffe00093f93840 InfoMask field not found for _OBJECT_HEADER at ffffe00093f936c0
> ffffe000935ea880 \Driver\ACPI ffffe000923fa8d0 Cannot read info offset from nt!ObpInfoMaskToOffset
!DevNode ffffe000935d6af0 :
DeviceInst is "ACPI\PNP0303\0"
ServiceName is "i8042prt"
!process 0 7
**** NT ACTIVE PROCESS DUMP ****
GetPointerFromAddress: unable to read from fffff8018d9f3200
Error in reading nt!_EPROCESS at 0000000000000000
0: kd> !poaction
PopAction: fffff8018d94efe0
State..........: 0 - Idle
Updates........: 0
Action.........: None
Lightest State.: Unspecified
Flags..........: 10000003 QueryApps|UIAllowed
Irp minor......: ??
System State...: Unspecified
Hiber Context..: 0000000000000000
Allocated power irps (PopIrpList - fffff8018d94f4f0)
IRP: ffffe00092718bd0 (set/D0,), PDO: ffffe000935ea880, CURRENT: ffffe00093f936f0
IRP: ffffe000971aa990
Irp worker threads (PopIrpThreadList - fffff8018d94e100)
THREAD: ffffe00091515040 (static)
THREAD: ffffe00091501800 (static)
Error resolving nt!_POP_CURRENT_BROADCAST...
Summary: Error was caused by my 10 year old Razor mouse with Windows 10.
The driver when entering power save state was freaking out and causing the blue screen.
I purchased a new mouse, removed the driver & 2 months in no more BSOD.
I usually use BlueScreenView by Nirsoft. It will get you a list of last BSOD and will show a nice view of the components. "Normally" the first mentioned component could be the reason.
Not sure, if you are looking for a solution on a specific problem or the minidump usage in general.
Some driver got problems with power state change. Make sure, you have the current Drivers installed.

Need help to understand kernel debugging error

Need help to understand kernel debugging error.
When I put my driver for Whck test for windows 8(32/64 bit), it fails CHAOS in RUN TEST.
So I did kernel debugging and got following debug message.But I don't understand where is the error in my ioctl.c file.Same driver has cleared the test for windows 7 32 bit.
*** Fatal System Error: 0x0000000a
(0x00000031,0x00000002,0x00000000,0x81CB1194)
Break instruction exception - code 80000003 (first chance)
A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.
A fatal system error has occurred.
nt!RtlpBreakWithStatusInstruction:
818d6ca4 cc int 3
2: kd> !analyze -v
Connected to Windows 8 9200 x86 compatible target at (Tue May 27 11:56:02.788 2014 (UTC - 7:00)), ptr64 FALSE
Loading Kernel Symbols
...............................................................
.............................................
Press ctrl-c (cdb, kd, ntsd) or ctrl-break (windbg) to abort symbol loads that take too long.
Run !sym noisy before .reload to track down problems loading symbols.
...................
........................
Loading User Symbols
Loading unloaded module list
.........Unable to enumerate user-mode unloaded modules, Win32 error 0n30
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 00000031, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: 81cb1194, address which referenced memory
Debugging Details:
------------------
READ_ADDRESS: 00000031
CURRENT_IRQL: 2
FAULTING_IP:
nt!VerifierKeSynchronizeExecution+26
81cb1194 0fb64631 movzx eax,byte ptr [esi+31h]
DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT
BUGCHECK_STR: AV
PROCESS_NAME: System
TRAP_FRAME: b74a594c -- (.trap 0xffffffffb74a594c)
ErrCode = 00000000
eax=9132b7d8 ebx=b23f4a38 ecx=b74a59d8 edx=9184c628 esi=00000000 edi=9184c570
eip=81cb1194 esp=b74a59c0 ebp=b74a59c4 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
nt!VerifierKeSynchronizeExecution+0x26:
81cb1194 0fb64631 movzx eax,byte ptr [esi+31h] ds:0023:00000031=??
Resetting default scope
LAST_CONTROL_TRANSFER: from 818fefc7 to 818d6ca4
STACK_TEXT:
b74a54e4 818fefc7 00000003 d565b8ac 00000031 nt!RtlpBreakWithStatusInstruction
b74a5534 818fe861 00000003 8286a340 b74a5934 nt!KiBugCheckDebugBreak+0x1c
b74a5908 818d56a6 0000000a 00000031 00000002 nt!KeBugCheck2+0x655
b74a592c 8194ed9b 0000000a 00000031 00000002 nt!KiBugCheck2+0xc6
b74a592c 81cb1194 0000000a 00000031 00000002 nt!KiTrap0E+0x1b3
b74a59c4 9132b7d8 00000000 9132ef20 b74a59d8 nt!VerifierKeSynchronizeExecution+0x26
b74a5a30 81ca1f9b 9184c570 adcf4f00 adcf4f00 OxSer!OxserInternalIoControl+0x328 [c:\users\admin\desktop\trunk\uart_v7.0\source\uart\driver\wdm\ioctl.c # 2570]
b74a5a50 81830066 81cb97fd adcf4fd4 adcf4ff8 nt!IovCallDriver+0x2e3
b74a5a64 81cb97fd b74a5a8c 81cb98f4 9184c570 nt!IofCallDriver+0x73
b74a5a6c 81cb98f4 9184c570 adcf4f00 ace85a30 nt!ViFilterIoCallDriver+0x10
b74a5a8c 81ca1f9b ace85ae8 adcf4f00 81ca27c1 nt!ViFilterDispatchGeneric+0x5e
b74a5aac 81830066 8f7eab44 ace85a30 8ad0c710 nt!IovCallDriver+0x2e3
b74a5ac0 8f7eab44 b74a5b0c b74a5b0c b74a5c14 nt!IofCallDriver+0x73
b74a5ad0 8f7ea625 001b0010 00000001 ace85a30 serenum!Serenum_IoSyncIoctlEx+0x48
b74a5c14 8f7e537d b7196ed8 b74a5c33 b7a84340 serenum!Serenum_ReenumerateDevices+0x259
b74a5c34 81866b1b b7196ed8 d565b1e8 00000000 serenum!SerenumEnumThread+0x57
b74a5c70 81950579 8f7e5326 8ad0c710 00000000 nt!PspSystemThreadStartup+0x4a
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
STACK_COMMAND: kb
FOLLOWUP_IP:
OxSer!OxserInternalIoControl+328 [c:\users\admin\desktop\trunk\uart_v7.0\source\uart\driver\wdm\ioctl.c # 2570]
9132b7d8 8b4dcc mov ecx,dword ptr [ebp-34h]
FAULTING_SOURCE_LINE: c:\users\admin\desktop\trunk\uart_v7.0\source\uart\driver\wdm\ioctl.c
FAULTING_SOURCE_FILE: c:\users\admin\desktop\trunk\uart_v7.0\source\uart\driver\wdm\ioctl.c
FAULTING_SOURCE_LINE_NUMBER: 2570
FAULTING_SOURCE_CODE:
No source found for 'c:\users\admin\desktop\trunk\uart_v7.0\source\uart\driver\wdm\ioctl.c'
SYMBOL_STACK_INDEX: 6
SYMBOL_NAME: OxSer!OxserInternalIoControl+328
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: OxSer
IMAGE_NAME: OxSer.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 53802c0f
BUCKET_ID_FUNC_OFFSET: 328
FAILURE_BUCKET_ID: AV_VRF_OxSer!OxserInternalIoControl
BUCKET_ID: AV_VRF_OxSer!OxserInternalIoControl
Followup: MachineOwner
---------
The routine that crashed was actually in the OS verifier. This is a set of function that perform additional validation on driver calls when driver development is performed in order to find driver bugs.
You are probably not crashing on Win7 because either the verifier is not turned on or the verifier was not detecting this problem in Win7. While your code is not crashing, it is probably still doing something that will cause OS instability at some point.
You should view this as Win8 helping you identify a real bug much more easily, rather than under weird circumstances after you shipped your driver.

What caused my NDIS miniport driver crashed on XP OS

I wrote a simple packet filter driver based on the example 'passthru' of the Windows DDK, when I turned on the filter function, the OS is crashed and I got the following message from the WinDbg:
Microsoft (R) Windows Debugger Version 6.12.0002.633 X86 Copyright (c)
Microsoft Corporation. All rights reserved.
Loading Dump File [D:\iCheckTool\dump\MEMORY.DMP] Kernel Summary Dump
File: Only kernel address space is available
WARNING: Whitespace at start of path element Symbol search path is:
D:\iCheckTool\dump;
SRV*E:\DebuggingSymbols*http://msdl.microsoft.com/download/symbols;SRV*C:\MyLocalSymbols*http://192.168.20.25/zfprisymbols/
Executable search path is: Windows XP Kernel Version 2600 (Service
Pack 3) MP (2 procs) Free x86 compatible Product: WinNt, suite:
TerminalServer SingleUserTS Built by: 2600.xpsp_sp3_qfe.120504-1617
Machine Name: Kernel base = 0x804d8000 PsLoadedModuleList = 0x8055e720
Debug session time: Tue Sep 11 09:41:02.828 2012 (UTC + 8:00) System
Uptime: 0 days 0:02:30.578 Loading Kernel Symbols
...............................................................
............................................................. Loading
User Symbols PEB is paged out (Peb.Ldr = 7ffd800c). Type ".hh
dbgerr001" for details Loading unloaded module list ........
*
Bugcheck Analysis *
*
Use !analyze -v to get detailed debugging information.
BugCheck C5, {4, 2, 1, 8054c10f}
Probably caused by : Pool_Corruption ( nt!ExDeferredFreePool+109 )
Followup: Pool_corruption
1: kd> !analyze -v
*
Bugcheck Analysis *
*
DRIVER_CORRUPTED_EXPOOL (c5) An attempt was made to access a pageable
(or completely invalid) address at an interrupt request level (IRQL)
that is too high. This is caused by drivers that have corrupted the
system pool. Run the driver verifier against any new (or suspect)
drivers, and if that doesn't turn up the culprit, then use gflags to
enable special pool. Arguments: Arg1: 00000004, memory referenced
Arg2: 00000002, IRQL Arg3: 00000001, value 0 = read operation, 1 =
write operation Arg4: 8054c10f, address which referenced memory
Debugging Details:
BUGCHECK_STR: 0xC5_2
CURRENT_IRQL: 2
FAULTING_IP: nt!ExDeferredFreePool+109 8054c10f 895f04 mov
dword ptr [edi+4],ebx
DEFAULT_BUCKET_ID: DRIVER_FAULT
PROCESS_NAME: explorer.exe
TRAP_FRAME: b42555dc -- (.trap 0xffffffffb42555dc) ErrCode = 00000002
eax=89cc1c60 ebx=89e4ded8 ecx=000001ff edx=89cc2a78 esi=80565d20
edi=00000000 eip=8054c10f esp=b4255650 ebp=b4255690 iopl=0 nv
up ei ng nz ac pe cy cs=0008 ss=0010 ds=0023 es=0023 fs=0030
gs=0000 efl=00010297 nt!ExDeferredFreePool+0x109: 8054c10f
895f04 mov dword ptr [edi+4],ebx
ds:0023:00000004=???????? Resetting default scope
LOCK_ADDRESS: 8055c4e0 -- (!locks 8055c4e0)
Resource # nt!PiEngineLock (0x8055c4e0) Available
Contention Count = 1 1 total locks
PNP_TRIAGE: Lock address : 0x8055c4e0 Thread Count : 0 Thread
address: 0x00000000 Thread wait : 0x0
LAST_CONTROL_TRANSFER: from 8054c10f to 80545768
STACK_TEXT: b42555dc 8054c10f badb0d00 89cc2a78 b8338538
nt!KiTrap0E+0x238 b4255690 8054c75f 00000001 8055c100 00020019
nt!ExDeferredFreePool+0x109 b42556d0 8058635e 899522e8 00000000
b42557d8 nt!ExFreePoolWithTag+0x47f b42556fc 805878b8 c0000023
00000007 8058758c nt!PiGetDeviceRegistryProperty+0x108 b425578c
bf879f40 8a523030 00000001 00000100 nt!IoGetDeviceProperty+0x25e
b42558f8 bf879735 00000000 e1b5e008 00000000
win32k!DrvEnumDisplayDevices+0x33b b425591c 8054268c 00000000 00000000
0007ecc4 win32k!NtUserEnumDisplayDevices+0x7c b425591c 7c92e514
00000000 00000000 0007ecc4 nt!KiFastCallEntry+0xfc WARNING: Frame IP
not in any known module. Following frames may be wrong. 0007f010
00000000 00000000 00000000 00000000 0x7c92e514
STACK_COMMAND: kb
FOLLOWUP_IP: nt!ExDeferredFreePool+109 8054c10f 895f04 mov
dword ptr [edi+4],ebx
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: nt!ExDeferredFreePool+109
FOLLOWUP_NAME: Pool_corruption
IMAGE_NAME: Pool_Corruption
DEBUG_FLR_IMAGE_TIMESTAMP: 0
MODULE_NAME: Pool_Corruption
FAILURE_BUCKET_ID: 0xC5_2_nt!ExDeferredFreePool+109
BUCKET_ID: 0xC5_2_nt!ExDeferredFreePool+109
Followup: Pool_corruption
Can someone tell me what caused this problem and how to fix it?
Thanks.
Apparently, you tried to write into invalid memory region (address = 0x4). Beyond this the debugger analysis you posted isn't too helpful. You can try finding your driver stack (which is not present in your posted debug output) in the debugger to get the failing code, but it's not guaranteed. Other methods to attack this include adding debug prints to your code and capturing it with DbgView (you can later extract them from the memory dump). And you can also connect kernel debugger and catch the error when it happens.

Resources