I am working on my embedded application on Ubuntu 16.0.4. I have written a fan monitoring driver for my board. The driver faults the kernel and system has to be rebooted.
I enabled kdump and using crash utility to analyze the stack trace.
For me the trace looks like:
crash> bt
PID: 2935 TASK: c01a8000 CPU: 3 COMMAND: "mk7i"
#0 [e309ddbc] crash_kexec at c10fa4ce
#1 [e309de1c] path_openat at c11e812b
#2 [e309de94] do_page_fault at c105de55
#3 [e309dea4] error_code (via page_fault) at c17a9185
EAX: 00000400 EBX: 037ea1e0 ECX: f0a746c4 EDX: 00000000 EBP: e309df00
DS: 007b ESI: 00000000 ES: 007b EDI: e3f47000 GS: 00e0
CS: 0060 EIP: f0a718f4 ERR: ffffffff EFLAGS: 00210212
#4 [e309ded8] fmon_read_value at f0a718f4 [fmon]
#5 [e309dee8] security_file_permission at c12f73ae
#6 [e309df04] proc_reg_read at c1239c2b
#7 [e309df24] __vfs_read at c11da81d
#8 [e309df38] vfs_read at c11dae8a
#9 [e309df5c] sys_read at c11db92c
#10 [e309df84] do_fast_syscall_32 at c1003936
#11 [e309dfb0] sysenter_past_esp at c17a8093
EAX: ffffffda EBX: 00000012 ECX: 037ea1e0 EDX: 00000400
DS: 007b ESI: 037f6360 ES: 007b EDI: ffffff98
SS: 007b ESP: a802a834 EBP: a802a888 GS: 0033
CS: 0073 EIP: b76f8c31 ERR: 00000003 EFLAGS: 00200292
The function "fmon_read_value" , that causes fault is in my driver module.
I would like to know how can I map the address in that function/module to source line number ? In this case, the address being f0a718f4
you could use gdb to find the line using gdb frame for example.
Here is probably a good documentation how to do that:
https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks#Using_GDB_to_find_the_location_where_your_kernel_panicked_or_oopsed.
The "-l" option for disassemble will do the trick. For example:
crash> dis -lr f0a718f4
Related
I am trying to debug the embedded Linux kernel 4.12.28, this crashes and shows kernel panic. I have put some prints inside blocks/genhd.c to print the disk_name. I see that it crashes inside - "bdget_disk" while getting the bdgt_disk for disk_name "ram0". I am using a power pc architecture.
I am a bit puzzled on how should I approach this issue or how can I debug? I am unable to understand the root cause. My understanding is that ram0 is like a RAM disk needed for initial bootup by the system. The initrd contents will be copied to ram0 for bootup. I don't understand why it crashes. I can make out that it is related to bad address. But what is the real reason and how to
debug this?
My defconfig has -
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=32768
The kernel panic logs are -
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 16, base_baud = 19531250) is a 16550A
serial8250.0: ttyS1 at MMIO 0xe0004600 (irq = 17, base_baud = 19531250) is a 16550A
console [ttyS1] enabled
console [ttyS1] enabled
bootconsole [udbg0] disabled
bootconsole [udbg0] disabled
Custom Debug..DEBUG: Passed bdget_disk 765
Custom Debug.. the disc name is ram0
Unable to handle kernel paging request for data at address 0x00005484
Faulting instruction address: 0xc0100154
Oops: Kernel access of bad area, sig: 11 [#1]
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.12.28-standard #1
task: df416a60 task.stack: df42a000
NIP: c0100154 LR: c011d158 CTR: c02bd990
REGS: df42bcb0 TRAP: 0300 Not tainted (4.12.28-standard)
MSR: 00009032 <EE,ME,IR,DR,RI>
CR: 242c0484 XER: 00000000
DAR: 00005484 DSISR: 20000000
GPR00: c02406bc df42bd60 df416a60 df407800 00000001 c011cb4c c011cb64 df42bd68
GPR08: 00000005 00000001 c0660000 00000000 222c0824 00000000 c00040f0 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0660000 c05fa2dc
GPR24: 00000007 00000093 df50d00c df50d060 00000000 df50d058 00000000 df50d040
NIP [c0100154] iget5_locked+0xc/0x250
LR [c011d158] bdget+0x40/0xf4
Call Trace:
[df42bd60] [c0652708] log_wait+0x0/0x8 (unreliable)
[df42bd80] [c02406bc] bdget_disk+0xac/0xf8
[df42bda0] [c0241780] device_add_disk+0x3f4/0x43c
[df42bdf0] [c060ed5c] brd_init+0xa8/0x184
[df42be20] [c0003a5c] do_one_initcall+0x48/0x18c
[df42be90] [c05faafc] kernel_init_freeable+0x130/0x228
[df42bf20] [c0004108] kernel_init+0x18/0x110
[df42bf40] [c00103f0] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
741d7e44 3f090d7e ea9463ef 3a7ebecd fc607969 24b8044d a251c1c7 2c91258b
242aaa92 9887d4e0 2f4a22b5 8b2ef93c <8b9c5484> 7ecf225d 6a9c4a5b 1a5791d4
---[ end trace 47ca8dc77d8de71b ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Rebooting in 180 seconds..
In simple words kernel was searching for /dev/ram0 which it couldn't find.
probable root-cause: you didn't provided proper command line argument to initialize initrd. please do post you kernel command line. we can sort it out.
when opening any .sln file my Visual Studio 2013 crashes with following error:
An unhandled Microsoft .NET Framework exception occurred in devenv.exe
Possible Debuggers:
New instance of Microsoft Visual Studio 2015
--> I am using 2013
This error suddenly occurred without doing anything, at least not that I am aware of. I have seen many similar problems to this but no one has solved them, yet..
EDIT local dump
0:000> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
*** WARNING: Unable to verify checksum for WindowsBase.ni.dll
GetUrlPageData2 (WinHttp) failed: 12002.
DUMP_CLASS: 2
DUMP_QUALIFIER: 400
CONTEXT: (.ecxr)
eax=168124f8 ebx=00000001 ecx=07c44ef8 edx=00f3f15c esi=168124f0 edi=00f3f154
eip=e8000000 esp=00f3ef80 ebp=00f3f00c iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00210202
e8000000 ?? ???
Resetting default scope
FAULTING_IP:
+0
e8000000 ?? ???
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: e8000000
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000008
Parameter[1]: e8000000
Attempt to execute non-executable address e8000000
DEFAULT_BUCKET_ID: SOFTWARE_NX_FAULT_NOSOS
PROCESS_NAME: devenv.exe
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.
EXCEPTION_CODE_STR: c0000005
EXCEPTION_PARAMETER1: 00000008
EXCEPTION_PARAMETER2: e8000000
FOLLOWUP_IP:
vcpkg!EnvUtils::ValidateFile+9c
60ddf268 8b45c8 mov eax,dword ptr [ebp-38h]
EXECUTE_ADDRESS: ffffffffe8000000
FAILED_INSTRUCTION_ADDRESS:
+0
e8000000 ?? ???
WATSON_BKT_PROCSTAMP: 524fcb34
WATSON_BKT_PROCVER: 12.0.21005.1
PROCESS_VER_PRODUCT: Microsoft® Visual Studio® 2013
WATSON_BKT_MODULE: unknown
WATSON_BKT_MODVER: 0.0.0.0
WATSON_BKT_MODOFFSET: e8000000
WATSON_BKT_MODSTAMP: bbbbbbb4
BUILD_VERSION_STRING: 10.0.15063.296 (WinBuild.160101.0800)
MODLIST_WITH_TSCHKSUM_HASH: fb08b3e0d26f59b745effd61c5c16cb11b294362
MODLIST_SHA1_HASH: e077fef6b924063dd9adb146ae617873baf70a07
NTGLOBALFLAG: 0
PROCESS_BAM_CURRENT_THROTTLED: 0
PROCESS_BAM_PREVIOUS_THROTTLED: 0
APPLICATION_VERIFIER_FLAGS: 0
PRODUCT_TYPE: 1
SUITE_MASK: 272
DUMP_FLAGS: 8000c07
DUMP_TYPE: 3
MISSING_CLR_SYMBOL: 0
ANALYSIS_SESSION_HOST: DESKTOP-BS5SBSD
ANALYSIS_SESSION_TIME: 07-20-2017 16:11:36.0410
ANALYSIS_VERSION: 10.0.15063.468 x86fre
MANAGED_CODE: 1
MANAGED_ENGINE_MODULE: clr
MANAGED_ANALYSIS_PROVIDER: SOS
MANAGED_THREAD_ID: 8e8
THREAD_ATTRIBUTES:
ADDITIONAL_DEBUG_TEXT: SOS.DLL is not loaded for managed code. Analysis might be incomplete
OS_LOCALE: DEU
PROBLEM_CLASSES:
ID: [0n292]
Type: [#ACCESS_VIOLATION]
Class: Addendum
Scope: BUCKET_ID
Name: Omit
Data: Omit
PID: [Unspecified]
TID: [0x8e8]
Frame: [0] : unknown!unknown
ID: [0n266]
Type: [INVALID_POINTER_EXECUTE]
Class: Primary
Scope: BUCKET_ID
Name: Add
Data: Omit
PID: [Unspecified]
TID: [0x8e8]
Frame: [0] : unknown!unknown
ID: [0n274]
Type: [SOFTWARE_NX_FAULT]
Class: Primary
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Add
Data: Omit
PID: [0xcc8]
TID: [0x8e8]
Frame: [0] : unknown!unknown
ID: [0n272]
Type: [INVALID_POINTER]
Class: Primary
Scope: BUCKET_ID
Name: Add
Data: Omit
PID: [0xcc8]
TID: [0x8e8]
Frame: [0] : unknown!unknown
ID: [0n234]
Type: [NOSOS]
Class: Addendum
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Add
Data: Omit
PID: [Unspecified]
TID: [Unspecified]
Frame: [0]
BUGCHECK_STR: APPLICATION_FAULT_SOFTWARE_NX_FAULT_INVALID_POINTER_INVALID_POINTER_EXECUTE_NOSOS
PRIMARY_PROBLEM_CLASS: APPLICATION_FAULT
LAST_CONTROL_TRANSFER: from 60ddf268 to e8000000
STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
00f3ef7c 60ddf268 168124f8 00f3f15c 3cc66705 0xe8000000
00f3f00c 60ddf66a 00f3f07c 00f3f144 00f3f154 vcpkg!EnvUtils::ValidateFile+0x9c
00f3f180 60ddfc26 00000001 3cc66699 00000000 vcpkg!CInitializeConfigurationWorkItem::ProcessFiles+0x166
00f3f2e8 60d7c573 00000001 3cc6642d 146e3ae8 vcpkg!CInitializeConfigurationWorkItem::Initialize+0xe3f
00f3f324 60d7a609 00000001 3cc66471 07a5417c vcpkg!CMultiItemWorkItem::Initialize+0x86
00f3f378 60d385c4 07ad6e1c 00f3f410 77155da0 vcpkg!CWorkItem::ProcessPendingInitializeCalls+0xad
00f3f3b8 60d382c3 07a5417c fffffffe 14073550 vcpkg!CParserManager::OnIdle+0x3a3
00f3f474 71b84ce7 07ad6e1c fffffffe ffffffff vcpkg!CVCPackage::FDoIdle+0x1d0
00f3f4a4 71b84e0f 00000000 0104e204 00000002 msenv!SCM::FDoIdleLoop+0x122
00f3f4c8 71b84e5a ffffffff 00f3f4f8 71b849df msenv!SCM::FDoIdle+0xd5
00f3f4d4 71b849df 0104e1b8 ffffffff 066313f8 msenv!SCM_MsoStdCompMgr::FDoIdle+0x11
00f3f4f8 71b84479 066313f8 ffffffff ffffffff msenv!MainMessageLoop::DoIdle+0x1a
00f3f534 71c83083 0835d33f 00000000 0104e1b0 msenv!CMsoCMHandler::EnvironmentMsgLoop+0x12e
00f3f56c 71c82fb3 066313f8 ffffffff 0104e1b0 msenv!CMsoCMHandler::FPushMessageLoop+0x132
00f3f594 71c82f12 06614bd0 ffffffff 00000cc8 msenv!SCM::FPushMessageLoop+0xae
00f3f5b4 71c82ed9 0104e1b4 06614bd0 ffffffff msenv!SCM_MsoCompMgr::FPushMessageLoop+0x2a
00f3f5e0 71c82e1d ffffffff 0835d38f 00000000 msenv!CMsoComponent::PushMsgLoop+0x2e
00f3f670 71baf730 0835d0e7 00fa1c70 71b10000 msenv!VStudioMainLogged+0x525
00f3f698 2f73f1e2 00fa16d0 280e5cc0 00fa1c70 msenv!VStudioMain+0x7c
00f3f6d8 2f73ee26 280e53b8 77154cc0 2f74b56c devenv!util_CallVsMain+0xde
00f3f9a0 2f748734 00000000 00f74865 00000001 devenv!CDevEnvAppId::Run+0x9bc
00f3f9c8 2f748799 2f730000 00000000 00f74865 devenv!WinMain+0xbd
00f3fa14 77158744 00d00000 77158720 a9dba18e devenv!WinMainCRTStartup+0x12f
00f3fa28 778b582d 00d00000 fd9eefff 00000000 kernel32!BaseThreadInitThunk+0x24
00f3fa70 778b57fd ffffffff 778d6386 00000000 ntdll!__RtlUserThreadStart+0x2f
00f3fa80 00000000 2f74c3e1 00d00000 00000000 ntdll!_RtlUserThreadStart+0x1b
THREAD_SHA1_HASH_MOD_FUNC: 0136bd6660b54be9c6ec0b5c346ba7b7017c80cb
THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 43ea5823f76b450da54d29b14be5db6eb9d88bc3
THREAD_SHA1_HASH_MOD: 22f3d87dbea0d43cb2ca58d96819da8f26bffe9e
FAULT_INSTR_CODE: 8bc8458b
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: vcpkg!EnvUtils::ValidateFile+9c
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: vcpkg
IMAGE_NAME: vcpkg.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 5590c8c5
STACK_COMMAND: .ecxr ; kb
FAILURE_BUCKET_ID: SOFTWARE_NX_FAULT_NOSOS_c0000005_vcpkg.dll!EnvUtils::ValidateFile
BUCKET_ID: APPLICATION_FAULT_SOFTWARE_NX_FAULT_INVALID_POINTER_INVALID_POINTER_EXECUTE_NOSOS_BAD_IP_vcpkg!EnvUtils::ValidateFile+9c
FAILURE_EXCEPTION_CODE: c0000005
FAILURE_IMAGE_NAME: vcpkg.dll
BUCKET_ID_IMAGE_STR: vcpkg.dll
FAILURE_MODULE_NAME: vcpkg
BUCKET_ID_MODULE_STR: vcpkg
FAILURE_FUNCTION_NAME: EnvUtils::ValidateFile
BUCKET_ID_FUNCTION_STR: EnvUtils::ValidateFile
BUCKET_ID_OFFSET: 9c
BUCKET_ID_MODTIMEDATESTAMP: 5590c8c5
BUCKET_ID_MODCHECKSUM: 44caac
BUCKET_ID_MODVER_STR: 12.0.40629.0
BUCKET_ID_PREFIX_STR: APPLICATION_FAULT_SOFTWARE_NX_FAULT_INVALID_POINTER_INVALID_POINTER_EXECUTE_NOSOS_BAD_IP_
FAILURE_PROBLEM_CLASS: APPLICATION_FAULT
FAILURE_SYMBOL_NAME: vcpkg.dll!EnvUtils::ValidateFile
WATSON_STAGEONE_URL: http://watson.microsoft.com/StageOne/devenv.exe/12.0.21005.1/524fcb34/unknown/0.0.0.0/bbbbbbb4/c0000005/e8000000.htm?Retriage=1
TARGET_TIME: 2017-07-20T14:57:52.000Z
OSBUILD: 15063
OSSERVICEPACK: 296
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
OSPLATFORM_TYPE: x86
OSNAME: Windows 10
OSEDITION: Windows 10 WinNt SingleUserTS
USER_LCID: 0
OSBUILD_TIMESTAMP: unknown_date
BUILDDATESTAMP_STR: 160101.0800
BUILDLAB_STR: WinBuild
BUILDOSVER_STR: 10.0.15063.296
ANALYSIS_SESSION_ELAPSED_TIME: 6ee0
ANALYSIS_SOURCE: UM
FAILURE_ID_HASH_STRING: um:software_nx_fault_nosos_c0000005_vcpkg.dll!envutils::validatefile
FAILURE_ID_HASH: {653be37d-7dca-4334-85f0-5ab76235b00d}
Followup: MachineOwner
I also had this problem. Every time I open my Visual Studio 2013, with or without a solution in it, it will crash and show:
Visual Studio 2013 has stopped working
It was solved by:
Disconnect your PC from any internet connection,
Open Visual Studio 2013. In this state, you are able to open your VS 2013 without any issue, but it will have the problem again when you connect to internet,
Logout your account from Visual Studio 2013,
Connect to internet again
In my case, it doesn't really a matter if I logout my account from my VS 2013, therefore it fixed the issue.
Close Visual Studio (ensure devenv.exe is not present in the Task Manager)
Delete the %USERPROFILE%\AppData\Local\Microsoft\VisualStudio\14.0\ComponentModelCache directory
Restart Visual Studio.
Guys When you are Facing this type issue Visual Studio When you Start It's Getting This Type "Microsoft Visual Studio 2013 has Stopped Working"
NOTE: Please check when you disconnect the internet VS 2013 It's Working Again You Start Internet Again the same Problem.
In that Case, You Can Log Out Microsoft ID Which you have logged into Visual Studio 2013
Then Problem Resolved,
I'm analyzing a post-mortem kernel dump and I'm trying to identify all processes and filter drivers that may be referencing a USB storage drive or have handles open to it. I've tried examining all the open handles but even with limiting it to only File objects the data isn't manageable. So I navigated through the !object \ list to find the volume I'm looking for:
3: kd> !devobj fffffa8007169cd0
Device object (fffffa8007169cd0) is for:
HarddiskVolume6 \Driver\volmgr DriverObject fffffa8006af2060
Current Irp 00000000 RefCount 34 Type 00000007 Flags 00001050
Vpb fffffa8007168940 Dacl fffff9a10033a3c0 DevExt fffffa8007169e20 DevObjExt fffffa8007169f88 Dope fffffa80071688d0 DevNode fffffa800716b890
3: kd> !vpb fffffa8007168940
Vpb at 0xfffffa8007168940
Flags: 0x1 mounted
DeviceObject: 0xfffffa8008880030
RealDevice: 0xfffffa8007169cd0
RefCount: 34
Volume Label:
Is it possible to find what all of these 34 references are?
Is there a simple method of identifying what is using any given volume from a memory dump?
Doesnt !devhandle on the devobject provide you any details ?
kd> .shell -ci "!object \Device" grep -i harddisk
xxxxxxxxxx
20 849a8e20 Device HarddiskVolume8
xxxxxxxx
kd> !devobj 849a8e20
Device object (849a8e20) is for:
HarddiskVolume8 \Driver\volmgr DriverObject 851708b0
Current Irp 00000000 RefCount 5 Type 00000007 Flags 00003050
Vpb 8594de78 Dacl b0c8b8a4 DevExt 849a8ed8 DevObjExt 849a8fc0 Dope 8493ee10 DevNode 86643708
ExtensionFlags (0000000000)
Characteristics (0x00000001) FILE_REMOVABLE_MEDIA <--------
AttachedDevice (Upper) 866f04c8 \Driver\fvevol
Device queue is not busy.
kd> !devhandles 849a8e20
Checking handle table for process 0x84830ae8
Kernel handle table at 89601b80 with 636 entries in use
xxxxxxxxxxxxxxxxxxxxxxxx
PROCESS 86479210 SessionId: 1 Cid: 05e8 Peb: 7ffdf000 ParentCid: 05b0
DirBase: 7e28f2c0 ObjectTable: 94dcc900 HandleCount: 923.
Image: explorer.exe
121c: Object: 84a03550 GrantedAccess: 00100081 Entry: adac3438
Object: 84a03550 Type: (848adde8) File
ObjectHeader: 84a03538 (new version)
HandleCount: 1 PointerCount: 2
Directory Object: 00000000 Name: \ {HarddiskVolume8} <----
PROCESS 86479210 SessionId: 1 Cid: 05e8 Peb: 7ffdf000 ParentCid: 05b0
DirBase: 7e28f2c0 ObjectTable: 94dcc900 HandleCount: 923.
Image: explorer.exe
12ac: Object: 84a0a038 GrantedAccess: 00100081 Entry: adac3558
Object: 84a0a038 Type: (848adde8) File
ObjectHeader: 84a0a020 (new version)
HandleCount: 1 PointerCount: 2
Directory Object: 00000000 Name: \ {HarddiskVolume8} <-----
I'm receiving an error when I try to run any cygwin functionality (bash/sh/ash/dash), which says:
1 [main] ash 5008 D:\DevStudio\cygwin\bin\ash.exe: *** fatal error - could
n't allocate heap, Win32 error 487, base 0xC80000, top 0xCD0000, reserve_size 32
3584, allocsize 327680, page_const 4096
Stack trace:
Frame Function Args
0028E4EC 6102796B (0028E4EC, 00000000, 00000000, 00640000)
0028E7DC 6102796B (6117EC60, 00008000, 00000000, 61180977)
0028F80C 61004F1B (611B66CC, 00C80000, 00CD0000, 0004F000)
0028F83C 6106E8C3 (7FFEFFFF, 000000FF, 00000008, 77C2FEA2)
0028F92C 610C133B (00000078, 02000000, 6116A724, 6116A720)
0028F95C 610064C0 (00000000, 00000000, 00000000, 00000000)
0028FA1C 6106FC15 (61000000, 00000001, 0028FD24, 00000001)
0028FA3C 77C4B990 (6106F960, 61000000, 00000001, 0028FD24)
0028FB30 77C50389 (0028FD24, 7EFDD000, 7EFDE000, 77D1206C)
0028FCB0 77C56C5C (0028FD24, 77C10000, 597A2CBD, 00000000)
0028FD00 77C55717 (0028FD24, 77C10000, 00000000, 00000000)
0028FD10 77C4BEB9 (0028FD24, 77C10000, 00000000, 0001002F)
End of stack trace
0 [main] ash 6536 fork: child -1 - died waiting for longjmp before initial
ization, retry 0, exit code 0x100, errno 11
I've looked at suggestions which all suggest rebasing, however it just fails with the above error but with this added on:
/usr/bin/rebaseall: 21: Cannot fork
Any help would be great! I've tried changing virtual memory size etc with no avail. Sophos is also running on this machine.
I'm trying to test a crash scenario (in an isolated test-app) with normal page heap (not full).
I have set up the flags with
gflags /p /enable Test.exe
and I'm overwriting an integer buffer by one element
...
const size_t s = 100;
vector<int> v1(s, 0);
int* v1_base = &v1[0];
write_to_memory_int(v1_base, s+1);
...
and indeed when the block is freed in the vectors d'tor, I get a break. The callstack for the break is reported correctly:
0:005> kp
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr
0785faa4 11229df2 verifier!VerifierStopMessage+0x1f8
0785fb08 1122a22a verifier!AVrfpDphReportCorruptedBlock+0x1c2
0785fb64 1122a742 verifier!AVrfpDphCheckNormalHeapBlock+0x11a
0785fb84 112290d3 verifier!AVrfpDphNormalHeapFree+0x22
0785fba8 77951564 verifier!AVrfDebugPageHeapFree+0xe3
0785fbf0 7790ac29 ntdll!RtlDebugFreeHeap+0x2f
0785fce4 778b34a2 ntdll!RtlpFreeHeap+0x5d
0785fd04 750c14dd ntdll!RtlFreeHeap+0x142
0785fd18 71fc4c39 kernel32!HeapFree+0x14
0785fd64 00404b0a msvcr80!free(void * pBlock = 0x0726f7b8)+0xcd [f:\dd\vctools\crt_bld\self_x86\crt\src\free.c # 110]
0785fd90 00402ac7 Test!std::vector<int,std::allocator<int> >::_Tidy
...
However, when I look at the faulting allocation, I only get this:
0:005> !heap -p -a 0x0726f7b8
address 0726f7b8 found in
_HEAP # 30000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
0726f790 0039 0000 [00] 0726f7b8 00190 - (busy)
1122a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
11228f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
77950d96 ntdll!RtlDebugAllocateHeap+0x00000030
7790af0d ntdll!RtlpAllocateHeap+0x000000c4
778b3cfe ntdll!RtlAllocateHeap+0x0000023a
that is, there is an allocation stack trace, but it stops at RtlAllocateHeap which is oviously totally useless.
Looking at the stack trace in memory:
dt _DPH_BLOCK_INFORMATION ....-0x20
=>
0:005> dds 0x03e556f4
03e556f4 00000000
03e556f8 00002050
03e556fc 00050000
03e55700 1122a6a7 verifier!AVrfpDphNormalHeapAllocate+0xd7
03e55704 11228f6e verifier!AVrfDebugPageHeapAllocate+0x30e
03e55708 77950d96 ntdll!RtlDebugAllocateHeap+0x30
03e5570c 7790af0d ntdll!RtlpAllocateHeap+0xc4
03e55710 778b3cfe ntdll!RtlAllocateHeap+0x23a
03e55714 00000000
03e55718 00003001
03e5571c 0004005e
It appears that there isn't in fact anything more recorded.
How can I fix Page Heap to record useful stack traces?
Note that the Test project is not compiled with FPO (/Oy), and I would not have expected to have RtlAllocateHeapbe affected by FPO?
Update: I checked the FPO-ness of the call in question by stepping into the allocation manually (see below) and it would appear that both malloc as well as op new of the VC80(VS2005) runtime libs have some form of FPO enabled ... so maybe that's messing up the stack trace for the stack DB of page heap.
0:004> kv
ChildEBP RetAddr Args to Child
077efa7c 77c8af0d 05290000 01001002 00000190 ntdll!RtlDebugAllocateHeap+0x16 (FPO: [Non-Fpo])
077efb60 77c33cfe 00000190 00000000 00000000 ntdll!RtlpAllocateHeap+0xc4 (FPO: [Non-Fpo])
077efbe4 72344d83 05290000 01001002 00000190 ntdll!RtlAllocateHeap+0x23a (FPO: [Non-Fpo])
077efc04 62f595ee 00000190 00000000 00000000 MSVCR80!malloc+0x7a (FPO: [1,0,0]) (CONV: cdecl)
077efc1c 00406a44 00000190 ebecf74f 00000001 MFC80U!operator new+0x2f (FPO: [Uses EBP] [1,0,0]) (CONV: cdecl)
077efc48 00405479 00000064 00000000 3fffffff Test!std::_Allocate<ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > > >+0x84 (FPO: [Non-Fpo]) (CONV: cdecl)
077efcb8 004049f4 00000064 ebecf68f 00000000 Test!std::vector<unsigned int,std::allocator<unsigned int> >::_Buy+0x69 (FPO: [Non-Fpo]) (CONV: thiscall)
077efd88 00402a4f 00000064 077efdc0 ebecf44b Test!std::vector<int,std::allocator<int> >::_Construct_n+0x44 (FPO: [Non-Fpo]) (CONV: thiscall)
077eff4c 72342848 00000000 ebec8474 00000000 Test!crashFN+0x35f (FPO: [Non-Fpo]) (CONV: cdecl)
077eff84 723428c8 75da33aa 072ab3d8 077effd4 MSVCR80!_callthreadstart+0x1b (FPO: [Non-Fpo]) (CONV: cdecl)
077eff88 75da33aa 072ab3d8 077effd4 77c39f72 MSVCR80!_threadstart+0x5a (FPO: [1,0,0]) (CONV: stdcall)
077eff94 77c39f72 072ab3d8 70fca8b2 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
077effd4 77c39f45 7234286e 072ab3d8 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
077effec 00000000 7234286e 072ab3d8 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
Thanks to #Marc Sherman for pointing out in the comments that I should check out the real allocation stack trace.
As already edited into the question, VC80(VS2005) is the problem here, as it's CRT has FPO enabled, as seen in the stack trace:
MSVCR80!malloc+0x7a (FPO: [1,0,0]) (CONV: cdecl)
MFC80U!operator new+0x2f (FPO: [Uses EBP] [1,0,0]) (CONV: cdecl)
Now, having an anchor to search, we find the following:
Why does every heap trace in UMDH get stuck at “malloc”?
Adding a few quotes:
In particular, it would appear that the default malloc implementation
on the static link CRT on Visual C++ 2005 not only doesn’t use a frame
pointer, but it trashes ebp as a scratch register ...
What does this all mean? Well, anything using malloc that’s built with
Visual C++ 2005 won’t be diagnosable with UMDH or anything else that
relies on ebp-based stack traces, at least not on x86 builds.
There is also a reply in the comments that's got nice info:
Mark Roberts [MSFT] says: February 25, 2008 at 3:03 pm
Hello,
Enabling FPO for the 8.0 CRT was not deliberate. The Visual Studio
2008 CRT (9.0) does NOT have FPO enabled, and UMDH should function
normally.
For 8.0, an alternative to UMDH would be to use LeakDiag. LeakDiag
will actually instrument memory allocators to obtain stack traces.
This makes it more versatile than UMDH as it can hook several
different allocator types at different granularities (Ranging from the
c runtime to raw virtual memory allocations).
By default, LeakDiag simply walks the stack base pointers, but it can
be modified to use the Dbghlp StackWalkAPI to resolve FPO data. This
will produce full stacks, though the performance penalty is higher. On
the flip side, you can customize the stack walking behavior to only go
to a certain depth, etc to minimize the perf penalty.
Please find LeakDiag here:
ftp://ftp.microsoft.com/PSS/Tools/Developer%20Support%20Tools/LeakDiag/leakdiag125.msi