My app crashed at startup in an MSHTML worker thread. The EXCEPTION_RECORD gives:
0:066> .exr 0e11f668
ExceptionAddress: 732019ab (rtutils!AcquireWriteLock+0x00000010)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000008
Parameter[1]: 732019ab
Attempt to execute non-executable address 732019ab
But !address shows that the address 732019ab is indeed executable:
0:066> !address 732019ab
Usage: Image
Base Address: 73201000
End Address: 7320a000
Region Size: 00009000
State: 00001000 MEM_COMMIT
Protect: 00000020 PAGE_EXECUTE_READ
Type: 01000000 MEM_IMAGE
Allocation Base: 73200000
Allocation Protect: 00000080 PAGE_EXECUTE_WRITECOPY
Image Path: C:\Windows\SysWOW64\rtutils.dll
Module Name: rtutils
Loaded Image Name: rtutils.dll
Mapped Image Name:
More info: lmv m rtutils
More info: !lmi rtutils
More info: ln 0x732019ab
More info: !dh 0x73200000
The instruction at 732019ab is:
0:066> u 732019ab l1
rtutils!AcquireWriteLock+0x10:
732019ab 8d4618 lea eax,[esi+18h]
Why is a DEP violation being reported at an address whose page is marked as PAGE_EXECUTE_WRITECOPY ?
Yep, that seems pretty impossible. I don't have an answer, but the list of possibilities is too long for a comment.
If I were to guess, I'd say something is playing with the protection flags on that page, but putting it back to PAGE_EXECUTE_READ after (or while) the exception is being raised. Start by seeing if your code (or any libraries you use) plays with VirtualProtect.
If that doesn't reveal anything, we can move onto some other possibilities:
Malware
Some malware likes to play with hooking/hotpatching and has been known to cause similar problems.
Faulty Antivirus
Antivirus applications employ a lot of the same tricks as malware. If issues stop after disabling it, you've found your culprit and can look at updating/replacing it.
A Bad Kernel Driver
In kernel mode, you can achieve the impossible accidentally, but never on purpose. :)
A Faulty CPU
An overclocked or poorly cooled CPU can cause many unpredictable things to happen. Not likely, but possible.
Related
I'm studying the internals of the Windows kernel and one of the things I'm looking into is how paging and virtual addresses work in Windows. I was experimenting with windbg's !vtop function when I noticed something strange I was getting an impossible physical address?
For example here is my output of a !process 0 0 command:
PROCESS fffffa8005319b30
SessionId: none Cid: 0104 Peb: 7fffffd8000 ParentCid: 0004
DirBase: a8df3000 ObjectTable: fffff8a0002f6df0 HandleCount: 29.
Image: smss.exe
when I run !vtop a8df3000 fffffa8005319b30. I get the following result:
lkd> !vtop a8df3000 fffffa8005319b30
Amd64VtoP: Virt fffffa80`05319b30, pagedir a8df3000
Amd64VtoP: PML4E a8df3fa8
Amd64VtoP: PDPE 2e54000
Amd64VtoP: PDE 2e55148
Amd64VtoP: Large page mapped phys 1`3eb19b30
Virtual address fffffa8001f07310 translates to physical address 13eb19b30
The problem I have with this is that my VM that I'm running this test on only has 4GB and 13eb19b30 is 5,346,794,288...
When I run !dd 13eb19b30 and dd fffffa8001f07310 I get the same result so windows seems to be able to access this physical address somehow... Does anyone know how this is done?
I found this post on Cheat Engine that looks like he had a similar problem to me. But they found no solution in that case either
I see You have posted this is RESE also i saw it there didn't understand exactly what you are trying to do.
i see a few discrepancies
you seemed to have used a PFN a8df3000 but it seems windbg seems to be using a PFN of 187000 instead
btw pfn iirc should be dirbase & 0xfffff000
also for virtual address you seem to using the EPROCESS address of your process
are you sure that this is the right virtual address you want to use ?
also it seems you are using lkd which is local kernel debugging prompt
and i hope you understand that lkd is not real kernel debugging
So I think I was finally able to come with a reasonable answer to the problem. It turns out that vmware doesn't seem to actually expose to the VM contiguous memory but instead segments it into different memory "runs". I was able to confirm this by using the volatility:
$ python vol.py -f ~/Desktop/Win7SP1x64-d8737a34.vmss vmwareinfo --verbose | less
Magic: 0xbad1bad1 (Version 1)
Group count: 0x5c
File Offset PhysMem Offset Size
----------- -------------- ----------
0x000010000 0x000000000000 0xc0000000
0x0c0010000 0x000100000000 0xc0000000
Here is a volatility github wiki article that goes into more detail about: volatility
I have crash dumps that have WerpReportFault() in their stack and they really don't look the way I expect them to.
My expectation
If have seen WerpReportFault()along with 0x80000003 breakpoints and I was able to use WinDbg to re-dump with different exception pointers, taken from the second argument passed to WerpReportFault().
I'm very sure that has worked before, since I even recommended that in my answer over there. There are also other sites suggesting this technique, e.g. James Ross
My current observations
The dumps I'm analyzing have an "ordinary exception" inside, e.g. an access violation:
0:000> .exr -1
ExceptionAddress: 53ec8b55
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 53ec8b55
Attempt to read from address 53ec8b55
But they still have WerpReportFault() as the stack:
0:000> k
ChildEBP RetAddr
0018f25c 74c4171a ntdll!NtWaitForMultipleObjects+0x15
0018f2f8 75181a08 KERNELBASE!WaitForMultipleObjectsEx+0x100
0018f340 75184200 kernel32!WaitForMultipleObjectsExImplementation+0xe0
0018f35c 751a80ec kernel32!WaitForMultipleObjects+0x18
0018f3c8 751a7fab kernel32!WerpReportFaultInternal+0x186
0018f3dc 751a78a0 kernel32!WerpReportFault+0x70
0018f3ec 751a781f kernel32!BasepReportFault+0x20
0018f478 7295fa2e kernel32!UnhandledExceptionFilter+0x1af
Argument 2 does not seem to be a good exception pointer to be used in the .dump command.
0:000> kb
ChildEBP RetAddr Args to Child
[...]
0018f3dc 751a78a0 0018f4a0 00000001 0018f478 kernel32!WerpReportFault+0x70
[...]
Question
What causes the problems I have and how do I get around it? I know it must be possible, because !analyze -v can tell me the real call stack.
Is it due to Visual Basic 6 and the unhandled exception filter?
0018f478 7295fa2e 00000000 72a2bd04 0018f4a8 kernel32!UnhandledExceptionFilter+0x1af
0018ff80 00440fe2 00443860 7518338a 7efde000 msvbvm60!Zombie_Release+0x10fd5
I really want to have a nice call stack, since all my manual debugging and all my scripts are broken which rely on k and !clrstack and similar. They can't deal with WerpReportFault() on the stack.
All the dumps are 32 bit, as you can imagine from the VB6 dependency.
Such a problem is caused by a wrong context. It seems to be set to the normal context record. To set it to the exception context, use .ecxr. To switch back to the normal context (which you see), use .cxr
My heap buffer of interest was allocated as follows:
0:047> !heap -p -a 1d7cd1f0
address 1d7cd1f0 found in
_DPH_HEAP_ROOT # 5251000
in busy allocation ( DPH_HEAP_BLOCK: UserAddr UserSize - VirtAddr VirtSize)
1cf8f5b0: 1d7cc008 3ff8 - 1d7cb000 6000
68448e89 verifier!AVrfDebugPageHeapAllocate+0x00000229
76e465ee ntdll!RtlDebugAllocateHeap+0x00000030
76e0a793 ntdll!RtlpAllocateHeap+0x000000c4
76dd5dd0 ntdll!RtlAllocateHeap+0x0000023a
000ca342 TEST+0x0002a342
000be639 TEST+0x0001e639
As you can see, it was allocated using HeapAlloc(). When I run the !address command on the pointer of this heap I get:
ProcessParametrs 01699928 in range 01699000 0169a000
Environment 016976e8 in range 01697000 01698000
1d790000 : 1d7cb000 - 00005000
Type 00020000 MEM_PRIVATE
Protect 00000004 PAGE_READWRITE
State 00001000 MEM_COMMIT
Usage RegionUsageIsVAD
It claims to be in RegionUsageIsVAD. According to this stackoverflow answer, RegionUsageIsVAD generally means two things:
This is a .NET application in which case, the CLR allocates this
block of memory.
The application calls VirtualAlloc to allocate a
bloc of memory.
My scenario does not fit either one of these cases. I confirmed that CLR wasn't used by running .cordll -ve -u -l to which I got:
CLR DLL status: No load attempts
What does RegionUsageIsVAD mean in this case?
i reread your question thinking i would update what i commented
but upon closer look it seems there are lot of holes
it appears you copied things and didnt paste right
where is that pointer on heap ?
01699928 which version of windbg are you using
since i couldn't confirm i cooked up a simple program
enabled hpa in gflags and executed the exe under windbg
below is the screen shot
except what you paste as isregionvad ( this line is output under kernel !address not in user !address ) every thing else appears to be similar in the screenshot
I'm writing a framebuffer for an SPI LCD display on ARM. Before I complete that, I've written a memory only driver and trialled it under Ubuntu (Intel, Virtualbox). The driver works fine - I've allocated a block of memory using kmalloc, page aligned it (it's page aligned anyway actually), and used the framebuffer system to create a /dev/fb1. I have my own mmap function if that's relevant (deferred_io ignores it and uses its own by the look of it).
I have set:
info->screen_base = (u8 __iomem *)kmemptr;
info->fix.smem_len = kmem_size;
When I open /dev/fb1 with a test program and mmap it, it works correctly. I can see what is happening x11vnc to "share" the fb1 out:
x11vnc -rawfb map:/dev/fb1#320x240x16
And view with a vnc viewer:
gvncviewer strontium:0
I've made sure I've no overflows by writing to the entire mmapped buffer and that seems to be fine.
The problem arises when I add in deferred_io. As a test of it, I have a delay of 1 second and the called deferred_io function does nothing except a pr_devel() print. I followed the docs.
Now, the test program opens /dev/fb1 fine, mmap returns ok but as soon as I write to that pointer, I get a kernel panic. The following dump is from the ARM machine actually but it panics on the Ubuntu VM as well:
root#duovero:~/testdrv# ./fbtest1 /dev/fb1
Device opened: /dev/fb3
Screen is: 320 x 240, 16 bpp
Screen size = 153600 bytes
mmap on device succeeded
Unable to handle kernel paging request at virtual address bf81e020
pgd = edbec000
[bf81e020] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in: hhlcd28a(O) sysimgblt sysfillrect syscopyarea fb_sys_fops bnep ipv6 mwifiex_sdio mwifiex btmrvl_sdio firmware_class btmrvl cfg80211 bluetooth rfkill
CPU: 0 Tainted: G O (3.6.0-hh04 #1)
PC is at fb_deferred_io_fault+0x34/0xb0
LR is at fb_deferred_io_fault+0x2c/0xb0
pc : [<c0271b7c>] lr : [<c0271b74>] psr: a0000113
sp : edbdfdb8 ip : 00000000 fp : edbeedb8
r10: edbeedb8 r9 : 00000029 r8 : edbeedb8
r7 : 00000029 r6 : bf81e020 r5 : eda99128 r4 : edbdfdd8
r3 : c081e000 r2 : f0000000 r1 : 00001000 r0 : bf81e020
Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c5387d Table: adbec04a DAC: 00000015
Process fbtest1 (pid: 485, stack limit = 0xedbde2f8)
Stack: (0xedbdfdb8 to 0xedbe0000)
[snipped out hexdump]
[<c0271b7c>] (fb_deferred_io_fault+0x34/0xb0) from [<c00db0c4>] (__do_fault+0xbc/0x470)
[<c00db0c4>] (__do_fault+0xbc/0x470) from [<c00dde0c>] (handle_pte_fault+0x2c4/0x790)
[<c00dde0c>] (handle_pte_fault+0x2c4/0x790) from [<c00de398>] (handle_mm_fault+0xc0/0xd4)
[<c00de398>] (handle_mm_fault+0xc0/0xd4) from [<c049a038>] (do_page_fault+0x140/0x37c)
[<c049a038>] (do_page_fault+0x140/0x37c) from [<c0008348>] (do_DataAbort+0x34/0x98)
[<c0008348>] (do_DataAbort+0x34/0x98) from [<c0498af4>] (__dabt_usr+0x34/0x40)
Exception stack(0xedbdffb0 to 0xedbdfff8)
ffa0: 00000280 0000ffff b6f5c900 00000000
ffc0: 00000003 00000000 00025800 b6f5c900 bea6dc1c 00011048 00000032 b6f5b000
ffe0: 00006450 bea6db70 00000000 000085d6 40000030 ffffffff
Code: 28bd8070 ebffff37 e2506000 0a00001b (e5963000)
---[ end trace 7e5ca57bebd433f5 ]---
Segmentation fault
root#duovero:~/testdrv#
I'm totally stumped - other drivers look more or less the same as mine but I assume they work. Most use vmalloc actually - is there a difference between kmalloc and vmalloc for this purpose?
Confirmed the fix so I'll answer my own question:
deferred_io changes the info mmap to its own that sets up fault handlers for writes to the video memory pages. In the fault handler it
checks bounds against info->fix.smem_len, so you must set that
gets the page that was written to.
For the latter case, it treats vmalloc differently from kmalloc (by checking info->screen_base to see if it's vmalloced). If you have vmalloced, it uses screen_base as the virtual address. If you have not used vmalloc, it assumes that the address of interest is the physical address in info->fix.smem_start.
So, to use deferred_io correctly
set screen_base (char __iomem *) and point that to the virtual address.
set info->fix.smem_len to the video buffer size
if you are not using vmalloc, you must set info->fix.smem_start to the video buffer's physical address by using virt_to_phys(vid_buffer);
Confirmed on Ubuntu as fixing the issue.
Really interesting, I'm currently implementing SPI-based display FB driver too (Sharp Memory LCD display and my VFDHack32 host driver). I also facing similar problem where it crashes at deferred_io. Can you share you source code ? mine is at my GitHub repo. P.S. that Memory LCD display is monochrome so I just pretend to be color display and just check whether the pixel byte is empty (dot off) or not empty (dot on).
Here's the nm dump of my program.
00000000 T __ctors_end
00000000 T __ctors_start
00000000 T __dtors_end
00000000 T __dtors_start
00000000 a __tmp_reg__
00000000 T __trampolines_end
00000000 T __trampolines_start
00000000 T setup
00000001 a __zero_reg__
0000003d a __SP_L__
0000003e a __SP_H__
0000003f a __SREG__
00000072 T __vector_15
00000086 T main
000000a8 A __data_load_end
000000a8 A __data_load_start
000000a8 T _etext
00800100 D _edata
00800100 T _end
00810000 T __eeprom_end
The architecture is AVR, and I need to get main() back up to 0x00000000 in order for the chip that I'm running this code on to execute properly. It should be as simple as a linker script, shouldn't it?
It doesn't matter where main() is in memory. Simply put a jump instruction to its address at the reset vector, or 0x0000 in application memory.
I used to program for AVR and as I know the only way to change main() entry is fuse bits. But you just can to put in the back of FLASH for bootloader. Depending on chip main starts in different places, I'm not sure but on AVR it should be something like 0x20 to 0x100.
It is because at the beginning there is RESET vector, registers and interrupt vectors.
This structure helps very much, once I had a project on which I wasn't able to use watchdog so the only way to trigger reset was overflow.
Also, I've read your comment. You don't need to put 256 bytes of 0x00 that place is for some registers (AVR registers are divided in to places one is SRAM, other FLASH) and interrupt vectors, so if you use lets say timer or UART and your code start at 0x00 so initialization of these would destroy your code.
It is designed to work, I think redesigning would spoil that. But if you really want this, you can try to add -Ttext=0x0000 this flag. This may compile it as you want but I do not recomend doing that.