LoadLibraryA with User32.dll crash in ntdll.dll (x64 assembly) - windows

So I have a block of assembly code that initializes a program, resolves kernel32, finds GetProcAddress, then finds LoadLibarayA to load User32.dll. It works up to the point of LoadLibraryA. It crashes in the function call but I can see User32.dll loaded in the debugger. If I try to use LoadLibraryA on a different module such as Kernel32.dll it returns and succeeds.
Here is the full source if you want to look it over
https://gist.github.com/mojobojo/921a5af897e86bb940a2
Exception thrown at 0x00007FFAFAE8E91C (ntdll.dll) in Small.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.
Here is the snippet that trys to load user32.
mov rcx, ActualAddress + User32DllStr ; ActualAddress is the program address in memory
call rax ; LoadLibararyA
cmp rax, 0
je EndFunction ; Failed to open user32.dll
LoadLibraryAStr:
db "LoadLibraryA", 0
Here is a look at the call stack.
ntdll.dll!RtlDosPathNameToRelativeNtPathName() Unknown
ntdll.dll!LdrpResolveDllName() Unknown
ntdll.dll!LdrpFindLoadedDll() Unknown
ntdll.dll!LdrGetDllHandleEx() Unknown
ntdll.dll!LdrGetDllHandle() Unknown
KernelBase.dll!00007ffaf82d2984() Unknown
KernelBase.dll!00007ffaf82d29ef() Unknown
user32.dll!00007ffaf934e7e8() Unknown
user32.dll!00007ffaf934dc92() Unknown
ntdll.dll!LdrpCallInitRoutine() Unknown
ntdll.dll!LdrpInitializeNode() Unknown
ntdll.dll!LdrpInitializeGraph() Unknown
ntdll.dll!LdrpPrepareModuleForExecution() Unknown
ntdll.dll!LdrpLoadDll() Unknown
ntdll.dll!LdrLoadDll() Unknown
KernelBase.dll!00007ffaf82d8e4a() Unknown
KernelBase.dll!00007ffaf82d97e5() Unknown
kernel32.dll!00007ffaf8b5499a() Unknown
Small.exe!0000000140010253() Unknown

I figured it out. My stack wasn't 16 byte aligned.

Related

breakpoint setting in nt!KiSystemCall64 not working

I want having a deep dive into ntdll!NtQueueApcThread seeing what happen after syscall instruction is executed. According to the document(Intel® 64 and IA-32 Architectures Software Developer’s Manual), the syscall instruction using the msr's LSTAR (0xC0000082) as the rip, so I set breakpoint in the nt!KiSystemCall64, which is the address I got by "rdmsr c0000082". But it's not working, my debugger doesn't break. It seems that the syscall inst does not jump to the address stored in the msr.
This is what I have done:
0: kd>
ntdll!NtQueueApcThread+0x10:
0033:00007ffb`3c32c640 7503 jne ntdll!NtQueueApcThread+0x15 (00007ffb`3c32c645)
0: kd>
ntdll!NtQueueApcThread+0x12:
0033:00007ffb`3c32c642 0f05 syscall
0: kd> rdmsr c0000082
msr[c0000082] = fffff802`3681e6c0
0: kd> u fffff802`3681e6c0
nt!KiSystemCall64:
fffff802`3681e6c0 0f01f8 swapgs
fffff802`3681e6c3 654889242510000000 mov qword ptr gs:[10h],rsp
fffff802`3681e6cc 65488b2425a8010000 mov rsp,qword ptr gs:[1A8h]
fffff802`3681e6d5 6a2b push 2Bh
fffff802`3681e6d7 65ff342510000000 push qword ptr gs:[10h]
fffff802`3681e6df 4153 push r11
fffff802`3681e6e1 6a33 push 33h
fffff802`3681e6e3 51 push rcx
0: kd> bp ffff802`3681e6c0
0: kd> t
ntdll!NtQueueApcThread+0x14:
0033:00007ffb`3c32c644 c3 ret
the target system version is Windows10 build 19042
After a few day study, I got to know what the problem is. In normal situation, setting a breakpoint in nt!KiSystemCall64 will cause a BSOD, this is because the first three instruction in nt!KiSystemCall64 is setting the kernel stack, which is required by windows kernel debugging mechanism. So the solution is setting the breakoint after the kernel stack is setup
bp nt!KiSystemCall64+0x15
In my situation, setting breakpoint does nothing happend, seems like the breakpoint is not working, I guess it is due to the anti-virus software I installed, which may have hooked some thing. After I uninstalled it, breakpoint is worked.

How to interpret exception codes shown in WinDbg?

I am just debugging a Windows application which is crashing. After starting the app, attaching to it with WinDbg, and then letting it crash, the following appeared in the WinDbg command window:
(119c.1794): Unknown exception - code 0000071a (first chance)
I've been searching the web but haven't found any explanation of how to interpret those exception codes.
If it makes a difference, it's a 32-bit .NET application running on 64-bit Windows 8 (via WoW64).
WinDbg already displays the name of exceptions when it knows it:
(15c0.1370): Break instruction exception - code 80000003 (first chance)
You get more details with .exr -1:
0:009> .exr -1
ExceptionAddress: 77d5000c (ntdll!DbgBreakPoint)
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 1
Parameter[0]: 00000000
You can also display NTSTATUS codes as proposed by #rrirower:
0:009> !gle
LastErrorValue: (Win32) 0 (0) - The operation completed successfully.
LastStatusValue: (NTSTATUS) 0 - STATUS_WAIT_0
And these status codes can be decoded with !error. It will consider Win32, Winsock, NTSTATUS and NetApi errors:
0:009> !error 0000071a
Error code: (Win32) 0x71a (1818) - The remote procedure call was cancelled.

MEM_FREE pages identification

I want to allocate a specific 512MB page which is MEM_FREE, and I want
to change that page to MEM_RESERVE and PAGE_NOACCESS.
Hence, with Windbg, I found a page and I called to
NtAllocateVirtualMemory on that page address with PAGE_RESERVED and
PAGE_NOACCESS.
After this call - I didn't notice that the page got PAGE_NOACCESS
flag(with !address command), but I couldn't change a memory address
inside that page - (I got memory access error for eb command).
So the operation succeeded because I couldn't change the memory.
Do you have an idea why I don't see in windbg PAGE_NOACCESS for
that page after change it's permissions?
Next Step, I called to VirtualQuery on that free page, and the
function failed with error 998 (Invalid access to memory location)
Finally I want to identify free pages and reveal their size.
Do you have an idea how can I get this information it VirtualQuery fails?
Thanks in Advance!
MEM_RESERVE reserves a range of addresses in the VAD tree. It doesn't commit pages that can have protection applied to them. Read the description of the PAGE_* constants. The phrase "committed region of pages" is written repeatedly. The Protect parameter of NtAllocateVirtualMemory says it applies to a "committed region of pages". Similarly flProtect of VirtualAllocEx is valid when "the pages are being committed".
Here's a Python script the commits a page of memory that's protected with PAGE_NOACCESS. Then it attempts to read the first byte, which of course raises an access violation exception.
test.py:
from ctypes import *
MEM_COMMIT = 0x1000
PAGE_NOACCESS = 1
VirtualAlloc = WinDLL('kernel32').VirtualAlloc
VirtualAlloc.restype = c_void_p
addr = VirtualAlloc(None, 4096, MEM_COMMIT, PAGE_NOACCESS)
array = (c_char * 4096).from_address(addr)
array[0] # access violation
demo:
(test) C:\>cdb -xi ld python test.py
Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
CommandLine: python test.py
Symbol search path is: symsrv*symsrv.dll*C:\Symbols*
http://msdl.microsoft.com/download/symbols
Executable search path is:
(d70.d08): Break instruction exception - code 80000003 (first chance)
ntdll!LdrpDoDebuggerBreak+0x30:
00000000`77b78700 cc int 3
0:000> g
(d70.d08): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
python34!PyBytes_FromStringAndSize+0x70:
00000000`64bd5cc0 0fb601 movzx eax,byte ptr [rcx]
ds:00000000`00190000=??
0:000> !address 190000
Usage: <unclassified>
Allocation Base: 00000000`00190000
Base Address: 00000000`00190000
End Address: 00000000`00191000
Region Size: 00000000`00001000
Type: 00020000 MEM_PRIVATE
State: 00001000 MEM_COMMIT
Protect: 00000001 PAGE_NOACCESS

User32.dll!NtUserWindowFromPoint getting corrupt when loaded by Mumble on Windows RT

I'm working on porting Mumble over to Windows RT (using the jailbreak), and I've hit an issue where this one function is getting corrupted when Mumble loads.
Mumble (Corrupt function):
0:000> dq user32.dll+0x023918
77a63918 47c3004244696841 4770df010c16f241
77a63928 4770df010c17f241 4770df010c18f241
77a63938 4770df010c19f241 4770df010c1af241
77a63948 4770df010c1bf241 4770df010c1cf241
77a63958 4770df010c1df241 4770df010c1ef241
77a63968 4770df010c1ff241 4770df015c81f44f
77a63978 4770df010c21f241 4770df010c22f241
77a63988 4770df010c23f241 4770df010c24f241
0:000> u user32.dll+0x023918
* ERROR: Symbol file could not be found. Defaulted to export symbols for
C:\windows\system32\user32.dll -
user32!WindowFromPoint:
77a63918 6841 ldr r1,[r0,#4]
77a6391a 4469 add r1,r1,sp
77a6391c 0042 lsls r2,r0,#1
77a6391e 47c3 ?blx r8
77a63920 f2410c16 mov r12,#0x1016
77a63924 df01 svc #1
TeXworks (Expected output):
0:000> dq user32.dll+0x23918
77a63918 4770df010c15f241 4770df010c16f241
77a63928 4770df010c17f241 4770df010c18f241
77a63938 4770df010c19f241 4770df010c1af241
77a63948 4770df010c1bf241 4770df010c1cf241
77a63958 4770df010c1df241 4770df010c1ef241
77a63968 4770df010c1ff241 4770df015c81f44f
77a63978 4770df010c21f241 4770df010c22f241
77a63988 4770df010c23f241 4770df010c24f241
0:000> u user32.dll+0x23918
* ERROR: Symbol file could not be found. Defaulted to export symbols for C:\windows\system32\USER32.dll -
USER32!WindowFromPoint:
77a63918 f2410c15 mov r12,#0x1015
77a6391c df01 svc #1
77a6391e 4770 bx lr
77a63920 f2410c16 mov r12,#0x1016
77a63924 df01 svc #1
77a63926 4770 bx lr
77a63928 f2410c17 mov r12,#0x1017
77a6392c df01 svc #1
(Apologies for the less than stellar formatting of the code, a screenshot of the windows can be found here: http://i.imgur.com/M6mLHN1.png )
Mumble uses Qt (customized by the Mumble team, to my understanding), Protobuf, Boost, and OpenSSL
TeXworks uses Qt
What I've tried so far:
Disabling the application compatibility engine
Unloading user32.dll at load, then reloading it (calling FreeLibrary 100 times, then calling LoadLibrary)
Removing anything that might look even remotely suspect from the manifests (from Qt and Mumble)
Removing the entire manifests (from Qt and Mumble)
If I patch this one function using cdb after Mumble launches it all works awesomely, but if I don't patch it the first action performed that calls that function ends in a crash. Opening/closing windows and dragging all call that function, so it's rather critical to the program that it's there.
Any help or pointers on this would be more than appreciated.
Edit: I've verified that it's something inside mainCRTStartup that's mucking around with it, trying to figure out what exactly it is now.
Edit 2: Found a platform-specific hook hidden in the Mumble code that was causing my troubles. Solved.
Since I can finally answer this now, Mumble had some hooks hidden away that I didn't know about. I defined a custom entrypoint that called mainCRTStartup so I could step through that and find exactly where that memory was getting changed, it led me straight to the hook.
Here's the code I used for that:
EXTERN_C int WINAPI mainCRTStartup();
void __stdcall EntryPoint()
{
MessageBox(HWND_DESKTOP,L"Pause(Before mainCRTStartup)",L"Pause(Before mainCRTStartup",MB_OK);
mainCRTStartup();
ExitProcess(0);
}
That allowed me to attach a debugger at the messagebox and step through mainCRTStartup until I found the static initializer that was getting called to load the hook.

WinDbg not showing useful information

First let me say I am a total WinDbg noob, so this might be an easy question...
I have an application ("MyApp" - name changed to protect the innocent!) that I am trying to debug because it is throwing an exception. This only happens on user machines - I have not been able to reproduce it on my development machine. So I set up DebugDiag on the users machine and captured a Full Dump. Then I loaded the dump in WinDbg and did an analyze -v and a kp to try to figure out what was going on... but neither of these seem to give me the information that I'm looking for - the function (and hopefully the line number) of the line that is causing the problem... I think I have the symbol file loaded by specifying the path to 'MyApp.pdb' in the Symbol File Path:
srv*c:\symcache*http://msdl.microsoft.com/download/symbols;srv*c:\symcache*C:\dev\Customer\MyAppSln\MyApp\Debug
First, here's the output from kp:
0:004> kp
ChildEBP RetAddr
WARNING: Stack unwind information not available. Following frames may be wrong.
0502f474 7c347966 MyApp!DllMain+0x3e8a6
0502f4bc 7c3a2448 msvcr71!_nh_malloc(unsigned int size = <Memory access error>, int nhFlag = <Memory access error>)+0x24 [f:\vs70builds\3052\vc\crtbld\crt\src\malloc.c # 117]
0502f57c 7c3416b3 msvcp71!std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >::_Tidy(bool _Built = <Memory access error>, unsigned int _Newsize = <Memory access error>)+0x45 [f:\vs70builds\3077\vc\crtbld\crt\src\xstring # 1520]
0502f610 7c3a32de msvcr71!_heap_alloc(unsigned int size = <Memory access error>)+0xe0 [f:\vs70builds\3052\vc\crtbld\crt\src\malloc.c # 212]
0502f620 7c3b3f63 msvcp71!wmemcpy(wchar_t * _S1 = 0x04e463b9 "Ҹ???", wchar_t * _S2 = 0xffffffff "--- memory read error at address 0xffffffff ---", unsigned int _N = 0x4e25212)+0x14 [f:\vs70builds\3077\vc\crtbld\crt\src\wchar.h # 843]
0502f640 04e463b9 msvcp71!std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >::assign(class std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > * _Right = 0xffffffff, unsigned int _Roff = 0x4e25212, unsigned int _Count = 2)+0x7c [f:\vs70builds\3077\vc\crtbld\crt\src\xstring # 601]
0502f770 04df1077 MyApp!DllMain+0x65329
0502f824 04e01b35 MyApp!DllMain+0xffe7
0502ff08 04dfe034 MyApp!DllMain+0x20aa5
0502ff48 04dfde4f MyApp!DllMain+0x1cfa4
0502ff88 7648d0e9 MyApp!DllMain+0x1cdbf
0502ffc4 773499f9 kernel32!BaseThreadInitThunk+0xe
0502ffd4 7738198e ntdll!RtlQueryInformationAcl+0x8b
0502ffec 00000000 ntdll!_RtlUserThreadStart+0x1b
the line I'm specifically trying to decode is the 'MyApp!DllMain+0x65329' as this is the last line that seems to be executing, and the error is occurring within the malloc call, which is apparently where the exception is being thrown from. What am I doing wrong that makes it only display the module and offset instead of source file and line number?
I'm also not sure why the line above the malloc call is back in MyApp again - maybe someone can explain that too.
Just in case, here's the output from 'analyze -v':
0:004> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
*** WARNING: Unable to verify checksum for MyApp.exe
*** ERROR: Module load completed but symbols could not be loaded for MyApp.exe
*** WARNING: Unable to verify checksum for ThirdPartyDll.dll
*** ERROR: Symbol file could not be found. Defaulted to export symbols for ThirdPartyDll.dll -
*** WARNING: Unable to verify checksum for mdnsNSP.dll
*** ERROR: Symbol file could not be found. Defaulted to export symbols for mdnsNSP.dll -
*** ERROR: Symbol file could not be found. Defaulted to export symbols for SLC.dll -
FAULTING_IP:
MyApp!DllMain+3e8a6
04e1f936 8b16 mov edx,dword ptr [esi]
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 04e1f936 (MyApp!DllMain+0x0003e8a6)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 00000000
Attempt to read from address 00000000
PROCESS_NAME: MyApp.exe
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".
EXCEPTION_PARAMETER1: 00000000
EXCEPTION_PARAMETER2: 00000000
READ_ADDRESS: 00000000
FOLLOWUP_IP:
msvcr71!_heap_alloc+e0 [f:\vs70builds\3052\vc\crtbld\crt\src\malloc.c # 212]
7c3416b3 e88e0c0000 call msvcr71!__SEH_epilog (7c342346)
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
LAST_CONTROL_TRANSFER: from 00000000 to 773bbb33
FAULTING_THREAD: ffffffff
BUGCHECK_STR: APPLICATION_FAULT_ACTIONABLE_HEAP_CORRUPTION_heap_failure_freelists_corruption_NULL_POINTER_READ_SHUTDOWN
PRIMARY_PROBLEM_CLASS: ACTIONABLE_HEAP_CORRUPTION_heap_failure_freelists_corruption_SHUTDOWN
DEFAULT_BUCKET_ID: ACTIONABLE_HEAP_CORRUPTION_heap_failure_freelists_corruption_SHUTDOWN
STACK_TEXT:
773bbb33 ntdll!RtlpAllocateHeap+0x7ad
773a6e0c ntdll!RtlAllocateHeap+0x1e3
7c3416b3 msvcr71!_heap_alloc+0xe0
FAULTING_SOURCE_CODE:
No source found for 'f:\vs70builds\3052\vc\crtbld\crt\src\malloc.c'
SYMBOL_STACK_INDEX: 2
SYMBOL_NAME: msvcr71!_heap_alloc+e0
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: msvcr71
IMAGE_NAME: msvcr71.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 3e561eac
STACK_COMMAND: dds 7740c078 ; kb
FAILURE_BUCKET_ID: ACTIONABLE_HEAP_CORRUPTION_heap_failure_freelists_corruption_SHUTDOWN_c0000005_msvcr71.dll!_heap_alloc
BUCKET_ID: APPLICATION_FAULT_ACTIONABLE_HEAP_CORRUPTION_heap_failure_freelists_corruption_NULL_POINTER_READ_SHUTDOWN_msvcr71!_heap_alloc+e0
If you believe the PDB should be in your symbol path, you should run something like this:
!sym noisy
.reload MyApp.dll
kp
!sym noisy causes the debugger to give out more detailed information on why it couldn't load symbols - no MyApp.pdb found, found but does not match, etc. This will help you find out why it is not loading symbols. !sym noisy again turns off the verbose symbol output.
When you set the path for symbols, did you reload them?
.reload
I'm not sure your adding
srv*c:\symcache*C:\dev\Customer\MyAppSln\MyApp\Debug
to the symbol path has the desired effect.
I usually list all local paths in the .sympath first, and as the last step, I do .symfix+ to configure the public symbols using the microsoft symbol server:
.sympath C:\dev\Customer\MyAppSln\MyApp\Debug
.symfix+ c:\symcache
the rationale behind listing local paths first being that the debugger would not have to check the remote server for pdbs (that are not there anyways) as opposed to simply retrieving them locally.
Anyways, your problem is that the symbols for MyApp are not loaded therefore stack walking does not quite work.
Debugger walks the stack backwards, starting from the top, that's why you're seeing MyApp - this is where the access violation occurred.
Now, since debugger does not have the symbols at this point, it can only guess what invocation chain has led to the function on top.
And it guesses wrong by following a misleading path.

Resources