Pinning a DLL in memory (increase reference count) - debugging

I am trying to run an application, but the application exits due to an access violation. Running the application in the debugger I can see that this is caused by an unloaded library. I can not wait for the next release of the application, so I'm trying to workaround the problem.
I wonder whether WinDbg provides a way of increasing the reference count of a loaded module, similar to the C++ LoadLibrary() call. I could then break on module loads and increase the reference count on the affected DLL to see if I can use the application then.
I have already looked for commands starting with .load, !load, .lock, !lock, .mod and !mod in WinDbg help. .load will load the DLL as an extension into the debugger process, not into the target process.
Update
Forgot to mention that I have no source code, so I can't simply implement a LoadLibrary() call as a workaround and recompile.
The comment by Hans Passant leads me to .call and I tried to use it like
.call /v kernel32!LoadLibraryA("....dll")
but it gives the error message
Symbol not a function in '.call /v kernel32!LoadLibraryA("....dll")'
Update 2
Probably the string for the file name in .call should be a pointer to some memory in the target process instead of a string which resides in WinDbg.exe where I type the command. That again means I would probably mean to allocate some memory to store the string inside, so this might become more complex.

Using .call in windbg as always been finicky to me. I believe you are having trouble with it because kernel32 only has public symbols so the debugger doesn't know what it's arguments look like.
So let's look at some alternatives...
The easy way
You can go grab a tool like Process Hacker, which I think is a wonderful addition to any debugger's tool chest. It has an option to inject a DLL into a process.
Behind the scenes, it calls CreateRemoteThread to spawn a thread in the target process which calls LoadLibrary on the chosen DLL. With any luck, this will increase the module reference count. You can verify that the LoadCount has been increased in windbg by running the !dlls command before and after the dll injection.
The hard way
You can also dig into the internal data structures Windows uses to keep track of a process's loaded modules and play with the LoadCount. This changes between versions of Windows and is a serious no-no. But, we're debugging, so, what the hell? Let's do this.
Start by getting a list of loaded modules with !dlls. Suppose we care about your.dll; we might see something like:
0x002772a8: C:\path\to\your.dll
Base 0x06b80000 EntryPoint 0x06b81000 Size 0x000cb000 DdagNode 0x002b3a10
Flags 0x800822cc TlsIndex 0x00000000 LoadCount 0x00000001 NodeRefCount 0x00000001
We can see that the load count is currently 1. To modify it, we could use the address printed before the module path. It is the address of the the ntdll!_LDR_DATA_TABLE_ENTRY the process holds for that module.
r? #$t0 = (ntdll!_LDR_DATA_TABLE_ENTRY*) 0x002772a8
And, now you can change the LoadCount member to something larger as so:
?? #$t0->LoadCount = 2
But, as I said, this stuff changes with new versions of Windows. On Windows 8, the LoadCount member was moved out of _LDR_DATA_TABLE_ENTRY and into a new ntdll!_LDR_DDAG_NODE structure. In place of it, there is now an ObsoleteNodeCount which is not what we want.
On Windows 8, we would run the following command instead:
?? #$t0->DdagNode->LoadCount = 2
And, time to check our work...
0x002772a8: C:\path\to\your.dll
Base 0x06b80000 EntryPoint 0x06b81000 Size 0x000cb000 DdagNode 0x002b3a10
Flags 0x800822cc TlsIndex 0x00000000 LoadCount 0x00000002 NodeRefCount 0x00000001
Awesome. It's 2 now. That'll teach FreeLibrary a lesson about unloading our DLLs before we say it can.
The takeaway
Try the easy way first. If that doesn't work, you can start looking at the internal data structures Windows uses to keep track of this kind of stuff. I don't provide the hard way hoping you'll actually try it, but that it might make you more comfortable around the !dlls command and those data structures in the future.
Still, all modifying the LoadCount will afford you is confirmation that you are seeing a DLL get unloaded before it should have. If the problem goes away after artificially increasing the LoadCount, meaning that you've confirmed your theory, you'll have to take a different approach to debugging it -- figuring out when and why it got unloaded.

A dll that is linked while compiling will normally have a LoadCount of -1 (0xffff) and it is not Unloadable Via FreeLibrary
so you can utilize the loadModule Event to break on a Dynamically Loaded Module and increase the LoadCount during the Event
Blink of InLoadOrderModuleList (last dll Loaded in the process) when on initial break at ntdll!Dbgbreak() xp-sp3 for an arbitrary console app which uses a dll
0:000> dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName LoadCount ##((( #$peb)->Ldr)->InLoadOrderModuleList.Blink)
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\system32\GDI32.dll"
+0x038 LoadCount : 0xffff <----------- not unloadable via FreeLibrary
setting up break on Specific Module Load
0:000> sxe ld skeleton
0:000> g
ModLoad: 10000000 10005000 C:\skeleton.dll
ntdll!KiFastSystemCallRet:
7c90e514 c3 ret
the LoadModule Breaks on MapSection so Ldr isnt yet updated
0:000> dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName LoadCount ##((( #$peb)->Ldr)->InLoadOrderModuleList.Blink)
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\system32\GDI32.dll"
+0x038 LoadCount : 0xffff
go up until the Ldr is updated
0:000> gu;gu;gu
ntdll!LdrpLoadDll+0x1e9:
7c91626a 8985c4fdffff mov dword ptr [ebp-23Ch],eax ss:0023:0013fa3c=00000000
blink showing the last loaded Module notice loadCount 0 not updated yet
0:000> dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName LoadCount ##((( #$peb)->Ldr)->InLoadOrderModuleList.Blink)
+0x024 FullDllName : _UNICODE_STRING "C:\skeleton.dll"
+0x038 LoadCount : 0
dump the LoadEntry of the module
0:000> !dlls -c skeleton
Dump dll containing 0x10000000:
**0x00252840:** C:\skeleton.dll
Base 0x10000000 EntryPoint 0x10001000 Size 0x00005000
Flags 0x00000004 LoadCount 0x00000000 TlsIndex 0x00000000
LDRP_IMAGE_DLL
increase load count arbitrarily and redump (process attach hasnt been called yet)
0:000> ed 0x252840+0x38 4
0:000> !dlls -c skeleton
Dump dll containing 0x10000000:
0x00252840: C:\skeleton.dll
Base 0x10000000 EntryPoint 0x10001000 Size 0x00005000
Flags 0x00000004 LoadCount 0x00000004 TlsIndex 0x00000000
LDRP_IMAGE_DLL
run the binary
0:000> g
dll is loaded into the process break with ctrl+break
Break-in sent, waiting 30 seconds...
(aa0.77c): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
7c90120e cc int 3
dump and see system has updated the loadcount to our count+1 also process attach has been called
0:001> !dlls -c skeleton
Dump dll containing 0x10000000:
0x00252840: C:\skeleton.dll
Base 0x10000000 EntryPoint 0x10001000 Size 0x00005000
Flags 0x80084004 LoadCount 0x00000005 TlsIndex 0x00000000
LDRP_IMAGE_DLL
LDRP_ENTRY_PROCESSED
LDRP_PROCESS_ATTACH_CALLED
btw use ken johnsons (skywing) sdbgext !remotecall instead of .call
it doesnt require Private Symbols
.load sdbgext
!remotecall kernel32!LoadLibraryA 0 "c:\skeleton.dll" ; g
should load the dll in the process
or use
!loaddll "c:\\skeleton.dll" from the same extension
kernel32!LoadLibraryA() will be run when execution is resumed
0:002> g
kernel32!LoadLibraryA() [conv=0 argc=4 argv=00AC0488]
kernel32!LoadLibraryA() returned 10000000

Simplest way - get .dll path and LoadLibrary it.
It will increase .dll reference count and .dll will not be released.

Related

writing a !address equivalent in WinAPI

Implementing !address feature of Windbg...
I am using VirtualQueryEx to query another Process memory and using getModuleFileName on the base addresses returned from VirtualQueryEx gives the module name.
What is left are the other non-module regions of a Process. How do I determine if a file is mapped to a region, or if the region represents the stack or the heap or PEB/TEB etc.
Basically, How do I figure out if a region represents Heap, the stack or PEB. How does Windbg do it?
One approach is to disassemble the code in the debugger extension DLL that implements !address. There is documentation within the Windbg help file on writing an extension. You could use that documentation to reverse engineer where the handler of !address is located. Then browsing through the disassembly you can see what functions it calls.
Windbg has support for debugging another instance of Windbg, specifically to debug an extension DLL. You can use this facility to better delve into the implementation of !address.
While the reverse engineering approach may be tedious, it will be more deterministic than theorizing how !address is implemented and trying out each theory.
To add to #Χpẘ answer, the reverse of the command shouldn't be really hard as debugger extensions DLLs come with symbols (I already reversed one to explain the internal flag of the !heap command).
Note that it is just a quick overview, I haven't perused inside it too much.
According to the !address documentation the command is located in exts.dll library. The command itself is located in Extension::address.
There are two commands handled there, a kernel mode (KmAnalyzeAddress) and a user mode one (UmAnalyzeAddress).
Inside UmAnalyzeAddress, the code:
Parse the command line: UmParseCommandLine(CmdArgs &,UmFilterData &)
Check if the process PEB is available IsTypeAvailable(char const *,ulong *) with "${$ntdllsym}!_PEB"
Allocate a std::list of user mode ranges: std::list<UmRange,std::allocator<UmRange>>::list<UmRange,std::allocator<UmRange>>(void)
Starts a loop to gather the required information:
UmRangeData::GetWowState(void)
UmMapBuild
UmMapFileMappings
UmMapModules
UmMapPebs
UmMapTebsAndStacks
UmMapHeaps
UmMapPageHeaps
UmMapCLR
UmMapOthers
Finally the results are finally output to screen using UmPrintResults.
Each of the above function can be simplfied to basic components, e.g. UmFileMappingshas the following central code:
.text:101119E0 push edi ; hFile
.text:101119E1 push offset LibFileName ; "psapi.dll"
.text:101119E6 call ds:LoadLibraryExW(x,x,x)
.text:101119EC mov [ebp+hLibModule], eax
.text:101119F2 test eax, eax
.text:101119F4 jz loc_10111BC3
.text:101119FA push offset ProcName ; "GetMappedFileNameW"
.text:101119FF push eax ; hModule
.text:10111A00 mov byte ptr [ebp+var_4], 1
.text:10111A04 call ds:GetProcAddress(x,x)
Another example, to find each stacks, the code just loops trhough all threads, get their TEB and call:
.text:1010F44C push offset aNttib_stackbas ; "NtTib.StackBase"
.text:1010F451 lea edx, [ebp+var_17C]
.text:1010F457 lea ecx, [ebp+var_CC]
.text:1010F45D call ExtRemoteTyped::Field(char const *)
There is a lot of fetching from _PEB, _TEB, _HEAP and other internal structures so it's not probably doable without going directly through those structures. So, I guess that some of the information returned by !address are not accessible through usual / common APIs.
You need to determine if the address you are interested in lies within a memory mapped file. Check out --> GetMappedFileName. Getting the heap and stack addresses of a process will be a little more problematic as the ranges are dynamic and don't always lie sequentially.
Lol, I don't know, I would start with a handle to the heap. If you can spawn/inherit a process then you more than likely can access the handle to the heap. This function looks promising: GetProcessHeap . That debug app runs as admin, it can walk the process chain and spy on any user level process. I don't think you will be able to access protected memory of kernel mode apps such as File System Filters, however, as they are dug down a little lower by policy.

Why I see ntdll disassembly, not my assembly code in WinDbg?

I want to analyze my first assembler program. Look at registers, execute step by step etc. It's for learning purposes.
I have a problem. Disassembly is strange. I can't find my code. I step through some ntdll functions. There is no my MessageBoxA, no ExitProcess etc.
I have used OllyDbg (32-bit) before and OllyDbg starts executing from normal entry of the program. Disassembly in OllyDbg was very similar to my MASM code.
What I am doing wrong? Why there is different disassembly? How to step through my code, not ntdll?
ollydbg interpreted the Address of Entry Point in the Pe Header and set a temporary BreakPoint and continued execution until that address Which Normally would be main if it was an assembler program or WinMainCrtStartup if it was compiler generated binary
Windbg will not do that it will stop at the system BreakPoint ( the default behavior of the Debug Apis)
you have two options
if it is your own assembler program and you have symbols for it
do
.reload /f
bp [your exe]![Your EntryPoint]
bl to confirm if it is right
g to continue execution and break in your code
if it is a third party program for which you do not have source
lmm [name of third party binary]
!dh [start address of the third party binary] ( see lm results to know the address)
look for Address of Entry Point
bp [start address of the third party binary] + [Address of entrypoint]
g
windbg will stop in the user code
a sample for method 2 on calc.exe under win7 sp1
0:000> lm m calc
start end module name
00210000 002d0000 calc (deferred)
0:000> !dh 210000
---------------------
12D6C address of entry point
0:000> bp 210000+12d6c
0:000> bl
0 e 00222d6c 0001 (0001) 0:**** calc!WinMainCRTStartup
0:000> g
calc!WinMainCRTStartup:
00222d6c e84bfdffff call calc!__security_init_cookie (00222abc)
Short answer: bu $exentry; g
Long answer:
A process starts its user mode execution in OS code found in ntdll which does all sorts of initialization and finally calls the entry point stored in the PE header.
WinDbg breaks in what's called the initial breakpoint (or ibp in WinDbg parlance). OllyDbg continues running after this breakpoint and sets a breakpoint on the PE entry point. That's nice, except when it screws you. For example, a well-known anti-debugging trick against olly was creating a binary with custom TLS callbacks, a rather esoteric feature. These callbacks are called before the PE entry point and thus go undetected by olly.
What you want to do is indeed set a breakpoint on the PE entry point.
The hard way is doing it like blabb did - using !dh to dump the module header and setting a breakpoint on the address noted there. Even the hard way can be a little easier, as there's no need to lm before running !dh. In WinDbg module names are interpreted at their base addresses. For example:
0:000> !dh -f notepad
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
14C machine (i386)
5 number of sections
55BEBE90 time date stamp Sun Aug 02 18:06:24 2015
...
OPTIONAL HEADER VALUES
10B magic #
12.10 linker version
15400 size of code
1F000 size of initialized data
0 size of uninitialized data
159F0 address of entry point
1000 base of code
And then it's simply a matter of:
0:000> bu notepad+159F0
0:000> bl
0 e 002259f0 0001 (0001) 0:**** notepad!WinMainCRTStartup
But the easy way is to use one of WinDbg's builtin pseudo-registers ("automatic pseudo-registers" is their term, in contrast to user-defined pseudo-registers), namely the aformentioned $exentry.
0:000> ? notepad + 159F0 == $exentry
Evaluate expression: 1 = 00000001
0:000> ? $exentry
Evaluate expression: 2251248 = 002259f0
You can see that 002259f0 is the same address as the one we calculated by hand.
For completeness I note that there's the $iment(address) operator that gives the entry point of any module, not just the main module (the EXE):
0:000> ? $iment(winspool)
Evaluate expression: 1889096272 = 70995250
0:000> ln $iment(winspool)
Browse module
Set bu breakpoint
(70995250) WINSPOOL!_DllMainCRTStartup | (7099526b) WINSPOOL!__CppXcptFilter
Exact matches:
WINSPOOL!_DllMainCRTStartup (<no parameter info>)

How much system-provided, user-space process initialization can be bypassed?

On most x86-based Unix systems you can construct a "static" executable that does not load any system-provided DLL(-equivalent)s, and runs a bare minimum of instructions before terminating itself normally. For instance, this works on x86/Linux (32-bit). Technically I might not even need the second mov instruction, as IIRC the ABI guarantees all registers are cleared to zero at the program entrypoint.
$ cat > test.s
.text
.globl start
start:
movl $1,%eax # _exit
movl $0,%ebx
int $0x80
$ as -32 test.s -o test.o
$ ld -m elf_i386 -e start test.o -o test
My question is how close you can get on Windows to this bare minimum of instructions executed in user space between process creation and termination. I have heard rumors that the kernelside process creation logic will load ntdll.dll and possibly also kernel32.dll into every process whether or not the PE file references them, and that both of these have nontrivial startup code that may be unavoidable. I have also heard rumors that system call numbers are not part of the stable ABI, so you have to call through ntdll for cross-version compatibility, even if you're bypassing Win32. I would like to know to what extent these rumors are true, and to what extent their implications can be worked around.
This is an exercise in what is possible in an experiment, rather than what is a good idea in a product shipped to end-users. A concrete motivation for asking this question is that if it were possible to cut the "mandatory" system DLLs completely out of the loop then it would be straightforward to measure what proportion of process startup time is due to their self-initialization.
I'm not very experienced with low-level Windows programming, so if you can give a step-by-step recipe like the above for constructing the "minimal" executable you propose as your answer, that would be appreciated.
I might be able to answer part of your question, but I don't know (and I doubt) that you can bypass them.
I have also heard rumors that system call numbers are not part of the
stable ABI, so you have to call through ntdll for cross-version
compatibility, even if you're bypassing Win32
This is true, each major kernel version comes with newer system calls numbers.
The reason why the syscalls number are not permanent is that the syscall table is generated by name (not by number). So each time you insert a new syscall the older ones get "pushed" farther (and the other way around if a syscall gets removed, although this is quite rare).
The syscall table name (kernel side) is KiServiceTable (part of KeServiceDescriptorTable and KeServiceDescriptorTableShadow).
kd> dps nt!KeServiceDescriptorTable L4
fffff800`1236ba80 fffff800`1215f700 nt!KiServiceTable
fffff800`1236ba88 00000000`00000000
fffff800`1236ba90 00000000`000001b1
fffff800`1236ba98 fffff800`1216048c nt!KiArgumentTable
There are 0x1B1 system calls (windows 8.1) and the system calls pointers are located in the KiServiceTable.
An userland syscall stub look like this (Windows 10):
0:004> u ntdll!ntcreatefile
ntdll!NtCreateFile:
00007fff`1d913ac0 4c8bd1 mov r10,rcx ; args
00007fff`1d913ac3 b855000000 mov eax,55h ; syscall number
00007fff`1d913ac8 0f05 syscall ; x64 instruction, perform ring3 -> ring0 transition
00007fff`1d913aca c3 ret
00007fff`1d913acb 0f1f440000 nop dword ptr [rax+rax]
The same one from Windows 8.1 x64:
0:003> u ntdll!ntcreatefile
ntdll!NtCreateFile:
00007ff8`62071720 4c8bd1 mov r10,rcx
00007ff8`62071723 b854000000 mov eax,54h
00007ff8`62071728 0f05 syscall
00007ff8`6207172a c3 ret
00007ff8`6207172b 0f1f440000 nop dword ptr [rax+rax]
As you can see the same function leads to different syscall numbers (0x55 for Windows 10 and 0x54 for Windows 8.1)
Pointers in the syscall table (inside the kernel) are now "encoded" in a simple way (they were plain pointers before). Let's take a look at index 0x54:
kd> ? nt!KiServiceTable+(dwo(nt!KiServiceTable + 0x54 * 4) >> 4)
Evaluate expression: -8795786429460 = fffff800`12463bec
What symbols is at this address?
kd> ln fffff800`12463bec
Browse module
Set bu breakpoint
(fffff800`12463bec) nt!NtCreateFile | (fffff800`12463c70) nt!IopCreateFile
Exact matches:
nt!NtCreateFile (<no parameter info>)
So ntdll!ntcreatefile leads to kernel function nt!NtCreateFile (not a big surprise :)
You can find a syscall table for major Windows systems at this URL.
Actually, the leaked source from the windows XP kernel (in fact the WRK) shows how the service table is generated (in an assembly file).
I have heard rumors that the kernelside process creation logic will
load ntdll.dll and possibly also kernel32.dll into every process
whether or not the PE file references them, and that both of these
have nontrivial startup code that may be unavoidable
That's true. I'll not go through the whole process which is very complicated and discussed to great length in the Windows Internals books .
ntdll is loaded because a big part of the user-land windows loader is located there (if you have symbolic information, look at all the function starting with Ldr).
The kernel32.dll is also loaded inside process address space because part of the main thread initialization is located there. It is also needed because a part of exception handling is done there.
I could have gone with an executable that execute just a single instruction (namely RET on x86 / x64), but the result is the same with notepad.
Put a breakpoint at entry point:
0:000> bp $exentry
0:000> bl
0 e 00007ff6`275c4030 0001 (0001) 0:**** notepad!WinMainCRTStartup
0:000> g
Breakpoint 0 hit
notepad!WinMainCRTStartup:
00007ff6`275c4030 4883ec28 sub rsp,28h
Stack trace at entry:
0:000> kb
# RetAddr : Args to Child : Call Site
00 00007fff`1ce62d92 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : notepad!WinMainCRTStartup
01 00007fff`1d889f64 : 00007fff`1ce62d70 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x22
02 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x34
So we have ntdll!RtlUserThreadStart which calls KERNEL32!BaseThreadInitThunkwhich calls the entry point of the executable.
0:000> u KERNEL32!BaseThreadInitThunk L 10
KERNEL32!BaseThreadInitThunk:
00007fff`1ce62d70 48895c2408 mov qword ptr [rsp+8],rbx
00007fff`1ce62d75 57 push rdi
00007fff`1ce62d76 4883ec20 sub rsp,20h
00007fff`1ce62d7a 498bf8 mov rdi,r8
00007fff`1ce62d7d 488bda mov rbx,rdx
00007fff`1ce62d80 85c9 test ecx,ecx
00007fff`1ce62d82 7517 jne KERNEL32!BaseThreadInitThunk+0x2b (00007fff`1ce62d9b)
00007fff`1ce62d84 488bca mov rcx,rdx
00007fff`1ce62d87 ff15d3390600 call qword ptr [KERNEL32!_guard_check_icall_fptr (00007fff`1cec6760)]
00007fff`1ce62d8d 488bcf mov rcx,rdi
00007fff`1ce62d90 ffd3 call rbx ; call entry point
00007fff`1ce62d92 8bc8 mov ecx,eax
00007fff`1ce62d94 ff15be2f0600 call qword ptr [KERNEL32!_imp_RtlExitUserThread (00007fff`1cec5d58)]
00007fff`1ce62d9a cc int 3
As you can see, returning from the entry point calls KERNEL32!_imp_RtlExitUserThread (which calls ExitProcess() for the main thread).
The closest you can get to the initialization itself is with TLS callbacks as far as i'm aware, here is some explanation on how things work; TLS callbacks are execute before the entry point of the application and they do have some limitations (that can be worked around with some effort).
As to measure startup time you should avoid trying to do it inside your own aplication; a separated process would be best for that (A debugger could do the trick in a much more reliable way).
Regarding a minimal executable, you can build an executable with only RET (as mentioned by #Neitsa); windows will load the program on memory but will not execute anything, it will basically only map things to memory and that's all.
With FASM you can build an exe that does literally nothing, like the following:
include '%fasm%\win32ax.inc'
section 'a' code readable executable
start:
retn
.end start

GDB doesn't disassemble program running in RAM correctly

I have an application compiled using GCC for an STM32F407 ARM processor. The linker stores it in Flash, but is executed in RAM. A small bootstrap program copies the application from Flash to RAM and then branches to the application's ResetHandler.
memcpy(appRamStart, appFlashStart, appRamSize);
// run the application
__asm volatile (
"ldr r1, =_app_ram_start\n\t" // load a pointer to the application's vectors
"add r1, #4\n\t" // increment vector pointer to the second entry (ResetHandler pointer)
"ldr r2, [r1, #0x0]\n\t" // load the ResetHandler address via the vector pointer
// bit[0] must be 1 for THUMB instructions otherwise a bus error will occur.
"bx r2" // jump to the ResetHandler - does not return from here
);
This all works ok, except when I try to debug the application from RAM (using GDB from Eclipse) the disassembly is incorrect. The curious thing is the debugger gets the source code correct, and will accept and halt on breakpoints that I have set. I can single step the source code lines. However, when I single step the assembly instructions, they make no sense at all. It also contains numerous undefined instructions. I'm assuming it is some kind of alignment problem, but it all looks correct to me. Any suggestions?
It is possible that GDB relies on symbol table to check instruction set mode which can be Thumb(2)/ARM. When you move code to RAM it probably can't find this information and opts back to ARM mode.
You can use set arm force-mode thumb in gdb to force Thumb mode instruction.
As a side note, if you get illegal instruction when you debugging an ARM binary this is generally the problem if it is not complete nonsense like trying to disassembly data parts.
I personally find it strange that tools doesn't try a heuristic approach when disassembling ARM binaries. In case of auto it shouldn't be hard to try both modes and do an error count to decide which mode to use as a last resort.

Can I set a breakpoint in ntdll.dll!_LdrpInitializeProcess?

When debugging a Windows process, it would sometimes be convenient to break as early as possible.
Inital Callstack looks like this: (you get this e.g. when you set a breakpoint in a DllMain function on DLL_PROCESS_ATTACH)
...
ntdll.dll!_LdrpCallInitRoutine#16() + 0x14 bytes
ntdll.dll!_LdrpRunInitializeRoutines#4() + 0x205 bytes
> ntdll.dll!_LdrpInitializeProcess#20() - 0x96d bytes
ntdll.dll!__LdrpInitialize#12() + 0x6269 bytes
ntdll.dll!_KiUserApcDispatcher#20() + 0x7 bytes
so setting a breakpoint in one of these ntdll routines should really break the process very early.
However, I can't figure out how to set a breakpoint there prior to starting the process in the debugger. Is it possible in Visual Studio (2005)? How? Can it be done in WinDbg?
I would use something like GFlags to launch the debugger when the process starts.
Here is a sample gflags settings for test.exe
And here is the debugger output. Notice the call-stack with ntdll!LdrpInitializeProcess
CommandLine: "C:\temp\test.exe"
Symbol search path is:
srv*;srvc:\symbolshttp://msdl.microsoft.com/download/symbols
Executable search path is: ModLoad:
0000000000d20000 0000000000d28000
image0000000000d20000 (1b40.464):
Break instruction exception - code
80000003 (first chance)
ntdll!LdrpDoDebuggerBreak+0x30:
0000000077c7cb60 cc int
3 0:000> k Child-SP RetAddr
Call Site 000000000012ed70
0000000077c32ef5
ntdll!LdrpDoDebuggerBreak+0x30
000000000012edb0 0000000077c11a17
ntdll!LdrpInitializeProcess+0x1b4f
000000000012f2a0 0000000077bfc32e
ntdll! ?? ::FNODOBFM::string'+0x29220
000000000012f310 00000000`00000000
ntdll!LdrInitializeThunk+0xe
Or you could open the process within the debugger like Windbg which would break into ntdll!LdrpInitializeProcess by default.
HTH
I have found out how to do it in Visual Studio.
The problem here is, that setting a breakpoint in any assembly function will be remembered as a "Data Breakpoint". These breakpoints are disabled as soon as the process stops, so even if I set one in this function (I can do this because I have the function on the stack if I set a breakpoint in any DllMain function) this breakpoint will be disabled for a new process run.
However for ntdll.dll (and kernel32.dll) the load addresses are pretty much fixed and won't change (and least not until reboot).
So, before starting the process, I just have to re-enable the Data Breakpoint for the address that corresponds to this NtDll function and the debugger will then stop there.

Resources