Where is located the Initial Breakpoint? - windows

I am writing a simple debugger for learning purposes. I need to know where the Initial Breakpoint set by Windows is located to handle it properly. Read somewhere that is should be at the function DbgBreakPoint() from ntdll.dll, however that function resolves to address 0x77ab0a60 and from my tests the Initial Breakpoint always raises at address 0x77aedbcf. Is this a function or just some random address with an INT 3 instruction? If I am not mistaken ntdll.dll is always loaded at the same address, if so do programs always break at this exact address, or is there a variation?

process in user mode begin execute from LdrInitializeThunk, it call LdrpInitializeProcess. this routine, after load all static dependencies but before call it initialization routines - check are debugger present (BeingDebugged member of PEB) and if yes - call LdrpDoDebuggerBreak where exist int 3 instruction. in case wow64 process the LdrpDoDebuggerBreak will be called 2 time - from 64 and 32 bit dll. as result 64-bit debugger got 2 breakpoints - STATUS_BREAKPOINT and STATUS_WX86_BREAKPOINT.
how handle this - already debugger must select yourself. interactive debugger simply stop here. another debugger tools, usually simply skip(handle) first STATUS_BREAKPOINT (and STATUS_WX86_BREAKPOINT) by returning DBG_CONTINUE

Related

Are Win32 applications automatically linked against ntdll.dll?

I've just found out by accident that doing this GetModuleHandle("ntdll.dll") works without a previous call to LoadLibrary("ntdll.dll").
This means ntdll.dll is already loaded in my process.
Is it safe to assume that ntdll.dll will always be loaded on Win32 applications, so that a call to LoadLibrary is not necessary?
From MSDN on LoadLibrary() (emphasis mine):
The system maintains a per-process reference count on all loaded
modules. Calling LoadLibrary increments the reference count. Calling
the FreeLibrary or FreeLibraryAndExitThread function decrements the
reference count. The system unloads a module when its reference count
reaches zero or when the process terminates (regardless of the
reference count).
In other words, continue to call LoadLibrary() and ensure you get your handle to ntdll.dll to be safe -- but the system will almost certainly be bumping a reference count as it should already be loaded.
As for "is it really always loaded?", see Windows Internals on the Image Loader (the short answer is yes, ntdll.dll is part of the loader itself and is always present).
The relevant paragraph is:
The image loader lives in the user-mode system DLL Ntdll.dll and not in the kernel library. Therefore, it behaves just like standard code that is part of a DLL, and it is subject to the same restrictions in terms of memory access and security rights. What makes this code special is the guaranty that it will always be present in the running process (Ntdll.dll is always loaded) and that it is the first piece of code to run in user mode as part of a new application. (When the system builds the initial context, the program counter, or instruction pointer is set to an initialization function inside Ntdll.dll.)

What Does Windows Do Before Main() is Called?

Windows must do something to parse the PE header, load the executable in memory, and pass command line arguments to main().
Using OllyDbg I have set the debugger to break on main() so I could view the call stack:
It seems as if symbols are missing so we can't get the function name, just its memory address as seen. However we can see the caller of main is kernel32.767262C4, which is the callee of ntdll.77A90FD9. Towards the bottom of the stack we see RETURN to ntdll.77A90FA4 which I assume to be the first function to ever be called to run an executable. It seems like the notable arguments passed to that function are the Windows' Structured Exception Handler address and the entry point of the executable.
So how exactly do these functions end up in loading the program into memory and getting it ready for the entry point to execute? Is what the debugger shows the entire process executed by the OS before main()?
if you call CreateProcess system internally call ZwCreateThread[Ex] to create first thread in process
when you create thread - you (if you direct call ZwCreateThread) or system initialize the CONTEXT record for new thread - here Eip(i386) or Rip(amd64) the entry point of thread. if you do this - you can specify any address. but when you call say Create[Remote]Thread[Ex] - how i say - the system fill CONTEXT and it set self routine as thread entry point. your original entry point is saved in Eax(i386) or Rcx(amd64) register.
the name of this routine depended from Windows version.
early this was BaseThreadStartThunk or BaseProcessStartThunk (in case from CreateProcess called) from kernel32.dll.
but now system specify RtlUserThreadStart from ntdll.dll . the RtlUserThreadStart usually call BaseThreadInitThunk from kernel32.dll (except native (boot execute) applications, like smss.exe and chkdsk.exe which no have kernel32.dll in self address space at all ). BaseThreadInitThunk already call your original thread entry point, and after (if) it return - RtlExitUserThread called.
the main goal of this common thread startup wrapper - set the top level SEH filter. only because this we can call SetUnhandledExceptionFilter function. if thread start direct from your entry point, without wrapper - the functional of Top level Exception Filter become unavailable.
but whatever the thread entry point - thread in user space - NEVER begin execute from this point !
early when user mode thread begin execute - system insert APC to thread with LdrInitializeThunk as Apc-routine - this is done by copy (save) thread CONTEXT to user stack and then call KiUserApcDispatcher which call LdrInitializeThunk. when LdrInitializeThunk finished - we return to KiUserApcDispatcher which called NtContinue with saved thread CONTEXT - only after this already thread entry point begin executed.
but now system do some optimization in this process - it copy (save) thread CONTEXT to user stack and direct call LdrInitializeThunk. at the end of this function NtContinue called - and thread entry point being executed.
so EVERY thread begin execute in user mode from LdrInitializeThunk. (this function with exactly name exist and called in all windows versions from nt4 to win10)
what is this function do ? for what is this ? you may be listen about DLL_THREAD_ATTACH notification ? when new thread in process begin executed (with exception for special system worked threads, like LdrpWorkCallback)- he walk by loaded DLL list, and call DLLs entry points with DLL_THREAD_ATTACH notification (of course if DLL have entry point and DisableThreadLibraryCalls not called for this DLL). but how this is implemented ? thanks to LdrInitializeThunk which call LdrpInitialize -> LdrpInitializeThread -> LdrpCallInitRoutine (for DLLs EP)
when the first thread in process start - this is special case. need do many extra jobs for process initialization. at this time only two modules loaded in process - EXE and ntdll.dll . LdrInitializeThunk
call LdrpInitializeProcess for this job. if very briefly:
different process structures is initialized
loading all DLL (and their dependents) to which EXE statically
linked - but not call they EPs !
called LdrpDoDebuggerBreak - this function look - are debugger
attached to process, and if yes - int 3 called - so debugger
receive exception message - STATUS_BREAKPOINT - most debuggers can
begin UI debugging only begin from this point. however exist
debugger(s) which let as debug process from LdrInitializeThunk -
all my screenshots from this kind debugger
important point - until in process executed code only from
ntdll.dll (and may be from kernel32.dll) - code from another
DLLs, any third-party code not executed in process yet.
optional loaded shim dll to process - Shim Engine initialized. but
this is OPTIONAL
walk by loaded DLL list and call its EPs with
DLL_PROCESS_DETACH
TLS Initializations and TLS callbacks called (if exists)
ZwTestAlert is called - this call check are exist APC in thread
queue, and execute its. this point exist in all version from NT4 to
win 10. this let as for example create process in suspended state
and then insert APC call ( QueueUserAPC ) to it thread
(PROCESS_INFORMATION.hThread) - as result this call will be
executed after process will be fully initialized, all
DLL_PROCESS_DETACH called, but before EXE entry point. in context
of first process thread.
and NtContinue called finally - this restore saved thread context
and we finally jump to thread EP
read also Flow of CreateProcess

WinDbg shows some variables but not others, shows some variables in same location

I'm trying to use WinDbg 6.2.9200.16384 x64 over a serial cable to debug a driver I'm writing. WinDbg connects to the target machine (Windows 8) just fine, and I see all the dbgprints as the system boots and loads everything. I can load the symbols for my driver just fine and can set breakpoints, and when my driver hits those breakpoints, the system halts as expected. This is where things get weird: when I hit a breakpoint, I can only see some of the local variables in my function in both the locals window and when using the 'dv' command. I created a variable to test with:
int myInt = 8;
When I use a dbgprint to show the value of myInt, it works fine and I see it as 8. However, the variable doesn't even appear at all in the locals window or with the 'dv' command. Other variables do, such as
ULONG rcb = 0;
and I can see its value just fine in the locals window. These variables are literally declared one after the other.
Another symptom of this strange problem is this. I have a function
ULONG someFunction(UINT16 offset) {
ULONG rcb, tempAddr, temp, temp1;
ULONG writeAddr, readAddr;
UINT16 dev;
dev = 15;
...
}
I call this function like so:
someFunction(0x777);
When I set a breakpoint in this function and inspect the variable values with WinDbg, nothing makes any sense. First, it only sees 4 of my 8 variables, just offset, rcb, writeAddr, and readAddr. It tells me the value of offset is not 0x777 as I would expect, but 0xE061 (this changes each time I run the code). When I look closer at the locals window (same information is shown via 'dv' and '? varname' commands) I notice that the location of offset and the location of rcb are the exact same address. Likewise, writeAddr and readAddr are stored at the same address as well. None of the other variables are detected by the debugger.
I am convinced that I've loaded the symbols properly, the source and symbols paths are set correctly, I've run '.reload /f' a million times with no errors loading my driver's symbols. I'm still able to break and step through other lines of code, but the locals just don't make any sense. When I dbgprint, the correct values are shown so it seems like this is a problem with the debugger itself, not with my driver. Any ideas?
<>
Nowadays Compiler has been enhanced a lot to get better optimized binary with optimized performance and other metrics. Hence compiler store few variables as locals(visible via 'dv /v' command) and store other variables in their registers. That's the reason you didn't see variable int myInt in dv command. We can get to know which registers are being used for the variables, by disassembling the function using 'uf binary!functionname' or by viewing disassembled code in Windbg View-> Disassembly.
Note that the driver may behave little differently with and without optimization of the compiler in the aspects of performance, memory usage, etc. So its always recommended to debug the one generated from the default optimized compiler, as this is the one used in realtime user scenario.
I fixed the problem. To anyone else who runs into this same thing: I was working with a free build of the driver, so the compiler had optimized out a lot of my variables. To fix it, either compile a checked version of the driver, or add the line
MSC_OPTIMIZATION=/Od /Oi
to your sources file to disable optimizations for the free build. Hope this helps anyone with the same problem.

How does a debugger set breakpoints if the image is in read-only memory?

How does a debugger set breakpoints if the image is in read-only memory? I know there are hardware breakpoints, but in the debugger I use (OllyDbg) those have to be set specially using a different dialog than normal breakpoints.
Explanation:
Here is a routine in a debugger that is comparing itself to a copy of itself. EDX points to the running image, EBX points to the known good copy of the image. The breakpoint on 4010CE only is reached if there is a mismatch. The character being compared is in the AL register. As you can see the debugger shows EB F6 at 10CE, but this is false. 10CE actually has CC in it, as you can see by looking at the AL register. This is because the debugger has secretely inserted the CC to perform the breakpoint.
The debugger first has to change the memory protection of the page it wants to write to. This can be done with VirtualProtectEx. After that it is able to write with WriteProcessMemory and then set the protection back to the original value.
Let me preface this with a disclaimer that I'm not familiar with your particular toolset.
If you haven't enabled hardware breakpoints, the only remaining breakpoint type is a software breakpoint. These are only hit (on x86 because that's what I'm most familiar with) when you replace the first byte of an instruction with a trap instruction, and will only be routed through the breakpoint mechanism of your OS to your debugger if the correct trap instruction for your OS is used and the debugger has already registered itself with the OS as a debugger for this process. In order to cause the software breakpoint to happen at the correct moment, the trap instruction must be written into your code segment over the first byte of your correct instruction.
The two answers that got here first explain the two scenarios which could get you here (at least, the only two I can think of):
The kernel always has write access everywhere, except for hardware-protected pages (ie on some sort of ROM), which your process' memory is almost certainly not. It has the ability to write the breakpoint instruction regardless of the permissions exposed to the user process being debugged.
The debugger must use some syscall to change the access rights on the memory of the target process before inserting the breakpoint.
Personally, I'm guessing the first thing is happening. The segment permissions are only in place to protect your target process from itself, not from a debugger process or from the kernel. Debugging mechanisms in operating systems pretty regularly violate "normal" permissions to allow the debugger to do whatever it wants to the target process. This, of course, is why some operating systems require you to enter a password before you're allowed to use the debugger in certain scenarios.
However, you can test if it's the second one by attempting to write to the code segment from inside the target process after a breakpoint has been set. If the write succeeds, you know the permissions have been lowered by the OS (to allow the process to be debugged). It would be pretty awkward for the OS to require a debugger to jump through this hoop since it can already insert arbitrary code into the writeable parts of memory and then force a jump to it by generating a stack frame overflow.
The debugger takes advantage of the WriteProcessMemory() function to alter the instruction in place. It'll keep a copy of the instruction. When the bp is hit it will reset the old byte value and set EIP back to the previous instruction so the real instruction can execute.

int instruction from user space

I was under the impression that "int" instruction on x86 is not privileged. So, I thought we should be able to execute this instruction from the user space application. But does not seem so.
I am trying to execute int from user application on windows. I know it may not be right to do so. But I wanted to have some fun. But windows is killing my application.
I think the issue is due to condition cpl <=iopl. Does anyone know how to get around it?
Generally the old dispatcher mechanism for user mode code to transition to kernel mode in order to invoke kernel services was implemented by int 2Eh (now replaced by sysenter). Also int 3 is still reserved for breakpoints to this very day.
Basically the kernel sets up traps for certain interrupts (don't remember whether for all, though) and depending the trap code they will perform some service for the user mode invoker or if that is not possible your application would get killed, because it attempts a privileged operation.
Details would anyway depend on the exact interrupt you were trying to invoke. The functions DbgBreakPoint (ntdll.dll) and DebugBreak (kernel32.dll) do nothing other than invoking int 3 (or actually the specific opcode int3), for example.
Edit 1: On newer Windows versions (XP SP2 and newer, IIRC) sysenter replaces int 2Eh as I wrote in my answer. One possible reason why it gets terminated - although you should be able to catch this via exception handling - is because you don't pass the parameters that it expects on the stack. Basically the usermode part of the native API places the parameters for the system service you call onto the stack, then places the number of the service (an index into the system service dispatcher table - SSDT, sometimes SDT) into a specific register and then calls on newer systems sysenter and on older systems int 2Eh.
The minimum ring level of a given interrupt vector (which decides whether a given "int" is privileged) is based on the ring-level descriptor associated with the vector in the interrupt descriptor table.
In Windows the majority of interrupts are privileged instructions. This prevents user-mode from merely calling the double-fault handler to immediately bugcheck the OS.
There are some non-privileged interrupts in Windows. Specifically:
int 1 (both CD 01 encoding and debug interrupt occurs after a single instruction if EFLAGS_TF is set in eflags)
int 3 (both encoding CC and CD 03)
int 2E (Windows system call)
All other interrupts are privileged, and calling them causes the "invalid instruction" interrupt to be issued instead.
INT is a 'privilege controlled' instruction. It has to be this way for the kernel to protect itself from usermode. INT goes through the exact same trap vectors that hardware interrupts and processor exceptions go through, so if usermode could arbitrarily trigger these exceptions, the interrupt dispatching code would get confused.
If you want to trigger an interrupt on a particular vector that's not already set up by Windows, you have to modify the IDT entry for that interrupt vector with a debugger or a kernel driver. Patchguard won't let you do this from a driver on x64 versions of Windows.

Resources