How do debuggers bypass Image File Execution Options when launching their debugee? - debugging

I'm doing some poking around in Windows internals for my general edification, and I'm trying to understand the mechanism behind Image File Execution Options. Specifically, I've set a Debugger entry for calc.exe, with "C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -NoLogo -NoProfile -NoExit -Command "& { start-process -filepath $args[0] -argumentlist $args[1..($args.Length - 1)] -nonewwindow -wait}" as the payload. This results in recursion, with many powershell instances being launched, which makes sense given that I'm intercepting their calls to calc.exe.
That begs the question, though: how do normal debuggers launch a program under test without causing this sort of recursive behavior?

It is anyway a good question about Windows internals, but the reason it has my interest right now is that it has become a practical question for me. At somewhere that I do paid work are three computers, each with a different Windows version and even different debuggers, for which using this IFEO trick results in the debugger debugging itself, apparently trapped in this very same circularity that troubles the OP.
How do debuggers usually avoid this circularity? Well, they themselves don't. Windows avoids it for them.
But let's look first at the circularity. Simple demonstrations are hardly ever helped by PowerShell concoctions and calc.exe is not what it used to be. Let's instead set the Debugger value for notepad.exe to c:\windows\system32\cmd.exe /k. Windows will interpret this as meaning that attempting to run notepad.exe should ordinarily run c:\windows\system32\cmd.exe /k notepad.exe instead. CMD will interpret this as meaning to run notepad.exe and hang about. But this execution too of notepad.exe will be turned into c:\windows\system32\cmd.exe /k notepad.exe, and so on. The Task Manager will soon show you many hundreds of cmd.exe instances. (The good news it that they're all on the one console and can be killed together.)
The OP's question is then why does CMD and its /k (or /c) switch for running a child go circular in a Debugger value but WinDbg, for instance, does not.
In one sense, the answer is down to one bit in an undocumented structure, the PS_CREATE_INFO, that's exchanged between user and kernel modes for the NtCreateUserProcess function. This structure has become fairly well known in some circles, not that they ever seem to say how. I think the structure dates from Windows Vista, but it's not known from Microsoft's public symbol files until Windows 8 and even then not from the kernel but from such things as the Internet Explorer component URLMON.DLL.
Anyway, in the modern form of the PS_CREATE_INFO structure, the 0x04 bit at offset 0x08 (32-bit) or 0x10 (64-bit) controls whether the kernel checks for the Debugger value. Symbol files tell us this bit is known to Microsoft as IFEOSkipDebugger. If this bit is clear and there's a Debugger value, then NtCreateUserProcess fails. Other feedback through the PS_CREATE_INFO structure tells KERNELBASE, for its handling of CreateProcessInternalW, to have its own look at the Debugger value and call NtCreateUserProcess again but for (presumably) some other executable and command line.
When instead the bit is set, the kernel doesn't care about the Debugger value and NtCreateUserProcess can succeed. How the bit ordinarily gets set is by KERNELBASE because the caller is asking not just to create a process but is specifically asking to be the debugger of the new process, i.e., has set either DEBUG_PROCESS or DEBUG_ONLY_THIS_PROCESS in the process creation flags. This is what I mean by saying that debuggers don't do anything themselves to avoid the circularity. Windows does it for them just for their wanting to debug the executable.
One way to look at the Debugger value as an Image File Execution Option for an executable X is that the value's presence means X cannot execute except under a debugger and the value's content may tell how to do that. As hackers have long noticed, and the kernel's programmers will have noticed well before, the content need not specify a debugger and the value can be adapted so that attempts to run X instead run Y. Less noticed is that Y won't be able to run X unless Y debugs X (or disables the Debugger value). Also less noticed is that not all attempts to run X will instead run Y: a debugger's attempt to run X as a debuggee will not be diverted.

TLDR of Geoff's great answer - use DEBUG_PROCESS or DEBUG_ONLY_THIS_PROCESS to bypass the Debugger / Image File Execution Options (IFEO) global flag and avoid the recursion.
In C#, using the excellent Vanara.PInvoke.Kernel32 NuGet:
var startupInfo = new STARTUPINFO();
var creationFlags = Kernel32.CREATE_PROCESS.DEBUG_ONLY_THIS_PROCESS;
CreateProcess(path, null, null, null, false, creationFlags, null, null, startupInfo, out var pi);
DebugActiveProcessStop(pi.dwProcessId);
Note that DebugActiveProcessStop was key for me (couldn't see a window when opened notepad.exe otherwise) - and makes sense anyway if your program is not really a debugger and you just want the bypass.

Related

How does a debugger set breakpoints if the image is in read-only memory?

How does a debugger set breakpoints if the image is in read-only memory? I know there are hardware breakpoints, but in the debugger I use (OllyDbg) those have to be set specially using a different dialog than normal breakpoints.
Explanation:
Here is a routine in a debugger that is comparing itself to a copy of itself. EDX points to the running image, EBX points to the known good copy of the image. The breakpoint on 4010CE only is reached if there is a mismatch. The character being compared is in the AL register. As you can see the debugger shows EB F6 at 10CE, but this is false. 10CE actually has CC in it, as you can see by looking at the AL register. This is because the debugger has secretely inserted the CC to perform the breakpoint.
The debugger first has to change the memory protection of the page it wants to write to. This can be done with VirtualProtectEx. After that it is able to write with WriteProcessMemory and then set the protection back to the original value.
Let me preface this with a disclaimer that I'm not familiar with your particular toolset.
If you haven't enabled hardware breakpoints, the only remaining breakpoint type is a software breakpoint. These are only hit (on x86 because that's what I'm most familiar with) when you replace the first byte of an instruction with a trap instruction, and will only be routed through the breakpoint mechanism of your OS to your debugger if the correct trap instruction for your OS is used and the debugger has already registered itself with the OS as a debugger for this process. In order to cause the software breakpoint to happen at the correct moment, the trap instruction must be written into your code segment over the first byte of your correct instruction.
The two answers that got here first explain the two scenarios which could get you here (at least, the only two I can think of):
The kernel always has write access everywhere, except for hardware-protected pages (ie on some sort of ROM), which your process' memory is almost certainly not. It has the ability to write the breakpoint instruction regardless of the permissions exposed to the user process being debugged.
The debugger must use some syscall to change the access rights on the memory of the target process before inserting the breakpoint.
Personally, I'm guessing the first thing is happening. The segment permissions are only in place to protect your target process from itself, not from a debugger process or from the kernel. Debugging mechanisms in operating systems pretty regularly violate "normal" permissions to allow the debugger to do whatever it wants to the target process. This, of course, is why some operating systems require you to enter a password before you're allowed to use the debugger in certain scenarios.
However, you can test if it's the second one by attempting to write to the code segment from inside the target process after a breakpoint has been set. If the write succeeds, you know the permissions have been lowered by the OS (to allow the process to be debugged). It would be pretty awkward for the OS to require a debugger to jump through this hoop since it can already insert arbitrary code into the writeable parts of memory and then force a jump to it by generating a stack frame overflow.
The debugger takes advantage of the WriteProcessMemory() function to alter the instruction in place. It'll keep a copy of the instruction. When the bp is hit it will reset the old byte value and set EIP back to the previous instruction so the real instruction can execute.

What is nCmdShow?

I've always been curious on what nCmdShow means in WinMain of a C program using Windows API.
I looked up the formal explanation: "Controls how the window is to be shown. This parameter can be one of the following values.".
I do not understand what that means, as a Windows program can contain more than one window, or no windows at all. In addition, as program begins, there is no window to be shown to begin with, which makes me question this argument even more.
Also from what I read, it always stays 10, which isn't even on the list of options in "http://msdn.microsoft.com/en-us/library/windows/desktop/ms633559%28v=vs.85%29.aspx"...
Is it obsolete? Can somebody explain its purpose, or provide any references explaining its use? I tried googling but saw nothing.
Thanks!
REVISITED:
When you right click a shortcut and go to properties, there is an option to start the window Minimized, Maximized, or Normal(ly).
Windows provides an nCmdShow to your program in case it wants to act in a special way if it was launched in any of these three ways. For example, it may hide itself inside notification bar if it was requested to be started minimized.
For exhaustiveness:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms633548(v=vs.85).aspx describes all the different ways that may be passed.
It is basically a hint to the application how it should show its main window. Although it is legacy, it is not as legacy as the hPrevInstance parameter. But, I digress...
The value of the nCmdShow parameter will be one of the constants specified in ShowWindow's API reference. It can be set by another process or system launching your application via CreateProcess. The STARTUPINFO struct that can optionally be passed to CreateProcess contains a wShowWindow member variable that will get passed to WinMain through the nCmdShow parameter.
Another way the nCmdShow parameter is passed is via calls to ShellExecute.
Off the top of my head, I can't think of any scenario (in recent versions of Windows) in which the operating system will explicitly pass a value other than SW_SHOW when launching an application.
It's not uncommon nor bad for an application to ignore the nCmdShow flag passed to WinMain[?].
Note this section from the ShowWindow documentation:
nCmdShow: This parameter is ignored the first time an application calls ShowWindow, if the program that launched the application provides a STARTUPINFO structure.
Even though your program has no window when it starts, the specified value gets implicitly used the first time you eventually call ShowWindow. (It's not read directly from WinMain's local nCmdShow variable, though, so you can't change its value within WinMain and expect to get different results. In that sense, it's not particularly useful unless your program needs to do something special if it's started minimized or maximized.)
The "n" in nCmdShow means "Short int".
(This is what I wanted to know when I came to this stack overflow page)
Source:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa378932(v=vs.85).aspx
nCmdShow is integer type,this parameter specifies how the application windows should be display( to O.S.)
If no value is specified by you than by default Windows O.S. say SW_NORMAL value of this param.
You can specify values of this parameter , but those who passed to WinMain() only for Windows O.S

LLDB Break at Address

I apologize for the likely trivial question but I am running into a wall as Google gives me the same non-applicable answers over and over.
I am trying to set a breakpoint in LLDB. After reading the documentation, the options available to me are to either stop on a certain line in the source or on a certain symbol.
What I want to do is set a breakpoint on a certain memory location.
Not read-or-write to that memory location either but simply breaking when the instruction at that location is about to be executed.
In Pseudocode:
break 0x00010000
breaks when EIP points to 0x00010000.
How can I do this?
breakpoint set has an address option; you would type help breakpoint set to see all of them. For your specific example,
(lldb) br s -a 0x10000
(You can always use shorter versions of command names in lldb that are unambiguous so typing out breakpoint set isn't necessary)
The alternative is to use "process launch --stop-at-entry ...". This will allow you to set breakpoints after the program is launched and then "continue" will let you stop on your first breakpoint. Interestingly (testing in Ubuntu) using --stop-at-entry takes a lot longer to start (~3 seconds). I need to use this on OS X and maybe it will be quicker there.

How to attach to a already running process noninvasively

I have a process suspended at breakpoint under visual studio debugger.
I can attach as many as cdb (Microsoft's console debugger) in non-invasive mode as
cdb -p pid -pvr
How to achieve the same using my own program which uses Debug Engine API.
IDebugClient* debugClient = 0;
(DebugCreate( __uuidof(IDebugClient), (void **)&debugClient );
debugClient->AttachProcess(0,id,DEBUG_ATTACH_NONINVASIVE
|DEBUG_ATTACH_NONINVASIVE_NO_SUSPEND);
This code causes E_INVALIDARG. Is this combination is not allowed ? The one below works, but when it calls GetStackTrace, it returns E_UNEXPECTED.
debugClient->AttachProcess(0,id,DEBUG_ATTACH_NONINVASIVE);
debugControl->GetStackTrace(0, 0, 0, pStackFrames, maxFrames, &framesFilled);
I am interested to attach to a process already at debug break noninvasive way , and get a few local variable from its current stack & some global variable value.
Secondly, can someone point me the function used to dump the content of memory for a symbol iteratively like !stl does. I need to write a plugin to dump one of my vector like structure.
Thanks
I believe there's nothing wrong with
DEBUG_ATTACH_NONINVASIVE|DEBUG_ATTACH_NONINVASIVE_NO_SUSPEND
combination - it is perfectly permissible and is even featured in assert sample.
Otherwise, as far as documentation goes - it is not that detailed. I would suggest debugging your extension with the help of wt (trace and watch data) - it is particularly useful when you need to locate the exact subroutine that is returning an error which might provide you with better insight on the problem.
As for remotely accessing typed data in your apps from an extension, I've found ExtRemoteTyped class (available in engextcpp.hpp in the sdk subfolder) to be very helpful and intuitive to use.

What happens if TF(trap flag) is set to 0 in 8086 microprocessors?

Here I searched that:
Trap Flag (T) – This flag is used for on-chip debugging. Setting trap
flag puts the microprocessor into single step mode for debugging. In
single stepping, the microprocessor executes a instruction and enters
into single step ISR.
If trap flag is set (1), the CPU automatically generates an internal
interrupt after each instruction, allowing a program to be inspected
as it executes instruction by instruction.
If trap flag is reset (0), no function is performed.
https://en.wikipedia.org/wiki/Trap_flag
Now I am coding on emu-8086. As explained, TF must be set in order to debugger work.
Should I set a TF always myself or it is set automatically?
If I somehow set a TF to 0, will the whole computer systems debuggers work or just emu-8086 wont debug?
I've never used emu8086 but by looking at some screenshot of it and judging by its name it's probably an emulator - this means it is not running the code natively.
Each instruction is changing the state of a virtual 8086 CPU (represented as a data structure in memory) and not the state of your real CPU.
With this emulation, emu8086 doesn't need to rely on the TF flag to single-step your program, it just needs to stop after one step of emulation and wait for you to hit another button.
This is also why you can find a thing such as "Step back".
If you were wondering what would happen if a debugged program (and not an emulated one) sets the TF flag then the answer is that it depends on the debugger.
The correct behaviour is the one where the debuggee receives the exceptions but this is hard to handle correctly (since the debugger itself uses the TF flag).
Some debugger just don't care and swallow the exception (i.e. they don't forward it to the program under debug) assuming that a well written program doesn't need to use the TF flag.
Unfortunately malwares routinely use a set of anti-debug technique including setting the TF and checking it back/waiting for exceptions to detect the presence of a debugger.
A truly transparent debugger has to handle the RFLAGS register carefully.
When debugging with breakpoints the TF is not set while the program is executing, so there is nothing to worry about.
However when single stepping the TF is set during the next instruction, this is problematic during a pushfd/q and the debugger must explicitly handle that case to avoid detection.
If the debuggee sets the TF the debugger must pass the debug exception to the program - under current OS the TF won't last more than an instruction because the OS will catch the exception,
trasnform it in a signal and dispatch it to the program while clearing the TF. So the debugger can simply do a check before stepping into a popfd/q instruction.
Where the TF doesn't get cleared by the OS the debugger must effectively emulate RFLAGS with a copy.
The debugger sets TF according to what it needs to do. The code being debugged should not modify TF.

Resources