A debug register substitute? - windows

I was reading some old articles about debugging, and one of them mentioned the debug registers. Reading some more about these registers and what they can do made me incredibly eager to have some fun with them. However when I tried looking for some more information about how to actually use them I read that they can only be accessed from ring 0 in windows.
I thought that that was the end then, since I'm not going to write a kernel driver just to play with a few registers. But then I thought about the memory editing tool I used to play around with. Cheat engine it's called, and one of the various options of the program was to specify to break on instructions/data that was being executed/accessed/read. That is exactly the same as the debug registers do. So I was wondering: Is there a substitute/replacement for the debug registers in windows? Since I'm sure that the program (cheat engine) doesn't use a kernel driver to set these values.

Thats not true at all, you can set HW debug register from ring3, indirectly (ollydbg does this), for this you need to use SetThreadContext under windows (example).
if you still want a substitute for HW registers, you can use INT3 for code break points and single step trapping for checking if a varibale has changed(highly inefficient).
a good reference is GDB and its source: http://developer.apple.com/library/mac/#documentation/DeveloperTools/gdb/gdbint/gdbint_3.html

Related

WinDbg not showing register values

Basically, this is the same question that was asked here.
When performing kernel debugging of a machine running Windows 7 or older, with WinDbg version 6.2 and up, the debugger doesn't show anything in the registers window. Pressing the Customize... button results in a message box that reads Registers are not yet known.
At the same time, issuing the r command results in perfectly valid register values being printed out.
What is the reason for this behaviour, and can it be fixed?
TL;DR: I wrote an extension DLL that fixes the bug. Available here.
The Problem
To understand the problem, we first need to understand that WinDbg is basically just a frontend to Microsoft's Windows Symbolic Debugger Engine, implemented inside dbgeng.dll. Other frontends include the command-line kd.exe (kernel debugger) and cdb.exe (user-mode debugger).
The engine implements everything we expect from a debugger: working with symbol files, read and writing memory and registers, setting breakpoitns, etc. The engine then exposes all of this functionality through COM-like interfaces (they implement IUnknown but are not registered components). This allows us, for instance, to write our own debugger (like this person did).
Armed with this knowledge, we can now make an educated guess as to how WinDbg obtains the values of the registers on the target machine.
The engine exposes the IDebugRegisters interface for manipulating registers. This interface declares the GetValues method for retrieving the values of multiple registers in one go. But how does WinDbg know how many registers are there? That why we have the GetNumberRegisters method.
So, to retrieve the values of all registers on the target, we'll have to do something like this:
Call IDebugRegisters::GetNumberRegisters to get the total number of registers.
Call IDebugRegisters::GetValues with the Count parameter set to the total number of registers, the Indices parameter set to NULL, and the Start parameter set to 0.
One tiny problem, though: the second call fails with E_INVALIDARG.
Ehm, excuse me? How can it fail? Especially puzzling is the documentation for this return value:
The value of the index of one of the registers is greater than the number of registers on the target machine.
But I just asked you how many registers there are, so how can that value be out of range? Okay, let's continue reading the docs anyway, maybe something will become clear:
If the return value is not S_OK, some of the registers still might have been read. If the target was not accessible, the return type is E_UNEXPECTED and Values is unchanged; otherwise, Values will contain partial results and the registers that could not be read will have type DEBUG_VALUE_INVALID.
(Emphasis mine.)
Aha! So maybe the engine just couldn't read one of the registers! But which one? Turns out that the engine chokes on the xcr0 register. From the Intel 64 and IA-32 Architectures Software Developer’s Manual:
Extended control register XCR0 contains a state-component bitmap that specifies the user state components that software has enabled the XSAVE feature set to manage. If the bit corresponding to a state component is clear in XCR0, instructions in the XSAVE feature set will not operate on that state component, regardless of the value of the instruction mask.
Okay, so the register controls the operation of the XSAVE instruction, which saves the state of the CPU's extended features (like XMM and AVX). According to the last comment on this page, this instruction requires some support from the operating system. Although the comment states that Windows 7 (that's what the VM I was testing on was running) does support this instruction, it seems that the issue at hand is related to the OS anyway, as when the target is Windows 8 everything works fine.
Really, it's unclear whether the bug is within the debugger engine, which reports more registers than it can retrieve values for, or within WinDbg, which refuses to show any values at all if the engine fails to produce all of them.
The Solution
We could, of course, bite the bullet and just use an older version of WinDbg for debugging older Windows versions. But where's the challenge in that?
Instead, I present to you a debugger extension that solves this problem. It does so by hooking (with the help of this library) the relevant debugger engine methods and returning S_OK if the only register that failed was xcr0. Otherwise, it propagates the failure. The extension supports runtime unload, so if you experience problems you can always disable the hooks.
That's it, have fun!

How to pin a interrupt to a CPU in driver

Is it possible to pin a softirq, or any other bottom half to a processor. I have a doubt that this could be done from within a softirq code.
But then inside a driver is it possible to pin a particular IRQ to a
core.
From user mode, you can easily do this by writing to /proc/irq/N/smp_affinity to control which processor(s) an interrupt is directed to. The symbols for the code implementing this are not exported though, so it's difficult to do from the kernel (at least for a loadable module which is how most drivers are structured).
The fact that the implementing function symbols aren't exported is a sign that the kernel developers don't want to encourage this. Presumably that's because it takes control away from the user. And also embeds assumptions about number of processors and so forth into the driver.
So, to answer your question, yes, it's possible, but it's discouraged, and you would need to do one of several "ugly" things to implement it ((a) change kernel exports, (b) link your driver statically into main kernel, or (c) open/write to the proc file from kernel mode).
The usual way to achieve this is by writing a user-mode program (can even be a shell script) that programs core numbers/masks into the appropriate proc file. See Documentation/IRQ-affinity.txt in the kernel source directory for details.

What is a TRAMPOLINE_ADDR for ARM and ARM64(aarch64)?

I am writing a basic check-pointing mechanism for ARM64 using PTrace in order to do so I am using some code from cryopid and I found a TRAMPOLINE_ADDR macro like the following:
#define TRAMPOLINE_ADDR 0x00800000 /* 8MB mark */ for x86
#define TRAMPOLINE_ADDR 0x00300000 /* 3MB mark */ for x86_64
So when I read about trampolines it is something related to jump statements. But my questions is from where the above values came and what would the corresponding values for the ARM and ARM64 platform.
Thank you
Just read the wikipedia page.
There is nothing magic about a trampoline or certainly a particular address, any address where you can have code that executes can hold a trampoline. there are many use cases for them...for example
say you are booting off of a flash, a spi flash, running at some safe rate so that the chip boots for all users. But you want to increase the rate of the spi flash and the spi peripheral does not allow you to change while executing code. So you would copy some code to ram, that code boosts the spi flash rate to a faster rate so you can use and/or run the flash faster, then you bounce back to running from the flash. you have bounced or trampolined off of that little bit of code in ram.
you have a chip that boots from flash, but has the ability to re-map that address space to ram for example, so you copy some code to some other ram, branch to it that little bit of trampoline code remaps the address space, then bounces you back or bounces you to where the flash is now mapped to or whatever.
you will see the gnu linker sometimes add a small trampoline, say you compile some modules as thumb and some others for arm, you no longer have to use that interwork thing, the linker takes care of cleaning this up, it may add an instruction or two to trampoline you between modes, sometimes it modifies the code to just go where it needs to sometimes it modifies the code to branch link somewhere close and that somewhere close is a trampoline.
I assume there may be a need to do the same thing for aarch64 if/when switching to that mode.
so there should be no magic. your specific application might have one or many trampolines, and the one you are interested might not even be called that, but is probably application specific, absolutely no reason why there would be one address for everyone, unless it is some very rigid operating specific (again "application specific") thing and one specific trampoline for that operating system is at some DEFINEd address.

what is the purpose of the BeingDebugged flag in the PEB structure?

What is the purpose of this flag (from the OS side)?
Which functions use this flag except isDebuggerPresent?
thanks a lot
It's effectively the same, but reading the PEB doesn't require a trip through kernel mode.
More explicitly, the IsDebuggerPresent API is documented and stable; the PEB structure is not, and could, conceivably, change across versions.
Also, the IsDebuggerPresent API (or flag) only checks for user-mode debuggers; kernel debuggers aren't detected via this function.
Why put it in the PEB? It saves some time, which was more important in early versions of NT. (There are a bunch of user-mode functions that check this flag before doing some runtime validation, and will break to the debugger if set.)
If you change the PEB field to 0, then IsDebuggerPresent will also return 0, although I believe that CheckRemoteDebuggerPresent will not.
As you have found the IsDebuggerPresent flag reads this from the PEB. As far as I know the PEB structure is not an official API but IsDebuggerPresent is so you should stick to that layer.
The uses of this method are quite limited if you are after a copy protection to prevent debugging your app. As you have found it is only a flag in your process space. If somebody debugs your application all he needs to do is to zero out the flag in the PEB table and let your app run.
You can raise the level by using the method CheckRemoteDebuggerPresent where you pass in your own process handle to get an answer. This method goes into the kernel and checks for the existence of a special debug structure which is associated with your process if it is beeing debugged. A user mode process cannot fake this one but you know there are always ways around by simply removing your check ....

State of Registers After Bootup

I'm working on a boot loader on an x86 machine.
When the BIOS copies the contents of the MBR to 0x7c00 and jumps to that address, is there a standard meaning to the contents of the registers? Do the registers have standard values?
I know that the segment registers are typically set to 0, but will sometimes be 0x7c0. What about the other hardware registers?
This early execution environment is highly implementation defined, meaning the implementation of your particular BIOS. Never make any assumptions on the contents of registers. They might be initialized to 0, but they might contain a random value just as well.
from the OS dev Wiki, which is where I get information when I'm playing with my toy OS's
Best option would be to assume nothing. If they have meaning, you will find that from the other side when you need the information they provide.
Undefined, I believe? I think it depends on the mainboard and CPU, and should be treated as random for your own good.
Safest bet is to assume undefined.
Always assume undefined, otherwise you'll hit bad problems if you ever try to port architectures.
There is nothing quite like the pain of porting code that assumes everything uninitialized will be set to zero.
The only thing that I know to be well defined is the processor state immediately after reset.
For the record you can find that in Intel's Software Developer's Manual Vol 3 chapter 8: "PROCESSOR MANAGEMENT AND INITIALIZATION" in the table titled " IA-32 Processor States Following Power-up, Reset, or INIT"
You can always initialize them yourself to start with a known state.

Resources