Visual Studio 2010 x64 __setReg Equivalent Compiler Intrinsic - windows

I have an application I have written in C where I really need to modify the value of one of the processor registers before calling a function. Normally I would do this with inline assembly, but as we all know that has been removed for 64 bit applications. I also cannot do this in a separate .asm file that is compiled with ml64 due to certain project constraints. So basically I need to execute the equivalent of the following code inline:
_asm mov r10d, 0xDEADBEEF
Does anyone know of a creative method or some other compiler intrinsic for x64 that will allow you to modify the value of a register inline?

Unfortunately, after looking at possible workarounds, it seems that Hans was right and it's simply not possible to modify the contents of a register inline. There is no compiler intrinsic that exists to do it and the only alternative is to either write the entire function in 64 bit assembly as a separate .asm file and compile it with ml64, or do as Alexey suggested and allocate an executable block of memory before hand and write the opcodes to it. You can then create a function pointer and just call this code directly. So for example, if I wanted to do the equivalent of:
mov r10d, ecx
ret
Just create an array to store the opcodes:
BYTE copyValueToR10[] = "\x44\x8B\xD1\xC3";
You can then VirtualAlloc memory for this tiny function with PAGE_EXECUTE protection. Next just create a function pointer and you're good to go. Definitely a dirty way to do it, but given the constraints of not having inline asm or wanting to compile using ml64, this seems to be the only other way to do it.

Related

HW register value (R15) not saved during a function call to external library

My code is written in C++, and compiled with gcc version 4.7.2.
It's linked with 3rd party library, which is written in C, and compiled with gcc 4.5.2.
My code calls a function initStuff(). During the debug I found out that the value of R15 register before the call to initStuff() is not the same as the value upon return from that function.
As a quick hack I did:
asm(" mov %%r15, %0" : "=r" ( saveR15 ) );
initStuff();
asm(" mov %0, %%r15;" : : "r" (saveR15) );
which seems to work for now.
Who is to blame here? How can I find if it's a compiler issue, or maybe compatibility issue?
gcc on x86-64 follows the System V ABI, which defines r15 as a callee-saved register; any function which uses this register is supposed to save and restore it.
So if this third-party function is not doing so, it is failing to conform to the ABI, and unless this is documented, it is to blame. AFAIK this part of the ABI has been stable forever, so if compiler-generated code (with default options) is failing to save and restore r15, that would be a compiler bug. More likely some part of the third-party code uses assembly language and is buggy, or conceivably it was built with non-standard compiler options.
You can either dig into it, or as a workaround, write a wrapper around it that saves and restores r15. Your current workaround is not really safe, since the compiler might reorder your asm statements with respect to surrounding code. You should instead put the call to initStuff inside a single asm block with the save-and-restore (declaring it as clobbering all caller-saved registers), or write a "naked" assembly wrapper which does the save/restore and call, and call it instead. (Make sure to preserve stack alignment.)

How to call library functions in shellcode

I want to generate shellcode using the following NASM code:
global _start
extern exit
section .text
_start:
xor rcx, rcx
or rcx, 10
call exit
The problem here is that I cannot use this because the address of exit function cannot be hard coded. So, how do I go about using library functions without having to re-implement them using system calls?
One way that I can think of, is to retrieve the address of exit function in a pre-processing program using GetProcAddress and substitute it in the shellcode at the appropriate place.
However, this method does not generate shellcode that can be run as it is. I'm sure there must be a better way to do it.
I am not an expert on writing shellcode, but you could try to find the import address table (IAT) of your target program and use the stored function pointers to call windows functions.
Note that you would be limited to the functions the target program uses.
Also you would have to let your shellcode calculate IAT's position relative to the process's base address due to relocations. Of course you could rely on Windows not relocating, but this might result in errors in a few cases.
Another issue is that you would have to find the target process's base address from outside.
A totally different attempt would be using syscalls, but they are really hard to use, not talking about the danger using them.
Information on PE file structure:
https://msdn.microsoft.com/en-us/library/ms809762.aspx

inline assembly error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'

I have an inline AT&T style assembly block, which works with XMM registers and there are no problems in Release configuration of my XCode project, however I've stumbled upon this strange error (which is supposedly a GCC bug) in Debug configuration... Can I fix it somehow? There is nothing special in assembly code, but I am using a lot of memory constraints (12 constraints), can this cause this problem?
Not a complete answer, sorry, but the comments section is too short for this ...
Can you post a sample asm("..." :::) line that demonstrates the problem ?
The use of XMM registers is not the issue, the error message indicates that GCC wanted to create code like, say:
movdqa (%rax),%xmm0
i.e. memory loads/stores through pointers held in general registers, and you specified more memory locations than available general-purpose regs (it's probably 12 in debug mode because because RBP, RSP are used for frame/stackpointer and likely RBX for the global offset table and RAX reserved for returns) without realizing register re-use potential.
You might be able to eek things out by doing something like:
void *all_mem_args_tbl[16] = { memarg1, memarg2, ... };
void *trashme;
asm ("movq (%0), %1\n\t"
"movdqa (%1), %xmm0\n\t"
"movq 8(%0), %1\n\t"
"movdqa (%1), %xmm1\n\t"
...
: "r"all_mem_args_tbl : "r"(trashme) : ...);
i.e. put all the mem locations into a table that you pass as operand, and then manage the actual general-purpose register use on your own. It might be two pointer accesses through the indirection table, but whether that makes a difference is hard to say without knowing your complete assembler code piece.
The Debug configuration uses -O0 by default. Since this flag disables optimisations, the compiler is probably not being able to allocate registers given the constraints specified by your inline assembly code, resulting in register starvation.
One solution is to specify a different optimisation level, e.g. -Os, which is the one used by default in the Release configuration.

Simple "Hello-World", null-free shellcode for Windows needed

I would like to test a buffer-overflow by writing "Hello World" to console (using Windows XP 32-Bit). The shellcode needs to be null-free in order to be passed by "scanf" into the program I want to overflow. I've found plenty of assembly-tutorials for Linux, however none for Windows. Could someone please step me through this using NASM? Thxxx!
Assembly opcodes are the same, so the regular tricks to produce null-free shellcodes still apply, but the way to make system calls is different.
In Linux you make system calls with the "int 0x80" instruction, while on Windows you must use DLL libraries and do normal usermode calls to their exported functions.
For that reason, on Windows your shellcode must either:
Hardcode the Win32 API function addresses (most likely will only work on your machine)
Use a Win32 API resolver shellcode (works on every Windows version)
If you're just learning, for now it's probably easier to just hardcode the addresses you see in the debugger. To make the calls position independent you can load the addresses in registers. For example, a call to a function with 4 arguments:
PUSH 4 ; argument #4 to the function
PUSH 3 ; argument #3 to the function
PUSH 2 ; argument #2 to the function
PUSH 1 ; argument #1 to the function
MOV EAX, 0xDEADBEEF ; put the address of the function to call
CALL EAX
Note that the argument are pushed in reverse order. After the CALL instruction EAX contains the return value, and the stack will be just like it was before (i.e. the function pops its own arguments). The ECX and EDX registers may contain garbage, so don't rely on them keeping their values after the call.
A direct CALL instruction won't work, because those are position dependent.
To avoid zeros in the address itself try any of the null-free tricks for x86 shellcode, there are many out there but my favorite (albeit lengthy) is encoding the values using XOR instructions:
MOV EAX, 0xDEADBEEF ^ 0xFFFFFFFF ; your value xor'ed against an arbitrary mask
XOR EAX, 0xFFFFFFFF ; the arbitrary mask
You can also try NEG EAX or NOT EAX (sign inversion and bit flipping) to see if they work, it's much cheaper (two bytes each).
You can get help on the different API functions you can call here: http://msdn.microsoft.com
The most important ones you'll need are probably the following:
WinExec(): http://msdn.microsoft.com/en-us/library/ms687393(VS.85).aspx
LoadLibrary(): http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress(): http://msdn.microsoft.com/en-us/library/ms683212%28v=VS.85%29.aspx
The first launches a command, the next two are for loading DLL files and getting the addresses of its functions.
Here's a complete tutorial on writing Windows shellcodes: http://www.codeproject.com/Articles/325776/The-Art-of-Win32-Shellcoding
Assembly language is defined by your processor, and assembly syntax is defined by the assembler (hence, at&t, and intel syntax) The main difference (at least i think it used to be...) is that windows is real-mode (call the actual interrupts to do stuff, and you can use all the memory accessible to your computer, instead of just your program) and linux is protected mode (You only have access to memory in your program's little cubby of memory, and you have to call int 0x80 and make calls to the kernel, instead of making calls to the hardware and bios) Anyway, hello world type stuff would more-or-less be the same between linux and windows, as long as they are compatible processors.
To get the shellcode from your program you've made, just load it into your target system's
debugger (gdb for linux, and debug for windows) and in debug, type d (or was it u? Anyway, it should say if you type h (help)) and between instructions and memory will be the opcodes.
Just copy them all over to your text editor into one string, and maybe make a program that translates them all into their ascii values. Not sure how to do this in gdb tho...
Anyway, to make it into a bof exploit, enter aaaaa... and keep adding a's until it crashes
from a buffer overflow error. But find exactly how many a's it takes to crash it. Then, it should tell you what memory adress that was. Usually it should tell you in the error message. If it says '9797[rest of original return adress]' then you got it. Now u gotta use ur debugger to find out where this was. disassemble the program with your debugger and look for where scanf was called. Set a breakpoint there, run and examine the stack. Look for all those 97's (which i forgot to mention is the ascii number for 'a'.) and see where they end. Then remove breakpoint and type the amount of a's you found out it took (exactly the amount. If the error message was "buffer overflow at '97[rest of original return adress]" then remove that last a, put the adress you found examining the stack, and insert your shellcode. If all goes well, you should see your shellcode execute.
Happy hacking...

GCC's extended version of asm

I never thought I'd be posting an assembly question. :-)
In GCC, there is an extended version of the asm function. This function can take four parameters: assembly-code, output-list, input-list and overwrite-list.
My question is, are the registers in the overwrite-list zeroed out? What happens to the values that were previously in there (from other code executing).
Update: In considering my answers thus far (thank you!), I want to add that though a register is listed in the clobber-list, it (in my instance) is being used in a pop (popl) command. There is no other reference.
No, they are not zeroed out. The purpose of the overwrite list (more commonly called the clobber list) is to inform GCC that, as a result of the asm instructions the register(s) listed in the clobber list will be modified, and so the compiler should preserve any which are currently live.
For example, on x86 the cpuid instruction returns information in four parts using four fixed registers: %eax, %ebx, %ecx and %edx, based on the input value of %eax. If we were only interested in the result in %eax and %ebx, then we might (naively) write:
int input_res1 = 0; // also used for first part of result
int res2;
__asm__("cpuid" : "+a"(input_res1), "=b"(res2) );
This would get the first and second parts of the result in C variables input_res1 and res2; however if GCC was using %ecx and %edx to hold other data; they would be overwritten by the cpuid instruction without gcc knowing. To prevent this; we use the clobber list:
int input_res1 = 0; // also used for first part of result
int res2;
__asm__("cpuid" : "+a"(input_res1), "=b"(res2)
: : "%ecx", "%edx" );
As we have told GCC that %ecx and %edx will be overwritten by this asm call, it can handle the situation correctly - either by not using %ecx or %edx, or by saving their values to the stack before the asm function and restoring after.
Update:
With regards to your second question (why you are seeing a register listed in the clobber list for a popl instruction) - assuming your asm looks something like:
__asm__("popl %eax" : : : "%eax" );
Then the code here is popping an item off the stack, however it doesn't care about the actual value - it's probably just keeping the stack balanced, or the value isn't needed in this code path. By writing this way, as opposed to:
int trash // don't ever use this.
__asm__("popl %0" : "=r"(trash));
You don't have to explicitly create a temporary variable to hold the unwanted value. Admittedly in this case there isn't a huge difference between the two, but the version with the clobber makes it clear that you don't care about the value from the stack.
If by "zeroed out" you mean "the values in the registers are replaced with 0's to prevent me from knowing what some other function was doing" then no, the registers are not zeroed out before use. But it shouldn't matter because you're telling GCC you plan to store information there, not that you want to read information that's currently there.
You give this information to GCC so that (reading the documentation) "you need not guess which registers or memory locations will contain the data you want to use" when you're finished with the assembly code (eg., you don't have to remember if the data will be in the stack register, or some other register).
GCC needs a lot of help for assembly code because "The compiler ... does not parse the assembler instruction template and does not know what it means or even whether it is valid assembler input. The extended asm feature is most often used for machine instructions the compiler itself does not know exist."
Update
GCC is designed as a multi-pass compiler. Many of the passes are in fact entirely different programs. A set of programs forming "the compiler" translate your source from C, C++, Ada, Java, etc. into assembly code. Then a separate program (gas, for GNU Assembler) takes that assembly code and turns it into a binary (and then ld and collect2 do more things to the binary). Assembly blocks exist to pass text directly to gas, and the clobber-list (and input list) exist so that the compiler can do whatever set up is needed to pass information between the C, C++, Ada, Java, etc. side of things and the gas side of things, and to guarantee that any important information currently in registers can be protected from the assembly block by copying it to memory before the assembly block runs (and copying back from memory afterward).
The alternative would be to save and restore every register for every assembly code block. On a RISC machine with a large number of registers that could get expensive (the Itanium has 128 general registers, another 128 floating point registers and 64 1-bit registers, for instance).
It's been a while since I've written any assembly code. And I have much more experience using GCC's named registers feature than doing things with specific registers. So, looking at an example:
#include <stdio.h>
long foo(long l)
{
long result;
asm (
"movl %[l], %[reg];"
"incl %[reg];"
: [reg] "=r" (result)
: [l] "r" (l)
);
return result;
}
int main(int argc, char** argv)
{
printf("%ld\n", foo(5L));
}
I have asked for an output register, which I will call reg inside the assembly code, and that GCC will automatically copy to the result variable on completion. There is no need to give this variable different names in C code vs assembly code; I only did it to show that it is possible. Whichever physical register GCC decides to use -- whether it's %%eax, %%ebx, %%ecx, etc. -- GCC will take care of copying any important data from that register into memory when I enter the assembly block so that I have full use of that register until the end of the assembly block.
I have also asked for an input register, which I will call l both in C and in assembly. GCC promises that whatever physical register it decides to give me will have the value currently in the C variable l when I enter the assembly block. GCC will also do any needed recordkeeping to protect any data that happens to be in that register before I enter the assembly block.
What if I add a line to the assembly code? Say:
"addl %[reg], %%ecx;"
Since the compiler part of GCC doesn't check the assembly code it won't have protected the data in %%ecx. If I'm lucky, %%ecx may happen to be one of the registers GCC decided to use for %[reg] or %[l]. If I'm not lucky, I will have "mysteriously" changed a value in some other part of my program.
I suspect the overwrite list is just to give GCC a hint not to store anything of value in these registers across the ASM call; since GCC doesn't analyze what ASM you're giving it, and certain instructions have side-effects that touch other registers not explicitly named in the code, this is the way to tell GCC about it.

Resources