Branch to an address using GCC inline ARM asm - gcc

I want to branch to a particular address(NOT a label) using ARM assembly, without modifying the LR register. So I go with B instead of BL or BX.
I want this to be done in GCC inline asm.
Here is the documentation, and here is what I 've tried:
#define JMP(addr) \
__asm__("b %0" \
: /*output*/ \
: /*input*/ \
"r" (addr) \
);
It is a C macro, that can be called with an address. When I run it I get the following error:
error: undefined reference to 'r3'
The error is because of the usage of "r". I looked into it a bit, and I've found that it could be a bug on gcc 4.9.* version.
BTW, I am using Android/Linux Gcc 4.9 cross compiler, on an OSX.
Also, I don't know wether I should have loaded something on Rm.
Cheers!
Edit:
I changed the macro to this, and I still get undefined reference to r3 and r4:
#define JMP(addr) \
__asm__("LDR r5,=%0\n\t" \
"LDR r4,[r5]\n\t"\
"ADD r4,#1\n\t" \
"B r4" \
: /*output*/ \
: /*input*/ \
"r" (addr) \
: /*clobbered*/ \
"r4" ,"r5" \
);
Explanation:
load the address of the variable to r5, then load the value of that address to r4. Then add 1 to LSB (emm required by ARM specification?). And finally Branch to that address.

Since you are programming in C, you could just use a plain C approach without any assembly at all: just cast the variable, that holds the pointer to address to which you want to jump, to a function pointer and call it right away:
((void (*)(void)) addr)();
just an explanation to this jungle of brackets:
with this code you are casting addr to a pointer (signified by the star (*)) to a function that takes no argument (the second void means there are no arguments) and that also returns nothing (first void). finally the last two brackets are the actual invocation of that function.
Google for "C function pointer" for more information about that approach.
But if that doesn't work for you and you still want to go with the assembly approach, the instruction that you are in looking for is in fact BX (not sure why you excluded that initially. but I can guess that the name "Branch and Exchange" mislead you to believe that the register argument is swapped (and thereby changed) with the program counter, which is NOT the case, but it confused me in the beginning, too).
For that just a simple recap of the instructions:
B would take a label as an argument. Actually the jump will be encoded as an offset from the current position, which tells the processor to jump that many instruction forwards or backwards (normally the compiler, assembler or linker will take care of calculating that offset for you). During execution, control flow will simply be transferred to that position without changing any register (this means also the link register LR will stay unchanged)
BX R0 will take the absolute (so not an offset) address from a register, in this case R0, and continue execution at that address. This is also done without changing any other register.
BL and BLX R0 are the corresponding counterparts to the previous two instruction. They will do the same thing control flow wise, but on top of that save the current program counter in the link register LR. This is needed if the called function is supposed to return later on.
so in essence, what you would need to do is:
asm("BX %0" : : "r"(addr));
instructing the compiler to make sure the variable addr is in a register (r), which you are promising to only read and not to change. on top of that, upon return you won't have changed (clobbered) any other register.
See here
https://gcc.gnu.org/onlinedocs/gcc/Constraints.html
for more information about inline assembly constraints.
To help you understand why there are also other solutions floating around, here some things about the ARM architecture:
the program counter PC is for many instruction accessible as a regular register R15. It's just an alias for that exact register number.
this means that almost all arithmetic and register altering instructions can take it as an argument. However, for many of them it is highly deprecated.
if you are looking at the disassembly of a program compiled to ARM code, any function will end with one of three things:
BX LR which does exactly what you want to do: take the content of the link register (LR is an alias for R14) and jump to that location, effectively returning to the caller
POP {R4-R11, PC} restoring the caller saved register and jumping back to the caller. This will almost certainly have counterpart of PUSH {R4-R11, LR} in the beginning of the function: your are pushing the content of the link register (the return address) onto the stack but store it back into the program counter effectively returning to the caller in the end
B branch to a different function, if this function ends with a tail call and leaving it up to that function to return to the original caller.
Hope that helps,
Martin

You can't branch to a register, you can only branch to a label. If you want to jump to address in a register you need to move it into the PC register (r15).
#define JMP(addr) \
__asm__("mov pc,%0" \
: /*output*/ \
: /*input*/ \
"r" (addr) \
);

Related

Accessing global variables in ARM64 position independent assembly code

I'm writing some ARM64 assembly code for macOS, and it needs to access a global variable.
I tried to use the solution in this SO answer, and it works fine if I just call the function as is. However, my application needs to patch some instructions of this function, and the way I'm doing it, the function gets moved somewhere else in memory in the process. Note the adrp/ldr pair is untouched during patching.
However, if I try to run the function after moving it elsewhere in memory, it no longer returns correct results. This happens even if I just memcpy() the code as is, without patching. After tracing with a debugger, I isolated the issue to the address of the global valuable being incorrectly loaded by the adrp/ldr pair (and weirdly, the ldr is assembled as an add, as seen with objdump straight after compiling the binary -- not sure if it's somehow related to the issue here.)
What would be the correct way to load a global variable, so that it survives the function being copied somewhere else and run from there?
Note the adrp/ldr pair is untouched during patching.
There's the issue. If you rip code out of the binary it's in, then you effectively need to re-link it.
There's two ways of dealing with this:
If you have complete control over the segment layout, then you could have one executable segment with all of your assembly in it, and right next to it one segment with all addresses that code needs, and make sure the assembly ONLY has references to things on that page. Then wherever you copy your assembly, you'd also copy the data page next to it. This would enable you to make use of static addresses that get rebased by the dynamic linker at the time your binary is loaded. This might look something like:
.section __ASM,__asm,regular
.globl _asm_stub
.p2align 2
_asm_stub:
adrp x0, _some_ref#PAGE
ldr x0, [x0, _some_ref#PAGEOFF]
ret
.section __REF,__ref
.globl _some_ref
.p2align 3
_some_ref:
.8byte _main
Compile that with -Wl,-segprot,__ASM,rx,rx and you'll get an executable __ASM and a writeable __REF segment. Those two would have to maintain their relative position to each other when they get copied around.
(Note that on arm64 macOS you cannot put symbol references into executable segments for the dynamic linker to rebase, because it will fault and crash while trying to do so, and even if it were able to do that, it would invalidate the code signature.)
You act as a linker, scanning for PC-relative instructions and re-linking them as you go. The list of PC-relative instructions in arm64 is quite short, so it should be a feasible amount of work:
adr and adrp
b and bl
b.cond (and bc.cond with FEAT_HBC)
cbz and cbnz
tbz and tbnz
ldr and ldrsw (literal)
ldr (SIMD & FP literal)
prfm (literal)
(You can look for the string PC[] in the ARMv8 Reference Manual to find all uses.)
For each of those you'd have to check whether their target address lies within the range that's being copied or not. If it does, then you'd leave the instruction alone (unless you copy the code to a different offset within the 4K page than it was before, in which case you have to fix up adrp instructions). If it isn't then you'll have to recalculate the offset and emit a new instruction. Some of the instructions have a really low maximum offset (tbz/tbnz ±32KiB). But usually the only instructions that reference addresses across function boundaries are adr, adrp, b, bl and ldr. If all code on the page is written by you then you can do adrp+add instead of adr and adrp+ldr instead of just ldr, and if you have compiler-generated code on there, then all adr's and ldr's will have a nop before or after, which you can use to turn them into an adrp combo. That should get your maximum reference range up to ±128MiB.

Cortex M0 HardFault_Handler and getting the fault address

I'm having a HardFault when executing my program. I've found dozens of ways to get PC's value, but I'm using Keil uVision 5 and none of them has worked.
As far as I know I'm not in a multitasking context, and PSP contains 0xFFFFFFF1, so adding 24 to it would cause overflow.
Here's what I've managed to get working (as in, it compiles and execute):
enum { r0, r1, r2, r3, r12, lr, pc, psr};
extern "C" void HardFault_Handler()
{
uint32_t *stack;
__ASM volatile("MRS stack, MSP");
stack += 0x20;
pc = stack[pc];
psr = stack[psr];
__ASM volatile("BKPT #01");
}
Note the "+= 0x20", which is here to compensate for C function stack.
Whenever I read the PC's value, it's 0.
Would anyone have working code for that?
Otherwise, here's how I do it manually:
Put a breakpoint on HardFault_Handler (the original one)
When it breaks, look as MSP
Add 24 to its value.
Dump memory at that address.
And there it is, 0x00000000.
What am I doing wrong?
A few problems with your code
uint32_t *stack;
__ASM volatile("MRS stack, MSP");
MRS supports register destinations only. Your assembler migt be clever enough to transfer it to a temporary register first, but I'd like to see the machine code generated from that.
If you are using some kind of multitasking system, it might use PSP instead of MSP. See the linked code below on how one can distinguish that.
pc = stack[pc];
psr = stack[psr];
It uses the previous values of pc and psr as an index. Should be
pc = stack[6];
psr = stack[7];
Whenever I read the PC's value, it's 0.
Your program might actually have jumped to address 0 (e.g. through a null function pointer), tried to execute the value found there, which was probably not a valid instruction but the initial SP value from the vector table, and faulted on that. This code
void (*f)(void) = 0;
f();
does exactly that, I'm seeing 0x00000000 at offset 24.
Would anyone have working code for that?
This works for me. Note the code choosing between psp and msp, and the __attribute__((naked)) directive. You could try to find some equivalent for your compiler, to prevent the compiler from allocating a stack frame at all.

Simple "Hello-World", null-free shellcode for Windows needed

I would like to test a buffer-overflow by writing "Hello World" to console (using Windows XP 32-Bit). The shellcode needs to be null-free in order to be passed by "scanf" into the program I want to overflow. I've found plenty of assembly-tutorials for Linux, however none for Windows. Could someone please step me through this using NASM? Thxxx!
Assembly opcodes are the same, so the regular tricks to produce null-free shellcodes still apply, but the way to make system calls is different.
In Linux you make system calls with the "int 0x80" instruction, while on Windows you must use DLL libraries and do normal usermode calls to their exported functions.
For that reason, on Windows your shellcode must either:
Hardcode the Win32 API function addresses (most likely will only work on your machine)
Use a Win32 API resolver shellcode (works on every Windows version)
If you're just learning, for now it's probably easier to just hardcode the addresses you see in the debugger. To make the calls position independent you can load the addresses in registers. For example, a call to a function with 4 arguments:
PUSH 4 ; argument #4 to the function
PUSH 3 ; argument #3 to the function
PUSH 2 ; argument #2 to the function
PUSH 1 ; argument #1 to the function
MOV EAX, 0xDEADBEEF ; put the address of the function to call
CALL EAX
Note that the argument are pushed in reverse order. After the CALL instruction EAX contains the return value, and the stack will be just like it was before (i.e. the function pops its own arguments). The ECX and EDX registers may contain garbage, so don't rely on them keeping their values after the call.
A direct CALL instruction won't work, because those are position dependent.
To avoid zeros in the address itself try any of the null-free tricks for x86 shellcode, there are many out there but my favorite (albeit lengthy) is encoding the values using XOR instructions:
MOV EAX, 0xDEADBEEF ^ 0xFFFFFFFF ; your value xor'ed against an arbitrary mask
XOR EAX, 0xFFFFFFFF ; the arbitrary mask
You can also try NEG EAX or NOT EAX (sign inversion and bit flipping) to see if they work, it's much cheaper (two bytes each).
You can get help on the different API functions you can call here: http://msdn.microsoft.com
The most important ones you'll need are probably the following:
WinExec(): http://msdn.microsoft.com/en-us/library/ms687393(VS.85).aspx
LoadLibrary(): http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress(): http://msdn.microsoft.com/en-us/library/ms683212%28v=VS.85%29.aspx
The first launches a command, the next two are for loading DLL files and getting the addresses of its functions.
Here's a complete tutorial on writing Windows shellcodes: http://www.codeproject.com/Articles/325776/The-Art-of-Win32-Shellcoding
Assembly language is defined by your processor, and assembly syntax is defined by the assembler (hence, at&t, and intel syntax) The main difference (at least i think it used to be...) is that windows is real-mode (call the actual interrupts to do stuff, and you can use all the memory accessible to your computer, instead of just your program) and linux is protected mode (You only have access to memory in your program's little cubby of memory, and you have to call int 0x80 and make calls to the kernel, instead of making calls to the hardware and bios) Anyway, hello world type stuff would more-or-less be the same between linux and windows, as long as they are compatible processors.
To get the shellcode from your program you've made, just load it into your target system's
debugger (gdb for linux, and debug for windows) and in debug, type d (or was it u? Anyway, it should say if you type h (help)) and between instructions and memory will be the opcodes.
Just copy them all over to your text editor into one string, and maybe make a program that translates them all into their ascii values. Not sure how to do this in gdb tho...
Anyway, to make it into a bof exploit, enter aaaaa... and keep adding a's until it crashes
from a buffer overflow error. But find exactly how many a's it takes to crash it. Then, it should tell you what memory adress that was. Usually it should tell you in the error message. If it says '9797[rest of original return adress]' then you got it. Now u gotta use ur debugger to find out where this was. disassemble the program with your debugger and look for where scanf was called. Set a breakpoint there, run and examine the stack. Look for all those 97's (which i forgot to mention is the ascii number for 'a'.) and see where they end. Then remove breakpoint and type the amount of a's you found out it took (exactly the amount. If the error message was "buffer overflow at '97[rest of original return adress]" then remove that last a, put the adress you found examining the stack, and insert your shellcode. If all goes well, you should see your shellcode execute.
Happy hacking...

(8051) Check if a single bit is set

I'm writing a program for a 8051 microcontroller. In the first part of the program I do some calculations and based on the result, I either light the LED or not (using CLR P1.7, where P1.7 is the port the LED is attached to in the microcontroller).
In the next part of the program I want to retrieve the bit, perhaps store it somewhere, and use it in a if-jump instruction like JB. How can I do that?
Also, I've seen the instruction MOV C, P1.7 in a code sample. What's the C here?
The C here is the 8051's carry flag - called that because it can be used to hold the "carry" when doing addition operations on multiple bytes.
It can also be used as a single-bit register - so (as here) where you want to move bits around, you can load it with a port value (such as P1.7) then store it somewhere else, for example:
MOV C, P1.7
MOV <bit-address>, C
Then later you can branch on it using:
JB <bit-address>, <label>
Some of the special function registers are also bit addressable. I believe its all the ones ending in 0 or 8. Don't have a reference in front of me but you can do something like setb r0.1. That way if you need the carry for something you dont have to worry about pushing it and using up space on your stack.

GCC's extended version of asm

I never thought I'd be posting an assembly question. :-)
In GCC, there is an extended version of the asm function. This function can take four parameters: assembly-code, output-list, input-list and overwrite-list.
My question is, are the registers in the overwrite-list zeroed out? What happens to the values that were previously in there (from other code executing).
Update: In considering my answers thus far (thank you!), I want to add that though a register is listed in the clobber-list, it (in my instance) is being used in a pop (popl) command. There is no other reference.
No, they are not zeroed out. The purpose of the overwrite list (more commonly called the clobber list) is to inform GCC that, as a result of the asm instructions the register(s) listed in the clobber list will be modified, and so the compiler should preserve any which are currently live.
For example, on x86 the cpuid instruction returns information in four parts using four fixed registers: %eax, %ebx, %ecx and %edx, based on the input value of %eax. If we were only interested in the result in %eax and %ebx, then we might (naively) write:
int input_res1 = 0; // also used for first part of result
int res2;
__asm__("cpuid" : "+a"(input_res1), "=b"(res2) );
This would get the first and second parts of the result in C variables input_res1 and res2; however if GCC was using %ecx and %edx to hold other data; they would be overwritten by the cpuid instruction without gcc knowing. To prevent this; we use the clobber list:
int input_res1 = 0; // also used for first part of result
int res2;
__asm__("cpuid" : "+a"(input_res1), "=b"(res2)
: : "%ecx", "%edx" );
As we have told GCC that %ecx and %edx will be overwritten by this asm call, it can handle the situation correctly - either by not using %ecx or %edx, or by saving their values to the stack before the asm function and restoring after.
Update:
With regards to your second question (why you are seeing a register listed in the clobber list for a popl instruction) - assuming your asm looks something like:
__asm__("popl %eax" : : : "%eax" );
Then the code here is popping an item off the stack, however it doesn't care about the actual value - it's probably just keeping the stack balanced, or the value isn't needed in this code path. By writing this way, as opposed to:
int trash // don't ever use this.
__asm__("popl %0" : "=r"(trash));
You don't have to explicitly create a temporary variable to hold the unwanted value. Admittedly in this case there isn't a huge difference between the two, but the version with the clobber makes it clear that you don't care about the value from the stack.
If by "zeroed out" you mean "the values in the registers are replaced with 0's to prevent me from knowing what some other function was doing" then no, the registers are not zeroed out before use. But it shouldn't matter because you're telling GCC you plan to store information there, not that you want to read information that's currently there.
You give this information to GCC so that (reading the documentation) "you need not guess which registers or memory locations will contain the data you want to use" when you're finished with the assembly code (eg., you don't have to remember if the data will be in the stack register, or some other register).
GCC needs a lot of help for assembly code because "The compiler ... does not parse the assembler instruction template and does not know what it means or even whether it is valid assembler input. The extended asm feature is most often used for machine instructions the compiler itself does not know exist."
Update
GCC is designed as a multi-pass compiler. Many of the passes are in fact entirely different programs. A set of programs forming "the compiler" translate your source from C, C++, Ada, Java, etc. into assembly code. Then a separate program (gas, for GNU Assembler) takes that assembly code and turns it into a binary (and then ld and collect2 do more things to the binary). Assembly blocks exist to pass text directly to gas, and the clobber-list (and input list) exist so that the compiler can do whatever set up is needed to pass information between the C, C++, Ada, Java, etc. side of things and the gas side of things, and to guarantee that any important information currently in registers can be protected from the assembly block by copying it to memory before the assembly block runs (and copying back from memory afterward).
The alternative would be to save and restore every register for every assembly code block. On a RISC machine with a large number of registers that could get expensive (the Itanium has 128 general registers, another 128 floating point registers and 64 1-bit registers, for instance).
It's been a while since I've written any assembly code. And I have much more experience using GCC's named registers feature than doing things with specific registers. So, looking at an example:
#include <stdio.h>
long foo(long l)
{
long result;
asm (
"movl %[l], %[reg];"
"incl %[reg];"
: [reg] "=r" (result)
: [l] "r" (l)
);
return result;
}
int main(int argc, char** argv)
{
printf("%ld\n", foo(5L));
}
I have asked for an output register, which I will call reg inside the assembly code, and that GCC will automatically copy to the result variable on completion. There is no need to give this variable different names in C code vs assembly code; I only did it to show that it is possible. Whichever physical register GCC decides to use -- whether it's %%eax, %%ebx, %%ecx, etc. -- GCC will take care of copying any important data from that register into memory when I enter the assembly block so that I have full use of that register until the end of the assembly block.
I have also asked for an input register, which I will call l both in C and in assembly. GCC promises that whatever physical register it decides to give me will have the value currently in the C variable l when I enter the assembly block. GCC will also do any needed recordkeeping to protect any data that happens to be in that register before I enter the assembly block.
What if I add a line to the assembly code? Say:
"addl %[reg], %%ecx;"
Since the compiler part of GCC doesn't check the assembly code it won't have protected the data in %%ecx. If I'm lucky, %%ecx may happen to be one of the registers GCC decided to use for %[reg] or %[l]. If I'm not lucky, I will have "mysteriously" changed a value in some other part of my program.
I suspect the overwrite list is just to give GCC a hint not to store anything of value in these registers across the ASM call; since GCC doesn't analyze what ASM you're giving it, and certain instructions have side-effects that touch other registers not explicitly named in the code, this is the way to tell GCC about it.

Resources