Moving a label into 64bit register - inline assembly (GCC / CLANG) - gcc

I'm trying to move a label's address into a 64bit register and it won't let me.
I'm getting a :
fatal error: error in backend: 32-bit absolute addressing is not supported in 64-bit mode
Here's an example of what i'm trying to do:
asm ("mov $label, %rax"); // Tried movq, movl (No difference)
asm volatile("label:");
Why won't it let me? does it allow moving a label only into a 32 bit register?
I have to insert that label's address into a 64bit register, how do I achieve that then?

Try either of these two asm statements:
asm ("movabs $label, %rax");
asm ("lea label(%rip), %rax");
The first one uses a 64-bit immediate operand (and thus a 64-bit absolute relocation), while the second one uses RIP relative addressing. The second choice is probably the best as it's shorter, though it requires that label be within 2^31 bytes.
However, as David Wohlferd noted, your code is unlikely to work.


Call address using assembly code

In our application, I have the following source code:
#define GET_CALL_ADDRESS(VAR) asm("movl 4(%%ebp),%0;" : "=r"(VAR));
void * _our_malloc(size_t size)
unsigned long calladdr;
return p;
I would like to know what does GET_CALL_ADDRESS do ? This code compiles and works fine on 32-bit machine.
But on 64-bit machine, during compilation I get the following error:
Error: incorrect register `%rax' used with `l' suffix
The directive
asm("movl 4(%%ebp),%0;" : "=r"(VAR));
copies a 32-bit quantity from [EBP+4] to VAR. VAR in your case is defined as calladdr. This assumes that the return address is 32-bit, which is not true anymore in a 64-bit system, and it assumes that the return address is at [EBP+4], which is also not true anymore in a 64-bit system.
The reason why it fails is that calladdr is something like [EBP-x] (where x is some number like 4,) and there is no single Intel x86 instruction that will both fetch from [EBP+4] and store at [EBP-x], so the value fetched from [EBP+4] must be stored in some register, and then the value of that register must be stored at [EBP-x]. Then for some unknown to me reason gcc decides to use register rax for this job, but rax is 64-bit wide, while the 'l' prefix of the movl instruction implies a 32-bit quantity, so there is a mismatch.
Even if you somehow managed to sort this out, your next problem would be that on a 64-bit architecture, the return address is not at [EBP+4].
So, this entire clause is an assumption that you are in 32-bits.
My recommendation: completely ditch this nonsense and replace it with some ready-made library (no need to re-invent the wheel) that works both in 32-bit and 64-bit mode, or with gcc's built-in function for retrieving the return address, as suggested by Michael Petch; then proceed to rebuild in 64-bit like a boss.

GDB doesn't disassemble program running in RAM correctly

I have an application compiled using GCC for an STM32F407 ARM processor. The linker stores it in Flash, but is executed in RAM. A small bootstrap program copies the application from Flash to RAM and then branches to the application's ResetHandler.
memcpy(appRamStart, appFlashStart, appRamSize);
// run the application
__asm volatile (
"ldr r1, =_app_ram_start\n\t" // load a pointer to the application's vectors
"add r1, #4\n\t" // increment vector pointer to the second entry (ResetHandler pointer)
"ldr r2, [r1, #0x0]\n\t" // load the ResetHandler address via the vector pointer
// bit[0] must be 1 for THUMB instructions otherwise a bus error will occur.
"bx r2" // jump to the ResetHandler - does not return from here
This all works ok, except when I try to debug the application from RAM (using GDB from Eclipse) the disassembly is incorrect. The curious thing is the debugger gets the source code correct, and will accept and halt on breakpoints that I have set. I can single step the source code lines. However, when I single step the assembly instructions, they make no sense at all. It also contains numerous undefined instructions. I'm assuming it is some kind of alignment problem, but it all looks correct to me. Any suggestions?
It is possible that GDB relies on symbol table to check instruction set mode which can be Thumb(2)/ARM. When you move code to RAM it probably can't find this information and opts back to ARM mode.
You can use set arm force-mode thumb in gdb to force Thumb mode instruction.
As a side note, if you get illegal instruction when you debugging an ARM binary this is generally the problem if it is not complete nonsense like trying to disassembly data parts.
I personally find it strange that tools doesn't try a heuristic approach when disassembling ARM binaries. In case of auto it shouldn't be hard to try both modes and do an error count to decide which mode to use as a last resort.

What does the 66 in "66:PUSH 08" stand for?

Test platform is windows 32bit.
I use IDA pro to disassemble a PE file, do some very tedious transform work, and re-assembly it into a new PE file.
But there is some difference in the re-assembled PE file and the original one if I use OllyDbg
to debug the new PE file (although there is no difference of this part in the assembly file I transformed)
Here is part of the original one:
See the
is correct.
Here is part of my new PE file:
See now the
is changed to
66:6A 08
66:6A 00
and it lead to the failure of the new PE's execution.
Basically, from what I have seen, it lead to the un-align of stack.
So does anyone know what is wrong with this part? I don't see any difference in the assembly code I transform....
Could anyone give me some help? Thank you!
66h is the operand-size override prefix. In 32-bit code, it switches the operand size to 16-bit from the default 32-bit. So what happens here is that the PUSH instruction pushes a 16-bit value on the stack instead of the 32-bit one, and the ESP is decremented by 2 instead of 4. That's why you get unbalanced stack after the call.
You should check your assembler's documentation to see how you can force 32-bit operand size for the PUSH imm instructions. Different assemblers use different conventions for that. For example, in NASM you'd probably use something like push dword 8.
It is a "prefix" opcode byte: See
0x66 means "operand size override". Your code is apparantly operating in 32-bit mode; PUSH without the prefix will push a 32 bit value. I think what this does is cause the PUSH to fetch a 16 bit value, and push that as a 32 bit value on the stack. (I write a lot of assembly code, and have never had need to do that).

Using assembly JMP function on x86_64

I'm really new to programming (in general - it's pathetic) and some Python-related assembly has cropped up in this app that I'm hacking to run on 64-bit.
Essentially, the code goes like this:
#define FUNCTION(name) \
.globl _##name; \
_##name: \
jmp *(_p_##name)
The FUNCTION(name) syntax is used about 50 times to define headers for an external Python library as far as I can tell (I'm not going to pretend that I fully understand it, I'm just bugfixing).
Since I'm compiling for x86_64, the following error is spit out by GCC for each FUNCTION(name) instance:
32-bit absolute addressing is not supported for x86-64
cannot do signed 4 byte relocation
How would I go about "fixing" this to run on x86_64?
Grab a copy of the Intel Architecture Software Developer's Manuals. As you're seeing, some forms of the jmp instruction are invalid in 64-bit mode. In particular, the two "Jump far, absolute, address given in operand" forms won't work. You will need to change to a relative addressing or absolute indirect addressing form of the instruction. Volume 2A, page 3-549 in my copy, of the manual has a huge pile of information about jmp.

GCC's extended version of asm

I never thought I'd be posting an assembly question. :-)
In GCC, there is an extended version of the asm function. This function can take four parameters: assembly-code, output-list, input-list and overwrite-list.
My question is, are the registers in the overwrite-list zeroed out? What happens to the values that were previously in there (from other code executing).
Update: In considering my answers thus far (thank you!), I want to add that though a register is listed in the clobber-list, it (in my instance) is being used in a pop (popl) command. There is no other reference.
No, they are not zeroed out. The purpose of the overwrite list (more commonly called the clobber list) is to inform GCC that, as a result of the asm instructions the register(s) listed in the clobber list will be modified, and so the compiler should preserve any which are currently live.
For example, on x86 the cpuid instruction returns information in four parts using four fixed registers: %eax, %ebx, %ecx and %edx, based on the input value of %eax. If we were only interested in the result in %eax and %ebx, then we might (naively) write:
int input_res1 = 0; // also used for first part of result
int res2;
__asm__("cpuid" : "+a"(input_res1), "=b"(res2) );
This would get the first and second parts of the result in C variables input_res1 and res2; however if GCC was using %ecx and %edx to hold other data; they would be overwritten by the cpuid instruction without gcc knowing. To prevent this; we use the clobber list:
int input_res1 = 0; // also used for first part of result
int res2;
__asm__("cpuid" : "+a"(input_res1), "=b"(res2)
: : "%ecx", "%edx" );
As we have told GCC that %ecx and %edx will be overwritten by this asm call, it can handle the situation correctly - either by not using %ecx or %edx, or by saving their values to the stack before the asm function and restoring after.
With regards to your second question (why you are seeing a register listed in the clobber list for a popl instruction) - assuming your asm looks something like:
__asm__("popl %eax" : : : "%eax" );
Then the code here is popping an item off the stack, however it doesn't care about the actual value - it's probably just keeping the stack balanced, or the value isn't needed in this code path. By writing this way, as opposed to:
int trash // don't ever use this.
__asm__("popl %0" : "=r"(trash));
You don't have to explicitly create a temporary variable to hold the unwanted value. Admittedly in this case there isn't a huge difference between the two, but the version with the clobber makes it clear that you don't care about the value from the stack.
If by "zeroed out" you mean "the values in the registers are replaced with 0's to prevent me from knowing what some other function was doing" then no, the registers are not zeroed out before use. But it shouldn't matter because you're telling GCC you plan to store information there, not that you want to read information that's currently there.
You give this information to GCC so that (reading the documentation) "you need not guess which registers or memory locations will contain the data you want to use" when you're finished with the assembly code (eg., you don't have to remember if the data will be in the stack register, or some other register).
GCC needs a lot of help for assembly code because "The compiler ... does not parse the assembler instruction template and does not know what it means or even whether it is valid assembler input. The extended asm feature is most often used for machine instructions the compiler itself does not know exist."
GCC is designed as a multi-pass compiler. Many of the passes are in fact entirely different programs. A set of programs forming "the compiler" translate your source from C, C++, Ada, Java, etc. into assembly code. Then a separate program (gas, for GNU Assembler) takes that assembly code and turns it into a binary (and then ld and collect2 do more things to the binary). Assembly blocks exist to pass text directly to gas, and the clobber-list (and input list) exist so that the compiler can do whatever set up is needed to pass information between the C, C++, Ada, Java, etc. side of things and the gas side of things, and to guarantee that any important information currently in registers can be protected from the assembly block by copying it to memory before the assembly block runs (and copying back from memory afterward).
The alternative would be to save and restore every register for every assembly code block. On a RISC machine with a large number of registers that could get expensive (the Itanium has 128 general registers, another 128 floating point registers and 64 1-bit registers, for instance).
It's been a while since I've written any assembly code. And I have much more experience using GCC's named registers feature than doing things with specific registers. So, looking at an example:
#include <stdio.h>
long foo(long l)
long result;
asm (
"movl %[l], %[reg];"
"incl %[reg];"
: [reg] "=r" (result)
: [l] "r" (l)
return result;
int main(int argc, char** argv)
printf("%ld\n", foo(5L));
I have asked for an output register, which I will call reg inside the assembly code, and that GCC will automatically copy to the result variable on completion. There is no need to give this variable different names in C code vs assembly code; I only did it to show that it is possible. Whichever physical register GCC decides to use -- whether it's %%eax, %%ebx, %%ecx, etc. -- GCC will take care of copying any important data from that register into memory when I enter the assembly block so that I have full use of that register until the end of the assembly block.
I have also asked for an input register, which I will call l both in C and in assembly. GCC promises that whatever physical register it decides to give me will have the value currently in the C variable l when I enter the assembly block. GCC will also do any needed recordkeeping to protect any data that happens to be in that register before I enter the assembly block.
What if I add a line to the assembly code? Say:
"addl %[reg], %%ecx;"
Since the compiler part of GCC doesn't check the assembly code it won't have protected the data in %%ecx. If I'm lucky, %%ecx may happen to be one of the registers GCC decided to use for %[reg] or %[l]. If I'm not lucky, I will have "mysteriously" changed a value in some other part of my program.
I suspect the overwrite list is just to give GCC a hint not to store anything of value in these registers across the ASM call; since GCC doesn't analyze what ASM you're giving it, and certain instructions have side-effects that touch other registers not explicitly named in the code, this is the way to tell GCC about it.
