I'm trying to convert a snippet of mine to a compiler that uses an inline asm syntax similar to gcc's. I read the documentation and all was fine until I encountered this line:
mov eax, dword ptr fs:[0x20]
I converted that to:
movl 0x20(%fs:), %eax
The compiled flipped, telling me that fs is not a 32bit register and that this operation is invalid. How should I access fs in at&t syntax?
Found the answer, it seems that gcc or the at&t is very inconsistent.
movl %fs:0x20, %eax
Related
I have written a small piece of assembly with AT&T syntax and have currently declared three variables in the .data section. However, when I attempt to move any of those variables to a register, such as %eax, an error from gcc is raised. The code and error message is below:
.data
x:.int 14
y:.int 4
str: .string "some string\n"
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl x, %eax; #attempting to move the value of x to %eax;
leave
ret
The error raised is:
call_function.s:14:3: error: 32-bit absolute addressing is not supported in 64-bit mode
movl x, %eax;
^
I have also tried moving the value by first adding the $ character in front of x, however, a clang error is raised:
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Does anyone know how the value stored in x can be successfully moved to %eax? I am using x86 assembly on Mac OSX and compiling with gcc.
A RIP-relative addressing mode is the only good option for addressing static data on MacOS; the image base address is above 2^32 so 32-bit absolute addresses aren't usable even in position-dependent code (unlike x86-64 Linux). RIP-relative addressing of static data is position-independent, so it works even in position-independent executables (ASLR) and libraries.
movl x(%rip), %eax is the AT&T syntax for RIP-relative.
mov eax, dword ptr [rip+x] in GAS .intel_syntax noprefix.
Or, to get the address of a symbol into a register, lea x(%rip), %rdi
NASM syntax: mov eax, [rel x], or use default rel so [x] is RIP-relative.
See Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array for more background on what you can do on OS X, e.g. movabs x, %eax would be possible because the destination register is AL/AX/EAX/RAX. (64-bit absolute address, but don't do that because it's larger and not faster than a RIP-relative load.)
See also http://felixcloutier.com/x86/MOV.html.
I have written a small piece of assembly with AT&T syntax and have currently declared three variables in the .data section. However, when I attempt to move any of those variables to a register, such as %eax, an error from gcc is raised. The code and error message is below:
.data
x:.int 14
y:.int 4
str: .string "some string\n"
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl x, %eax; #attempting to move the value of x to %eax;
leave
ret
The error raised is:
call_function.s:14:3: error: 32-bit absolute addressing is not supported in 64-bit mode
movl x, %eax;
^
I have also tried moving the value by first adding the $ character in front of x, however, a clang error is raised:
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Does anyone know how the value stored in x can be successfully moved to %eax? I am using x86 assembly on Mac OSX and compiling with gcc.
A RIP-relative addressing mode is the only good option for addressing static data on MacOS; the image base address is above 2^32 so 32-bit absolute addresses aren't usable even in position-dependent code (unlike x86-64 Linux). RIP-relative addressing of static data is position-independent, so it works even in position-independent executables (ASLR) and libraries.
movl x(%rip), %eax is the AT&T syntax for RIP-relative.
mov eax, dword ptr [rip+x] in GAS .intel_syntax noprefix.
Or, to get the address of a symbol into a register, lea x(%rip), %rdi
NASM syntax: mov eax, [rel x], or use default rel so [x] is RIP-relative.
See Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array for more background on what you can do on OS X, e.g. movabs x, %eax would be possible because the destination register is AL/AX/EAX/RAX. (64-bit absolute address, but don't do that because it's larger and not faster than a RIP-relative load.)
See also http://felixcloutier.com/x86/MOV.html.
I am trying to learn x86_64 assembly, and am using GCC as my assembler. The exact command I'm using is:
gcc -nostdlib tapydn.S -D__ASSEMBLY__
I'm mainly using gcc for its preprocessor. Here is tapydn.S:
.global _start
#include <asm-generic/unistd.h>
syscall=0x80
.text
_start:
movl $__NR_exit, %eax
movl $0x00, %ebx
int $syscall
This results in a segmentation fault. I believe the problem is with the following line:
movl $__NR_exit, %eax
I used __NR_exit because it was more descriptive than some magic number. However, it appears that my usage of it is incorrect. I believe this to be the case because when I change the line in question to the following, it runs fine:
movl $0x01, %eax
Further backing up this trail of thought is the contents of usr/include/asm-generic/unistd.h:
#define __NR_exit 93
__SYSCALL(__NR_exit, sys_exit)
I expected the value of __NR_exit to be 1, not 93! Clearly I am misunderstanding its purpose and consequently its usage. For all I know, I'm getting lucky with the $0x01 case working (much like undefined behaviour in C++), so I kept digging...
Next, I looked for the definition of sys_exit. I couldn't find it. I tried using it anyway as follows (with and without the preceeding $):
movl $sys_exit, %eax
This wouldn't link:
/tmp/cc7tEUtC.o: In function `_start':
(.text+0x1): undefined reference to `sys_exit'
collect2: error: ld returned 1 exit status
My guess is that it's a symbol in one of the system libraries and I'm not linking it due to my passing -nostdlib to GCC. I'd like to avoid linking such a large library for just one symbol if possible.
In response to Jester's comment about mixing 32 and 64 bit constants, I tried using the value 0x3C as suggested:
movq $0x3C, %eax
movq $0x00, %ebx
This also resulting a segmentation fault. I also tried swapping out eax and ebx for rax and rbx:
movq $0x3C, %rax
movq $0x00, %rbx
The segmentation fault remained.
Jester then commented stating that I should be using syscall rather than int $0x80:
.global _start
#include <asm-generic/unistd.h>
.text
_start:
movq $0x3C, %rax
movq $0x00, %rbx
syscall
This works, but I was later informed that I should be using rdi instead of rbx as per the System V AMD64 ABI:
movq $0x00, %rdi
This also works fine, but still ends up using the magic number 0x3C for the system call number.
Wrapping up, my questions are as follows:
What is the correct usage of __NR_exit?
What should I be using instead of a magic number for the exit system call?
The correct header file to get the system call numbers is sys/syscall.h. The constants are called SYS_### where ### is the name of the system call you are interested in. The __NR_### macros are implementation details and should not be used. As a rule of thumb, if an identifier begins with an underscore it should not be used, if it begins with two it should definitely not be used. The arguments go into rdi, rsi, rdx, r10, r8, and r9. Here is a sample program for Linux:
#include <sys/syscall.h>
.globl _start
_start:
mov $SYS_exit,%eax
xor %edi,%edi
syscall
These conventions are mostly portable to other UNIX-like operating systems.
I'm learning the basics of x86 via this free book.
Keep in mind this is specific to macOS x86 compared to Linux x86.
Its made for GNU Linux, so I have to change some of the code which is probably where I went wrong. I took this code snippet:
.section .data
.section .text
.globl _start
_start:
movl $1, %eax
movl $0, %ebx
int $0x80
After a bit of googling about x86 on macOS I turned that bit of code into this:
.data
.text
.globl _main
_main:
movl $1, %eax
movl $0, %ebx
int $0x80
I compiled this using gcc test.s which compiles it into a.out. When trying to run it using ./a.out I get the error [1] 17301 illegal hardware instruction ./a.out.
Any help is appreciated, thanks!
#Jester helped me out. You can view the comment on my question, but basically call convention is different for macOS. I found this resource which helped me out.
I have been trying to get a better idea of what happens under the hood by using the compiler to generate the assembly programs of various C programs at different optimization levels. There is something that has been bothering me for a while.
When I compile t.c as follows,
gcc -S t.c
I get the assembly in AT&T syntax as follows.
function:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
addl 8(%ebp), %eax
popl %ebp
ret
.size function, .-function
When I compile using the masm argument as follows:-
gcc -S t.c -masm=intel
I get the following output.
function:
push %ebp
mov %ebp, %esp
mov %eax, DWORD PTR [%ebp+12]
add %eax, DWORD PTR [%ebp+8]
pop %ebp
ret
.size function, .-function
There is a change in syntax but there are still "%"s before the notation of registers(this is why I don't prefer AT&T syntax in the first place).
Can someone shed some light on why this is happening? How do I solve this issue?
The GNU assembler (gas) does have a separate option for controlling the % prefix. Documentation seems to suggest GCC doesn't have such an option, but my GCC (version Debian 4.3.2-1.1) doesn't produce the % prefix.