Hopefully not a silly question.
Compiled without specifying optimization: gcc test.c -o test (which seem to chose -O0).
gcc -O2 or -O3 output much cleaner (at least it seems to me) assembly code than -O0.
What's the reason for -O0, how does it help us, I can't see that it's simpler than -O1 or -O2.
...
int sum(int x, int y)
{
int sum = x + y;
return sum;
}
...
0x00000000004004ed <+0>: push %rbp
0x00000000004004ee <+1>: mov %rsp,%rbp
0x00000000004004f1 <+4>: mov %edi,-0x14(%rbp)
0x00000000004004f4 <+7>: mov %esi,-0x18(%rbp)
0x00000000004004f7 <+10>: mov -0x18(%rbp),%eax
0x00000000004004fa <+13>: mov -0x14(%rbp),%edx
0x00000000004004fd <+16>: add %edx,%eax
0x00000000004004ff <+18>: mov %eax,-0x4(%rbp)
0x0000000000400502 <+21>: mov -0x4(%rbp),%eax
0x0000000000400505 <+24>: pop %rbp
0x0000000000400506 <+25>: retq
With optimizations turned off, there is a 1:1 representation between source code and machine code, allowing for easier debugging. With optimizations turned on, the compiler can do strange things like rearranging code or getting rid of variables that make debugging the code much harder.
Compiling with -O0 is also typically faster as the optimizer is usually the slowest component of every modern compiler.
Related
I am trying to learn x86_64 assembly, and am using GCC as my assembler. The exact command I'm using is:
gcc -nostdlib tapydn.S -D__ASSEMBLY__
I'm mainly using gcc for its preprocessor. Here is tapydn.S:
.global _start
#include <asm-generic/unistd.h>
syscall=0x80
.text
_start:
movl $__NR_exit, %eax
movl $0x00, %ebx
int $syscall
This results in a segmentation fault. I believe the problem is with the following line:
movl $__NR_exit, %eax
I used __NR_exit because it was more descriptive than some magic number. However, it appears that my usage of it is incorrect. I believe this to be the case because when I change the line in question to the following, it runs fine:
movl $0x01, %eax
Further backing up this trail of thought is the contents of usr/include/asm-generic/unistd.h:
#define __NR_exit 93
__SYSCALL(__NR_exit, sys_exit)
I expected the value of __NR_exit to be 1, not 93! Clearly I am misunderstanding its purpose and consequently its usage. For all I know, I'm getting lucky with the $0x01 case working (much like undefined behaviour in C++), so I kept digging...
Next, I looked for the definition of sys_exit. I couldn't find it. I tried using it anyway as follows (with and without the preceeding $):
movl $sys_exit, %eax
This wouldn't link:
/tmp/cc7tEUtC.o: In function `_start':
(.text+0x1): undefined reference to `sys_exit'
collect2: error: ld returned 1 exit status
My guess is that it's a symbol in one of the system libraries and I'm not linking it due to my passing -nostdlib to GCC. I'd like to avoid linking such a large library for just one symbol if possible.
In response to Jester's comment about mixing 32 and 64 bit constants, I tried using the value 0x3C as suggested:
movq $0x3C, %eax
movq $0x00, %ebx
This also resulting a segmentation fault. I also tried swapping out eax and ebx for rax and rbx:
movq $0x3C, %rax
movq $0x00, %rbx
The segmentation fault remained.
Jester then commented stating that I should be using syscall rather than int $0x80:
.global _start
#include <asm-generic/unistd.h>
.text
_start:
movq $0x3C, %rax
movq $0x00, %rbx
syscall
This works, but I was later informed that I should be using rdi instead of rbx as per the System V AMD64 ABI:
movq $0x00, %rdi
This also works fine, but still ends up using the magic number 0x3C for the system call number.
Wrapping up, my questions are as follows:
What is the correct usage of __NR_exit?
What should I be using instead of a magic number for the exit system call?
The correct header file to get the system call numbers is sys/syscall.h. The constants are called SYS_### where ### is the name of the system call you are interested in. The __NR_### macros are implementation details and should not be used. As a rule of thumb, if an identifier begins with an underscore it should not be used, if it begins with two it should definitely not be used. The arguments go into rdi, rsi, rdx, r10, r8, and r9. Here is a sample program for Linux:
#include <sys/syscall.h>
.globl _start
_start:
mov $SYS_exit,%eax
xor %edi,%edi
syscall
These conventions are mostly portable to other UNIX-like operating systems.
BOOL32 doStuff() {
return TRUE;
}
gcc 2.95 for vxworks 5.x, compiling the above code with -O0 for 32-bit x86 generated following code:
doStuff:
0e9de190: push %ebp
0e9de191: mov %esp,%ebp
308 return TRUE;
0e9de193: mov $0x1,%eax
0e9de198: jmp 0xe9de1a0 <doStuff+16>
312 {
0e9de19a: lea 0x0(%esi),%esi
// The JMP jumps here
0e9de1a0: mov %ebp,%esp
0e9de1a2: pop %ebp
0e9de1a3: ret
Everything looks normal until the JMP and LEA instruction. What are they for?
My guess is that it is some kind of alignment, but I am not sure about this.
I would have done something like this:
doStuff:
0e9de190: push %ebp
0e9de191: mov %esp,%ebp
308 return TRUE;
0e9de193: mov $0x1,%eax
0e9de1XX: mov %ebp,%esp
0e9de1XX: pop %ebp
0e9de1XX: ret
0e9de1XX: fill with lea 0x0, %esi
lea 0x0(%esi),%esi is a long NOP, and the jmp is jumping over it. You probably have an ancient version of binutils (containing as) to go with your ancient gcc version.
So when gcc put a .p2align to align a label in the middle of the function that isn't otherwise a branch target (for some bizarre reason, but it's -O0 so it's not even supposed to be good code), the assembler made a long NOP and jumped over it.
Normally you'd only jump over a block of NOPs if there were a lot of them, especially if they were all single-byte NOPs. This is really dumb code, so stop using such crusty tools. You could try upgrading your assembler (but still using gcc2.95 if you need to). Or check that it doesn't happen at -O2 or -O3, in which case it doesn't matter.
If you have to keep using gcc2.95 for some reason, then just be aware that it's ancient, and this is part of the tradeoff you're making to keep using whatever it is that's forcing you to use it.
I have two .asm files, one that calls a function inside the other. My files look like:
mainProg.asm:
global main
extern factorial
section .text
main:
;---snip---
push rcx
call factorial
pop rcx
;---snip---
ret
factorial.asm:
section .text
factorial:
cmp rdi, 0
je l2
mov rax, 1
l1:
mul rdi
dec rdi
jnz l1
ret
l2:
mov rax, 1
ret
(Yes, there's some things I could improve with the implementation.)
I tried to compile them according to the steps at How to link two nasm source files:
$ nasm -felf64 -o factorial.o factorial.asm
$ nasm -felf64 -o mainProg.o mainProg.asm
$ gcc -o mainProg mainProg.o factorial.o
The first two commands work without issue, but the last fails with
mainProg.o: In function `main':
mainProg.asm:(.text+0x22): undefined reference to `factorial'
collect2: error: ld returned 1 exit status
Changing the order of the object files doesn't change the error.
I tried searching for solutions to link two .o files, and I found the question C Makefile given two .o files. As mentioned there, I ran objdump -S factorial.o and got
factorial.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <factorial>:
0: 48 83 ff 00 cmp $0x0,%rdi
4: 74 0e je 14 <l2>
6: b8 01 00 00 00 mov $0x1,%eax
000000000000000b <l1>:
b: 48 f7 e7 mul %rdi
e: 48 ff cf dec %rdi
11: 75 f8 jne b <l1>
13: c3 retq
0000000000000014 <l2>:
14: b8 01 00 00 00 mov $0x1,%eax
19: c3 retq
which is pretty much identical to the source file. It clearly contains the factorial function, so why doesn't ld detect it? Is there a different method to link two .o files?
You need a global factorial assembler directive in factorial.asm. Without that, it's still in the symbol table, but the linker won't consider it for linking between objects.
A label like factorial: is half way between a global/external symbol and a local label like .loop1: would make (not present in the object file at all). Local labels are a good way to get less messy disassembly, with one block per function instead of a separate block starting after every branch target.
Non-global symbols are only useful for disassembly and stuff like that, AFAIK. I think they would get stripped, along with debug information, by strip.
Also, note that imul rax, rdi runs faster, because it doesn't have to store the high half of the result in %rdx, or even calculate it.
Also note that you can objdump -Mintel -d to get intel-syntax disassembly. Agner Fog's objconv is also very nice, but it's more typing because the output doesn't go to stdout by default. (Although a shell wrapper function or script can solve that.)
Anyway, this would be better:
global factorial
factorial:
mov eax, 1 ; depending on the assembler, might save a REX prefix
; early-out branch after setting rax, instead of duplicating the constant
test rdi, rdi ; test is shorter than compare-against-zero
jz .early_out
.loop: ; local label won't appear in the object file
imul rax, rdi
dec rdi
jnz .loop
.early_out:
ret
Why does main push/pop rcx? If you're writing functions that follow the standard ABI (definitely a good idea unless there's a large performance gain), and you want something to survive a call, keep it in a call-preserved register like rbx.
.text
.globl _start
_start:
pushq %rbp
movq %rsp,%rbp
movq $2, %rax
leaveq
retq
I'm compiling with -nostdlib:
[root# test]# gcc -nostdlib -Wall minimal.S &&./a.out
Segmentation fault
What's wrong here?
BTW,is it possible to make the entry point other names than main and _start?
As #jaquadro mentions, you can specify the entry point on the command line to the linker (or use a link script): gcc -Wall -Wextra -nostdlib -Wl,-eMyEntry minimal.S && ./a.out
The reason your program segfaults is, that since you're not using the standard library there is nowhere to return back to (retq). Instead call exit using the correct syscall (in this case it is 60, which is put into rax, the first (and only) parameter is put into rdi.
Example:
.text
.globl MyEntry
MyEntry:
# Use Syscall 60 (exit) to exit with error code 42
movq $60, %rax
movq $42, %rdi
syscall
Related question on how to perform syscalls on x86_64
You can set the entry point by passing an option to the linker
http://sca.uwaterloo.ca/coldfire/gcc-doc/docs/ld_24.html
To do this with gcc, you would do something like...
gcc all_my_other_gcc_commands -Wl,-e,start_symbol
main is different, it is not the entry point to your compiled application, although it is the function that will be called from the entry point. The entry point itself, if you're compiling C or C++ code, is defined in something like Start.S deep in the source tree of glibc, and is platform-dependent. If you're programming straight assembly, I don't know what actually goes on.
I have been trying to get a better idea of what happens under the hood by using the compiler to generate the assembly programs of various C programs at different optimization levels. There is something that has been bothering me for a while.
When I compile t.c as follows,
gcc -S t.c
I get the assembly in AT&T syntax as follows.
function:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
addl 8(%ebp), %eax
popl %ebp
ret
.size function, .-function
When I compile using the masm argument as follows:-
gcc -S t.c -masm=intel
I get the following output.
function:
push %ebp
mov %ebp, %esp
mov %eax, DWORD PTR [%ebp+12]
add %eax, DWORD PTR [%ebp+8]
pop %ebp
ret
.size function, .-function
There is a change in syntax but there are still "%"s before the notation of registers(this is why I don't prefer AT&T syntax in the first place).
Can someone shed some light on why this is happening? How do I solve this issue?
The GNU assembler (gas) does have a separate option for controlling the % prefix. Documentation seems to suggest GCC doesn't have such an option, but my GCC (version Debian 4.3.2-1.1) doesn't produce the % prefix.