GCC generates useless instructions? - gcc

BOOL32 doStuff() {
return TRUE;
}
gcc 2.95 for vxworks 5.x, compiling the above code with -O0 for 32-bit x86 generated following code:
doStuff:
0e9de190: push %ebp
0e9de191: mov %esp,%ebp
308 return TRUE;
0e9de193: mov $0x1,%eax
0e9de198: jmp 0xe9de1a0 <doStuff+16>
312 {
0e9de19a: lea 0x0(%esi),%esi
// The JMP jumps here
0e9de1a0: mov %ebp,%esp
0e9de1a2: pop %ebp
0e9de1a3: ret
Everything looks normal until the JMP and LEA instruction. What are they for?
My guess is that it is some kind of alignment, but I am not sure about this.
I would have done something like this:
doStuff:
0e9de190: push %ebp
0e9de191: mov %esp,%ebp
308 return TRUE;
0e9de193: mov $0x1,%eax
0e9de1XX: mov %ebp,%esp
0e9de1XX: pop %ebp
0e9de1XX: ret
0e9de1XX: fill with lea 0x0, %esi

lea 0x0(%esi),%esi is a long NOP, and the jmp is jumping over it. You probably have an ancient version of binutils (containing as) to go with your ancient gcc version.
So when gcc put a .p2align to align a label in the middle of the function that isn't otherwise a branch target (for some bizarre reason, but it's -O0 so it's not even supposed to be good code), the assembler made a long NOP and jumped over it.
Normally you'd only jump over a block of NOPs if there were a lot of them, especially if they were all single-byte NOPs. This is really dumb code, so stop using such crusty tools. You could try upgrading your assembler (but still using gcc2.95 if you need to). Or check that it doesn't happen at -O2 or -O3, in which case it doesn't matter.
If you have to keep using gcc2.95 for some reason, then just be aware that it's ancient, and this is part of the tradeoff you're making to keep using whatever it is that's forcing you to use it.

Related

Implementing the "." word from Forth in x86 assembly

I am trying to make a function that, prints a number out on screen. Eventually, I'll make it able to take the top stack item, print it, and then pop it (like the "." word in Forth). But for now, I am trying to keep it simple. I think that I need to align the call stack in some way - and I figured that pushing and popping an arbitrary register before and after calling printf (rbx) would do the trick - but I am still getting a segmentation fault. A backtrace in GDB hasn't helped me make any progress either. Does anyone know why this code is causing a segmentation fault, and how to fix it?
How I am assembling (GAS):
gcc -masm=intel
.data
format_num: .ascii "%d\0"
.text
.global _main
.extern _printf
print_num:
push rbx
lea rdi, format_num[RIP]
mov esi, 250
xor eax, eax
call _printf
pop rbx
ret
_main:
call print_num
mov rdi, 0
mov rax, 0x2000001
syscall

Porting JonesForth to macOS v10.15 (Catalina)

I'm trying to make JonesForth run on a recent MacBook out of the box, just using Mac tools.
I started to convert everything 64 bits and attend to the Mac assembler syntax.
I got things to assemble, but I immediately run into a curious segmentation fault:
/* NEXT macro. */
.macro NEXT
lodsq
jmpq *(%rax)
.endm
...
/* Assembler entry point. */
.text
.globl start
.balign 16
start:
cld
mov %rsp,var_SZ(%rip) // Save the initial data stack pointer in FORTH variable S0.
mov return_stack_top(%rip),%rbp // Initialise the return stack.
//call set_up_data_segment
mov cold_start(%rip),%rsi // Initialise interpreter.
NEXT // Run interpreter!
.const
cold_start: // High-level code without a codeword.
.quad QUIT
QUIT is defined like this via macro defword:
.macro defword
.const_data
.balign 8
.globl name_$3
name_$3 :
.quad $4 // Link
.byte $2+$1 // Flags + length byte
.ascii $0 // The name
.balign 8 // Padding to next four-byte boundary
.globl $3
$3 :
.quad DOCOL // Codeword - the interpreter
// list of word pointers follow
.endm
// QUIT must not return (ie. must not call EXIT).
defword "QUIT",4,,QUIT,name_TELL
.quad RZ,RSPSTORE // R0 RSP!, clear the return stack
.quad INTERPRET // Interpret the next word
.quad BRANCH,-16 // And loop (indefinitely)
...more code
When I run this, I get a segmentation fault the first time in the NEXT macro:
(lldb) run
There is a running process, kill it and restart?: [Y/n] y
Process 83000 exited with status = 9 (0x00000009)
Process 83042 launched: '/Users/klapauciusisgreat/jonesforth64/jonesforth' (x86_64)
Process 83042 stopped
* thread #1, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x0000000100000698 jonesforth`start + 24
jonesforth`start:
-> 0x100000698 <+24>: jmpq *(%rax)
0x10000069a <+26>: nopw (%rax,%rax)
jonesforth`code_DROP:
0x1000006a0 <+0>: popq %rax
0x1000006a1 <+1>: lodsq (%rsi), %rax
Target 0: (jonesforth) stopped.
rax does point to what I think is the dereferenced address, DOCOL:
(lldb) register read
General Purpose Registers:
rax = 0x0000000100000660 jonesforth`DOCOL
So one mystery is:
Why does RAX point to DOCOL instead of QUIT? My guess is that the instruction was halfway executed and the result of the indirection was stored in rax. What are some good pointers to documentation?
Why the segmentation fault?
I commented out the original segment setup code in the original that called brk to set up a data segment. Another [implementation] also did not call it at all, so I thought I could as well ignore this. Is there any magic on how to set up segment permissions with syscalls in a 64-bit binary on Catalina? The make command is pretty much the standard JonesForth one:
jonesforth: jonesforth.S
gcc -nostdlib -g -static $(BUILD_ID_NONE) -o $# $<
P.S.: Yes, I can get JonesForth to work perfectly in Docker images, but that's besides the point. I really want it to work in 64 bit on Catalina, out of the box.
The original code had something like
mov $cold_start,%rsi
And the Apple assembler complains about not being able to use 32 immediate addressing in 64-bit binaries.
So I tried
mov $cold_start(%rip),%rsi
but that also doesn't work.
So I tried
mov cold_start(%rip),%rsi
which assembles, but of course it dereferences cold start, which is not something I need.
The correct way of doing this is apparently
lea cold_start(%rip),%rsi
This seems to work as intended.

Gdb: gcc -O0 assmbly exmple

Hopefully not a silly question.
Compiled without specifying optimization: gcc test.c -o test (which seem to chose -O0).
gcc -O2 or -O3 output much cleaner (at least it seems to me) assembly code than -O0.
What's the reason for -O0, how does it help us, I can't see that it's simpler than -O1 or -O2.
...
int sum(int x, int y)
{
int sum = x + y;
return sum;
}
...
0x00000000004004ed <+0>: push %rbp
0x00000000004004ee <+1>: mov %rsp,%rbp
0x00000000004004f1 <+4>: mov %edi,-0x14(%rbp)
0x00000000004004f4 <+7>: mov %esi,-0x18(%rbp)
0x00000000004004f7 <+10>: mov -0x18(%rbp),%eax
0x00000000004004fa <+13>: mov -0x14(%rbp),%edx
0x00000000004004fd <+16>: add %edx,%eax
0x00000000004004ff <+18>: mov %eax,-0x4(%rbp)
0x0000000000400502 <+21>: mov -0x4(%rbp),%eax
0x0000000000400505 <+24>: pop %rbp
0x0000000000400506 <+25>: retq
With optimizations turned off, there is a 1:1 representation between source code and machine code, allowing for easier debugging. With optimizations turned on, the compiler can do strange things like rearranging code or getting rid of variables that make debugging the code much harder.
Compiling with -O0 is also typically faster as the optimizer is usually the slowest component of every modern compiler.

What is the correct constant for the exit system call?

I am trying to learn x86_64 assembly, and am using GCC as my assembler. The exact command I'm using is:
gcc -nostdlib tapydn.S -D__ASSEMBLY__
I'm mainly using gcc for its preprocessor. Here is tapydn.S:
.global _start
#include <asm-generic/unistd.h>
syscall=0x80
.text
_start:
movl $__NR_exit, %eax
movl $0x00, %ebx
int $syscall
This results in a segmentation fault. I believe the problem is with the following line:
movl $__NR_exit, %eax
I used __NR_exit because it was more descriptive than some magic number. However, it appears that my usage of it is incorrect. I believe this to be the case because when I change the line in question to the following, it runs fine:
movl $0x01, %eax
Further backing up this trail of thought is the contents of usr/include/asm-generic/unistd.h:
#define __NR_exit 93
__SYSCALL(__NR_exit, sys_exit)
I expected the value of __NR_exit to be 1, not 93! Clearly I am misunderstanding its purpose and consequently its usage. For all I know, I'm getting lucky with the $0x01 case working (much like undefined behaviour in C++), so I kept digging...
Next, I looked for the definition of sys_exit. I couldn't find it. I tried using it anyway as follows (with and without the preceeding $):
movl $sys_exit, %eax
This wouldn't link:
/tmp/cc7tEUtC.o: In function `_start':
(.text+0x1): undefined reference to `sys_exit'
collect2: error: ld returned 1 exit status
My guess is that it's a symbol in one of the system libraries and I'm not linking it due to my passing -nostdlib to GCC. I'd like to avoid linking such a large library for just one symbol if possible.
In response to Jester's comment about mixing 32 and 64 bit constants, I tried using the value 0x3C as suggested:
movq $0x3C, %eax
movq $0x00, %ebx
This also resulting a segmentation fault. I also tried swapping out eax and ebx for rax and rbx:
movq $0x3C, %rax
movq $0x00, %rbx
The segmentation fault remained.
Jester then commented stating that I should be using syscall rather than int $0x80:
.global _start
#include <asm-generic/unistd.h>
.text
_start:
movq $0x3C, %rax
movq $0x00, %rbx
syscall
This works, but I was later informed that I should be using rdi instead of rbx as per the System V AMD64 ABI:
movq $0x00, %rdi
This also works fine, but still ends up using the magic number 0x3C for the system call number.
Wrapping up, my questions are as follows:
What is the correct usage of __NR_exit?
What should I be using instead of a magic number for the exit system call?
The correct header file to get the system call numbers is sys/syscall.h. The constants are called SYS_### where ### is the name of the system call you are interested in. The __NR_### macros are implementation details and should not be used. As a rule of thumb, if an identifier begins with an underscore it should not be used, if it begins with two it should definitely not be used. The arguments go into rdi, rsi, rdx, r10, r8, and r9. Here is a sample program for Linux:
#include <sys/syscall.h>
.globl _start
_start:
mov $SYS_exit,%eax
xor %edi,%edi
syscall
These conventions are mostly portable to other UNIX-like operating systems.

gcc gives weird Intel syntax

I have been trying to get a better idea of what happens under the hood by using the compiler to generate the assembly programs of various C programs at different optimization levels. There is something that has been bothering me for a while.
When I compile t.c as follows,
gcc -S t.c
I get the assembly in AT&T syntax as follows.
function:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
addl 8(%ebp), %eax
popl %ebp
ret
.size function, .-function
When I compile using the masm argument as follows:-
gcc -S t.c -masm=intel
I get the following output.
function:
push %ebp
mov %ebp, %esp
mov %eax, DWORD PTR [%ebp+12]
add %eax, DWORD PTR [%ebp+8]
pop %ebp
ret
.size function, .-function
There is a change in syntax but there are still "%"s before the notation of registers(this is why I don't prefer AT&T syntax in the first place).
Can someone shed some light on why this is happening? How do I solve this issue?
The GNU assembler (gas) does have a separate option for controlling the % prefix. Documentation seems to suggest GCC doesn't have such an option, but my GCC (version Debian 4.3.2-1.1) doesn't produce the % prefix.

Resources