How to compile assembly whose entry point is not main with gcc?

How to compile assembly whose entry point is not main with gcc? - gcc

.text
.globl _start
_start:
pushq %rbp
movq %rsp,%rbp
movq $2, %rax
leaveq
retq
I'm compiling with -nostdlib:
[root# test]# gcc -nostdlib -Wall minimal.S &&./a.out
Segmentation fault
What's wrong here?
BTW,is it possible to make the entry point other names than main and _start?

As #jaquadro mentions, you can specify the entry point on the command line to the linker (or use a link script): gcc -Wall -Wextra -nostdlib -Wl,-eMyEntry minimal.S && ./a.out
The reason your program segfaults is, that since you're not using the standard library there is nowhere to return back to (retq). Instead call exit using the correct syscall (in this case it is 60, which is put into rax, the first (and only) parameter is put into rdi.
Example:
.text
.globl MyEntry
MyEntry:
# Use Syscall 60 (exit) to exit with error code 42
movq $60, %rax
movq $42, %rdi
syscall
Related question on how to perform syscalls on x86_64

You can set the entry point by passing an option to the linker
http://sca.uwaterloo.ca/coldfire/gcc-doc/docs/ld_24.html
To do this with gcc, you would do something like...
gcc all_my_other_gcc_commands -Wl,-e,start_symbol
main is different, it is not the entry point to your compiled application, although it is the function that will be called from the entry point. The entry point itself, if you're compiling C or C++ code, is defined in something like Start.S deep in the source tree of glibc, and is platform-dependent. If you're programming straight assembly, I don't know what actually goes on.

Related

How to configure gcc to use -no-pie by default?

I want to compile the following program on Linux:
.global _start
.text
_start:
mov $1, %rax
mov $1, %rdi
mov $msg, %rsi
mov $13, %rdx
syscall
mov $60, %rax
xor %rdi, %rdi
syscall
msg:
.ascii "Hello World!\n"
However, it gives me the following linker error:
$ gcc -nostdlib hello.s
/usr/bin/ld: /tmp/ccMNQrOF.o: relocation R_X86_64_32S against `.text' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
I figured that the reason it doesn't work is because gcc is using -pie to generate a shared object by default. Thus, using -no-pie fixes it:
$ gcc -no-pie -nostdlib hello.s
$ ./a.out
Hello World!
How do I configure gcc to use -no-pie by default? I'm using Arch Linux.

I guess just don't configure gcc with --enable-default-pie.
See this blog post: http://nanxiao.me/en/gccs-enable-enable-default-pie-option-make-you-stuck-at-relocation-r_x86_64_32s-against-error/, and Arch patch that enabled pie by default: https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/gcc&id=5936710c764016ce306f9cb975056e5b7605a65b.

For this purpose you have to re-compile gcc to disable default PIE. Or you will need -no-pie each time you want to compile position dependent assembly code.
However, just for the example you provided, a better way is to use relative addressing like label_name(%rip).
Relative addressing allows PIE to function properly.
I modified yours into this: (see the leaq line)
.global _start
.text
_start:
movq $1, %rax
movq $1, %rdi
leaq msg(%rip), %rsi
movq $13, %rdx
syscall
movq $60, %rax
xorq %rdi, %rdi
syscall
.section .rodata
msg:
.ascii "Hello World!\n"
(I added .section .rodata just because usually this should be put into rodata section. Your version works fine, but output of objdump -d contains meaningless instructions from msg label.)

Fix relocations for global variables in position-independent executables with GCC

I'm looking for a gcc command-line flag or other settings to produce GOTOFF relocations rather than GOT relocations for my statically linked, position-independent i386 executable. More details on what I was trying below.
My source file g1.s looks like this:
extern int answer;
int get_answer1() { return answer; }
My other source file g2.s looks like this:
extern int answer;
int get_answer2() { return answer; }
I compile them with gcc -m32 -fPIE -Os -static -S -ffreestanding -fomit-frame-pointer -fno-unwind-tables -fno-asynchronous-unwind-tables g1.c for i386.
I get the following assembly output:
.file "g1.c"
.text
.globl get_answer1
.type get_answer1, #function
get_answer1:
call __x86.get_pc_thunk.cx
addl $_GLOBAL_OFFSET_TABLE_, %ecx
movl answer#GOT(%ecx), %eax
movl (%eax), %eax
ret
.size get_answer1, .-get_answer1
.section .text.__x86.get_pc_thunk.cx,"axG",#progbits,__x86.get_pc_thunk.cx,comdat
.globl __x86.get_pc_thunk.cx
.hidden __x86.get_pc_thunk.cx
.type __x86.get_pc_thunk.cx, #function
__x86.get_pc_thunk.cx:
movl (%esp), %ecx
ret
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
Here is how to reproduce this behavior online with GCC 7.2: https://godbolt.org/g/XXkxJh
Instead of GOT above, I'd like to get GOTOFF, and the movl %(eax), %eax should disappear, so the assembly code for the function should look like this:
get_answer1:
call __x86.get_pc_thunk.cx
addl $_GLOBAL_OFFSET_TABLE_, %ecx
movl answer#GOTOFF(%ecx), %eax
ret
I have verified that this GOTOFF assembly version is what works, and the GOT version doesn't work (because it has an extra pointer indirection).
How can I convince gcc to generate the GOTOFF version? I've tried various combinations of -fPIC, -fpic, -fPIE, -fpie, -pie, -fno-plt. None of them worked, all of them made gcc produce the GOT version.
I couldn't find any i386-specific flag on https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html or any generic flag here: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html
In fact, I'm getting GOTOFF relocations for "..." string literals, and I also want to get them for extern variables.
The final output is a statically linked executable in a custom binary format (for which I've written a GNU ld linker script). There is no dynamic linking and no shared libraries. The address randomization is performed by a custom loader, which is free to load the executable to any address. So I do need position-independent code. There is no per-segment memory mapping: the entire executable is loaded as is, contiguously.
All the documentation I've been able to find online talk about position-independent executables which are dynamically linked, and I wasn't able to find anything useful there.

I wasn't able to solve this with gcc -fPIE, so I solved it manually, by processing the output file.
I use gcc -Wl,-q, with an output ELF executable file containing the relocations. I post-process this ELF executable file, and I add the following assembly instructions to the beginning:
call next
next:
pop ebx
add [ebx + R0 + (after_add - next)], ebx
add [ebx + R1 + (after_add - next)], ebx
add [ebx + R2 + (after_add - next)], ebx
...
after_add:
, where R0, R1, R2 ... are the addresses of R_386_32 relocations in the ELF executable. The In use objdump -O binary prog.elf prog.bin', and nowprog.bin' contains position-independent code, because it starts with the `add [ebx + ...], ebx' instructions, which do the necessary relocations to the code when the code starts running.
Depending on the execution environment, the gcc flag -Wl,-N is needed, to make the .text section writable (the `add [ebx + ...], ebx' instructions need that).

What is the correct constant for the exit system call?

I am trying to learn x86_64 assembly, and am using GCC as my assembler. The exact command I'm using is:
gcc -nostdlib tapydn.S -D__ASSEMBLY__
I'm mainly using gcc for its preprocessor. Here is tapydn.S:
.global _start
#include <asm-generic/unistd.h>
syscall=0x80
.text
_start:
movl $__NR_exit, %eax
movl $0x00, %ebx
int $syscall
This results in a segmentation fault. I believe the problem is with the following line:
movl $__NR_exit, %eax
I used __NR_exit because it was more descriptive than some magic number. However, it appears that my usage of it is incorrect. I believe this to be the case because when I change the line in question to the following, it runs fine:
movl $0x01, %eax
Further backing up this trail of thought is the contents of usr/include/asm-generic/unistd.h:
#define __NR_exit 93
__SYSCALL(__NR_exit, sys_exit)
I expected the value of __NR_exit to be 1, not 93! Clearly I am misunderstanding its purpose and consequently its usage. For all I know, I'm getting lucky with the $0x01 case working (much like undefined behaviour in C++), so I kept digging...
Next, I looked for the definition of sys_exit. I couldn't find it. I tried using it anyway as follows (with and without the preceeding $):
movl $sys_exit, %eax
This wouldn't link:
/tmp/cc7tEUtC.o: In function `_start':
(.text+0x1): undefined reference to `sys_exit'
collect2: error: ld returned 1 exit status
My guess is that it's a symbol in one of the system libraries and I'm not linking it due to my passing -nostdlib to GCC. I'd like to avoid linking such a large library for just one symbol if possible.
In response to Jester's comment about mixing 32 and 64 bit constants, I tried using the value 0x3C as suggested:
movq $0x3C, %eax
movq $0x00, %ebx
This also resulting a segmentation fault. I also tried swapping out eax and ebx for rax and rbx:
movq $0x3C, %rax
movq $0x00, %rbx
The segmentation fault remained.
Jester then commented stating that I should be using syscall rather than int $0x80:
.global _start
#include <asm-generic/unistd.h>
.text
_start:
movq $0x3C, %rax
movq $0x00, %rbx
syscall
This works, but I was later informed that I should be using rdi instead of rbx as per the System V AMD64 ABI:
movq $0x00, %rdi
This also works fine, but still ends up using the magic number 0x3C for the system call number.
Wrapping up, my questions are as follows:
What is the correct usage of __NR_exit?
What should I be using instead of a magic number for the exit system call?

The correct header file to get the system call numbers is sys/syscall.h. The constants are called SYS_### where ### is the name of the system call you are interested in. The __NR_### macros are implementation details and should not be used. As a rule of thumb, if an identifier begins with an underscore it should not be used, if it begins with two it should definitely not be used. The arguments go into rdi, rsi, rdx, r10, r8, and r9. Here is a sample program for Linux:
#include <sys/syscall.h>
.globl _start
_start:
mov $SYS_exit,%eax
xor %edi,%edi
syscall
These conventions are mostly portable to other UNIX-like operating systems.

PIE disabled. Absolute addressing when asm programming with gcc on mac OS X

I wrote the code below. I want to compile it using gcc on mac OS X,
but I get a message saying "PIE disabled. Absolute addressing" when I run gcc.
I googled it, but cannot find a solution.
Please advise.
hello.s file:
.data
hello: .string "Hello World!\n"
.text
.globl _main
_main:
push %rbp
mov %rsp, %rbp
movabs $hello, %rdi
call _printf
leave
ret
The error:
ld: warning: PIE disabled. Absolute addressing (perhaps -mdynamic-no-pic) not allowed in
code signed PIE, but used in _main from /var/folders/xs/4z9kr_n93111fhv9_j1dd9gw0000gn/T/ex2_64-369300.o.
To fix this warning, don't compile with -mdynamic-no-pic or link with -Wl,-no_pie

Looks like there are a couple solutions:
Link with -Wl,-no_pie:
clang -o hello hello.s -Wl,-no_pie
Don't use absolute addressing.
.data
hello: .string "Hello World!\n"
.text
.globl _main
_main:
push %rbp
mov %rsp, %rbp
lea hello(%rip), %rdi
mov $0, %rax
call _printf
leave
ret
Then you can compile and run:
host % clang -o hello hello.s
host % ./hello
Hello World!
The bit about zeroing out al is mentioned in section 3.5.7 of System V Application Binary Interface. Here's the relevant excerpt:
When a function taking variable-arguments is called, %al must be set
to the total num- ber of floating point parameters passed to the
function in vector registers.
In your case this is zero. You are passing in zero floating point parameters.

ELF Shared Object in x86-64 Assembly language

I'm trying to create a Shared library (*.so) in ASM and I'm not sure that i do it correct...
My code is:
.section .data
.globl var1
var1:
.quad 0x012345
.section .text
.globl func1
func1:
xor %rax, %rax
# mov var1, %rcx # this is commented
ret
To compile it i run
gcc ker.s -g -fPIC -m64 -o ker.o
gcc ker.o -shared -fPIC -m64 -o libker.so
I can access variable var1 and call func1 with dlopen() and dlsym() from a program in C.
The problem is in variable var1. When i try to access it from func1, i.e. uncomment that line, the compiler generates an error:
/usr/bin/ld: ker.o: relocation R_X86_64_32S against `var1' can not be used when making a shared object; recompile with -fPIC
ker.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
I don't understand. I've already compiled with -fPIC, so what's wrong?

I've already compiled with -fPIC, so what's wrong?
That part of the error message is for people who are linking compiler-generated code.
You're writing asm by hand, so as datenwolf correctly wrote, when writing a shared library in assembly, you have to take care for yourself that the code is position independent.
This means file must not contain any 32-bit absolute addresses (because relocation to an arbitrary 64-bit base is impossible). 64-bit absolute relocations are supported, but normally you should only use that for jump tables.
mov var1, %rcx uses a 32-bit absolute addressing mode. You should normally never do this, even in position-dependent x86-64 code. The normal use-cases for 32-bit absolute addresses are: putting an address into a 64-bit register withmov $var1, %edi (zero-extends into RDI)
and indexing static arrays: mov arr(,%rdx,4), %edx
mov var1(%rip), %rcx uses a RIP-relative 32-bit offset. It's the efficient way to address static data, and compilers always use this even without -fPIE or -fPIC for static/global variables.
You have basically two possibilities:
Normal library-private static data, like C compilers will make for __attribute__((visibility("hidden"))) long var1;, same as for -fno-PIC.
.data
.globl var1 # linkable from other .o files in the same shared object / library
.hidden var1 # not visible for *dynamic* linking outside the library
var1:
.quad 0x012345
.text
.globl func1
func1:
xor %eax, %eax # return 0
mov var1(%rip), %rcx
ret
full symbol-interposition-aware code like compilers generate for -fPIC.
You have to use the Global Offset Table. This is how a compiler does it, if you tell him to produce code for a shared library.
Note that this comes with a performance hit because of the additional indirection.
See Sorry state of dynamic libraries on Linux for more about symbol-interposition and the overheads it imposes on code-gen for shared libraries if you're not careful about restricting symbol visibility to allow inlining.
var1#GOTPCREL is the address of a pointer to your var1, the pointer itself is reachable with rip-relative addressing, while the content (the address of var1) is filled by the linker during loading of the library. This supports the case where the program using your library defined var1, so var1 in your library should resolve to that memory location instead of the one in the .data or .bss (or .text) of your .so.
.section .data
.globl var1
# without .hidden
var1:
.quad 0x012345
.section .text
.globl func1
func1:
xor %eax, %eax
mov var1#GOTPCREL(%rip), %rcx
mov (%rcx), %rcx
ret
See some additional information at http://www.bottomupcs.com/global_offset_tables.html
An example on the Godbolt compiler explorer of -fPIC vs. -fPIE shows the difference that symbol-interposition makes for getting the address of non-hidden global variables:
movl $x, %eax 5 bytes, -fno-pie
leaq x(%rip), %rax 7 bytes, -fPIE and hidden globals or static with -fPIC
y#GOTPCREL(%rip), %rax 7 bytes and a load instead of just ALU, -fPIC with non-hidden globals.
Actually loading always uses x(%rip), except for non-hidden / non-static vars with -fPIC where it has to get the runtime address from the GOT first, because it's not a link-time constant offset relative to the code.
Related: 32-bit absolute addresses no longer allowed in x86-64 Linux? (PIE executables).
A previous version of this answer stated that the DATA and BSS segments could move relative to TEXT when loading a dynamic library. This is incorrect, only the library base address is relocatable. RIP-relative access to other segments within the same library is guaranteed to be ok, and compilers emit code that does this. The ELF headers specify how the segments (which contain the sections) need to be loaded/mapped into memory.

I don't understand. I've already compiled with -fPIC, so what's wrong?
-fPIC is a flag concerning the creation of machine code from non-machine code, i.e. which operations to use. In the compilation stage. Assembly is not compiled, though! Each assembly mnemonic maps directly to a machine instruction, your code is not compiled. It's just transcribed into a slightly different format.
Since you're writing it in assembly, your assembly code must be position independent to be linkable into a shared library. -fPIC has not effect in your case, because it only affects code generation.

Ok, i think i found something...
First solution from drhirsch gives almost the same error but the relocation type is changed. And type is always ended with 32. Why is it? Why 64 bit program uses 32-bit relocation?
I found this from googling: http://www.technovelty.org/code/c/relocation-truncated.html
It says:
For code optimisation purposes, the default immediate size to the mov
instructions is a 32-bit value
So that's the case. I use 64-bit program but relocation is 32-bit and all i need is to force it to be 64 bit with movabs instruction.
This code is assembling and working (access to var1 from internal function func1 and from external C program via dlsym()):
.section .data
.globl var1
var1:
.quad 0x012345
.section .text
.globl func1
func1:
movabs var1, %rax # if one is symbol, other must be %rax
inc %rax
movabs %rax, var1
ret
But i'm in doubt about Global Offset Table. Must i use it, or this "direct" access is absolutely correct?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio