What are .seh_* assembly commands that gcc outputs? - gcc

I use gcc -S for a hello world program. What are the 5 .seh_ commands? I can't seem to find much info at all about them when I search.
.file "hi.c"
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "Hello World\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
leaq .LC0(%rip), %rcx
call puts
movl $0, %eax
addq $32, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (rubenvb-4.8.0) 4.8.0"
.def puts; .scl 2; .type 32; .endef

These are gas's implementation of MASM's frame handling pseudos for generating an executable's .pdata and .xdata sections (structured exception handling stuff). Also check out Raw Pseudo Operations. Apparently if your code might be in the stack during an SEH unwind operation, you are expected to use these.
I found slightly more information at https://sourceware.org/ml/binutils/2009-08/msg00193.html. This thread seems to be the original checkin to gas to add support for all the .set_* pseudo ops.
I would like to show the patch for .pdata and .xdata generation of
pe-coff targets via gas, and to get some feed-back. This patch
includes support for arm, ppc, arm, sh (3&4), mips, and x64. As for
x86 there is no OS support for runtime function information, I spared
this part. It would just increase executable size for x86 PE and there
is no real gain for this target.
Short overview:
There are at the moment three different function entry formats preset.
The first is the MIPS one. The second version is for ARM, PPC, SH3,
and SH4 mainly for Windows CE. The third is the IA64 and x64 version.
Note, the IA64 isn't implemented yet, but to find information about
it, please see specification about IA64 on
http://download.intel.com/design/Itanium/Downloads/245358.pdf file.
The first version has just entries in the pdata section: BeginAddress,
EndAddress, ExceptionHandler, HandlerData, and PrologueEndAddress.
Each value is a pointer to the corresponding data and has size of 4
bytes.
The second variant has the following entries in the pdata section.
BeginAddress, PrologueLength (8 bits), EndAddress (22 bits),
Use-32-bit-instruction (1 bit), and Exception-Handler-Exists (1 bit).
If the FunctionLength is zero, or the Exception-Handler-Exists bit is
true, a DATA_EH block is placed directly before function entry.
The third version has a function entry block of BeginAddress (RVA),
EndAddress (RVA), and UnwindData (RVA). The description of the
prologue, excepetion-handler, and additional SEH data is stored within
the UNWIND_DATA field in the xdata section.
.seh_proc <fct_name>
This specifies, that a SEH block begins for the function <fct_name>. This is valid for all
targets.
.seh_endprologue
By this pseudo the location of the prologue end-address (taken by the current code address of the appearance of
this pseudo). Valid for all targets.
.seh_handler <handler>[,<handler-data>]
This pseudo specifies the handler function to be used. For version 2 the
handler-data field specifies the user optional data block. For version
3 the handler-data field can be a rva to user-data (for FHANDLER), if
the name is #unwind the UHANDLER unwind block is generated, and if it
is #except (or not specified at all) EHANDLER exception block is
generated.
.seh_eh
This pseudo is used for version 2 to indicate the location of the function begin in assembly. Here the PDATA_EH data is
may stored to.
.seh_32/.seh_no32
This pseudos are just used for version 2 (see above for description). At the moment it defaults to no32, if not
specified.
.seh_endproc
By this pseudo the end of the SEH block is specified.
.seh_setframe <reg>,<offset>
By this pseudo the frame-register and the offset (value between 0-240 with 16-byte
alignment) can be specified. This is just used by version 3.
.seh_stackalloc <size>
By this stack allocation in code is described for version 3.
.seh_pushreg <reg>
By this a general register push in code is described for version 3.
.seh_savereg <reg>
By this a general register save to memory in code is described for version 3.
.seh_savemm <mm>
By this a mm register save to memory in code is described for version 3.
.seh_savexmm
By this a xmm register save to memory in code is described for version 3.
.seh_pushframe
By this information about entry kind can be described for version 3.
.seh_scope <begin>,<end>,<handler>,<jump>
By this SCOPED entries for unwind or exceptions can be specified for
version 3. This is just valid for UHANDLE and EHANDLER xdata
descriptor and a global handler has to be specified. For handler and
jump arguments, names of #1,#0, and #null can be used and they are
specifying that a constant instead of a rva has to be used.
There is also some hard-core discussion of .xdata and .pdata (along with a bunch of links) at https://sourceware.org/ml/binutils/2009-04/msg00181.html.

I stopped them from being output by using:
gcc -S -fno-asynchronous-unwind-tables hi.c
so I can look that up. But I'm happy with just not having them output anymore.

They seem related to exception handling. That's all I could find.
http://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/external/gpl3/binutils/dist/gas/config/obj-coff-seh.h

Related

Writing and debugging a min program in asm

I am trying to write a program to find the minimum value of a list of integers in asm. Here is what I have so far:
.section .data
data_items:
.long 2,3,4,5,1,9,10 # set 10 as the sentinal value
.section text
.globl _start
_start:
# %ebx holds min
# %edi holds index (destination index)
# %eax current data item
movl $255, %ebx # set the current min to 255
movl $0, %edi # the index is also zero
start_loop:
movl data_items(,%edi,4), %eax # set %eax equal to the current data item
cmpl $10, %eax # compare %eax with zero to see if we should exit
je exit_loop # if it's the sentinel value, exit
incl %edi # increment the index
cmpl %eax, %edi # compare the current value to the current min
jge start_loop # if it's not less than the current value, go to start
movl %eax, %ebx # move the current value if less that the current min
jmp start_loop # always go back to the start if we've gotten this far
exit_loop:
movl $1, %eax # push the linux system call to %eax (1=exit)
int $0x80 # give linux control (so it will exit)
When I run this, I get the following:
$ as min.s -o min.o && ld min.o -o min && ./min
Segmentation fault (core dumped)
How is one supposed to debug asm? For example, at least in C the compiler tells you what the error might be and the line number, whereas here I know just about nothing. (Note: the error is having .section text instead of .section .text but how would one figure that out?)
It's very possible in C to write a program that compiles with no warnings but crashes (e.g. NULL pointer deref), and you'll see exactly the same thing. It's much more likely in asm, though.
You debug asm with a debugger, GDB for example. See tips at the bottom of https://stackoverflow.com/tags/x86/info. And if you make any system calls, use strace to see what your program is actually doing.
To debug this, you'd run it under GDB and notice that it segfaulted on the first instruction, movl $255, %ebx. It doesn't access memory so code-fetch must have faulted. So there must be something wrong with your sections that resulted in your code in section linked into a non-executable segment of your executable.
objdump -d would also have given you a hint: it disassembles the .text section by default, and this program doesn't have one.
The reason text instead of .text causes this problem is that the defaults for sections with random names that aren't one of the few specially-recognized ones are read+write without exec.
In GAS, use .text or .data, special shortcut directives for .section .text or .data which avoid this problem for those sections. https://sourceware.org/binutils/docs/as/Text.html
But not all "standard" sections have special directives, you do still need .section .rodata to switch to the read-only data section, where you should have put your array. (read, no write. On newer toolchains, also no exec). Instead of switching to the .bss section, though, you can use .comm or .lcomm (https://sourceware.org/binutils/docs/as/bss.html)
Another possible problem is that you're building this 32-bit code as a 64-bit executable (unless you're using a 32-bit-only install where as --32 is the default). Using 32-bit addressing modes works in 64-bit modes, truncating the address to 32 bits. That works when accessing static data in a position-dependent executable on Linux, because all code+data is linked into the low 2GiB of virtual address space.
But any access to (%esp) or -4(%ebp) or whatever would fault because the stack in a 64-bit process is mapped to a high address with non-zero bits outside the low 32.
You'd notice that problem in GDB because layout reg would show all 16 64-bit integer registers, RAX..R15.

x64 nasm: pushing memory addresses onto the stack & call function

I'm pretty new to x64-assembly on the Mac, so I'm getting confused porting some 32-bit code in 64-bit.
The program should simply print out a message via the printf function from the C standart library.
I've started with this code:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
push msg
call _printf
mov rsp, rbp
pop rbp
ret
Compiling it with nasm this way:
$ nasm -f macho64 main.s
Returned following error:
main.s:12: error: Mach-O 64-bit format does not support 32-bit absolute addresses
I've tried to fix that problem byte changing the code to this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
mov rax, msg ; shouldn't rax now contain the address of msg?
push rax ; push the address
call _printf
mov rsp, rbp
pop rbp
ret
It compiled fine with the nasm command above but now there is a warning while compiling the object file with gcc to actual program:
$ gcc main.o
ld: warning: PIE disabled. Absolute addressing (perhaps -mdynamic-no-pic) not
allowed in code signed PIE, but used in _main from main.o. To fix this warning,
don't compile with -mdynamic-no-pic or link with -Wl,-no_pie
Since it's a warning not an error I've executed the a.out file:
$ ./a.out
Segmentation fault: 11
Hope anyone knows what I'm doing wrong.
The 64-bit OS X ABI complies at large to the System V ABI - AMD64 Architecture Processor Supplement. Its code model is very similar to the Small position independent code model (PIC) with the differences explained here. In that code model all local and small data is accessed directly using RIP-relative addressing. As noted in the comments by Z boson, the image base for 64-bit Mach-O executables is beyond the first 4 GiB of the virtual address space, therefore push msg is not only an invalid way to put the address of msg on the stack, but it is also an impossible one since PUSH does not support 64-bit immediate values. The code should rather look similar to:
; this is what you *would* do for later args on the stack
lea rax, [rel msg] ; RIP-relative addressing
push rax
But in that particular case one needs not push the value on the stack at all. The 64-bit calling convention mandates that the fist 6 integer/pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9, exactly in that order. The first 8 floating-point or vector arguments go into XMM0, XMM1, ..., XMM7. Only after all the available registers are used or there are arguments that cannot fit in any of those registers (e.g. a 80-bit long double value) the stack is used. 64-bit immediate pushes are performed using MOV (the QWORD variant) and not PUSH. Simple return values are passed back in the RAX register. The caller must also provide stack space for the callee to save some of the registers.
printf is a special function because it takes variable number of arguments. When calling such functions AL (the low byte of RAX) should be set to the number of floating-point arguments, passed in the vector registers. Also note that RIP-relative addressing is preferred for data that lies within 2 GiB of the code.
Here is how gcc translates printf("This is a test\n"); into assembly on OS X:
xorl %eax, %eax # (1)
leaq L_.str(%rip), %rdi # (2)
callq _printf # (3)
L_.str:
.asciz "This is a test\n"
(this is AT&T style assembly, source is left, destination is right, register names are prefixed with %, data width is encoded as a suffix to the instruction name)
At (1) zero is put into AL (by zeroing the whole RAX which avoids partial-register delays) since no floating-point arguments are being passed. At (2) the address of the string is loaded in RDI. Note how the value is actually an offset from the current value of RIP. Since the assembler doesn't know what this value would be, it puts a relocation request in the object file. The linker then sees the relocation and puts the correct value at link time.
I am not a NASM guru, but I think the following code should do it:
default rel ; make [rel msg] the default for [msg]
section .data
msg: db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp ; re-aligns the stack by 16 before call
mov rbp, rsp
xor eax, eax ; al = 0 FP args in XMM regs
lea rdi, [rel msg]
call _printf
mov rsp, rbp
pop rbp
ret
No answer yet has explained why NASM reports
Mach-O 64-bit format does not support 32-bit absolute addresses
The reason NASM won't do this is explained in Agner Fog's Optimizing Assembly manual in section 3.3 Addressing modes under the subsection titled 32-bit absolute addressing in 64 bit mode he writes
32-bit absolute addresses cannot be used in Mac OS X, where addresses are above 2^32 by
default.
This is not a problem on Linux or Windows. In fact I already showed this works at static-linkage-with-glibc-without-calling-main. That hello world code uses 32-bit absolute addressing with elf64 and runs fine.
#HristoIliev suggested using rip relative addressing but did not explain that 32-bit absolute addressing in Linux would work as well. In fact if you change lea rdi, [rel msg] to lea rdi, [msg] it assembles and runs fine with nasm -efl64 but fails with nasm -macho64
Like this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
xor al, al
lea rdi, [msg]
call _printf
mov rsp, rbp
pop rbp
ret
You can check that this is an absolute 32-bit address and not rip relative with objdump. However, it's important to point out that the preferred method is still rip relative addressing. Agner in the same manual writes:
There is absolutely no reason to use absolute addresses for simple memory operands. Rip-
relative addresses make instructions shorter, they eliminate the need for relocation at load
time, and they are safe to use in all systems.
So when would use use 32-bit absolute addresses in 64-bit mode? Static arrays is a good candidate. See the following subsection Addressing static arrays in 64 bit mode. The simple case would be e.g:
mov eax, [A+rcx*4]
where A is the absolute 32-bit address of the static array. This works fine with Linux but once again you can't do this with Mac OS X because the image base is larger than 2^32 by default. To to this on Mac OS X see example 3.11c and 3.11d in Agner's manual. In example 3.11c you could do
mov eax, [(imagerel A) + rbx + rcx*4]
Where you use the extern reference from Mach O __mh_execute_header to get the image base. In example 3.11c you use rip relative addressing and load the address like this
lea rbx, [rel A]; rel tells nasm to do [rip + A]
mov eax, [rbx + 4*rcx] ; A[i]
According to the documentation for the x86 64bit instruction set http://download.intel.com/products/processor/manual/325383.pdf
PUSH only accepts 8, 16 and 32bit immediate values (64bit registers and register addressed memory blocks are allowed though).
PUSH msg
Where msg is a 64bit immediate address will not compile as you found out.
What calling convention is _printf defined as in your 64bit library?
Is it expecting the parameter on the stack or using a fast-call convention where the parameters on in registers? Because x86-64 makes more general purpose registers available the fast-call convention is used more often.

basic assembly not working on Mac (x86_64+Lion)?

here is the code(exit.s):
.section .data,
.section .text,
.globl _start
_start:
movl $1, %eax
movl $32, %ebx
syscall
when I execute " as exit.s -o exit.o && ld exit.o -o exit -e _start && ./exit"
the return is "Bus error: 10" and the output of "echo $?" is 138
I also tried the example of the correct answer in this question: Process command line in Linux 64 bit
stil get "bus error"...
First, you are using old 32-bit Linux kernel calling convention on Mac OS X - this absolutely doesn't work.
Second, syscalls in Mac OS X are structured in a different way - they all have a leading class identifier and a syscall number. The class can be Mach, BSD or something else (see here in the XNU source) and is shifted 24 bits to the left. Normal BSD syscalls have class 2 and thus begin from 0x2000000. Syscalls in class 0 are invalid.
As per §A.2.1 of the SysV AMD64 ABI, also followed by Mac OS X, syscall id (together with its class on XNU!) goes to %rax (or to %eax as the high 32 bits are unused on XNU). The fist argument goes in %rdi. Next goes to %rsi. And so on. %rcx is used by the kernel and its value is destroyed and that's why all functions in libc.dyld save it into %r10 before making syscalls (similarly to the kernel_trap macro from syscall_sw.h).
Third, code sections in Mach-O binaries are called __text and not .text as in Linux ELF and also reside in the __TEXT segment, collectively referred as (__TEXT,__text) (nasm automatically translates .text as appropriate if Mach-O is selected as target object type) - see the Mac OS X ABI Mach-O File Format Reference. Even if you get the assembly instructions right, putting them in the wrong segment/section leads to bus error. You can either use the .section __TEXT,__text directive (see here for directive syntax) or you can also use the (simpler) .text directive, or you can drop it altogether since it is assumed if no -n option was supplied to as (see the manpage of as).
Fourth, the default entry point for the Mach-O ld is called start (although, as you've already figured it out, it can be changed via the -e linker option).
Given all the above you should modify your assembler source to read as follows:
; You could also add one of the following directives for completeness
; .text
; or
; .section __TEXT,__text
.globl start
start:
movl $0x2000001, %eax
movl $32, %edi
syscall
Here it is, working as expected:
$ as -o exit.o exit.s; ld -o exit exit.o
$ ./exit; echo $?
32
Adding more explanation on the magic number. I made the same mistake by applying the Linux syscall number to my NASM.
From the xnu kernel sources in osfmk/mach/i386/syscall_sw.h (search SYSCALL_CLASS_SHIFT).
/*
* Syscall classes for 64-bit system call entry.
* For 64-bit users, the 32-bit syscall number is partitioned
* with the high-order bits representing the class and low-order
* bits being the syscall number within that class.
* The high-order 32-bits of the 64-bit syscall number are unused.
* All system classes enter the kernel via the syscall instruction.
Syscalls are partitioned:
#define SYSCALL_CLASS_NONE 0 /* Invalid */
#define SYSCALL_CLASS_MACH 1 /* Mach */
#define SYSCALL_CLASS_UNIX 2 /* Unix/BSD */
#define SYSCALL_CLASS_MDEP 3 /* Machine-dependent */
#define SYSCALL_CLASS_DIAG 4 /* Diagnostics */
As we can see, the tag for BSD system calls is 2. So that magic number 0x2000000 is constructed as:
// 2 << 24
#define SYSCALL_CONSTRUCT_UNIX(syscall_number) \
((SYSCALL_CLASS_UNIX << SYSCALL_CLASS_SHIFT) | \
(SYSCALL_NUMBER_MASK & (syscall_number)))
Why it uses BSD tag in the end, probably Apple switches from mach kernel to BSD kernel. Historical reason.
Inspired by the original answer.

Generating a pure (or flat) binary

How can you generate a flat binary that will run directly on the CPU?
That is, without an Operating System; also called free standing environment code (see What is the name for a program running directly without an OS?).
I've noticed that the assembler I'm using, as from the OS-X developer tools bundle, keeps generating Mach-O files, and not flat binaries.
This is the way I've done it. Using the linker that comes with the XCode Command Line Tools, you can combine object files using:
ld code1.o code2.o -o code.bin -r -U start
The -r asks ld to just combine object files together without making a library, -U tells ld to ignore the missing definition of _start (which would normally be provided by the C stdlib).
This creates a binary which still has some header bytes, but this is easily identified with
otool -l code.bin
Look for the __text section in the output:
Section
sectname __text
segname __TEXT
addr 0x00000000
size 0x0000003b
offset 240
align 2^4 (16)
reloff 300
nreloc 1
flags 0x80000400
reserved1 0
reserved2 0
Note the offset (which you can confirm by comparing the output of otool -l and hexdump). We don't want the headers so just use dd to copy out the bytes you need:
dd if=code.bin of=code_stripped.bin ibs=240 skip=1
where I've set the block size to the offset and skipping one block.
You don't. You get the linker to produce a flat (pure) binary. To do that, you have to write a linker script file with OUTPUT_FORMAT(binary). If memory serves, you also need to specify something about how the sections are merged, but I don't remember any of the details.
I don't think you necessarily need to do this. Some bootloaders can load more complex executable formats. For example, GRUB can load ELF right off the bat. I'm sure you can somehow get it or some other bootloader to load Mach-O files.
You may want to try using the nasm assembler -- it has an option to control the output binary format, including -f bin for flat binaries.
Note that you can't easily compile C code to flat binaries, since almost any C code will require binary features (like external symbols and relocations) which can't be represented in a flat binary.
There is no easy way I know of.
Once I needed to create plain binary file which will be loaded and executed by another program. However, as didn't allow me to do that. I tried to use gobjcopy to convert object file to raw binary, but it was not able to properly convert code such as this:
.quad LinkName2 - LinkName1
In binary file produced by gobjcopy it looked like
.quad 0
I've ended up writing special dumping program, which is executable that will save part of the memory on disk:
.set SYS_EXIT, 0x2000001
.set SYS_READ, 0x2000003
.set SYS_WRITE, 0x2000004
.set SYS_OPEN, 0x2000005
.set SYS_CLOSE, 0x2000006
.data
dumpfile: .ascii "./dump"
.byte 0
OutputFileDescriptor: .quad 0
.section __TEXT,__text,regular
.globl _main
_main:
movl $0644, %edx # file mode
movl $0x601, %esi # O_CREAT | O_TRUNC | O_WRONLY
leaq dumpfile(%rip), %rdi
movl $SYS_OPEN, %eax
syscall
movq %rax, OutputFileDescriptor(%rip)
movq $EndDump - BeginDump, %rdx
leaq BeginDump(%rip), %rsi
movq OutputFileDescriptor(%rip), %rdi
movl $SYS_WRITE, %eax
syscall
movq OutputFileDescriptor(%rip), %rdi
movl $SYS_CLOSE, %eax
syscall
Done:
movq %rax, %rdi
movl $SYS_EXIT, %eax
syscall
.align 3
BeginDump:
.include "dump.s"
EndDump:
.quad 0
The code that have to be saved as raw binary file is included in dump.s

ELF Shared Object in x86-64 Assembly language

I'm trying to create a Shared library (*.so) in ASM and I'm not sure that i do it correct...
My code is:
.section .data
.globl var1
var1:
.quad 0x012345
.section .text
.globl func1
func1:
xor %rax, %rax
# mov var1, %rcx # this is commented
ret
To compile it i run
gcc ker.s -g -fPIC -m64 -o ker.o
gcc ker.o -shared -fPIC -m64 -o libker.so
I can access variable var1 and call func1 with dlopen() and dlsym() from a program in C.
The problem is in variable var1. When i try to access it from func1, i.e. uncomment that line, the compiler generates an error:
/usr/bin/ld: ker.o: relocation R_X86_64_32S against `var1' can not be used when making a shared object; recompile with -fPIC
ker.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
I don't understand. I've already compiled with -fPIC, so what's wrong?
I've already compiled with -fPIC, so what's wrong?
That part of the error message is for people who are linking compiler-generated code.
You're writing asm by hand, so as datenwolf correctly wrote, when writing a shared library in assembly, you have to take care for yourself that the code is position independent.
This means file must not contain any 32-bit absolute addresses (because relocation to an arbitrary 64-bit base is impossible). 64-bit absolute relocations are supported, but normally you should only use that for jump tables.
mov var1, %rcx uses a 32-bit absolute addressing mode. You should normally never do this, even in position-dependent x86-64 code. The normal use-cases for 32-bit absolute addresses are: putting an address into a 64-bit register withmov $var1, %edi (zero-extends into RDI)
and indexing static arrays: mov arr(,%rdx,4), %edx
mov var1(%rip), %rcx uses a RIP-relative 32-bit offset. It's the efficient way to address static data, and compilers always use this even without -fPIE or -fPIC for static/global variables.
You have basically two possibilities:
Normal library-private static data, like C compilers will make for __attribute__((visibility("hidden"))) long var1;, same as for -fno-PIC.
.data
.globl var1 # linkable from other .o files in the same shared object / library
.hidden var1 # not visible for *dynamic* linking outside the library
var1:
.quad 0x012345
.text
.globl func1
func1:
xor %eax, %eax # return 0
mov var1(%rip), %rcx
ret
full symbol-interposition-aware code like compilers generate for -fPIC.
You have to use the Global Offset Table. This is how a compiler does it, if you tell him to produce code for a shared library.
Note that this comes with a performance hit because of the additional indirection.
See Sorry state of dynamic libraries on Linux for more about symbol-interposition and the overheads it imposes on code-gen for shared libraries if you're not careful about restricting symbol visibility to allow inlining.
var1#GOTPCREL is the address of a pointer to your var1, the pointer itself is reachable with rip-relative addressing, while the content (the address of var1) is filled by the linker during loading of the library. This supports the case where the program using your library defined var1, so var1 in your library should resolve to that memory location instead of the one in the .data or .bss (or .text) of your .so.
.section .data
.globl var1
# without .hidden
var1:
.quad 0x012345
.section .text
.globl func1
func1:
xor %eax, %eax
mov var1#GOTPCREL(%rip), %rcx
mov (%rcx), %rcx
ret
See some additional information at http://www.bottomupcs.com/global_offset_tables.html
An example on the Godbolt compiler explorer of -fPIC vs. -fPIE shows the difference that symbol-interposition makes for getting the address of non-hidden global variables:
movl $x, %eax 5 bytes, -fno-pie
leaq x(%rip), %rax 7 bytes, -fPIE and hidden globals or static with -fPIC
y#GOTPCREL(%rip), %rax 7 bytes and a load instead of just ALU, -fPIC with non-hidden globals.
Actually loading always uses x(%rip), except for non-hidden / non-static vars with -fPIC where it has to get the runtime address from the GOT first, because it's not a link-time constant offset relative to the code.
Related: 32-bit absolute addresses no longer allowed in x86-64 Linux? (PIE executables).
A previous version of this answer stated that the DATA and BSS segments could move relative to TEXT when loading a dynamic library. This is incorrect, only the library base address is relocatable. RIP-relative access to other segments within the same library is guaranteed to be ok, and compilers emit code that does this. The ELF headers specify how the segments (which contain the sections) need to be loaded/mapped into memory.
I don't understand. I've already compiled with -fPIC, so what's wrong?
-fPIC is a flag concerning the creation of machine code from non-machine code, i.e. which operations to use. In the compilation stage. Assembly is not compiled, though! Each assembly mnemonic maps directly to a machine instruction, your code is not compiled. It's just transcribed into a slightly different format.
Since you're writing it in assembly, your assembly code must be position independent to be linkable into a shared library. -fPIC has not effect in your case, because it only affects code generation.
Ok, i think i found something...
First solution from drhirsch gives almost the same error but the relocation type is changed. And type is always ended with 32. Why is it? Why 64 bit program uses 32-bit relocation?
I found this from googling: http://www.technovelty.org/code/c/relocation-truncated.html
It says:
For code optimisation purposes, the default immediate size to the mov
instructions is a 32-bit value
So that's the case. I use 64-bit program but relocation is 32-bit and all i need is to force it to be 64 bit with movabs instruction.
This code is assembling and working (access to var1 from internal function func1 and from external C program via dlsym()):
.section .data
.globl var1
var1:
.quad 0x012345
.section .text
.globl func1
func1:
movabs var1, %rax # if one is symbol, other must be %rax
inc %rax
movabs %rax, var1
ret
But i'm in doubt about Global Offset Table. Must i use it, or this "direct" access is absolutely correct?

Resources