Problem
When I compile my assembly code with as (binutils) and link using link.exe (Visual Studio 2015) the program crashes because of an unrelocated address.
When linking with gcc (gcc hello-64-gas.obj -o hello-64-gas.exe) the program runs correctly without crash though.
Am I correctly assuming that the object file generated by as should be compiler independent, since abi compatibility problems are in the hands of the assembly code writer?
Since I am a beginner, any explanation of my mistakes/incorrect assumptions is appreciated.
Platform
Windows 10, 64 bit
Linker: Visual Studio 2015 using the native command tools command prompt (x64)
Compiler: as from MinGW-w64
Example
The following code does not link correctly:
# hello-64-gas.asm print a string using printf
# Assemble: as hello-64-gas.asm -o hello-64-gas.obj --64
# Link: link -subsystem:CONSOLE hello-64-gas.obj -out:hello-64-gas.exe libcmt.lib libvcruntime.lib libucrt.lib legacy_stdio_definitions.lib
.intel_syntax noprefix
.global main
# Declare needed C functions
.extern printf
.section .data
msg: .asciz "Hello world"
fmt: .asciz "%s(%d; %f)\n"
myDouble: .double 2.33, -1.0
.text
main:
sub rsp, 8*5
mov rcx, offset flat: fmt
mov rdx, offset flat: msg
mov r8, 0xFF
mov r9, offset flat: myDouble
mov r9, [r9]
movq xmm4, r9
call printf
add rsp, 8*5
mov rax, 0
ret
When debugging it seems mov r9, offset flat: myDouble is not relocated: mov r9,18h, where 18h would be correct if the .data section where at position zero. Looking at the relocation table with objdump -dr hello-64-gas.obj yields:
...
19: 49 c7 c1 18 00 00 00 mov $0x18,%r9
1c: R_X86_64_32S .data
...
Variation (workaround?)
Replacing mov with movabs seems to work:
# hello-64-gas.asm print a string using printf
# Assemble: as hello-64-gas.asm -o hello-64-gas.obj --64
# Link: link -subsystem:CONSOLE hello-64-gas.obj -out:hello-64-gas.exe libcmt.lib libvcruntime.lib libucrt.lib legacy_stdio_definitions.lib
.intel_syntax noprefix
.global main
# Declare needed C functions
.extern printf
.section .data
msg: .asciz "Hello world"
fmt: .asciz "%s(%d; %f)\n"
myDouble: .double 2.33, -1.0
.text
main:
sub rsp, 8*5
movabs rcx, offset flat: fmt
movabs rdx, offset flat: msg
mov r8, 0xFF
movabs r9, offset flat: myDouble
mov r9, [r9]
movq xmm4, r9
call printf
add rsp, 8*5
mov rax, 0
ret
This does somehow run correctly when linked using link.exe.
The relocation that the GNU assembler is using for your references to myDouble, along with fmt and msg, isn't supported by Microsoft's linker. This relocation, called R_X86_64_32S by the GNU utilities and having a value of 0x11, isn't documented in Microsoft's PECOFF specification. As can be evidenced by using Microsoft's DUMPBIN on your object file, Microsoft's linker seems to use relocations with this value for some other undocumented purpose:
RELOCATIONS #1
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000007 EHANDLER 7 .data
0000000E EHANDLER 7 .data
0000001C EHANDLER 7 .data
00000029 REL32 00000000 C printf
As work around you can use either use:
a LEA instruction with RIP relative addressing, which generates a R_X86_64_PC32/REL32 relocation
as you found out yourself, a MOVABS instruction, which generates a R_X86_64_64/ADDR64 relocation
a 32-bit MOV instruction which generates a R_X86_64_32/ADDR32 relocation
In order these would be written as:
lea r9, [rip + myDouble]
movabs r9, offset myDouble
mov r9d, offset myDouble
These, along with mov r9, offset myDouble, are four different instructions with different encodings and subtly different semantics each requiring a different type of relocation.
The LEA instruction encodes myDouble as a 32-bit signed offset relative to RIP. This is the preferable instruction to use here, as it takes only 4 bytes to encode the address and it allows the executable to be loaded anywhere in the 64-bit address space. The only limitation is that executable needs to be less than 2G in size, but this is a fundamental limitation x64 PECOFF executables anyways.
The MOVABS encodes myDouble as a 64-bit absolute address. While in theory this allows myDouble to be located anywhere in the 64-bit address space, even more than 2G away from the instruction, it takes 8 bytes of encoding space and doesn't actually get you anything under Windows.
The 32-bit MOV instruction encodes myDouble as an unsigned 32-bit absolute address. It has the disadvantage of requiring the the executable to be loaded somewhere in the first 4G of address space. Because of this you need to use the /LARGEADDRESSAWARE:NO flag with the Microsoft linker otherwise you'll get an error.
The 64-bit MOV instruction you're using encodes myDouble as a 32-bit signed absolute address. This also limits where the executable can be loaded, and requires a type of relocation that Microsoft's PECOFF format isn't documented as having and isn't supported by Microsoft's linker.
Related
I have written a small piece of assembly with AT&T syntax and have currently declared three variables in the .data section. However, when I attempt to move any of those variables to a register, such as %eax, an error from gcc is raised. The code and error message is below:
.data
x:.int 14
y:.int 4
str: .string "some string\n"
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl x, %eax; #attempting to move the value of x to %eax;
leave
ret
The error raised is:
call_function.s:14:3: error: 32-bit absolute addressing is not supported in 64-bit mode
movl x, %eax;
^
I have also tried moving the value by first adding the $ character in front of x, however, a clang error is raised:
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Does anyone know how the value stored in x can be successfully moved to %eax? I am using x86 assembly on Mac OSX and compiling with gcc.
A RIP-relative addressing mode is the only good option for addressing static data on MacOS; the image base address is above 2^32 so 32-bit absolute addresses aren't usable even in position-dependent code (unlike x86-64 Linux). RIP-relative addressing of static data is position-independent, so it works even in position-independent executables (ASLR) and libraries.
movl x(%rip), %eax is the AT&T syntax for RIP-relative.
mov eax, dword ptr [rip+x] in GAS .intel_syntax noprefix.
Or, to get the address of a symbol into a register, lea x(%rip), %rdi
NASM syntax: mov eax, [rel x], or use default rel so [x] is RIP-relative.
See Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array for more background on what you can do on OS X, e.g. movabs x, %eax would be possible because the destination register is AL/AX/EAX/RAX. (64-bit absolute address, but don't do that because it's larger and not faster than a RIP-relative load.)
See also http://felixcloutier.com/x86/MOV.html.
I have two .asm files, one that calls a function inside the other. My files look like:
mainProg.asm:
global main
extern factorial
section .text
main:
;---snip---
push rcx
call factorial
pop rcx
;---snip---
ret
factorial.asm:
section .text
factorial:
cmp rdi, 0
je l2
mov rax, 1
l1:
mul rdi
dec rdi
jnz l1
ret
l2:
mov rax, 1
ret
(Yes, there's some things I could improve with the implementation.)
I tried to compile them according to the steps at How to link two nasm source files:
$ nasm -felf64 -o factorial.o factorial.asm
$ nasm -felf64 -o mainProg.o mainProg.asm
$ gcc -o mainProg mainProg.o factorial.o
The first two commands work without issue, but the last fails with
mainProg.o: In function `main':
mainProg.asm:(.text+0x22): undefined reference to `factorial'
collect2: error: ld returned 1 exit status
Changing the order of the object files doesn't change the error.
I tried searching for solutions to link two .o files, and I found the question C Makefile given two .o files. As mentioned there, I ran objdump -S factorial.o and got
factorial.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <factorial>:
0: 48 83 ff 00 cmp $0x0,%rdi
4: 74 0e je 14 <l2>
6: b8 01 00 00 00 mov $0x1,%eax
000000000000000b <l1>:
b: 48 f7 e7 mul %rdi
e: 48 ff cf dec %rdi
11: 75 f8 jne b <l1>
13: c3 retq
0000000000000014 <l2>:
14: b8 01 00 00 00 mov $0x1,%eax
19: c3 retq
which is pretty much identical to the source file. It clearly contains the factorial function, so why doesn't ld detect it? Is there a different method to link two .o files?
You need a global factorial assembler directive in factorial.asm. Without that, it's still in the symbol table, but the linker won't consider it for linking between objects.
A label like factorial: is half way between a global/external symbol and a local label like .loop1: would make (not present in the object file at all). Local labels are a good way to get less messy disassembly, with one block per function instead of a separate block starting after every branch target.
Non-global symbols are only useful for disassembly and stuff like that, AFAIK. I think they would get stripped, along with debug information, by strip.
Also, note that imul rax, rdi runs faster, because it doesn't have to store the high half of the result in %rdx, or even calculate it.
Also note that you can objdump -Mintel -d to get intel-syntax disassembly. Agner Fog's objconv is also very nice, but it's more typing because the output doesn't go to stdout by default. (Although a shell wrapper function or script can solve that.)
Anyway, this would be better:
global factorial
factorial:
mov eax, 1 ; depending on the assembler, might save a REX prefix
; early-out branch after setting rax, instead of duplicating the constant
test rdi, rdi ; test is shorter than compare-against-zero
jz .early_out
.loop: ; local label won't appear in the object file
imul rax, rdi
dec rdi
jnz .loop
.early_out:
ret
Why does main push/pop rcx? If you're writing functions that follow the standard ABI (definitely a good idea unless there's a large performance gain), and you want something to survive a call, keep it in a call-preserved register like rbx.
I'm pretty new to x64-assembly on the Mac, so I'm getting confused porting some 32-bit code in 64-bit.
The program should simply print out a message via the printf function from the C standart library.
I've started with this code:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
push msg
call _printf
mov rsp, rbp
pop rbp
ret
Compiling it with nasm this way:
$ nasm -f macho64 main.s
Returned following error:
main.s:12: error: Mach-O 64-bit format does not support 32-bit absolute addresses
I've tried to fix that problem byte changing the code to this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
mov rax, msg ; shouldn't rax now contain the address of msg?
push rax ; push the address
call _printf
mov rsp, rbp
pop rbp
ret
It compiled fine with the nasm command above but now there is a warning while compiling the object file with gcc to actual program:
$ gcc main.o
ld: warning: PIE disabled. Absolute addressing (perhaps -mdynamic-no-pic) not
allowed in code signed PIE, but used in _main from main.o. To fix this warning,
don't compile with -mdynamic-no-pic or link with -Wl,-no_pie
Since it's a warning not an error I've executed the a.out file:
$ ./a.out
Segmentation fault: 11
Hope anyone knows what I'm doing wrong.
The 64-bit OS X ABI complies at large to the System V ABI - AMD64 Architecture Processor Supplement. Its code model is very similar to the Small position independent code model (PIC) with the differences explained here. In that code model all local and small data is accessed directly using RIP-relative addressing. As noted in the comments by Z boson, the image base for 64-bit Mach-O executables is beyond the first 4 GiB of the virtual address space, therefore push msg is not only an invalid way to put the address of msg on the stack, but it is also an impossible one since PUSH does not support 64-bit immediate values. The code should rather look similar to:
; this is what you *would* do for later args on the stack
lea rax, [rel msg] ; RIP-relative addressing
push rax
But in that particular case one needs not push the value on the stack at all. The 64-bit calling convention mandates that the fist 6 integer/pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9, exactly in that order. The first 8 floating-point or vector arguments go into XMM0, XMM1, ..., XMM7. Only after all the available registers are used or there are arguments that cannot fit in any of those registers (e.g. a 80-bit long double value) the stack is used. 64-bit immediate pushes are performed using MOV (the QWORD variant) and not PUSH. Simple return values are passed back in the RAX register. The caller must also provide stack space for the callee to save some of the registers.
printf is a special function because it takes variable number of arguments. When calling such functions AL (the low byte of RAX) should be set to the number of floating-point arguments, passed in the vector registers. Also note that RIP-relative addressing is preferred for data that lies within 2 GiB of the code.
Here is how gcc translates printf("This is a test\n"); into assembly on OS X:
xorl %eax, %eax # (1)
leaq L_.str(%rip), %rdi # (2)
callq _printf # (3)
L_.str:
.asciz "This is a test\n"
(this is AT&T style assembly, source is left, destination is right, register names are prefixed with %, data width is encoded as a suffix to the instruction name)
At (1) zero is put into AL (by zeroing the whole RAX which avoids partial-register delays) since no floating-point arguments are being passed. At (2) the address of the string is loaded in RDI. Note how the value is actually an offset from the current value of RIP. Since the assembler doesn't know what this value would be, it puts a relocation request in the object file. The linker then sees the relocation and puts the correct value at link time.
I am not a NASM guru, but I think the following code should do it:
default rel ; make [rel msg] the default for [msg]
section .data
msg: db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp ; re-aligns the stack by 16 before call
mov rbp, rsp
xor eax, eax ; al = 0 FP args in XMM regs
lea rdi, [rel msg]
call _printf
mov rsp, rbp
pop rbp
ret
No answer yet has explained why NASM reports
Mach-O 64-bit format does not support 32-bit absolute addresses
The reason NASM won't do this is explained in Agner Fog's Optimizing Assembly manual in section 3.3 Addressing modes under the subsection titled 32-bit absolute addressing in 64 bit mode he writes
32-bit absolute addresses cannot be used in Mac OS X, where addresses are above 2^32 by
default.
This is not a problem on Linux or Windows. In fact I already showed this works at static-linkage-with-glibc-without-calling-main. That hello world code uses 32-bit absolute addressing with elf64 and runs fine.
#HristoIliev suggested using rip relative addressing but did not explain that 32-bit absolute addressing in Linux would work as well. In fact if you change lea rdi, [rel msg] to lea rdi, [msg] it assembles and runs fine with nasm -efl64 but fails with nasm -macho64
Like this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
xor al, al
lea rdi, [msg]
call _printf
mov rsp, rbp
pop rbp
ret
You can check that this is an absolute 32-bit address and not rip relative with objdump. However, it's important to point out that the preferred method is still rip relative addressing. Agner in the same manual writes:
There is absolutely no reason to use absolute addresses for simple memory operands. Rip-
relative addresses make instructions shorter, they eliminate the need for relocation at load
time, and they are safe to use in all systems.
So when would use use 32-bit absolute addresses in 64-bit mode? Static arrays is a good candidate. See the following subsection Addressing static arrays in 64 bit mode. The simple case would be e.g:
mov eax, [A+rcx*4]
where A is the absolute 32-bit address of the static array. This works fine with Linux but once again you can't do this with Mac OS X because the image base is larger than 2^32 by default. To to this on Mac OS X see example 3.11c and 3.11d in Agner's manual. In example 3.11c you could do
mov eax, [(imagerel A) + rbx + rcx*4]
Where you use the extern reference from Mach O __mh_execute_header to get the image base. In example 3.11c you use rip relative addressing and load the address like this
lea rbx, [rel A]; rel tells nasm to do [rip + A]
mov eax, [rbx + 4*rcx] ; A[i]
According to the documentation for the x86 64bit instruction set http://download.intel.com/products/processor/manual/325383.pdf
PUSH only accepts 8, 16 and 32bit immediate values (64bit registers and register addressed memory blocks are allowed though).
PUSH msg
Where msg is a 64bit immediate address will not compile as you found out.
What calling convention is _printf defined as in your 64bit library?
Is it expecting the parameter on the stack or using a fast-call convention where the parameters on in registers? Because x86-64 makes more general purpose registers available the fast-call convention is used more often.
When attempting to run the following assembly program:
.globl start
start:
pushq $0x0
movq $0x1, %rax
subq $0x8, %rsp
int $0x80
I am receiving the following errors:
dyld: no writable segment
Trace/BPT trap
Any idea what could be causing this? The analogous program in 32 bit assembly runs fine.
OSX now requires your executable to have a writable data segment with content, so it can relocate and link your code dynamically. Dunno why, maybe security reasons, maybe due to the new RIP register. If you put a .data segment in there (with some bogus content), you'll avoid the "no writable segment" error. IMO this is an ld bug.
Regarding the 64-bit syscall, you can do it 2 ways. GCC-style, which uses the _syscall PROCEDURE from libSystem.dylib, or raw. Raw uses the syscall instruction, not the int 0x80 trap. int 0x80 is an illegal instruction in 64-bit.
The "GCC method" will take care of categorizing the syscall for you, so you can use the same 32-bit numbers found in sys/syscall.h. But if you go raw, you'll have to classify what kind of syscall it is by ORing it with a type id. Here is an example of both. Note that the calling convention is different! (this is NASM syntax because gas annoys me)
; assemble with
; nasm -f macho64 -o syscall64.o syscall64.asm && ld -lc -ldylib1.o -e start -o syscall64 syscall64.o
extern _syscall
global start
[section .text align=16]
start:
; do it gcc-style
mov rdi, 0x4 ; sys_write
mov rsi, 1 ; file descriptor
mov rdx, hello
mov rcx, size
call _syscall ; we're calling a procedure, not trapping.
;now let's do it raw
mov rax, 0x2000001 ; SYS_exit = 1 and is type 2 (bsd call)
mov rdi, 0 ; Exit success = 0
syscall ; faster than int 0x80, and legal!
[section .data align=16]
hello: db "hello 64-bit syscall!", 0x0a
size: equ $-hello
check out http://www.opensource.apple.com/source/xnu/xnu-792.13.8/osfmk/mach/i386/syscall_sw.h for more info on how a syscall is typed.
The system call interface is different between 32 and 64 bits. Firstly, int $80 is replaced by syscall and the system call numbers are different. You will need to look up documentation for a 64-bit version of your system call. Here is an example of what a 64-bit program may look like.
I have a code like this:
.bss
woof: .long 0
.text
bleh:
...some op codes here.
now I would like to move the address of woof into eax. What's the intel syntax code here for doing that? The same goes with moving bleh's address into, say, ebx.
Your help is much appreciated!
The bss section can't have any actual objects in it. Some assemblers may still allow you to switch to the .bss section, but all you can do there is say something like: x: . = . + 4.
In most assemblers these days and specifically in gnu for intel, there is no longer a .bss directive, so you temporarily switch to bss and create the bss symbol in one shot with something like: .comm sym,size,alignment. This is why you are presumably getting an error ".bss directive not recognized" or something like that.
And then you can get the address with either:
lea woof, %eax
or
movl $woof, %eax
Update: aha, intel syntax, not intel architecture. OK:
.intel_syntax noprefix
lea esi,fun
lea esi,[fun]
mov eax,OFFSET FLAT:fun
.att_syntax
lea fun, %eax
mov $fun, %eax
.data
fun: .long 0x123
All the lea forms should generate the same code.