x64 nasm: pushing memory addresses onto the stack & call function

x64 nasm: pushing memory addresses onto the stack & call function - macos

I'm pretty new to x64-assembly on the Mac, so I'm getting confused porting some 32-bit code in 64-bit.
The program should simply print out a message via the printf function from the C standart library.
I've started with this code:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
push msg
call _printf
mov rsp, rbp
pop rbp
ret
Compiling it with nasm this way:
$ nasm -f macho64 main.s
Returned following error:
main.s:12: error: Mach-O 64-bit format does not support 32-bit absolute addresses
I've tried to fix that problem byte changing the code to this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
mov rax, msg ; shouldn't rax now contain the address of msg?
push rax ; push the address
call _printf
mov rsp, rbp
pop rbp
ret
It compiled fine with the nasm command above but now there is a warning while compiling the object file with gcc to actual program:
$ gcc main.o
ld: warning: PIE disabled. Absolute addressing (perhaps -mdynamic-no-pic) not
allowed in code signed PIE, but used in _main from main.o. To fix this warning,
don't compile with -mdynamic-no-pic or link with -Wl,-no_pie
Since it's a warning not an error I've executed the a.out file:
$ ./a.out
Segmentation fault: 11
Hope anyone knows what I'm doing wrong.

The 64-bit OS X ABI complies at large to the System V ABI - AMD64 Architecture Processor Supplement. Its code model is very similar to the Small position independent code model (PIC) with the differences explained here. In that code model all local and small data is accessed directly using RIP-relative addressing. As noted in the comments by Z boson, the image base for 64-bit Mach-O executables is beyond the first 4 GiB of the virtual address space, therefore push msg is not only an invalid way to put the address of msg on the stack, but it is also an impossible one since PUSH does not support 64-bit immediate values. The code should rather look similar to:
; this is what you *would* do for later args on the stack
lea rax, [rel msg] ; RIP-relative addressing
push rax
But in that particular case one needs not push the value on the stack at all. The 64-bit calling convention mandates that the fist 6 integer/pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9, exactly in that order. The first 8 floating-point or vector arguments go into XMM0, XMM1, ..., XMM7. Only after all the available registers are used or there are arguments that cannot fit in any of those registers (e.g. a 80-bit long double value) the stack is used. 64-bit immediate pushes are performed using MOV (the QWORD variant) and not PUSH. Simple return values are passed back in the RAX register. The caller must also provide stack space for the callee to save some of the registers.
printf is a special function because it takes variable number of arguments. When calling such functions AL (the low byte of RAX) should be set to the number of floating-point arguments, passed in the vector registers. Also note that RIP-relative addressing is preferred for data that lies within 2 GiB of the code.
Here is how gcc translates printf("This is a test\n"); into assembly on OS X:
xorl %eax, %eax # (1)
leaq L_.str(%rip), %rdi # (2)
callq _printf # (3)
L_.str:
.asciz "This is a test\n"
(this is AT&T style assembly, source is left, destination is right, register names are prefixed with %, data width is encoded as a suffix to the instruction name)
At (1) zero is put into AL (by zeroing the whole RAX which avoids partial-register delays) since no floating-point arguments are being passed. At (2) the address of the string is loaded in RDI. Note how the value is actually an offset from the current value of RIP. Since the assembler doesn't know what this value would be, it puts a relocation request in the object file. The linker then sees the relocation and puts the correct value at link time.
I am not a NASM guru, but I think the following code should do it:
default rel ; make [rel msg] the default for [msg]
section .data
msg: db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp ; re-aligns the stack by 16 before call
mov rbp, rsp
xor eax, eax ; al = 0 FP args in XMM regs
lea rdi, [rel msg]
call _printf
mov rsp, rbp
pop rbp
ret

No answer yet has explained why NASM reports
Mach-O 64-bit format does not support 32-bit absolute addresses
The reason NASM won't do this is explained in Agner Fog's Optimizing Assembly manual in section 3.3 Addressing modes under the subsection titled 32-bit absolute addressing in 64 bit mode he writes
32-bit absolute addresses cannot be used in Mac OS X, where addresses are above 2^32 by
default.
This is not a problem on Linux or Windows. In fact I already showed this works at static-linkage-with-glibc-without-calling-main. That hello world code uses 32-bit absolute addressing with elf64 and runs fine.
#HristoIliev suggested using rip relative addressing but did not explain that 32-bit absolute addressing in Linux would work as well. In fact if you change lea rdi, [rel msg] to lea rdi, [msg] it assembles and runs fine with nasm -efl64 but fails with nasm -macho64
Like this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
xor al, al
lea rdi, [msg]
call _printf
mov rsp, rbp
pop rbp
ret
You can check that this is an absolute 32-bit address and not rip relative with objdump. However, it's important to point out that the preferred method is still rip relative addressing. Agner in the same manual writes:
There is absolutely no reason to use absolute addresses for simple memory operands. Rip-
relative addresses make instructions shorter, they eliminate the need for relocation at load
time, and they are safe to use in all systems.
So when would use use 32-bit absolute addresses in 64-bit mode? Static arrays is a good candidate. See the following subsection Addressing static arrays in 64 bit mode. The simple case would be e.g:
mov eax, [A+rcx*4]
where A is the absolute 32-bit address of the static array. This works fine with Linux but once again you can't do this with Mac OS X because the image base is larger than 2^32 by default. To to this on Mac OS X see example 3.11c and 3.11d in Agner's manual. In example 3.11c you could do
mov eax, [(imagerel A) + rbx + rcx*4]
Where you use the extern reference from Mach O __mh_execute_header to get the image base. In example 3.11c you use rip relative addressing and load the address like this
lea rbx, [rel A]; rel tells nasm to do [rip + A]
mov eax, [rbx + 4*rcx] ; A[i]

According to the documentation for the x86 64bit instruction set http://download.intel.com/products/processor/manual/325383.pdf
PUSH only accepts 8, 16 and 32bit immediate values (64bit registers and register addressed memory blocks are allowed though).
PUSH msg
Where msg is a 64bit immediate address will not compile as you found out.
What calling convention is _printf defined as in your 64bit library?
Is it expecting the parameter on the stack or using a fast-call convention where the parameters on in registers? Because x86-64 makes more general purpose registers available the fast-call convention is used more often.

Related

I got the error "32-bit absolute addressing is not supported in 64-bit mode" [duplicate]

I have written a small piece of assembly with AT&T syntax and have currently declared three variables in the .data section. However, when I attempt to move any of those variables to a register, such as %eax, an error from gcc is raised. The code and error message is below:
.data
x:.int 14
y:.int 4
str: .string "some string\n"
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl x, %eax; #attempting to move the value of x to %eax;
leave
ret
The error raised is:
call_function.s:14:3: error: 32-bit absolute addressing is not supported in 64-bit mode
movl x, %eax;
^
I have also tried moving the value by first adding the $ character in front of x, however, a clang error is raised:
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Does anyone know how the value stored in x can be successfully moved to %eax? I am using x86 assembly on Mac OSX and compiling with gcc.

A RIP-relative addressing mode is the only good option for addressing static data on MacOS; the image base address is above 2^32 so 32-bit absolute addresses aren't usable even in position-dependent code (unlike x86-64 Linux). RIP-relative addressing of static data is position-independent, so it works even in position-independent executables (ASLR) and libraries.
movl x(%rip), %eax is the AT&T syntax for RIP-relative.
mov eax, dword ptr [rip+x] in GAS .intel_syntax noprefix.
Or, to get the address of a symbol into a register, lea x(%rip), %rdi
NASM syntax: mov eax, [rel x], or use default rel so [x] is RIP-relative.
See Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array for more background on what you can do on OS X, e.g. movabs x, %eax would be possible because the destination register is AL/AX/EAX/RAX. (64-bit absolute address, but don't do that because it's larger and not faster than a RIP-relative load.)
See also http://felixcloutier.com/x86/MOV.html.

Unable to move variables in .data to registers with Mac x86 Assembly

I have written a small piece of assembly with AT&T syntax and have currently declared three variables in the .data section. However, when I attempt to move any of those variables to a register, such as %eax, an error from gcc is raised. The code and error message is below:
.data
x:.int 14
y:.int 4
str: .string "some string\n"
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl x, %eax; #attempting to move the value of x to %eax;
leave
ret
The error raised is:
call_function.s:14:3: error: 32-bit absolute addressing is not supported in 64-bit mode
movl x, %eax;
^
I have also tried moving the value by first adding the $ character in front of x, however, a clang error is raised:
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Does anyone know how the value stored in x can be successfully moved to %eax? I am using x86 assembly on Mac OSX and compiling with gcc.

A RIP-relative addressing mode is the only good option for addressing static data on MacOS; the image base address is above 2^32 so 32-bit absolute addresses aren't usable even in position-dependent code (unlike x86-64 Linux). RIP-relative addressing of static data is position-independent, so it works even in position-independent executables (ASLR) and libraries.
movl x(%rip), %eax is the AT&T syntax for RIP-relative.
mov eax, dword ptr [rip+x] in GAS .intel_syntax noprefix.
Or, to get the address of a symbol into a register, lea x(%rip), %rdi
NASM syntax: mov eax, [rel x], or use default rel so [x] is RIP-relative.
See Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array for more background on what you can do on OS X, e.g. movabs x, %eax would be possible because the destination register is AL/AX/EAX/RAX. (64-bit absolute address, but don't do that because it's larger and not faster than a RIP-relative load.)
See also http://felixcloutier.com/x86/MOV.html.

Why "mov rcx, rax" is required when calling printf in x64 assembler?

I am trying to learn x64 assembler. I wrote "hello world" and tried to call printf using the following code:
EXTERN printf: PROC
PUBLIC hello_world_asm
.data
hello_msg db "Hello world", 0
.code
hello_world_asm PROC
push rbp ; save frame pointer
mov rbp, rsp ; fix stack pointer
sub rsp, 8 * (4 + 2) ; shadow space (32bytes)
lea rax, offset hello_msg
mov rcx, rax ; <---- QUESTION ABOUT THIS LINE
call printf
; epilog. restore stack pointer
mov rsp, rbp
pop rbp
ret
hello_world_asm ENDP
END
At the beginning I called printf without "mov rcx, rax", which ended up with access violation. Getting all frustrated I just wrote in C++ a call to printf and looked in the disassembler. There I saw the line "mov rcx, rax" which fixed everything, but WHY do I need to move RAX to RCX ??? Clearly I am missing something fundamental.
Thanks for your help!
p.s. a reference to good x64 assembler tutorial is more than welcome :-) couldn't find one.

It isn't required, this code just wastes an instruction by doing an lea into RAX and then copying to RCX, when it could do
lea rcx, hello_msg
call printf ; printf(rcx, rdx, r8, r9, stack...)
printf on 64-bit Windows ignores RAX as an input; RAX is the return-value register in the Windows x64 calling convention (and can also be clobbered by void functions). The first 4 args go in RCX, RDX, R8, and R9 (if they're integer/pointer like here).
Also note that FP args in xmm0..3 have to be mirrored to the corresponding integer register for variadic functions like printf (MS's docs), but for integer args it's not required to movq xmm0, rcx.
In the x86-64 System V calling convention, variadic functions want al = the number of FP args passed in registers. (So you'd xor eax,eax to zero it). But the x64 Windows convention doesn't need that; it's optimized to make variadic functions easy to implement (instead of for higher performance / more register args for normal functions).
A few 32-bit calling conventions pass an arg in EAX, for example Irvine32, or gcc -m32 -mregparm=1. But no standard x86-64 calling conventions do. You can do whatever you like with private asm functions you write, but you have to follow the standard calling conventions when calling library functions.
Also note that lea rax, offset hello_msg was weird; LEA uses memory-operand syntax and machine encoding (and gives you the address instead of the data). offset hello_msg is an immediate, not a memory operand. But MASM accepts it as a memory operand anyway in this context.
You could use mov ecx, offset hello_msg in position-dependent code, otherwise you want a RIP-relative LEA. I'm not sure of the MASM syntax for that.

The Windows 64-bit (x64/AMD64) calling convention passes the first four integer arguments in RCX, RDX, R8 and R9.
The return value is stored in RAX and it is volatile so a C/C++ compiler is allowed to use it as generic storage in a function.

Assembly code of malloc

I want to view the assembly code of malloc(), calloc() and free() but when I print the assembly code on radare2 it gives me the following code:
push rbp
mov rbp, rsp
sub rsp, 0x10
mov eax, 0xc8
mov edi, eax
call sym.imp.malloc
xor ecx, ecx
mov qword [local_8h], rax
mov eax, ecx
add rsp, 0x10
pop rbp
ret
How can I see sym.imp.malloc function code? Is there any way to see the code or any website to see the assembly?

Since libc is an open-source library, it is freely available and you can simply read the source code.
The source-code of malloc is available on many places online (example), and you can view the source of different versions of libc under malloc/malloc.c here.
The symbol sym.imp.malloc is how radare flags the address of malloc in the PLT (Procedure Linkage Table) and not the function itself.
Reading the Assembly of the function can be done in several ways:
Open your local libc library with radare2, seek to malloc, analyze the function and then print its disassmbly:
$ r2 /usr/lib/libc.so.6
[0x00020630]> s sym.malloc
[0x0007c620]> af
[0x0007c620]> pdf
If you want to see malloc when linked to another binary you need to open the binary in debug mode, then step to main to make it load the library, then search for the address of malloc, seek to it, analyze the function and print the disassembly:
$ r2 -d /bin/ls
Process with PID 20540 started...
= attach 20540 20540
bin.baddr 0x00400000
Using 0x400000
Assuming filepath /bin/ls
asm.bits 64
[0x7fa764841d80]> dcu main
Continue until 0x004028b0 using 1 bpsize
hit breakpoint at: 4028b0
[0x004028b0]> dmi libc malloc~name=malloc$
vaddr=0x7fa764315620 paddr=0x0007c620 ord=4162 fwd=NONE sz=388 bind=LOCAL type=FUNC name=malloc
vaddr=0x7fa764315620 paddr=0x0007c620 ord=5225 fwd=NONE sz=388 bind=LOCAL type=FUNC name=malloc
vaddr=0x7fa764315620 paddr=0x0007c620 ord=5750 fwd=NONE sz=388 bind=GLOBAL type=FUNC name=malloc
vaddr=0x7fa764315620 paddr=0x0007c620 ord=7013 fwd=NONE sz=388 bind=GLOBAL type=FUNC name=malloc
[0x004028b0]> s 0x7fa764315620
[0x7fa764315620]> af
[0x7fa764315620]> pdf

64 bit assembly on Mac OS X runtime errors: "dyld: no writable segment" and "Trace/BPT trap"

When attempting to run the following assembly program:
.globl start
start:
pushq $0x0
movq $0x1, %rax
subq $0x8, %rsp
int $0x80
I am receiving the following errors:
dyld: no writable segment
Trace/BPT trap
Any idea what could be causing this? The analogous program in 32 bit assembly runs fine.

OSX now requires your executable to have a writable data segment with content, so it can relocate and link your code dynamically. Dunno why, maybe security reasons, maybe due to the new RIP register. If you put a .data segment in there (with some bogus content), you'll avoid the "no writable segment" error. IMO this is an ld bug.
Regarding the 64-bit syscall, you can do it 2 ways. GCC-style, which uses the _syscall PROCEDURE from libSystem.dylib, or raw. Raw uses the syscall instruction, not the int 0x80 trap. int 0x80 is an illegal instruction in 64-bit.
The "GCC method" will take care of categorizing the syscall for you, so you can use the same 32-bit numbers found in sys/syscall.h. But if you go raw, you'll have to classify what kind of syscall it is by ORing it with a type id. Here is an example of both. Note that the calling convention is different! (this is NASM syntax because gas annoys me)
; assemble with
; nasm -f macho64 -o syscall64.o syscall64.asm && ld -lc -ldylib1.o -e start -o syscall64 syscall64.o
extern _syscall
global start
[section .text align=16]
start:
; do it gcc-style
mov rdi, 0x4 ; sys_write
mov rsi, 1 ; file descriptor
mov rdx, hello
mov rcx, size
call _syscall ; we're calling a procedure, not trapping.
;now let's do it raw
mov rax, 0x2000001 ; SYS_exit = 1 and is type 2 (bsd call)
mov rdi, 0 ; Exit success = 0
syscall ; faster than int 0x80, and legal!
[section .data align=16]
hello: db "hello 64-bit syscall!", 0x0a
size: equ $-hello
check out http://www.opensource.apple.com/source/xnu/xnu-792.13.8/osfmk/mach/i386/syscall_sw.h for more info on how a syscall is typed.

The system call interface is different between 32 and 64 bits. Firstly, int $80 is replaced by syscall and the system call numbers are different. You will need to look up documentation for a 64-bit version of your system call. Here is an example of what a 64-bit program may look like.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio