What is the difference between dword and 'the stack' in assembler - macos

I am trying to learn assembler and am somewhat confused by the method used by osx with nasm macho32 for passing arguments to functions.
I am following the book 'Assembly Language Step By Step' by Jeff Duntemann and using the internet extensively have altered it to run on osx both 32 and 64 bit.
So to begin with the linux version from the book
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!",10
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
nop
mov eax, 4 ; Specify sys_write syscall
mov ebx, 1 ; Specify File Descriptor 1: Standard Output
mov ecx, EatMsg ; Pass offset of the message
mov edx, EatLen ; Pass the length of the message
int 0x80 ; Make syscall to output the text to stdout
mov eax, 1 ; Specify Exit syscall
mov ebx, 0 ; Return a code of zero
int 0x80 ; Make syscall to terminate the program
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
Then very similarly the 64 bit version for osx, other than changing the register names, replacing int 80H (which I understand is somewhat archaic) and adding 0x2000000 to the values moved to eax (don't understand this in the slightest) there isn't much to alter.
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
mov rax, 0x2000004 ; Specify sys_write syscall
mov rdi, 1 ; Specify File Descriptor 1: Standard Output
mov rsi, EatMsg ; Pass offset of the message
mov rdx, EatLen ; Pass the length of the message
syscall ; Make syscall to output the text to stdout
mov rax, 0x2000001 ; Specify Exit syscall
mov rdi, 0 ; Return a code of zero
syscall ; Make syscall to terminate the program
The 32 Bit mac version on the other hand is quite different. I can see we are pushing the arguments to the stack dword, so my question is (and sorry for the long preamble) what is the difference between the stack that eax is being pushed to and dword and why do we just use the registers and not the stack in the 64 bit version (and linux)?
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
mov eax, 0x4 ; Specify sys_write syscall
push dword EatLen ; Pass the length of the message
push dword EatMsg ; Pass offset of the message
push dword 1 ; Specify File Descriptor 1: Standard Output
push eax
int 0x80 ; Make syscall to output the text to stdout
add esp, 16 ; Move back the stack pointer
mov eax, 0x1 ; Specify Exit syscall
push dword 0 ; Return a code of zero
push eax
int 0x80 ; Make syscall to terminate the program

Well, you don't quite understand what is dword. Speaking HLL, it is not a variable, but rather a type. So push doword 1 means that you pushes a double word constant 1 into the stack. There only ONE stack, and both the one and the register eax are pushed in it.
The registers are used in linux because they are much faster, especially on old processors. Linux ABI (which is, as far as i know, a descent of System V ABI) was developed quite a long time ago and often used in systems where performance was critical, when the difference was very significant. OSX intel abi is much younger, afaik, and simplicity of using stack where more important in desktop OSX than the negligible slowdown. In 64-bit processors, more registers where added and hence the where more efficient to use them.

Related

Problems with getcwd syscall on OSX

Does anyone have an idea how to get the current working directory in OSX with NASM? The syscall getcwd isn't available on osx and dtruss pwd return lots of stat sys calls. However in the manual I can't find which structure variable of stat returns the current working directory.
Thanks.
That's a bit late answer, but nonetheless this can be achieved using 2 syscalls.
open_nocancel 0x2000018e (or open 0x2000005) opening a file descriptor for current dir
fcntl_nocancel 0x20000196 (or fcntl 0x2000005c) for reading the actual path
example:
#define F_GETPATH 50 ; from <sys/fcntl.h>
currentDirConstant:
db ".",0 ; needs segment read permission
outputPath:
resb 1000 ; needs segment write permission
mov rdi,currentDirConstant; input path 1st argument
xor esi, esi ; int flags 2nd argument, just use 0
xor edx, edx ; int mode 3rd argument, just use 0
mov eax, 0x2000018e ; open_nocancel syscall number
syscall
mov edi,eax ; file descriptor 1st argument of current dir from previous syscall
mov esi,F_GETPATH ; fcntl cmd 2nd argument
mov rdx, outputPath ; output buffer 3rd argument
mov eax, 0x20000196 ; fcntl_nocancel syscall number
syscall

General structure for executing system commands from x86-64 assembly (NASM)?

I am trying to make some basic system calls in assembly (x86-64 in NASM on OSX), but have so far been unsuccessful.
The only examples I have seen on the web so far are for reading from stdin or writing to stdout, such as this:
global main
section .text
main:
call write
write:
mov rax, 0x2000004
mov rdi, 1
mov rsi, message
mov rdx, length
syscall
section .data
message: db 'Hello, world!', 0xa
length: equ $ - message
However, when I try to use that same pattern to make another system call, it doesn't work (it's saying Bus error: 10):
global main
section .text
main:
call mkdir
mkdir:
mov rax, 0x2000136 ; mkdir system command number
mov rdi, rax ; point destination to system command
mov rsi, directory ; first argument
mov rdx, 755 ; second argument
syscall
section .data
directory: db 'tmp', 0xa
What is the general structure for calling system commands (on OSX in NASM ideally)?
Basically what it seems like you're supposed to do is find your desired system call in here: http://www.opensource.apple.com/source/xnu/xnu-1504.3.12/bsd/kern/syscalls.master. So the "write" one looks like this:
4 AUE_NULL ALL { user_ssize_t write(int fd, user_addr_t cbuf, user_size_t nbyte); }
That is saying:
system call number: 4
number of arguments: 3 (file descriptor, memory address to string/buffer, length of buffer)
So I was beginning to think the general pattern was this:
rax: system call number
rdi: maybe? point to system call ("destination index"), but why the `1` in the write example?
rsi: first argument to system call ("source index", the string in this case)
rdx: second argument to system call
rcx: third argument (if necessary, but not in the system write case)
So then it's like you could do a direct mapping of any of the system commands. So mkdir:
136 AUE_MKDIR ALL { int mkdir(user_addr_t path, int mode); }
would be translated to:
rax: 0x20000136 ; 136 + 20000000
rdi: i dunno, maybe `rax`?
rsi: directory (first argument)
rdx: 755 (mode, second argument)
But yeah, that doesn't work.
What am I doing wrong? What is the general pattern of how to do this so I can test it out on any of the other system commands in syscalls.master? Can you describe the role the different registers play here too? That would help clarify a lot I think.
I believe OSX is following the standard SYSV ABI calling convention, at least your example certainly looks like that. Arguments go in the registers RDI, RSI, RDX, R10, R8, and R9, in order. System call number goes into RAX.
Let's look at write: int fd, user_addr_t cbuf, user_size_t nbyte
The assembly:
mov rdi, 1 ; fd = 1 = stdout
mov rsi, message ; cbuf
mov rdx, length ; nbyte
Now, for mkdir: user_addr_t path, int mode
Obviously you need to put path into rdi and mode into rsi.
mkdir:
mov rax, 0x2000136 ; mkdir system command number
mov rdi, directory ; first argument
mov rsi, 0x1ED ; second argument, 0x1ED = 755 octal
syscall
ret
Note you need ret and the end of mkdir subroutine, and you also need one so your main doesn't fall through into mkdir. Furthermore, you should probably use lea to load the directory argument, and use RIP-relative addressing, such as lea rdi, [rel directory].
You've got it almost right: You need 0x88 (dec 136) for the syscall number. The syscalls in syscall.master are in decimal. You ended up calling getsid (which is syscall 310).
For arguments, don't use syscalls.master since that gives you the kernel perspective which is a tad skewed (when it comes to argument names). You should use /usr/include/unistd.h for the prototypes, and usr/inclunde/sys/syscall.h for the numbers. syscalls.master comes in handy only in cases where the syscalls aren't exported to these files, and those are cases where the master files says NO_SYSCALL_STUB.
As for the ABI, it's the same as System V AMD64 ABI. http://people.freebsd.org/~obrien/amd64-elf-abi.pdf
You can see the system calls as libsystem does them:
otool -tV /usr/lib/system/libsystem_kernel.dylib | more
# seek to /^_mkdir:
_mkdir:
0000000000012dfc movl $0x2000088, %eax
0000000000012e01 movq %rcx, %r10
0000000000012e04 syscall
0000000000012e06 jae 0x12e0d
0000000000012e08 jmpq cerror_nocancel
0000000000012e0d ret
0000000000012e0e nop
0000000000012e0f nop
All the system calls essentially have the structure:
Arguments by this point have been put in RDI,RSI,... as per above
ABI
The system call # is loaded into EAX. The 0x2 implies POSIX
syscall. 0x1 would be a Mach Trap, 0x3 - arch specific, 0x4 -
Diagnostic syscalls
rcx saved into r10
syscall gets executed
<< kernel portion occurs, wherein execution goes into kernel mode through trap,
and the value of eax is used to i) get to system call table and ii) branch to correct
system call >>
kernel mode returns to user mode, past the syscall instruction
EAX now holds the syscall return value, so
that "jae" means if the syscall return value is >=0 - i.e. ok -
continue to the "ret" and return to the user
if not, jump to
cerror_nocancel which loads the value of errno and returns the -1 to
the user.
The Bus error: 10 error appears to be caused by an incorrect syscall number and no exit syscall.
; nasm -f macho64 mkdir.asm && ld -o mkdir mkdir.o && ./mkdir
%define SYSCALL_MKDIR 0x2000088
%define SYSCALL_EXIT 0x2000001
global start
section .text
start:
call mkdir
call exit
ret
mkdir:
mov rax, SYSCALL_MKDIR
mov rdi, directory
mov rsi, 0x1ED
syscall
exit:
mov rax, SYSCALL_EXIT
mov rdi, 0
syscall
section .data
directory: db 'tmp', 0
Summary of changes to the original code:
Renaming the main symbol to start
Changing the mkdir syscall number from 0x2000136 to 0x2000088
Changing the registry assignments
Changing the 0xa character to 0 in the directory variable (works without but results in an incorrect filename)
NASM
I also had to install version 2.10.09 of nasm:
brew install https://raw.githubusercontent.com/Homebrew/homebrew/c1616860c8697ffed8887cae8088ab39141f0308/Library/Formula/nasm.rb
brew switch nasm 2.10.09
This was due to:
No nacho64 support in /usr/bin/nasm
Latest brew version (2.11.08) results in this error: fatal: No section for index 2 offset 0 found

GDB only read 1 character from STDIN

Currently I'm now learning assembly language, out of curiosity, I want to see how my program that was coded using assembly work under the hood, so I load up GDB (with -tui and layout asm ) and start debugging.
It came at my surprise when GDB asking for STDIN input, it only take 1 character and then stop taking others even if I'm not pressing enter.
I tried again in terminal, it's working fine and my program can take multiple STDIN input without any problem.
Here I provide partion from my testing assembly, this problem is reproducible with following assembly :
global main
segment .bss ; hold uninitialize buffer
szinput resb 10 ; declare 10 bytes uninitialize buffer
nszinput equ $ - szinput ; equ is meaning constant
segment .text
main:
pusha
mov eax, DWORD 3 ; read syscall
mov ebx, DWORD 0 ; 0 is stdin file descriptor
mov ecx, szinput ; void *buf
mov edx, nszinput ; size_t count
int 80h ; invoke syscall
popa
ret
Note that this assembly is for NASM assembler.
Running gdb --version, and this is the result : GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1

x64 nasm: pushing memory addresses onto the stack & call function

I'm pretty new to x64-assembly on the Mac, so I'm getting confused porting some 32-bit code in 64-bit.
The program should simply print out a message via the printf function from the C standart library.
I've started with this code:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
push msg
call _printf
mov rsp, rbp
pop rbp
ret
Compiling it with nasm this way:
$ nasm -f macho64 main.s
Returned following error:
main.s:12: error: Mach-O 64-bit format does not support 32-bit absolute addresses
I've tried to fix that problem byte changing the code to this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
mov rax, msg ; shouldn't rax now contain the address of msg?
push rax ; push the address
call _printf
mov rsp, rbp
pop rbp
ret
It compiled fine with the nasm command above but now there is a warning while compiling the object file with gcc to actual program:
$ gcc main.o
ld: warning: PIE disabled. Absolute addressing (perhaps -mdynamic-no-pic) not
allowed in code signed PIE, but used in _main from main.o. To fix this warning,
don't compile with -mdynamic-no-pic or link with -Wl,-no_pie
Since it's a warning not an error I've executed the a.out file:
$ ./a.out
Segmentation fault: 11
Hope anyone knows what I'm doing wrong.
The 64-bit OS X ABI complies at large to the System V ABI - AMD64 Architecture Processor Supplement. Its code model is very similar to the Small position independent code model (PIC) with the differences explained here. In that code model all local and small data is accessed directly using RIP-relative addressing. As noted in the comments by Z boson, the image base for 64-bit Mach-O executables is beyond the first 4 GiB of the virtual address space, therefore push msg is not only an invalid way to put the address of msg on the stack, but it is also an impossible one since PUSH does not support 64-bit immediate values. The code should rather look similar to:
; this is what you *would* do for later args on the stack
lea rax, [rel msg] ; RIP-relative addressing
push rax
But in that particular case one needs not push the value on the stack at all. The 64-bit calling convention mandates that the fist 6 integer/pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9, exactly in that order. The first 8 floating-point or vector arguments go into XMM0, XMM1, ..., XMM7. Only after all the available registers are used or there are arguments that cannot fit in any of those registers (e.g. a 80-bit long double value) the stack is used. 64-bit immediate pushes are performed using MOV (the QWORD variant) and not PUSH. Simple return values are passed back in the RAX register. The caller must also provide stack space for the callee to save some of the registers.
printf is a special function because it takes variable number of arguments. When calling such functions AL (the low byte of RAX) should be set to the number of floating-point arguments, passed in the vector registers. Also note that RIP-relative addressing is preferred for data that lies within 2 GiB of the code.
Here is how gcc translates printf("This is a test\n"); into assembly on OS X:
xorl %eax, %eax # (1)
leaq L_.str(%rip), %rdi # (2)
callq _printf # (3)
L_.str:
.asciz "This is a test\n"
(this is AT&T style assembly, source is left, destination is right, register names are prefixed with %, data width is encoded as a suffix to the instruction name)
At (1) zero is put into AL (by zeroing the whole RAX which avoids partial-register delays) since no floating-point arguments are being passed. At (2) the address of the string is loaded in RDI. Note how the value is actually an offset from the current value of RIP. Since the assembler doesn't know what this value would be, it puts a relocation request in the object file. The linker then sees the relocation and puts the correct value at link time.
I am not a NASM guru, but I think the following code should do it:
default rel ; make [rel msg] the default for [msg]
section .data
msg: db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp ; re-aligns the stack by 16 before call
mov rbp, rsp
xor eax, eax ; al = 0 FP args in XMM regs
lea rdi, [rel msg]
call _printf
mov rsp, rbp
pop rbp
ret
No answer yet has explained why NASM reports
Mach-O 64-bit format does not support 32-bit absolute addresses
The reason NASM won't do this is explained in Agner Fog's Optimizing Assembly manual in section 3.3 Addressing modes under the subsection titled 32-bit absolute addressing in 64 bit mode he writes
32-bit absolute addresses cannot be used in Mac OS X, where addresses are above 2^32 by
default.
This is not a problem on Linux or Windows. In fact I already showed this works at static-linkage-with-glibc-without-calling-main. That hello world code uses 32-bit absolute addressing with elf64 and runs fine.
#HristoIliev suggested using rip relative addressing but did not explain that 32-bit absolute addressing in Linux would work as well. In fact if you change lea rdi, [rel msg] to lea rdi, [msg] it assembles and runs fine with nasm -efl64 but fails with nasm -macho64
Like this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
xor al, al
lea rdi, [msg]
call _printf
mov rsp, rbp
pop rbp
ret
You can check that this is an absolute 32-bit address and not rip relative with objdump. However, it's important to point out that the preferred method is still rip relative addressing. Agner in the same manual writes:
There is absolutely no reason to use absolute addresses for simple memory operands. Rip-
relative addresses make instructions shorter, they eliminate the need for relocation at load
time, and they are safe to use in all systems.
So when would use use 32-bit absolute addresses in 64-bit mode? Static arrays is a good candidate. See the following subsection Addressing static arrays in 64 bit mode. The simple case would be e.g:
mov eax, [A+rcx*4]
where A is the absolute 32-bit address of the static array. This works fine with Linux but once again you can't do this with Mac OS X because the image base is larger than 2^32 by default. To to this on Mac OS X see example 3.11c and 3.11d in Agner's manual. In example 3.11c you could do
mov eax, [(imagerel A) + rbx + rcx*4]
Where you use the extern reference from Mach O __mh_execute_header to get the image base. In example 3.11c you use rip relative addressing and load the address like this
lea rbx, [rel A]; rel tells nasm to do [rip + A]
mov eax, [rbx + 4*rcx] ; A[i]
According to the documentation for the x86 64bit instruction set http://download.intel.com/products/processor/manual/325383.pdf
PUSH only accepts 8, 16 and 32bit immediate values (64bit registers and register addressed memory blocks are allowed though).
PUSH msg
Where msg is a 64bit immediate address will not compile as you found out.
What calling convention is _printf defined as in your 64bit library?
Is it expecting the parameter on the stack or using a fast-call convention where the parameters on in registers? Because x86-64 makes more general purpose registers available the fast-call convention is used more often.

64 bit assembly on Mac OS X runtime errors: "dyld: no writable segment" and "Trace/BPT trap"

When attempting to run the following assembly program:
.globl start
start:
pushq $0x0
movq $0x1, %rax
subq $0x8, %rsp
int $0x80
I am receiving the following errors:
dyld: no writable segment
Trace/BPT trap
Any idea what could be causing this? The analogous program in 32 bit assembly runs fine.
OSX now requires your executable to have a writable data segment with content, so it can relocate and link your code dynamically. Dunno why, maybe security reasons, maybe due to the new RIP register. If you put a .data segment in there (with some bogus content), you'll avoid the "no writable segment" error. IMO this is an ld bug.
Regarding the 64-bit syscall, you can do it 2 ways. GCC-style, which uses the _syscall PROCEDURE from libSystem.dylib, or raw. Raw uses the syscall instruction, not the int 0x80 trap. int 0x80 is an illegal instruction in 64-bit.
The "GCC method" will take care of categorizing the syscall for you, so you can use the same 32-bit numbers found in sys/syscall.h. But if you go raw, you'll have to classify what kind of syscall it is by ORing it with a type id. Here is an example of both. Note that the calling convention is different! (this is NASM syntax because gas annoys me)
; assemble with
; nasm -f macho64 -o syscall64.o syscall64.asm && ld -lc -ldylib1.o -e start -o syscall64 syscall64.o
extern _syscall
global start
[section .text align=16]
start:
; do it gcc-style
mov rdi, 0x4 ; sys_write
mov rsi, 1 ; file descriptor
mov rdx, hello
mov rcx, size
call _syscall ; we're calling a procedure, not trapping.
;now let's do it raw
mov rax, 0x2000001 ; SYS_exit = 1 and is type 2 (bsd call)
mov rdi, 0 ; Exit success = 0
syscall ; faster than int 0x80, and legal!
[section .data align=16]
hello: db "hello 64-bit syscall!", 0x0a
size: equ $-hello
check out http://www.opensource.apple.com/source/xnu/xnu-792.13.8/osfmk/mach/i386/syscall_sw.h for more info on how a syscall is typed.
The system call interface is different between 32 and 64 bits. Firstly, int $80 is replaced by syscall and the system call numbers are different. You will need to look up documentation for a 64-bit version of your system call. Here is an example of what a 64-bit program may look like.

Resources