Closing a file in x86_64 nasm assembly on mac - macos

I am having difficulties closing a file in nasm assembly on mac 64 bit. My goal is to make a file and write to it. For now I think that I opened the file the correct way. Now I need to close the file so that the file actually gets made. Here is my code so far.
global start
; default rel
section .text
start:
;open
mov rax, 0x2000005
mov rdi, file
mov rsi, 1
syscall
;close
mov rax, 0x2000006
mov rdi, file ;I need to replace this line. With what though?
syscall
mov rax, 0x2000001 ;Exiting
xor rdi, rdi
syscall
section .data
str: db "Hello world", 0
strlen: equ $ - str
file: db "test.txt"
lenfile: equ $ - file
I did a bit of research before I posted this question. I need to get something called a file handler, I think thats what its called. Help would be greatly appreciated.
EDIT
Answer:
;close
mov rdi, rax ;rax contains the file handler from the previous syscall. Moving it here before it gets cleared
mov rax, 0x2000006
syscall
Edit 2
;close
mov rdi, rax ;rax contains the file handler from the previous syscall. Moving it here before it gets cleared
mov rax, 0x2000006
mov rsi, 0x0201
syscall
Actual answer
Turns out that I needed to write the file path including the file and not just the file.
Edit 2 works btw.

Related

Implementing the "." word from Forth in x86 assembly

I am trying to make a function that, prints a number out on screen. Eventually, I'll make it able to take the top stack item, print it, and then pop it (like the "." word in Forth). But for now, I am trying to keep it simple. I think that I need to align the call stack in some way - and I figured that pushing and popping an arbitrary register before and after calling printf (rbx) would do the trick - but I am still getting a segmentation fault. A backtrace in GDB hasn't helped me make any progress either. Does anyone know why this code is causing a segmentation fault, and how to fix it?
How I am assembling (GAS):
gcc -masm=intel
.data
format_num: .ascii "%d\0"
.text
.global _main
.extern _printf
print_num:
push rbx
lea rdi, format_num[RIP]
mov esi, 250
xor eax, eax
call _printf
pop rbx
ret
_main:
call print_num
mov rdi, 0
mov rax, 0x2000001
syscall

Successive sys_write syscalls not working as expected, NASM bug on OS X?

I'm trying to learn MacOS assembly using NASM and I can't get a trivial program to work. I'm trying a variation of the "Hello, World" where the two words are independently called by a macro. My source code looks like this:
%macro printString 2
mov rax, 0x2000004 ; write
mov rdi, 1 ; stdout
mov rsi, %1
mov rdx, %2
syscall
%endmacro
global start
section .text
start:
printString str1,str1.len
printString str2,str2.len
mov rax, 0x2000001 ; exit
mov rdi, 0
syscall
section .data
str1: db "Hello,",10,
.len: equ $ - str1
str2: db "world",10
.len: equ $ - str2
The expected result should be:
$./hw
Hello,
World
$
Instead I get:
$./hw
Hello,
$
What am I missing? How do I fix it?
EDIT: I am compiling & running with the following commands:
/usr/local/bin/nasm -f macho64 hw.asm
ld -macosx_version_min 10.7.0 -lSystem -o hw hw.o
./hw
NASM 2.11.08 and 2.13.02+ have bugs with macho64 output. What you are observing seems to be something I saw specifically with 2.13.02+ recently when using absolute references. The final linked program has incorrect fixups applied so the reference to str2 is incorrect. The incorrect fixup causes us to print out memory that isn't str2.
NASM has a bug report about this issue in their system. I have added a specific example of this failure based on the code in the question. Hopefully the NASM developers will be able to reproduce the failure and create a fix.
Update: As of June 2018 my view is that there are enough recurring bugs and regressions in NASM that I do not recommend NASM at this point in time for Macho-64 development.
Another recommendation I have for Macho-64 development is to use RIP relative addressing rather than absolute. RIP relative addressing is the default for 64-bit programs on later versions of MacOS.
In NASM you can use the default rel directive in your file to change the default from absolute to RIP relative addresses. For this to work you will have to change from using mov register, variable to lea register, [variable] when trying to move the address of a variable to a register. Your revised code could look like:
default rel
%macro printString 2
mov rax, 0x2000004 ; write
mov rdi, 1 ; stdout
lea rsi, [%1]
mov rdx, %2
syscall
%endmacro
global start
section .text
start:
printString str1,str1.len
printString str2,str2.len
mov rax, 0x2000001 ; exit
mov rdi, 0
syscall
section .data
str1: db "Hello,",10
.len: equ $ - str1
str2: db "world",10
.len: equ $ - str2

Incorrect string address in generated executable [duplicate]

I'm trying to learn MacOS assembly using NASM and I can't get a trivial program to work. I'm trying a variation of the "Hello, World" where the two words are independently called by a macro. My source code looks like this:
%macro printString 2
mov rax, 0x2000004 ; write
mov rdi, 1 ; stdout
mov rsi, %1
mov rdx, %2
syscall
%endmacro
global start
section .text
start:
printString str1,str1.len
printString str2,str2.len
mov rax, 0x2000001 ; exit
mov rdi, 0
syscall
section .data
str1: db "Hello,",10,
.len: equ $ - str1
str2: db "world",10
.len: equ $ - str2
The expected result should be:
$./hw
Hello,
World
$
Instead I get:
$./hw
Hello,
$
What am I missing? How do I fix it?
EDIT: I am compiling & running with the following commands:
/usr/local/bin/nasm -f macho64 hw.asm
ld -macosx_version_min 10.7.0 -lSystem -o hw hw.o
./hw
NASM 2.11.08 and 2.13.02+ have bugs with macho64 output. What you are observing seems to be something I saw specifically with 2.13.02+ recently when using absolute references. The final linked program has incorrect fixups applied so the reference to str2 is incorrect. The incorrect fixup causes us to print out memory that isn't str2.
NASM has a bug report about this issue in their system. I have added a specific example of this failure based on the code in the question. Hopefully the NASM developers will be able to reproduce the failure and create a fix.
Update: As of June 2018 my view is that there are enough recurring bugs and regressions in NASM that I do not recommend NASM at this point in time for Macho-64 development.
Another recommendation I have for Macho-64 development is to use RIP relative addressing rather than absolute. RIP relative addressing is the default for 64-bit programs on later versions of MacOS.
In NASM you can use the default rel directive in your file to change the default from absolute to RIP relative addresses. For this to work you will have to change from using mov register, variable to lea register, [variable] when trying to move the address of a variable to a register. Your revised code could look like:
default rel
%macro printString 2
mov rax, 0x2000004 ; write
mov rdi, 1 ; stdout
lea rsi, [%1]
mov rdx, %2
syscall
%endmacro
global start
section .text
start:
printString str1,str1.len
printString str2,str2.len
mov rax, 0x2000001 ; exit
mov rdi, 0
syscall
section .data
str1: db "Hello,",10
.len: equ $ - str1
str2: db "world",10
.len: equ $ - str2

General structure for executing system commands from x86-64 assembly (NASM)?

I am trying to make some basic system calls in assembly (x86-64 in NASM on OSX), but have so far been unsuccessful.
The only examples I have seen on the web so far are for reading from stdin or writing to stdout, such as this:
global main
section .text
main:
call write
write:
mov rax, 0x2000004
mov rdi, 1
mov rsi, message
mov rdx, length
syscall
section .data
message: db 'Hello, world!', 0xa
length: equ $ - message
However, when I try to use that same pattern to make another system call, it doesn't work (it's saying Bus error: 10):
global main
section .text
main:
call mkdir
mkdir:
mov rax, 0x2000136 ; mkdir system command number
mov rdi, rax ; point destination to system command
mov rsi, directory ; first argument
mov rdx, 755 ; second argument
syscall
section .data
directory: db 'tmp', 0xa
What is the general structure for calling system commands (on OSX in NASM ideally)?
Basically what it seems like you're supposed to do is find your desired system call in here: http://www.opensource.apple.com/source/xnu/xnu-1504.3.12/bsd/kern/syscalls.master. So the "write" one looks like this:
4 AUE_NULL ALL { user_ssize_t write(int fd, user_addr_t cbuf, user_size_t nbyte); }
That is saying:
system call number: 4
number of arguments: 3 (file descriptor, memory address to string/buffer, length of buffer)
So I was beginning to think the general pattern was this:
rax: system call number
rdi: maybe? point to system call ("destination index"), but why the `1` in the write example?
rsi: first argument to system call ("source index", the string in this case)
rdx: second argument to system call
rcx: third argument (if necessary, but not in the system write case)
So then it's like you could do a direct mapping of any of the system commands. So mkdir:
136 AUE_MKDIR ALL { int mkdir(user_addr_t path, int mode); }
would be translated to:
rax: 0x20000136 ; 136 + 20000000
rdi: i dunno, maybe `rax`?
rsi: directory (first argument)
rdx: 755 (mode, second argument)
But yeah, that doesn't work.
What am I doing wrong? What is the general pattern of how to do this so I can test it out on any of the other system commands in syscalls.master? Can you describe the role the different registers play here too? That would help clarify a lot I think.
I believe OSX is following the standard SYSV ABI calling convention, at least your example certainly looks like that. Arguments go in the registers RDI, RSI, RDX, R10, R8, and R9, in order. System call number goes into RAX.
Let's look at write: int fd, user_addr_t cbuf, user_size_t nbyte
The assembly:
mov rdi, 1 ; fd = 1 = stdout
mov rsi, message ; cbuf
mov rdx, length ; nbyte
Now, for mkdir: user_addr_t path, int mode
Obviously you need to put path into rdi and mode into rsi.
mkdir:
mov rax, 0x2000136 ; mkdir system command number
mov rdi, directory ; first argument
mov rsi, 0x1ED ; second argument, 0x1ED = 755 octal
syscall
ret
Note you need ret and the end of mkdir subroutine, and you also need one so your main doesn't fall through into mkdir. Furthermore, you should probably use lea to load the directory argument, and use RIP-relative addressing, such as lea rdi, [rel directory].
You've got it almost right: You need 0x88 (dec 136) for the syscall number. The syscalls in syscall.master are in decimal. You ended up calling getsid (which is syscall 310).
For arguments, don't use syscalls.master since that gives you the kernel perspective which is a tad skewed (when it comes to argument names). You should use /usr/include/unistd.h for the prototypes, and usr/inclunde/sys/syscall.h for the numbers. syscalls.master comes in handy only in cases where the syscalls aren't exported to these files, and those are cases where the master files says NO_SYSCALL_STUB.
As for the ABI, it's the same as System V AMD64 ABI. http://people.freebsd.org/~obrien/amd64-elf-abi.pdf
You can see the system calls as libsystem does them:
otool -tV /usr/lib/system/libsystem_kernel.dylib | more
# seek to /^_mkdir:
_mkdir:
0000000000012dfc movl $0x2000088, %eax
0000000000012e01 movq %rcx, %r10
0000000000012e04 syscall
0000000000012e06 jae 0x12e0d
0000000000012e08 jmpq cerror_nocancel
0000000000012e0d ret
0000000000012e0e nop
0000000000012e0f nop
All the system calls essentially have the structure:
Arguments by this point have been put in RDI,RSI,... as per above
ABI
The system call # is loaded into EAX. The 0x2 implies POSIX
syscall. 0x1 would be a Mach Trap, 0x3 - arch specific, 0x4 -
Diagnostic syscalls
rcx saved into r10
syscall gets executed
<< kernel portion occurs, wherein execution goes into kernel mode through trap,
and the value of eax is used to i) get to system call table and ii) branch to correct
system call >>
kernel mode returns to user mode, past the syscall instruction
EAX now holds the syscall return value, so
that "jae" means if the syscall return value is >=0 - i.e. ok -
continue to the "ret" and return to the user
if not, jump to
cerror_nocancel which loads the value of errno and returns the -1 to
the user.
The Bus error: 10 error appears to be caused by an incorrect syscall number and no exit syscall.
; nasm -f macho64 mkdir.asm && ld -o mkdir mkdir.o && ./mkdir
%define SYSCALL_MKDIR 0x2000088
%define SYSCALL_EXIT 0x2000001
global start
section .text
start:
call mkdir
call exit
ret
mkdir:
mov rax, SYSCALL_MKDIR
mov rdi, directory
mov rsi, 0x1ED
syscall
exit:
mov rax, SYSCALL_EXIT
mov rdi, 0
syscall
section .data
directory: db 'tmp', 0
Summary of changes to the original code:
Renaming the main symbol to start
Changing the mkdir syscall number from 0x2000136 to 0x2000088
Changing the registry assignments
Changing the 0xa character to 0 in the directory variable (works without but results in an incorrect filename)
NASM
I also had to install version 2.10.09 of nasm:
brew install https://raw.githubusercontent.com/Homebrew/homebrew/c1616860c8697ffed8887cae8088ab39141f0308/Library/Formula/nasm.rb
brew switch nasm 2.10.09
This was due to:
No nacho64 support in /usr/bin/nasm
Latest brew version (2.11.08) results in this error: fatal: No section for index 2 offset 0 found

What is the difference between dword and 'the stack' in assembler

I am trying to learn assembler and am somewhat confused by the method used by osx with nasm macho32 for passing arguments to functions.
I am following the book 'Assembly Language Step By Step' by Jeff Duntemann and using the internet extensively have altered it to run on osx both 32 and 64 bit.
So to begin with the linux version from the book
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!",10
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
nop
mov eax, 4 ; Specify sys_write syscall
mov ebx, 1 ; Specify File Descriptor 1: Standard Output
mov ecx, EatMsg ; Pass offset of the message
mov edx, EatLen ; Pass the length of the message
int 0x80 ; Make syscall to output the text to stdout
mov eax, 1 ; Specify Exit syscall
mov ebx, 0 ; Return a code of zero
int 0x80 ; Make syscall to terminate the program
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
Then very similarly the 64 bit version for osx, other than changing the register names, replacing int 80H (which I understand is somewhat archaic) and adding 0x2000000 to the values moved to eax (don't understand this in the slightest) there isn't much to alter.
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
mov rax, 0x2000004 ; Specify sys_write syscall
mov rdi, 1 ; Specify File Descriptor 1: Standard Output
mov rsi, EatMsg ; Pass offset of the message
mov rdx, EatLen ; Pass the length of the message
syscall ; Make syscall to output the text to stdout
mov rax, 0x2000001 ; Specify Exit syscall
mov rdi, 0 ; Return a code of zero
syscall ; Make syscall to terminate the program
The 32 Bit mac version on the other hand is quite different. I can see we are pushing the arguments to the stack dword, so my question is (and sorry for the long preamble) what is the difference between the stack that eax is being pushed to and dword and why do we just use the registers and not the stack in the 64 bit version (and linux)?
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
mov eax, 0x4 ; Specify sys_write syscall
push dword EatLen ; Pass the length of the message
push dword EatMsg ; Pass offset of the message
push dword 1 ; Specify File Descriptor 1: Standard Output
push eax
int 0x80 ; Make syscall to output the text to stdout
add esp, 16 ; Move back the stack pointer
mov eax, 0x1 ; Specify Exit syscall
push dword 0 ; Return a code of zero
push eax
int 0x80 ; Make syscall to terminate the program
Well, you don't quite understand what is dword. Speaking HLL, it is not a variable, but rather a type. So push doword 1 means that you pushes a double word constant 1 into the stack. There only ONE stack, and both the one and the register eax are pushed in it.
The registers are used in linux because they are much faster, especially on old processors. Linux ABI (which is, as far as i know, a descent of System V ABI) was developed quite a long time ago and often used in systems where performance was critical, when the difference was very significant. OSX intel abi is much younger, afaik, and simplicity of using stack where more important in desktop OSX than the negligible slowdown. In 64-bit processors, more registers where added and hence the where more efficient to use them.

Resources