I have this code
global start
section .text
start:
mov rax,0x2000004
mov rdi,1
mov rsi,msg
mov rdx,msg.len
syscall
mov rax,0x2000004
mov rdi,2
mov rsi,msgt
mov rdx,msgt.len
syscall
mov rax,0x2000004
mov rdi,3
mov rsi,msgtn
mov rdx,msgtn.len
syscall
mov rax,0x2000001
mov rdi,0
syscall
section .data
msg: db "This is a string",10
.len: equ $ - msg
var: db 1
msgt: db "output of 1+1: "
.len: equ $ - msgt
msgtn: db 1
.len: equ $ - msg
I want to print the variable msgtn. I tried msgt: db "output of 1+1", var
But the NASM assembler failed with:
second.s:35: error: Mach-O 64-bit format does not support 32-bit absolute addresses
Instead of the variable, I also tried "output of 1+1", [1+1], but I got:
second.s:35: error: expression syntax error
I tried it also without the parantheses, there was no number, but only the string "1+1".
The command I used to assemble my program was:
/usr/local/Cellar/nasm/*/bin/nasm -f macho64 second.s && ld -macosx_version_min 10.7.0 second.o second.o
nasm -v shows:
NASM version 2.11.08 compiled on Nov 27 2015
OS X 10.9.5 with Intel core i5 (x86_64 assembly)
db directives let you put assemble-time-constant bytes into the object file (usually in the data section). You can use an expression as an argument, to have the assembler do some math for you at assemble time. Anything that needs to happen at run time needs to be done by instructions that you write, and that get run. It's not like C++ where a global variable can have a constructor that gets run at startup behind the scenes.
msgt: db "output of 1+1", var
would place those ascii characters, followed by (the low byte of?) the absolute address of var. You'd use this kind of thing (with dd or dq) to do something like this C: int var; int *global_ptr = &var;, where you have a global/static pointer variable that starts out initialized to point to another global/static variable. I'm not sure if MacOS X allows this with a 64bit pointer, or if it just refuses to do relocations for 32bit addresses. But that's why you're getting:
second.s:35: error: Mach-O 64-bit format does not support 32-bit absolute addresses
Notice that numeric value of the pointer depends on where in virtual address space the code is loaded. So the address isn't strictly an assemble-time constant. The linker needs to mark things that need run-time relocation, like those 64bit immediate-constant addresses you mov into registers (mov rsi,msg). See this answer for some information on the difference between that and lea rsi, [rel msg] to get the address into a register using a RIP-relative method. (That answer has links to more detailed info, and so do the x86 wiki).
Your attempt at using db [1+1]: What the heck were you expecting? [] in NASM syntax means memory reference. First: the resulting byte has to be an assemble-time constant. I'm not sure if there's an easy syntax for duplicating whatever's at some other address, but this isn't it. (I'd just define a macro and use it in both places.) Second: 2 is not a valid address.
msgt: db "output of 1+1: ", '0' + 1 + 1, 10
would put the ASCII characters: output of 1+1: 2\n at that point in the object file. 10 is the decimal value of ASCII newline. '0' is a way of writing 0x30, the ASCII encoding the character '0'. A 2 byte is not a printable ASCII character. Your version that did that would have printed a 2 byte there, but you wouldn't notice unless you piped the output into hexdump (or od -t x1c or something, IDK what OS X provides. od isn't very nice, but it is widely available.)
Note that this string is not null-terminated. If you want to pass it to something expecting an implicit-length string (like fputs(3) or strchr(3), instead of write(2) or memchr(3)), tack on an extra , 0 to add a zero-byte after everything else.
If you wanted to do the math at run-time, you need to get data into register, add it, then store a string representation of the number into a buffer somewhere. (Or print it one byte at a time, but that's horrible.)
The easy way is to just call printf, to easily print a constant string with some stuff substituted in. Spend your time writing asm for the part of your code that needs to be hand-tuned, not re-implementing library functions.
There's some discussion of int-to-string in comments.
Your link command looks funny:
ld -macosx_version_min 10.7.0 second.o second.o
Are you sure you want the same .o twice?
You could save some code bytes by only moving to 32bit registers when you don't need sign-extension into the 64bit reg. e.g. mov edi,2 instead of mov rdi,2 saves a byte (the REX prefix), unless NASM is clever and does that anyway (actually, it does).
lea rsi, [rel msg] (or use default rel) is a shorter instruction than mov r64, imm64, though. (The AT&T mnemonic is movabs, but Intel syntax still calls it mov.)
Related
I wrote the following code to check if the 1st number- 'x' is greater than the 2nd number- 'y'. For x>y output should be 1 and for x<=y output should be 0.
section .txt
global _start
global checkGreater
_start:
mov rdi,x
mov rsi,y
call checkGreater
mov rax,60
mov rdi,0
syscall
checkGreater:
mov r8,rdi
mov r9,rsi
cmp r8,r9
jg skip
mov [c],byte '0'
skip:
mov rax,1
mov rdi,1
mov rsi,c
mov rdx,1
syscall
ret
section .data
x db 7
y db 5
c db '1',0
But due to some reasons(of course from my end), the code always gives 0 as the output when executed.
I am using the following commands to run the code on Ubuntu 20.04.1 LTS with nasm 2.14.02-1
nasm -f elf64 fileName.asm
ld -s -o fileName fileName.o
./fileName
Where did I make a mistake?
And how should one debug assembly codes, I looked for printing received arguments in checkGreater, but it turns out that's a disturbing headache itself.
Note: If someone wondering why I didn't directly use x and y in checkGreater, I want to extend the comparison to user inputs, and so wrote code in that way only.
The instructions
mov rdi,x
mov rsi,y
write the address of x into rdi, and of y into rsi. The further code then goes on to compare the addresses, which are always x<y, since x is defined above y.
What you should have written instead is
mov rdi,[x]
mov rsi,[y]
But then you have another problem: x and y variables are 1 byte long, while the destination registers are 8 bytes long. So simply doing the above fix will read extraneous bytes, leading to useless results. The final correction is to either fix the size of the variables (writing dq instead of db), or read them as bytes:
movzx rdi,byte [x]
movzx rsi,byte [y]
As for
And how should one debug assembly codes
The main tool for you is an assembly-level debugger, like EDB on Linux or x64dbg on Windows. But in fact, most debuggers, even the ones intended for languages like C++, are capable of displaying disassembly for the program being debugged. So you can use e.g. GDB, or even a GUI wrapper for it like Qt Creator or Eclipse. Just be sure to switch to machine code mode, or use the appropriate commands like GDB's disassemble, stepi, info registers etc..
Note that you don't have to build EDB or GDB from source (as the links above might suggest): they are likely already packaged in the Linux distribution you use. E.g. on Ubuntu the packages are called edb-debugger and gdb.
I have this piece of inline assembly code that should print A in text mode:
void print(){
asm volatile(
"mov ax,0xb800\n"
"mov ds,ax\n" /*<-as complains about this*/
"movb 0,'A'\n"
);
}
However when I try to compile it with gcc(with -m32 and -masm=intel):
./source/kernel.c: Assembler messages:
./source/kernel.c:4: Error: invalid instruction suffix for `mov'
btw this piece of code is from my operating system's kernel, so I can't use stdio.h or something like that.
Despite GCC's line numbering in the error message, that's not the line it's actually complaining about, it's the movb store. You can test that by commenting the other instructions. The error is actually printed by the assembler, with numbering based on .loc metadata directives from the compiler, and this is a multi-line asm template, so it's easy for that to go wrong I guess.
I suspect GAS .intel_syntax mode treats a literal 0 as an immediate, for consistency with mov al, 0 with 0 as a source operand. This of course can't work as a destination.
The "invalid instruction suffix" error message makes little sense, although note that Intel syntax doesn't use operand-size suffixes. (For some reason movb [0], 'A' is accepted, though.)
Instead use square brackets to avoid ambiguity; recommended for any memory operand, even if the address is a symbol instead of a literal number.
mov byte ptr [0], 'A'
mov byte ptr ds:0, 'A' also works, and is the syntax objdump -d -Mintel uses.
It's a good idea to always use square brackets on memory operands to remove any ambiguity, especially for people who might be used to the NASM flavour of Intel syntax.
This question already has answers here:
basic assembly not working on Mac (x86_64+Lion)?
(2 answers)
Closed 3 years ago.
I can find a Linux 64-bit system call table, but the call numbers do not work on macOS - I get a Bus Error: 10 whenever I try to use them.
What are the macOS call numbers for operations like sys_write?
You can get the list of system call numbers from user mode in (/usr/include/)sys/syscall.h. The numbers ARE NOT the same as in Linux. The file is autogenerated during XNU build from bsd/kern/syscalls/syscalls.master.
If you use the libsystem_kernel syscall export you can use the numbers as they are. If you use assembly you have to add 0x2000000 to mark them for the BSD layer (rather than 0x1000000, which would mean Mach traps, or 0x3000000, which would mean machine dependent).
To see examples of system call usage in assembly, you can easily disassemble the exported wrappers: x86_64's /usr/lib/system/libsystem_kernel.dylib (or ARM64's using jtool from the shared library cache).
You need to add 0x2000000 to the call number using a syscalls.master file. I'm using the XNU bds/kern/syscalls.master file. Here's a function in the syscalls.master file that I'm going to call:
4 AUE_NULL ALL { user_ssize_t write(int fd, user_addr_t cbuf, user_size_t nbyte); }
In terms of which registers to pass arguments to, it's the same as 64-bit Linux. Arguments are passed through the rdi, rsi, rdx, r10, r8 and r9 registers, respectively. The write function takes three arguments, which are described in the following assembly:
mov rax, 0x2000004 ; sys_write call identifier
mov rdi, 1 ; STDOUT file descriptor
mov rsi, myMessage ; buffer to print
mov rdx, myMessageLen ; length of buffer
syscall ; make the system call
Error returns are different from Linux, though: on error, CF=1 and RAX=an errno code. (vs. Linux using rax=-4095..-1 as -errno in-band signalling.) See What is the relation between (carry flag) and syscall in assembly (x64 Intel syntax on Mac Os)?
RCX and R11 are overwritten by the syscall instruction itself, before any kernel code runs, so that part is necessarily the same as Linux.
As was already pointed out, you need to add 0x2000000 to the call number. The explanation of that magic number comes from the xnu kernel sources in osfmk/mach/i386/syscall_sw.h (search SYSCALL_CLASS_SHIFT).
/*
* Syscall classes for 64-bit system call entry.
* For 64-bit users, the 32-bit syscall number is partitioned
* with the high-order bits representing the class and low-order
* bits being the syscall number within that class.
* The high-order 32-bits of the 64-bit syscall number are unused.
* All system classes enter the kernel via the syscall instruction.
There are classes of system calls on OSX. All system calls enter the kernel via the syscall instruction. At that point there are Mach system calls, BSD system calls, NONE, diagnostic and machine-dependent.
#define SYSCALL_CLASS_NONE 0 /* Invalid */
#define SYSCALL_CLASS_MACH 1 /* Mach */
#define SYSCALL_CLASS_UNIX 2 /* Unix/BSD */
#define SYSCALL_CLASS_MDEP 3 /* Machine-dependent */
#define SYSCALL_CLASS_DIAG 4 /* Diagnostics */
Each system call is tagged with a class enumeration which is left-shifted 24 bits, SYSCALL_CLASS_SHIFT. The enumeration for BSD system calls is 2, SYSCALL_CLASS_UNIX. So that magic number 0x2000000 is constructed as:
// 2 << 24
#define SYSCALL_CONSTRUCT_UNIX(syscall_number) \
((SYSCALL_CLASS_UNIX << SYSCALL_CLASS_SHIFT) | \
(SYSCALL_NUMBER_MASK & (syscall_number)))
Apparently you can get that magic number from the kernel sources but not from the developer include files. I think this means that Apple really wants you to link against library object files that resolve your system call shim rather than use an inline routine: object compatibility rather than source compatibility.
On x86_64, the system call itself uses the System V ABI (section A.2.1) as Linux does and it uses the syscall instruction (int 0x80 for syscall in Linux). Arguments are passed in rdi, rsi, rdx, r10, r8 and r9. The syscall number is in the rax register.
I used assembly language step by step to learn assembly language programming on linux. I recently got a Mac, on which int 0x80 doesn't seem to work (illegal instruction).
So just wanted to know if there is a good reference (book/webpage) which gives the differences b/w the standard unix assembly and darwin assembly.
For practical purposes, this answer shows how to compile a hello world application using nasm on OSX.
This code can be compiled for linux as is, but the cmd-line command to compile it would probably differ:
section .text
global mystart ; make the main function externally visible
mystart:
; 1 print "hello, world"
; 1a prepare the arguments for the system call to write
push dword mylen ; message length
push dword mymsg ; message to write
push dword 1 ; file descriptor value
; 1b make the system call to write
mov eax, 0x4 ; system call number for write
sub esp, 4 ; OS X (and BSD) system calls needs "extra space" on stack
int 0x80 ; make the actual system call
; 1c clean up the stack
add esp, 16 ; 3 args * 4 bytes/arg + 4 bytes extra space = 16 bytes
; 2 exit the program
; 2a prepare the argument for the sys call to exit
push dword 0 ; exit status returned to the operating system
; 2b make the call to sys call to exit
mov eax, 0x1 ; system call number for exit
sub esp, 4 ; OS X (and BSD) system calls needs "extra space" on stack
int 0x80 ; make the system call
; 2c no need to clean up the stack because no code here would executed: already exited
section .data
mymsg db "hello, world", 0xa ; string with a carriage-return
mylen equ $-mymsg ; string length in bytes
Assemble the source (hello.nasm) to an object file:
nasm -f macho hello.nasm
Link to produce the executable:
ld -o hello -e mystart hello.o
This question will likely help: List of and documentation for system calls for XNU kernel in OSX.
Unfortunately, it looks like the book mentioned there is the only way to find out. As for int 0x80, I doubt it will work because it is a pretty Linux specific API that is built right into the kernel.
The compromise I make when working on an unfamiliar OS is to just use libc calls, but I can understand that even that may be too high level if you're just looking to learn.
can you post your code and how you compiled? (There are many ways to elicit illegal instruction errors)
OSX picked up bsd style of passing arguments, which is why you have to do thing slightly differently.
I bookmarked this a while ago: http://www.freebsd.org/doc/en/books/developers-handbook/book.html#X86-SYSTEM-CALLS
So i was wondering if there is any? I know afd on windows but not sure anything about mac?
And this his how i am using nasam on the following code: nasm a.asm -o a.com -l a.lst
[org 0x100]
mov ax, 5
mov bx, 10
add ax, bx
mov bx, 15
add ax, bx
mov ax, 0x4c00
int 0x21
On windows i know a debugger name afd which help me to step through each statement but not sure how i can do this using gdb.
And neither i am able to execute this .com file, am i supposed to make some other file here?
Why are you writing 16-bit code that makes DOS syscalls? If you want to know how to write asm that's applicable to your OS, take a look the code generated by "gcc -S" on some C code... (Note that code generated this way will have operands reversed, and is meant to be assembled with as instead of nasm)
Further, are you aware what this code is doing? It reads to me like this:
ax = 5
bx = 10
ax += bx
bx = 15
ax += bx
ax = 0x4c00
int 21h
Seems like this code is equivalent to:
mov bx, 15
mov ax, 4c00
int 21h
Which according to what I see here, is exit(0). You didn't need to change bx either...
But. This doesn't even apply to what you were trying to do, because Mac OS X is not MS-DOS, does not know about DOS APIs, cannot run .COM files, etc. I wasn't even aware that it can run 16 bit code. You will want to look at nasm's -f elf option, and you will want to use registers like eax rather than ax.
I've not done assembly programming on OS X, but you could theoretically do something like this:
extern exit
global main
main:
push dword 0
call exit
; This will never get called, but hey...
add esp, 4
xor eax, eax
ret
Then:
nasm -f elf foo.asm -o foo.o
ld -o foo foo.o -lc
Of course this is relying on the C library, which you might not want to do. I've omitted the "full" version because I don't know what the syscall interface looks like on Mac. On many platforms your entry point is the symbol _start and you do syscalls with int 80h or sysenter.
As for debugging... I would also suggest GDB. You can advance by a single instruction with stepi, and the info registers command will dump register state. The disassemble command is also helpful.
Update: Just remembered, I don't think Mac OS X uses ELF... Well.. Much of what I wrote still applies. :-)
Xcode ships with GDB, the GNU Debugger.
Xcode 4 and newer ships with LLDB instead.
As others have said, use GDB, the gnu debugger. In debugging assembly source, I usually find it useful to load a command file that contains something like the following:
display/5i $pc
display/x $eax
display/x $ebx
...
display/5i will display 5 instructions starting with the next to be executed. You can use the stepi command to step execution one instruction at a time. display/x $eax displays the contents of the eax register in hex. You will also likely want to use the x command to examine the contents of memory: x/x $eax, for example, prints the contents of the memory whose address is stored in eax.
These are a few of many commands. Download the GDB manual and skim through it to find other commands you may be interested in using.
IDA Pro does work on the Mac after a fashion (UI still runs on Windows; see an example).