I wrote the following code to check if the 1st number- 'x' is greater than the 2nd number- 'y'. For x>y output should be 1 and for x<=y output should be 0.
section .txt
global _start
global checkGreater
_start:
mov rdi,x
mov rsi,y
call checkGreater
mov rax,60
mov rdi,0
syscall
checkGreater:
mov r8,rdi
mov r9,rsi
cmp r8,r9
jg skip
mov [c],byte '0'
skip:
mov rax,1
mov rdi,1
mov rsi,c
mov rdx,1
syscall
ret
section .data
x db 7
y db 5
c db '1',0
But due to some reasons(of course from my end), the code always gives 0 as the output when executed.
I am using the following commands to run the code on Ubuntu 20.04.1 LTS with nasm 2.14.02-1
nasm -f elf64 fileName.asm
ld -s -o fileName fileName.o
./fileName
Where did I make a mistake?
And how should one debug assembly codes, I looked for printing received arguments in checkGreater, but it turns out that's a disturbing headache itself.
Note: If someone wondering why I didn't directly use x and y in checkGreater, I want to extend the comparison to user inputs, and so wrote code in that way only.
The instructions
mov rdi,x
mov rsi,y
write the address of x into rdi, and of y into rsi. The further code then goes on to compare the addresses, which are always x<y, since x is defined above y.
What you should have written instead is
mov rdi,[x]
mov rsi,[y]
But then you have another problem: x and y variables are 1 byte long, while the destination registers are 8 bytes long. So simply doing the above fix will read extraneous bytes, leading to useless results. The final correction is to either fix the size of the variables (writing dq instead of db), or read them as bytes:
movzx rdi,byte [x]
movzx rsi,byte [y]
As for
And how should one debug assembly codes
The main tool for you is an assembly-level debugger, like EDB on Linux or x64dbg on Windows. But in fact, most debuggers, even the ones intended for languages like C++, are capable of displaying disassembly for the program being debugged. So you can use e.g. GDB, or even a GUI wrapper for it like Qt Creator or Eclipse. Just be sure to switch to machine code mode, or use the appropriate commands like GDB's disassemble, stepi, info registers etc..
Note that you don't have to build EDB or GDB from source (as the links above might suggest): they are likely already packaged in the Linux distribution you use. E.g. on Ubuntu the packages are called edb-debugger and gdb.
Related
I'm new to GNU Debugger. I've been playing around with it, debugging Assembly Files (x86_64 Linux) for a day or so and just a few hours ago I ''discovered'' the TUI interface.
My first attempt using the TUI interface was to see the register changes as I execute each line at a time of a simple Hello World program (in asm). Here is the code of the program
section .data
text db "Hello, World!", 10
len equ $-text
section .text
global _start
_start:
nop
call _printText
mov rax, 60
mov rdi, 0
syscall
_printText:
nop
mov rax, 1
mov rdi, 1
mov rsi, text
mov rdx, len
syscall
ret
After creating the executable file in the terminal of linux I write
$ gdb -q ./hello -tui
Then I created three breakpoints: one right of the _start, another right after _printText and the last just above the mov rax, 60 for the SYS_EXIT.
After this:
1) I run the program.
2) On gdb mode I write layout asm to see the written code.
3) I write layout regs.
4) Finally I use stepi to see how the register change according the the written hello world program.
The thing is that when the RIP register points to the address of ret, corresponding to SYS_EXIT and I hit Enter I get the following message in console
[Inferior 1 (process 2059) exited normally]
/build/gdb-cXfXJ3/gdb-7.11.1/gdb/thread.c:1100: internal-error: finish_thread_st
ate: Assertion `tp' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)
If I type n It appears this (as it says, it quits if I type y):
This is a bug, please report it. For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.
/build/gdb-cXfXJ3/gdb-7.11.1/gdb/thread.c:1100: internal-error: finish_thread_st
ate: Assertion `tp' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n)
As I don't know what a core file of GDB (and what is useful for), so I type n and the debugging session closes.
Does anyone know why this is happening and how can be fixed?
By the way, I'm new in Assembly also, so if this occurs because of something wrong in the program I'd also appreciate if anyone can point that out.
I use the same GDB version as you and I always use the TUI features; but I've never had this problem. However, when I use your code the internal GDB error occurs. But if I make one change in your write syscall function, the error does not manifest.
Although you are not calling another function from within a function, I generally create a stack frame by including at least the "push rbp", "mov rbp, rsp", and "leave" instructions in my x86-64 function calls. This may be a band-aide or a work around with respect to the "bug".
_printText:
push rbp
mov rbp, rsp
mov rax, 1
mov rdi, 1
mov rsi, text
mov rdx, len
syscall
leave
ret
Does anyone know why this is happening
It's happening because there is a bug in GDB (more precisely, an assertion that GDB internal variable tp is not NULL has been violated).
and how can be fixed?
You should try to reproduce this with current version of GDB (the bug may have already been fixed), and file a bug report (like the message tells you).
I don't know what a core file of GDB (and what is useful for),
It's only useful to GDB developers.
The book Assembly Language Step by Step provides the following code as a sandbox:
section .data
section .text
global _start
_start:
nop
//insert sandbox code here
nop
Any example that I include in the space for sandbox is creating a segmentation fault. For example, adding this code:
mov ax, 067FEh
mov bx, ax
mov cl, bh
mov ch, bl
Then compiling with:
nasm -f macho sandbox.asm
ld -o sandbox -e _start sandbox.o
creates a seg fault when I run it on my OS/X. Is there a way to get more information about what's causing the segmentation fault?
The problem you have is that you have created a program that runs past the end of the code that you have written.
When your program executes, the loader will end up issuing a jmp to your _start. Your code then runs, but you do not have anything to return to the OS at the end, so it will simply continue running, executing whatever bytes happen to be in RAM after your code.
The simplest fix would be to properly exit the code. For example:
mov eax, 0x1 ; system call number for exit
sub esp, 4 ; OS X system calls needs "extra space" on stack
int 0x80
Since you are not generating any actual output, you would need to step through with a debugger to see what's going on. After compiling you could use lldb to step through.
lldb ./sandbox
image dump sections
Make note of the address listed that is of type code for your executable (not dyld). It will likely be 0x0000000000001fe6. Continuing within lldb:
b s -a 0x0000000000001fe6
run
register read
step
register read
step
register read
At this point you should be past the NOPs and see things changing in registers. Have fun!
I have this code
global start
section .text
start:
mov rax,0x2000004
mov rdi,1
mov rsi,msg
mov rdx,msg.len
syscall
mov rax,0x2000004
mov rdi,2
mov rsi,msgt
mov rdx,msgt.len
syscall
mov rax,0x2000004
mov rdi,3
mov rsi,msgtn
mov rdx,msgtn.len
syscall
mov rax,0x2000001
mov rdi,0
syscall
section .data
msg: db "This is a string",10
.len: equ $ - msg
var: db 1
msgt: db "output of 1+1: "
.len: equ $ - msgt
msgtn: db 1
.len: equ $ - msg
I want to print the variable msgtn. I tried msgt: db "output of 1+1", var
But the NASM assembler failed with:
second.s:35: error: Mach-O 64-bit format does not support 32-bit absolute addresses
Instead of the variable, I also tried "output of 1+1", [1+1], but I got:
second.s:35: error: expression syntax error
I tried it also without the parantheses, there was no number, but only the string "1+1".
The command I used to assemble my program was:
/usr/local/Cellar/nasm/*/bin/nasm -f macho64 second.s && ld -macosx_version_min 10.7.0 second.o second.o
nasm -v shows:
NASM version 2.11.08 compiled on Nov 27 2015
OS X 10.9.5 with Intel core i5 (x86_64 assembly)
db directives let you put assemble-time-constant bytes into the object file (usually in the data section). You can use an expression as an argument, to have the assembler do some math for you at assemble time. Anything that needs to happen at run time needs to be done by instructions that you write, and that get run. It's not like C++ where a global variable can have a constructor that gets run at startup behind the scenes.
msgt: db "output of 1+1", var
would place those ascii characters, followed by (the low byte of?) the absolute address of var. You'd use this kind of thing (with dd or dq) to do something like this C: int var; int *global_ptr = &var;, where you have a global/static pointer variable that starts out initialized to point to another global/static variable. I'm not sure if MacOS X allows this with a 64bit pointer, or if it just refuses to do relocations for 32bit addresses. But that's why you're getting:
second.s:35: error: Mach-O 64-bit format does not support 32-bit absolute addresses
Notice that numeric value of the pointer depends on where in virtual address space the code is loaded. So the address isn't strictly an assemble-time constant. The linker needs to mark things that need run-time relocation, like those 64bit immediate-constant addresses you mov into registers (mov rsi,msg). See this answer for some information on the difference between that and lea rsi, [rel msg] to get the address into a register using a RIP-relative method. (That answer has links to more detailed info, and so do the x86 wiki).
Your attempt at using db [1+1]: What the heck were you expecting? [] in NASM syntax means memory reference. First: the resulting byte has to be an assemble-time constant. I'm not sure if there's an easy syntax for duplicating whatever's at some other address, but this isn't it. (I'd just define a macro and use it in both places.) Second: 2 is not a valid address.
msgt: db "output of 1+1: ", '0' + 1 + 1, 10
would put the ASCII characters: output of 1+1: 2\n at that point in the object file. 10 is the decimal value of ASCII newline. '0' is a way of writing 0x30, the ASCII encoding the character '0'. A 2 byte is not a printable ASCII character. Your version that did that would have printed a 2 byte there, but you wouldn't notice unless you piped the output into hexdump (or od -t x1c or something, IDK what OS X provides. od isn't very nice, but it is widely available.)
Note that this string is not null-terminated. If you want to pass it to something expecting an implicit-length string (like fputs(3) or strchr(3), instead of write(2) or memchr(3)), tack on an extra , 0 to add a zero-byte after everything else.
If you wanted to do the math at run-time, you need to get data into register, add it, then store a string representation of the number into a buffer somewhere. (Or print it one byte at a time, but that's horrible.)
The easy way is to just call printf, to easily print a constant string with some stuff substituted in. Spend your time writing asm for the part of your code that needs to be hand-tuned, not re-implementing library functions.
There's some discussion of int-to-string in comments.
Your link command looks funny:
ld -macosx_version_min 10.7.0 second.o second.o
Are you sure you want the same .o twice?
You could save some code bytes by only moving to 32bit registers when you don't need sign-extension into the 64bit reg. e.g. mov edi,2 instead of mov rdi,2 saves a byte (the REX prefix), unless NASM is clever and does that anyway (actually, it does).
lea rsi, [rel msg] (or use default rel) is a shorter instruction than mov r64, imm64, though. (The AT&T mnemonic is movabs, but Intel syntax still calls it mov.)
I have written a c++ file and i want to output it into assembly. However, I want the assembly to be optimized like the example below:
.386
.model flat, c
; Custom Build Step, including a listing file placed in intermediate directory
; but without Source Browser information
; debug:
; ml -c -Zi "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
; release:
; ml -c "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
; outputs:
; $(IntDir)\$(InputName).obj
; Custom Build Step, including a listing file placed in intermediate directory
; and Source Browser information also placed in intermediate directory
; debug:
; ml -c -Zi "-Fl$(IntDir)\$(InputName).lst" "-FR$(IntDir)\$(InputName).sbr" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
; release:
; ml -c "-Fl$(IntDir)\$(InputName).lst" "-FR$(IntDir)\$(InputName).sbr" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
; outputs:
; $(IntDir)\$(InputName).obj
; $(IntDir)\$(InputName).sbr
.code
_TEXT SEGMENT
_p$ = -8
_Array$ = 8
_size$ = 12
ClearUsingPointers PROC NEAR ; ClearUsingPointers, COMDAT
; Line 15
push ebp
mov ebp, esp
sub esp, 204 ; 000000ccH
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-204]
mov ecx, 51 ; 00000033H
mov eax, -858993460 ; ccccccccH
rep stosd
; Line 17
mov eax, DWORD PTR _Array$[ebp]
mov DWORD PTR _p$[ebp], eax
jmp SHORT $L280
$L281:
mov eax, DWORD PTR _p$[ebp]
add eax, 4
mov DWORD PTR _p$[ebp], eax
$L280:
mov eax, DWORD PTR _size$[ebp]
mov ecx, DWORD PTR _Array$[ebp]
lea edx, DWORD PTR [ecx+eax*4]
cmp DWORD PTR _p$[ebp], edx
jae SHORT $L278
; Line 18
mov eax, DWORD PTR _p$[ebp]
mov DWORD PTR [eax], 0
jmp SHORT $L281
$L278:
; Line 19
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
ClearUsingPointers ENDP ; ClearUsingPointers
_TEXT ENDS
END
How has the above assembly been generated. The one that i am able to generate is full of garbage(i don't know how else to explain it), How can i shorten it so i can optimize it manually, compile it and run it? BY garbage i am referring to multiple lines like those below. Can i delete them? :
PUBLIC ?value#?$integral_constant#_N$0A##tr1#std##2_NB ; std::tr1::integral_constant<bool,0>::value
PUBLIC ?value#?$integral_constant#_N$00#tr1#std##2_NB ; std::tr1::integral_constant<bool,1>::value
PUBLIC ?value#?$integral_constant#I$0A##tr1#std##2IB ; std::tr1::integral_constant<unsigned int,0>::value
PUBLIC ?_Rank#?$_Arithmetic_traits#_N#std##2HB ; std::_Arithmetic_traits<bool>::_Rank
PUBLIC ?_Rank#?$_Arithmetic_traits#D#std##2HB ; std::_Arithmetic_traits<char>::_Rank
PUBLIC ?_Rank#?$_Arithmetic_traits#C#std##2HB ; std::_Arithmetic_traits<signed char>::_Rank
PUBLIC ?_Rank#?$_Arithmetic_traits#E#std##2HB ; std::_Arithmetic_traits<unsigned char>::_Rank
; COMDAT ?end#?$_Iosb#H#std##2W4_Seekdir#12#B
CONST SEGMENT
?end#?$_Iosb#H#std##2W4_Seekdir#12#B DD 02H ; std::_Iosb<int>::end
CONST ENDS
; COMDAT ?cur#?$_Iosb#H#std##2W4_Seekdir#12#B
CONST SEGMENT
?cur#?$_Iosb#H#std##2W4_Seekdir#12#B DD 01H ; std::_Iosb<int>::cur
CONST ENDS
; COMDAT ?beg#?$_Iosb#H#std##2W4_Seekdir#12#B
CONST SEGMENT
?beg#?$_Iosb#H#std##2W4_Seekdir#12#B DD 00H ; std::_Iosb<int>::beg
CONST ENDS
; COMDAT ?binary#?$_Iosb#H#std##2W4_Openmode#12#B
CONST SEGMENT
?binary#?$_Iosb#H#std##2W4_Openmode#12#B DD 020H ; std::_Iosb<int>::binary
CONST ENDS
In your project properties, C/C++ settings, Output files, Select Assembly Output. Its output will depend on what C/C++ optimization settings you select
You will get precisely the assembly output you desire from Visual C++ by compiling with the /FA switch. This emits a listing containing only the instructions. Your other options include:
/FAb to get the instructions, followed by a comment stating its actual size (in bytes)
/FAc to get the instructions, preceded by the actual bytes used to encode that instruction
/FAs to get the instructions, with comments interspersed extracted from your actual C/C++ source code, showing what was responsible for generating those chunks of assembly code
Various combinations of these are also allowed, following the standard syntax for CL's command-line switches. For example, /FAcs will produce a rather complex-looking listing containing the raw bytes, assembly opcodes, and commented extracts from your source code.
This can also be controlled, as Keith Nicholas mentioned, in the "Assembly Output" setting under "C/C++ Settings" in the project options within the Visual Studio GUI. Most of the available options are there, but b isn't. You'll need to specify it manually if you want to use it. (I think it might actually be an undocumented option, but it's worked on every version of MSVC I've ever seen.)
The output of /FA alone is very lean. The only noise you get are the comments indicating the lines of your source code that are responsible for that particular chunk of assembly instructions. This is precisely what is shown in the first example from your question. I wish there was a way to prevent these from being included, but I can't find one. It makes it very difficult to easily diff the implementation of two variants of a function. I have an app that strips them out manually.
Note, of course, that none of this has anything to do with optimization. The actual binary code that the compiler generates (assuming, that is, you aren't passing the /c switch, which does a compile only without linking, but will still generate assembly listings) is identical, regardless of which variation of the /FA switch that you use. None of this additional information has any effect whatsoever. It is only for your benefit, aiding you when you are analyzing the code.
As for your real question, about eliminating the "garbage" shown in your second snippet… That simply comes from having included standard-library headers, which define a bunch of symbols and other junk that the compiler has to embed in the object files in order to make it possible for the linker to do its job. There is no way to prevent this from showing up. You have only two options:
If you aren't actually using the standard library, then don't include any of its headers. This will give you much "cleaner" output when using /FA.
If you are using the standard library, and need it to get the code to compile, then you'll just have to ignore it.
Notice that the "garbage" is only at the top of the file, making it possible to easily strip it out manually. When you're trying to analyze the generated object code, either just to understand what the compiler is doing or use it as a starting point to build your own optimized implementation, all you need to do is load the file in a text editor, search for the name of the function(s) you're interested in, and zip right to the relevant code. There will be no garbage there; just the required code.
I should point out that, if you are aiming to take the compiler-generated assembly listings, tweak the code slightly, and then run the whole shebang through an assembler (e.g., MASM), you can forget about it. There's no guarantee that it will work. The /FA assembly listings aren't designed to be fed back into an assembler. They are informational only. Extract the information you need from them, write the assembly code using the compiler's version as a basis, and then feed your clean source files into the assembler.
So i was wondering if there is any? I know afd on windows but not sure anything about mac?
And this his how i am using nasam on the following code: nasm a.asm -o a.com -l a.lst
[org 0x100]
mov ax, 5
mov bx, 10
add ax, bx
mov bx, 15
add ax, bx
mov ax, 0x4c00
int 0x21
On windows i know a debugger name afd which help me to step through each statement but not sure how i can do this using gdb.
And neither i am able to execute this .com file, am i supposed to make some other file here?
Why are you writing 16-bit code that makes DOS syscalls? If you want to know how to write asm that's applicable to your OS, take a look the code generated by "gcc -S" on some C code... (Note that code generated this way will have operands reversed, and is meant to be assembled with as instead of nasm)
Further, are you aware what this code is doing? It reads to me like this:
ax = 5
bx = 10
ax += bx
bx = 15
ax += bx
ax = 0x4c00
int 21h
Seems like this code is equivalent to:
mov bx, 15
mov ax, 4c00
int 21h
Which according to what I see here, is exit(0). You didn't need to change bx either...
But. This doesn't even apply to what you were trying to do, because Mac OS X is not MS-DOS, does not know about DOS APIs, cannot run .COM files, etc. I wasn't even aware that it can run 16 bit code. You will want to look at nasm's -f elf option, and you will want to use registers like eax rather than ax.
I've not done assembly programming on OS X, but you could theoretically do something like this:
extern exit
global main
main:
push dword 0
call exit
; This will never get called, but hey...
add esp, 4
xor eax, eax
ret
Then:
nasm -f elf foo.asm -o foo.o
ld -o foo foo.o -lc
Of course this is relying on the C library, which you might not want to do. I've omitted the "full" version because I don't know what the syscall interface looks like on Mac. On many platforms your entry point is the symbol _start and you do syscalls with int 80h or sysenter.
As for debugging... I would also suggest GDB. You can advance by a single instruction with stepi, and the info registers command will dump register state. The disassemble command is also helpful.
Update: Just remembered, I don't think Mac OS X uses ELF... Well.. Much of what I wrote still applies. :-)
Xcode ships with GDB, the GNU Debugger.
Xcode 4 and newer ships with LLDB instead.
As others have said, use GDB, the gnu debugger. In debugging assembly source, I usually find it useful to load a command file that contains something like the following:
display/5i $pc
display/x $eax
display/x $ebx
...
display/5i will display 5 instructions starting with the next to be executed. You can use the stepi command to step execution one instruction at a time. display/x $eax displays the contents of the eax register in hex. You will also likely want to use the x command to examine the contents of memory: x/x $eax, for example, prints the contents of the memory whose address is stored in eax.
These are a few of many commands. Download the GDB manual and skim through it to find other commands you may be interested in using.
IDA Pro does work on the Mac after a fashion (UI still runs on Windows; see an example).