How to produce optimized assembly from c++ code in visual studio 2010

How to produce optimized assembly from c++ code in visual studio 2010 - visual-studio-2010

I have written a c++ file and i want to output it into assembly. However, I want the assembly to be optimized like the example below:
.386
.model flat, c
; Custom Build Step, including a listing file placed in intermediate directory
; but without Source Browser information
; debug:
; ml -c -Zi "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
; release:
; ml -c "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
; outputs:
; $(IntDir)\$(InputName).obj
; Custom Build Step, including a listing file placed in intermediate directory
; and Source Browser information also placed in intermediate directory
; debug:
; ml -c -Zi "-Fl$(IntDir)\$(InputName).lst" "-FR$(IntDir)\$(InputName).sbr" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
; release:
; ml -c "-Fl$(IntDir)\$(InputName).lst" "-FR$(IntDir)\$(InputName).sbr" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
; outputs:
; $(IntDir)\$(InputName).obj
; $(IntDir)\$(InputName).sbr
.code
_TEXT SEGMENT
_p$ = -8
_Array$ = 8
_size$ = 12
ClearUsingPointers PROC NEAR ; ClearUsingPointers, COMDAT
; Line 15
push ebp
mov ebp, esp
sub esp, 204 ; 000000ccH
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-204]
mov ecx, 51 ; 00000033H
mov eax, -858993460 ; ccccccccH
rep stosd
; Line 17
mov eax, DWORD PTR _Array$[ebp]
mov DWORD PTR _p$[ebp], eax
jmp SHORT $L280
$L281:
mov eax, DWORD PTR _p$[ebp]
add eax, 4
mov DWORD PTR _p$[ebp], eax
$L280:
mov eax, DWORD PTR _size$[ebp]
mov ecx, DWORD PTR _Array$[ebp]
lea edx, DWORD PTR [ecx+eax*4]
cmp DWORD PTR _p$[ebp], edx
jae SHORT $L278
; Line 18
mov eax, DWORD PTR _p$[ebp]
mov DWORD PTR [eax], 0
jmp SHORT $L281
$L278:
; Line 19
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
ClearUsingPointers ENDP ; ClearUsingPointers
_TEXT ENDS
END
How has the above assembly been generated. The one that i am able to generate is full of garbage(i don't know how else to explain it), How can i shorten it so i can optimize it manually, compile it and run it? BY garbage i am referring to multiple lines like those below. Can i delete them? :
PUBLIC ?value#?$integral_constant#_N$0A##tr1#std##2_NB ; std::tr1::integral_constant<bool,0>::value
PUBLIC ?value#?$integral_constant#_N$00#tr1#std##2_NB ; std::tr1::integral_constant<bool,1>::value
PUBLIC ?value#?$integral_constant#I$0A##tr1#std##2IB ; std::tr1::integral_constant<unsigned int,0>::value
PUBLIC ?_Rank#?$_Arithmetic_traits#_N#std##2HB ; std::_Arithmetic_traits<bool>::_Rank
PUBLIC ?_Rank#?$_Arithmetic_traits#D#std##2HB ; std::_Arithmetic_traits<char>::_Rank
PUBLIC ?_Rank#?$_Arithmetic_traits#C#std##2HB ; std::_Arithmetic_traits<signed char>::_Rank
PUBLIC ?_Rank#?$_Arithmetic_traits#E#std##2HB ; std::_Arithmetic_traits<unsigned char>::_Rank
; COMDAT ?end#?$_Iosb#H#std##2W4_Seekdir#12#B
CONST SEGMENT
?end#?$_Iosb#H#std##2W4_Seekdir#12#B DD 02H ; std::_Iosb<int>::end
CONST ENDS
; COMDAT ?cur#?$_Iosb#H#std##2W4_Seekdir#12#B
CONST SEGMENT
?cur#?$_Iosb#H#std##2W4_Seekdir#12#B DD 01H ; std::_Iosb<int>::cur
CONST ENDS
; COMDAT ?beg#?$_Iosb#H#std##2W4_Seekdir#12#B
CONST SEGMENT
?beg#?$_Iosb#H#std##2W4_Seekdir#12#B DD 00H ; std::_Iosb<int>::beg
CONST ENDS
; COMDAT ?binary#?$_Iosb#H#std##2W4_Openmode#12#B
CONST SEGMENT
?binary#?$_Iosb#H#std##2W4_Openmode#12#B DD 020H ; std::_Iosb<int>::binary
CONST ENDS

In your project properties, C/C++ settings, Output files, Select Assembly Output. Its output will depend on what C/C++ optimization settings you select

You will get precisely the assembly output you desire from Visual C++ by compiling with the /FA switch. This emits a listing containing only the instructions. Your other options include:
/FAb to get the instructions, followed by a comment stating its actual size (in bytes)
/FAc to get the instructions, preceded by the actual bytes used to encode that instruction
/FAs to get the instructions, with comments interspersed extracted from your actual C/C++ source code, showing what was responsible for generating those chunks of assembly code
Various combinations of these are also allowed, following the standard syntax for CL's command-line switches. For example, /FAcs will produce a rather complex-looking listing containing the raw bytes, assembly opcodes, and commented extracts from your source code.
This can also be controlled, as Keith Nicholas mentioned, in the "Assembly Output" setting under "C/C++ Settings" in the project options within the Visual Studio GUI. Most of the available options are there, but b isn't. You'll need to specify it manually if you want to use it. (I think it might actually be an undocumented option, but it's worked on every version of MSVC I've ever seen.)
The output of /FA alone is very lean. The only noise you get are the comments indicating the lines of your source code that are responsible for that particular chunk of assembly instructions. This is precisely what is shown in the first example from your question. I wish there was a way to prevent these from being included, but I can't find one. It makes it very difficult to easily diff the implementation of two variants of a function. I have an app that strips them out manually.
Note, of course, that none of this has anything to do with optimization. The actual binary code that the compiler generates (assuming, that is, you aren't passing the /c switch, which does a compile only without linking, but will still generate assembly listings) is identical, regardless of which variation of the /FA switch that you use. None of this additional information has any effect whatsoever. It is only for your benefit, aiding you when you are analyzing the code.
As for your real question, about eliminating the "garbage" shown in your second snippet… That simply comes from having included standard-library headers, which define a bunch of symbols and other junk that the compiler has to embed in the object files in order to make it possible for the linker to do its job. There is no way to prevent this from showing up. You have only two options:
If you aren't actually using the standard library, then don't include any of its headers. This will give you much "cleaner" output when using /FA.
If you are using the standard library, and need it to get the code to compile, then you'll just have to ignore it.
Notice that the "garbage" is only at the top of the file, making it possible to easily strip it out manually. When you're trying to analyze the generated object code, either just to understand what the compiler is doing or use it as a starting point to build your own optimized implementation, all you need to do is load the file in a text editor, search for the name of the function(s) you're interested in, and zip right to the relevant code. There will be no garbage there; just the required code.
I should point out that, if you are aiming to take the compiler-generated assembly listings, tweak the code slightly, and then run the whole shebang through an assembler (e.g., MASM), you can forget about it. There's no guarantee that it will work. The /FA assembly listings aren't designed to be fed back into an assembler. They are informational only. Extract the information you need from them, write the assembly code using the compiler's version as a basis, and then feed your clean source files into the assembler.

Related

Trouble debugging assembly code for greater of two numbers

I wrote the following code to check if the 1st number- 'x' is greater than the 2nd number- 'y'. For x>y output should be 1 and for x<=y output should be 0.
section .txt
global _start
global checkGreater
_start:
mov rdi,x
mov rsi,y
call checkGreater
mov rax,60
mov rdi,0
syscall
checkGreater:
mov r8,rdi
mov r9,rsi
cmp r8,r9
jg skip
mov [c],byte '0'
skip:
mov rax,1
mov rdi,1
mov rsi,c
mov rdx,1
syscall
ret
section .data
x db 7
y db 5
c db '1',0
But due to some reasons(of course from my end), the code always gives 0 as the output when executed.
I am using the following commands to run the code on Ubuntu 20.04.1 LTS with nasm 2.14.02-1
nasm -f elf64 fileName.asm
ld -s -o fileName fileName.o
./fileName
Where did I make a mistake?
And how should one debug assembly codes, I looked for printing received arguments in checkGreater, but it turns out that's a disturbing headache itself.
Note: If someone wondering why I didn't directly use x and y in checkGreater, I want to extend the comparison to user inputs, and so wrote code in that way only.

The instructions
mov rdi,x
mov rsi,y
write the address of x into rdi, and of y into rsi. The further code then goes on to compare the addresses, which are always x<y, since x is defined above y.
What you should have written instead is
mov rdi,[x]
mov rsi,[y]
But then you have another problem: x and y variables are 1 byte long, while the destination registers are 8 bytes long. So simply doing the above fix will read extraneous bytes, leading to useless results. The final correction is to either fix the size of the variables (writing dq instead of db), or read them as bytes:
movzx rdi,byte [x]
movzx rsi,byte [y]
As for
And how should one debug assembly codes
The main tool for you is an assembly-level debugger, like EDB on Linux or x64dbg on Windows. But in fact, most debuggers, even the ones intended for languages like C++, are capable of displaying disassembly for the program being debugged. So you can use e.g. GDB, or even a GUI wrapper for it like Qt Creator or Eclipse. Just be sure to switch to machine code mode, or use the appropriate commands like GDB's disassemble, stepi, info registers etc..
Note that you don't have to build EDB or GDB from source (as the links above might suggest): they are likely already packaged in the Linux distribution you use. E.g. on Ubuntu the packages are called edb-debugger and gdb.

Assembly code of malloc

I want to view the assembly code of malloc(), calloc() and free() but when I print the assembly code on radare2 it gives me the following code:
push rbp
mov rbp, rsp
sub rsp, 0x10
mov eax, 0xc8
mov edi, eax
call sym.imp.malloc
xor ecx, ecx
mov qword [local_8h], rax
mov eax, ecx
add rsp, 0x10
pop rbp
ret
How can I see sym.imp.malloc function code? Is there any way to see the code or any website to see the assembly?

Since libc is an open-source library, it is freely available and you can simply read the source code.
The source-code of malloc is available on many places online (example), and you can view the source of different versions of libc under malloc/malloc.c here.
The symbol sym.imp.malloc is how radare flags the address of malloc in the PLT (Procedure Linkage Table) and not the function itself.
Reading the Assembly of the function can be done in several ways:
Open your local libc library with radare2, seek to malloc, analyze the function and then print its disassmbly:
$ r2 /usr/lib/libc.so.6
[0x00020630]> s sym.malloc
[0x0007c620]> af
[0x0007c620]> pdf
If you want to see malloc when linked to another binary you need to open the binary in debug mode, then step to main to make it load the library, then search for the address of malloc, seek to it, analyze the function and print the disassembly:
$ r2 -d /bin/ls
Process with PID 20540 started...
= attach 20540 20540
bin.baddr 0x00400000
Using 0x400000
Assuming filepath /bin/ls
asm.bits 64
[0x7fa764841d80]> dcu main
Continue until 0x004028b0 using 1 bpsize
hit breakpoint at: 4028b0
[0x004028b0]> dmi libc malloc~name=malloc$
vaddr=0x7fa764315620 paddr=0x0007c620 ord=4162 fwd=NONE sz=388 bind=LOCAL type=FUNC name=malloc
vaddr=0x7fa764315620 paddr=0x0007c620 ord=5225 fwd=NONE sz=388 bind=LOCAL type=FUNC name=malloc
vaddr=0x7fa764315620 paddr=0x0007c620 ord=5750 fwd=NONE sz=388 bind=GLOBAL type=FUNC name=malloc
vaddr=0x7fa764315620 paddr=0x0007c620 ord=7013 fwd=NONE sz=388 bind=GLOBAL type=FUNC name=malloc
[0x004028b0]> s 0x7fa764315620
[0x7fa764315620]> af
[0x7fa764315620]> pdf

What is the difference between dword and 'the stack' in assembler

I am trying to learn assembler and am somewhat confused by the method used by osx with nasm macho32 for passing arguments to functions.
I am following the book 'Assembly Language Step By Step' by Jeff Duntemann and using the internet extensively have altered it to run on osx both 32 and 64 bit.
So to begin with the linux version from the book
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!",10
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
nop
mov eax, 4 ; Specify sys_write syscall
mov ebx, 1 ; Specify File Descriptor 1: Standard Output
mov ecx, EatMsg ; Pass offset of the message
mov edx, EatLen ; Pass the length of the message
int 0x80 ; Make syscall to output the text to stdout
mov eax, 1 ; Specify Exit syscall
mov ebx, 0 ; Return a code of zero
int 0x80 ; Make syscall to terminate the program
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
Then very similarly the 64 bit version for osx, other than changing the register names, replacing int 80H (which I understand is somewhat archaic) and adding 0x2000000 to the values moved to eax (don't understand this in the slightest) there isn't much to alter.
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
mov rax, 0x2000004 ; Specify sys_write syscall
mov rdi, 1 ; Specify File Descriptor 1: Standard Output
mov rsi, EatMsg ; Pass offset of the message
mov rdx, EatLen ; Pass the length of the message
syscall ; Make syscall to output the text to stdout
mov rax, 0x2000001 ; Specify Exit syscall
mov rdi, 0 ; Return a code of zero
syscall ; Make syscall to terminate the program
The 32 Bit mac version on the other hand is quite different. I can see we are pushing the arguments to the stack dword, so my question is (and sorry for the long preamble) what is the difference between the stack that eax is being pushed to and dword and why do we just use the registers and not the stack in the 64 bit version (and linux)?
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
mov eax, 0x4 ; Specify sys_write syscall
push dword EatLen ; Pass the length of the message
push dword EatMsg ; Pass offset of the message
push dword 1 ; Specify File Descriptor 1: Standard Output
push eax
int 0x80 ; Make syscall to output the text to stdout
add esp, 16 ; Move back the stack pointer
mov eax, 0x1 ; Specify Exit syscall
push dword 0 ; Return a code of zero
push eax
int 0x80 ; Make syscall to terminate the program

Well, you don't quite understand what is dword. Speaking HLL, it is not a variable, but rather a type. So push doword 1 means that you pushes a double word constant 1 into the stack. There only ONE stack, and both the one and the register eax are pushed in it.
The registers are used in linux because they are much faster, especially on old processors. Linux ABI (which is, as far as i know, a descent of System V ABI) was developed quite a long time ago and often used in systems where performance was critical, when the difference was very significant. OSX intel abi is much younger, afaik, and simplicity of using stack where more important in desktop OSX than the negligible slowdown. In 64-bit processors, more registers where added and hence the where more efficient to use them.

Incremental linking causes unexpected disassembly for MASM program

A while ago I posted this question regarding strange behavior I was experiencing in trying to step through a MASM program.
Essentially, given the following code:
; Tell MASM to use the Intel 80386 instruction set.
.386
; Flat memory model, and Win 32 calling convention
.MODEL FLAT, STDCALL
; Treat labels as case-sensitive (required for windows.inc)
OPTION CaseMap:None
include windows.inc
include masm32.inc
include user32.inc
include kernel32.inc
include macros.asm
includelib masm32.lib
includelib user32.lib
includelib kernel32.lib
.DATA
BadText db "Error...", 0
GoodText db "Excellent!", 0
.CODE
main PROC
int 3
mov eax, 6
xor eax, eax
_label: add eax, ecx
dec ecx
jnz _label
cmp eax, 21
jz _good
_bad: invoke StdOut, addr BadText
jmp _quit
_good: invoke StdOut, addr GoodText
_quit: invoke ExitProcess, 0
main ENDP
END main
I could not get the int 3 instruction to trigger. It was clear why it didn't, examining the disassembly:
00400FFD add byte ptr [eax],al
00400FFF add ah,cl
--- [User path]\main.asm
mov eax, 6
00401001 mov eax,6
xor eax, eax
00401006 xor eax,eax
_label: add eax, ecx
The int 3 instruction had been replaced with add al,cl, but I had no idea why. I managed to track the problem to whether or not Incremental Linking was enabled. The above disassembly was generated with Incremental Linking disabled (/INCREMENTAL:NO option on the command line). Re-enabling it would result in something like the following:
.CODE
main PROC
int 3
00401010 int 3
mov eax, 6
00401011 mov eax,6
xor eax, eax
00401016 xor eax,eax
I should note that the interleaving lines are references back to the original code (I guess a feature of Visual Studio's disassembly window). With Incremental Linking enabled, the disassembly corresponds exactly to what I had written in the program, which is how I expected it to behave all along.
So, why would disabling Incremental Linking cause the disassembly of my program to be altered? What could be happening behind the scenes that would actually alter how the program executes?

The "add" instruction is a two byte instruction, the second of which is the 1-byte opcode of your int3. The first byte of the two byte add instruction is probably some garbage just before the entrypoint. The address of the add instruction is probably 1 byte before where the int3 instruction would be.
I quickly assembled and then disassembled those two instructions with GNU as en objdump, and the result is:
8: 00 cc add %cl,%ah
a: cc int3
Here you can clearly see that the the add instruction contains the second byte 0xcc, while int3 is 0xcc
IOW make sure that you start disassembling on the entry point to avoid this problem.

Is there any assembly language debugger for OS X?

So i was wondering if there is any? I know afd on windows but not sure anything about mac?
And this his how i am using nasam on the following code: nasm a.asm -o a.com -l a.lst
[org 0x100]
mov ax, 5
mov bx, 10
add ax, bx
mov bx, 15
add ax, bx
mov ax, 0x4c00
int 0x21
On windows i know a debugger name afd which help me to step through each statement but not sure how i can do this using gdb.
And neither i am able to execute this .com file, am i supposed to make some other file here?

Why are you writing 16-bit code that makes DOS syscalls? If you want to know how to write asm that's applicable to your OS, take a look the code generated by "gcc -S" on some C code... (Note that code generated this way will have operands reversed, and is meant to be assembled with as instead of nasm)
Further, are you aware what this code is doing? It reads to me like this:
ax = 5
bx = 10
ax += bx
bx = 15
ax += bx
ax = 0x4c00
int 21h
Seems like this code is equivalent to:
mov bx, 15
mov ax, 4c00
int 21h
Which according to what I see here, is exit(0). You didn't need to change bx either...
But. This doesn't even apply to what you were trying to do, because Mac OS X is not MS-DOS, does not know about DOS APIs, cannot run .COM files, etc. I wasn't even aware that it can run 16 bit code. You will want to look at nasm's -f elf option, and you will want to use registers like eax rather than ax.
I've not done assembly programming on OS X, but you could theoretically do something like this:
extern exit
global main
main:
push dword 0
call exit
; This will never get called, but hey...
add esp, 4
xor eax, eax
ret
Then:
nasm -f elf foo.asm -o foo.o
ld -o foo foo.o -lc
Of course this is relying on the C library, which you might not want to do. I've omitted the "full" version because I don't know what the syscall interface looks like on Mac. On many platforms your entry point is the symbol _start and you do syscalls with int 80h or sysenter.
As for debugging... I would also suggest GDB. You can advance by a single instruction with stepi, and the info registers command will dump register state. The disassemble command is also helpful.
Update: Just remembered, I don't think Mac OS X uses ELF... Well.. Much of what I wrote still applies. :-)

Xcode ships with GDB, the GNU Debugger.
Xcode 4 and newer ships with LLDB instead.

As others have said, use GDB, the gnu debugger. In debugging assembly source, I usually find it useful to load a command file that contains something like the following:
display/5i $pc
display/x $eax
display/x $ebx
...
display/5i will display 5 instructions starting with the next to be executed. You can use the stepi command to step execution one instruction at a time. display/x $eax displays the contents of the eax register in hex. You will also likely want to use the x command to examine the contents of memory: x/x $eax, for example, prints the contents of the memory whose address is stored in eax.
These are a few of many commands. Download the GDB manual and skim through it to find other commands you may be interested in using.

IDA Pro does work on the Mac after a fashion (UI still runs on Windows; see an example).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio