Incremental linking causes unexpected disassembly for MASM program - visual-studio-2010

A while ago I posted this question regarding strange behavior I was experiencing in trying to step through a MASM program.
Essentially, given the following code:
; Tell MASM to use the Intel 80386 instruction set.
.386
; Flat memory model, and Win 32 calling convention
.MODEL FLAT, STDCALL
; Treat labels as case-sensitive (required for windows.inc)
OPTION CaseMap:None
include windows.inc
include masm32.inc
include user32.inc
include kernel32.inc
include macros.asm
includelib masm32.lib
includelib user32.lib
includelib kernel32.lib
.DATA
BadText db "Error...", 0
GoodText db "Excellent!", 0
.CODE
main PROC
int 3
mov eax, 6
xor eax, eax
_label: add eax, ecx
dec ecx
jnz _label
cmp eax, 21
jz _good
_bad: invoke StdOut, addr BadText
jmp _quit
_good: invoke StdOut, addr GoodText
_quit: invoke ExitProcess, 0
main ENDP
END main
I could not get the int 3 instruction to trigger. It was clear why it didn't, examining the disassembly:
00400FFD add byte ptr [eax],al
00400FFF add ah,cl
--- [User path]\main.asm
mov eax, 6
00401001 mov eax,6
xor eax, eax
00401006 xor eax,eax
_label: add eax, ecx
The int 3 instruction had been replaced with add al,cl, but I had no idea why. I managed to track the problem to whether or not Incremental Linking was enabled. The above disassembly was generated with Incremental Linking disabled (/INCREMENTAL:NO option on the command line). Re-enabling it would result in something like the following:
.CODE
main PROC
int 3
00401010 int 3
mov eax, 6
00401011 mov eax,6
xor eax, eax
00401016 xor eax,eax
I should note that the interleaving lines are references back to the original code (I guess a feature of Visual Studio's disassembly window). With Incremental Linking enabled, the disassembly corresponds exactly to what I had written in the program, which is how I expected it to behave all along.
So, why would disabling Incremental Linking cause the disassembly of my program to be altered? What could be happening behind the scenes that would actually alter how the program executes?

The "add" instruction is a two byte instruction, the second of which is the 1-byte opcode of your int3. The first byte of the two byte add instruction is probably some garbage just before the entrypoint. The address of the add instruction is probably 1 byte before where the int3 instruction would be.
I quickly assembled and then disassembled those two instructions with GNU as en objdump, and the result is:
8: 00 cc add %cl,%ah
a: cc int3
Here you can clearly see that the the add instruction contains the second byte 0xcc, while int3 is 0xcc
IOW make sure that you start disassembling on the entry point to avoid this problem.

Related

How to fix gdb skipping breakpoints

I wrote an assembly code to test multiplication and division with integers. After assembling the code with -g and compiling it with gcc -g; I put a series of breakpoints in GDB, before and after the arithmetic operations.
Some breakpoints are skipped and GDB ends the debugging session.
What could be happening? I attach example code and picture of it.
Debian 10.8 (gcc 8.3.0-6)and Nasm assembler.
I would appreciate the help.
test code
section .bss
data: resq 1
section .text
global main
main:
push rbp
mov rbp,rsp
mov rax,77
mov rbx,2
mul rbx
mov [rel data],rax
mov rsp,rbp
pop rbp
mov rax,60
mov rdi,0
syscall
ok this is the gdb output; consider two breakpoints, one is mul bx, and the other on mov rsp,rbp; before the syscall exit.
Breakpoint 2, in__libc_csu_init()
(gdb)c
Continuing.
[inferior 1 process(1974) exited normally]
(gdb)

Assembly code of malloc

I want to view the assembly code of malloc(), calloc() and free() but when I print the assembly code on radare2 it gives me the following code:
push rbp
mov rbp, rsp
sub rsp, 0x10
mov eax, 0xc8
mov edi, eax
call sym.imp.malloc
xor ecx, ecx
mov qword [local_8h], rax
mov eax, ecx
add rsp, 0x10
pop rbp
ret
How can I see sym.imp.malloc function code? Is there any way to see the code or any website to see the assembly?
Since libc is an open-source library, it is freely available and you can simply read the source code.
The source-code of malloc is available on many places online (example), and you can view the source of different versions of libc under malloc/malloc.c here.
The symbol sym.imp.malloc is how radare flags the address of malloc in the PLT (Procedure Linkage Table) and not the function itself.
Reading the Assembly of the function can be done in several ways:
Open your local libc library with radare2, seek to malloc, analyze the function and then print its disassmbly:
$ r2 /usr/lib/libc.so.6
[0x00020630]> s sym.malloc
[0x0007c620]> af
[0x0007c620]> pdf
If you want to see malloc when linked to another binary you need to open the binary in debug mode, then step to main to make it load the library, then search for the address of malloc, seek to it, analyze the function and print the disassembly:
$ r2 -d /bin/ls
Process with PID 20540 started...
= attach 20540 20540
bin.baddr 0x00400000
Using 0x400000
Assuming filepath /bin/ls
asm.bits 64
[0x7fa764841d80]> dcu main
Continue until 0x004028b0 using 1 bpsize
hit breakpoint at: 4028b0
[0x004028b0]> dmi libc malloc~name=malloc$
vaddr=0x7fa764315620 paddr=0x0007c620 ord=4162 fwd=NONE sz=388 bind=LOCAL type=FUNC name=malloc
vaddr=0x7fa764315620 paddr=0x0007c620 ord=5225 fwd=NONE sz=388 bind=LOCAL type=FUNC name=malloc
vaddr=0x7fa764315620 paddr=0x0007c620 ord=5750 fwd=NONE sz=388 bind=GLOBAL type=FUNC name=malloc
vaddr=0x7fa764315620 paddr=0x0007c620 ord=7013 fwd=NONE sz=388 bind=GLOBAL type=FUNC name=malloc
[0x004028b0]> s 0x7fa764315620
[0x7fa764315620]> af
[0x7fa764315620]> pdf

ml64 weird error output

Hello Stack Overflowers!
I have wrote a asm code file that i'm trying to build using ml64 (v14.00.23506.0)
but the assembler gives me weird output.. like there something wrong in line 1..
i'm sure there is nothing wrong with the code because it worked well on other ml64 version. it worked well with the ml64 from VS Community 2015, the current ml64 from VC++ BuildTools Tech-Preview gives me the error in the screenshot.
i tried ml64 from VS Express 2015 and VC BuildTools 2015 TP, both give same error.
ps. im building with a batch script that i wrote and coding in VIM.
can anyone guide me on how to overcome this issue?
-- UPDATED --
that's the cmd build error after i fixed the previous batch errors.
still weird error, can anyone understand?
(thats a test code)
includelib kernel32.lib
includelib user32.lib
EXTERN GetModuleHandleExW:PROC
EXTERN MessageBoxW:PROC
EXTERN ExitProcess:PROC
.DATA
ALIGN 4
pwszMessage WORD "H","e","l","l","o",","," ","W","o","r","l","d","!"
WORD 0
pwszCaption WORD "M","e","s","s","a","g","e","B","o","x"
WORD 0
hInstance QWORD 0
.CODE
entry PROC
sub rsp, 28h
xor rcx, rcx
mov rdx, rcx
lea r8, [hInstance]
call GetModuleHandleExW
lea rdx, [pwszMessage]
lea r8, [pwszCaption]
mov r9d, 40h
call MessageBoxW
call ExitProcess ; rcx suppose to hold 0 already
add rsp, 28h
ret
entry ENDP
END

What is the difference between dword and 'the stack' in assembler

I am trying to learn assembler and am somewhat confused by the method used by osx with nasm macho32 for passing arguments to functions.
I am following the book 'Assembly Language Step By Step' by Jeff Duntemann and using the internet extensively have altered it to run on osx both 32 and 64 bit.
So to begin with the linux version from the book
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!",10
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
nop
mov eax, 4 ; Specify sys_write syscall
mov ebx, 1 ; Specify File Descriptor 1: Standard Output
mov ecx, EatMsg ; Pass offset of the message
mov edx, EatLen ; Pass the length of the message
int 0x80 ; Make syscall to output the text to stdout
mov eax, 1 ; Specify Exit syscall
mov ebx, 0 ; Return a code of zero
int 0x80 ; Make syscall to terminate the program
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
Then very similarly the 64 bit version for osx, other than changing the register names, replacing int 80H (which I understand is somewhat archaic) and adding 0x2000000 to the values moved to eax (don't understand this in the slightest) there isn't much to alter.
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
mov rax, 0x2000004 ; Specify sys_write syscall
mov rdi, 1 ; Specify File Descriptor 1: Standard Output
mov rsi, EatMsg ; Pass offset of the message
mov rdx, EatLen ; Pass the length of the message
syscall ; Make syscall to output the text to stdout
mov rax, 0x2000001 ; Specify Exit syscall
mov rdi, 0 ; Return a code of zero
syscall ; Make syscall to terminate the program
The 32 Bit mac version on the other hand is quite different. I can see we are pushing the arguments to the stack dword, so my question is (and sorry for the long preamble) what is the difference between the stack that eax is being pushed to and dword and why do we just use the registers and not the stack in the 64 bit version (and linux)?
section .data ; Section containing initialised data
EatMsg db "Eat at Joe's!", 0x0a
EatLen equ $-EatMsg
section .bss ; Section containing uninitialised data
section .text ; Section containing code
global start ; Linker needs this to find the entry point!
start:
mov eax, 0x4 ; Specify sys_write syscall
push dword EatLen ; Pass the length of the message
push dword EatMsg ; Pass offset of the message
push dword 1 ; Specify File Descriptor 1: Standard Output
push eax
int 0x80 ; Make syscall to output the text to stdout
add esp, 16 ; Move back the stack pointer
mov eax, 0x1 ; Specify Exit syscall
push dword 0 ; Return a code of zero
push eax
int 0x80 ; Make syscall to terminate the program
Well, you don't quite understand what is dword. Speaking HLL, it is not a variable, but rather a type. So push doword 1 means that you pushes a double word constant 1 into the stack. There only ONE stack, and both the one and the register eax are pushed in it.
The registers are used in linux because they are much faster, especially on old processors. Linux ABI (which is, as far as i know, a descent of System V ABI) was developed quite a long time ago and often used in systems where performance was critical, when the difference was very significant. OSX intel abi is much younger, afaik, and simplicity of using stack where more important in desktop OSX than the negligible slowdown. In 64-bit processors, more registers where added and hence the where more efficient to use them.

x64 nasm: pushing memory addresses onto the stack & call function

I'm pretty new to x64-assembly on the Mac, so I'm getting confused porting some 32-bit code in 64-bit.
The program should simply print out a message via the printf function from the C standart library.
I've started with this code:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
push msg
call _printf
mov rsp, rbp
pop rbp
ret
Compiling it with nasm this way:
$ nasm -f macho64 main.s
Returned following error:
main.s:12: error: Mach-O 64-bit format does not support 32-bit absolute addresses
I've tried to fix that problem byte changing the code to this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
mov rax, msg ; shouldn't rax now contain the address of msg?
push rax ; push the address
call _printf
mov rsp, rbp
pop rbp
ret
It compiled fine with the nasm command above but now there is a warning while compiling the object file with gcc to actual program:
$ gcc main.o
ld: warning: PIE disabled. Absolute addressing (perhaps -mdynamic-no-pic) not
allowed in code signed PIE, but used in _main from main.o. To fix this warning,
don't compile with -mdynamic-no-pic or link with -Wl,-no_pie
Since it's a warning not an error I've executed the a.out file:
$ ./a.out
Segmentation fault: 11
Hope anyone knows what I'm doing wrong.
The 64-bit OS X ABI complies at large to the System V ABI - AMD64 Architecture Processor Supplement. Its code model is very similar to the Small position independent code model (PIC) with the differences explained here. In that code model all local and small data is accessed directly using RIP-relative addressing. As noted in the comments by Z boson, the image base for 64-bit Mach-O executables is beyond the first 4 GiB of the virtual address space, therefore push msg is not only an invalid way to put the address of msg on the stack, but it is also an impossible one since PUSH does not support 64-bit immediate values. The code should rather look similar to:
; this is what you *would* do for later args on the stack
lea rax, [rel msg] ; RIP-relative addressing
push rax
But in that particular case one needs not push the value on the stack at all. The 64-bit calling convention mandates that the fist 6 integer/pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9, exactly in that order. The first 8 floating-point or vector arguments go into XMM0, XMM1, ..., XMM7. Only after all the available registers are used or there are arguments that cannot fit in any of those registers (e.g. a 80-bit long double value) the stack is used. 64-bit immediate pushes are performed using MOV (the QWORD variant) and not PUSH. Simple return values are passed back in the RAX register. The caller must also provide stack space for the callee to save some of the registers.
printf is a special function because it takes variable number of arguments. When calling such functions AL (the low byte of RAX) should be set to the number of floating-point arguments, passed in the vector registers. Also note that RIP-relative addressing is preferred for data that lies within 2 GiB of the code.
Here is how gcc translates printf("This is a test\n"); into assembly on OS X:
xorl %eax, %eax # (1)
leaq L_.str(%rip), %rdi # (2)
callq _printf # (3)
L_.str:
.asciz "This is a test\n"
(this is AT&T style assembly, source is left, destination is right, register names are prefixed with %, data width is encoded as a suffix to the instruction name)
At (1) zero is put into AL (by zeroing the whole RAX which avoids partial-register delays) since no floating-point arguments are being passed. At (2) the address of the string is loaded in RDI. Note how the value is actually an offset from the current value of RIP. Since the assembler doesn't know what this value would be, it puts a relocation request in the object file. The linker then sees the relocation and puts the correct value at link time.
I am not a NASM guru, but I think the following code should do it:
default rel ; make [rel msg] the default for [msg]
section .data
msg: db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp ; re-aligns the stack by 16 before call
mov rbp, rsp
xor eax, eax ; al = 0 FP args in XMM regs
lea rdi, [rel msg]
call _printf
mov rsp, rbp
pop rbp
ret
No answer yet has explained why NASM reports
Mach-O 64-bit format does not support 32-bit absolute addresses
The reason NASM won't do this is explained in Agner Fog's Optimizing Assembly manual in section 3.3 Addressing modes under the subsection titled 32-bit absolute addressing in 64 bit mode he writes
32-bit absolute addresses cannot be used in Mac OS X, where addresses are above 2^32 by
default.
This is not a problem on Linux or Windows. In fact I already showed this works at static-linkage-with-glibc-without-calling-main. That hello world code uses 32-bit absolute addressing with elf64 and runs fine.
#HristoIliev suggested using rip relative addressing but did not explain that 32-bit absolute addressing in Linux would work as well. In fact if you change lea rdi, [rel msg] to lea rdi, [msg] it assembles and runs fine with nasm -efl64 but fails with nasm -macho64
Like this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
xor al, al
lea rdi, [msg]
call _printf
mov rsp, rbp
pop rbp
ret
You can check that this is an absolute 32-bit address and not rip relative with objdump. However, it's important to point out that the preferred method is still rip relative addressing. Agner in the same manual writes:
There is absolutely no reason to use absolute addresses for simple memory operands. Rip-
relative addresses make instructions shorter, they eliminate the need for relocation at load
time, and they are safe to use in all systems.
So when would use use 32-bit absolute addresses in 64-bit mode? Static arrays is a good candidate. See the following subsection Addressing static arrays in 64 bit mode. The simple case would be e.g:
mov eax, [A+rcx*4]
where A is the absolute 32-bit address of the static array. This works fine with Linux but once again you can't do this with Mac OS X because the image base is larger than 2^32 by default. To to this on Mac OS X see example 3.11c and 3.11d in Agner's manual. In example 3.11c you could do
mov eax, [(imagerel A) + rbx + rcx*4]
Where you use the extern reference from Mach O __mh_execute_header to get the image base. In example 3.11c you use rip relative addressing and load the address like this
lea rbx, [rel A]; rel tells nasm to do [rip + A]
mov eax, [rbx + 4*rcx] ; A[i]
According to the documentation for the x86 64bit instruction set http://download.intel.com/products/processor/manual/325383.pdf
PUSH only accepts 8, 16 and 32bit immediate values (64bit registers and register addressed memory blocks are allowed though).
PUSH msg
Where msg is a 64bit immediate address will not compile as you found out.
What calling convention is _printf defined as in your 64bit library?
Is it expecting the parameter on the stack or using a fast-call convention where the parameters on in registers? Because x86-64 makes more general purpose registers available the fast-call convention is used more often.

Resources