Fairly simple question this time. How do I write to screen the contents of a single register in Assembly? I'm getting a bit tired of calling DumpRegs just to see the value of one register.
I'm using x86 architecture, and MASM in Visual Studio, and Irvine32.lib.
Irvines's DumpReg uses repeatedly a macro of Macros.inc: mShowRegister. It can be used directly. Example:
INCLUDE Irvine32.inc
INCLUDE Macros.inc
.code
main PROC
mov esi, 0DeadBeefh
mShowRegister ESI, ESI
exit
main ENDP
END main
A documented macro with more options is mShow. Example:
INCLUDE Irvine32.inc
INCLUDE Macros.inc
.code
main PROC
mov esi, 0DeadBeefh
mshow ESI, h
exit
main ENDP
END main
Irvine32 has output functions that take a value in EAX, such as WriteDec (unsigned base 10).
See the documentation http://programming.msjc.edu/asm/help/source/irvinelib/writedec.htm which has links to WriteHex (hex = base 16), and WriteInt (signed base 10).
These functions use the Irvine32 calling convention and preserve all registers, including the arg-passing register EAX, so you can even insert them as a debug-print for EAX at least. More usually you'd use them as simply normal output functions.
Usually for asm debugging you'd use a debugger, not debug-print function calls.
I am a newcomer to assembly trying to understand the objdump of the following function:
int nothing(int num) {
return num;
}
This is the result (linux, x86-64, gcc 8):
push rbp
mov rbp,rsp
mov DWORD PTR [rbp-0x4],edi
mov eax,DWORD PTR [rbp-0x4]
pop rbp
ret
My questions are:
1. Where does edi come from? Reading through some intro docs, I was under the impression that [rbp-0x4] would contain num.
2. From the above, apparently edi contains the argument. But then what role does [rbp-0x4] play? Why not just mov eax, edi?
Thanks!
Where does edi come from?
... From the above, apparently edi contains the argument.
This is the calling convention (for Linux and many other OSs):
All programming languages for these OSs pass the first parameter in rdi. The result (value returned) is passed in rax.
And because your C compiler interprets int as 32 bits, only the low 32 bits of rdi and rax are used - which is edi and eax.
Programming languages for Windows pass the first parameter in rcx...
But then what role does [rbp-0x4] play?
Using rbp has mainly historic reasons here. In 16-bit code (as it was used in 1980s and 1990s PCs) it was not possible to address data on the stack using the sp register (which corresponds to rsp). The only register that allowed addressing values on the stack easily was the bp register (corresponding to rbp).
And even in 32- or 64-bit code it is more difficult to write a compiler that addresses local variables (on the stack) using rsp rather than using rbp.
The compiler generates the first 3 instructions of assembler code before it knows what is done in the C function. The compiler puts the value on the stack because you could do something like address = &num in the code. This is however not possible when num is in a register but only when num is located in the memory.
Why not just mov eax, edi?
If you tell the compiler to optimize the code, it will first check the content of the C function before generating the first assembler instruction. It will find out that it is not required to put the value into the memory.
In this case the code will indeed look like this:
mov eax, edi
ret
This is the code that I use with FASM:
format PE console
entry main
include '..\MACRO\import32.inc'
section '.data' data readable writeable
msg db "привіт!",0dh,0ah,0 ;hi
lcl_set db ?
section '.code' code readable executable
main:
;fail without set locale
push msg
call [printf]
pop ecx
;succeed with set locale
push msg
call _liapnuty
pop ecx
push 0
call [ExitProcess]
_liapnuty:
push ebp
mov ebp, esp
;sub esp, 0
mov ebx,[ebp+8] ; 1st arg addr
mov al, [lcl_set]
or al, al
jnz _liapnuty_rest
call __set_locale
_liapnuty_rest:
push ebx
call [printf]
pop ebx
mov esp, ebp
pop ebp
ret 0
__set_locale:
mov al, [lcl_set]
or al, al
jnz __set_locale_rest
push 1251
call SetConsoleCP
call SetConsoleOutputCP
pop ecx
mov [lcl_set], 1
;push lcl
;call [system]
; pop ecx
; mov [lcl_set], 1
;push cls
;call [printf]
; pop ecx
__set_locale_rest:
ret 0
section '.idata' import data readable
library kernel,'kernel32.dll',\
msvcrt,'msvcrt.dll'
import kernel,\
SetConsoleCP,'SetConsoleCP',\
SetConsoleOutputCP,'SetConsoleOutputCP',\
ExitProcess,'ExitProcess'
import msvcrt,\
printf,'printf'
It works almost perfectly, except that before exiting it waits for like a second for some reason. It outputs data almost instantly, yet it fails to shut down quickly. If the reason is using these libraries or not clearing the stack after calling ExitProcess (which obviously can't be done), then let me know and I will mostly gladly accept this answer, but I want to be 100% sure I'm doing everything correctly.
The reason for all of it was because kernel32 functions pop their parameters themselves on return. If I remove unnecessary pops it starts working fast again. Of course, the program still runs with damaged stack but it does a lot of damage control at the end. That's why it was slow, but still worked. For everyone facing this issue, make sure to be careful with the calling convention.
To debug the application and find the error I used OLLYDBG. It's free and it works. It helps you debug EXEs and DLLs, allowing to step one command at a time. Also it shows the memory, the stack and all of the registers and flags.
Using the stack I was able to find out that it gets corrupted.
I'm stepping through Structured Error Handling recovery code in Windows 7
(e.g, what happens after SEH handler is done and passes back "CONTINUE" code).
Here's a function which is called:
7783BD9F mov edi,edi
7783BDA1 push ebp
7783BDA2 mov ebp,esp
7783BDA4 push 1
7783BDA6 push dword ptr [ebp+0Ch]
7783BDA9 push dword ptr [ebp+8]
7783BDAC call 778692DF
7783BDB1 pop ebp
7783BDB2 ret 8
I'm used to the function prolog of "push ebp/mov ebp,esp". What's the purpose
of the "mov edi,edi"?
Raymond Chen (one of the Microsoft developers) has answered this exact question:
Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?
And he links an even earlier reference:
Why does the compiler generate a MOV EDI, EDI instruction at the beginning of functions?
Basically, it leaves space for a jump instruction to be inserted during hot patching.
So I've got a very strange and specific problem here. We're putting seemingly valid code into a linker, and then the linker is making some mistakes, like removing a valid ptr label and instead of replacing it with a value, it's putting in 0 instead. But not everywhere either. It starts at a pretty arbitrary point.
So, some background: we have an interpreted language that is converted into assembly by putting together hand-generated chunks of assembly (by an in-house compiler-like application) and adding in variables where required. This is a system that has been working for around 10 years if not longer, in pretty much it's current form, so the validity of this method is not currently in question. This assembly is assembled to an obj file using the Microsoft Assembler (ml.exe or MASM).
Then in a separate step, this obj file is linked using the Microsoft Incremental Linker together with some other libraries (static, and import libraries for dlls) to create an executable.
Following is a portion of the assembly (.asm) file that the assembler outputs when it creates the obj file in the first step:
call _c_rt_strcmp
mov di, 1
mov ebp, esp
cmp ax, 0
je sym2148
dec di
sym2148: mov [ebp+6], di
add esp, 6
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 0bb49h
pop ax
mov byte ptr [ebx],al
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 012656h
push ebx
mov eax, OFFSET sym2151
push eax
call _c_rt_strcmp
mov di, 1
mov ebp, esp
cmp ax, 0
je sym2152
dec di
sym2152: mov [ebp+6], di
add esp, 6
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 0bb32h
pop ax
mov byte ptr [ebx],al
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 012656h
push ebx
mov eax, OFFSET sym2155
push eax
call _c_rt_strcmp
mov di, 1
mov ebp, esp
cmp ax, 0
je sym2156
dec di
sym2156: mov [ebp+6], di
add esp, 6
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 0bb4ah
pop ax
mov byte ptr [ebx],al
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 0126bbh
push ebx
mov eax, OFFSET sym2159
push eax
call _c_rt_strcmp
As far as I can see, that is correct, and makes sense. I believe that code that generates this does a string compare and then stores a value based on the result of the string compare, and it's a section of 20-30 of them, so there are 10 or so more groups before the first bit I posted, and about 15-20 after the last bit I posted.
Following is the disassembled view of the crash location that Visual C++ 5.0 (old I know, but this is on a legacy system unfortunately) shows when the executable crashes:
004093D2 call 00629570
004093D7 mov di,1
004093DB mov ebp,esp
004093DD cmp ax,0
004093E1 je 004093E5
004093E3 dec di
004093E5 mov word ptr [ebp+6],di
004093E9 add esp,6
004093EC mov ebx,dword ptr ds:[64174Ch]
004093F2 add ebx,0BB49h
004093F8 pop ax
004093FA mov byte ptr [ebx],al
004093FC mov ebx,dword ptr ds:[64174Ch]
00409402 add ebx,12656h
00409408 push ebx
00409409 mov eax,0
0040940E push eax
0040940F call 00409414
00409414 mov di,1
00409418 mov ebp,esp
0040941A cmp ax,0
0040941E je 00409422
00409420 dec di
00409422 mov word ptr [ebp+6],di
00409426 add esp,6
00409429 mov ebx,dword ptr ds:[0]
0040942F add ebx,0BB32h
00409435 pop ax
00409437 mov byte ptr [ebx],al
00409439 mov ebx,dword ptr ds:[0]
0040943F add ebx,12656h
00409445 push ebx
00409446 mov eax,0
0040944B push eax
0040944C call 00409451
00409451 mov di,1
00409455 mov ebp,esp
00409457 cmp ax,0
0040945B je 0040945F
0040945D dec di
0040945F mov word ptr [ebp+6],di
00409463 add esp,6
00409466 mov ebx,dword ptr ds:[0]
0040946C add ebx,0BB4Ah
00409472 pop ax
00409474 mov byte ptr [ebx],al
00409476 mov ebx,dword ptr ds:[0]
0040947C add ebx,126BBh
00409482 push ebx
00409483 mov eax,0
00409488 push eax
00409489 call 0040948E
The actual crash location is 0x00409429.
The two bits of code match, as in they are the same section of code, except the first from the .asm file is what is going into the linker, and the second one is that has come out of the linker.
From what I can see, it's crashing at that location because it's attempting to de-reference an address of 0, right? So of course that's going to fail. Also, if you take a look at the different function calls at 0x004093D2, 0x0040940F, 0x0040944C and 0x00409489, only the first one is valid, the others are simply doing a function call to the line directly after them! And this broken code continues on for the next 15-20 similar segments of code that is defined in that file.
If you look at the corresponding lines for the function calls and the bad pointers in both sections, you will see that in the .asm file everything appears to be correct, but somehow the linker just breaks it all when it compiles the exe, and in a very specific spot, as there are similar chunks of code previous in the file which are constructed correctly.
We do get some linker warnings of the form: "LINK : warning LNK4049: locally defined symbol "_smfv1_ptr" imported". But we were getting those same warnings even when it was working.
The assembler used was ml.exe version 6.11, and the linker was link.exe version 5.10.7303 which were both the version from Visual C++ 5.0. Since the assembled code seems to be right, I'm going to be trying the linker from Visual Studio 2005, 2008 and 2010 and see if anything changes.
I can't really imagine what could create this kind of error, and I thought maybe it was symbols getting messed up, but there are jumps to locations (for small 'if' statements) which are stored as symbols which still work fine after they get through the linker.
Is it possible that a symbol table or something similar is getting overloaded inside the linker, and it's just reverting to bad or default values?
Call to the following address is a clear sign of unresolved symbol reference (it makes sense if you notice that all calls in .obj files are emitted as E8 00 00 00 00). You have zeroes for some data references as well (sym2151, some references to _smfv1_ptr, sym2159). What's strange is that the first call to _c_rt_strcmp did get resolved. I would suggest turning on all warning/debugging/verbose switches that you can find, and generating all kinds of listing and map files. Maybe something will jump out.
Okay, so the end result seems to be that it is a bug with the Visual C++ version of the masm assembler "ml.exe" (big surprise, huh?)
So, moving to the versions of masm and link that were provided in Visual Studio 2005 seems to be the best solution for us.
Thanks for your help guys :)