Using WinDBG for debugging the assembly code of an executable, it seems that compiler inserts some other codes between two sequential statements. The statements are pretty simple, e.g. they don't work with complex objects for function calls;
int a, b;
char c;
long l;
a = 0; // ##
b = a + 1; // %%
c = 1; // ##
l = 1000000;
l = l + 1;
And the disassembly is
## 008a1725 c745f800000000 mov dword ptr [ebp-8],0
008a172c 80bd0bffffff00 cmp byte ptr [ebp-0F5h],0 ss:002b:0135f71f=00
008a1733 750d jne test!main+0x42 (008a1742)
008a1735 687c178a00 push offset test!main+0x7c (008a177c)
008a173a e893f9ffff call test!ILT+205(__RTC_UninitUse) (008a10d2)
008a173f 83c404 add esp,4
008a1742 8b45ec mov eax,dword ptr [ebp-14h]
%% 008a1745 83c001 add eax,1
008a1748 c6850bffffff01 mov byte ptr [ebp-0F5h],1
008a174f 8945ec mov dword ptr [ebp-14h],eax
## 008a1752 c645e301 mov byte ptr [ebp-1Dh],1
Please note that ##, %% and ## in the disassembly list show the corresponding C++ lines.
So what are that call, cmp, jne and push?
It is the compiler run-time error checking (RTC), the RTC switch check for uninitialized variables, I think that you can manage it from Visual Studio (compiler options).
For more information, take a look to this. Section /RTCu switch
Related
Before calling a member function of an object, the address of the object will be moved to ECX.
Inside the function, ECX will be moved to dword ptr [this], what does this mean?
C++ Source
#include <iostream>
class CAdd
{
public:
CAdd(int x, int y) : _x(x), _y(y) {}
int Do() { return _x + _y; }
private:
int _x;
int _y;
};
int main()
{
CAdd ca(1, 2);
int n = ca.Do();
std::cout << n << std::endl;
}
Disassembly
...
CAdd ca(1, 2);
00A87B4F push 2
00A87B51 push 1
00A87B53 lea ecx,[ca] ; the instance address
00A87B56 call CAdd::CAdd (0A6BA32h)
int Do() { return _x + _y; }
00A7FFB0 push ebp
00A7FFB1 mov ebp,esp
00A7FFB3 sub esp,0CCh
00A7FFB9 push ebx
00A7FFBA push esi
00A7FFBB push edi
00A7FFBC push ecx
00A7FFBD lea edi,[ebp-0Ch]
00A7FFC0 mov ecx,3
00A7FFC5 mov eax,0CCCCCCCCh
00A7FFCA rep stos dword ptr es:[edi]
00A7FFCC pop ecx
00A7FFCD mov dword ptr [this],ecx ; ========= QUESTION HERE!!! =========
00A7FFD0 mov ecx,offset _CC7F790E_main#cpp (0BC51F2h)
00A7FFD5 call #__CheckForDebuggerJustMyCode#4 (0A6AC36h)
00A7FFDA mov eax,dword ptr [this] ; ========= AND HERE!!! =========
00A7FFDD mov eax,dword ptr [eax]
00A7FFDF mov ecx,dword ptr [this]
00A7FFE2 add eax,dword ptr [ecx+4]
00A7FFE5 pop edi
00A7FFE6 pop esi
00A7FFE7 pop ebx
00A7FFE8 add esp,0CCh
00A7FFEE cmp ebp,esp
00A7FFF0 call __RTC_CheckEsp (0A69561h)
00A7FFF5 mov esp,ebp
00A7FFF7 pop ebp
00A7FFF8 ret
MSVC's asm output itself (https://godbolt.org/z/h44rW3Mxh) uses _this$[ebp] with _this$ = -4, in a debug build like this which wastes instructions storing/reloading incoming register args.
_this$ = -4
int CAdd::Do(void) PROC ; CAdd::Do, COMDAT
push ebp
mov ebp, esp
push ecx ; dummy push instead of sub to reserve 4 bytes
mov DWORD PTR _this$[ebp], ecx
mov eax, DWORD PTR _this$[ebp]
...
This is just spilling the register arg to a local on the stack with that name. (The default options for the MSVC version I used on Godbolt, x86 MSVC 19.29.30136, don't include __CheckForDebuggerJustMyCode#4 or the runtime-check stack poisoning (rep stos) in Do(), but the usage of this is still there.)
Amusingly, the push ecx it uses (as a micro-optimization) instead of sub esp, 4 to reserve stack space already stored ECX, making the mov store redundant.
(AFAIK, no compilers actually do use push to both initialize and make space for locals, but it would be an optimization for cases like this: What C/C++ compiler can use push pop instructions for creating local variables, instead of just increasing esp once?. It's just using the push for its effect on ESP, not caring what it stores, even if you enabled optimization. In a function where it did still need to spill it, instead of keeping it in memory.)
Your disassembler apparently folds the frame-pointer (EBP +) into what its defining as a this symbol / macro, making it more confusing if you don't look around at other lines to find out how it defines that text macro or whatever it is.
What disassembler are you using? The one built-in to Visual Studio's debugger?
I guess that would make sense that it's using C local var names this way, even though it looks super weird to people familiar with asm. (Because only static storage is addressable with a mode like [symbol] not involving any registers.)
I am debugging a simple code in c++ and, looking at the disassembly.
In the disassembly, all the calculations are done in the registers. And later, the result of the operation is returned. I only see the a and b variables being pushed onto the stack (the code is below). I don't see the resultant c variable pushed onto the stack. Am I missing something?
I researched on the internet. But on the internet it looks like all variables a,b and c should be pushed onto the stack. But in my Disassembly, I don't see the resultant variable c being pushed onto the stack.
C++ code:
#include<iostream>
using namespace std;
int AddMe(int a, int b)
{
int c;
c = a + b;
return c;
}
int main()
{
AddMe(10, 20);
return 0;
}
Relevant assembly code:
int main()
{
00832020 push ebp
00832021 mov ebp,esp
00832023 sub esp,0C0h
00832029 push ebx
0083202A push esi
0083202B push edi
0083202C lea edi,[ebp-0C0h]
00832032 mov ecx,30h
00832037 mov eax,0CCCCCCCCh
0083203C rep stos dword ptr es:[edi]
0083203E mov ecx,offset _E7BF1688_Function#cpp (0849025h)
00832043 call #__CheckForDebuggerJustMyCode#4 (083145Bh)
AddMe(10, 20);
00832048 push 14h
0083204A push 0Ah
0083204C call std::operator<<<std::char_traits<char> > (08319FBh)
00832051 add esp,8
return 0;
00832054 xor eax,eax
}
As seen above, 14h and 0Ah are pushed onto the stack - corresponding to AddMe(10, 20);
But, when we look at the disassembly for the AddMe function, we see that the variable c (c = a + b), is not pushed onto the stack.
snippet of AddMe in Disassembly:
…
int c;
c = a + b;
00836028 mov eax,dword ptr [a]
0083602B add eax,dword ptr [b]
0083602E mov dword ptr [c],eax
return c;
00836031 mov eax,dword ptr [c]
}
shouldn't c be pushed to the stack in this program? Am I missing something?
All the calculations take place in registers.
Well yes, but they're stored afterwards.
Using memory-destination add instead of just using the accumulator register (EAX) would be an optimization. And one that's impossible when when the result needs to be in a different location than any of the inputs to an expression.
Why is the stack not storing the result of the register computation here
It is, just not with push
You compiled with optimization disabled (debug mode) so every C object really does have its own address in the asm, and is kept in sync between C statements. i.e. no keeping C variables in registers. (Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?). This is one reason why debug mode is extra slow: it's not just avoiding optimizations, it's forcing store/reload.
But the compiler uses mov not push because it's not a function arg. That's a missed optimization that all compilers share, but in this case it's not even trying to optimize. (What C/C++ compiler can use push pop instructions for creating local variables, instead of just increasing esp once?). It would certainly be possible for the compiler to reserve space for c in the same instruction as storing it, using push. But compilers instead to stack-allocation for all locals on entry to a function with one sub esp, constant.
Somewhere before the mov dword ptr [c],eax that spills c to its stack slot, there's a sub esp, 12 or something that reserves stack space for c. In this exact case, MSVC uses a dummy push to reserve 4 bytes space, as an optimization over sub esp, 4.
In the MSVC asm output, the compiler will emit a c = ebp-4 line or something that defines c as a text substitution for ebp-4. If you looked at disassembly you'd just see [ebp-4] or whatever addressing mode instead of.
In MSVC asm output, don't assume that [c] refers to static storage. It's actually still stack space as expected, but using a symbolic name for the offset.
Putting your code on the Godbolt compiler explorer with 32-bit MSVC 19.22, we get the following asm which only uses symbolic asm constants for the offset, not the whole addressing mode. So [c] might just be that form of listing over-simplifying even further.
_c$ = -4 ; size = 4
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
int AddMe(int,int) PROC ; AddMe
push ebp
mov ebp, esp ## setup a legacy frame pointer
push ecx # RESERVE 4B OF STACK SPACE FOR c
mov eax, DWORD PTR _a$[ebp]
add eax, DWORD PTR _b$[ebp] # c = a+b
mov DWORD PTR _c$[ebp], eax # spill c to the stack
mov eax, DWORD PTR _c$[ebp] # reload it as the return value
mov esp, ebp # restore ESP
pop ebp # tear down the stack frame
ret 0
int AddMe(int,int) ENDP ; AddMe
The __cdecl calling convention, which AddMe() uses by default (depending on the compiler's configuration), requires parameters to be passed on the stack. But there is nothing requiring local variables to be stored on the stack. The compiler is allowed to use registers as an optimization, as long as the intent of the code is preserved.
I am trying to encrypt a text file using the adobe type 1 font encryption algorithm. However, I don't know how to properly implement the algorithm in assembly language. Please, help me if you can.
Here is the adobe type 1 font encryption algorithm:
unsigned short int r;
unsigned short int c1 =52845;
unsigned short int c2 = 22719;
unsigned char eencrypt(char plain) unsigned char plain;
{ unsigned char cipher;
cipher = (plain ^ (r >> 8));
r = (cipher + r) * c1 + c2;
return cipher;
}
Here is my code:
.model tiny
.data
filename db "file.txt", 0
bufferSize = 512
filehandle dw ?
buffer db bufferSize dup (0)
r dw 0
c1 dw 52845
c2 dw 22719
cipher dw ?
message1 db 'Cannot open file. $'
message2 db 'Cannot read file. $'
message3 db 'Cannot close file. $'
.code
org 100h
start:
call open
call read
call close
call Exit
;procedures
open:
mov ah,3DH
mov al,0
mov dx, offset filename
int 21h
jc openErr
mov filehandle, ax
ret
read: ;reads file
mov ah, 3Fh
mov bx, filehandle
mov cx, bufferSize
mov dx, offset buffer
int 21h
cmp ax,0
jc readErr
;displays content of file
call clear
mov ah, 9
mov dx, offset buffer
int 21h
ret
close:
mov ah, 3Eh
mov bx, filehandle
int 21h
jc closeErr
ret
encrypt:
; need loop to loop through each char, don't know how to do that
mov ax, [r]
shr ax, 8
mov bl, [buffer]
xor bh,bh
xor bx, ax
mov cipher, bx
mov dx, cipher
add dx, [r] ;get error: extra characters on line
imul dx, c1
add dx, c2
mov [r], dx
;decrypt:
clear: ;clears the screen
mov ax,003h
int 10h
ret
Exit:
mov ax, 4C00h
int 21h
newline: ;prints a newline
mov ah, 2
mov dl, 0DH
int 21h
mov dl, 0AH
int 21h
ret
;error messages
openErr :
call newline
lea DX,message1 ;set up pointer to error message
mov AH,9 ;display string function
int 21H ;DOS call
stc ;set error flag
ret
readErr :
call newline
lea DX,message2 ;set up pointer to error message
mov AH,9 ;display string function
int 21H ;DOS call
stc ;set error flag
ret
closeErr :
call newline
lea DX,message3 ;set up pointer to error message
mov AH,9 ;display string function
int 21H ;DOS call
stc ;set error flag
ret
end start
;encrypt:
; need loop to loop through each char, don't know how to do that
;mov ax, r
;shr ax, 8 r>>8
;mov bl, buffer bl = buffer
buffer is symbolic address, not value in the buffer. Use mov bl,[buffer] to read the value from memory. And if you want to loop over full buffer content, put that address into some register, si is often favoured for keeping pointer to the source of data (because it is hardcoded in lods instructions, and "s" may be read as "source"). If you are C++ compiler, you don't care, and you pick any spare register you have, without considering it's name.
;xor bx, ax bx = buffer ^(r>>8)
bh was not set and may be anything, ruining your calculation. Either do xor bx,bx before loading the bl from buffer, or xor bh,bh to clear bh only (tiny fraction slower on modern x86 because it must then combine bh+bl into bx), or use movzx bx,byte ptr [buffer] in previous step to zero-extend the char value into word value.
;mov cipher, bx cipher = buffer^(r>>8)
Label cipher was defined ahead of db ?, but here you are writing two bytes into memory, so it will also overwrite first byte of message1 string. And I strongly suggest to use [] every time you write into memory, even when TASM/MASM allows this kind of syntax, but I find it difficult to read (seeing [cipher] will make my brain automatically think "memory access" even with quick glance on source).
;mov cx, c1
c1 and c2 can be EQU, so they will compile as intermediate values directly into code.
;add cx, c2
That's not, what the C encrypt does (multiplication has priority over addition).
;mov dx, cipher
Again you treat cipher as word, which makes sense, but it's against the db ? (so the db needs fixing, not code here).
;add dx, r
;imul dx, cx
I would use [] brackets around r again, and the multiplication/addition order is of course wrong.
Not a bad try, but don't hesitate to uncomment that piece of code, pre-set registers with some debug values, and jump straight to it, open the binary in debugger, and single-step over it to verify it works as you did want. You would probably quickly find all those things I wrote above.
Then just turn it into loop (use any ASM book/tutorial with some example about working with arrays/strings, learn to use pointers, then it should be not that hard to finish).
I'm using C++builder for GUI application on Win32. Borland compiler optimization is very bad and does not know how to use SSE.
I have a function that is 5 times faster when compiled with mingw gcc 4.7.
I think about asking gcc to generate assembler code and then use this cod inside my C function because Borland compiler allows inline assembler.
The function in C looks like this :
void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
{
double s = 77.777;
size_t m = mA[NT-3];
AV[2]=x[n-4]+m*s;
}
I made the function code very simple in order to simplify my question. My real function contains many loops.
The Borland C++ compiler generated this assembler code :
;
; void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
;
#1:
push ebp
mov ebp,esp
add esp,-16
push ebx
;
; {
; double s = 77.777;
;
mov dword ptr [ebp-8],1580547965
mov dword ptr [ebp-4],1079210426
;
; size_t m = mA[NT-3];
;
mov edx,dword ptr [ebp+20]
mov ecx,dword ptr [ebp+24]
mov eax,dword ptr [edx+4*ecx-12]
;
; AV[2]=x[n-4]+m*s;
;
?live16385#48: ; EAX = m
xor edx,edx
mov dword ptr [ebp-16],eax
mov dword ptr [ebp-12],edx
fild qword ptr [ebp-16]
mov ecx,dword ptr [ebp+8]
mov ebx,dword ptr [ebp+12]
mov eax,dword ptr [ebp+16]
fmul qword ptr [ebp-8]
fadd qword ptr [ecx+8*ebx-32]
fstp qword ptr [eax+16]
;
; }
;
?live16385#64: ;
#2:
pop ebx
mov esp,ebp
pop ebp
ret
While the gcc generated assembler code is :
_Test_Fn:
mov edx, DWORD PTR [esp+20]
mov eax, DWORD PTR [esp+16]
mov eax, DWORD PTR [eax-12+edx*4]
mov edx, DWORD PTR [esp+8]
add eax, -2147483648
cvtsi2sd xmm0, eax
mov eax, DWORD PTR [esp+4]
addsd xmm0, QWORD PTR LC0
mulsd xmm0, QWORD PTR LC1
addsd xmm0, QWORD PTR [eax-32+edx*8]
mov eax, DWORD PTR [esp+12]
movsd QWORD PTR [eax+16], xmm0
ret
LC0:
.long 0
.long 1105199104
.align 8
LC1:
.long 1580547965
.long 1079210426
.align 8
I like to get help about how the function arguments acces is done in gcc and Borland C++.
My function in C++ for Borland would be something like :
void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
{
__asm
{
put gcc generated assembler here
}
}
Borland starts using ebp register while gcc use esp register.
Can I force one of the compilers to generate compatible code for accessing the arguments using some calling conventions like cdecl ou stdcall ?
The arguments are passed similarly in both cases. The difference is that the code generated by Borland expresses the argument locations relative to EBP register and GCC relative to ESP, but both of them refer to the same addresses.
Borlands sets EBP to point to the start of the function's stack frame and expresses locations relative to that, while GCC doesn't set up a new stack frame but expresses locations relative to ESP, which the caller has left pointing to the end of the caller's stack frame.
The code generated by Borland sets up a stack frame at the beginning of the function, causing EBP in the Borland code to be equal to ESP in the GCC code decreased by 4. This can be seen by looking at the first two Borland lines:
push ebp ; decrease esp by 4
mov ebp,esp ; ebp = the original esp decreased by 4
The GCC code doesn't alter ESP and Borland code doesn't alter EBP until the end of the procedure, so the relationsip holds when the arguments are accessed.
The calling convention seems to be cdecl in both of the cases, and there's no difference in how the functions are called. You can add keyword __cdecl to both in order to make that clear.
void __cdecl Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
However adding inline assembly compiled with GCC to the function compiled with Borland is not straightforward, because Borland might set up a stack frame even if the function body contains only inline assembly, causing the value of ESP register to differ from the one used in the GCC code. I see three possible workarounds:
Compile with Borland without the option "Standard stack frames". If the compiler figures out that a stack frame is not needed, this might work.
Compile with GCC without the option -fomit-frame-pointer. This should make sure that atleast the value of EBP is the same in both. The option is enabled at levels -O, -O2, -O3 and -Os.
Manually edit the assembly produced by GCC, changing references to ESP to EBP and adding 4 to the offset.
I would recommend you do some reading up on Application Binary Interfaces.
Here is a relevant link to help you figure out what compiler generates what sort of code:
https://en.wikipedia.org/wiki/X86_calling_conventions
I'd try either compiling everything with GCC, or see if compiling just the critical file with GCC and the rest with Borland and linking together works. What you explain can be made to work, but it will be a hard job that probably isn't worth your invested time (unless it will run very frequently on many, many machines).
I'm learning x86 asm and using masm, and am trying to write a function which has the equivalent signature to the following c function:
void func(double a[], double b[], double c[], int len);
I'm not sure how to implement it?
The asm file will be compiled into a win32 DLL.
So that I can understand how to do this, can someone please translate this very simple function into asm for me:
void func(double a[], double b[], double c[], int len)
{
// a, b, and c have the same length, given by len
for (int i = 0; i < length; i++)
c[i] = a[i] + b[i];
}
I tried writing a function like this in C, compiling it, and looking at the corresponding disassembled code in the exe using OllyDbg but I couldn't even find my function in it.
Thank you kindly.
I haven't written x86 for a while but I can give you a general idea of how to do it. Since I don't have an assembler handy, this is written in notepad.
func proc a:DWORD, b:DWORD, c:DWORD, len:DWORD
mov eax, len
test eax, eax
jnz #f
ret
##:
push ebx
push esi
xor eax, eax
mov esi, a
mov ebx, b
mov ecx, c
##:
mov edx, dword ptr ds:[ebx+eax*4]
add edx, dword ptr ds:[ecx+eax*4]
mov [esi+eax*4], edx
cmp eax, len
jl #b
pop esi
pop ebx
ret
func endp
The above function conforms to stdcall and is approximately how you would translate to x86 if your arguments were integers. Unfortunately, you are using doubles. The loop would be the same but you'd need to use the FPU stack and opcodes for doing the arithmetic. I haven't used that for a while and couldn't remember the instructions off the top of my head unfortunately.
You have to pass the memory addresses of the arrays. Consider the following code:
.data?
array1 DWORD 4 DUP(?)
.code
main PROC
push LENGTHOF array1
push OFFSET array1
call arrayFunc
main ENDP
arrayFunc PROC
push ebp
mov ebp, esp
push edi
mov edi, [ebp+08h]
mov ecx, [ebp+0Ch]
L1:
;reference each element of given array by [edi]
;add "TYPE" *array* to edi to increment
loop L1:
pop edi
pop ebp
ret 8
arrayFunc ENDP
END main
I just wrote this code for you to understand the concept. I leave it to you to figure out how to properly figure the usage of registers in order to achieve your program's goals.