Compiler generated unexpected `IN AL, DX` (opcode `EC`) while setting up call stack - visual-studio

I was looking at some compiler output, and when a function is called it usually starts setting up the call stack like so:
PUSH EBP
MOV EBP, ESP
PUSH EDI
PUSH ESI
PUSH EBX
So we save the base pointer of the calling routine on the stack, move our own base pointer up, and then store the contents of a few registers on the stack. These are then restored to their original values at the end of the routine, like so:
LEA ESP, [EBP-0Ch]
POP EBX
POP ESI
POP EDI
POP EBP
RET
So far, so good. However, I noticed that in one routine the code that sets up the call stack looks a little different. In fact, it looks like this:
IN AL, DX
PUSH EDI
PUSH ESI
PUSH EBX
This is quite confusing for a number of reasons. For one thing, the end-of-method code is identical to that quoted above for the other method, and in particular seems to expect a saved copy of EBP to be available on the stack.
For another, if I understand correctly the command IN AL, DX reads into the AL register, which is the same as the EAX register, and as it so happens the very next command here is
XOR EAX, EAX
as the program wants to zero a few things it allocated on the stack.
Question: I'm wondering exactly what's going on here that I don't understand. The machine code being translated as IN AL, DX is the single byte EC, whereas the pair of instructions
PUSH EBP
MOV EBP, ESP
would correspond to three byte 55 88 EC. Is the disassembler misreading this somehow? Or is something relying on a side effect I don't understand?
If anyone's curious, this machine code was generated by the CLR's JIT compiler, and I'm viewing it with the Visual Studio debugger. Here's a minimal reproduction in C#:
class C {
string s = "";
public void f(string s) {
this.s = s;
}
}
However, note that this seems to be non-deterministic; sometimes I seem to get the IN AL, DX version, while other times there's a PUSH EBP followed by a MOV EBP, ESP.
EDIT: I'm starting to strongly suspect a disassembler bug -- I just got another situation where it shows IN AL, DX (opcode EC) and the two preceding bytes in memory are 55 88. So perhaps the disassembler is simply confused about the entry point of the method. (Though I'd still like some insight as to why that's happening!)

Sounds like you are using VS2015. Your conclusion is correct, its debugging engine has a lot of bugs. Yes, wrong address. Not the only problem, it does not restore breakpoints properly and you are apt to see the INT3 instruction still in the code. And it can't correctly refresh the disassembly when the jitter has re-generated the code and replace stub calls. You can't trust anything you see.
I recommend you use Tools > Options > Debugging > General and tick the "Use Managed Compatibility Mode" checkbox. That forces the debugger to use an older debugging engine, VS2010 vintage. It is much more stable.
You'll lose some features with this engine, like return value inspection and 64-bit Edit+Continue. Won't be missed when you do this kind of debugging. You will however see fake code addresses, as was always common before, so all CALL addresses are wrong and you can't easily identify calls into the CLR. Flipping the engine back-and-forth is a workaround of sorts, but of course a big annoyance.
This has not been worked on either, I saw no improvements in the Updates. But they no doubt had a big bug list to work through, VS2015 shipped before it was done. Hopefully VS2017 is better, we'll find out soon.

As Hans's answered, it's a bug in Visual Studio.
To confirm the same, I disassembled a binary using IDA 6.5 and Visual Studio 2019. Here is the screenshot:
Visual Studio 2019 missed 2 bytes (0x55 0x8B) while considering the start of main.
Note: 'Use managed compatibility mode' mentioned by Hans didn't fix the issue in VS2019.

Related

x86 assembler pushf causes program to exit

I think my real problem is I don't completely understand the stack frame mechanism so I am looking to understand why the following code causes the program execution to resume at the end of the application.
This code is called from a C function which is several call levels deep and the pushf causes program execution to revert back several levels through the stack and completely exit the program.
Since my work around works as expected I would like to know why using the pushf instruction appears to be (I assume) corrupting the stack.
In the routines I usually setup and clean up the stack with :
sub rsp, 28h
...
add rsp, 28h
However I noticed that this is only necessary when the assembly code calls a C function.
So I tried removing this from both routines but it made no difference. SaveFlagsCmb is an assembly function but could easily be a macro.
The code represents an emulated 6809 CPU Rora (Rotate Right Register A).
PUBLIC Rora_I_A ; Op 46 - Rotate Right through Carry A reg
Rora_I_A PROC
sub rsp, 28h
; Restore Flags
mov cx, word ptr [x86flags]
push cx
popf
; Rotate Right the byte and save the FLAGS
rcr byte ptr [q_s+AREG], 1
; rcr only affects Carry. Save the Carry first in dx then
; add 0 to result to trigger Zero and Sign/Neg flags
pushf ; this causes jump to end of program ????
pop dx ; this line never reached
and dx, CF ; Save only Carry Flag
add [q_s+AREG], 0 ; trigger NZ flags
mov rcx, NF+ZF+CF ; Flag Mask NZ
Call SaveFlagsCmb ; NZ from the add and CF saved in dx
add rsp, 28h
ret
Rora_I_A ENDP
However if I use this code it works as expected:
PUBLIC Rora_I_A ; Op 46 - Rotate Right through Carry A reg
Rora_I_A PROC
; sub rsp, 28h ; works with or without this!!!
; Restore Flags
mov ah, byte ptr [x86flags+LSB]
sahf
; Rotate Right the byte and save the FLAGS
rcr byte ptr [q_s+AREG], 1
; rcr only affects Carry. Save the Carry first in dx then
; add 0 to result to trigger Zero and Sign/Neg flags
lahf
mov dl, ah
and dx, CF ; Save only Carry Flag
add [q_s+AREG], 0 ; trigger NZ flags
mov rcx, NF+ZF+CF ; Flag Mask NZ
Call SaveFlagsCmb ; NZ from the add and CF saved in dx
; add rsp, 28h ; works with or without this!!!
ret
Rora_I_A ENDP
Your reported behaviour doesn't really make sense. Mostly this answer is just providing some background not a real answer, and a suggestion not to use pushf/popf in the first place for performance reasons.
Make sure your debugging tools work properly and aren't being fooled by something into falsely showing a "jump" to somewhere. (And jump where exactly?)
There's little reason to mess around with 16-bit operand size, but that's probably not your problem.
In Visual Studio / MASM, apparently (according to OP's comment) pushf assembles as pushfw, 66 9C which pushes 2 bytes. Presumably popf also assembles as popfw, only popping 2 bytes into FLAGS instead of the normal 8 bytes into RFLAGS. Other assemblers are different.1
So your code should work. Unless you're accidentally setting some other bit in FLAGS that breaks execution? There are bits in EFLAGS/RFLAGS other than condition codes, including the single-step TF Trap Flag: debug exception after every instruction.
We know you're in 64-bit mode, not 32-bit compat mode, otherwise rsp wouldn't be a valid register. And running 64-bit machine code in 32-bit mode wouldn't explain your observations either.
I'm not sure how that would explain pushf being a jump to anywhere. pushf itself can't fault or jump, and if popf set TF then the instruction after popf would have caused a debug exception.
Are you sure you're assembling 64-bit machine code and running it in 64-bit mode? The only thing that would be different if a CPU decoded your code in 32-bit mode should be the REX prefix on sub rsp, 28h, and the RIP-relative addressing mode on [x86flags] decoding as absolute (which would presumably fault). So I don't think that could explain what you're seeing.
Are you sure you're single-stepping by instructions (not source lines or C statements) with a debugger to test this?
Use a debugger to look at the machine code as you single-step. This seem really weird.
Anyway, it seems like a very low-performance idea to use pushf / popf at all, and also to be using 16-bit operand-size creating false dependencies.
e.g. you can set x86 CF with movzx ecx, word ptr [x86flags] / bt ecx, CF.
You can capture the output CF with setc cl
Also, if you're going to do multiple things to the byte from the guest memory, load it into an x86 register. A memory-destination RCR and a memory-destination ADD are unnecessarily slow vs. load / rcr / ... / test reg,reg / store.
LAHF/SAHF may be useful, but you can also do without them too for many cases. popf is quite slow (https://agner.org/optimize/) and it forces a round trip through memory. However, there is one condition-code outside the low 8 in x86 FLAGS: OF (signed overflow). asm-source compatibility with 8080 is still hurting x86 in 2019 :(
You can restore OF from a 0/1 integer with add al, 127: if AL was originally 1, it will overflow to 0x80, otherwise it won't. You can then restore the rest of the condition codes with SAHF. You can extract OF with seto al. Or you can just use pushf/popf.
; sub rsp, 28h ; works with or without this!!!
Yes of course. You have a leaf function that doesn't use any stack space.
You only need to reserve another 40 bytes (align the stack + 32 bytes of shadow space) if you were going to make another function call from this function.
Footnote 1: pushf/popf in other assemblers:
In NASM, pushf/popf default to the same width as other push/pop instructions: 8 bytes in 64-bit mode. You get the normal encoding without an operand-size prefix. (https://www.felixcloutier.com/x86/pushf:pushfd:pushfq)
Like for integer registers, both 16 and 64-bit operand-size for pushf/popf are encodeable in 64-bit mode, but 32-bit operand size isn't.
In NASM, your code would be broken because push cx / popf would push 2 bytes and pop 8, popping 6 bytes of your return address into RFLAGS.
But apparently MASM isn't like that. Probably a good idea to use explicit operand-size specifiers anyway, like pushfw and popfw if you use it at all, to make sure you get the 66 9C encoding, not just 9C pushfq.
Or better, use pushfq and pop rcx like a normal person: only write to 8 or 16-bit partial registers when you need to, and keep the stack qword-aligned. (16-byte alignment before call, 8-byte alignment always.)
I believe this is a bug in Visual Studio. I'm using 2022, so it's an issue that's been around for a while.
I don't know exactly what is triggering it, however stepping over one specific pushf in my code had the same symptoms, albeit with the code actually working.
Putting a breakpoint on the line after the pushf did break, and allowed further debugging of my app. Adding a push ax, pop ax before the pushf also seemed to fix the issue. So it must be a Visual Studio issue.
At this point I think MASM and debugging in Visual Studio has pretty much been abandoned. Any suggestions for alternatives for developing dlls on Windows would be appreciated!

How to find code for crash

I have some 64-bit code that runs in release mode on a server. There's no Visual studio on the server, only on my dev-machine. The program has been written by many authors now (me latest), and some code in it I'm still not familiar with, and its quite big.
The program crashes now and then with a nullpointer. The instruction at 0xwhatever (latest 0x40066c19) referenced memory at 0x00000000 - click on OK to terminate the program. I have all the source and PDB files for the EXE, but when i run it and attach the process, the memory 0x40066c19 is completely out of range. There is only ?? in that area. How do you use the info about "the instruction at ..." ?
The disassembly window displays something like (example) - but as you see there are simply too far from 00000001403CB888 to 0x40066c19
if (LastKickIdle > GetTickCount())
00000001403CB882 call qword ptr [__imp_GetTickCount (0140688310h)]
00000001403CB888 cmp dword ptr [LastKickIdle (0140888DF8h)],eax
00000001403CB88E ja CMainDlg::OnKickIdle+281h (01403CBAB1h)
return 1;
LastKickIdle = GetTickCount() + 500;
00000001403CB894 mov qword ptr [__formal],rbx
00000001403CB89C call qword ptr [__imp_GetTickCount (0140688310h)]
00000001403CB8A2 add eax,1F4h
00000001403CB8A7 mov dword ptr [LastKickIdle (0140888DF8h)],eax
I run into similar situations at work. I created a log class that takes a string arg and writes it to a file with some other useful info. I make entries at the beginning of methods or at places that I think something may be or may later be problematic. With this, I have at least been able to narrow down my search for problems.
Hope this helps.

Windows 64 ABI, correct register use if i do NOT call windows API?

As suggested to me in another question i checked the windows ABI and i'm left a little confused about what i can and cannot do if i'm not calling windows API myself.
My scenario is i'm programming .NET and need a small chunk of code in asm targeting a specific processor for a time critical section of code that does heavy multi pass processing on an array.
When checking the register information in the ABI at https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx
I'm left a little confused about what applies to me if i
1) Don't call the windows API from the asm code
2) Don't return a value and take a single parameter.
Here is what i understand, am i getting all of it right?
RAX : i can overwrite this without preserving it as the function doesn't expect a return value
RCX : I need to preserve this as this is where the single int parameter will be passed, then i can overwrite it and not restore it
RDX/R8/R9 : Should not be initialized as there are no such parameters in my method, i can overwrite those and not restore them
R10/R11 : I can overwrite those without saving them, if the caller needs it he is in charge of preserving them
R12/R13/R14/R15/RDI/RSI/RBX : I can overwrite them but i first need to save them (or can i just not save them if i'm not calling the windows API?)
RBP/RSP : I'm assuming i shouldn't touch those?
If so am i correct that this is the right way to handle this (if i don't care about the time taking to preserve data and need as many registers available as possible)? Or is there a way to use even more registers?
; save required registers
push r12
push r13
push r14
push r15
push rdi
push rsi
push rbx
; my own array processing code here, using rax as the memory address passed as the first parameter
; safe to use rax rbx rcx rdx r8 r9 r10 r11 r12 r13 r14 r15 rdi rsi giving me 14 64bit registers
; 1 for the array address 13 for processing
; should not touch rbp rsp
; restore required registers
pop rbx
pop rsi
pop rdi
pop r15
pop r14
pop r13
pop r12
TL;DR: if you need registers that are marked preserved, push/pop them in proper order. With your code you can use those 14 registers you mention without issues. You may touch RBP if you preserve it, but don't touch RSP basically ever.
It does matter if you call Windows APIs but not in the way I assume you think. The ABI says what registers you must preserve. The preservation information means that the caller knows that there are registers you will not change. You don't need to call any Windows API functions for that requirement to be there.
The idea as an analogue (yeah, I know...): Here are five different colored stacks of sticky notes. You can use any of them, but if you need the red or the blue ones, could you keep the top one in a safe place and put it back when you stop since I need the phone numbers on them. About the other colors I don't care, they were just scratch paper and I've written the information elsewhere.
So if you call an external function you know that no function will ever change the value of the registers marked as preserved. Any other register may change their values and you have to make sure you don't have anything there that needs to be preserved.
And when your function is called, the caller expects the same: if they put a value in a preserved register, it will have the same value after the call. But any non-preserved registers may be whatever and they will make sure they store those values if they need to keep them.
The return value register you may use however you want. If the function doesn't return a value the caller must not expect it to have any specific value and also will not expect it to preserve its value.
You only need to preserve the registers you use. If you don't use all of these, you don't need to preserve all of them.
You can freely use RAX, RCX, RDX, R8, R9, R10 and R11. The latter two must be preserved by the caller, if necessary, not by your function.
Most of the time, these registers (or their subregisters like EAX) are enough for my purposes. I hardly ever need more.
Of course, if any of these (e.g. RCX) contain arguments for your function, it is up to you to preserve them for yourself as long as you need them. How you do that is also up to you. But if you push them, make sure that there is a corresponding pop somewhere.
Use This MSDN page as a guide.

Having problems with GdiGradientFill in FASM

I'm trying to write an ASM version of a Java app I developed recently, as a project in Win32 ASM, but as the title states, I'm having problems with GdiGradientFill; I'd prefer, for the moment, to use FASM, and avoid higher level ASM constructs, such as INVOKE and the use of the WIN32 includes.
What I have, atm:
PUSH [hWnd]
CALL [User32.GetWindowDC]
MOV [hDC], EAX
PUSH rectClient
PUSH [hWnd]
CALL [User32.GetClientRect]
PUSH [rectClient.left]
POP [colorOne.xPos]
PUSH [rectClient.top]
POP [colorOne.yPos]
MOV [colorOne.red], 0xC000
MOV [colorOne.green], 0xC000
MOV [colorOne.blue], 0xC000
MOV [colorOne.alpha], 0x0000
PUSH [rectClient.right]
POP [colorTwo.xPos]
PUSH [rectClient.bottom]
POP [colorTwo.yPos]
MOV [colorTwo.red], 0x0000
MOV [colorTwo.green], 0x2800
MOV [colorTwo.blue], 0x7700
MOV [colorTwo.alpha], 0x0C00
MOV [gRect.UpperLeft], 0
MOV [gRect.LowerRight], 1
PUSH GRADIENT_FILL_RECT_H
PUSH 1
PUSH gRect
PUSH 2
PUSH colorOne
PUSH [hDC]
CALL [GDI32.GdiGradientFill]
However, the code returns only a FALSE, and after going through both MSDN
(http://msdn.microsoft.com/en-us/library/windows/desktop/dd373585(v=vs.85).aspx)
and some other examples (http://www.asmcommunity.net/board/index.php?topic=4100.0), I still can't see what I am doing wrong, can anyone see the flaw here?
An additional problem has been with my attempts to use Msimg32's GradientFill, as this always leads to a crash, however, I have seen some reports that Win2K+ OS's simply pass the parameters from Msimg32 to GDI32; is this accurate, or has anyone else experienced problems with this form?
Pastebin link for whole code: http://pastebin.com/GEHDw6Qe
Thanks for any help, SS
EDIT:
Code is now working, honestly, I have no idea what has changed, I can't see anything different between the previous and now working data, other than changing the PUSH / POP sequence to MOV EAX, [rectClient.left], ect (The PUSH / POP method works, also) - Many thanks to those who offered assistance!
You're passing what looks like a RECT as the 4th parameter to GdiGradientFill. The function expects a GRADIENT_TRIANGLE.
Also, PUSH/POP is a very weird way to copy from one memory location to another. You're doing 4 memory accesses instead of two. Copy via a register; this is not Java.
Are you sure GetWindowDC is what you need? That one returns the DC for the whole window, title and border and all. For just the client area, people normally use GetDC(). When done, call ReleaseDC().

When and why use CoLoadLibrary?

The description of CoLoadLibrary() says it does pretty much the same as LoadLibraryEx() - loads a DLL into the process. COM classes creation functions - CoCreateInstance() and CoGetClassObject() - both do load the necessary DLL into the process too.
Then why is CoLoadLibrary() needed in the first place and how should it be used?
Have a look at the code:
mov edi,edi
push ebp
mov ebp,esp
push 8
push 0
push dword ptr [ebp+8]
call dword ptr [ole32!_imp__LoadLibraryExW (71eb1214)]
pop ebp
ret 8
So it just calls:
LoadLibraryEx( FileName, NULL, LOAD_WITH_ALTERED_SEARCH_PATH ).
Presumably, the routine merely exists for backwards compatibility -- it probably has its roots in Win16.
Perhaps if you were writing your own regsvr32.exe? But JP's disassembly doesn't really support my guess, because you could just use LoadLibraryEx instead. Maybe in the olden days, Microsoft planned on COM DLLs someday being loaded in a different way than regular DLLs (D-COM?), so this was a way of ensuring future compatibility.

Resources