Inline Assembly code in VC++ :: help needed on syscall to WaitForSingleObject - winapi

I have coded in VS2019 using VC++ and compiled using the Intel C++ compiler, a 64 bit command line music file player to play WAV files using WASAPI. The OS is Win 7-SP1.
This is the part of code on which I have questions and need help on. Declaration of variables are left out for conciseness.
// activate an IAudioClient
IAudioClient* pAudioClient;
...
...
// create an event
HANDLE hNeedDataEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
// set the event handle
hr = pAudioClient->SetEventHandle(hNeedDataEvent);
...
...
//works fine
do
{
WaitForSingleObject(hNeedDataEvent, INFINITE);
hr = pAudioRenderClient->ReleaseBuffer(nFramesInBuffer, 0);
hr = pAudioRenderClient->GetBuffer(nFramesInBuffer, &pData);
memcpy(pData, sound_buffer + nBytesToSkip, nBytesThisPass);
nBytesToSkip += nBytesThisPass;
} while (--nBuffersPlayed);
I want to replace the line of code: WaitForSingleObject(hNeedDataEvent, INFINITE);
with inline Assembly code using a syscall. Portability is unimportant since this is just for experimentation/learning because have no knowledge of Assembler.
I found a syscall table for Win7-SP1 on Github and here's what it says for NtWaitForSingleObject:
; ULONG64 __stdcall NtWaitForSingleObject( ULONG64 arg_01 , ULONG64 arg_02 , ULONG64 arg_03 );
NtWaitForSingleObject PROC STDCALL
mov r10 , rcx
mov eax , 1
;syscall
db 0Fh , 05h
ret
NtWaitForSingleObject ENDP
I think the inline Assembly code to replace the call to WaitForSingleObject should be:
__asm
{
mov r10, ?????? ; pHandle
xor edx, edx ; FALSE: The alert cannot be delivered
xor r8d, r8d ; Time-out interval, in microseconds. NULL means infinite
mov eax, 1 ; code number for WaitForSingleObject
syscall
}
My questions are:
What exactly do I need to move to r10 so that it will contain the "handle" of the event?
Is the rest of the inline Assembly code correct?
As an aside, when I disassembled my compiled code I see this:
mov rcx, [rbp+220h+hHandle] ; hHandle
mov edx, 0FFFFFFFFh ; dwMilliseconds
call cs:WaitForSingleObject

__asm
{
mov r10, hNeedDataEvent ; pHandle
xor edx, edx ; FALSE: The alert cannot be delivered
xor r8d, r8d ; Time-out interval, in microseconds. NULL means infinite
mov eax, 1 ; code number for WaitForSingleObject syscall
}
I just assigned the value of hNeedDataEvent to r10 and it worked.

Related

What does 'dword ptr[this]' mean in VS disassembly of C++ code?

Before calling a member function of an object, the address of the object will be moved to ECX.
Inside the function, ECX will be moved to dword ptr [this], what does this mean?
C++ Source
#include <iostream>
class CAdd
{
public:
CAdd(int x, int y) : _x(x), _y(y) {}
int Do() { return _x + _y; }
private:
int _x;
int _y;
};
int main()
{
CAdd ca(1, 2);
int n = ca.Do();
std::cout << n << std::endl;
}
Disassembly
...
CAdd ca(1, 2);
00A87B4F push 2
00A87B51 push 1
00A87B53 lea ecx,[ca] ; the instance address
00A87B56 call CAdd::CAdd (0A6BA32h)
int Do() { return _x + _y; }
00A7FFB0 push ebp
00A7FFB1 mov ebp,esp
00A7FFB3 sub esp,0CCh
00A7FFB9 push ebx
00A7FFBA push esi
00A7FFBB push edi
00A7FFBC push ecx
00A7FFBD lea edi,[ebp-0Ch]
00A7FFC0 mov ecx,3
00A7FFC5 mov eax,0CCCCCCCCh
00A7FFCA rep stos dword ptr es:[edi]
00A7FFCC pop ecx
00A7FFCD mov dword ptr [this],ecx ; ========= QUESTION HERE!!! =========
00A7FFD0 mov ecx,offset _CC7F790E_main#cpp (0BC51F2h)
00A7FFD5 call #__CheckForDebuggerJustMyCode#4 (0A6AC36h)
00A7FFDA mov eax,dword ptr [this] ; ========= AND HERE!!! =========
00A7FFDD mov eax,dword ptr [eax]
00A7FFDF mov ecx,dword ptr [this]
00A7FFE2 add eax,dword ptr [ecx+4]
00A7FFE5 pop edi
00A7FFE6 pop esi
00A7FFE7 pop ebx
00A7FFE8 add esp,0CCh
00A7FFEE cmp ebp,esp
00A7FFF0 call __RTC_CheckEsp (0A69561h)
00A7FFF5 mov esp,ebp
00A7FFF7 pop ebp
00A7FFF8 ret
MSVC's asm output itself (https://godbolt.org/z/h44rW3Mxh) uses _this$[ebp] with _this$ = -4, in a debug build like this which wastes instructions storing/reloading incoming register args.
_this$ = -4
int CAdd::Do(void) PROC ; CAdd::Do, COMDAT
push ebp
mov ebp, esp
push ecx ; dummy push instead of sub to reserve 4 bytes
mov DWORD PTR _this$[ebp], ecx
mov eax, DWORD PTR _this$[ebp]
...
This is just spilling the register arg to a local on the stack with that name. (The default options for the MSVC version I used on Godbolt, x86 MSVC 19.29.30136, don't include __CheckForDebuggerJustMyCode#4 or the runtime-check stack poisoning (rep stos) in Do(), but the usage of this is still there.)
Amusingly, the push ecx it uses (as a micro-optimization) instead of sub esp, 4 to reserve stack space already stored ECX, making the mov store redundant.
(AFAIK, no compilers actually do use push to both initialize and make space for locals, but it would be an optimization for cases like this: What C/C++ compiler can use push pop instructions for creating local variables, instead of just increasing esp once?. It's just using the push for its effect on ESP, not caring what it stores, even if you enabled optimization. In a function where it did still need to spill it, instead of keeping it in memory.)
Your disassembler apparently folds the frame-pointer (EBP +) into what its defining as a this symbol / macro, making it more confusing if you don't look around at other lines to find out how it defines that text macro or whatever it is.
What disassembler are you using? The one built-in to Visual Studio's debugger?
I guess that would make sense that it's using C local var names this way, even though it looks super weird to people familiar with asm. (Because only static storage is addressable with a mode like [symbol] not involving any registers.)

Repeated call of WriteConsole (NASM x64 on Win64)

I started to learn Assembly lately and for practice, I thought of makeing a small game.
To make the border graphic of the game I need to print a block character n times.
To test this, I wrote the following code:
bits 64
global main
extern ExitProcess
extern GetStdHandle
extern WriteConsoleA
section .text
main:
mov rcx, -11
call GetStdHandle
mov rbx, rax
drawFrame:
mov r12, [sze]
l:
mov rcx, rbx
mov rdx, msg
mov r8, 1
sub rsp, 48
mov r9, [rsp+40]
mov qword [rsp+32], 0
call WriteConsoleA
dec r12
jnz l
xor rcx, rcx
call ExitProcess
section .data
score dd 0
sze dq 20
msg db 0xdb
I wanted to make this with the WinAPI Function for ouput.
Interestingly, this code stops after printing one char when using WriteConsoleA, but when I use C's putchar, it works correctly. I could also manage to make a C equivalent with the WriteConsoleA function, which also works fine. The disassembly of the C code didn't bring me further.
I suspect there's something wrong in my use of the stack that I don't see. Hopefully someone can explain or point out.
You don't want to keep subtracting 48 from RSP through each loop. You only need to allocate that space once before the loop and before you call a C library function or the WinAPI.
The primary problem is with your 4th parameter in R9. The WriteConsole function is defined as:
BOOL WINAPI WriteConsole(
_In_ HANDLE hConsoleOutput,
_In_ const VOID *lpBuffer,
_In_ DWORD nNumberOfCharsToWrite,
_Out_opt_ LPDWORD lpNumberOfCharsWritten,
_Reserved_ LPVOID lpReserved
);
R9 is supposed to be a pointer to a memory location that returns a DWORD with the number of characters written, but you do:
mov r9, [rsp+40]
This moves the 8 bytes starting at memory address RSP+40 to R9. What you want is the address of [rsp+40] which can be done using the LEA instruction:
lea r9, [rsp+40]
Your code could have looked like:
bits 64
global main
extern ExitProcess
extern GetStdHandle
extern WriteConsoleA
section .text
main:
sub rsp, 56 ; Allocate space for local variable(s)
; Allocate 32 bytes of space for shadow store
; Maintain 16 byte stack alignment for WinAPI/C library calls
; 56+8=64 . 64 is evenly divisible by 16.
mov rcx, -11
call GetStdHandle
mov rbx, rax
drawFrame:
mov r12, [sze]
l:
mov rcx, rbx
mov rdx, msg
mov r8, 1
lea r9, [rsp+40]
mov qword [rsp+32], 0
call WriteConsoleA
dec r12
jnz l
xor rcx, rcx
call ExitProcess
section .data
score dd 0
sze dq 20
msg db 0xdb
Important Note: In order to be compliant with the 64-bit Microsoft ABI you must maintain the 16 byte alignment of the stack pointer prior to calling a WinAPI or C library function. Upon calling the main function the stack pointer (RSP) was 16 byte aligned. At the point the main function starts executing the stack is misaligned by 8 because the 8 byte return address was pushed on the stack. 48+8=56 doesn't get you back on a 16 byte aligned stack address (56 is not evenly divisible by 16) but 56+8=64 does. 64 is evenly divisible by 16.

CreateFileA in Windows API in NASM 64 revisited: invalid parameter

I am creating a file in NASM 64 using CreateFileA in the Windows API. Yesterday I posted a question on this, which brought out some useful comments. Today I wrote this section in C and compliled it with Pelles C compiler, and got the disassembly from x64dbg so I can use the output in NASM. I put the disassembly code into my NASM project, and here's the program in NASM as it is now:
CreateAuditFile:
sub rsp,38
xor eax,eax
mov qword [rsp+30],rax
mov eax,80
mov dword [rsp+28],eax
mov eax,2
mov dword [rsp+20],eax
xor r9,r9
xor r8d,r8d
mov edx,40000000
mov rcx,FileName_1
;lea rcx,[rel FileName_1]
call CreateFileA
call GetLastError
mov rdi,FileAuditString ; not from the disassembly
mov [rdi],rax ; not from the disassembly
xor eax,eax
add rsp,38
ret
That's the disassembly of this C code (done as a console app):
int main(void)
{
char* File_Name ="C:\\Audit_File_P1";
HANDLE hFile;
hFile = CreateFileA(
File_Name,
GENERIC_WRITE,
0,
NULL,
CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL,
NULL);
}
This works in C, but in NASM it returns an error code 87, "invalid parameter." I notice two things: (1) the first line is sub rsp,38, which is an odd number and not 16-byte aligned; (2) for the file name (FileName_1), the disassembly shows
"lea rcx,qword ptr ds:[140006270]" but in NASM we use "mov rcx,FileName_1" instead of lea where the file name string is defined in the .data section, as it is here (FileName_1: db "c:\AuditFile_1.txt",0). Also, it inserts a qword parameter at [rsp+30] and a dword at [rsp+28] but a dword needs four bytes, not two, which seems wrong but it works from the C version.
So my question is: which parameter(s) is/are incorrect?
Thanks very much.
This problem was solved with the help of others above, and I am posting the full correct code below for others who need this information in the future:
CreateAuditFile:
mov rcx,FileName_1
sub rsp,56 ; 38h
xor eax,eax
mov qword [rsp+48],rax ; 30h
mov eax,80
mov dword [rsp+40],eax ; 28h
mov eax,2
mov dword [rsp+32],eax ; 20h
xor r9,r9
xor r8d,r8d
mov edx,40000000
call CreateFileA
mov rdi,OutputFileHandle
mov [rdi+r15],rax
xor eax,eax
add rsp,56 ;38h
ret
Here is what changed (changes are followed by a comment and ***):
CreateAuditFile:
sub rsp,38 - should be sub rsp,56 ***
xor eax,eax
mov qword [rsp+30],rax - should be mov qword [rsp+48],rax ***
mov eax,80
mov dword [rsp+28],eax - should be mov dword [rsp+40],eax ***
mov eax,2
mov dword [rsp+20],eax - should be mov dword [rsp+32],eax ***
xor r9,r9
xor r8d,r8d
mov edx,40000000
mov rcx,FileName_1
call CreateFileA
call GetLastError - do not call this without testing the result of CreateFileA ***
mov rdi,FileAuditString - not needed ***
mov [rdi],rax - not needed ***
mov rdi,OutputFileHandle - use this to capture the file handle ***
mov [rdi+r15],rax - use this to capture the file handle ***
xor eax,eax
add rsp,38 - should be add rsp,56 ***
ret
Thanks very much to everyone who responded.

Output from registers using inline assembly are zeroed out

I've been through a couple of threads already, and everything seems to be worded correctly...but my output_01 is returning false. It seems that the inline assembly is writing zeros to my variables...and I just can't figure out why. Below is code from the main c file which calls the assembly, and the assembly it calls (and thrown in the header file as well, though I don't think it has any bearing...but the devil is in the details...right?
main file
#include stdio.h
#include "lab.h"
int output_01();
int main()
{
printf("Starting exercise lab_01:\n");
if(output_01())
printf("lab_01: Successful!");
else
printf("lab_01: Failed!");
return 0;
}
int output_01()
{
int eax=0;
int edx=0;
int ecx=0;
asm("call lab_01;"
"mov %0, eax;"
"mov %1, edx;"
"mov %2, ecx;"
:
:"r" (eax), "r" (edx), "r" (ecx)
:
);
if(eax==3 && edx==1 && ecx==2)
{
printf("\teax = %i\n",eax);
printf("\tedx = %i\n",edx);
printf("\tecx = %i\n",ecx);
return 1;
}
else
{
printf("\teax = %i\n",eax);
printf("\tedx = %i\n",edx);
printf("\tecx = %i\n",ecx);
return 0;
}
}
assembly file
BITS 32 ;you must specify bits mode
segment .text ;you must specify a section
GLOBAL lab_01, labSize_01
lab_01:
;instructions:
;the following registers have the following values:
;eax = 1
;edx = 2
;ecx = 3
;Make it so that the registers have the following values, using only the allowed opcodes and registers:
;eax = 3
;edx = 1
;ecx = 2
;Allowed registers: eax,ebx,ecx,edx
;Allowed opcodes: mov, int3
;Non volatile registers: ebp, ebx, edi, esi
;Volatile registers: eax, ecx, edx
;Forbidden items: immediate values, memory addresses
;;;;;;;;;;;;; EXERCISE SETUP CODE - DO NOT TOUCH
int3 ;make it 'easier' to debug
push ebx; this is to save ebx onto the stack.
mov eax, 1
mov edx, 2
mov ecx, 3
;;;;;;;;;;;;; YOUR CODE BELOW
;int3 ;make it 'easier' to debug
mov ebx, eax ;hold 1
mov eax, ecx ;eax is set 3
mov ecx, edx ;ecx is set to 2
mov edx, ebx ;edx is set to 1
int3 ;make it 'easier' to debug
;;;;;;;;;;;;; YOUR CODE ABOVE
pop ebx;
ret
labSize_01 dd $-lab_01 -1
lab.h file:
extern int lab_01();
You listed the registers as input only. You have no outputs at all. The correct asm for this is:
asm("call lab_01;"
: "=a" (eax), "=d" (edx), "=c" (ecx)
);

passing arrays to functions in x86 asm

I'm learning x86 asm and using masm, and am trying to write a function which has the equivalent signature to the following c function:
void func(double a[], double b[], double c[], int len);
I'm not sure how to implement it?
The asm file will be compiled into a win32 DLL.
So that I can understand how to do this, can someone please translate this very simple function into asm for me:
void func(double a[], double b[], double c[], int len)
{
// a, b, and c have the same length, given by len
for (int i = 0; i < length; i++)
c[i] = a[i] + b[i];
}
I tried writing a function like this in C, compiling it, and looking at the corresponding disassembled code in the exe using OllyDbg but I couldn't even find my function in it.
Thank you kindly.
I haven't written x86 for a while but I can give you a general idea of how to do it. Since I don't have an assembler handy, this is written in notepad.
func proc a:DWORD, b:DWORD, c:DWORD, len:DWORD
mov eax, len
test eax, eax
jnz #f
ret
##:
push ebx
push esi
xor eax, eax
mov esi, a
mov ebx, b
mov ecx, c
##:
mov edx, dword ptr ds:[ebx+eax*4]
add edx, dword ptr ds:[ecx+eax*4]
mov [esi+eax*4], edx
cmp eax, len
jl #b
pop esi
pop ebx
ret
func endp
The above function conforms to stdcall and is approximately how you would translate to x86 if your arguments were integers. Unfortunately, you are using doubles. The loop would be the same but you'd need to use the FPU stack and opcodes for doing the arithmetic. I haven't used that for a while and couldn't remember the instructions off the top of my head unfortunately.
You have to pass the memory addresses of the arrays. Consider the following code:
.data?
array1 DWORD 4 DUP(?)
.code
main PROC
push LENGTHOF array1
push OFFSET array1
call arrayFunc
main ENDP
arrayFunc PROC
push ebp
mov ebp, esp
push edi
mov edi, [ebp+08h]
mov ecx, [ebp+0Ch]
L1:
;reference each element of given array by [edi]
;add "TYPE" *array* to edi to increment
loop L1:
pop edi
pop ebp
ret 8
arrayFunc ENDP
END main
I just wrote this code for you to understand the concept. I leave it to you to figure out how to properly figure the usage of registers in order to achieve your program's goals.

Resources