Repeated call of WriteConsole (NASM x64 on Win64) - winapi

I started to learn Assembly lately and for practice, I thought of makeing a small game.
To make the border graphic of the game I need to print a block character n times.
To test this, I wrote the following code:
bits 64
global main
extern ExitProcess
extern GetStdHandle
extern WriteConsoleA
section .text
main:
mov rcx, -11
call GetStdHandle
mov rbx, rax
drawFrame:
mov r12, [sze]
l:
mov rcx, rbx
mov rdx, msg
mov r8, 1
sub rsp, 48
mov r9, [rsp+40]
mov qword [rsp+32], 0
call WriteConsoleA
dec r12
jnz l
xor rcx, rcx
call ExitProcess
section .data
score dd 0
sze dq 20
msg db 0xdb
I wanted to make this with the WinAPI Function for ouput.
Interestingly, this code stops after printing one char when using WriteConsoleA, but when I use C's putchar, it works correctly. I could also manage to make a C equivalent with the WriteConsoleA function, which also works fine. The disassembly of the C code didn't bring me further.
I suspect there's something wrong in my use of the stack that I don't see. Hopefully someone can explain or point out.

You don't want to keep subtracting 48 from RSP through each loop. You only need to allocate that space once before the loop and before you call a C library function or the WinAPI.
The primary problem is with your 4th parameter in R9. The WriteConsole function is defined as:
BOOL WINAPI WriteConsole(
_In_ HANDLE hConsoleOutput,
_In_ const VOID *lpBuffer,
_In_ DWORD nNumberOfCharsToWrite,
_Out_opt_ LPDWORD lpNumberOfCharsWritten,
_Reserved_ LPVOID lpReserved
);
R9 is supposed to be a pointer to a memory location that returns a DWORD with the number of characters written, but you do:
mov r9, [rsp+40]
This moves the 8 bytes starting at memory address RSP+40 to R9. What you want is the address of [rsp+40] which can be done using the LEA instruction:
lea r9, [rsp+40]
Your code could have looked like:
bits 64
global main
extern ExitProcess
extern GetStdHandle
extern WriteConsoleA
section .text
main:
sub rsp, 56 ; Allocate space for local variable(s)
; Allocate 32 bytes of space for shadow store
; Maintain 16 byte stack alignment for WinAPI/C library calls
; 56+8=64 . 64 is evenly divisible by 16.
mov rcx, -11
call GetStdHandle
mov rbx, rax
drawFrame:
mov r12, [sze]
l:
mov rcx, rbx
mov rdx, msg
mov r8, 1
lea r9, [rsp+40]
mov qword [rsp+32], 0
call WriteConsoleA
dec r12
jnz l
xor rcx, rcx
call ExitProcess
section .data
score dd 0
sze dq 20
msg db 0xdb
Important Note: In order to be compliant with the 64-bit Microsoft ABI you must maintain the 16 byte alignment of the stack pointer prior to calling a WinAPI or C library function. Upon calling the main function the stack pointer (RSP) was 16 byte aligned. At the point the main function starts executing the stack is misaligned by 8 because the 8 byte return address was pushed on the stack. 48+8=56 doesn't get you back on a 16 byte aligned stack address (56 is not evenly divisible by 16) but 56+8=64 does. 64 is evenly divisible by 16.

Related

Are I/O statements in FreeBasic compiled as function calls?

Example:
Dim x As Integer, y As Integer
Input "x=", x
y = x ^ 3 + 3 * x ^ 2 - 24 * x + 30
Print y
End
When I used FreeBasic compiler to generate the assembly code of this source code, I found
.globl _main
_main:
and
call ___main
in assembly code. In addition, it looks like that the Input statement is compiled as
call _fb_ConsoleInput#12
and
call _fb_InputInt#4
The "^" operator is compiled as
call _pow
(I am not sure whether the math function library of FreeBasic is integrated or external)
and the Print statement is compiled as
call _fb_PrintInt#12
and the End statement is compiled as
call _fb_End#4
The question is: How is FreeBasic source code compiled? Why _main and ___main appeared in assembly code? Are I/O statements compiled as function calls?
Reference: Assembly code generated by FreeBasic compiler
.intel_syntax noprefix
.section .text
.balign 16
.globl _main
_main:
push ebp
mov ebp, esp
and esp, 0xFFFFFFF0
sub esp, 20
mov dword ptr [ebp-4], 0
call ___main
push 0
push dword ptr [ebp+12]
push dword ptr [ebp+8]
call _fb_Init#12
.L_0002:
mov dword ptr [ebp-8], 0
mov dword ptr [ebp-12], 0
push -1
push 0
push 2
push offset _Lt_0004
call _fb_StrAllocTempDescZEx#8
push eax
call _fb_ConsoleInput#12
lea eax, [ebp-8]
push eax
call _fb_InputInt#4
push dword ptr [_Lt_0005+4]
push dword ptr [_Lt_0005]
fild dword ptr [ebp-8]
sub esp,8
fstp qword ptr [esp]
call _pow
add esp, 16
fild dword ptr [ebp-8]
fild dword ptr [ebp-8]
fxch st(1)
fmulp
fmul qword ptr [_Lt_0005]
fxch st(1)
faddp
mov eax, dword ptr [ebp-8]
imul eax, 24
push eax
fild dword ptr [esp]
add esp, 4
fxch st(1)
fsubrp
fadd qword ptr [_Lt_0006]
fistp dword ptr [ebp-12]
push 1
push dword ptr [ebp-12]
push 0
call _fb_PrintInt#12
push 0
call _fb_End#4
.L_0003:
push 0
call _fb_End#4
mov eax, dword ptr [ebp-4]
mov esp, ebp
pop ebp
ret
.section .data
.balign 4
_Lt_0004: .ascii "x=\0"
.balign 8
_Lt_0005: .quad 0x4008000000000000
.balign 8
_Lt_0006: .quad 0x403E000000000000
Yes, things like PRINT are implemented as function calls, though i am not sure why this matters to you unless you are currently learning assembly.
As for _main, that is the ASM name for the main() C function used as the main program.
On x86, it is common for global/exported function names in C to be preceded by _ in the ASM output.
___main is the ASM name for the __main() C function called by the MinGW C runtime library startup code before anything in _main is executed.
Again, you'll see the extra _ preceding the C function name.
After that is a call to fb_Init(argc, argv, FB_LANG_FB) to initialize the FreeBASIC runtime library with the default "fb" FreeBASIC dialect and argc elements in the argument vector argv.
The #12 means the argument list is 12 bytes long (e.g., 4+4+4=12 as with fb_Init here); see __stdcall | Microsoft Docs for more information on that.

InitializeCriticalSection fails in NASM

UPDATE: based on comments below, I revised the code below to add a struc and a pointer (new or revised code has "THIS IS NEW" or "THIS IS UPDATED" beside the code). Now the program does not crash, so the pointer is initialized, but the programs hangs at EnterCriticalSection. I suspect that in translating the sample MASM code below into NASM syntax, I did not declare the struc correctly. Any ideas? Thanks very much.
ORIGINAL QUESTION:
Below is a simple test program in 64-bit NASM, to test a critical section in
Windows. This is a dll and the entry point is Main_Entry_fn, which calls Init_Cores_fn, where we initialize four threads (cores) to call Test_fn.
I suspect that the problem is the pointer to the critical section. None of the online resources specifies what that pointer is. The doc "Using Critical Section Objects" at https://learn.microsoft.com/en-us/windows/desktop/sync/using-critical-section-objects shows a C++ example where the pointer appears to be relevant only to EnterCriticalSection and LeaveCriticalSection, but it's not a pointer to an independent object.
For those not familiar with NASM, the first parameter in a C++ signature goes into rcx and the second parameter goes into rds, but otherwise it should function the same as in C or C++. It's the same thing as InitializeCriticalSectionAndSpinCount(&CriticalSection,0x00000400) in C++.
Here's the entire program:
; Header Section
[BITS 64]
[default rel]
extern malloc, calloc, realloc, free
global Main_Entry_fn
export Main_Entry_fn
extern CreateThread, CloseHandle, ExitThread
extern WaitForMultipleObjects, GetCurrentThreadId
extern InitializeCriticalSectionAndSpinCount, EnterCriticalSection
extern LeaveCriticalSection, DeleteCriticalSection, InitializeCriticalSection
struc CRITICAL_SECTION ; THIS IS NEW
.cs_quad: resq 5
endstruc
section .data align=16
const_1000000000: dq 1000000000
ThreadID: dq 0
TestInfo: times 20 dq 0
ThreadInfo: times 3 dq 0
ThreadInfo2: times 3 dq 0
ThreadInfo3: times 3 dq 0
ThreadInfo4: times 3 dq 0
ThreadHandles: times 4 dq 0
Division_Size: dq 0
Start_Byte: dq 0
End_Byte: dq 0
Return_Data_Array: times 4 dq 0
Core_Number: dq 0
const_inf: dq 0xFFFFFFFF
SpinCount: dq 0x00000400
CriticalSection: ; THIS IS NEW
istruc CRITICAL_SECTION
iend
section .text
; ______________________________________
Init_Cores_fn:
; Calculate the data divisions
mov rax,[const_1000000000]
mov rbx,4 ;cores
xor rdx,rdx
div rbx
mov [End_Byte],rax
mov [Division_Size],rax
mov rax,0
mov [Start_Byte],rax
; Populate the ThreadInfo arrays to pass for each core
; ThreadInfo: (1) startbyte; (2) endbyte; (3) Core_Number
mov rdi,ThreadInfo
mov rax,[Start_Byte]
mov [rdi],rax
mov rax,[End_Byte]
mov [rdi+8],rax
mov rax,[Core_Number]
mov [rdi+16],rax
call DupThreadInfo ; Create ThreadInfo arrays for cores 2-4
mov rbp,rsp ; preserve caller's stack frame
sub rsp,56 ; Shadow space (was 32)
; _____
; Create four threads
label_0:
mov rax,[Core_Number]
cmp rax,0
jne sb2
mov rdi,ThreadInfo
jmp sb5
sb2:cmp rax,8
jne sb3
mov rdi,ThreadInfo2
jmp sb5
sb3:cmp rax,16
jne sb4
mov rdi,ThreadInfo3
jmp sb5
sb4:cmp rax,24
jne sb5
mov rdi,ThreadInfo4
sb5:
; _____
; Create Threads
mov rcx,0 ; lpThreadAttributes (Security Attributes)
mov rdx,0 ; dwStackSize
mov r8,Test_fn ; lpStartAddress (function pointer)
mov r9,rdi ; lpParameter (array of data passed to each core)
mov rax,0
mov [rsp+32],rax ; use default creation flags
mov rdi,ThreadID
mov [rsp+40],rdi ; ThreadID
call CreateThread
; Move the handle into ThreadHandles array (returned in rax)
mov rdi,ThreadHandles
mov rcx,[Core_Number]
mov [rdi+rcx],rax
mov rdi,TestInfo
mov [rdi+rcx],rax
mov rax,[Core_Number]
add rax,8
mov [Core_Number],rax
mov rbx,32 ; Four cores
cmp rax,rbx
jl label_0
mov rcx,CriticalSection ; THIS IS REVISED
mov rdx,[SpinCount]
call InitializeCriticalSectionAndSpinCount
; _____
; Wait
mov rcx,4 ;rax ; number of handles
mov rdx,ThreadHandles ; pointer to handles array
mov r8,1 ; wait for all threads to complete
mov r9,[const_inf] ;4294967295 ;0xFFFFFFFF
call WaitForMultipleObjects
; _____
mov rsp,rbp ; can we push rbp so we can use it internally?
jmp label_900
; ______________________________________
Test_fn:
mov rdi,rcx
mov r14,[rdi] ; Start_Byte
mov r15,[rdi+8] ; End_Byte
mov r13,[rdi+16] ; Core_Number
;______
; while(n < 1000000000)
label_401:
cmp r14,r15
jge label_899
mov rcx,CriticalSection
call EnterCriticalSection
; n += 1
add r14,1
mov rcx,CriticalSection
call LeaveCriticalSection
jmp label_401
;______
label_899:
mov rdi,Return_Data_Array
mov [rdi+r13],r14
mov rbp,ThreadHandles
mov rax,[rbp+r13]
call ExitThread
ret
; __________
label_900:
mov rcx,CriticalSection
call DeleteCriticalSection
mov rdi,Return_Data_Array
mov rax,rdi
ret
; __________
; Main Entry
Main_Entry_fn:
push rdi
push rbp
call Init_Cores_fn
pop rbp
pop rdi
ret
DupThreadInfo:
mov rdi,ThreadInfo2
mov rax,8
mov [rdi+16],rax ; Core Number
mov rax,[Start_Byte]
add rax,[Division_Size]
mov [rdi],rax
mov rax,[End_Byte]
add rax,[Division_Size]
mov [rdi+8],rax
mov [Start_Byte],rax
mov rdi,ThreadInfo3
mov rax,16
mov [rdi+16],rax ; Core Number
mov rax,[Start_Byte]
mov [rdi],rax
add rax,[Division_Size]
mov [rdi+8],rax
mov [Start_Byte],rax
mov rdi,ThreadInfo4
mov rax,24
mov [rdi+16],rax ; Core Number
mov rax,[Start_Byte]
mov [rdi],rax
add rax,[Division_Size]
mov [rdi+8],rax
mov [Start_Byte],rax
ret
The code above shows the functions in three separate places, but of course we test them one at a time (but they all fail).
To summarize, my question is why do InitializeCriticalSection and InitializeCriticalSectionAndSpinCount both fail in the code above? The inputs are dead simple, so I don't understand why it should not work.
InitializeCriticalSection take pointer to critical section object
The process is responsible for allocating the memory used by a
critical section object, which it can do by declaring a variable of
type CRITICAL_SECTION.
so code can be something like (i use masm syntax)
CRITICAL_SECTION STRUCT
DQ 5 DUP(?)
CRITICAL_SECTION ends
extern __imp_InitializeCriticalSection:QWORD
extern __imp_InitializeCriticalSectionAndSpinCount:QWORD
.DATA?
CriticalSection CRITICAL_SECTION {}
.CODE
lea rcx,CriticalSection
;mov edx,400h
;call __imp_InitializeCriticalSectionAndSpinCount
call __imp_InitializeCriticalSection
also you need declare all imported functions as
extern __imp_funcname:QWORD
instead
extern funcname

why if the number I enter gets to high it returns the wrong number [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I've got the following code but I can't work out why if the number I enter is too high does it return the wrong number. It might be because of the data types and dividing and multiplying but I can't work out exactly why. if you know of why I would be grateful for the help.
.586
.model flat, stdcall
option casemap :none
.stack 4096
extrn ExitProcess#4: proc
GetStdHandle proto :dword
ReadConsoleA proto :dword, :dword, :dword, :dword, :dword
WriteConsoleA proto :dword, :dword, :dword, :dword, :dword
STD_INPUT_HANDLE equ -10
STD_OUTPUT_HANDLE equ -11
.data
bufSize = 80
inputHandle DWORD ?
buffer db bufSize dup(?)
bytes_read DWORD ?
sum_string db "The number was ",0
outputHandle DWORD ?
bytes_written dd ?
actualNumber dw 0
asciiBuf db 4 dup (0)
.code
main:
invoke GetStdHandle, STD_INPUT_HANDLE
mov inputHandle, eax
invoke ReadConsoleA, inputHandle, addr buffer, bufSize, addr bytes_read,0
sub bytes_read, 2 ; -2 to remove cr,lf
mov ebx,0
mov al, byte ptr buffer+[ebx]
sub al,30h
add [actualNumber],ax
getNext:
inc bx
cmp ebx,bytes_read
jz cont
mov ax,10
mul [actualNumber]
mov actualNumber,ax
mov al, byte ptr buffer+[ebx]
sub al,30h
add actualNumber,ax
jmp getNext
cont:
invoke GetStdHandle, STD_OUTPUT_HANDLE
mov outputHandle, eax
mov eax,LENGTHOF sum_string ;length of sum_string
invoke WriteConsoleA, outputHandle, addr sum_string, eax, addr bytes_written, 0
mov ax,[actualNumber]
mov cl,10
mov bl,3
nextNum:
xor edx, edx
div cl
add ah,30h
mov byte ptr asciiBuf+[ebx],ah
dec ebx
mov ah,0
cmp al,0
ja nextNum
mov eax,4
invoke WriteConsoleA, outputHandle, addr asciiBuf, eax, addr bytes_written, 0
mov eax,0
mov eax,bytes_written
push 0
call ExitProcess#4
end main
Yes, it is plausible that your return value is capped by a maximum value. This maximum is either the BYTE boundary of 255 or the WORD boundary of 65536. Let me explain why, part by part:
mov inputHandle, eax
invoke ReadConsoleA, inputHandle, addr buffer, bufSize, addr bytes_read,0
sub bytes_read, 2 ; -2 to remove cr,lf
mov ebx,0
mov al, byte ptr buffer+[ebx]
sub al,30h
add [actualNumber],ax
In this part you are calling a Win32 API function, which always returns the return value in the register EAX. After it has returned, you assign the lower 8-bits of the 32-bit return value to byte ptr buffer+[ebx], subtract 30h from it. Then you MOV the 8-bit you just modified in AL and the 8-bit from the return-value preserved in AH as a block AX to a WORD variable by add [actualNumber],ax. So AH stems from the EAX return value and is quite of undefined. You may be lucky if it's 0, but that should not be assumed.
The next problem is the following sub-routine:
getNext:
inc bx
cmp ebx,bytes_read
jz cont
mov ax,10
mul [actualNumber]
mov actualNumber,ax
mov al, byte ptr buffer+[ebx]
sub al,30h
add actualNumber,ax
jmp getNext
You are moving the decimal base 10 to the WORD register AX and multiply it by the WORD variable [actualNumber]. So far, so good. But the result of a 16-bit*16-bit MUL is returned in the register pair AX:DX(lower:higher). So your mov actualNumber,ax solely MOVs the lower 16-bits to your variable (DX is ignored, limiting your result to result % 65536). So your maximum possible result is MAX_WORD = 65535. Everything else would just give you the modulo in AX.
After your mov al, byte ptr buffer+[ebx] your overwrite the lower 8-bits of this result with the BYTE pointed to by buffer[ebx] and then subtract 30h from it. Remember: the higher 8-bits of the result still remain in AH, the higher 8-bits of AX.
Then you (re)add this value to the variable actualNumber with add actualNumber,ax. Let me condense these last two paragraphs:
Operation | AX |
| AL AH |
mov actualNumber,ax | ................ |
mov al, byte ptr buffer+[ebx] | ........ AH |
sub al,30h | ....-30h AH |
add actualNumber,ax | ................ |
So, you are modifying the lower 8-bits of AX through AL and then add the higher 8-bits of actualNumber/AH to itself - effectively doubling AH and then adding this to actualNumber like this:
actualNumber = 256 * (2 * AH) + (byte ptr buffer[ebx]-30h) ; I doubt you want that ;-)
These problems may cause several deviations from the desired result.

NASM ReadConsoleA or WriteConsoleA Buffer Debugging Issue

I am writing a NASM Assembly program on Windows to get the user to enter in two single digit numbers, add these together and then output the result. I am trying to use the Windows API for input and output.
Unfortunately, whilst I can get it to read in one number as soon as the program loops round to get the second the program ends rather than asking for the second value.
The output of the program shown below:
What is interesting is that if I input 1 then the value displayed is one larger so it is adding to something!
This holds for other single digits (2-9) entered as well.
I am pretty sure it is related to how I am using the ReadConsoleA function but I have hit a bit of a wall attempting to find a solution. I have installed gdb to debug the program and assembled it as follows:
nasm -f win64 -g -o task9.obj task9.asm
GoLink /console /entry _main task9.obj kernel32.dll
gdb task9
But I just get the following error:
"C:\Users\Administrator\Desktop/task9.exe": not in executable format: File format not recognized
I have since read that NASM doesn't output the debug information needed for the Win64 format but I am not 100% sure about that. I am fairly sure I have the 64-bit version of GDB installed:
My program is as follows:
extern ExitProcess ;windows API function to exit process
extern WriteConsoleA ;windows API function to write to the console window (ANSI version)
extern ReadConsoleA ;windows API function to read from the console window (ANSI version)
extern GetStdHandle ;windows API to get the for the console handle for input/output
section .data ;the .data section is where variables and constants are defined
STD_OUTPUT_HANDLE equ -11
STD_INPUT_HANDLE equ -10
digits db '0123456789' ;list of digits
input_message db 'Please enter your next number: '
length equ $-input_message
section .bss ;the .bss section is where space is reserved for additional variables
input_buffer: resb 2 ;reserve 64 bits for user input
char_written: resb 4
chars: resb 1 ;reversed for use with write operation
section .text ;the .text section is where the program code goes
global _main ;tells the machine which label to start program execution from
_num_to_str:
cmp rax, 0 ;compare value in rax to 0
jne .convert ;if not equal then jump to label
jmp .output
.convert:
;get next digit value
inc r15 ;increment the counter for next digit
mov rcx, 10
xor rdx, rdx ;clear previous remainder result
div rcx ;divide value in rax by value in rcx
;quotient (result) stored in rax
;remainder stored in rdx
push rdx ;store remainder on the stack
jmp _num_to_str
.output:
pop rdx ;get the last digit from the stack
;convert digit value to ascii character
mov r10, digits ;load the address of the digits into rsi
add r10, rdx ;get the character of the digits string to display
mov rdx, r10 ;digit to print
mov r8, 1 ;one byte to be output
call _print
;decide whether to loop
dec r15 ;reduce remaining digits (having printed one)
cmp r15, 0 ;are there digits left to print?
jne .output ;if not equal then jump to label output
ret
_print:
;get the output handle
mov rcx, STD_OUTPUT_HANDLE ;specifies that the output handle is required
call GetStdHandle ;returns value for handle to rax
mov rcx, rax
mov r9, char_written
call WriteConsoleA
ret
_read:
;get the input handle
mov rcx, STD_INPUT_HANDLE ;specifies that the input handle is required
call GetStdHandle
;get value from keyboard
mov rcx, rax ;place the handle for operation
mov rdx, input_buffer ;set name to receive input from keyboard
mov r8, 2 ;max number of characters to read
mov r9, chars ;stores the number of characters actually read
call ReadConsoleA
movzx r12, byte[input_buffer]
ret
_get_value:
mov rdx, input_message ;move the input message into rdx for function call
mov r8, length ;load the length of the message for function call
call _print
xor r8, r8
xor r9, r9
call _read
.end:
ret
_main:
mov r13, 0 ;counter for values input
mov r14, 0 ;total for calculation
.loop:
xor r12, r12
call _get_value ;get value from user
sub r12, '0' ;convert char to integer
add r14, r12 ;add value to total
;decide whether to loop for another character or not
inc r13
cmp r13, 2
jne .loop
;convert total to ASCII value
mov rax, r14 ;num_to_str expects total in rax
mov r15, 0 ;num_to_str uses r15 as a counter - must be initialised
call _num_to_str
;exit the program
mov rcx, 0 ;exit code
call ExitProcess
I would really appreciate any assistance you can offer either with resolving the issue or how to resolve the issue with gdb.
I found the following issues with your code:
Microsoft x86-64 convention mandates rsp be 16 byte aligned.
You must reserve space for the arguments on the stack, even if you pass them in registers.
Your chars variable needs 4 bytes not 1.
ReadConsole expects 5 arguments.
You should read 3 bytes because ReadConsole returns CR LF. Or you could just ignore leading whitespace.
Your _num_to_str is broken if the input is 0.
Based on Jester's suggestions this is the final program:
extern ExitProcess ;windows API function to exit process
extern WriteConsoleA ;windows API function to write to the console window (ANSI version)
extern ReadConsoleA ;windows API function to read from the console window (ANSI version)
extern GetStdHandle ;windows API to get the for the console handle for input/output
section .data ;the .data section is where variables and constants are defined
STD_OUTPUT_HANDLE equ -11
STD_INPUT_HANDLE equ -10
digits db '0123456789' ;list of digits
input_message db 'Please enter your next number: '
length equ $-input_message
NULL equ 0
section .bss ;the .bss section is where space is reserved for additional variables
input_buffer: resb 3 ;reserve 64 bits for user input
char_written: resb 4
chars: resb 4 ;reversed for use with write operation
section .text ;the .text section is where the program code goes
global _main ;tells the machine which label to start program execution from
_num_to_str:
sub rsp, 32
cmp rax, 0
jne .next_digit
push rax
inc r15
jmp .output
.next_digit:
cmp rax, 0 ;compare value in rax to 0
jne .convert ;if not equal then jump to label
jmp .output
.convert:
;get next digit value
inc r15 ;increment the counter for next digit
mov rcx, 10
xor rdx, rdx ;clear previous remainder result
div rcx ;divide value in rax by value in rcx
;quotient (result) stored in rax
;remainder stored in rdx
sub rsp, 8 ;add space on stack for value
push rdx ;store remainder on the stack
jmp .next_digit
.output:
pop rdx ;get the last digit from the stack
add rsp, 8 ;remove space from stack for popped value
;convert digit value to ascii character
mov r10, digits ;load the address of the digits into rsi
add r10, rdx ;get the character of the digits string to display
mov rdx, r10 ;digit to print
mov r8, 1 ;one byte to be output
call _print
;decide whether to loop
dec r15 ;reduce remaining digits (having printed one)
cmp r15, 0 ;are there digits left to print?
jne .output ;if not equal then jump to label output
add rsp, 32
ret
_print:
sub rsp, 40
;get the output handle
mov rcx, STD_OUTPUT_HANDLE ;specifies that the output handle is required
call GetStdHandle ;returns value for handle to rax
mov rcx, rax
mov r9, char_written
mov rax, qword 0 ;fifth argument
mov qword [rsp+0x20], rax
call WriteConsoleA
add rsp, 40
ret
_read:
sub rsp, 40
;get the input handle
mov rcx, STD_INPUT_HANDLE ;specifies that the input handle is required
call GetStdHandle
;get value from keyboard
mov rcx, rax ;place the handle for operation
xor rdx, rdx
mov rdx, input_buffer ;set name to receive input from keyboard
mov r8, 3 ;max number of characters to read
mov r9, chars ;stores the number of characters actually read
mov rax, qword 0 ;fifth argument
mov qword [rsp+0x20], rax
call ReadConsoleA
movzx r12, byte[input_buffer]
add rsp, 40
ret
_get_value:
sub rsp, 40
mov rdx, input_message ;move the input message into rdx for function call
mov r8, length ;load the length of the message for function call
call _print
call _read
.end:
add rsp, 40
ret
_main:
sub rsp, 40
mov r13, 0 ;counter for values input
mov r14, 0 ;total for calculation
.loop:
call _get_value ;get value from user
sub r12, '0' ;convert char to integer
add r14, r12 ;add value to total
;decide whether to loop for another character or not
inc r13
cmp r13, 2
jne .loop
;convert total to ASCII value
mov rax, r14 ;num_to_str expects total in rax
mov r15, 0 ;num_to_str uses r15 as a counter - must be initialised
call _num_to_str
;exit the program
mov rcx, 0 ;exit code
call ExitProcess
add rsp, 40
ret
As it turned out I was actually missing a 5th argument in the WriteConsole function as well.

x64 asm ret lands in no mans land

I assumed I had push'ed something without popping it, or vice versa, but I can't find anything wrong! I write to the console with a call to a dll that links properly, and I inexplicably am in no mans land... (address 0x0000000000000000)
I've put some sleeps in, and I'm sure that the api call WriteConsoleA is returning. It's on my last ret under the print function.
Any ideas?
.exe:
extern FreeConsole
extern Sleep
extern ExitProcess
extern print
extern newconsole
extern strlen
section .data BITS 64
title: db 'Consolas!',0
message: db 'Hello, world',0,0
section .text bits 64
global Start
Start:
mov rcx, title
call newconsole
mov rcx, 1000
call Sleep
mov rcx, message
call print
mov rcx, 10000
call Sleep
call FreeConsole
xor rcx, rcx
call ExitProcess
.dll:
extern AllocConsole
extern SetConsoleTitleA
extern GetStdHandle
extern WriteConsoleA
extern Sleep
export newconsole
export strlen
export print
section .data BITS 64
console.writehandle: dq 0
console.readhandle: dq 0
console.write.result: dq 0
section .text BITS 64
global strlen
strlen:
push rax
push rdx
push rdi
mov rdi, rcx
xor rax, rax
mov rcx, dword -1
cld
repnz scasb
neg rcx
sub rcx, 2
pop rdi
pop rdx
pop rax
ret
global print
print:
mov rbp, rsp
push rcx
call strlen
mov r8, rcx
pop rdx
mov rcx, [console.writehandle]
mov r9, console.write.result
push qword 0
call WriteConsoleA
ret
global newconsole
newconsole:
push rax
push rcx
call AllocConsole
pop rcx
call SetConsoleTitleA
mov rcx, -11
call GetStdHandle
mov [console.writehandle], rax
pop rax
ret
I assume you're talking about this function:
global print
print:
mov rbp, rsp
push rcx
call strlen
mov r8, rcx
pop rdx
mov rcx, [console.writehandle]
mov r9, console.write.result
push qword 0
call WriteConsoleA
ret
The x64 ABI requires that stack space is reserved even for parameters passed in registers. WriteConsoleA is free to use those stack locations for whatever it wants - so you need to make sure that you've adjusted the stack appropriately. As it stands, you're pushing only the last reserved pointer parameter. I think something like the following will do the trick for you:
push qword 0
sub rsp, 4 * 8 // reserve stack for register parameters
call WriteConsoleA
mov rsp, rbp // restore rsp
ret
See http://msdn.microsoft.com/en-us/library/ms235286.aspx (emphasis added):
The x64 Application Binary Interface (ABI) is a 4 register fast-call calling convention, with stack-backing for those registers.
...
The caller is responsible for allocating space for parameters to the callee, and must always allocate sufficient space for the 4 register parameters, even if the callee doesn’t have that many parameters.
According to calling convention, you have to clean up arguments you put on the stack. In this case that applies to the 5th argument to WriteConsoleA. Since you have a copy of original rsp in rbp, you can reload rsp from rbp, or just add 8 after the call.

Resources