Calling LONGLONG RtlLargeIntegerDivide(LONGLONG, LONGLONG, LONGLONG*) in NASM (stdcall) - winapi

I'm trying to call the following function:
long long RtlLargeIntegerDivide(long long dividend, long long divisor, long long* pRemainder)
in assembly code (NASM). It uses the stdcall calling convention, and it returns the quotient. These are the specifications:
Input: [EDX,EAX] (dividend), [ECX,EBX] (divisor)
Output: [EDX,EAX] (quotient), [ECX,EBX] (remainder)
How do I go about doing this? (My main problem is not exactly understanding EBP and ESP, and how they relate to local variables.)
(And no, this isn't homework; I'm trying to implement a wrapper C run-time library.)
Thank you!

In 32 bit mode you don't have to use EBP to access local variables at all, that is just a convention remnant from 16 bit times and doesn't concern us now anyway.
ESP is your stack pointer, I assume you know that. You can "allocate" space for your local variables by decrementing ESP.
stdcall calling convention uses the stack for argument passing. They are in normal order when on the stack, but if you are using PUSH that means you push them reversed. Integral return values are in EAX (and EDX when necessary). The called function cleans up the arguments from the stack.
So the following code should do what you want:
sub ESP, 8; make room for remainder
push ESP ; pass pointer to remainder as argument
push ECX
push EBX ; pass divisor argument
push EDX
push EAX ; pass dividend argument
call RtlLargeIntegerDivide
; quotient returned in EDX:EAX
; so just load remainder from stack
pop EBX
pop ECX
(For speed, you can use MOV instead of PUSH/POP)

Related

How to display results of a column sum in masm [duplicate]

Fairly simple question this time. How do I write to screen the contents of a single register in Assembly? I'm getting a bit tired of calling DumpRegs just to see the value of one register.
I'm using x86 architecture, and MASM in Visual Studio, and Irvine32.lib.
Irvines's DumpReg uses repeatedly a macro of Macros.inc: mShowRegister. It can be used directly. Example:
INCLUDE Irvine32.inc
INCLUDE Macros.inc
.code
main PROC
mov esi, 0DeadBeefh
mShowRegister ESI, ESI
exit
main ENDP
END main
A documented macro with more options is mShow. Example:
INCLUDE Irvine32.inc
INCLUDE Macros.inc
.code
main PROC
mov esi, 0DeadBeefh
mshow ESI, h
exit
main ENDP
END main
Irvine32 has output functions that take a value in EAX, such as WriteDec (unsigned base 10).
See the documentation http://programming.msjc.edu/asm/help/source/irvinelib/writedec.htm which has links to WriteHex (hex = base 16), and WriteInt (signed base 10).
These functions use the Irvine32 calling convention and preserve all registers, including the arg-passing register EAX, so you can even insert them as a debug-print for EAX at least. More usually you'd use them as simply normal output functions.
Usually for asm debugging you'd use a debugger, not debug-print function calls.

Why does generated assembly mov edi to variable on stack?

I am a newcomer to assembly trying to understand the objdump of the following function:
int nothing(int num) {
return num;
}
This is the result (linux, x86-64, gcc 8):
push rbp
mov rbp,rsp
mov DWORD PTR [rbp-0x4],edi
mov eax,DWORD PTR [rbp-0x4]
pop rbp
ret
My questions are:
1. Where does edi come from? Reading through some intro docs, I was under the impression that [rbp-0x4] would contain num.
2. From the above, apparently edi contains the argument. But then what role does [rbp-0x4] play? Why not just mov eax, edi?
Thanks!
Where does edi come from?
... From the above, apparently edi contains the argument.
This is the calling convention (for Linux and many other OSs):
All programming languages for these OSs pass the first parameter in rdi. The result (value returned) is passed in rax.
And because your C compiler interprets int as 32 bits, only the low 32 bits of rdi and rax are used - which is edi and eax.
Programming languages for Windows pass the first parameter in rcx...
But then what role does [rbp-0x4] play?
Using rbp has mainly historic reasons here. In 16-bit code (as it was used in 1980s and 1990s PCs) it was not possible to address data on the stack using the sp register (which corresponds to rsp). The only register that allowed addressing values on the stack easily was the bp register (corresponding to rbp).
And even in 32- or 64-bit code it is more difficult to write a compiler that addresses local variables (on the stack) using rsp rather than using rbp.
The compiler generates the first 3 instructions of assembler code before it knows what is done in the C function. The compiler puts the value on the stack because you could do something like address = &num in the code. This is however not possible when num is in a register but only when num is located in the memory.
Why not just mov eax, edi?
If you tell the compiler to optimize the code, it will first check the content of the C function before generating the first assembler instruction. It will find out that it is not required to put the value into the memory.
In this case the code will indeed look like this:
mov eax, edi
ret

LOOP is done only one time when single stepping in Turbo Debugger

The code must output 'ccb',but output only 'c', LOOP is done only one time, i have calibrated in TD, but why LOOP is done only one time?
I THINK THAT I MUST TO DECREMENT STRING_LENGTH, SO I WROTE
DEC STRING_LENGTH
BUT IT NOT WORK, SO I WROTE LIKE THAT
MOV SP,STRING_LENGTH
DEC SP
MOV STRING_LENGTH,SP
I KNOW WHAT ARE YOU THINKING RIGHT NOW, THAT IS SO INCORRECT, YOU ARE RIGHT)))
I CAN USE C++, BUT I WANT TO DO IT ONLY IN ASSEMBLY,
DOSSEG
.MODEL SMALL
.STACK 200H
.DATA
STRING DB 'cScbd$'
STRING_LENGTH EQU $-STRING
STRING1 DB STRING_LENGTH DUP (?) , '$'
.CODE
MOV AX,#DATA
MOV DS,AX
XOR SI,SI
XOR DI,DI
MOV CX,STRING_LENGTH
S:
MOV BL,STRING[DI]
AND STRING[DI],01111100B
CMP STRING[DI],01100000B
JNE L1
MOV AL,BL
MOV STRING1[SI],AL
ADD SI,2
L1:
ADD DI,2
LOOP S
MOV DL,STRING1
MOV AH,9
INT 21H
MOV AH,4CH
INT 21H
END
In Turbo Debugger (TD.EXE) the F8 "F8 step" will execute the loop completely, until the cx becomes zero (you can even create infinite loop by updating cx back to some value, preventing it from reaching the 1 -> 0 step).
To get "single-step" out of the loop instruction, use the F7 "F7 trace" - that will cause the cx to go from 6 to 5, and the code pointer will follow the jump back on the start of the loop.
About some other issues of your code:
MOV SP,STRING_LENGTH
DEC SP
MOV STRING_LENGTH,SP
sp is not general purpose register, don't use it for calculation like this. Whenever some instruction does use stack implicitly (push, pop, call, ret, ...), the values are being written and read in memory area addressed by the ss:sp register pair, so by manipulating the sp value you are modifying the current "stack".
Also in 16 bit x86 real mode all the interrupts (keyboard, timer, ...), when they occur, the current state of flag register and code address is stored into stack, before giving the control to the interrupt handler code, which usually will push additional values to the stack, so whatever is in memory on addresses below current ss:sp is not safe in 16 bit x86 real mode, and the memory content keeps "randomly" changing there by all the interrupts being executed meanwhile (the TD.EXE itself does use part of this stack memory after every single step).
For arithmetic use other registers, not sp. Once you will know enough about "stack", you will understand what kind of sp manipulation is common and why (like sub sp,40 at beginning of function which needs additional "local" memory space), and how to restore stack back into expected state.
One more thing about that:
MOV SP,STRING_LENGTH
DEC SP
MOV STRING_LENGTH,SP
The STRING_LENGTH is defined by EQU, which makes it compile time constant, and only compile time. It's not "variable" (memory allocation), contrary to the things like someLabel dw 1345, which cause the assembler to emit two bytes with values 0100_0001B, 0000_0101B (when read as 16 bit word in little-endian way, that's value 1345 encoded), and the first byte address has symbolic name someLabel, which can be used in further instructions, like dec word ptr [someLabel] to decrement that value in memory from 1345 to 1344 during runtime.
But EQU is different, it assigns the symbol STRING_LENGTH final value, like 14.
So your code can be read as:
mov sp,14 ; makes almost sense, (practically destroys stack setup)
dec sp ; still valid
mov 14,sp ; doesn't make any sense, constant can't be destination for MOV

Register ESI causes RunTime-Check Failure #0 error

I've spend lot of time trying to solve this problem and I don't understand, why it doesn't work. Problem's description is in comments below:
.386
.MODEL FLAT, STDCALL
OPTION CASEMAP:NONE
.NOLIST
.NOCREF
INCLUDE \masm32\include\windows.inc
.LIST
.CODE
DllEntry PROC hInstDLL:HINSTANCE, reason:DWORD, reserved1:DWORD
mov eax, TRUE
ret
DllEntry ENDP
caesarAsm proc string: DWORD, key: DWORD, stringLength : DWORD
mov esi, 1 ; I cannot use this register, mov esi, (anything) causes Crash:
; Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention
mov eax, string
ret
caesarAsm endp
END DllEntry
I searched "whole" Internet, I found that the problem is connected with stack, but no operation on stacks helped me to solve it.
I'm using Microsoft Visual Studio 2012
I assume the error does not occur in this function, rather, it is triggered elsewhere. The esi register is a callee-saved register. You must make sure its value is the same at function exit as it was at entry. You can use it in your function, but you must save and restore its value. Such as:
push esi
mov esi, 1
mov eax, string
pop esi
ret
This is all well documented. You may only use eax, ecx and edx without saving.
Side note: you are using high-level features of your assembler, you might want to check the actual generated code or refrain from using them until you are confident in what the result is going to be. Incidentally masm has a USES keyword which would do the save/restore for you.

Malloc and Free multiple arrays in assembly

I'm trying to experiment with malloc and free in assembly code (NASM, 64 bit).
I have tried to malloc two arrays, each with space for 2 64 bit numbers. Now I would like to be able to write to their values (not sure if/how accessing them will work exactly) and then at the end of the whole program or in the case of an error at any point, free the memory.
What I have now works fine if there is one array but as soon as I add another, it fails on the first attempt to deallocate any memory :(
My code is currently the following:
extern printf, malloc, free
LINUX equ 80H ; interupt number for entering Linux kernel
EXIT equ 60 ; Linux system call 1 i.e. exit ()
segment .text
global main
main:
push dword 16 ; allocate 2 64 bit numbers
call malloc
add rsp, 4 ; Undo the push
test rax, rax ; Check for malloc failure
jz malloc_fail
mov r11, rax ; Save base pointer for array
; DO SOME CODE/ACCESSES/OPERATIONS HERE
push dword 16 ; allocate 2 64 bit numbers
call malloc
add rsp, 4 ; Undo the push
test rax, rax ; Check for malloc failure
jz malloc_fail
mov r12, rax ; Save base pointer for array
; DO SOME CODE/ACCESSES/OPERATIONS HERE
malloc_fail:
jmp dealloc
; Finish Up, deallocate memory and exit
dealloc:
dealloc_1:
test r11, r11 ; Check that the memory was originally allocated
jz dealloc_2 ; If not, try the next block of memory
push r11 ; push the address of the base of the array
call free ; Free this memory
add rsp, 4
dealloc_2:
test r12, r12
jz dealloc_end
push r12
call free
add rsp, 4
dealloc_end:
call os_return ; Exit
os_return:
mov rax, EXIT
mov rdi, 0
syscall
I'm assuming the above code is calling the C functions malloc() and free()...
If 1st malloc() fails, you arrive at dealloc_1 with whatever garbage is in r11 and r12 after returning from the malloc().
If 2nd malloc() fails, you arrive at dealloc_1 with whatever garbage is in r12 after returning from the malloc().
Therefore, you have to zero out r11 and r12 before doing the first allocation.
Since this is 64-bit mode, all pointers/addresses and sizes are normally 64-bit. When you pass one of those to a function, it has to be 64-bit. So, push dword 16 isn't quite right. It should be push qword 16 instead. Likewise, when you are removing these parameters from the stack, you have to remove exactly as many bytes as you've put there, so add rsp, 4 must change to add rsp, 8.
Finally, I don't know which registers malloc() and free() preserve and which they don't. You may need to save and restore the so-called volatile registers (see your C compiler documentation). The same holds for the code not shown. It must preserve r11 and r12 so they can be used for deallocation. EDIT: And I'd check if it's the right way of passing parameters through the stack (again, see your compiler documentation).
EDIT: you're testing r11 for 0 right before 2nd free(). It should be r12. But free() doesn't really mind receiving NULL pointers. So, these checks can be removed.
Pay attention to your code.
You have to obey x86-64 calling conventions: arguments might be passed through registers, in the case of malloc that would be RDI for the size. And as already pointed out, you have to watch out which registers are preserved by the called functions. (afaik only RBP, RSP and R12-R15 are preserved across function calls)
There are at least two bugs, because you test r11 again (the line test r11,r11 after dealloc_2:, but you supposedly wanted to test r12 here. Additionally you want to push a qword, if you are in 64 bit mode.
The reason the deallocation doesn't work at all may be because you are changing the contents of r11 or r12.
Not that both tests are not needed, as it is perfectly safe to call free with an null pointer.

Resources