NASM: why is cmp byte[rdi], byte[rsi] not compiling? - compilation

Why this following nasm code does not compile :
cmp byte [rdi], byte [rsi]
whereas this code compiles:
mov al, byte [rsi]
cmp byte [rdi], al

Two indirect arguments are not supported. And why not use cmpsb, which compares these two bytes directly?

Related

Why does a function double dereference arguments stored on stack and how is that possible? [duplicate]

This question already has answers here:
Basic use of immediates vs. square brackets in YASM/NASM x86 assembly
(4 answers)
x86 Nasm assembly - push'ing db vars on stack - how is the size known?
(2 answers)
Referencing the contents of a memory location. (x86 addressing modes)
(2 answers)
Why do you have to dereference the label of data to store something in there: Assembly 8086 FASM
(1 answer)
Closed 7 months ago.
I tried to understand "lfunction" stack arguments loading to "flist" in following assembly code I found on a book (The book doesn't explain it. Code compiles and run without errors giving intended output displaying "The string is: ABCDEFGHIJ".) but I can't grasp the legality or logic of the code. What I don't understand is listed below.
In lfunction:
Non-volatile (as per Microsoft x64 calling convention) register RBX is not backed up before 'XOR'ing. (But it is not what bugs me most.)
In portion ";arguments on stack"
mov rax, qword [rbp+8+8+32]
mov bl,[rax]
Here [rbp+8+8+32] dereferences corresponding address stored in stack so RAX should
be loaded with value represented by'fourth' which is char 'D'(0x44) as per my understanding (Why qword?). And if so, what dereferencing char 'D' in second line can possibly mean (There should be a memory address to dereference but 'D' is a char.)?
Original code is listed below:
%include "io64.inc"
; stack.asm
extern printf
section .data
first db "A"
second db "B"
third db "C"
fourth db "D"
fifth db "E"
sixth db "F"
seventh db "G"
eighth db "H"
ninth db "I"
tenth db "J"
fmt db "The string is: %s",10,0
section .bss
flist resb 14 ;length of string plus end 0
section .text
global main
main:
push rbp
mov rbp,rsp
sub rsp, 8
mov rcx, flist
mov rdx, first
mov r8, second
mov r9, third
push tenth ; now start pushing in
push ninth ; reverse order
push eighth
push seventh
push sixth
push fifth
push fourth
sub rsp,32 ; shadow
call lfunc
add rsp,32+8
; print the result
mov rcx, fmt
mov rdx, flist
sub rsp,32+8
call printf
add rsp,32+8
leave
ret
;––––––––––––––––––––––––-
lfunc:
push rbp
mov rbp,rsp
xor rax,rax ;clear rax (especially higher bits)
;arguments in registers
mov al,byte[rdx] ; move content argument to al
mov [rcx], al ; store al to memory(resrved at section .bss)
mov al, byte[r8]
mov [rcx+1], al
mov al, byte[r9]
mov [rcx+2], al
;arguments on stack
xor rbx,rbx
mov rax, qword [rbp+8+8+32] ; rsp + rbp + return address + shadow
mov bl,[rax]
mov [rcx+3], bl
mov rax, qword [rbp+48+8]
mov bl,[rax]
mov [rcx+4], bl
mov rax, qword [rbp+48+16]
mov bl,[rax]
mov [rcx+5], bl
mov rax, qword [rbp+48+24]
mov bl,[rax]
mov [rcx+6], bl
mov rax, qword [rbp+48+32]
mov bl,[rax]
mov [rcx+7], bl
mov rax, qword [rbp+48+40]
mov bl,[rax]
mov [rcx+8], bl
mov rax, qword [rbp+48+48]
mov bl,[rax]
mov [rcx+9], bl
mov bl,0 ; terminating zero
mov [rcx+10], bl
leave
ret
Additional info:
I cannot look at register values just after line 50 which
corresponds to "XOR RAX, RAX" in lfunc because debugger auto skips
single stepping to line 37 of main function which corresponds to
"add RSP, 32+8". Even If I marked breakpoints in between
aforementioned lines in lfunc code the debugger simply hangs so I
have to manually abort debugging.
In portion ";arguments on stack"
mov rax, qword [rbp+8+8+32]
mov bl,[rax]
I am mentioning this again to be more precise of what am asking because question was marked as duplicate and
provided links with answers that doesn't address my specific issue. At line
[rbp+8+8+32] == 0x44 because clearly, mov with square brackets dereferences reference address (which I assume 64bit width) rbp+3h. So, the size of 0x44 is byte. That is why ask "Why qword?" because it implies "lea [rbp+8+8+32]" which is a qword reference, not mov. So if [rbp+8+8+32] equals 0x44, then [rax] == [0x0000000000000044], which a garbage ( not relevant to our code here) address.

Segmentation fault when adding 2 digits - nasm MacOS x86_64

I am trying to write a program that accepts 2 digits as user input, and then outputs their sum. I keep getting segmentation error when trying to run program(I am able to input 2 digits, but then the program crashes). I already check answers to similar questions and many of them pointed out to clear the registers, which I did, but I am still getting a segmentation fault.
section .text
global _main ;must be declared for linker (ld)
default rel
_main: ;tells linker entry point
call _readData
call _readData1
call _addData
call _displayData
mov RAX, 0x02000001 ;system call number (sys_exit)
syscall
_addData:
mov byte [sum], 0 ; init sum with 0
lea EAX, [buffer] ; load value from buffer to register
lea EBX, [buffer1] ; load value from buffer1 to register
sub byte [EAX], '0' ; transfrom to digit
sub byte [EBX], '0' ; transform to digit
add [sum], EAX ; increment value of sum by value from register
add [sum], EBX ; increment value of sum by value from 2nd register
add byte [sum], '0' ; convert to ASCI
xor EAX, EAX ; clear registers
xor EBX, EBX ; clear registers
ret
_readData:
mov RAX, 0x02000003
mov RDI, 2
mov RSI, buffer
mov RDX, SIZE
syscall
ret
_readData1:
mov RAX, 0x02000003
mov RDI, 2
mov RSI, buffer1
mov RDX, SIZE
syscall
ret
_displayData:
mov RAX, 0x02000004
mov RDI, 1
mov RSI, sum
mov RDX, SIZE
syscall
ret
section .bss
SIZE equ 4
buffer: resb SIZE
buffer1: resb SIZE
sum: resb SIZE
I see that, unlike other languages I learned, it is quite difficult to find a good source /tutorial about programming assembly using nasm on x86_64 architecture. Is there any kind of walkthrough for beginners(so I do not need to ask on SO everytime I am stuck :D)

Best way to load a byte for write

To read and then write a byte from and to memory, you can choose between a few options, among them:
Load a 32-bit value, write out a byte:
mov eax, [rdi]
mov byte [rsi], al
Load a byte, write a byte:
mov al, byte [rdi]
mov byte [rsi], al
Load a byte, zero extended, write a byte:
movzx eax, byte [rdi]
mov byte [rsi], al
All the approaches are the same on the write side, but they use different ways of reading the value. Is there any reason to prefer one over the others, particularly performance-wise?

why if the number I enter gets to high it returns the wrong number [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I've got the following code but I can't work out why if the number I enter is too high does it return the wrong number. It might be because of the data types and dividing and multiplying but I can't work out exactly why. if you know of why I would be grateful for the help.
.586
.model flat, stdcall
option casemap :none
.stack 4096
extrn ExitProcess#4: proc
GetStdHandle proto :dword
ReadConsoleA proto :dword, :dword, :dword, :dword, :dword
WriteConsoleA proto :dword, :dword, :dword, :dword, :dword
STD_INPUT_HANDLE equ -10
STD_OUTPUT_HANDLE equ -11
.data
bufSize = 80
inputHandle DWORD ?
buffer db bufSize dup(?)
bytes_read DWORD ?
sum_string db "The number was ",0
outputHandle DWORD ?
bytes_written dd ?
actualNumber dw 0
asciiBuf db 4 dup (0)
.code
main:
invoke GetStdHandle, STD_INPUT_HANDLE
mov inputHandle, eax
invoke ReadConsoleA, inputHandle, addr buffer, bufSize, addr bytes_read,0
sub bytes_read, 2 ; -2 to remove cr,lf
mov ebx,0
mov al, byte ptr buffer+[ebx]
sub al,30h
add [actualNumber],ax
getNext:
inc bx
cmp ebx,bytes_read
jz cont
mov ax,10
mul [actualNumber]
mov actualNumber,ax
mov al, byte ptr buffer+[ebx]
sub al,30h
add actualNumber,ax
jmp getNext
cont:
invoke GetStdHandle, STD_OUTPUT_HANDLE
mov outputHandle, eax
mov eax,LENGTHOF sum_string ;length of sum_string
invoke WriteConsoleA, outputHandle, addr sum_string, eax, addr bytes_written, 0
mov ax,[actualNumber]
mov cl,10
mov bl,3
nextNum:
xor edx, edx
div cl
add ah,30h
mov byte ptr asciiBuf+[ebx],ah
dec ebx
mov ah,0
cmp al,0
ja nextNum
mov eax,4
invoke WriteConsoleA, outputHandle, addr asciiBuf, eax, addr bytes_written, 0
mov eax,0
mov eax,bytes_written
push 0
call ExitProcess#4
end main
Yes, it is plausible that your return value is capped by a maximum value. This maximum is either the BYTE boundary of 255 or the WORD boundary of 65536. Let me explain why, part by part:
mov inputHandle, eax
invoke ReadConsoleA, inputHandle, addr buffer, bufSize, addr bytes_read,0
sub bytes_read, 2 ; -2 to remove cr,lf
mov ebx,0
mov al, byte ptr buffer+[ebx]
sub al,30h
add [actualNumber],ax
In this part you are calling a Win32 API function, which always returns the return value in the register EAX. After it has returned, you assign the lower 8-bits of the 32-bit return value to byte ptr buffer+[ebx], subtract 30h from it. Then you MOV the 8-bit you just modified in AL and the 8-bit from the return-value preserved in AH as a block AX to a WORD variable by add [actualNumber],ax. So AH stems from the EAX return value and is quite of undefined. You may be lucky if it's 0, but that should not be assumed.
The next problem is the following sub-routine:
getNext:
inc bx
cmp ebx,bytes_read
jz cont
mov ax,10
mul [actualNumber]
mov actualNumber,ax
mov al, byte ptr buffer+[ebx]
sub al,30h
add actualNumber,ax
jmp getNext
You are moving the decimal base 10 to the WORD register AX and multiply it by the WORD variable [actualNumber]. So far, so good. But the result of a 16-bit*16-bit MUL is returned in the register pair AX:DX(lower:higher). So your mov actualNumber,ax solely MOVs the lower 16-bits to your variable (DX is ignored, limiting your result to result % 65536). So your maximum possible result is MAX_WORD = 65535. Everything else would just give you the modulo in AX.
After your mov al, byte ptr buffer+[ebx] your overwrite the lower 8-bits of this result with the BYTE pointed to by buffer[ebx] and then subtract 30h from it. Remember: the higher 8-bits of the result still remain in AH, the higher 8-bits of AX.
Then you (re)add this value to the variable actualNumber with add actualNumber,ax. Let me condense these last two paragraphs:
Operation | AX |
| AL AH |
mov actualNumber,ax | ................ |
mov al, byte ptr buffer+[ebx] | ........ AH |
sub al,30h | ....-30h AH |
add actualNumber,ax | ................ |
So, you are modifying the lower 8-bits of AX through AL and then add the higher 8-bits of actualNumber/AH to itself - effectively doubling AH and then adding this to actualNumber like this:
actualNumber = 256 * (2 * AH) + (byte ptr buffer[ebx]-30h) ; I doubt you want that ;-)
These problems may cause several deviations from the desired result.

Use GCC generated assembler inside C++ Builder

I'm using C++builder for GUI application on Win32. Borland compiler optimization is very bad and does not know how to use SSE.
I have a function that is 5 times faster when compiled with mingw gcc 4.7.
I think about asking gcc to generate assembler code and then use this cod inside my C function because Borland compiler allows inline assembler.
The function in C looks like this :
void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
{
double s = 77.777;
size_t m = mA[NT-3];
AV[2]=x[n-4]+m*s;
}
I made the function code very simple in order to simplify my question. My real function contains many loops.
The Borland C++ compiler generated this assembler code :
;
; void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
;
#1:
push ebp
mov ebp,esp
add esp,-16
push ebx
;
; {
; double s = 77.777;
;
mov dword ptr [ebp-8],1580547965
mov dword ptr [ebp-4],1079210426
;
; size_t m = mA[NT-3];
;
mov edx,dword ptr [ebp+20]
mov ecx,dword ptr [ebp+24]
mov eax,dword ptr [edx+4*ecx-12]
;
; AV[2]=x[n-4]+m*s;
;
?live16385#48: ; EAX = m
xor edx,edx
mov dword ptr [ebp-16],eax
mov dword ptr [ebp-12],edx
fild qword ptr [ebp-16]
mov ecx,dword ptr [ebp+8]
mov ebx,dword ptr [ebp+12]
mov eax,dword ptr [ebp+16]
fmul qword ptr [ebp-8]
fadd qword ptr [ecx+8*ebx-32]
fstp qword ptr [eax+16]
;
; }
;
?live16385#64: ;
#2:
pop ebx
mov esp,ebp
pop ebp
ret
While the gcc generated assembler code is :
_Test_Fn:
mov edx, DWORD PTR [esp+20]
mov eax, DWORD PTR [esp+16]
mov eax, DWORD PTR [eax-12+edx*4]
mov edx, DWORD PTR [esp+8]
add eax, -2147483648
cvtsi2sd xmm0, eax
mov eax, DWORD PTR [esp+4]
addsd xmm0, QWORD PTR LC0
mulsd xmm0, QWORD PTR LC1
addsd xmm0, QWORD PTR [eax-32+edx*8]
mov eax, DWORD PTR [esp+12]
movsd QWORD PTR [eax+16], xmm0
ret
LC0:
.long 0
.long 1105199104
.align 8
LC1:
.long 1580547965
.long 1079210426
.align 8
I like to get help about how the function arguments acces is done in gcc and Borland C++.
My function in C++ for Borland would be something like :
void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
{
__asm
{
put gcc generated assembler here
}
}
Borland starts using ebp register while gcc use esp register.
Can I force one of the compilers to generate compatible code for accessing the arguments using some calling conventions like cdecl ou stdcall ?
The arguments are passed similarly in both cases. The difference is that the code generated by Borland expresses the argument locations relative to EBP register and GCC relative to ESP, but both of them refer to the same addresses.
Borlands sets EBP to point to the start of the function's stack frame and expresses locations relative to that, while GCC doesn't set up a new stack frame but expresses locations relative to ESP, which the caller has left pointing to the end of the caller's stack frame.
The code generated by Borland sets up a stack frame at the beginning of the function, causing EBP in the Borland code to be equal to ESP in the GCC code decreased by 4. This can be seen by looking at the first two Borland lines:
push ebp ; decrease esp by 4
mov ebp,esp ; ebp = the original esp decreased by 4
The GCC code doesn't alter ESP and Borland code doesn't alter EBP until the end of the procedure, so the relationsip holds when the arguments are accessed.
The calling convention seems to be cdecl in both of the cases, and there's no difference in how the functions are called. You can add keyword __cdecl to both in order to make that clear.
void __cdecl Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
However adding inline assembly compiled with GCC to the function compiled with Borland is not straightforward, because Borland might set up a stack frame even if the function body contains only inline assembly, causing the value of ESP register to differ from the one used in the GCC code. I see three possible workarounds:
Compile with Borland without the option "Standard stack frames". If the compiler figures out that a stack frame is not needed, this might work.
Compile with GCC without the option -fomit-frame-pointer. This should make sure that atleast the value of EBP is the same in both. The option is enabled at levels -O, -O2, -O3 and -Os.
Manually edit the assembly produced by GCC, changing references to ESP to EBP and adding 4 to the offset.
I would recommend you do some reading up on Application Binary Interfaces.
Here is a relevant link to help you figure out what compiler generates what sort of code:
https://en.wikipedia.org/wiki/X86_calling_conventions
I'd try either compiling everything with GCC, or see if compiling just the critical file with GCC and the rest with Borland and linking together works. What you explain can be made to work, but it will be a hard job that probably isn't worth your invested time (unless it will run very frequently on many, many machines).

Resources