Why is correct sum of memory integers in register ax but not register eax? - visual-studio

Given this program -- a student's program whom I am helping --
global _start
section .text
_start:
mov ebx, people
mov eax, [ebx + 2]
add eax, [ebx + 4]
add eax, [ebx + 6]
add eax, [ebx + 8]
mov ebx, [constant]
div bx
section .data
average dw 0
constant dw 5
people dw 0, 10, 25, 55, 125
While porting this from Visual Studio to a Linux machine to figure out what the problem was, I ran into some questions:
1) Why does gdm display the sum 255 when print $ax issued, but a large number appears when print $eax command issued? Is that because we have added word values instead of longword values?
I did try to add into ax, rather than eax, and got the same results. I got a relocation complaint when I tried to move the initial value into ax. That's why I used eax.
2) Why is the quotient 43 when div bx used, but if div ebx used, I get the wrong answer?
As an aside, I believe I found the original problem, which was an integer overflow. Line 10 -- mov ebx, [constant] -- was originally mov ebx,constant, which did not result in moving 5 into bx.

A few problems
All the data is defined as word but the code treats it as dword.
Prior to the division you need to extent the dividend via DX or EDX.
Although the first value in the array is zero, it's probably better to also include it in the code. If ever the data changes, at least the code will remain valid!
Solutions
Keep the data 16-bit and program accordingly
mov edx, people
mov ax, [edx]
add ax, [edx + 2]
add ax, [edx + 4]
add ax, [edx + 6]
add ax, [edx + 8]
xor dx, dx
div word ptr [constant] ;Divide DX:AX by 5
...
constant dw 5
people dw 0, 10, 25, 55, 125
Make the data 32-bit and program accordingly
mov edx, people
mov eax, [edx]
add eax, [edx + 4]
add eax, [edx + 8]
add eax, [edx + 12]
add eax, [edx + 16]
xor edx, edx
div dword ptr [constant] ;Divide EDX:EAX by 5
...
constant dd 5
people dd 0, 10, 25, 55, 125
See how I've avoided the use of EBX?
The division can use its divider straight from memory.
The EDX register (that anyway takes part in the division) can also equally fine address memory.
Less register clobbering is a good thing!

Related

Sorting a list through a procedure in flat assembly

I'm pretty new to fasm and I just recently started learning about procedures. My problem is that I have a proc and I want it to sort my list in a certain way. But when I run my code it just seems to sort some random numbers from memory. I don't quite know why it happens and I would appreciate any help.
Here's the code:
format PE gui 5.0
include 'D:\Flat Assembler\INCLUDE\win32a.inc'
entry start
section '.data' data readable writable
mas1 dw 2, -3, 1, -1, 3, -2, 5, -5, -4, 4
N = ($ - mas1) / 2
numStr db N dup('%d '), 0
strStr db '%s', 0
undefStr db 'undefined', 0
buff db 50 dup(?)
Caption db 'Result', 0
section '.code' code readable executable
start:
stdcall bubble, mas1
cinvoke wsprintf, buff, numStr
invoke MessageBox, 0, buff, Caption, MB_OK + MB_ICONINFORMATION
invoke ExitProcess, 0
proc bubble, mas:word
mov ecx, 0
mov ebx, 0
outerLoop:
cmp ecx, 10
je done
mov ebx, 2
innerLoop:
mov eax, 0
mov edx, 0
cmp [mas+ebx], 0 ;if(mas[j] > 0)
jge continue ;continue
mov ax, [mas+ebx-2]
cmp ax, [mas+ebx]
jle continue
mov dx, [mas+ebx]
mov [mas+ebx-2], dx
mov [mas+ebx], ax
continue:
cmp ebx, 18 ;10
je innerDone
add ebx, 2 ;inc ebx
jmp innerLoop
innerDone:
inc ecx
jmp outerLoop
done:
mov ecx, 0
mov ebx, 0
mov ebx, 18
mov ecx, N
print:
mov eax, 0
mov ax, [mas+ebx]
cwde
push eax
sub ebx, 2
loop print
ret
endp
section '.idata' import data readable writeable
library kernel32,'KERNEL32.DLL',\
user32,'USER32.DLL'
include 'D:\Flat Assembler\INCLUDE\API\kernel32.inc'
include 'D:\Flat Assembler\INCLUDE\API\user32.inc'
Error 1
stdcall bubble, mas1
...
proc bubble, mas:word
The parameter mas1 is an address and is pushed to the stack as a dword. Therefore you should not limit the argument mas to a word.
What your bubble procedure needs is the full address of the array. You get this via mov esi, [mas] that FASM will encode as if you would have written mov esi, [ebp+8]. EBP+8 is where the first argument (and in your program the only argument) resides, when the standard prologue push ebp mov ebp, esp is used.
Error 2
In your bubble procedure you push the resulting array to the stack hoping to have wsprintf use it from there, but once the bubble procedure executes its ret instruction, the epilogue code as well as the ret instruction itself will start eating your array and even return to the wrong address in memory!
If you're going to return an array via the stack, then store it above the return address and the argument(s). That's why I wrote in my program below:
sub esp, N*4 ; Space for N dwords on the stack
stdcall bubble, mas1
Error 3
cmp [mas+ebx], 0 ;if(mas[j] > 0)
jge continue ;continue
Your BubbleSort is wrong because you don't allow positive numbers to get compared!
Furthermore you make too many iterations that also continu for too long.
I tested below program on FASM 1.71.22 Don't forget to change the paths!
format PE gui 5.0
include 'C:\FASM\INCLUDE\win32a.inc'
entry start
section '.data' data readable writable
mas1 dw 2, -3, 1, -1, 3, -2, 5, -5, -4, 4
N = ($ - mas1) / 2
numStr db N-1 dup('%d, '), '%d', 0
;strStr db '%s', 0
;undefStr db 'undefined', 0
buff db 50 dup(?)
Caption db 'Result', 0
section '.code' code readable executable
start:
sub esp, N*4 ; Space for N dwords on the stack
stdcall bubble, mas1
cinvoke wsprintf, buff, numStr
invoke MessageBox, 0, buff, Caption, MB_OK + MB_ICONINFORMATION
invoke ExitProcess, 0
proc bubble uses ebx esi, mas
mov esi, [mas] ; Address of the array
mov ecx, (N-1)*2 ; Offset to the last item; Max (N-1) compares
outerLoop:
xor ebx, ebx
innerLoop:
mov ax, [esi+ebx]
mov dx, [esi+ebx+2]
cmp ax, dx
jle continue
mov [esi+ebx+2], ax
mov [esi+ebx], dx
continue:
add ebx, 2
cmp ebx, ecx
jb innerLoop
sub ecx, 2
jnz outerLoop
mov ebx, (N-1)*2
toStack:
movsx eax, word [esi+ebx]
mov [ebp+12+ebx*2], eax
sub ebx, 2
jnb toStack
ret
endp
section '.idata' import data readable writeable
library kernel32,'KERNEL32.DLL',\
user32,'USER32.DLL'
include 'C:\FASM\INCLUDE\API\kernel32.inc'
include 'C:\FASM\INCLUDE\API\user32.inc'
Error 2 revisited
IMO returning the resulting array through the stack would make better sense if your bubble procedure didn't modify the original array.
But in your present code you do, so...
Once you strike the toStack snippet from the bubble procedure, you can simply (after returning from the bubble procedure) push the word-sized elements of the array to the stack as dwords followed by using wsprintf.
...
start:
stdcall bubble, mas1
mov ebx, (N-1)*2
toStack:
movsx eax, word [mas1+ebx]
push eax
sub ebx, 2
jnb toStack
cinvoke wsprintf, buff, numStr
...
sub ecx, 2
jnz outerLoop
; See no more toStack here!
ret
endp
...

Effective addressing in Real Mode - accessing array

I am working in real mode of x86 and say I need to access a element from the array people; the index of which is in the register BX.
MOV BX, 2
struc person
.name resb 11
.age resb 1
endstruc
people: times 10 db person_size
The effective addressing in real mode is limited to base + offset. So code like
mov [people + bx * person_size + person.age],byte 20
does not work; however the assembler can do the calculation if no BX register is used -
mov [people + 2 * person_size + person.age],byte 20
I can do multiplication or shift lefts a few times and make it work, but is there a way to do access any element in an array, without assuming that the size of the structure will remain the same in future?
Is there any other way than multiplying like below (cannot do shifts if the structure size changes, code will also change)?
push ax
mov ax, person_size
mul bx
mov bx, ax
pop ax
add bx, person.age
mov [people + bx], byte 20
The effective addressing in real mode is limited to base + offset.
Only on 8086 but not on x86-16 in general.
It's true that in Real Mode you can use Scaled Index addressing like in Fifoernik's answer, but in your program it won't help much since the Scale values are limited to either {1, 2, 4, or 8} and your structure has 12 bytes.
You must do the multiplication yourself especially since you want to leave it open what the size of the structure will be in future.
push ax
mov ax, person_size
mul bx
mov bx, ax
pop ax
add bx, person.age
mov [people + bx], byte 20
What the Real Mode on x86-16 does offer is an extra imul variant that simplifies your calculation:
imul bx, person_size
mov [people + bx + person.age], byte 20
There was no need to add person.age in a separate instruction. The assembler will add people and person.age to become a 16-bit offset.
Your version with the mul bx instruction also modified the DX register. you didn't preserve that one like you did with AX!
For a true 8086 your code was (almost) fine:
push ax
push dx
mov ax, person_size
mul bx
mov bx, ax
pop dx
pop ax
mov [people + bx + person.age], byte 20
One optimization would pad the 12-byte structure to 16 bytes.
struc person
.name resb 11
.age resb 1
.pad resb 4
endstruc
This replaces multiplication by simple shifting to the left in order to access the elements:
For x86-16 (array index in ebx):
shl ebx, 1
mov [people + ebx * 8 + person.age], byte 20
or for 8086 (array index in bx):
push cx
mov cl, 4
shl bx, cl
pop cx
mov [people + bx + person.age], byte 20
Another solution uses a lookup table to avoid multiplication and padding.
LUT dw 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ; 10 elements
...
shl bx, 1 ; Lookup table holds words
mov bx, [LUT + bx] ; Fetch array element's offset
mov [people + bx + person.age], byte 20

Assembly program crashes on call or exit

Debugging my code in VS2015, I get to the end of the program. The registers are as they should be, however, on call ExitProcess, or any variation of that, causes an "Access violation writing location 0x00000004." I am utilizing Irvine32.inc from Kip Irvine's book. I have tried using call DumpRegs, but that too throws the error.
I have tried using other variations of call ExitProcess, such as exit and invoke ExitProcess,0 which did not work either, throwing the same error. Before, when I used the same format, the code worked fine. The only difference between this code and the last one is utilizing the general purpose registers.
include Irvine32.inc
.data
;ary dword 100, -30, 25, 14, 35, -92, 82, 134, 193, 99, 0
ary dword -24, 1, -5, 30, 35, 81, 94, 143, 0
.code
main PROC
;ESI will be used for the array
;EDI will be used for the array value
;ESP will be used for the array counting
;EAX will be used for the accumulating sum
;EBX will be used for the average
;ECX will be used for the remainder of avg
;EBP will be used for calculating remaining sum
mov eax,0 ;Set EAX register to 0
mov ebx,0 ;Set EBX register to 0
mov esp,0 ;Set ESP register to 0
mov esi,OFFSET ary ;Set ESI register to array
sum: mov edi,[esi] ;Set value to array value
cmp edi,0 ;Check value to temination value 0
je finsum ;If equal, jump to finsum
add esp,1 ;Add 1 to array count
add eax,edi ;Add value to sum
add esi,4 ;Increment to next address in array
jmp sum ;Loop back to sum array
finsum: mov ebp,eax ;Set remaining sum to the sum
cmp ebp,0 ;Compare rem sum to 0
je finavg ;Jump to finavg if sum is 0
cmp ebp,esp ;Check sum to array count
jl finavg ;Jump to finavg if sum is less than array count
avg: add ebx,1 ;Add to average
sub ebp,esp ;Subtract array count from rem sum
cmp ebp,esp ;Compare rem sum to array count
jge avg ;Jump to avg if rem sum is >= to ary count
finavg: mov ecx,ebp ;Set rem sum to remainder of avg
call ExitProcess
main ENDP
END MAIN
Registers before call ExitProcess
EAX = 00000163 EBX = 0000002C ECX = 00000003 EDX = 00401055
ESI = 004068C0 EDI = 00000000 EIP = 0040366B ESP = 00000008
EBP = 00000003 EFL = 00000293
OV = 0 UP = 0 EI = 1 PL = 1 ZR = 0 AC = 1 PE = 0 CY = 1
mov esp,0 sets the stack pointer to 0. Any stack instructions like push/pop or call/ret will crash after you do that.
Pick a different register for your array-count temporary, not the stack pointer! You have 7 other choices, looks like you still have EDX unused.
In the normal calling convention, only EAX, ECX, and EDX are call-clobbered (so you can use them without preserving the caller's value). But you're calling ExitProcess instead of returning from main, so you can destroy all the registers. But ESP has to be valid when you call.
call works by pushing a return address onto the stack, like sub esp,4 / mov [esp], next_instruction / jmp ExitProcess. See https://www.felixcloutier.com/x86/CALL.html. As your register-dump shows, ESP=8 before the call, which is why it's trying to store to absolute address 4.
Your code has 2 sections: looping over the array and then finding the average. You can reuse a register for different things in the 2 sections, often vastly reducing register pressure. (i.e. you don't run out of registers.)
Using implicit-length arrays (terminated by a sentinel element like 0) is unusual outside of strings. It's much more common to pass a function a pointer + length, instead of just a pointer.
But anyway, you have an implicit-length array so you have to find its length and remember that when calculating the average. Instead of incrementing a size counter inside the loop, you can calculate it from the pointer you're also incrementing. (Or use the counter as an array index like ary[ecx*4], but pointer-increments are often more efficient.)
Here's what an efficient (scalar) implementation might look like. (With SSE2 for SIMD you could add 4 elements with one instruction...)
It only uses 3 registers total. I could have used ECX instead of ESI (so main could ret without having destroyed any of the registers the caller expected it to preserve, only EAX, ECX, and EDX), but I kept ESI for consistency with your version.
.data
;ary dword 100, -30, 25, 14, 35, -92, 82, 134, 193, 99, 0
ary dword -24, 1, -5, 30, 35, 81, 94, 143, 0
.code
main PROC
;; inputs: static ary of signed dword integers
;; outputs: EAX = array average, EDX = remainder of sum/size
;; ESI = array count (in elements)
;; clobbers: none (other than the outputs)
; EAX = sum accumulator
; ESI = array pointer
; EDX = array element temporary
xor eax, eax ; sum = 0
mov esi, OFFSET ary ; incrementing a pointer is usually efficient, vs. ary[ecx*4] inside a loop or something. So this is good.
sumloop: ; do {
mov edx, [esi]
add edx, 4
add eax, edx ; sum += *p++ without checking for 0, because + 0 is a no-op
test edx, edx ; sets FLAGS the same as cmp edx,0
jnz sumloop ; }while(array element != 0);
;;; fall through if the element is 0.
;;; esi points to one past the terminator, i.e. two past the last real element we want to count for the average
sub esi, OFFSET ary + 4 ; (end+4) - (start+4) = array size in bytes
shr esi, 2 ; esi = array length = (end-start)/element_size
cdq ; sign-extend sum into EDX:EAX as an input for idiv
idiv esi ; EAX = sum/length EDX = sum%length
call ExitProcess
main ENDP
I used x86's hardware division instruction, instead of a subtraction loop. Your repeated-subtraction loop looked pretty complicated, but manual signed division can be tricky. I don't see where you're handling the possibility of the sum being negative. If your array had a negative sum, repeated subtraction would make it grow until it overflowed. Or in your case, you're breaking out of the loop if sum < count, which will be true on the first iteration for a negative sum.
Note that comments like Set EAX register to 0 are useless. We already know that from reading mov eax,0. sum = 0 describes the semantic meaning, not the architectural effect. There are some tricky x86 instructions where it does make sense to comment about what it even does in this specific case, but mov isn't one of them.
If you just wanted to do repeated subtraction with the assumption that sum is non-negative to start with, it's as simple as this:
;; UNSIGNED division (or signed with non-negative dividend and positive divisor)
; Inputs: sum(dividend) in EAX, count(divisor) in ECX
; Outputs: quotient in EDX, remainder in EAX (reverse of the DIV instruction)
xor edx, edx ; quotient counter = 0
cmp eax, ecx
jb subloop_end ; the quotient = 0 case
repeat_subtraction: ; do {
inc edx ; quotient++
sub eax, ecx ; dividend -= divisor
cmp eax, ecx
jae repeat_subtraction ; while( dividend >= divisor );
; fall through when eax < ecx (unsigned), leaving EAX = remainder
subloop_end:
Notice how checking for special cases before entering the loop lets us simplify it. See also Why are loops always compiled into "do...while" style (tail jump)?
sub eax, ecx and cmp eax, ecx in the same loop seems redundant: we could just use sub to set flags, and correct for the overshoot.
xor edx, edx ; quotient counter = 0
cmp eax, ecx
jb division_done ; the quotient = 0 case
repeat_subtraction: ; do {
inc edx ; quotient++
sub eax, ecx ; dividend -= divisor
jnc repeat_subtraction ; while( dividend -= divisor doesn't wrap (carry) );
add eax, ecx ; correct for the overshoot
dec edx
division_done:
(But this isn't actually faster in most cases on most modern x86 CPUs; they can run the inc, cmp, and sub in parallel even if the inputs weren't the same. This would maybe help on AMD Bulldozer-family where the integer cores are pretty narrow.)
Obviously repeated subtraction is total garbage for performance with large numbers. It is possible to implement better algorithms, like one-bit-at-a-time long-division, but the idiv instruction is going to be faster for anything except the case where you know the quotient is 0 or 1, so it takes at most 1 subtraction. (div/idiv is pretty slow compared to any other integer operation, but the dedicated hardware is much faster than looping.)
If you do need to implement signed division manually, normally you record the signs, take the unsigned absolute value, then do unsigned division.
e.g. xor eax, ecx / sets dl gives you dl=0 if EAX and ECX had the same sign, or 1 if they were different (and thus the quotient will be negative). (SF is set according to the sign bit of the result, and XOR produces 1 for different inputs, 0 for same inputs.)

Flipping the first pixel in an image in asm

Hi I am just doing this for practice before I create a loop that can flip an 3x3 image horizontally or vertically. I am using a variable called ap to store the addresses of the first pixel. I would also like to eventually use another variable called amp to store the mirrored pixel address, and also a register to store the calculated offset of the pixels but for now I put it in manually. No matter what I do the program doesn't swap them. Does anyone have an idea of what is the issue? Thank you for reading.
mov ecx, dword ptr[eax + ecx * 4]
mov ap, ecx //temporary pixel address storage
mov ecx, 0
mov ecx, dword ptr[eax + ecx * 4 + 8] //offset by 8 pixels
mov [ap], ecx
I am using a variable called ap to store the addresses of the first pixel
If the ap variable is suppossed to contain an addresss than you need to use the lea instruction (not the mov instruction).
; For the 1st line EAX is address of image = address of 1st pixel
mov ecx, 0 ;Index to 1st pixel
lea ecx, dword ptr[eax + ecx * 4] ;Address of 1st pixel
mov [ap], ecx
mov ecx, 2 ;Index to 3rd pixel
lea ecx, dword ptr[eax + ecx * 4] ;Address of 3rd pixel
mov [amp], ecx
Now to swap these pixels and thus flipping the image you can write:
mov ecx, [ap]
mov edx, [amp]
mov [ap], edx
mov [amp], ecx
To proces the next lines of the image you could each time add the number of bytes per scanline to the EAX register. For an 3x3 image that's probably 12.
I don't get what this suppose to do:
mov ecx, dword ptr[eax + ecx * 4]
whats in ecx? is it a counter for offset? but you are overriding it each time...
If you'r trying to save the original bit i think that you need to make sure you got the right value in ecx. try
mov ecx, 0
first (you can also xor ecx, ecx it gets the job done and its easier to read)

Jumping to random code when using IDIV

I am relatively new to assembler, but when creating code what works with arrays and calculates the average of each row, I encountered a problem that suggests I don't know how division really works. This is my code:
.model tiny
.code
.startup
Org 100h
Jmp Short Start
N Equ 2 ;columns
M Equ 3 ;rows
Matrix DW 2, 2, 3 ; elements
DW 4, 6, 6 ; elements]
Vector DW M Dup (?)
S Equ Type Matrix
Start:
Mov Cx, M;20
Lea Di, Vector
Xor Si, Si
Cols: Push Cx
Mov Cx, N
Xor Bx, Bx
Xor Ax, Ax
Rows:
Add Ax, Matrix[Bx][Si]
Next:
Add Bx, S*M
Loop Rows
Add Si, S
Mov [Di], Ax
Add Di, S
Pop Cx
Loop Cols
Xor Bx, Bx
Mov Cx, M
Mov DX, 2
Print: Mov Ax, Vector[Bx]
IDiv Dx; div/idiv error here
Add Bx, S
Loop Print
.exit 0
There are no errors when compiling. Elements are counted correctly, but when division happens the debugger shows the program jumping to apparently random code. Why is this happening and how can I resolve it?
If you use x86 architecture, IDiv with 16-bit operand will also take Dx as a part of the integer to be divided and throw an exception (interrupt) if the quotient is too large to fit in 16bits.
Try something like this:
Mov Di, 2
Print: Mov Ax, Vector[Bx]
Cwd ; sign extend Ax to Dx:Ax
IDiv Di

Resources