Recursive methods - visual-studio-2010

I'm having a hard time grasping recursion. For example I have the following method. When the if statement returns true, I expect to return from this method. However looking at the method execution in Windbg and Visual Studio shows that the method continues to execute. I apologize for the generic question however your feedback would really be appreciated.
How is N decremented in-order to satisfy the if condition?
long factorial(int N)
{
if(N == 1)
return 1;
return N * factorial(N - 1);
}

compiling and disassembling the function you should get a disassembly similar to this
0:000> cdb: Reading initial command 'uf fact!fact;q'
fact!fact:
00401000 55 push ebp
00401001 8bec mov ebp,esp
00401003 837d0801 cmp dword ptr [ebp+8],1
00401007 7507 jne fact!fact+0x10 (00401010)
fact!fact+0x9:
00401009 b801000000 mov eax,1
0040100e eb13 jmp fact!fact+0x23 (00401023)
fact!fact+0x10:
00401010 8b4508 mov eax,dword ptr [ebp+8]
00401013 83e801 sub eax,1
00401016 50 push eax
00401017 e8e4ffffff call fact!fact (00401000)
0040101c 83c404 add esp,4
0040101f 0faf4508 imul eax,dword ptr [ebp+8]
fact!fact+0x23:
00401023 5d pop ebp
00401024 c3 ret
quit:
lets assume N == 5 when the function is entered ie [ebp+8] will hold 5
as long as [ebp+8] > 1 the jne will be taken
here you can see N being decremented (sub eax ,1)
the decremented N is again passed to the function fact (recursed without a return back to caller) the loop happens again and the decremented N is resent to fact this keeps on happening until the jne is not taken that is until N or [ebp+8] == 1
when N becomes 1 jne is not taken but jmp 401023 is taken
where it returns to the caller the caller being the function fact(int N)
that is it will return 40101c where the multiplication of eax of takes place and result is stored back in eax;
this will keep on happening until the ret points to the first call in main() see the stack below prior to executing pop ebp for the first time
0:000> kPL
ChildEBP RetAddr
0013ff38 0040101c fact!fact(
int N = 0n1)+0x23
0013ff44 0040101c fact!fact(
int N = 0n2)+0x1c
0013ff50 0040101c fact!fact(
int N = 0n3)+0x1c
0013ff5c 0040101c fact!fact(
int N = 0n4)+0x1c
0013ff68 0040109f fact!fact(
int N = 0n5)+0x1c
0013ff78 0040140b fact!main(
int argc = 0n2,
char ** argv = 0x00033ac0)+0x6f

I think the best way to grasp is to work through your code manually. Say you call factorial(4), what happens?4 is not equal to 1. Return 4 * factorial(4-1).
What is the return value of factorial 3? 3 is not equal to 1 return 3* factorial(3-1).
What is the return value of factorial 2? 2 is not equal to 1 return 2* factorial(2-1).
What is the return value of factorial 1? 1 equals 1 is true. Return 1. This is the base case. Now we move back up the recursion.
Return 1. This is factorial (2-1)
Return 2*1. This is factorial (3-1)
Return 3*2 this is factorial(4-1)
Return 4*6 this is factorial(4), the original call you made.
The idea is you have a function that has a base case (when n=1 return 1) and the function calls itself in way that moves the function towards the base case (factorial(n**-**1)).

Related

What is the code generator for computed goto?

example of computed goto:
...
GO TO ( 10, 20, 30, 40 ), N
...
10 CONTINUE
...
20 CONTINUE
...
40 CONTINUE
If N equals one, then go to 10.
If N equals two, then go to 20.
If N equals three, then go to 30.
If N equals four, then go to 40.
What is the code generator of goto in the final state of compiling?
The most common way of compiling computed goto is a static jump table and an indirect branch instruction. For example (without -fPIC):
int test(int num) {
const void * const labels[] = {&&a, &&b, &&cl};
goto *labels[num];
a: return 1;
b: return 2;
cl: return 3;
}
Is going to be compiled as:
test(int): # #test(int)
movsxd rax, edi
jmp qword ptr [8*rax + .L__const.test(int).labels]
.Ltmp0: # Block address taken
mov eax, 1
ret
.Ltmp1: # Block address taken
mov eax, 3
ret
.Ltmp2: # Block address taken
mov eax, 2
ret
.L__const.test(int).labels:
.quad .Ltmp0
.quad .Ltmp2
.quad .Ltmp1

Assembly program crashes on call or exit

Debugging my code in VS2015, I get to the end of the program. The registers are as they should be, however, on call ExitProcess, or any variation of that, causes an "Access violation writing location 0x00000004." I am utilizing Irvine32.inc from Kip Irvine's book. I have tried using call DumpRegs, but that too throws the error.
I have tried using other variations of call ExitProcess, such as exit and invoke ExitProcess,0 which did not work either, throwing the same error. Before, when I used the same format, the code worked fine. The only difference between this code and the last one is utilizing the general purpose registers.
include Irvine32.inc
.data
;ary dword 100, -30, 25, 14, 35, -92, 82, 134, 193, 99, 0
ary dword -24, 1, -5, 30, 35, 81, 94, 143, 0
.code
main PROC
;ESI will be used for the array
;EDI will be used for the array value
;ESP will be used for the array counting
;EAX will be used for the accumulating sum
;EBX will be used for the average
;ECX will be used for the remainder of avg
;EBP will be used for calculating remaining sum
mov eax,0 ;Set EAX register to 0
mov ebx,0 ;Set EBX register to 0
mov esp,0 ;Set ESP register to 0
mov esi,OFFSET ary ;Set ESI register to array
sum: mov edi,[esi] ;Set value to array value
cmp edi,0 ;Check value to temination value 0
je finsum ;If equal, jump to finsum
add esp,1 ;Add 1 to array count
add eax,edi ;Add value to sum
add esi,4 ;Increment to next address in array
jmp sum ;Loop back to sum array
finsum: mov ebp,eax ;Set remaining sum to the sum
cmp ebp,0 ;Compare rem sum to 0
je finavg ;Jump to finavg if sum is 0
cmp ebp,esp ;Check sum to array count
jl finavg ;Jump to finavg if sum is less than array count
avg: add ebx,1 ;Add to average
sub ebp,esp ;Subtract array count from rem sum
cmp ebp,esp ;Compare rem sum to array count
jge avg ;Jump to avg if rem sum is >= to ary count
finavg: mov ecx,ebp ;Set rem sum to remainder of avg
call ExitProcess
main ENDP
END MAIN
Registers before call ExitProcess
EAX = 00000163 EBX = 0000002C ECX = 00000003 EDX = 00401055
ESI = 004068C0 EDI = 00000000 EIP = 0040366B ESP = 00000008
EBP = 00000003 EFL = 00000293
OV = 0 UP = 0 EI = 1 PL = 1 ZR = 0 AC = 1 PE = 0 CY = 1
mov esp,0 sets the stack pointer to 0. Any stack instructions like push/pop or call/ret will crash after you do that.
Pick a different register for your array-count temporary, not the stack pointer! You have 7 other choices, looks like you still have EDX unused.
In the normal calling convention, only EAX, ECX, and EDX are call-clobbered (so you can use them without preserving the caller's value). But you're calling ExitProcess instead of returning from main, so you can destroy all the registers. But ESP has to be valid when you call.
call works by pushing a return address onto the stack, like sub esp,4 / mov [esp], next_instruction / jmp ExitProcess. See https://www.felixcloutier.com/x86/CALL.html. As your register-dump shows, ESP=8 before the call, which is why it's trying to store to absolute address 4.
Your code has 2 sections: looping over the array and then finding the average. You can reuse a register for different things in the 2 sections, often vastly reducing register pressure. (i.e. you don't run out of registers.)
Using implicit-length arrays (terminated by a sentinel element like 0) is unusual outside of strings. It's much more common to pass a function a pointer + length, instead of just a pointer.
But anyway, you have an implicit-length array so you have to find its length and remember that when calculating the average. Instead of incrementing a size counter inside the loop, you can calculate it from the pointer you're also incrementing. (Or use the counter as an array index like ary[ecx*4], but pointer-increments are often more efficient.)
Here's what an efficient (scalar) implementation might look like. (With SSE2 for SIMD you could add 4 elements with one instruction...)
It only uses 3 registers total. I could have used ECX instead of ESI (so main could ret without having destroyed any of the registers the caller expected it to preserve, only EAX, ECX, and EDX), but I kept ESI for consistency with your version.
.data
;ary dword 100, -30, 25, 14, 35, -92, 82, 134, 193, 99, 0
ary dword -24, 1, -5, 30, 35, 81, 94, 143, 0
.code
main PROC
;; inputs: static ary of signed dword integers
;; outputs: EAX = array average, EDX = remainder of sum/size
;; ESI = array count (in elements)
;; clobbers: none (other than the outputs)
; EAX = sum accumulator
; ESI = array pointer
; EDX = array element temporary
xor eax, eax ; sum = 0
mov esi, OFFSET ary ; incrementing a pointer is usually efficient, vs. ary[ecx*4] inside a loop or something. So this is good.
sumloop: ; do {
mov edx, [esi]
add edx, 4
add eax, edx ; sum += *p++ without checking for 0, because + 0 is a no-op
test edx, edx ; sets FLAGS the same as cmp edx,0
jnz sumloop ; }while(array element != 0);
;;; fall through if the element is 0.
;;; esi points to one past the terminator, i.e. two past the last real element we want to count for the average
sub esi, OFFSET ary + 4 ; (end+4) - (start+4) = array size in bytes
shr esi, 2 ; esi = array length = (end-start)/element_size
cdq ; sign-extend sum into EDX:EAX as an input for idiv
idiv esi ; EAX = sum/length EDX = sum%length
call ExitProcess
main ENDP
I used x86's hardware division instruction, instead of a subtraction loop. Your repeated-subtraction loop looked pretty complicated, but manual signed division can be tricky. I don't see where you're handling the possibility of the sum being negative. If your array had a negative sum, repeated subtraction would make it grow until it overflowed. Or in your case, you're breaking out of the loop if sum < count, which will be true on the first iteration for a negative sum.
Note that comments like Set EAX register to 0 are useless. We already know that from reading mov eax,0. sum = 0 describes the semantic meaning, not the architectural effect. There are some tricky x86 instructions where it does make sense to comment about what it even does in this specific case, but mov isn't one of them.
If you just wanted to do repeated subtraction with the assumption that sum is non-negative to start with, it's as simple as this:
;; UNSIGNED division (or signed with non-negative dividend and positive divisor)
; Inputs: sum(dividend) in EAX, count(divisor) in ECX
; Outputs: quotient in EDX, remainder in EAX (reverse of the DIV instruction)
xor edx, edx ; quotient counter = 0
cmp eax, ecx
jb subloop_end ; the quotient = 0 case
repeat_subtraction: ; do {
inc edx ; quotient++
sub eax, ecx ; dividend -= divisor
cmp eax, ecx
jae repeat_subtraction ; while( dividend >= divisor );
; fall through when eax < ecx (unsigned), leaving EAX = remainder
subloop_end:
Notice how checking for special cases before entering the loop lets us simplify it. See also Why are loops always compiled into "do...while" style (tail jump)?
sub eax, ecx and cmp eax, ecx in the same loop seems redundant: we could just use sub to set flags, and correct for the overshoot.
xor edx, edx ; quotient counter = 0
cmp eax, ecx
jb division_done ; the quotient = 0 case
repeat_subtraction: ; do {
inc edx ; quotient++
sub eax, ecx ; dividend -= divisor
jnc repeat_subtraction ; while( dividend -= divisor doesn't wrap (carry) );
add eax, ecx ; correct for the overshoot
dec edx
division_done:
(But this isn't actually faster in most cases on most modern x86 CPUs; they can run the inc, cmp, and sub in parallel even if the inputs weren't the same. This would maybe help on AMD Bulldozer-family where the integer cores are pretty narrow.)
Obviously repeated subtraction is total garbage for performance with large numbers. It is possible to implement better algorithms, like one-bit-at-a-time long-division, but the idiv instruction is going to be faster for anything except the case where you know the quotient is 0 or 1, so it takes at most 1 subtraction. (div/idiv is pretty slow compared to any other integer operation, but the dedicated hardware is much faster than looping.)
If you do need to implement signed division manually, normally you record the signs, take the unsigned absolute value, then do unsigned division.
e.g. xor eax, ecx / sets dl gives you dl=0 if EAX and ECX had the same sign, or 1 if they were different (and thus the quotient will be negative). (SF is set according to the sign bit of the result, and XOR produces 1 for different inputs, 0 for same inputs.)

Value of unused variable changing after subroutine call - Assembly

I am pretty new to assembly that I'm learning from the last 7 hours (It's an early peek into the courses I had in the next semester starting next month). I read some online tutorials, and the nasm manual and started to port a C program to nasm, just for learning.
int fact(int n)
{
return (n < 0) ? 1 : n * fact(n - 1);
}
I then started to port it to assembly, and had this as my solution:
fact:
; int fact(int n)
cmp dword ebx, 0 ; n == 0
je .yes
.no:
push ebx ; save ebx in stack
sub ebx, dword 1 ; sub 1 from ebx. (n - 1)
call fact ; call fact recursively
pop ebx ; get back the ebx from stack
imul eax, ebx ; eax *= ebx; eax == fact(n - 1)
ret
.yes:
mov eax, dword 1 ; store 1 in eax to return it
ret
I take in a DWORD (int I suppose) in the ebx register and return the value in the eax register. As you can see I am not at all using the variable i that I have declared in the .bss section. My variables are like this:
section .bss
; int i, f
i resb 2
f resb 2
It's 2 bytes for an int right? Okay then I'm prompting the user in the _main, getting the input with _scanf and then calling the function. Other than this and calling the function, I have no other code that changes the value of i variable.
mov ebx, dword [i] ; check for validity of the factorial value
cmp dword ebx, 0
jnl .no
.yes:
push em ; print error message and exit
call _printf
add esp, 4
ret
.no:
push dword 0 ; print the result and exit
push dword [i]
push rm
call _printf
add esp, 12
call fact ; call the fact function
mov dword [f], eax
push dword [f] ; print the result and exit
push dword [i]
push rm
call _printf
add esp, 12
ret
I don't see where I'm modifying the value of i variable, on first print before the call to fact it is indeed the same value entered by the user, but after calling the function, in the later print, it is printing some garbage value, as the following output:
E:\ASM> factorial
Enter a number: 5
The factorial of 5 is 0The factorial of 7864325 is 120
E:\ASM>
Any clues? My complete source code is in this gist: https://gist.github.com/sriharshachilakapati/70049a778e12d8edd9c7acf6c2d44c33

assembly: sorting numbers using only conditional statments

I am new to assembly and I am trying to write a program that gets five user inputed numbers, stores them in variables num1-num5, sorts them(without using arrays) with num5 having the greatest value and num1 having the lowest value, and then displays the sorted numbers. I am having trouble figuring out how to approach this. I got the 5 numbers and stored them in the variables but I am confused as to how to start with sorting. I have tried a few things but I keep getting errors. This is my code that I can actually get running but it isn't working the way I want it to.
TITLE MASM Template (main.asm)
INCLUDE Irvine32.inc
.data
getnumber byte "Please enter a number between 0 and 20",0ah,0dh,0
num1 byte 0
num2 byte 0
num3 byte 0
num4 byte 0
num5 byte 0
.code
main PROC
call Clrscr
;************* get the information from the user*******************
mov edx, offset getnumber ;ask to input number
call writestring
call readint
mov bl, al
mov num1, bl ;get the number and move to num1 variable
mov edx, offset getnumber ;ask to input number
call writestring
call readint
mov bl, al
mov num2, bl ;get the number and move to num2 variable
mov edx, offset getnumber ;ask to input number
call writestring
call readint
mov bl, al
mov num3, bl ;get the number and move to num3 variable
mov edx, offset getnumber ;ask to input number
call writestring
call readint
mov bl,al
mov num4, bl ;get the number and move to num4 variable
mov edx, offset getnumber ;ask to input number
call writestring
call readint
mov bl, al
mov num5, bl ;get the number and move to num5 variable
;***show the user inputed numbers****
mov al, num1
call writeint
mov al, num2
call writeint
mov al,num3
call writeint
mov al, num4
call writeint
mov al,num5
call writeint
;*****start comparing***
cmp bl,num5
jl jumptoisless
jg jumptoisgreater
jumptoisless:
call writeint
jumptoisgreater:
mov bl, num5
mov dl, num4
mov num5, dl
mov num4, bl
call writeint
jmp imdone
imdone:
call dumpregs
exit
main ENDP
END main
Some notes to your code:
call readint
mov bl, al
mov num2, bl
Why don't you simply store al directly to memory, as: mov [num2],al? You don't use the bl anyway.
Except here:
;*****start comparing***
cmp bl,num5
jl jumptoisless
jg jumptoisgreater
Where I would be afraid what call writeint does to ebx (or you did your homework, and you know from head that call writeint preserves ebx content?).
And if the ebx is preserved, then bl contains still num5 from the input, so it will be equal.
Funnily enough, when equal, you will continue with jumptoisless: part of code, which will output some leftover in al, and then it will continue to jumptoisgreater: part of code, so effectively executing all of them.
Can you watch the CPU for a while in debugger, while single stepping over the instructions, to understand a bit better how it works? It's a state machine, ie. based on the current values in registers, and content of memory, it will change the state of registers and memory in the deterministic way.
So unless you jump away, next instructions is executed after the current one, and jl + jg doesn't cover "equal" state (at least you do cmp only once, so hopefully you understand the jcc instructions don't change flags and both jl/jg operate on the same result of cmp in flags). The Assembler doesn't care about name of your labels, and it will not warn you the "isgreater" code is executed even when "isless" was executed first.
About how to solve your task:
Can't think of anything reasonably short, unless you start to work with num1-num5 memory as array, so you can address it in generic pointer way with index. So I will gladly let you try on your own, just a reminder you need at least n*log_n compares to sort n values, so if you would write very effective sort code, you would need at least 5*3 = 15 cmp instructions (log2(5) = 3, as 23 = 8).
On the other hand an ineffective (but simple to write and understand) bubble sort over array can be done with single cmp inside two loops.
rcgldr made me curious, so I have been trying few things...
With insertion sort it's possible to use only 8x (at most) cmp (I hope the pseudo-syntax is understandable for him):
Insert(0, num1)
// ^ no cmp
Insert((num2 <= [0] ? 0 : 1), num2)
// ^ 1x cmp executed
Insert((num3 <= [0] ? 0 : (num3 <= [1] ? 1 : 2)), num3)
// ^ at most 2 cmp executed
Insert((num4 <= [1] ? (num4 <= [0] ? 0 : 1) : (num4 <= [2] ? 2 : 3)), num4)
// ^ always 2 of 3 cmp executed
Insert((num5 <= [1] ? (num5 <= [0] ? 0 : 1) : (num5 <= [2] ? 2 : (num5 <= [3] ? 3 : 4))), num5)
// ^ at most 3 of 4 cmp executed
=> total at most 8 cmp executed.
Of course doing the "insert" with "position" over fixed variables would be total PITA... ;) So this is half-joke proposal just to see if 8x cmp is enough.
("6 compares" turned out to be my brain-fart, not possible AFAIK)

Visual Studio performance optimization in branching

Consider the following
while(true)
{
if(x>5)
// Run function A
else
// Run function B
}
if x is always less than 5, does visual studio compiler do any optimization? i.e. like never checks if x is larger than 5 and always run function B
It depends on whether or not the compiler "knows" that x will always be less than 5.
Yes, nearly all modern compilers are capable of removing the branch. But the compiler needs to be able to prove that the branch will always go one direction.
Here's an example that can be optimized:
int x = 1;
if (x > 5)
printf("Hello\n");
else
printf("World\n");
The disassembly is:
sub rsp, 40 ; 00000028H
lea rcx, OFFSET FLAT:??_C#_06DKJADKFF#World?6?$AA#
call QWORD PTR __imp_printf
x = 1 is provably less than 5. So the compiler is able to remove the branch.
But in this example, even if you always input less than 5, the compiler doesn't know that. It must assume any input.
int x;
cin >> x;
if (x > 5)
printf("Hello\n");
else
printf("World\n");
The disassembly is:
cmp DWORD PTR x$[rsp], 5
lea rcx, OFFSET FLAT:??_C#_06NJBIDDBG#Hello?6?$AA#
jg SHORT $LN5#main
lea rcx, OFFSET FLAT:??_C#_06DKJADKFF#World?6?$AA#
$LN5#main:
call QWORD PTR __imp_printf
The branch stays. But note that it actually hoisted the function call out of the branch. So it really optimized the code down to something like this:
const char *str = "Hello\n";
if (!(x > 5))
str = "World\n";
printf(str);

Resources