Analysing the runtime efficiency of a Haskell function - performance

I have the following solution in Haskell to Problem 3:
isPrime :: Integer -> Bool
isPrime p = (divisors p) == [1, p]
divisors :: Integer -> [Integer]
divisors n = [d | d <- [1..n], n `mod` d == 0]
main = print (head (filter isPrime (filter ((==0) . (n `mod`)) [n-1,n-2..])))
where n = 600851475143
However, it takes more than the minute limit given by Project Euler. So how do I analyze the time complexity of my code to determine where I need to make changes?
Note: Please do not post alternative algorithms. I want to figure those out on my own. For now I just want to analyse the code I have and look for ways to improve it. Thanks!

Two things:
Any time you see a list comprehension (as you have in divisors), or equivalently, some series of map and/or filter functions over a list (as you have in main), treat its complexity as Θ(n) just the same as you would treat a for-loop in an imperative language.
This is probably not quite the sort of advice you were expecting, but I hope it will be more helpful: Part of the purpose of Project Euler is to encourage you to think about the definitions of various mathematical concepts, and about the many different algorithms that might correctly satisfy those definitions.
Okay, that second suggestion was a bit too nebulous... What I mean is, for example, the way you've implemented isPrime is really a textbook definition:
isPrime :: Integer -> Bool
isPrime p = (divisors p) == [1, p]
-- p is prime if its only divisors are 1 and p.
Likewise, your implementation of divisors is straightforward:
divisors :: Integer -> [Integer]
divisors n = [d | d <- [1..n], n `mod` d == 0]
-- the divisors of n are the numbers between 1 and n that divide evenly into n.
These definitions both read very nicely! Algorithmically, on the other hand, they are too naïve. Let's take a simple example: what are the divisors of the number 10? [1, 2, 5, 10]. On inspection, you probably notice a couple things:
1 and 10 are pairs, and 2 and 5 are pairs.
Aside from 10 itself, there can't be any divisors of 10 that are greater than 5.
You can probably exploit properties like these to optimize your algorithm, right? So, without looking at your code -- just using pencil and paper -- try sketching out a faster algorithm for divisors. If you've understood my hint, divisors n should run in sqrt n time. You'll find more opportunities along these lines as you continue. You might decide to redefine everything differently, in a way that doesn't use your divisors function at all...
Hope this helps give you the right mindset for tackling these problems!

Let's start from the top.
divisors :: Integer -> [Integer]
divisors n = [d | d <- [1..n], n `mod` d == 0]
For now, let's assume that certain things are cheap: incrementing numbers is O(1), doing mod operations is O(1), and comparisons with 0 are O(1). (These are false assumptions, but what the heck.) The divisors function loops over all numbers from 1 to n, and does an O(1) operation on each number, so computing the complete output is O(n). Notice that here when we say O(n), n is the input number, not the size of the input! Since it takes m=log(n) bits to store n, this function takes O(2^m) time in the size of the input to produce a complete answer. I'll use n and m consistently to mean the input number and input size below.
isPrime :: Integer -> Bool
isPrime p = (divisors p) == [1, p]
In the worst case, p is prime, which forces divisors to produce its whole output. Comparison to a list of statically-known length is O(1), so this is dominated by the call to divisors. O(n), O(2^m)
Your main function does a bunch of things at once, so let's break down subexpressions a bit.
filter ((==0) . (n `mod`))
This loops over a list, and does an O(1) operation on each element. This is O(m), where here m is the length of the input list.
filter isPrime
Loops over a list, doing O(n) work on each element, where here n is the largest number in the list. If the list happens to be n elements long (as it is in your case), this means this is O(n*n) work, or O(2^m*2^m) = O(4^m) work (as above, this analysis is for the case where it produces its entire list).
print . head
Tiny bits of work. Let's call it O(m) for the printing part.
main = print (head (filter isPrime (filter ((==0) . (n `mod`)) [n-1,n-2..])))
Considering all the subexpressions above, the filter isPrime bit is clearly the dominating factor. O(4^m), O(n^2)
Now, there's one final subtlety to consider: throughout the analysis above, I've consistently made the assumption that each function/subexpression was forced to produce its entire output. As we can see in main, this probably isn't true: we call head, which only forces a little bit of the list. However, if the input number itself isn't prime, we know for sure that we must look through at least half the list: there will certainly be no divisors between n/2 and n. So, at best, we cut our work in half -- which has no effect on the asymptotic cost.

Daniel Wagner's answer explains the general strategy of deriving bounds for the runtime complexity rather well. However, as is usually the case for general strategies, it yields too conservative bounds.
So, just for the heck of it, let's investigate this example in some more detail.
main = print (head (filter isPrime (filter ((==0) . (n `mod`)) [n-1,n-2..])))
where n = 600851475143
(Aside: if n were prime, this would cause a runtime error when checking n `mod` 0 == 0, thus I change the list to [n, n-1 .. 2] so that the algorithm works for all n > 1.)
Let's split up the expression into its parts, so we can see and analyse each part more easily
main = print answer
where
n = 600851475143
candidates = [n, n-1 .. 2]
divisorsOfN = filter ((== 0) . (n `mod`)) candidates
primeDivisors = filter isPrime divisorsOfN
answer = head primeDivisors
Like Daniel, I work with the assumption that arithmetic operations, comparisons etc. are O(1) - although not true, that's a good enough approximation for all remotely reasonable inputs.
So, of the list candidates, the elements from n down to answer have to be generated, n - answer + 1 elements, for a total cost of O(n - answer + 1). For composite n, we have answer <= n/2, then that's Θ(n).
Generating the list of divisors as far as needed is then Θ(n - answer + 1) too.
For the number d(n) of divisors of n, we can use the coarse estimate d(n) <= 2√n.
All divisors >= answer of n have to be checked for primality, that's at least half of all divisors.
Since the list of divisors is lazily generated, the complexity of
isPrime :: Integer -> Bool
isPrime p = (divisors p) == [1, p]
is O(smallest prime factor of p), because as soon as the first divisor > 1 is found, the equality test is determined. For composite p, the smallest prime factor is <= √p.
We have < 2√n primality checks of complexity at worst O(√n), and one check of complexity Θ(answer), so the combined work of all prime tests carried out is O(n).
Summing up, the total work needed is O(n), since the cost of each step is O(n) at worst.
In fact, the total work done in this algorithm is Θ(n). If n is prime, generating the list of divisors as far as needed is done in O(1), but the prime test is Θ(n). If n is composite, answer <= n/2, and generating the list of divisors as far as needed is Θ(n).
If we don't consider the arithmetic operations to be O(1), we have to multiply with the complexity of an arithmetic operation on numbers the size of n, that is O(log n) bits, which, depending on the algorithms used, usually gives a factor slightly above log n and below (log n)^2.

Related

Analysing running time, Big O

I am having problems with anyalysing the time complexity of an algorithm.
For example the following Haskell code, which will sort a list.
sort xs
|isSorted xs= xs
|otherwise= sort (check xs)
where
isSorted xs=all (==True) (zipWith (<=) xs ( drop 1 xs))
check [] =[]
check [x]=[x]
check (x:y:xs)
|x<=y = x:check (y:xs)
|otherwise=y:check (x:xs)
So for n being the length of the list and t_isSorted(n) the running time function: there is an constant t_drop(n) =c and t_all(n)=n, t_zipWith(n)=n :
t_isSorted(n)= c + n +n
For t_check:
t_check(1)=c1
t_check(n)=c2 + t_check(n-1), c2= for comparing and changing an element
.
.
.
t_check(n)=i*c2 + tcheck_(n-i), with i=n-1
=(n-1)*c2 + t_check(1)
=n*c2 - c2 + c1
And how exactly do I have to combine those to get t_sort(n)? I guess in the worst- case, sort xs has to run n-1 times.
isSorted is indeed O(n), since it's dominated by zipWith which in turn is O(n) since it does a linear pass over its argument.
check itself is O(n), since it only calls itself once per execution, and it always removes a constant number of elements from the list. The fastest sorting algorithm (without knowing something more about the list) runs in O(n*log(n)) (equivalent to O(log(n!)) time). There's a mathematical proof of this, and this algorithm is faster, so it cannot possibly be sorting the whole list.
check only moves things one step; it's effectively a single pass of bubble sort.
Consider sorting this list: [3,2,1]
check [3,2,1] = 2:(check [3,1]) -- since 3 > 2
check [3,1] = 1:(check [3]) -- since 3 > 1
check [3] = [3]
which would return the "sorted" list [2,1,3].
Then, as long as the list is not sorted, we loop. Since we might only put one element in its correct position (as 3 did in the example above), we might need O(n) loop iterations.
This totals at a time complexity of O(n) * O(n) = O(n^2)
The time complexity is O(n^2).
You're right, one step takes O(n) time (for both isSorted and check functions). It is called no more than n times (maybe even n - 1, it doesn't really matter for time complexity) (after the first call the largest element is guaranteed to be the last one, the same is the case for the second largest after the second call. We can prove that the last k elements are the largest and sorted properly after k calls). It swaps only adjacent elements, so it removes at most one inversion per step. As the number of inversions is O(n^2) in the worst case (namely, n * (n - 1) / 2), the time complexity is O(n^2).

Determining whether a system of congruences has a solution

Having a system of linear congruences, I'd like to determine if it has a solution. Using simple algorithms that solve such systems is impossible, as the answer may grow exponentially.
One hypothesis I have is that if a system of congruences has no solution, then there are two of them that contradict each other. I have no idea if this holds, if it did that would lead to an easy O(n^2 log n) algo, as checking if a pair of congruences has a solution requires O(log n) time. Nevertheless for this problem I'd rather see something closer to O(n).
We may assume that no moduli exceeds 10^6, especially we can quickly factor them all to begin with. We may even further assume that the sum of all moduli doesn't exceed 10^6 (but still, their product can be huge).
As you suspect, there's a fairly simple way to determine whether the set of congruences has a solution without actually needing to build that solution. You need to:
Reduce each congruence to the form x = a (mod n) if necessary; from the comments, it sounds as though you already have this.
Factorize each modulus n as a product of prime powers: n = p1^e1 * p2^e2 * ... * pk^ek.
Replace each congruence x = a (mod n) with a collection of congruences x = a (mod pi^ei), one for each of the k prime powers you found in step 2.
And now, by the Chinese Remainder Theorem it's enough to check compatibility for each prime independently: given any two congruences x = a (mod p^e) and x = b (mod p^f), they're compatible if and only if a = b (mod p^(min(e, f)). Having determined compatibility, you can throw out the congruence with smaller modulus without losing any information.
With the right data structures, you can do all this in a single pass through your congruences: for each prime p encountered, you'll need to keep track of the biggest exponent e found so far, together with the corresponding right-hand side (reduced modulo p^e for convenience). The running time will likely be dominated by the modulus factorizations, though if no modulus exceeds 10^6, then you can make that step very fast, too, by prebuilding a mapping from each integer in the range 1 .. 10^6 to its smallest prime factor.
EDIT: And since this is supposed to be a programming site, here's some (Python 3) code to illustrate the above. (For Python 2, replace the range call with xrange for better efficiency.)
def prime_power_factorisation(n):
"""Brain-dead factorisation routine, for illustration purposes only."""
# DO NOT USE FOR LARGE n!
while n > 1:
p, pe = next(d for d in range(2, n+1) if n % d == 0), 1
while n % p == 0:
n, pe = n // p, pe*p
yield p, pe
def compatible(old_ppc, new_ppc):
"""Determine whether two prime power congruences (with the same
prime) are compatible."""
m, a = old_ppc
n, b = new_ppc
return (a - b) % min(m, n) == 0
def are_congruences_solvable(moduli, right_hand_sides):
"""Determine whether the given congruences have a common solution."""
# prime_power_congruences is a dictionary mapping each prime encountered
# so far to a pair (prime power modulus, right-hand side).
prime_power_congruences = {}
for m, a in zip(moduli, right_hand_sides):
for p, pe in prime_power_factorisation(m):
# new prime-power congruence: modulus, rhs
new_ppc = pe, a % pe
if p in prime_power_congruences:
old_ppc = prime_power_congruences[p]
if not compatible(new_ppc, old_ppc):
return False
# Keep the one with bigger exponent.
prime_power_congruences[p] = max(new_ppc, old_ppc)
else:
prime_power_congruences[p] = new_ppc
# If we got this far, there are no incompatibilities, and
# the congruences have a mutual solution.
return True
One final note: in the above, we made use of the fact that the moduli were small, so that computing prime power factorisations wasn't a big deal. But if you do need to do this for much larger moduli (hundreds or thousands of digits), it's still feasible. You can skip the factorisation step, and instead find a "coprime base" for the collection of moduli: that is, a collection of pairwise relatively prime positive integers such that each modulus appearing in your congruences can be expressed as a product (possibly with repetitions) of elements of that collection. Now proceed as above, but with reference to that coprime base instead of the set of primes and prime powers. See this article by Daniel Bernstein for an efficient way to compute a coprime base for a set of positive integers. You'd likely end up making two passes through your list: one to compute the coprime base, and a second to check the consistency.

Performance of two alternative functions that compute least divisor

In the book "The Haskell Road to Logic, Maths and Programming", the authors present two alternative ways of finding the least divisor k of a number n with k > 1, claiming that the second version is much faster than the first one. I have problems understanding why (I am a beginnner).
Here is the first version (page 10):
ld :: Integer -> Integer -- finds the smallest divisor of n which is > 1
ld n = ldf 2 n
ldf :: Integer -> Integer -> Integer
ldf k n | n `rem` k == 0 = k
| k ^ 2 > n = n
| otherwise = ldf (k + 1) n
If I understand this correctly, the ld function basically ends up iterating through all integers in the interval [2..sqrt(n)] and stops as soon as one of them divides n, returning it as the result.
The second version, which the authors claim to be much faster, goes like this (page 23):
ldp :: Integer -> Integer -- finds the smallest divisor of n which is > 1
ldp n = ldpf allPrimes n
ldpf :: [Integer] -> Integer -> Integer
ldpf (p:ps) n | rem n p == 0 = p
| p ^ 2 > n = n
| otherwise = ldpf ps n
allPrimes :: [Integer]
allPrimes = 2 : filter isPrime [3..]
isPrime :: Integer -> Bool
isPrime n | n < 1 = error "Not a positive integer"
| n == 1 = False
| otherwise = ldp n == n
The authors claim that this version is faster because it iterates only through the list of primes within the interval 2..sqrt(n), instead of iterating through all numbers in that range.
However, this argument doesn't convince me: the recursive function ldpf is eating one by one the numbers from the list of primes allPrimes. This list is generated by doing filter on the list of all integers.
So unless I am missing something, this second version ends up iterating through all numbers within the interval 2..sqrt(n) too, but for each number it first checks whether it is prime (a relatively expensive operation), and if so, it checks whether it divides n (a relatively cheap one).
I would say that just checking whether k divides n for each k should be faster. Where is the flaw in my reasoning?
The main advantage of the second solution is that you compute the list of primes allPrimes only once. Thanks to lazy evaluation, each call computes just the primes it needs, or reuses the ones that have been already computed. So the expensive part is computed only once and then just reused.
For computing the least divisor of just a single number, the first version is indeed more efficient. But try running ldp and ld for all numbers between let's say 1 and 100000 and you'll see the difference.
haskell is unknown for me so without proper measurements of booth version I can only assume that the claims are right. In that case the reasons can be:
1.primes are precomputed in some array
then isprime? is not time expensive
2.primes are computed on the run (and memoizing)
from start will be the first version faster
but continuous use will make the second version faster (especially for bigger n)
3.primes are computed on the run (and not memoizing)
as DeadMG mentioned
compiler optimizations+CPU caching can sometimes feel like memoizing effect up to a point
in that case version 2 will be overall faster
but if you continue to use bigger and bigger n and small ones all the time
after reaching cache invalidation point it will slow down even more then version 1
4.this is just a guess
is it possible for your compiler to convert recursion
like single integer iteration through list or range
to loop ? (most recursions of this type can be converted to loop anyway)
that could explain all ...
no recursion calls overhead
no heap trashing
[Note]
as I wrote above I am not haskell user so treat this answer accordingly
As I understand this, the division operation isn't as expensive as you might think for the divisor of 2, this makes for half of allPrimes filtered out numbers to be checked for "shift right 1 bit" which is as simple as a computing operation can be, while the first algorithm will perform a relatively expensive true division by the whole of the number. Say if the possible divisor is 1956, it will be filtered out of allPrimes by executing the very first test at almost no cost (shift right will return zero - divisible by 2) while dividing a say 2^4253-1 by 1956 is already senseless, as it's not divisible by 2, and in case of really big numbers divisions take a lot of time, and at least half of them (or say 5/6, for divisors 2 and 3) are useless. Also allPrimes is a cached list, thus checking the next integer for a prime to be included in allPrimes uses only verified primes, so the primality test isn't extremely expensive even for the actual prime number. This combined gives the second method advantage.

Reducing space complexity of Sieve of Eratosthenes to generate primes in a range

After getting through some of the SO posts, I found Sieve of Eratosthenes is the best & fastest way of generating prime numbers.
I want to generate the prime numbers between two numbers, say a and b.
AFAIK, in Sieve's method, the space complexity is O(b).
PS: I wrote Big-O and not Theta, because I don't know whether the space requirement can be reduced.
Can we reduce the space complexity in Sieve of Eratosthenes ?
There are two basic choices here: sieve the range [a..b] by primes below sqrt(b) (the "offset" sieve of Eratosthenes), or by odd numbers. That's right; just eliminate the multiples of each odd as you would of each prime. Sieve the range in one chunk, or in several "segments" if the range is too wide (but efficiency can deteriorate if the chunks are too narrow).
In Haskell executable pseudocode,
-- foldl :: (r -> x -> r) -> r -> [x] -> r -- type signature of foldl
primesRange_by_Odds a b =
foldl (\ r x -> r `minus` [q x, q x+2*x .. b])
[o, o+2 .. b] -- initial value of `r`, the list
[3, 5 .. floor(sqrt(fromIntegral b))] -- values of `x`, one after another
where
o = 1 + 2*div a 2 -- odd start of the range
q x = x*x - 2*x*min 0 (div (x*x-o) (2*x)) -- 1st odd multiple of x >= x*x in range
Sieving by odds will have the additional space complexity of O(1) (on top of the output / range space of O(|b-a|)).
That is because we can enumerate the odds just by iteratively adding 2 – unlike the "core" primes for the sieve of Eratosthenes, below sqrt(b), for which we have to reserve additional space of O(pi(sqrt(b))) = ~ 2*sqrt(b)/log(b) (where pi() is the prime-counting function).
The remaining question is how do we find those "core" primes. Trial division will entail an additional space of O(1) but if we were to do it by the sieve of Eratosthenes, we'd need the O(sqrt(b)) space to perform the core sieve itself -- unless we perform it as a segmented sieve thus with the auxiliary space requirement of O(sqrt(sqrt(b))). Choose whatever method suits your needs better.
Sorenson Sieve might be worth a peek if space complexity is really an issue. Got the reference from the wikipedia page you shared.
If you have enough space to store all the primes up to sqrt(b) then you can sieve for the primes in the range a to b using additional space O(b-a).
In Python this might look like:
def primesieve(ps,start,n):
"""Sieve the interval [start,start+n) for primes.
Returns a list P of length n.
P[x]==1 if the number start+x is prime.
Relies on being given a list of primes in ps from 2 up to sqrt(start+n)."""
P=[1]*n
for p in ps:
for k in range((-start)%p,n,p):
if k+start<=p: continue
P[k]=0
return P
You could easily make this take half the space by only sieving the odd numbers.
Search for "segmented sieve of Eratosthenes" at your favorite search engine. If you don't want to go searching, I have an implementation at my blog.

Time complexity of Euclid's Algorithm

I am having difficulty deciding what the time complexity of Euclid's greatest common denominator algorithm is. This algorithm in pseudo-code is:
function gcd(a, b)
while b ≠ 0
t := b
b := a mod b
a := t
return a
It seems to depend on a and b. My thinking is that the time complexity is O(a % b). Is that correct? Is there a better way to write that?
One trick for analyzing the time complexity of Euclid's algorithm is to follow what happens over two iterations:
a', b' := a % b, b % (a % b)
Now a and b will both decrease, instead of only one, which makes the analysis easier. You can divide it into cases:
Tiny A: 2a <= b
Tiny B: 2b <= a
Small A: 2a > b but a < b
Small B: 2b > a but b < a
Equal: a == b
Now we'll show that every single case decreases the total a+b by at least a quarter:
Tiny A: b % (a % b) < a and 2a <= b, so b is decreased by at least half, so a+b decreased by at least 25%
Tiny B: a % b < b and 2b <= a, so a is decreased by at least half, so a+b decreased by at least 25%
Small A: b will become b-a, which is less than b/2, decreasing a+b by at least 25%.
Small B: a will become a-b, which is less than a/2, decreasing a+b by at least 25%.
Equal: a+b drops to 0, which is obviously decreasing a+b by at least 25%.
Therefore, by case analysis, every double-step decreases a+b by at least 25%. There's a maximum number of times this can happen before a+b is forced to drop below 1. The total number of steps (S) until we hit 0 must satisfy (4/3)^S <= A+B. Now just work it:
(4/3)^S <= A+B
S <= lg[4/3](A+B)
S is O(lg[4/3](A+B))
S is O(lg(A+B))
S is O(lg(A*B)) //because A*B asymptotically greater than A+B
S is O(lg(A)+lg(B))
//Input size N is lg(A) + lg(B)
S is O(N)
So the number of iterations is linear in the number of input digits. For numbers that fit into cpu registers, it's reasonable to model the iterations as taking constant time and pretend that the total running time of the gcd is linear.
Of course, if you're dealing with big integers, you must account for the fact that the modulus operations within each iteration don't have a constant cost. Roughly speaking, the total asymptotic runtime is going to be n^2 times a polylogarithmic factor. Something like n^2 lg(n) 2^O(log* n). The polylogarithmic factor can be avoided by instead using a binary gcd.
The suitable way to analyze an algorithm is by determining its worst case scenarios.
Euclidean GCD's worst case occurs when Fibonacci Pairs are involved.
void EGCD(fib[i], fib[i - 1]), where i > 0.
For instance, let's opt for the case where the dividend is 55, and the divisor is 34 (recall that we are still dealing with fibonacci numbers).
As you may notice, this operation costed 8 iterations (or recursive calls).
Let's try larger Fibonacci numbers, namely 121393 and 75025. We can notice here as well that it took 24 iterations (or recursive calls).
You can also notice that each iterations yields a Fibonacci number. That's why we have so many operations. We can't obtain similar results only with Fibonacci numbers indeed.
Hence, the time complexity is going to be represented by small Oh (upper bound), this time. The lower bound is intuitively Omega(1): case of 500 divided by 2, for instance.
Let's solve the recurrence relation:
We may say then that Euclidean GCD can make log(xy) operation at most.
There's a great look at this on the wikipedia article.
It even has a nice plot of complexity for value pairs.
It is not O(a%b).
It is known (see article) that it will never take more steps than five times the number of digits in the smaller number. So the max number of steps grows as the number of digits (ln b). The cost of each step also grows as the number of digits, so the complexity is bound by O(ln^2 b) where b is the smaller number. That's an upper limit, and the actual time is usually less.
See here.
In particular this part:
Lamé showed that the number of steps needed to arrive at the greatest common divisor for two numbers less than n is
So O(log min(a, b)) is a good upper bound.
Here's intuitive understanding of runtime complexity of Euclid's algorithm. The formal proofs are covered in various texts such as Introduction to Algorithms and TAOCP Vol 2.
First think about what if we tried to take gcd of two Fibonacci numbers F(k+1) and F(k). You might quickly observe that Euclid's algorithm iterates on to F(k) and F(k-1). That is, with each iteration we move down one number in Fibonacci series. As Fibonacci numbers are O(Phi ^ k) where Phi is golden ratio, we can see that runtime of GCD was O(log n) where n=max(a, b) and log has base of Phi. Next, we can prove that this would be the worst case by observing that Fibonacci numbers consistently produces pairs where the remainders remains large enough in each iteration and never become zero until you have arrived at the start of the series.
We can make O(log n) where n=max(a, b) bound even more tighter. Assume that b >= a so we can write bound at O(log b). First, observe that GCD(ka, kb) = GCD(a, b). As biggest values of k is gcd(a,c), we can replace b with b/gcd(a,b) in our runtime leading to more tighter bound of O(log b/gcd(a,b)).
Here is the analysis in the book Data Structures and Algorithm Analysis in C by Mark Allen Weiss (second edition, 2.4.4):
Euclid's algorithm works by continually computing remainders until 0 is reached. The last nonzero remainder is the answer.
Here is the code:
unsigned int Gcd(unsigned int M, unsigned int N)
{
unsigned int Rem;
while (N > 0) {
Rem = M % N;
M = N;
N = Rem;
}
Return M;
}
Here is a THEOREM that we are going to use:
If M > N, then M mod N < M/2.
PROOF:
There are two cases. If N <= M/2, then since the remainder is smaller
than N, the theorem is true for this case. The other case is N > M/2.
But then N goes into M once with a remainder M - N < M/2, proving the
theorem.
So, we can make the following inference:
Variables M N Rem
initial M N M%N
1 iteration N M%N N%(M%N)
2 iterations M%N N%(M%N) (M%N)%(N%(M%N)) < (M%N)/2
So, after two iterations, the remainder is at most half of its original value. This would show that the number of iterations is at most 2logN = O(logN).
Note that, the algorithm computes Gcd(M,N), assuming M >= N.(If N > M, the first iteration of the loop swaps them.)
Worst case will arise when both n and m are consecutive Fibonacci numbers.
gcd(Fn,Fn−1)=gcd(Fn−1,Fn−2)=⋯=gcd(F1,F0)=1 and nth Fibonacci number is 1.618^n, where 1.618 is the Golden ratio.
So, to find gcd(n,m), number of recursive calls will be Θ(logn).
The worst case of Euclid Algorithm is when the remainders are the biggest possible at each step, ie. for two consecutive terms of the Fibonacci sequence.
When n and m are the number of digits of a and b, assuming n >= m, the algorithm uses O(m) divisions.
Note that complexities are always given in terms of the sizes of inputs, in this case the number of digits.
Gabriel Lame's Theorem bounds the number of steps by log(1/sqrt(5)*(a+1/2))-2, where the base of the log is (1+sqrt(5))/2. This is for the the worst case scenerio for the algorithm and it occurs when the inputs are consecutive Fibanocci numbers.
A slightly more liberal bound is: log a, where the base of the log is (sqrt(2)) is implied by Koblitz.
For cryptographic purposes we usually consider the bitwise complexity of the algorithms, taking into account that the bit size is given approximately by k=loga.
Here is a detailed analysis of the bitwise complexity of Euclid Algorith:
Although in most references the bitwise complexity of Euclid Algorithm is given by O(loga)^3 there exists a tighter bound which is O(loga)^2.
Consider; r0=a, r1=b, r0=q1.r1+r2 . . . ,ri-1=qi.ri+ri+1, . . . ,rm-2=qm-1.rm-1+rm rm-1=qm.rm
observe that: a=r0>=b=r1>r2>r3...>rm-1>rm>0 ..........(1)
and rm is the greatest common divisor of a and b.
By a Claim in Koblitz's book( A course in number Theory and Cryptography) is can be proven that: ri+1<(ri-1)/2 .................(2)
Again in Koblitz the number of bit operations required to divide a k-bit positive integer by an l-bit positive integer (assuming k>=l) is given as: (k-l+1).l ...................(3)
By (1) and (2) the number of divisons is O(loga) and so by (3) the total complexity is O(loga)^3.
Now this may be reduced to O(loga)^2 by a remark in Koblitz.
consider ki= logri +1
by (1) and (2) we have: ki+1<=ki for i=0,1,...,m-2,m-1 and ki+2<=(ki)-1 for i=0,1,...,m-2
and by (3) the total cost of the m divisons is bounded by: SUM [(ki-1)-((ki)-1))]*ki for i=0,1,2,..,m
rearranging this: SUM [(ki-1)-((ki)-1))]*ki<=4*k0^2
So the bitwise complexity of Euclid's Algorithm is O(loga)^2.
For the iterative algorithm, however, we have:
int iterativeEGCD(long long n, long long m) {
long long a;
int numberOfIterations = 0;
while ( n != 0 ) {
a = m;
m = n;
n = a % n;
numberOfIterations ++;
}
printf("\nIterative GCD iterated %d times.", numberOfIterations);
return m;
}
With Fibonacci pairs, there is no difference between iterativeEGCD() and iterativeEGCDForWorstCase() where the latter looks like the following:
int iterativeEGCDForWorstCase(long long n, long long m) {
long long a;
int numberOfIterations = 0;
while ( n != 0 ) {
a = m;
m = n;
n = a - n;
numberOfIterations ++;
}
printf("\nIterative GCD iterated %d times.", numberOfIterations);
return m;
}
Yes, with Fibonacci Pairs, n = a % n and n = a - n, it is exactly the same thing.
We also know that, in an earlier response for the same question, there is a prevailing decreasing factor: factor = m / (n % m).
Therefore, to shape the iterative version of the Euclidean GCD in a defined form, we may depict as a "simulator" like this:
void iterativeGCDSimulator(long long x, long long y) {
long long i;
double factor = x / (double)(x % y);
int numberOfIterations = 0;
for ( i = x * y ; i >= 1 ; i = i / factor) {
numberOfIterations ++;
}
printf("\nIterative GCD Simulator iterated %d times.", numberOfIterations);
}
Based on the work (last slide) of Dr. Jauhar Ali, the loop above is logarithmic.
Yes, small Oh because the simulator tells the number of iterations at most. Non Fibonacci pairs would take a lesser number of iterations than Fibonacci, when probed on Euclidean GCD.
At every step, there are two cases
b >= a / 2, then a, b = b, a % b will make b at most half of its previous value
b < a / 2, then a, b = b, a % b will make a at most half of its previous value, since b is less than a / 2
So at every step, the algorithm will reduce at least one number to at least half less.
In at most O(log a)+O(log b) step, this will be reduced to the simple cases. Which yield an O(log n) algorithm, where n is the upper limit of a and b.
I have found it here

Resources