Exponentiation in a low complexity way - algorithm

I want to compute q^k, s.t. q is n bits wide, in the limitations:
the final result will be n*k bits wide.
for every step of the calculation, the result of multiplying x,y s.t. x is |x| bits wide and y is |y| bits wide is |x|*|y| bits wide.
I tried to do that in pairs; start with the q^2's, then q^4's, etc.
The 1st step result takes 2n bits, the 2nd takes (2^2)n bits, etc. and the last step takes n*2^(logk) (=kn) bits.
We have log(k) steps, and a Careful calculation brings us to:
I'd be happy to hear about a faster way of doing that (or a better analysis of this algorithm or a similar), in the above restrictions.
Thanks in advance.

Assuming that multiplications with n-bit outputs have cost n f(n) for some function f that is positive and nondecreasing, the final multiplication costs asymptotically no less than the rest of the work. Preparing the squares for q^k where q has n bits costs
2 n f(2 n) + 4 n f(4 n) + ... + 2^floor(lg(k)) f(2^floor(lg(k)) n)
<= (2 n + 4 n + ... + 2^floor(lg(k)) n) f(2^floor(lg(k)) n)
<= 2 (2^floor(lg(k)) n) f(2^floor(lg(k)) n)
<= 2 k n f(k n),
which is twice the cost of the final multiplication. The other multiplications can be analyzed similarly.

I think the best integer pow is O(2*Log2(K)) when going through bits of k.
k can be written in binary form k = { kn-1,...k3,k2,k1,k0 }
so equation looks like this:
q^k = q^( 1*k0 +2*k1 +4*k2 +8*k3 ... )
q^k = q^k0 * q^(2*k1) * q^(4*k2) ....
so you have only n = ceil(log2(k)) steps to compute
each step multiply q^i to result if ki!=0
q0 = q
simplified working code (does not handle special cases 0^k,k^0,0^0,...):
for (s=1;k;k>>=1,q*=q)
if (k&1) s*=q;
return s;
of course if you want to use big q,k than bigint arithmetics is needed instead
i do not think that pairing low bits multiplications will be of much help
unless used for really big numbers
because the pairing and merging different bit-count sub-results is usually slower


What is the best approach for computing logarithm of an integer x base 2 approximated to the greatest integer less than or equal to it?

What is the best amortized time complexity to compute floor(log(x)) amongst the algorithms that find floor(log(x)) for base 2?
There are many different algorithms for computing logarithms, each of which represents a different tradeoff of some sort. This answer surveys a variety of approaches and some of the tradeoffs involved.
Approach 1: Iterated Multiplication
One simple approach for computing ⌊logb n⌋ is to compute the sequence b0, b1, b2, etc. until we find a value greater than n. At that point, we can stop, and return the exponent just before this. The code for this is fairly straightforward:
x = 0; # Exponent
bX = 1; # b^Exponent
while (bx <= n) {
bX *= b;
return x - 1;
How fast is this? Notice that the inner loop counts up x = 0, x = 1, x = 2, etc. until eventually we reach x = ⌊logb x⌋, doing one multiplication per iteration. If we assume all the integers we're dealing with fit into individual machine words - say, we're working with int or long or something like that - then each multiply takes time O(1) and the overall runtime is O(logb n), with a memory usage of O(1).
Approach 2: Repeated Squaring
There's an old interview question that goes something like this. I have a number n, and I'm not going to tell you what it is. You can make queries of the form "is x equal to n, less than n, or greater than n?," and your goal is to use the fewest queries to figure out what n is. Assuming you have literally no idea what n can be, one reasonable approach works like this: guess the values 1, 2, 4, 8, 16, 32, ..., 2k, ..., until you overshoot n. At that point, use a binary search on the range you just found to discover what n is. This runs in time O(log n), since after computing 21 + log2 n = 2n you'll overshoot n, and after that you're binary searching over a range of size n for a total runtime of O(log n).
Finding logarithms, in a sense, kinda sorta matches this problem. We have a number n written as bx for some unknown x, and we want to find x. Using the above strategy as a starting point, we can therefore compute b20, b21, b22, etc. until we overshoot bx. From there, we can run a secondary binary search to figure out the exact exponent required.
We can compute the series of values b2k by using the fact that
b2k+1 = b2 · 2k = (b2k)2
and find a value that overshoots as follows:
x = 1; # exponent
bX = b; # b^x
while (bX <= n) {
bX *= bX; # bX = bX^2
x *= 2;
# Overshot, now do the binary search
The problem is how to do that binary search to figure things out. Specifically, we know that b2x is too big, but we don't know by how much. And unlike the "guess the number" game, binary searching over the exponent is a bit tricky.
One cute solution to this is based on the idea that if x is the value that we're looking for, then we can write x as a series of bits in binary. For example, let's write x = am-12m-1 + am-22m-2 + ... + a121 + a020. Then
bx = bam-12m-1 + am-22m-2 + ... + a121 + a020
= 2am-12m-1 · 2am-22m-2 · ... · 2a0 20
In other words, we can try to discover what bx is by building up x one bit at a time. To do so, as we compute the values b1, b2, b4, b8, etc., we can write down the values we discover. Then, once we overshoot, we can try multiplying them in and see which ones should be included and which should be excluded. Here's what that looks like:
x = 1; // Exponent
bX = b; // b^x
powers = [b]; // b^{2^0}
exps = [1]; // 2^0
while (bX <= n) {
bX *= bX; // bX = bX^2
powers += bX; // Append bX
exps += x;
# Overshot, now recover the bits
resultExp = 1
result = 0;
while (x > 0) {
# If including this bit doesn't overshoot, it's part of the
# representation of x.
if (resultExp * powers[x] <= n) {
resultExp *= powers[x];
result += exps[x];
return result;
This is certainly a more involved approach, but it is faster. Since the value we're looking for is ⌊bx⌋ and we're essentially using the solution from the "guess the number game" to figure out what x is, the number of multiplications is O(log logb n), with a memory usage of O(log logb n) to hold the intermediate powers. This is exponentially faster than the previous solution!
Approach 3: Zeckendorf Representations
There's a slight modification on the previous approach that keeps the runtime of O(log logb n) but drops the auxiliary space usage to O(1). The idea is that, instead of writing the exponent in binary using the regular system, we write the number out using Zeckendorf's theorem, which is a binary number system based on the Fibonacci sequence. The advantage is that instead of having to store the intermediate powers of two, we can use the fact that any two consecutive Fibonacci numbers are sufficient to let you compute the next or previous Fibonacci number, allowing us to regenerate the powers of b as needed. Here's an implementation of that idea in C++.
Approach 4: Bitwise Iterated Multiplication
In some cases, you need to find logarithms where the log base is two. In that case, you can take advantage of the fact that numbers on a computer are represented in binary and that multiplications and divisions by two correspond to bit shifts.
For example, let's take the iterated multiplication approach from before, where we computed larger and larger powers of b until we overshot. We can do that same technique using bitshifts, and it's much faster:
x = 0; # Exponent
while ((1 << x) <= n) {
return x - 1;
This approach still runs in time O(log n), as before, but is probably faster implemented this way than with multiplications because the CPU can do bit shifts much more quickly.
Approach 5: Bitwise Binary Search
The base-two logarithm of a number, written in binary, is equivalent to the position of the most-significant bit of that number. To find that bit, we can use a binary search technique somewhat reminiscent of Approach 2, though done faster because the machine can process multiple bits in parallel in a single instruction. Basically, as before, we generate the sequence 220, 221, etc. until we overshoot the number, giving an upper bound on how high the highest bit can be. From there, we do a binary search to find the highest 1 bit. Here's what that looks like:
x = 1;
while ((1 << x) <= n) {
x *= 2;
# We've overshot the high-order bit. Do a binary search to find it.
low = 0;
high = x;
while (low < high) {
mid = (low + high) / 2;
# Form a bitmask with 1s up to and including bit number mid.
# This can be done by computing 2^{m+1} - 1.
mask = (1 << (mid + 1)) - 1
# If the mask overlaps, branch higher
if (mask & n) {
low = mid + 1
# Otherwise, branch lower
else {
high = mid
return high - 1
This approach runs in time O(log log n), since the binary search takes time logarithmic in the quantity being searched for and the quantity we're searching for is O(log n). It uses O(1) auxiliary space.
Approach 6: Magical Word-Level Parallelism
The speedup in Approach 5 is largely due to the fact that we can test multiple bits in parallel using a single bitwise operation. By doing some truly amazing things with machine words, it's possible to harness this parallelism to find the most-significant bit in a number in time O(1) using only basic arithmetic operations and bit-shifts, and to do so in a way where the runtime is completely independent of the size of a machine word (e.g. the algorithm works equally quickly on a 16-bit, 32-bit, and 64-bit machine). The techniques involved are fairly complex and I will confess that I had no idea this was possible to do until recently when I learned the technique when teaching an advanced data structures course.
To summarize, here are the approaches listed, their time complexity, and their space complexity.
Approach Which Bases? Time Complexity Space Complexity
Iter. Multiplication Any O(log_b n) O(1)
Repeated Squaring Any O(log log_b n) O(log log_b n)
Zeckendorf Logarithm Any O(log log_b n) O(1)
Bitwise Multiplication 2 O(log n) O(1)
Bitwise Binary Search 2 O(log log n) O(1)
Word-Level Parallelism 2 O(1) O(1)
There are many other algorithms I haven't mentioned here that are worth exploring. Some algorithms work by segmenting machine words into blocks of some fixed size, precomputing the position of the first 1 bit in each block, then testing one block at a time. These approaches have runtimes that depend on the size of the machine word, and (to the best of my knowledge) none of them are asymptotically faster than the approaches I've outlined here. Other approaches work by using the fact that some processors have instructions that immediately output the position of the most significant bit in a number, or work by using floating-point hardware. Those are also interesting and fascinating, be sure to check them out!
Another area worth exploring is when you have arbitrary-precision integers. There, the costs of multiplications, divisions, shifts, etc. are not O(1), and the relative costs of these algorithms change. This is definitely worth exploring in more depth if you're curious!
The code included here is written in pseudocode because it's designed mostly for exposition. In a real implementation, you'd need to worry about overflow, handling the case where the input is negative or zero, etc. Just FYI. :-)
Hope this helps!

Big O of multiplying 2 complex numbers?

What is the time complexity for multiplying two complex numbers?
For example (35 + 12i) *(45 +23i)
The asymptotic complexity is the same as for multiplying the components.
(35 + 12i) * (45 + 23i) == 35*45 + 45*12i + 35*23i - 12*23
== (35*45 - 12*23) + (45*12 + 35*23)i
You just have 4 real multiplications and 2 real additions.
So, if real multiplication is O(1), so is complex multiplication.
If real multiplication is not constant (as is the case for arbitrary precision values), then neither is complex multiplication.
If you multiply two complex numbers (a + bi) and (c + di), the calculation works out to (ac - bd, adi + bci), which requires a total of four multiplications and two subtractions. Additions and subtractions take less time than multiplications, so the main cost is the four multiplications done here. Since four is a constant, this doesn't change the big-O runtime of doing the muliplications compared to the real number case.
Let's imagine you have two numbers n1 and n2, each of which is d digits long. If you use the grade-school method for multiplying these numbers together, you'd do the following:
for each digit d1 of n2, in reverse:
let carry = 0
for each digit d2 of n1, in reverse:
let product = d1 * d2 + carry
write down product mod 10
set carry = product / 10, rounding down
add up all d of the d-digit numbers you wrote in step 1
That first loop runs in time Θ(d2), since each digit in n2 is paired and multiplied with each digit of n1, doing O(1) work apiece. The result is d different d-digit numbers. Adding up those numbers will take time Θ(d2), since you have to scan each number of each digit exactly once. Overall, this takes time Θ(d2).
Notice that this runtime is a function of how many digits are in n1 and n2, rather than n1 and n2 themselves. The number of digits in a number n is Θ(log n), so this runtime is actually O((log max{n1, n2})2) if you're multiplying two numbers n1 and n2.
This is not the fastest way to do multiplications, though for a while there was a conjecture that it was. Karatsuba's algorithm runs in time O((log max{n1, n2})log3 4), where the exponent is around 1.7ish. There are more modern algorithms that run even faster of this, and it's an open problem whether it can be done in time O(log d) with no exponent!
Multiplying two complex numbers only requires three real multiplications.
Let p = a * c, q = b * d, and r = (a + b) * (c + d).
Then (a + bi) * (c + di) = (p - q) + i(r - p - q).
See also Complex numbers product using only three multiplications.

Finding the first number larger than N that is a relative prime to M

Basically, the title says everything. The numbers are not too big (the maximum for N is ~2/3 * max(long) and max M is max(long)), so I think even a simple solution that I currently have is sufficient. M is always bigger than N.
What I currently have:
Most simple, just start from N + 1, perform plain Euclidean GCD, and if it returns 1 we are done, if not increment and try again.
I would like to know what is the worst case scenario with this solution. Performance is not a big issue, but still I feel like there must be a better way.
About the worst case, I made a small test:
Random r = new Random();
while (true)
long num = (long) r.Next();
num *= r.Next();
f((long)(num * 0.61), num);
public static int max;
public static int f(long N, long M)
int iter = 0;
while (GCD(N++, M) != 1)
if (iter > max)
max = iter;
return 0;
It is running for ~30 minutes and the worst case so far is 29 iterations. So I believe that there is a more precise answer then O(N).
I don't know the worst scenario, but using the fact that M < 264, I can bound it above by 292 iterations and below by 53 (removing the restriction that the ratio N/M be approximately fixed).
Let p1, …, pk be the primes greater than or equal to 5 by which M is divisible. Let N' ≥ N be the least integer such that N' = 1 mod 6 or N' = 5 mod 6. For each i = 1, …, k, the prime pi divides at most ceil(49/pi) of the integers N', N' + 6, N' + 12, …, N' + 288. An upper bound on ∑i=1,…,k ceil(49/pi) is ∑i=3,…,16 ceil(49/qi) = 48, where q is the primes in order starting with q1 = 2. (This follows because ∏i=3,…,17 ≥ 264 implies that M is the product of at most 14 distinct primes other than 2 and 3.) We conclude that at least one of the integers mentioned is relatively prime to M.
For the lower bound, let M = 614889782588491410 (product of the first fifteen primes) and let N = 1. After 1, the first integer relatively prime to the first fifteen primes is the sixteenth prime, 53.
I expect both bounds could be improved without too much work, though it's not clear to me for what purpose. For the upper bound, handle separately the case where 2 and 3 are both divisors of M, as then M can be the product of at most thirteen other primes. For the lower bound, one could try to find a good M by running the sieve of Eratosthenes to compute, for a range of integers, the list of primes dividing those integers. Then sweep a window across the range; if the product of the distinct primes in the window is too large, advance the trailing end of the window; otherwise, advance the leading end.
Sure it's not O(n), By knowing that prime number gaps is logen we can simply say your algorithm has at most logen iterations,(because after passing at most logen number you will see new prime number which is prime respect to your given number n) for more detail about this gap, you can see prime numbers gap.
So for your bounded case it is smaller than logen = loge264 <= 44 and it will be smaller than 44 iterations.

Time complexity of Euclid's Algorithm

I am having difficulty deciding what the time complexity of Euclid's greatest common denominator algorithm is. This algorithm in pseudo-code is:
function gcd(a, b)
while b ≠ 0
t := b
b := a mod b
a := t
return a
It seems to depend on a and b. My thinking is that the time complexity is O(a % b). Is that correct? Is there a better way to write that?
One trick for analyzing the time complexity of Euclid's algorithm is to follow what happens over two iterations:
a', b' := a % b, b % (a % b)
Now a and b will both decrease, instead of only one, which makes the analysis easier. You can divide it into cases:
Tiny A: 2a <= b
Tiny B: 2b <= a
Small A: 2a > b but a < b
Small B: 2b > a but b < a
Equal: a == b
Now we'll show that every single case decreases the total a+b by at least a quarter:
Tiny A: b % (a % b) < a and 2a <= b, so b is decreased by at least half, so a+b decreased by at least 25%
Tiny B: a % b < b and 2b <= a, so a is decreased by at least half, so a+b decreased by at least 25%
Small A: b will become b-a, which is less than b/2, decreasing a+b by at least 25%.
Small B: a will become a-b, which is less than a/2, decreasing a+b by at least 25%.
Equal: a+b drops to 0, which is obviously decreasing a+b by at least 25%.
Therefore, by case analysis, every double-step decreases a+b by at least 25%. There's a maximum number of times this can happen before a+b is forced to drop below 1. The total number of steps (S) until we hit 0 must satisfy (4/3)^S <= A+B. Now just work it:
(4/3)^S <= A+B
S <= lg[4/3](A+B)
S is O(lg[4/3](A+B))
S is O(lg(A+B))
S is O(lg(A*B)) //because A*B asymptotically greater than A+B
S is O(lg(A)+lg(B))
//Input size N is lg(A) + lg(B)
S is O(N)
So the number of iterations is linear in the number of input digits. For numbers that fit into cpu registers, it's reasonable to model the iterations as taking constant time and pretend that the total running time of the gcd is linear.
Of course, if you're dealing with big integers, you must account for the fact that the modulus operations within each iteration don't have a constant cost. Roughly speaking, the total asymptotic runtime is going to be n^2 times a polylogarithmic factor. Something like n^2 lg(n) 2^O(log* n). The polylogarithmic factor can be avoided by instead using a binary gcd.
The suitable way to analyze an algorithm is by determining its worst case scenarios.
Euclidean GCD's worst case occurs when Fibonacci Pairs are involved.
void EGCD(fib[i], fib[i - 1]), where i > 0.
For instance, let's opt for the case where the dividend is 55, and the divisor is 34 (recall that we are still dealing with fibonacci numbers).
As you may notice, this operation costed 8 iterations (or recursive calls).
Let's try larger Fibonacci numbers, namely 121393 and 75025. We can notice here as well that it took 24 iterations (or recursive calls).
You can also notice that each iterations yields a Fibonacci number. That's why we have so many operations. We can't obtain similar results only with Fibonacci numbers indeed.
Hence, the time complexity is going to be represented by small Oh (upper bound), this time. The lower bound is intuitively Omega(1): case of 500 divided by 2, for instance.
Let's solve the recurrence relation:
We may say then that Euclidean GCD can make log(xy) operation at most.
There's a great look at this on the wikipedia article.
It even has a nice plot of complexity for value pairs.
It is not O(a%b).
It is known (see article) that it will never take more steps than five times the number of digits in the smaller number. So the max number of steps grows as the number of digits (ln b). The cost of each step also grows as the number of digits, so the complexity is bound by O(ln^2 b) where b is the smaller number. That's an upper limit, and the actual time is usually less.
See here.
In particular this part:
Lamé showed that the number of steps needed to arrive at the greatest common divisor for two numbers less than n is
So O(log min(a, b)) is a good upper bound.
Here's intuitive understanding of runtime complexity of Euclid's algorithm. The formal proofs are covered in various texts such as Introduction to Algorithms and TAOCP Vol 2.
First think about what if we tried to take gcd of two Fibonacci numbers F(k+1) and F(k). You might quickly observe that Euclid's algorithm iterates on to F(k) and F(k-1). That is, with each iteration we move down one number in Fibonacci series. As Fibonacci numbers are O(Phi ^ k) where Phi is golden ratio, we can see that runtime of GCD was O(log n) where n=max(a, b) and log has base of Phi. Next, we can prove that this would be the worst case by observing that Fibonacci numbers consistently produces pairs where the remainders remains large enough in each iteration and never become zero until you have arrived at the start of the series.
We can make O(log n) where n=max(a, b) bound even more tighter. Assume that b >= a so we can write bound at O(log b). First, observe that GCD(ka, kb) = GCD(a, b). As biggest values of k is gcd(a,c), we can replace b with b/gcd(a,b) in our runtime leading to more tighter bound of O(log b/gcd(a,b)).
Here is the analysis in the book Data Structures and Algorithm Analysis in C by Mark Allen Weiss (second edition, 2.4.4):
Euclid's algorithm works by continually computing remainders until 0 is reached. The last nonzero remainder is the answer.
Here is the code:
unsigned int Gcd(unsigned int M, unsigned int N)
unsigned int Rem;
while (N > 0) {
Rem = M % N;
M = N;
N = Rem;
Return M;
Here is a THEOREM that we are going to use:
If M > N, then M mod N < M/2.
There are two cases. If N <= M/2, then since the remainder is smaller
than N, the theorem is true for this case. The other case is N > M/2.
But then N goes into M once with a remainder M - N < M/2, proving the
So, we can make the following inference:
Variables M N Rem
initial M N M%N
1 iteration N M%N N%(M%N)
2 iterations M%N N%(M%N) (M%N)%(N%(M%N)) < (M%N)/2
So, after two iterations, the remainder is at most half of its original value. This would show that the number of iterations is at most 2logN = O(logN).
Note that, the algorithm computes Gcd(M,N), assuming M >= N.(If N > M, the first iteration of the loop swaps them.)
Worst case will arise when both n and m are consecutive Fibonacci numbers.
gcd(Fn,Fn−1)=gcd(Fn−1,Fn−2)=⋯=gcd(F1,F0)=1 and nth Fibonacci number is 1.618^n, where 1.618 is the Golden ratio.
So, to find gcd(n,m), number of recursive calls will be Θ(logn).
The worst case of Euclid Algorithm is when the remainders are the biggest possible at each step, ie. for two consecutive terms of the Fibonacci sequence.
When n and m are the number of digits of a and b, assuming n >= m, the algorithm uses O(m) divisions.
Note that complexities are always given in terms of the sizes of inputs, in this case the number of digits.
Gabriel Lame's Theorem bounds the number of steps by log(1/sqrt(5)*(a+1/2))-2, where the base of the log is (1+sqrt(5))/2. This is for the the worst case scenerio for the algorithm and it occurs when the inputs are consecutive Fibanocci numbers.
A slightly more liberal bound is: log a, where the base of the log is (sqrt(2)) is implied by Koblitz.
For cryptographic purposes we usually consider the bitwise complexity of the algorithms, taking into account that the bit size is given approximately by k=loga.
Here is a detailed analysis of the bitwise complexity of Euclid Algorith:
Although in most references the bitwise complexity of Euclid Algorithm is given by O(loga)^3 there exists a tighter bound which is O(loga)^2.
Consider; r0=a, r1=b, r0=q1.r1+r2 . . . ,ri-1=qi.ri+ri+1, . . . ,rm-2=qm-1.rm-1+rm rm-1=qm.rm
observe that: a=r0>=b=r1>r2>r3...>rm-1>rm>0 ..........(1)
and rm is the greatest common divisor of a and b.
By a Claim in Koblitz's book( A course in number Theory and Cryptography) is can be proven that: ri+1<(ri-1)/2 .................(2)
Again in Koblitz the number of bit operations required to divide a k-bit positive integer by an l-bit positive integer (assuming k>=l) is given as: (k-l+1).l ...................(3)
By (1) and (2) the number of divisons is O(loga) and so by (3) the total complexity is O(loga)^3.
Now this may be reduced to O(loga)^2 by a remark in Koblitz.
consider ki= logri +1
by (1) and (2) we have: ki+1<=ki for i=0,1,...,m-2,m-1 and ki+2<=(ki)-1 for i=0,1,...,m-2
and by (3) the total cost of the m divisons is bounded by: SUM [(ki-1)-((ki)-1))]*ki for i=0,1,2,..,m
rearranging this: SUM [(ki-1)-((ki)-1))]*ki<=4*k0^2
So the bitwise complexity of Euclid's Algorithm is O(loga)^2.
For the iterative algorithm, however, we have:
int iterativeEGCD(long long n, long long m) {
long long a;
int numberOfIterations = 0;
while ( n != 0 ) {
a = m;
m = n;
n = a % n;
numberOfIterations ++;
printf("\nIterative GCD iterated %d times.", numberOfIterations);
return m;
With Fibonacci pairs, there is no difference between iterativeEGCD() and iterativeEGCDForWorstCase() where the latter looks like the following:
int iterativeEGCDForWorstCase(long long n, long long m) {
long long a;
int numberOfIterations = 0;
while ( n != 0 ) {
a = m;
m = n;
n = a - n;
numberOfIterations ++;
printf("\nIterative GCD iterated %d times.", numberOfIterations);
return m;
Yes, with Fibonacci Pairs, n = a % n and n = a - n, it is exactly the same thing.
We also know that, in an earlier response for the same question, there is a prevailing decreasing factor: factor = m / (n % m).
Therefore, to shape the iterative version of the Euclidean GCD in a defined form, we may depict as a "simulator" like this:
void iterativeGCDSimulator(long long x, long long y) {
long long i;
double factor = x / (double)(x % y);
int numberOfIterations = 0;
for ( i = x * y ; i >= 1 ; i = i / factor) {
numberOfIterations ++;
printf("\nIterative GCD Simulator iterated %d times.", numberOfIterations);
Based on the work (last slide) of Dr. Jauhar Ali, the loop above is logarithmic.
Yes, small Oh because the simulator tells the number of iterations at most. Non Fibonacci pairs would take a lesser number of iterations than Fibonacci, when probed on Euclidean GCD.
At every step, there are two cases
b >= a / 2, then a, b = b, a % b will make b at most half of its previous value
b < a / 2, then a, b = b, a % b will make a at most half of its previous value, since b is less than a / 2
So at every step, the algorithm will reduce at least one number to at least half less.
In at most O(log a)+O(log b) step, this will be reduced to the simple cases. Which yield an O(log n) algorithm, where n is the upper limit of a and b.
I have found it here

Calculating sum of geometric series (mod m)

I have a series
S = i^(m) + i^(2m) + ............... + i^(km) (mod m)
0 <= i < m, k may be very large (up to 100,000,000), m <= 300000
I want to find the sum. I cannot apply the Geometric Progression (GP) formula because then result will have denominator and then I will have to find modular inverse which may not exist (if the denominator and m are not coprime).
So I made an alternate algorithm making an assumption that these powers will make a cycle of length much smaller than k (because it is a modular equation and so I would obtain something like 2,7,9,1,2,7,9,1....) and that cycle will repeat in the above series. So instead of iterating from 0 to k, I would just find the sum of numbers in a cycle and then calculate the number of cycles in the above series and multiply them. So I first found i^m (mod m) and then multiplied this number again and again taking modulo at each step until I reached the first element again.
But when I actually coded the algorithm, for some values of i, I got cycles which were of very large size. And hence took a large amount of time before terminating and hence my assumption is incorrect.
So is there any other pattern we can find out? (Basically I don't want to iterate over k.)
So please give me an idea of an efficient algorithm to find the sum.
This is the algorithm for a similar problem I encountered
You probably know that one can calculate the power of a number in logarithmic time. You can also do so for calculating the sum of the geometric series. Since it holds that
1 + a + a^2 + ... + a^(2*n+1) = (1 + a) * (1 + (a^2) + (a^2)^2 + ... + (a^2)^n),
you can recursively calculate the geometric series on the right hand to get the result.
This way you do not need division, so you can take the remainder of the sum (and of intermediate results) modulo any number you want.
As you've noted, doing the calculation for an arbitrary modulus m is difficult because many values might not have a multiplicative inverse mod m. However, if you can solve it for a carefully selected set of alternate moduli, you can combine them to obtain a solution mod m.
Factor m into p_1, p_2, p_3 ... p_n such that each p_i is a power of a distinct prime
Since each p is a distinct prime power, they are pairwise coprime. If we can calculate the sum of the series with respect to each modulus p_i, we can use the Chinese Remainder Theorem to reassemble them into a solution mod m.
For each prime power modulus, there are two trivial special cases:
If i^m is congruent to 0 mod p_i, the sum is trivially 0.
If i^m is congruent to 1 mod p_i, then the sum is congruent to k mod p_i.
For other values, one can apply the usual formula for the sum of a geometric sequence:
S = sum(j=0 to k, (i^m)^j) = ((i^m)^(k+1) - 1) / (i^m - 1)
TODO: Prove that (i^m - 1) is coprime to p_i or find an alternate solution for when they have a nontrivial GCD. Hopefully the fact that p_i is a prime power and also a divisor of m will be of some use... If p_i is a divisor of i. the condition holds. If p_i is prime (as opposed to a prime power), then either the special case i^m = 1 applies, or (i^m - 1) has a multiplicative inverse.
If the geometric sum formula isn't usable for some p_i, you could rearrange the calculation so you only need to iterate from 1 to p_i instead of 1 to k, taking advantage of the fact that the terms repeat with a period no longer than p_i.
(Since your series doesn't contain a j=0 term, the value you want is actually S-1.)
This yields a set of congruences mod p_i, which satisfy the requirements of the CRT.
The procedure for combining them into a solution mod m is described in the above link, so I won't repeat it here.
This can be done via the method of repeated squaring, which is O(log(k)) time, or O(log(k)log(m)) time, if you consider m a variable.
In general, a[n]=1+b+b^2+... b^(n-1) mod m can be computed by noting that:
The second just being the corollary for the first.
In your case, b=i^m can be computed in O(log m) time.
The following Python code implements this:
def geometric(n,b,m):
total = 0
while n>0:
if n&1==1:
total = (e*total + T)%m
T = ((e+1)*T)%m
e = (e*e)%m
n = n/2
//print '{} {} {}'.format(total,T,e)
return total
This bit of magic has a mathematical reason - the operation on pairs defined as
is associative, and the rule 1 basically means that:
(b,1)#(b,1)#... n times ... #(b,1)=(b^n,1+b+b^2+...+b^(n-1))
Repeated squaring always works when operations are associative. In this case, the # operator is O(log(m)) time, so repeated squaring takes O(log(n)log(m)).
One way to look at this is that the matrix exponentiation:
[[b,1],[0,1]]^n == [[b^n,1+b+...+b^(n-1))],[0,1]]
You can use a similar method to compute (a^n-b^n)/(a-b) modulo m because matrix exponentiation gives:
[[b,1],[0,a]]^n == [[b^n,a^(n-1)+a^(n-2)b+...+ab^(n-2)+b^(n-1)],[0,a^n]]
Based on the approach of #braindoper a complete algorithm which calculates
1 + a + a^2 + ... +a^n mod m
looks like this in Mathematica:
geometricSeriesMod[a_, n_, m_] :=
Module[ {q = a, exp = n, factor = 1, sum = 0, temp},
While[And[exp > 0, q != 0],
temp = Mod[factor*PowerMod[q, exp, m], m];
sum = Mod[sum + temp, m];
factor = Mod[Mod[1 + q, m]*factor, m];
q = Mod[q*q, m];
exp = Floor[ exp /2];
Return [Mod[sum + factor, m]]
a is the "ratio" of the series. It can be any integer (including zero and negative values).
n is the highest exponent of the series. Allowed are integers >= 0.
mis the integer modulus != 0
Note: The algorithm performs a Mod operation after every arithmetic operation. This is essential, if you transcribe this algorithm to a language with a limited word length for integers.
