Why does my Randomized SVD implementation use so much memory? - performance

I have a Julia implementation (below) of randomized SVD from this paper, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. See the algorithm on page 9 if you're curious.
I would expect randomized SVD to be more efficient than SVD for large datasets, but it's slightly slower and uses way more memory. Here are my performance statistics from #time:
SVD: 16.331761 seconds (17 allocations: 763.184 MiB, 0.82% gc time)
RSVD: 17.009699 seconds (38 allocations: 1.074 GiB, 0.83% gc time)
Note that my randomized SVD uses over 1 GB of memory. I'm not sure why. Here is my implementation:
using Distributions
using LinearAlgebra
# ------------------------------------------------------------------------------
function find_Q(A, l)
#=
Given an m × n matrix A, and an integer l, compute an m × l orthonormal
matrix Q whose range approximates the range of A.
=#
m, n = size(A)
Ω = rand(Normal(), n, l)
Y = A * Ω
Q, R = qr(Y)
return Q
end
# ------------------------------------------------------------------------------
function randomized_SVD(A, k)
#=
Given an m × n matrix A, a target number k of singular vectors, and an
exponent q (say q = 1 or q = 2), this procedure computes an approximate
rank-2k factorization UΣVt, where U and V are orthonormal and Σ is
nonnegative and diagonal.
=#
Q = find_Q(A, 2*k)
B = Q' * A
S, Σ, Vt = svd(B)
U = Q * S
return U, Σ, Vt
end
# ------------------------------------------------------------------------------
m = 2000
n = 20000
k = 10
# Construct low-rank matrix
A = rand(m, k) * rand(k, n)
println("Rank of A: ", rank(A))
println("Size of A: ", size(A))
println("Throwaway test:")
#time svd(A)
#time randomized_SVD(A, k)
println("Actual test:")
#time svd(A)
#time randomized_SVD(A, k)
println("Completed")
Note that I call #time twice per the Julia documentation, which says:
On the first call (#time sum_global()) the function gets compiled. (If you've not yet used #time in this session, it will also compile functions needed for timing.) You should not take the results of this run seriously.

As an additional note: the Julia documentation does not recommend using the #time macro for benchmarking. Better of using #benchmark macro from the BenchmarkTools.jl package

Related

Time Complexity for finding Discrete Logarithm (brute-force)

I'm trying to understand the time complexity (Big-O) of the following algorithm which finds x such that g^x = y (mod p) (i.e. finding the discrete logarithm of y with base g modulo p).
Here's the pseudocode:
discreteLogarithm(y, g, p)
y := y mod p
a := g
x := 1
until a = y
a := (a * g) mod p
x++
return x
end
I know that the time complexity of this approach is exponential in the number of binary digits in p - but what does this mean and why does it depend on p?
I understand that the complexity is determined by the number of loops (until a = y), but where does p come into this, what's this about binary digits?
The run time depends upon the order of g mod p. The worst case is order (p-1)/2, which is O(p). The run time is thus O(p) modular multiplies. The key here is that p has log p bits, where I use 'log' to mean base 2 logarithm. Since p = 2^( log p ) -- mathematical identity -- we see the run time is exponential in the number of bits of p. To make it more clear, let's use b=log p to represent the number of bits. The worst case run time is O(2^b) modular multiplies. Modular multiplies take O(b^2) time, so the full run time is O(2^b * b^2) time. The 2^b is the dominant term.
Depending upon your particular p and g, the order could be much smaller than p. However, some heuristics in analytical number theory show that on average, it is order p.
EDIT: If you are not familiar with the concept of 'order' from group theory, here is brief explanation. If you keep multiplying g by itself mod p, it eventually comes to 1. The order is the number of multiplies before that happens.

RSA factorization, explanation of the c^d % n

In my RSA, I am using the following code for computing c^d % m.
However, I am not sure, how this version containing the double mod operation, works in the background.
function [f] = rsa ( m, d, c )
f = 1;
for index = 1:d
f = f * mod(c , m );
f = mod( f, m);
end
There is another method using a binary exppansion of the exponent d, where the exponent is represented by the sum of 2^n, which is known for me.
Could somebody help me? Thanks.
If you ignore the mod function calls for the moment, each iteration of the loop will simply multiply f (starting at 1) by c.
Therefore, after d iterations, f will equal
1*c*c*...*c = c^d
You could then apply the modulus operation at the end to compute "c^d%m".
However, doing it like this is very likely to overflow, so the code instead computes the modulus during every iteration to prevent overflow.
In summary, this code is simply doing a bruteforce calculation of "c^d%m". This approach will be far too slow in practice as d tends to be a very large number in RSA.

List of divisors of an integer n (Haskell)

I currently have the following function to get the divisors of an integer:
-- All divisors of a number
divisors :: Integer -> [Integer]
divisors 1 = [1]
divisors n = firstHalf ++ secondHalf
where firstHalf = filter (divides n) (candidates n)
secondHalf = filter (\d -> n `div` d /= d) (map (n `div`) (reverse firstHalf))
candidates n = takeWhile (\d -> d * d <= n) [1..n]
I ended up adding the filter to secondHalf because a divisor was repeating when n is a square of a prime number. This seems like a very inefficient way to solve this problem.
So I have two questions: How do I measure if this really is a bottle neck in my algorithm? And if it is, how do I go about finding a better way to avoid repetitions when n is a square of a prime?
To mesure where the bottleneck is, put the three auxiliary definitions (firstHalf, secondHalf, candidates) at the top level, and run your code with the profiler on: ghc -prof --make divisors.hs ./divisors 100 +RTS -p -RTS
Also, you know that the biggest candidate is sqrt n, so instead of doing that many multiplications d*d, just consider [1..floor (sqrt n)]
For better algorithms, you should take a maths book, for it's not a haskell related question… Things you can consider: if "a divides b", then for all divisor d of a, d divides b as well.
You'll want to use memoization or dynamic programming to avoid checking multiple times if a given d divides b (for example, if 15 and 27 divide b, then you need to mathematically check only once that 3 divides b. The other times, you just see if 3 is in your table of divisors of b).
You needn't test all the elements of reversed second half. You know that if the square root is present, it is the head element there:
secondHalf = let (r:ds) = [n `div` d | d <- reverse firstHalf]
in [r | n `div` r /= r] ++ ds
This assumes n is positive.
A simpler way to handle the sqrt of a number differently is to handle it separately:
divs n =
let
r = floor $ sqrt $ fromIntegral n
(a,b) = unzip $ (1,n) : [(d, q) | d<-[2..r-1], let (q,r)=quotRem n d, r==0]
in
if r*r==n
then a ++ r : reverse b
else a ++ reverse b
That way we get the second half for free, as a part of producing the first half.
But this could hardly be a bottleneck in your application because the algorithm itself is inefficient. It is usually much faster to generate the divisors from a number's prime factorization. Prime factorization by trial division can be much quicker because we divide out each divisor as it is found, reducing the number being factorized and thus the amount of divisors that are tried (up to the reduced number's square root). For example, 12348 = 2*2*3*3*7*7*7 and no factor above 7 is tried in the process of factorization, whereas in divs 12348 the number 12348 is divided by all numbers from 2 to 110:
factorize n = go n (2:[3,5..]) -- or: (go n primes) where
where -- primes = 2 :
go n ds#(d:t) -- filter (null.tail.factorize) [3,5..]
| d*d > n = [n]
| r == 0 = d : go q ds
| otherwise = go n t
where (q,r) = quotRem n d

Finding the closest integer fraction to a given random real between 0..1, given ranges of numerator and denominator

Given two ranges of positive integers x: [1 ... n] and y: [1 ... m] and random real R from 0 to 1, I need to find the pair of elements (i,j) from x and y such that x_i / y_j is closest to R.
What is the most efficient way to find this pair?
Using Farey sequence
This is a simple and mathematically beautiful algorithm to solve this: run a binary search, where on each iteration the next number is given by the mediant formula (below). By the properties of the Farey sequence that number is the one with the smallest denominator within that interval. Consequently this sequence will always converge and never 'miss' a valid solution.
In pseudocode:
input: m, n, R
a_num = 0, a_denom = 1
b_num = 1, b_denom = 1
repeat:
-- interestingly c_num/c_denom is already in reduced form
c_num = a_num + b_num
c_denom = a_denom + b_denom
-- if the numbers are too big, return the closest of a and b
if c_num > n or c_denom > m then
if R - a_num/a_denom < b_num/b_denom - R then
return a_num, a_denom
else
return b_num, b_denom
-- adjust the interval:
if c_num/c_denom < R then
a_num = c_num, a_denom = c_denom
else
b_num = c_num, b_denom = c_denom
goto repeat
Even though it's fast on average (my educated guess that it's O(log max(m,n))), it can still be slow if R is close to a fraction with a small denominator. For example finding an approximation to 1/1000000 with m = n = 1000000 will take a million iterations.
The standard approach to approximating reals with rationals is computing the continued fraction series (see [1]). Put a limit on the nominator and denominator while computing parts of the series, and the last value before you break the limits is a fraction very close to your real number.
This will find a very good approximation very fast, but I'm not sure this will always find a closest approximation. It is known that
any convergent [partial value of the continued fraction expansion] is nearer to the continued fraction than any other fraction whose denominator is less than that of the convergent
but there may be approximations with larger denominator (still below your limit) that are better approximations, but are not convergents.
[1] http://en.wikipedia.org/wiki/Continued_fraction
Given that R is a real number such that 0 <= R <= 1, integers x: [1 ... n] and integers y: [1 ... m]. It is assumed that n <= m, since if n > m then x[n]/y[m] will be greater than 1, which cannot be the closest approximation to R.
Therefore, the best approximation of R with the denominator d will be either floor(R*d) / d or ceil(R*d) / d.
The problem can be solved in O(m) time and O(1) space (in Python):
from __future__ import division
from random import random
from math import floor
def fractionize(R, n, d):
error = abs(n/d - R)
return (n, d, error) # (numerator, denominator, absolute difference to R)
def better(a, b):
return a if a[2] < b[2] else b
def approximate(R, n, m):
best = (0, 1, R)
for d in xrange(1, m+1):
n1 = min(n, int(floor(R * d)))
n2 = min(n, n1 + 1) # ceil(R*d)
best = better(best, fractionize(R, n1, d))
best = better(best, fractionize(R, n2, d))
return best
if __name__ == '__main__':
def main():
R = random()
n = 30
m = 100
print R, approximate(R, n, m)
main()
Prolly get flamed, but a lookup might be best where we compute all of the fractional values for each of the possible values.. So a simply indexing a 2d array indexed via the fractional parts with the array element containing the real equivalent. I guess we have discrete X and Y parts so this is finite, it wouldnt be the other way around.... Ahh yeah, the actual searching part....erm reet....
Rather than a completely brute force search, do a linear search over the shortest of your lists, using round to find the best match for each element. Maybe something like this:
best_x,best_y=(1,1)
for x in 1...n:
y=max(1,min(m,round(x/R)))
#optional optimization (if you have a fast gcd)
if gcd(x,y)>1:
continue
if abs(R-x/y)<abs(R-bestx/besty):
best_x,best_y=(x,y)
return (best_x,best_y)
Not at all sure whether the gcd "optimization" will ever be faster...
The Solution:
You can do this O(1) space and O(m log(n)) time:
there is no need to create any list to search,
The pseudo code may be is buggy but the idea is this:
r: input number to search.
n,m: the ranges.
for (int i=1;i<=m;i++)
{
minVal = min(Search(i,1,n,r), minVal);
}
//x and y are start and end of array:
decimal Search(i,x,y,r)
{
if (i/x > r)
return i/x - r;
decimal middle1 = i/Cill((x+y)/2);
decimal middle2 = i/Roof((x+y)/2);
decimal dist = min(middle1,middle2)
decimal searchResult = 100000;
if( middle > r)
searchResult = Search (i, x, cill((x+y)/2),r)
else
searchResult = Search(i, roof((x+y)/2), y,r)
if (searchResult < dist)
dist = searchResult;
return dist;
}
finding the index as home work to reader.
Description: I think you can understand what's the idea by code, but let trace one of a for loop:
when i=1:
you should search within bellow numbers:
1,1/2,1/3,1/4,....,1/n
you check the number with (1,1/cill(n/2)) and (1/floor(n/2), 1/n) and doing similar binary search on it to find the smallest one.
Should do this for loop for all items, so it will be done m time. and in each time it takes O(log(n)). this function can improve by some mathematical rules, but It will be complicated, I skip it.
If the denominator of R is larger than m then use the Farey method (which the Fraction.limit_denominator method implements) with a limit of m to get a fraction a/b where b is smaller than m else let a/b = R. With b <= m, either a <= n and you are done or else let M = math.ceil(n/R) and re-run the Farey method.
def approx2(a, b, n, m):
from math import ceil
from fractions import Fraction
R = Fraction(a, b)
if R < Fraction(1, m):
return 1, m
r = R.limit_denominator(m)
if r.numerator > n:
M = ceil(n/R)
r = R.limit_denominator(M)
return r.numerator, r.denominator
>>> approx2(113, 205, 50, 200)
(43, 78)
It might be possible to just run the Farey method once using a limiting denominator of min(ceil(n/R), m) but I am not sure about that:
def approx(a, b, n, m):
from math import ceil
from fractions import Fraction
R = Fraction(a, b)
if R < Fraction(1, m):
return 1, m
r = R.limit_denominator(min(ceil(n/R), m))
return r.numerator, r.denominator

finding a^b^c^... mod m

I would like to calculate:
abcd... mod m
Do you know any efficient way since this number is too big but a , b , c , ... and m fit in a simple 32-bit int.
Any Ideas?
Caveat: This question is different from finding ab mod m.
Also please note that abc is not the same as (ab)c. The later is equal to abc. Exponentiation is right-associative.
abc mod m = abc mod n mod m, where n = φ(m) Euler's totient function.
If m is prime, then n = m-1.
Edit: as Nabb pointed out, this only holds if a is coprime to m. So you would have to check this first.
The answer does not contain full formal mathematical proof of correctness. I assumed that it is unnecessary here. Besides, it would be very illegible on SO, (no MathJax for example).
I will use (just a little bit) specific prime factorization algorithm. It's not best option, but enough.
tl;dr
We want calculate a^x mod m. We will use function modpow(a,x,m). Described below.
If x is small enough (not exponential form or exists p^x | m) just calculate it and return
Split into primes and calculate p^x mod m separately for each prime, using modpow function
Calculate c' = gcd(p^x,m) and t' = totient(m/c')
Calculate w = modpow(x.base, x.exponent, t') + t'
Save pow(p, w - log_p c', m) * c' in A table
Multiple all elements from A and return modulo m
Here pow should look like python's pow.
Main problem:
Because current best answer is about only special case gcd(a,m) = 1, and OP did not consider this assumption in question, I decided to write this answer. I will also use Euler's totient theorem. Quoting wikipedia:
Euler's totient theorem:
If n and a are coprime positive integers, then
where φ(n) is Euler's totient function.
The assumption numbers are co-primeis very important, as Nabb shows in comment. So, firstly we need to ensure that the numbers are co-prime. (For greater clarity assume x = b^(c^...).) Because , where we can factorize a, and separately calculate q1 = (p1^alpha)^x mod m,q2 = (p2^beta)^x mod m... and then calculate answer in easy way (q1 * q2 * q3 * ... mod m). Number has at most o(log a) prime factors, so we will be force to perform at most o(log a) calculations.
In fact we doesn't have to split to every prime factor of a (if not all occur in m with other exponents) and we can combine with same exponent, but it is not noteworthy by now.
Now take a look at (p^z)^x mod m problem, where p is prime. Notice some important observation:
If a,b are positive integers smaller than m and c is some positive integer and , then true is sentence .
Using the above observation, we can receive solution for actual problem. We can easily calculate gcd((p^z)^x, m). If x*z are big, it is number how many times we can divide m by p. Let m' = m /gcd((p^z)^x, m). (Notice (p^z)^x = p^(z*x).) Let c = gcd(p^(zx),m). Now we can easily (look below) calculate w = p^(zx - c) mod m' using Euler's theorem, because this numbers are co-prime! And after, using above observation, we can receive p^(zx) mod m. From above assumption wc mod m'c = p^(zx) mod m, so the answer for now is p^(zx) mod m = wc and w,c are easy to calculate.
Therefore we can easily calculate a^x mod m.
Calculate a^x mod m using Euler's theorem
Now assume a,m are co-prime. If we want calculate a^x mod m, we can calculate t = totient(m) and notice a^x mod m = a^(x mod t) mod m. It can be helpful, if x is big and we know only specific expression of x, like for example x = 7^200.
Look at example x = b^c. we can calculate t = totient(m) and x' = b^c mod t using exponentiation by squaring algorithm in Θ(log c) time. And after (using same algorithm) a^x' mod m, which is equal to solution.
If x = b^(c^(d^...) we will solve it recursively. Firstly calculate t1 = totient(m), after t2 = totient(t1) and so on. For example take x=b^(c^d). If t1=totient(m), a^x mod m = a^(b^(c^d) mod t1), and we are able to say b^(c^d) mod t1 = b^(c^d mod t2) mod t1, where t2 = totient(t1). everything we are calculating using exponentiation by squaring algorithm.
Note: If some totient isn't co-prime to exponent, it is necessary to use same trick, as in main problem (in fact, we should forget that it's exponent and recursively solve problem, like in main problem). In above example, if t2 isn't relatively prime with c, we have to use this trick.
Calculate φ(n)
Notice simple facts:
if gcd(a,b)=1, then φ(ab) = φ(a)*φ(b)
if p is prime φ(p^k)=(p-1)*p^(k-1)
Therefore we can factorize n (ak. n = p1^k1 * p2^k2 * ...) and separately calculate φ(p1^k1),φ(p2^k2),... using fact 2. Then combine this using fact 1. φ(n)=φ(p1^k1)*φ(p2^k2)*...
It is worth remembering that, if we will calculate totient repeatedly, we may want to use Sieve of Eratosthenes and save prime numbers in table. It will reduce the constant.
python example: (it is correct, for the same reason as this factorization algorithm)
def totient(n) : # n - unsigned int
result = 1
p = 2 #prime numbers - 'iterator'
while p**2 <= n :
if(n%p == 0) : # * (p-1)
result *= (p-1)
n /= p
while(n%p == 0) : # * p^(k-1)
result *= p
n /= p
p += 1
if n != 1 :
result *= (n-1)
return result # in O(sqrt(n))
Case: abc mod m
Cause it's in fact doing the same thing many times, I believe this case will show you how to solve this generally.
Firstly, we have to split a into prime powers. Best representation will be pair <number,
exponent>.
c++11 example:
std::vector<std::tuple<unsigned, unsigned>> split(unsigned n) {
std::vector<std::tuple<unsigned, unsigned>> result;
for(unsigned p = 2; p*p <= n; ++p) {
unsigned current = 0;
while(n % p == 0) {
current += 1;
n /= p;
}
if(current != 0)
result.emplace_back(p, current);
}
if(n != 1)
result.emplace_back(n, 1);
return result;
}
After split, we have to calculate (p^z)^(b^c) mod m=p^(z*(b^c)) mod m for every pair. Firstly we should check, if p^(z*(b^c)) | m. If, yes the answer is just (p^z)^(b^c), but it's possible only in case in which z,b,c are very small. I believe I don't have to show code example to it.
And finally if p^(z*b^c) > m we have to calculate the answer. Firstly, we have to calculate c' = gcd(m, p^(z*b^c)). After we are able to calculate t = totient(m'). and (z*b^c - c' mod t). It's easy way to get an answer.
function modpow(p, z, b, c, m : integers) # (p^z)^(b^c) mod m
c' = 0
m' = m
while m' % p == 0 :
c' += 1
m' /= p
# now m' = m / gcd((p^z)^(b^c), m)
t = totient(m')
exponent = z*(b^c)-c' mod t
return p^c' * (p^exponent mod m')
And below Python working example:
def modpow(p, z, b, c, m) : # (p^z)^(b^c) mod m
cp = 0
while m % p == 0 :
cp += 1
m /= p # m = m' now
t = totient(m)
exponent = ((pow(b,c,t)*z)%t + t - (cp%t))%t
# exponent = z*(b^c)-cp mod t
return pow(p, cp)*pow(p, exponent, m)
Using this function, we can easily calculate (p^z)^(b^c) mod m, after we just have to multiple all results (mod m), we can also calculate everything on an ongoing basis. Example below. (I hope I didn't make mistake, writing.) Only assumption, b,c are big enough (b^c > log(m) ak. each p^(z*b^k) doesn't divide m), it's simple check and I don't see point to make clutter by it.
def solve(a,b,c,m) : # split and solve
result = 1
p = 2 # primes
while p**2 <= a :
z = 0
while a % p == 0 :
# calculate z
a /= p
z += 1
if z != 0 :
result *= modpow(p,z,b,c,m)
result %= m
p += 1
if a != 1 : # Possible last prime
result *= modpow(a, 1, b, c, m)
return result % m
Looks, like it works.
DEMO and it's correct!
Since for any relationship a=x^y, the relationship is invariant with respect to the numeric base you are using (base 2, base 6, base 16, etc).
Since the mod N operation is equivalent to extracting the least significant digit (LSD) in base N
Since the LSD of the result A in base N can only be affected by the LSD of X in base N, and not digits in higher places. (e.g. 34*56 = 30*50+30*6+50*4+4*5 = 10*(3+50+3*6+5*4)+4*6)
Therefore, from LSD(A)=LSD(X^Y) we can deduce
LSD(A)=LSD(LSD(X)^Y)
Therefore
A mod N = ((X mod N) ^ Y) mod N
and
(X ^ Y) mod N = ((X mod N) ^ Y) mod N)
Therefore you can do the mod before each power step, which keeps your result in the range of integers.
This assumes a is not negative, and for any x^y, a^y < MAXINT
This answer answers the wrong question. (alex)
Modular Exponentiation is a correct way to solve this problem, here's a little bit of hint:
To find abcd % m
You have to start with calculating
a % m, then ab % m, then abc % m and then abcd % m ... (you get the idea)
To find ab % m, you basically need two ideas: [Let B=floor(b/2)]
ab = (aB)2 if b is even OR ab = (aB)2*a if b is odd.
(X*Y)%m = ((X%m) * (Y%m)) % m
(% = mod)
Therefore,
if b is even
ab % m = (aB % m)2 % m
or if b is odd
ab % m = (((aB % m)2) * (a % m)) % m
So if you knew the value of aB, you can calculate this value.
To find aB, apply similar approach, dividing B until you reach 1.
e.g. To calculate 1613 % 11:
1613 % 11 = (16 % 11)13 % 11 = 513 % 11
= (56 % 11) * (56 % 11) * (5 % 11) <---- (I)
To find 56 % 11:
56 % 11 = ((53 % 11) * (53 % 11)) % 11 <----(II)
To find 53%11:
53 % 11 = ((51 % 11) * (51 % 11) * (5 % 11)) % 11
= (((5 * 5) % 11) * 5) % 11 = ((25 % 11) * 5) % 11 = (3 * 5) % 11 = 15 % 11 = 4
Plugging this value to (II) gives
56 % 11 = (((4 * 4) % 11) * 5) % 11 = ((16 % 11) * 5) % 11 = (5 * 5) % 11 = 25 % 11 = 3
Plugging this value to (I) gives
513 % 11 = ((3 % 11) * (3 % 11) * 5) % 11 = ((9 % 11) * 5) % 11 = 45 % 11 = 4
This way 513 % 11 = 4
With this you can calculate anything of form a513 % 11 and so on...
Look at the behavior of A^X mod M as X increases. It must eventually go into a cycle. Suppose the cycle has length P and starts after N steps. Then X >= N implies A^X = A^(X+P) = A^(X%P + (-N)%P + N) (mod M). Therefore we can compute A^B^C by computing y=B^C, z = y < N ? y : y%P + (-N)%P + N, return A^z (mod m).
Notice that we can recursively apply this strategy up the power tree, because the derived equation either has an exponent < M or an exponent involving a smaller exponent tower with a smaller dividend.
The only question is if you can efficiently compute N and P given A and M. Notice that overestimating N is fine. We can just set N to M and things will work out. P is a bit harder. If A and M are different primes, then P=M-1. If A has all of M's prime factors, then we get stuck at 0 and P=1. I'll leave it as an exercise to figure that out, because I don't know how.
///Returns equivalent to list.reverse().aggregate(1, acc,item => item^acc) % M
func PowerTowerMod(Link<int> list, int M, int upperB = M)
requires M > 0, upperB >= M
var X = list.Item
if list.Next == null: return X
var P = GetPeriodSomehow(base: X, mod: M)
var e = PowerTowerMod(list.Next, P, M)
if e^X < upperB then return e^X //todo: rewrite e^X < upperB so it doesn't blowup for large x
return ModPow(X, M + (e-M) % P, M)
Tacet's answer is good, but there are substantial simplifications possible.
The powers of x, mod m, are preperiodic. If x is relatively prime to m, the powers of x are periodic, but even without that assumption, the part before the period is not long, at most the maximum of the exponents in the prime factorization of m, which is at most log_2 m. The length of the period divides phi(m), and in fact lambda(m), where lambda is Carmichael's function, the maximum multiplicative order mod m. This can be significantly smaller than phi(m). Lambda(m) can be computed quickly from the prime factorization of m, just as phi(m) can. Lambda(m) is the GCD of lambda(p_i^e_i) over all prime powers p_i^e_i in the prime factorization of m, and for odd prime powers, lambda(p_i^e_i) = phi(p_i^e^i). lambda(2)=1, lamnda(4)=2, lambda(2^n)=2^(n-2) for larger powers of 2.
Define modPos(a,n) to be the representative of the congruence class of a in {0,1,..,n-1}. For nonnegative a, this is just a%n. For a negative, for some reason a%n is defined to be negative, so modPos(a,n) is (a%n)+n.
Define modMin(a,n,min) to be the least positive integer congruent to a mod n that is at least min. For a positive, you can compute this as min+modPos(a-min,n).
If b^c^... is smaller than log_2 m (and we can check whether this inequality holds by recursively taking logarithms), then we can simply compute a^b^c^... Otherwise, a^b^c^... mod m = a^modMin(b^c^..., lambda(m), [log_2 m])) mod m = a^modMin(b^c^... mod lambda(m), lambda(m),[log_2 m]).
For example, suppose we want to compute 2^3^4^5 mod 100. Note that 3^4^5 only has 489 digits, so this is doable by other methods, but it's big enough that you don't want to compute it directly. However, by the methods I gave here, you can compute 2^3^4^5 mod 100 by hand.
Since 3^4^5 > log_2 100,
2^3^4^5 mod 100
= 2^modMin(3^4^5,lambda(100),6) mod 100
= 2^modMin(3^4^5 mod lambda(100), lambda(100),6) mod 100
= 2^modMin(3^4^5 mod 20, 20,6) mod 100.
Let's compute 3^4^5 mod 20. Since 4^5 > log_2 20,
3^4^5 mod 20
= 3^modMin(4^5,lambda(20),4) mod 20
= 3^modMin(4^5 mod lambda(20),lambda(20),4) mod 20
= 3^modMin(4^5 mod 4, 4, 4) mod 20
= 3^modMin(0,4,4) mod 20
= 3^4 mod 20
= 81 mod 20
= 1
We can plug this into the previous calculation:
2^3^4^5 mod 100
= 2^modMin(3^4^5 mod 20, 20,6) mod 100
= 2^modMin(1,20,6) mod 100
= 2^21 mod 100
= 2097152 mod 100
= 52.
Note that 2^(3^4^5 mod 20) mod 100 = 2^1 mod 100 = 2, which is not correct. You can't reduce down to the preperiodic part of the powers of the base.

Resources