Matrix multiplication when one dimension is much larger than the other - algorithm

So I read that Strassen's matrix multiplication algorithm has complexity O(n^2.8)
but it works only if A is n x n and B is n x n
What if
A is m x n and B is n x o
and m is much much bigger than n and o but n and o is still very big
Padding with zeroes might make the multiplication take longer
I doing a project that requires multiplication of such a matrix so I was hoping to get some advice
Should I use the conventional algorithm or is there a way to modify Strassen's algorithm to do it faster?

https://en.m.wikipedia.org/wiki/Strassen_algorithm
A product of size [2N x N] * [N x 10N] can be done as 20 separate [N x N] * [N x N] operations, arranged to form the result;
A product of size [N x 10N] * [10N x N] can be done as 10 separate [N x N] * [N x N] operations, summed to form the result.
These techniques will make the implementation more complicated, compared to simply padding to a power-of-two square; however, it is a reasonable assumption that anyone undertaking an implementation of Strassen, rather than conventional, multiplication, will place a higher priority on computational efficiency than on simplicity of the implementation.

Related

What is the correct time complexity for n factorial time funcion?

I am very new to this topic and I am trying to grasp everything related to the asymptotic notations. I want to ask for your opinion on the following question:
If we have, for an algorithm, that T(n)=n!, then we can say for its time complexity that:
1 x 1 x 1 ... x1 <= n! <= n x n x n ... x n
This relation means that n! = O(n^n) and n! = Ω(1). However, can't we do better? We want the big-oh to be as close as we can to the function T(n). If we do the following:
n! <= 1 x 2 x 3 x 4 ... x n x n
That is, for the second to last element, we replace (n-1) with n. Now isnt this relation true? So isn't it true that n! = O(1 x 2 ... x n x n)? Something similar can be said for the lower bound Ω.
I am not sure if there is an error in my though process so I would really appreciate your input. Thanks in advance.
The mathematical statement n! = O(1 x 2 ... x n x n) is true. But also not terribly helpful nor enlightening. In what situations do you want to write n! = O(...)?
Either you are satisfied with n! = n!, and you don't need to write n! = O(1 x 2 ... x n x n). Or you are not satisfied with n! = n!; you want something that explains better exactly how large is n!; then you shouldn't be satisfied with n! = O(1 x 2 ... x n x n) either, as it is not any easier to understand.
Personally, I am satisfied with polynomials, like n^2. I am satisfied with exponentials, like 2^n. I am also somewhat satisfied with n^n, because I know n^n = 2^(n log n), and I also know I can't hope to find a better expression for n^n.
But I am not satisfied with n!. I would like to be able to compare it to exponentials.
Here are two comparisons:
n! < n^n
2^n < n!
The first one is obtained by upperbounding every factor by n in the product; the second one is obtained by lowerbounding every factor by 2 in the product.
That's already pretty good; it tells us that n! is somewhere between the exponential 2^n and the superexponential n^n.
But you can easily tell that the upperbound n^n is too high; for instance, you can find the following tighter bounds quite easily:
n! < n^(n-1)
n! < 2 * n^(n-2)
n! < 6 * n^(n-3)
Note that n^(n-3) is a lot smaller than n^n when n is big! This is slightly better, but still not satisfying.
You could go even further, and notice that half the factors are smaller than n/2, thus:
n! < (n/2)^(n/2) * n^(n/2) = (1/2)^(n/2) * n^n = (n / sqrt(2))^n =~ (0.7 n)^n
This is a slightly tighter upper bound! But can we do even better? I am still not satisfied.
If you are not satisfied either, I encourage you to read: https://en.wikipedia.org/wiki/Stirling%27s_approximation

How much computationally expensive is Matrix multiplication?

What advantage does the function
N(x;θ) = θ1(θ2*x)
has over
G(x;θ) = θ*x
for an input vector
x ∈ R^n
θ1 ∈ R^(nx1)
θ2 ∈ R^(1xn)
θ ∈ R^(nxn)
For first case, θ2 with dimension 1xn is multiplied with x with dimension n. That gives output of 1x1. Then multiplied by nx1 the output dimension of N(x;θ) is nx1. So there are n elements in θ2 and n elements in θ1. In total there are n+n (2n) elements.
For second case, θ with dimension nxn is multiplied with x with dimension n. That gives output dimension for G(x;θ) as nx1. In this case there are n*n (n^2) elements for θ.
Therefore, the advantage is that, it is computationally inexpensive to calculate the first case then the second case.

Does the asymptotic complexity of a multiplication algorithm only rely on the larger of the two operands?

I'm taking an algorithms class and I repeatedly have trouble when I'm asked to analyze the runtime of code when there is a line with multiplication or division. How can I find big-theta of multiplying an n digit number with an m digit number (where n>m)? Is it the same as multiplying two n digit numbers?
For example, right now I'm attempting to analyze the following line of code:
return n*count/100
where count is at most 100. Is the asymptotic complexity of this any different from n*n/100? or n*n/n?
You can always look up here Computational complexity of mathematical operations.
In your complexity of n*count/100 is O(length(n)) as 100 is a constant and length(count) is at most 3.
In general multiplication of two numbers n and m digits length, takes O(nm), the same time required for division. Here i assume we are talking about long division. There are many sophisticated algorithms which will beat this complexity.
To make things clearer i will provide an example. Suppose you have three numbers:
A - n digits length
B - m digits length
C - p digits length
Find complexity of the following formula:
A * B / C
Multiply first. Complexity of A * B it is O(nm) and as result we have number D, which is n+m digits length. Now consider D / C, here complexity is O((n+m)p), where overall complexity is sum of the two O(nm + (n+m)p) = O(m(n+p) + np).
Divide first. So, we divide B / C, complexity is O(mp) and we have m digits number E. Now we calculate A * E, here complexity is O(nm). Again overall complexity is O(mp + nm) = O(m(n+p)).
From the analysis you can see that it is beneficial to divide first. Of course in real life situation you would account for numerical stability as well.
From Modern Computer Arithmetic:
Assume the larger operand has size
m, and the smaller has size n ≤ m, and denote by M(m,n) the corresponding
multiplication cost.
When m is an exact multiple of n, say m = kn, a trivial strategy is to cut the
larger operand into k pieces, giving M(kn,n) = kM(n) + O(kn).
Suppose m ≥ n and n is large. To use an evaluation-interpolation scheme,
we need to evaluate the product at m + n points, whereas balanced k by k
multiplication needs 2k points. Taking k ≈ (m+n)/2, we see that M(m,n) ≤ M((m + n)/2)(1 + o(1)) as n → ∞. On the other hand, from the discussion
above, we have M(m,n) ≤ ⌈m/n⌉M(n)(1 + o(1)).

Fast way to calculate n! mod m where m is prime?

I was curious if there was a good way to do this. My current code is something like:
def factorialMod(n, modulus):
ans=1
for i in range(1,n+1):
ans = ans * i % modulus
return ans % modulus
But it seems quite slow!
I also can't calculate n! and then apply the prime modulus because sometimes n is so large that n! is just not feasible to calculate explicitly.
I also came across http://en.wikipedia.org/wiki/Stirling%27s_approximation and wonder if this can be used at all here in some way?
Or, how might I create a recursive, memoized function in C++?
n can be arbitrarily large
Well, n can't be arbitrarily large - if n >= m, then n! ≡ 0 (mod m) (because m is one of the factors, by the definition of factorial).
Assuming n << m and you need an exact value, your algorithm can't get any faster, to my knowledge. However, if n > m/2, you can use the following identity (Wilson's theorem - Thanks #Daniel Fischer!)
to cap the number of multiplications at about m-n
(m-1)! ≡ -1 (mod m)
1 * 2 * 3 * ... * (n-1) * n * (n+1) * ... * (m-2) * (m-1) ≡ -1 (mod m)
n! * (n+1) * ... * (m-2) * (m-1) ≡ -1 (mod m)
n! ≡ -[(n+1) * ... * (m-2) * (m-1)]-1 (mod m)
This gives us a simple way to calculate n! (mod m) in m-n-1 multiplications, plus a modular inverse:
def factorialMod(n, modulus):
ans=1
if n <= modulus//2:
#calculate the factorial normally (right argument of range() is exclusive)
for i in range(1,n+1):
ans = (ans * i) % modulus
else:
#Fancypants method for large n
for i in range(n+1,modulus):
ans = (ans * i) % modulus
ans = modinv(ans, modulus)
ans = -1*ans + modulus
return ans % modulus
We can rephrase the above equation in another way, that may or may-not perform slightly faster. Using the following identity:
we can rephrase the equation as
n! ≡ -[(n+1) * ... * (m-2) * (m-1)]-1 (mod m)
n! ≡ -[(n+1-m) * ... * (m-2-m) * (m-1-m)]-1 (mod m)
(reverse order of terms)
n! ≡ -[(-1) * (-2) * ... * -(m-n-2) * -(m-n-1)]-1 (mod m)
n! ≡ -[(1) * (2) * ... * (m-n-2) * (m-n-1) * (-1)(m-n-1)]-1 (mod m)
n! ≡ [(m-n-1)!]-1 * (-1)(m-n) (mod m)
This can be written in Python as follows:
def factorialMod(n, modulus):
ans=1
if n <= modulus//2:
#calculate the factorial normally (right argument of range() is exclusive)
for i in range(1,n+1):
ans = (ans * i) % modulus
else:
#Fancypants method for large n
for i in range(1,modulus-n):
ans = (ans * i) % modulus
ans = modinv(ans, modulus)
#Since m is an odd-prime, (-1)^(m-n) = -1 if n is even, +1 if n is odd
if n % 2 == 0:
ans = -1*ans + modulus
return ans % modulus
If you don't need an exact value, life gets a bit easier - you can use Stirling's approximation to calculate an approximate value in O(log n) time (using exponentiation by squaring).
Finally, I should mention that if this is time-critical and you're using Python, try switching to C++. From personal experience, you should expect about an order-of-magnitude increase in speed or more, simply because this is exactly the sort of CPU-bound tight-loop that natively-compiled code excels at (also, for whatever reason, GMP seems much more finely-tuned than Python's Bignum).
Expanding my comment to an answer:
Yes, there are more efficient ways to do this. But they are extremely messy.
So unless you really need that extra performance, I don't suggest to try to implement these.
The key is to note that the modulus (which is essentially a division) is going to be the bottleneck operation. Fortunately, there are some very fast algorithms that allow you to perform modulus over the same number many times.
Division by Invariant Integers using Multiplication
Montgomery Reduction
These methods are fast because they essentially eliminate the modulus.
Those methods alone should give you a moderate speedup. To be truly efficient, you may need to unroll the loop to allow for better IPC:
Something like this:
ans0 = 1
ans1 = 1
for i in range(1,(n+1) / 2):
ans0 = ans0 * (2*i + 0) % modulus
ans1 = ans1 * (2*i + 1) % modulus
return ans0 * ans1 % modulus
but taking into account for an odd # of iterations and combining it with one of the methods I linked to above.
Some may argue that loop-unrolling should be left to the compiler. I will counter-argue that compilers are currently not smart enough to unroll this particular loop. Have a closer look and you will see why.
Note that although my answer is language-agnostic, it is meant primarily for C or C++.
n! mod m can be computed in O(n1/2 + ε) operations instead of the naive O(n). This requires use of FFT polynomial multiplication, and is only worthwhile for very large n, e.g. n > 104.
An outline of the algorithm and some timings can be seen here: http://fredrikj.net/blog/2012/03/factorials-mod-n-and-wilsons-theorem/
If we want to calculate M = a*(a+1) * ... * (b-1) * b (mod p), we can use the following approach, if we assume we can add, substract and multiply fast (mod p), and get a running time complexity of O( sqrt(b-a) * polylog(b-a) ).
For simplicity, assume (b-a+1) = k^2, is a square. Now, we can divide our product into k parts, i.e. M = [a*..*(a+k-1)] *...* [(b-k+1)*..*b]. Each of the factors in this product is of the form p(x)=x*..*(x+k-1), for appropriate x.
By using a fast multiplication algorithm of polynomials, such as Schönhage–Strassen algorithm, in a divide & conquer manner, one can find the coefficients of the polynomial p(x) in O( k * polylog(k) ). Now, apparently there is an algorithm for substituting k points in the same degree-k polynomial in O( k * polylog(k) ), which means, we can calculate p(a), p(a+k), ..., p(b-k+1) fast.
This algorithm of substituting many points into one polynomial is described in the book "Prime numbers" by C. Pomerance and R. Crandall. Eventually, when you have these k values, you can multiply them in O(k) and get the desired value.
Note that all of our operations where taken (mod p).
The exact running time is O(sqrt(b-a) * log(b-a)^2 * log(log(b-a))).
Expanding on my comment, this takes about 50% of the time for all n in [100, 100007] where m=(117 | 1117):
Function facmod(n As Integer, m As Integer) As Integer
Dim f As Integer = 1
For i As Integer = 2 To n
f = f * i
If f > m Then
f = f Mod m
End If
Next
Return f
End Function
I found this following function on quora:
With f(n,m) = n! mod m;
function f(n,m:int64):int64;
begin
if n = 1 then f:= 1
else f:= ((n mod m)*(f(n-1,m) mod m)) mod m;
end;
Probably beat using a time consuming loop and multiplying large number stored in string. Also, it is applicable to any integer number m.
The link where I found this function : https://www.quora.com/How-do-you-calculate-n-mod-m-where-n-is-in-the-1000s-and-m-is-a-very-large-prime-number-eg-n-1000-m-10-9+7
If n = (m - 1) for prime m then by http://en.wikipedia.org/wiki/Wilson's_theorem n! mod m = (m - 1)
Also as has already been pointed out n! mod m = 0 if n > m
Assuming that the "mod" operator of your chosen platform is sufficiently fast, you're bounded primarily by the speed at which you can calculate n! and the space you have available to compute it in.
Then it's essentially a 2-step operation:
Calculate n! (there are lots of fast algorithms so I won't repeat any here)
Take the mod of the result
There's no need to complexify things, especially if speed is the critical component. In general, do as few operations inside the loop as you can.
If you need to calculate n! mod m repeatedly, then you may want to memoize the values coming out of the function doing the calculations. As always, it's the classic space/time tradeoff, but lookup tables are very fast.
Lastly, you can combine memoization with recursion (and trampolines as well if needed) to get things really fast.

What is the BigO of linear regression?

How large a system is it reasonable to attempt to do a linear regression on?
Specifically: I have a system with ~300K sample points and ~1200 linear terms. Is this computationally feasible?
The linear regression is computed as (X'X)^-1 X'Y.
If X is an (n x k) matrix:
(X' X) takes O(n*k^2) time and produces a (k x k) matrix
The matrix inversion of a (k x k) matrix takes O(k^3) time
(X' Y) takes O(n*k^2) time and produces a (k x k) matrix
The final matrix multiplication of two (k x k) matrices takes O(k^3) time
So the Big-O running time is O(k^2*(n + k)).
See also: http://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra
If you get fancy it looks like you can get the time down to O(k^2*(n+k^0.376)) with the Coppersmith–Winograd algorithm.
You can express this as a matrix equation:
where the matrix is 300K rows and 1200 columns, the coefficient vector is 1200x1, and the RHS vector is 1200x1.
If you multiply both sides by the transpose of the matrix , you have a system of equations for the unknowns that's 1200x1200. You can use LU decomposition or any other algorithm you like to solve for the coefficients. (This is what least squares is doing.)
So the Big-O behavior is something like O(mmn), where m = 300K and n = 1200. You'd account for the transpose, the matrix multiplication, the LU decomposition, and the forward-back substitution to get the coefficients.
The linear regression is computed as (X'X)^-1 X'y.
As far as I learned, y is a vector of results (or in other words: dependant variables).
Therefore, if X is an (n × m) matrix and y is an (n × 1) matrix:
The transposing of a (n × m) matrix takes O(n⋅m) time and produces a (m × n) matrix
(X' X) takes O(n⋅m²) time and produces a (m × m) matrix
The matrix inversion of a (m × m) matrix takes O(m³) time
(X' y) takes O(n⋅m) time and produces a (m × 1) matrix
The final matrix multiplication of a (m × m) and a (m x 1) matrices takes O(m²) time
So the Big-O running time is O(n⋅m + n⋅m² + m³ + n⋅m + m²).
Now, we know that:
m² ≤ m³
n⋅m ≤ n⋅m²
so asymptotically, the actual Big-O running time is O(n⋅m² + m³) = O(m²(n + m)).
And that's what we have from
http://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra
But, we know that there's a significant difference between the case n → ∞ and m → ∞.
https://en.wikipedia.org/wiki/Big_O_notation#Multiple_variables
So which one should we choose? Obviously it's the number of observations which is more likely to grow, rather than the number of attributes.
So my conclusion is that if we assume that the number of attributes remains constant, we can ignore the m terms and that's a relief because the time complexity of a multivariate linear regression becomes a mere linear O(n). On the other hand, we can expect our computing time explodes by a large value when the number of attributes increase substantially.
The linear regression of closed-form model is computed as follow:
derivative of
RSS(W) = -2H^t (y-HW)
So, we solve for
-2H^t (y-HW) = 0
Then, the W value is
W = (H^t H)^-1 H^2 y
where:
W: is the vector of expected weights
H: is the features matrix N*D where N is the number of observations, and D is the number of features
y: is the actual value
Then, the complexity of
H^t H is n D^2
The complexity of the transpose is D^3
So, The complexity of
(H^t H)^-1 is n * D^2 + D^3

Resources