How large a system is it reasonable to attempt to do a linear regression on?
Specifically: I have a system with ~300K sample points and ~1200 linear terms. Is this computationally feasible?
The linear regression is computed as (X'X)^-1 X'Y.
If X is an (n x k) matrix:
(X' X) takes O(n*k^2) time and produces a (k x k) matrix
The matrix inversion of a (k x k) matrix takes O(k^3) time
(X' Y) takes O(n*k^2) time and produces a (k x k) matrix
The final matrix multiplication of two (k x k) matrices takes O(k^3) time
So the Big-O running time is O(k^2*(n + k)).
See also: http://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra
If you get fancy it looks like you can get the time down to O(k^2*(n+k^0.376)) with the Coppersmith–Winograd algorithm.
You can express this as a matrix equation:
where the matrix is 300K rows and 1200 columns, the coefficient vector is 1200x1, and the RHS vector is 1200x1.
If you multiply both sides by the transpose of the matrix , you have a system of equations for the unknowns that's 1200x1200. You can use LU decomposition or any other algorithm you like to solve for the coefficients. (This is what least squares is doing.)
So the Big-O behavior is something like O(mmn), where m = 300K and n = 1200. You'd account for the transpose, the matrix multiplication, the LU decomposition, and the forward-back substitution to get the coefficients.
The linear regression is computed as (X'X)^-1 X'y.
As far as I learned, y is a vector of results (or in other words: dependant variables).
Therefore, if X is an (n × m) matrix and y is an (n × 1) matrix:
The transposing of a (n × m) matrix takes O(n⋅m) time and produces a (m × n) matrix
(X' X) takes O(n⋅m²) time and produces a (m × m) matrix
The matrix inversion of a (m × m) matrix takes O(m³) time
(X' y) takes O(n⋅m) time and produces a (m × 1) matrix
The final matrix multiplication of a (m × m) and a (m x 1) matrices takes O(m²) time
So the Big-O running time is O(n⋅m + n⋅m² + m³ + n⋅m + m²).
Now, we know that:
m² ≤ m³
n⋅m ≤ n⋅m²
so asymptotically, the actual Big-O running time is O(n⋅m² + m³) = O(m²(n + m)).
And that's what we have from
http://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra
But, we know that there's a significant difference between the case n → ∞ and m → ∞.
https://en.wikipedia.org/wiki/Big_O_notation#Multiple_variables
So which one should we choose? Obviously it's the number of observations which is more likely to grow, rather than the number of attributes.
So my conclusion is that if we assume that the number of attributes remains constant, we can ignore the m terms and that's a relief because the time complexity of a multivariate linear regression becomes a mere linear O(n). On the other hand, we can expect our computing time explodes by a large value when the number of attributes increase substantially.
The linear regression of closed-form model is computed as follow:
derivative of
RSS(W) = -2H^t (y-HW)
So, we solve for
-2H^t (y-HW) = 0
Then, the W value is
W = (H^t H)^-1 H^2 y
where:
W: is the vector of expected weights
H: is the features matrix N*D where N is the number of observations, and D is the number of features
y: is the actual value
Then, the complexity of
H^t H is n D^2
The complexity of the transpose is D^3
So, The complexity of
(H^t H)^-1 is n * D^2 + D^3
Related
What advantage does the function
N(x;θ) = θ1(θ2*x)
has over
G(x;θ) = θ*x
for an input vector
x ∈ R^n
θ1 ∈ R^(nx1)
θ2 ∈ R^(1xn)
θ ∈ R^(nxn)
For first case, θ2 with dimension 1xn is multiplied with x with dimension n. That gives output of 1x1. Then multiplied by nx1 the output dimension of N(x;θ) is nx1. So there are n elements in θ2 and n elements in θ1. In total there are n+n (2n) elements.
For second case, θ with dimension nxn is multiplied with x with dimension n. That gives output dimension for G(x;θ) as nx1. In this case there are n*n (n^2) elements for θ.
Therefore, the advantage is that, it is computationally inexpensive to calculate the first case then the second case.
I'm taking an algorithms class and I repeatedly have trouble when I'm asked to analyze the runtime of code when there is a line with multiplication or division. How can I find big-theta of multiplying an n digit number with an m digit number (where n>m)? Is it the same as multiplying two n digit numbers?
For example, right now I'm attempting to analyze the following line of code:
return n*count/100
where count is at most 100. Is the asymptotic complexity of this any different from n*n/100? or n*n/n?
You can always look up here Computational complexity of mathematical operations.
In your complexity of n*count/100 is O(length(n)) as 100 is a constant and length(count) is at most 3.
In general multiplication of two numbers n and m digits length, takes O(nm), the same time required for division. Here i assume we are talking about long division. There are many sophisticated algorithms which will beat this complexity.
To make things clearer i will provide an example. Suppose you have three numbers:
A - n digits length
B - m digits length
C - p digits length
Find complexity of the following formula:
A * B / C
Multiply first. Complexity of A * B it is O(nm) and as result we have number D, which is n+m digits length. Now consider D / C, here complexity is O((n+m)p), where overall complexity is sum of the two O(nm + (n+m)p) = O(m(n+p) + np).
Divide first. So, we divide B / C, complexity is O(mp) and we have m digits number E. Now we calculate A * E, here complexity is O(nm). Again overall complexity is O(mp + nm) = O(m(n+p)).
From the analysis you can see that it is beneficial to divide first. Of course in real life situation you would account for numerical stability as well.
From Modern Computer Arithmetic:
Assume the larger operand has size
m, and the smaller has size n ≤ m, and denote by M(m,n) the corresponding
multiplication cost.
When m is an exact multiple of n, say m = kn, a trivial strategy is to cut the
larger operand into k pieces, giving M(kn,n) = kM(n) + O(kn).
Suppose m ≥ n and n is large. To use an evaluation-interpolation scheme,
we need to evaluate the product at m + n points, whereas balanced k by k
multiplication needs 2k points. Taking k ≈ (m+n)/2, we see that M(m,n) ≤ M((m + n)/2)(1 + o(1)) as n → ∞. On the other hand, from the discussion
above, we have M(m,n) ≤ ⌈m/n⌉M(n)(1 + o(1)).
So I read that Strassen's matrix multiplication algorithm has complexity O(n^2.8)
but it works only if A is n x n and B is n x n
What if
A is m x n and B is n x o
and m is much much bigger than n and o but n and o is still very big
Padding with zeroes might make the multiplication take longer
I doing a project that requires multiplication of such a matrix so I was hoping to get some advice
Should I use the conventional algorithm or is there a way to modify Strassen's algorithm to do it faster?
https://en.m.wikipedia.org/wiki/Strassen_algorithm
A product of size [2N x N] * [N x 10N] can be done as 20 separate [N x N] * [N x N] operations, arranged to form the result;
A product of size [N x 10N] * [10N x N] can be done as 10 separate [N x N] * [N x N] operations, summed to form the result.
These techniques will make the implementation more complicated, compared to simply padding to a power-of-two square; however, it is a reasonable assumption that anyone undertaking an implementation of Strassen, rather than conventional, multiplication, will place a higher priority on computational efficiency than on simplicity of the implementation.
I got this question wrong on an exam : Name a function that is neither O(n) nor Omega(n).
After attempting to learn this stuff on my own through youtube, I'm thinking this may be a correct answer:
(n3 (1 + sin n)) is neither O(n) nor Omega(n).
Would that be accurate?
Name a function that is neither O(n) nor Omega(n)
Saying f ∈ O(g) means the quotient
f(x)/g(x)
is bounded from above for all sufficiently large x.
f ∈ Ω(g) on the other hand means the quotient
f(x)/g(x)
is bounded below away from zero for all sufficiently large x.
So to find a function that is neither O(n) nor Ω(n) means finding a function f such that the quotient
f(x)/x
becomes arbitrarily large, and arbitrarily close to zero on every interval [y, ∞).
I'm thinking this may be a correct answer: (n^3 (1 + sin n)) is neither O(n) nor Omega(n).
Let's plug it in our quotient:
(n^3*(1 + sin n))/n = n^2*(1 + sin n)
The n^2 grows to infinity, and the factor 1 + sin n is larger than 1 for roughly three out of every six n. So one every interval [y, ∞) the quotient becomes arbitrarily large. Given an arbitrary K > 0, let N_0 = y + K + 1 and N_1 the smallest of N_0 + i, i = 0, 1, ..., 4 such that sin (N_0+i) > 0. Then f(N_1)/N_1 > (y + K + 1)² > K² + K > K.
For the Ω(n) part, it's not so easy to prove, although I believe it is satisfied.
But, we can modify the function a bit, retaining the idea of multiplying a growing function with an oscillating one in such a way that the proof becomes simple.
Instead of sin n, let us choose cos (π*n), and, to offset the zeros, add a fast decreasing function to it.
f'(n) = n^3*(1 + cos (π*n) + 1/n^4)
now,
/ n^3*(2 + 1/n^4), if n is even
f'(n) = <
\ 1/n , if n is odd
and it is obvious that f' is neither bounded from above, nor from below by any positive constant multiple of n.
I would consider something like a binary search. This is both O(log N) and Ω(log N). Since Omega is defined as a lower bound, it's not allowed to exceed the function itself -- so O(log N) definitely is not Ω(N).
I think some of the comments on the deleted answer deserve some...clarification -- perhaps even outright correction. To quote from CLRS, "Ω-notation gives a lower bound for a function to within a constant factor."
Since N2 differs from N by more than a constant factor, N2 is not Ω(N).
How to find nth tribonacci number with matrix multiplication method if the initial values are some arbitrary numbers say 1, 2 3 i.e T(1) = 1, T(2) =2 and T(3) = 3.
If T(n) = T(n-1) + T(n-2) + T(n-3) then how to find T(n) if n is very very large, I would appreciate if anyone can explain with matrix multiplication method. How to construct initial matrix.
The matrix multiplication method involves using the matrix recurrence relation.
For the Fibonacci series, we can define a vector of length 2 to represent adjacent Fibonacci numbers. Using this vector, we can define a recurrence relation with a matrix multiplication:
Similarly, the Tribonacci series recurrence relation can be written in this way:
The only difference is that the vector and matrix sizes are different.
Now, to calculate a large Tribonacci number, we just apply the matrix multiplication n times, and we get:
The matrix to the power of n (Mn) can be efficiently calculated, because we can use an exponentiation algorithm.
Many efficient exponentiation algorithms for scalars are described by Wikipedia in Exponentiation by Squaring. We can use the same idea for matrix exponentiation.
I will describe a simple way to do this. First we write n as a binary number, eg:
n = 37 = 100101
Then, calculate M to each power of 2 by squaring the previous power of 2: M1, M2 = M1M1, M4 = M2M2, M8 = M4M4, M16 = M8M8, M32 = M16M16, ...
And finally, multiply the powers of M corresponding to the binary digits of n. In this case, Mn = M1M4M32.
After calculating that, we can multiply the matrix with the Tribonacci vector for the first 3 values, ie.
Because the matrices have fixed size, each matrix multiplication takes constant time. We must do O(log n) matrix multiplications. Thus, we can calculate the nth Tribonacci number in O(log n) time.
Compare this to the normal dynamic programming method, where it takes O(n) time, by calculating each Tribonacci number up to the nth Tribonacci number (ie. for (i = 3 to n) {T[i] = T[i-1]+T[i-2]+T[i-3];} return T[n];).
I will assume that you know how to code up matrix multiplication in the language of your choice.
Consider:
| a1 b1 c1 |
[f(n) f(n - 1) f(n - 2)] * | a2 b2 c2 | = [f(n + 1) f(n) f(n - 1)]
| a3 b3 c3 |
Find the unknowns in the matrix based on that and that will be the matrix you want.
The answer in this case is:
1 1 0
1 0 1
1 0 0
The method is general however, it works even if you sum k previous terms, even if they have constants in front of them etc.