Does anyone know how to implement the Principal component analysis (PCA) on a m-by-n matrix in matlab for normalization?
Assuming each column is a sample (that is, you have n samples each of dimension m), and it's stored in a matrix A you first have to subtract off the column means:
Amm = bsxfun(#minus,A,mean(A,2));
then you want to do an eigenvalue decomposition on 1/size(Amm,2)*Amm*Amm' (you can use 1/(size(Amm,2)-1) as a scale factor if you want an interpetation as an unbiased covariance matrix) with:
[v,d] = eig(1/size(Amm,2)*Amm*Amm');
And the columns of v are going to be your PCA vectors. The entries of d are going to be your corresponding "variances".
However, if your m is huge then this is not the best way to go because storing Amm*Amm' is not practical. You want to instead compute:
[u,s,v] = svd(1/sqrt(size(Amm,2))*Amm,'econ');
This time u contains your PCA vectors. The entries of s are related to the entries of d by a sqrt.
Note: there's another way to go if m is huge, i.e. computing eig(1/size(Amm,2)*Amm
'*Amm); (notice the switch of transposes as compared to above) and doing a little trickery, but it's a longer explanation so I won't get into it.
Related
I'm working on a program that sorts individuals into teams based on a sparse matrix with binary entries, each entry corresponding to whether or not i is willing to work with j and so on. I have the program running, but I need to be able to test it on random matrices to observe some relationships between the results and the parameters.
What I'd like to find is some way to generate a matrix that has a a certain number of non-zero entries per row and a certain probability of symmetrical entries. That is, I want to be able to assign a specific number for P(w_ji = 1 | w_ij = 1) and use that to generate a matrix. I don't want symmetric matrices, but modeling this with completely random matrices would be inaccurate, since a real-world willingness matrix tends to be at least somewhat symmetric.
Does anyone know of anything I could use to generate such a matrix? I generally use python (with gurobi) and am open to installing any number of other libraries to help if I have to. If anyone else here uses gurobi, I would appreciate input on whether or not I could model matrix generation like this as an optimization problem using something like this for an objective function:
min <= sum(w[i,j] * w[j,i] for i in... for j in...) <= max
Thank you!
If all you want is a coefficient matrix with random distribution of 0 and 1 values, the easiest option is to pick a probability and do Bernoulli trials as to whether the value is 1. (If it is zero, omit the element for sparseness).
Alternately, if you need a random permutation of a fixed number of 0's and 1's, then try something like:
import random
n = 50
k = 10
positions = sorted(random.sample(range(n), k))
The list positions represents the nonzero elements you need.
With a matrix representation, this would be a good candidate for the Gurobi matrix variable object, MVar.
Assume that multiplying a matrix G1 of dimension p×q with another matrix G2 of dimension q×r requires pqr scalar multiplications. Computing the product of n matrices G1G2G3 ….. Gn can be done by parenthesizing in different ways. Define GiGi+1 as an explicitly computed pair for a given paranthesization if they are directly multiplied. For example, in the matrix multiplication chain G1G2G3G4G5G6 using parenthesization (G1(G2G3))(G4(G5G6)), G2G3 and G5G6 are only explicitly computed pairs.
Consider a matrix multiplication chain F1F2F3F4F5, where matrices F1,F2,F3,F4 and F5 are of dimensions 2×25,25×3,3×16,16×1 and 1×1000, respectively. In the parenthesization of F1F2F3F4F5 that minimizes the total number of scalar multiplications, the explicitly computed pairs is/are
F1F2 and F3F4 only
F2F3 only
F3F4 only
F2F3 and F4F5 only
=======================================================================
My approach - I want to solve this under one minute, but the only way I know is that to use Bottom up Dynamic Approach by making a table and the other thing I can conclude is we should multiply with F5 at last because it has 1000 in it's dimension.So, please how to develop fast intuition for this kind of question!
======================================================================
Correct answer is F3F4
The most important thing to note is the dimension 1×1000. You better watch out for it if you want to minimize the multiplications. OK, now we do know what we are looking for is basically multiply a small number with 1000.
Carefully examining if we go with F4F5, we would be multiplying 16x1x1000. But computing F3F4 first , the result matrix has dimension 3x1. So going with F3F4 we are able to get small numbers like 3,1 . So , no way im going with F4F5.
By similar logic I would not go with F2F3 and loose the smaller 3 and get bigger 25 and 16 to be later used with 1000.
OK, for F1F2, you can quickly find that (F1F2)(F3F4) is not better than
(F1(F2(F3F4))) . So the answer is F3F4
I'm working of matrices having rank >1. It is possible to reduce the rank of a matrix to rank=1 substituing some values to zeros?
Rank in a matrix refers to how many of the column vectors are independent and non-zero (Or row vectors, but I was taught to always use column vectors). So, if you're willing to lose a lot of the information about the transformation your matrix is defining, you could create a matrix that's just the first non-zero column of your matrix, and everything else set to zero. Guaranteed to be rank 1.
However, that loses a whole lot of information about the transformation. Perhaps a more useful thing to do would be project your matrix onto a space of size 1x1. There are ways to do this in such a way that can create an injection from your matrix to the new space, guaranteeing that no two matrices produce an equivalent result. The first one that comes to mind is:
Let A be an n x m matrix
Let {P_i} be the ith prime number.
Let F(A) = {sum from i to (n * m)} {P_i} ^ (A_(i div n),(i mod m))
While this generates a single number, you can think of a single number as a 1 x 1 matrix, which, if non-zero, has rank 1.
All that being said, rank 1 matrices are kinda boring and you can do cooler stuff with matrices if you keep it at rank != 1. In particular, if you have an n x n matrix with rank n, a whole world of possibility opens up. It really depends on what you want to use these matrices for.
You might want to look at the singular value decomposition, which can be used to write your matrix as a sum of weighted outer products (see here). Choosing only the highest-weighted component of this sum will give you the closest rank-1 approximation to the decomposed matrix.
Most common linear algebra libraries (Eigen, OpenCV, NumPy) have an SVD implementation.
Given an n-by-m matrix A, with it being guaranteed that n>m=rank(A), and given a n-by-1 column v, what is the fastest way to check if [A v] has rank strictly bigger than A?
For my application, A is sparse, n is about 2^12, and m is anywhere in 1:n-1.
Comparing rank(full([A v])) takes about a second on my machine, and I need to do it tens of thousands of times, so I would be very happy to discover a quicker way.
There is no need to do repeated solves IF you can afford to do ONE computation of the null space. Just one call to null will suffice. Given a new vector V, if the dot product with V and the nullspace basis is non-zero, then V will increase the rank of the matrix. For example, suppose we have the matrix M, which of course has a rank of 2.
M = [1 1;2 2;3 1;4 2];
nullM = null(M')';
Will a new column vector [1;1;1;1] increase the rank if we appended it to M?
nullM*[1;1;1;1]
ans =
-0.0321573705742971
-0.602164651199413
Yes, since it has a non-zero projection on at least one of the basis vectors in nullM.
How about this vector:
nullM*[0;0;1;1]
ans =
1.11022302462516e-16
2.22044604925031e-16
In this case, both numbers are essentially zero, so the vector in question would not have increased the rank of M.
The point is, only a simple matrix-vector multiplication is necessary once the null space basis has been generated. If your matrix is too large (and the matrix nearly of full rank) that a call to null will fail here, then you will need to do more work. However, n = 4096 is not excessively large as long as the matrix does not have too many columns.
One alternative if null is too much is a call to svds, to find those singular vectors that are essentially zero. These will form the nullspace basis that we need.
I would use sprank for sparse matrixes. Check it out, it might be faster than any other method.
Edit : As pointed out correctly by #IanHincks, it is not the rank. I am leaving the answer here, just in case someone else will need it in the future.
Maybe you can try to solve the system A*x=v, if it has a solution that means that the rank does not increase.
x=(B\A)';
norm(A*x-B) %% if this is small then the rank does not increase
I have lots of large (around 5000 x 5000) matrices that I need to invert in Matlab. I actually need the inverse, so I can't use mldivide instead, which is a lot faster for solving Ax=b for just one b.
My matrices are coming from a problem that means they have some nice properties. First off, their determinant is 1 so they're definitely invertible. They aren't diagonalizable, though, or I would try to diagonlize them, invert them, and then put them back. Their entries are all real numbers (actually rational).
I'm using Matlab for getting these matrices and for this stuff I need to do with their inverses, so I would prefer a way to speed Matlab up. But if there is another language I can use that'll be faster, then please let me know. I don't know a lot of other languages (a little but of C and a little but of Java), so if it's really complicated in some other language, then I might not be able to use it. Please go ahead and suggest it, though, in case.
I actually need the inverse, so I can't use mldivide instead,...
That's not true, because you can still use mldivide to get the inverse. Note that A-1 = A-1 * I. In MATLAB, this is equivalent to
invA = A\speye(size(A));
On my machine, this takes about 10.5 seconds for a 5000x5000 matrix. Note that MATLAB does have an inv function to compute the inverse of a matrix. Although this will take about the same amount of time, it is less efficient in terms of numerical accuracy (more info in the link).
First off, their determinant is 1 so they're definitely invertible
Rather than det(A)=1, it is the condition number of your matrix that dictates how accurate or stable the inverse will be. Note that det(A)=∏i=1:n λi. So just setting λ1=M, λn=1/M and λi≠1,n=1 will give you det(A)=1. However, as M → ∞, cond(A) = M2 → ∞ and λn → 0, meaning your matrix is approaching singularity and there will be large numerical errors in computing the inverse.
My matrices are coming from a problem that means they have some nice properties.
Of course, there are other more efficient algorithms that can be employed if your matrix is sparse or has other favorable properties. But without any additional info on your specific problem, there is nothing more that can be said.
I would prefer a way to speed Matlab up
MATLAB uses Gauss elimination to compute the inverse of a general matrix (full rank, non-sparse, without any special properties) using mldivide and this is Θ(n3), where n is the size of the matrix. So, in your case, n=5000 and there are 1.25 x 1011 floating point operations. So on a reasonable machine with about 10 Gflops of computational power, you're going to require at least 12.5 seconds to compute the inverse and there is no way out of this, unless you exploit the "special properties" (if they're exploitable)
Inverting an arbitrary 5000 x 5000 matrix is not computationally easy no matter what language you are using. I would recommend looking into approximations. If your matrices are low rank, you might want to try a low-rank approximation M = USV'
Here are some more ideas from math-overflow:
https://mathoverflow.net/search?q=matrix+inversion+approximation
First suppose the eigen values are all 1. Let A be the Jordan canonical form of your matrix. Then you can compute A^{-1} using only matrix multiplication and addition by
A^{-1} = I + (I-A) + (I-A)^2 + ... + (I-A)^k
where k < dim(A). Why does this work? Because generating functions are awesome. Recall the expansion
(1-x)^{-1} = 1/(1-x) = 1 + x + x^2 + ...
This means that we can invert (1-x) using an infinite sum. You want to invert a matrix A, so you want to take
A = I - X
Solving for X gives X = I-A. Therefore by substitution, we have
A^{-1} = (I - (I-A))^{-1} = 1 + (I-A) + (I-A)^2 + ...
Here I've just used the identity matrix I in place of the number 1. Now we have the problem of convergence to deal with, but this isn't actually a problem. By the assumption that A is in Jordan form and has all eigen values equal to 1, we know that A is upper triangular with all 1s on the diagonal. Therefore I-A is upper triangular with all 0s on the diagonal. Therefore all eigen values of I-A are 0, so its characteristic polynomial is x^dim(A) and its minimal polynomial is x^{k+1} for some k < dim(A). Since a matrix satisfies its minimal (and characteristic) polynomial, this means that (I-A)^{k+1} = 0. Therefore the above series is finite, with the largest nonzero term being (I-A)^k. So it converges.
Now, for the general case, put your matrix into Jordan form, so that you have a block triangular matrix, e.g.:
A 0 0
0 B 0
0 0 C
Where each block has a single value along the diagonal. If that value is a for A, then use the above trick to invert 1/a * A, and then multiply the a back through. Since the full matrix is block triangular the inverse will be
A^{-1} 0 0
0 B^{-1} 0
0 0 C^{-1}
There is nothing special about having three blocks, so this works no matter how many you have.
Note that this trick works whenever you have a matrix in Jordan form. The computation of the inverse in this case will be very fast in Matlab because it only involves matrix multiplication, and you can even use tricks to speed that up since you only need powers of a single matrix. This may not help you, though, if it's really costly to get the matrix into Jordan form.