I'm currently trying to develop a small matrix-oriented math library (I'm using Eigen 3 for matrix data structures and operations) and I wanted to implement some handy Matlab functions, such as the widely used backslash operator (which is equivalent to mldivide ) in order to compute the solution of linear systems (expressed in matrix form).
Is there any good detailed explanation on how this could be achieved ? (I've already implemented the Moore-Penrose pseudoinverse pinv function with a classical SVD decomposition, but I've read somewhere that A\b isn't always pinv(A)*b , at least Matalb doesn't simply do that)
Thanks
For x = A\b, the backslash operator encompasses a number of algorithms to handle different kinds of input matrices. So the matrix A is diagnosed and an execution path is selected according to its characteristics.
The following page describes in pseudo-code when A is a full matrix:
if size(A,1) == size(A,2) % A is square
if isequal(A,tril(A)) % A is lower triangular
x = A \ b; % This is a simple forward substitution on b
elseif isequal(A,triu(A)) % A is upper triangular
x = A \ b; % This is a simple backward substitution on b
else
if isequal(A,A') % A is symmetric
[R,p] = chol(A);
if (p == 0) % A is symmetric positive definite
x = R \ (R' \ b); % a forward and a backward substitution
return
end
end
[L,U,P] = lu(A); % general, square A
x = U \ (L \ (P*b)); % a forward and a backward substitution
end
else % A is rectangular
[Q,R] = qr(A);
x = R \ (Q' * b);
end
For non-square matrices, QR decomposition is used. For square triangular matrices, it performs a simple forward/backward substitution. For square symmetric positive-definite matrices, Cholesky decomposition is used. Otherwise LU decomposition is used for general square matrices.
Update: MathWorks has updated the algorithm section in the doc page of mldivide with some nice flow charts. See here and here (full and sparse cases).
All of these algorithms have corresponding methods in LAPACK, and in fact it's probably what MATLAB is doing (note that recent versions of MATLAB ship with the optimized Intel MKL implementation).
The reason for having different methods is that it tries to use the most specific algorithm to solve the system of equations that takes advantage of all the characteristics of the coefficient matrix (either because it would be faster or more numerically stable). So you could certainly use a general solver, but it wont be the most efficient.
In fact if you know what A is like beforehand, you could skip the extra testing process by calling linsolve and specifying the options directly.
if A is rectangular or singular, you could also use PINV to find a minimal norm least-squares solution (implemented using SVD decomposition):
x = pinv(A)*b
All of the above applies to dense matrices, sparse matrices are a whole different story. Usually iterative solvers are used in such cases. I believe MATLAB uses UMFPACK and other related libraries from the SuiteSpase package for direct solvers.
When working with sparse matrices, you can turn on diagnostic information and see the tests performed and algorithms chosen using spparms:
spparms('spumoni',2)
x = A\b;
What's more, the backslash operator also works on gpuArray's, in which case it relies on cuBLAS and MAGMA to execute on the GPU.
It is also implemented for distributed arrays which works in a distributed computing environment (work divided among a cluster of computers where each worker has only part of the array, possibly where the entire matrix cannot be stored in memory all at once). The underlying implementation is using ScaLAPACK.
That's a pretty tall order if you want to implement all of that yourself :)
Related
There's three components to this problem:
A three dimensional vector A.
A "smooth" function F.
A desired vector B (also three dimensional).
We want to find a vector A that when put through F will produce the vector B.
F(A) = B
F can be anything that somehow transforms or distorts A in some manner. The point is that we want to iteratively call F(A) until B is produced.
The question is:
How can we do this, but with the least amount of calls to F before finding a vector that equals B (within a reasonable threshold)?
I am assuming that what you call "smooth" is tantamount to being differentiable.
Since the concept of smoothness only makes sense in the rational / real numbers, I will also assume that you are solving a floating point-based problem.
In this case, I would formulate the problem as a nonlinear programming problem. i.e. minimizing the squared norm of the difference between f(A) and B, given by
(F(A)_1 -B_1)² + (F(A)_2 - B_2)² + (F(A)_3 - B_3)²
It should be clear that this expression is zero if and only if f(A) = B and positive otherwise. Therefore you would want to minimize it.
As an example, you could use the solvers built into the scipy optimization suite (available for python):
from scipy.optimize import minimize
# Example function
f = lambda x : [x[0] + 1, x[2], 2*x[1]]
# Optimization objective
fsq = lambda x : sum(v*v for v in f(x))
# Initial guess
x0 = [0,0,0]
res = minimize(fsq, x0, tol=1e-6)
# res.x is the solution, in this case
# array([-1.00000000e+00, 2.49999999e+00, -5.84117172e-09])
A binary search (as pointed out above) only works if the function is 1-d, which is not the case here. You can try out different optimization methods by adding the method="name" to the call to minimize, see the API. It is not always clear which method works best for your problem without knowing more about the nature of your function. As a rule of thumb, the more information you give to the solver, the better. If you can compute the derivative of F explicitly, passing it to the solver will help reduce the number of required evaluations. If F has a Hessian (i.e., if it is twice differentiable), providing the Hessian will help as well.
As an alternative, you can use the least_squares function on F directly via res = least_squares(f, x0). This could be faster since the solver can take care of the fact that you are solving a least squares problem rather than a generic optimization problem.
From a more general standpoint, the problem of restoring the function arguments producing a given value is called an Inverse Problem. These problems have been extensively studied.
Provided that F(A)=B, F,B are known and A remains unknown, you can start with a simple gradient search:
F(A)~= F(C) + F'(C)*(A-C)~=B
where F'(C) is the gradient of F() evaluated in point C. I'm assuming you can calculate this gradient analytically for now.
So, you can choose a point C that you estimate it is not very far from the solution and iterate by:
C= Co;
While(true)
{
Ai = inverse(F'(C))*(B-F(C)) + C;
convergence = Abs(Ai-C);
C=Ai;
if(convergence<someThreshold)
break;
}
if the gradient of F() cannot be calculated analytically, you can estimate it. Let Ei, i=1:3 be the ortonormal vectors, then
Fi'(C) = (F(C+Ei*d) - F(C-Ei*d))/(2*d);
F'(C) = [F1'(C) | F2'(C) | F3'(C)];
and d can be chosen as fixed or as a function of the convergence value.
These algorithms suffer from the problem of local maxima, null gradient areas, etc., so in order for it to work, the start point (Co) must be not very far from the solution where the function F() behaves monotonically
it seems like you can try a metaheuristic approach for this.
Genetic algorithm (GA) might be the best suite for this.
you can initiate a number of A vector to init a population, and use GA to make evolution on your population, so you will have better generation in which they have better vectors that F(Ax) closer to B.
Your fitness function can be a simple function that compare F(Ai) to B
You can choose how to mutate your population by each generation.
A simple example about GA can be found here link
i’m a new programmer on opencl, i’ve to perform a multiplication of 2 complex matrix but i don’t know how to deal with complex matrix on opencl. please any help? I aleady tried matrix multiplication with normal numbers.
One way, though probably not the most efficient, would be to regard your complex matrix, Z say as being two real matrices X (the real parts) and Y the imaginary parts,ie
X[i,j]= Real( Z[i,j]) Y[i,j] = Imag( Z[i,j])
If you have another complex matrix W say, which is split as above into U and V then to multiply:
Z*W = (X*U-Y*V, X*V+Y*U)
where on the rhs we have real matrices and real matrix multiplication and addition.
In terms of multiplies and adds this will be the same amount of computation as doing the complex multiplications and additions (of the elements) directly. The inefficiency will come if you are given, and should return, arrays of complex numbers; then you have to split, as above the matrices you are going to multiply into real ones, and combine the product into complex array.
I need to compute the following matrices:
M = XSX^T
and
V = XSy
what I'd like to know is the more efficient implementation using blas, knowing that S is a symmetric and definite positive matrix of dimension n, X has m rows and n columns while y is a vector of length n.
My implementation is the following:
I compute A = XS using dsymm and then with dgemm is obtained M=AX^T while dgemv is used to obtain V=Ay.
I think that at least M can be computed in a more efficient way since I know that M is symmetric and definite positive.
Your code is the best BLAS can do for you. There is no BLAS operation, that can exploit the fact that M is symmetric.
You are right though you'd technically only need to compute the upper diagonal part of the gemm product and then copy the strictly upper diagonal part to the lower diagonal part. But there is no routine for that.
May I inquire about the sizes? And may I also inspire some other sources for performance gains: Own build of your BLAS implementation, comparison with MKL, ACML, OpenBLAS, ATLAS. You could obviously code your own version that would use AVX, FMA intrinsics. You should be able to do better that some generalised library. Also what is the precision of your floating point variable?
I seriously doubt that you might gain too much by coding it yourself anyway. But what I would definitely suggest is converting everything to floats and testing if float precision is not giving you the same result with significant speed up in compute time. Very seldom have I seen such cases, which were more in the ODE solving domain and numeric integration of nasty functions.
But you did not address my question regarding the BLAS implementation and machine type.
Again, the optimisation beyond this point is not possible without more skills :(. But seriously, don't be to worried about this. There is a reason, why BLAS does not the optimisation you ask for. It might not be worth the hassle. Go with your solution.
And don't forget to investgate the use of floats rather than double. On R convert everything to float. For the Lapack commands use only sgemX
Without knowing the detail of your problem, it can be useful to recognize the zeros in the matrices. Partitioning the matrices to achieve this can provide significant benefits. Is M the sum of many XSX' sub matrices ?
For V = XSy, where y is a vector and X and S are matrices, calculating S.y then X.(Sy) should be better, unless X.S is a necessary calculation for M.
I have an MxM matrix S whose entries are zero on the diagonal, and non-zero everywhere else. I need to make a larger, block matrix. The blocks will be size NxN, and there will be MxM of them.
The (i,j)th block will be S(i,j)I where I=eye(N) is the NxN identity. This matrix will certainly be sparse, S has M^2-M nonzero entries and my block matrix will have N(M^2-M) out of (NM)^2 or about 1/N% nonzero entries, but I'll be adding it to another NMxNM matrix that I do not expect to be sparse.
Since I will be adding my block matrix to a full matrix, would there be any speed gain by trying to write my code in a 'sparse' way? I keep going back and forth, but my thinking is settling on: even if my code to turn S into a sparse block matrix isn't very efficient, when I tell it to add a full and sparse matrix together, wouldn't MATLAB know that it only needs to iterate over the nonzero elements? I've been trained that for loops are slow in MATLAB and things like repmat and padding with zeros is faster, but my guess is that the fastest thing to do would be to not even build the block matrix at all, but write code that adds the entries of (the small matrix) S to my other (large, full) matrix in a sparse way. If I were to learn how to build the block matrix with sparse code (faster than building it in a full way and passing it to sparse), then that code should be able to do the addition for me in a sparse way without even needing to build the block matrix right?
If you can keep a full NMxNM matrix in memory, dont bother with sparse operations. In fact in most cases A+B, with A full and B sparse, will take longer than A+B, where A and B are both full.
From your description, using sparse is likely slower for your problem:
If you're adding a sparse matrix A to a full matrix B, the result is full and there's almost certainly no advantage to having A sparse.
For example:
n = 12000; A = rand(n, n); B1 = rand(n, n); B2 = spalloc(n, n, n*n);
B2 is as sparse as possible, that is, it's all zeros!
On my machine, A+B1 takes about .23 seconds while A + B2 takes about .7 seconds.
Basically, operations on full matrices use BLAS/LAPACK library calls that are insanely optimized. Overhead associated with sparse is going to make things worse unless you're in the special cases where sparse is super useful.
When is sparse super useful?
Sparse is super useful when the size of matrices suggest that some algorithm should be very slow, but because of sparsity (+ perhaps special matrix structure), the actual number of computations required is orders of magnitude less.
EXAMPLE: Solving linear system A*x=b where A is block diagonal matrix:
As = sparse(rand(5, 5)); for(i=1:999) As = blkdiag(As, sparse(rand(5,5))); end %generate a 5000x5000 sparse block diagonal matrix of 5x5 blocks
Af = full(As);
b = rand(5000, 1);
On my machine, solving the linear system on the full matrix (i.e. Af \ b) takes about 2.3 seconds while As \ b takes .0012 seconds.
Sparse can be awesome, but it's only helpful for large problems where you can cleverly exploit structure.
I am trying using multivariate regression to play basketball. Specificlly, I need to, based on X, Y, and distance from the target, predict the pitch, yaw, and cannon strength. I was thinking of using multivariate regression with multipule variables for each of the output parameter. Is there a better way to do this?
Also, should I use solve directly for the best fit, or use gradient descent?
ElKamina's answer is correct but one thing to note about this is that it is identical to doing k independent ordinary least squares regressions. That is, the same as doing a separate linear regression from X to pitch, from X to yaw, and from X to strength. This means, you are not taking advantage of correlations between the output variables. This may be fine for your application, but one alternative that does take advantage of correlations in the output is reduced rank regression(a matlab implementation here), or somewhat related, you can explicitly uncorrelate y by projecting it onto its principle components (see PCA, also called PCA whitening in this case since you aren't reducing the dimensionality).
I highly recommend chapter 6 of Izenman's textbook "Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning" for a fairly high level overview of these techniques. If you're at a University it may be available online through your library.
If those alternatives don't perform well, there are many sophisticated non-linear regression methods that have multiple output versions (although most software packages don't have the multivariate modifications) such as support vector regression, Gaussian process regression, decision tree regression, or even neural networks.
Multivariate regression is equivalent to doing the inverse of the covariance of the input variable set. Since there are many solutions to inverting the matrix (if the dimensionality is not very high. Thousand should be okay), you should go directly for the best fit instead of gradient descent.
n be the number of samples, m be the number of input variables and k be the number of output variables.
X be the input data (n,m)
Y be the target data (n,k)
A be the coefficients you want to estimate (m,k)
XA = Y
X'XA=X'Y
A = inverse(X'X)X'Y
X' is the transpose of X.
As you can see, once you find the inverse of X'X you can calculate the coefficients for any number of output variables with just a couple of matrix multiplications.
Use any simple math tools to solve this (MATLAB/R/Python..).