Related question Multiplying real matrix with a complex vector using BLAS
Suppose I aim at C = A*B, where A, B, C are real, complex, and complex matrices, respectively. A[i,j] * B[j,k] := (A[i,j] Re(B[j,k])), (A[i,j] Im(B[j,k])). Is there any available subroutine in BLAS?
I can think about split B into two real matrices for the real and imaginary part, do dgemm then combine, (combine should be faster than matrix multiplication, even directly using nested loops(?)) as suggested by Multiplying real matrix with a complex vector using BLAS
I don't know if there is a direct option in BLAS.
No, there is no routine in standard BLAS that multiplies real and complex matrices together to produce a complex result.
Related
i’m a new programmer on opencl, i’ve to perform a multiplication of 2 complex matrix but i don’t know how to deal with complex matrix on opencl. please any help? I aleady tried matrix multiplication with normal numbers.
One way, though probably not the most efficient, would be to regard your complex matrix, Z say as being two real matrices X (the real parts) and Y the imaginary parts,ie
X[i,j]= Real( Z[i,j]) Y[i,j] = Imag( Z[i,j])
If you have another complex matrix W say, which is split as above into U and V then to multiply:
Z*W = (X*U-Y*V, X*V+Y*U)
where on the rhs we have real matrices and real matrix multiplication and addition.
In terms of multiplies and adds this will be the same amount of computation as doing the complex multiplications and additions (of the elements) directly. The inefficiency will come if you are given, and should return, arrays of complex numbers; then you have to split, as above the matrices you are going to multiply into real ones, and combine the product into complex array.
I need to compute the following matrices:
M = XSX^T
and
V = XSy
what I'd like to know is the more efficient implementation using blas, knowing that S is a symmetric and definite positive matrix of dimension n, X has m rows and n columns while y is a vector of length n.
My implementation is the following:
I compute A = XS using dsymm and then with dgemm is obtained M=AX^T while dgemv is used to obtain V=Ay.
I think that at least M can be computed in a more efficient way since I know that M is symmetric and definite positive.
Your code is the best BLAS can do for you. There is no BLAS operation, that can exploit the fact that M is symmetric.
You are right though you'd technically only need to compute the upper diagonal part of the gemm product and then copy the strictly upper diagonal part to the lower diagonal part. But there is no routine for that.
May I inquire about the sizes? And may I also inspire some other sources for performance gains: Own build of your BLAS implementation, comparison with MKL, ACML, OpenBLAS, ATLAS. You could obviously code your own version that would use AVX, FMA intrinsics. You should be able to do better that some generalised library. Also what is the precision of your floating point variable?
I seriously doubt that you might gain too much by coding it yourself anyway. But what I would definitely suggest is converting everything to floats and testing if float precision is not giving you the same result with significant speed up in compute time. Very seldom have I seen such cases, which were more in the ODE solving domain and numeric integration of nasty functions.
But you did not address my question regarding the BLAS implementation and machine type.
Again, the optimisation beyond this point is not possible without more skills :(. But seriously, don't be to worried about this. There is a reason, why BLAS does not the optimisation you ask for. It might not be worth the hassle. Go with your solution.
And don't forget to investgate the use of floats rather than double. On R convert everything to float. For the Lapack commands use only sgemX
Without knowing the detail of your problem, it can be useful to recognize the zeros in the matrices. Partitioning the matrices to achieve this can provide significant benefits. Is M the sum of many XSX' sub matrices ?
For V = XSy, where y is a vector and X and S are matrices, calculating S.y then X.(Sy) should be better, unless X.S is a necessary calculation for M.
I have an MxM matrix S whose entries are zero on the diagonal, and non-zero everywhere else. I need to make a larger, block matrix. The blocks will be size NxN, and there will be MxM of them.
The (i,j)th block will be S(i,j)I where I=eye(N) is the NxN identity. This matrix will certainly be sparse, S has M^2-M nonzero entries and my block matrix will have N(M^2-M) out of (NM)^2 or about 1/N% nonzero entries, but I'll be adding it to another NMxNM matrix that I do not expect to be sparse.
Since I will be adding my block matrix to a full matrix, would there be any speed gain by trying to write my code in a 'sparse' way? I keep going back and forth, but my thinking is settling on: even if my code to turn S into a sparse block matrix isn't very efficient, when I tell it to add a full and sparse matrix together, wouldn't MATLAB know that it only needs to iterate over the nonzero elements? I've been trained that for loops are slow in MATLAB and things like repmat and padding with zeros is faster, but my guess is that the fastest thing to do would be to not even build the block matrix at all, but write code that adds the entries of (the small matrix) S to my other (large, full) matrix in a sparse way. If I were to learn how to build the block matrix with sparse code (faster than building it in a full way and passing it to sparse), then that code should be able to do the addition for me in a sparse way without even needing to build the block matrix right?
If you can keep a full NMxNM matrix in memory, dont bother with sparse operations. In fact in most cases A+B, with A full and B sparse, will take longer than A+B, where A and B are both full.
From your description, using sparse is likely slower for your problem:
If you're adding a sparse matrix A to a full matrix B, the result is full and there's almost certainly no advantage to having A sparse.
For example:
n = 12000; A = rand(n, n); B1 = rand(n, n); B2 = spalloc(n, n, n*n);
B2 is as sparse as possible, that is, it's all zeros!
On my machine, A+B1 takes about .23 seconds while A + B2 takes about .7 seconds.
Basically, operations on full matrices use BLAS/LAPACK library calls that are insanely optimized. Overhead associated with sparse is going to make things worse unless you're in the special cases where sparse is super useful.
When is sparse super useful?
Sparse is super useful when the size of matrices suggest that some algorithm should be very slow, but because of sparsity (+ perhaps special matrix structure), the actual number of computations required is orders of magnitude less.
EXAMPLE: Solving linear system A*x=b where A is block diagonal matrix:
As = sparse(rand(5, 5)); for(i=1:999) As = blkdiag(As, sparse(rand(5,5))); end %generate a 5000x5000 sparse block diagonal matrix of 5x5 blocks
Af = full(As);
b = rand(5000, 1);
On my machine, solving the linear system on the full matrix (i.e. Af \ b) takes about 2.3 seconds while As \ b takes .0012 seconds.
Sparse can be awesome, but it's only helpful for large problems where you can cleverly exploit structure.
I need to compute the determinant of complex matrix which is symmetric. Size of matrix ranges from 500*500 to 2000*2000. Is there any subroutine for me to call? By the way, I use ifort to compile.
The easiest way would be to do an LU-decomposition as described here. I would suggest using LAPACK for this task...
This article has some code in C doing that for a real-valued symmetric matrix, so you need to exchange dspsv by zspsv to handle double-precision complex matrices.
I'm currently trying to develop a small matrix-oriented math library (I'm using Eigen 3 for matrix data structures and operations) and I wanted to implement some handy Matlab functions, such as the widely used backslash operator (which is equivalent to mldivide ) in order to compute the solution of linear systems (expressed in matrix form).
Is there any good detailed explanation on how this could be achieved ? (I've already implemented the Moore-Penrose pseudoinverse pinv function with a classical SVD decomposition, but I've read somewhere that A\b isn't always pinv(A)*b , at least Matalb doesn't simply do that)
Thanks
For x = A\b, the backslash operator encompasses a number of algorithms to handle different kinds of input matrices. So the matrix A is diagnosed and an execution path is selected according to its characteristics.
The following page describes in pseudo-code when A is a full matrix:
if size(A,1) == size(A,2) % A is square
if isequal(A,tril(A)) % A is lower triangular
x = A \ b; % This is a simple forward substitution on b
elseif isequal(A,triu(A)) % A is upper triangular
x = A \ b; % This is a simple backward substitution on b
else
if isequal(A,A') % A is symmetric
[R,p] = chol(A);
if (p == 0) % A is symmetric positive definite
x = R \ (R' \ b); % a forward and a backward substitution
return
end
end
[L,U,P] = lu(A); % general, square A
x = U \ (L \ (P*b)); % a forward and a backward substitution
end
else % A is rectangular
[Q,R] = qr(A);
x = R \ (Q' * b);
end
For non-square matrices, QR decomposition is used. For square triangular matrices, it performs a simple forward/backward substitution. For square symmetric positive-definite matrices, Cholesky decomposition is used. Otherwise LU decomposition is used for general square matrices.
Update: MathWorks has updated the algorithm section in the doc page of mldivide with some nice flow charts. See here and here (full and sparse cases).
All of these algorithms have corresponding methods in LAPACK, and in fact it's probably what MATLAB is doing (note that recent versions of MATLAB ship with the optimized Intel MKL implementation).
The reason for having different methods is that it tries to use the most specific algorithm to solve the system of equations that takes advantage of all the characteristics of the coefficient matrix (either because it would be faster or more numerically stable). So you could certainly use a general solver, but it wont be the most efficient.
In fact if you know what A is like beforehand, you could skip the extra testing process by calling linsolve and specifying the options directly.
if A is rectangular or singular, you could also use PINV to find a minimal norm least-squares solution (implemented using SVD decomposition):
x = pinv(A)*b
All of the above applies to dense matrices, sparse matrices are a whole different story. Usually iterative solvers are used in such cases. I believe MATLAB uses UMFPACK and other related libraries from the SuiteSpase package for direct solvers.
When working with sparse matrices, you can turn on diagnostic information and see the tests performed and algorithms chosen using spparms:
spparms('spumoni',2)
x = A\b;
What's more, the backslash operator also works on gpuArray's, in which case it relies on cuBLAS and MAGMA to execute on the GPU.
It is also implemented for distributed arrays which works in a distributed computing environment (work divided among a cluster of computers where each worker has only part of the array, possibly where the entire matrix cannot be stored in memory all at once). The underlying implementation is using ScaLAPACK.
That's a pretty tall order if you want to implement all of that yourself :)