Multiply part of an array as a matrix using matmul - matrix

My question is similar to this one Multiply a 3D matrix with a 2D matrix. However, I'm coding in Fortran.
Say, if I have a RxSxT matrix A and an SxU matrix B, where R,S,T,U are integers, and I want to multiply A(:,:,0) with B. How can I do this with matmul? When I do something like
C(:,:,0) = matmul(A(:,:,0),B)
The compiler (gfortran) gives:
Warning:Array reference at (1) is out of bounds (0 < 1) in dimension 3
f951: internal compiler error: Segmentation fault
Is there a way around this?
Thanks.
EDIT: I should add that I'm actually transposing the second matrix. Say, A a RxSxT matrix and B a UxS matrix. Then
C(:,:,0) = matmul(B,transpose(A(:,:,0))
That transpose might be part of the problem. Does it convert A(i,j,k) to A(k,i,j)?

Remember that in Fortran your array indices start from 1 by default. So unless you have specified your array A to have a non-default lower bound on the 3rd dimension, gfortran is entirely correct in pointing out your error.
Of course, an internal compiler error is always a compiler bug; unless you have some ancient version of gfortran please file a bug at http://gcc.gnu.org/bugzilla

transpose (A(:,:,0)) should interchange the indices A(i,j,0) to A(j,i,0). A(:,:,0) is a rank two matrix.
The compiler should never crash, whether or not the input source code is correct. Are you using the latest version of gfortran? You could report this "internal compiler error: Segmentation fault" to the gfortran development team: http://gcc.gnu.org/wiki/GFortran#bugs

Related

Crash in Eigen when working with mixed row major/col major sparse matrices

I'm getting a crash in Eigen 3.3.5 when trying to do something like the below:
Eigen::Map<Eigen::Matrix<float, 1, Eigen::Dynamic, Eigen::RowMajor>> eigenValues(valueBuffer, 1, 100000);
Eigen::Map<Eigen::Matrix<float, 1, Eigen::Dynamic, Eigen::RowMajor>> eigenChannels(channelBuffer, 1, 5000);
Eigen::SparseMatrix<float, Eigen::RowMajor> sparseChannels = eigenChannels.sparseView(1.0f, 1.e-4f);
Eigen::Map<const Eigen::SparseMatrix<float>> eigenLargeSparseMatrix(5000, 100000, LargeSparseMatrix.Values.Num(), LargeSparseMatrix.OuterStarts.GetData(), LargeSparseMatrix.InnerIndices.GetData(), LargeSparseMatrix.Values.GetData());
eigenValues += (sparseChannels * eigenLargeSparseMatrix);
Specifically, it's crashing in Eigen::internal::sparse_sparse_to_dense_product_impl in the inner loop when trying to grab the index of the lhsIt.
Assume that I've already checked that all the sizes of everything are correct, all my buffers are initialized correctly with real memory, etc. I've been going over every detail of this for a few days trying to find an error in my reasoning or logic.
Basically all I'm trying to do is do:
1xn row vector += (1xm row vector * mxn matrix)
where the left side is dense and both right side vector/matrices are sparse.
What appears to be happening from looking at the templated callstack is the add_assign_op correctly recognizes that the row vector has the RowMajor flag and the matrix is ColMajor, but then the sparse_sparse_to_dense_product_impl has both the lhs and rhs being ColMajor.
From looking at the sparse_sparse_to_dense_product_selector code, this appears to be because Eigen just changes the RowMajor lhs into a ColMajorLhs and calls the product impl. This seems bound to crash- it's a row vector for a reason, I'm not sure why eigen finds the need to transpose it. I'm really unsure how this is mean to work.
My challenge is I need (for memory streaming efficiency), the larger matrix to be organized sequentially in memory such that either a) it's col major, and being pre-multiplied by a row vector, or b) it's row major and being post multiplied by a col vector. Both versions hit this weird transpose code rather than just letting them be different.
Can anyone lend a hand? Am I doing something wrong? Is this a bug?
ADDED:
I'm reasonable sure the crash is my own fault at this point after all, but I'd still like to understand why the row vector is being transposed before the multiply, as I would assume this would produce undesirable behavior. Basically at this point, I would just like to understand why the sparse_sparse_to_dense_product_selector transposes row vectors into column vectors before the multiply.

Mapping complex sparse matrix in Eigen from MATLAB workspace

I am working on solving the linear algebraic equation Ax = b by using Eigen solvers through mex function of Matlab. Given a complex sparse matrix A and a sparse vector b from Matlab workspace, I want to map matrix A and vector b in Eigen sparse matrix format. After that, I need to use Eigen's linear equation solvers to solve it. At the end I need to transfer the results x to Matlab workspace.
However, since I am not good at C++ and not familiar with Eigen either. I am stuck at the first step, namely constructing the complex sparse matrix in Eigen accepted format.
I have found there is the following function in Eigen,
Eigen::MappedSparseMatrix<double,RowMajor> mat(rows, cols, nnz, row_ptr, col_index, values);
And I can use mxGetPr, mxGetPi, mxGetIr, mxGetJc, etc, these mex functions to get the info for the above "rows, cols, nnz, row_ptr, col_index, values". However, since in my case, matrix A is a complex sparse matrix, I am not sure whether "MappedSparseMatrix" can do that.
If it can, how the format of "MappedSparseMatrix" should be ? Is the following correct ?
Eigen::MappedSparseMatrix<std::complex<double>> mat(rows, cols, nnz, row_ptr, col_index, values_complex);
If so, how should I construct that values_complex ?
I have found about a relevant topic before. I can use the following codes to get a complex dense matrix.
MatrixXcd mat(m,n);
mat.real() = Map<MatrixXd>(realData,m,n);
mat.imag() = Map<MatrixXd>(imagData,m,n);
However, since my matrix A is a sparse matrix, it seems that it will produce errors if I define mat as a complex sparse matrix like the following:
SparseMatrix<std::complex<double> > mat;
mat.real() = Map<SparseMatrix>(rows, cols, nnz, row_ptr, col_index, realData);
mat.imag() = Map<SparseMatrix>(rows, cols, nnz, row_ptr, col_index, imagData);
So can anyone provide some advice for that?
MatlLab stores complex entries in two separate buffers: one for the real components and one for the imaginary components, whereas Eigen needs them to be interleaved:
value_ptr = [r0,i0,r1,i1,r2,i2,...]
so that it is compatible with std::complex<>. So in your case, you will have to create yourself a temporary buffer holding the values in that interleaved format to be passed to MappedSparseMatrix, or, if using Eigen 3.3, to Map<SparseMatrix<double,RowMajor> >.
Moreover, you will have to adjust the buffer of indices so that they are zero-based. To this end, decrement by one all entries of col_ptr and row_ptr before passing them to Eigen, and increment them by one afterward.

CUDA implementation for arbitrary precision arithmetics

I have to multiply two very large (~ 2000 X 2000) dense matrices whose entries are floats with arbitrary precision (I am using GMP and the precision is currently set to 600). I was wondering if there is any CUDA library that supports arbitrary precision arithmetics? The only library that I have found is called CAMPARY however it seems to be missing some references to some of the used functions.
The other solution that I was thinking about was implementing a version of the Karatsuba algorithm for multiplying matrices with arbitrary precision entries. The end step of the algorithm would just be multiplying matrices of doubles, which could be done very efficiently using cuBLAS. Is there any similar implementation already out there?
Since nobody has suggested such a library so far, let's assume that one doesn't exist.
You could always implement the naive implementation:
One grid thread for each pair of coordinates in the output matrix.
Each thread performs an inner product of a row and a column in the input matrices.
Individual element operations will use the code taken from the GMP (hopefully not much more than copy-and-paste).
But you can also do better than this - just like you can do better for regular-float matrix multiplication. Here's my idea (likely not the best of course):
Consider the worked example of matrix multiplication using shared memory in the CUDA C Programming Guide. It suggests putting small submatrices in shared memory. You can still do this - but you need to be careful with shared memory sizes (they're small...):
A typical GPU today has 64 KB shared memory usable per grid block (or more)
They take 16 x 16 submatrix.
Times 2 (for the two multiplicands)
Times ceil(801/8) (assuming the GMP representation uses 600 bits from the mantissa, one bit for the sign and 200 bits from the exponent)
So 512 * 101 < 64 KB !
That means you can probably just use the code in their worked example as-is, again replacing the float multiplication and addition with code from GMP.
You may then want to consider something like parallelizing the GMP code itself, i.e. using multiple threads to work together on single pairs of 600-bit-precision numbers. That would likely help your shared memory reading pattern. Alternatively, you could interleave the placement of 4-byte sequences from the representation of your elements, in shared memory, for the same effect.
I realize this is a bit hand-wavy, but I'm pretty certain I've waved my hands correctly and it would be a "simple matter of coding".

If the command nullspace doesnt work on mathematica is there anything else that does the same thing?

I have a matrix S(105 rows and 22 columns) and I need to find its orthogonal (when I multiply S with the orthogonal the result must be a zero matrix).I searched and the only command I found that seems to do what I want is nullspace[S] but the result is not the matrix I need.It is a matrix with 8 rows and 22 columns that it doesnt give me the result I want.I tried Transpose in case it got the matrix backwards but the multiplication cannot be done either.Is there anyone who knows about mathematica that can help me?Thanks.
I am not sure, if I understood your concept of an "orthogonal" matrix, which is usually defined differently. But if you are looking for a matrix T such that T.S == {{0,0,....},...} then
T = NullSpace[Transpose[S]];
Unless your 105*22-dimensional matrix S is highly degenerate, there is no solution such that S.T==0.
In this case, T = Transpose[NullSpace[S]] will most likely render {}.

Speed up numpy matrix inverse

I am using Numpy/Scipy to invert a 20k matrix, it's slow.
I tried:
(1) M_inv = M.I
(2) Ident = np.Identity(len(M))
M_inv = scipy.linalg.solve(M, Ident)
(3) M_inv = scipy.linglg.inv(M)
but didn't see any speedup.
Is there any other way to speed this up?
This is a big matrix, and inverting it is going to be slow. Some options:
Use a numpy linked against Intel MKL (e.g. the Enthought distribution, or you can compile it yourself), which should be faster than one linked against standard BLAS/ATLAS.
If your matrix is sufficiently sparse, use scipy.linalg.sparse. (This will probably be slower if there are only a few zeros, though.)
Figure out if you really need an explicit representation of the inverted matrix to do whatever it is you're trying to do with it – often you can get away without explicitly inverting it, but it's hard to tell without knowing what it is you're doing with this matrix.

Resources