I am working on solving the linear algebraic equation Ax = b by using Eigen solvers through mex function of Matlab. Given a complex sparse matrix A and a sparse vector b from Matlab workspace, I want to map matrix A and vector b in Eigen sparse matrix format. After that, I need to use Eigen's linear equation solvers to solve it. At the end I need to transfer the results x to Matlab workspace.
However, since I am not good at C++ and not familiar with Eigen either. I am stuck at the first step, namely constructing the complex sparse matrix in Eigen accepted format.
I have found there is the following function in Eigen,
Eigen::MappedSparseMatrix<double,RowMajor> mat(rows, cols, nnz, row_ptr, col_index, values);
And I can use mxGetPr, mxGetPi, mxGetIr, mxGetJc, etc, these mex functions to get the info for the above "rows, cols, nnz, row_ptr, col_index, values". However, since in my case, matrix A is a complex sparse matrix, I am not sure whether "MappedSparseMatrix" can do that.
If it can, how the format of "MappedSparseMatrix" should be ? Is the following correct ?
Eigen::MappedSparseMatrix<std::complex<double>> mat(rows, cols, nnz, row_ptr, col_index, values_complex);
If so, how should I construct that values_complex ?
I have found about a relevant topic before. I can use the following codes to get a complex dense matrix.
MatrixXcd mat(m,n);
mat.real() = Map<MatrixXd>(realData,m,n);
mat.imag() = Map<MatrixXd>(imagData,m,n);
However, since my matrix A is a sparse matrix, it seems that it will produce errors if I define mat as a complex sparse matrix like the following:
SparseMatrix<std::complex<double> > mat;
mat.real() = Map<SparseMatrix>(rows, cols, nnz, row_ptr, col_index, realData);
mat.imag() = Map<SparseMatrix>(rows, cols, nnz, row_ptr, col_index, imagData);
So can anyone provide some advice for that?
MatlLab stores complex entries in two separate buffers: one for the real components and one for the imaginary components, whereas Eigen needs them to be interleaved:
value_ptr = [r0,i0,r1,i1,r2,i2,...]
so that it is compatible with std::complex<>. So in your case, you will have to create yourself a temporary buffer holding the values in that interleaved format to be passed to MappedSparseMatrix, or, if using Eigen 3.3, to Map<SparseMatrix<double,RowMajor> >.
Moreover, you will have to adjust the buffer of indices so that they are zero-based. To this end, decrement by one all entries of col_ptr and row_ptr before passing them to Eigen, and increment them by one afterward.
Related
Say you have Matrix<int, 4, 2> and Matrix<int, 3, 2> which you want to multiply in the natural way that consumes the -1 dimension without first transposing.
Is this possible? Or do we have to transpose first. Which would be silly (unperformative) from a cache perspective, because now the elements we are multiplying and summing aren't contiguous.
Here's a playground. https://godbolt.org/z/Gdj3sfzcb
Pytorch provides torch.inner and torch.tensordot which do this.
Just like in Numpy, transpose() just creates a "view". It doesn't do any expensive memory operations (unless you assign it to a new matrix). Just call a * b.transpose() and let Eigen handle the details of the memory access. A properly optimized BLAS library like Eigen handles the transposition on smaller tiles in temporary memory for optimal performance.
Memory order still matters for fine tuning though. If you can, write your matrix multiplications in the form a.transpose() * b for column-major matrices (like Eigen, Matlab), or a * b.transpose() for row-major matrices like those in Numpy. That saves the BLAS library the trouble of doing that transposition.
Side note: You used auto for your result. Please read the Common Pitfalls chapter in the documentation. Your code didn't compute a matrix multiplication, it stored an expression of one.
I'm getting a crash in Eigen 3.3.5 when trying to do something like the below:
Eigen::Map<Eigen::Matrix<float, 1, Eigen::Dynamic, Eigen::RowMajor>> eigenValues(valueBuffer, 1, 100000);
Eigen::Map<Eigen::Matrix<float, 1, Eigen::Dynamic, Eigen::RowMajor>> eigenChannels(channelBuffer, 1, 5000);
Eigen::SparseMatrix<float, Eigen::RowMajor> sparseChannels = eigenChannels.sparseView(1.0f, 1.e-4f);
Eigen::Map<const Eigen::SparseMatrix<float>> eigenLargeSparseMatrix(5000, 100000, LargeSparseMatrix.Values.Num(), LargeSparseMatrix.OuterStarts.GetData(), LargeSparseMatrix.InnerIndices.GetData(), LargeSparseMatrix.Values.GetData());
eigenValues += (sparseChannels * eigenLargeSparseMatrix);
Specifically, it's crashing in Eigen::internal::sparse_sparse_to_dense_product_impl in the inner loop when trying to grab the index of the lhsIt.
Assume that I've already checked that all the sizes of everything are correct, all my buffers are initialized correctly with real memory, etc. I've been going over every detail of this for a few days trying to find an error in my reasoning or logic.
Basically all I'm trying to do is do:
1xn row vector += (1xm row vector * mxn matrix)
where the left side is dense and both right side vector/matrices are sparse.
What appears to be happening from looking at the templated callstack is the add_assign_op correctly recognizes that the row vector has the RowMajor flag and the matrix is ColMajor, but then the sparse_sparse_to_dense_product_impl has both the lhs and rhs being ColMajor.
From looking at the sparse_sparse_to_dense_product_selector code, this appears to be because Eigen just changes the RowMajor lhs into a ColMajorLhs and calls the product impl. This seems bound to crash- it's a row vector for a reason, I'm not sure why eigen finds the need to transpose it. I'm really unsure how this is mean to work.
My challenge is I need (for memory streaming efficiency), the larger matrix to be organized sequentially in memory such that either a) it's col major, and being pre-multiplied by a row vector, or b) it's row major and being post multiplied by a col vector. Both versions hit this weird transpose code rather than just letting them be different.
Can anyone lend a hand? Am I doing something wrong? Is this a bug?
ADDED:
I'm reasonable sure the crash is my own fault at this point after all, but I'd still like to understand why the row vector is being transposed before the multiply, as I would assume this would produce undesirable behavior. Basically at this point, I would just like to understand why the sparse_sparse_to_dense_product_selector transposes row vectors into column vectors before the multiply.
I used Rcpp (especially Rcpp Armadillo) to perform a method that returns as result several large matrix, for example of size 10000*10000. How can I save these matrix to use them in R environment. Assume that my code in Rcpp looks like:
list Output (20000);
for( int i(0);i<20000;++1 ){
...
...
// Suppose that the previous lines allow me to compute a matrix Gi of size 10000*10000
Output(i)=Gi;
}
return Output;
The way I programmed is very costly and need enough memory. But I need the 20000 matrix to compute an estimator in R environment. How can I save the matrix ? I do not know if bigmatrix package can help me.
Best,
I finally found a solution. I noticed that I will need 15TB to save the matrices. That is impossible. What I finally did is to save only some features of the matrices, as eigenvalues for example and others. See more details here
Given a super-basic vertex shader such as:
output.position = mul(position, _gWorldViewProj);
I was having a great deal of trouble because I was setting _gWorldViewProj as follows; I tried both (a bit of flailing) to make sure it wasn't just backwards.
mWorldViewProj = world * view * proj;
mWorldViewProj = proj * view * world;
My solution turned out to be:
mWorldView = mWorld * mView;
mWorldViewProj = XMMatrixTranspose(worldView * proj);
Can someone explain why this XMMatrixTranspose was required? I know there were matrix differences between XNA and HLSL (I think) but not between vanilla C++ and HLSL, though I could be wrong.
Problem is I don't know if I'm wrong or what I'm wrong about! So if someone could tell me precisely why the transpose is required, I hopefully won't make the same mistake again.
On the CPU, 2D arrays are generally stored in row-major ordering, so the order in memory goes x[0][0], x[0][1], ... In HLSL, matrix declarations default to column-major ordering, so the order goes x[0][0], x[1][0], ...
In order to transform the memory from the format defined on the CPU to the order expected in HLSL, you need to transpose the CPU matrix before sending it to the GPU. Alternatively, you can row_major keyword in HLSL to declare the matrices as row major, eliminating the need for a transpose but leading to different codegen in HLSL (you'll often end up with mul-adds instead of dot-products).
I am trying to use CUSP as an external linear solver for Mathematica to use the power of the GPU.
Here is the CUSP Project webpage. I am asking for some suggestion how we can integrate CUSP with Mathematica. I am sure many of you here will be interested to discuss this. I think writing a input matrix and then feeding it to CUSP program is not the way to go. Using Mathematica's LibrarayFunctionLoad will be a better way to pipeline the input matrix to the GPU based solver on the fly. What will be the way to supply the matrix and the right hand side matrix directly from Mathematica?
Here is some CUSP code snippet.
#include <cusp/hyb_matrix.h>
#include <cusp/io/matrix_market.h>
#include <cusp/krylov/cg.h>
int main(void)
{
// create an empty sparse matrix structure (HYB format)
cusp::hyb_matrix<int, float, cusp::device_memory> A;
// load a matrix stored in MatrixMarket format
cusp::io::read_matrix_market_file(A, "5pt_10x10.mtx");
// allocate storage for solution (x) and right hand side (b)
cusp::array1d<float, cusp::device_memory> x(A.num_rows, 0);
cusp::array1d<float, cusp::device_memory> b(A.num_rows, 1);
// solve the linear system A * x = b with the Conjugate Gradient method
cusp::krylov::cg(A, x, b);
return 0;
}
This question gives us the possibility to discuss compilation capabilities of Mathematica 8. It is also possible to invoke the topic of mathlink interface of MMA. I hope people here find this problem worthy and interesting enough to ponder on.
BR
If you want to use LibraryLink (for which LibraryFunctionLoad is used to access a dynamic library function as a Mathematica downvalue) there's actually not much room for discussion, LibraryFunctions can receive Mathematica tensors of machine doubles or machine integers and you're done.
The Mathematica MTensor format is a dense array, just as you'd naturally use in C, so if CUSP uses some other format you will need to write some glue code to translate between representations.
Refer to the LibraryLink tutorial for full details.
You will want to especially note the section "Memory Management of MTensors" in the Interaction with Mathematica page, and choose the "Shared" mode to just pass a Mathematica tensor by reference.