Sparse matrix creation in parallel - parallel-processing

Are there any algorithms that allow efficient creation (element filling) of sparse (e.g. CSR or coordinate) matrix in parallel?

If you store your matrix as a coordinate map, any language which has a concurrent dictionary implementation available should do the job for you.
Java's got the ConcurrentHashMap, and .NET 4 has ConcurrentDictionary, both of which allow multi-threaded non-blocking (afaik) element insertion in parallel.

There are no efficient algorithms for creating sparse matrices in data-parallel way. Plausible is coordinate matrix type which requires sorting after content filling, but that type is slow for matrix products etc.
Solution is you don't build sparse matrix - you don't keep it in memory; you do implicit operations in place when you're calculating elements of sparse matrix.

Related

Tensorflow: Operations With Block-diagonal Matrices

I'm looking for a way to implement block-diagonal matrices in Tensorflow. Specifically, I have block-diagonal matrix A with N blocks of size S x S each. Further, I have a vector v of length N*S. I want to calculate A dot v. Is there any efficient way to do it in Tensorflow?
Also, I would prefer the implementation which supports a batch dimension of v (e.g. its real dimension is batch_size x (N*S)) and which is memory efficient, keeping in memory only block-diagonal parts of A.
Thanks for any help!
You can simply convert your tensor to a sparse tensor since a block-diagonal matrix is just a special case of it. Then, the operations are done in a efficient way. If you already have a dense representation of the tensor you can just cast it using sparse_tensor = tf.contrib.layers.dense_to_sparse(dense_tensor). Otherwise, you can construct it with the tf.SparseTensor(...) function. To get the indices, you might use tf.strided_slice, see this post for more information.

How are sparse Ax = b systems solved in practice?

Let A be an n x n sparse matrix, represented by a sequence of m tuples of the form (i,j,a) --- with indices i,j (between 0 and n-1) and a being a value a in the underlying field F.
What algorithms are used, in practice, to solve linear systems of equations of the form Ax = b? Please describe them, don't just link somewhere.
Notes:
I'm interested both in exact solutions for finite fields, and in exact and bounded-error solutions for reals or complex numbers using floating-point representation. I suppose exact or bounded-solutions for rational numbers are also interesting.
I'm particularly interested in parallelizable solutions.
A is not fixed, i.e. you don't just get different b's for the same A.
The main two algorithms that I have used and parallelised are the Wiedemann algorithm and the Lanczos algorithm (and their block variants for GF(2) computations), both of which are better than structured gaussian elimination.
The LaMacchia-Odlyzo paper (the one for the Lanczos algorithm) will tell you what you need to know. The algorithms involve repeatedly multiplying your sparse matrix by a sequence of vectors. To do this efficiently, you need to use the right data structure (linked list) to make the matrix-vector multiply time proportional to the number of non-zero values in the matrix (i.e. the sparsity).
Paralellisation of these algorithms is trivial, but optimisation will depend upon the architecture of your system. The parallelisation of the matrix-vector multiply is done by splitting the matrix into blocks of rows (each processor gets one block), each block of rows multiplies by the vector separately. Then you combine the results to get the new vector.
I've done these types of computations extensively. The original authors that broke the RSA-129 factorisation took 6 weeks using structured gaussian elimination on a 16,384 processor MasPar. On the same machine, I worked with Arjen Lenstra (one of the authors) to solve the matrix in 4 days with block Wiedemann and 1 day with block Lanczos. Unfortunately, I never published the result!

Evolving a matrix using a genetic algorithm

I recently discovered genetic algothims and after doing a little research I can't find any example on how to evolve structures more complex than a vector or a string.
Let's say that I'm using a covariance matrix for a certain computation (to compute a mahalanobis distance for example) and I want to look for a better matrix to do the job and linimize a certain criteria, are there any classic examples on how to evolve the matrix and which crossover operators to use ?
Thanks !
Any structure of fixed size and shape that is made of numbers (or any other elements) can be rewritten to a 1-D vector and back. You can then use any operator you like which works on vectors.
If you wanted to work with matrices (or any other structures) directly you can always design your own operators, but a matrix basically is a vector, just written in a different way. For the matrix case there are a number of possibilites of operators (crossover):
Swap rows/columns (between the parents)
Swap submatrices (generalization of the above)
Continuous-space crossover methos like BLX-alpha, PCX, arithmetic crossover... These all are designed for vectors but you will just treat the matrix as a vector (it's really not that different).
Mutation is probably going to be more or less identical to the vector-like - you just mutate the elements (or some of them).

Armadillo c++: Is there a specific way for creating efficiently triangular or symmetric matrix

I am using armadillo mostly for symmetric and triangular matrices. I wanted to be efficient in terms of memory storage. However, it seems there is no other way than to create a new mat and fill with zeros(for triangular) or with duplicates(for symmetric) the lower/upper part of the matrix.
Is there a more efficient way of using triangular/symmetric matrices using Armadillo?
Thanks,
Antoine
There is no specific support for triangular or banded matrices in Armadillo. However, since version 3.4 support for sparse matrices has gradually been added. Depending on what Armadillo functions you need, and the sparsity of your matrix, you might gain from using SpMat<type> which implements the compressed sparse column (CSC) format. For each nonzero value in your matrix the CSC format stores the row index along with the value so you would likely not save much memory for a triangular matrix. A banded diagonal matrix should however consume significantly less memory.
symmatu()/symmatl() and trimatu()/trimatl()
may be what you are looking for:
http://arma.sourceforge.net/docs.html

Why permutation matrices are used to swap rows of an array?

What are the advantages of using a permutation matrix to swap rows? Why one would create a permutation matrix and then apply a matrix multiplication, is it easier and more efficient than just swapping rows with a for loop?
Permutation matrices are a useful mathematical abstraction, because they allow analysis using the normal rules of matrix algebra, without having to introduce another type of operation.
In software, good implementations do not store a permutation matrix as a full matrix, they store a permutation array and they apply it directly (without a full matrix multiplication).
Depending on the sizes of the matrices and the operations and access patterns involved, it may be cheaper not to apply the permutation to the data in memory at all, but just to use it as an extra indirection. So, when you request (P * M)(i,j), where P is a permutation matrix and M is some other matrix that you are permuting, the data need not be re-arranged at all, but rather the element access operation will look up the permuted row when you access the element.
The first thing that comes into my mind is the issue called "spatial locality". Caching technologies assume that if a memory location is accessed, it is probable to access the nearby locations of the memory. In some programming languages, elements in rows are neighbors whereas elements in columns are neighbors in others. It depends on the implementation. I guess permutation matrices are designed to solve this problem, since optimization of matrix multiplication is one of the problems that algorithms academia mostly works on improving. Simple loop structure will not be able to make use of cache technologies to improve performance.

Resources