fast unitary matrix multiplication with block acceleration - algorithm

Assume I have a very fast subroutine for fixed size unitary matrix multiplication. (The subroutine may involve hardware acceleration) Say, a function called quantum_unmm_256(A, U, m) right-multiplies a m by 256 matrix A with a 256 by 256 unitary matrix U.
Now I want multiply something with a unitary matrix whose size is multiples of 256, say, a 1280x1280 unitary matrix. What would be a fast algorithm that make best use of the fast subroutine?
All matrices are assumed dense, with 64 or 128 bit float complex type.

Have a look at parallel matrix multiplication algorithms. You can always divide the matrix in blocks, and multiply it in pieces. You can even reduce the amount of operations needed.
For example, reading Wikipedia:

This isn't a full answer, but too long for a comment:
It might be easier to work with a (1280x1280) if it were reshaped to (4, 256, 4, 256), and then transposed to (4,4,256,256). But even that could require a copy() to ensure that the inner most blocks (numpy terms) are contiguous.
It could even be cast as a (4,4) object dtype array, where each element is your 'fast' unitary array.
I could elaborate on those actions if needed, but I suspect you have enough numpy skills to do it.
There are a lot things that are unclear about this question.
why both MATLAB and numpy tags
how is this block acceleration coded - if it's fast it must be compiled code; if so what's the proposed link to interpreted code
what are the constraints on the data structure? I suspect it must be some sort of contiguous block(s) of data. That's why I suggest the reshape and transpose.

Related

Complex matrix multiplication OpenCL

i’m a new programmer on opencl, i’ve to perform a multiplication of 2 complex matrix but i don’t know how to deal with complex matrix on opencl. please any help? I aleady tried matrix multiplication with normal numbers.
One way, though probably not the most efficient, would be to regard your complex matrix, Z say as being two real matrices X (the real parts) and Y the imaginary parts,ie
X[i,j]= Real( Z[i,j]) Y[i,j] = Imag( Z[i,j])
If you have another complex matrix W say, which is split as above into U and V then to multiply:
Z*W = (X*U-Y*V, X*V+Y*U)
where on the rhs we have real matrices and real matrix multiplication and addition.
In terms of multiplies and adds this will be the same amount of computation as doing the complex multiplications and additions (of the elements) directly. The inefficiency will come if you are given, and should return, arrays of complex numbers; then you have to split, as above the matrices you are going to multiply into real ones, and combine the product into complex array.

Tensorflow: Operations With Block-diagonal Matrices

I'm looking for a way to implement block-diagonal matrices in Tensorflow. Specifically, I have block-diagonal matrix A with N blocks of size S x S each. Further, I have a vector v of length N*S. I want to calculate A dot v. Is there any efficient way to do it in Tensorflow?
Also, I would prefer the implementation which supports a batch dimension of v (e.g. its real dimension is batch_size x (N*S)) and which is memory efficient, keeping in memory only block-diagonal parts of A.
Thanks for any help!
You can simply convert your tensor to a sparse tensor since a block-diagonal matrix is just a special case of it. Then, the operations are done in a efficient way. If you already have a dense representation of the tensor you can just cast it using sparse_tensor = tf.contrib.layers.dense_to_sparse(dense_tensor). Otherwise, you can construct it with the tf.SparseTensor(...) function. To get the indices, you might use tf.strided_slice, see this post for more information.

How are sparse Ax = b systems solved in practice?

Let A be an n x n sparse matrix, represented by a sequence of m tuples of the form (i,j,a) --- with indices i,j (between 0 and n-1) and a being a value a in the underlying field F.
What algorithms are used, in practice, to solve linear systems of equations of the form Ax = b? Please describe them, don't just link somewhere.
Notes:
I'm interested both in exact solutions for finite fields, and in exact and bounded-error solutions for reals or complex numbers using floating-point representation. I suppose exact or bounded-solutions for rational numbers are also interesting.
I'm particularly interested in parallelizable solutions.
A is not fixed, i.e. you don't just get different b's for the same A.
The main two algorithms that I have used and parallelised are the Wiedemann algorithm and the Lanczos algorithm (and their block variants for GF(2) computations), both of which are better than structured gaussian elimination.
The LaMacchia-Odlyzo paper (the one for the Lanczos algorithm) will tell you what you need to know. The algorithms involve repeatedly multiplying your sparse matrix by a sequence of vectors. To do this efficiently, you need to use the right data structure (linked list) to make the matrix-vector multiply time proportional to the number of non-zero values in the matrix (i.e. the sparsity).
Paralellisation of these algorithms is trivial, but optimisation will depend upon the architecture of your system. The parallelisation of the matrix-vector multiply is done by splitting the matrix into blocks of rows (each processor gets one block), each block of rows multiplies by the vector separately. Then you combine the results to get the new vector.
I've done these types of computations extensively. The original authors that broke the RSA-129 factorisation took 6 weeks using structured gaussian elimination on a 16,384 processor MasPar. On the same machine, I worked with Arjen Lenstra (one of the authors) to solve the matrix in 4 days with block Wiedemann and 1 day with block Lanczos. Unfortunately, I never published the result!

MATLAB sparsity - Is there a speed advantage in my situation?

I have an MxM matrix S whose entries are zero on the diagonal, and non-zero everywhere else. I need to make a larger, block matrix. The blocks will be size NxN, and there will be MxM of them.
The (i,j)th block will be S(i,j)I where I=eye(N) is the NxN identity. This matrix will certainly be sparse, S has M^2-M nonzero entries and my block matrix will have N(M^2-M) out of (NM)^2 or about 1/N% nonzero entries, but I'll be adding it to another NMxNM matrix that I do not expect to be sparse.
Since I will be adding my block matrix to a full matrix, would there be any speed gain by trying to write my code in a 'sparse' way? I keep going back and forth, but my thinking is settling on: even if my code to turn S into a sparse block matrix isn't very efficient, when I tell it to add a full and sparse matrix together, wouldn't MATLAB know that it only needs to iterate over the nonzero elements? I've been trained that for loops are slow in MATLAB and things like repmat and padding with zeros is faster, but my guess is that the fastest thing to do would be to not even build the block matrix at all, but write code that adds the entries of (the small matrix) S to my other (large, full) matrix in a sparse way. If I were to learn how to build the block matrix with sparse code (faster than building it in a full way and passing it to sparse), then that code should be able to do the addition for me in a sparse way without even needing to build the block matrix right?
If you can keep a full NMxNM matrix in memory, dont bother with sparse operations. In fact in most cases A+B, with A full and B sparse, will take longer than A+B, where A and B are both full.
From your description, using sparse is likely slower for your problem:
If you're adding a sparse matrix A to a full matrix B, the result is full and there's almost certainly no advantage to having A sparse.
For example:
n = 12000; A = rand(n, n); B1 = rand(n, n); B2 = spalloc(n, n, n*n);
B2 is as sparse as possible, that is, it's all zeros!
On my machine, A+B1 takes about .23 seconds while A + B2 takes about .7 seconds.
Basically, operations on full matrices use BLAS/LAPACK library calls that are insanely optimized. Overhead associated with sparse is going to make things worse unless you're in the special cases where sparse is super useful.
When is sparse super useful?
Sparse is super useful when the size of matrices suggest that some algorithm should be very slow, but because of sparsity (+ perhaps special matrix structure), the actual number of computations required is orders of magnitude less.
EXAMPLE: Solving linear system A*x=b where A is block diagonal matrix:
As = sparse(rand(5, 5)); for(i=1:999) As = blkdiag(As, sparse(rand(5,5))); end %generate a 5000x5000 sparse block diagonal matrix of 5x5 blocks
Af = full(As);
b = rand(5000, 1);
On my machine, solving the linear system on the full matrix (i.e. Af \ b) takes about 2.3 seconds while As \ b takes .0012 seconds.
Sparse can be awesome, but it's only helpful for large problems where you can cleverly exploit structure.

finding eigenvalues of huge and very sparse matrix

I have the following problem. There is a matrix A of size NxN, where N = 200 000. It is very sparse, there are exactly M elements in each row, where M={6, 18, 40, 68, 102} (I have 5 different scenarios), the rest are zeros.
Now I would like to get all the eigenvalues and eigenvectors of matrix A.
Problem is, I cannot put matrix A into memory as it is around 160 GB of data. What I am looking for is a software that allows nice storing of sparse matrix (without zeros, my matrix is just few MB) and then putting this stored matrix without zeros to the algorithm that calculates eigenvalues and vectors.
Can any of you recommend me a software for that?
EDIT: I found out I can reconfigure my matrix A so it becomes a band matrix. Then I could use LAPACK to get the eigenvalues and eigenvectors (concretely: http://software.intel.com/sites/products/documentation/doclib/iss/2013/mkl/mklman/GUID-D3C929A9-8E33-4540-8854-AA8BE61BB08F.htm). Problem is, I need all the vectors, and since my matrix is NxN, I cannot allow LAPACK to store the solution (all eigenvectors) in the memory. The best way would be a function that will give me first K eigenvectors, then I rerun the program to get the next K eigenvectors and so on, so I can save the results in a file.
You may try to use the SLEPC library http://www.grycap.upv.es/slepc/description/summary.htm :
"SLEPc the Scalable Library for Eigenvalue Problem Computations, is a software library for the solution of large sparse eigenproblems on parallel computers."
Read the second chapter of their users'manual, "EPS: Eigenvalue Problem Solver". They are focused on methods that preserve sparcity...but a limited number of eigenvalues and eigenvectors are computed.
I hope our matrices have good properties (positive definite for instance...).
EPSIsPositive(EPS eps,PetscBool *pos);
You may be interrested in "spectrum slicing" to compute all eigenvalues in a given interval... Or you may set a target and compute the closest eigenvalue around this target.
See http://www.grycap.upv.es/slepc/documentation/current/docs/manualpages/EPS/EPSSetWhichEigenpairs.html#EPSSetWhichEigenpairs
See examples http://www.grycap.upv.es/slepc/documentation/current/src/eps/examples/tutorials/index.html
Why do you need to compute all eigenvectors for such large matrices ?
Bye,

Resources