Is there any built-in type of matrix in CUDA for matrix and matrix-vector operation? - matrix

I want to implement some matrix–vector math. There are vector types like: float2, int2, but I cannot find any built-in type matrix in CUDA.
Is there a library that suitable for such operations?

You're right to look for a library for matrix data types. I recommend taking a look at ArrayFire.
Here is the quick reference page with a listing of the supported types. Here are the functions you can run with is, which is organized into the categories of data analysis, linear algebra, image and signal processing, sparse matrices, and a bunch of common place algorithms for data indexing, sorting, reductions, visualization, and faster for loops.
Other libraries include CULA or MAGMA (focused on linear algebra), Thrust (targeted at 1D operations), and a host of niche academic libraries.
Disclaimer: I work on ArrayFire myself.

Related

Tuple v/s StaticVectors in Julia

If I understand correctly, then as tuples are immutable in Julia, they must also be stack allocated (similar to StaticVectors). So there should be not any advantage of using StaticVectors in place of Tuples when I am dealing with small vectors say, a length 3 vector for coordinates of a particle. Can someone highlight the advantages of using StaticVectors in such cases. And more broadly what will be the use cases where I would possible want to choose using one over the other?
Thanks.
The raw performance is similar, since StaticArrays are built on tuples. The point of StaticArrays is all the functionality, the linear algebra, the solvers, sorting, the mutable arrays, etc.
Tuples are a barebones data collection with barely any mathematical structure. That's fine as far as it goes, but StaticArrays has done most of the work you would have to do yourself with tuples.

Sparse matrix design in modern C++

I want to implement a sparse matrix class using modern c++ ie 14 or 17. I know there must be some trade offs between storage and runtime efficiency. Right now I'd prefer to optimize more in terms of storage efficiency. If possible, I'd prefer more work to be done at compile time rather than runtime. For example, vector does have a lot of runtime checks so it may not be optimal. Can someone suggest a container for this? I plan to support the following operations:
Matrix Multiplication, Addition, Subtraction, Inversion and Transpose
Matrix iterators ie column row
Efficient constructors
etc
Thanks!

Which hashing algorithm is suited for image local descriptors?

I run a sliding-window (akin to a convolution kernel) on an image and extract the means/color histograms for each window. However since the data is very high dimensional I wish to hash it as a signature, so I can perform approximate Nearest neighbor image searching by aggregating the windows.
>>> means = cv2.mean(roi) #roi = window
>>> means
(181.12238527002307, 199.18315040165433, 206.514296508391, 0.0)..... => **some numeric hash**
Which hashing algorithm is appropriate for this situation? I have tried Md5 and SHA-1, but those are cryptographic and probably unsuited for k-NN.
I have read about MinHash and SimHash, but unsure if they are suited for my usecase. Any suggestions?
sliding window image example
Locality Sensitive Hashing (LSH) is a good candidate for your purpose. According to "Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions" (by Alexandr Andoni and Piotr Indyk)", it is suitable for performing approximate search in high dimensional spaces.
The characteristic of the hash functions employed in LSH is that their probability of collision is higher for feature vectors that are close to each other than for those that are far apart. Cryptographic hash functions don't have this property. With a crypto-hash, even a very small change in your feature vector will produce an extensive change in the hash value.
OpenCV FLANN has an implementation of LSH. Also the authors of the above paper provide an implementation here.
Having said that, I suggest you evaluate other algorithms in OpenCV FLANN on your dataset, so you can select the best one.
Regarding the features, you are basically using color information that are very sensitive to illumination. May be illumination is irrelevant in your case. Better if you can try other feature descriptors (SIFT/SURF, ORB, BRIEF and many more) and evaluate their performance with different algorithms.

Calculating the Jacobian using the Eigen library

I want to be able to compute the Jacobian matrix using the Eigen C++ library, but I cannot find any documentation on how to do this.
Previously, I have achieved this using the numdifftools package in Python. The function is:
numdifftools.Jacobian(ForwardsFunction)([input 1, input 2, input 3, ....])
Here, ForwardsFunction is a user-defined function which calculates the output state given the input state ([input 1, input 2, input 3, ...]). The numdifftools.Jacobian() method then automatically calculates the Jacobian for these input values, presumable using some automatic differentiation.
Is there an equivalent function in the Eigen library?
There are some tools in Eigen library that perform numerical differentiation.
Take a look at:
https://eigen.tuxfamily.org/dox/unsupported/group__NumericalDiff__Module.html
https://eigen.tuxfamily.org/dox/unsupported/classEigen_1_1AutoDiffScalar.html
You might notice, that those modules are "unsupported" (not a part of a official Eigen library). The reasoning is the following: Eigen is a library for linear algebra, thus manipulating with sparse and dense matrices, and numerical differentiation are a little bit on the edge of its scope. - So the priority of including them inside the library is lower. Those modules, as far as I know, are used in their solvers in a very specific way. I don't have an experience of using those Eigen::Numerical Differentiation classes in my project, though you might give it a try.
https://eigen.tuxfamily.org/dox/unsupported/classEigen_1_1NumericalDiff.html

Calculate eigenvalues/eigenvectors of hundreds of small matrices using CUDA

I have a question on the eigen-decomposition of hundreds of small matrices using CUDA.
I need to calculate the eigenvalues and eigenvectors of hundreds (e.g. 500) of small (64-by-64) real symmetric matrices concurrently. I tried to implement it by the Jacobi method using chess tournament ordering (see this paper (PDF) for more information).
In this algorithm, 32 threads are defined in each block, while each block handles one small matrix, and the 32 threads work together to inflate 32 off-diagonal elements until convergence. However, I am not very satisfied with its performance.
I am wondering where there is any better algorithm for my question, i.e. the eigen-decomposition of many 64-by-64 real symmetric matrices. I guess the householder's method may be a better choice but not sure whether it can be efficiently implemented in CUDA. There are not a lot of useful information online, since most of other programmers are more interested in using CUDA/OpenCL to decompose one large matrix instead of a lot of small matrices.
At least for the Eigenvalues, a sample can be found in the Cuda SDK
http://www.nvidia.de/content/cudazone/cuda_sdk/Linear_Algebra.html
Images seem broken, but download of samples still works. I would suggest downloading the full SDK and having a look at that exsample. Also, this Paper could be helpfull:
http://docs.nvidia.com/cuda/samples/6_Advanced/eigenvalues/doc/eigenvalues.pdf

Resources