Parallel Matrix Multiplication using multi GPU

Parallel Matrix Multiplication using multi GPU - matrix

I have installed two GPUs (2x Nvidia Quadro 410) in my system in different pci slots. To solve Martix multiplication on both of these GPU, how can I split the input matrices such that each GPU processes/computes a part of output matrix and then returns it back.
For eg. for two matrix A, B each of order 10x10 , then the to compute the output matrix C= A x B ,such that ,out of 100 elements(10 x 10) 50 elements should be calculated on 1st GPU and other half i.e 50 to b computed in 2nd GPU.
I am trying to implement it on OpenCL. But, any algorithm is welcomed which will help me come up with the solution.

In general, if you have matrices X (of size axb, rows first) and Y (of size bxc),
X * Y = vcat(X[0:a/2,0:b] * Y, X[a/2:a,0:b] * Y)
In this pseudocode, vcat is vertical concatenation (putting one matrix on top of each other, e.g. a 4x3 matrix concatenated with 2x3 matrix will produce a 6x3 matrix), : denotes ranges and [] is indexing.
Both arguments to vcat can be computed on different GPUs, and the concatenation can be achieved just by pointing the output to different sub-regions of the output buffer (assuming we have C-ordered arrays). The initial splitting of X can be similarly achieved just by using different sub-regions (since it is split along a row).

Related

applying Amdahl’s and Gustafson’s law on matrix vector multiply

I read about these laws on many threads here but still could not figure out how to apply their formulas on matrix vector multiply(y = y+ Ax). here I will try to explain my algorithm with respect to time:
T1(sequential): processor zero generates vectors y and x and broadcast
them. T2(parallel):matrix size(n) is divided among processors and
each marix generates its on portion and does the multiplication.
All processors then send results to processor zero.
T3(sequential):processor zero collects results, orders them and print results.
If I run this multiple times with different matrix sizes and processors. how can I apply Amdahl’s and Gustafson’s law on the results

How to reduce a matrix rank using some zeros?

I'm working of matrices having rank >1. It is possible to reduce the rank of a matrix to rank=1 substituing some values to zeros?

Rank in a matrix refers to how many of the column vectors are independent and non-zero (Or row vectors, but I was taught to always use column vectors). So, if you're willing to lose a lot of the information about the transformation your matrix is defining, you could create a matrix that's just the first non-zero column of your matrix, and everything else set to zero. Guaranteed to be rank 1.
However, that loses a whole lot of information about the transformation. Perhaps a more useful thing to do would be project your matrix onto a space of size 1x1. There are ways to do this in such a way that can create an injection from your matrix to the new space, guaranteeing that no two matrices produce an equivalent result. The first one that comes to mind is:
Let A be an n x m matrix
Let {P_i} be the ith prime number.
Let F(A) = {sum from i to (n * m)} {P_i} ^ (A_(i div n),(i mod m))
While this generates a single number, you can think of a single number as a 1 x 1 matrix, which, if non-zero, has rank 1.
All that being said, rank 1 matrices are kinda boring and you can do cooler stuff with matrices if you keep it at rank != 1. In particular, if you have an n x n matrix with rank n, a whole world of possibility opens up. It really depends on what you want to use these matrices for.

You might want to look at the singular value decomposition, which can be used to write your matrix as a sum of weighted outer products (see here). Choosing only the highest-weighted component of this sum will give you the closest rank-1 approximation to the decomposed matrix.
Most common linear algebra libraries (Eigen, OpenCV, NumPy) have an SVD implementation.

GPU: the fastest way to transpose, reshape and multiply two matrices

What is the fastest way to transpose, reshape and multiply two matrices in Matlab? I want to do the following:
B = B';
B = reshape(B, 20, 5 000 000);
A = A * B
where A is 20 x 20 real matrix and B is 25 million x 4 real matrix. Using the implementation above, the transpose operation is ~4 times slower than the matrix multiplication (on GPU).
I heard about dgemm which seems relevant, but not exactly what I'm looking for (it allows one to multiply matrices, transpose them and add stride (see LDA argument) in a single and fast operation).
I'm mostly interested in the case when both A and B are already gpuArrays and we are using modern nVidia GPU hardware.

Difference between observations and variables in Matlab

I'm kind of ashamed to even ask this but here goes. In every Matlab help file where the input matrix is a NxD matrix X Matlab describes the matrix arrangement as
Data, specified as a numeric matrix. The rows of X correspond to
observations, and the columns correspond to variables.
Above taken from help of kmeans
I'm kind of confused as to what does Matlab mean by observations and variables.
Suppose I have a data matrix composed of 100 images. Each image is represented by a feature vector of size 128 x 1. So here is 100 my observations and 128 the variables or is it the other way around?
Will my data matrix be of the size 128 x 100 or 100 x 128

Eugene's explanation in a statistical and probability construct is great, but I would like to explain it more in the viewpoint of data analysis and image processing.
Think of an observation as one sample from your data set. In this case, one observation is one image. For each sample, it has some dimensionality associated to it or a number of variables used to represent such a sample.
For example, if we had a set of 100 2D Cartesian points, the amount of observations is 100, while the dimensionality or the total number of variables used to describe the point is 2: We have a x point and a y point. As such, in the MATLAB universe, we'd place all of these data points into a single matrix. Each row of the matrix denotes one point in your data set. Therefore, the matrix you would create here is 100 x 2.
Now, go back to your problem. We have 100 images and each image can be expressed by 128 features. This suspiciously looks like you are trying to use SIFT or SURF to represent an image so think of this situation where each image can be described by a 128-dimensional vector, or a histogram with bins of 128 elements. Each feature is part of the dimensionality makeup that makes up the image. Therefore, you would have a 100 x 128 matrix. Each row represents one image, where each image is represented as a 1 x 128 feature vector.
In general, MATLAB's machine learning and data analysis algorithms assume that your matrix is M x N, where M is the total number of points that make up your data set while N is the dimensionality of one such point in your data set. In MATLAB's universe, the total number of observations is equal to the total number of points in your data set, while the total number of features / distinct attributes to represent one sample is the total number of variables.
tl:dr
Observation: One sample from your data set
Variable: One feature / attribute that helps describe an observation or sample in your data set.
Number of observations: Total number of points in your data set
Number of variables: Total number of features / attributes that make up an observation or sample in your data set.

It looks like you are talking about some specific statistical/probabilistic functions. In statistics or probability theory there are some random variables that are results of some kind of measurements/observations over time (or some other dimension). So such a matrix is just a collection of N measurements of D different random variables.

Usage of the gaussian blur

I read now tons of different explanations of the gaussian blur and I am really confused.
I roughly understand how the gaussian blur works.
http://en.wikipedia.org/wiki/Gaussian_blur
I understood that we choose 3*sigma as the maxium size for our mask because the values will get really small.
But my three questions are:
How do I create a gaussian mask with the sigma only?
If I understood it correctly, the mask gives me the weights, then
I place the mask on the top left pixel. I multiply the weights for
each value of the pixels in the mask. Then I move the mask to the
next pixel. I do this for all pixels. Is this correct?
I also know that 1D masks are faster. So I create a mask for x and a
mask for y. Lets say my mask would look like this. (3x3)
1 2 1
2 4 2
1 2 1
How would my x and y mask look like?

1- A solution to create a gaussian mask is to setup an N by N matrix, with N=3*sigma (or less if you want a coarser solution), and fill each entry (i,j) with exp(-((i-N/2)^2 + (j-N/2)^2)/(2*sigma^2)). As a comment mentioned, taking N=3*sigma just means that you truncate your gaussian at a "sufficiently small" threshold.
2- yes - you understood correctly. A small detail is that you'll need to normalize by the sum of your weights (ie., divide the result of what you said by the sum of all the elements of your matrix). The other option is that you can build your matrix already normalized, so that you don't need to perform this normalization at the end (the normalized gaussian formula becomes exp(-((i-N/2)^2 + (j-N/2)^2)/(2*sigma^2))/(2*pi*sigma))
3- In your specific case, the 1D version is [1 2 1] (ie, both your x and y masks) since you can obtain the matrix you gave with the multiplication transpose([1 2 1]) * [1 2 1]. In general, you can directly build these 1D gaussians using the 1D gaussian formula which is similar as the one above : exp(-((i-N/2)^2)/(2*sigma^2)) (or the normalized version exp(-((i-N/2)^2)/(2*sigma^2)) / (sigma*sqrt(2*pi)))

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio