Which is better between 1D and 2D partitioning in matrix operations and how is it better?
I have looked for how both partitions work but still couldn’t find which one is better.
Can anyone please help me..
For distributed computation of sparse matrices, 2D partitioning is shown to be more scalable than 1D partitioning [1]. Having p processes, if you create a two dimensional grid of p^2 tiles, a 2D partitioning such as 2D-cyclic limits the communication of a row/column group of tiles to sqrt(p) processes whereas, e.g., 1D-column has to communicate with p processes for row group communication and no other process for column group communication. Therefore, 1D-column speedup is bound to a larger communication time which is a factor of p.
[1] Buluc, Aydin, and John R. Gilbert. Linear algebraic primitives for parallel computing on large graphs. University of California, Santa Barbara, 2010.
Related
I have a huge dataset. We are talking about 100 3D matrices with 121x145x121 cells. Any cell has a value between 0 and 1, and I need a way to cluster these cells according to their correlation. The problem is the dataset is too big for any algorithm I know; even using just half of it (any matrix is a MRI scan of a brain) we have around 400 billion pairs. Any ideas?
As a first step I would be tempted to try K-means clustering.
This appears in the Matlab statistics toolbox as the function kmeans.
In this algorithm you only end up computing the distances between the K current centres and the data, so the number of pairs is much smaller than comparing all choices.
In Matlab, I've also found that the speed of the operation can be quite dependent on the organisation of your matrix (due to memory caching and optimisation issues). I would recommend transforming your 3d matrices so that the columns (held together in memory) correspond to the 100 values for a particular cell.
This can be done with the permute function.
Try a weighted K-means++ clustering algorithm. Create one matrix of the sum of values for all the 100 input matrices at every point to produce one "grey scale" matrix, then adjust the K-means++ algorithm to work with weighted, (wt), values.
In the initialization phase choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to D(X)^2 x wt^2 .
The assignment step should be okay, but when computing the centroids in the update step adjust the formula to account for the weights. (Or use the same formula but each point is used wt times).
You may not be able to use a library function to do this but you start with a 100 fold decrease in number of points and matrices to work with.
I have a huge journal with actions done by users (like, for example, moderating contents).
I would like to find the 'mass' actions, meaning the actions that are too dense (the user probably made those actions without thinking it too much :) ).
That would translate to clustering the actions by date (in a linear space), and to marking the clusters that are too dense.
I am no expert in clustering algorithms and methods, but I think the k-means clustering would not do the trick, since I don't know the number of clusters.
Also, ideally, I would also like to 'fine tune' the algorithm.
What would you advice?
P.S. Here are some resources that I found (in Ruby):
hierclust - a simple hierarchical clustering library for spatial data
AI4R - library that implements some clustering algorithms
K-means would probably do a good job as long as you're interested in an a priori known number of clusters. Since you don't you might consider reading about the LBG algorithm, which is based on k-means and is used in data compression for vector quantisation. It's basically iterative k-means which splits centroids after they converge and keeps splitting until you achieve an acceptable number of clusters.
On the other hand, since your data is one-dimensional, you could do something completely different.
Assume that you've got actions which took place at 5 points in time: (8, 11, 15, 16, 17). Let's plot a Gaussian for each of these actions with μ equal to the time and σ = 3.
Now let's see how a sum of values of these Gaussians looks like.
It shows a density of actions with a peak around 16.
Based on this observation I propose a following simple algorithm.
Create a vector of zeroes for the time range of interest.
For each action calculate the Gaussian and add it to the vector.
Scan the vector looking for values which are greater than the maximum value in the vector multiplied by α.
Note that for each action only a small section of the vector needs updates because values of a Gaussian converge to zero very quickly.
You can tune the algorithm by adjusting values of
α ∈ [0,1], which indicates how significant a peak of activity has to be to be noted,
σ, which affects the distance of actions which are considered close to each other, and
time periods per vector's element (minutes, seconds, etc.).
Notice that the algorithm is linear with regard to the number of actions. Moreover, it shouldn't be difficult to parallelise: split your data across multiple processes summing Gaussians and then sum generated vectors.
Have a look at density based clustering. E.g. DBSCAN and OPTICS.
This sounds like exactly what you want.
What are the advantages of using a permutation matrix to swap rows? Why one would create a permutation matrix and then apply a matrix multiplication, is it easier and more efficient than just swapping rows with a for loop?
Permutation matrices are a useful mathematical abstraction, because they allow analysis using the normal rules of matrix algebra, without having to introduce another type of operation.
In software, good implementations do not store a permutation matrix as a full matrix, they store a permutation array and they apply it directly (without a full matrix multiplication).
Depending on the sizes of the matrices and the operations and access patterns involved, it may be cheaper not to apply the permutation to the data in memory at all, but just to use it as an extra indirection. So, when you request (P * M)(i,j), where P is a permutation matrix and M is some other matrix that you are permuting, the data need not be re-arranged at all, but rather the element access operation will look up the permuted row when you access the element.
The first thing that comes into my mind is the issue called "spatial locality". Caching technologies assume that if a memory location is accessed, it is probable to access the nearby locations of the memory. In some programming languages, elements in rows are neighbors whereas elements in columns are neighbors in others. It depends on the implementation. I guess permutation matrices are designed to solve this problem, since optimization of matrix multiplication is one of the problems that algorithms academia mostly works on improving. Simple loop structure will not be able to make use of cache technologies to improve performance.
Are there any algorithms that allow efficient creation (element filling) of sparse (e.g. CSR or coordinate) matrix in parallel?
If you store your matrix as a coordinate map, any language which has a concurrent dictionary implementation available should do the job for you.
Java's got the ConcurrentHashMap, and .NET 4 has ConcurrentDictionary, both of which allow multi-threaded non-blocking (afaik) element insertion in parallel.
There are no efficient algorithms for creating sparse matrices in data-parallel way. Plausible is coordinate matrix type which requires sorting after content filling, but that type is slow for matrix products etc.
Solution is you don't build sparse matrix - you don't keep it in memory; you do implicit operations in place when you're calculating elements of sparse matrix.
I would like to compute the Moore–Penrose pseudoinverse of an enormous matrix. Ideally, I would like to do it on a matrix that has 23 million rows and 1000 columns, but if necessary I can reduce the number of rows to 4 million by only running on one part of my experiment.
Obviously, loading the matrix in to memory and running SVD on it is not going to work. Wikipedia points to Krylov subspace methods and mention the Arnoldi, Lanczos, Conjugate gradient, GMRES (generalized minimum residual), BiCGSTAB (biconjugate gradient stabilized), QMR (quasi minimal residual), TFQMR (transpose-free QMR), and MINRES (minimal residual) methods as being among the best Krylov subspace methods. But I don't know where to go from here. Is computing the pseudoinverse of such a huge matrix even feasible? If so, using which algorithms or software libraries? I have a large computing cluster available, so parallel approaches are welcome.
This answer points to the R package biglm. Would that work? Has anyone used it? I normally work in Python, but don't mind using other languages and tools for this particular task.
You might be better off using a block iterative algorithm that converges directly to the least squares solution than computing the least squares solution through the pseudoinverse. See "Applied Iterative Methods" by Charlie Byrne. These algorithms are closely related to the Krylov subspace methods, but are tuned for easy computation. You can get an introduction by looking at chapter 3 of this preprint of another of his books.