Given an adjanceny matrix that is rather sparse, meaning there are a lot of zero entries, I would like to do the following: I would like to change the order of the rows and columns of the matrix such that the matrix has the non-zero entries as close as possible to the diagonal line. Then I would get some kind of pseudo diagonal matrix.
I would like to whether there are known algorithm for doing exactly that. After having this pseudo diagonal matrix I that ideally think must be a metric for "how diagonal" we could get.
The reason for doing so is that afterwards I would be able to store the matrix in a much small data-structure which would be faster to store and load.
My own research has shown me that I might not know the correct terminology for the problem so I would be happy to learn about the correct wording of the given problem. And of course get to know algorithms that can do the "pseudo diagonalisation" by changing the order of rows and columns in a matrix.
Related
This is a raw idea I have in my mind, but I'm pretty sure there is a specific name for the following type of algorithm or math concept? Can somebody give me pointers which area to look deeper into if I want to implement something like this:
Say I have a [100x100] matrix of numbers. My goal is to find those positions in this matrix where the surrounding entries are very similar (e.g. only differ from each other by 0.1).
In other words, I would go through each entry in this matrix and check whether the nearest neighbors are close (0.1 distance) to this specific entry in question. In the end I want a list of those entry positions where this distance criterion is satisfied.
The goal is to use this algorithm to find location in this matrix where entries are homogenous, similar. I want to implement this in Javascript.
Suppose I know an algorithm, that partitions a boolean matrix into a minimal set of disjoint rectangles that cover all "ones" ("trues").
The task is to find a permutation of rows and columns of the matrix, such that a matrix built by shuffling the columns and rows according to the permutations can be partitioned into a minimal set of rectangles.
For illustration, one can think about the problem this way:
Suppose I have a set of objects and a set of properties. Each object can have any number of (distinct) properties. The task is to summarize (report) this mapping using the least amount of sentences. Each sentence has a form "<list of objects> have properties <list of properties>".
I know I can brute-force the solution by applying the permutations and run the algorithm on each try. But the time complexity explodes exponentially making this approach non-practical for matrices bigger than 15×15.
I know I can simplify the matrices before running the algorithm by removing duplicated rows and columns.
This problem feels like it is NP-hard, and there might be no fast (polynomial in time) solutions. If that is so, I'd be interested to learn about some approximate solutions.
This is isomorphic to reducing logic circuits, given the full set of inputs (features) and the required truth table (which rows have which feature). You can solve the problem with classic Boolean algebra. The process is called logic optimization.
When I was in school, we drew Karnaugh maps on the board and drew colored boundaries to form our rectangles. However, it sounds as if you have something larger than one would handle on the board; try the QM algorithm and the cited heuristics for a "good enough" solution for many applications.
My solution so far:
First let us acknowledge, that the problem is symmetric with respect to swapping rows with columns (features with objects).
Let us represent the problem with the binary matrix, where rows are objects and columns are features and ones in the matrix represent matched pairs (object, feature).
My idea so far is to run two steps in sequence until there is no 1s left in the matrix:
Heuristically find a good unshuffling permutation of rows and columns on which I can run 2D maximal rectangle
Find the maximal rectangle, save it to the answer list and zero all 1s belonging to it.
Maximal rectangle problem
It can be simply any of the implementations of the maximal rectangle problem found on the net, for instance https://www.geeksforgeeks.org/maximum-size-rectangle-binary-sub-matrix-1s/
Unshuffling the rows (and columns)
Unshuffling rows are independent of unshuffling columns and both tasks can be run separately (concurrently). Let us assume I am looking for the unshuffling permutation of columns.
Also, it is worth noting, that unshuffling a matrix should yield the same results if we swap ones with zeroes.
Build a distance matrix of columns. A distance between two columns is defined as Manhattan distance between the two columns represented numerically (i.e. 0 - the absence of a relationship between object and feature, 1 - presence)
Run hierarchical clustering using the distance matrix. The complexity is O(n^2), as I believe single linkage should be good enough.
The order of objects returned from the hierarchical clustering is the unshuffling permutation.
The algorithm works good enough for my use cases. The implementation in R can be found in https://github.com/adamryczkowski/rectpartitions
this is my first question, and I hope it's not misdirected/in the wrong place.
Let's say I have a matrix of data that is fully populated except for one value. For example, Column 1 is Height, Column 2 is Weight, and Column 3 is Bench Press. So I surveyed 20 people and got their height, weight, and bench press weight. Now I have a 5'11 individual weighing 170 pounds, and would like to predict his/her bench press weight. You could look at this as the matrix having a missing value, or you could look at it as wanting to predict a dependent variable given a vector of independent variables. There are curve fitting approaches to this kind of problem, but I would like to know how to use the Singular Value Decomposition to answer this question.
I am aware of the Singular Value Decomposition as a means of predicting missing values, but virtually all the information I have found has been in relation to huge, highly sparse matrices, with respect to the Netflix Prize and related problems. I cannot figure out how to use SVD or a similar approach to predict a missing value from a small or medium sized, fully populated (except for one missing value) matrix.
A step-by-step algorithm for solving the example above using SVD would be very helpful to me. Thank you!
I was planning this as a comment but its too long by a fair bit so I've submitted it as an answer.
My reading of SVD suggests to me that it is not very applicable to your example. In particular it seems that you would need to somehow assign some difficulty ranking to the bench-press column of your matrix, or some ability ranking of the individuals. Perhaps both. Since the amount he can bench-press depends solely on his own height and weight I don't think SVD would provide any optimization over just calculating the statistical average of what others in the list have accomplished and using that to predict the outcome for your 5'11 170lb lifter. Perhaps if there was BMI (body mass index) column and if BMI could be ranked... and probably a larger data set. I think the problem is that there is no noise in your matrix to be reduced by using SVD. Here's a tut that appears to use a similar problem: http://www.puffinwarellc.com/index.php/news-and-articles/articles/30-singular-value-decomposition-tutorial.html
I'm new here so I'm not sure if it's asked before, but I did look out to see if it's there.
I'm interested if anyone has encountered similar problem. I have sparse matrix that is being LU decomposed and than those L and U factors are than inverted. Now the problem I encounter is following. The original sparse matrix requires editing because of input data, and in some cases (I know why) it becomes singular. The solution for that is simple, I will remove row and column for those elements that made it singular, and continue with my code, but is there a way to edit LU factors that are inverted or I have to create new ones every time? It consumes a lot of time, since number of nonzero elements is like 10K or more.
I am trying to apply Random Projections method on a very sparse dataset. I found papers and tutorials about Johnson Lindenstrauss method, but every one of them is full of equations which makes no meaningful explanation to me. For example, this document on Johnson-Lindenstrauss
Unfortunately, from this document, I can get no idea about the implementation steps of the algorithm. It's a long shot but is there anyone who can tell me the plain English version or very simple pseudo code of the algorithm? Or where can I start to dig this equations? Any suggestions?
For example, what I understand from the algorithm by reading this paper concerning Johnson-Lindenstrauss is that:
Assume we have a AxB matrix where A is number of samples and B is the number of dimensions, e.g. 100x5000. And I want to reduce the dimension of it to 500, which will produce a 100x500 matrix.
As far as I understand: first, I need to construct a 100x500 matrix and fill the entries randomly with +1 and -1 (with a 50% probability).
Edit:
Okay, I think I started to get it. So we have a matrix A which is mxn. We want to reduce it to E which is mxk.
What we need to do is, to construct a matrix R which has nxk dimension, and fill it with 0, -1 or +1, with respect to 2/3, 1/6 and 1/6 probability.
After constructing this R, we'll simply do a matrix multiplication AxR to find our reduced matrix E. But we don't need to do a full matrix multiplication, because if an element of Ri is 0, we don't need to do calculation. Simply skip it. But if we face with 1, we just add the column, or if it's -1, just subtract it from the calculation. So we'll simply use summation rather than multiplication to find E. And that is what makes this method very fast.
It turned out a very neat algorithm, although I feel too stupid to get the idea.
You have the idea right. However as I understand random project, the rows of your matrix R should have unit length. I believe that's approximately what the normalizing by 1/sqrt(k) is for, to normalize away the fact that they're not unit vectors.
It isn't a projection, but, it's nearly a projection; R's rows aren't orthonormal, but within a much higher-dimensional space, they quite nearly are. In fact the dot product of any two of those vectors you choose will be pretty close to 0. This is why it is a generally good approximation of actually finding a proper basis for projection.
The mapping from high-dimensional data A to low-dimensional data E is given in the statement of theorem 1.1 in the latter paper - it is simply a scalar multiplication followed by a matrix multiplication. The data vectors are the rows of the matrices A and E. As the author points out in section 7.1, you don't need to use a full matrix multiplication algorithm.
If your dataset is sparse, then sparse random projections will not work well.
You have a few options here:
Option A:
Step 1. apply a structured dense random projection (so called fast hadamard transform is typically used). This is a special projection which is very fast to compute but otherwise has the properties of a normal dense random projection
Step 2. apply sparse projection on the "densified data" (sparse random projections are useful for dense data only)
Option B:
Apply SVD on the sparse data. If the data is sparse but has some structure SVD is better. Random projection preserves the distances between all points. SVD preserves better the distances between dense regions - in practice this is more meaningful. Also people use random projections to compute the SVD on huge datasets. Random Projections gives you efficiency, but not necessarily the best quality of embedding in a low dimension.
If your data has no structure, then use random projections.
Option C:
For data points for which SVD has little error, use SVD; for the rest of the points use Random Projection
Option D:
Use a random projection based on the data points themselves.
This is very easy to understand what is going on. It looks something like this:
create a n by k matrix (n number of data point, k new dimension)
for i from 0 to k do #generate k random projection vectors
randomized_combination = feature vector of zeros (number of zeros = number of features)
sample_point_ids = select a sample of point ids
for each point_id in sample_point_ids do:
random_sign = +1/-1 with prob. 1/2
randomized_combination += random_sign*feature_vector[point_id] #this is a vector operation
normalize the randomized combination
#note that the normal random projection is:
# randomized_combination = [+/-1, +/-1, ...] (k +/-1; if you want sparse randomly set a fraction to 0; also good to normalize by length]
to project the data points on this random feature just do
for each data point_id in dataset:
scores[point_id, j] = dot_product(feature_vector[point_id], randomized_feature)
If you are still looking to solve this problem, write a message here, I can give you more pseudocode.
The way to think about it is that a random projection is just a random pattern and the dot product (i.e. projecting the data point) between the data point and the pattern gives you the overlap between them. So if two data points overlap with many random patterns, those points are similar. Therefore, random projections preserve similarity while using less space, but they also add random fluctuations in the pairwise similarities. What JLT tells you is that to make fluctuations 0.1 (eps)
you need about 100*log(n) dimensions.
Good Luck!
An R Package to perform Random Projection using Johnson- Lindenstrauss Lemma
RandPro