Related
So I want to find the smallest values in a matrix in the following way.
[[ 1000. 930. 940. 740.]
[ 1000. 1000. 990. 670.]
M1= [ 1000. 1000. 1000. 680.]
[ 1000. 1000. 1000. 1000.]]
The sum of 2 matrix values should be chosen in such a way that the indexes are used once 0,1,2,3. But also the sum of matrix values should be minimized.
So in this case the solution would be M1[2][3] and M1[0][1].
Incorrect would be M1[2][3] and M1[1][3], which hase a lower sum but is does not contain unique index numbers.
The solution should work for NxN matrices, N is even. So for 8x8 matrix, i want to find 4 elements. So that the index Numbers. 0,1,2,3,4,5,6,7 are uses once. So four matrix values.
Another constraint is that the matrix contains only values of intrest in the upper trangle matrix. So were the matrix elements are 1000, these elements can be ignored in finding the minimum sum.
I have tried to alter the Hungarian algorithm, but this was not successful.
Does anybody know of an algorithm that does what I want? Maybe a python package wich I can abuse
Or has a smart solution which would help, I have to do this matrix with about 200X200 elements max.
I will say a solution that is probably not the fastest but it may work.
You can build a graph this way:
the graph will contain (N×N+1) vertexes, which represent the indexes of the matrix and a new one, which will be the source
the source will be connected to all other vertexes with a distance equivalent to the value of the index each of them represents.
then you must connect each vertex (except the source) to every other vertex that is possible to go to (for example, M1[1][2] can go to M1[0][3] but not to M1[1][3]). The distance from any vertex to a vertex V will correspond to the value of V in the matrix.
after you build this graph, you should walk on it K steps (being K the number of possible matrix' indexes you will consider, for example, 2 in a 4x4 matrix like your example).
For each step you take, you store in a stack and in 2 hashes the last position you were (the first to store all rows already used, the second to store all columns already used) and you mark the vertex you get into.
Always you get into a vertex, you should check if is possible to stay in it by using the hashes (theoretically O(1) checking), and if is possible, you add that value to the current sum, otherwise you go to the previous position (stored in the stack) and remove the weight you added when you went into the current vertex.
You should also store a global variable and always you walk K steps, you check if the current sum is smaller than the global sum, and if it is, you change it.
After you walk all possible ways, the global sum will be your answer.
Hope this helps :)
I have two sparse matrices "Matrix1" and "Matrix2" of the same size p x n.
By sparse matrix I mean that it contains a lot of exactly zero elements.
I want to show the two matrices under the same colormap and a unique colorbar. Doing this in MATLAB is straightforward:
bottom = min(min(min(Matrix1)),min(min(Matrix2)));
top = max(max(max(Matrix1)),max(max(Matrix2)));
subplot(1,2,1)
imagesc(Matrix1)
colormap(gray)
caxis manual
caxis([bottom top]);
subplot(1,2,2)
imagesc(Matrix2)
colormap(gray)
caxis manual
caxis([bottom top]);
colorbar;
My problem:
In fact, when I show the matrix using imagesc(Matrix), it can ignore the noises (or backgrounds) that always appear with using imagesc(10*log10(Matrix)).
That is why, I want to show the 10*log10 of the matrices. But in this case, the minimum value will be -Inf since the matrices are sparse. In this case caxis will give an error because bottom is equal to -Inf.
What do you suggest me? How can I modify the above code?
Any help will be very appreciated!
A very important point is that the minimum value in your matrix will always be 0. Leveraging this, a very simple way to address your problem is to add 1 inside the log operation so that values that map to 0 in the original matrix also map to 0 in the log operation. This avoids the -Inf error that you're encountering. In fact, this is a very common way of visualizing the Fourier Transform if you will. Adding 1 to the logarithm ensures that the transform has no negative values in the output, yet the derivative or its rate of change remains intact as the effect is simply a translation of the curve by 1 unit to the left.
Therefore, simply do imagesc(10*log10(1 + Matrix));, then the minimum is always bounded at 0 while the maximum is unbounded but subject to the largest value that is seen in Matrix.
consider the following code-
a=tf.convert_to_tensor(np.array([[1001,1002],[3,4]]), dtype=tf.float32)
b=tf.reduce_max(a,reduction_indices=[1], keep_dims=True)
with tf.Session():
print b.eval()
What exactly is the purpose of keep_dims here? I tested quite a bit, and saw that the above is equivalent to-
b=tf.reduce_max(a,reduction_indices=[1], keep_dims=False)
b=tf.expand_dims(b,1)
I maybe wrong, but my guess is that if keep_dims is False, we get a 2D coloumn vector. And if keep_dims=True, we have a 2x1 matrix. But how are they different?
If you reduce over one or more indices (i.e. dimensions of the tensor), you effectively reduce the rank of the tensor (i.e. its number of dimensions or, in other words, the number of indices you need in order to access an element of the tensor). By setting keep_dims=True, you are telling tensorflow to keep the dimensions over which you reduce. They will then have size 1, but they are still there. While a column vector and a nx1 matrix are conceptually the same thing, in tensorflow, these are tensors of rank 1 (you need a single index to access an element) and rank 2 (you need two indices to access an element), respectively.
I am a quite newbie on matlab and i have a simple issue that is disturbing me,
I want to know if its possible to covert all the subscripts of a matrix to linear indices.
when using SUB2IND i must inform de x and y coordinates, but i want to convert all at the same time.
i can use function FIND which returns two vectors x and y and this way i can use SUB2IND succesfully, but FIND only returns the x and y coordinates of nonzero elements.
is there a smart way to do this?
If you want all of the elements of an array A as linear subscripts, this can be done simply via:
IND = 1:numel(A);
This works for any size or dimension array.
More on array indexing in Matlab, including the difference between linear indexing and logical indexing. When you use find you're essentially using logical indexing to obtain linear indexing. The find function can be used to reliably obtain all of your linear indices, via IND = find(A==A);, but this is horrendously inefficient.
you don't need to convert, just use a single number \ 1-D vector when accessing elements of your matrix. For example, given a 5x5 matrix M
M=magic(5);
you can access the last element using M(5,5) or using M(25) ...
similarly M(21:25) will give you the info of M(1,5),M(2,5),...M(5,5).
Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to be as large as possible (not the largest sum, but the largest number of elements) within some aspect ratio constraints.
Are there any known exact or aproxamate solutions to this problem?
A quick scan on Google seems to give a lot of close-but-not-exactly results. What terms should I be looking for?
edit: Just to clarify; the sub matrix need not be continuous. In fact the row and column order is completely arbitrary so adjacency is completely irrelevant.
A thought based on Chad Okere's idea
Order the rows from largest count to smallest count (not necessary but might help perf)
Select two rows that have a "large" overlap
Add all other rows that won't reduce the overlap
Record that set
Add whatever row reduces the overlap by the least
Repeat at #3 until the result gets to small
Start over at #2 with a different starting pair
Continue until you decide the result is good enough
I assume you want something like this. You have a matrix like
1100101
1110101
0100101
You want columns 1,2,5,7 and rows 1 and 2, right? That submatrix would 4x2 with 8 elements. Or you could go with columns 1,5,7 with rows 1,2,3 which would be a 3x3 matrix.
If you want an 'approximate' method, you could start with a single non-zero element, then go on to find another non-zero element and add it to your list of rows and columns. At some point you'll run into a non-zero element that, if it's rows and columns were added to your collection, your collection would no longer be entirely non-zero.
So for the above matrix, if you added 1,1 and 2,2 you would have rows 1,2 and columns 1,2 in your collection. If you tried to add 3,7 it would cause a problem because 1,3 is zero. So you couldn't add it. You could add 2,5 and 2,7 though. Creating the 4x2 submatrix.
You would basically iterate until you can't find any more new rows and columns to add. That would get you too a local minimum. You could store the result and start again with another start point (perhaps one that didn't fit into your current solution).
Then just stop when you can't find any more after a while.
That, obviously, would take a long time, but I don't know if you'll be able to do it any more quickly.
I know you aren't working on this anymore, but I thought someone might have the same question as me in the future.
So, after realizing this is an NP-hard problem (by reduction to MAX-CLIQUE) I decided to come up with a heuristic that has worked well for me so far:
Given an N x M binary/boolean matrix, find a large dense submatrix:
Part I: Generate reasonable candidate submatrices
Consider each of the N rows to be a M-dimensional binary vector, v_i, where i=1 to N
Compute a distance matrix for the N vectors using the Hamming distance
Use the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm to cluster vectors
Initially, each of the v_i vectors is a singleton cluster. Step 3 above (clustering) gives the order that the vectors should be combined into submatrices. So each internal node in the hierarchical clustering tree is a candidate submatrix.
Part II: Score and rank candidate submatrices
For each submatrix, calculate D, the number of elements in the dense subset of the vectors for the submatrix by eliminating any column with one or more zeros.
Select the submatrix that maximizes D
I also had some considerations regarding the min number of rows that needed to be preserved from the initial full matrix, and I would discard any candidate submatrices that did not meet this criteria before selecting a submatrix with max D value.
Is this a Netflix problem?
MATLAB or some other sparse matrix libraries might have ways to handle it.
Is your intent to write your own?
Maybe the 1D approach for each row would help you. The algorithm might look like this:
Loop over each row
Find the index of the first non-zero element
Find the index of the non-zero row element with the largest span between non-zero columns in each row and store both.
Sort the rows from largest to smallest span between non-zero columns.
At this point I start getting fuzzy (sorry, not an algorithm designer). I'd try looping over each row, lining up the indexes of the starting point, looking for the maximum non-zero run of column indexes that I could.
You don't specify whether or not the dense matrix has to be square. I'll assume not.
I don't know how efficient this is or what its Big-O behavior would be. But it's a brute force method to start with.
EDIT. This is NOT the same as the problem below.. My bad...
But based on the last comment below, it might be equivilent to the following:
Find the furthest vertically separated pair of zero points that have no zero point between them.
Find the furthest horizontally separated pair of zero points that have no zeros between them ?
Then the horizontal region you're looking for is the rectangle that fits between these two pairs of points?
This exact problem is discussed in a gem of a book called "Programming Pearls" by Jon Bentley, and, as I recall, although there is a solution in one dimension, there is no easy answer for the 2-d or higher dimensional variants ...
The 1=D problem is, effectively, find the largest sum of a contiguous subset of a set of numbers:
iterate through the elements, keeping track of a running total from a specific previous element, and the maximum subtotal seen so far (and the start and end elemnt that generateds it)... At each element, if the maxrunning subtotal is greater than the max total seen so far, the max seen so far and endelemnt are reset... If the max running total goes below zero, the start element is reset to the current element and the running total is reset to zero ...
The 2-D problem came from an attempt to generate a visual image processing algorithm, which was attempting to find, within a stream of brightnesss values representing pixels in a 2-color image, find the "brightest" rectangular area within the image. i.e., find the contained 2-D sub-matrix with the highest sum of brightness values, where "Brightness" was measured by the difference between the pixel's brighness value and the overall average brightness of the entire image (so many elements had negative values)
EDIT: To look up the 1-D solution I dredged up my copy of the 2nd edition of this book, and in it, Jon Bentley says "The 2-D version remains unsolved as this edition goes to print..." which was in 1999.