Sort matrix elements around the diagonal - algorithm

I am looking for an algorithm that can sort the rows of a matrix so that the elements will cumulate around the diagonal.
I will have a square matrix (around 80 rows/ columns) containing only the values 0 and 1. There are algorithms that sort the rows in a way that most of the elements with the value 1 are below the diagonal.
I need an algorithm that sort to minimize the mean distance of the elements to the diagonal.
Like so:
from:
0 1 0
1 0 1
1 1 0
to:
1 1 0
0 1 0
1 0 1
Since I am not familiar with this topic I hope that someone can help me. I am not looking for a complete solution. The name of such algorithm if it exists or a pseudo code would be sufficient.
Thanks a lot!

There is probably a more efficient way, but you could treat this problem as an assignment problem (trying to assign each row to a diagonal element).
This can be done in three steps:
1) Create a new matrix M where each entry M(i,j) contains the cost of assigning row i of your input matrix to the diagonal element j. For your example this matrix will be the following (average distance to the diagonal element):
1 0 1
1 1 1
1 0.5 1.5
Example: M(0,0) = 1 is the average distance when assigning row 0 of the input matrix (0 1 0) to the diagonal element positioned at 0.
2) Run an algorithm to find the best assignment (e.g., hungarian algorithm). This will give you an optimal 1:1 matching between rows and columns minimizing the sum of cost in the matrix.
The result will be the elements (0,1), (1,2) and (2,0)
3) Rearrange your input matrix using this knowledge. So
row 0 -> row 1
row 1 -> row 2
row 2 -> row 0

Related

Binary search over 2d array to find a local maximum? What's wrong with this algorithm?

This is the classic finding a local maximum (just one) in a matrix.
My algorithm is:
Choose the number in the center of the matrix.
Check if the number is a peak. If yes, return.
If not, check the numbers to the left and right. If one of them is greater than our current number, choose that half of the matrix. If both are greater, we can choose either half.
Repeat with the numbers to the top and bottom. This will leave us with one quadrant of the matrix to continue checking.
Since this is binary search for a n x n matrix which has n^2 elements, it should take O(log(n^2)) = O(2*log(n)) = O(log(n))
I'm pretty sure this is not correct, but where is my mistake?
This algorithm isn't guaranteed to find the local maximum. Consider for example the case where you need to follow a winding path through the matrix of ascending values to get to the peak. If that path crosses back and forth between quadrants you algorithm will not find it.
13 1 1 1 1
12 1 1 1 1
11 1 1 2 3
10 1 1 1 4
9 8 7 6 5
Or, here's a simpler example:
3 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
You start in the middle, how do you find the '3'? You algorithm doesn't describe what to do when faced with a horizontal plane.
Consider reading Find a peak element in a 2D array, where it describes a brute force approach, as well as an efficient method which has a time complexity of O(rows * log(columns)), in yout case O(nlogn).
The algorithm is based on Binary Search, thus the logn term you had in your complexity too:
Consider mid column and find maximum element in it.
Let index of mid column be ‘mid’, value of maximum element in mid
column be ‘max’ and maximum element be at ‘mat[max_index][mid]’.
If max >= A[index][mid-1] & max >= A[index][pick+1], max is a peak,
return max.
If max < mat[max_index][mid-1], recur for left half of matrix.
If max < mat[max_index][mid+1], recur for right half of matrix.
However, your algorithm won't work for all cases, and might fail to find a local maximum, since you only look neighboring elements of the current center, which does not guarantee you that will find the maximum (the elements are not sorted of course). Example:
1 1 1 1 1
1 1 1 1 1
1 2 1 1 1
1 1 1 1 1
1 1 1 1 10
You start from the center, you pick the wrong submatrix, you are doomed not to find the local maximum.

making a singular matrix non singular by removing rows and columns

I have large sparse square matrix n by n, its rank is slightly below n, let's say m. I want to make it non-singular by removing rows and columns by a certain rule. The rule is that if you remove ith row, you must remove ith column as well, so that the matrix is always square. This is effectively removing a node in an adjacency graph.
My first question is: does there always exist such a combination of n-m rows and columns I can remove such that the remaining m by m submatrix is structurally non singular.
My second questions is: is there an effective algorithm to obtain a p by p non-singular submatrix without removing excessive amount of rows and columns
To provide more context, the matrix I'm dealing with is about 1000 by 1000 with sparsity close to 0.05
1 is not true. here's a example.
[1 0 0 0;
0 1 0 0;
0 0 0 1;
0 0 0 0]
The rank is clearly 3 which happens to be the number of nonzero rows/columns). You can't remove rows 1,2,3 nor columns 1,2,4. So 1 to 4 are covered.
The first one is not true. As has been answered by hiandbali. I managed to solve the second problem by doing a DFS. The interior adjacency matrices are not singular.

Incidence matrices

Permutation of any two rows or columns in an incidence matrix simply corresponds to relabelling the vertices and edges of the same graph. Conversely, two graphs X and Y are isomorphic if and only if their incidence matrices A(X) and A(Y) differ only by permutations of rows and columns.
Can someone explain me what does it mean, with an example. What exactly does "permutation of any two rows or columns" over hear means?
"Permutation" here means "exchange". Consider the following node-node incidence matrix:
0 1 0
0 0 1
1 0 0
It defines a graph with vertices 0, 1, 2 where the edges constitue a circle 0-1-2-0. If the first two rows are exchanged, we obtain
0 0 1
0 1 0
1 0 0
where the circle is 0-2-1-0. This graph is obtained from the initial graph by relabelling 1 to 2 and vice versa. This means that both graphs are "identical up to renaming of vertices", i.e. they are isomorphic.

Algorithm to maximize the smallest diagonal element of a matrix

Suppose we are given a square matrix A. Our goal is to maximize the smallest diagonal element by row permutations. In other words, for the given matrix A, we have n diagonal elements and thus we have the minimum $min{d_i}$. Our purpose is to reach the matrix with possibly largest minimum diagonal element by row permutations.
This is like $max min{d_i}$ over all row permutations.
For example, suppose A = [4 3 2 1; 1 4 3 2; 2 1 4 3; 2.5 3.5 4.5 1.5]. The diagonal is [4, 4, 4, 1.5]. The minimum of the diagonal is 1.5. We can swap row 3 and 4 to get to a new matrix \tilde_A = [4 3 2 1; 1 4 3 2; 2.5 3.5 4.5 1.5; 2 1 4 3]. The new diagonal is [4, 4, 4.5, 3] with a new minimum 3. And in theory, this is the best result I can obtain because there seems no better option: 3 seems to be the max min{d_i}.
In my problem, n is much larger like 1000. I know there are n! row permutations so I cannot go through each permutation in theory. I know greedy algorithm will help--we start from the first row. If a_11 is not the smallest in the first column, we swap a_11 with the largest element in the first column by row permutation. Then we look at the second row by comparing a_22 with all remaining elements in the second column(except a_12). Swap a_22 if it is not the smallest. ... ... etc. We keep doing this till the last row.
Is there any better algorithm to do it?
This is similar to Minimum Euclidean Matching but they are not the same.
Suppose you wanted to know whether there was a better solution to your problem than 3.
Change your matrix to have a 1 for every element that is strictly greater than 3:
4 3 2 1 1 0 0 0
1 4 3 2 0 1 0 0
2.5 3.5 4.5 1.5 -> 0 1 1 0
2 1 4 3 0 0 1 0
Your problem can be interpreted as trying to find a perfect matching in the bipartite graph which has this binary matrix as its biadjacency graph.
In this case, it is easy to see that there is no way of improving your result because there is no way of reordering rows to make the diagonal entry in the last column greater than 3.
For a larger matrix, there are efficient algorithms to determine maximal matchings in bipartite graphs.
This suggests an algorithm:
Use bisection to find the largest value for which the generated graph has a perfect matching
The assignment corresponding to the perfect matching with the largest value will be equal to the best permutation of rows
EDIT
This Python code illustrates how to use the networkx library to determine whether the graph has a perfect matching for a particular cutoff value.
import networkx as nx
A = [[4,3,2,1],
[1,4,3,2],
[2,1,4,3],
[2.5,3.5,4.5,1.5]]
cutoff = 3
G=nx.DiGraph()
for i,row in enumerate(A):
G.add_edge('start','row'+str(i),capacity=1.0)
G.add_edge('col'+str(i),'end',capacity=1.0)
for j,e in enumerate(row):
if e>cutoff:
G.add_edge('row'+str(i),'col'+str(j),capacity=1.0)
if nx.max_flow(G,'start','end')<len(A):
print 'No perfect matching'
else:
print 'Has a perfect matching'
For a random matrix of size 1000*1000 it takes about 1 second on my computer.
Let $x_{ij}$ be 1 if row i is moved to row j and zero otherwise.
You're interested in the following integer program:
max z
\sum_{i=0}^n x_{ij} = 1 \forall j
\sum_{j=0}^n x_{ij} = 1 \forall i
A[j,j]x_{ij} >= z
Then plug this into GLPK, Gurobi, or CPLEX. Alternatively, solve the IP using your own branch and bound solve.

Rectangular region in an array

Given an N*N matrix having 1's an 0's in them and given an integer k,what is the best method to find a rectangular region such that it has k 1's in it ???
I can do it with O(N^3*log(N)), but sure the best solution is faster. First you create another N*N matrix B (the initial matrix is A). The logic of B is the following:
B[i][j] - is the number of ones on rectangle in A with corners (0,0) and (i,j).
You can evaluate B for O(N^2) by dynamic programming: B[i][j] = B[i-1][j] + B[i][j-1] - B[i-1][j-1] + A[i][j].
Now it is very easy to solve this problem with O(N^4) by iterating over all right-bottom (i=1..N, j=1..N, O(N^2)), left-bottom (z=1..j, O(N)), and right-upper (t=1..i, O(N)) and you get the number of ones in this rectangular with the help of B:
sum_of_ones = B[i][j] - B[i][z-1] - B[t-1][j] + B[t-1][z-1].
If you got exactly k: k==sum_of_ones, then out the result.
To make it N^3*log(N), you should find right-upper by binary search (so not just iterate all possible cells).
Consider this simpler problem:
Given a vector of size N containing only the values 1 and 0, find a subsequence that contains exactly k values of 1 in it.
Let A be the given vector and S[i] = A[1] + A[2] + A[3] + ... + A[i], meaning how many 1s there are in the subsequence A[1..i].
For each i, we are interested in the existence of a j <= i such that S[i] - S[j-1] == k.
We can find this in O(n) with a hash table by using the following relation:
S[i] - S[j-1] == k => S[j-1] = S[i] - k
let H = an empty hash table
for i = 1 to N do
if H.Contains (S[i] - k) then your sequence ends at i
else
H.Add(S[i])
Now we can use this to solve your given problem in O(N^3): for each sequence of rows in your given matrix (there are O(N^2) sequences of rows), consider that sequence to represent a vector and apply the previous algorithm on it. The computation of S is a bit more difficult in the matrix case, but it's not that hard to figure out. Let me know if you need more details.
Update:
Here's how the algorithm would work on the following matrix, assuming k = 12:
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
Consider the first row alone:
0 1 1 1 1 0
Consider it to be the vector 0 1 1 1 1 0 and apply the algorithm for the simpler problem on it: we find that there's no subsequence adding up to 12, so we move on.
Consider the first two rows:
0 1 1 1 1 0
0 1 1 1 1 0
Consider them to be the vector 0+0 1+1 1+1 1+1 1+1 0+0 = 0 2 2 2 2 0 and apply the algorithm for the simpler problem on it: again, no subsequence that adds up to 12, so move on.
Consider the first three rows:
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
Consider them to be the vector 0 3 3 3 3 0 and apply the algorithm for the simpler problem on it: we find the sequence starting at position 2 and ending at position 5 to be the solution. From this we can get the entire rectangle with simple bookkeeping.

Resources