Sorting a binary 2D matrix? - algorithm

I'm looking for some pointers here as I don't quite know where to start researching this one.
I have a 2D matrix with 0 or 1 in each cell, such as:
1 2 3 4
A 0 1 1 0
B 1 1 1 0
C 0 1 0 0
D 1 1 0 0
And I'd like to sort it so it is as "upper triangular" as possible, like so:
4 3 1 2
B 0 1 1 1
A 0 1 0 1
D 0 0 1 1
C 0 0 0 1
The rows and columns must remain intact, i.e. elements can't be moved individually and can only be swapped "whole".
I understand that there'll probably be pathological cases where a matrix has multiple possible sorted results (i.e. same shape, but differ in the identity of the "original" rows/columns.)
So, can anyone suggest where I might find some starting points for this? An existing library/algorithm would be great, but I'll settle for knowing the name of the problem I'm trying to solve!
I doubt it's a linear algebra problem as such, and maybe there's some kind of image processing technique that's applicable.
Any other ideas aside, my initial guess is just to write a simple insertion sort on the rows, then the columns and iterate that until it stabilises (and hope that detecting the pathological cases isn't too hard.)
More details: Some more information on what I'm trying to do may help clarify. Each row represents a competitor, each column represents a challenge. Each 1 or 0 represents "success" for the competitor on a particular challenge.
By sorting the matrix so all 1s are in the top-right, I hope to then provide a ranking of the intrinsic difficulty of each challenge and a ranking of the competitors (which will take into account the difficulty of the challenges they succeeded at, not just the number of successes.)
Note on accepted answer: I've accepted Simulated Annealing as "the answer" with the caveat that this question doesn't have a right answer. It seems like a good approach, though I haven't actually managed to come up with a scoring function that works for my problem.

An Algorithm based upon simulated annealing can handle this sort of thing without too much trouble. Not great if you have small matrices which most likely hae a fixed solution, but great if your matrices get to be larger and the problem becomes more difficult.
(However, it also fails your desire that insertions can be done incrementally.)
Preliminaries
Devise a performance function that "scores" a matrix - matrices that are closer to your triangleness should get a better score than those that are less triangle-y.
Devise a set of operations that are allowed on the matrix. Your description was a little ambiguous, but if you can swap rows then one op would be SwapRows(a, b). Another could be SwapCols(a, b).
The Annealing loop
I won't give a full exposition here, but the idea is simple. You perform random transformations on the matrix using your operations. You measure how much "better" the matrix is after the operation (using the performance function before and after the operation). Then you decide whether to commit that transformation. You repeat this process a lot.
Deciding whether to commit the transform is the fun part: you need to decide whether to perform that operation or not. Toward the end of the annealing process, you only accept transformations that improved the score of the matrix. But earlier on, in a more chaotic time, you allow transformations that don't improve the score. In the beginning, the algorithm is "hot" and anything goes. Eventually, the algorithm cools and only good transforms are allowed. If you linearly cool the algorithm, then the choice of whether to accept a transformation is:
public bool ShouldAccept(double cost, double temperature, Random random) {
return Math.Exp(-cost / temperature) > random.NextDouble();
}
You should read the excellent information contained in Numerical Recipes for more information on this algorithm.
Long story short, you should learn some of these general purpose algorithms. Doing so will allow you to solve large classes of problems that are hard to solve analytically.
Scoring algorithm
This is probably the trickiest part. You will want to devise a scorer that guides the annealing process toward your goal. The scorer should be a continuous function that results in larger numbers as the matrix approaches the ideal solution.
How do you measure the "ideal solution" - triangleness? Here is a naive and easy scorer: For every point, you know whether it should be 1 or 0. Add +1 to the score if the matrix is right, -1 if it's wrong. Here's some code so I can be explicit (not tested! please review!)
int Score(Matrix m) {
var score = 0;
for (var r = 0; r < m.NumRows; r++) {
for (var c = 0; c < m.NumCols; c++) {
var val = m.At(r, c);
var shouldBe = (c >= r) ? 1 : 0;
if (val == shouldBe) {
score++;
}
else {
score--;
}
}
}
return score;
}
With this scoring algorithm, a random field of 1s and 0s will give a score of 0. An "opposite" triangle will give the most negative score, and the correct solution will give the most positive score. Diffing two scores will give you the cost.
If this scorer doesn't work for you, then you will need to "tune" it until it produces the matrices you want.
This algorithm is based on the premise that tuning this scorer is much simpler than devising the optimal algorithm for sorting the matrix.

I came up with the below algorithm, and it seems to work correctly.
Phase 1: move rows with most 1s up and columns with most 1s right.
First the rows. Sort the rows by counting their 1s. We don't care
if 2 rows have the same number of 1s.
Now the columns. Sort the cols by
counting their 1s. We don't care
if 2 cols have the same number of
1s.
Phase 2: repeat phase 1 but with extra criterions, so that we satisfy the triangular matrix morph.
Criterion for rows: if 2 rows have the same number of 1s, we move up the row that begin with fewer 0s.
Criterion for cols: if 2 cols have the same number of 1s, we move right the col that has fewer 0s at the bottom.
Example:
Phase 1
1 2 3 4 1 2 3 4 4 1 3 2
A 0 1 1 0 B 1 1 1 0 B 0 1 1 1
B 1 1 1 0 - sort rows-> A 0 1 1 0 - sort cols-> A 0 0 1 1
C 0 1 0 0 D 1 1 0 0 D 0 1 0 1
D 1 1 0 0 C 0 1 0 0 C 0 0 0 1
Phase 2
4 1 3 2 4 1 3 2
B 0 1 1 1 B 0 1 1 1
A 0 0 1 1 - sort rows-> D 0 1 0 1 - sort cols-> "completed"
D 0 1 0 1 A 0 0 1 1
C 0 0 0 1 C 0 0 0 1
Edit: it turns out that my algorithm doesn't give proper triangular matrices always.
For example:
Phase 1
1 2 3 4 1 2 3 4
A 1 0 0 0 B 0 1 1 1
B 0 1 1 1 - sort rows-> C 0 0 1 1 - sort cols-> "completed"
C 0 0 1 1 A 1 0 0 0
D 0 0 0 1 D 0 0 0 1
Phase 2
1 2 3 4 1 2 3 4 2 1 3 4
B 0 1 1 1 B 0 1 1 1 B 1 0 1 1
C 0 0 1 1 - sort rows-> C 0 0 1 1 - sort cols-> C 0 0 1 1
A 1 0 0 0 A 1 0 0 0 A 0 1 0 0
D 0 0 0 1 D 0 0 0 1 D 0 0 0 1
(no change)
(*) Perhaps a phase 3 will increase the good results. In that phase we place the rows that start with fewer 0s in the top.

Look for a 1987 paper by Anna Lubiw on "Doubly Lexical Orderings of Matrices".
There is a citation below. The ordering is not identical to what you are looking for, but is pretty close. If nothing else, you should be able to get a pretty good idea from there.
http://dl.acm.org/citation.cfm?id=33385

Here's a starting point:
Convert each row from binary bits into a number
Sort the numbers in descending order.
Then convert each row back to binary.

Basic algorithm:
Determine the row sums and store
values. Determine the column sums
and store values.
Sort the row sums in ascending order. Sort the column
sums in ascending order.
Hopefully, you should have a matrix with as close to an upper-right triangular region as possible.

Treat rows as binary numbers, with the leftmost column as the most significant bit, and sort them in descending order, top to bottom
Treat the columns as binary numbers with the bottommost row as the most significant bit and sort them in ascending order, left to right.
Repeat until you reach a fixed point. Proof that the algorithm terminates left as an excercise for the reader.

Related

Is there an algorithm that divide a square cell grid map into 3 or 4 equal spaces?

So for example I have divide my map into something like this:
click on link
the matrix representative would be
0 1 0 1 0
1 1 1 1 0
0 1 1 1 1
0 1 0 0 0
one of the way I could divide it into even-ish would be:
click to see
where total square is 11 and since 11/3 gives us a decimal, I need to have 2 space with 4 square and one space with 3 squares.
but I don't know an algorithm that will be able to divide a small map like that.
there is probably a code that will be able to solve that particular map, but what if it is like :
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 0 0
0 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 0 1
Each value is a square in the map and 1 is the square that should be considered. 0 is an empty/null space that is not part of the map and should not be taken into consideration when dividing the map.
So far I try a for loop adding all value and divide by 3 to determine how many squares is needed for each space. Also, if I get a decimal, then one space can have one more square than the other. So in this problem there is 36 squares so if I try to divide it into 3 space, then each space would have 12 squares.
So I am looking to see if there is an algorithm that will be able to solve all types of map.
This is actually NP-hard for k>=2, where you want k=3 or k=4:
theorem 2.2 in On the complexity of partitioning graphs into connected subgraphs - M.E. DYER, A.M. FRIEZE
You can get a decent answer by greedily removing nodes from your graph, and backtracking if you can't merge the remaining nodes.
It would help if you gave a more rigorous definition of 'even-ish' - for example, consider a map with 13 nodes - Would you rather have divisions of size (4,4,5), (3,3,3,4), (4,4,4,1), (5,5,3), or (4,4,3,2)?

How to solve 5 * 5 Cube in efficient easy way

There is 5*5 cube puzzle named Happy cube Problem where for given mat , need to make a cube .
http://www.mathematische-basteleien.de/cube_its.htm#top
Its like, 6 blue mats are given-
From the following mats, Need to derive a Cube -
These way it has 3 more solutions.
So like first cub
For such problem, the easiest approach I could imagine was Recursion based where for each cube, I have 6 position , and for each position I will try check all other mate and which fit, I will go again recursively to solve the same. Like finding all permutations of each of the cube and then find which fits the best.So Dynamic Programming approach.
But I am making loads of mistake in recursion , so is there any better easy approach which I can use to solve the same?
I made matrix out of each mat or diagram provided, then I rotated them in each 90 clock-wise 4 times and anticlock wise times . I flip the array and did the same, now for each of the above iteration I will have to repeat the step for other cube, so again recursion .
0 0 1 0 1
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
0 1 0 1 1
-------------
0 1 0 1 0
1 1 1 1 0
0 1 1 1 1
1 1 1 1 0
1 1 0 1 1
-------------
1 1 0 1 1
0 1 1 1 1
1 1 1 1 0
0 1 1 1 1
0 1 0 1 0
-------------
1 0 1 0 0
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
1 1 0 1 0
-------------
1st - block is the Diagram
2nd - rotate clock wise
3rd - rotate anti clockwise
4th - flip
Still struggling to sort out the logic .
I can't believe this, but I actually wrote a set of scripts back in 2009 to brute-force solutions to this exact problem, for the simple cube case. I just put the code on Github: https://github.com/niklasb/3d-puzzle
Unfortunately the documentation is in German because that's the only language my team understood, but source code comments are in English. In particular, check out the file puzzle_lib.rb.
The approach is indeed just a straightforward backtracking algorithm, which I think is the way to go. I can't really say it's easy though, as far as I remember the 3-d aspect is a bit challenging. I implemented one optimization: Find all symmetries beforehand and only try each unique orientation of a piece. The idea is that the more characteristic the pieces are, the less options for placing pieces exist, so we can prune early. In the case of many symmetries, there might be lots of possibilities and we want to inspect only the ones that are unique up to symmetry.
Basically the algorithm works as follows: First, assign a fixed order to the sides of the cube, let's number them 0 to 5 for example. Then execute the following algorithm:
def check_slots():
for each edge e:
if slot adjacent to e are filled:
if the 1-0 patterns of the piece edges (excluding the corners)
have XOR != 0:
return false
if the corners are not "consistent":
return false
return true
def backtrack(slot_idx, pieces_left):
if slot_idx == 6:
# finished, we found a solution, output it or whatever
return
for each piece in pieces_left:
for each orientation o of piece:
fill slot slot_idx with piece in orientation o
if check_slots():
backtrack(slot_idx + 1, pieces_left \ {piece})
empty slot slot_idx
The corner consistency is a bit tricky: Either the corner must be filled by exactly one of the adjacent pieces or it must be accessible from a yet unfilled slot, i.e. not cut off by the already assigned pieces.
Of course you can ignore to drop some or all of the consistency checks and only check in the end, seeing as there are only 8^6 * 6! possible configurations overall. If you have more than 6 pieces, it becomes more important to prune early.

How I can get the 'n' possible matrices from two vectors?

I've been searching for an algorithm for the solution of all possible matrices of dimension 'n' that can be obtained with two arrays, one of the sum of the rows, and another, of the sum of the columns of a matrix. For example, if I have the following matrix of dimension 7:
matriz= [ 1 0 0 1 1 1 0
1 0 1 0 1 0 0
0 0 1 0 1 0 0
1 0 0 1 1 0 1
0 1 1 0 1 0 1
1 1 1 0 0 0 1
0 0 1 0 1 0 1 ]
The sum of the columns are:
col= [4 2 5 2 6 1 4]
The sum of the rows are:
row = [4 3 2 4 4 4 3]
Now, I want to obtain all possible matrices of "ones and zeros" where the sum of the columns and the rows fulfil the condition of "col" and "row" respectively.
I would appreciate ideas that can help solve this problem.
One obvious way is to brute-force a solution: for the first row, generate all the possibilities that have the right sum, then for each of these, generate all the possibilities for the 2nd row, and so on. Once you have generated all the rows, you check if the sum of the columns is right. But this will take a lot of time. My math might be rusty at this time of the day, but I believe the number of distinct possibilities for a row of length n of which k bits are 1 is given by the binomial coefficient or nchoosek(n,k) in Matlab. To determine the total number of possibilities, you have to multiply this number for every row:
>> n = 7;
>> row= [4 3 2 4 4 4 3];
>> prod(arrayfun(#(k) nchoosek(n, k), row))
ans =
3.8604e+10
This is a lot of possibilities to check! Doing the same for the columns gives
>> col= [4 2 5 2 6 1 4];
>> prod(arrayfun(#(k) nchoosek(n, k), col))
ans =
555891525
Still a large number, but 'only' a factor 70 smaller.
It might be possible to improve this brute-force method a little bit by seeing if the later rows are already constrained by the previous rows. If in your example, for a particular combination of the first two rows, both rows have a 1 in the second column, the rest of this column should all be 0, since the sum must be 2. This reduces the number of possibilities for the remaining rows a bit. Implementing such checks might complicate things a bit, but they might make the difference between a calculation that takes 2 days or one that takes just 1 hour.
An optimized version of this might alternatively generate rows and columns, and start with those for which the number of possibilities is the lowest. I don't know if there is a more elegant solution than this brute-force method, I would be interested to hear one.

How to Shuffle an Array with Fixed Row/Column Sum?

I need to assign random papers to students of a class, but I have the constraints that:
Each student should have two papers assigned.
Each paper should be assigned to (approximately) the same number of students.
Is there an elegant way to generate a matrix that has this property? i.e. it is shuffled but the row and column sums are constant? As an illustration:
Student A 1 0 0 1 1 0 | 3
Student B 1 0 1 0 0 1 | 3
Student C 0 1 1 0 1 0 | 3
Student D 0 1 0 1 0 1 | 3
----------------
2 2 2 2 2 2
I thought of first building an "initial matrix" with the right row/column sum, then randomly permuting first the rows, then the colums, but how do I generate this initial matrix? The problem here is that I'd be choosing between (e.g.) the following alternatives, and the fact that there are two students with the same pair of papers assigned (in the left setup) won't change through row/column shuffling:
INITIAL (MA): OR (MB):
A 1 1 1 0 0 0 || 1 1 1 0 0 0
B 1 1 1 0 0 0 || 0 1 1 1 0 0
C 0 0 0 1 1 1 || 0 0 0 1 1 1
D 0 0 0 1 1 1 || 1 0 0 0 1 1
I know I could come up with something quick/dirty and just tweak where necessary but it seemed like a fun exercise.
If you want to make permutations, what about:
Chose randomly a student, say student 1
For this student, chose a random paper he has, say paper A
Chose randomly another student
For this student, chose a random paper he has, say paper B (different from A)
Give paper B to student 1 and paper A to student 2.
That way, you preserve both the number of different papers and the number of papers per student. Indeed, both students give one paper and receive one back. Moreover, no paper is created nor deleted.
In term of table, it means finding two pairs of indices(i1,i2) and (j1,j2) such that A(i1,j1) = 1, A(i2,j2)=1, A(i1,j2)=0 and A(i2,j1)=0 and changing the 0s for 1s and the 1s for 0s => The sums of the rows and columns do not change.
Remark 1: If you do not want to proceed by permutations, you can simply put in a vector all the paper (put 2 times paper A, 2 times paper B,...). Then, random shuffle the vector and attribute the k first to the first student, the k next ones to student 2, ... However, you can end with a student having several times the same paper. In this case, make some permutations starting with the surnumerary papers.
You can generate the initial matrix as follows (pseudo-Python syntax):
column_sum = [0] * n_students
for i in range(n_students):
if column_sum[i] < max_allowed:
for j in range(i + 1, n_students):
if column_sum[j] < max_allowed:
generate_row_with_ones_at(i, j)
column_sum[i] += 1
column_sum[j] += 1
if n_rows == n_wanted:
return
This is a straightforward iteration over all n choose 2 distinct rows, but with the constraint on column sums enforced as early as possible.

Enumerate graphs under edge and symmetry constraints

I would like to create the set of all directed graphs with n vertices where each vertex has k direct successors and k direct predecessors. n and k won't be that large, rather around n = 8 and k = 3. The set includes cyclic and acyclic graphs. Each graph in turn will serve as a template for sampling a large number of weighted graphs.
My interest is in the role of topology motifs so I don't want to sample weights for any two graphs that are symmetric to each other, where symmetry means that no permutation of vertices exists in one graph that transforms it into the other.
A naive solution would be to consider the 2 ^ (n * (n - 1)) adjacency matrices and eliminate all those (most of them) for which direct successor or predecessor constraints are violated. For n = 8, that's still few enough bits to represent and simply enumerate each matrix comfortably inside a uint64_t.
Keeping track of row counts and column counts would be another improvement, but the real bottleneck will be adding the graph to the result set, at which point we need to test for symmetry against each other graph that's already in the set. For n = 8 that would be already more than 40,000 permutations per insert operation.
Could anyone refer me to an algorithm that I could read up on that can do all this in a smarter way? Is there a graph library for C, C++, Java, or Python that already implements such a comprehensive graph generator? Is there a repository where someone has already "tabulated" all graphs for reasonable n and k?
Graph isomorphism is, in my opinion, not something you should be thinking about implementing yourself. I believe the current state-of-the-art is Brendan McKay's Nauty (and associated programs/libraries). It's a bit of a bear to work with, but it may be worth it to avoid doing your own, naive graph isomorphism. Also, it's primarily geared towards undirected graphs, but it can do digraphs as well. You may want to check out the geng (which generates undirected graphs) and directg (which generates digraphs given an underlying graph) utilities that come with Nauty.
This is more of a comment than an answer, because it seems like I have missed something in your question.
First of all, is it possible for such a graph to be acyclic?
I am also wondering about your symmetry constraint. Does this not make all such graphs symmetric to one another? Is it allowed to permute rows and columns of the connection-matrix?
For example, if we allow self-connections in the graph, does the following connection-matrix fulfill your conditions?
1 1 0 0 0 0 0 1
1 1 1 0 0 0 0 0
0 1 1 1 0 0 0 0
0 0 1 1 1 0 0 0
0 0 0 1 1 1 0 0
0 0 0 0 1 1 1 0
0 0 0 0 0 1 1 1
1 0 0 0 0 0 1 1
Starting from this matrix, is it then not possible to permute the rows and columns of it to obtain all such graphs where all rows and columns have a sum of three?
One example of such a matrix can be obtained from the above matrix A in the following way (using MATLAB).
>> A(randperm(8),randperm(8))
ans =
0 1 0 0 0 1 1 0
0 0 1 0 1 0 1 0
1 1 0 1 0 0 0 0
1 1 0 0 0 1 0 0
1 0 0 1 0 0 0 1
0 0 1 1 0 0 0 1
0 0 1 0 1 0 0 1
0 0 0 0 1 1 1 0
PS. In this case I have repeated the command a few times in order to obtain a matrix with only zeros in the diagonal. :)
Edit
Ah, I see from your comments that I was not correct. Of course the permutation index must be the same for rows and columns. I at least should have noticed it when I started out with a graph with self-connections and obtained one without them after the permutation.
A random isomorphic permutation would instead look like this:
idx = randperm(8);
A(idx,idx);
which will keep all the self-connections.
Perhaps this could be of some use when the matrices are generated, but it is not at all as useful as I thought it would be.

Resources