Find every possible permutation of bits in a 2D array that has a single group of contiguous 1s - algorithm

Below I have represented 2 permutations of bits in a 2D bit array (1s are red). The matrix on the left has a single group of contiguous 1s but the right matrix has 2.
I would like to loop through every possible permutation of binary values in such an array that has a single group of contiguous 1s. I am aware that for a 10×7 grid like above there are 2(10 × 7) permutations when you include non-contiguous permutations, but my hope is that by excluding non-contiguous permutations I will be able to go through them all in reasonable CPU time.
Speaking of reasonableness, I am also interested in an algorithm to determine how many permutations are contiguous.
My question is similar to, but different from, these:
2D Bit matrix with every possible combination
Finding Contiguous Areas of Bits in 2D Bit Array
Any help is appreciated. I'm a bit stuck.

So, I found that the OEIS (Online Encyclopedia of Integer Sequences) has a sequence from n = 0..7 for the "number of nonzero n X n binary arrays with all 1's connected" (A059525). They provide no formula though except for grids fixed at 1 cell wide (triangular numbers), or 2 or 3 cells wide. There's a sequence for 4 x n too but no formula.

Two approaches I can think of. One is to iterate through all possible sequences and devise a test for non-contiguous groups and some method for skipping over large regions guaranteed to be non-contiguous.
A second approach is to attempt to build all sets of contiguous groups so you don't need to test. This is the approach I would take:
Let n = width * height
Enumerate blocks left to right, top to bottom from 0 to n - 1
Fix a block at position 0.
Generate all contiguous solutions between 1 and n blocks extending from position 0
Omit position 0 and fix a block at position 1
Find all contiguous solutions between 1 and n - 1 blocks extending from position 1
Continue until you've reached position n
You can place your pieces according to the following rules, backtracking for the next placement at each depth:
To left of most recently placed piece if placed in row above prior piece provided that no other neighbors exist for that vacancy.
Above left-most available piece in row of most recently placed piece if no other neighbors exist for that vacancy.
To right of most recently placed piece (if adjacent piece exists)
On the next row, farthest left vacancy such that upper row has a piece above any contiguous right remaining neighbors
Next move for any backtracked position is first available move to the right of, or in the row below, the backtracked position (obeying prior 4 rules)

Related

Algorithm challenge to merge sets

Given n sets of numbers. Each set contains some numbers from 1 to 100. How to select sets to merge into the longest set under a special rule, only two non-overlapping sets can merge. [1,2,3] can merge with [4,5] but not [3,4]. What will be an efficient algorithm to merge into the longest set.
My first attempt is to form an n by n matrix. Each row/column represents a set. Entry(i,j) equals to 1 if two sets overlap, entry(i,i) stores the length of set i. Then the questions becomes can we perform row and column operations at the same time to create a diagonal sub-matrix on top left corner whose trace is as large as possible.
However, I got stuck in how to efficiently perform row and column operations to form such a diagonal sub-matrix on top left corner.
As already pointed out in the comments (maximum coverage problem) you have a NP-hart problem. Luckily, matlab offers solvers for integer linear programming.
So we try to reduce the problem to the form:
min f*x subject to Ax<=b , 0<=x
There are n sets, we can encode a set as a vector of 0s and 1s. For example (1,1,1,0,0,...) would represent {1,2,3} and (0,0,1,1,0,0...) - {3,4}.
Every column of A represents a set. A(i,j)=1 means that the i-th element is in the j-th set, A(i,j)=0 means that the i-th element is not in the j-th set.
Now, x represents the sets we select: if x_j=1 than the set j is selected, if x_j=0 - than not selected!
As every element must be at most in one set, we choose b=(1, 1, 1, ..., 1): If we take two sets which contain the i-th element, than the i-th element of (Ax) would be at least 2.
The only question left is what is f? We try to maximize the number of elements in the union, so we choose f_j=-|set_j| (minus owing to min<->max conversion), with |set_j| - number of elements in the j-th set.
Putting it all in matlab we get:
f=-sum(A)
xopt=intlinprog(f.',1:n,A,ones(m,1),[],[],zeros(n,1),ones(n,1))
f.' - cost function as column
1:n - all n elements of x are integers
A - encodes n sets
ones(m,1) - b=(1,1,1...), there are m=100 elements
[],[] - there are no constrains of the form Aeq*x=beq
zeros(n,1), 0<=x must hold
ones(n,1), x<=1 follows already from others constrains, but maybe it will help the solver a little bit
You can represent sets as bit fields. A bitwise and operation yielding zero would indicate non-overlapping sets. Depending on the width of the underlying data type, you may need to perform multiple and operations. For example, with a 64 bit machine word size, I'd need two words to cover 1 to 100 as a bit field.

Heuristics for this (probably) NP-complete puzzle game

I asked whether this problem was NP-complete on the Computer Science forum, but asking for programming heuristics seems better suited for this site. So here it goes.
You are given an NxN grid of unit squares and 2N binary strings of length N. The goal is to fill the grid with 0's and 1's so that each string appears once and only once in the grid, either horizontally (left to right) or vertically (top down). Or determine that no such solution exists. If N is not fixed I suspect this is an NP-complete problem. However are there any heuristics that can hopefully speed up the search to faster than brute force trying all ways to fill in the grid with N vertical strings?
I remember programming this for my friend that had the 5x5 physical version of this game, but I used brute force back then. I can only think of this heuristic:
Consider a 4x4 map with these 8 strings (read each from left to right):
1 1 0 1
1 0 0 1
1 0 1 1
1 0 1 0
1 1 1 1
1 0 0 0
0 0 1 1
1 1 1 0
(Note that this is already solved, since the second 4 is the first 4 transposed)
First attempt:
We will choose columns from left to right. Since 7 of 8 strings start with 1, we will try to put the one with most 1s to the first column (so that we can lay rows more easily when columns are done).
In the second column, most string have 0, so you can also try putting a string with most zeros to the second row, and so on.
This i would call a wide-1 prediction, since it only looks at one column at a time
(Possible) Improvement:
You can look at 2 columns at a time (a wide-2 prediction, if i may call it like that). In this case, from the 8 strings, the most common combination of first two bits is 10 (5/8), so you would like to choose first two columns so the the combination 10 occurring as much as possible (in this case, 1111 followed by 1000 has 3 of 4 10 at start).
(Of course you don't have to stop at 2)
Weaknesses:
I don't know if this would work. I just made it up and thought it might work.
If you choose to he wide-X prediction, the number of possibilities is exponential with X
This can absolutely fail if the distribution of combinations if even.
What you can do:
As i said, this game has physical 5x5 adaptation, only there you can also lay the string from right-to-left and bottom-to-top, if you found that name, you could google further. I unfortunately don't remember it.
Sounds like you want the crossword grid filling algorithm:
First, build 2N subsets of your 2N strings -- each subset has all the strings with a particular bit at a particular postion. So subset(0,3) is all the strings that have a 0 in the 3rd position and subset(1,5) is all the strings that have a 1 in the 5th position.
The algorithm is a basic brute-force depth fist search trying all possible mappings of strings to slots in the grid, with severe pruning of impossible branches
Your search state is a set of assignments of strings to slots and a set of sets of possible assignments to the remaining slots. The initial state has 0 assignments and 2N sets, all of which contain all 2N strings.
At each step of the search, pick the most constrained set (the set with the fewest elements) from the set of possible sets. Try each element of the set in turn in that slot (adding it to the assigments and removing it from the set of sets), and constrain all the remaining sets of sets by removing the chosen string and intersecting the crossing sets with subset(X,N) (computed in step 1) where X is the bit from the chosen string and N is the row/column number of the chosen string
If you find an empty set when picking above, there is no solution with the choices so far, so backtrack up the tree to a different choice
This is still EXPTIME, but it is about as fast as you can get it. Since the main time consuming step is the set intersections, using 2N bit binary strings for your set representation is very fast -- for N=32, the sets fit in a 64-bit word and can be intersected with a single AND instruction. It also helps to have a POPCOUNT instruction, since you also need set sizes.
This can be solved as a 0/1 integer linear program with O(N^2) variables and constraints. First there are variables Xij which are 1 if string i is assigned to line j (where j=1 to N are rows and j = (N+1) to 2N are columns). Then there is a variable for each square in the grid, which indicates if the entry is 0 or 1. If the position of the square is (i,j) with variable Yij then the sum of all X variables for line j that correspond to strings that have a 1 in position i is equal to Yij, and the sum of all X variables for line j that correspond to strings that have a 0 in position i is equal to (1 - Yij). And similarly for line i and position j. Finally, the sum of all X variables Xij for each string i (summed over all lines j) is equal to 1.
There has been a lot of research in speeding up solvers for 0/1 integer programming so this may be able to often handle fairly large N (like N=100) for many examples. Also, in some cases, solving the relaxed non-integer linear program and rounding the solution off to 0/1 may produce a valid solution, in polynomial time.
We could choose the first lg 2N rows out of the 2N strings, and then since 2^(lg 2N) = 2N, in a lot of cases there shouldn't be very many ways to assign the N columns so that the prefixes of length lg 2N are respected. Then all the rows are filled in so they can be checked to see if a solution has been found. We can also try assigning more rows in the beginning, and fill in different combinations of rows besides the initial rows. (e.g. we can try filling in contiguous rows starting anywhere in the grid).
Running time for assigning lg 2N rows out of 2N strings is O((2N)^(lg 2N)) = O(2^((lg 2N)^2)), which grows slower than 2^N. Assigning columns to match the prefixes is the part that's the hardest to predict run time. If a prefix occurs K times among the assigned rows, and there are M remaining strings that have the prefix, then the number of assignments for this prefix is M*(M-1)...(M-K+1). The total number of possible column assignments is the product of these terms over all prefixes that occur among the rows. If this gets to be too large, the number of rows initially assigned can be increased. But it's hard to predict the worst-case run time unless an assumption is made like the NxN grid is filled in randomly.

Hungarian algorithm matching one set to itself

I'm looking for a variation on the Hungarian algorithm (I think) that will pair N people to themselves, excluding self-pairs and reverse-pairs, where N is even.
E.g. given N0 - N6 and a matrix C of costs for each pair, how can I obtain the set of 3 lowest-cost pairs?
C = [ [ - 5 6 1 9 4 ]
[ 5 - 4 8 6 2 ]
[ 6 4 - 3 7 6 ]
[ 1 8 3 - 8 9 ]
[ 9 6 7 8 - 5 ]
[ 4 2 6 9 5 - ] ]
In this example, the resulting pairs would be:
N0, N3
N1, N4
N2, N5
Having typed this out I'm now wondering if I can just increase the cost values in the "bottom half" of the matrix... or even better, remove them.
Is there a variation of Hungarian that works on a non-square matrix?
Or, is there another algorithm that solves this variation of the problem?
Increasing the values of the bottom half can result in an incorrect solution. You can see this as the corner coordinates (in your example coordinates 0,1 and 5,6) of the upper half will always be considered to be in the minimum X pairs, where X is the size of the matrix.
My Solution for finding the minimum X pairs
Take the standard Hungarian algorithm
You can set the diagonal to a value greater than the sum of the elements in the unaltered matrix — this step may allow you to speed up your implementation, depending on how your implementation handles nulls.
1) The first step of the standard algorithm is to go through each row, and then each column, reducing each row and column individually such that the minimum of each row and column is zero. This is unchanged.
The general principle of this solution, is to mirror every subsequent step of the original algorithm around the diagonal.
2) The next step of the algorithm is to select rows and columns so that every zero is included within the selection, using the minimum number of rows and columns.
My alteration to the algorithm means that when selecting a row/column, also select the column/row mirrored around that diagonal, but count it as one row or column selection for all purposes, including counting the diagonal (which will be the intersection of these mirrored row/column selection pairs) as only being selected once.
3) The next step is to check if you have the right solution — which in the standard algorithm means checking if the number of rows and columns selected is equal to the size of the matrix — in your example if six rows and columns have been selected.
For this variation however, when calculating when to end the algorithm treat each row/column mirrored pair of selections as a single row or column selection. If you have the right solution then end the algorithm here.
4) If the number of rows and columns is less than the size of the matrix, then find the smallest unselected element, and call it k. Subtract k from all uncovered elements, and add it to all elements that are covered twice (again, counting the mirrored row/column selection as a single selection).
My alteration of the algorithm means that when altering values, you will alter their mirrored values identically (this should happen naturally as a result of the mirrored selection process).
Then go back to step 2 and repeat steps 2-4 until step 3 indicates the algorithm is finished.
This will result in pairs of mirrored answers (which are the coordinates — to get the value of these coordinates refer back to the original matrix) — you can safely delete half of each pair arbitrarily.
To alter this algorithm to find the minimum R pairs, where R is less than the size of the matrix, reduce the stopping point in step 3 to R. This alteration is essential to answering your question.
As #Niklas B, stated you are solving Weighted perfect matching problem
take a look at this
here is part of document describing Primal-dual algorithm for weighted perfect matching
Please read all and let me know if is useful to you

Labelling a grid using n labels, where every label neighbours every other label

I am trying to create a grid with n separate labels, where each cell is labelled with one of the n labels such that all labels neighbour (edge-wise) all other labels somewhere in the grid (I don't care where). Labels are free to appear as many times as necessary, and I'd like the grid to be as small as possible. As an example, here's a grid for five labels, 1 to 5:
3 2 4
5 1 3
2 4 5
While generating this by hand is not too bad for small numbers of labels, it appears to be very hard to generate a grid of reasonable size for larger numbers and so I'm looking to write a program to generate them, without having to resort to a brute-force search. I imagine this must have been investigated before, but the closest I've found are De Bruijn tori, which are not quite what I'm looking for. Any help would be appreciated.
EDIT: Thanks to Benawii for the following improved description:
"Given an integer n, generate the smallest possible matrix where for every pair (x,y) where x≠y and x,y ∈ {1,...,n} there exists a pair of adjacent cells in the matrix whose values are x and y."
You can experiment with a simple greedy algorithm.
I don't think that I'm able to give you a strict mathematical prove, at least because the question is not strictly defined, but the algorithm is quite intuitive.
First, if you have 1...K numbers (K labels) then you need at least K*(K-1)/2 adjacent cells (connections) for full coverage. A matrix of size NxM generates (N-1)*M+(M-1)*N=2*N*M-(N+M) connections.
Since you didn't mention what you understand under 'smallest matrix', let's assume that you meant the area. In that case it is obvious that for the given area the square matrix will generate bigger number of connections because it has more 'inner' cells adjacent to 4 others. For example, for area 16 the matrix 4x4 is better than 2x8. 'Better' is intuitive - more connections and more chances to reach the goal. So lets use target square matrixes and expand them if needed. The above formula will become 2*N*(N-1).
Then we can experiment with the following greedy algorithm:
For input number K find the N such that 2*N*(N-1)>K*(K-1)/2. A simple school equation.
Keep an adjacency matrix M, set M[i][i]=1 for all i, and 0 for the rest of the pairs.
Initialize a resulting matrix R of size NxN, fill with 'empty value' markers, for example with -1.
Start from top-left corner and iterate right-down:
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
R[i][j];
for each such R[i][j] (which is -1 now) find such a value which will 'fit best'. Again, 'fit best' is an intuitive definition, here we understand such a value that will contribute to a new unused connection. For that reason create the set of already filled cell neighbor numbers - S, its size is 2 at most (upper and left neighbor). Then find first k such that M[x][k]=0 for both numbers x in S. If no such number then try at least one new connection, if no number even for one then both neighbors are completely covered, put some number from uncovered here - probably the one in 'worst situation' - such x where Sum(M[x][i]) is the smallest. You should also choose the one in 'worst situation' when there are several ones to choose from in any case.
After setting the value for R[i][j] don't forget to mark the new connections with numbers x from S - M[R[i][j]][x] = M[x][R[i][j]] = 1.
If the matrix is filled and there are still unmarked connections in M then append another row to the matrix and continue. If all the connections are found before the end then remove extra rows.
You can check this algorithm and see what will happen. Step 5 is the place for playing around, particularly in guessing which one to choose in equal situation (several numbers can be in equally 'worst situation').
Example:
for K=6 we need 15 connections:
N=4, we need 4x4 square matrix. The theory says that 4x3 matrix has 17 connections, so it can possibly fit, but we will try 4x4.
Here is the output of the algorithm above:
1234
5615
2413
36**
I'm not sure if you can do by 4x3, maybe yes... :)

How to find contiguous areas in matrix with less memory?

Suppose I have a matrix of 0 and 1. Now I would like to count all contiguous areas in the matrix that contain only 1.
allocate a new matrix "visited" to track visited elements
COUNT = 0
for each element E in the matrix
if E == 1 and E is not visited
COUNT = COUNT + 1
run BFS/DFS to visit all not-visited matrix elements connected to E
return COUNT
If the matrix is N x N then we need additional N x N "visited" matrix and also a queue/stack for BFS/DFS. Now I wonder if there is an algorithm, which solves this problem and requires less memory.
If memory is so critical, then you can use the following approach:
Additionally keep only an array with the length of matrix's one row, then scan matrix rows from up to bottom, and each time when you process current row, there are intervals of continuous 1's in it, which can merge different pieces located above them, for example
1110001110011
1111001110011
0011111000011
if you consider only first 2 rows, there are three different pieces, but 3rd row merges first two of them. So you can use the additional array to keep what different pieces are there until last row, i.e. initially it will be filled with 0's, and when you are just starting scanning 3rd row of matrix, that array should look like 1111002220033, i.e. each number shows to which part that cell belongs. After processing 3rd row they will be merged and additional array will become 0011111000033. So after scanning each row some parts are merged, and some parts disappear. Each time a part disappears, you can increment the counter to have the answer in the end.
However, this is more tricky and complex approach than BFS/DFS, and I don't think that you will need to use this anyway in practice.

Resources