Algorithm to identify a unique free polyomino (or polyomino hash) - algorithm

In short: How to hash a free polyomino?
This could be generalized into: How to efficiently hash an arbitrary collection of 2D integer coordinates, where a set contains unique pairs of non-negative integers, and a set is considered unique if and only if no translation, rotation, or flip can map it identically to another set?
For impatient readers, please note I'm fully aware of a brute force approach. I'm looking for a better way -- or a very convincing proof that no other way can exist.
I'm working on some different algorithms to generate random polyominos. I want to test their output to determine how random they are -- i.e. are certain instances of a given order generated more frequently than others. Visually, it is very easy to identify different orientations of a free polyomino, for example the following Wikipedia illustration shows all 8 orientations of the "F" pentomino (Source):
How would one put a number on this polyomino - that is, hash a free polyomino? I don't want to depend on a prepolulated list of "named" polyominos. Broadly agreed-upon names only exists for orders 4 and 5, anyway.
This is not necessarily equavalent to enumerating all free (or one-sided, or fixed) polyominos of a given order. I only want to count the number of times a given configuration appears. If a generating algorithm never produces a certain polyomino it will simply not be counted.
The basic logic of the counting is:
testcount = 10000 // Arbitrary
order = 6 // Create hexominos in this test
hashcounts = new hashtable
for i = 1 to testcount
poly = GenerateRandomPolyomino(order)
hash = PolyHash(poly)
if hashcounts.contains(hash) then
hashcounts[hash]++
else
hashcounts[hash] = 1
What I'm looking for is an efficient PolyHash algorithm. The input polyominos are simply defined as a set of coordinates. One orientation of the T tetronimo could be, for example:
[[1,0], [0,1], [1,1], [2,1]]:
|012
-+---
0| X
1|XXX
You can assume that that input polyomino will already be normalized to be aligned against the X and Y axes and have only positive coordinates. Formally, each set:
Will have at least 1 coordinate where the x value is 0
Will have at least 1 coordinate where the y value is 0
Will not have any coordinates where x < 0 or y < 0
I'm really looking for novel algorithms that avoid the increasing number of integer operations required by a general brute force approach, described below.
Brute force
A brute force solution suggested here and here consists of hashing each set as an unsigned integer using each coordinate as a binary flag, and taking the minimum hash of all possible rotations (and in my case flips), where each rotation / flip must also be translated to the origin. This results in a total of 23 set operations for each input set to get the "free" hash:
Rotate (6x)
Flip (1x)
Translate (7x)
Hash (8x)
Find minimum of computed hashes (1x)
Where the sequence of operations to obtain each hash is:
Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Flip, Translate, Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Rotate, Translate, Hash

Well, I came up with a completely different approach. (Also thanks to corsiKa for some helpful insights!) Rather than hashing / encoding the squares, encode the path around them. The path consists of a sequence of 'turns' (including no turn) to perform before drawing each unit segment. I think an algorithm for getting the path from the coordinates of the squares is outside the scope of this question.
This does something very important: it destroys all location and orientation information, which we don't need. It is also very easy to get the path of the flipped object: you do so by simply reversing the order of the elements. Storage is compact because each element requires only 2 bits.
It does introduce one additional constraint: the polyomino must not have fully enclosed holes. (Formally, it must be simply connected.) Most discussions of polyominos consider a hole to exist even if it is sealed only by two touching corners, as this prevents tiling with any other non-trivial polyomino. Tracing the edges is not hindered by touching corners (as in the single heptomino with a hole), but it cannot leap from one outer loop to an inner one as in the complete ring-shaped octomino:
It also produces one additional challenge: finding the minumum ordering of the encoded path loop. This is because any rotation of the path (in the sense of string rotation) is a valid encoding. To always get the same encoding we have to find the minimal (or maximal) rotation of the path instructions. Thankfully this problem has already been solved: see for example http://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation.
Example:
If we arbitrarily assign the following values to the move operations:
No turn: 1
Turn right: 2
Turn left: 3
Here is the F pentomino traced clockwise:
An arbitrary initial encoding for the F pentomino is (starting at the bottom right corner):
2,2,3,1,2,2,3,2,2,3,2,1
The resulting minimum rotation of the encoding is
1,2,2,3,1,2,2,3,2,2,3,2
With 12 elements, this loop can be packed into 24 bits if two bits are used per instruction or only 19 bits if instructions are encoded as powers of three. Even with the 2-bit element encoding can easily fit that in a single unsigned 32 bit integer 0x6B6BAE:
1- 2- 2- 3- 1- 2- 2- 3- 2- 2- 3- 2
= 01-10-10-11-01-10-10-11-10-10-11-10
= 00000000011010110110101110101110
= 0x006B6BAE
The base-3 encoding with the start of the loop in the most significant powers of 3 is 0x5795F:
1*3^11 + 2*3^10 + 2*3^9 + 3*3^8 + 1*3^7 + 2*3^6
+ 2*3^5 + 3*3^4 + 2*3^3 + 2*3^2 + 3*3^1 + 2*3^0
= 0x0005795F
The maximum number of vertexes in the path around a polyomino of order n is 2n + 2. For 2-bit encoding the number of bits is twice the number of moves, so the maximum bits needed is 4n + 4. For base-3 encoding it's:
Where the "gallows" is the ceiling function. Accordingly any polyomino up to order 9 can be encoded in a single 32 bit integer. Knowing this you can choose your platform-specific data structure accordingly for the fastest hash comparison given the maximum order of the polyominos you'll be hashing.

You can reduce it down to 8 hash operations without the need to flip, rotate, or re-translate.
Note that this algorithm assumes you are operating with coordinates relative to itself. That is to say it's not in the wild.
Instead of applying operations that flip, rotate, and translate, instead simply change the order in which you hash.
For instance, let us take the F pent above. In the simple example, let us presume the hash operation was something like this:
int hashPolySingle(Poly p)
int hash = 0
for x = 0 to p.width
fory = 0 to p.height
hash = hash * 31 + p.contains(x,y) ? 1 : 0
hashPolySingle = hash
int hashPoly(Poly p)
int hash = hashPolySingle(p)
p.rotateClockwise() // assume it translates inside
hash = hash * 31 + hashPolySingle(p)
// keep rotating for all 4 oritentations
p.flip()
// hash those 4
Instead of applying the function to all 8 different orientations of the poly, I would apply 8 different hash functions to 1 poly.
int hashPolySingle(Poly p, bool flip, int corner)
int hash = 0
int xstart, xstop, ystart, ystop
bool yfirst
switch(corner)
case 1: xstart = 0
xstop = p.width
ystart = 0
ystop = p.height
yfirst = false
break
case 2: xstart = p.width
xstop = 0
ystart = 0
ystop = p.height
yfirst = true
break
case 3: xstart = p.width
xstop = 0
ystart = p.height
ystop = 0
yfirst = false
break
case 4: xstart = 0
xstop = p.width
ystart = p.height
ystop = 0
yfirst = true
break
default: error()
if(flip) swap(xstart, xstop)
if(flip) swap(ystart, ystop)
if(yfirst)
for y = ystart to ystop
for x = xstart to xstop
hash = hash * 31 + p.contains(x,y) ? 1 : 0
else
for x = xstart to xstop
for y = ystart to ystop
hash = hash * 31 + p.contains(x,y) ? 1 : 0
hashPolySingle = hash
Which is then called in the 8 different ways. You could also encapsulate hashPolySingle in for loop around the corner, and around the flip or not. All the same.
int hashPoly(Poly p)
// approach from each of the 4 corners
int hash = hashPolySingle(p, false, 1)
hash = hash * 31 + hashPolySingle(p, false, 2)
hash = hash * 31 + hashPolySingle(p, false, 3)
hash = hash * 31 + hashPolySingle(p, false, 4)
// flip it
hash = hash * 31 + hashPolySingle(p, true, 1)
hash = hash * 31 + hashPolySingle(p, true, 2)
hash = hash * 31 + hashPolySingle(p, true, 3)
hash = hash * 31 + hashPolySingle(p, true, 4)
hashPoly = hash
In this way, you're implicitly rotating the poly from each direction, but you're not actually performing the rotation and translation. It performs the 8 hashes, which seem to be entirely necessary in order to accurately hash all 8 orientations, but wastes no passes over the poly that are not actually doing hashes. This seems to me to be the most elegant solution.
Note that there may be a better hashPolySingle() algorithm to use. Mine uses a Cartesian exhaustion algorithm that is on the order of O(n^2). Its worst case scenario is an L shape, which would cause there to be an N/2 * (N-1)/2 sized square for only N elements, or an efficiency of 1:(N-1)/4, compared to an I shape which would be 1:1. It may also be that the inherent invariant imposed by the architecture would actually make it less efficient than the naive algorithm.
My suspicion is that the above concern can be alleviated by simulating the Cartesian exhaustion by converting the set of nodes into an bi-directional graph that can be traversed, causing the nodes to be hit in the same order as my much more naive hashing algorithm, ignoring the empty spaces. This will bring the algorithm down to O(n) as the graph should be able to be constructed in O(n) time. Because I haven't done this, I can't say for sure, which is why I say it's only a suspicion, but there should be a way to do it.

Here's my DFS (depth first search) explained:
Start with the top-most cell (left-most as a tiebreaker). Mark it as visited. Every time you visit a cell, check all four directions for unvisited neighbors. Always check the four directions in this order: up, left, down, right.
Example
In this example, up and left fail, but down succeeds. So far our output is 001, and we recursively search the "down" cell.
We mark our new current cell as visited (and we'll finish searching the original cell when we finish searching this cell). Here, up=0, left=1.
We search the left-most cell and there are no unvisted neighbors (up=0, left=0, down=0, right=0). Our total output so far is 001010000.
We continue our search of the second cell. down=0, right=1. We search the cell to the right.
up=0, left=0, down=1. Search the down cell: all 0s. Total output so far is 001010000010010000. Then, we return from the down cell...
right=0, return. return. (Now, we are at the starting cell.) right=0. Done!
So, the total output is 20 (N*4) bits: 00101000001001000000.
Encoding improvement
But, we can save some bits.
The last visited cell will always encode 0000 for its four directions. So, don't encode the last visited cell to save 4 bits.
Another improvement: if you reached a cell by moving left, don't check that cells right-side. So, we only need 3 bits per cell, except 4 bits for the first cell, and 0 for the last cell.
The first cell will never have an up, or left neighbor, so omit these bits. So the first cell takes 2 bits.
So, with these improvements, we use only N*3-4 bits (e.g. 5 cells -> 11 bits; 9 cells -> 23 bits).
If you really want, you can compact a little more by noting that exactly N-1 bits will be "1".
Caveat
Yes, you'll need to encode all 8 rotations/flips of the polyomino and choose the least to get a canonical encoding.
I suspect this will still be faster than the outline approach. Also, holes in the polyomino shouldn't be a problem.

I worked on the same problem recently. I solved the problem fairly simply by
(1) generate a unique ID for a polyomino, such that each identical poly would have the same UID. For example, find the bounding box, normalize the corner of the bounding box, and collect the set of non-empty cells.
(2) generate all possible permutations by rotating (and flipping, if appropriate) a polyomino, and look for duplicates.
The advantage of this brute approach, other than it's simplicity, is that it still works if the
polys are distinguishable in some other way, for example if some of them are colored or numbered.

You can set up something like a trie to uniquely identify (and not just hash) your polyomino. Take your normalized polyomino and set up a binary search tree, where the root branches on whether (0,0) is has a set pixel, the next level branches on whether (0,1) has a set pixel, and so on. When you look up a polyomino, simply normalize it and then walk the tree. If you find it in the trie, then you're done. If not, assign that polyomino a unique id (just increment a counter), generate all 8 possible rotations and flips, then add those 8 to the trie.
On a trie miss, you'll have to generate all the rotations and reflections. But on a trie hit it should cost less (O(k^2) for k-polyominos).
To make lookups even more efficient, you could use a couple bits at a time and use a wider tree instead of a binary tree.

A valid hash function, if you're really afraid of hash collisions, is to make a hash function x + order * y for coordinates and then loop trough all the coordinates of a piece, adding (order ^ i) * hash(coord[i]) to the piece hash. That way, you can guarantee you won't get any hash collisions.

Related

Pairwise matching of tiles

Recently in a coding competition I came across this question.
We have a 1000 tiles where each tile is a 3x3 matrix. Each cell in the
matrix has an integer value from 0 to 9 which signifies the elevation
of the cell. The problem was to find the maximum pairs of tiles such
that they fit in perfectly. The tiles may be rotated to fit in. By fit
in it means that for tile A and tile B
A[i]+B[i]=const for i=0 to 8
The approach I thought for this problem was that I could maintain a hash value corresponding to each tile. Then I would find the possible combinations of tiles that would be
a possible fit and look it up in the hashtable.
Ex. For the tile below
5 3 2 4 6 7 5 7 8
4 8 9 matches with 5 1 0 for const = 9 & with 6 2 1 for const=10
1 4 5 8 5 4 9 6 5
for this tile the 'const' would range from 9(adding 0 to the maximum element) to 10(adding 9 to the minimum element).
So I would get two possible combinations for tiles which i would look up in the table.
But this method is greedy and does not give the desired answer and also I was unable to think of a proper hash function which would consider of all possible rotations.
So what would be a good approach for solving this problem?
I am sure there is a brute force way to solve this problem but I was actually wondering whether a viable solution to the problem exists on the lines of "pairwise equal to k" problem.
For n=1000 I would stick with the O(n^2) brute force solution. However an O(n log n) algorithm is described below.
The lexicographicalish ordering is defined by the following less-than operator:
Given two matrices M1, M2, define M1' as M1 if M1[1] is positive and -M1 if M1[1] is negative, and likewise or M2'. We say that M1<M2 if M1'[1]<M2'[1], or if M1'[1] == M2'[1] and M1'[2] < M2'[2], or if M1'[1] == M2'[1] and M1'[2] == M2'[2] and M1'[3] < M2'[3] etc.
Subtract the middle element of each matrix from the rest of the elements of the matrix i.e. A'[5] = A[5] and A'[i] = A[i] - A[5]. Then A' fits with B' if A'[i] +B'[i] = 0 for i!=5, and the elevation is A'[5] + B'[5].
Create an array of matrices and a dictionary. Rotate each matrix so that the top left corner has minimal absolute value before adding it to the array. If there are multiple corners with the same absolute value then duplicate the matrix and store both rotations in the array.
If some rotation of a matrix fits with itself and i,j are indices of rotations of this matrix, add the key-value pairs (i,j) and (j, i) to the dictionary.
Create an array S of indices 1,2... and sort S using the lexicographicalish ordering.
Instead of needing O(n^2) operations to check all possible pairs of matrices, it is only necessary to check all pairs of matrices with indices are S_i and S_(i+1). If a pair of matrices fits, use the dictionary to check that the two matrices are not rotations of the same original matrix before calculating the elevation of the pair.
Not sure if this is the most efficient way for doing this, but it sure works.
What I would do is:
Go over all tiles and check the maximum and minimum value of each tile and save it in a different array.
Check all possible pairs.
If min(A) + max(B) == min(B) + max(A) then check if some rotation of B fits perfectly on A. If it does, add 1 to your count.
Else, it does not fit so you can skip the checking for this pair.
Note: The reason for saving both maximum and minimum for each tile is that it might save us unnecessary calculations and checking rotations as in O(1) we can check if it doesn't fit.

Solving Rubik's Cubes for Dummies

Mr. Dum: Hello, I'm very stupid but I still want to solve a 3x3x3 Rubik's cube.
Mr. Smart: Well, you're in luck. Here is guidance to do just that!
Mr. Dum: No that won't work for me because I'm Dum. I'm only capable of following an algorithm like this.
pick up cube
look up a list of moves from some smart person
while(cube is not solved)
perform the next move from list and turn
the cube as instructed. If there are no
more turns in the list, I'll start from the
beginning again.
hey look, it's solved!
Mr. Smart: Ah, no problem here's your list!
Ok, so what sort of list would work for a problem like this? I know that the Rubik's cube can never be farther away from 20 moves to solved, and that there are 43,252,003,274,489,856,000 permutations of a Rubik's Cube. Therefore, I think that this list could be (20 * 43,252,003,274,489,856,000) long, but
Does anyone know the shortest such list currently known?
How would you find a theoretically shortest list?
Note that this is purely a theoretical problem and I don't actually want to program a computer to do this.
An idea to get such a path through all permutations of the Cube would be to use some of the sequences that human solvers use. The main structure of the algorithm for Mr Smart would look like this:
function getMoves(callback):
paritySwitchingSequences = getParitySwitchingSequences()
cornerCycleSequences = getCornerCycleSequences()
edgeCycleSequences = getEdgeCycleSequences()
cornerRotationSequences = getCornerRotationSequences()
edgeFlipSequences = getEdgeFlipSequences()
foreach paritySeq in paritySwitchingSequences:
if callback(paritySeq) return
foreach cornerCycleSeq in cornerCycleSequences:
if callback(cornerCycleSeq) return
foreach edgeCycleSeq in edgeCycleSequences:
if callback(edgeCycleSeq) return
foreach cornerRotationSeq in cornerRotationSequences:
if callback(cornerRotationSeq) return
foreach edgeFLipSeq in edgeFlipSequences:
if callback(edgeFlipSeq) return
The 5 get... functions would all return an array of sequences, where each sequence is an array of moves. A callback system will avoid the need for keeping all moves in memory, and could be rewritten in the more modern generator syntax if available in the target language.
Mr Dumb would have this code:
function performMoves(sequence):
foreach move in sequence:
cube.do(move)
if cube.isSolved() then return true
return false
getMoves(performMoves)
Mr Dumb's code passes his callback function once to Mr Smart, who will then keep calling back that function until it returns true.
Mr Smart's code will go through each of the 5 get functions to retrieve the basic sequences he needs to start producing sequences to the caller. I will describe those functions below, starting with the one whose result is used in the innermost loop:
getEdgeFlipSequences
Imagine a cube that has all pieces in their right slots and rightly rotated, except for the edges which could be flipped, but still in right slot. If they would be flipped, the cube would be solved. As there are 12 edges, but edges can only be flipped with 2 at the same time, the number of ways this cube could have its edges flipped (or not) is 2^11 = 2048. Otherwise put, there are 11 of the 12 edges that can have any flip status (flipped or not), while the last one is bound by the flips of the other 11.
This function should return just as many sequences, such that after applying one of those sequences the next state of the cube is produced that has a unique set of edges flipped.
function getEdgeFlipSequences
sequences = []
for i = 1 to 2^11:
for edge = 1 to 11:
if i % (2^edge) != 0 then break
sequence = getEdgePairFlipSequence(edge, 12)
sequences.push(sequence)
return sequences
The inner loop makes sure that with one flip in each iteration of the outer loop you get exactly all possible flip states.
It is like listing all numbers in binary representation by just flipping one bit to arrive at the next number. The numbers' output will not be in order when produced that way, but you will get them all. For example, for 4 bits (instead of 11), it would go like this:
0000
0001
0011
0010
0110
0111
0101
0100
1100
1101
1111
1110
1010
1011
1001
1000
The sequence will determine which edge to flip together with the 12th edge. I will not go into defining that getEdgePairFlipSequence function now. It is evident that there are sequences for flipping any pair of edges, and where they are not publicly available, one can easily make a few moves to bring those two edges in a better position, do the double flip and return those edges to their original position again by applying the starting moves in reversed order and in opposite direction.
getCornerRotationSequences
The idea is the same as above, but now with rotated corners. The difference is that a corner can have three rotation states. But like with the flipped edges, if you know the rotations of 7 corners (already in their right position), the rotation of the 8th corner is determined as well. So there are 3^7 possible ways a cube can have its corners rotated.
The trick to rotate a corner together with the 8th corner, and so find all possible corner rotations also works here. The pattern in the 3-base number representation would be like this (for 3 corners):
000
001
002
012
011
010
020
021
022
122
121
120
110
111
112
102
101
100
200
201
202
212
211
210
220
221
222
So the code for this function would look like this:
function getCornerRotationSequences
sequences = []
for i = 1 to 3^7:
for corner = 1 to 7:
if i % (3^edge) != 0 break
sequence = getCornerPairRotationSequence(corner, 8)
sequences.push(sequence)
return sequences
Again, I will not define getCornerPairRotationSequence. A similar reasoning as for the edges applies.
getEdgeCycleSequences
When you want to move edges around without affecting the rest of the cube, you need to cycle at least 3 of them, as it is not possible to swap two edges without altering anything else.
For instance, it is possible to swap two edges and two corners. But that would be out of the scope of this function. I will come back to this later when dealing with the last function.
This function aims to find all possible cube states that can be arrived at by repeatedly cycling 3 edges. There are 12 edges, and if you know the position of 10 of them, the positions of the 2 remaining ones are determined (still assuming corners remain at their position). So there are 12!/2 = 239 500 800 possible permutations of edges in these conditions.
This may be a bit of problem memory-wise, as the array of sequences to produce will occupy a multiple of that number in bytes, so we could be talking about a few gigabytes. But I will assume there is enough memory for this:
function getEdgeCycleSequences
sequences = []
cycles = getCyclesReachingAllPermutations([1,2,3,4,5,6,7,8,9,10,11,12])
foreach cycle in cycles:
sequence = getEdgeTripletCycleSequence(cycle[0], cycle[1], cycle[3])
sequences.push(sequence)
return sequences
The getCyclesAchievingAllPermutations function would return an array of triplets of edges, such that if you would cycle the edges from left to right as listed in a triplet, and repeat this for the complete array, you would get to all possible permutations of edges (without altering the position of corners).
Several answers for this question I asked can be used to implement getCyclesReachingAllPermutations. The pseudo code based on this answer could look like this:
function getCyclesReachingAllPermutations(n):
c = [0] * n
b = [0, 1, ... n]
triplets = []
while (true):
triplet = [0]
for (parity = 0; parity < 2; parity++):
for (k = 1; k <= c[k]; k++):
c[k] = 0
if (k == n - 1):
return triplets
c[k] = c[k] + 1
triplet.add( b[k] )
for (j = 1, k--; j < k; j++, k--):
swap(b, j, k)
triplets.add(triplet)
Similarly for the other main functions, also here is a dependency on a function getEdgeTripletCycleSequence, which I will not expand on. There are many known sequences to cycle three edges, for several positions, and others can be easily derived from them.
getCornerCycleSequences
I will keep this short, as it is the same thing as for edges. There are 8!/2 possible permutations for corners if edges don't move.
function getCornerCycleSequences
sequences = []
cycles = getCyclesReachingAllPermutations([1,2,3,4,5,6,7,8])
foreach cycle in cycles:
sequence = getCornerTripletCycleSequence(cycle[0], cycle[1], cycle[3])
sequences.push(sequence)
return sequences
getParitySwitchingSequences
This extra level is needed to deal with the fact that a cube can be in an odd or even position. It is odd when an odd number of quarter-moves (a half turn counts as 2 then) is needed to solve the cube.
I did not mention it before, but all the above used sequences should not change the parity of the cube. I did refer to it implicitly when I wrote that when permuting edges, corners should stay in their original position. This ensures that the parity does not change. If on the other hand you would apply a sequence that swaps two edges and two corners at the same time, you are bound to toggle the parity.
But since that was not accounted for with the four functions above, this extra layer is needed.
The function is quite simple:
function getParitySwitchingSequences
return = [
[L], [-L]
]
L is a constant that represents the quarter move of the left face of the cube, and -L is the same move, but reversed. It could have been any face.
The simplest way to toggle the parity of a cube is just that: perform a quarter move.
Thoughts
This solution is certainly not the optimal one, but it is a solution that will eventually go through all states of the cube, albeit with many duplicate statuses appearing along the way. And it will do so with less than 20 moves between two consecutive permutations. The number of moves will vary between 1 -- for parity toggle -- and 18 -- for flipping two edges allowing for 2 extra moves to bring an edge in a good relative position and 2 for putting that edge back after the double flip with 14 moves, which I think is the worst case.
One quick optimisation would be to put the parity loop as the inner loop, as it only consists of one quarter move it is more efficient to have that one repeated the most.
Hamilton Graph: the best
A graph has been constructed where each edge represents one move, and where the nodes represent all unique cube states. It is cyclic, such that the edge forward from the last node, brings you back to the first node.
So this should allow you to go through all cube states with as many moves. Clearly a better solution cannot exist. The graph can be downloaded.
You can use the De Bruijn sequence to get a sequence that will definitely solve a rubik's cube (because it will contain every possible permutation of size 20).
From wiki (Python):
def de_bruijn(k, n):
"""
De Bruijn sequence for alphabet k
and subsequences of length n.
"""
try:
# let's see if k can be cast to an integer;
# if so, make our alphabet a list
_ = int(k)
alphabet = list(map(str, range(k)))
except (ValueError, TypeError):
alphabet = k
k = len(k)
a = [0] * k * n
sequence = []
def db(t, p):
if t > n:
if n % p == 0:
sequence.extend(a[1:p + 1])
else:
a[t] = a[t - p]
db(t + 1, p)
for j in range(a[t - p] + 1, k):
a[t] = j
db(t + 1, t)
db(1, 1)
return "".join(alphabet[i] for i in sequence)
You can use it kinda like this:
print(de_bruijn(x, 20))
Where 20 is the size of your sequence and x is a list/string containing every possible turn (couldn't think of a better word) of the cube.

Most efficient algorithm to find the biggest square in a two dimension map [duplicate]

This question already has answers here:
Dynamic programming - Largest square block
(7 answers)
Closed 1 year ago.
I would like to know the different algorithms to find the biggest square in a two dimensions map dotted with obstacles.
An example, where o would be obstacles:
...........................
....o......................
............o..............
...........................
....o......................
...............o...........
...........................
......o..............o.....
..o.......o................
The biggest square would be (if we choose the first one):
.....xxxxxxx...............
....oxxxxxxx...............
.....xxxxxxxo..............
.....xxxxxxx...............
....oxxxxxxx...............
.....xxxxxxx...o...........
.....xxxxxxx...............
......o..............o.....
..o.......o................
What would be the fastest algorithm to find it? The one with the smallest complexity?
EDIT: I know that people are interested on the algorithm explained in the accepted answer, so I made a document that explains it a bit more, you can find it here:
https://docs.google.com/document/d/19pHCD433tYsvAor0WObxa2qusAjKdx96kaf3z5I8XT8/edit?usp=sharing
Here is how to do this in the optimal amount of time, O(nm). This is built on top of #dukeling's insight that you never need to check a solution of size less than your current known best solution.
The key is to be able to build a data structure that can answer this query in O(1) time.
Is there an obstacle in the square whose top left corner is at r, c and has size k?
To solve that problem, we'll support answering a slightly harder question, also in O(1).
What is the count of items in the rectangle from r1, c1 to r2, c2?
It's easy to answer the square existence question with an answer from the rectangle count question.
To answer the rectangle count question, note that if you had pre-computed the answer for every rectangle that starts in the top left, then you could answer the general question for from r1, c1 to r2, c2 by a kind of clever/inclusion exclusion tactic using only rectangles that start in the top left
c1 c2
-----------------------
| | | |
| A | B | |
|_____________|____| | r1
| | | |
| C | D | |
|_____________|____| | r2
|_____________________|
We want the count of stuff inside D. In terms of our pre-computed counts from the top left.
Count(D) = Count(A ∪ B ∪ C ∪ D) - Count(A ∪ C) - Count(A ∪ B) + Count(A)
You can pre-compute all the top left rectangles in O(nm) by doing some clever row/column partial sums, but I'll leave that to you.
Then to answer the to the problem you want just involves checking possible solutions, starting with solutions that are at least as good as your known best. Your known best will only get better up to min(n, m) times total, so the best_possible increment will happen very rarely and almost all squares will be rejected in O(1) time.
best_possible = 0
for r in range(n):
for c in range(m):
while True:
# this looks O(min(n, m)), but it's amortized O(1) since best_possible
# rarely increased.
if possible(r, c, best_possible+1):
best_possible += 1
else:
break
One idea, making use of binary search.
The basic idea:
Start off in the top-left corner. See if a 1x1 square would work.
If it will work, increase the sides lengths of the square by 1 and repeat.
If it won't work, move right and repeat. If you've reached the right-most position, move to the next line.
The native approach:
We can simply check every possible cell of every square at each step, but this is fairly inefficient.
The optimized approach:
When increasing the square size, we can just do a binary search over the next row and column to see if that row / column contains an obstacle at any of those positions.
When moving to the right, we can do a binary search for each next column to determine if that column contains an obstacle at any of those positions.
When moving down, we can do a similar binary on each of the columns in the target position.
Implementation note:
To start off, we'd need to go through all the rows and columns and set up arrays containing the positions of the obstacles for each of them, which we can use for the binary searches.
Running time:
We do 2 binary searches to increase the square size, and the square size is maximum the size of the grid, so that is fairly small (O(min(m,n) log max(m,n))) and gets dominated by the below.
Beyond that, for each position, we do a single binary search on a column.
So, for a grid with m columns and n rows, the overall complexity is O(mn log m).
But note how little we're actually searching below when the grid is sparse.
Example:
For your example:
012345678901234567890123456
0...........................
1....o......................
2............o..............
3...........................
4....o......................
5...............o...........
6...........................
7......o..............o.....
8..o.......o................
We'd first try a 1x1 square in the top-left corner, which works.
Then a 2x2 square. For this, we do a binary search for the range [0,1] on the row 1, which can be represented simply by {4} - an array of a single position corresponding to where the obstacle is. And we also do a binary search for the range [0,1] on the column 1, which contains no obstacles, thus an empty array - {}.
Then a 3x3 square. For this, we do a binary search for [0,2] on the row 2, which contains 1 obstacles at position 12, thus {12}. And we also do a binary search for [0,2] on the column 2, which contains an obstacle at position 8, thus {8}.
Then a 4x4 square. For this, we do a binary search for [0,3] on the row 3 - {}. And for [0,3] on column 3 - {}.
Then a 5x5 square. For this, we do a binary search for [0,4] on the row 4 - {4}. And for [0,4] column 4 - {1,4}.
Here is the first one we actually find. In the range [0,4], we find 4 in both the row and the column (we only really need to find one of them). So this indicates a fail.
From here we do a binary search on column 4 (again - not really necessary) for [0,4]. Then binary search columns 5-8 for [0,4], none of them found, so a square starting at position 5,0 is the next possible candidate.
So from here we try to increase the square size to 5x5, which works, then 6x6 and 7x7, which works.
Then we try 8x8, which doesn't work.
And so on.
I know binary search, but how does yours work?
So we're basically doing a range search within a set of values. This is fairly easy to do. First search for the starting value of the range, then the end value. If we get to the same point, there are no values in the range.
We don't really care what values exist in the range, just whether or not there are any.
So here's one rough approach.
Store the x-y positions of all the obstacles.
For each obstacle O
find obstacle C that is nearest to it column-wise.
find obstacle R-top that is nearest to it row-wise from the top.
find obstacle R-bottom that is nearest to it row-wise from the bottom.
if (|R-top.y - R-bottom.y| != |O.x - C.x|) continue
Size of the square = Abs((R-top.y - R-bottom.y) * (O.x - C.x))
Keep track of the sizes and positions to find the largest square
Complexity is roughly O(k^2) where k is the number of obstacles. You could reduce it to O(k * log k) if you use binary search.
The following SO articles are identical/similar to the problem you're trying to solve. You may want to look over those answers as well as the responses to your question.
Dynamic programming - Largest square block
dynamic programming: finding largest non-overlapping squares
Dynamic programming: Find largest diamond (rhombus)
Here's the baseline case I'd use, written in simplified Python/pseudocode.
# obstacleMap is a list of list of MapElements, stored in row-major order
max([find_largest_rect(obstacleMap, element) for row in obstacleMap for element in row])
def find_largest_rect(obstacleMap, upper_left_elem):
size = 0
while not has_obstacles(obstacleMap, upper_left_elem, size+1):
size += 1
return size
def has_obstacles(obstacleMap, upper_left_elem, size):
#determines if there are obstacles on the on outside square layer
#for example, if U is the upper left element and size=3, then has_obstacles checks the elements marked p.
# .....
# ..U.p
# ....p
# ..ppp
periphery_row = obstacleMap[upper_left_elem.row][upper_left_elem.col:upper_left_elem.col+size]
periphery_col = [row[upper_left_elem.col+size] for row in obstacleMap[upper_left_elem.row:upper_left_elem.row+size]
return any(is_obstacle(elem) for elem in periphery_row + periphery_col)
def is_obstacle(elem):
return elem.value == 'o'
class MapElement(object):
def __init__(self, row, col, value):
self.row = row
self.col = col
self.value = value
here is an approach using recurrence relation :-
isSquare(R,C1,C2) = noObstacle(R,C1,R,C2) && noObstacle(R,C2,R-(C2-C1),C2) && isSquare(R-1,C1,C2-1)
isSquare(R,C1,C2) = square that has bottom side (R,C1) to (R,C2)
noObstacle(R1,C1,R2,C2) = checks whether there is no obstacle in line segment (R1,C1) to (R2,C2)
Find Max (C2-C1+1) which where isSquare(R,C1,C2) = true
You can use dynamic programming to solve this problem in polynomial time. Use suitable data structure for searching obstacle.

Compare two arrays of points [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm trying to find a way to find similarities in two arrays of different points. I drew circles around points that have similar patterns and I would like to do some kind of auto comparison in intervals of let's say 100 points and tell what coefficient of similarity is for that interval. As you can see it might not be perfectly aligned also so point-to-point comparison would not be a good solution also (I suppose). Patterns that are slightly misaligned could also mean that they are matching the pattern (but obviously with a smaller coefficient)
What similarity could mean (1 coefficient is a perfect match, 0 or less - is not a match at all):
Points 640 to 660 - Very similar (coefficient is ~0.8)
Points 670 to 690 - Quite similar (coefficient is ~0.5-~0.6)
Points 720 to 780 - Let's say quite similar (coefficient is ~0.5-~0.6)
Points 790 to 810 - Perfectly similar (coefficient is 1)
Coefficient is just my thoughts of how a final calculated result of comparing function could look like with given data.
I read many posts on SO but it didn't seem to solve my problem. I would appreciate your help a lot. Thank you
P.S. Perfect answer would be the one that provides pseudo code for function which could accept two data arrays as arguments (intervals of data) and return coefficient of similarity.
Click here to see original size of image
I also think High Performance Mark has basically given you the answer (cross-correlation). In my opinion, most of the other answers are only giving you half of what you need (i.e., dot product plus compare against some threshold). However, this won't consider a signal to be similar to a shifted version of itself. You'll want to compute this dot product N + M - 1 times, where N, M are the sizes of the arrays. For each iteration, compute the dot product between array 1 and a shifted version of array 2. The amount you shift array 2 increases by one each iteration. You can think of array 2 as a window you are passing over array 1. You'll want to start the loop with the last element of array 2 only overlapping the first element in array 1.
This loop will generate numbers for different amounts of shift, and what you do with that number is up to you. Maybe you compare it (or the absolute value of it) against a threshold that you define to consider two signals "similar".
Lastly, in many contexts, a signal is considered similar to a scaled (in the amplitude sense, not time-scaling) version of itself, so there must be a normalization step prior to computing the cross-correlation. This is usually done by scaling the elements of the array so that the dot product with itself equals 1. Just be careful to ensure this makes sense for your application numerically, i.e., integers don't scale very well to values between 0 and 1 :-)
i think HighPerformanceMarks's suggestion is the standard way of doing the job.
a computationally lightweight alternative measure might be a dot product.
split both arrays into the same predefined index intervals.
consider the array elements in each intervals as vector coordinates in high-dimensional space.
compute the dot product of both vectors.
the dot product will not be negative. if the two vectors are perpendicular in their vector space, the dot product will be 0 (in fact that's how 'perpendicular' is usually defined in higher dimensions), and it will attain its maximum for identical vectors.
if you accept the geometric notion of perpendicularity as a (dis)similarity measure, here you go.
caveat:
this is an ad hoc heuristic chosen for computational efficiency. i cannot tell you about mathematical/statistical properties of the process and separation properties - if you need rigorous analysis, however, you'll probably fare better with correlation theory anyway and should perhaps forward your question to math.stackexchange.com.
My Attempt:
Total_sum=0
1. For each index i in the range (m,n)
2. sum=0
3. k=Array1[i]*Array2[i]; t1=magnitude(Array1[i]); t2=magnitude(Array2[i]);
4. k=k/(t1*t2)
5. sum=sum+k
6. Total_sum=Total_sum+sum
Coefficient=Total_sum/(m-n)
If all values are equal, then sum would return 1 in each case and total_sum would return (m-n)*(1). Hence, when the same is divided by (m-n) we get the value as 1. If the graphs are exact opposites, we get -1 and for other variations a value between -1 and 1 is returned.
This is not so efficient when the y range or the x range is huge. But, I just wanted to give you an idea.
Another option would be to perform an extensive xnor.
1. For each index i in the range (m,n)
2. sum=1
3. k=Array1[i] xnor Array2[i];
4. k=k/((pow(2,number_of_bits))-1) //This will scale k down to a value between 0 and 1
5. sum=(sum+k)/2
Coefficient=sum
Is this helpful ?
You can define a distance metric for two vectors A and B of length N containing numbers in the interval [-1, 1] e.g. as
sum = 0
for i in 0 to 99:
d = (A[i] - B[i])^2 // this is in range 0 .. 4
sum = (sum / 4) / N // now in range 0 .. 1
This now returns distance 1 for vectors that are completely opposite (one is all 1, another all -1), and 0 for identical vectors.
You can translate this into your coefficient by
coeff = 1 - sum
However, this is a crude approach because it does not take into account the fact that there could be horizontal distortion or shift between the signals you want to compare, so let's look at some approaches for coping with that.
You can sort both your arrays (e.g. in ascending order) and then calculate the distance / coefficient. This returns more similarity than the original metric, and is agnostic towards permutations / shifts of the signal.
You can also calculate the differentials and calculate distance / coefficient for those, and then you can do that sorted also. Using differentials has the benefit that it eliminates vertical shifts. Sorted differentials eliminate horizontal shift but still recognize different shapes better than sorted original data points.
You can then e.g. average the different coefficients. Here more complete code. The routine below calculates coefficient for arrays A and B of given size, and takes d many differentials (recursively) first. If sorted is true, the final (differentiated) array is sorted.
procedure calc(A, B, size, d, sorted):
if (d > 0):
A' = new array[size - 1]
B' = new array[size - 1]
for i in 0 to size - 2:
A'[i] = (A[i + 1] - A[i]) / 2 // keep in range -1..1 by dividing by 2
B'[i] = (B[i + 1] - B[i]) / 2
return calc(A', B', size - 1, d - 1, sorted)
else:
if (sorted):
A = sort(A)
B = sort(B)
sum = 0
for i in 0 to size - 1:
sum = sum + (A[i] - B[i]) * (A[i] - B[i])
sum = (sum / 4) / size
return 1 - sum // return the coefficient
procedure similarity(A, B, size):
sum a = 0
a = a + calc(A, B, size, 0, false)
a = a + calc(A, B, size, 0, true)
a = a + calc(A, B, size, 1, false)
a = a + calc(A, B, size, 1, true)
return a / 4 // take average
For something completely different, you could also run Fourier transform using FFT and then take a distance metric on the returning spectra.

Simple Weighted Random Walk with Hysteresis

I've already written a solution for this, but it doesn't feel "right", so I'd like some input from others.
The rules are:
Movement is on a 2D grid (Directions arbitrarily labelled N, NE, E, SE, S, SW, W, NW)
Probabilities of moving in a given direction are relative to the direction of travel (i.e. 40% represents ahead), and weighted:
[14%][40%][14%]
[ 8%][ 4%][ 8%]
[ 4%][ 4%][ 4%]
This means with overwhelming probability, travel will continue along its current trajectory. The middle value represents stopping. As an example, if the last move was NW, then the absolute probabilities would read:
[40%][14%][ 8%]
[14%][ 4%][ 4%]
[ 8%][ 4%][ 4%]
The probabilities are approximate - one thing I toyed with was making stopped a static 5% chance outside of the main calculation, which would have altered the probability of any other operation ever so slightly.
My current algorithm is as follows (in simplified pseudocode):
int[] probabilities = [4,40,14,8,4,4,4,8,14]
if move.previous == null:
move.previous = STOPPED
if move.previous != STOPPED:
// Cycle probabilities[1:8] array until indexof(move.previous) = 40%
r = Random % 99
if r < probabilities.sum[0:0]:
move.current = STOPPED
elif r < probabilities.sum[0:1]:
move.current = NW
elif r < probabilities.sum[0:2]:
move.current = NW
...
Reasons why I really dislike this method:
* It forces me to assign specific roles to array indices: [0] = stopped, [1] = North...
* It forces me to operate on a subset of the array when cycling (i.e. STOPPED always remains in place)
* It's very iterative, and therefore, slow. It has to check every value in turn until it gets to the right one. Cycling the array requires up to 4 operations.
* A 9-case if-block (most languages do not allow dynamic switches).
* Stopped has to be special cased in everything.
Things I have considered:
* Circular linked list: Simplifies the cycling (make the pivot always equal north) but requires maintaining a set of pointers, and still involves assigning roles to specific indices.
* Vectors: Really not sure how I'd go about weighting this, plus I'd need to worry about magnitude.
* Matrices: Rotating matrices does not work like that :)
* Use a well-known random walk algorithm: Overkill? Though recommendations are considered.
* Trees: Just thought of this, so no real thought given to it...
So. Does anyone have any bright ideas?
8You have 8 directions and when you hit some direction you have to "rotate this matrix"
But this is just modulo over finite field.
Since you have only 100 integers to pick probability from, you can just putt all integers in list and value from each integers points to index of your direction.
This direction you rotate (modulo addition) in way that it points to move that you have to make.
And than you have one array that have difference that you have to apply to your move.
somethihing like that.
40 numbers 14 numbers 8 numbers
int[100] probab={0,0,0,0,0,0,....,1,1,1,.......,2,2,2,...};
and then
N NE E SE STOP
int[9] next_move={{0,1},{ 1,1},{1,1},{1,-1}...,{0,0}}; //in circle
So you pick
move=probab[randint(100)]
if(move != 8)//if 8 you got stop
{
move=(prevous_move+move)%8;
}
move_x=next_move[move][0];
move_y=next_move[move][1];
Use a more direct representation of direction in your algorithms, something like a (dx, dy) pair, for example.
This allows you to move by just having x += dx; y += dy;
(You can still use the "direction ENUM" + a lookup table if you wish...)
Your next problem is finding a good representation of the "probability table". Since r only ranges from 1 to 99 it might be feasible to just do a dumb array and use prob_table[r] directly.
Then, compute a 3x3 matrix of these probability tables using the method of your choice. It doesn't matter if it is slow because you only do it once.
To get the next direction simply
prob_table = dir_table[curr_dx][curr_dy];
(curr_dx, curr_dy) = get_next_dir(prob_table, random_number());

Resources