Simple Weighted Random Walk with Hysteresis - algorithm

I've already written a solution for this, but it doesn't feel "right", so I'd like some input from others.
The rules are:
Movement is on a 2D grid (Directions arbitrarily labelled N, NE, E, SE, S, SW, W, NW)
Probabilities of moving in a given direction are relative to the direction of travel (i.e. 40% represents ahead), and weighted:
[14%][40%][14%]
[ 8%][ 4%][ 8%]
[ 4%][ 4%][ 4%]
This means with overwhelming probability, travel will continue along its current trajectory. The middle value represents stopping. As an example, if the last move was NW, then the absolute probabilities would read:
[40%][14%][ 8%]
[14%][ 4%][ 4%]
[ 8%][ 4%][ 4%]
The probabilities are approximate - one thing I toyed with was making stopped a static 5% chance outside of the main calculation, which would have altered the probability of any other operation ever so slightly.
My current algorithm is as follows (in simplified pseudocode):
int[] probabilities = [4,40,14,8,4,4,4,8,14]
if move.previous == null:
move.previous = STOPPED
if move.previous != STOPPED:
// Cycle probabilities[1:8] array until indexof(move.previous) = 40%
r = Random % 99
if r < probabilities.sum[0:0]:
move.current = STOPPED
elif r < probabilities.sum[0:1]:
move.current = NW
elif r < probabilities.sum[0:2]:
move.current = NW
...
Reasons why I really dislike this method:
* It forces me to assign specific roles to array indices: [0] = stopped, [1] = North...
* It forces me to operate on a subset of the array when cycling (i.e. STOPPED always remains in place)
* It's very iterative, and therefore, slow. It has to check every value in turn until it gets to the right one. Cycling the array requires up to 4 operations.
* A 9-case if-block (most languages do not allow dynamic switches).
* Stopped has to be special cased in everything.
Things I have considered:
* Circular linked list: Simplifies the cycling (make the pivot always equal north) but requires maintaining a set of pointers, and still involves assigning roles to specific indices.
* Vectors: Really not sure how I'd go about weighting this, plus I'd need to worry about magnitude.
* Matrices: Rotating matrices does not work like that :)
* Use a well-known random walk algorithm: Overkill? Though recommendations are considered.
* Trees: Just thought of this, so no real thought given to it...
So. Does anyone have any bright ideas?

8You have 8 directions and when you hit some direction you have to "rotate this matrix"
But this is just modulo over finite field.
Since you have only 100 integers to pick probability from, you can just putt all integers in list and value from each integers points to index of your direction.
This direction you rotate (modulo addition) in way that it points to move that you have to make.
And than you have one array that have difference that you have to apply to your move.
somethihing like that.
40 numbers 14 numbers 8 numbers
int[100] probab={0,0,0,0,0,0,....,1,1,1,.......,2,2,2,...};
and then
N NE E SE STOP
int[9] next_move={{0,1},{ 1,1},{1,1},{1,-1}...,{0,0}}; //in circle
So you pick
move=probab[randint(100)]
if(move != 8)//if 8 you got stop
{
move=(prevous_move+move)%8;
}
move_x=next_move[move][0];
move_y=next_move[move][1];

Use a more direct representation of direction in your algorithms, something like a (dx, dy) pair, for example.
This allows you to move by just having x += dx; y += dy;
(You can still use the "direction ENUM" + a lookup table if you wish...)
Your next problem is finding a good representation of the "probability table". Since r only ranges from 1 to 99 it might be feasible to just do a dumb array and use prob_table[r] directly.
Then, compute a 3x3 matrix of these probability tables using the method of your choice. It doesn't matter if it is slow because you only do it once.
To get the next direction simply
prob_table = dir_table[curr_dx][curr_dy];
(curr_dx, curr_dy) = get_next_dir(prob_table, random_number());

Related

What is a greedy algorithm for this problem that is minimally optimal + proof?

The details are a bit cringe, fair warning lol:
I want to set up meters on the floor of my building to catch someone; assume my floor is a number line from 0 to length L. The specific type of meter I am designing has a radius of detection that is 4.7 meters in the -x and +x direction (diameter of 9.4 meters of detection). I want to set them up in such a way that if the person I am trying to find steps foot anywhere in the floor, I will know. However, I can't just setup a meter anywhere (it may annoy other residents); therefore, there are only n valid locations that I can setup a meter. Additionally, these meters are expensive and time consuming to make, so I would like to use as few as possible.
For simplicity, you can assume the meter has 0 width, and that each valid location is just a point on the number line aformentioned. What is a greedy algorithm that places as few meters as possible, while being able to detect the entire hallway of length L like I want it to, or, if detecting the entire hallway is not possible, will output false for the set of n locations I have (and, if it isn't able to detect the whole hallway, still uses as few meters as possible while attempting to do so)?
Edit: some clarification on being able to detect the entire hallway or not
Given:
L (hallway length)
a list of N valid positions to place a meter (p_0 ... p_N-1) of radius 4.7
You can determine in O(N) either a valid and minimal ("good") covering of the whole hallway or a proof that no such covering exists given the constraints as follows (pseudo-code):
// total = total length;
// start = current starting position, initially 0
// possible = list of possible meter positions
// placed = list of (optimal) meter placements, initially empty
boolean solve(float total, float start, List<Float> possible, List<Float> placed):
if (total-start <= 0):
return true; // problem solved with no additional meters - woo!
else:
Float next = extractFurthestWithinRange(start, possible, 4.7);
if (next == null):
return false; // no way to cover end of hall: report failure
else:
placed.add(next); // placement decided
return solve(total, next + 4.7, possible, placed);
Where extractFurthestWithinRange(float start, List<Float> candidates, float range) returns null if there are no candidates within range of start, or returns the last position p in candidates such that p <= start + range -- and also removes p, and all candidates c such that p >= c.
The key here is that, by always choosing to place a meter in the next position that a) leaves no gaps and b) is furthest from the previously-placed position we are simultaneously creating a valid covering (= no gaps) and an optimal covering (= no possible valid covering could have used less meters - because our gaps are already as wide as possible). At each iteration, we either completely solve the problem, or take a greedy bite to reduce it to a (guaranteed) smaller problem.
Note that there can be other optimal coverings with different meter positions, but they will use the exact same number of meters as those returned from this pseudo-code. For example, if you adapt the code to start from the end of the hallway instead of from the start, the covering would still be good, but the gaps could be rearranged. Indeed, if you need the lexicographically minimal optimal covering, you should use the adapted algorithm that places meters starting from the end:
// remaining = length (starts at hallway length)
// possible = positions to place meters at, starting by closest to end of hallway
// placed = positions where meters have been placed
boolean solve(float remaining, List<Float> possible, Queue<Float> placed):
if (remaining <= 0):
return true; // problem solved with no additional meters - woo!
else:
// extracts points p up to and including p such that p >= remaining - range
Float next = extractFurthestWithinRange2(remaining, possible, 4.7);
if (next == null):
return false; // no way to cover start of hall: report failure
else:
placed.add(next); // placement decided
return solve(next - 4.7, possible, placed);
To prove that your solution is optimal if it is found, you merely have to prove that it finds the lexicographically last optimal solution.
And you do that by induction on the size of the lexicographically last optimal solution. The case of a zero length floor and no monitor is trivial. Otherwise you demonstrate that you found the first element of the lexicographically last solution. And covering the rest of the line with the remaining elements is your induction step.
Technical note, for this to work you have to be allowed to place monitoring stations outside of the line.

Genetic/Evolutionary algorithm - Painter

My task:
Create a program to copy a picture (given as input) using primitives only (like triangle or something). The program should use evolutionary algorithm to create output picture.
My question:
I need to invent an algorithm to create populations and check them (how much - in % - they match the input picture).
I have an idea; you can find it below.
So what I want from you: advice (if you find my idea not so bad) or inspiration (maybe you have a better idea?)
My idea:
Let's say that I'll use only triangles to build the output picture.
My first population is P pictures (generated by using T randomly generated triangles - called Elements).
I check by my fitness function every pictures in population and choose E of them as elite and rest of population just remove:
To compare 2 pictures we check every pixel in picture A and compare his R,G,B with
the same pixel (the same coordinates) in picture B.
I use this:
SingleDif = sqrt[ (Ar - Br)^2 + (Ag - Bg)^2 + (Ab - Bb)^2]
then i sum all differences (from all pixels) - lets call it SumDif
and use:
PictureDif = (DifMax - SumDif)/DifMax
where
DifMax = pictureHeight * pictureWidth * 255*3
The best are used to create the next population in this way:
picture MakeChild(picture Mother, picture Father)
{
picture child;
for( int i = 0; i < T; ++i )
{
j //this is a random number from 0 to 1 - created now
if( j < 0.5 ) child.element(i) = Mother.element(i);
else child.element(i) = Father.element(i)
if( j < some small % ) mutate( child.element(i) );
}
return child;
}
So it's quite simple. Only the mutation needs a comment: So there is always some small probability that element X in child will be different than X in his parent. To do this we make random changes in element in child (change his colour by random number, or add random number to his (x,y) coordinate - or his node).
So this is my idea... I didn't test it, didn't code it.
Please check my idea - what do you think about it?
I would make the number of patches of each child dynamic and get the mutation operation to insert/delete patches with some (low) probability. Of course this could result in a lot of redundancy and bloat in the child's genome. In these situations, it is usually a good idea to use the length of an individual's genome as a parameter of the fitness function so that individuals get rewarded (with a higher fitness value) for using fewer patches. So for example if the PictureDif of individuals A and B are the same but the A has fewer patches than B, then A has a higher fitness.
Another issue is the reproductive operator that you proposed (namely, the crossover operation). In order for the evolutionary process to work efficiently, you need to achieve a reasonable exploration and exploitation balance. One way of doing this is by having a set of reproductive operators that exhibit a good fitness correlation [1] which means the fitness of a child must be close to the fitness of its parent(s).
In the case of single parent reproduction you only need to find the right mutation parameters. However, when it comes to multi-parent reproduction (crossover) one of the frequently used techniques is to produce 2 children (instead of 1) from the same 2 parents. For the first child, each gene comes from the mother with the probability of 0.2 and from the father with the probability of 0.8, and for the second child the other way around. Of course after the crossover, you can do the mutation.
Oh and one more thing, for the mutation operators, when you say
... make random changes in element in child (change his colour by random number, or add random number to his (x,y) coordinate - or his node)
it's a good idea to use a Gaussian distribution to change the colour, coordinate etc.
[1] Evolutionary Computation: A unified approach by Kenneth A. De Jong, page 69

matlab: optimum amount of points for linear fit

I want to make a linear fit to few data points, as shown on the image. Since I know the intercept (in this case say 0.05), I want to fit only points which are in the linear region with this particular intercept. In this case it will be lets say points 5:22 (but not 22:30).
I'm looking for the simple algorithm to determine this optimal amount of points, based on... hmm, that's the question... R^2? Any Ideas how to do it?
I was thinking about probing R^2 for fits using points 1 to 2:30, 2 to 3:30, and so on, but I don't really know how to enclose it into clear and simple function. For fits with fixed intercept I'm using polyfit0 (http://www.mathworks.com/matlabcentral/fileexchange/272-polyfit0-m) . Thanks for any suggestions!
EDIT:
sample data:
intercept = 0.043;
x = 0.01:0.01:0.3;
y = [0.0530642513911393,0.0600786706929529,0.0673485248329648,0.0794662409166333,0.0895915873196170,0.103837395346484,0.107224784565365,0.120300492775786,0.126318699218730,0.141508831492330,0.147135757370947,0.161734674733680,0.170982455701681,0.191799936622712,0.192312642057298,0.204771365716483,0.222689541632988,0.242582251060963,0.252582727297656,0.267390860166283,0.282890010610515,0.292381165948577,0.307990544720676,0.314264952297699,0.332344368808024,0.355781519885611,0.373277721489254,0.387722683944356,0.413648156978284,0.446500064130389;];
What you have here is a rather difficult problem to find a general solution of.
One approach would be to compute all the slopes/intersects between all consecutive pairs of points, and then do cluster analysis on the intersepts:
slopes = diff(y)./diff(x);
intersepts = y(1:end-1) - slopes.*x(1:end-1);
idx = kmeans(intersepts, 3);
x([idx; 3] == 2) % the points with the intersepts closest to the linear one.
This requires the statistics toolbox (for kmeans). This is the best of all methods I tried, although the range of points found this way might have a few small holes in it; e.g., when the slopes of two points in the start and end range lie close to the slope of the line, these points will be detected as belonging to the line. This (and other factors) will require a bit more post-processing of the solution found this way.
Another approach (which I failed to construct successfully) is to do a linear fit in a loop, each time increasing the range of points from some point in the middle towards both of the endpoints, and see if the sum of the squared error remains small. This I gave up very quickly, because defining what "small" is is very subjective and must be done in some heuristic way.
I tried a more systematic and robust approach of the above:
function test
%% example data
slope = 2;
intercept = 1.5;
x = linspace(0.1, 5, 100).';
y = slope*x + intercept;
y(1:12) = log(x(1:12)) + y(12)-log(x(12));
y(74:100) = y(74:100) + (x(74:100)-x(74)).^8;
y = y + 0.2*randn(size(y));
%% simple algorithm
[X,fn] = fminsearch(#(ii)P(ii, x,y,intercept), [0.5 0.5])
[~,inds] = P(X, y,x,intercept)
end
function [C, inds] = P(ii, x,y,intercept)
% ii represents fraction of range from center to end,
% So ii lies between 0 and 1.
N = numel(x);
n = round(N/2);
ii = round(ii*n);
inds = min(max(1, n+(-ii(1):ii(2))), N);
% Solve linear system with fixed intercept
A = x(inds);
b = y(inds) - intercept;
% and return the sum of squared errors, divided by
% the number of points included in the set. This
% last step is required to prevent fminsearch from
% reducing the set to 1 point (= minimum possible
% squared error).
C = sum(((A\b)*A - b).^2)/numel(inds);
end
which only finds a rough approximation to the desired indices (12 and 74 in this example).
When fminsearch is run a few dozen times with random starting values (really just rand(1,2)), it gets more reliable, but I still wouln't bet my life on it.
If you have the statistics toolbox, use the kmeans option.
Depending on the number of data values, I would split the data into a relative small number of overlapping segments, and for each segment calculate the linear fit, or rather the 1-st order coefficient, (remember you know the intercept, which will be same for all segments).
Then, for each coefficient calculate the MSE between this hypothetical line and entire dataset, choosing the coefficient which yields the smallest MSE.

Algorithm to identify a unique free polyomino (or polyomino hash)

In short: How to hash a free polyomino?
This could be generalized into: How to efficiently hash an arbitrary collection of 2D integer coordinates, where a set contains unique pairs of non-negative integers, and a set is considered unique if and only if no translation, rotation, or flip can map it identically to another set?
For impatient readers, please note I'm fully aware of a brute force approach. I'm looking for a better way -- or a very convincing proof that no other way can exist.
I'm working on some different algorithms to generate random polyominos. I want to test their output to determine how random they are -- i.e. are certain instances of a given order generated more frequently than others. Visually, it is very easy to identify different orientations of a free polyomino, for example the following Wikipedia illustration shows all 8 orientations of the "F" pentomino (Source):
How would one put a number on this polyomino - that is, hash a free polyomino? I don't want to depend on a prepolulated list of "named" polyominos. Broadly agreed-upon names only exists for orders 4 and 5, anyway.
This is not necessarily equavalent to enumerating all free (or one-sided, or fixed) polyominos of a given order. I only want to count the number of times a given configuration appears. If a generating algorithm never produces a certain polyomino it will simply not be counted.
The basic logic of the counting is:
testcount = 10000 // Arbitrary
order = 6 // Create hexominos in this test
hashcounts = new hashtable
for i = 1 to testcount
poly = GenerateRandomPolyomino(order)
hash = PolyHash(poly)
if hashcounts.contains(hash) then
hashcounts[hash]++
else
hashcounts[hash] = 1
What I'm looking for is an efficient PolyHash algorithm. The input polyominos are simply defined as a set of coordinates. One orientation of the T tetronimo could be, for example:
[[1,0], [0,1], [1,1], [2,1]]:
|012
-+---
0| X
1|XXX
You can assume that that input polyomino will already be normalized to be aligned against the X and Y axes and have only positive coordinates. Formally, each set:
Will have at least 1 coordinate where the x value is 0
Will have at least 1 coordinate where the y value is 0
Will not have any coordinates where x < 0 or y < 0
I'm really looking for novel algorithms that avoid the increasing number of integer operations required by a general brute force approach, described below.
Brute force
A brute force solution suggested here and here consists of hashing each set as an unsigned integer using each coordinate as a binary flag, and taking the minimum hash of all possible rotations (and in my case flips), where each rotation / flip must also be translated to the origin. This results in a total of 23 set operations for each input set to get the "free" hash:
Rotate (6x)
Flip (1x)
Translate (7x)
Hash (8x)
Find minimum of computed hashes (1x)
Where the sequence of operations to obtain each hash is:
Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Flip, Translate, Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Well, I came up with a completely different approach. (Also thanks to corsiKa for some helpful insights!) Rather than hashing / encoding the squares, encode the path around them. The path consists of a sequence of 'turns' (including no turn) to perform before drawing each unit segment. I think an algorithm for getting the path from the coordinates of the squares is outside the scope of this question.
This does something very important: it destroys all location and orientation information, which we don't need. It is also very easy to get the path of the flipped object: you do so by simply reversing the order of the elements. Storage is compact because each element requires only 2 bits.
It does introduce one additional constraint: the polyomino must not have fully enclosed holes. (Formally, it must be simply connected.) Most discussions of polyominos consider a hole to exist even if it is sealed only by two touching corners, as this prevents tiling with any other non-trivial polyomino. Tracing the edges is not hindered by touching corners (as in the single heptomino with a hole), but it cannot leap from one outer loop to an inner one as in the complete ring-shaped octomino:
It also produces one additional challenge: finding the minumum ordering of the encoded path loop. This is because any rotation of the path (in the sense of string rotation) is a valid encoding. To always get the same encoding we have to find the minimal (or maximal) rotation of the path instructions. Thankfully this problem has already been solved: see for example http://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation.
Example:
If we arbitrarily assign the following values to the move operations:
No turn: 1
Turn right: 2
Turn left: 3
Here is the F pentomino traced clockwise:
An arbitrary initial encoding for the F pentomino is (starting at the bottom right corner):
2,2,3,1,2,2,3,2,2,3,2,1
The resulting minimum rotation of the encoding is
1,2,2,3,1,2,2,3,2,2,3,2
With 12 elements, this loop can be packed into 24 bits if two bits are used per instruction or only 19 bits if instructions are encoded as powers of three. Even with the 2-bit element encoding can easily fit that in a single unsigned 32 bit integer 0x6B6BAE:
1- 2- 2- 3- 1- 2- 2- 3- 2- 2- 3- 2
= 01-10-10-11-01-10-10-11-10-10-11-10
= 00000000011010110110101110101110
= 0x006B6BAE
The base-3 encoding with the start of the loop in the most significant powers of 3 is 0x5795F:
1*3^11 + 2*3^10 + 2*3^9 + 3*3^8 + 1*3^7 + 2*3^6
+ 2*3^5 + 3*3^4 + 2*3^3 + 2*3^2 + 3*3^1 + 2*3^0
= 0x0005795F
The maximum number of vertexes in the path around a polyomino of order n is 2n + 2. For 2-bit encoding the number of bits is twice the number of moves, so the maximum bits needed is 4n + 4. For base-3 encoding it's:
Where the "gallows" is the ceiling function. Accordingly any polyomino up to order 9 can be encoded in a single 32 bit integer. Knowing this you can choose your platform-specific data structure accordingly for the fastest hash comparison given the maximum order of the polyominos you'll be hashing.
You can reduce it down to 8 hash operations without the need to flip, rotate, or re-translate.
Note that this algorithm assumes you are operating with coordinates relative to itself. That is to say it's not in the wild.
Instead of applying operations that flip, rotate, and translate, instead simply change the order in which you hash.
For instance, let us take the F pent above. In the simple example, let us presume the hash operation was something like this:
int hashPolySingle(Poly p)
int hash = 0
for x = 0 to p.width
fory = 0 to p.height
hash = hash * 31 + p.contains(x,y) ? 1 : 0
hashPolySingle = hash
int hashPoly(Poly p)
int hash = hashPolySingle(p)
p.rotateClockwise() // assume it translates inside
hash = hash * 31 + hashPolySingle(p)
// keep rotating for all 4 oritentations
p.flip()
// hash those 4
Instead of applying the function to all 8 different orientations of the poly, I would apply 8 different hash functions to 1 poly.
int hashPolySingle(Poly p, bool flip, int corner)
int hash = 0
int xstart, xstop, ystart, ystop
bool yfirst
switch(corner)
case 1: xstart = 0
xstop = p.width
ystart = 0
ystop = p.height
yfirst = false
break
case 2: xstart = p.width
xstop = 0
ystart = 0
ystop = p.height
yfirst = true
break
case 3: xstart = p.width
xstop = 0
ystart = p.height
ystop = 0
yfirst = false
break
case 4: xstart = 0
xstop = p.width
ystart = p.height
ystop = 0
yfirst = true
break
default: error()
if(flip) swap(xstart, xstop)
if(flip) swap(ystart, ystop)
if(yfirst)
for y = ystart to ystop
for x = xstart to xstop
hash = hash * 31 + p.contains(x,y) ? 1 : 0
else
for x = xstart to xstop
for y = ystart to ystop
hash = hash * 31 + p.contains(x,y) ? 1 : 0
hashPolySingle = hash
Which is then called in the 8 different ways. You could also encapsulate hashPolySingle in for loop around the corner, and around the flip or not. All the same.
int hashPoly(Poly p)
// approach from each of the 4 corners
int hash = hashPolySingle(p, false, 1)
hash = hash * 31 + hashPolySingle(p, false, 2)
hash = hash * 31 + hashPolySingle(p, false, 3)
hash = hash * 31 + hashPolySingle(p, false, 4)
// flip it
hash = hash * 31 + hashPolySingle(p, true, 1)
hash = hash * 31 + hashPolySingle(p, true, 2)
hash = hash * 31 + hashPolySingle(p, true, 3)
hash = hash * 31 + hashPolySingle(p, true, 4)
hashPoly = hash
In this way, you're implicitly rotating the poly from each direction, but you're not actually performing the rotation and translation. It performs the 8 hashes, which seem to be entirely necessary in order to accurately hash all 8 orientations, but wastes no passes over the poly that are not actually doing hashes. This seems to me to be the most elegant solution.
Note that there may be a better hashPolySingle() algorithm to use. Mine uses a Cartesian exhaustion algorithm that is on the order of O(n^2). Its worst case scenario is an L shape, which would cause there to be an N/2 * (N-1)/2 sized square for only N elements, or an efficiency of 1:(N-1)/4, compared to an I shape which would be 1:1. It may also be that the inherent invariant imposed by the architecture would actually make it less efficient than the naive algorithm.
My suspicion is that the above concern can be alleviated by simulating the Cartesian exhaustion by converting the set of nodes into an bi-directional graph that can be traversed, causing the nodes to be hit in the same order as my much more naive hashing algorithm, ignoring the empty spaces. This will bring the algorithm down to O(n) as the graph should be able to be constructed in O(n) time. Because I haven't done this, I can't say for sure, which is why I say it's only a suspicion, but there should be a way to do it.
Here's my DFS (depth first search) explained:
Start with the top-most cell (left-most as a tiebreaker). Mark it as visited. Every time you visit a cell, check all four directions for unvisited neighbors. Always check the four directions in this order: up, left, down, right.
Example
In this example, up and left fail, but down succeeds. So far our output is 001, and we recursively search the "down" cell.
We mark our new current cell as visited (and we'll finish searching the original cell when we finish searching this cell). Here, up=0, left=1.
We search the left-most cell and there are no unvisted neighbors (up=0, left=0, down=0, right=0). Our total output so far is 001010000.
We continue our search of the second cell. down=0, right=1. We search the cell to the right.
up=0, left=0, down=1. Search the down cell: all 0s. Total output so far is 001010000010010000. Then, we return from the down cell...
right=0, return. return. (Now, we are at the starting cell.) right=0. Done!
So, the total output is 20 (N*4) bits: 00101000001001000000.
Encoding improvement
But, we can save some bits.
The last visited cell will always encode 0000 for its four directions. So, don't encode the last visited cell to save 4 bits.
Another improvement: if you reached a cell by moving left, don't check that cells right-side. So, we only need 3 bits per cell, except 4 bits for the first cell, and 0 for the last cell.
The first cell will never have an up, or left neighbor, so omit these bits. So the first cell takes 2 bits.
So, with these improvements, we use only N*3-4 bits (e.g. 5 cells -> 11 bits; 9 cells -> 23 bits).
If you really want, you can compact a little more by noting that exactly N-1 bits will be "1".
Caveat
Yes, you'll need to encode all 8 rotations/flips of the polyomino and choose the least to get a canonical encoding.
I suspect this will still be faster than the outline approach. Also, holes in the polyomino shouldn't be a problem.
I worked on the same problem recently. I solved the problem fairly simply by
(1) generate a unique ID for a polyomino, such that each identical poly would have the same UID. For example, find the bounding box, normalize the corner of the bounding box, and collect the set of non-empty cells.
(2) generate all possible permutations by rotating (and flipping, if appropriate) a polyomino, and look for duplicates.
The advantage of this brute approach, other than it's simplicity, is that it still works if the
polys are distinguishable in some other way, for example if some of them are colored or numbered.
You can set up something like a trie to uniquely identify (and not just hash) your polyomino. Take your normalized polyomino and set up a binary search tree, where the root branches on whether (0,0) is has a set pixel, the next level branches on whether (0,1) has a set pixel, and so on. When you look up a polyomino, simply normalize it and then walk the tree. If you find it in the trie, then you're done. If not, assign that polyomino a unique id (just increment a counter), generate all 8 possible rotations and flips, then add those 8 to the trie.
On a trie miss, you'll have to generate all the rotations and reflections. But on a trie hit it should cost less (O(k^2) for k-polyominos).
To make lookups even more efficient, you could use a couple bits at a time and use a wider tree instead of a binary tree.
A valid hash function, if you're really afraid of hash collisions, is to make a hash function x + order * y for coordinates and then loop trough all the coordinates of a piece, adding (order ^ i) * hash(coord[i]) to the piece hash. That way, you can guarantee you won't get any hash collisions.

Distributing points over a surface within boundries

I'm interested in a way (algorithm) of distributing a predefined number of points over a 4 sided surface like a square.
The main issue is that each point has got to have a minimum and maximum proximity to each other (random between two predefined values). Basically the distance of any two points should not be closer than let's say 2, and a further than 3.
My code will be implemented in ruby (the points are locations, the surface is a map), but any ideas or snippets are definitely welcomed as all my ideas include a fair amount of brute force.
Try this paper. It has a nice, intuitive algorithm that does what you need.
In our modelization, we adopted another model: we consider each center to be related to all its neighbours by a repulsive string.
At the beginning of the simulation, the centers are randomly distributed, as well as the strengths of the
strings. We choose randomly to move one center; then we calculate the resulting force caused by all
neighbours of the given center, and we calculate the displacement which is proportional and oriented
in the sense of the resulting force.
After a certain number of iterations (which depends on the number of
centers and the degree of initial randomness) the system becomes stable.
In case it is not clear from the figures, this approach generates uniformly distributed points. You may use instead a force that is zero inside your bounds (between 2 and 3, for example) and non-zero otherwise (repulsive if the points are too close, attractive if too far).
This is my Python implementation (sorry, I donĀ“t know ruby). Just import this and call uniform() to get a list of points.
import numpy as np
from numpy.linalg import norm
import pylab as pl
# find the nearest neighbors (brute force)
def neighbors(x, X, n=10):
dX = X - x
d = dX[:,0]**2 + dX[:,1]**2
idx = np.argsort(d)
return X[idx[1:11]]
# repulsion force, normalized to 1 when d == rmin
def repulsion(neib, x, d, rmin):
if d == 0:
return np.array([1,-1])
return 2*(x - neib)*rmin/(d*(d + rmin))
def attraction(neib, x, d, rmax):
return rmax*(neib - x)/(d**2)
def uniform(n=25, rmin=0.1, rmax=0.15):
# Generate randomly distributed points
X = np.random.random_sample( (n, 2) )
# Constants
# step is how much each point is allowed to move
# set to a lower value when you have more points
step = 1./50.
# maxk is the maximum number of iterations
# if step is too low, then maxk will need to increase
maxk = 100
k = 0
# Force applied to the points
F = np.zeros(X.shape)
# Repeat for maxk iterations or until all forces are zero
maxf = 1.
while maxf > 0 and k < maxk:
maxf = 0
for i in xrange(n):
# Force calculation for the i-th point
x = X[i]
f = np.zeros(x.shape)
# Interact with at most 10 neighbors
Neib = neighbors(x, X, 10)
# dmin is the distance to the nearest neighbor
dmin = norm(Neib[0] - x)
for neib in Neib:
d = norm(neib - x)
if d < rmin:
# feel repulsion from points that are too near
f += repulsion(neib, x, d, rmin)
elif dmin > rmax:
# feel attraction if there are no neighbors closer than rmax
f += attraction(neib, x, d, rmax)
# save all forces and the maximum force to normalize later
F[i] = f
if norm(f) <> 0:
maxf = max(maxf, norm(f))
# update all positions using the forces
if maxf > 0:
X += (F/maxf)*step
k += 1
if k == maxk:
print "warning: iteration limit reached"
return X
I presume that one of your brute force ideas includes just repeatedly generating points at random and checking to see if the constraints happen to be satisified.
Another way is to take a configuration that satisfies the constraints and repeatedly perturb a small part of it, chosen at random - for instance move a single point - to move to a randomly chosen nearby configuration. If you do this often enough you should move to a random configuration that is almost independent of the starting point. This could be justified under http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm or http://en.wikipedia.org/wiki/Gibbs_sampling.
I might try just doing it at random, then going through and dropping points that are to close to other points. You can compare the square of the distance to save some math time.
Or create cells with borders and place a point in each one. Less random, it depends on if this is a "just for looks thing" or not. But it could be very fast.
I made a compromise and ended up using the Poisson Disk Sampling method.
The result was fairly close to what I needed, especially with a lower number of tries (which also drastically reduces cost).

Resources