cudnnRNNForwardTraining seqLength / xDesc usage - cudnn

Let's say I have N sequences x[i], each with length seqLength[i] for 0 <= i < N. As far as I understand from the cuDNN docs, they have to be ordered by sequence length, the longest first, so assume that seqLength[i] >= seqLength[i+1]. Let's say that they have the feature dimension D, so x[i] is a 2D tensor of shape (seqLength[i], D). As far as I understand, I should prepare a tensor x where all x[i] are contiguously behind each other, i.e. it would be of shape (sum(seqLength), D).
According to the cuDNN docs, the functions cudnnRNNForwardInference / cudnnRNNForwardTraining gets the argument int seqLength and cudnnTensorDescriptor_t* xDesc, where:
seqLength: Number of iterations to unroll over.
xDesc: Array of tensor descriptors. Each must have the same second dimension. The first dimension may decrease from element n to element n + 1 but may not increase.
I'm not exactly sure I understand this correctly.
Is seqLength my max(seqLength)?
And xDesc is an array. Of what length? max(seqLength)? If so, I assume that it describes one batch of features for each frame but some of the later frames will have less sequences in it. It sounds like the number of sequences per frame is described in the first dimension.
So:
xDesc[t].shape[0] = len([i for i in range(N) if t < seqLength[i]])
for all 0 <= t < max(seqLength). I.e. 0 <= xDesc[t].shape[0] <= N.
How much dimensions does each xDesc[t] describe, i.e. what is len(xDesc[t].shape)? I would assume that it is 2 and the second dimension is the feature dimension, i.e. D, i.e.:
xDesc[t].shape = (len(...), D)
The strides would have to be set accordingly, although it's also not totally clear. If x is stored in row-major order, then
xDesc[0].strides[0] = D * xDesc[0].shape[0]
xDesc[0].strides[1] = 1
But how does cuDNN compute the offset for frame t? I guess it will keep track and thus calculate sum([xDesc[t2].strides[0] for t2 in range(t)]).
Most example code I have seen assume that all sequences are of the same length. Also they all describe 3 dimensions per xDesc[t], not 2. Why is that? The third dimension is always 1, as well as the stride of the second and third dimension, and the stride for the first dimension is N. So this assumes that the tensor x is row-major ordered and of shape (max(seqLength), N, D). The code is actually a bit strange. E.g. from TensorFlow:
int dims[] = {batch_size, data_size, 1};
int strides[] = {dims[1] * dims[2], dims[2], 1};
cudnnSetTensorNdDescriptor(
...,
sizeof(dims) / sizeof(dims[0]) /*nbDims*/, dims /*dimA*/,
strides /*strideA*/);
The code looks really similar in all examples I have found. Search for cudnnSetTensorNdDescriptor or cudnnRNNForwardTraining. E.g.:
TensorFlow (issue 6633)
Theano
mxnet
Torch
Baidu persistent-rnn
Caffe2
Chainer
I found one example which can handle sequences of different length. Again search for cudnnSetTensorNdDescriptor:
Microsoft CNTK
That claims that there must be 3 dimensions for every xDesc[t]. It has the comment:
these dimensions are what CUDNN expects: (the minibatch dimension, the data dimension, and the number 1 (because each descriptor describes one frame of data)
Edit: Support for this was added now end of 2018 for PyTorch, in this commit.
Am I missing something from the cuDNN documentation? I really have not found that information in it.
My question is basically, is my conclusion about how to set the arguments x, seqLength and xDesc for cudnnRNNForwardInference / cudnnRNNForwardTraining correct, and also my implicit assumptions, or if not, how would I use it, how does the memory layout look like, etc.?

Related

Understanding image steganography by LSB substitution method

I am having a very hard time in understanding the LSB based steganography method given in Section 2. The examples in the internet are very confusing and unclear. I am following the Matlab implementation https://www.mathworks.com/matlabcentral/fileexchange/41326-steganography-using-lsb-substitution and the paper titled, "A SURVEY ON IMAGE STEGANOGRAPHY USING LSB SUBSTITUTION
TECHNIQUE " download link (https://irjet.net/archives/V4/i5/IRJET-V4I566.pdf)
Section 5 of this paper gives an example of the LSB based method. Suppose, P1 = [10011011], P2 = [01101010], P3 = [11001100] are the 3 bytes of the cover image into which the message M = [011] is to be embedded. The result of the embedding is P1 = [10011010], P2 = [01101011],
P3 = [11001101].
I am clueless how this answer comes. Can somebody please help in giving the steps/working example to clear the concept?
Based on my understanding of the Matlab code,
Stego = uint8(round(bitor(bitand(x, bitcmp(2^n - 1, 8)) , bitshift(y, n - 8))));
if n is the number of bits to be substituted, then the group of n bits are replaced by doing a complement/ comparison of the group of n bits of the cover image (x variable) with the n bits of the message (y variable). If the bits are same, then no replacement, else the bits are swapped. I dont't know if my understanding is correct or not.
Your confusion stems from the fact that all 3 sources you're looking at talk about something different.
Paper 2, section 5
This describes the most basic form of LSB pixel substitution steganography. Each pixel is described by 8 bits. For each pixel we clear out the LSB and substitute it with one bit of the secret message. For example,
pixels = [xxxxxxxa, xxxxxxxb]
message = [c, d]
stego_pixels = [xxxxxxxc, xxxxxxxd]
Where x, a, b, c and d are bits and we don't care what x is.
Paper 1, section 2
This is the generalised form of LSB pixel substitution steganography. Instead of embedding the secret in the LSB, you embed it in the k-most LSBs. If k = 1, then we have the simple form described above. The mathematical equations is this section mean the following:
We have an image of size MxN, with each pixel having a value between 0 and 255 (8 bits).
We have an n-bit message, with each bit being either 0 or 1. For example, for 12 bits it'd be m = [a, b, c, d, e, f, g, h, i, j, k, l].
Since we'll be embedding k bits per pixel, we group our message bits in groups of k. Assume that k = 3, then, m' = [abc, def, ghi, jkl]. Obviously, each group can have a value between 0 and 2^k - 1. Furthermore, the number of groups in m' cannot exceed the size of our image, or we won't be able to embed the whole message.
We clear out the last k bits from each pixel and we substitute them with one group of m'. When you take the modulo of a number with 2^k, the remainder you get is the last k bits of the original number. So by subtracting them, we clear out the last k bits.
Similar to the previous step, if we want to extract the message, we take the modulo of each pixel with 2^k to get the last k bits, where we have embedded our message. It's trivial to stitch these bit groups and obtain the original message, m, back.
Matlab code
This code hides an image of size MxN to a cover image of the same size. The idea here is the MSB holds the most information about the image and the LSB the least. For example, this is the bit plane decomposition of this image.
If we decide to hide k bits from the secret image, we want those to be the k most significant bits. Similarly, we can hide them in the k least significant bits of the cover image. The larger the k value, the more bits you'll hide from the secret for a more faithful reconstruction, but the more distortion you'll introduce to the cover.
Let's break down the nested functions in the code to see what they do. I'll use k instead of n to maintain consistency with the above sections.
bitcmp(2^k - 1, 8) creates the complement of 2^k - 1 for 8 bits. For example, if k = 3, then 2^k - 1 is like having the bits 00000111 and obviously its complement is 11111000. We're going to use this number as a mask, so mask = bitcmp(2^k - 1, 8).
bitand(x, mask) zeroes out the last k bits of the cover image, x. This is the bitwise AND operation and the reason we called the second part a mask is because anywhere we have a 1, we keep the original bit and anywhere we have a 0, we get a 0. Let's call cleared_pixels = bitand(x, mask).
bitshift(y, 8 - k) keeps only the k most significant bits of the secret. For example, for k = 3 we achieve abcxxxxx -> 00000abc. This is done by shifting the number 5 places to the right. This is a logical shift operation. We'll call this result secret = bitshift(y, 8 - k).
Finally, bitor(cleared_pixels, secret) simply combines the two together. The cleared pixels have the last k bits cleared and the secret is at most k bits, so the two parts don't interact; we get a pure combination.

MATLAB: Speeding up a discretization function using bsxfun

For a current project, I have to discretize quasi-continuous values into bins defined by some pre-defined binning resolution. For this purpose, I have written a function, which I expected to be highly efficient as it is able to both process scalar inputs as well as vector inputs using bsxfun. However, after some profiling, I found out that almost all processing time of my much larger project is produced in this function, and within the function, it's mainly the bsxfun part that takes time, with the min-query following on second place. Long story short, I am looking for advice on how to solve this task MUCH faster in MATLAB. Side note: I am usually passing vectors with some 50k elements.
Here's the code:
function sampleNo = value2sample(value,bins)
%Make sure both vectors have orientations fitting bsxfun
value = value(:);
bins = bins(:)';
%Recover bin resolution (avoids passing another parameter)
delta = median(diff(bins));
%Calculate distance matrix between all combinations
dist = abs(bsxfun(#minus,value,bins));
%What we really want to know is the minimum distance per row
[minval,ind] = min(dist,[],2);
%Make sure we don't accidentally further process NaNs as 1st bin
ind(isnan(minval))=NaN;
sampleNo = ind;
sampleNo(minval>delta) = NaN;
end
The reason that your function is slow is because you are computing the distance between every element of values and bins and storing them all in an array - if there are N values and M bins then you will require NM elements to store all the distances, and this is probably a really big number (e.g. if each input has 50,000 elements then you need 2.5 billion elements in the output array).
Moreover, since your bins are sorted (you didn't state this, but it looks like you are assuming it in your code) you do not need to compute the distance from every value to every bin. You can be much smarter,
function ind = value2sample(value, bins)
% Find median bin distance
delta = median(diff(bins));
% Bucket into 'nearest' bin by using midpoints
bins = bins(:);
mids = [-Inf; 0.5 * (bins(1:end-1) + bins(2:end))];
[~, ind] = histc(value, mids);
% Ensure that NaN values and points that aren't near any bin are returned as NaN
ind(isnan(value)) = NaN;
ind(abs(value - bins(ind)) > delta) = NaN;
end
In my tests, with values = randn(10000, 1) and bins = -50:50 it takes around 4.5 milliseconds to run the original function, and 485 microseconds to run the code above, so you are getting around a 10x speedup (and the speedup will be even greater as you increase the size of the inputs).
Thanks to #Chris Taylor, I was able to solve the problem very efficiently. The code now runs almost 400 times faster than before. The only changes I had to make from his version are reflected in the code below. Main issue was to replace histc (whose use is not encouraged anymore) by discretize.
function ind = value2sample(value, bins)
% Make sure the vectors are standing
value = value(:);
bins = bins(:);
% Bucket into 'nearest' bin by using midpoints
mids = [eps; 0.5 * (bins(1:end-1) + bins(2:end))];
ind = discretize(value, mids);
The only thing is, that in this implementation your bins must be non-negative. Other than that, this code does exactly what I want, including the fact that ind has the same size as value and contains NaNs whenever a value is NaN or out of the range of bins.

Compare two arrays of points [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm trying to find a way to find similarities in two arrays of different points. I drew circles around points that have similar patterns and I would like to do some kind of auto comparison in intervals of let's say 100 points and tell what coefficient of similarity is for that interval. As you can see it might not be perfectly aligned also so point-to-point comparison would not be a good solution also (I suppose). Patterns that are slightly misaligned could also mean that they are matching the pattern (but obviously with a smaller coefficient)
What similarity could mean (1 coefficient is a perfect match, 0 or less - is not a match at all):
Points 640 to 660 - Very similar (coefficient is ~0.8)
Points 670 to 690 - Quite similar (coefficient is ~0.5-~0.6)
Points 720 to 780 - Let's say quite similar (coefficient is ~0.5-~0.6)
Points 790 to 810 - Perfectly similar (coefficient is 1)
Coefficient is just my thoughts of how a final calculated result of comparing function could look like with given data.
I read many posts on SO but it didn't seem to solve my problem. I would appreciate your help a lot. Thank you
P.S. Perfect answer would be the one that provides pseudo code for function which could accept two data arrays as arguments (intervals of data) and return coefficient of similarity.
Click here to see original size of image
I also think High Performance Mark has basically given you the answer (cross-correlation). In my opinion, most of the other answers are only giving you half of what you need (i.e., dot product plus compare against some threshold). However, this won't consider a signal to be similar to a shifted version of itself. You'll want to compute this dot product N + M - 1 times, where N, M are the sizes of the arrays. For each iteration, compute the dot product between array 1 and a shifted version of array 2. The amount you shift array 2 increases by one each iteration. You can think of array 2 as a window you are passing over array 1. You'll want to start the loop with the last element of array 2 only overlapping the first element in array 1.
This loop will generate numbers for different amounts of shift, and what you do with that number is up to you. Maybe you compare it (or the absolute value of it) against a threshold that you define to consider two signals "similar".
Lastly, in many contexts, a signal is considered similar to a scaled (in the amplitude sense, not time-scaling) version of itself, so there must be a normalization step prior to computing the cross-correlation. This is usually done by scaling the elements of the array so that the dot product with itself equals 1. Just be careful to ensure this makes sense for your application numerically, i.e., integers don't scale very well to values between 0 and 1 :-)
i think HighPerformanceMarks's suggestion is the standard way of doing the job.
a computationally lightweight alternative measure might be a dot product.
split both arrays into the same predefined index intervals.
consider the array elements in each intervals as vector coordinates in high-dimensional space.
compute the dot product of both vectors.
the dot product will not be negative. if the two vectors are perpendicular in their vector space, the dot product will be 0 (in fact that's how 'perpendicular' is usually defined in higher dimensions), and it will attain its maximum for identical vectors.
if you accept the geometric notion of perpendicularity as a (dis)similarity measure, here you go.
caveat:
this is an ad hoc heuristic chosen for computational efficiency. i cannot tell you about mathematical/statistical properties of the process and separation properties - if you need rigorous analysis, however, you'll probably fare better with correlation theory anyway and should perhaps forward your question to math.stackexchange.com.
My Attempt:
Total_sum=0
1. For each index i in the range (m,n)
2. sum=0
3. k=Array1[i]*Array2[i]; t1=magnitude(Array1[i]); t2=magnitude(Array2[i]);
4. k=k/(t1*t2)
5. sum=sum+k
6. Total_sum=Total_sum+sum
Coefficient=Total_sum/(m-n)
If all values are equal, then sum would return 1 in each case and total_sum would return (m-n)*(1). Hence, when the same is divided by (m-n) we get the value as 1. If the graphs are exact opposites, we get -1 and for other variations a value between -1 and 1 is returned.
This is not so efficient when the y range or the x range is huge. But, I just wanted to give you an idea.
Another option would be to perform an extensive xnor.
1. For each index i in the range (m,n)
2. sum=1
3. k=Array1[i] xnor Array2[i];
4. k=k/((pow(2,number_of_bits))-1) //This will scale k down to a value between 0 and 1
5. sum=(sum+k)/2
Coefficient=sum
Is this helpful ?
You can define a distance metric for two vectors A and B of length N containing numbers in the interval [-1, 1] e.g. as
sum = 0
for i in 0 to 99:
d = (A[i] - B[i])^2 // this is in range 0 .. 4
sum = (sum / 4) / N // now in range 0 .. 1
This now returns distance 1 for vectors that are completely opposite (one is all 1, another all -1), and 0 for identical vectors.
You can translate this into your coefficient by
coeff = 1 - sum
However, this is a crude approach because it does not take into account the fact that there could be horizontal distortion or shift between the signals you want to compare, so let's look at some approaches for coping with that.
You can sort both your arrays (e.g. in ascending order) and then calculate the distance / coefficient. This returns more similarity than the original metric, and is agnostic towards permutations / shifts of the signal.
You can also calculate the differentials and calculate distance / coefficient for those, and then you can do that sorted also. Using differentials has the benefit that it eliminates vertical shifts. Sorted differentials eliminate horizontal shift but still recognize different shapes better than sorted original data points.
You can then e.g. average the different coefficients. Here more complete code. The routine below calculates coefficient for arrays A and B of given size, and takes d many differentials (recursively) first. If sorted is true, the final (differentiated) array is sorted.
procedure calc(A, B, size, d, sorted):
if (d > 0):
A' = new array[size - 1]
B' = new array[size - 1]
for i in 0 to size - 2:
A'[i] = (A[i + 1] - A[i]) / 2 // keep in range -1..1 by dividing by 2
B'[i] = (B[i + 1] - B[i]) / 2
return calc(A', B', size - 1, d - 1, sorted)
else:
if (sorted):
A = sort(A)
B = sort(B)
sum = 0
for i in 0 to size - 1:
sum = sum + (A[i] - B[i]) * (A[i] - B[i])
sum = (sum / 4) / size
return 1 - sum // return the coefficient
procedure similarity(A, B, size):
sum a = 0
a = a + calc(A, B, size, 0, false)
a = a + calc(A, B, size, 0, true)
a = a + calc(A, B, size, 1, false)
a = a + calc(A, B, size, 1, true)
return a / 4 // take average
For something completely different, you could also run Fourier transform using FFT and then take a distance metric on the returning spectra.

matlab: optimum amount of points for linear fit

I want to make a linear fit to few data points, as shown on the image. Since I know the intercept (in this case say 0.05), I want to fit only points which are in the linear region with this particular intercept. In this case it will be lets say points 5:22 (but not 22:30).
I'm looking for the simple algorithm to determine this optimal amount of points, based on... hmm, that's the question... R^2? Any Ideas how to do it?
I was thinking about probing R^2 for fits using points 1 to 2:30, 2 to 3:30, and so on, but I don't really know how to enclose it into clear and simple function. For fits with fixed intercept I'm using polyfit0 (http://www.mathworks.com/matlabcentral/fileexchange/272-polyfit0-m) . Thanks for any suggestions!
EDIT:
sample data:
intercept = 0.043;
x = 0.01:0.01:0.3;
y = [0.0530642513911393,0.0600786706929529,0.0673485248329648,0.0794662409166333,0.0895915873196170,0.103837395346484,0.107224784565365,0.120300492775786,0.126318699218730,0.141508831492330,0.147135757370947,0.161734674733680,0.170982455701681,0.191799936622712,0.192312642057298,0.204771365716483,0.222689541632988,0.242582251060963,0.252582727297656,0.267390860166283,0.282890010610515,0.292381165948577,0.307990544720676,0.314264952297699,0.332344368808024,0.355781519885611,0.373277721489254,0.387722683944356,0.413648156978284,0.446500064130389;];
What you have here is a rather difficult problem to find a general solution of.
One approach would be to compute all the slopes/intersects between all consecutive pairs of points, and then do cluster analysis on the intersepts:
slopes = diff(y)./diff(x);
intersepts = y(1:end-1) - slopes.*x(1:end-1);
idx = kmeans(intersepts, 3);
x([idx; 3] == 2) % the points with the intersepts closest to the linear one.
This requires the statistics toolbox (for kmeans). This is the best of all methods I tried, although the range of points found this way might have a few small holes in it; e.g., when the slopes of two points in the start and end range lie close to the slope of the line, these points will be detected as belonging to the line. This (and other factors) will require a bit more post-processing of the solution found this way.
Another approach (which I failed to construct successfully) is to do a linear fit in a loop, each time increasing the range of points from some point in the middle towards both of the endpoints, and see if the sum of the squared error remains small. This I gave up very quickly, because defining what "small" is is very subjective and must be done in some heuristic way.
I tried a more systematic and robust approach of the above:
function test
%% example data
slope = 2;
intercept = 1.5;
x = linspace(0.1, 5, 100).';
y = slope*x + intercept;
y(1:12) = log(x(1:12)) + y(12)-log(x(12));
y(74:100) = y(74:100) + (x(74:100)-x(74)).^8;
y = y + 0.2*randn(size(y));
%% simple algorithm
[X,fn] = fminsearch(#(ii)P(ii, x,y,intercept), [0.5 0.5])
[~,inds] = P(X, y,x,intercept)
end
function [C, inds] = P(ii, x,y,intercept)
% ii represents fraction of range from center to end,
% So ii lies between 0 and 1.
N = numel(x);
n = round(N/2);
ii = round(ii*n);
inds = min(max(1, n+(-ii(1):ii(2))), N);
% Solve linear system with fixed intercept
A = x(inds);
b = y(inds) - intercept;
% and return the sum of squared errors, divided by
% the number of points included in the set. This
% last step is required to prevent fminsearch from
% reducing the set to 1 point (= minimum possible
% squared error).
C = sum(((A\b)*A - b).^2)/numel(inds);
end
which only finds a rough approximation to the desired indices (12 and 74 in this example).
When fminsearch is run a few dozen times with random starting values (really just rand(1,2)), it gets more reliable, but I still wouln't bet my life on it.
If you have the statistics toolbox, use the kmeans option.
Depending on the number of data values, I would split the data into a relative small number of overlapping segments, and for each segment calculate the linear fit, or rather the 1-st order coefficient, (remember you know the intercept, which will be same for all segments).
Then, for each coefficient calculate the MSE between this hypothetical line and entire dataset, choosing the coefficient which yields the smallest MSE.

Algorithm to identify a unique free polyomino (or polyomino hash)

In short: How to hash a free polyomino?
This could be generalized into: How to efficiently hash an arbitrary collection of 2D integer coordinates, where a set contains unique pairs of non-negative integers, and a set is considered unique if and only if no translation, rotation, or flip can map it identically to another set?
For impatient readers, please note I'm fully aware of a brute force approach. I'm looking for a better way -- or a very convincing proof that no other way can exist.
I'm working on some different algorithms to generate random polyominos. I want to test their output to determine how random they are -- i.e. are certain instances of a given order generated more frequently than others. Visually, it is very easy to identify different orientations of a free polyomino, for example the following Wikipedia illustration shows all 8 orientations of the "F" pentomino (Source):
How would one put a number on this polyomino - that is, hash a free polyomino? I don't want to depend on a prepolulated list of "named" polyominos. Broadly agreed-upon names only exists for orders 4 and 5, anyway.
This is not necessarily equavalent to enumerating all free (or one-sided, or fixed) polyominos of a given order. I only want to count the number of times a given configuration appears. If a generating algorithm never produces a certain polyomino it will simply not be counted.
The basic logic of the counting is:
testcount = 10000 // Arbitrary
order = 6 // Create hexominos in this test
hashcounts = new hashtable
for i = 1 to testcount
poly = GenerateRandomPolyomino(order)
hash = PolyHash(poly)
if hashcounts.contains(hash) then
hashcounts[hash]++
else
hashcounts[hash] = 1
What I'm looking for is an efficient PolyHash algorithm. The input polyominos are simply defined as a set of coordinates. One orientation of the T tetronimo could be, for example:
[[1,0], [0,1], [1,1], [2,1]]:
|012
-+---
0| X
1|XXX
You can assume that that input polyomino will already be normalized to be aligned against the X and Y axes and have only positive coordinates. Formally, each set:
Will have at least 1 coordinate where the x value is 0
Will have at least 1 coordinate where the y value is 0
Will not have any coordinates where x < 0 or y < 0
I'm really looking for novel algorithms that avoid the increasing number of integer operations required by a general brute force approach, described below.
Brute force
A brute force solution suggested here and here consists of hashing each set as an unsigned integer using each coordinate as a binary flag, and taking the minimum hash of all possible rotations (and in my case flips), where each rotation / flip must also be translated to the origin. This results in a total of 23 set operations for each input set to get the "free" hash:
Rotate (6x)
Flip (1x)
Translate (7x)
Hash (8x)
Find minimum of computed hashes (1x)
Where the sequence of operations to obtain each hash is:
Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Flip, Translate, Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Rotate, Translate, Hash
Well, I came up with a completely different approach. (Also thanks to corsiKa for some helpful insights!) Rather than hashing / encoding the squares, encode the path around them. The path consists of a sequence of 'turns' (including no turn) to perform before drawing each unit segment. I think an algorithm for getting the path from the coordinates of the squares is outside the scope of this question.
This does something very important: it destroys all location and orientation information, which we don't need. It is also very easy to get the path of the flipped object: you do so by simply reversing the order of the elements. Storage is compact because each element requires only 2 bits.
It does introduce one additional constraint: the polyomino must not have fully enclosed holes. (Formally, it must be simply connected.) Most discussions of polyominos consider a hole to exist even if it is sealed only by two touching corners, as this prevents tiling with any other non-trivial polyomino. Tracing the edges is not hindered by touching corners (as in the single heptomino with a hole), but it cannot leap from one outer loop to an inner one as in the complete ring-shaped octomino:
It also produces one additional challenge: finding the minumum ordering of the encoded path loop. This is because any rotation of the path (in the sense of string rotation) is a valid encoding. To always get the same encoding we have to find the minimal (or maximal) rotation of the path instructions. Thankfully this problem has already been solved: see for example http://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation.
Example:
If we arbitrarily assign the following values to the move operations:
No turn: 1
Turn right: 2
Turn left: 3
Here is the F pentomino traced clockwise:
An arbitrary initial encoding for the F pentomino is (starting at the bottom right corner):
2,2,3,1,2,2,3,2,2,3,2,1
The resulting minimum rotation of the encoding is
1,2,2,3,1,2,2,3,2,2,3,2
With 12 elements, this loop can be packed into 24 bits if two bits are used per instruction or only 19 bits if instructions are encoded as powers of three. Even with the 2-bit element encoding can easily fit that in a single unsigned 32 bit integer 0x6B6BAE:
1- 2- 2- 3- 1- 2- 2- 3- 2- 2- 3- 2
= 01-10-10-11-01-10-10-11-10-10-11-10
= 00000000011010110110101110101110
= 0x006B6BAE
The base-3 encoding with the start of the loop in the most significant powers of 3 is 0x5795F:
1*3^11 + 2*3^10 + 2*3^9 + 3*3^8 + 1*3^7 + 2*3^6
+ 2*3^5 + 3*3^4 + 2*3^3 + 2*3^2 + 3*3^1 + 2*3^0
= 0x0005795F
The maximum number of vertexes in the path around a polyomino of order n is 2n + 2. For 2-bit encoding the number of bits is twice the number of moves, so the maximum bits needed is 4n + 4. For base-3 encoding it's:
Where the "gallows" is the ceiling function. Accordingly any polyomino up to order 9 can be encoded in a single 32 bit integer. Knowing this you can choose your platform-specific data structure accordingly for the fastest hash comparison given the maximum order of the polyominos you'll be hashing.
You can reduce it down to 8 hash operations without the need to flip, rotate, or re-translate.
Note that this algorithm assumes you are operating with coordinates relative to itself. That is to say it's not in the wild.
Instead of applying operations that flip, rotate, and translate, instead simply change the order in which you hash.
For instance, let us take the F pent above. In the simple example, let us presume the hash operation was something like this:
int hashPolySingle(Poly p)
int hash = 0
for x = 0 to p.width
fory = 0 to p.height
hash = hash * 31 + p.contains(x,y) ? 1 : 0
hashPolySingle = hash
int hashPoly(Poly p)
int hash = hashPolySingle(p)
p.rotateClockwise() // assume it translates inside
hash = hash * 31 + hashPolySingle(p)
// keep rotating for all 4 oritentations
p.flip()
// hash those 4
Instead of applying the function to all 8 different orientations of the poly, I would apply 8 different hash functions to 1 poly.
int hashPolySingle(Poly p, bool flip, int corner)
int hash = 0
int xstart, xstop, ystart, ystop
bool yfirst
switch(corner)
case 1: xstart = 0
xstop = p.width
ystart = 0
ystop = p.height
yfirst = false
break
case 2: xstart = p.width
xstop = 0
ystart = 0
ystop = p.height
yfirst = true
break
case 3: xstart = p.width
xstop = 0
ystart = p.height
ystop = 0
yfirst = false
break
case 4: xstart = 0
xstop = p.width
ystart = p.height
ystop = 0
yfirst = true
break
default: error()
if(flip) swap(xstart, xstop)
if(flip) swap(ystart, ystop)
if(yfirst)
for y = ystart to ystop
for x = xstart to xstop
hash = hash * 31 + p.contains(x,y) ? 1 : 0
else
for x = xstart to xstop
for y = ystart to ystop
hash = hash * 31 + p.contains(x,y) ? 1 : 0
hashPolySingle = hash
Which is then called in the 8 different ways. You could also encapsulate hashPolySingle in for loop around the corner, and around the flip or not. All the same.
int hashPoly(Poly p)
// approach from each of the 4 corners
int hash = hashPolySingle(p, false, 1)
hash = hash * 31 + hashPolySingle(p, false, 2)
hash = hash * 31 + hashPolySingle(p, false, 3)
hash = hash * 31 + hashPolySingle(p, false, 4)
// flip it
hash = hash * 31 + hashPolySingle(p, true, 1)
hash = hash * 31 + hashPolySingle(p, true, 2)
hash = hash * 31 + hashPolySingle(p, true, 3)
hash = hash * 31 + hashPolySingle(p, true, 4)
hashPoly = hash
In this way, you're implicitly rotating the poly from each direction, but you're not actually performing the rotation and translation. It performs the 8 hashes, which seem to be entirely necessary in order to accurately hash all 8 orientations, but wastes no passes over the poly that are not actually doing hashes. This seems to me to be the most elegant solution.
Note that there may be a better hashPolySingle() algorithm to use. Mine uses a Cartesian exhaustion algorithm that is on the order of O(n^2). Its worst case scenario is an L shape, which would cause there to be an N/2 * (N-1)/2 sized square for only N elements, or an efficiency of 1:(N-1)/4, compared to an I shape which would be 1:1. It may also be that the inherent invariant imposed by the architecture would actually make it less efficient than the naive algorithm.
My suspicion is that the above concern can be alleviated by simulating the Cartesian exhaustion by converting the set of nodes into an bi-directional graph that can be traversed, causing the nodes to be hit in the same order as my much more naive hashing algorithm, ignoring the empty spaces. This will bring the algorithm down to O(n) as the graph should be able to be constructed in O(n) time. Because I haven't done this, I can't say for sure, which is why I say it's only a suspicion, but there should be a way to do it.
Here's my DFS (depth first search) explained:
Start with the top-most cell (left-most as a tiebreaker). Mark it as visited. Every time you visit a cell, check all four directions for unvisited neighbors. Always check the four directions in this order: up, left, down, right.
Example
In this example, up and left fail, but down succeeds. So far our output is 001, and we recursively search the "down" cell.
We mark our new current cell as visited (and we'll finish searching the original cell when we finish searching this cell). Here, up=0, left=1.
We search the left-most cell and there are no unvisted neighbors (up=0, left=0, down=0, right=0). Our total output so far is 001010000.
We continue our search of the second cell. down=0, right=1. We search the cell to the right.
up=0, left=0, down=1. Search the down cell: all 0s. Total output so far is 001010000010010000. Then, we return from the down cell...
right=0, return. return. (Now, we are at the starting cell.) right=0. Done!
So, the total output is 20 (N*4) bits: 00101000001001000000.
Encoding improvement
But, we can save some bits.
The last visited cell will always encode 0000 for its four directions. So, don't encode the last visited cell to save 4 bits.
Another improvement: if you reached a cell by moving left, don't check that cells right-side. So, we only need 3 bits per cell, except 4 bits for the first cell, and 0 for the last cell.
The first cell will never have an up, or left neighbor, so omit these bits. So the first cell takes 2 bits.
So, with these improvements, we use only N*3-4 bits (e.g. 5 cells -> 11 bits; 9 cells -> 23 bits).
If you really want, you can compact a little more by noting that exactly N-1 bits will be "1".
Caveat
Yes, you'll need to encode all 8 rotations/flips of the polyomino and choose the least to get a canonical encoding.
I suspect this will still be faster than the outline approach. Also, holes in the polyomino shouldn't be a problem.
I worked on the same problem recently. I solved the problem fairly simply by
(1) generate a unique ID for a polyomino, such that each identical poly would have the same UID. For example, find the bounding box, normalize the corner of the bounding box, and collect the set of non-empty cells.
(2) generate all possible permutations by rotating (and flipping, if appropriate) a polyomino, and look for duplicates.
The advantage of this brute approach, other than it's simplicity, is that it still works if the
polys are distinguishable in some other way, for example if some of them are colored or numbered.
You can set up something like a trie to uniquely identify (and not just hash) your polyomino. Take your normalized polyomino and set up a binary search tree, where the root branches on whether (0,0) is has a set pixel, the next level branches on whether (0,1) has a set pixel, and so on. When you look up a polyomino, simply normalize it and then walk the tree. If you find it in the trie, then you're done. If not, assign that polyomino a unique id (just increment a counter), generate all 8 possible rotations and flips, then add those 8 to the trie.
On a trie miss, you'll have to generate all the rotations and reflections. But on a trie hit it should cost less (O(k^2) for k-polyominos).
To make lookups even more efficient, you could use a couple bits at a time and use a wider tree instead of a binary tree.
A valid hash function, if you're really afraid of hash collisions, is to make a hash function x + order * y for coordinates and then loop trough all the coordinates of a piece, adding (order ^ i) * hash(coord[i]) to the piece hash. That way, you can guarantee you won't get any hash collisions.

Resources