perfect hash function for random integer - algorithm

Here's the problem:
X is a positive integer (include 0) set which has n different elements I know in advance. All of them is less equal than m. And I want to have an occ-free hash function as simple as possible to map them to 0-n-1.
For example:
X = [31,223,121,100,123,71], so n = 6, m = 223.
I want to find a hash function to map them to [0, 1, 2, 3, 4, 5].
If mapping to 0-n-1 is too difficult, then how to mapping X to a small range is also a problem.
Finding such a function is not too difficult, but to be simple and easy to be generated is hard.
It's better to preserve the order of the X.
Any clues?

My favorite perfect hash is pretty easy.
The hash function you generate has the form:
hash = table1[h1(key)%N] + table2[h2(key)%N]
h1 and h2 are randomly generated hash functions. In your case, you can generate random constants and then have h1(key)=key*C1/m and h2(key)=key*C2/m or something similarly simple
To generated the perfect hash:
Generate random constants C1 and C2
Imagine the bipartite graph, with table1 slots and table2 slots as vertices and an edge for each key between table1[h1(key)%N] and table2[h2(key)%N]. Run a DFS to see if the graph is acyclic. If not, go back to step 1.
Now that you have an acyclic graph, start at any key/edge in each connected component, and set its slots in table1 and table2 however you like to give it whatever hash you like.
Traverse the tree starting at the vertices adjacent to the edge you just set. For every edge you traverse, one of its slots will already be set. Set the other one to make the hash value come out however you like.
That's it. All of steps (2), (3) and (4) can be combined into a single DFS traversal pretty easily.
The complete description and analysis is in this paper.

Related

Sum of Function defined on Subsets

I want to know if their are any fast approaches to solve the following problem. I have a list of codes somewhere in the thousands (A0, A1, A2, ...). There is a positive value attached to about a million distinct combinations (A0-A1, A2-A10, A1-A2-A10, ...). Let the values be denoted f(A0-A1). Note that not all the combinations have the value attached.
For each listed combination, I want to calculate the sum of values of the values attached to each set that contains the given combination. For instance, for A2-A10,
calculate
g(A2-A10) = f(A2-A10) + f(A1-A2-A10) + ...
I would like to do this with minimal time complexity. A simpler related problem is to find all combinations where g(C) is greater than a threshold value.
Key the existing combinations with a bit map, where bit n denotes whether An is in that particular coding. Store the values keyed by the bit map for each in your favorite hash-map structure. Thus, f(A0, A1, A10, A12) would be combo_val[11000000001010000...]
To sum all of the desired combinations, build a bit map of your root. For instance, with the combination above, we'd have root = 1100000000101000 (cutting off at 16 total elements for the sake of illustration.
Now simply loop through the keys of the hashmap, using root as a mask. Sum the desired values:
total = 0
for key in combo_val.keys()
if root && key == root
total += combo_val[key]
Does that get you moving?
I thought waaay too long before coming up with the following approach.
Index the million combinations. So you know which you want. In your example:
0: A0-A1
1: A2-A10
2: A1-A2-A10
For each code, create an ordered list of combinations that contain that code. Call that code_combs. In your example:
A0: [0]
A1: [0, 2]
A2: [1, 2]
A10: [1, 2]
Now we have a combination of codes, like A2-A10. We create two arrays, one of codes, the other of indices. Set indices at 0. So:
codes = ['A2', 'A10']
indices = [0, 0]
And now do the following:
while not done:
let max_comb = max(code_combs[codes[i]][indices[i]] over i in range(len(codes))
Advance each index until we are at the max_comb or greater
(if we reach the end of any list, we are done)
If all are at the same max_comb, we add its value.
Advance all indexes by 1.
(if we reach the end of any list, we are done)
Basically this is a k-way intersection of ordered lists. Now here is the trick. If we advance naively, this will be slightly faster because we only have to look at combinations that contain a code. However we can use a clever advance strategy like this:
Advance by 1, 2, 4, 8, etc until we reach or pass the point we want.
Do a binary search between the last two values until we find the point we want
(Be warned, implementing binary search is not always so easy to get right.)
And now we are crossing fingers. But if any one of our codes has few combinations that it is in, and there aren't too many codes in our combination, we can compute our intersection quite quickly.

Implementing cartesian product, such that it can skip iterations

I want to implement a function which will return cartesian product of set, repeated given number. For example
input: {a, b}, 2
output:
aa
ab
bb
ba
input: {a, b}, 3
aaa
aab
aba
baa
bab
bba
bbb
However the only way I can implement it is firstly doing cartesion product for 2 sets("ab", "ab), then from the output of the set, add the same set. Here is pseudo-code:
function product(A, B):
result = []
for i in A:
for j in B:
result.append([i,j])
return result
function product1(chars, count):
result = product(chars, chars)
for i in range(2, count):
result = product(result, chars)
return result
What I want is to start computing directly the last set, without computing all of the sets before it. Is this possible, also a solution which will give me similar result, but it isn't cartesian product is acceptable.
I don't have problem reading most of the general purpose programming languages, so if you need to post code you can do it in any language you fell comfortable with.
Here's a recursive algorithm that builds S^n without building S^(n-1) "first". Imagine an infinite k-ary tree where |S| = k. Label with the elements of S each of the edges connecting any parent to its k children. An element of S^m can be thought of as any path of length m from the root. The set S^m, in that way of thinking, is the set of all such paths. Now the problem of finding S^n is a problem of enumerating all paths of length n - and we can name a path by considering the sequence of edge labels from beginning to end. We want to directly generate S^n without first enumerating all of S^(n-1), so a depth-first search modified to find all nodes at depth n seems appropriate. This is essentially how the below algorithm works:
// collection to hold generated output
members = []
// recursive function to explore product space
Products(set[1...n], length, current[1...m])
// if the product we're working on is of the
// desired length then record it and return
if m = length then
members.append(current)
return
// otherwise we add each possible value to the end
// and generate all products of the desired length
// with the new vector as a prefix
for i = 1 to n do
current.addLast(set[i])
Products(set, length, current)
currents.removeLast()
// reset the result collection and request the set be generated
members = []
Products([a, b], 3, [])
Now, a breadth-first approach is no less efficient than a depth-first one, and if you think about it would be no different from exactly what you're already doing. Indeed, and approach that generates S^n must necessarily generate S^(n-1) at least once, since that can be found in a solution to S^n.

how to generate a maze using a binary matrix representation?

I'm suppose to generate a maze using a binary matrix
when a 0 represents empty cell
and 1 a wall
I tried to use the DFS algorithm, the problem is the DFS refers to cells and walls between them(each cell has at most four walls)
" then selects a random neighbouring cell that has not yet been visited. The computer removes the 'wall' between the two cells and.."
I don't understand to analogy to the representation I've been asked to implement
anyone has any idea?
When asked to make a grid, I'd start by making a multi-dimensional array.
Have the outer array contain each row of your grid, and each nested array be the columns.
Depending if each cell needs to remember being visited, each slot of the array can contain a simple class or struct (depending on your language of choice). Else, they can simply contain an int or a bool.
A simple example could be:
var grid = [
[1, 1, 1],
[1, 0, 1],
[1, 1, 1]
];
Representing an empty cell in the middle, surrounded by walls.
The idea is to construct a pattern of cells, then build wall between them, but not when it has a edge of DFS tree.
Example DFS run (2D case, edge removed, can be inferred from order of nodes):
O
O
O
O
OO
OO
OO
(this branch is stuck so start a new one)
OOO
OO
...
OOO
OOO
OOO
Now construct the maze:
OXSOO
OXOXO
OOOXO
XXXXO
EOOOO
O -> cell
X -> wall
S -> start
E -> end
The wall can be either a block or just a plane, the topology is same.

Create Ancestor Matrix from given Binary Tree

The question is, given a Ancestor Matrix, as a bitmap of 1s and 0s, to construct the corresponding Binary Tree. Can anyone give me an idea on how to do it? I found a solution at Stackoverflow, but the line a[root->data][temp[i]]=1 seems wrong, there is no binding that the nodes will contain data 1 to n. It may contain, say 2000, in which case, there will be no a[2000][some_column], since there are only 7 nodes, hence 7 rows and columns in the matrix.
Two ways:
Normalize your node values such that they are all from 1 to n. If you have nodes 1, 2, 5000 for example, make them 1, 2, 3. You can do this by sorting or hashing your labels and keeping something like normalized[i] = normalized value of node i. normalized can be a map / hash table if you have very large labels or even text labels.
You might be able to use a sparse matrix for this, implementable with a hash table or a set: keep a hash table of hash tables. H[x] stores another hash table that stores your y values. So if in a naive matrix solution you had a[2000][5000] = 1, you would use H.get(2000) => returns a hash table H' of values stored on the 2000th row => H'.get(5000) => returns the value you want.

Algorithm/Data Structure for finding combinations of minimum values easily

I have a symmetric matrix like shown in the image attached below.
I've made up the notation A.B which represents the value at grid point (A, B). Furthermore, writing A.B.C gives me the minimum grid point value like so: MIN((A,B), (A,C), (B,C)).
As another example A.B.D gives me MIN((A,B), (A,D), (B,D)).
My goal is to find the minimum values for ALL combinations of letters (not repeating) for one row at a time e.g for this example I need to find min values with respect to row A which are given by the calculations:
A.B = 6
A.C = 8
A.D = 4
A.B.C = MIN(6,8,6) = 6
A.B.D = MIN(6, 4, 4) = 4
A.C.D = MIN(8, 4, 2) = 2
A.B.C.D = MIN(6, 8, 4, 6, 4, 2) = 2
I realize that certain calculations can be reused which becomes increasingly important as the matrix size increases, but the problem is finding the most efficient way to implement this reuse.
Can point me in the right direction to finding an efficient algorithm/data structure I can use for this problem?
You'll want to think about the lattice of subsets of the letters, ordered by inclusion. Essentially, you have a value f(S) given for every subset S of size 2 (that is, every off-diagonal element of the matrix - the diagonal elements don't seem to occur in your problem), and the problem is to find, for each subset T of size greater than two, the minimum f(S) over all S of size 2 contained in T. (And then you're interested only in sets T that contain a certain element "A" - but we'll disregard that for the moment.)
First of all, note that if you have n letters, that this amounts to asking Omega(2^n) questions, roughly one for each subset. (Excluding the zero- and one-element subsets and those that don't include "A" saves you n + 1 sets and a factor of two, respectively, which is allowed for big Omega.) So if you want to store all these answers for even moderately large n, you'll need a lot of memory. If n is large in your applications, it might be best to store some collection of pre-computed data and do some computation whenever you need a particular data point; I haven't thought about what would work best, but for example computing data only for a binary tree contained in the lattice would not necessarily help you anything beyond precomputing nothing at all.
With these things out of the way, let's assume you actually want all the answers computed and stored in memory. You'll want to compute these "layer by layer", that is, starting with the three-element subsets (since the two-element subsets are already given by your matrix), then four-element, then five-element, etc. This way, for a given subset S, when we're computing f(S) we will already have computed all f(T) for T strictly contained in S. There are several ways that you can make use of this, but I think the easiest might be to use two such subset S: let t1 and t2 be two different elements of T that you may select however you like; let S be the subset of T that you get when you remove t1 and t2. Write S1 for S plus t1 and write S2 for S plus t2. Now every pair of letters contained in T is either fully contained in S1, or it is fully contained in S2, or it is {t1, t2}. Look up f(S1) and f(S2) in your previously computed values, then look up f({t1, t2}) directly in the matrix, and store f(T) = the minimum of these 3 numbers.
If you never select "A" for t1 or t2, then indeed you can compute everything you're interested in while not computing f for any sets T that don't contain "A". (This is possible because the steps outlined above are only interesting whenever T contains at least three elements.) Good! This leaves just one question - how to store the computed values f(T). What I would do is use a 2^(n-1)-sized array; represent each subset-of-your-alphabet-that-includes-"A" by the (n-1) bit number where the ith bit is 1 whenever the (i+1)th letter is in that set (so 0010110, which has bits 2, 4, and 5 set, represents the subset {"A", "C", "D", "F"} out of the alphabet "A" .. "H" - note I'm counting bits starting at 0 from the right, and letters starting at "A" = 0). This way, you can actually iterate through the sets in numerical order and don't need to think about how to iterate through all k-element subsets of an n-element set. (You do need to include a special case for when the set under consideration has 0 or 1 element, in which case you'll want to do nothing, or 2 elements, in which case you just copy the value from the matrix.)
Well, it looks simple to me, but perhaps I misunderstand the problem. I would do it like this:
let P be a pattern string in your notation X1.X2. ... .Xn, where Xi is a column in your matrix
first compute the array CS = [ (X1, X2), (X1, X3), ... (X1, Xn) ], which contains all combinations of X1 with every other element in the pattern; CS has n-1 elements, and you can easily build it in O(n)
now you must compute min (CS), i.e. finding the minimum value of the matrix elements corresponding to the combinations in CS; again you can easily find the minimum value in O(n)
done.
Note: since your matrix is symmetric, given P you just need to compute CS by combining the first element of P with all other elements: (X1, Xi) is equal to (Xi, X1)
If your matrix is very large, and you want to do some optimization, you may consider prefixes of P: let me explain with an example
when you have solved the problem for P = X1.X2.X3, store the result in an associative map, where X1.X2.X3 is the key
later on, when you solve a problem P' = X1.X2.X3.X7.X9.X10.X11 you search for the longest prefix of P' in your map: you can do this by starting with P' and removing one component (Xi) at a time from the end until you find a match in your map or you end up with an empty string
if you find a prefix of P' in you map then you already know the solution for that problem, so you just have to find the solution for the problem resulting from combining the first element of the prefix with the suffix, and then compare the two results: in our example the prefix is X1.X2.X3, and so you just have to solve the problem for
X1.X7.X9.X10.X11, and then compare the two values and choose the min (don't forget to update your map with the new pattern P')
if you don't find any prefix, then you must solve the entire problem for P' (and again don't forget to update the map with the result, so that you can reuse it in the future)
This technique is essentially a form of memoization.

Resources