High and Low bits in van Emde Boas Tree - data-structures

I was trying to understand the concept of vEB tree.
In an example:
I assumed a universe set U = {0, 1, 2, 3 ..... 8}. So the size is 9.
Now lets take a subset S = {0, 1, 3, 4, 6, 7}.
For an operation FindSuccessor (3, S); where I need to know the smallest element > 3 in subset S, I need to know the high and low bits of my element i.e. 3.
One explanation says its the first half and second half bits, giving the result 00 and 11 as high and low respectively.
Another says:
high = Floor [element/sqrt(|U|)] = Floor [3/ sqrt (9)] = Floor [1] = 1;
low = element % sqrt(|U|) = 3 % sqrt (9) = 0;
Please explain where am I going wrong?

You're not going wrong—the explanations are for two slightly different data structures that coincide only when |U| is a square power of two. At a high level, we're trying to divide a key k into two halves, each with about √|U| possibilities. The first method achieves this goal directly; the second is an approximation that runs faster on commodity hardware (assuming |U| is a power of two, the worst case is when |U| is not square and the first half has twice as many possibilities as the second). Pick one method and stick with it.
Here's an example of FindSuccessor(3, S). For simplicity, I'm going to bottom out the recursion at three elements.
The tree looks like
min=0| aux
max=7|------->min=0|
/ | \ max=2|
/ | \ /|\
/ | \ 0 1 2
/ | \
v v v
min=0| min=3| min=6|
max=1| max=4| max=7|
/| /| /|
0 1 3 4 6 7
At the root, we split 3 = (1, 0) and check whether the 1th (middle) child has max > 3. It does, so we descend there and use brute force to compute the answer, 4. (Of course, if the tree had more than two levels, we would search recursively.)
A more interesting case is when S = {0, 1, 3, 6, 7}.
min=0| aux
max=7|------->min=0|
/ | \ max=2|
/ | \ /|\
/ | \ 0 1 2
/ | \
v v v
min=0| min=3| min=6|
max=1| max=3| max=7|
/| / /|
0 1 3 6 7
Here, we examine the 1th subtree of the root, {3}, and find that its max is not greater than 3. We find the successor of 1 in the aux data structure, which is 2, and return the min of the 2th subtree, which is 6.

Related

hash for particular array

I have a very particular problem that I want to solve efficiently.
A geometry is defined by V volumes, numbered from 0 to V-1.
Each volume is bounded by different surfaces, numbered from 0 to N-1).
Volume | Surfaces
--------------------
Geometry A (V=2, N=7): 0 | [0 3 5 6 2]
1 | [5 4 2 1]
2 | [4 0 1 3 6]
Note that a surface will only appear once in a volume.
Also, a surface is at most in 2 volumes of a geometry.
Here is the problem:
I have two different descriptions of the same underlying geometry and I want to find which volume in Geometry A correspond to which volume in Geometry B. In other words, I have the same N surfaces, but the V volumes are defined differently.
Here is a Geometry B that could correspond to Geometry A above:
Volume | Surfaces
--------------------
Geometry B (V=2, N=7): 0 | [1 5 4 2]
1 | [3 6 5 0 2]
2 | [0 1 3 6 4]
Given Geometry A and B, I want to be able to bind each volume of Geometry A to its corresponding volume in Geometry B, the most efficiently as possible.
A 0 1 2
B 1 0 2
Draft of solution:
Sort each array of surfaces in ascending or descending order, than sort each volume following the lexicographic order of their surfaces. The problem is easily and robustly solved this way.
Better solution:
Compute a quick, unique hash for each array, than sort volumes following this hash. The hash should not depend on the order of surfaces in the array.
Why do I think a hash can be a good solution ?
Take hash(Volume) = min([Surfaces])
This hash already has at most 1 collision, because a surface can only appear in 2 volumes !
Now, if I take hash(Volume) = min([Surfaces]) + max([Sufaces])*N, I still have at most 1 collision, but the probability becomes very small when there is a lot of volumes and surfaces.
As mentioned, your solution is a good approximation for what you want. However, if you seek a perfect hash function, you can use the following method:
suppose p_i is the i-th prime number such that p_0 = 2, p_1 = 3, p_2 = 5, p_3 = 7, p_4 = 11, p_5 = 13, p_6 = 17, p_7 = 19 .... We can define a hash function on x_0, x_1, ..., x_k from an array such that h(x_0, ..., x_k) = p_{x_0} p_{x_1} ... p_{x_k}. Also, for the repeated numbers, we can apply the number of repetition as a power of the p_{x_i}. It means, for example, if x_i is repeated 3 times, the power of p_{x_i} in h would be p_{x_i}^3. if number of repetition of x_i is a_i we will have h(x_0, ..., x_k) = p_{x_0}^{a_0} p_{x_1}^{a_1} ... p_{x_k}^{a_k}.
Hence, for geometry A we have:
Volume | Surfaces | Hash
----------------------------------
geometry A 0 | [0, 3, 5, 6, 2] | 2 * 7 * 13 * 17 * 5 = 15470
1 | [5, 4, 2, 1] | 13 * 11 * 5 * 3 = 2145
2 | [4, 0, 1, 3, 6] | 11 * 2 * 3 * 7 * 17 = 7854
And the similar way for geometry B. As this function returns a unique value for each array (without concern with the order) you can arrange the surfaces using the correspondence hash value. If the value of N is not big, you can use the precomputed list of prime values.
I found a pretty good hash function, that should almost never have collisions:
V: [S_0 S_1 S_2 S_3...S_N-1]
u64 hash(V) = 0;
for i in {0..N-1} :
hash(V) = hash(V) ^ (1<<(S_i & 63))
end
This gives a unique 64 bit number, and all numbers are possible (unlike Omg's solution, where most numbers are impossible to get given that there is no repetition in the list of surface)
In the extreme case where there is a collision (which I will see after sorting), I will compare the arrays lexicographically in a stupid manner.

Segment tree data position to tree position relation

I wonder if there is any relation between data_array data position to tree_array data position.
int data[N];
int tree[M]; // lets M = 2^X-1, where X = nearest ceiling power of 2 to N;
void build_segment_tree();
I wonder if I can say n'th value of data[] is mapped with i'th value of tree[]. is there any mathematical resolution?
You certainly can. For example segment tree is used for it's capapbility to store
segment information.
Now you will see that if you want to create a segment tree out of N elements then
you will need ceil(log_2(N))+1 levels. And in the last level you will find all the
1 length-range or the single elements.
These elements will be precisely in the position (1-index) 2^ceil(log_2(N)) to 2^ceil(log_2(N))+N-1.
[1-8]
/ \
[1-4] [5-8]
/ \ / \
[1-2][3-4] [5-6][7-8]
/\ /\ /\ /\
[1][2] [3][4] [5][6] [7][8]
1-11
/ \
1-6 7-11
1-3 4-6 7-9 10-11
1-2 3 4-5 6 7-8 9 10 11
1 2 4 5 7 8
This answer is for only valid for segment tree of power of 2 elements.
But for other elements the elements are not necessarily organized.
So the answer will be false for N those are not power of 2.
On that case you can't find any formualitve rule.

How many permutations of a given array result in BST's of height 2?

A BST is generated (by successive insertion of nodes) from each permutation of keys from the set {1,2,3,4,5,6,7}. How many permutations determine trees of height two?
I been stuck on this simple question for quite some time. Any hints anyone.
By the way the answer is 80.
Consider how the tree would be height 2?
-It needs to have 4 as root, 2 as the left child, 6 right child, etc.
How come 4 is the root?
-It needs to be the first inserted. So we have one number now, 6 still can move around in the permutation.
And?
-After the first insert there are still 6 places left, 3 for the left and 3 for the right subtrees. That's 6 choose 3 = 20 choices.
Now what?
-For the left and right subtrees, their roots need to be inserted first, then the children's order does not affect the tree - 2, 1, 3 and 2, 3, 1 gives the same tree. That's 2 for each subtree, and 2 * 2 = 4 for the left and right subtrees.
So?
In conclusion: C(6, 3) * 2 * 2 = 20 * 2 * 2 = 80.
Note that there is only one possible shape for this tree - it has to be perfectly balanced. It therefore has to be this tree:
4
/ \
2 6
/ \ / \
1 3 5 7
This requires 4 to be inserted first. After that, the insertions need to build up the subtrees holding 1, 2, 3 and 5, 6, 7 in the proper order. This means that we will need to insert 2 before 1 and 3 and need to insert 6 before 5 and 7. It doesn't matter what relative order we insert 1 and 3 in, as long as they're after the 2, and similarly it doesn't matter what relative order we put 5 and 7 in as long as they're after 6. You can therefore think of what we need to insert as 2 X X and 6 Y Y, where the X's are the children of 2 and the Y's are the children of 6. We can then find all possible ways to get back the above tree by finding all interleaves of the sequences 2 X X and 6 Y Y, then multiplying by four (the number of ways of assigning X and Y the values 1, 3, 5, and 7).
So how many ways are there to interleave? Well, you can think of this as the number of ways to permute the sequence L L L R R R, since each permutation of L L L R R R tells us how to choose from either the Left sequence or the Right sequence. There are 6! / 3! 3! = 20 ways to do this. Since each of those twenty interleaves gives four possible insertion sequences, there end up being a total of 20 × 4 = 80 possible ways to do this.
Hope this helps!
I've created a table for the number of permutations possible with 1 - 12 elements, with heights up to 12, and included the per-root break down for anybody trying to check that their manual process (described in other answers) is matching with the actual values.
http://www.asmatteringofit.com/blog/2014/6/14/permutations-of-a-binary-search-tree-of-height-x
Here is a C++ code aiding the accepted answer, here I haven't shown the obvious ncr(i,j) function, hope someone will find it useful.
int solve(int n, int h) {
if (n <= 1)
return (h == 0);
int ans = 0;
for (int i = 0; i < n; i++) {
int res = 0;
for (int j = 0; j < h - 1; j++) {
res = res + solve(i, j) * solve(n - i - 1, h - 1);
res = res + solve(n - i - 1, j) * solve(i, h - 1);
}
res = res + solve(i, h - 1) * solve(n - i - 1, h - 1);
ans = ans + ncr(n - 1, i) * res;
}
return ans
}
The tree must have 4 as the root and 2 and 6 as the left and right child, respectively. There is only one choice for the root and the insertion should start with 4, however, once we insert the root, there are many insertion orders. There are 2 choices for, the second insertion 2 or 6. If we choose 2 for the second insertion, we have three cases to choose 6: choose 6 for the third insertion, 4, 2, 6, -, -, -, - there are 4!=24 choices for the rest of the insertions; fix 6 for the fourth insertion, 4, 2, -, 6, -,-,- there are 2 choices for the third insertion, 1 or 3, and 3! choices for the rest, so 2*3!=12, and the last case is to fix 6 in the fifth insertion, 4, 2, -, -, 6, -, - there are 2 choices for the third and fourth insertion ((1 and 3), or (3 and 1)) as well as for the last two insertions ((5 and 7) or (7 and 5)), so there are 4 choices. In total, if 2 is the second insertion we have 24+12+4=40 choices for the rest of the insertions. Similarly, there are 40 choices if the second insertion is 6, so the total number of different insertion orders is 80.

Hanoi configuration at a certain move

I am interested in finding how many disks are on each peg at a given move in the towers of Hanoi puzzle. For example, given n = 3 disks we have this sequence of configurations for optimally solving the puzzle:
0 1 2
0. 3 0 0
1. 2 0 1 (move 0 -> 2)
2. 1 1 1 (move 0 -> 1)
3. 1 2 0 (move 2 -> 1)
4. 0 2 1 (move 0 -> 2)
5. 1 1 1 (move 1 -> 0)
6. 1 0 2 (move 1 -> 2)
7. 0 0 3 (move 0 -> 2)
So given move number 5, I want to return 1 1 1, given move number 6, I want 1 0 2 etc.
This can easily be done by using the classical algorithm and stopping it after a certain number of moves, but I want something more efficient. The wikipedia page I linked to above gives an algorithm under the Binary solutions section. I think this is wrong however. I also do not understand how they calculate n.
If you follow their example and convert the disk positions it returns to what I want, it gives 4 0 4 for n = 8 disks and move number 216. Using the classical algorithm however, I get 4 2 2.
There is also an efficient algorithm implemented in C here that also gives 4 2 2 as the answer, but it lacks documentation and I don't have access to the paper it's based on.
The algorithm in the previous link seems correct, but can anyone explain how exactly it works?
A few related questions that I'm also interested in:
Is the wikipedia algorithm really wrong, or am I missing something? And how do they calculate n?
I only want to know how many disks are on each peg at a certain move, not on what peg each disk is on, which is what the literature seems to be more concerned about. Is there a simpler way to solve my problem?
1) If your algo says Wikipedia is broken I'd guess you are right...
2) As for calculating the number of disks in each peg, is it pretty straightfoward to do a recursive algorithm for it:
(Untested, unelegant and possibly full of +-1 errors code follows:)
function hanoi(n, nsteps, begin, middle, end, nb, nm, ne)
// n = number of disks to mive from begin to end
// nsteps = number of steps to move
// begin, middle, end = index of the pegs
// nb, nm, ne = number of disks currently in each of the pegs
if(nsteps = 0) return (begin, middle, end, nb, nm, ne)
//else:
//hanoi goes like
// a) h(n-1, begin, end, middle) | 2^(n-1) steps
// b) move 1 from begin -> end | 1 step
// c) h(n-1, middle, begin, end) | 2^(n-1) steps
// Since we know how the pile will look like after a), b) and c)
// we can skip those steps if nsteps is large...
if(nsteps <= 2^(n-1)){
return hanoi(n-1, nsteps, begin, end, middle, nb, ne, nm):
}
nb -= n;
nm += (n-1);
ne += 1;
nsteps -= (2^(n-1) + 1);
//we are now between b) and c)
return hanoi((n-1), nsteps, middle, begin, end, nm, nb, ne);
function(h, n, nsteps)
return hanoi(n, nsteps, 1, 2, 3, n, 0, 0)
If you want effieciency, it should try to convert this to an iterative form (shouldn't be hard - you don't need to mantain a stack anyways) and find a way to better represent the state of the program, instead of using 6+ variables willy nilly.
You can make use of the fact that the position at powers of two is easily known. For a tower of size T, we have:
Time Heights
2^T-1 | { 0, 0, T }
2^(T-1) | { 0, T-1, 1 }
2^(T-1)-1 | { 1, T-1, 0 }
2^(T-2) | { 1, 1, T-2 }
2^(T-2)-1 | { 2, 0, T-2 }
2^(T-2) | { 2, T-3, 1 }
2^(T-2)-1 | { 3, T-3, 0 }
...
0 | { T, 0, 0 }
It is easy to find out in between which those levels your move k is; simply look at log2(k).
Next, notice that between 2^(a-1) and 2^a-1, there are T-a disks which stay in the same place (the heaviest disks). All the other blocks will move however, since at this stage the algorithm is moving the subtower of size a. Hence use an iterative approach.
It might be a bit tricky to get the book-keeping right, but here you have the ingredients to find the heights for any k, with time complexity O(log2(T)).
Cheers
If you look at the first few moves of the puzzle, you'll see an important pattern. Each move (i - j) below means on turn i, move disc j. Discs are 0-indexed, where 0 is the smallest disc.
1 - 0
2 - 1
3 - 0
4 - 2
5 - 0
6 - 1
7 - 0
8 - 3
9 - 0
10 - 1
11 - 0
12 - 2
13 - 0
14 - 1
15 - 0
Disc 0 is moved every 2 turns, starting on turn 1. Disc 1 is moved every 4 turns, starting on turn 2 ... Disc i is moved every 2^(i+1) turns, starting on turn 2^i.
So, in constant time we can determine how many times a given disc has moved, given m:
moves = (m + 2^i) / (2^(i+1)) [integer division]
The next thing to note is that each disc moves in a cyclic pattern. Namely, the odd-numbered discs move to the left each time they move (2, 3, 1, 2, 3, 1...) and the even-numbered discs move to the right (1, 3, 2, 1, 3, 2...)
So once you know how many times a disc has moved, you can easily determine which peg it ends on by taking mod 3 (and doing a little bit of figuring).

How does this work? Weird Towers of Hanoi Solution

I was lost on the internet when I discovered this unusual, iterative solution to the towers of Hanoi:
for (int x = 1; x < (1 << nDisks); x++)
{
FromPole = (x & x-1) % 3;
ToPole = ((x | x-1) + 1) % 3;
moveDisk(FromPole, ToPole);
}
This post also has similar Delphi code in one of the answers.
However, for the life of me, I can't seem to find a good explanation for why this works.
Can anyone help me understand it?
the recursive solution to towers of Hanoi works so that if you want to move N disks from peg A to C, you first move N-1 from A to B, then you move the bottom one to C, and then you move again N-1 disks from B to C. In essence,
hanoi(from, to, spare, N):
hanoi(from, spare, to, N-1)
moveDisk(from, to)
hanoi(spare, to, from, N-1)
Clearly hanoi( _ , _ , _ , 1) takes one move, and hanoi ( _ , _ , _ , k) takes as many moves as 2 * hanoi( _ , _ , _ , k-1) + 1. So the solution length grows in the sequence 1, 3, 7, 15, ... This is the same sequence as (1 << k) - 1, which explains the length of the loop in the algorithm you posted.
If you look at the solutions themselves, for N = 1 you get
FROM TO
; hanoi(0, 2, 1, 1)
0 2 movedisk
For N = 2 you get
FROM TO
; hanoi(0, 2, 1, 2)
; hanoi(0, 1, 2, 1)
0 1 ; movedisk
0 2 ; movedisk
; hanoi(1, 2, 0, 1)
1 2 ; movedisk
And for N = 3 you get
FROM TO
; hanoi(0, 2, 1, 3)
; hanoi(0, 1, 2, 2)
; hanoi(0, 2, 1, 1)
0 2 ; movedisk
0 1 ; movedisk
; hanoi(2, 1, 0, 1)
2 1 ; movedisk
0 2 ; movedisk ***
; hanoi(1, 2, 0, 2)
; hanoi(1, 0, 2, 1)
1 0 ; movedisk
1 2 ; movedisk
; hanoi(0, 2, 1, 1)
0 2 ; movedisk
Because of the recursive nature of the solution, the FROM and TO columns follow a recursive logic: if you take the middle entry on the columns, the parts above and below are copies of each others, but with the numbers permuted. This is an obvious consequence of the algorithm itself which does not perform any arithmetics on the peg numbers but only permutes them. In the case N=4 the middle row is at x=4 (marked with three stars above).
Now the expression (X & (X-1)) unsets the least set bit of X, so it maps e.g. numbers form 1 to 7 like this:
1 -> 0
2 -> 0
3 -> 2
4 -> 0 (***)
5 -> 4 % 3 = 1
6 -> 4 % 3 = 1
7 -> 6 % 3 = 0
The trick is that because the middle row is always at an exact power of two and thus has exactly one bit set, the part after the middle row equals the part before it when you add the middle row value (4 in this case) to the rows (i.e. 4=0+4, 6=2+6). This implements the "copy" property, the addition of the middle row implements the permutation part. The expression (X | (X-1)) + 1 sets the lowest zero bit which has ones to its right, and clears these ones, so it has similar properties as expected:
1 -> 2
2 -> 4 % 3 = 1
3 -> 4 % 3 = 1
4 -> 8 (***) % 3 = 2
5 -> 6 % 3 = 0
6 -> 8 % 3 = 2
7 -> 8 % 3 = 2
As to why these sequences actually produce the correct peg numbers, let's consider the FROM column. The recursive solution starts with hanoi(0, 2, 1, N), so at the middle row (2 ** (N-1)) you must have movedisk(0, 2). Now by the recursion rule, at (2 ** (N-2)) you need to have movedisk(0, 1) and at (2 ** (N-1)) + 2 ** (N-2) movedisk (1, 2). This creates the "0,0,1" pattern for the from pegs which is visible with different permutations in the table above (check rows 2, 4 and 6 for 0,0,1 and rows 1, 2, 3 for 0,0,2, and rows 5, 6, 7 for 1,1,0, all permuted versions of the same pattern).
Now then of all the functions that have this property that they create copies of themselves around powers of two but with offsets, the authors have selected those that produce modulo 3 the correct permutations. This isn't an overtly difficult task because there are only 6 possible permutations of the three integers 0..2 and the permutations progress in a logical order in the algorithm. (X|(X-1))+1 is not necessarily deeply linked with the Hanoi problem or it doesn't need to be; it's enough that it has the copying property and that it happens to produce the correct permutations in the correct order.
antti.huima's solution is essentially correct, but I wanted something more rigorous, and it was too big to fit in a comment. Here goes:
First note: at the middle step x = 2N-1 of this algorithm, the "from" peg is 0, and the "to" peg is 2N % 3. This leaves 2(N-1) % 3 for the "spare" peg.
This is also true for the last step of the algorithm, so we see that actually the authors' algorithm
is a slight "cheat": they're moving the disks from peg 0 to peg 2N % 3, rather than a fixed,
pre-specified "to" peg. This could be changed with not much work.
The original Hanoi algorithm is:
hanoi(from, to, spare, N):
hanoi(from, spare, to, N-1)
move(from, to)
hanoi(spare, to, from, N-1)
Plugging in "from" = 0, "to" = 2N % 3, "spare" = 2N-1 % 3, we get (suppressing the %3's):
hanoi(0, 2**N, 2**(N-1), N):
(a) hanoi(0, 2**(N-1), 2**N, N-1)
(b) move(0, 2**N)
(c) hanoi(2**(N-1), 2**N, 0, N-1)
The fundamental observation here is:
In line (c), the pegs are exactly the pegs of hanoi(0, 2N-1, 2N, N-1) shifted by 2N-1 % 3, i.e.
they are exactly the pegs of line (a) with this amount added to them.
I claim that it follows that when we
run line (c), the "from" and "to" pegs are the corresponding pegs of line (a) shifted by 2N-1 % 3. This
follows from the easy, more general lemma that in hanoi(a+x, b+x, c+x, N), the "from and "to" pegs are shifted exactly x from in hanoi(a, b, c, N).
Now consider the functions
f(x) = (x & (x-1)) % 3
g(x) = (x | (x-1)) + 1 % 3
To prove that the given algorithm works, we only have to show that:
f(2N-1) == 0 and g(2N-1) == 2N
for 0 < i < 2N, we have f(2N - i) == f(2N + i) + 2N % 3, and g(2N - i) == g(2N + i) + 2N % 3.
Both of these are easy to show.
This isn't directly answering the question, but it was too long to put in a comment.
I had always done this by analyzing the size of disk you should move next. If you look at the disks moved, it comes out to:
1 disk : 1
2 disks : 1 2 1
3 disks : 1 2 1 3 1 2 1
4 disks : 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1
Odd sizes always move in the opposite direction of even ones, in order if pegs (0, 1, 2, repeat) or (2, 1, 0, repeat).
If you take a look at the pattern, the ring to move is the highest bit set of the xor of the number of moves and the number of moves + 1.

Resources