I have a heap (implemented like a binary tree: each node has two pointers to the children and one pointer to the parent).
How can I find the k-th element (in a BFS order), given the number of elements in it? I think it can be done in O(logn) time..
(I'm assuming by "kth element (in a BFS order)" that you mean the kth element from the perspective of a top-to-bottom, left-to-right scan of the input.)
Since you know that a binary heap is a complete binary tree (except possibly at the last level), you know that the shape of the tree is a perfect binary tree of some height (containing 2k nodes for some k) with some number of nodes filled in from the left to the right. A really nifty property of these trees occurs when you write out the numbers of the nodes in a picture, one-indexing the values:
1
2 3
4 5 6 7
8 9 10 11 12 13 14 15
Notice that each layer starts with a node that's a power of two. So let's suppose, hypothetically, that you wanted to look up the number 13. The biggest power of two no greater than 13 is 8, so we know that 13 must appear in the row
8 9 10 11 12 13 14 15
We can now use this knowledge to reverse-engineer the path from 13 back up to the root of the tree. We know, for example, that 13 is in the latter half of the the numbers in this row, which means that 13 belongs to the right subtree of the root (if it belonged to the left subtree, then we would be in the subtree containing 8, 9, 10, and 11.) This means that we can go right from the root and throw out half of the numbers to get
12 13 14 15
We are now at node 3 in the tree. So do we go left or right? Well, 13 is in the first half of these numbers, so we know at this point that we need to descend into the left subtree of node 3. This takes us to node 6, and now we're left with the first half of the numbers:
12 13
13 is in the right half of these nodes, so we should descend to the right, taking us to node 13. And voila! We're there!
So how did this process work? Well, there's a really, really cute trick we can use. Let's write out the same tree we had above, but in binary:
0001
0010 0011
0100 0101 0110 0111
1000 1001 1010 1011 1100 1101 1110 1111
^^^^
I've pointed out the location of node 13. Our algorithm worked in the following way:
Find the layer containing the node.
While not at the node in question:
If the node is in the first half of the layer it's in, move left and throw away the right half of the range.
If the node is in the second half of the layer it's in, move right and throw away the left half of the range.
Let's think about what this means in binary. Finding the layer containing the node is equivalent to finding the most significant bit set in the number. In 13, which has binary representation 1101, the MSB is the 8 bit. This means that we're in the layer starting with eight.
So how do we determine whether we're in the left subtree or the right subtree? Well, to do that, we'd need to see if we are in the first half of this layer or the second half. And now for a cute trick - look at the next bit after the MSB. If this bit is set to 0, we're in the first half of the range, and otherwise we're in the second half of the range. Thus we can determine which half of the range we're in by just looking at the next bit of the number! This means we can determine which subtree to descend into by looking just at the next bit of the number.
Once we've done that, we can just repeat this process. What do we do at the next level? Well, if the next bit is a zero, we go left, and if the next bit is a one, we go right. Take a look at what this means for 13:
1101
^^^
|||
||+--- Go right at the third node.
||
|+---- Go left at the second node.
|
+----- Go right at the first node.
In other words, we can spell out the path from the root of the tree to our node in question just by looking at the bits of the number after the MSB!
Does this always work! You bet! Let's try the number 7. This has binary representation 0111. The MSB is in the 4's place. Using our algorithm, we'd do this:
0111
^^
||
|+--- Go right at the second node.
|
+---- Go right at the first node.
Looking in our original picture, this is the right path to take!
Here's some rough C/C++ pseudocode for this algorithm:
Node* NthNode(Node* root, int n) {
/* Find the largest power of two no greater than n. */
int bitIndex = 0;
while (true) {
/* See if the next power of two is greater than n. */
if (1 << (bitIndex + 1) > n) break;
bitIndex++;
}
/* Back off the bit index by one. We're going to use this to find the
* path down.
*/
bitIndex--;
/* Read off the directions to take from the bits of n. */
for (; bitIndex >= 0; bitIndex--) {
int mask = (1 << bitIndex);
if (n & mask)
root = root->right;
else
root = root->left;
}
return root;
}
I haven't tested this code! To paraphrase Don Knuth, I've just shown that conceptually it does the right thing. I might have an off-by-one error in here.
So how fast is this code? Well, the first loop runs until it finds the first power of two greater than n, which takes O(log n) time. The next part of the loop counts backwards through the bits of n one at a time, doing O(1) work at each step. The overall algorithm thus takes a total of O(log n) time.
Hope this helps!
Related
I saw an answer here with the idea implemented in Python (not very familiar with Python) - I was looking for a more general algorithm.
EDIT:
For clarification:
Say we are given a list of integer keys: 23 44 88 12 74 32 7 39 10
That list was chosen arbitrarily. We are to create an almost complete (or complete) binary search tree from that list. There is supposed to be only one such tree...how do we find it?
A binary search tree is constructed so that all items on a node's left subtree are less than the node, and all nodes on the right subtree are greater than the node.
A complete (or almost complete) binary tree is one in which all levels except possibly the last are completely full, and the bottom level is filled to the left.
So, for example, this is an almost-complete binary search tree:
4
/ \
2 5
/ \
1 3
This is not:
3
/ \
2 4
/ \
1 5
Because the bottom level of the tree is not filled from the left.
If the number of items is one less than a power of two (i.e. 3, 7, 15, etc.), then building the tree is easy. Start by sorting the list. Then, take the middle element as the root. So if you have [1,2,3,4,5,6,7], and the root node is 4.
You do the same thing recursively for the right and left halves of the array.
If the number of items is not one less than a power of two, you have to adjust the starting point (the root node) so that the bottom row is left-filled. Note that you might have to apply that adjustment recursively, as well, whenever your subtree length is not one less than a power of two.
Since this is a homework assignment, I'll leave that for you to figure out.
I have tried many online sources but i am unable to understand how digital binary search tree works.Below Link is the example for your reference
(LINK: http://cseweb.ucsd.edu/~kube/cls/100/Lectures/lec15/lec15-10.html)
Is anybody construct a tree using these values and tell in detail that how it works?
A 00001
S 10011
E 00101
R 10010
C 00011
H 10100
The tree is constructed in such a way that the binary representations of the keys (A,S,E,R,C,H) can be used to locate them into the tree. In each searching step, the key is compared to the curren node (which is the root of the current search three). If the the key is not the root, the most significant bit of the key's binary representation is used to select the left subree (if the bit is 0) or the right subtree (if the bit is 1). This process is explained in more detail here.
In the example you provided, the key H (binary representation 10100) can be found as follows.
In the first step, the root is node A. As A does not equal H, the bit 1 is used, indicating that the right subtree should be chosen. Consequently, we consider the node S and the bit string 0100 which results from the original binary representation by omission of the most significant bit.
Since A does not equal H, we use the most significant bit, which is 0, indicating a choice of the left subtree. We consider the node R and the bit string 100.
As R does not equal H, again we use the most significant bit, which is 1, which means that the right subtree is to be chosen. We consider the node H and the bit string 00.
Since H equals H, we have found the desired key and the search terminates.
The DST works like by checking the bit level by level. Checks the starting of the bit if o then moves to left or else to right. At the same time it compares the bit position with the level.
For example:
Root is in the O level,
if the bit to be inserted is in the 1 st level then checks the 1st bit,
if 0 then inserts as a left or if it is 1 then insert in the right.
Similarly for the second level, checks the second bit to be inserted and works like this for the remaining levels.
In the given example;
First A (00001) is a root node, then S(10011) since 1 , move to the right and inserted.
The next is E(00101) , since 0 moves to left and inserted, next in the sequence is R(10010), since 1 move to the right , the bit in the second position is 0 so it is inserted as a left child of S.
In the sequence next is C(00011), 0 so moves to the left since 2nd bit is 0 insert in the left side, next is H(10100), since it starts with 1, moves to right, it has to be inserted as 3rd level so check the 3rd bit position as it is with 1, it is inserted in the right.
Hope this will clear your doubt.
So the final DST looks like this
[DST] [1]: https://i.stack.imgur.com/Iet4n.gif
If a binary tree is constructed as follows
the root is 1
the left child of an element n is 2*n
the right child of an element n is (2*n)+1
if I get a number n, what's the quickest way to find out if it's in the left or right subtree of the root? Is there some easy-to-determine mathematical property of the left subtree?
note: this is NOT a homework question, though it is part of a bigger algorithmic problem I'm trying to solve.
Consider the numbers in binary. Each child will be the parent number with a 0 or 1 appended to it depending on whether it is left or right.
This means that everything to the left of the root will start 10 in binary and anything to the right will start 11 in binary.
This means that you should be able to work out which side it is on just using some shift operations and then some comparisons.
I should note that I don't know if this is the most efficient method but it is a easy to determine mathematical property of the left subtree.
As noted by others the consequence of the 0 or 1 appendation means that each digit encodes the path through the subtree. The first 1 represents the root of the tree and from then on a 0 will mean taking the left branch at that point and a 1 will mean taking the right branch.
Thus binary 1001101 would mean left, left, right, right, left, right.
An obvious consequence of this is that the number of binary digits will determine exactly how deep in the tree that number is. so 1 is the top (1st) level. 10 would be at the second level (one choice made). my example 1001101 would be at the 7th level (six choices made). I should note that I'm unfamiliar with Binary tree terminology so not sure if the root would usually be considered the first or zeroth level which is why I am being explicit about number of choices made too.
One last observation in case it hasn't already been observed is that the numbers will also be assigned counting from top to bottom, left to right. So the first level is 1. The next is 2 on the left, 3 on the right. The level below that will go 4, 5, 6, 7 and then the row below that 8, 9, 10, 11, 12, 13, 14, 15 and so on. This isn't any more useful mathematically but if you are trying to visualise it may well help.
Following from Chris' observation, there's a very simple rule: Let x be the node you are looking for. Let S be the binary representation of x. Then the digits in S after the first from most significant tell you the path from the root: 0 means go left, 1 means go right.
Example: x = 2710 = 110112, so we need to go right, left, right, right to get there (the leading 1 is ignored).
The reason why this is true is that if you go right, you multiply by 2 (binary left shift by 1) and add a 1, so you are essentially appending a 1. Conversely, if you go left, you append a 0.
Here is the interview problem: Designing a data structure for a range of integers {1,...,M} (numbers can be repeated) support insert(x), delete(x) and return mode which return the most frequently number.
The interviewer said that we can do in O(1) for all the operation with preprocessed in O(M). He also accepted that I can do insert(x) and delete(x) in O(log(n)), return mode in O(1) with preprocessed in O(M).
But I can only give in O(n) for insert(x) and delete(x) and return mode in O(1), actually how can I give O(log (n)) or/and O(1) in insert(x) and delete(x), and return mode in O(1) with preprocessed in O(M)?
When you hear O(log X) operations, the first structures that comes to mind should be a binary search tree and a heap. For reference: (since I'm focussing on a heap below)
A heap is a specialized tree-based data structure that satisfies the heap property: If A is a parent node of B then the key of node A is ordered with respect to the key of node B with the same ordering applying across the heap. ... The keys of parent nodes are always greater than or equal to those of the children and the highest key is in the root node (this kind of heap is called max heap) ....
A binary search tree doesn't allow construction (from unsorted data) in O(M), so let's see if we can make a heap work (you can create a heap in O(M)).
Clearly we want the most frequent number at the top, so this heap needs to use frequency as its ordering.
But this brings us to a problem - insert(x) and delete(x) will both require that we look through the entire heap to find the correct element.
Now you should be thinking "what if we had some sort of mapping from index to position in the tree?", and this is exactly what we're going to have. If all / most of the M elements exist, we could simply have an array, with each index i's element being a pointer to the node in the heap. If implemented correctly, this will allow us to look up the heap node in O(1), which we could then modify appropriately, and move, taking O(log M) for both insert and delete.
If only a few of the M elements exist, replacing the array with a (hash-)map (of integer to heap node) might be a good idea.
Returning the mode will take O(1).
O(1) for all operations is certainly quite a bit more difficult.
The following structure comes to mind:
3 2
^ ^
| |
5 7 4 1
12 14 15 18
To explain what's going on here - 12, 14, 15 and 18 correspond to the frequency, and the numbers above correspond to the elements with said frequency, so both 5 and 3 would have a frequency of 12, 7 and 2 would have a frequency of 14, etc.
This could be implemented as a double linked-list:
/-------\ /-------\
(12) <-> 5 <-> 3 <-> (13) <-> (14) <-> 7 <-> 2 <-> (15) <-> 4 <-> (16) <-> (18) <-> 1
^------------------/ ^------/ ^------------------/ ^------------/ ^------/
You may notice that:
I filled in the missing 13 and 16 - these are necessary, otherwise we'll have to update all elements with the same frequency when doing an insert (in this example, you would've needed to update 5 to point to 13 when doing insert(3), because 13 wouldn't have existed yet, so it would've been pointing to 14).
I skipped 17 - this is just be an optimization in terms of space usage - this makes this structure take O(M) space, as opposed to O(M + MaxFrequency). The exact conditions for skipping a number is simply that it doesn't have any elements at its frequency, or one less than its frequency.
There's some strange things going on above the linked-list. These simply mean that 5 points to 13 as well, and 7 points to 15 as well, i.e. each element also keeps a pointer to the next frequency.
There's some strange things going on below the linked-list. These simply mean that each frequency keeps a pointer to the frequency before it (this is more space efficient than each element keeping a pointer to both it's own and the next frequency).
Similarly to the above solution, we'd keep a mapping (array or map) of integer to node in this structure.
To do an insert:
Look up the node via the mapping.
Remove the node.
Get the pointer to the next frequency, insert it after that node.
Set the next frequency pointer using the element after the insert position (either it is the next frequency, in which case we can just make the pointer point to that, otherwise we can make this next frequency pointer point to the same element as that element's next frequency pointer).
To do a remove:
Look up the node via the mapping.
Remove the node.
Get the pointer to the current frequency via the next frequency, insert it before that node.
Set the next frequency pointer to that node.
To get the mode:
Return the last node.
Since range is fixed, for simplicity lets take an example M=7 (range is 1 to 7). So we need atmost 3 bit to represent each number.
0 - 000
1 - 001
2 - 010
3 - 011
4 - 100
5 - 101
6 - 110
7 - 111
Now create a b-tree with each node having 2-child (like Huffmann coding algo). Each leaf will contain the frequency of each number (initially it would be 0 for all). And address of these nodes will be saved in an array, with key as index (i.e. address for Node 1 will be at index 1 in array).
With pre-processing, we can execute insert, remove in O(1), mode in O(M) time.
insert(x) - go to location k in array, get address of node and increment counter for that node.
delete(x) - as above, just decrement counter if>0.
mode - linear search in array for maximum frequency (value of counter).
I had once known of a way to use logarithms to move from one leaf of a tree to the next "in-order" leaf of a tree. I think it involved taking a position value (rank?) of the "current" leaf and using it as a seed for a fresh traversal from the root down to the new target leaf - all the way using a log function test to determine whether to follow the right or left node down to the leaf.
I no longer recall how to exercise that technique. Can anyone re-introduce me?
I also don't recall if the technique required the tree to be balanced, or if it worked on n-trees or only binary trees. Any info would be appreciated.
Since you mentioned whether to go left or right, I'm going to assume you're talking about a binary tree specifically. In that case, I think you're right that there is a way. If your nodes are numbered left-to-right, top-to-bottom, starting with 1, then you can find the rank (depth in the tree) by taking the log2 of the node's number. To find that node again from the root, you can use the binary representation of the number, where 0 = left and 1 = right.
For example:
n = 11
11 in binary is 1011
We always ignore the first 1 since it's going to be there for every number (all nodes of rank n will be binary numbers with n+1 digits, with the first digit being 1). We're left with 011, which is saying from the root go left, then right, then right.
If you want to find the next in-order leaf, take the current leaf's number and add one, then traverse from the root using this method.
I believe this only works with balanced binary trees.
OK, this proposal requires more characters than I can fit into a comment box. Steven does not believe that knowing the depth of the node in the tree is useful. I think it is. I have been wrong in the past, and I'm sure I'll be wrong in the future, so I will try to explain how this idea works in an attempt to not be wrong in the present. If I am, I apologize ahead of time. I'm nearly certain I got it from one of my Algorithms and Datastructures courses, using the CLR book. Please excuse any slips in notation or nomenclature, I haven't studied this stuff in a while.
Quoting wikipedia, "a complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible."
We are considering a complete tree with any branching degree (where a binary tree has a branching degree of two). Also, we are considering our nodes to have a 'positional value' which is an ordering of the positional value (top to bottom, left to right) of the node.
Now, if we are given a positional value, we can find the node in the following fashion. Take the log_base_n of the positional value of the element we are looking for (floor of this, we want an integer). Traverse down from the root that many times, minus one. Now, start looking through all the children of the nodes at this level. Your node you are searching for will be in this set.
This is an attempt in explaining the additional part of the wikipedia definition:
"This depth is equal to the integer part of log2(n) where n
is the number of nodes on the balanced tree.
Example 1: balanced tree with 1 node, log2(1) = 0 (depth = 0).
Example 2: balanced tree with 3 nodes, log2(3) = 1.59 (depth=1).
Example 3: balanced tree with 5 nodes, log2(5) = 2.32
(depth of tree is 2 nodes)."
This is useful, because you can simply traverse down to this level and then start looking around. It is useful and important to know the depth your node is located on, so you can start looking there, instead of starting to look at the beginning. Unless you know what level of the tree you are on, you get to start looking at all the nodes sequentially.
That is why I think it is helpful to know the depth of the node we are searching for.
It is a little bit odd, since having the "positional value" is not something we normally care about in a tree. I can see why Steve thought of this in terms of an array, since positional value is inherent in arrays.
-Brian J. Stinar-
Something that at least resembles your description is the Binary Heap, used a.o. in Priority Queues.
I think I've found the answer, or at least a facsimile.
Assume the tree nodes are numbered, starting at 1, top-down and left-to-right. Assume traversal begins at the root, and halts when it finds node X (which means the parent is linked to its children). Also, for quick reference, the base 2 logarithmic values for nodes 1 through 12 are:
log2(1) = 0.0
log2(2) = 1
log2(3) = 1.58
log2(4) = 2
log2(5) = 2.32
log2(6) = 2.58
log2(7) = 2.807
log2(8) = 3
log2(9) = 3.16
log2(10) = 3.32
log2(11) = 3.459
log2(12) = 3.58
The fractional portion represents a unique diagonal position (notice how nodes 3, 6, and 12 all have fractional portion 0.58). Also notice that every node belongs either to the left or right side of the tree, depending on whether the log fractional component is less or great than 0.5. Anecdotes aside, the algorithm for finding a node is then as follows:
examine fractional portion, if it is less than .5, turn left. Else turn right.
subtract one from the whole number portion of the log, stop if the value reaches zero.
double the fractional portion, and start over.
So, for example, if node 11 is what you seek then you start by computing the log which is 3.459. Then...
3-459 <=fraction less than .5: turn left and decrement whole number to 2.
2-918 <=doubled fraction more than .5: turn right and decrement whole number to 1.
1-836 <=doubling .918 gives 1.836: but only fractional part counts: turn right and dec prior whole number to 0. Done!!
With appropriate accomodations, the same technique appears to work for any balanced n-ary tree. For example, given a balanced ternary tree, the choice of following left, middle, or right edges is again based on the fractional portion of the log, as follows:
between 0.5-0.832: turn left (a one-third fraction range)
between 0.17-0.49: turn right (another one-third fraction range)
otherwise go down the middle. (the last one-third range)
The algorithm is adjusted by multiplying the fractional portion by 3 instead of 2. Again, a quick reference for those who want to test this last statement:
log3(1) = 0.0
log3(2) = 0.63
log3(3) = 1
log3(4) = 1.26
log3(5) = 1.46
log3(6) = 1.63
log3(7) = 1.77
log3(8) = 1.89
log3(9) = 2
At this point I wonder if there is an even more concise way to express this whole "log-based top-down selection of a node." I'm interested if anyone knows...
Case 1: Nodes have pointers to their parent
Starting from the node, traverse up the parent pointer until one with non-null right_child is found. Go to the right_child and traverse left_child as long as they are non-null.
Case 2: Nodes do not have pointers to the parent
Starting from the root, find the path to the node (including the root and the node). Then find the latest vertex (i.e. a node) in the path that has non-null right_child. Go the the right_child and traverse left_child as long as they are non-null.
In both cases, we traversing either up or down from the root to one of the nodes. The maximum of such traversal is in the order of the depth of the tree, hence logarithmic in the size of the nodes if the tree is balanced.