How Digital search trees Work? - algorithm

I have tried many online sources but i am unable to understand how digital binary search tree works.Below Link is the example for your reference
(LINK: http://cseweb.ucsd.edu/~kube/cls/100/Lectures/lec15/lec15-10.html)
Is anybody construct a tree using these values and tell in detail that how it works?
A 00001
S 10011
E 00101
R 10010
C 00011
H 10100

The tree is constructed in such a way that the binary representations of the keys (A,S,E,R,C,H) can be used to locate them into the tree. In each searching step, the key is compared to the curren node (which is the root of the current search three). If the the key is not the root, the most significant bit of the key's binary representation is used to select the left subree (if the bit is 0) or the right subtree (if the bit is 1). This process is explained in more detail here.
In the example you provided, the key H (binary representation 10100) can be found as follows.
In the first step, the root is node A. As A does not equal H, the bit 1 is used, indicating that the right subtree should be chosen. Consequently, we consider the node S and the bit string 0100 which results from the original binary representation by omission of the most significant bit.
Since A does not equal H, we use the most significant bit, which is 0, indicating a choice of the left subtree. We consider the node R and the bit string 100.
As R does not equal H, again we use the most significant bit, which is 1, which means that the right subtree is to be chosen. We consider the node H and the bit string 00.
Since H equals H, we have found the desired key and the search terminates.

The DST works like by checking the bit level by level. Checks the starting of the bit if o then moves to left or else to right. At the same time it compares the bit position with the level.
For example:
Root is in the O level,
if the bit to be inserted is in the 1 st level then checks the 1st bit,
if 0 then inserts as a left or if it is 1 then insert in the right.
Similarly for the second level, checks the second bit to be inserted and works like this for the remaining levels.
In the given example;
First A (00001) is a root node, then S(10011) since 1 , move to the right and inserted.
The next is E(00101) , since 0 moves to left and inserted, next in the sequence is R(10010), since 1 move to the right , the bit in the second position is 0 so it is inserted as a left child of S.
In the sequence next is C(00011), 0 so moves to the left since 2nd bit is 0 insert in the left side, next is H(10100), since it starts with 1, moves to right, it has to be inserted as 3rd level so check the 3rd bit position as it is with 1, it is inserted in the right.
Hope this will clear your doubt.
So the final DST looks like this
[DST] [1]: https://i.stack.imgur.com/Iet4n.gif

Related

Where would you add '4' to the above binary search tree?

Where would you add '4' to the above binary search tree? And why?
A) A
B) B
C) C
D) Any of the above
My TA said it was just A but I'm thinking why can't it be all of the above
It is only A. Starting from root if your number is less than 5, go left branch. If your number is greater than 5 go right branch. Same process for every node.
Answer: A
Options B and C both violate that BST property; i.e. the new key '4', whose value is smaller than '5', would end up in the right subtree of '5'. (The right subtree should have keys which are greater.)
A binary search tree works by following the left child if the value you are searching for is less than the current node and right if it's greater until you find a node with desired value or the desired child is an empty tree (null).
So to test A, B or C:
if 4 is greater than 5 and smaller than 8 and 6, B is the correct answer.
If 4 is greater than 5 and 8 but smaller than 42, C is the correct answer.
If 4 is smaller than 5 but greater than 3, A is the correct answer.
In some silly field of mathematics or perhaps a parallel universe all these 3 might be correct at the same time, but besides that only one of these are correct with standard number theory.
Or from a search perspective (look at your tree from the root while reading this):
4 is smaller than 5 so go left.
4 is greater than 3 so go right
right node empty, insert at A
Now where would 2 be inserted?

Why in a heap implemented by array the index 0 is left unused?

I'm learning data structures and every source tells me not to use index 0 of the array while implementing heap, without giving any explanation why. I searched the web, searched StackExchange, and couldn't find an answer.
There's no reason why a heap implemented in an array has to leave the item at index 0 unused. If you put the root at 0, then the item at array[index] has its children at array[index*2+1] and array[index*2+2]. The node at array[child] has its parent at array[(child-1)/2].
Let's see.
root at 0 root at 1
Left child index*2 + 1 index*2
Right child index*2 + 2 index*2 + 1
Parent (index-1)/2 index/2
So having the root at 0 rather than at 1 costs you an extra add to find the left child, and an extra subtraction to find the parent.
For a more general case where it may not be a binary heap, but a 3-heap, 4-heap, etc where there are NUM_CHILDREN children for each node instead of 2 the formulas are:
root at 0 root at 1
Left child index*NUM_CHILDREN + 1 index*NUM_CHILDREN
Right child index* NUM_CHILDREN + 2 index*NUM_CHILDREN + 1
Parent (index-1)/NUM_CHILDREN index/NUM_CHILDREN
I can't see those few extra instructions making much of a difference in the run time.
For reasons why I think it's wrong to start at 1 in a language that has 0-based arrays, see https://stackoverflow.com/a/49806133/56778 and my blog post But that's the way we've always done it!
As I found it in CLRS book, there is some significance in terms of performance, since generally, shift operators work very fast.
On most computers, the LEFT procedure can compute 2*i in one instruction by
simply shifting the binary representation of i left by one bit position. Similarly, the
RIGHT procedure can quickly compute 2*i+1 by shifting the binary representation
of i left by one bit position and then adding in a 1 as the low-order bit. The
PARENT procedure can compute i/2 by shifting i right one bit position.
So, starting the heap at index 1 will probably make faster calculation of parent, left and right child indexes.
As observed by AnonJ, this is a question of taste rather than technical necessity. One nice thing about starting at 1 rather than 0 is that there's a bijection between binary strings x and the positive integers that maps a binary string x to the positive integer written 1x in binary. The string x gives the path from the root to the indexed node, where 0 means "take the left child", and 1 means "take the right child".
Another consideration is that the otherwise unused "zeroth" location can hold a sentinel with value minus infinity that, on architectures without branch prediction, may mean a non-negligible improvement in running time due to having only one test in the sift up loop rather than two.
(While I was searching, I came up with an answer of my own but I don't know whether it's correct or not.)
If index 0 is used for the root node then subsequent calculations on its children cannot proceed, because we have indexOfLeftChild = indexOfParent * 2 and indexOfRightChild = indexOfParent * 2 + 1. However 0 * 2 = 0 and 0 * 2 + 1 = 1, which cannot represent the parent-children relationship we want. Therefore we have to start at 1 so that the tree, represented by array, complies with the mathematical properties we desire.

quickest way to find out if an element is in the left or right subtree

If a binary tree is constructed as follows
the root is 1
the left child of an element n is 2*n
the right child of an element n is (2*n)+1
if I get a number n, what's the quickest way to find out if it's in the left or right subtree of the root? Is there some easy-to-determine mathematical property of the left subtree?
note: this is NOT a homework question, though it is part of a bigger algorithmic problem I'm trying to solve.
Consider the numbers in binary. Each child will be the parent number with a 0 or 1 appended to it depending on whether it is left or right.
This means that everything to the left of the root will start 10 in binary and anything to the right will start 11 in binary.
This means that you should be able to work out which side it is on just using some shift operations and then some comparisons.
I should note that I don't know if this is the most efficient method but it is a easy to determine mathematical property of the left subtree.
As noted by others the consequence of the 0 or 1 appendation means that each digit encodes the path through the subtree. The first 1 represents the root of the tree and from then on a 0 will mean taking the left branch at that point and a 1 will mean taking the right branch.
Thus binary 1001101 would mean left, left, right, right, left, right.
An obvious consequence of this is that the number of binary digits will determine exactly how deep in the tree that number is. so 1 is the top (1st) level. 10 would be at the second level (one choice made). my example 1001101 would be at the 7th level (six choices made). I should note that I'm unfamiliar with Binary tree terminology so not sure if the root would usually be considered the first or zeroth level which is why I am being explicit about number of choices made too.
One last observation in case it hasn't already been observed is that the numbers will also be assigned counting from top to bottom, left to right. So the first level is 1. The next is 2 on the left, 3 on the right. The level below that will go 4, 5, 6, 7 and then the row below that 8, 9, 10, 11, 12, 13, 14, 15 and so on. This isn't any more useful mathematically but if you are trying to visualise it may well help.
Following from Chris' observation, there's a very simple rule: Let x be the node you are looking for. Let S be the binary representation of x. Then the digits in S after the first from most significant tell you the path from the root: 0 means go left, 1 means go right.
Example: x = 2710 = 110112, so we need to go right, left, right, right to get there (the leading 1 is ignored).
The reason why this is true is that if you go right, you multiply by 2 (binary left shift by 1) and add a 1, so you are essentially appending a 1. Conversely, if you go left, you append a 0.

K-th element in a heap tree

I have a heap (implemented like a binary tree: each node has two pointers to the children and one pointer to the parent).
How can I find the k-th element (in a BFS order), given the number of elements in it? I think it can be done in O(logn) time..
(I'm assuming by "kth element (in a BFS order)" that you mean the kth element from the perspective of a top-to-bottom, left-to-right scan of the input.)
Since you know that a binary heap is a complete binary tree (except possibly at the last level), you know that the shape of the tree is a perfect binary tree of some height (containing 2k nodes for some k) with some number of nodes filled in from the left to the right. A really nifty property of these trees occurs when you write out the numbers of the nodes in a picture, one-indexing the values:
1
2 3
4 5 6 7
8 9 10 11 12 13 14 15
Notice that each layer starts with a node that's a power of two. So let's suppose, hypothetically, that you wanted to look up the number 13. The biggest power of two no greater than 13 is 8, so we know that 13 must appear in the row
8 9 10 11 12 13 14 15
We can now use this knowledge to reverse-engineer the path from 13 back up to the root of the tree. We know, for example, that 13 is in the latter half of the the numbers in this row, which means that 13 belongs to the right subtree of the root (if it belonged to the left subtree, then we would be in the subtree containing 8, 9, 10, and 11.) This means that we can go right from the root and throw out half of the numbers to get
12 13 14 15
We are now at node 3 in the tree. So do we go left or right? Well, 13 is in the first half of these numbers, so we know at this point that we need to descend into the left subtree of node 3. This takes us to node 6, and now we're left with the first half of the numbers:
12 13
13 is in the right half of these nodes, so we should descend to the right, taking us to node 13. And voila! We're there!
So how did this process work? Well, there's a really, really cute trick we can use. Let's write out the same tree we had above, but in binary:
0001
0010 0011
0100 0101 0110 0111
1000 1001 1010 1011 1100 1101 1110 1111
^^^^
I've pointed out the location of node 13. Our algorithm worked in the following way:
Find the layer containing the node.
While not at the node in question:
If the node is in the first half of the layer it's in, move left and throw away the right half of the range.
If the node is in the second half of the layer it's in, move right and throw away the left half of the range.
Let's think about what this means in binary. Finding the layer containing the node is equivalent to finding the most significant bit set in the number. In 13, which has binary representation 1101, the MSB is the 8 bit. This means that we're in the layer starting with eight.
So how do we determine whether we're in the left subtree or the right subtree? Well, to do that, we'd need to see if we are in the first half of this layer or the second half. And now for a cute trick - look at the next bit after the MSB. If this bit is set to 0, we're in the first half of the range, and otherwise we're in the second half of the range. Thus we can determine which half of the range we're in by just looking at the next bit of the number! This means we can determine which subtree to descend into by looking just at the next bit of the number.
Once we've done that, we can just repeat this process. What do we do at the next level? Well, if the next bit is a zero, we go left, and if the next bit is a one, we go right. Take a look at what this means for 13:
1101
^^^
|||
||+--- Go right at the third node.
||
|+---- Go left at the second node.
|
+----- Go right at the first node.
In other words, we can spell out the path from the root of the tree to our node in question just by looking at the bits of the number after the MSB!
Does this always work! You bet! Let's try the number 7. This has binary representation 0111. The MSB is in the 4's place. Using our algorithm, we'd do this:
0111
^^
||
|+--- Go right at the second node.
|
+---- Go right at the first node.
Looking in our original picture, this is the right path to take!
Here's some rough C/C++ pseudocode for this algorithm:
Node* NthNode(Node* root, int n) {
/* Find the largest power of two no greater than n. */
int bitIndex = 0;
while (true) {
/* See if the next power of two is greater than n. */
if (1 << (bitIndex + 1) > n) break;
bitIndex++;
}
/* Back off the bit index by one. We're going to use this to find the
* path down.
*/
bitIndex--;
/* Read off the directions to take from the bits of n. */
for (; bitIndex >= 0; bitIndex--) {
int mask = (1 << bitIndex);
if (n & mask)
root = root->right;
else
root = root->left;
}
return root;
}
I haven't tested this code! To paraphrase Don Knuth, I've just shown that conceptually it does the right thing. I might have an off-by-one error in here.
So how fast is this code? Well, the first loop runs until it finds the first power of two greater than n, which takes O(log n) time. The next part of the loop counts backwards through the bits of n one at a time, doing O(1) work at each step. The overall algorithm thus takes a total of O(log n) time.
Hope this helps!

Applying a Logarithm to Navigate a Tree

I had once known of a way to use logarithms to move from one leaf of a tree to the next "in-order" leaf of a tree. I think it involved taking a position value (rank?) of the "current" leaf and using it as a seed for a fresh traversal from the root down to the new target leaf - all the way using a log function test to determine whether to follow the right or left node down to the leaf.
I no longer recall how to exercise that technique. Can anyone re-introduce me?
I also don't recall if the technique required the tree to be balanced, or if it worked on n-trees or only binary trees. Any info would be appreciated.
Since you mentioned whether to go left or right, I'm going to assume you're talking about a binary tree specifically. In that case, I think you're right that there is a way. If your nodes are numbered left-to-right, top-to-bottom, starting with 1, then you can find the rank (depth in the tree) by taking the log2 of the node's number. To find that node again from the root, you can use the binary representation of the number, where 0 = left and 1 = right.
For example:
n = 11
11 in binary is 1011
We always ignore the first 1 since it's going to be there for every number (all nodes of rank n will be binary numbers with n+1 digits, with the first digit being 1). We're left with 011, which is saying from the root go left, then right, then right.
If you want to find the next in-order leaf, take the current leaf's number and add one, then traverse from the root using this method.
I believe this only works with balanced binary trees.
OK, this proposal requires more characters than I can fit into a comment box. Steven does not believe that knowing the depth of the node in the tree is useful. I think it is. I have been wrong in the past, and I'm sure I'll be wrong in the future, so I will try to explain how this idea works in an attempt to not be wrong in the present. If I am, I apologize ahead of time. I'm nearly certain I got it from one of my Algorithms and Datastructures courses, using the CLR book. Please excuse any slips in notation or nomenclature, I haven't studied this stuff in a while.
Quoting wikipedia, "a complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible."
We are considering a complete tree with any branching degree (where a binary tree has a branching degree of two). Also, we are considering our nodes to have a 'positional value' which is an ordering of the positional value (top to bottom, left to right) of the node.
Now, if we are given a positional value, we can find the node in the following fashion. Take the log_base_n of the positional value of the element we are looking for (floor of this, we want an integer). Traverse down from the root that many times, minus one. Now, start looking through all the children of the nodes at this level. Your node you are searching for will be in this set.
This is an attempt in explaining the additional part of the wikipedia definition:
"This depth is equal to the integer part of log2(n) where n
is the number of nodes on the balanced tree.
Example 1: balanced tree with 1 node, log2(1) = 0 (depth = 0).
Example 2: balanced tree with 3 nodes, log2(3) = 1.59 (depth=1).
Example 3: balanced tree with 5 nodes, log2(5) = 2.32
(depth of tree is 2 nodes)."
This is useful, because you can simply traverse down to this level and then start looking around. It is useful and important to know the depth your node is located on, so you can start looking there, instead of starting to look at the beginning. Unless you know what level of the tree you are on, you get to start looking at all the nodes sequentially.
That is why I think it is helpful to know the depth of the node we are searching for.
It is a little bit odd, since having the "positional value" is not something we normally care about in a tree. I can see why Steve thought of this in terms of an array, since positional value is inherent in arrays.
-Brian J. Stinar-
Something that at least resembles your description is the Binary Heap, used a.o. in Priority Queues.
I think I've found the answer, or at least a facsimile.
Assume the tree nodes are numbered, starting at 1, top-down and left-to-right. Assume traversal begins at the root, and halts when it finds node X (which means the parent is linked to its children). Also, for quick reference, the base 2 logarithmic values for nodes 1 through 12 are:
log2(1) = 0.0
log2(2) = 1
log2(3) = 1.58
log2(4) = 2
log2(5) = 2.32
log2(6) = 2.58
log2(7) = 2.807
log2(8) = 3
log2(9) = 3.16
log2(10) = 3.32
log2(11) = 3.459
log2(12) = 3.58
The fractional portion represents a unique diagonal position (notice how nodes 3, 6, and 12 all have fractional portion 0.58). Also notice that every node belongs either to the left or right side of the tree, depending on whether the log fractional component is less or great than 0.5. Anecdotes aside, the algorithm for finding a node is then as follows:
examine fractional portion, if it is less than .5, turn left. Else turn right.
subtract one from the whole number portion of the log, stop if the value reaches zero.
double the fractional portion, and start over.
So, for example, if node 11 is what you seek then you start by computing the log which is 3.459. Then...
3-459 <=fraction less than .5: turn left and decrement whole number to 2.
2-918 <=doubled fraction more than .5: turn right and decrement whole number to 1.
1-836 <=doubling .918 gives 1.836: but only fractional part counts: turn right and dec prior whole number to 0. Done!!
With appropriate accomodations, the same technique appears to work for any balanced n-ary tree. For example, given a balanced ternary tree, the choice of following left, middle, or right edges is again based on the fractional portion of the log, as follows:
between 0.5-0.832: turn left (a one-third fraction range)
between 0.17-0.49: turn right (another one-third fraction range)
otherwise go down the middle. (the last one-third range)
The algorithm is adjusted by multiplying the fractional portion by 3 instead of 2. Again, a quick reference for those who want to test this last statement:
log3(1) = 0.0
log3(2) = 0.63
log3(3) = 1
log3(4) = 1.26
log3(5) = 1.46
log3(6) = 1.63
log3(7) = 1.77
log3(8) = 1.89
log3(9) = 2
At this point I wonder if there is an even more concise way to express this whole "log-based top-down selection of a node." I'm interested if anyone knows...
Case 1: Nodes have pointers to their parent
Starting from the node, traverse up the parent pointer until one with non-null right_child is found. Go to the right_child and traverse left_child as long as they are non-null.
Case 2: Nodes do not have pointers to the parent
Starting from the root, find the path to the node (including the root and the node). Then find the latest vertex (i.e. a node) in the path that has non-null right_child. Go the the right_child and traverse left_child as long as they are non-null.
In both cases, we traversing either up or down from the root to one of the nodes. The maximum of such traversal is in the order of the depth of the tree, hence logarithmic in the size of the nodes if the tree is balanced.

Resources