Binary Search Trees / Picking a Root - algorithm

I'm not quite sure how to pick a root for a binary search tree (I'm wanting to do without any code):
5, 9, 2, 1, 4, 8 ,3, 7, 6
How do I pick a root?
The steps are confusing me for this algorithm.

You can initialize an empty BST (binary search tree), then iterate the list and insert each item.
You don't need to pick a root, just build the tree. But maybe you want balanced the tree, you can insert as first element the middle value of the list, but the right answer is to use a balanced binary search tree (AVL tree).

Median number will be a better choice, because you want to have less depth.
Here is one example, the root is find the median the next one is also find the median
5
3 8
2 4 7 9
1 6
5 is get by (1+9)/2. 3 get from ceiling(1+4)/2 (you can also choose the floor of the median as the role of choosing median root)

BST with the same values can have many forms. For example, a tree containing 1,2 can be:
1 <- root
\
2 <-- right son
or
2 <- root
/
1 <-- left son
So you can have a tree where 1 is the root and it goes 1->2->3... and no left sons. You can have 5 as the root with 4 and 6 as left and right sons respectively, and you can have many other trees with the same values, but different ordering (and maybe different roots)

How do I pick a root?
In whichever way you want to. Any number of your data can be the root.
You would like to choose the median though, in this case, 5. With that choice, your tree should get as balanced as it gets, four nodes on the left of 5 and four nodes in the right subtree of 5.
Notice that any element could be the rood (even a random choice, or the first number in your example).
Um, then why should I worried finding the median and not always picking the first number (easiest choice)?
Because you want your Binary Search Tree (BST) to be as balanced as possible.
If you pick the min or the max number as a root, then your tree will reach its maximum depth (worst case scenario), and will emulate a single linked list, which will result in a worst case scenario for the search algorithm as well. However, as Michel stated, picking the minimum or maximum item for the root won't necessarily lead to a degenerate tree. For example, if you picked the minimum item for the root and but the right branch that contains the rest of the items is balanced, then the tree's height is only one level more than optimum. You only get a degenerate tree if you choose the nodes in ascending or descending order.
Keep in mind that in a BST, this rule must be respected:
Left children are less than the parent node and
all right children are greater than the parent node.
For more, read How binary search tree is created??

Related

Given a list of keys, how do we find the almost complete binary search tree of that list?

I saw an answer here with the idea implemented in Python (not very familiar with Python) - I was looking for a more general algorithm.
EDIT:
For clarification:
Say we are given a list of integer keys: 23 44 88 12 74 32 7 39 10
That list was chosen arbitrarily. We are to create an almost complete (or complete) binary search tree from that list. There is supposed to be only one such tree...how do we find it?
A binary search tree is constructed so that all items on a node's left subtree are less than the node, and all nodes on the right subtree are greater than the node.
A complete (or almost complete) binary tree is one in which all levels except possibly the last are completely full, and the bottom level is filled to the left.
So, for example, this is an almost-complete binary search tree:
4
/ \
2 5
/ \
1 3
This is not:
3
/ \
2 4
/ \
1 5
Because the bottom level of the tree is not filled from the left.
If the number of items is one less than a power of two (i.e. 3, 7, 15, etc.), then building the tree is easy. Start by sorting the list. Then, take the middle element as the root. So if you have [1,2,3,4,5,6,7], and the root node is 4.
You do the same thing recursively for the right and left halves of the array.
If the number of items is not one less than a power of two, you have to adjust the starting point (the root node) so that the bottom row is left-filled. Note that you might have to apply that adjustment recursively, as well, whenever your subtree length is not one less than a power of two.
Since this is a homework assignment, I'll leave that for you to figure out.

How can I write a non recursive algorithm to add root to tree?

I need a non recursive algorithm that will add up the root to the tree values, and then display the value that is the highest. Not to add up every element in the tree, just the highest value way to get from root to leaf.
2
/ \
8 6
in this example the answer would be 10
must be in O(n) timing
Try it breadth-first. Add the children of the actual node to a list, then proceed along that list until you reach its end.

Why does adding nodes in (reverse) order make for inefficient searching?

I am preparing for an exam and I have stumbled on the following question:
Draw the binary search tree that would result if data were to be added in the following order:
10,9,8,7,6,5,4,3
Why is the tree that results unsuitable for efficient searching?
My Answer:
I would have thought when creating a BST that we start with the value 10 as the root node then add 9 as the left sub tree value on the first level. Then 8 to the left subtree of 9 and so on. I don't know why this makes it inefficient for searching though. Any ideas?
Since the values are in decreasing order, they get added to the left at each level, practically leaving you with a linked list, which takes O(N) to search, instead of the preferred O(logN) of a BST.
Drawing:
10
/
9
/
8
/
7
/
6
/
5
/
4
/
3
This would create a linked-list, since it would just be a series of nodes; which is a heavily unbalanced tree.
You should look up red-black trees. They have the same time-complexities, but it will constantly move around nodes, so that it is always forming a triangular shape. This will keep the tree balanced.
This is inefficient because the node will always be added to the left subtree of the prior node. Making a search check every every node in the list until it finds the result even though the answer will always be to the left thus actually making it take more computations than just simply having a list that is searched through a loop.

Applying a Logarithm to Navigate a Tree

I had once known of a way to use logarithms to move from one leaf of a tree to the next "in-order" leaf of a tree. I think it involved taking a position value (rank?) of the "current" leaf and using it as a seed for a fresh traversal from the root down to the new target leaf - all the way using a log function test to determine whether to follow the right or left node down to the leaf.
I no longer recall how to exercise that technique. Can anyone re-introduce me?
I also don't recall if the technique required the tree to be balanced, or if it worked on n-trees or only binary trees. Any info would be appreciated.
Since you mentioned whether to go left or right, I'm going to assume you're talking about a binary tree specifically. In that case, I think you're right that there is a way. If your nodes are numbered left-to-right, top-to-bottom, starting with 1, then you can find the rank (depth in the tree) by taking the log2 of the node's number. To find that node again from the root, you can use the binary representation of the number, where 0 = left and 1 = right.
For example:
n = 11
11 in binary is 1011
We always ignore the first 1 since it's going to be there for every number (all nodes of rank n will be binary numbers with n+1 digits, with the first digit being 1). We're left with 011, which is saying from the root go left, then right, then right.
If you want to find the next in-order leaf, take the current leaf's number and add one, then traverse from the root using this method.
I believe this only works with balanced binary trees.
OK, this proposal requires more characters than I can fit into a comment box. Steven does not believe that knowing the depth of the node in the tree is useful. I think it is. I have been wrong in the past, and I'm sure I'll be wrong in the future, so I will try to explain how this idea works in an attempt to not be wrong in the present. If I am, I apologize ahead of time. I'm nearly certain I got it from one of my Algorithms and Datastructures courses, using the CLR book. Please excuse any slips in notation or nomenclature, I haven't studied this stuff in a while.
Quoting wikipedia, "a complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible."
We are considering a complete tree with any branching degree (where a binary tree has a branching degree of two). Also, we are considering our nodes to have a 'positional value' which is an ordering of the positional value (top to bottom, left to right) of the node.
Now, if we are given a positional value, we can find the node in the following fashion. Take the log_base_n of the positional value of the element we are looking for (floor of this, we want an integer). Traverse down from the root that many times, minus one. Now, start looking through all the children of the nodes at this level. Your node you are searching for will be in this set.
This is an attempt in explaining the additional part of the wikipedia definition:
"This depth is equal to the integer part of log2(n) where n
is the number of nodes on the balanced tree.
Example 1: balanced tree with 1 node, log2(1) = 0 (depth = 0).
Example 2: balanced tree with 3 nodes, log2(3) = 1.59 (depth=1).
Example 3: balanced tree with 5 nodes, log2(5) = 2.32
(depth of tree is 2 nodes)."
This is useful, because you can simply traverse down to this level and then start looking around. It is useful and important to know the depth your node is located on, so you can start looking there, instead of starting to look at the beginning. Unless you know what level of the tree you are on, you get to start looking at all the nodes sequentially.
That is why I think it is helpful to know the depth of the node we are searching for.
It is a little bit odd, since having the "positional value" is not something we normally care about in a tree. I can see why Steve thought of this in terms of an array, since positional value is inherent in arrays.
-Brian J. Stinar-
Something that at least resembles your description is the Binary Heap, used a.o. in Priority Queues.
I think I've found the answer, or at least a facsimile.
Assume the tree nodes are numbered, starting at 1, top-down and left-to-right. Assume traversal begins at the root, and halts when it finds node X (which means the parent is linked to its children). Also, for quick reference, the base 2 logarithmic values for nodes 1 through 12 are:
log2(1) = 0.0
log2(2) = 1
log2(3) = 1.58
log2(4) = 2
log2(5) = 2.32
log2(6) = 2.58
log2(7) = 2.807
log2(8) = 3
log2(9) = 3.16
log2(10) = 3.32
log2(11) = 3.459
log2(12) = 3.58
The fractional portion represents a unique diagonal position (notice how nodes 3, 6, and 12 all have fractional portion 0.58). Also notice that every node belongs either to the left or right side of the tree, depending on whether the log fractional component is less or great than 0.5. Anecdotes aside, the algorithm for finding a node is then as follows:
examine fractional portion, if it is less than .5, turn left. Else turn right.
subtract one from the whole number portion of the log, stop if the value reaches zero.
double the fractional portion, and start over.
So, for example, if node 11 is what you seek then you start by computing the log which is 3.459. Then...
3-459 <=fraction less than .5: turn left and decrement whole number to 2.
2-918 <=doubled fraction more than .5: turn right and decrement whole number to 1.
1-836 <=doubling .918 gives 1.836: but only fractional part counts: turn right and dec prior whole number to 0. Done!!
With appropriate accomodations, the same technique appears to work for any balanced n-ary tree. For example, given a balanced ternary tree, the choice of following left, middle, or right edges is again based on the fractional portion of the log, as follows:
between 0.5-0.832: turn left (a one-third fraction range)
between 0.17-0.49: turn right (another one-third fraction range)
otherwise go down the middle. (the last one-third range)
The algorithm is adjusted by multiplying the fractional portion by 3 instead of 2. Again, a quick reference for those who want to test this last statement:
log3(1) = 0.0
log3(2) = 0.63
log3(3) = 1
log3(4) = 1.26
log3(5) = 1.46
log3(6) = 1.63
log3(7) = 1.77
log3(8) = 1.89
log3(9) = 2
At this point I wonder if there is an even more concise way to express this whole "log-based top-down selection of a node." I'm interested if anyone knows...
Case 1: Nodes have pointers to their parent
Starting from the node, traverse up the parent pointer until one with non-null right_child is found. Go to the right_child and traverse left_child as long as they are non-null.
Case 2: Nodes do not have pointers to the parent
Starting from the root, find the path to the node (including the root and the node). Then find the latest vertex (i.e. a node) in the path that has non-null right_child. Go the the right_child and traverse left_child as long as they are non-null.
In both cases, we traversing either up or down from the root to one of the nodes. The maximum of such traversal is in the order of the depth of the tree, hence logarithmic in the size of the nodes if the tree is balanced.

Difference between a LinkedList and a Binary Search Tree

What are the main differences between a Linked List and a BinarySearchTree? Is BST just a way of maintaining a LinkedList? My instructor talked about LinkedList and then BST but did't compare them or didn't say when to prefer one over another. This is probably a dumb question but I'm really confused. I would appreciate if someone can clarify this in a simple manner.
Linked List:
Item(1) -> Item(2) -> Item(3) -> Item(4) -> Item(5) -> Item(6) -> Item(7)
Binary tree:
Node(1)
/
Node(2)
/ \
/ Node(3)
RootNode(4)
\ Node(5)
\ /
Node(6)
\
Node(7)
In a linked list, the items are linked together through a single next pointer.
In a binary tree, each node can have 0, 1 or 2 subnodes, where (in case of a binary search tree) the key of the left node is lesser than the key of the node and the key of the right node is more than the node. As long as the tree is balanced, the searchpath to each item is a lot shorter than that in a linked list.
Searchpaths:
------ ------ ------
key List Tree
------ ------ ------
1 1 3
2 2 2
3 3 3
4 4 1
5 5 3
6 6 2
7 7 3
------ ------ ------
avg 4 2.43
------ ------ ------
By larger structures the average search path becomes significant smaller:
------ ------ ------
items List Tree
------ ------ ------
1 1 1
3 2 1.67
7 4 2.43
15 8 3.29
31 16 4.16
63 32 5.09
------ ------ ------
A Binary Search Tree is a binary tree in which each internal node x stores an element such that the element stored in the left subtree of x are less than or equal to x and elements stored in the right subtree of x are greater than or equal to x.
Now a Linked List consists of a sequence of nodes, each containing arbitrary values and one or two references pointing to the next and/or previous nodes.
In computer science, a binary search tree (BST) is a binary tree data structure which has the following properties:
each node (item in the tree) has a distinct value;
both the left and right subtrees must also be binary search trees;
the left subtree of a node contains only values less than the node's value;
the right subtree of a node contains only values greater than or equal to the node's value.
In computer science, a linked list is one of the fundamental data structures, and can be used to implement other data structures.
So a Binary Search tree is an abstract concept that may be implemented with a linked list or an array. While the linked list is a fundamental data structure.
I would say the MAIN difference is that a binary search tree is sorted. When you insert into a binary search tree, where those elements end up being stored in memory is a function of their value. With a linked list, elements are blindly added to the list regardless of their value.
Right away you can some trade offs:
Linked lists preserve insertion order and inserting is less expensive
Binary search trees are generally quicker to search
A linked list is a sequential number of "nodes" linked to each other, ie:
public class LinkedListNode
{
Object Data;
LinkedListNode NextNode;
}
A Binary Search Tree uses a similar node structure, but instead of linking to the next node, it links to two child nodes:
public class BSTNode
{
Object Data
BSTNode LeftNode;
BSTNode RightNode;
}
By following specific rules when adding new nodes to a BST, you can create a data structure that is very fast to traverse. Other answers here have detailed these rules, I just wanted to show at the code level the difference between node classes.
It is important to note that if you insert sorted data into a BST, you'll end up with a linked list, and you lose the advantage of using a tree.
Because of this, a linkedList is an O(N) traversal data structure, while a BST is a O(N) traversal data structure in the worst case, and a O(log N) in the best case.
They do have similarities, but the main difference is that a Binary Search Tree is designed to support efficient searching for an element, or "key".
A binary search tree, like a doubly-linked list, points to two other elements in the structure. However, when adding elements to the structure, rather than just appending them to the end of the list, the binary tree is reorganized so that elements linked to the "left" node are less than the current node and elements linked to the "right" node are greater than the current node.
In a simple implementation, the new element is compared to the first element of the structure (the root of the tree). If it's less, the "left" branch is taken, otherwise the "right" branch is examined. This continues with each node, until a branch is found to be empty; the new element fills that position.
With this simple approach, if elements are added in order, you end up with a linked list (with the same performance). Different algorithms exist for maintaining some measure of balance in the tree, by rearranging nodes. For example, AVL trees do the most work to keep the tree as balanced as possible, giving the best search times. Red-black trees don't keep the tree as balanced, resulting in slightly slower searches, but do less work on average as keys are inserted or removed.
Linked lists and BSTs don't really have much in common, except that they're both data structures that act as containers. Linked lists basically allow you to insert and remove elements efficiently at any location in the list, while maintaining the ordering of the list. This list is implemented using pointers from one element to the next (and often the previous).
A binary search tree on the other hand is a data structure of a higher abstraction (i.e. it's not specified how this is implemented internally) that allows for efficient searches (i.e. in order to find a specific element you don't have to look at all the elements.
Notice that a linked list can be thought of as a degenerated binary tree, i.e. a tree where all nodes only have one child.
It's actually pretty simple. A linked list is just a bunch of items chained together, in no particular order. You can think of it as a really skinny tree that never branches:
1 -> 2 -> 5 -> 3 -> 9 -> 12 -> |i. (that last is an ascii-art attempt at a terminating null)
A Binary Search Tree is different in 2 ways: the binary part means that each node has 2 children, not one, and the search part means that those children are arranged to speed up searches - only smaller items to the left, and only larger ones to the right:
5
/ \
3 9
/ \ \
1 2 12
9 has no left child, and 1, 2, and 12 are "leaves" - they have no branches.
Make sense?
For most "lookup" kinds of uses, a BST is better. But for just "keeping a list of things to deal with later First-In-First-Out or Last-In-First-Out" kinds of things, a linked list might work well.
The issue with a linked list is searching within it (whether for retrieval or insert).
For a single-linked list, you have to start at the head and search sequentially to find the desired element. To avoid the need to scan the whole list, you need additional references to nodes within the list, in which case, it's no longer a simple linked list.
A binary tree allows for more rapid searching and insertion by being inherently sorted and navigable.
An alternative that I've used successfully in the past is a SkipList. This provides something akin to a linked list but with extra references to allow search performance comparable to a binary tree.
A linked list is just that... a list. It's linear; each node has a reference to the next node (and the previous, if you're talking of a doubly-linked list). A tree branches---each node has a reference to various child nodes. A binary tree is a special case in which each node has only two children. Thus, in a linked list, each node has a previous node and a next node, and in a binary tree, a node has a left child, right child, and parent.
These relationships may be bi-directional or uni-directional, depending on how you need to be able to traverse the structure.
Linked List is straight Linear data with adjacent nodes connected with each other e.g. A->B->C. You can consider it as a straight fence.
BST is a hierarchical structure just like a tree with the main trunk connected to branches and those branches in-turn connected to other branches and so on. The "Binary" word here means each branch is connected to a maximum of two branches.
You use linked list to represent straight data only with each item connected to a maximum of one item; whereas you can use BST to connect an item to two items. You can use BST to represent a data such as family tree, but that'll become n-ary search tree as there can be more than two children to each person.
A binary search tree can be implemented in any fashion, it doesn't need to use a linked list.
A linked list is simply a structure which contains nodes and pointers/references to other nodes inside a node. Given the head node of a list, you may browse to any other node in a linked list. Doubly-linked lists have two pointers/references: the normal reference to the next node, but also a reference to the previous node. If the last node in a doubly-linked list references the first node in the list as the next node, and the first node references the last node as its previous node, it is said to be a circular list.
A binary search tree is a tree that splits up its input into two roughly-equal halves based on a binary search comparison algorithm. Thus, it only needs a very few searches to find an element. For instance, if you had a tree with 1-10 and you needed to search for three, first the element at the top would be checked, probably a 5 or 6. Three would be less than that, so only the first half of the tree would then be checked. If the next value is 3, you have it, otherwise, a comparison is done, etc, until either it is not found or its data is returned. Thus the tree is fast for lookup, but not nessecarily fast for insertion or deletion. These are very rough descriptions.
Linked List from wikipedia, and Binary Search Tree, also from wikipedia.
They are totally different data structures.
A linked list is a sequence of element where each element is linked to the next one, and in the case of a doubly linked list, the previous one.
A binary search tree is something totally different. It has a root node, the root node has up to two child nodes, and each child node can have up to two child notes etc etc. It is a pretty clever data structure, but it would be somewhat tedious to explain it here. Check out the Wikipedia artcle on it.

Resources