Related
I'm not quite sure how to pick a root for a binary search tree (I'm wanting to do without any code):
5, 9, 2, 1, 4, 8 ,3, 7, 6
How do I pick a root?
The steps are confusing me for this algorithm.
You can initialize an empty BST (binary search tree), then iterate the list and insert each item.
You don't need to pick a root, just build the tree. But maybe you want balanced the tree, you can insert as first element the middle value of the list, but the right answer is to use a balanced binary search tree (AVL tree).
Median number will be a better choice, because you want to have less depth.
Here is one example, the root is find the median the next one is also find the median
5
3 8
2 4 7 9
1 6
5 is get by (1+9)/2. 3 get from ceiling(1+4)/2 (you can also choose the floor of the median as the role of choosing median root)
BST with the same values can have many forms. For example, a tree containing 1,2 can be:
1 <- root
\
2 <-- right son
or
2 <- root
/
1 <-- left son
So you can have a tree where 1 is the root and it goes 1->2->3... and no left sons. You can have 5 as the root with 4 and 6 as left and right sons respectively, and you can have many other trees with the same values, but different ordering (and maybe different roots)
How do I pick a root?
In whichever way you want to. Any number of your data can be the root.
You would like to choose the median though, in this case, 5. With that choice, your tree should get as balanced as it gets, four nodes on the left of 5 and four nodes in the right subtree of 5.
Notice that any element could be the rood (even a random choice, or the first number in your example).
Um, then why should I worried finding the median and not always picking the first number (easiest choice)?
Because you want your Binary Search Tree (BST) to be as balanced as possible.
If you pick the min or the max number as a root, then your tree will reach its maximum depth (worst case scenario), and will emulate a single linked list, which will result in a worst case scenario for the search algorithm as well. However, as Michel stated, picking the minimum or maximum item for the root won't necessarily lead to a degenerate tree. For example, if you picked the minimum item for the root and but the right branch that contains the rest of the items is balanced, then the tree's height is only one level more than optimum. You only get a degenerate tree if you choose the nodes in ascending or descending order.
Keep in mind that in a BST, this rule must be respected:
Left children are less than the parent node and
all right children are greater than the parent node.
For more, read How binary search tree is created??
I am trying to understand B+ trees. I have done some reading about it.
One thing I am confused about. For creation of tree some articles give no. of keys=n, no. of pointers=1+n and some increase them by 1.
For example I have to make a B+ tree of order 3 with
6,2,9,16,12,17,21,18
Here the root shall have 3 numbers and 4 pointers OR 4 numbers and 5 pointers.
The order measures the branching factor, or the maximum number of keys. When the root is alone it has one pointer, to its own key. Once more nodes are added it will have 1 pointer to its key and then n pointers to its children, where n is the order of the tree. In this case the B+ tree root will have one pointer to number (key) and up to 3 pointer to node. for a total of 4 pointers.
For more one b+ tree creation look at the insertion section:
B+ Tree Insertion Wikipedia
So, I read this TopCoder tutorial on RMQ (Range Minimum Query), and I got a big question.
On the section where he introduced the approach, what I can understand until now is this:
(The whole approach actually uses methodology introduced in Sparse Table (ST) Algorithm, Reduction from LCA to RMQ, and from RMQ to LCA)
Given an array A[N], we need to transform it to a Cartesian Tree, thus making a RMQ problem a LCA (Lowest Common Ancestor) problem. Later, we can get a simplified version of the array A, and make it a restricted RMQ problem.
So it's basically two transforms. So the first RMQ to LCA part is simple. By using a stack, we can make the transform in O(n) time, resulting an array T[N] where T[i] is the element i's parent. And the tree is completed.
But here's what I can't understand. The O(n) approach needs an array where |A[i] - A[i-1]| = 1, and that array is introduced in Reduction from LCA to RMQ section of the tutorial. That involves a Euler Tour of this tree. But how can this be achieved with my end result from the transform? My approach to it is not linear, so it should be considered bad in this approach, what would be the linear approach for this?
UPDATE: The point that confuse me
Here's the array A[]:
n : 0 1 2 3 4 5 6 7 8 9
A[n]: 2 4 3 1 6 7 8 9 1 7
Here's the array T[]:
n : 0 1 2 3 4 5 6 7 8 9
T[n]: 3 2 0 * 8 4 5 6 3 8 // * denotes -1, which is the root of the tree
//Above is from RMQ to LCA, it's from LCA to RMQ part that confuses me, more below.
A picture of the tree:
A Euler Tour needs to know each node's child, just like a DFS (Depth-First Search) while T[n] have only each element's root, not child.
Here is my current understanding of what's confusing you:
In order to reduce RMQ to LCA, you need to convert the array into a tree and then do an Euler tour of that tree.
In order to do an Euler tour, you need to store the tree such that each node points at its children.
The reduction that is listed from RMQ to LCA has each node point to its parent, not its children.
If this is the case, your concerns are totally justified, but there's an easy way to fix this. Specifically, once you have the array of all the parent pointers, you can convert it to a tree where each node points to its children in O(n) time. The idea is the following:
Create an array of n nodes. Each node has a value field, a left child, and a right child.
Initially, set the nth node to have a null left child, null right child, and the value of the nth element from the array.
Iterate across the T array (where T[n] is the parent index of n) and do the following:
If T[n] = *, then the nth entry is the root. You can store this for later use.
Otherwise, if T[n] < n, then you know that node n must be a right child of its parent, which is stored at position T[n]. So set the T[n]th node's right child to be the nth node.
Otherwise, if T[n] > n, then you know that node n must be a left child of its parent, which is stored at position T[n]. So set the T[n]th node's left child to be the nth node.
This runs in time O(n), since each node is processed exactly once.
Once this is done, you've explicitly constructed the tree structure that you need and have a pointer to the root. From there, it should be reasonably straightforward to proceed with the rest of the algorithm.
Hope this helps!
I am preparing for an exam and I have stumbled on the following question:
Draw the binary search tree that would result if data were to be added in the following order:
10,9,8,7,6,5,4,3
Why is the tree that results unsuitable for efficient searching?
My Answer:
I would have thought when creating a BST that we start with the value 10 as the root node then add 9 as the left sub tree value on the first level. Then 8 to the left subtree of 9 and so on. I don't know why this makes it inefficient for searching though. Any ideas?
Since the values are in decreasing order, they get added to the left at each level, practically leaving you with a linked list, which takes O(N) to search, instead of the preferred O(logN) of a BST.
Drawing:
10
/
9
/
8
/
7
/
6
/
5
/
4
/
3
This would create a linked-list, since it would just be a series of nodes; which is a heavily unbalanced tree.
You should look up red-black trees. They have the same time-complexities, but it will constantly move around nodes, so that it is always forming a triangular shape. This will keep the tree balanced.
This is inefficient because the node will always be added to the left subtree of the prior node. Making a search check every every node in the list until it finds the result even though the answer will always be to the left thus actually making it take more computations than just simply having a list that is searched through a loop.
What are the main differences between a Linked List and a BinarySearchTree? Is BST just a way of maintaining a LinkedList? My instructor talked about LinkedList and then BST but did't compare them or didn't say when to prefer one over another. This is probably a dumb question but I'm really confused. I would appreciate if someone can clarify this in a simple manner.
Linked List:
Item(1) -> Item(2) -> Item(3) -> Item(4) -> Item(5) -> Item(6) -> Item(7)
Binary tree:
Node(1)
/
Node(2)
/ \
/ Node(3)
RootNode(4)
\ Node(5)
\ /
Node(6)
\
Node(7)
In a linked list, the items are linked together through a single next pointer.
In a binary tree, each node can have 0, 1 or 2 subnodes, where (in case of a binary search tree) the key of the left node is lesser than the key of the node and the key of the right node is more than the node. As long as the tree is balanced, the searchpath to each item is a lot shorter than that in a linked list.
Searchpaths:
------ ------ ------
key List Tree
------ ------ ------
1 1 3
2 2 2
3 3 3
4 4 1
5 5 3
6 6 2
7 7 3
------ ------ ------
avg 4 2.43
------ ------ ------
By larger structures the average search path becomes significant smaller:
------ ------ ------
items List Tree
------ ------ ------
1 1 1
3 2 1.67
7 4 2.43
15 8 3.29
31 16 4.16
63 32 5.09
------ ------ ------
A Binary Search Tree is a binary tree in which each internal node x stores an element such that the element stored in the left subtree of x are less than or equal to x and elements stored in the right subtree of x are greater than or equal to x.
Now a Linked List consists of a sequence of nodes, each containing arbitrary values and one or two references pointing to the next and/or previous nodes.
In computer science, a binary search tree (BST) is a binary tree data structure which has the following properties:
each node (item in the tree) has a distinct value;
both the left and right subtrees must also be binary search trees;
the left subtree of a node contains only values less than the node's value;
the right subtree of a node contains only values greater than or equal to the node's value.
In computer science, a linked list is one of the fundamental data structures, and can be used to implement other data structures.
So a Binary Search tree is an abstract concept that may be implemented with a linked list or an array. While the linked list is a fundamental data structure.
I would say the MAIN difference is that a binary search tree is sorted. When you insert into a binary search tree, where those elements end up being stored in memory is a function of their value. With a linked list, elements are blindly added to the list regardless of their value.
Right away you can some trade offs:
Linked lists preserve insertion order and inserting is less expensive
Binary search trees are generally quicker to search
A linked list is a sequential number of "nodes" linked to each other, ie:
public class LinkedListNode
{
Object Data;
LinkedListNode NextNode;
}
A Binary Search Tree uses a similar node structure, but instead of linking to the next node, it links to two child nodes:
public class BSTNode
{
Object Data
BSTNode LeftNode;
BSTNode RightNode;
}
By following specific rules when adding new nodes to a BST, you can create a data structure that is very fast to traverse. Other answers here have detailed these rules, I just wanted to show at the code level the difference between node classes.
It is important to note that if you insert sorted data into a BST, you'll end up with a linked list, and you lose the advantage of using a tree.
Because of this, a linkedList is an O(N) traversal data structure, while a BST is a O(N) traversal data structure in the worst case, and a O(log N) in the best case.
They do have similarities, but the main difference is that a Binary Search Tree is designed to support efficient searching for an element, or "key".
A binary search tree, like a doubly-linked list, points to two other elements in the structure. However, when adding elements to the structure, rather than just appending them to the end of the list, the binary tree is reorganized so that elements linked to the "left" node are less than the current node and elements linked to the "right" node are greater than the current node.
In a simple implementation, the new element is compared to the first element of the structure (the root of the tree). If it's less, the "left" branch is taken, otherwise the "right" branch is examined. This continues with each node, until a branch is found to be empty; the new element fills that position.
With this simple approach, if elements are added in order, you end up with a linked list (with the same performance). Different algorithms exist for maintaining some measure of balance in the tree, by rearranging nodes. For example, AVL trees do the most work to keep the tree as balanced as possible, giving the best search times. Red-black trees don't keep the tree as balanced, resulting in slightly slower searches, but do less work on average as keys are inserted or removed.
Linked lists and BSTs don't really have much in common, except that they're both data structures that act as containers. Linked lists basically allow you to insert and remove elements efficiently at any location in the list, while maintaining the ordering of the list. This list is implemented using pointers from one element to the next (and often the previous).
A binary search tree on the other hand is a data structure of a higher abstraction (i.e. it's not specified how this is implemented internally) that allows for efficient searches (i.e. in order to find a specific element you don't have to look at all the elements.
Notice that a linked list can be thought of as a degenerated binary tree, i.e. a tree where all nodes only have one child.
It's actually pretty simple. A linked list is just a bunch of items chained together, in no particular order. You can think of it as a really skinny tree that never branches:
1 -> 2 -> 5 -> 3 -> 9 -> 12 -> |i. (that last is an ascii-art attempt at a terminating null)
A Binary Search Tree is different in 2 ways: the binary part means that each node has 2 children, not one, and the search part means that those children are arranged to speed up searches - only smaller items to the left, and only larger ones to the right:
5
/ \
3 9
/ \ \
1 2 12
9 has no left child, and 1, 2, and 12 are "leaves" - they have no branches.
Make sense?
For most "lookup" kinds of uses, a BST is better. But for just "keeping a list of things to deal with later First-In-First-Out or Last-In-First-Out" kinds of things, a linked list might work well.
The issue with a linked list is searching within it (whether for retrieval or insert).
For a single-linked list, you have to start at the head and search sequentially to find the desired element. To avoid the need to scan the whole list, you need additional references to nodes within the list, in which case, it's no longer a simple linked list.
A binary tree allows for more rapid searching and insertion by being inherently sorted and navigable.
An alternative that I've used successfully in the past is a SkipList. This provides something akin to a linked list but with extra references to allow search performance comparable to a binary tree.
A linked list is just that... a list. It's linear; each node has a reference to the next node (and the previous, if you're talking of a doubly-linked list). A tree branches---each node has a reference to various child nodes. A binary tree is a special case in which each node has only two children. Thus, in a linked list, each node has a previous node and a next node, and in a binary tree, a node has a left child, right child, and parent.
These relationships may be bi-directional or uni-directional, depending on how you need to be able to traverse the structure.
Linked List is straight Linear data with adjacent nodes connected with each other e.g. A->B->C. You can consider it as a straight fence.
BST is a hierarchical structure just like a tree with the main trunk connected to branches and those branches in-turn connected to other branches and so on. The "Binary" word here means each branch is connected to a maximum of two branches.
You use linked list to represent straight data only with each item connected to a maximum of one item; whereas you can use BST to connect an item to two items. You can use BST to represent a data such as family tree, but that'll become n-ary search tree as there can be more than two children to each person.
A binary search tree can be implemented in any fashion, it doesn't need to use a linked list.
A linked list is simply a structure which contains nodes and pointers/references to other nodes inside a node. Given the head node of a list, you may browse to any other node in a linked list. Doubly-linked lists have two pointers/references: the normal reference to the next node, but also a reference to the previous node. If the last node in a doubly-linked list references the first node in the list as the next node, and the first node references the last node as its previous node, it is said to be a circular list.
A binary search tree is a tree that splits up its input into two roughly-equal halves based on a binary search comparison algorithm. Thus, it only needs a very few searches to find an element. For instance, if you had a tree with 1-10 and you needed to search for three, first the element at the top would be checked, probably a 5 or 6. Three would be less than that, so only the first half of the tree would then be checked. If the next value is 3, you have it, otherwise, a comparison is done, etc, until either it is not found or its data is returned. Thus the tree is fast for lookup, but not nessecarily fast for insertion or deletion. These are very rough descriptions.
Linked List from wikipedia, and Binary Search Tree, also from wikipedia.
They are totally different data structures.
A linked list is a sequence of element where each element is linked to the next one, and in the case of a doubly linked list, the previous one.
A binary search tree is something totally different. It has a root node, the root node has up to two child nodes, and each child node can have up to two child notes etc etc. It is a pretty clever data structure, but it would be somewhat tedious to explain it here. Check out the Wikipedia artcle on it.