I am watching a lecture from IIT about data structures ( Dr.naveen garg ) About AVL tree.
My Question : Why the height of T2 can't be (h-1)?
The assumption is that the tree is balanced after the insertion WITHOUT rotation.
If rotation occured - it's a different case and you deal with it with ROTATION, I figure it out from "Since X remains balanced.." this is the assumption, and we are showing in here that the tree stays balanced only in this case.
if ht(T2) were to be (h-1) like you said, then the tree would be unbalanced AFTER insertion. Which is not part of the ASSUMPTION in the question.
Because the balance factor of x will now be 2. Thus a rotation would have to take place.
Related
When adding or deleting a node in an AVL tree, rebalancing might occur. I can understand how there can be O(log(n)) rebalances needed, but when those rotations occur to balance the tree how many nodes actually change level. I can't seem to find this anywhere. I was thinking it was O(log(n)) but can't seem to figure out why. Help would be greatly appreciated
The answer is O(n).
Suppose that at each node there is a "depth" field, how much will it cost to maintain it?
There is a theorem: If the information in field F of node N depends solely on its direct children, then it can be maintained when updated (-inserted or deleted) in logarithmic time.
(the theorem can be proved by induction)
The "depth" field doesn't depends on its children - but rather on its parent.
Note, however, that the theorem is one way, meaning it says when it can be maintained in logarithmic time, but doesn't say when not.Therefore it cant't be said with certaity that the "depth" field can be maintained in logarithmic time (such as height or BF fields), and it can even be seen at insertion that O(n) nodes change thier depth:
In the first rotation the depth of t2 and t4 have been changed, and in the second, the depth of t1, t2, and t3!
I wanted to understand how red-black tree works. I understood the algorithm, how to fix properties after insert and delete operations, but something isn't clear to me. Why red-black tree is more balanced than binary tree? I want to understand the intuition, why rotations and fixing tree properties makes red-black tree more balanced.
Thanks.
Suppose you create a plain binary tree by inserting the following items in order: 1, 2, 3, 4, 5, 6, 7, 8, 9. Each new item will always be the largest item in the tree, and so inserted as the right-most possible node. You "tree" would look like this:
1
\
2
\
3
.
.
.
9
The rotations performed in a red-black tree (or any type of balanced binary tree) ensure that neither the left nor right subtree of any node is significantly deeper than the other (typically, the difference in height is 0 or 1, but any constant factor would do.) This way, operations whose running time depends on the height h of the tree are always O(lg n), since the rotations maintain the property that h = O(lg n), whereas in the worst case shown above h = O(n).
For a red-black tree in particular, the node coloring is simply a bookkeeping trick that help in proving that the rotations always maintain h = O(lg n). Different types of balanced binary trees (AVL trees, 2-3 trees, etc) use different bookkeeping techniques for maintaining the same property.
Why red-black tree is more balanced than binary search tree?
Because a red-black tree guarantees O(logN) performance for insertion, deletion and look ups for any order of operations.
Why rotations and fixing tree properties makes red-black tree more balanced?
Apart from the general properties that any binary search tree must obey, a red-black tree also obeys the following properties:
No node has two red links connected to it.
Every path from root to null link has the same number of black links.
Red links lean left.
Now we want to prove the following proposition :
Proposition. Height of tree is ≤ 2 lg N in the worst case.
Proof.
Since every path from the root to any null link has the same number of black links and two red links are never in-a-row, the maximum height will always be less than or equal to 2logN in the worst case.
Although quite late , but since I was recently studying RBT and was struggling with the intuition behind why some magical rotation and coloring balances the tree and was thinking the same question as the OP
why rotations and fixing tree properties makes red-black tree more balanced
After a few days of "research" , I had the eureka moment and decided to write it in details . I won't copy paste here as some formatting would be not right , so anyone who is interested , can check it from github . I tried to explain with a lot of images and simulation . Hope it helps someone someday who happens to trip in this thread searching the same question : )
Since tree height is the main impediment to computational efficiency, a good strategy is to make the root of the shorter tree point to the root of the longer tree.
Does this really matter though? I mean if you did it the other way around (merge the longer tree into the shorter) the tree height will only increase by 1. Since an increase of 1 wouldn't make a real difference (would it?), does it really matter which tree is merged into which? Or is there an alternate reason for why the shorter tree is merged into the longer?
Note I am talking about disjoint sets.
It isn't really clear which kind of tree you are talking about (binary search trees, disjoint sets, or any n-ary tree).
But in any case, I think the reason is that although having a an increase of 1 isn't significant, if you do n mergers you end up with an increase of n. This can be significant if you have a data structure that needs lots of mergers (e.g. disjoint sets).
The quoatation lacks context. For example, in some tree structures single elemenet may have to be inserted one by one (possibly rebalancing the tree, for example - usually you want trees of height O(log n)); maybe this is meant: Then it is easier to insert fewer elements to the larger tree.
Obviously, if a height increase of 1 matters depends in party on how often the height is increased by one :-)
Edit: With disjoint sets, it is important that the smaller (lower) tree will be added to the bigger.
We always see operations on a (binary search) tree has O(logn) worst case running time because of the tree height is logn. I wonder if we are told that an algorithm has running time as a function of logn, e.g m + nlogn, can we conclude it must involve an (augmented) tree?
EDIT:
Thanks to your comments, I now realize divide-conquer and binary tree are so similar visually/conceptually. I had never made a connection between the two. But I think of a case where O(logn) is not a divide-conquer algo which involves a tree which has no property of a BST/AVL/red-black tree.
That's the disjoint set data structure with Find/Union operations, whose running time is O(N + MlogN), with N being the # of elements and M the number of Find operations.
Please let me know if I'm missing sth, but I cannot see how divide-conquer comes into play here. I just see in this (disjoint set) case that it has a tree with no BST property and a running time being a function of logN. So my question is about why/why not I can make a generalization from this case.
What you have is exactly backwards. O(lg N) generally means some sort of divide and conquer algorithm, and one common way of implementing divide and conquer is a binary tree. While binary trees are a substantial subset of all divide-and-conquer algorithms, the are a subset anyway.
In some cases, you can transform other divide and conquer algorithms fairly directly into binary trees (e.g. comments on another answer have already made an attempt at claiming a binary search is similar). Just for another obvious example, however, a multiway tree (e.g. a B-tree, B+ tree or B* tree), while clearly a tree is just as clearly not a binary tree.
Again, if you want to badly enough, you can stretch the point that a multiway tree can be represented as sort of a warped version of a binary tree. If you want to, you can probably stretch all the exceptions to the point of saying that all of them are (at least something like) binary trees. At least to me, however, all that does is make "binary tree" synonymous with "divide and conquer". In other words, all you accomplish is warping the vocabulary and essentially obliterating a term that's both distinct and useful.
No, you can also binary search a sorted array (for instance). But don't take my word for it http://en.wikipedia.org/wiki/Binary_search_algorithm
As a counter example:
given array 'a' with length 'n'
y = 0
for x = 0 to log(length(a))
y = y + 1
return y
The run time is O(log(n)), but no tree here!
Answer is no. Binary search of a sorted array is O(log(n)).
Algorithms taking logarithmic time are commonly found in operations on binary trees.
Examples of O(logn):
Finding an item in a sorted array with a binary search or a balanced search tree.
Look up a value in a sorted input array by bisection.
As O(log(n)) is only an upper bound also all O(1) algorithms like function (a, b) return a+b; satisfy the condition.
But I have to agree all Theta(log(n)) algorithms kinda look like tree algorithms or at least can be abstracted to a tree.
Short Answer:
Just because an algorithm has log(n) as part of its analysis does not mean that a tree is involved. For example, the following is a very simple algorithm that is O(log(n)
for(int i = 1; i < n; i = i * 2)
print "hello";
As you can see, no tree was involved. John, also provides a good example on how binary search can be done on a sorted array. These both take O(log(n)) time, and there are of other code examples that could be created or referenced. So don't make assumptions based on the asymptotic time complexity, look at the code to know for sure.
More On Trees:
Just because an algorithm involves "trees" doesn't imply O(logn) either. You need to know the tree type and how the operation affects the tree.
Some Examples:
Example 1)
Inserting or searching the following unbalanced tree would be O(n).
Example 2)
Inserting or search the following balanced trees would both by O(log(n)).
Balanced Binary Tree:
Balanced Tree of Degree 3:
Additional Comments
If the trees you are using don't have a way to "balance" than there is a good chance that your operations will be O(n) time not O(logn). If you use trees that are self balancing, then inserts normally take more time, as the balancing of the trees normally occur during the insert phase.
I've tried to understand what sorted trees are and binary trees and avl and and and ...
I'm still not sure, what makes a sorted tree sorted? And what is the complexity (Big-Oh) between searching in a sorted and searching in an unsorted tree? Hope you can help me.
Binary Trees
There exists two main types of binary trees, balanced and unbalanced. A balanced tree aims to keep the height of the tree (height = the amount of nodes between the root and the furthest child) as even as possible. There are several types of algorithms for balanced trees, the two most famous being AVL- and RedBlack-trees. The complexity for insert/delete/search operations on both AVL and RedBlack trees is O(log n) or better - which is the important part. Other self balancing algorithms are AA-, Splay- and Scapegoat-tree.
Balanced trees gain their property (and name) of being balanced from the fact that after every delete or insert operation on the tree the algorithm introspects the tree to make sure it's still balanced, if it's not it will try to fix this (which is done differently with each algorithm) by rotating nodes around in the tree.
Normal (or unbalanced) binary trees do not modify their structure to keep themselves balanced and have the risk of, most often overtime, to become very inefficient (especially if the values are inserted in order). However if performance is of no issue and you mainly want a sorted data structure then they might do. The complexity for insert/delete/search operations on an unbalanced tree range from O(1) (best case - if you want the root) to O(n) (worst-case if you inserted all nodes in order and want the largest node)
There exists another variation which is called a randomized binary tree which uses some kind of randomization to make sure the tree doesn't become fully unbalanced (which is the same as a linked list)
A binary search tree is an "tree"-structure where every node has two children-nodes.
The left nodes all have the property of being less than its parent, and the right-nodes are all greater than its parent.
The intressting thing with an binary-tree is that we can search for an value in O(log n) when the tree is properly sorted. Doing the same search in an LinkedList for an example would give us the searchspeed of O(n).
The best way to go about learning datastructures would be to do a day of googling and reading wikipedia articles.
This might get you started
http://en.wikipedia.org/wiki/Binary_search_tree
Do a google search for the following:
site:stackoverflow.com binary trees
to get a list of SO questions which will answer your several questions.
There isn't really a lot of point in using a tree structure if it isn't sorted in some fashion - if you are planning on searching for a node in the tree and it is unsorted, you will have to traverse the entire tree (O(n)). If you have a tree which is sorted in some fashion, then it is only necessary to traverse down a single branch of the tree (typically O(log n)).
In binary tree the right leaf is always smaller then the head, and the left leaf is always bigger, so you can search in sorted tree in O(log(n)), you just need to go right if if the key is smaller than head and to the left if bgger