Considering a Skewed tree, it has all the nodes only a particular direction (left or right). Can we say that a Linked List with n-nodes is also a Skewed tree with height n ?
Yes. A list is a degenerate tree. You could call it a "maximally unbalanced tree" if you want.
In fact, that's exactly what someone means when they say that you need to balance a binary search tree in order to get the O(log n) lookup performance, because if your tree becomes unbalanced, it degenerates into a list and lookup performance becomes O(n).
It is also sometimes useful to think in the other direction: most people have no trouble at all understanding how a persistent list works, but many people have trouble understanding how a persistent tree works. But the thing is: it actually works exactly like a persistent list, and it's generally easy to understand how a persistent tree works, if you start from a persistent list and then re-interpret that list as a degenerate tree.
Related
Why do we always want shallow binary tree? In what cases is shallow binary tree better than non-shallow/minimum depth tree?
I am just confused as my prof keeps saying we want to aim for shallowest possible binary tree but I do not understand why. I guess smallar is better but is there any specific concrete reason? Sorry for my bad english thanks for your help
I'm assuming this is in regards to binary search trees - if not, please let me know and I can update this answer.
In a binary search tree, the cost of almost every operation (insertion, deletion, lookup, successor, predecessor, min, max, range search, split, join, etc.) depends on the height of the binary search tree. The reason for this is that these operations work by walking down the tree from the root until they either fall off the tree or find what they're looking for. The deeper the tree, the longer this can take if you get bad inputs.
By shuffling nodes around to keep the tree height low, we can make it so that these operations are, in general, very fast. A tree with height h can have at most 2h - 1 nodes in it, which is a huge number compared with h (figure that if h = 20, 2h - 1 is over a million!), so if you make an effort to pack the nodes into the tree higher up and closer to the root, you'll get better operation speeds all around.
There are some cases where it's actually beneficial to have trees that are as imbalanced as possible. For example, if you have a binary search tree and know in advance that some elements will be looked up more than others, you may want to shuffle the nodes around in the tree to put the high-frequency items higher up and the low-frequency items deeper in the tree. In non-binary-search-tree contexts, the randomized meldable priority queue works by randomly walking down a tree doing merges, and the less balanced the tree is the more likely it is for these operations to end early by falling off the tree.
I'm using a red-black binary tree with linked leafs on a project (Java's TreeMap), to quickly find and iterate through the items. The problem is that I can easily get 35000 items or so on the tree, and several times I have to remove "all items above X", which can be almost the entire tree (like 30000 items at once, because all of them are bigger than X), and that takes too much time to remove them and rebalance the tree each time.
Is there any algorithm that can help me here (so I can make my own tree implementation)?
You're looking for the split operation on a red/black tree, which takes the red/black tree and some value k and splits it into two red/black trees, one with all keys greater than or equal to k and one with all keys less than k. This can be implemented in O(log n) time if you augment the structure to store some extra information. In your case, since you're using Java, you can just split the tree and discard the root of the tree you don't care about so that the garbage collector can handle it.
Details on how to implement this are given in this paper, starting on page 9. It's implemented in terms of a catenate (or join) operations which combines two trees, but I think the exposition is pretty clear.
Hope this helps!
I'm studying how to balance trees and I have some questions
Is it possible to balance a normal binary tree? If yes, which algorithm should be used?
Do I necessarily have to use a AVL or Red-black tree to obtain a balanced tree? How do these work?
I read something about rotations, weights but I'm kind of confused right now
Is it possible to balance a normal binary tree? If yes, which
algorithm should be used?
In O(n) you can build a complete tree, and populate it with the elements in in-order traversal.
It cannot be done better, because A BST might in rare cases decay to a chain (linked list), where all nodes have one son as null. In this cases, accessing the element in the middle is O(n) itself.
Do I necessarily have to use a AVL or Red-black tree to obtain a
balanced tree?
There are other balanced trees such as B+ trees, and other data structures (not trees) such as skip-lists. You might want to have a look at a list of known data structures, especially the trees section.
How do these work?
I find the wikipedia articles both on AVL tree and Red-Black tree very informative. If you have something specific you don't understand there - you should ask.
Also: Trying to implement a balanced trees on your own (Implement a known tree, not inventing a new one - of course) - is great for educational purposes, and by doing so - you will definitely understand how it works.
Well... AVL and red-black trees are "normal binary trees" that are balanced, and keep that balance (for some definition of "balanced"). I'm not a computer science teacher to come up with my own explanation of the algorithms, and I guess you aren't looking for a cut&paste from Wikipedia :-)
Now, for balancing binary trees: if the tree is a search tree (i.e. 'sorted', but 'balanced' doesn't really make all that much sense if it's not) you could always just recreate the tree. The simplest algorithm is to use an array with all the elements from the tree, in sorted order (easily obtained from an inorder traversal). Then build an algorithm around this general idea:
take the middle element of the array as the root of the tree. This will create a tree node, and two arrays "left" and "right", which are meant to form the left and right subtrees
Apply this same algorithm recursively to create a tree from the "left" array and one from the "right" array. These two trees become the children of the parent node.
You might have to be careful with the case when the array has an even number of elements: there is no obvious "middle element", and removing one of the two candidates will create arrays of different sizes. I'm too lazy to analyze this further to see if that could offset the whole balancing thing.
Of course, doing something like this every time you change the tree isn't such a great idea; you really want to use self-balancing trees like AVL for that. Doing it after creating the tree might not be all that useful either: you could just use the array itself and do binary searches on it, instead of making a tree. The array IS just another form of a binary tree...
EDIT: there is a reason why a lot of computer scientists have spent a lot of time developing data structures and algorithms that perform well in certain situations. Rolling your own version of a balanced binary tree is unlikely to beat these...
Can you balance an unbalanced tree?
Yes, You can. You use the same balance function you created for your AVL Tree inside a PostOrderTraversal function.
Should You Do it?
No!!! You should recreate it! Balancing the tree will cost you unnecessarily.
How do I recreate it?
Use an InOrderTraversal function to put your nodes into an array. Then use a variable that will always go to the middle of the array and the left middle, right middle and add the nodes to the new Tree.
Is it possible to balance a normal binary tree? If yes, which algorithm should be used?
Do I necessarily have to use a AVL or Red-black tree to obtain a balanced tree? How do these work?
In general, Trees are either unbalanced or balanced. AVL, Red-Black, 2-3, e.t.c. are just trees with some properties and according to their properties they use some extra variables and functions. Those extra variables and function can also be used in the "normal" binary trees. In other words those functions and variables are not bounded to their respective type of tree. The nodes of a "normal" binary tree always had a balance! You just didn't use it because you didn't care if the "normal" binary tree was balanced or not. They also always had a height, depth, e.t.c. You just didn't care. In general, you will realize at one point that all are a trade-off between speed and memory. If you know what you are doing, more memory usage will make your program faster. Less memory usage means more calculations so you will have a slower program.
In case if I don't know the probabilities of accessing each element, but I'm sure that some elements will be accessed far more often then the others, I will use Splay tree. What should I use if I already know all the probabilities? I assume that there should be some data structure that is better than splay trees for this case.
I'm trying to imagine all the cases where and when should I use every type of the search trees. Maybe someone can post some links to articles about comparison of all the search trees, and similar structures?
EDIT I'd like to still have O(log n) as the worst case, but in avarage it should be faster. Splay trees are good example, but I'd like to predefine the configuration of this tree.
For example, I have an array of elements to store [a1, a2, .. an], and the probabilities for each element [p1, p2, .. pn], which define how often I will access each element. I can create splay tree, add each element to the splay tree (O(n log n)), and then access them with given probabilities to make the desired tree. So if I have probabilities [1/2, 1/4, 1/4], I need to splay the first element, to make it be among the first. So, I need to order elements by probabilities, and splay them in the order from the lowest to the highest access probability. That takes O(n log n) also. So, overall time of building such tree is O(n log n) with a big constant. My goal is to lower this number.
I do not mind using something else, but not search tree, but I'd like for the time to be lower then in case of Splay tree. And I want search, insert and delete be in the range of O(log n) amortized.
Edit: I didn't see that you wanted to update the tree dynamically - the below algorithm requires all elements and probabilities to be known in advance. I'll leave the post up in case someone in such a situation comes along.
If you happen to be in possession of the third edition of Introduction to Algorithms by Cormen et al., it describes a dynamic programming algorithm for creating optimal binary search trees when you know all of the probabilities.
Here is a rough outline of the algorithm: First, sort the elements (on element value, not probability). We don't yet know which element should be the root of the tree, but we know that all elements that will be to the left of the root in the tree will be to the left of that element in the list, and vice versa for the elements to the right of the root. If we choose the element at index k to be the root, we get two subproblems: how to construct an optimal tree for the elements 0 through k-1, and for the elements k+1 through n-1. Solve these problems recursively, so that you know the expected cost for a search in a tree where the root is element k. Do this for all possible choices of k, and you will find which tree is the best one. Use dynamic programming or memoization in order to save computation time.
Use a hash table.
You never mentioned needing ordered iteration, and by sacrificing this you can achieve amortized O(1) insert/access complexity, better than O(log n).
Specifically, use a hash table with linked list buckets, and use the move-to-front optimization. What this means is each time you search a bucket (linked list) with more than one item, you move the item found to the front of that bucket. The next time you access this element, it will already be at the front.
If you know the access probabilities, you can further refine the technique. When inserting a new element into a bucket, don't insert it onto the front, but rather insert such that you maintain most-probable-first order. Note the move-to-front technique will tend to perform this sort implicitly already, but you can help it bootstrap more quickly.
If your tree is not going to change once created, you probably should use a hash table or tango tree:
http://en.wikipedia.org/wiki/Tango_tree
Hash tables, when not overloaded, are O(1) lookup, degrading to a O(n) when overloaded.
Tango trees, once constructed, are O(loglogn) lookup. They do not support deletion or insertion.
There's also something known as a "perfect hash" that might be good for your use.
I've tried to understand what sorted trees are and binary trees and avl and and and ...
I'm still not sure, what makes a sorted tree sorted? And what is the complexity (Big-Oh) between searching in a sorted and searching in an unsorted tree? Hope you can help me.
Binary Trees
There exists two main types of binary trees, balanced and unbalanced. A balanced tree aims to keep the height of the tree (height = the amount of nodes between the root and the furthest child) as even as possible. There are several types of algorithms for balanced trees, the two most famous being AVL- and RedBlack-trees. The complexity for insert/delete/search operations on both AVL and RedBlack trees is O(log n) or better - which is the important part. Other self balancing algorithms are AA-, Splay- and Scapegoat-tree.
Balanced trees gain their property (and name) of being balanced from the fact that after every delete or insert operation on the tree the algorithm introspects the tree to make sure it's still balanced, if it's not it will try to fix this (which is done differently with each algorithm) by rotating nodes around in the tree.
Normal (or unbalanced) binary trees do not modify their structure to keep themselves balanced and have the risk of, most often overtime, to become very inefficient (especially if the values are inserted in order). However if performance is of no issue and you mainly want a sorted data structure then they might do. The complexity for insert/delete/search operations on an unbalanced tree range from O(1) (best case - if you want the root) to O(n) (worst-case if you inserted all nodes in order and want the largest node)
There exists another variation which is called a randomized binary tree which uses some kind of randomization to make sure the tree doesn't become fully unbalanced (which is the same as a linked list)
A binary search tree is an "tree"-structure where every node has two children-nodes.
The left nodes all have the property of being less than its parent, and the right-nodes are all greater than its parent.
The intressting thing with an binary-tree is that we can search for an value in O(log n) when the tree is properly sorted. Doing the same search in an LinkedList for an example would give us the searchspeed of O(n).
The best way to go about learning datastructures would be to do a day of googling and reading wikipedia articles.
This might get you started
http://en.wikipedia.org/wiki/Binary_search_tree
Do a google search for the following:
site:stackoverflow.com binary trees
to get a list of SO questions which will answer your several questions.
There isn't really a lot of point in using a tree structure if it isn't sorted in some fashion - if you are planning on searching for a node in the tree and it is unsorted, you will have to traverse the entire tree (O(n)). If you have a tree which is sorted in some fashion, then it is only necessary to traverse down a single branch of the tree (typically O(log n)).
In binary tree the right leaf is always smaller then the head, and the left leaf is always bigger, so you can search in sorted tree in O(log(n)), you just need to go right if if the key is smaller than head and to the left if bgger