I was thinking of implementing a binary search trees. I have implemented some very basic operations such as search, insert, delete.
Please share your experiences as to what all other operations i could perform on binary search trees, and some real time operations(basic) that is needed every time for any given situation.. I hope my question was clear..
Thanks.
Try a traversal operation (e.g., return the elements in the tree as a List, in order) and make sure that the tree remains balanced when elements are inserted/deleted.
You may want to look at the different ways of returning the tree:
Depth-first (going all the way down a branch and back up, repeat)
In-order (going around the tree)
Level-order (each level as drawn in a diagram)
Returning as a flat array.
And if you're feeling particularly adventurous, take an array and import it in as a tree. There is a specific format for this that goes something like (1(2(3)),(5) - that example isn't balanced but you get the idea, and it's on Wikipedia.
You might also want to implement a rotation operation. A rotation changes the structure without change the order of the elements. This is usually used to balance the tree (to make sure the leaves are all close to the same depth) and can also be used to change the root to a given element if you know it will be showing up in the search more often.
My ASCII art is not great, but a rotation can turn this tree:
f
d g
b e
a c
into this tree:
d
b f
a c e g
The second tree being balanced will make searches for f and g slower, and searches for d,a,b,c faster with e staying the same.
If this is homework, Good luck!
If this is curiousity, have fun!
If you want to implement this in production code without even knowing the basic operations, Don't do it!
http://www.boost.org/doc/libs/1_38_0/boost/graph/detail/array_binary_tree.hpp
At the very least, a binary search tree should have an insert, delete, and search operation. Any other operations will depend on what you intend to do with your tree, although some generic suggestions are: return parent of a given node, find left and right child of a given node, return the root node, preorder, inorder, and postorder traversals, as well as a breadth-first traversal.
If you really just want a list of stuff that might be useful or fun to implement...
Reverse the order of everything in the tree. This is O(N) I think?
Subtree, elements between x and y as a binary search tree themselves -- should be O(log N) I think?
Minimum, maximum? Yeah, trivial but I'm out of ideas!
I think I've seen somewhere "map" operation. When you change all elements of tree with monotonic function. I.e. function with property to always ascend ( f(x+dx) >= f(x) ) or always descend ( f(x+dx) <= f(x) ). In one case you'll need to apply that function to each node in other you'll need also to mirror tree (swap "left" and "right" nodes) because order of resulted values will be reversed.
Related
This is not a homework question. I heard that it is possible to mirror a binary tree i.e. flip it, in constant time. Is this really the case?
Sure, depending on your data structure, you would just do the equivalent of: instead of traversing down the left node and then the right node, you would traverse down the right node, and then the left node. This could be a parameter passed into the recursive function that traverses the tree (i.e. in C/C++, a bool bDoLeftFirst, and an if-statement that uses that parameter to decide which order to traverse the child nodes in).
Did you mean "invert binary tree", the problem which Max Howell could not solve and thus rejected by Google?
https://leetcode.com/problems/invert-binary-tree/
You can find solutions in the "discuss" section.
I'm trying to understand intuitively how to create recursive functions (for anything) but mostly for traversing a tree to check if that tree meets certain criteria.
For a random example, counting the number of nodes in a Binary Tree that are not the root or the leaves?
How do I go about recursively thinking about this problem and eventually coming up with a psuedocode solution?
Previously, I would start a problem like this by drawing out different scenarios and creating pseudocode to account for the cases I came up with, but I felt like (and did) miss some logic here and there.
Any suggestions?
In general, recursion is about finding a repetitive pattern and extrapolating it to a more general solution. According to Wikipedia:
"Recursion is the process of repeating items in a self-similar way.”
But it's a quite vague and unspecific definition. Let's go to the examples.
Binary tree is a highly repetitive structure. Consider this (almost) minimal example:
Now, imagine that you want to visit each node in that tree, assuming that you are starting in the root. It seems pretty straightforward:
visit(l_child)
already in root
visit(r_child)
already in root
/ \
/ \
visit(l_child) visit(r_child)
So basically you are:
starting in the root →
visiting the left child →
getting back to the root →
visiting the right child →
getting back to the root.
Take a look at another example of a binary tree:
As you can see there's a huge resemblance to the previous structure. Now, let's visit each coloured node:
visit(l_subtree)
already in root
visit(r_subtree)
It's exactly the same pattern. Since we know how to traverse the subtrees, we can think of a more general algorithm:
inorder(node):
if node == null: //we've reached the bottom of a binary tree
return 0
inorder(node.l_child)
do_something(node)
inorder(node.r_child)
And that's the whole inorder traversal algorithm. I'm certain that you can figure out on your own how to write a pseudocode for preorder and postorder.
If recursion is still not intuitive to you, you can check this fractals example.
I learnt that, to retain the structure of a BST while serialiing it, one needs to store in-order and one of either pre-order or post-order notations of the tree.
What makes in-order notation essential?
Note: Rewrote the answer, the previous version was incorrect.
For a general binary tree (with unique elements) your statement would be correct. Consider these two inputs (not very prettily drawn ;-) ):
If you serialize these using in-order traversal, both yield ABC. Similar cases exist for the other traversal types.
So why is a combination of in-order and pre-order enough?
The serialized shape of pre-order is [root][left subtree][right subtree]. The root is easy to identify, but you don't know where the left subtree ends and the right subtree begins.
Now consider in-order serialized: [left subtree][root][right subtree]. You know what the root is (thanks to pre-order), so it is really easy to identify the left and right subtrees.
Note that this is still not enough if the weights are not unique. If in the above example we change B into A, both trees would yield [AAC] for both traversal types.
For binary search trees deserialization is much easier. Why? Well, every subtree has the property that the nodes in the left subtree are smaller than the root, whereas the nodes in the right subtree are bigger. Therefore, the pre-order serialization [root][left subtree][right subtree] can easily and unambiguously be parsed again. So, in conclusion, the person who told you that at least two serialization approaches are needed for a BST was mistaken (maybe he also forgot about the properties of a BST).
Storing BSTs in some sort of order while serializing likely makes it simpler to build upon retrieval. Imagine that you have your BST and just pick nodes at random to serialize and store. When retrieving, it will retrieve in the order stored and then after the fact, something would have to go through and connect all of the nodes. While that should be possible - all the information is there - it seems unnecessary. Each node is just sort of floating; the deserialization process/program has to maintain a list of all the nodes (or similar) while it walks through the list connecting piece by piece.
On the other hand, if you store them in some sort of prescribed order, it can build the tree while reading in each node - it knows where to connect the nodes since they are in order (for clarity: this doesn't imply the next node must be connected to the previously-read node, in the case of adjacent leaves; it's just much simpler to hop up enough levels to the appropriate branch). This should be faster, and potentially use less memory (no list/container while building).
An explanation about Threaded Binary Search Trees (skip it if you know them):
We know that in a binary search tree with n nodes, there are n+1 left and right pointers that contain null. In order to use that memory that contain null, we change the binary tree as follows -
for every node z in the tree:
if left[z] = NULL, we put in left[z] the value of tree-predecessor(z) (i.e, a pointer to the node which contains the predecessor key),
if right[z] = NULL, we put in right[z] the value of tree-successor(z) (again, this is a pointer to the node which contains the successor key).
A tree like that is called a threaded binary search tree, and the new links are called threads.
And my question is:
What is the main advatage of Threaded Binary Search Trees (in comparison to "Regular" binary search trees).
A quick search in the web has told me that it helps to implement in-order traversal iteratively, and not recursively.
Is that the only difference? Is there another way we can use the threads?
Is that so meaningful advantage? and if so, why?
Recursive traversal costs O(n) time too, so..
Thank you very much.
Non-recursive in-order scan is a huge advantage. Imagine that somebody asks you to find the value "5" and the four values that follow it. That's difficult using recursion. But if you have a threaded tree then it's easy: do the recursive in-order search to find the value "5", and then follow the threaded links to get the next four values.
Similarly, what if you want the four values that precede a particular value? That's difficult with a recursive traversal, but trivial if you find the item and then walk the threaded links backwards.
The main advantage of Threaded Binary Search Trees over Regular one is in Traversing nature which is more efficient in case of first one as compared to other one.
Recursively traversing means you don't need to implement it with stack or queue .Each node will have pointer which will give inorder successor and predecessor in more efficient way , while implementing traversing in normal BST need stack which is memory exhaustive (as here programming language have to consider implementation of stack) .
The idea of deleting a node in BST is:
If the node has no child, delete it and update the parent's pointer to this node as null
If the node has one child, replace the node with its children by updating the node's parent's pointer to its child
If the node has two children, find the predecessor of the node and replace it with its predecessor, also update the predecessor's parent's pointer by pointing it to its only child (which only can be a left child)
the last case can also be done with use of a successor instead of predecessor!
It's said that if we use predecessor in some cases and successor in some other cases (giving them equal priority) we can have better empirical performance ,
Now the question is , how is it done ? based on what strategy? and how does it affect the performance ? (I guess by performance they mean time complexity)
What I think is that we have to choose predecessor or successor to have a more balanced tree ! but I don't know how to choose which one to use !
One solution is to randomly choose one of them (fair randomness) but isn't better to have the strategy based on the tree structure ? but the question is WHEN to choose WHICH ?
The thing is that is fundamental problem - to find correct removal algorithm for BST. For 50 years people were trying to solve it (just like in-place merge) and they didn't find anything better then just usual algorithm (with predecessor/successor removing). So, what is wrong with classic algorithm? Actually, this removing unbalances the tree. After several random operations add/remove you'll get unbalanced tree with height sqrt(n). And it is no matter what you choosed - remove successor or predecessor (or random chose beetwen these ways) - the result is the same.
So, what to choose? I'm guessing random based (succ or pred) deletion will postpone unbalancing of your tree. But, if you want to have perfectly balanced tree - you have to use red-black ones or something like that.
As you said, it's a question of balance, so in general the method that disturbs the balance the least is preferable. You can hold some metrics to measure the level of balance (e.g., difference from maximal and minimal leaf height, average height etc.), but I'm not sure whether the overhead worth it. Also, there are self-balancing data structures (red-black, AVL trees etc.) that mitigate this problem by rebalancing after each deletion. If you want to use the basic BST, I suppose the best strategy without apriori knowledge of tree structure and the deletion sequence would be to toggle between the 2 methods for each deletion.