So I was looking at tree traversal algorithms. For example, in a K-d tree traversal, our goal is to traverse the nodes down to the leaf. This isn't so much of a tree search, more just a root to leaf traversal.
In such a case, a recursive solution would suffice. However, in languages like C, calling a function recursively requires pushing values onto the stack and jumping between stack frames etc. The standard recursive method would be something like:
void traverse(Node* ptr){
if(ptr.left == null && ptr.right == null) return;
if(ptr.val >= threshold) traverse(ptr.right);
else if(ptr.val < threshold) traverse(ptr.left);
}
traverse(root);
Hence, considering that there's a definite upper bound on binary trees (I'm sure this could be extended to other tree types too), would it be more efficient to perform this traversal iteratively instead:
Node* ptr = root;
for(int i = 0; i < tree.maxHeight; i++) {
if (ptr.left == null && ptr.right == null) break;
if (ptr.val >= threshold) ptr = ptr.right;
else if (ptr.val < threshold) ptr = ptr.left
}
The max height of a binary tree would be its number of nodes, while a balanced one would have log(n). Hence I was wondering if there were any downsides to the iterative solution or if it would indeed be faster than plain recursion. Is there any concept I'm missing in this?
Your code isn't so much a tree traversal as it is a tree search. If all you want to do is go from the root to a leaf, then the iterative solution is simpler and faster, and will use less memory because you don't have to deal with stack frames.
If you want a full traversal of the tree: that is, an in-order traversal where you visit every node, then you either write a recursive algorithm, or you implement your own stack where you explicitly push and pop nodes. The iterative method where you implement your own stack will potentially be faster, but you can't avoid the O(log n) (in a balanced binary tree) or the possible O(n) (in a degenerate tree) memory usage. Implementing an explicit stack will use somewhat less memory, simply because it only has to contain tree node pointers, whereas a full stack frame contains considerably more.
Related
I am currently on 94. Binary Tree Traversal on leetcode and I am not sure how to analyze the run time and space complexity of the question. In my opinion, the time complexity for the question seems to be O(n) where n is number of node in the tree, since we need to traverse every single node in the tree. However, for space it is more controversial, I think it is O(h), where h is the max height of the tree, because I think the call stack incurred by recursion can go as far as max height of the tree, and the stack will pop as we backtrack. Some people suggest it is actually O(n), because in the worst case where the tree is completely left or rightly skewed, the call stack is as deep as the number of node available, but isn't O(h) also works here since max height is also the number of node in the tree. While O(n) is the worst case but O(h) seems more accurate and fit more scenario including the example above, which one is should be the answer? Or more specifically, which one would be accepted by interviewer during coding interview?
I will also paste my solution here:
class Solution {
public List < Integer > inorderTraversal(TreeNode root) {
List < Integer > res = new ArrayList < > ();
helper(root, res);
return res;
}
public void helper(TreeNode root, List < Integer > res) {
if (root != null) {
if (root.left != null) {
helper(root.left, res);
}
res.add(root.val);
if (root.right != null) {
helper(root.right, res);
}
}
}
}
The space complexity is always O(n), even when the tree is balanced. This is because both the input as the output have a size of O(n). The output is newly allocated memory, so even if we would ignore the memory already taken by the input, the algorithm would still be using O(n) additional memory.
If we don't count the memory needed for the output either, then indeed the space complexity is O(h).
Now, it is less common to use the height of the input tree as a parameter for asymptotic complexity. It is more common to use the number of nodes for that purpose.
But either would be OK to mention during an interview, as long as you are clear about which space is intended: is it only about auxiliary space? ... excluding the auxiliary space that the output may occupy?
I have question on runtime for recursive pattern.
Example 1
int f(int n) {
if(n <= 1) {
return 1;
}
return f(n - 1) + f(n - 1);
}
I can understand that the runtime for the above code is O(2^N) because if I pass 5, it calls 4 twice then each 4 calls 3 twice and follows till it reaches 1 i.e., something like O(branches^depth).
Example 2
Balanced Binary Tree
int sum(Node node) {
if(node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
}
I read that the runtime for the above code is O(2^log N) since it is balanced but I still see it as O(2^N). Can anyone explain it?
When the number of element gets halved each time, the runtime is log N. But how a binary tree works here?
Is it 2^log N just because it is balanced?
What if it is not balanced?
Edit:
We can solve O(2^log N) = O(N) but I am seeing it as O(2^N).
Thanks!
Binary tree will have complexity O(n) like any other tree here because you are ultimately traversing all of the elements of the tree. By halving we are not doing anything special other than calculating sum for the corresponding children separately.
The term comes this way because if it is balanced then 2^(log_2(n)) is the number of elements in the tree (leaf+non-leaf).(log2(n) levels)
Again if it is not balanced it doesn't matter. We are doing an operation for which every element needs to be consideredmaking the runtime to be O(n).
Where it could have mattered? If it was searching an element then it would have mattered (whether it is balanced or not).
I'll take a stab at this.
In a balanced binary tree, you should have half the child nodes to the left and half to the right of each parent node. The first layer of the tree is the root, with 1 element, then 2 elements in the next layer, then 4 elements in the next, then 8, and so on. So for a tree with L layers, you have 2^L - 1 nodes in the tree.
Reversing this, if you have N elements to insert into a tree, you end up with a balanced binary tree of depth L = log_2(N), so you only ever need to call your recursive algorithm for log_2(N) layers. At each layer, you are doubling the number of calls to your algorithm, so in your case you end up with 2^log_2(N) calls and O(2^log_2(N)) run time. Note that 2^log_2(N) = N, so it's the same either way, but we'll get to the advantage of a binary tree in a second.
If the tree is not balanced, you end up with depth greater than log_2(N), so you have more recursive calls. In the extreme case, when all of your children are to the left (or right) of their parent, you have N recursive calls, but each call returns immediately from one of its branches (no child on one side). Thus you would have O(N) run time, which is the same as before. Every node is visited once.
An advantage of a balanced tree is in cases like search. If the left-hand child is always less than the parent, and the right-hand child is always greater than, then you can search for an element n among N nodes in O(log_2(N)) time (not 2^log_2(N)!). If, however, your tree is severely imbalanced, this search becomes a linear traversal of all of the values and your search is O(N). If N is extremely large, or you perform this search a ton, this can be the difference between a tractable and an intractable algorithm.
I am working through the book "Cracking the coding interview" by Gayle McDowell and came across an interesting recursive algorithm that sums the values of all the nodes in a balanced binary search tree.
int sum(Node node) {
if (node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
}
Now Gayle says the runtime is O(N) which I find confusing as I don't see how this algorithm will ever terminate. For a given node, when node.left is passed to sum in the first call, and then node.right is consequently passed to sum in the second call, isn't the algorithm computing sum(node) for the second time around? Wouldn't this process go on forever? I'm still new to recursive algorithms so it might just not be very intuitive yet.
Cheers!
The process won't go on forever. The data structure in question is a Balanced Binary Search Tree and not a Graph which can contain cycles.
Starting from root, all the nodes will be explored in the manner - left -> itself -> right, like a Depth First Search.
node.left will explore the left subtree of a node and node.right will explore the right subtree of the same node. Both subtrees have nothing intersecting. Draw the trail of program control to see the order in which the nodes are explored and also to see that there is no overlapping in the traversal.
Since each node will be visited only once and the recursion will start unwinding when a leaf node will be hit, the running time will be O(N), N being the number of nodes.
The key to understanding a recursive algorithm is to trust that it does what it is deemed to. Let me explain.
First admit that the function sum(node) returns the sum of the values of all nodes of the subtree rooted at node.
Then the code
if (node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
can do two things:
if node is null, return 0; this is a non-recursive case and the returned value is trivially correct;
otherwise, the fuction computes the sum for the left subtree plus the value at node plus the sum for the right subtree, i.e. the sum for the subtree rooted at node.
So in a way, if the function is correct, then it is correct :) Actually the argument isn't circular thanks to the non-recursive case, which is also correct.
We can use the same way of reasoning to prove the running time of the algorithm.
Assume that the time required to process the tree rooted at node is proportional to the size of this subtree, let |T|. This is another act of faith.
Then if node is null, the time is constant, let 1 unit. And if node isn't null, the time is |L| + 1 + |R| units, which is precisely |T|. So if the time to sum a subtree is proportional to the size of the subtree, the time to sum a tree is prortional to the size of the tree!
I was thinking about the different techniques to check the validity of a binary search tree. Naturally, the invariant that needs to be maintained is that the left subtree must be less than or equal to the current node, which in turn should be less than or equal to the right subtree. There are a couple of different ways to tackle this problem: The first is to check the constraints for values on each subtree and can be outlined like this (in Java, for integer nodes):
public static boolean isBST(TreeNode node, int lower, int higher){
if(node == null) return true;
else if(node.data < lower || node.data > higher) return false;
return isBST(node.left, lower, node.data) && isBST(node.right, node.data, higher);
}
There is also another way to accomplish this using an inOrder traversal where you keep track of the previous element and make sure the progression is strictly non-decreasing. Both these methods explore the left subtrees first though, and in the event we have an inconsistency in the middle of the root's right subtree, what is the recommended path? I know that a BFS variant could be used, but would it be possible to use multiple techniques at the same time and is that recommended? For example, we could to a BFS, an inorder and a reverseInorder and return the moment there is a failure detected. This could only maybe be desirable for really large trees in order to reduce the average runtime at the cost of a bit more space and multiple threads accessing the same data structure. Ofcourse, if we're using a simple iterative solution for inorder solution (NOT a morris traversal that modifies the tree) we will be using up O(lgN) space.
I would expect this to depend on your precise situation. In particular, what is the probability that your tree will fail to be binary, and the expected depth at which the failure will occur.
For example, if it is likely that the tree is correctly binary, then it would be wasteful to use 3 multiple techniques as the overall runtime for a valid tree will be roughly tripled.
What about iterative deepening depth-first search?
It is generally (asymptotically) as fast as breadth-first search (and also finds any early failure), but uses as little memory as depth-first search.
It would typically look something like this:
boolean isBST(TreeNode node, int lower, int higher, int depth)
{
if (depth == 0)
return true;
...
isBST(..., depth-1)
...
}
Caller:
boolean failed = false;
int treeHeight = height(root);
for (int depth = 2; depth <= treeHeight && !failed; depth++)
failed = !isBST(root, -INFINITY, INFINITY, depth);
I noticed on the AVL Tree Wikipedia page the following comment:
"If each node additionally records the size of its subtree (including itself and its descendants), then the nodes can be retrieved by index in O(log n) time as well."
I've googled and have found a few places mentioning accessing by index but can't seem to find an explanation of the algorithm one would write.
Many thanks
[UPDATE] Thanks people. If found #templatetypedef answer combined with one of #user448810 links to particularly help. Especially this snipit:
"The key to both these functions is that the index of a node is the size of its left child. As long as we are descending a tree via its left child, we just take the index of the node. But when we have to move down the tree via its right child, we have to adjust the size to include the half of the tree that we have excluded."
Because my implementation is immutable I didn't need to do any additional work when rebalancing as each node calculates it's size on construction (same as the scheme impl linked)
My final implementation ended up being:
class Node<K,V> implements AVLTree<K,V> { ...
public V index(int i) {
if (left.size() == i) return value;
if (i < left.size()) return left.index(i);
return right.index(i - left.size() - 1);
}
}
class Empty<K,V> implements AVLTree<K,V> { ...
public V index(int i) { throw new IndexOutOfBoundsException();}
}
Which is slightly different from the other implementations, let me know if you think I have a bug!
The general idea behind this construction is to take an existing BST and augment each node by storing the number of nodes in the left subtree. Once you have done this, you can look up the nth node in the tree by using the following recursive algorithm:
To look up the nth element in a BST whose root node has k elements in its left subtree:
If k = n, return the root node (since this is the zeroth node in the tree)
If n ≤ k, recursively look up the nth element in the left subtree.
Otherwise, look up the (n - k - 1)st element in the right subtree.
This takes time O(h), where h is the height of the tree. In an AVL tree, this O(log n). In CLRS, this construction is explored as applied to red/black trees, and they call such trees "order statistic trees."
You have to put in some extra logic during tree rotations to adjust the cached number of elements in the left subtree, but this is not particularly difficult.
Hope this helps!