tree traversal runtime and space analysis general approach - algorithm

I am currently on 94. Binary Tree Traversal on leetcode and I am not sure how to analyze the run time and space complexity of the question. In my opinion, the time complexity for the question seems to be O(n) where n is number of node in the tree, since we need to traverse every single node in the tree. However, for space it is more controversial, I think it is O(h), where h is the max height of the tree, because I think the call stack incurred by recursion can go as far as max height of the tree, and the stack will pop as we backtrack. Some people suggest it is actually O(n), because in the worst case where the tree is completely left or rightly skewed, the call stack is as deep as the number of node available, but isn't O(h) also works here since max height is also the number of node in the tree. While O(n) is the worst case but O(h) seems more accurate and fit more scenario including the example above, which one is should be the answer? Or more specifically, which one would be accepted by interviewer during coding interview?
I will also paste my solution here:
class Solution {
public List < Integer > inorderTraversal(TreeNode root) {
List < Integer > res = new ArrayList < > ();
helper(root, res);
return res;
}
public void helper(TreeNode root, List < Integer > res) {
if (root != null) {
if (root.left != null) {
helper(root.left, res);
}
res.add(root.val);
if (root.right != null) {
helper(root.right, res);
}
}
}
}

The space complexity is always O(n), even when the tree is balanced. This is because both the input as the output have a size of O(n). The output is newly allocated memory, so even if we would ignore the memory already taken by the input, the algorithm would still be using O(n) additional memory.
If we don't count the memory needed for the output either, then indeed the space complexity is O(h).
Now, it is less common to use the height of the input tree as a parameter for asymptotic complexity. It is more common to use the number of nodes for that purpose.
But either would be OK to mention during an interview, as long as you are clear about which space is intended: is it only about auxiliary space? ... excluding the auxiliary space that the output may occupy?

Related

Recursive vs Iterative Tree Traversal

So I was looking at tree traversal algorithms. For example, in a K-d tree traversal, our goal is to traverse the nodes down to the leaf. This isn't so much of a tree search, more just a root to leaf traversal.
In such a case, a recursive solution would suffice. However, in languages like C, calling a function recursively requires pushing values onto the stack and jumping between stack frames etc. The standard recursive method would be something like:
void traverse(Node* ptr){
if(ptr.left == null && ptr.right == null) return;
if(ptr.val >= threshold) traverse(ptr.right);
else if(ptr.val < threshold) traverse(ptr.left);
}
traverse(root);
Hence, considering that there's a definite upper bound on binary trees (I'm sure this could be extended to other tree types too), would it be more efficient to perform this traversal iteratively instead:
Node* ptr = root;
for(int i = 0; i < tree.maxHeight; i++) {
if (ptr.left == null && ptr.right == null) break;
if (ptr.val >= threshold) ptr = ptr.right;
else if (ptr.val < threshold) ptr = ptr.left
}
The max height of a binary tree would be its number of nodes, while a balanced one would have log(n). Hence I was wondering if there were any downsides to the iterative solution or if it would indeed be faster than plain recursion. Is there any concept I'm missing in this?
Your code isn't so much a tree traversal as it is a tree search. If all you want to do is go from the root to a leaf, then the iterative solution is simpler and faster, and will use less memory because you don't have to deal with stack frames.
If you want a full traversal of the tree: that is, an in-order traversal where you visit every node, then you either write a recursive algorithm, or you implement your own stack where you explicitly push and pop nodes. The iterative method where you implement your own stack will potentially be faster, but you can't avoid the O(log n) (in a balanced binary tree) or the possible O(n) (in a degenerate tree) memory usage. Implementing an explicit stack will use somewhat less memory, simply because it only has to contain tree node pointers, whereas a full stack frame contains considerably more.

What would be the complexity of this Sorting Algorithm? What are the demerits of using the same?

The sorting algorithm can be described as follows:
1. Create Binary Search Tree from the Array data.
(For multiple occurences, increment occurence variable of the current Node)
2. Traverse BST in inorder fashion.
(Inorder traversal will return Sorted order of elements in array).
3. At each node in inorder traversal, overwrite the array element at current index(index beginning at 0) with current node value.
Here's a Java implementation for the same:
Structure of Node Class
class Node {
Node left;
int data;
int occurence;
Node right;
}
inorder function
(returning type is int just for obtaining correct indices at every call, they serve no other purpose)
public int inorder(Node root,int[] arr,int index) {
if(root == null) return index;
index = inorder(root.left,arr,index);
for(int i = 0; i < root.getOccurence(); i++)
arr[index++] = root.getData();
index = inorder(root.right,arr,index);
return index;
}
main()
public static void main(String[] args) {
int[] arr = new int[]{100,100,1,1,1,7,98,47,13,56};
BinarySearchTree bst = new BinarySearchTree(new Node(arr[0]));
for(int i = 1; i < arr.length; i++)
bst.insert(bst.getRoot(),arr[i]);
int dummy = bst.inorder(bst.getRoot(),arr,0);
System.out.println(Arrays.toString(arr));
}
The space complexity is terrible, I know, but it should not be such a big issue unless the sort is used for an extremely HUGE dataset. However, as I see it, isn't Time Complexity O(n)? (Insertions and Retrieval from BST is O(log n), and each element is touched once, making it O(n)). Correct me if I am wrong as I haven't yet studied Big-O well.
Assuming that the amortized (average) complexity of an insertion is O(log n), then N inserts (construction of the tree) will give O(log(1) + log(2) + ... + log(N-1) + log(N) = O(log(N!)) = O(NlogN) (Stirling's theorem). To read back the sorted array, perform an in-order depth-first traversal, which visits each node once, and is hence O(N). Combining the two you get O(NlogN).
However this requires that the tree is always balanced! This will not be the case in general for the most basic binary tree, as insertions do not check the relative depths of each child tree. There are many variants which are self-balancing - the two most famous being Red-Black trees and AVL trees. However the implementation of balancing is quite complicated and often leads to a higher constant factor in real-life performance.
the goal was to implement an O(n) algorithm to sort an Array of n elements with each element in the range [1, n^2]
In that case Radix sort (counting variation) would be O(n), taking a fixed number of passes (logb(n^2)), where b is the "base" used for the field, and b a function of n, such as b == n, where it would take two passes, or b == sqrt(n), where it would take four passes, or if n is small enough, b == n^2 in where it would take one pass and counting sort could be used. b could be rounded up to the next power of 2 in order to replace division and modulo with binary shift and binary and. Radix sort needs O(n) extra space, but so do the links for a binary tree.

RunTime Complexity of Recursive BinaryTree Traversal

This is my solution to the problem, where, given a Binary Tree, you're asked to find, the total sum of all non-directly linked nodes. "Directly linked" refers to parent-child relationship, just to be clear.
My solution
If the current node is visited, you're not allowed to visit the nodes at the next level. If the current node, however, is not visited, you may or may not visit the nodes at the next level.
It passes all tests. However, what is the run time complexity of this Recursive Binary Tree Traversal. I think it's 2^n because, at every node, you have two choices, whether to use it, or not use it, and accordingly, the next level, would have two choices for each of these choices and so on.
Space complexity : Not using any additional space for storage, but since this is a recursive implementation, stack space is used, and the maximum elements in the stack, could be the height of the tree, which is n. so O(n) ?
public int rob(TreeNode root) {
return rob(root, false);
}
public int rob(TreeNode root, boolean previousStateUsed) {
if(root == null)
return 0;
if(root.left == null && root.right == null)
{
if(previousStateUsed == true)
return 0;
return root.val;
}
if(previousStateUsed == true)
{
int leftSumIfCurrentIsNotUsedNotUsed = rob(root.left, false);
int rightSumIfCurrentIsNotUsed = rob(root.right, false);
return leftSumIfCurrentIsNotUsedNotUsed + rightSumIfCurrentIsNotUsed;
}
else
{
int leftSumIfCurrentIsNotUsedNotUsed = rob(root.left, false);
int rightSumIfCurrentIsNotUsed = rob(root.right, false);
int leftSumIsCurrentIsUsed = rob(root.left, true);
int rightSumIfCurrentIsUsed = rob(root.right, true);
return Math.max(leftSumIfCurrentIsNotUsedNotUsed + rightSumIfCurrentIsNotUsed, leftSumIsCurrentIsUsed + rightSumIfCurrentIsUsed + root.val);
}
}
Your current recursive solution would be O(2^n). It's pretty clear to see if we take an example:
Next, let's cross out alternating layers of nodes:
With the remaining nodes we have about n/2 nodes (this will vary, but you can always remove alternating layers to get at least n/2 - 1 nodes worst case). With just these nodes, we can make any combination of them because none of them are conflicting. Therefore we can be certain that this takes at least Omega( 2^(n/2) ) time worst case. You can probably get a tighter bound, but this should make you realize your solution will not scale well.
This problem is a pretty common adaptation of the Max Non-Adajacent Sum Problem.
You should be able to use dynamic programming on this. I would highly recommend it. Imagine we are finding the solution for node i. Let's assume we already have the solution to nodes i.left and i.right and let's also assume we have the solution to their children (i's grandchildren). We now have 2 options for i's max solution:
max-sum(i.left) + max-sum(i.right)
i.val + max-sum(i.left.left) + max-sum(i.left.right) + max-sum(i.right.left) + max-sum(i.right.right)
You take the max of these and that's your solution for i. You can perform this bottom-up DP or use memoization in your current program. Either should work. The best part is, now your solution is O(n)!

Using Breadth First Search and inorder traversal to analyze the validity of a really large binary search tree

I was thinking about the different techniques to check the validity of a binary search tree. Naturally, the invariant that needs to be maintained is that the left subtree must be less than or equal to the current node, which in turn should be less than or equal to the right subtree. There are a couple of different ways to tackle this problem: The first is to check the constraints for values on each subtree and can be outlined like this (in Java, for integer nodes):
public static boolean isBST(TreeNode node, int lower, int higher){
if(node == null) return true;
else if(node.data < lower || node.data > higher) return false;
return isBST(node.left, lower, node.data) && isBST(node.right, node.data, higher);
}
There is also another way to accomplish this using an inOrder traversal where you keep track of the previous element and make sure the progression is strictly non-decreasing. Both these methods explore the left subtrees first though, and in the event we have an inconsistency in the middle of the root's right subtree, what is the recommended path? I know that a BFS variant could be used, but would it be possible to use multiple techniques at the same time and is that recommended? For example, we could to a BFS, an inorder and a reverseInorder and return the moment there is a failure detected. This could only maybe be desirable for really large trees in order to reduce the average runtime at the cost of a bit more space and multiple threads accessing the same data structure. Ofcourse, if we're using a simple iterative solution for inorder solution (NOT a morris traversal that modifies the tree) we will be using up O(lgN) space.
I would expect this to depend on your precise situation. In particular, what is the probability that your tree will fail to be binary, and the expected depth at which the failure will occur.
For example, if it is likely that the tree is correctly binary, then it would be wasteful to use 3 multiple techniques as the overall runtime for a valid tree will be roughly tripled.
What about iterative deepening depth-first search?
It is generally (asymptotically) as fast as breadth-first search (and also finds any early failure), but uses as little memory as depth-first search.
It would typically look something like this:
boolean isBST(TreeNode node, int lower, int higher, int depth)
{
if (depth == 0)
return true;
...
isBST(..., depth-1)
...
}
Caller:
boolean failed = false;
int treeHeight = height(root);
for (int depth = 2; depth <= treeHeight && !failed; depth++)
failed = !isBST(root, -INFINITY, INFINITY, depth);

Print binary tree in BFS fashion with O(1) space

I was wondering if it's possible to print a binary tree in breadth first order while using only O(1) space?
The difficult part is that one have to use additional space to memorize the next level to traverse, and that grows with n.
Since we haven't place any limitation on the time part, maybe there are some inefficient (in terms of time) ways that can achieve this?
Any idea?
This is going to depend on some finer-grained definitions, for example if the edges have back-links. Then it's easy, because you can just follow a back link up the tree. Otherwise I can't think off hand of a way to do it without O(lg number of nodes) space, because you need to remember at least the nodes "above".
Update
Oh wait, of course it can be done in O(1) space with a space time trade. Everywhere you would want to do a back link, you save your place and do BFS, tracking the most recent node, until you find yours. Then back up to the most recently visited node and proceed.
Problem is, that's O(1) space but O(n^2) time.
Another update
Let's assume that we've reached node n_i, and we want to reach the parent of that node, which we'll call wlg n_j. We have identified the distinguished root node n_0.
Modify the breath-first search algorithm so that when it follows a directed edge (n_x,n_y), the efferent or "incoming" node is stored. Thus when you follow (n_x,n_y), you save n_x.
When you start the BFS again from n_0, you are guaranteed (assuming it really is a tree) that at SOME point, you will transition the edge (n_j,n_i). At that point you observe you're back at n_i. You've stored n_j and so you know the reverse edge is (n_i,n_j).
Thus, you get that single backtrack with only two extra cells, one for n_0 and one for the "saved" node. This is O(1)
I'm not so sure of O(n^2) -- it's late and it's been a hard day so I don't want to compose a proof. I'm sure it's O((|N|+|E|)^2) where |N| and |E| are the size of the sets of vertices and edges respectively.
An interesting special case is heaps.
From heapq docs:
Heaps are binary trees for which every parent node has a value less
than or equal to any of its children. This implementation uses arrays
for which heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k,
counting elements from zero. For the sake of comparison, non-existing
elements are considered to be infinite. The interesting property of a
heap is that its smallest element is always the root, heap[0]. [explanation by François Pinard]
How a tree represented in memory (indexes of the array):
0
1 2
3 4 5 6
7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
In this case nodes in the array are already stored in a breadth first order.
for value in the_heap:
print(value)
O(1) in space.
I know that this is strictly not an answer to the question, but visiting the nodes of a tree in breadth-first order can be done using O(d) space, where d is the depth of the tree, by a recursive iterative deepening depth first search (IDDFS). The space is required for the stack, of course. In the case of a balanced tree, d = O(lg n) where n is the number of nodes. I honestly don't see how you'd do it in constant space without the backlinks suggested by #Charlie Martin.
It is easy to implement a recursive method to get all the nodes of a tree at a given level. Hence, we could calculate the height of the tree and get all the nodes and each level. This is Level Order Traversal of the tree. But, the time complexity is O(n^2). Below is the Java implementation (source).
class Node
{
int data;
Node left, right;
public Node(int item)
{
data = item;
left = right = null;
}
}
class BinaryTree
{
Node root;
public BinaryTree()
{
root = null;
}
void PrintLevelOrder()
{
int h = height(root);
int i;
for (i=1; i<=h; i++)
printGivenLevel(root, i);
}
int Height(Node root)
{
if (root == null)
return 0;
else
{
int lheight = height(root.left);
int rheight = height(root.right);
}
}
void PrintGivenLevel (Node root ,int level)
{
if (root == null)
return;
if (level == 1)
System.out.print(root.data + " ");
else if (level > 1)
{
printGivenLevel(root.left, level-1);
printGivenLevel(root.right, level-1);
}
}
}

Resources