Shortest Root to Leaf Path - algorithm

What is the easiest way, preferably using recursion, to find the shortest root-to-leaf path in a BST (Binary Search Tree). Java prefered, pseudocode okay.
Thanks!

General description:
Use a Breadth-first search (BFS) as opposed to a Depth-first search (DFS). Find the first node with no children.
Using a DFS you might get lucky on some input trees (but there is no way to know you got lucky so you still need to search the whole tree), but using the BFS method is much faster and you can find a solution without touching all nodes.
To find the root to leaf path, you could follow the first found childless node all the way back up to the root using the parent reference. If you have no parent reference stored in each node, you can keep track of the parent nodes as you recurse down. If you have your list in reverse order you could push it all on a stack and then pop it off.
Pseudo-code:
The problem is very simple; here is pseudo code to find the smallest length:
Put the root node on the queue.
Repeat while the queue is not empty, and no result was found:
Pull a node from the beginning of the queue and check if it has no children. If it has no children you
are done you found the shortest path.
Otherwise push all the children (left, right) onto the queue.
Finding all shortest paths:
To find all shortest paths you can store the depth of the node along with node inside the queue. Then you would continue the algorithm for all nodes in the queue with the same depth.
Alternative:
If instead you decided to use a DFS, you would have to search the entire tree to find the shortest path. But this could be optimized by keeping a value for the shortest so far, and only checking the depth of future nodes up until you find a new shortest, or until you reach the shortest so far. The BFS is a much better solution though.

This is in C++, but it is so simple you can convert it easily. Just change min to max to get the maximum tree depth.
int TreeDepth(Node* p)
{
return (p == NULL) ? 0 : min(TreeDepth(p->LeftChild), TreeDepth(p->RightChild)) + 1;
}
Just to explain what this is doing, it is counting from the leaf node (it returns 0 when it finds a leaf) and counts up back to the root. Doing this for the left and right hand sides of the tree and taking the minimum will give you the shortest path.

Breadth first search is exactly optimal in terms of the number of vertices visited. You have to visit every one of the vertices you'd visit in a breadth first search just in order to prove you have the closest leaf!
However, if you have a mandate to use recursion, Mike Thompson's approach is almost the right one to use -- and is slightly simpler.
TD(p) is
0 if p is NULL (empty tree special case)
1 if p is a leaf (p->left == NULL and p->right == NULL)
min(TD(p->left), TD(p->right)) if p is not a leaf

static int findCheapestPathSimple(TreeNode root){
if(root==null){
return 0;
}
return root.data+Math.min(findCheapestPathSimple(root.left), findCheapestPathSimple(root.right));
}

shortestPath(X)
if X == NIL
return 0
else if X.left == NIL and X.right == NIL //X is a leaf
return 1
else
if X.left == NIL
return 1 + shortestPath(X.right)
else if X.right == NIL
return 1 + shortestPath(X.left)
else
return 1 + min(shortestPath(X.left), shortestPath(X.right))

Related

Runtime of the following recursive algorithm?

I am working through the book "Cracking the coding interview" by Gayle McDowell and came across an interesting recursive algorithm that sums the values of all the nodes in a balanced binary search tree.
int sum(Node node) {
if (node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
}
Now Gayle says the runtime is O(N) which I find confusing as I don't see how this algorithm will ever terminate. For a given node, when node.left is passed to sum in the first call, and then node.right is consequently passed to sum in the second call, isn't the algorithm computing sum(node) for the second time around? Wouldn't this process go on forever? I'm still new to recursive algorithms so it might just not be very intuitive yet.
Cheers!
The process won't go on forever. The data structure in question is a Balanced Binary Search Tree and not a Graph which can contain cycles.
Starting from root, all the nodes will be explored in the manner - left -> itself -> right, like a Depth First Search.
node.left will explore the left subtree of a node and node.right will explore the right subtree of the same node. Both subtrees have nothing intersecting. Draw the trail of program control to see the order in which the nodes are explored and also to see that there is no overlapping in the traversal.
Since each node will be visited only once and the recursion will start unwinding when a leaf node will be hit, the running time will be O(N), N being the number of nodes.
The key to understanding a recursive algorithm is to trust that it does what it is deemed to. Let me explain.
First admit that the function sum(node) returns the sum of the values of all nodes of the subtree rooted at node.
Then the code
if (node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
can do two things:
if node is null, return 0; this is a non-recursive case and the returned value is trivially correct;
otherwise, the fuction computes the sum for the left subtree plus the value at node plus the sum for the right subtree, i.e. the sum for the subtree rooted at node.
So in a way, if the function is correct, then it is correct :) Actually the argument isn't circular thanks to the non-recursive case, which is also correct.
We can use the same way of reasoning to prove the running time of the algorithm.
Assume that the time required to process the tree rooted at node is proportional to the size of this subtree, let |T|. This is another act of faith.
Then if node is null, the time is constant, let 1 unit. And if node isn't null, the time is |L| + 1 + |R| units, which is precisely |T|. So if the time to sum a subtree is proportional to the size of the subtree, the time to sum a tree is prortional to the size of the tree!

Counting number of nodes in a complete binary tree

I want to count the number of nodes in a Complete Binary tree but all I can think of is traversing the entire tree. This will be a O(n) algorithm where n is the number of nodes in the tree. what could be the most efficient algorithm to achieve this?
Suppose that we start off by walking down the left and right spines of the tree to determine their heights. We'll either find that they're the same, in which case the last row is full, or we'll find that they're different. If the heights come back the same (say the height is h), then we know that there are 2h - 1 nodes and we're done. (refer figure below for reference)
Otherwise, the heights must be h+1 and h, respectively. We know that there are then at least 2h - 1 nodes, plus the number of nodes in the bottom layer of the tree. The question, then, is how to figure that out. One way to do this would be to find the rightmost node in the last layer. If you know at which index that node is, you know exactly how many nodes are in the last layer, so you can add that to 2h - 1 and you're done.
If you have a complete binary tree with left height h+1, then there are between 1 and 2h - 1 possible nodes that could be in the last layer. The question is then how to determine this as efficiently as possible.
Fortunately, since we know the nodes in the last layer get filled in from the left to the right, we can use binary search to try to figure out where the last filled node in the last layer is. Essentially, we guess the index where it might be, walk from the root of the tree down to where that leaf should be, and then either find a node there (so we know that the rightmost node in the bottom layer is either that node or to the right) or we don't (so we know that the rightmost node in the bottom layer must purely be to the right of the current location). We can walk down to where the kth node in the bottom layer should be by using the bits in the number k to guide a search down: we start at the root, then go left if the first bit of k is 0 and right if the first bit of k is 1, then use the remaining bits in a corresponding manner to walk down the tree. The total number of times we'll do this is O(h) and each probe takes time O(h), so the total work done here is O(h2). Since h is the height of the tree, we know that h = O(log n), so this algorithm takes time O(log2 n) time to complete.
I'm not sure whether it's possible to improve upon this algorithm. I can get an Ω(log n) lower bound on any correct algorithm, though. I'm going to argue that any algorithm that is always correct in all cases must inspect the rightmost leaf node in the final row of the tree. To see why, suppose there's a tree T where the algorithm doesn't do this. Let's suppose that the rightmost node that the algorithm inspects in the bottom row is x, that the actual rightmost node in the bottom row is y, and that the leftmost missing node in the bottom row that the algorithm detected is z. We know that x must be to the left of y (because the algorithm didn't inspect the leftmost node in the bottom row) and that y must be to the left of z (because y exists and z doesn't, so z must be further to the right than y). If you think about what the algorithm's "knowledge" is at this point, the algorithm has no idea whether or not there are any nodes purely to the right of x or purely to the left of z. Therefore, if we were to give it a modified tree T' where we deleted y, the algorithm wouldn't notice that anything had changed and would have exactly the same execution path on T and T'. However, since T and T' have a different number of nodes, the algorithm has to be wrong on at least one of them. Inspecting this node takes time at least Ω(log n) because of the time required to walk down the tree.
In short, you can do better than O(n) with the above O(log2 n)-time algorithm, and you might be able to do even better than that, though I'm not entirely sure how or whether that's possible. I suspect it isn't because I suspect that binary search is the optimal way to check the bottom row and the lengths of the paths down to the nodes you'd probe, even after taking into account that they share nodes in common, is Θ(log2 n), but I'm not sure how to prove it.
Hope this helps!
Images source
public int leftHeight(TreeNode root){
int h=0;
while(root!=null){
root=root.left;
h++;
}
return h;
}
public int rightHeight(TreeNode root){
int h=0;
while(root!=null){
root=root.right;
h++;
}
return h;
}
public int countNodes(TreeNode root) {
if(root==null)
return 0;
int lh=leftHeight(root);
int rh=rightHeight(root);
if(lh==rh)
return (1<<lh)-1;
return countNodes(root.left)+countNodes(root.right)+1;
}
In each recursive call,we need to traverse along the left and right boundaries of the complete binary tree to compute the left and right height. If they are equal the tree is full with 2^h-1 nodes.Otherwise we recurse on the left subtree and right subtree. The first call is from the root (level=0) which take O(h) time to get left and right height.We have recurse till we get a subtree which is full binary tree.In worst case it can happen that the we go till the leaf node. So the complexity will be (h + (h-1) +(h-2) + ... + 0)= (h(h+1)/2)= O(h^2).Also space complexity is size of the call stack,which is O(h).
NOTE:For complete binary tree h=log(n).
If the binary tree is definitely complete (as opposed to 'nearly complete' or 'almost complete' as defined in the Wikipedia article) you should simply descend down one branch of the tree down to the leaf. This will be O(logn). Then sum the powers of two up to this depth. So 2^0 + 2^1... + 2^d
C# Sample might helps others. This is similar to the time complexity well explained above by templatetypedef
public int GetLeftHeight(TreeNode treeNode)
{
int heightCnt = 0;
while (treeNode != null)
{
heightCnt++;
treeNode = treeNode.LeftNode;
}
return heightCnt;
}
public int CountNodes(TreeNode treeNode)
{
int heightIndx = GetLeftHeight(treeNode);
int nodeCnt = 0;
while (treeNode != null)
{
int rightHeight = GetLeftHeight(treeNode.RightNode);
nodeCnt += (int)Math.Pow(2, rightHeight); //(1 << rh);
treeNode = (rightHeight == heightIndx - 1) ? treeNode.RightNode : treeNode.LeftNode;
heightIndx--;
}
return nodeCnt;
}
Using Recursion:
int countNodes(TreeNode* root) {
if (!root){
return 0;
}
else{
return countNodes(root->left)+countNodes(root->right)+1;
}
}

Topological search and Breadth first search

Is it possible to use Breadth first search logic to do a topological sort of a DAG?
The solution in Cormen makes use of Depth first search but wouldn't be easier to use BFS?
Reason:
BFS visits all the nodes in a particular depth before visiting nodes with the next depth value. It naturally means that the parents will be listed before the children if we do a BFS. Isn't this exactly what we need for a topological sort?
A mere BFS is only sufficient for a tree (or forest of trees), because in (forest of) trees, in-degrees are at most 1.
Now, look at this case:
B → C → D
↗
A
A BFS where queue is initialized to A B (whose in-degrees are zero) will return A B D C, which is not topologically sorted. That's why you have to maintain in-degrees count, and only pick nodes whose count has dropped to zero. (*)
BTW, this is the flaw of your 'reason' : BFS only guarantee one parent has been visited before, not all of them.
Edit: (*) In other words you push back adjacent nodes whose in-degree is zero (in the exemple, after processing A, D would be skipped). So, you're still using a queue and you've just added a filtering step to the general algorithm. That being said, continuing to call it a BFS is questionable.
It is possible, even wikipedia describes an algorithm based on BFS.
Basically, you use a queue in which you insert all nodes with no incoming edges. Then, when you extract a node, you remove all of its outgoing edges and insert the nodes reachable from it that have no other incoming edges.
In a BFS all of the edges you actually walk will end up in the correct direction. But all the edges you don't walk (those between nodes at the same depth, or those from deeper nodes back up to earlier nodes) will end up going the wrong way if you lay out the graph in BFS order.
Yes, you really need DFS to do it.
Yes, you can do topological sorting using BFS. Actually I remembered once my teacher told me that if the problem can be solved by BFS, never choose to solve it by DFS. Because the logic for BFS is simpler than DFS, most of the time you will always want a straightforward solution to a problem.
As YvesgereY and IVlad has mentioned, you need to start with nodes of which the indegree is 0, meaning no other nodes direct to them. Be sure to add these nodes to your result first.You can use a HashMap to map every node with its indegree, and a queue which is very commonly seen in BFS to assist your traversal. When you poll a node from the queue, the indegree of its neighbors need to be decreased by 1, this is like delete the node from the graph and delete the edge between the node and its neighbors. Every time you come across nodes with 0 indegree, offer them to the queue for checking their neighbors later and add them to the result.
public ArrayList<DirectedGraphNode> topSort(ArrayList<DirectedGraphNode> graph) {
ArrayList<DirectedGraphNode> result = new ArrayList<>();
if (graph == null || graph.size() == 0) {
return result;
}
Map<DirectedGraphNode, Integer> indegree = new HashMap<DirectedGraphNode, Integer>();
Queue<DirectedGraphNode> queue = new LinkedList<DirectedGraphNode>();
//mapping node to its indegree to the HashMap, however these nodes
//have to be directed to by one other node, nodes whose indegree == 0
//would not be mapped.
for (DirectedGraphNode DAGNode : graph){
for (DirectedGraphNode nei : DAGNode.neighbors){
if(indegree.containsKey(nei)){
indegree.put(nei, indegree.get(nei) + 1);
} else {
indegree.put(nei, 1);
}
}
}
//find all nodes with indegree == 0. They should be at starting positon in the result
for (DirectedGraphNode GraphNode : graph) {
if (!indegree.containsKey(GraphNode)){
queue.offer(GraphNode);
result.add(GraphNode);
}
}
//everytime we poll out a node from the queue, it means we delete it from the
//graph, we will minus its neighbors indegree by one, this is the same meaning
//as we delete the edge from the node to its neighbors.
while (!queue.isEmpty()) {
DirectedGraphNode temp = queue.poll();
for (DirectedGraphNode neighbor : temp.neighbors){
indegree.put(neighbor, indegree.get(neighbor) - 1);
if (indegree.get(neighbor) == 0){
result.add(neighbor);
queue.offer(neighbor);
}
}
}
return result;
}

Finding the lowest common ancestor of two nodes in a binary search tree - efficiently

Just wanted to know how efficient is the below algorithm to find the lowest common ancestor of two nodes in a binary search tree.
Node getLowestCommonAncestor (Node root, Node a, Node b) {
Find the in-order traversal of Node root.
Find temp1 = the in-order successor of Node a.
Find temp2 = the in-order successor of Node b.
return min (temp1, temp2);
}
Searching for the lowest common ancestor in a binary search tree is simpler than that: observe that the LCA is the node where the searches for item A and item B diverge for the first time.
Start from the root, and search for A and B at the same time. As long as both searches take you in the same direction, continue the search. Once you arrive at the node such that searching for A and B take you to different subtrees, you know that the current node is the LCA.
A node at the bottom of a large binary search trees can have an in-order successor close to it, for instance if it is the left child of a node, its in-order successor is its parent.
Two nodes descending from different children of the root will have the root as their least common ancestor, no matter where they are, so I believe that your algorithm gets this case wrong.
This is a discussion of efficient LCA algorithms (given time to build a preparatory data structure) at http://en.wikipedia.org/wiki/Lowest_common_ancestor, with pointers to code.
An inefficient but simple way of finding the LCA is as follows: in the tree keep pointers from children to parents and a note of the depth of each node. Given two nodes, move up from the deepest one until the depth if the same. If you are pointing at the other node, it is the LCA. Otherwise move up one step from each node and check again, and so on, until you meet at the LCA.
Finding LCA of BST is straight:
Find the node for which node1 and node2 are present on different
sides. But if the node1 is an ancestor of node2 than also we will
have to return node1. Below code implements this algo.
TreeNode<K> commonAncestor(TreeNode t, K k1, K k2){
if(t==null) return null;
while(true)
{
int c1=t.k.compareTo(k1);
int c2=t.k.compareTo(k2);
if(c1*c2<=0)
{
return t;
}
else
{
if(c1<0)
{
t = t.right;
}
else
{
t = t.left;
}
}
}
}

Algorithm for successor

I need an algorithm for returning the successor node of some arbitrary node of the
given binary search tree.
To give you an answer with the same level of detail as your question: Go up until going right is possible, and then left until you reach a leaf.
There are two general approaches:
If your binary tree nodes have pointers to their parent node, then you can traverse directly from the node to the successor node. You will have to determine how to use the parent pointers to do the traversal.
If your binary tree nodes do not have parent pointers, then you will have to do an inorder traversal of the tree starting at the root (presumably you have a root node pointer) and return the next node after the given node.
You need to maintain a zipper as you descend the tree. A zipper is simply a list of the nodes you have traversed, and for each an indication of whether you next went left or right.
It allows you to travel back up in the tree even if there aren't any pointers from children to parents.
The algorithm for successor is to go back (up in the zipper) as long as you were coming from the left, then go right once, and then descend to the leftmost child.
It's easier with a figure...
The successor of a node implies you are looking for an inorder successor.
The following method helps you determine the inorder successor WITHOUT ANY PARENT NODE OR EXTRA SPACE NON-RECURSIVELY
struct node * inOrderSuccessor(struct node *root, struct node *n)
{
//*If the node has a right child, return the smallest value of the right sub tree*
if( n->right != NULL )
return minValue(n->right);
//*Return the first ancestor in whose left subtree, node n lies*
struct node *succ=NULL;
while(root)
{
if(n->datadata < root->data)
{
succ=root; root=root->left;
}
else if(n->data > root->data)
root=root->right;
else break;
}
return succ;
}
I'm quite certain this is right. Do correct me if I am wrong. Thanks.

Resources