How find the longest consecutive path in a binary tree - algorithm

I was asked in an interview this question. Consider a binary tree, we need to print the length of the longest path, where each element differs by 1.
EG:
6
/ \
5 7
/ \ / \
2 4 8 9
answer: 5
( 4,5,6,7,8 )
How to do this?
I developed an algoirthm to print increasing path from root to leaf, but I was not to develop one that keeps track of path that's on both subtrees.
EDIT: Need to get back the original tree after modification.

As suggested by #qwertyman in the comments
remove all invalid edges i.e edges whose difference is greater than 1
Now we have a forest, for each forest calculate the diameter as it is given in #Filip Kočica solution
The answer would be the max diameter out of all forests

For each subtree, you can calculate the longest increasing path down from the subtree root, the longest decreasing path down, and the longest internal path consisting of the increasing and decreasing paths down from the same node anywhere in the subtree.
It's easy to calculate these for a node if you already have them for all of its children, so you can do it as part of any postorder traversal.
The answer is the longest internal path within the whole tree.

Let longest_desc[a] be the longest 1-by-1 descending path going down from a
Similarly longest_asc[a], the longest 1-by-1 incremental path going down from a
For a fixed root R, the answer would be longest_desc[R] + longest_asc[R] - 1.
A brut force solution would do 2 dfs/bfs traversals from each node X to compute longest_asc[X] and longest_desc[X] and then merge them together. The resulting runtime complexity would be O(n^2).
But we can actually do better using dynamic programming:
longest_asc[X] = max(longest_asc[Y in children[X]] with Y = X + 1)
longest_desc[X] = max(longest_desc[Y in children[X]] with Y = X - 1)
Then we can compute all the values in a single DFS traversal => O(n) solution.

The answer is incorrect - another user correctly pointed out a bug. My solution below works only when the max length path passes through the root. In case, for example, the max length path is entirely in the left subtree and does not pass through the root, this answer fails. Feel free to read further to acquaint yourself with a recursive solution... and the bug in it.
I'm assuming that it is not important that the path has to have a difference of +1 as shown in your example. A difference of -1, resulting in a path like 4 -> 5 -> 4 -> 3 -> 4 -> 5 is ok as well.
public int getLongestConsecutivePath(TreeNode root) {
return root == null
? 0
: getLength(root.left, root.value) + getLength(root.right, root.value);
}
private int getLength(TreeNode node, int prevVal) {
return node == null || Math.abs(node.value - prevVal) > 1
? 0
: Math.max(getLength(node.left, node.value), getLength(node.right, node.value)) + 1;
}
Explanation:
If the root is not null, we get the max length in left and right subtree and sum it.
To get max length in a subtree, we recursively get the max length of right and left subtree of the subtree.
If we have reached the leaf OR if we have reached a node where the difference in value is greater than 1, we return 0.
Else we recursively get the max length from the left and right subtree and add 1 to it to accommodate for this node itself.

Related

Generate all the leaf to leaf path in an n-array tree

Given an N-ary tree, I have to generate all the leaf to leaf paths in an n-array tree. The path should also denote the direction. As an example:
Tree:
1
/ \
2 6
/ \
3 4
/
5
Paths:
5 UP 3 UP 2 DOWN 4
4 UP 2 UP 1 DOWN 6
5 UP 3 UP 2 UP 1 DOWN 6
These paths can be in any order, but all paths need to be generated.
I kind of see the pattern:
looks like I have to do in order traversal and
need to save what I have seen so far.
However, can't really come up with an actual working algorithm.
Can anyone nudge me to the correct algorithm?
I am not looking for the actual implementation, just the pseudo code and the conceptual idea would be much appreciated.
The first thing I would do is to perform in-order traversal. As a result of this, we will accumulate all the leaves in the order from the leftmost to the rightmost nodes.(in you case this would be [5,4,6])
Along the way, I would certainly find the mapping between nodes and its parents so that we can perform dfs later. We can keep this mapping in HashMap(or its analogue). Apart from this, we will need to have the mapping between nodes and its priorities which we can compute from the result of the in-order traversal. In your example the in-order would be [5,3,2,4,1,6] and the list of priorities would be [0,1,2,3,4,5] respectively.
Here I assume that our node looks like(we may not have the mapping node -> parent a priori):
class TreeNode {
int val;
TreeNode[] nodes;
TreeNode(int x) {
val = x;
}
}
If we have n leaves, then we need to find n * (n - 1) / 2 paths. Obviously, if we have managed to find a path from leaf A to leaf B, then we can easily calculate the path from B to A. (by transforming UP -> DOWN and vice versa)
Then we start traversing over the array of leaves we computed earlier. For each leaf in the array we should be looking for paths to leaves which are situated to the right of the current one. (since we have already found the paths from the leftmost nodes to the current leaf)
To perform the dfs search, we should be going upwards and for each encountered node check whether we can go to its children. We should NOT go to a child whose priority is less than the priority of the current leaf. (doing so will lead us to the paths we already have) In addition to this, we should not visit nodes we have already visited along the way.
As we are performing dfs from some node, we can maintain a certain structure to keep the nodes(for instance, StringBuilder if you program in Java) we have come across so far. In our case, if we have reached leaf 4 from leaf 5, we accumulate the path = 5 UP 3 UP 2 DOWN 4. Since we have reached a leaf, we can discard the last visited node and proceed with dfs and the path = 5 UP 3 UP 2.
There might be a more advanced technique for solving this problem, but I think it is a good starting point. I hope this approach will help you out.
I didn't manage to create a solution without programming it out in Python. UNDER THE ASSUMPTION that I didn't overlook a corner case, my attempt goes like this:
In a depth-first search every node receives the down-paths, emits them (plus itself) if the node is a leaf or passes the down-paths to its children - the only thing to consider is that a leaf node is a starting point of a up-path, so these are input from the left to right children as well as returned to the parent node.
def print_leaf2leaf(root, path_down):
for st in path_down:
st.append(root)
if all([x is None for x in root.children]):
for st in path_down:
for n in st: print(n.d,end=" ")
print()
path_up = [[root]]
else:
path_up = []
for child in root.children:
path_up += child is not None and [st+[root] for st in print_root2root(child, path_down + path_up)] or []
for st in path_down:
st.pop()
return path_up
class node:
def __init__(self,d,*children):
self.d = d
self.children = children
## 1
## / \
## 2 6
## / \ /
## 3 4 7
## / / | \
## 5 8 9 10
five = node(5)
three = node(3,five)
four = node(4)
two = node(2,three,four)
eight = node(8)
nine = node(9)
ten = node(10)
seven = node(7,eight,nine,ten)
six = node(6,None,seven)
one = node(1,two,six)
print_leaf2leaf(one,[])

Job Interview Question Using Trees, What data to save?

I was solving the following job interview question and solved most of it but failed at the last requirement.
Q: Build a data structure which supports the following functions:
Init - Initialise Empty DS. O(1) Time complexity.
SetPositiveInDay(d,x) - Add to the DS that in day d exactly x new people were infected with covid-19. O(log n)Time complexity.
WorseBefore(d) - From the days inserted into the DS and smaller than d return the last one which has more newly infected people than d. O(log n)Time complexity.
For example:
Init()
SetPositiveInDay(1,10)
SetPositiveInDay(2,20)
SetPositiveInDay(3,15)
SetPositiveInDay(5,17)
SetPositiveInDay(23,180)
SetPositiveInDay(8,13)
SetPositiveInDay(13,18)
WorstBefore(13) // Returns day #2
SetPositiveInDay(10,19)
WorstBefore(13) // Returns day #10
Important note: you can't suppose that days will be entered by order and can't suppose too that there won't be "gaps" between days. (Some days may not be saved in the DS while those after it may be).
What I did?
I used AVL tree (I could use 2-3 tree too).
For each node I have:
Sick - Number of new infected people in that day.
maxLeftSick - Max number of infected people for left son.
maxRightSick - Max number of infected people for right son.
When inserted a new node I made sure that in rotation data won't get missed plus, for each single node from the new one till the root I did:
But I wasn't successful implementing WorseBefore(d).
Where to search?
First you need to find the node node corresponding to d in the tree ordered by days. Let x = Sick(node). This can be done in O(log n).
If maxLeftSick(node) > x, the solution must be in the left subtree of node. Search for the solution there and return the answer. This can be done in O(log n) - see below.
Otherwise, traverse the tree upwards towards the root, starting from node, until you find the first node nextPredecessor satisfying this property (this takes O(log n)):
nextPredecessor is smaller than node,
and either
Sick(nextPredecessor) > x or
maxLeftSick(nextPredecessor) > x.
If no such node exists, we give up. In case 1, just return nextPredecessor since that is the best solution.
In case 2, we know that the solution must be in the left subtree of nextPredecessor, so search there and return the answer. Again, this takes O(log n) - see below.
Note that there is no need to search in the right subtree of nextPredecessor since the only nodes that are smaller than node in that subtree would be the left subtree of node itself, and we have already excluded that.
Note also that it is not necessary to traverse further up the tree than nextPredecessor since those nodes are even smaller, and we are looking for the largest node satisfying all constraints.
How to search?
OK, so how do we search for the solution in a subtree? Finding the largest day within a subtree rooted in q that is worse than an infection number x is simple using the maxLeftSick and maxRightSick information:
If q has a right child and maxRightSick(q) > x then search in the right subtree of q.
If q has no right child and Sick(q) > x, return Day(q).
If q has a left child and maxLeftSick(q) > x then search in the left subtree of q.
Otherwise there is no solution within the subtree q.
We are effectively using maxLeftSick and maxRightSick to prune the search tree to include only "worse" nodes, and within that pruned tree we get the right most node, i.e. the one with the largest day.
It is easy to see that this algorithm runs in O(log n) where n is the total number of nodes since the number of steps is bounded by the height of the tree.
Pseudocode
Here is the pseudocode (assuming maxLeftSick and maxRightSick return -1 if no corresponding child node exists):
// Returns the largest day smaller than d such that its
// infection number is larger than the infection number on day d.
// Returns -1 if no such day exists.
int WorstBefore(int d) {
node = find(d);
// try to find the solution in the left subtree
if (maxLeftSick(node) > Sick(node)) {
return FindLastWorseThan(node -> left, Sick(node));
}
// move up towards root until we find the first node
// that is smaller than `node` and such that
// Sick(nextPredecessor) > Sick(node) or
// maxLeftSick(nextPredecessor) > Sick(node).
nextPredecessor = findNextPredecessor(node);
if (nextPredecessor == null) return -1;
// Case 1
if (Sick(nextPredecessor) > Sick(node)) return nextPredecessor;
// Case 2: maxLeftSick(nextPredecessor) > Sick(node)
return FindLastWorseThan(nextPredecessor -> left, Sick(node));
}
// Finds the latest day within the given subtree with root "node" where
// the infection number is larger than x. Runs in O(log(size(q)).
int FindLastWorseThan(Node q, int x) {
if ((q -> right) = null and Sick(q) > x) return Day(q);
if (maxRightSick(q) > x) return FindLastWorseThan(q -> right, x);
if (maxLeftSick(q) > x) return FindLastWorseThan(q -> left, x);
return -1;
}
First of all, your chosen data structure looks fine to me. You did not mention it explicitly, but I assume that the "key" you use in the AVL tree is the day number, i.e. an in-order traversal of the tree would list the nodes in their chronological order.
I would just suggest a cosmetic change: store the maximum value of sick in the node itself, so that you don't have two similar informations (maxLeftSick and maxRightSick) stored in one node instance, but move those two informations to the child nodes, so that your node.maxLeftSick is actually stored in node.left.maxSick, and similarly node.maxRightSick is stored in node.right.maxSick. This is of course not done when that child does not exist, but then we don't need that information either. In your structure maxLeftSick would be 0 when left is not defined. In my proposed structure, you would not have that value -- the 0 would follow naturally from the fact that there is no left child. In my proposal, the root node would have an information in maxSick which is not present in yours, and which would be the sum of your root.maxLeftSick and root.maxRightSick. This information would not really be used, but it is just there to make the structure consistent throughout the tree.
So you would just store one maxSick, which considers the current node's sick value also in that maximum. The processing you do during rotations will need to change accordingly, but will not become more complex.
I will assume that your AVL tree is single-threaded, i.e. you don't keep track of parent-pointers. So create a find method which will return the path to the node to be found. For instance, in Python syntax, it could look like this:
def find(self, day):
node = self.root
path = [] # an array of nodes
while node:
path.append(node)
if node.day == day: # bingo
return path
if day < node.day:
node = node.left
else:
node = node.right
Then the worstBefore method could look like this:
def worstBefore(self, day):
path = self.find(day)
if not path:
return # day not found
# get number of sick people on that day:
sick = path[-1].sick
# look for recent day with greater number of sick
while path:
node = path.pop() # walk upward, starting with found node
if node.day < day and node.sick > sick:
return node.day
if node.left and node.left.maxSick > sick:
# we will find the result in this subtree
node = node.left
while True:
if node.right and node.right.maxSick > sick:
node = node.right
elif node.sick > sick: # bingo
return node.day
else:
node = node.left
So the path returned by the find method will be used to get the parents of a node when you need to backtrack upwards in the tree along that path.
If along that path you find a left child whose maxSick is greater, then you know that the targeted node must be in that subtree. It is then a matter to walk down that subtree in a controlled way, choosing the right child when it still has maxSick greater. Otherwise check the current node's sick value and return that one if that value is greater. Otherwise go left, and repeat.
While there is no such left sub tree, go up along the path. If that parent would be a match, then return it (make sure to verify the day number). Keep checking for left sub trees that have a larger maxSick.
This runs in O(logn) because you first will walk zero or more steps upward and then zero or more steps downward (in a left subtree).
You can see your example scenario run on repl.it. There I focussed on this question, and didn't implement the rotations.

Algorithm for finding the size of the largest connected region of nodes of the same value in a tree

This is a question I got in an interview, and I'm still not fully sure how to solve it.
Let's say we have a tree of numbers, and we want to find the size of the largest connected region in the tree whose nodes have the same value. For example, in this tree
3
/ \
3 3
/ \ / \
1 2 3 4
The answer is 4, because you have a region of 4 connected 3s.
I would suggest a depth first search with a function that takes two inputs:
A target value
A start node
and returns two outputs:
the size of the subtree of node with values equal to the target value
the largest size of connected region within the subtree of node
You can then call this function with a dummy target value (e.g. -1) and the root node and it will return the answer in the second output.
In pseudocode:
dfs(target_value,start_node):
if start_node.value == target_value:
total = 1
best = 0
for each child of start_node:
x,m = dfs(target_value,child)
best = max(m,best)
total += x
return total,best
else
x,m = dfs(start_node.value,start_node)
return 0,max(x,m)
_,ans = dfs(-1, root_node)
print ans
Associate a counter with each node to represent the largest connected region rooted at that node where all the nodes are the same value. Initialize this counter to 1 for every node.
Run DFS on the tree.
When you back up from any node, if both nodes have the same value, add the child node's counter to the parent's counter.
When you're done, the largest counter associated with a node is your answer. You can keep track of this as you run the algorithm.

Algorithm to find the maximum non-adjacent sum in n-ary tree

Given an n-ary tree of integers, the task is to find the maximum sum of a subsequence with the constraint that no 2 numbers in the sequence should share a common edge in the tree.
Example:
1
/ \
2 5
/ \
3 4
Maximum non adjacent sum = 3 + 4 + 5 = 12
The following is the faulty extension of the algorithm outlined in http://www.geeksforgeeks.org/maximum-sum-such-that-no-two-elements-are-adjacent?
def max_sum(node, inc_sum, exc_sum):
for child in node.children:
exc_new = max(inc_sum, exc_sum)
inc_sum = exc_sum + child.val
exc_sum = exc_new
inc_sum, exc_sum = max(max_sum(child, inc_sum, exc_sum),
max_sum(child, inc_sum, inc_sum - node.val))
return exc_sum, inc_sum
But I wasn't sure if swapping exc_sum and inc_sum while returning is the right way to achieve the result and how do I keep track of the possible sums which can lead to a maximum sum, in this example, the maximum sum in the left subtree is (1+3+4) whereas the sum which leads to the final maximum is (3+4+5), so how should (3+4) be tracked? Should all the intermediary sums stored in a table?
Lets say dp[u][select] stores the answer: maximum sub sequence sum with no two nodes having edge such that we consider only the sub-tree rooted at node u ( such that u is selected or not ). Now you can write a recursive program where state of each recursion is (u,select) where u means root of the sub graph being considered and select means whether or not we select node u. So we get the following pseudo code
/* Initialize dp[][] to be -1 for all values (u,select) */
/* Select is 0 or 1 for false/true respectively */
int func(int node , int select )
{
if(dp[node][select] != -1)return dp[node][select];
int ans = 0,i;
// assuming value of node is same as node number
if(select)ans=node;
//edges[i] stores children of node i
for(i=0;i<edges[node].size();i++)
{
if(select)ans=ans+func(edges[node][i],1-select);
else ans=ans+max(func(edges[node][i],0),func(edges[node][i],1));
}
dp[node][select] = ans;
return ans;
}
// from main call, root is root of tree and answer is
// your final answer
answer = max(func(root,0),func(root,1));
We have used memoization in addition to recursion to reduce time complexity.Its O(V+E) in both space and time. You can see here a working version of
the code Code. Click on the fork on top right corner to run on test case
4 1
1 2
1 5
2 3
2 4
It gives output 12 as expected.
The input format is specified in comments in the code along with other clarifications. Its in C++ but there is not significant changes if you want it in python once you understand the code. Do post in comments if you have any doubts regarding the code.

Least distance between two values in a large binary tree with duplicate values

Given a binary tree that might contain duplicate values, you need to find minimum distance between two given values. Note that the binary tree can be large.
For example:
5
/ \
1 7
/ \ / \
4 3 8 2
/ \
1 2
The function should return 2 for (1 and 2 as input).
(If duplicates are not present, we can find LCA and then calculate the distance.)
I've written the following code but I couldn't handle cases when the values are present in different subtrees and in the below cases:
root = 1, root.left = 4, root.left.left = 3, root.left.right = 2, root.left.left.left = 1
root = 1, root.left = 4, root.left.left = 3, root.left.left.left = 1, root.left.left.right = 2
void dist(struct node* root,int& min,int n1,int n2,int pos1,int pos2,int level) {
if(!root)
return;
if(root->data==n1){
pos1 = level;
if(pos2>=0)
if(pos1-pos2 < min)
min = pos1-pos2;
}
else if(root->data==n2){
pos2 = level;
if(pos1>=0)
if(pos2-pos1 < min)
min = pos2-pos1;
}
dist(root->left,min,n1,n2,pos1,pos2,level+1);
dist(root->right,min,n1,n2,pos1,pos2,level+1);
}
I think at each node we can find if that node is the LCA of the values or not. If that node is LCA then find the distance and update min accordingly, but this would take O(n2).
Following is an algorithm to solve the problem:-
traverse all of the tree and calculate paths for each node using binary strings representation and store into hash map
eg. For your tree the hashmap will be
1 => 0,000
2 => 001,11
3 => 01
...
When query for distance between (u,v) check for each pair and calculate distance between them. Remove common prefix from strings and then sum the remaining lengths
eg. u=1 and v=2
distance(0,001) = 2
distance(0,11) = 3
distance(000,001) = 2
distance(000,11) = 5
min = 2
Note: I think the second step can be made more efficient but need to do more research
You can compute the LCA of a set of nodes by computing LCA(x1, LCA(x2, LCA(x3... and all the nodes in the set will be somewhere below this LCA. If you compare the LCAs of two sets of nodes and one is not directly beneath the other then the minimum distance between any two nodes in different sets will be at least the distance between the LCAs. If one LCA is above the other then the minimum distance could be zero.
This allows a sort of branch and bound approach. At each point you have a best minimum distance so far (initialized as infinity). Given two sets of nodes, use their LCAs to work out a lower bound on their minimum distance and discard them if this is no better than the best answer so far. If not discarded, split each set into two plus a possible single depending on whether each node in the set is to the left of the LCA, to the right of the LCA, or is the LCA. Recursively check for the minimum distance in the (up to nine) pairs of split sets. If both splits in a pair are below some minimum size, just work out the LCAs and minimum distances of each pair of nodes across the two sets - at this point may find out that you have a new best answer and can update the best answer so far.
Looking at the example at the top of the question, the LCA of the 2s is the root of the tree, and the LCA of the 1s is the highest 1. So the minimum distance between these two sets could be close to zero. Now split each set in two. The left hand 2 is distance two from both of the two 1s. The LCA of the right hand 2 is itself, on the right hand branch of the tree, and the LCA of each of the two 1s is down on the left hand branch of the tree. So the distance between the two is at least two, and we could tell this even if we had a large number of 2s anywhere below the position of the existing right-hand two, and a large number of 1s anywhere on the left hand subtree.
Do a pre-order traversal of the tree (or any traversal should work).
During this process, simply keep track of the closest 1 and 2, and update the distance whenever you find a 2 and the closest 1 is closer than the closest distance so far, or vice versa.
Code (C++, untested first draft): (hardcoded 1 and 2 for simplicity)
int getLeastDistance(Node *n, int *distTo1, int *distTo2)
{
if (n == NULL)
return;
int dist = LARGE_VALUE;
// process current node
if (n->data == 1)
{
dist = *distTo2;
*distTo1 = 0;
}
else if (n->data == 2)
{
dist = *distTo1;
*distTo2 = 0;
}
// go left
int newDistTo1 = *distTo1 + 1,
newDistTo2 = *distTo2 + 1;
dist = min(dist, getLeastDistance(n->left, &newDistTo1, &newDistTo2));
// update distances
*distTo1 = min(*distTo1, newDistTo1 + 1);
*distTo2 = min(*distTo2, newDistTo2 + 1);
// go right
newDistTo1 = *distTo1 + 1;
newDistTo2 = *distTo2 + 1;
dist = min(dist, getLeastDistance(n->right, &newDistTo1, &newDistTo2));
}
Caller:
Node root = ...;
int distTo1 = LARGE_VALUE, distTo2 = LARGE_VALUE;
int dist = getLeastDistance(&root, &distTo1, &distTo2);
Just be sure to make LARGE_VALUE far enough from the maximum value for int such that it won't overflow if incremented (-1 is probably safer, but it requires more complex code).

Resources