Space/Time complexity of Binary Tree Equality

Space/Time complexity of Binary Tree Equality - algorithm

I had an interview yesterday that involved a pretty basic tree data structure:
t ::= int | (t * t)
where a tTree is either an integer (leaf) or two t's which represent left and right subtrees. This implies that the tree will only have values at the leaf level.
An example tree could look like this:
t
/ \
t t
/ \ / \
1 2 3 4
The task was to write a function equal(t, t) => bool that takes two tTrees and determines whether or not they are equal, pretty simple.
I wrote pretty standard code that turned out like this (below is pseudocode):
fun equal(a, b) {
if a == b { // same memory address
return true
}
if !a || !b {
return false
}
// both leaves
if isLeaf(a) && isLeaf(b) {
return a == b
}
// both tTrees
if isTree(a) && isTree(b) {
return equal(a->leftTree, b->leftTree)
&& equal(a->rightTree, b->rightTree)
}
// otherwise
return false
}
When asked to give the time and space complexity, I quickly answered:
O(n) time
O(1) space
My interviewer claimed they could create a tree such that this equal function would run in O(2^n) exponential time. I didn't (and still don't) see how this is possible given the algorithm above. I see that the function recursively calls itself twice, but the input size is halved on each of those calls right? because you are only examining the respective subtrees in parallel.
Any thoughts or input on this would be really helpful.

As it stands, your code is O(n), and your interviewer was mistaken. The code's not O(1) in space use though: it's O(n) in the worst case (when the trees are very unbalanced) because your code is recursive (and not tail recursive).
Probably they were asking you to write a function that tests if two trees are isomorphic. That is, they were expecting you to write code that returns true when comparing these two trees:
* *
/ \ / \
1 2 2 1
Then they misread your solution, assuming you'd written the naive code that does this, which would be O(2^n).
Another possibility is that some of the pointers could be reused in both the left and right branches of the same tree, allowing a tree with 2^n nodes to be represented in O(n) space. Then if 'n' is the size of the structure in memory, rather than the number of nodes, then the interviewer's position is correct. Here's such a tree:
___ ___ ___ ___ ___
/ \ / \ / \ / \ / \
* * * * * 1
\___/ \___/ \___/ \___/ \___/
The root is on the left, and it has 32 leaf nodes (all 1).

Related

Thought process behind determining if binary tree is symmetric

Here is the problem description:
Given a binary tree, check whether it is a mirror of itself (ie, symmetric around its center).
For example, this binary tree [1,2,2,3,4,4,3] is symmetric:
1
/ \
2 2
/ \ / \
3 4 4 3
But the following [1,2,2,null,3,null,3] is not:
1
/ \
2 2
\ \
3 3
Sourced from: Determine if tree is symmetric
I took a lot of time to solve the problem and the solution I came up with was to do a level order traversal and check that the values in each level are a palindrome. This implementation passed the tests on leetcode. However, when I read the editorial, I saw an extremely short recursive program and I have been having trouble getting my head around it.
public boolean isSymmetric(TreeNode root) {
return isMirror(root, root); }
public boolean isMirror(TreeNode t1, TreeNode t2) {
if (t1 == null && t2 == null) return true;
if (t1 == null || t2 == null) return false;
return (t1.val == t2.val)
&& isMirror(t1.right, t2.left)
&& isMirror(t1.left, t2.right);}
How can one prove the correctness of the above recursive version? (I guess this can be proved inductively?)
Can someone outline the thought process in coming up with such a solution. Do you verify the solution by actually visualizing the call stack or is there a good high level thinking framework to reason about such problems?
I understand that tree is a recursive data structure in itself i.e. composed of left and right subtree that follow the same structure but for some reason when I try to verify the validity of this solution, I attempt to visualize recursive calls and eventually my thoughts get entangled. This guy has done a good job at explaining how the call stack unrolls as recursion proceeds but I just wanted to improve my thought process for such "easy" recursive problems and hence am posting here.
(FWIW, I am familiar with recursion/DFS/backtracking and how the call flow is but still I was stuck coming up and validating the high level recursive idea for the above problem)
Thanks for helping out.

This is one of the solution which can be done with a slick recursive algorithm. The idea is to maintain two references point to the root initially and then move one sub tree to the left and the other sub tree to the opposite direction and the reverse the direction of traversal for both the children to now have then point to symmetric node on either sub trees(if it exists)
Here t1 and t2 refers to the left and right sub tree.
if (t1 == null || t2 == null) return false;
What this step does is to check if there exist both a right and left sub tree because in case we do not have either one subtree then it can't be symmetric so we return false
if (t1 == null && t2 == null) return true;
This accounts for leaf nodes where it is okay to have null as left and right child tree. So we return true;
return (t1.val == t2.val)
&& isMirror(t1.right, t2.left)
&& isMirror(t1.left, t2.right);}
can be re written as
if(t1.val != t2.val) return false;
auto left = isMirror(t1.right, t2.left)
auto right = isMirror(t1.left, t2.right);
return left && right
Now that we know that both the sub tree are valid (i.e not null) we then check for their value to check if they are the same. If not same we can return false as there is no point in looking further.
The reason why we can compare is because since we know it has to be a complete binary tree to be symmetric we can move the left sub tree(t1) to the left and right sub tree(t2) to the right to move the the symmetric node on the right sub tree.
1 (t1, t2)
/ \
2 2
/ \ / \
4 5 5 4
After isMirror(t1.right, t2.left)
1
/ \
(t2) 2 2(t1)
/ \ / \
4 5 5 4
After calling isMirror(t1.right, t2.left) recursively again
1
/ \
2 2
/ \ / \
(t2) 4 5 5 4(t1)
Now this will in turn call it's child nodes only to return true as they are both null. Then the value of t1 and t2 is checked and returns true or false. And then isMirror(t1.left, t2.right) is called now to reach here.
1
/ \
2 2
/ \ / \
4 5 5 4
(t2)(t1)
Which now does the same as the above step and unrolls the call stack.
So at stack frame we have left indicating if the left sub tree of t1 is symmetric to the right sub tree of t2 and right indicating the opposite.
Since we have already checked if t1.val equal t2.val before recursively checking for it's children, we know that the root is equal and if it's children are equal so we return return left && right the sub tree of t1 is symmetric to sub tree of t2
If this gets a bit convoluted you can trace it out on a papare and check it might clear things up a lot better.
Hope this helps.

How can one prove the correctness of the above recursive version? (I
guess this can be proved inductively?)
Yes, you can prove it by induction.
Can someone outline the thought process in coming up with such a
solution. Do you verify the solution by actually visualizing the call
stack or is there a good high level thinking framework to reason about
such problems?
I solved the problem in both way - level order traversal and recursion. By seeing this problem, my first thought was to use level order traversal. And there is no shame on thinking sub-optimal solution (with extra space) at first attempt. Then I figured out the recursive approach.
I think this is all about practice. The more recursion problem you solve, the more you start to think/visualize in the head the corresponding recursion tree. At the beginning, I struggled with it and use the help of debugger to see the contents of function stack in each call. But debugging is time consuming and you can't find scope to debug in whiteboard coding. Now at my level, I can figure out easy/medium recursion problem by seeing it in head and hard one by simulating in pen & paper/whiteboard.
These things will improve with experience and more practice. Experience with certain data-structure matters - The person who see and solve a lot of binary tree(pointer representation)/BST problem will more likely to outperform here than the person who can deal recursion very good but didn't solve binary tree problem that much.
Hope this helps!

Sedgewick Algorithms 4, why BinarySearchST put FrequencyCounters test costs lower than SequentialSearchST?

I'm reading Algorithms 4th edition. I have some questions when reading chapter 3 Searching.
From the cost summary
the insert cost of BinarySearchST(2N in worst case) is a little worse than SequentialSearchST(N in worst case).
But the FrequencyCounter test with VisualAccumulator(which draws plots) shows
Returning to the cost of the put() operations for FrequencyCounter for words of
length 8 or more, we see a reduction in the average cost from 2,246 compares (plus
array accesses) per operation for SequentialSearchST to 484 for BinarySearchST.
Shouldn't the put() operations of BinarySearchST need more compares(plus array accesses) than SequentialSearchST?
Another question, for BinarySearchST, the book says
Proposition B (continued). Inserting a new key into an ordered array of size N uses ~ 2N array accesses in the worst
case, so inserting N keys into an initially empty table uses ~
N2 array accesses in the worst case
When I look at the code of BinarySearchST, I think inserting a new key into an ordered array of size N uses ~ 4N
array accesses.
public void put(Key key, Value val) {
if (key == null) throw new IllegalArgumentException("first argument to put() is null");
if (val == null) {
delete(key);
return;
}
int i = rank(key);
// key is already in table
if (i < n && keys[i].compareTo(key) == 0) {
vals[i] = val;
return;
}
// insert new key-value pair
if (n == keys.length) resize(2*keys.length);
for (int j = n; j > i; j--) {
keys[j] = keys[j-1];
vals[j] = vals[j-1];
}
keys[i] = key;
vals[i] = val;
n++;
assert check();
}
Because for every i in the loop, there are 4 array accesses, 2 for keys reading and updating,
2 for values reading and updating. So why does prop B say it uses ~2N array accesses?

Shouldn't the put() operations of BinarySearchST need more compares(plus array accesses) than SequentialSearchST?
The key thing to understand is where does complexity comes from for each of these two symbol table implementations. SequentialSearchST reaches its worst case when the input key is not present, because in that case it needs to perform N searches (and has N misses). Based on the type of the input text, this could happen quite often. However, even if the key is already there, on average there are N/2 compares to find it sequentially.
As per BinarySearchST, searching for the key costs logN in the worst case, so here the complexity comes from resizing the array and/or from moving the existing elements to the right to make room for a new key. Notice that when the key is missing, you should make N/2 moves on average, and when key is there, only logN compares on average. In this case the total running time highly depends on the distribution of the keys - if new keys keep coming, running time will be higher!
The test they performed included text "Tale of two cities" by Charles Dickens, taking only words with 8 letters or more. There are 14350 such words, from which 5737 distinct. After 14350 put() operations and 5737 keys in the table, you would expect about 5737 / 2 = 2868 compares to perform another put() in SequentialSearchST. However, it's better than that, you "only" need 2246 compares. BinarySearchST's runtime significantly depends on the presence of the key; the experiment showed that for this text there were far more O(logN) searches of existing keys than O(N) moves required to insert new keys, which combined gives smaller cost than SequentialSearchST. Do not mix average and worst case runtime, this analysis relies on the average case complexity for the specific example.
When I look at the code of BinarySearchST, I think inserting a new key
into an ordered array of size N uses ~ 4N array accesses.
Authors should have clarified the exact definition of access. If referencing the array element means access, then there are even more, 8N array accesses because in the worst case you should first resize the whole array (take a look at the implementation of resize()). Of course, whole implementation could be rewritten to optimize number of accesses in this case by putting new key at the right place during the resize operation.

"Shouldn't the put() operations of BinarySearchST need more compares(plus array accesses) than SequentialSearchST?"
No, because previously the book talks about the WORST case.
Worst and Average cases are different. From the next sentence of the book we can read : "As before, this cost is even better than would be predicted by analysis, and the extra improvement is likely again explained by properties of the application ..."
"So why prop B says it uses ~2N array accesses?"
At some point, I think, you are right, formally there are 4N accesses, but
what if we rewrite loop as :
keys[j] = keys[j-1];
keys[j-1] = keys[j-2];
keys[j-2] = keys[j-3];
...
keys[i+1] = keys[i];
will it mean that we still use 4N accesses? I assume, that JIT compiler can optimize the loop in a right way.
Also we can do an assumption that arrays usually represented as a linear memory, computers read data into virtual pages, so, such a page has been already accessed and it is in a cache.

If a binary search tree is "balanced", there will be far less comparisons.
1 d
/ \
2 b f
/ \ / \
3 a c e g
In the worst case "unbalanced", there will be more, on the same "order" as sequential. It's not a linear reduction when the tree is balanced, I think it's C * (ln(2) / ln(n+1)) or just O(log(N)) for short. So for millions of records there are much much less.
1 a
\
2 b
\
3 c
\
4 d
\
5 e
\
6 f
\
7 g
If it's only a little unbalanced, the result will be somewhere in the middle.
1 d
/ \
2 b e
/ \ \
3 a c f
\
4 g
I'm not sure that your code is optimal, but if the book says there are twice as many operations in the worst case, it's probably accurate. Try to get it to 2x at each level if you're interested in the details for academic reasons.
I wouldn't worry about the value of C - you probably only want to use a BST if you know in advance it's going to be balanced or close to balanced based on your insertion/update method because O(N) will probably be catastrophic. Consider 40 * (ln(2) / ln(1,000,0000,000,000+1)) versus 1 * 1,000,000,000,000.

The point about BinarySearchST vs SequentialSearchST performance in average case was already covered in other responses.
Concerning the second question: 2N is for an array. It’s obviously true. The BinarySearchST uses 2 arrays but anyway when you’re inserting in an initially empty tree N times you get ~N^2 operations. It’s up to a multiplier. Either you have 2 + 4 + 6 + ... + 2N or 2 times that - anyway you get ~N^2.

Understanding the time complexity of the Longest Common Subsequence Algorithm

I do not understand the O(2^n) complexity that the recursive function for the Longest Common Subsequence algorithm has.
Usually, I can tie this notation with the number of basic operations (in this case comparisons) of the algorithm, but this time it doesn't make sense in my mind.
For example, having two strings with the same length of 5. In the worst case the recursive function computes 251 comparisons. And 2^5 is not even close to that value.
Can anyone explain the algorithmic complexity of this function?
def lcs(xstr, ystr):
global nComp
if not xstr or not ystr:
return ""
x, xs, y, ys = xstr[0], xstr[1:], ystr[0], ystr[1:]
nComp += 1
#print("comparing",x,"with",y)
if x == y:
return x + lcs(xs, ys)
else:
return max(lcs(xstr, ys), lcs(xs, ystr), key=len)

To understand it properly look at the diagram carefully and follow the recursive top-to-down approach while reading the graph.
Here, xstr = "ABCD"
ystr = "BAEC"
lcs("ABCD", "BAEC") // Here x != y
/ \
lcs("BCD", "BAEC") <-- x==y --> lcs("ABCD", "AEC") x==y
| |
| |
lcs("CD", "AEC") <-- x!=y --> lcs("BCD", "EC")
/ \ / \
/ \ / \
/ \ / \
lcs("D","AEC") lcs("CD", "EC") lcs("BCD", "C")
/ \ / \ / \
lcs("", "AEC") lcs("D","EC") lcs("CD", "C") lcs("BCD","")
| \ / \ | / |
Return lcs("", "EC") lcs("D" ,"C") lcs("D", "") lcs("CD","") Return
/ \ / \ / \ / \
Return lcs("","C") lcs("D","") lcs("","") Return lcs("D","") Return
/ \ / \ / / \
Return lcs("","") Return lcs("", "") Return
| |
Return Return
NOTE: The proper way of representation of recursive call is usually done by using tree approach, but here i used the graph approach just to compress the tree so one can easy understand the recursive call in a go. And, of course it would be easy to me to represent.
Since, in the above diagram there are some redundant pairs like lcs("CD", "EC") which is the result of deletion of "A" from the "AEC" in lcs("CD", "AEC") and of "B" from the "BCD" in lcs("BCD", "EC"). As a result, these pairs will be called more than once while execution which increases the time complexity of the program.
As you could easily see that every pair generates two outcomes for its next level until it encounters any empty string or x==y. Therefore, if the length of the strings are n, m (considering the length of the xstr is n and ystr is m and we are considering the worst case scenario). Then, we will have number outcomes at the end of the order : 2n+m. (How? think)
Since, n+m is an integer number let's say N. Therefore, the time complexity of the algorithm : O(2N), which is not efficient for lager values of N.
Therefore, we prefer Dynamic-Programming Approach over the recursive Approach. It can reduce the time complexity to: O(n.m) => O(n2) , when n == m.
Even now, if you are getting hard time to understand the logic, i would suggest you to make a tree-like (not the graph which i have shown here) representation for xstr = "ABC" and ystr = "EF". I hope you will understand it.
Any doubt, comments most welcome.

O(2^n) means the run time is proportional to (2^n) for large enough n. It doesn't mean the number is bad, high, low, or anything specific for a small n, and it doesn't give a way to calculate the absolute run-time.
To get a feel for the implication, you should consider the run-times for n = 1000, 2000, 3000, or even 1 million, 2 million, etc.
In your example, assuming that for n=5 the algorithm takes a max of 251 iteration, then the O(n) prediction is that for n=50, it would take in the range of 2^(50)/2^(5)*251 = 2^45*251 = ~8.8E15 iterations.

Developing an Algorithm for Tree Mutations

I have to come up with an efficient algorithm that takes a tree in this format:
?
/ \
? ?
/ \ / \
G A A A
and fills in the question mark nodes with the values that provide the least amount of mutations. The values can only be {A, C, T, G}. The tree will always have this same shape and amount of nodes. Also, it will always have the leaf nodes filled in and the remaining nodes will be question marks that need to be filled.
For instance, the tree on the right is correct and has less mutations than the one on the left.
A A
/ \ / \
G G A A
/ \ / \ / \ / \
G A A A G A A A
A mutation occurs when a parent node differs from a child node. So, the above left tree contains five mutations and the above right has one.
Can someone help me out by providing psuedocode? Thanks.

This looks like dynamic programming from the bottom of the tree up. For each node you want to work out the least cost solution that leaves that node marked A, C, T, or G, for each of these possibilities. You work this out by using previously calculated costs for each possibility for the nodes immediately below that node. The code just to work out the cost might be a bit like this.
LeastCost(node, colourHere)
{
foreach colour
leastLeft[colour] = LeastCost(leftChild, colour)
leastRight[colour] = LeastCost(rightChild, colour)
best = infinity
foreach combination
cost = leastLeft[combination.leftColour] +
leastRight[combination.rightColour]
if (combination.leftColour != colourHere)
cost++;
if (combination.rightColour != colourHere)
cost++;
if (cost < best)
cost = best;
return cost
}
To return the best answer as well as the best cost you need to keep track of the combination corresponding to the best answer as well. Come to think about it, you can save time by working out the answers for all four colours at each node at the same time.

More localized, efficient Lowest Common Ancestor algorithm given multiple binary trees?

I have multiple binary trees stored as an array. In each slot is either nil (or null; pick your language) or a fixed tuple storing two numbers: the indices of the two "children". No node will have only one child -- it's either none or two.
Think of each slot as a binary node that only stores pointers to its children, and no inherent value.
Take this system of binary trees:
0 1
/ \ / \
2 3 4 5
/ \ / \
6 7 8 9
/ \
10 11
The associated array would be:
0 1 2 3 4 5 6 7 8 9 10 11
[ [2,3] , [4,5] , [6,7] , nil , nil , [8,9] , nil , [10,11] , nil , nil , nil , nil ]
I've already written simple functions to find direct parents of nodes (simply by searching from the front until there is a node that contains the child)
Furthermore, let us say that at relevant times, both all trees are anywhere between a few to a few thousand levels deep.
I'd like to find a function
P(m,n)
to find the lowest common ancestor of m and n -- to put more formally, the LCA is defined as the "lowest", or deepest node in which have m and n as descendants (children, or children of children, etc.). If there is none, a nil would be a valid return.
Some examples, given our given tree:
P( 6,11) # => 2
P( 3,10) # => 0
P( 8, 6) # => nil
P( 2,11) # => 2
The main method I've been able to find is one that uses an Euler trace, which turns the given tree (Adding node A as the invisible parent of 0 and 1, with a "value" of -1), into:
A-0-2-6-2-7-10-7-11-7-2-0-3-0-A-1-4-1-5-8-5-9-5-1-A
And from that, simply find the node between your given m and n that has the lowest number; For example, to find P(6,11), look for a 6 and an 11 on the trace. The number between them that is the lowest is 2, and that's your answer. If A (-1) is in between them, return nil.
-- Calculating P(6,11) --
A-0-2-6-2-7-10-7-11-7-2-0-3-0-A-1-4-1-5-8-5-9-5-1-A
^ ^ ^
| | |
m lowest n
Unfortunately, I do believe that finding the Euler trace of a tree that can be several thousands of levels deep is a bit machine-taxing...and because my tree is constantly being changed throughout the course of the programming, every time I wanted to find the LCA, I'd have to re-calculate the Euler trace and hold it in memory every time.
Is there a more memory efficient way, given the framework I'm using? One that maybe iterates upwards? One way I could think of would be the "count" the generation/depth of both nodes, and climb the lowest node until it matched the depth of the highest, and increment both until they find someone similar.
But that'd involve climbing up from level, say, 3025, back to 0, twice, to count the generation, and using a terribly inefficient climbing-up algorithm in the first place, and then re-climbing back up.
Are there any other better ways?
Clarifications
In the way this system is built, every child will have a number greater than their parents.
This does not guarantee that if n is in generation X, there are no nodes in generation (X-1) that are greater than n. For example:
0
/ \
/ \
/ \
1 2 6
/ \ / \ / \
2 3 9 10 7 8
/ \ / \
4 5 11 12
is a valid tree system.
Also, an artifact of the way the trees are built are that the two immediate children of the same parent will always be consecutively numbered.

Are the nodes in order like in your example where the children have a larger id than the parent? If so, you might be able to do something similar to a merge sort to find them.. for your example, the parent tree of 6 and 11 are:
6 -> 2 -> 0
11 -> 7 -> 2 -> 0
So perhaps the algorithm would be:
left = left_start
right = right_start
while left > 0 and right > 0
if left = right
return left
else if left > right
left = parent(left)
else
right = parent(right)
Which would run as:
left right
---- -----
6 11 (right -> 7)
6 7 (right -> 2)
6 2 (left -> 2)
2 2 (return 2)
Is this correct?

Maybe this will help: Dynamic LCA Queries on Trees.
Abstract:
Richard Cole, Ramesh Hariharan
We show how to maintain a data
structure on trees which allows for
the following operations, all in
worst-case constant time. 1. Insertion
of leaves and internal nodes. 2.
Deletion of leaves. 3. Deletion of
internal nodes with only one child. 4.
Determining the Least Common Ancestor
of any two nodes.
Conference: Symposium on Discrete
Algorithms - SODA 1999

I've solved your problem in Haskell. Assuming you know the roots of the forest, the solution takes time linear in the size of the forest and constant additional memory. You can find the full code at http://pastebin.com/ha4gqU0n.
The solution is recursive, and the main idea is that you can call a function on a subtree which returns one of four results:
The subtree contains neither m nor n.
The subtree contains m but not n.
The subtree contains n but not m.
The subtree contains both m and n, and the index of their least common ancestor is k.
A node without children may contain m, n, or neither, and you simply return the appropriate result.
If a node with index k has two children, you combine the results as follows:
join :: Int -> Result -> Result -> Result
join _ (HasBoth k) _ = HasBoth k
join _ _ (HasBoth k) = HasBoth k
join _ HasNeither r = r
join _ r HasNeither = r
join k HasLeft HasRight = HasBoth k
join k HasRight HasLeft = HasBoth k
After computing this result you have to check the index k of the node itself; if k is equal to m or n, you will "extend" the result of the join operation.
My code uses algebraic data types, but I've been careful to assume you need only the following operations:
Get the index of a node
Find out if a node is empty, and if not, find its two children
Since your question is language-agnostic I hope you'll be able to adapt my solution.
There are various performance tweaks you could put in. For example, if you find a root that has exactly one of the two nodes m and n, you can quit right away, because you know there's no common ancestor. Also, if you look at one subtree and it has the common ancestor, you can ignore the other subtree (that one I get for free using lazy evaluation).
Your question was primarily about how to save memory. If a linear-time solution is too slow, you'll probably need an auxiliary data structure. Space-for-time tradeoffs are the bane of our existence.

I think that you can simply loop backwards through the array, always replacing the higher of the two indices by its parent, until they are either equal or no further parent is found:
(defun lowest-common-ancestor (array node-index-1 node-index-2)
(cond ((or (null node-index-1)
(null node-index-2))
nil)
((= node-index-1 node-index-2)
node-index-1)
((< node-index-1 node-index-2)
(lowest-common-ancestor array
node-index-1
(find-parent array node-index-2)))
(t
(lowest-common-ancestor array
(find-parent array node-index-1)
node-index-2))))

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio