Time complexity of a tree related problem - algorithm

I am struggling to figure out the time complexity of the following problem (this is not homework, just something I came up with and can't understand).
Suppose you have an arbitrary tree. The algorithm is such that for every node in the tree you have to run some O(1) operation as many times as that node's number of leaf descendants. So, in the example tree below, we would run 2 operations for node A and 6 operations for the root node R.
Let's say you have n nodes, the tree is of depth d, and you may use any other notation necessary. What is the complexity?
I can't quite wrap my head around this. Surely it is less than O(n^2) but how do I approach this? Thank you!
Edit: leaf descendant of a node is a descendant that does not have any children. A descendant is a node reachable by repeated proceeding from parent to child (doesn't matter if it's an internal or a leaf node)

It's Ө(n^2). Obviously, as you noted, it's in O(n^2) because each node must have fewer than n descendant leaves.
In a tree with a construction like this:
A
/ \
B C
/ \
D E
/ \
F G
...
The top-most n/4 internal nodes have at least n/4 descendant leaves, so the total number of operations is at least n^2/16, which is in Ω(n^2).
If you have a depth limit d, then each node can have at most d ancestors, so you get O(n*min(d,n)), which is also tight by a similar construction.

I think it will be O(2(N - Leaf) + Leaf) where Leaf is the number of descendants of the tree. O(2(N - Leaf)) is required to iterate over the tree to find the leaf descendants and a O(1) operation needs to be performed on each of them.

Related

Running time to check if a binary tree is subtree of another binary tree

I've come across a naive solution for the problem of checking if a binary tree is subtree of another binary tree:
Given two binary trees, check if the first tree is subtree of the second one. A subtree of a tree T is a tree S consisting of a node in T and all of its descendants in T. The subtree corresponding to the root node is the entire tree; the subtree corresponding to any other node is called a proper subtree.
For example, in the following case, tree S is a subtree of tree T:
Tree 2
10
/ \
4 6
\
30
Tree 1
26
/ \
10 3
/ \ \
4 6 3
\
30
The solution is to traverse the tree T in preorder fashion. For every visited node in the traversal, see if the subtree rooted with this node is identical to S.
It is said in the post that the algorithm has a running time of n^2 or O(m*n) in the worst case where m and n are the sizes of both trees involved.
The point of confusion here is that, if we are traversing through both trees at the same time, in the worst case, it would seem that you would simply have to recurse through all of the nodes in the larger tree to find the subtree. So how could this version of the algorithm (not this one) have a quadratic running time?
Well, basically in the isSubTree() function you only traverse T tree (the main one, not a subtree). You do nothing with S, so in the worst case this function would be executed for every node in T. However (in the worst case) for each execution, it will check if areIdentical(T, S), which in the worst case has to fully traverse one of the given trees (till one of those is zero-sized).
Trees passed to areIdentical() function are obviously smaller and smaller, but in this case it doesn't matter if it comes to time complexity. Either way this gives you O(n^2) or O(n*m) (where n,m - number of nodes in those trees).
To solve reasonably optimally, flatten the two trees. Using Lisp notation,
we get
(10 (4(30) (6))
and
(26 (10 (4(30) (6)) (3 (3))
So the subtree is a substring of the parent. Using strstr we can
complete normally in O(N) time, it might take a little bit longer
if we have lots and lots of near sub-trees. You can use a suffix
tree if you need to do lots of searches and that gets it down to O(M)
time where M is the size of the subtree.
But actually runtime doesn't improve. It's the same algorithm,
and it will have N M behaviour if, for example, all the trees
have the same node id and structure, except for the last right
child of the query sub-tree. it's just that the operations
become a lot faster.

Count nodes bigger then root in each subtree of a given binary tree in O(n log n)

We are given a tree with n nodes in form of a pointer to its root node, where each node contains a pointer to its parent, left child and right child, and also a key which is an integer. For each node v I want to add additional field v.bigger which should contain number of nodes with key bigger than v.key, that are in a subtree rooted at v. Adding such a field to all nodes of a tree should take O(n log n) time in total.
I'm looking for any hints that would allow me to solve this problem. I tried several heuristics - for example when thinking about doing this problem in bottom-up manner, for a fixed node v, v.left and v.right could provide v with some kind of set (balanced BST?) with operation bigger(x), which for a given x returns a number of elements bigger than x in that set in logarihmic time. The problem is, we would need to merge such sets in O(log n), so this seems as a no-go, as I don't know any ordered set like data structure which supports quick merging.
I also thought about top-down approach - a node v adds one to some u.bigger for some node u if and only if u lies on a simple path to the root and u<v. So v could update all such u's somehow, but I couldn't come up with any reasonable way of doing that...
So, what is the right way of thinking about this problem?
Perform depth-first search in given tree (starting from root node).
When any node is visited for the first time (coming from parent node), add its key to some order-statistics data structure (OSDS). At the same time query OSDS for number of keys larger than current key and initialize v.bigger with negated result of this query.
When any node is visited for the last time (coming from right child), query OSDS for number of keys larger than current key and add the result to v.bigger.
You could apply this algorithm to any rooted trees (not necessarily binary trees). And it does not necessarily need parent pointers (you could use DFS stack instead).
For OSDS you could use either augmented BST or Fenwick tree. In case of Fenwick tree you need to preprocess given tree so that values of the keys are compressed: just copy all the keys to an array, sort it, remove duplicates, then substitute keys by their indexes in this array.
Basic idea:
Using the bottom-up approach, each node will get two ordered lists of the values in the subtree from both sons and then find how many of them are bigger. When finished, pass the combined ordered list upwards.
Details:
Leaves:
Leaves obviously have v.bigger=0. The node above them creates a two item list of the values, updates itself and adds its own value to the list.
All other nodes:
Get both lists from sons and merge them in an ordered way. Since they are already sorted, this is O(number of nodes in subtree). During the merge you can also find how many nodes qualify the condition and get the value of v.bigger for the node.
Why is this O(n logn)?
Every node in the tree counts through the number of nodes in its subtree. This means the root counts all the nodes in the tree, the sons of the root each count (combined) the number of nodes in the tree (yes, yes, -1 for the root) and so on all nodes in the same height count together the number of nodes that are lower. This gives us that the number of nodes counted is number of nodes * height of the tree - which is O(n logn)
What if for each node we keep a separate binary search tree (BST) which consists of nodes of the subtree rooted at that node.
For a node v at level k, merging the two subtrees v.left and v.right which both have O(n/2^(k+1)) elements is O(n/2^k). After forming the BST for this node, we can find v.bigger in O(n/2^(k+1)) time by just counting the elements in the right (traditionally) subtree of the BST. Summing up, we have O(3*n/2^(k+1)) operations for a single node at level k. There are a total of 2^k many level k nodes, therefore we have O(2^k*3*n/2^(k+1)) which is simplified as O(n) (dropping the 3/2 constant). operations at level k. There are log(n) levels, hence we have O(n*log(n)) operations in total.

Check if a binary tree has two identical subtrees

I want to write a method to find whether a tree has at least a pair of identical subtrees, the subtrees have to be identical in both value and structure.
Suppose you are given a tree as follows:
a
/ \
b f
/ / \
c g d
/ / /
d h e
/
e
This would return true because we have a pair of identical trees with root d.
My thought is to traverse each node and build a map of node name mapped to list of tree nodes. In each iteration, we check if current node name is in the map or not. If it's in, then we can call boolean isSameTree(TreeNode t1, TreeNode t2) function with current node against every node in the list of tree nodes to see if they are identical.
The time complexity would be O(n^3). And I wonder if we can do better than that!
Your example tree also has a pair of identical trees with root e. In general, if two trees are identical then either they are both leaves or they have subtrees which are identical. So you can simplify your test to checking whether all of the leaves in the original tree are distinct, which takes O(n) hashing operations and average case Theta(n) equality comparisons unless the hash is very poor.

Split a tree into equal parts by deleting an edge

I am looking for an algorithm to split a tree with N nodes (where the maximum degree of each node is 3) by removing one edge from it, so that the two trees that come as the result have as close as possible to N/2. How do I find the edge that is "the most centered"?
The tree comes as an input from a previous stage of the algorithm and is input as a graph - so it's not balanced nor is it clear which node is the root.
My idea is to find the longest path in the tree and then select the edge in the middle of the longest path. Does it work?
Optimally, I am looking for a solution that can ensure that neither of the trees has more than 2N / 3 nodes.
Thanks for your answers.
I don't believe that your initial algorithm works for the reason I mentioned in the comments. However, I think that you can solve this in O(n) time and space using a modified DFS.
Begin by walking the graph to count how many total nodes there are; call this n. Now, choose an arbitrary node and root the tree at it. We will now recursively explore the tree starting from the root and will compute for each subtree how many nodes are in each subtree. This can be done using a simple recursion:
If the current node is null, return 0.
Otherwise:
For each child, compute the number of nodes in the subtree rooted at that child.
Return 1 + the total number of nodes in all child subtrees
At this point, we know for each edge what split we will get by removing that edge, since if the subtree below that edge has k nodes in it, the spilt will be (k, n - k). You can thus find the best cut to make by iterating across all nodes and looking for the one that balances (k, n - k) most evenly.
Counting the nodes takes O(n) time, and running the recursion visits each node and edge at most O(1) times, so that takes O(n) time as well. Finding the best cut takes an additional O(n) time, for a net runtime of O(n). Since we need to store the subtree node counts, we need O(n) memory as well.
Hope this helps!
If you see my answer to Divide-And-Conquer Algorithm for Trees, you can see I'll find a node that partitions tree into 2 nearly equal size trees (bottom up algorithm), now you just need to choose one of the edges of this node to do what you want.
Your current approach is not working assume you have a complete binary tree, now add a path of length 3*log n to one of leafs (name it bad leaf), your longest path will be within one of a other leafs to the end of path connected to this bad leaf, and your middle edge will be within this path (in fact after you passed bad leaf) and if you partition base on this edge you have a part of O(log n) and another part of size O(n) .

Binary tree visit: get from one leaf to another leaf

Problem: I have a binary tree, all leaves are numbered (from left to right, starting from 0) and no connection exists between them.
I want an algorithm that, given two indices (of 2 distinct leaves), visits the tree starting from the greater leaf (the one with the higher index) and gets to the lower one.
The internal nodes of the tree do not contain any useful information.
I should chose the path based only on the leaves indices. The path start from a leaf and terminates on a leaf, and of course I can access a leaf if I know its index (through an array of pointers)
The tree is static, no insertion or deletion of nodes is allowed.
I have developed an algorithm to do it but it really sucks... any ideas?
One option would be to find the least common ancestor of the two nodes, along with the sequence of nodes you should take from each node to get to that ancestor. Here's a sketch of the algorithm:
Starting from each node, walk back up to that node's parent until you reach the root. Count the number of nodes on the path from each node to the root. Let the height of the first node be h1 and the height of the second node be h2.
Let h = min(h1, h2). This is the height of the higher of the two nodes.
Starting from each node, keep following the node's parent pointer until both nodes are at height h. Record the nodes you followed during this step. At this point, both nodes are at the same height.
Until you find a common node, keep marching upwards from each node to its parent. Eventually you will hit their common ancestor. At this point, follow the path from the first node up to this ancestor, then down the path from the ancestor down to the second node.
In the worst case, this takes O(h) time and O(h) space, where h is the height of the tree. For a balanced binary tree is this O(lg n) time and space, which is quite good.
If you're interested in a Much More Hardcore version of this algorithm, consider looking into Tarjan's Least Common Ancestors algorithm, which with linear preprocessing time, can be used to find the least common ancestor much more rapidly than this.
Hope this helps!
Distance between any two nodes can be calculated with the help of lowest common ancestor:
Dist(n1, n2) = Dist(root, n1) + Dist(root, n2) - 2*Dist(root, lca)
where lca is lowest common ancestor.
see this for more help about this algorithm and see this video for learning how to calculate lca.

Resources