Split a tree into equal parts by deleting an edge - algorithm

I am looking for an algorithm to split a tree with N nodes (where the maximum degree of each node is 3) by removing one edge from it, so that the two trees that come as the result have as close as possible to N/2. How do I find the edge that is "the most centered"?
The tree comes as an input from a previous stage of the algorithm and is input as a graph - so it's not balanced nor is it clear which node is the root.
My idea is to find the longest path in the tree and then select the edge in the middle of the longest path. Does it work?
Optimally, I am looking for a solution that can ensure that neither of the trees has more than 2N / 3 nodes.
Thanks for your answers.

I don't believe that your initial algorithm works for the reason I mentioned in the comments. However, I think that you can solve this in O(n) time and space using a modified DFS.
Begin by walking the graph to count how many total nodes there are; call this n. Now, choose an arbitrary node and root the tree at it. We will now recursively explore the tree starting from the root and will compute for each subtree how many nodes are in each subtree. This can be done using a simple recursion:
If the current node is null, return 0.
Otherwise:
For each child, compute the number of nodes in the subtree rooted at that child.
Return 1 + the total number of nodes in all child subtrees
At this point, we know for each edge what split we will get by removing that edge, since if the subtree below that edge has k nodes in it, the spilt will be (k, n - k). You can thus find the best cut to make by iterating across all nodes and looking for the one that balances (k, n - k) most evenly.
Counting the nodes takes O(n) time, and running the recursion visits each node and edge at most O(1) times, so that takes O(n) time as well. Finding the best cut takes an additional O(n) time, for a net runtime of O(n). Since we need to store the subtree node counts, we need O(n) memory as well.
Hope this helps!

If you see my answer to Divide-And-Conquer Algorithm for Trees, you can see I'll find a node that partitions tree into 2 nearly equal size trees (bottom up algorithm), now you just need to choose one of the edges of this node to do what you want.
Your current approach is not working assume you have a complete binary tree, now add a path of length 3*log n to one of leafs (name it bad leaf), your longest path will be within one of a other leafs to the end of path connected to this bad leaf, and your middle edge will be within this path (in fact after you passed bad leaf) and if you partition base on this edge you have a part of O(log n) and another part of size O(n) .

Related

Time/Space Complexity of Depth First Search

I've looked at various other StackOverflow answer's and they all are different to what my lecturer has written in his slides.
Depth First Search has a time complexity of O(b^m), where b is the
maximum branching factor of the search tree and m is the maximum depth
of the state space. Terrible if m is much larger than d, but if search
tree is "bushy", may be much faster than Breadth First Search.
He goes on to say..
The space complexity is O(bm), i.e. space linear in length of action
sequence! Need only store a single path from the root to the leaf
node, along with remaining unexpanded sibling nodes for each node on
path.
Another answer on StackOverflow states that it is O(n + m).
Time Complexity: If you can access each node in O(1) time, then with branching factor of b and max depth of m, the total number of nodes in this tree would be worst case = 1 + b + b2 + … + bm-1. Using the formula for summing a geometric sequence (or even solving it ourselves) tells that this sums to = (bm - 1)/(b - 1), resulting in total time to visit each node proportional to bm. Hence the complexity = O(bm).
On the other hand, if instead of using the branching factor and max depth you have the number of nodes n, then you can directly say that the complexity will be proportional to n or equal to O(n).
The other answers that you have linked in your question are similarly using different terminologies. The idea is same everywhere. Some solutions have added the edge count too to make the answer more precise, but in general, node count is sufficient to describe the complexity.
Space Complexity: The length of longest path = m. For each node, you have to store its siblings so that when you have visited all the children, and you come back to a parent node, you can know which sibling to explore next. For m nodes down the path, you will have to store b nodes extra for each of the m nodes. That’s how you get an O(bm) space complexity.
The complexity is O(n + m) where n is the number of nodes in your tree, and m is the number of edges.
The reason why your teacher represents the complexity as O(b ^ m), is probably because he wants to stress the difference between Depth First Search and Breadth First Search.
When using BFS, if your tree has a very large amount of spread compared to it's depth, and you're expecting results to be found at the leaves, then clearly DFS would make much more sense here as it reaches leaves faster than BFS, even though they both reach the last node in the same amount of time (work).
When a tree is very deep, and non-leaves can give information about deeper nodes, BFS can detect ways to prune the search tree in order to reduce the amount of nodes necessary to find your goal. Clearly, the higher up the tree you discover you can prune a sub tree, the more nodes you can skip.
This is harder when you're using DFS, because you're prioritize reaching a leaf over exploring nodes that are closer to the root.
I suppose this DFS time/space complexity is taught on an AI class but not on Algorithm class.
The DFS Search Tree here has slightly different meaning:
A node is a bookkeeping data structure used to represent the search
tree. A state corresponds to a configuration of the world. ...
Furthermore, two different nodes can contain the same world state if
that state is generated via two different search paths.
Quoted from book 'Artificial Intelligence - A Modern Approach'
So the time/space complexity here is focused on you visit nodes and check whether this is the goal state. #displayName already give a very clear explanation.
While O(m+n) is in algorithm class, the focus is the algorithm itself, when we store the graph as adjacency list and how we discover nodes.

Count nodes bigger then root in each subtree of a given binary tree in O(n log n)

We are given a tree with n nodes in form of a pointer to its root node, where each node contains a pointer to its parent, left child and right child, and also a key which is an integer. For each node v I want to add additional field v.bigger which should contain number of nodes with key bigger than v.key, that are in a subtree rooted at v. Adding such a field to all nodes of a tree should take O(n log n) time in total.
I'm looking for any hints that would allow me to solve this problem. I tried several heuristics - for example when thinking about doing this problem in bottom-up manner, for a fixed node v, v.left and v.right could provide v with some kind of set (balanced BST?) with operation bigger(x), which for a given x returns a number of elements bigger than x in that set in logarihmic time. The problem is, we would need to merge such sets in O(log n), so this seems as a no-go, as I don't know any ordered set like data structure which supports quick merging.
I also thought about top-down approach - a node v adds one to some u.bigger for some node u if and only if u lies on a simple path to the root and u<v. So v could update all such u's somehow, but I couldn't come up with any reasonable way of doing that...
So, what is the right way of thinking about this problem?
Perform depth-first search in given tree (starting from root node).
When any node is visited for the first time (coming from parent node), add its key to some order-statistics data structure (OSDS). At the same time query OSDS for number of keys larger than current key and initialize v.bigger with negated result of this query.
When any node is visited for the last time (coming from right child), query OSDS for number of keys larger than current key and add the result to v.bigger.
You could apply this algorithm to any rooted trees (not necessarily binary trees). And it does not necessarily need parent pointers (you could use DFS stack instead).
For OSDS you could use either augmented BST or Fenwick tree. In case of Fenwick tree you need to preprocess given tree so that values of the keys are compressed: just copy all the keys to an array, sort it, remove duplicates, then substitute keys by their indexes in this array.
Basic idea:
Using the bottom-up approach, each node will get two ordered lists of the values in the subtree from both sons and then find how many of them are bigger. When finished, pass the combined ordered list upwards.
Details:
Leaves:
Leaves obviously have v.bigger=0. The node above them creates a two item list of the values, updates itself and adds its own value to the list.
All other nodes:
Get both lists from sons and merge them in an ordered way. Since they are already sorted, this is O(number of nodes in subtree). During the merge you can also find how many nodes qualify the condition and get the value of v.bigger for the node.
Why is this O(n logn)?
Every node in the tree counts through the number of nodes in its subtree. This means the root counts all the nodes in the tree, the sons of the root each count (combined) the number of nodes in the tree (yes, yes, -1 for the root) and so on all nodes in the same height count together the number of nodes that are lower. This gives us that the number of nodes counted is number of nodes * height of the tree - which is O(n logn)
What if for each node we keep a separate binary search tree (BST) which consists of nodes of the subtree rooted at that node.
For a node v at level k, merging the two subtrees v.left and v.right which both have O(n/2^(k+1)) elements is O(n/2^k). After forming the BST for this node, we can find v.bigger in O(n/2^(k+1)) time by just counting the elements in the right (traditionally) subtree of the BST. Summing up, we have O(3*n/2^(k+1)) operations for a single node at level k. There are a total of 2^k many level k nodes, therefore we have O(2^k*3*n/2^(k+1)) which is simplified as O(n) (dropping the 3/2 constant). operations at level k. There are log(n) levels, hence we have O(n*log(n)) operations in total.

Complexity of a tree labeling algorithm

I have a generic weighted tree (undirected graph without cycles, connected) with n nodes and n-1 edges connecting a node to another one.
My algorithm does the following:
do
compute the actual leaves (nodes with degree 1)
remove all the leaves and their edges from the tree labelling each parent with the maximum value of the cost of his connected leaves
(for example if an internal node is connected to two leaf with edges with costs 5,6 then we label the internal node after removing the leaves with 6)
until the tree has size <= 2
return the node with maximum cost labelled
Can I say that the complexity is O(n) to compute the leaves and O(n) to eliminate each edge with leaf, so I have O(n)+O(n) = O(n)?
You can easily do this in O(n) with a set implemented as a simple list, queue, or stack (order of processing is unimportant).
Put all the leaves in the set.
In a loop, remove a leaf from the set, delete it and its edge from the graph. Process the label by updating the max of the parent. If the parent is now a leaf, add it to the set and keep going.
When the set is empty you're done, and the node labels are correct.
Initially constructing the set is O(n). Every vertex is placed on the set, removed and its label processed exactly once. That's all constant time. So for n nodes it is O(n) time. So we have O(n) + O(n) = O(n).
It's certainly possible to do this process in O(n), but whether or not your algorithm actually does depends.
If either "compute the actual leaves" or "remove all the leaves and their edges" loops over the entire tree, that step would take O(n).
And both the above steps will be repeated O(n) times in the worst case (if the tree is greatly unbalanced), so, in total, it could take O(n2).
To do this in O(n), you could have each node point to its parent so you can remove the leaf in constant time and maintain a collection of leaves so you always have the leaves, rather than having to calculate them - this would lead to O(n) running time.
As your tree is an artitary one. It can also be a link list in which case you would eliminate one node in each iteration and you would need (n-2) iterations of O(n) to find the leaf.
So your algorithm is actually O(N^2)
Here is an better algorithm that does that in O(N) for any tree
deleteLeaf(Node k) {
for each child do
value = deleteLeaf(child)
if(value>max)
max = value
delete(child)
return max
}
deleteLeaf(root) or deleteLeaf(root.child)

Why in-order traversal of a threaded tree is O(N)?

I can't seem to figure out how the in-order traversal of a threaded binary tree is O(N)..
Because you have to descend the links to find the the leftmost child and then go back by the thread when you want to add the parent to the traversal path. would not that be O(N^2)?
Thanks!
The traversal of a tree (threaded or not) is O(N) because visiting any node, starting from its parent, is O(1). The visitation of a node consists of three fixed operations: descending to the node from parent, the visitation proper (spending time at the node), and then returning to the parent. O(1 * N) is O(N).
The ultimate way to look at it is that the tree is a graph, and the traversal crosses each edge in the graph only twice. And the number of edges is proportional to the number of nodes since there are no cycles or redundant edges (each node can be reached by one unique path). A tree with N nodes has exactly N-1 edges: each node has an edge leading to it from its parent node, except for the root node of the tree.
At times it appears as if visiting a node requires more than one descent. For instance, after visiting the rightmost node in a subtree, we have to pop back up numerous levels before we can march to the right into the next subtree. But we did not descend all the way down just to visit that node. Each one-level descent can be accounted for as being necessary for visiting just the node immediately below, and the opposite ascent's
cost is lumped with that. By visiting a node V, we also gain access to all the nodes below it, but all those nodes benefit from and share the edge traversal from V's parent down to V, and back up again.
This is related to amortized analysis, which applies in situations where we can globally understand the overall cost based on some general observation about the structure of the problem, but at the detailed level of the individual operations, the costs are distributed in an uneven way that appears confusing.
Amortized analysis helps us understand that, for instance, N insertions into a hash table which resizes itself by growing exponentially are O(N). Most of the insertion operations are quick, but from time to time, we grow the table and process its contents. This is similar to how, from time to time during a tree traversal, we have to perform numerous consecutive ascents to climb out of a deep subtree.
The global observation about the hash table is that each item inserted into the table will move to a larger table on average about three times in three resize operations, and so each insertion can be regarded as "pre paying" for three re-insertions, which is a fixed cost. Of course, "older" items will be moved more times, but this is offset by "younger" entries that move fewer times, diluting the cost. And the global observation about the tree was already noted above: it has N-1 edges, each of which are traversed exactly twice during the traversal, so the visitation of each node "pays" for the double traversal of its respective edge. Because this is so easy to see, we don't actually have to formally apply amortized analysis to tree traversal.
Now suppose we performed an individual searches for each node (and the tree is a balanced search tree). Then the traversal would still not be O(N*N), but rather O(N log N). Suppose we have an ordered search tree which holds consecutive integers. If we increment over the integers and perform individual searches for each value, then each search is O(log N), and we end up doing N of these. In this situation, the edge traversals are no longer shared, so amortization does not apply. To reach some given node that we are searching for which is found at depth D, we have to cross D edges twice, for the sake of that node and that node alone. The next search in the loop for another integer will be completely independent of the previous one.
It may also help you to think of a linked list, which can be regarded as a very unbalanced tree. To visit all the items in a linked list of length N and return back to the head node is obviously O(N). Searching for each item individually is O(N*N), but in a traversal, we are not searching for each node individually, but using each predecessor as a springboard into finding the next node.
There is no loop to find the parent. Otherwise said, you are going through each arc between two node twice. That would be 2*number of arc = 2*(number of node -1) which is O(N).

Binary tree visit: get from one leaf to another leaf

Problem: I have a binary tree, all leaves are numbered (from left to right, starting from 0) and no connection exists between them.
I want an algorithm that, given two indices (of 2 distinct leaves), visits the tree starting from the greater leaf (the one with the higher index) and gets to the lower one.
The internal nodes of the tree do not contain any useful information.
I should chose the path based only on the leaves indices. The path start from a leaf and terminates on a leaf, and of course I can access a leaf if I know its index (through an array of pointers)
The tree is static, no insertion or deletion of nodes is allowed.
I have developed an algorithm to do it but it really sucks... any ideas?
One option would be to find the least common ancestor of the two nodes, along with the sequence of nodes you should take from each node to get to that ancestor. Here's a sketch of the algorithm:
Starting from each node, walk back up to that node's parent until you reach the root. Count the number of nodes on the path from each node to the root. Let the height of the first node be h1 and the height of the second node be h2.
Let h = min(h1, h2). This is the height of the higher of the two nodes.
Starting from each node, keep following the node's parent pointer until both nodes are at height h. Record the nodes you followed during this step. At this point, both nodes are at the same height.
Until you find a common node, keep marching upwards from each node to its parent. Eventually you will hit their common ancestor. At this point, follow the path from the first node up to this ancestor, then down the path from the ancestor down to the second node.
In the worst case, this takes O(h) time and O(h) space, where h is the height of the tree. For a balanced binary tree is this O(lg n) time and space, which is quite good.
If you're interested in a Much More Hardcore version of this algorithm, consider looking into Tarjan's Least Common Ancestors algorithm, which with linear preprocessing time, can be used to find the least common ancestor much more rapidly than this.
Hope this helps!
Distance between any two nodes can be calculated with the help of lowest common ancestor:
Dist(n1, n2) = Dist(root, n1) + Dist(root, n2) - 2*Dist(root, lca)
where lca is lowest common ancestor.
see this for more help about this algorithm and see this video for learning how to calculate lca.

Resources