Algorithm For Vertex Removal From Tree - algorithm

When a vertex and its incident edges are removed from a tree, a collection of
subtrees remains. Write an algorithm that, given a graph that is a tree with n
vertices, finds a vertex v whose removal leaves no subtree with more than n/2
vertices.
I have attempted this problem using a modified DFS approach as well as a bridge finding algorithm. Any help would be much appreciated.

Create a recursive function that does a post-order traversal of the tree.
The function returns the subtree size and a vertex, or the vertex can be global (in which case you just set it instead of returning it).
Call the function for the root.
Call the function on each child's subtree.
If one of those calls returned a vertex, return that vertex.
Return the current vertex if these conditions hold:
Each child's subtree has less than or equal to n/2 vertices.
The sum of children's subtree sizes is greater than or equal to (n-1)/2, i.e. the subtree 'above' the current has less than or equal to n/2 nodes.
Return the sum of children's subtree sizes + 1 as the subtree size.
Running time: O(n).
I'm assuming you've already got the size of the tree - n - if not, you'll need to start off with another traversal to get this size (which doesn't affect the O(n) running time).

Related

Algorithm to find the longest path in binary tree?

Question:
You are given a rooted binary tree (each node has at most two children).
For a simple path p between two nodes in the tree, let mp be the node on the path that is highest
(closest to the root). Define the weight of a path w(p) = Σ_u∈p d(u, mp), where d denotes the distance
(number of edges on the path between two nodes). That is, every node on the path is weighted by the
distance to the highest node on the path.
The question asks an algorithm that finds the maximum weight among all simple paths in the tree. I'm not sure if I interpreted correctly, but can I just find the longest path from mp to the farthest node? I haven't figure out which algorithm is appropriate for this question, but I think recursive is one way to do it. Again, I don't understand the question very well, it would be better if someone could "translate" it for me and guide me to the solution.
Let's assume we know mp. Then the highest-weight path must start in the left subtree and end in the right subtree (or vice versa). Otherwise, the path would not be simple. To find the start and end node, we would go as deep as possible into the respective subtrees as each level adds depth to the weight. Therefore, we can compute the weight of this path directly from the heights of the two subtrees (by using the analytic solution of the arithmetic progression):
max_weight = height_left * (height_left + 1) / 2 + height_right * (height_right + 1) / 2
To find the maximum weight path across the entire tree (without prescribing mp), simply check this value for all nodes. I.e., take a recursive algorithm that calculates the height for each subtree. When you have the two subtree heights for a node, calculate the maximum weight. Of all these weights, take the maximum. This requires time linear in the number of nodes.
And to answer your question: No, it is not necessarily the longest path in the tree. The path can have one branch that goes very deep but a very shallow branch on the other side. This is because adding one level deeper to the path does not just increase the weight by 1 but by the depth of that node.
This problem is the diameter of a binary tree. In this case, the node that at the lower level has a greater weight because it's far from the root. Therefore, to find the longest path is to find the diameter of the binary tree. We can use brute force algorithm to solve it, by traveling all leaf-to-leaf paths, then arriving at the diameter.
Method: naive approach
Find the height of left and right subtree, and then find the left and right diameter. Return the Maximum(Diameter of left subtree, Diameter of right subtree, Longest path between two nodes which passes through the root.)
Time Complexity: Since when calculating the diameter, every iteration for every node, is calculating height of tree separately in which we iterate the tree from top to bottom and when we calculate diameter recursively so its O(N2)
Improve:
If you notice at every node to find the diameter we are calling a separate function to find the height. We can improve it by finding the height of tree and diameter in the same iteration.Every node will return the two information in the same iteration , height of that node and diameter of tree with respect to that node. Running time is O(N)

relation between degrees of vertices and edge removal

I'm looking for help to prove the next question:
given an undirected tree with n vertices with each one's degree <= 3,
(1) prove that there exists an edge that if we remove we'll have two trees with number of vertices in each one - maximum (2*n/3).
(2) suggest a linear algorithm that finds such an edge in the above given tree
Choose an arbitrary root. Do a post order traversal to compute the size of each subtree. By descending from the root via children with subtrees at least as large as their siblings, find a subtree of size between (n-1)/3 inclusive and 2(n-1)/3 + 1 exclusive (the degree bound keeps the size from decreasing by more than minus one divided by two). Sever its parent edge.

Complexity of a tree labeling algorithm

I have a generic weighted tree (undirected graph without cycles, connected) with n nodes and n-1 edges connecting a node to another one.
My algorithm does the following:
do
compute the actual leaves (nodes with degree 1)
remove all the leaves and their edges from the tree labelling each parent with the maximum value of the cost of his connected leaves
(for example if an internal node is connected to two leaf with edges with costs 5,6 then we label the internal node after removing the leaves with 6)
until the tree has size <= 2
return the node with maximum cost labelled
Can I say that the complexity is O(n) to compute the leaves and O(n) to eliminate each edge with leaf, so I have O(n)+O(n) = O(n)?
You can easily do this in O(n) with a set implemented as a simple list, queue, or stack (order of processing is unimportant).
Put all the leaves in the set.
In a loop, remove a leaf from the set, delete it and its edge from the graph. Process the label by updating the max of the parent. If the parent is now a leaf, add it to the set and keep going.
When the set is empty you're done, and the node labels are correct.
Initially constructing the set is O(n). Every vertex is placed on the set, removed and its label processed exactly once. That's all constant time. So for n nodes it is O(n) time. So we have O(n) + O(n) = O(n).
It's certainly possible to do this process in O(n), but whether or not your algorithm actually does depends.
If either "compute the actual leaves" or "remove all the leaves and their edges" loops over the entire tree, that step would take O(n).
And both the above steps will be repeated O(n) times in the worst case (if the tree is greatly unbalanced), so, in total, it could take O(n2).
To do this in O(n), you could have each node point to its parent so you can remove the leaf in constant time and maintain a collection of leaves so you always have the leaves, rather than having to calculate them - this would lead to O(n) running time.
As your tree is an artitary one. It can also be a link list in which case you would eliminate one node in each iteration and you would need (n-2) iterations of O(n) to find the leaf.
So your algorithm is actually O(N^2)
Here is an better algorithm that does that in O(N) for any tree
deleteLeaf(Node k) {
for each child do
value = deleteLeaf(child)
if(value>max)
max = value
delete(child)
return max
}
deleteLeaf(root) or deleteLeaf(root.child)

Finding number of nodes within a certain distance in a rooted tree

In a rooted and weighted tree, how can you find the number of nodes within a certain distance from each node? You only need to consider down edges, e.g. nodes going down from the root. Keep in mind each edge has a weight.
I can do this in O(N^2) time using a DFS from each node and keeping track of the distance traveled, but with N >= 100000 it's a bit slow. I'm pretty sure you could easily solve it with unweighted edges with DP, but anyone know how to solve this one quickly? (Less than N^2)
It's possible to improve my previous answer to O(nlog d) time and O(n) space by making use of the following observation:
The number of sufficiently-close nodes at a given node v is the sum of the numbers of sufficiently-close nodes of each of its children, less the number of nodes that have just become insufficiently-close.
Let's call the distance threshold m, and the distance on the edge between two adjacent nodes u and v d(u, v).
Every node has a single ancestor that is the first ancestor to miss out
For each node v, we will maintain a count, c(v), that is initially 0.
For any node v, consider the chain of ancestors from v's parent up to the root. Call the ith node in this chain a(v, i). Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. If we are able to quickly find i, then we can simply decrement c(a(v, i+1)) (bringing it (possibly further) below 0), so that when the counts of a(v, i+1)'s children are added to it in a later pass, v is correctly excluded from being counted. Provided we calculate fully accurate counts for all children of a node v before adding them to c(v), any such exclusions are correctly "propagated" to parent counts.
The tricky part is finding i efficiently. Call the sum of the distances of the first j >= 0 edges on the path from v to the root s(v, j), and call the list of all depth(v)+1 of these path lengths, listed in increasing order, s(v). What we want to do is binary-search the list of path lengths s(v) for the first entry greater than the threshold m: this would find i+1 in log(d) time. The problem is constructing s(v). We could easily build it using a running total from v up to the root -- but that would require O(d) time per node, nullifying any time improvement. We need a way to construct s(v) from s(parent(v)) in constant time, but the problem is that as we recurse from a node v to its child u, the path lengths grow "the wrong way": every path length x needs to become x + d(u, v), and a new path length of 0 needs to be added at the beginning. This appears to require O(d) updates, but a trick gets around the problem...
Finding i quickly
The solution is to calculate, at each node v, the total path length t(v) of all edges on the path from v to the root. This is easily done in constant time per node: t(v) = t(parent(v)) + d(v, parent(v)). We can then form s(v) by prepending -t to the beginning of s(parent(v)), and when performing the binary search, consider each element s(v, j) to represent s(v, j) + t (or equivalently, binary search for m - t instead of m). The insertion of -t at the start can be achieved in O(1) time by having a child u of a node v share v's path length array, with s(u) considered to begin one memory location before s(v). All path length arrays are "right-justified" inside a single memory buffer of size d+1 -- specifically, nodes at depth k will have their path length array begin at offset d-k inside the buffer to allow room for their descendant nodes to prepend entries. The array sharing means that sibling nodes will overwrite each other's path lengths, but this is not a problem: we only need the values in s(v) to remain valid while v and v's descendants are processed in the preorder DFS.
In this way we gain the effect of O(d) path length increases in O(1) time. Thus the total time required to find i at a given node is O(1) (to build s(v)) plus O(log d) (to find i using the modified binary search) = O(log d). A single preorder DFS pass is used to find and decrement the appropriate ancestor's count for each node; a postorder DFS pass then sums child counts into parent counts. These two passes can be combined into a single pass over the nodes that performs operations both before and after recursing.
[EDIT: Please see my other answer for an even more efficient O(nlog d) solution :) ]
Here's a simple O(nd)-time, O(n)-space algorithm, where d is the maximum depth of any node in the tree. A complete tree (a tree in which every node has the same number of children) with n nodes has depth d = O(log n), so this should be much faster than your O(n^2) DFS-based approach in most cases, though if the number of sufficiently-close descendants per node is small (i.e. if DFS only traverses a small number of levels) then your algorithm should not be too bad either.
For any node v, consider the chain of ancestors from v's parent up to the root. Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. So all we need to do is for each node, climb upwards towards the root until such time as the total path length exceeds the threshold distance m, incrementing the count at each ancestor as we go. There are n nodes, and for each node there are at most d ancestors, so this algorithm is trivially O(nd).

Split a tree into equal parts by deleting an edge

I am looking for an algorithm to split a tree with N nodes (where the maximum degree of each node is 3) by removing one edge from it, so that the two trees that come as the result have as close as possible to N/2. How do I find the edge that is "the most centered"?
The tree comes as an input from a previous stage of the algorithm and is input as a graph - so it's not balanced nor is it clear which node is the root.
My idea is to find the longest path in the tree and then select the edge in the middle of the longest path. Does it work?
Optimally, I am looking for a solution that can ensure that neither of the trees has more than 2N / 3 nodes.
Thanks for your answers.
I don't believe that your initial algorithm works for the reason I mentioned in the comments. However, I think that you can solve this in O(n) time and space using a modified DFS.
Begin by walking the graph to count how many total nodes there are; call this n. Now, choose an arbitrary node and root the tree at it. We will now recursively explore the tree starting from the root and will compute for each subtree how many nodes are in each subtree. This can be done using a simple recursion:
If the current node is null, return 0.
Otherwise:
For each child, compute the number of nodes in the subtree rooted at that child.
Return 1 + the total number of nodes in all child subtrees
At this point, we know for each edge what split we will get by removing that edge, since if the subtree below that edge has k nodes in it, the spilt will be (k, n - k). You can thus find the best cut to make by iterating across all nodes and looking for the one that balances (k, n - k) most evenly.
Counting the nodes takes O(n) time, and running the recursion visits each node and edge at most O(1) times, so that takes O(n) time as well. Finding the best cut takes an additional O(n) time, for a net runtime of O(n). Since we need to store the subtree node counts, we need O(n) memory as well.
Hope this helps!
If you see my answer to Divide-And-Conquer Algorithm for Trees, you can see I'll find a node that partitions tree into 2 nearly equal size trees (bottom up algorithm), now you just need to choose one of the edges of this node to do what you want.
Your current approach is not working assume you have a complete binary tree, now add a path of length 3*log n to one of leafs (name it bad leaf), your longest path will be within one of a other leafs to the end of path connected to this bad leaf, and your middle edge will be within this path (in fact after you passed bad leaf) and if you partition base on this edge you have a part of O(log n) and another part of size O(n) .

Resources