How to update MST after deleting an edge from the graph? - algorithm

I have a graph represented with adjacency lists and his MST represented by parent array.
My problem is that I need to delete an edge from the graph and update parent array.
I've already manage to do the cases when:
the edge doesn't exist;
the edge is in graph but not in MST (MST doesn't change);
and the edge is the only path from two nodes(in this case I return null, because the
graph is not connected).
What can I do when the edge is in MST and the edge in graph is in a cycle? I need to do this in O(n+m) complexity.
I write the costs of edges with red color.

Just search the minimum-distance path of
the now-disconnected tree portion
to the rest of the tree and
add that path in between-- which is a single edge.
Say your original tree is T. With the removal
of the edge, it is now split into trees T1 and T2.
Take one of these-- say T2.
Every edge incident to a node on T2 is either
on T2 or is one connecting T2 to T1. Among these
edges incident to a node on T2, pick the one which
( (has its other end at a T1 node) and
(has the minimum cost among all such edges) ).
Search for this edge, say e=(u,v), takes O(|E|) time. O(|N|) if the separated portion is a leaf.
if there were a tree, say T', with less cost than T1 \union T2 \union {e},
then the node sets N1 and N2 of T1 and T2 would have more inter-connections than just one edge,
i.e., there would be two or more more edges on T' that begin at a node in N1 and end at a node in N2.
Otherwise, ( (T1 and T2 are minimum spanning trees resply. over N1 and N2) and (e is the least costly
connection between T1 and T2) ) would be false. But then, any go-between T1 and T2 is costlier than e=(u,v)-- contradicts.
Skipped some proof details. hope this helps.

Related

Finding Minimum Spanning Tree (MST) in graph?

Given an Undirected graph G with with weight on its edges and 2 different minimal spanning trees: T, T'
Then I want to prove the following:
For every edge e in T that's not in T', there is an edge e' in T'
that's not in T such that if we replace e with e' in T (let's call
it T_new) then it's still a minimal spanning tree of G.
I think I am too close for finding the right algorithm but stuck a little:
I have proved that weight(e) must be exactly equal to weight(e').
Since T is a tree, deleting e will result in 2 separated components, then for T_new to be a tree it must use one of the edges connecting two vertices from those different components.
But, I wasn't able to know which edge e' exactly will work. Plus I wasn't able to prove that always there is such an edge (I just found some requirements for e' that is must satisfy).
Some notes: I know Kruskal algorithm, and familiar with an algorithm in which we can paint some edges in yellow and request it to generate minimal spanning trees with maximum yellow edges (In other words from all found minimal spanning trees return the one with maximum number of yellow edges)
Let T1 and T2 be the two connected components of T \ {e}. Consider the path P joining the endpoints of e in T'. Since e connects T1 and T2, so does P, and therefore there exists an edge e' in P that connects T1 and T2. The edge e' cannot be lighter than e, or else T would not be minimum (T \ {e} U {e'}). The edge e' cannot be heavier than e, or else T' would not be minimum (T' \ {e'} U {e}).

Graph and MST, Some Facts and Validity

My notes tell me that the first and last is false. I need some idea of how to understand the validity of these sentences in a more simple and concise manner.
Suppose M is an MST (minimum spanning tree) of the Weighted Graph - GR.
Let A be a vertex of GR then M-{A} is also MST of GR-{A}.
Let A be a leaf of M then M-{A} is also MST of GR-{A}.
If e is a edge of M then (M-{e}) is a forest of M1 and M2 trees such that for M_i, i=1,2 is a MST of Induced Graph GR on vertexes T_i.
Let A be a vertex of GR, then M-{A} is also MST of GR-{A}.
This is false.
If A is not a leaf, then M-{A} is not a connected graph, and so it cannot be a MST.
In more detail: A has at least two neighbors, and the single path (MST property) that exists between two of them includes A. If A is removed, then there is no more path between those two other vertices.
Let A be a leaf of M, then M-{A} is also MST of GR-{A}.
This is true.
As A is a leaf, M-{A} is a connected graph, and has one fewer edge than M: it does not have the edge e that connects with A in M.
Now assume that M-{A} is not a MST of GR-{A}. We know that M-{A} is a connected graph, and has no cycles -- otherwise M would not be an MST. So if it is not an MST, it must be that its weight is not minimised. Then there is a different graph P that is a MST of GR-{A}. But then P+e would have less total weight than M, but still be a spanning tree of GR. So then M could not have been a MST of GR. This is a contradiction, and so the original statement is true.
If e is a edge of M then M-{e} is a forest of M1 and M2 trees such that for Mi, i=1,2 is a MST of the induced graph GR on vertexes Ti.
This is true. Your notes are wrong on this one.
Let's assume that Mi (for either i=1,2) is not a MST of the induced graph on vertexes Ti. We know that Mi is not disconnected (the removal of e creates exactly 2 connected graphs), and that it has no cycles (otherwise M would have had those as well). So if it is not a MST, then the reason would be that its weight is not minimised, and another graph Pi is a MST on that induced graph, and has less total weight than Mi.
If you would then reintroduce the edge e, and so connect Pi with M3-i, we would have a spanning tree for GR which would have less weight than M, and so M is not a MST. This is again a contradiction and so the original statement is true.

Distinct minimum spanning tree

For a connected, weighed, undirected graph G:
G has a unique MST, if for every cut of G there is a unique minimum weight edge crossing the cut.
Is this statement true?
I think false because for the following graph in the given link there can be multiple MSTs.
https://drive.google.com/file/d/1yDK3juPxeDBdS-aEOx0aAsphy4odZ55l/view?usp=drivesdk
If you mean a connected graph G, with edge costs that are all distinct. Then G has a unique minimum spanning tree.
proof:
Suppose there are two different MSTs, call them T1 and T2, who have different sets of edges--{t11, t12, … t1n-1} for T1 and {t21, t22, … t2n-1} for T2. So, let ti be the edge with the smallest weight only in T1 (not in T2). Since it is the smallest, ti must be included in "every choice" of MST. That is, both MST of T1 and T2 would have it. But this contradicts the definition of ti.

Finding The Maximum Weight on A Tree

I have a tree with N nodes and each edge has some weight. I have to find, for all pairs (u,v) 1<=u<=N 1<=V<=N, the maximum weight in the path from U to V. How can I find the total sum of the maximum weight for every pair (U,V)?
You can use Floyd Warshall Algorithm. You can multiply each edges with (-1), and solve your problem like finding the shortest path between all pairs, then multiply your result with (-1).
For every node, you can extend it to a tree. The extend process of node x is, for every edge (x, y), if value[x] > value[y] then we go to node y and continue the process, so the result of extending node x is a tree, and the value of x is the largest of all.
There are possibility that some value of the tree are equal maximum, to break the tie, we can assign each node an index, to make the value like a pair (value[x], idx[x]), and each idx is different.
Then for each extended tree, suppose the node with max value is x, take it as the root, suppose it has m subtree, each has total node number num[i], then every path between two node from different subtree has max weight value[x], so value[x] counts SUM[num[i]*num[j]] (i != j) times, ie. (SUM[num[i]]^2-SUM[num[i]^2])/2, and we miss the path starting from node x, so we should also count SUM[num[i]] more times. num can be generated during the extending process.
This is a O(n^2) solution. And I believe there is other much more faster algorithm.
Given a tree T, let S(T) be the sum you're interested in: the sum over every pair of vertices of the largest weight of an edge in the path between them.
If T has no edges then it has 1 vertex, and S(T) = 0.
Otherwise, consider the largest edge E in the tree, and say its weight is W. If you remove this edge from the tree, you end up with two smaller trees (call them T1 and T2). Let's say T1 is the tree above the edge (that is, the subtree that contains the root of T).
If u is in T1 and v is in T2, then the largest edge in any path between them goes through E, which has weight W and is guaranteed to be the largest weight in the path. Otherwise, if u and v are both in T1, then the path between them stays inside T1. Same for T2.
Thus S(T) = 2*W*v(T1)*v(T2) + S(T1) + S(T2). (where v(T) means the number of vertices of T).
This gives you a recursive formula for computing S(T), and the only difficulty is finding the largest edge quickly in the subproblems.
One way is to store in each vertex of the tree the maximum weight of any edge below that vertex. Then you can find the largest weight in the whole tree in O(log(v(T)) time, and when you, following the algorithm above, remove that largest weight, you can patch up the weights in the part of the tree above the edge in log(v(T1)) time. The weights in T2 don't need adjusting.
Overall, this gives you an O(v(T)log(v(T))) algorithm. Each edge is removed exactly once, and there's at most O(log(v(T))) work per step.

How to find largest common sub-tree in the given two binary search trees?

Two BSTs (Binary Search Trees) are given. How to find largest common sub-tree in the given two binary trees?
EDIT 1:
Here is what I have thought:
Let, r1 = current node of 1st tree
r2 = current node of 2nd tree
There are some of the cases I think we need to consider:
Case 1 : r1.data < r2.data
2 subproblems to solve:
first, check r1 and r2.left
second, check r1.right and r2
Case 2 : r1.data > r2.data
2 subproblems to solve:
- first, check r1.left and r2
- second, check r1 and r2.right
Case 3 : r1.data == r2.data
Again, 2 cases to consider here:
(a) current node is part of largest common BST
compute common subtree size rooted at r1 and r2
(b)current node is NOT part of largest common BST
2 subproblems to solve:
first, solve r1.left and r2.left
second, solve r1.right and r2.right
I can think of the cases we need to check, but I am not able to code it, as of now. And it is NOT a homework problem. Does it look like?
Just hash the children and key of each node and look for duplicates. This would give a linear expected time algorithm. For example, see the following pseudocode, which assumes that there are no hash collisions (dealing with collisions would be straightforward):
ret = -1
// T is a tree node, H is a hash set, and first is a boolean flag
hashTree(T, H, first):
if (T is null):
return 0 // leaf case
h = hash(hashTree(T.left, H, first), hashTree(T.right, H, first), T.key)
if (first):
// store hashes of T1's nodes in the set H
H.insert(h)
else:
// check for hashes of T2's nodes in the set H containing T1's nodes
if H.contains(h):
ret = max(ret, size(T)) // size is recursive and memoized to get O(n) total time
return h
H = {}
hashTree(T1, H, true)
hashTree(T2, H, false)
return ret
Note that this is assuming the standard definition of a subtree of a BST, namely that a subtree consists of a node and all of its descendants.
Assuming there are no duplicate values in the trees:
LargestSubtree(Tree tree1, Tree tree2)
Int bestMatch := 0
Int bestMatchCount := 0
For each Node n in tree1 //should iterate breadth-first
//possible optimization: we can skip every node that is part of each subtree we find
Node n2 := BinarySearch(tree2(n.value))
Int matchCount := CountMatches(n, n2)
If (matchCount > bestMatchCount)
bestMatch := n.value
bestMatchCount := matchCount
End
End
Return ExtractSubtree(BinarySearch(tree1(bestMatch)), BinarySearch(tree2(bestMatch)))
End
CountMatches(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return 0
End
Return 1 + CountMatches(n1.left, n2.left) + CountMatches(n1.right, n2.right)
End
ExtractSubtree(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return nil
End
Node result := New Node(n1.value)
result.left := ExtractSubtree(n1.left, n2.left)
result.right := ExtractSubtree(n1.right, n2.right)
Return result
End
To briefly explain, this is a brute-force solution to the problem. It does a breadth-first walk of the first tree. For each node, it performs a BinarySearch of the second tree to locate the corresponding node in that tree. Then using those nodes it evaluates the total size of the common subtree rooted there. If the subtree is larger than any previously found subtree, it remembers it for later so that it can construct and return a copy of the largest subtree when the algorithm completes.
This algorithm does not handle duplicate values. It could be extended to do so by using a BinarySearch implementation that returns a list of all nodes with the given value, instead of just a single node. Then the algorithm could iterate this list and evaluate the subtree for each node and then proceed as normal.
The running time of this algorithm is O(n log m) (it traverses n nodes in the first tree, and performs a log m binary-search operation for each one), putting it on par with most common sorting algorithms. The space complexity is O(1) while running (nothing allocated beyond a few temporary variables), and O(n) when it returns its result (because it creates an explicit copy of the subtree, which may not be required depending upon exactly how the algorithm is supposed to express its result). So even this brute-force approach should perform reasonably well, although as noted by other answers an O(n) solution is possible.
There are also possible optimizations that could be applied to this algorithm, such as skipping over any nodes that were contained in a previously evaluated subtree. Because the tree-walk is breadth-first we know than any node that was part of some prior subtree cannot ever be the root of a larger subtree. This could significantly improve the performance of the algorithm in certain cases, but the worst-case running time (two trees with no common subtrees) would still be O(n log m).
I believe that I have an O(n + m)-time, O(n + m) space algorithm for solving this problem, assuming the trees are of size n and m, respectively. This algorithm assumes that the values in the trees are unique (that is, each element appears in each tree at most once), but they do not need to be binary search trees.
The algorithm is based on dynamic programming and works with the following intution: suppose that we have some tree T with root r and children T1 and T2. Suppose the other tree is S. Now, suppose that we know the maximum common subtree of T1 and S and of T2 and S. Then the maximum subtree of T and S
Is completely contained in T1 and r.
Is completely contained in T2 and r.
Uses both T1, T2, and r.
Therefore, we can compute the maximum common subtree (I'll abbreviate this as MCS) as follows. If MCS(T1, S) or MCS(T2, S) has the roots of T1 or T2 as roots, then the MCS we can get from T and S is given by the larger of MCS(T1, S) and MCS(T2, S). If exactly one of MCS(T1, S) and MCS(T2, S) has the root of T1 or T2 as a root (assume w.l.o.g. that it's T1), then look up r in S. If r has the root of T1 as a child, then we can extend that tree by a node and the MCS is given by the larger of this augmented tree and MCS(T2, S). Otherwise, if both MCS(T1, S) and MCS(T2, S) have the roots of T1 and T2 as roots, then look up r in S. If it has as a child the root of T1, we can extend the tree by adding in r. If it has as a child the root of T2, we can extend that tree by adding in r. Otherwise, we just take the larger of MCS(T1, S) and MCS(T2, S).
The formal version of the algorithm is as follows:
Create a new hash table mapping nodes in tree S from their value to the corresponding node in the tree. Then fill this table in with the nodes of S by doing a standard tree walk in O(m) time.
Create a new hash table mapping nodes in T from their value to the size of the maximum common subtree of the tree rooted at that node and S. Note that this means that the MCS-es stored in this table must be directly rooted at the given node. Leave this table empty.
Create a list of the nodes of T using a postorder traversal. This takes O(n) time. Note that this means that we will always process all of a node's children before the node itself; this is very important!
For each node v in the postorder traversal, in the order they were visited:
Look up the corresponding node in the hash table for the nodes of S.
If no node was found, set the size of the MCS rooted at v to 0.
If a node v' was found in S:
If neither of the children of v' match the children of v, set the size of the MCS rooted at v to 1.
If exactly one of the children of v' matches a child of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the subtree rooted at that child.
If both of the children of v' match the children of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the left subtree plus the size of the MCS of the right subtree.
(Note that step (4) runs in expected O(n) time, since it visits each node in S exactly once, makes O(n) hash table lookups, makes n hash table inserts, and does a constant amount of processing per node).
Iterate across the hash table and return the maximum value it contains. This step takes O(n) time as well. If the hash table is empty (S has size zero), return 0.
Overall, the runtime is O(n + m) time expected and O(n + m) space for the two hash tables.
To see a correctness proof, we proceed by induction on the height of the tree T. As a base case, if T has height zero, then we just return zero because the loop in (4) does not add anything to the hash table. If T has height one, then either it exists in T or it does not. If it exists in T, then it can't have any children at all, so we execute branch 4.3.1 and say that it has height one. Step (6) then reports that the MCS has size one, which is correct. If it does not exist, then we execute 4.2, putting zero into the hash table, so step (6) reports that the MCS has size zero as expected.
For the inductive step, assume that the algorithm works for all trees of height k' < k and consider a tree of height k. During our postorder walk of T, we will visit all of the nodes in the left subtree, then in the right subtree, and finally the root of T. By the inductive hypothesis, the table of MCS values will be filled in correctly for the left subtree and right subtree, since they have height ≤ k - 1 < k. Now consider what happens when we process the root. If the root doesn't appear in the tree S, then we put a zero into the table, and step (6) will pick the largest MCS value of some subtree of T, which must be fully contained in either its left subtree or right subtree. If the root appears in S, then we compute the size of the MCS rooted at the root of T by trying to link it with the MCS-es of its two children, which (inductively!) we've computed correctly.
Whew! That was an awesome problem. I hope this solution is correct!
EDIT: As was noted by #jonderry, this will find the largest common subgraph of the two trees, not the largest common complete subtree. However, you can restrict the algorithm to only work on subtrees quite easily. To do so, you would modify the inner code of the algorithm so that it records a subtree of size 0 if both subtrees aren't present with nonzero size. A similar inductive argument will show that this will find the largest complete subtree.
Though, admittedly, I like the "largest common subgraph" problem a lot more. :-)
The following algorithm computes all the largest common subtrees of two binary trees (with no assumption that it is a binary search tree). Let S and T be two binary trees. The algorithm works from the bottom of the trees up, starting at the leaves. We start by identifying leaves with the same value. Then consider their parents and identify nodes with the same children. More generally, at each iteration, we identify nodes provided they have the same value and their children are isomorphic (or isomorphic after swapping the left and right children). This algorithm terminates with the collection of all pairs of maximal subtrees in T and S.
Here is a more detailed description:
Let S and T be two binary trees. For simplicity, we may assume that for each node n, the left child has value <= the right child. If exactly one child of a node n is NULL, we assume the right node is NULL. (In general, we consider two subtrees isomorphic if they are up to permutation of the left/right children for each node.)
(1) Find all leaf nodes in each tree.
(2) Define a bipartite graph B with edges from nodes in S to nodes in T, initially with no edges. Let R(S) and T(S) be empty sets. Let R(S)_next and R(T)_next also be empty sets.
(3) For each leaf node in S and each leaf node in T, create an edge in B if the nodes have the same value. For each edge created from nodeS in S to nodeT in T, add all the parents of nodeS to the set R(S) and all the parents of nodeT to the set R(T).
(4) For each node nodeS in R(S) and each node nodeT in T(S), draw an edge between them in B if they have the same value AND
{
(i): nodeS->left is connected to nodeT->left and nodeS->right is connected to nodeT->right, OR
(ii): nodeS->left is connected to nodeT->right and nodeS->right is connected to nodeT->left, OR
(iii): nodeS->left is connected to nodeT-> right and nodeS->right == NULL and nodeT->right==NULL
(5) For each edge created in step (4), add their parents to R(S)_next and R(T)_next.
(6) If (R(S)_next) is nonempty {
(i) swap R(S) and R(S)_next and swap R(T) and R(T)_next.
(ii) Empty the contents of R(S)_next and R(T)_next.
(iii) Return to step (4).
}
When this algorithm terminates, R(S) and T(S) contain the roots of all maximal subtrees in S and T. Furthermore, the bipartite graph B identifies all pairs of nodes in S and nodes in T that give isomorphic subtrees.
I believe this algorithm has complexity is O(n log n), where n is the total number of nodes in S and T, since the sets R(S) and T(S) can be stored in BST’s ordered by value, however I would be interested to see a proof.

Resources