How to organize a MST when one node disappear? - algorithm

I'm doing my research and stuck with a question:
I am having a minimum spanning tree (prim algorithm), now one node in my tree gets deleted, I wonder if there is a way i can re-organize my tree such that the optimality still maintains?
I'm looking for some suggestions here and I will appreciate your help.
Thank you!

This problem has been well-studied. Research done in 2001 found a way to maintain a graph data structure so that you can insert or remove an edge and update the minimum spanning tree in time O(log4 n), which to the best of my knowledge is the best time bound anyone has been able to come up with so far. The paper describing this algorithm is dense and tricky, but if you're interested you can find it here:
Poly-Logarithmic Deterministic Fully-Dynamic Algorithms for Connectivity, Minimum Spanning Tree, 2-Edge, and Biconnectivity
Hope this helps!

When you remove a node in the tree, it may divide the graph into more than one disconnected component. In the worst case, imagine an MST where all the edges go from one central node to all the others - like a star. In this event, if the central node is removed, the whole MST will have to be redone. So, I guess the short answer is - it depends on what node is removed. The solution is to do it like aix mentioned - find all the components which are disconnected because of the removed node and connect them greedily. This could stretch from 0 (if a leaf node is removed) to n-1 (if the center of a star is removed)

If the deleted vertex was a leaf of the MST, you don't need to do anything: you still have a spanning tree and it's still optimal.
If it wasn't a leaf, you now have two sub-trees. All you need to do is reconnect them by the shortest edge that exists between the two sub-trees. The best way to find that edge is probably by using whatever data structure you used for the Prim's algorithm (or, trivially, in O(n^2) by considering all pairs of vertices).

Related

Efficient way for finding the nodes having maximum degree of separation in a connected graph

I am working on a graph library.It has to have a function which finds the two nodes which are most separated i.e they maximum number of the minimum number of nodes required to traverse before reaching the target node from the source node.
One naive way would be to calculate the degree of separation from each node to all other node and repeat the same for every node.
The complexity of this turns out to be O(n^2).
Any better solution to this problem ?
Use Floyd-Warshall algorithm to find all pairs shortest path. Then iterate through results and find one with the longest path.
Without any assumptions on the graph, Floyd-Warshall is the way to go.
If your graph is sparse (i.e. it has a relatively few edges by node, or |E|<<|N|^2), then Johnson is likely to be faster.
With unit edge weight (which seems to be your case), a naïve approach by computing the furthest node (with BFS) for each node leads to O(|N|.|E|). This can probably be improved further, but I don't see a way right now.

What are the practical factors to consider when choosing between Depth-First Search (DFS) and Breadth-First Search (BFS)? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
This post was edited and submitted for review 6 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I understand the differences between DFS and BFS, but I'm interested to know what factors to consider when choosing DFS vs BFS.
Things like avoiding DFS for very deep trees, etc.
That heavily depends on the structure of the search tree and the number and location of solutions (aka searched-for items).
If you know a solution is not far from the root of the tree, a
breadth first search (BFS) might be better.
If the tree is very deep and solutions are rare, depth first search
(DFS) might take an extremely long time, but BFS could be faster.
If the tree is very wide, a BFS might need too much memory, so it
might be completely impractical.
If solutions are frequent but located deep in the tree, BFS could be
impractical.
If the search tree is very deep you will need to restrict the search
depth for depth first search (DFS), anyway (for example with
iterative deepening).
But these are just rules of thumb; you'll probably need to experiment.
I think in practice you'll usually not use these algorithms in their pure form anyway. There could be heuristics that help to explore promising parts of the search space first, or you might want to modify your search algorithm to be able to parallelize it efficiently.
Depth-first Search
Depth-first searches are often used in simulations of games (and game-like situations in the real world). In a typical game you can choose one of several possible actions. Each choice leads to further choices, each of which leads to further choices, and so on into an ever-expanding tree-shaped graph of possibilities.
For example in games like Chess, tic-tac-toe when you are deciding what move to make, you can mentally imagine a move, then your opponent’s possible responses, then your responses, and so on. You can decide what to do by seeing which move leads to the best outcome.
Only some paths in a game tree lead to your win. Some lead to a win by your opponent, when you reach such an ending, you must back up, or backtrack, to a previous node and try a different path. In this way you explore the tree until you find a path with a successful conclusion. Then you make the first move along this path.
Breadth-first search
The breadth-first search has an interesting property: It first finds all the vertices that are one edge away from the starting point, then all the vertices that are two edges away, and so on. This is useful if you’re trying to find the shortest path from the starting vertex to a given vertex. You start a BFS, and when you find the specified vertex, you know the path you’ve traced so far is the shortest path to the node. If there were a shorter path, the BFS would have found it already.
Breadth-first search can be used for finding the neighbour nodes in peer to peer networks like BitTorrent, GPS systems to find nearby locations, social networking sites to find people in the specified distance and things like that.
Nice Explanation from
http://www.programmerinterview.com/index.php/data-structures/dfs-vs-bfs/
An example of BFS
Here’s an example of what a BFS would look like. This is something like Level Order Tree Traversal where we will use QUEUE with ITERATIVE approach (Mostly RECURSION will end up with DFS). The numbers represent the order in which the nodes are accessed in a BFS:
In a depth first search, you start at the root, and follow one of the branches of the tree as far as possible until either the node you are looking for is found or you hit a leaf node ( a node with no children). If you hit a leaf node, then you continue the search at the nearest ancestor with unexplored children.
An example of DFS
Here’s an example of what a DFS would look like. I think post order traversal in binary tree will start work from the Leaf level first. The numbers represent the order in which the nodes are accessed in a DFS:
Differences between DFS and BFS
Comparing BFS and DFS, the big advantage of DFS is that it has much lower memory requirements than BFS, because it’s not necessary to store all of the child pointers at each level. Depending on the data and what you are looking for, either DFS or BFS could be advantageous.
For example, given a family tree if one were looking for someone on the tree who’s still alive, then it would be safe to assume that person would be on the bottom of the tree. This means that a BFS would take a very long time to reach that last level. A DFS, however, would find the goal faster. But, if one were looking for a family member who died a very long time ago, then that person would be closer to the top of the tree. Then, a BFS would usually be faster than a DFS. So, the advantages of either vary depending on the data and what you’re looking for.
One more example is Facebook; Suggestion on Friends of Friends. We need immediate friends for suggestion where we can use BFS. May be finding the shortest path or detecting the cycle (using recursion) we can use DFS.
Breadth First Search is generally the best approach when the depth of the tree can vary, and you only need to search part of the tree for a solution. For example, finding the shortest path from a starting value to a final value is a good place to use BFS.
Depth First Search is commonly used when you need to search the entire tree. It's easier to implement (using recursion) than BFS, and requires less state: While BFS requires you store the entire 'frontier', DFS only requires you store the list of parent nodes of the current element.
DFS is more space-efficient than BFS, but may go to unnecessary depths.
Their names are revealing: if there's a big breadth (i.e. big branching factor), but very limited depth (e.g. limited number of "moves"), then DFS can be more preferrable to BFS.
On IDDFS
It should be mentioned that there's a less-known variant that combines the space efficiency of DFS, but (cummulatively) the level-order visitation of BFS, is the iterative deepening depth-first search. This algorithm revisits some nodes, but it only contributes a constant factor of asymptotic difference.
When you approach this question as a programmer, one factor stands out: if you're using recursion, then depth-first search is simpler to implement, because you don't need to maintain an additional data structure containing the nodes yet to explore.
Here's depth-first search for a non-oriented graph if you're storing “already visited” information in the nodes:
def dfs(origin): # DFS from origin:
origin.visited = True # Mark the origin as visited
for neighbor in origin.neighbors: # Loop over the neighbors
if not neighbor.visited: dfs(neighbor) # Visit each neighbor if not already visited
If storing “already visited” information in a separate data structure:
def dfs(node, visited): # DFS from origin, with already-visited set:
visited.add(node) # Mark the origin as visited
for neighbor in node.neighbors: # Loop over the neighbors
if not neighbor in visited: # If the neighbor hasn't been visited yet,
dfs(neighbor, visited) # then visit the neighbor
dfs(origin, set())
Contrast this with breadth-first search where you need to maintain a separate data structure for the list of nodes yet to visit, no matter what.
One important advantage of BFS would be that it can be used to find the shortest path between any two nodes in an unweighted graph.
Whereas, we cannot use DFS for the same.
The following is a comprehensive answer to what you are asking.
In simple terms:
Breadth First Search (BFS) algorithm, from its name "Breadth", discovers all the neighbours of a node through the out edges of the node then it discovers the unvisited neighbours of the previously mentioned neighbours through their out edges and so forth, till all the nodes reachable from the origional source are visited (we can continue and take another origional source if there are remaining unvisited nodes and so forth). That's why it can be used to find the shortest path (if there is any) from a node (origional source) to another node if the weights of the edges are uniform.
Depth First Search (DFS) algorithm, from its name "Depth", discovers the unvisited neighbours of the most recently discovered node x through its out edges. If there is no unvisited neighbour from the node x, the algorithm backtracks to discover the unvisited neighbours of the node (through its out edges) from which node x was discovered, and so forth, till all the nodes reachable from the origional source are visited (we can continue and take another origional source if there are remaining unvisited nodes and so forth).
Both BFS and DFS can be incomplete. For example if the branching factor of a node is infinite, or very big for the resources (memory) to support (e.g. when storing the nodes to be discovered next), then BFS is not complete even though the searched key can be at a distance of few edges from the origional source. This infinite branching factor can be because of infinite choices (neighbouring nodes) from a given node to discover.
If the depth is infinite, or very big for the resources (memory) to support (e.g. when storing the nodes to be discovered next), then DFS is not complete even though the searched key can be the third neighbor of the origional source. This infinite depth can be because of a situation where there is, for every node the algorithm discovers, at least a new choice (neighbouring node) that is unvisited before.
Therefore, we can conclude when to use the BFS and DFS. Suppose we are dealing with a manageable limited branching factor and a manageable limited depth. If the searched node is shallow i.e. reachable after some edges from the origional source, then it is better to use BFS. On the other hand, if the searched node is deep i.e. reachable after a lot of edges from the origional source, then it is better to use DFS.
For example, in a social network if we want to search for people who have similar interests of a specific person, we can apply BFS from this person as an origional source, because mostly these people will be his direct friends or friends of friends i.e. one or two edges far.
On the other hand, if we want to search for people who have completely different interests of a specific person, we can apply DFS from this person as an origional source, because mostly these people will be very far from him i.e. friend of friend of friend.... i.e. too many edges far.
Applications of BFS and DFS can vary also because of the mechanism of searching in each one. For example, we can use either BFS (assuming the branching factor is manageable) or DFS (assuming the depth is manageable) when we just want to check the reachability from one node to another having no information where that node can be. Also both of them can solve same tasks like topological sorting of a graph (if it has).
BFS can be used to find the shortest path, with unit weight edges, from a node (origional source) to another. Whereas, DFS can be used to exhaust all the choices because of its nature of going in depth, like discovering the longest path between two nodes in an acyclic graph. Also DFS, can be used for cycle detection in a graph.
In the end if we have infinite depth and infinite branching factor, we can use Iterative Deepening Search (IDS).
I think it depends on what problems you are facing.
shortest path on simple graph -> bfs
all possible results -> dfs
search on graph(treat tree, martix as a graph too) -> dfs
....
Some algorithms depend on particular properties of DFS (or BFS) to work. For example the Hopcroft and Tarjan algorithm for finding 2-connected components takes advantage of the fact that each already visited node encountered by DFS is on the path from root to the currently explored node.
For BFS, we can consider Facebook example. We receive suggestion to add friends from the FB profile from other other friends profile. Suppose A->B, while B->E and B->F, so A will get suggestion for E And F. They must be using BFS to read till second level.
DFS is more based on scenarios where we want to forecast something based on data we have from source to destination. As mentioned already about chess or sudoku.
Once thing I have different here is, I believe DFS should be used for shortest path because DFS will cover the whole path first then we can decide the best. But as BFS will use greedy's approach so might be it looks like its the shortest path, but the final result might differ.
Let me know whether my understanding is wrong.
According to the properties of DFS and BFS.
For example,when we want to find the shortest path.
we usually use bfs,it can guarantee the 'shortest'.
but dfs only can guarantee that we can come from this point can achieve that point ,can not guarantee the 'shortest'.
Because Depth-First Searches use a stack as the nodes are processed, backtracking is provided with DFS. Because Breadth-First Searches use a queue, not a stack, to keep track of what nodes are processed, backtracking is not provided with BFS.
When tree width is very large and depth is low use DFS as recursion stack will not overflow.Use BFS when width is low and depth is very large to traverse the tree.
This is a good example to demonstrate that BFS is better than DFS in certain case. https://leetcode.com/problems/01-matrix/
When correctly implemented, both solutions should visit cells that have farther distance than the current cell +1.
But DFS is inefficient and repeatedly visited the same cell resulting O(n*n) complexity.
For example,
1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,
0,0,0,0,0,0,0,0,

How to detect if breaking an edge will make a graph disjoint?

I have a graph that starts off with a single, root node. Nodes are added one by one to the graph. At node creation time, they have to be linked either to the root node, or to another node, by a single edge. Edges can also be created and deleted (one by one, between any two nodes). Nodes can be deleted one at a time. Node and edge creation, deletion operations can happen in any arbitrary order.
OK, so here's my question: When an edge is deleted, is it possible do determine, in constant time (i.e. with an O(1) algorithm), if doing this will divide the graph into two disjoint subgraphs? If it will, then which side of the edge will the root node belong?
I'm willing to maintain, within reasonable limits, any additional data structure that can facilitate the derivation of this information.
Maybe it is not possible to do it in O(1), if so any pointers to literature will be appreciated.
Edit: The graph is a directed graph.
Edit 2: OK, maybe I can restrict the case to deletion of edges from the root node. [Edit 3: not, actually] Also, no edge lands into the root node.
To speed things up a little over the obvious O(|V|+|E|) solution, you could keep a spanning tree which is fairly easy to update as the graph is changed.
If an edge not in the spanning tree is deleted, then the graph isn't disconnected and do nothing. If an edge in the spanning tree is deleted, then you must try to find a new path between those two vertices (if you find one, use it to update the spanning tree, otherwise the graph is disconnected).
So, best case O(1), worst-case O(|V|+|E|), but fairly simple to implement anyway.
Is this a directed graph? The below assumes undirected.
What you are looking for is whether the given edge is a Bridge in the graph. I believe this can be found using a traversal looking for cycles containing that edge and would be O(|V| + |E|).
O(1) is too much to ask.
You might find that looking to maintain 2-edge connected components in dynamic graphs could be useful to you.
Eppstein et al have a paper on this: http://www.ics.uci.edu/~eppstein/pubs/EppGalIta-TR-93-20.pdf
which can maintain 2-edge connected components, in a graph of n nodes where edge insertions and deletions are allowed. It has O(sqrt(n)) time per update and O(log n) time per query.
So any time you delete, you can query in O(logn) to determine if the number of 2-edge connected components has changed. I suppose it can also tell you which component a specific node is in.
This paper is more general and applies to other graph problems, not only 2 edge connected components.
I suggest you look for bridges and dynamic 2-edge connectivity to get you started.
Hope that helps.
as said by Moron just before, you are actually looking for a Bridge in your graph.
Now a Bridge is an edge that has the described attribute and also originates and ends up in Cut Vertexes. Cut vertex is exactly what a Bridge is, but in a vertex (node) edition.
So the only way (though quite bending the initial data structure hypothesis) I can think of, to get a O(1) complexity for this, is if you first check every node in your graph if it is a Cut Vertex and then simply in constant time checking if the edge you want to delete is a attached to one of those two.
Finding if a node in a graph is a Cut Vertex takes O(m+n) where m = # edges and n= # nodes.
Cheers

Minimum cost strongly connected digraph

I have a digraph which is strongly connected (i.e. there is a path from i to j and j to i for each pair of nodes (i, j) in the graph G). I wish to find a strongly connected graph out of this graph such that the sum of all edges is the least.
To put it differently, I need to get rid of edges in such a way that after removing them, the graph will still be strongly connected and of least cost for the sum of edges.
I think it's an NP hard problem. I'm looking for an optimal solution, not approximation, for a small set of data like 20 nodes.
Edit
A more general description: Given a grap G(V,E) find a graph G'(V,E') such that if there exists a path from v1 to v2 in G than there also exists a path between v1 and v2 in G' and sum of each ei in E' is the least possible. so its similar to finding a minimum equivalent graph, only here we want to minimize the sum of edge weights rather than sum of edges.
Edit:
My approach so far:
I thought of solving it using TSP with multiple visits, but it is not correct. My goal here is to cover each city but using a minimum cost path. So, it's more like the cover set problem, I guess, but I'm not exactly sure. I'm required to cover each and every city using paths whose total cost is minimum, so visiting already visited paths multiple times does not add to the cost.
Your problem is known as minimum spanning strong sub(di)graph (MSSS) or, more generally, minimum cost spanning sub(di)graph and is NP-hard indeed. See also another book: page 501 and page 480.
I'd start with removing all edges that don't satisfy the triangle inequality - you can remove edge a -> c if going a -> b -> c is cheaper. This reminds me of TSP, but don't know if that leads anywhere.
My previous answer was to use the Chu-Liu/Edmonds algorithm which solves Arborescence problem; as Kazoom and ShreevatsaR pointed out, this doesn't help.
I would try this in a dynamic programming kind of way.
0- put the graph into a list
1- make a list of new subgraphs of each graph in the previous list, where you remove one different edge for each of the new subgraphs
2- remove duplicates from the new list
3- remove all graphs from the new list that are not strongly connected
4- compare the best graph from the new list with the current best, if better, set new current best
5- if the new list is empty, the current best is the solution, otherwise, recurse/loop/goto 1
In Lisp, it could perhaps look like this:
(defun best-subgraph (digraphs &optional (current-best (best digraphs)))
(let* ((new-list (remove-if-not #'strongly-connected
(remove-duplicates (list-subgraphs-1 digraphs)
:test #'digraph-equal)))
(this-best (best (cons current-best new-list))))
(if (null new-list)
this-best
(best-subgraph new-list this-best))))
The definitions of strongly-connected, list-subgraphs-1, digraph-equal, best, and better are left as an exercise for the reader.
This problem is equivalent to the problem described here: http://www.facebook.com/careers/puzzles.php?puzzle_id=1
Few ideas that helped me to solve the famous facebull puzzle:
Preprocessing step:
Pruning: remove all edges a-b if there are cheaper or having the same cost path, for example: a-c-b.
Graph decomposition: you can solve subproblems if the graph has articulation points
Merge vertexes into one virtual vertex if there are only one outgoing edge.
Calculation step:
Get approximate solution using the directed TSP with repeated visits. Use Floyd Warshall and then solve Assignment problem O(N^3) using hungarian method. If we got once cycle - it's directed TSP solution, if not - use branch and bound TSP. After that we have upper bound value - the cycle of the minimum cost.
Exact solution - branch and bound approach. We remove the vertexes from the shortest cycle and try build strongly connected graph with less cost, than upper bound.
That's all folks. If you want to test your solutions - try it here: http://codercharts.com/puzzle/caribbean-salesman
Sounds like you want to use the Dijkstra algorithm
Seems like what you basically want is an optimal solution for traveling-salesman where it is permitted for nodes to be visited more than once.
Edit:
Hmm. Could you solve this by essentially iterating over each node i and then doing a minimum spanning tree of all the edges pointing to that node i, unioned with another minimum spanning tree of all the edges pointing away from that node?
A 2-approximation to the minimal strongly connected subgraph is obtained by taking a union of a minimal in-branching and minimal out-branching, both rooted at the same (but arbitrary) vertex.
An out-branching, also known as arborescence, is a directed tree rooted at a single vertex spanning all vertexes. An in-branching is the same with reverse edges. These can be found by Edmonds' algorithm in time O(VE), and there are speedups to O(E log(V)) (see the wiki page). There is even an open source implementation.
The original reference for the 2-approximation result is the paper by JaJa and Frederickson, but the paper is not freely accessible.
There is even a 3/2 approximation by Adrian Vetta (PDF), but more complicated than the above.

How do I test whether a tree has a perfect matching in linear time?

Give a linear-time algorithm to test whether a tree has a perfect matching,
that is, a set of edges that touches each vertext of the tree exactly once.
This is from Algorithms by S. Dasgupta, and I just can't seem to nail this problem down. I know I need to use a greedy approach in some manner, but I just can't figure this out. Help?
Pseudocode is fine; once I have the idea, I can implement in any language trivially.
The algorithm has to be linear in anything. O( V + E ) is fine.
I think I have the solution. Since we know the graph is a tree, we know of the existance of leaf nodes, nodes with one edge and no children. In order for this node to be included in the perfect matching, that edge MUST exist in the final solution.
Ergo, we can find all edges connecting to a leaf node, add to the solution, and remove the touched edges from the graph. If, at the end of this process, we are left any remaining nodes untounched, there exists no perfect matching.
In the case of a "graph",
The first step of the problem should be to find the connected components.
Since every edge in the final answer connect two vertices, they belong to at most one of the connected components.
Then, the perfect matching could be found for each connected component.
Linear in what? Linear in the number of edges, keep the edges as an ordered incidence list, ie, every edge (vi, vj) in some total order. Then you can compare the two lists in O(n) of the edges.
The working algorithm would be something as follows:
For each leaf in the tree:
add edge from leaf to its parent to the solution
delete edge from leaf to its parent
delete all edges from the parent to any other vertices
delete leaf and parent from the tree
If the tree is empty then the answer is yes. Otherwise, there's no perfect matching.
I think that it's a simplified problem of finding a Hamiltonian path in a graph:
http://en.wikipedia.org/wiki/Hamiltonian_path
http://en.wikipedia.org/wiki/Hamiltonian_path_problem
I think that there are many solutions on the internet to this problem, but generally finding Hamilton cycle is a NP problem.

Resources