How can a binomial heap be useful in finding connected components of a graph, it cannot be used then why?
I've never seen binomial heaps used this way, since graph connected components are usually found using a depth-first search or breadth-first search, and neither algorithm requires you to use any sort of priority queue. You could, of course, do a sort of "priority-first search" to find connected components by replacing the stack or queue of DFS or BFS with a priority queue, but there's little reason to do so. That would slow the cost of finding connected components down to O(m + n log n) rather than the O(m + n) you'd get from a vanilla BFS or DFS.
There is one way in which you can tenuously say that binomial heaps might be useful, and that's in a different strategy for finding connected components. You can, alternatively, use a disjoint-set forest to identify connected components. You begin with each node in its own partition, then call the union operation for each edge to link nodes together. When you've finished, you will end up with a collection of trees, each of which represents one connected component.
There are many strategies for determining how to link trees in a disjoint-set forest. One of them is union-by-size, in which whenever you need to pick which representative to change, you pick the tree of smaller size and point it at the tree of larger size. You can prove that the smallest tree of height k that can be formed this way is a binomial tree of rank k. That's formed by pairing off all the nodes, then taking the representatives and pairing them off, etc. (Try it for yourself - isn't that cool?)
But that, to me, feels more like a coincidence than anything else. This is less about binomial heaps and more about binomial trees, and this particular shape only arises if you're looking for a pathological case rather than as a matter of course in the execution of the algorithm.
So the best answer I have is "technically you could do this, but you shouldn't, and technically binomial trees arise in this other context that's related to connected components, but that's not the same as using binomial heaps."
Hope this helps!
Related
You are given a connected graph G and a series of edges S. One at a time, an edge from S is removed from G. You then check to see if G is still connected. If G is no longer connected, you return the edge. Otherwise, you remove the edge from the graph and continue.
My initial thought was to use Tarjan's bridge finding algorithm, which builds a DFS tree and then checks to see if a given vertex has a back-edge connecting one of its descendants to it or one of its ancestors. If it does not, then it is a bridge.
You can find all of the bridges in O(V+E) time while building the tree, but I am having problems adapting Tarjan's algorithm to account for deletions. Every time you delete an edge, the tree changes, and I am having trouble keeping the algorithm at O(V+E) time. Any thoughts?
You can do this fairly easily in almost-linear, O(E * alpha(V)) time, where alpha is the ridiculously-slowly-growing inverse-Ackermann function, using a disjoint-set data structure. The trick is to reverse S and add edges, so you ask about when G first becomes connected, rather than disconnected. The incremental connectivity problem is quite a bit easier than the decremental connectivity case.
Given an implementation of disjoint-set, you can augment it to track the number of components, and a graph is connected exactly when there is only one component. If you start with n components, then before any Union(x, y) operation, check whether x and y belong to the same component. If they don't, then the union operation will reduce the graph's components by 1.
For your graph, you'll need to preprocess S to find all of the edges in G that are not in S, and add these to the disjoint-set data structure first. Then, if adding S[i] causes the graph to be connected for the first time, the answer is i, since we've already added S[i+1], S[i+2], ... S[n-1].
Optimal Complexity
The inverse-Ackermann function is at most 4 for any input that fits in our universe, so our Union-find algorithm is usually considered 'pretty much linear'. However, if that's not good enough...
You can do this in O(V+E), although it's quite complex, and probably of mostly theoretical interest. Gabow and Tarjan's 1984 paper found an algorithm that supports Union-Find in O(1) amortized cost per operation if we know the order of all union operations, which we do here. It still uses the disjoint-set data structure, but adds additional auxiliary structures to cache node information on small sets.
Some pseudocode is provided in the paper, but you'll probably need to implement this yourself or look for implementations online. It also only works in the word RAM model, since it fundamentally relies on manipulating small bit-strings in constant time to use them as lookup tables (a fairly standard assumption, but you'll need to do some low-level bit manipulation).
Find the bridge edges
FOR E in S
IF E is a bridge
STOP
remove E
IF E1 is disconnected ( zero edges on E1 )
STOP
IF E2 is disconnected
STOP
E1 and E2 are the vertices connected by E
Using a disjoint set forest, we can efficiently check incrementally whether a graph has become connected while adding edges to it. (Or rather, it allows us to check whether two vertices already belong to the same connected component.) This is useful, for example, in Kruskal's algorithm for computing the minimum spanning tree.
Is there a similarly efficient data structure and algorithm to check whether a graph has become disconnected while removing edges from it?
More formally, I have a connected graph G = (V, E) from which I'm iteratively removing edges one by one. Without loss of generality, let's say these are e_1, e_2, and so on. I want to stop right before G becomes disconnected, so I need a way to check whether the removal of e_i will make the graph disconnected. This can of course be done by removing e_i and then doing a breadth-first or depth-first search from an arbitrary vertex, but this is an O(|E|) operation, making the entire algorithm O(|E|²). I'm hoping there is a faster way by somehow leveraging results from the previous iteration.
I've looked at partition refinement but it doesn't quite solve this problem.
There’s a family of data structures that are designed to support queries of the form “add an edge between two nodes,” “delete an edge between two nodes,” and “is there a path from node x to node y?” This is called the dynamic connectivity problem and there are many data structures that you can use to solve it. Unfortunately, these data structures tend to be much more complicated than a disjoint-set forest, so you may want to search around to see if you can find a ready-made implementation somewhere.
The layered forest structure of Holm et al works by maintaining a hierarchy of spanning forests for the graph and doing some clever bookkeeping with edges to avoid rescanning them when seeing if two parts of the graph are still linked. It supports adding and deleting edges in amortized time O(log2 n) and queries of the form “are these two nodes connected?” in time O(log n / log log n).
A more recent randomized structure called the randomized cutset was developed by Kapron et al. It has worst-case O(log5 n) costs for all three operations (IIRC).
There might be a better way to solve your particular problem that doesn’t require these heavyweight approaches, but I figured I’d mention these in case they’re helpful.
apologies first, english is not my first language.
So here's my understanding on graph that's represented as adjancey list: It's usually used for sparse graph, which is the case for most of graphs, and it uses V (number of vertex) lists. so, V head pointers + 2e (# of edges) nodes for undirected graph. Therefore, space complexity = O(E+V)
Since any node can have upto V-1 edges (excluding itself) it has time complexity of O(V) to check a node's adjacency.
As to check all the edges, it takes O(2e + V) so O(v + e)
Now, since it's mostly used for sparse graph, it's rarely O(v) to check adjacency, but simply the number of edges a given vertex has (which is O(V) at worst since V-1 is the possible maximum)
What I'm wondering is, is it possible to make the list (the edge nodes) binary tree? So to find out whether node A is adjacent to node B, time complexity would be O(logn) and not linear O(n).
If it is possible, is it actually done quite often? Also, what is that kind of data structure called? I've been googling if such combinations are possible but couldn't find anything. I would be very grateful if anyone could explain this to me in detail as i'm new to data structure. Thank you.
Edit: I know binary search can be performed on arrays. I'm talking about linked list representation, I thought I made it obvious when I said heads to the lists but wow
There's no reason the adjacency list for each vertex couldn't be stored as a binary tree, but there are tradoffs.
As you say, this adjacency list representation is often used for sparse graphs. Often, "sparse graph" means that a particular vertex is adjacent to few others. So your "adjacency list" for a particular vertex would be very small. Whereas it's true that binary search is O(log n) and sequential search is O(n), when n is very small sequential search is faster. I've seen cases where sequential search beats binary search when n is smaller than 16. It depends on the implementation, of course, but don't count on binary search being faster for small lists.
Another thing to think about is memory. Linked list overhead is one pointer per node. Unless, of course, you're using a doubly linked list. Binary tree overhead is two pointers per node. Perhaps not a big deal, but if you're trying to represent a very large graph, that extra pointer will become important.
If the graph will be updated frequently at run time, you have to take that into account, too. Adding a new edge to a linked list of edges is an O(1) operation. But adding an edge to a binary tree will require O(log n). And you want to make sure you keep that tree balanced. An unbalanced tree starts to act like a linked list.
So, yes, you could make your adjacency lists binary trees. You have to decide whether it's worth the extra effort, based on your application's speed requirements and the nature of your data.
My book defines a method to find the strongly connected components of a directed graph in linear time. In addition several other algorithms to find strongly connected components (i.e. Tarjan's algorithm) is also able to find the components in linear time.
However all of these algorithms require the vertices of the graph to be ordered in decreasing post values (time the vertex is left). Common ordering algorithms such as Mergesort take O(n log n) time.
Therefore how do these algorithms manage to complete locating the strongly connected components in linear time, if ordering the list of vertices by post values takes O(n log n) time?
Since "time" (the kind by which the post values are measured) is monotonically nondecreasing as a function of time (the number of steps executed by the depth-first search program), it suffices to append each node to a list immediately after the traversal leaves it. At the end of the traversal, the list is in sorted order.
Alternatively, since the post values are integers bounded above polynomially, on some machine models it is possible to sort them in linear time using e.g. radix sort.
I have studied the two graph traversal algorithms,depth first search and breadth first search.Since both algorithms are used to solve the same problem of graph traversal I would like to know how to choose between the two.I mean is one more efficient than the other or any reason why i would choose one over the other in a particular scenario ?
Thank You
Main difference to me is somewhat theoretical. If you had an infinite sized graph then DFS would never find an element if it exists outside of the first path it chooses. It would essentially keep going down the first path and would never find the element. The BFS would eventually find the element.
If the size of the graph is finite, DFS would likely find a outlier (larger distance between root and goal) element faster where BFS would find a closer element faster. Except in the case where DFS chooses the path of the shallow element.
In general, BFS is better for problems related to finding the shortest paths or somewhat related problems. Because here you go from one node to all node that are adjacent to it and hence you effectively move from path length one to path length two and so on.
While DFS on the other end helps more in connectivity problems and also in finding cycles in graph(though I think you might be able to find cycles with a bit of modification of BFS too). Determining connectivity with DFS is trivial, if you call the explore procedure twice from the DFS procedure, then the graph is disconnected (this is for an undirected graph). You can see the strongly connected component algorithm for a directed graph here, which is a modification of DFS. Another application of the DFS is topological sorting.
These are some applications of both the algorithms:
DFS:
Connectivity
Strongly Connected Components
Topological Sorting
BFS:
Shortest Path(Dijkstra is some what of a modification of BFS).
Testing whether the graph is Bipartitie.
When traversing a multiply-connected graph, the order in which nodes are traversed may greatly influence (by many orders of magnitude) the number of nodes to be tracked by the traversing method. Some kinds of algorithms will be massively better when using breadth-first; others will be massively better when using depth-search.
At one extreme, doing a depth-first search on a binary tree with N leaf nodes requires that the traversing method keep track of lgN nodes while a breadth-first search would require keeping track of at least N/2 nodes (since it might scan all other nodes before it scans any leaf nodes; immediately prior to scanning the first leaf node, it would have encountered N/2 of the leafs' parent nodes which have to be tracked separately since none of them reference each other).
On the other extreme, doing a flood-fill on an NxN grid with a method that, if its pixel hasn't been colored yet, colors that pixel and then flood-fills its neighbors will require enqueuing O(N) pixel coordinates if using breadth-first search, but O(N^2) pixel coordinates if using depth-first. When using breadth-first search, paint will seem to "spread out", regardless of the shape to be painted; when using depth-first algorithm to paint a rectangular spiral, each line of which is straight on one side and jagged on the other (which sides should be straight and jagged depends upon the exact algorithm used), all of the straight sections will get painted before any of the jagged ones, meaning that the system must track the location of every jag separately.
For a complete/perfect tree, DFS takes a linear amount of space with respect to the depth of the tree whereas BFS takes an exponential amount of space with respect to the depth of the tree. This is because for BFS the maximum number of nodes in the queue is proportional to the number of nodes in one level of the tree. In DFS the maximum number of nodes in the stack is proportional to the depth of the tree.