Consider the following graph. I can distinguish 4 strongly connected components, but they are 5.
Which one I missed? Also, can a node be shared in several components?
The 5 components are:
Top left node
Top right node
Bottom left node
Bottom right node
The rest of the nodes
What you thought of as components are not actually components, because they all can be expanded up to the 5th component from the list.
Notice that it is not possible to extend the listed components, because each of corner nodes is either unreachable from anywhere else (has only outgoing edges) or can't reach any other node (has only incoming edges). Therefore you can't add those corners to bigger component, and can't add anything to corner nodes to make them larger components.
By definition strongly connected components are largest possible (so that it's not possible to further extend them), but there's nothing about not having intersections with each other in definition. However it is easy to show that components defined that way can't have intersections.
Related
The problem statement
Given a directed graph of N nodes where each node is pointing to any one of the N nodes (can possibly point to itself). Ishu, the coder, is bored and he has discovered a problem out of it to keep himself busy. Problem is as follows:
A node is 'good' if it satisfies one of the following:
It is the special node (marked as node 1)
It is pointing to the special node (node 1)
It is pointing to a good node.
Ishu is going to change pointers of some nodes to make them all 'good'. You have to find the minimum number of pointers to change in order to make all the nodes good (Thus, a Good Graph).
NOTE: Resultant Graph should hold the property that all nodes are good and each node must point to exactly one node.
Problem Constraints
1 <= N <= 10^5
Input Format
First and only argument is an integer array A containing N numbers all between 1 to N, where i-th number is the number of node that i-th node is pointing to.
Output Format
An Integer denoting minimum number of pointer changes.
My Attempted Solution
I tried building a graph by reversing the edges, and then trying to colour connected nodes, the answer would be colors - 1. In short , I was attempting to find number of connected components for a directed graph, which does not make sense(as directed graphs do not have any concept of connected components). Other solutions on the web like this and this also point out to find number of connected components, but again connected components for aa directed graph does not make any sense to me. The question looks a bit trickier to me than a first look at it suggests.
#mcdowella gives an accurate characterization of the graph -- in each connected component, all chains of pointers and up in the same dead end or cycle.
There's a complication if the special node has a non-null pointer, though. If the special node has a non-null pointer, then it may or may not be in the terminal cycle for its connected component. First set the special node pointer to null to make this problem go away, and then:
If the graph has m connected components, then m-1 pointers would have to be changed to make all nodes good.
Since you don't have to actually change the pointers, all you need to do to solve the problem is count the connected components in the graph.
There are many ways to do this. If you think of the edges as undirected, then you could use BFS or DFS to trace each connected component, for example.
With the kind of graph you actually have, though, where each node has a directed pointer, the easiest way to solve this is with a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Initially you just put each node in its own set, then union the sets on either side of each pointer, and then count the number of sets by counting the number of roots in the disjoint set structure.
If you haven't implemented a disjoint set structure before, it's a little tough to understand just how easy it is. There are couple simple implementations here: How to properly implement disjoint set data structure for finding spanning forests in Python?
I think this is a very similar question to Graphic: Rerouting problem test in python language. Because each node points to just one other node, your graph consists of a set of cycles (counting a node pointing to itself as a cycle) with trees feeding in to them. You can find a cycle by following pointers between nodes and checking for nodes that you have already visited. You need to break one link in each cycle, redirecting it to point to the special node. This routes everything else in that cycle, and all the trees that feed into it, to the special node.
Since this distance learning thing started I've really struggled to understand data structures and this question really threw me for a loop. I have absolutely no idea how to even start with the code let alone get my point across. Any help at all would be much appreciated...
Pick any node in the graph. That node belongs to some component, and it’s there with potentially a few other nodes. So run a BFS, painting that node and everything reachable from it gold. That’s one component.
Now pick another node. One of two things must be true about it. First, it could be the case that the node has already been painted gold. In that case, you already have counted the component that contains it. Second, it could be unpainted. In that case, you haven’t counted its component, so paint it and all nodes reachable from it gold, and that’s your second component.
Do you think you can generalize this idea so that you count all the components?
As for runtime - how many times does each node get visited this way? Remember that you only paint each node gold once and that each edge is only visited in the course or painting nodes.
The following graph sample is a portion of a directed acyclic graph which is to be layered and cleaned up so that only edges connecting consecutive layers are kept.
So what I need is to eliminate edges that form "shortcuts", that is, that jump between non-consecutive layers.
The following considerations apply:
The bluish ring layering is valid because, starting at 83140 and ending at 29518, both branches have the same amount (3) of intermediary nodes, and there is no path that is longer between start and end node;
The green ring, starting at 94347 and ending at 107263, has an invalid edge (already red-crossed), because the left branch encompasses only one intermediary node, while the right branch encompasses three intermediary nodes; Besides, since the first edge of that branch is already valid - we know it pertains to the valid blue ring - it is possible to know which is the right edge to cross-out - otherwise it would be impossible to know which layer should be assigned to node 94030 and so it should be eliminated;
If we consider the pink ring after considering the green one, we know that the lower red-crossed edge is to be removed.
BUT if we consider only the yellow ring, both branches seem to be right (they contain the same number of inner nodes), but actually they only seem right because they contain symmetric errors (shortcuts jumping the same amount of nodes on both branches). If we take this ring locally, at least one of the branches would end up in wrong layers, so it is necessary to use more global data to avoid this error.
My questions are:
What typical concepts and operations are involved in the formulation and possible solution of this problem?
Is there an algorithm for that?
First, topologically sort the graph.
Now from the beginning of sorted array, start breadth first search and try to find the proper "depth" (i.e distance from root) of every node. Since a node can have multiple parents, for a node x, depth[x] is maximum of depth of all it's parents, plus one. We initialize depth for all nodes as -1.
Now in bfs traversal, when we encounter a node p, we try to update the depth of all it's childs c, where depth[c] = max(depth[c],depth[p]+1). Now there are two ways we can detect a child with shortcut.
if depth[p]+1 < depth[c], it means c has a parent with higher depth than p. So edge p to c must be a shortcut.
if depth[p]+1 > depth[c] and depth[c]!=-1, it means c have a parent with lower depth than p. So p is a better parent, and that other parent of c must have a shortcut with p.
In both cases, we mark c as problematic.
Now our goal is for every 'problematic' node x, we check all it's parent, whose depth should be depth[x]-1. If any of them have depth that is lower than that, that one have a shortcut edge with x that needs to be removed.
Since the graph can have multiple roots, we should have a variable to mark visited nodes, and repeat the above thing for any that's left unvisited.
This will sort the yellow ring problem, because before we visit any node, all it's predecessors has already been visited and properly ranked. This is ensured by the topological sort.
(Note : we can do this by just one pass. Instead of marking problematic nodes, we can maintain a parent variable for all nodes, and delete edge with the old parent whenever case 2 occurs. case 1 should be obvious)
Lets say I have an undirected graph G. Lets say I add the following
add_edge(1,2,G);
add_edge(1,3,G);
add_edge(0,2,G);
Now I lets say add this again:
add_edge(0,2,G);
Do I have two edges in my graph from 0 ---> 2 ?
What happens if I added the edge twice and I do:
remove_edge(0,2,G);
Do both the edges disappear,or do I still have one of them?
The answer to both of your questions depends on the definition of graph G.
The answer to the first question, according to the boost::graph tutorial, depends on which OutEdgeList you use in your graph definition. If you use a container that cannot represent multiple edges (such as setS or hash_setS), there will be only one edge between two vertices no matter how many times you insert it. If you use a vectorS, multisetS or similar, there will be one edge inserted for each call of add_edge().
The answer to the second question, according to the same page (that section of the page does not allow direct links - just search for remove_edge) is that all edges between the two vertices will be removed after calling that particular remove_edge() function. There are several other versions of remove_edge() (described on the same page), each with a slightly different behaviour.
I have a cyclic directed graph. Starting at the leaves, I wish to propagate data attached to each node downstream to all nodes that are reachable from that node. In particular, I need to keep pushing data around any cycles that are reached until the cycles stabilise.
I'm completely sure that this is a stock graph traversal problem. However, I'm having a fair bit of difficulty trying to find a suitable algorithm --- I think I'm missing a few crucial search keywords.
Before I attempt to write my own half-assed O(n^3) algorithm, can anyone point me at a proper solution? And what is this particular problem called?
Since the graph is cyclic (i.e. can contain cycles), I would first break it down into strongly connected components. A strongly connected component of a directed graph is a subgraph where each node is reachable from every other node in the same subgraph. This would yield a set of subgraphs. Notice that a strongly connected component of more than one node is effectively a cycle.
Now, in each component, any information in one node will eventually end up in every other node of the graph (since they are all reachable). Thus for each subgraph we can simply take all the data from all the nodes in it and make every node have the same set of data. No need to keep going through the cycles. Also, at the end of this step, all nodes in the same component contains exactly the same data.
The next step would be to collapse each strongly connected component into a single node. As the nodes within the same component all have the same data, and are therefore basically the same, this operation does not really change the graph. The newly created "super node" will inherit all the edges going out or coming into the component's nodes from nodes outside the component.
Since we have collapsed all strongly connected components, there will be no cycles in the resultant graph (why? because had there been a cycle formed by the resultant nodes, they would all have been placed in the same component in the first place). The resultant graph is now a Directed Acyclic Graph. There are no cycles, and a simple depth first traversal from all nodes with indegree=0 (i.e. nodes that have no incoming edges), propagating data from each node to its adjacent nodes (i.e. its "children"), should get the job done.