How do I solve the following graph problem - algorithm

The problem statement
Given a directed graph of N nodes where each node is pointing to any one of the N nodes (can possibly point to itself). Ishu, the coder, is bored and he has discovered a problem out of it to keep himself busy. Problem is as follows:
A node is 'good' if it satisfies one of the following:
It is the special node (marked as node 1)
It is pointing to the special node (node 1)
It is pointing to a good node.
Ishu is going to change pointers of some nodes to make them all 'good'. You have to find the minimum number of pointers to change in order to make all the nodes good (Thus, a Good Graph).
NOTE: Resultant Graph should hold the property that all nodes are good and each node must point to exactly one node.
Problem Constraints
1 <= N <= 10^5
Input Format
First and only argument is an integer array A containing N numbers all between 1 to N, where i-th number is the number of node that i-th node is pointing to.
Output Format
An Integer denoting minimum number of pointer changes.
My Attempted Solution
I tried building a graph by reversing the edges, and then trying to colour connected nodes, the answer would be colors - 1. In short , I was attempting to find number of connected components for a directed graph, which does not make sense(as directed graphs do not have any concept of connected components). Other solutions on the web like this and this also point out to find number of connected components, but again connected components for aa directed graph does not make any sense to me. The question looks a bit trickier to me than a first look at it suggests.

#mcdowella gives an accurate characterization of the graph -- in each connected component, all chains of pointers and up in the same dead end or cycle.
There's a complication if the special node has a non-null pointer, though. If the special node has a non-null pointer, then it may or may not be in the terminal cycle for its connected component. First set the special node pointer to null to make this problem go away, and then:
If the graph has m connected components, then m-1 pointers would have to be changed to make all nodes good.
Since you don't have to actually change the pointers, all you need to do to solve the problem is count the connected components in the graph.
There are many ways to do this. If you think of the edges as undirected, then you could use BFS or DFS to trace each connected component, for example.
With the kind of graph you actually have, though, where each node has a directed pointer, the easiest way to solve this is with a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Initially you just put each node in its own set, then union the sets on either side of each pointer, and then count the number of sets by counting the number of roots in the disjoint set structure.
If you haven't implemented a disjoint set structure before, it's a little tough to understand just how easy it is. There are couple simple implementations here: How to properly implement disjoint set data structure for finding spanning forests in Python?

I think this is a very similar question to Graphic: Rerouting problem test in python language. Because each node points to just one other node, your graph consists of a set of cycles (counting a node pointing to itself as a cycle) with trees feeding in to them. You can find a cycle by following pointers between nodes and checking for nodes that you have already visited. You need to break one link in each cycle, redirecting it to point to the special node. This routes everything else in that cycle, and all the trees that feed into it, to the special node.

Related

Constructing a binary tree from a list of its edges (node pairs)

I'd like to construct a binary tree from a quite unusual input. The input contains:
Total number of nodes.
The integer label of the root.
A list of all edges (vertices/nodes that are connected to each
other). The edges in the list are UNSORTED, there is only one rule for
determining left/right children - the child in the edge that appears
first in list is always on the left. The order of child/parent in the vertices pair is also random.
I've come up with some straighforward solutions but they require multiple searches through the list of all edges (I'd basically find the 2 edges that have the labeled root in them and repeat this process for all the subtrees.)
I imagine this straightforward approach would be VERY inefficient for trees with a big amount of nodes, but I can't come up with anything else.
Any ideas for more efficient algorithms to solve this?
Here's an example for better visualization:
INPUT: 5 NODES, ROOT LABELED 2, LIST OF EDGES: [(1,0),(1,2),(2,3),(1,4)]
The tree would look like this:
2
1 3
0 4
It is important to clarify whether the given edge list is stated to be directed or not.
If edges are given in a directed fashion (i.e. it is stated that any given edge A-B also includes the information that A is a parent of B) storing the edges in an adjacency list while recording number of incoming edges for each vertex in an array should be sufficient. Once you go through the array for the incoming edges, the vertex with 0 incoming edges(i.e. parents) should be the root. Then you can run a DFS in linear time complexity to traverse the graph and put it in any data structure that is best for your needs.
If the edges given are stated to be undirected, the scheme changes a bit. In that case, you don't have the concept of incoming and outgoing edge. In that case, as no structure for the array is specified(e.g. BST, etc.) you can basically consider any node with less than 3 edges as root and run DFS as mentioned above. (all the leaves and intermediary nodes with single child nodes)
A simple solution is: "Link all the edges in the tree that it!"
Start preparing a dictionary. If nodes don't exist by the start and end point, create them nodes. As it is random in nature, you can set their left and right pointers to NULL initially.
You have rule - " the child in the edge that appears first in list is always on the left.". So create child accordingly.
Also, you already know the root of the tree so you can iterate across the nodes you have constructed so far.
Through this you can generate tree in one shot.
Hope this helps!

Check if a changing undirected graph has at least one circle

I have an undirected graph which initially has no edges. Now in every step an edge is added or deleted and one has to check whether the graph has at least one circle. Probably the easiest sufficient condition for that is
connected components + number of edges <= number of nodes.
As the "steps" I mentioned above are executed millions of times, this check has to be really fast. So I wonder what would be a quick way to check the condition depending on the fact that in each step only one edge changes.
Any suggestions?
If you are keen, you can try to implement a fully dynamic graph connectivity data structure like described in "Poly-logarithmic deterministic fully-dynamic graph algorithms I: connectivity and minimum spanning tree" by Jacob Holm, Kristian de Lichtenberg, Mikkel Thorup.
When adding an edge, you check whether the two endpoints are connected. If not, the number of connected components decreases by one. After deleting an edge, check if the two endpoints are stil connected. If not, the number of connected components increases by one. The amortized runtime of edge insertion and deletion would be O(log^2 n), but I can imagine the constant factor is quite high.
There are newer result with better bounds. There is also an experimental evaluation of some of the dynamic connectivity algorithms that considers implementation details as well. There is also a Javascript implementation. I have no idea how good it is.
I guess in practice you can have it much easier by maintaining a spanning forest. You get edge additions and non-tree edge deletions (almost) for free. For tree edge deletions you could just use "brute force" in the form of BFS or DFS to check whether the end points are still connected. Especially if the number of nodes is bounded, maybe that works well enough in practice, BFS and DFS are both O(n^2) for dense graphs and you can charge some of that work to the operations where you got lucky and didn't have a lot to do.
I suggest you label all the nodes. Use integers, that's easiest.
At any point, your graph will be divided into a number of disjoint subgraphs. Initially, each node is in its own subgraph.
Maintain the condition that each subgraph has a unique label, and all the nodes in the subgraph carry that label. Initially, just give each node a unique label. If your problem includes adding nodes, you might want to maintain a variable to hold the next available label.
If and only if a new edge would connect two nodes with identical labels, then the edge would create a cycle.
Whenever you add an edge, you will connect two previously disjoint subgraphs. You must relabel one of the subgraphs to match the other, which will require visiting all the nodes of one subgraph. This is the highest computatonal burden in this scheme.
If you don't mind allocating more space, you should also maintain a list of labels in use, associated with a count of the nodes carrying that label. This will allow you to choose the smaller subgraph when relabeling.
If you know which two nodes are being connected by the new edge, you could use some sort of path finding algorithm to detect an alternative path between the two nodes. In other words, if a path exists which connects the two nodes of your new edge before you add the new edge, adding the new edge will create a circle.
Your problem then reduces to finding the paths between two given nodes.

Is it possible to develop an algorithm to solve a graph isomorphism?

Or will I need to develop an algorithm for every unique graph? The user is given a type of graph, and they are then supposed to use the interface to add nodes and edges to an initial graph. Then they submit the graph and the algorithm is supposed to confirm whether the user's graph matches the given graph.
The algorithm needs to confirm not only the neighbours of each node, but also that each node and each edge has the correct value. The initial graphs will always have a root node, which is where the algorithm can start from.
I am wondering if I can develop the logic for such an algorithm in the general sense, or will I need to actually code a unique algorithm for each unique graph. It isn't a big deal if it's the latter case, since I only have about 20 unique graphs.
Thanks. I hope I was clear.
Graph isomorphism problem might not be hard. But it's very hard to prove this problem is not hard.
There are three possibilities for this problem.
1. Graph isomorphism problem is NP-hard.
2. Graph isomorphism problem has a polynomial time solution.
3. Graph isomorphism problem is neither NP-hard or P.
If two graphs are isomorphic, then there exist a permutation for this isomorphism. Take this permutation as a certificate, we could prove this two graphs are isomorphic to each other in polynomial time. Thus, graph isomorphism lies in the territory of NP set. However, it has been more than 30 years that no one could prove whether this problem is NP-hard or P. Thus, this problem is intrinsically hard despite its simple problem description.
If I understand the question properly, you can have ONE single algorithm, which will work by accepting one of several reference graphs as its input (in addition to the input of the unknown graph which isomorphism with the reference graph is to be asserted).
It appears that you seek to assert whether a given graph is exactly identical to another graph rather than asserting if the graphs are isomorph relative to a particular set of operations or characteristics. This implies that the algorithm be supplied some specific reference graph, rather than working off some set of "abstract" rules such as whether neither graphs have loops, or both graphs are fully connected etc. even though the graphs may differ in some other fashion.
Edit, following confirmation that:
Yeah, the algorithm would be supplied a reference graph (which is the answer), and will then check the user's graph to see if it is isomorphic (including the values of edges and nodes) to the reference
In that case, yes, it is quite possible to develop a relatively simple algorithm which would assert isomorphism of these two graphs. Note that the considerations mentioned in other remarks and answers and relative to the fact that the problem may be NP-Hard are merely indicative that a simple algorithm [or any algorithm for that matter] may not be sufficient to solve the problem in a reasonable amount of time for graphs which size and complexity are too big. However, assuming relatively small graphs and taking advantage (!) of the requirement that the weights of edges and nodes also need to match, the following algorithm should generally be applicable.
General idea:
For each sub-graph that is disconnected from the rest of the graph, identify one (or possibly several) node(s) in the user graph which must match a particular node of the reference graph. By following the paths from this node [in an orderly fashion, more on this below], assert the identity of other nodes and/or determine that there are some nodes which cannot be matched (and hence that the two structures are not isomorphic).
Rough pseudo code:
1. For both the reference and the user supplied graph, make the the list of their Connected Components i.e. the list of sub-graphs therein which are disconnected from the rest of the graph. Finding these connected components is done by following either a breadth-first or a depth-first path from starting at a given node and "marking" all nodes on that path with an arbitrary [typically incremental] element ID number. Once a given path has been fully visited, repeat the operation from any other non-marked node, and do so until there are no more non-marked nodes.
2. Build a "database" of the characteristics of each graph.
This will be useful to identify matching candidates and also to determine, early on, instances of non-isomorphism.
Each "database" would have two kinds of "records" : node and edge, with the following fields, respectively:
- node_id, Connected_element_Id, node weight, number of outgoing edges, number of incoming edges, sum of outgoing edges weights, sum of incoming edges weight.
node
- edge_id, Connected_element_Id, edge weight, node_id_of_start, node_id_of_end, weight_of_start_node, weight_of_end_node
3. Build a database of the Connected elements of each graph
Each record should have the following fields: Connected_element_id, number of nodes, number of edges, sum of node weights, sum of edge weights.
4. [optionally] Dispatch the easy cases of non-isomorphism:
4.a mismatch of the number of connected elements
4.b mismatch of of number of connected elements, grouped-by all fields but the id (number of nodes, number of edges, sum of nodes weights, sum of edges weights)
5. For each connected element in the reference graph
5.1 Identify candidates for the matching connected element in the user-supplied graph. The candidates must have the same connected element characteristics (number of nodes, number of edges, sum of nodes weights, sum of edges weights) and contain the same list of nodes and edges, again, counted by grouping by all characteristics but the id.
5.2 For each candidate, finalize its confirmation as an isomorph graph relative to the corresponding connected element in the reference graph. This is done by starting at a candidate node-match, i.e. a node, hopefully unique which has the exact same characteristics on both graphs. In case there is not such a node, one needs to disqualify each possible candidate until isomorphism can be confirmed (or all candidates are exhausted). For the candidate node match, walk the graph, in, say, breadth first, and by finding matches for the other nodes, on the basis of the direction and weight of the edges and weight of the nodes.
The main tricks with this algorithm is are to keep proper accounting of the candidates (whether candidate connected element at higher level or candidate node, at lower level), and to also remember and mark other identified items as such (and un-mark them if somehow the hypothetical candidate eventually proves to not be feasible.)
I realize the above falls short of a formal algorithm description, but that should give you an idea of what is required and possibly a starting point, would you decide to implement it.
You can remark that the requirement of matching nodes and edges weights may appear to be an added difficulty for asserting isomorphism, effectively simplify the algorithm because the underlying node/edge characteristics render these more unique and hence make it more likely that the algorithm will a) find unique node candidates and b) either quickly find other candidates on the path and/or quickly assert non-isomorphism.

An algorithm to check if a vertex is reachable

Is there an algorithm that can check, in a directed graph, if a vertex, let's say V2, is reachable from a vertex V1, without traversing all the vertices?
You might find a route to that node without traversing all the edges, and if so you can give a yes answer as soon as you do. Nothing short of traversing all the edges can confirm that the node isn't reachable (unless there's some other constraint you haven't stated that could be used to eliminate the possibility earlier).
Edit: I should add that it depends on how often you need to do queries versus how large (and dense) your graph is. If you need to do a huge number of queries on a relatively small graph, it may make sense to pre-process the data in the graph to produce a matrix with a bit at the intersection of any V1 and V2 to indicate whether there's a connection from V1 to V2. This doesn't avoid traversing the graph, but it can avoid traversing the graph at the time of the query. I.e., it's basically a greedy algorithm that assumes you're going to eventually use enough of the combinations that it's easiest to just traverse them all and store the result. Depending on the size of the graph, the pre-processing step may be slow, but once it's done executing a query becomes quite fast (constant time, and usually a pretty small constant at that).
Depth first search or breadth first search. Stop when you find one. But there's no way to tell there's none without going through every one, no. You can improve the performance sometimes with some heuristics, like if you have additional information about the graph. For example, if the graph represents a coordinate space like a real map, and most of the time you know that there's going to be a mostly direct path, then you can attempt to have the depth-first search look along lines that "aim towards the target". However, imagine the case where the start and end points are right next to each other, but with no vector inbetween, and to find it, you have to go way out of the way. You have to check every case in order to be exhaustive.
I doubt it has a name, but a breadth-first search might go like this:
Add V1 to a queue of nodes to be visited
While there are nodes in the queue:
If the node is V2, return true
Mark the node as visited
For every node at the end of an outgoing edge which is not yet visited:
Add this node to the queue
End for
End while
Return false
Create an adjacency matrix when the graph is created. At the same time you do this, create matrices consisting of the powers of the adjacency matrix up to the number of nodes in the graph. To find if there is a path from node u to node v, check the matrices (starting from M^1 and going to M^n) and examine the value at (u, v) in each matrix. If, for any of the matrices checked, that value is greater than zero, you can stop the check because there is indeed a connection. (This gives you even more information as well: the power tells you the number of steps between nodes, and the value tells you how many paths there are between nodes for that step number.)
(Note that if you know the number of steps in the longest path in your graph, for whatever reason, you only need to create a number of matrices up to that power. As well, if you want to save memory, you could just store the base adjacency matrix and create the others as you go along, but for large matrices that may take a fair amount of time if you aren't using an efficient method of doing the multiplications, whether from a library or written on your own.)
It would probably be easiest to just do a depth- or breadth-first search, though, as others have suggested, not only because they're comparatively easy to implement but also because you can generate the path between nodes as you go along. (Technically you'd be generating multiple paths and discarding loops/dead-end ones along the way, but whatever.)
In principle, you can't determine that a path exists without traversing some part of the graph, because the failure case (a path does not exist) cannot be determined without traversing the entire graph.
You MAY be able to improve your performance by searching backwards (search from destination to starting point), or by alternating between forward and backward search steps.
Any good AI textbook will talk at length about search techniques. Elaine Rich's book was good in this area. Amazon is your FRIEND.
You mentioned here that the graph represents a road network. If the graph is planar, you could use Thorup's Algorithm which creates an O(nlogn) space data structure that takes O(nlogn) time to build and answers queries in O(1) time.
Another approach to this problem would allow you to ignore all of the vertices. If you were to only look at the edges, you can produce a transitive closure array that will show you each vertex that is reachable from any other vertex.
Start with your list of edges:
Va -> Vc
Va -> Vd
....
Create an array with start location as the rows and end location as the columns. Fill the arrays with 0. For each edge in the list of edges, place a one in the start,end coordinate of the edge.
Now you iterate a few times until either V1,V2 is 1 or there are no changes.
For each row:
NextRowN = RowN
For each column that is true for RowN
Use boolean OR to OR in the results of that row of that number with the current NextRowN.
Set RowN to NextRowN
If you run this algorithm until the end, you will quickly have a complete list of all reachable vertices without looking at any of them. The runtime is proportional to the number of edges. This would work well with a reasonable implementation and a reasonable number of edges.
A slightly more complex version of this algorithm would be to only calculate the vertices reachable by V1. To do this, you would focus your scope on the ones that are currently reachable at any given time. You can also limit adding rows to only one time, since the other rows are never changing.
In order to be sure, you either have to find a path, or traverse all vertices that are reachable from V1 once.
I would recommend an implementation of depth first or breadth first search that stops when it encounters a vertex that it has already seen. The vertex will be processed on the first occurrence only. You need to make sure that the search starts at V1 and stops when it runs out of vertices or encounters V2.

Is there a proper algorithm to solve edge-removing problem?

There is a directed graph (not necessarily connected) of which one or more nodes are distinguished as sources. Any node accessible from any one of the sources is considered 'lit'.
Now suppose one of the edges is removed. The problem is to determine the nodes that were previously lit and are not lit anymore.
An analogy like city electricity system may be considered, I presume.
This is a "dynamic graph reachability" problem. The following paper should be useful:
A fully dynamic reachability algorithm for directed graphs with an almost linear update time. Liam Roditty, Uri Zwick. Theory of Computing, 2002.
This gives an algorithm with O(m * sqrt(n))-time updates (amortized) and O(sqrt(n))-time queries on a possibly-cyclic graph (where m is the number of edges and n the number of nodes). If the graph is acyclic, this can be improved to O(m)-time updates (amortized) and O(n/log n)-time queries.
It's always possible you could do better than this given the specific structure of your problem, or by trading space for time.
If instead of just "lit" or "unlit" you would keep a set of nodes from which a node is powered or lit, and consider a node with an empty set as "unlit" and a node with a non-empty set as "lit", then removing an edge would simply involve removing the source node from the target node's set.
EDIT: Forgot this:
And if you remove the last lit-from-node in the set, traverse the edges and remove the node you just "unlit" from their set (and possibly traverse from there too, and so on)
EDIT2 (rephrase for tafa):
Firstly: I misread the original question and thought that it stated that for each node it was already known to be lit or unlit, which as I re-read it now, was not mentioned.
However, if for each node in your network you store a set containing the nodes it was lit through, you can easily traverse the graph from the removed edge and fix up any lit/unlit references.
So for example if we have nodes A,B,C,D like this: (lame attempt at ascii art)
A -> B >- D
\-> C >-/
Then at node A you would store that it was a source (and thus lit by itself), and in both B and C you would store they were lit by A, and in D you would store that it was lit by both A and C.
Then say we remove the edge from B to D: In D we remove B from the lit-source-list, but it remains lit as it is still lit by A. Next say we remove the edge from A to C after that: A is removed from C's set, and thus C is no longer lit. We then go on to traverse the edges that originated at C, and remove C from D's set which is now also unlit. In this case we are done, but if the set was bigger, we'd just go on from D.
This algorithm will only ever visit the nodes that are directly affected by a removal or addition of an edge, and as such (apart from the extra storage needed at each node) should be close to optimal.
Is this your homework?
The simplest solution is to do a DFS (http://en.wikipedia.org/wiki/Depth-first_search) or a BFS (http://en.wikipedia.org/wiki/Breadth-first_search) on the original graph starting from the source nodes. This will get you all the original lit nodes.
Now remove the edge in question. Do again the DFS. You can the nodes which still remain lit.
Output the nodes that appear in the first set but not the second.
This is an asymptotically optimal algorithm, since you do two DFSs (or BFSs) which take O(n + m) times and space (where n = number of nodes, m = number of edges), which dominate the complexity. You need at least o(n + m) time and space to read the input, therefore the algorithm is optimal.
Now if you want to remove several edges, that would be interesting. In this case, we would be talking about dynamic data structures. Is this what you intended?
EDIT: Taking into account the comments:
not connected is not a problem, since nodes in unreachable connected components will not be reached during the search
there is a smart way to do the DFS or BFS from all nodes at once (I will describe BFS). You just have to put them all at the beginning on the stack/queue.
Pseudo code for a BFS which searches for all nodes reachable from any of the starting nodes:
Queue q = [all starting nodes]
while (q not empty)
{
x = q.pop()
forall (y neighbour of x) {
if (y was not visited) {
visited[y] = true
q.push(y)
}
}
}
Replace Queue with a Stack and you get a sort of DFS.
How big and how connected are the graphs? You could store all paths from the source nodes to all other nodes and look for nodes where all paths to that node contain one of the remove edges.
EDIT: Extend this description a bit
Do a DFS from each source node. Keep track of all paths generated to each node (as edges, not vertices, so then we only need to know the edges involved, not their order, and so we can use a bitmap). Keep a count for each node of the number of paths from source to node.
Now iterate over the paths. Remove any path that contains the removed edge(s) and decrement the counter for that node. If a node counter is decremented to zero, it was lit and now isn't.
I would keep the information of connected source nodes on the edges while building the graph.(such as if edge has connectivity to the sources S1 and S2, its source list contains S1 and S2 ) And create the Nodes with the information of input edges and output edges. When an edge is removed, update the output edges of the target node of that edge by considering the input edges of the node. And traverse thru all the target nodes of the updated edges by using DFS or BFS. (In case of a cycle graph, consider marking). While updating the graph, it is also possible to find nodes without any edge that has source connection (lit->unlit nodes). However, it might not be a good solution, if you'd like to remove multiple edges at the same time since that may cause to traverse over same edges again and again.

Resources