Divide a graph into two sets - algorithm

The Question is from Code jam.
Question:
Is there any way to divide the nodes of a graph into two group such that any two nodes which can't remain in the same group should be in different group.
Is there any standard algorithm for this?
How should I tackle this problem when each group should have equal element.

First, the feasibility problem (is there such set/ doesn't exist such set) is 2-coloring problem, where:
G = (V,E)
V = { all nodes }
E = { (u,v) | u and v are "troubling each other" }
This problem is solved by checking if the graph is bi-partite, and can be done using BFS.
How to tackle the problem when each group should have equal element.
first, let's assume the graph is bi-partite, so there is some solution.
Split the graph into set of connected components: (S1,S2,S3,...,Sk).
Each connected component is actually a subgraph (Si = Li,Ri) - where Li,Ri are the two sides of the bipartite graph (there is only one such splitting in each connected component, if ignoring the order of Li and Ri).
Create a new array:
arr[i] = |Li| - |Ri|
where |X| is the cardinality of X (number of elements in the set)
Now, solving this problem is same as solving the partition problem, which can be done in pseudo-polynomial time (which is polynomial in the number of nodes).
The solution to partition problem splits each arr[i] to be in A or in B, such that sum{A} is closest as possible to sum{B}. If arr[i] is in A, in your solution, color Li with "1", and Ri with "2". Otherwise - do the opposite.
Solution will be O(k*n+m), where k is number of connected components, n is number of nodes in the graph, and m is number of edges in the graph.

You build a graph from the given nodes (using a hash-table to map names to nodes) and then you use BFS or DFS to traverse the graph and determine if its bipartite (that is, divisibe into two disjoint sets such that a node in one set is only in "trouble" with nodes in the other set, but not with any node in its own set). This is done by assigning a boolean value to each node as its visited by the BFS/DFS and then checking if any of its visited neighbors has the same value, which means the graph is not bipartite (not divisible into two groups).

Related

How can I know all possible ways that the edges are connected if I know the toposort in a graph?

How can I know all possible ways that the edges are connected if I know the topological sort?
Here is the original problem:
Now little C has topologically sorted a simple (no heavy edges) directed acyclic graph, but accidentally lost the original. Except for topological sequences, little C only remembers that the original graph had the number of edges k, and that there is one vertex u in the graph that can reach all the other vertices. He wants to know how many simple directed acyclic graphs there are that satisfy the above requirements. Since the answer may be large, you only need to output the remainder of the answer module m.
I have just learned the topological sort. I wonder how I can use it in an upside down way? I know the final toposorted way as (1 2 3 4) and there is one vertex that connects all other vertexes, and there are 4 edges in all, but I need the number of all possible ways that edges are linked.
I think this problem has something to deal with permutation number,and the specific u has to be the first in the toposorted list.
NOTICE the max of m can be up to 200'000,so definitely you can not brute force this problem!!!
Let the topological order be u = 1, 2, …, n. Since 1 can reach all other
vertices, the topological order begins with 1. Each node v > 1, being
reachable from u, must have arcs from one or more nodes < v. These
choices are linked only by the constraint on the number of arcs.
We end up computing Count[v][m] (modulo whatever the modulus is) as
the number of reconstructions on 1, 2, …, v with exactly m arcs. The
answer is Count[n][k].
Count[1][0] = 1 if m == 0 else 0
for v > 1, Count[v][m] = sum for j = 1 to min(m, v-1) of (v-1 choose j)*Count[v-1][m-j]

Compute distance using DFS

I was torn between these two methods:
M1:
Use adjacency list to represent graph G with vertices P and edges A
Use DFS on G storing all the distances from p in an array d;
Loop through d checking all entries. If some d[u] >6, return false otherwise true
M2:
Use adjacency list to represent graph G with vertices P and edges A
Use BFS on G storing all the distances from p in an array d;
Loop through d checking all entries. If some d[u] >6, return false otherwise true
Both these methods will produce a worst case O(|P| + |A|), therefore I think that both would be a correct answer to this question. I had chosen the DFS method, with the reasoning that with DFS you should be able to find the "outlier" of freedom degree 7 earlier than with BFS, since with BFS you would have to traverse every single Vertex until degree 7 in every case.
Apparently this is wrong according to the teacher, as using DFS, you can't compute the distances. I don't understand why you wouldn't be able to compute the distances. I could have a number n indicating the degree of freedom I am currently at. Starting from root p, the child would have n = 1. Now I store n in array d. Then I keep traversing down until no child is to be found, while incrementing n and storing the value in my array d. Then, if the back-tracking starts, the value n will be decremented until we find an unvisited child node of any of the visited nodes on the stack. If there is an unvisited child, increment once again, then increment until no more child is found, decrement until the next unvisited child from the stack is found...
I believe that would be a way to store the distances with DFS
Both BFS and DFS can do the job: they can both limit their search to a depth of 6, and at the end of the traversal they can check whether the whole population was reached or not. But there are some important differences:
With BFS
The BFS traversal is the algorithm I would opt for. When a BFS search determines the degree of a person, it is definitive: no correction needs to be made to it.
Here is sketch of how you can do this with BFS:
visited = set() # empty set
frontier = [] # empty array
visited.add(p) # search starts at person p
frontier.append(p)
for degree in [1, 2, 3, 4, 5, 6]:
nextFrontier = [] # empty array
for person in frontier:
for acquaintance in A[person]:
if acquaintance not in visited:
visited.add(acquaintance)
nextFrontier.append(acquaintance)
frontier = nextFrontier
if size(visited) == size(P): # have we reached the whole population?
return True
# After six rounds we did not reach all people, so...
return False
This assumes that you can find the list of acquaintances for a given person via A[person]. If A is not structured like an adjacency list but as a list of pairs, then first do some preprocessing on the original A to create such an adjacency list.
With DFS
A DFS algorithm has as downside that it will not necessarily start with optimal paths, and so it will find that some persons have degree 6, while there really are shorter, uninvestigated paths that could improve on that degree. This means that a DFS algorithm may need to revisit nodes and even partial paths (edges) to register such improvements and cascade them through a visited path up to degree 6. And there might even be several improvements to be applied for the same person.
A DFS algorithm could look like this:
degreeOfPerson = dict() # empty key/value dictionary
for person in P:
degreeOfPerson[person] = 7 # some value greater than 6
function dfs(person, degree):
if degree >= 7:
return # don't lose time for higher degrees than 6.
for acquaintance in A[person]:
if degree < degreeOfPerson[acquaintance]: # improvement?
degreeOfPerson[acquaintance] = degree
dfs(acquaintance, degree+1)
# start DFS
degreeOfPerson[p] = 0
dfs(p, 1)
# Check if all persons got a degree of maximum 6
for person in P:
if degreeOfPerson[person] > 6:
return False
return True
Example
If the graph has three nodes, linked as a triangle a-b-c, with starting point a, then this would be the sequence. Indentation means (recursive) call of dfs:
degreeOfPerson[a] = 0
a->b: degreeOfPerson[b] = 1
b->c: degreeOfPerson[c] = 2
c->a: # cannot improve degreeOfPerson[a]. Backtrack
c->b: # cannot improve degreeOfPerson[b]. Backtrack
b->a: # cannot improve degreeOfPerson[a]. Backtrack
a->c: degreeOfPerson[c] = 1 # improvement!
c->a: # cannot improve degreeOfPerson[a]. Backtrack
c->b: # cannot improve degreeOfPerson[b]. Backtrack
Time Complexity
The number of times the same edge can be visited with DFS is not more than the maximum degree we are looking for -- in your case 6. If that is a constant, then it does not affect the time complexity. If however the degree to check for is an input value, then the time complexity of DFS becomes O(maxdegree * |E| + |V|).
A simple depth-first search algorithm does not necessary yield the shortest path in an undirected graph. For example, consider a simple triangle graph. If you start at one vertex, you will process the other two vertices. A naive algorithm will find that there is one vertex whose distance equals one away from the source, and a second vertex whose distance equals two away from the source. However, this is incorrect since the distance from the source to either vertex is actually one.
A much more natural approach is to use the breadth-first search (BFS) algorithm. It can be shown that a breadth-first search computes shortest paths, and it requires significantly fewer modifications.
You definitely can use depth-first search to compute the distances from one node to another, but it is not a natural approach. In fact, it is very common to miscompute distances using a depth-first search algorithm (see: http://www-student.cse.buffalo.edu/~atri/cse331/support/dfs-bfs/index.html), particularly when the underlying graph has cycles. There are some special cases you must handle if you want to do it this way, but it definitely is possible.
With that being said, the depth-first search algorithm you describe does not appear to be correct. For example, it will fail on the triangle graph that I described above. This is true because the standard depth-first search only visits each vertex once, and you would not revisit a vertex after its distance has been set. Thus, if you take the "longer path" to a vertex in a cycle at first, you will end up with an incorrect distance value.

Linear-time algorithm for number of distinct paths from each vertex in a directed acyclic graph

I am working on the following past paper question for an algorithms module:
Let G = (V, E) be a simple directed acyclic graph (DAG).
For a pair of vertices v, u in V, we say v is reachable from u if there is a (directed) path from u to v in G.
(We assume that every vertex is reachable from itself.)
For any vertex v in V, let R(v) be the reachability number of vertex v, which is the number of vertices u in V that are reachable from v.
Design an algorithm which, for a given DAG, G = (V, E), computes the values of R(v) for all vertices v in V.
Provide the analysis of your algorithm (i.e., correctness and running time
analysis).
(Optimally, one should try to design an algorithm running in
O(n + m) time.)
So, far I have the following thoughts:
The following algorithm for finding a topological sort of a DAG might be useful:
TopologicalSort(G)
1. Run DFS on G and compute a DFS-numbering, N // A DFS-numbering is a numbering (starting from 1) of the vertices of G, representing the point at which the DFS-call on a given vertex v finishes.
2. Let the topological sort be the function a(v) = n - N[v] + 1 // n is the number of nodes in G and N[v] is the DFS-number of v.
My second thought is that dynamic programming might be a useful approach, too.
However, I am currently not sure how to combine these two ideas into a solution.
I would appreciate any hints!
EDIT: Unfortunately the approach below is not correct in general. It may count multiple times the nodes that can be reached via multiple paths.
The ideas below are valid if the DAG is a polytree, since this guarantees that there is at most one path between any two nodes.
You can use the following steps:
find all nodes with 0 in-degree (i.e. no incoming edges).
This can be done in O(n + m), e.g. by looping through all edges
and marking those nodes that are the end of any edge. The nodes with 0
in-degree are those which have not been marked.
Start a DFS from each node with 0 in-degree.
After the DFS call for a node ends, we want to have computed for that
node the information of its reachability.
In order to achieve this, we need to add the reachability of the
successors of this node. Some of these values might have already been
computed (if the successor was already visited by DFS), therefore this
is a dynamic programming solution.
The following pseudocode describes the DFS code:
function DFS(node) {
visited[node] = true;
reachability[node] = 1;
for each successor of node {
if (!visited[successor]) {
DFS(successor);
}
reachability[node] += reachability[successor];
}
}
After calling this for all nodes with 0 in-degree, the reachability
array will contain the reachability for all nodes in the graph.
The overall complexity is O(n + m).
I'd suggest using a Breadth First Search approach.
For every node, add all the nodes that are connected to the queue. In addition to that, maintain a separate array for calculating the reachability.
For example, if a A->B, then
1.) Mark A as traversed
2.) B is added to the queue
3.) arr[B]+=1
This way, we can get R(v) for all vertices in O(|V| + |E|) time through arr[].

Minimum Spanning tree different from another

Assume we are given
an undirected graph g where every node i,1 <= i < n is connected to all j,i < j <=n
and a source s.
We want to find the total costs (defined as the sum of all edges' weights) of the cheapest minimum spanning tree that differs from the minimum distance tree of s (i.e. from the MST obtained by running prim/dijkstra on s) by at least one edge.
What would be the best way to tackle this? Because currently, I can only think of some kind of fixed-point iteration
run dijkstra on (g,s) to obtain reference graph r that we need to differ from
costs := sum(edge_weights_of(r))
change := 0
for each vertex u in r, run a bfs and note for each reached vertex v the longest edge on the path from u to v.
iterate through all edges e = (a,b) in g: and find e'=(a',b') that is NOT in r and minimizes newchange := weight(e') - weight(longest_edge(a',b'))
if(first_time_here OR newchange < 0) then change += newchange
if(newchange < 0) goto 4
result := costs + change
That seems to waste a lot of time... It relies on the fact that adding an edge to a spanning tree creates a cycle from which we can remove the longest edge.
I also thought about using Kruskal to get an overall minimum spanning tree and only using the above algorithm to replace a single edge when the trees from both, prim and kruskal, happen to be the same, but that doesn't seem to work as the result would be highly dependent on the edges selected during a run of kruskal.
Any suggestions/hints?
You can do it using Prim`s algorithm
Prim's algorithm:
let T be a single vertex x
while (T has fewer than n vertices)
{
1.find the smallest edge connecting T to G-T
2.add it to T
}
Now lets modify it.
Let you have one minimum spanning tree. Say Tree(E,V)
Using this algorithm
Prim's algorithm (Modified):
let T be a single vertex
let isOther = false
while (T has fewer than n vertices)
{
1.find the smallest edge (say e) connecting T to G-T
2.If more than one edge is found, {
check which one you have in E(Tree)
choose one different from this
add it to T
set isOther = true
}
else if one vertex is found {
add it to T
If E(Tree) doesn`t contain this edge, set isOther = true
Else don`t touch isOther ( keep value ).
}
}
If isOther = true, it means you have found another tree different from Tree(E,V) and it is T,
Else graph have single minimum spanning tree

Find a maximum tree subgraph with given number of edges that is a subgraph of a tree

So a problem is as follows: you are given a graph which is a tree and the number of edges that you can use. Starting at v1, you choose the edges that go out of any of the verticies that you have already visited.
An example:
In this example the optimal approach is:
for k==1 AC -> 5
for k==2 AB BH -> 11
for k==3 AC AB BH -> 16
At first i though this is a problem to find the maximum path of length k starting from A, which would be trivial, but the point is you can always choose to go a different way, so that approach did not work.
What i though of so far:
Cut the tree at k, and brute force all the possibilites.
Calculate the cost of going to an edge for all edges.
The cost would include the sum of all edges before the edge we are trying to go to divided by the amount of edges you need to add in order to get to that edge.
From there pick the maximum, for all edges, update the cost, and do it again until you have reached k.
The second approach seems good, but it reminds me a bit of the knapsack problem.
So my question is: is there a better approach for this? Is this problem NP?
EDIT: A counter example for the trimming answer:
This code illustrates a memoisation approach based on the subproblem of computing the max weight from a tree rooted at a certain node.
I think the complexity will be O(kE) where E is the number of edges in the graph (E=n-1 for a tree).
edges={}
edges['A']=('B',1),('C',5)
edges['B']=('G',3),('H',10)
edges['C']=('D',2),('E',1),('F',3)
cache={}
def max_weight_subgraph(node,k,used=0):
"""Compute the max weight from a subgraph rooted at node.
Can use up to k edges.
Not allowed to use the first used connections from the node."""
if k==0:
return 0
key = node,k,used
if key in cache:
return cache[key]
if node not in edges:
return 0
E=edges[node]
best=0
if used<len(E):
child,weight = E[used]
# Choose the amount r of edges to get from the subgraph at child
for r in xrange(k):
# We have k-1-r edges remaining to be used by the rest of the children
best=max(best,weight+
max_weight_subgraph(node,k-1-r,used+1)+
max_weight_subgraph(child,r,0))
# Also consider not using this child at all
best=max(best,max_weight_subgraph(node,k,used+1))
cache[key]=best
return best
for k in range(1,4):
print k,max_weight_subgraph('A',k)

Resources