Questions:
Given a directed graph of N nodes and M edges (M <= 2.N). Find all the nodes that is reachable from all other nodes.
Example:
The below graph has 4 nodes and 4 edges:
Answer: Node (2) and (3) is reachable from all other nodes.
P/S:
The only solution I came up with is to revert the graph, BFS all nodes and check if they reach all other nodes. But it would take O(n^2).
Is there any other approach that takes O(n.logn) or less?
Here's my take on the O(n^2) approach:
void init(){
cin >> n >> m;
for(int i = 0; i < m; i++){
int u, v; cin >> u >> v;
adj[v].emplace_back(u);
}
}
void dfs(int u){
visited[u] = true;
for(int v : adj[u])
if(!visited[v]) dfs(v);
}
init();
for(int u = 1; u <= n; u++){
memset(visited, 0, sizeof visited);
dfs(u);
if(count(visited + 1, visited + n + 1, 1) == n) cout << u << ' ';
}
You can get an O(|V| + |E|) algorithm using Tarjan's strongly connected components algorithm. Many sample implementations are available online or in standard algorithms textbooks.
Once you have the strongly connected components, the set of vertices reachable from all others is exactly the set of vertices whose strongly connected component(s) have out-degree 0 (the sinks in the graph). However, there is one exception: if the condensation of the graph (the new graph induced by the strongly connected components) is not connected, there are no such vertices. You can test the connectivity of a directed acyclic graph in O(|V|) time, by running a breadth-first search from any in-degree 0 vertex.
Can You share that code which take O(n^2) ?
Anyway, This approach only take O(n), Which is better then O(n log n)!
class DirectedGraph {
graph = {}
constructor(directedGraph) {
for (let [id, arrow] of directedGraph)
this.graph[id] = { arrow, isVisited: false }
}
find_end_points(id) {
let out = [], graph = this.graph;
function go_next(id, from) {
let node = graph[id];
if (node.isVisited || !node.arrow) return out.push(id)
if (node.arrow == from) out.push(id);
node.isVisited = true;
go_next(node.arrow, id);
}
go_next(id, null);
return out
}
}
let directedGraph = new DirectedGraph([[2, 3], [3, 2], [1, 2], [4, 1]]);
console.log(directedGraph.find_end_points(4))
Related
I am looking for an algorithm to find global mini-cut in a undirected graph.
I want to input a graph and algorithm output minimum number of the edges by cutting them the given graph can be partitioned into two parts.
Here is requirement:
find exact edges, not only their number.
the min-cut edges should compute with 100% correctly.
it is an undirected graph.
the algorithm shall terminate by indicating it found the answer or not found the answer.
I searched some articles on Internet and find out that Karger's minimum cut algorithm is randomized one, its output maybe be not the exact min-cuts. I don't algorithm like that.
I want to compute exact edges(I need to know which edges they are) whose number is the smallest.
I would like to hear some advice, while I am looking for such algorithms.
It would be great if your advice comes with introduction to the algorithm with example codes.
Thanks in advance.
We can do this using max flow algorithm, when we calculate the max-flow of a graph the edges that are saturated when the algorithm finishes are part of the min-cut. You can read more about the Max flow Min cut theorem . Calculating the max flow is a fairly standard problem, there are a lot of polynomial time solutions available for that problem you can read more about them here.
So to summarize the algorithm, we first find the max flow of the graph, then the edges in the min cut are the ones where the flow on the edge is equal to the capacity of that edge (those edges that have been saturated).
So one of the ways to solve the max flow problem is using the Ford Fulkerson algorithm, it finds augmenting path in the graph, then using the minimum edge in the augmenting path tries to saturate that augmenting path, it repeats this process untill no augmenting path's are left.
To find the augmenting path we can either do a Depth first Search or a Breadth First Search. The Edmonds Karp's algorithm finds an augmenting path by using a simple Breadth first Search.
I have written c++ code below to find the max flow and min cut, the code to find max flow is taken from the book "Competetive Programming by Steven Halim".
Also note that since the graph is undirected all the edges that the min cut function print's would be twice as it prints a -- b and b -- a.
#include<iostream>
#include<vector>
#include<queue>
#include<utility>
#define MAXX 100
#define INF 1e9
using namespace std;
int s, t, flow, n, dist[MAXX], par[MAXX], AdjMat[MAXX][MAXX]; //Adjacency Matrix graph
vector< pair<int, int> > G[MAXX]; //adjacency list graph
void minCut(){
for(int i = 0;i < n;i++){
for(int j = 0;j < G[i].size();j++){
int v = G[i][j].first;
if(AdjMat[i][v] == 0){ //saturated edges
cout << i << " " << v << endl;
}
}
}
}
void augmentPath(int v, int minEdge){
if(v == s){ flow = minEdge;return; }
else if(par[v] != -1){
augmentPath(par[v], min(minEdge, AdjMat[par[v]][v]));
AdjMat[par[v]][v] -= flow; //forward edges
AdjMat[v][par[v]] += flow; //backward edges
}
}
void EdmondsKarp(){
int max_flow = 0;
for(int i= 0;i < n;i++) dist[i] = -1, par[i] = -1;
while(1){
flow = 0;
queue<int> q;
q.push(s);dist[s] = 0;
while(!q.empty()){
int u = q.front();q.pop();
if(u == t) break;
for(int i = 0;i < G[u].size();i++){
int v = G[u][i].first;
if(AdjMat[u][v] > 0 && dist[v] == -1){
dist[v] = dist[u] + 1;
q.push(v);
par[v] = u;
}
}
}
augmentPath(t, INF);
if(flow == 0) break; //Max flow reached, now we have saturated edges that form the min cut
max_flow += flow;
}
}
int main(){
//Create the graph here, both as an adjacency list and adjacency matrix
//also mark the source i.e "s" and sink "t", before calling max flow.
return 0;
}
I'm reading graph algorithms from Cormen book. Below is pseudocode from that book
Prim algorithm for MST
MST-PRIM (G, w, r)
for each u in G.V
u.key = infinity
u.p = NIL
r.key = 0
Q = G.V
while Q neq null
u = EXTRACT-MIN(Q)
for each v in G.Adj[u]
if (v in Q) and (w(u,v) < v.key)
v.p = u
v.key = w(u,v)
Dijkstra algorithm to find single source shortest path.
INITIALIZE-SINGLE-SOURCE (G,s)
for each vertex v in G.V
v.d = infinity
v.par = NIL
s.d = 0
DIJKSTRA (G, w, s)
INITIALIZE-SINGLE-SOURCE(G,s)
S = NULL
Q = G.V
while Q neq null
u = EXTRACT-MIN(Q)
S = S U {u}
for each vertex v in G.Adj[u]
RELAX(u,v,w)
My question is, why we are checking if vertex belongs to Q (v in Q), i.e. that vertex doesn't belong to tree, whereas in Dijkstra algorithm we are not checking for that.
Any reason, why?
The algorithms called Prim and Dijkstra solve different problems in the first place. 'Prim' finds a minimum spanning tree of an undirected graph, while 'Disjkstra' solves the single-source shortest path problem for directed graphs with nonnegative edge weights.
In both algorithms queue Q contains all vertices that are not 'done' yet, i.e. white and gray according to common terminology (see here).
In Dijkstra's algorithm, the black vertex cannot be relaxed, because if it could, that would mean that its distance was not correct beforehand (contradicts with property of black nodes). So there is no difference whether you check v in Q or not.
In Prim's algorithm, it is possible to find an edge of small weight, that leads to already black vertex. That's why if we do not check v in Q, then the value in vertex v can change indeed. Normally, it does not matter, because we never read min-weight value for black vertices. However, your pseudocode is using MinHeap data structure. In this case each modification of vertex values must be accompanied with DecreaseKey. Clearly, it is not valid to call DecreaseKey for black vertices, because they are not in heap. That's why I suppose author decided to check for v in Q explicitly.
Speaking generally, the codes for Dijkstra's and Prim's algorithms are usually absolutely same, except for a minor difference:
Prim's algorithm checks w(u, v) for being less than D(v) in RELAX.
Dijkstra's algorithm checks D(u) + w(u, v) for being less D(v) in RELAX.
Take a look at my personal implementation for both Dijkstra and Prim written in C++.
They are very similar and I modified Dijkstra into Prim.
Dijkstra:
const int INF = INT_MAX / 4;
struct node { int v, w; };
bool operator<(node l, node r){if(l.w==r.w)return l.v>r.v; return l.w> r.w;}
vector<int> Dijkstra(int max_v, int start_v, vector<vector<node>>& adj_list) {
vector<int> min_dist(max_v + 1, INF);
priority_queue<node> q;
q.push({ start_v, 0 });
min_dist[start_v] = 0;
while (q.size()) {
node n = q.top(); q.pop();
for (auto adj : adj_list[n.v]) {
if (min_dist[adj.v] > n.w + adj.w) {
min_dist[adj.v] = n.w + adj.w;
q.push({ adj.v, adj.w + n.w });
}
}
}
return min_dist;
}
Prim:
struct node { int v, w; };
bool operator<(node l, node r) { return l.w > r.w; }
int MST_Prim(int max_v, int start_v, vector<vector<node>>& adj_list) {
vector<int> visit(max_v + 1, 0);
priority_queue<node> q; q.push({ start_v, 0 });
int sum = 0;
while (q.size()) {
node n = q.top(); q.pop();
if (visit[n.v]) continue;
visit[n.v] = 1;
sum += n.w;
for (auto adj : adj_list[n.v]) {
q.push({ adj.v, adj.w });
}
}
return sum;
}
Here is an excise for graph.
Given an undirected graph G with n vertices and m edges, and an integer k, give an O(m + n) algorithm that finds the maximum induced subgraph H of G such that each vertex in H has degree ≥ k, or prove that no such graph exists.
An induced subgraph F = (U, R) of a graph G = (V, E) is a subset of U of the vertices V of G, and all edges R of G such that both vertices of each edge are in U.
My initial idea is like this:
First, this excise actually asks that we have all vertices S whose degrees are bigger than or equal to k, then we remove vertices in S who don't have any edge connected to others. Then the refined S is H, in which all vertices have degree >= k and the edges between them is R.
In addition, it asks O(m+n), so I think I need to a BFS or DFS. Then I get stuck.
In BFS, I can know the degree of a vertex. But once I get the degree of v (a vertex), I don't know other connected vertices except for its parent. But if the parent doesn't have degree >= k, I can't eliminate v as it may still be connected with others.
Any hints?
Edit:
According to the answer of #Michael J. Barber, I implemented it and update the code here:
Can anyone have a look at the key method of the codes public Graph kCore(Graph g, int k)? Do I do it right? Is it O(m+n)?
class EdgeNode {
EdgeNode next;
int y;
}
public class Graph {
public EdgeNode[] edges;
public int numVertices;
public boolean directed;
public Graph(int _numVertices, boolean _directed) {
numVertices = _numVertices;
directed = _directed;
edges = new EdgeNode[numVertices];
}
public void insertEdge(int x, int y) {
insertEdge(x, y, directed);
}
public void insertEdge(int x, int y, boolean _directed) {
EdgeNode edge = new EdgeNode();
edge.y = y;
edge.next = edges[x];
edges[x] = edge;
if (!_directed)
insertEdge(y, x, true);
}
public Graph kCore(Graph g, int k) {
int[] degree = new int[g.numVertices];
boolean[] deleted = new boolean[g.numVertices];
int numDeleted = 0;
updateAllDegree(g, degree);// get all degree info for every vertex
for (int i = 0;i < g.numVertices;i++) {
**if (!deleted[i] && degree[i] < k) {
deleteVertex(p.y, deleted, g);
}**
}
//Construct the kCore subgraph
Graph h = new Graph(g.numVertices - numDeleted, false);
for (int i = 0;i < g.numVertices;i++) {
if (!deleted[i]) {
EdgeNode p = g[i];
while(p!=null) {
if (!deleted[p.y])
h.insertEdge(i, p.y, true); // I just insert the good edge as directed, because i->p.y is inserted and later p.y->i will be inserted too in this loop.
p = p.next;
}
}
}
}
return h;
}
**private void deleteVertex(int i, boolean[] deleted, Graph g) {
deleted[i] = true;
EdgeNode p = g[i];
while(p!=null) {
if (!deleted[p.y] && degree[p.y] < k)
deleteVertex(p.y, deleted, g);
p = p.next;
}
}**
private void updateAllDegree(Graph g, int[] degree) {
for(int i = 0;i < g.numVertices;i++) {
EdgeNode p = g[i];
while(p!=null) {
degree[i] += 1;
p = p.next;
}
}
}
}
A maximal induced subgraph where the vertices have minimum degree k is called a k-core. You can find the k-cores just by repeatedly removing any vertices with degree less than k.
In practice, you first evaluate the degrees of all the vertices, which is O(m). You then go through the vertices looking for vertices with degree less than k. When you find such a vertex, cut it from the graph and update the degrees of the neighbors, also deleting any neighbors whose degrees drop below k. You need to look at each vertex at least once (so doable in O(n)) and update degrees at most once for each edge (so doable in O(m)), giving a total asymptotic bound of O(m+n).
The remaining connected components are the k-cores. Find the biggest one by evaluating their sizes.
What optimizations exist for trying to find the longest path in a cyclic graph?
Longest path in cyclic graphs is known to be NP-complete. What optimizations or heuristics can make finding the longest path faster than DFSing the entire graph? Are there any probabilistic approaches?
I have a graph with specific qualities, but I'm looking for an answer to this in the general case. Linking to papers would be fantastic. Here is a partial answer:
Confirm it is cyclic. Longest path in acyclic graphs is easily computed using dynamic programming.
Find out if the graph is planar (which algorithm is best?). If it is, you might see if it is a block graph, ptolemaic graph, or cacti graph and apply the methods found in this paper.
Find out how many simple cycles there are using Donald B Johnson's algorithm (Java implementation). You can change any cyclic graph into an acyclic one by removing an edge in a simple cycle. You can then run the dynamic programming solution found on the Wikipedia page. For completeness, you would have to do this N times for each cycle, where N is the length of the cycle. Thus, for an entire graph, the number of times you have to run the DP solution is equal to the product of the lengths of all cycles.
If you have to DFS the entire graph, you can prune some paths by computing the "reachability" of each node in advance. This reachability, which is mainly applicable to directed graphs, is the number of nodes each node can reach without repetitions. It is the maximum the longest path from that node could possibly be. With this information, if your current path plus the reachability of the child node is less than the longest you've already found, there is no point in taking that branch as it is impossible that you would find a longer path.
Here is a O(n*2^n) dynamic programming approach that should be feasible for up to say 20 vertices:
m(b, U) = the maximum length of any path ending at b and visiting only (some of) the vertices in U.
Initially, set m(b, {b}) = 0.
Then, m(b, U) = max value of m(x, U - x) + d(x, b) over all x in U such that x is not b and an edge (x, b) exists. Take the maximum of these values for all endpoints b, with U = V (the full set of vertices). That will be the maximum length of any path.
The following C code assumes a distance matrix in d[N][N]. If your graph is unweighted, you can change every read access to this array to the constant 1. A traceback showing an optimal sequence of vertices (there may be more than one) is also computed in the array p[N][NBITS].
#define N 20
#define NBITS (1 << N)
int d[N][N]; /* Assumed to be populated earlier. -1 means "no edge". */
int m[N][NBITS]; /* DP matrix. -2 means "unknown". */
int p[N][NBITS]; /* DP predecessor traceback matrix. */
/* Maximum distance for a path ending at vertex b, visiting only
vertices in visited. */
int subsolve(int b, unsigned visited) {
if (visited == (1 << b)) {
/* A single vertex */
p[b][visited] = -1;
return 0;
}
if (m[b][visited] == -2) {
/* Haven't solved this subproblem yet */
int best = -1, bestPred = -1;
unsigned i;
for (i = 0; i < N; ++i) {
if (i != b && ((visited >> i) & 1) && d[i][b] != -1) {
int x = subsolve(i, visited & ~(1 << b));
if (x != -1) {
x += d[i][b];
if (x > best) {
best = x;
bestPred = i;
}
}
}
}
m[b][visited] = best;
p[b][visited] = bestPred;
}
return m[b][visited];
}
/* Maximum path length for d[][].
n must be <= N.
*last will contain the last vertex in the path; use p[][] to trace back. */
int solve(int n, int *last) {
int b, i;
int best = -1;
/* Need to blank the DP and predecessor matrices */
for (b = 0; b < N; ++b) {
for (i = 0; i < NBITS; ++i) {
m[b][i] = -2;
p[b][i] = -2;
}
}
for (b = 0; b < n; ++b) {
int x = subsolve(b, (1 << n) - 1);
if (x > best) {
best = x;
*last = b;
}
}
return best;
}
On my PC, this solves a 20x20 complete graph with edge weights randomly chosen in the range [0, 1000) in about 7s and needs about 160Mb (half of that is for the predecessor trace).
(Please, no comments about using fixed-size arrays. Use malloc() (or better yet, C++ vector<int>) in a real program. I just wrote it this way so things would be clearer.)
I have written a code that solves MST using Prim method. I read that this kind of implementation(using priority queue) should have O(E + VlogV) = O(VlogV) where E is the number of edges and V number of Edges but when I look at my code it simply doesn't look that way.I would appreciate it if someone could clear this up for me.
To me it seems the running time is this:
The while loop takes O(E) times(until we go through all the edges)
Inside that loop we extract an element from the Q which takes O(logE) time.
And the second inner loop takes O(V) time(although we dont run this loop everytime
it is clear that it will be ran V times since we have to add all the vertices )
My conclusion would be that the running time is: O( E(logE+V) ) = O( E*V ).
This is my code:
#define p_int pair < int, int >
int N, M; //N - nmb of vertices, M - nmb of edges
int graph[100][100] = { 0 }; //adj. matrix
bool in_tree[100] = { false }; //if a node if in the mst
priority_queue< p_int, vector < p_int >, greater < p_int > > Q;
/*
keeps track of what is the smallest edge connecting a node in the mst tree and
a node outside the tree. First part of pair is the weight of the edge and the
second is the node. We dont remember the parent node beaceuse we dont need it :-)
*/
int mst_prim()
{
Q.push( make_pair( 0, 0 ) );
int nconnected = 0;
int mst_cost = 0;
while( nconnected < N )
{
p_int node = Q.top(); Q.pop();
if( in_tree[ node.second ] == false )
{
mst_cost += node.first;
in_tree[ node.second ] = true;
for( int i = 0; i < N; ++i )
if( graph[ node.second ][i] > 0 && in_tree[i]== false )
Q.push( make_pair( graph[ node.second ][i], i ) );
nconnected++;
}
}
return mst_cost;
}
You can use adjacency lists to speed your solution up (but not for dense graphs), but even then, you are not going to get O(V log V) without a Fibonacci heap..
Maybe the Kruskal algorithm would be simpler for you to understand. It features no priority queue, you only have to sort an array of edges once. It goes like this basically:
Insert all edges into an array and sort them by weight
Iterate over the sorted edges, and for each edge connecting nodes i and j, check if i and j are connected. If they are, skip the edge, else add the edge into the MST.
The only catch is to be quickly able to say if two nodes are connected. For this you use the Union-Find data structure, which goes like this:
int T[MAX_#_OF_NODES]
int getParent(int a)
{
if (T[a]==-1)return a;
return T[a]=getParent(T[a]);
}
void Unite(int a,int b)
{
if (rand()&1)
T[a]=b;
else
T[b]=a;
}
In the beginning, just initialize T to all -1, and then every time you want to find out if nodes A and B are connected, just compare their parents - if they are the same, they are connected (like this getParent(A)==getParent(B)). When you are inserting the edge to MST, make sure to update the Union-Find with Unite(getParent(A),getParent(B)).
The analysis is simple, you sort the edges and iterate over using the UF that takes O(1). So it is O(E logE + E ) which equals O(E log E).
That is it ;-)
I did not have to deal with the algorithm before, but what you have implemented does not match the algorithm as explained on Wikipedia. The algorithm there works as follows.
But all vertices into the queue. O(V)
While the queue is not empty... O(V)
Take the edge with the minimum weight from the queue. O(log(V))
Update the weights of adjacent vertices. O(E / V), this is the average number of adjacent vertices.
Reestablish the queue structure. O(log(V))
This gives
O(V) + O(V) * (O(log(V)) + O(V/E))
= O(V) + O(V) * O(log(V)) + O(V) * O(E / V)
= O(V) + O(V * log(V)) + O(E)
= O(V * log(V)) + O(E)
exactly what one expects.