I'm reading graph algorithms from Cormen book. Below is pseudocode from that book
Prim algorithm for MST
MST-PRIM (G, w, r)
for each u in G.V
u.key = infinity
u.p = NIL
r.key = 0
Q = G.V
while Q neq null
u = EXTRACT-MIN(Q)
for each v in G.Adj[u]
if (v in Q) and (w(u,v) < v.key)
v.p = u
v.key = w(u,v)
Dijkstra algorithm to find single source shortest path.
INITIALIZE-SINGLE-SOURCE (G,s)
for each vertex v in G.V
v.d = infinity
v.par = NIL
s.d = 0
DIJKSTRA (G, w, s)
INITIALIZE-SINGLE-SOURCE(G,s)
S = NULL
Q = G.V
while Q neq null
u = EXTRACT-MIN(Q)
S = S U {u}
for each vertex v in G.Adj[u]
RELAX(u,v,w)
My question is, why we are checking if vertex belongs to Q (v in Q), i.e. that vertex doesn't belong to tree, whereas in Dijkstra algorithm we are not checking for that.
Any reason, why?
The algorithms called Prim and Dijkstra solve different problems in the first place. 'Prim' finds a minimum spanning tree of an undirected graph, while 'Disjkstra' solves the single-source shortest path problem for directed graphs with nonnegative edge weights.
In both algorithms queue Q contains all vertices that are not 'done' yet, i.e. white and gray according to common terminology (see here).
In Dijkstra's algorithm, the black vertex cannot be relaxed, because if it could, that would mean that its distance was not correct beforehand (contradicts with property of black nodes). So there is no difference whether you check v in Q or not.
In Prim's algorithm, it is possible to find an edge of small weight, that leads to already black vertex. That's why if we do not check v in Q, then the value in vertex v can change indeed. Normally, it does not matter, because we never read min-weight value for black vertices. However, your pseudocode is using MinHeap data structure. In this case each modification of vertex values must be accompanied with DecreaseKey. Clearly, it is not valid to call DecreaseKey for black vertices, because they are not in heap. That's why I suppose author decided to check for v in Q explicitly.
Speaking generally, the codes for Dijkstra's and Prim's algorithms are usually absolutely same, except for a minor difference:
Prim's algorithm checks w(u, v) for being less than D(v) in RELAX.
Dijkstra's algorithm checks D(u) + w(u, v) for being less D(v) in RELAX.
Take a look at my personal implementation for both Dijkstra and Prim written in C++.
They are very similar and I modified Dijkstra into Prim.
Dijkstra:
const int INF = INT_MAX / 4;
struct node { int v, w; };
bool operator<(node l, node r){if(l.w==r.w)return l.v>r.v; return l.w> r.w;}
vector<int> Dijkstra(int max_v, int start_v, vector<vector<node>>& adj_list) {
vector<int> min_dist(max_v + 1, INF);
priority_queue<node> q;
q.push({ start_v, 0 });
min_dist[start_v] = 0;
while (q.size()) {
node n = q.top(); q.pop();
for (auto adj : adj_list[n.v]) {
if (min_dist[adj.v] > n.w + adj.w) {
min_dist[adj.v] = n.w + adj.w;
q.push({ adj.v, adj.w + n.w });
}
}
}
return min_dist;
}
Prim:
struct node { int v, w; };
bool operator<(node l, node r) { return l.w > r.w; }
int MST_Prim(int max_v, int start_v, vector<vector<node>>& adj_list) {
vector<int> visit(max_v + 1, 0);
priority_queue<node> q; q.push({ start_v, 0 });
int sum = 0;
while (q.size()) {
node n = q.top(); q.pop();
if (visit[n.v]) continue;
visit[n.v] = 1;
sum += n.w;
for (auto adj : adj_list[n.v]) {
q.push({ adj.v, adj.w });
}
}
return sum;
}
Related
When we are writing dijkstra algorithm with priority queue why we are not check about the visited node?
while (!pq.empty())
{
int u = pq.top().second;
pq.pop();
// Get all adjacent of u.
for (auto x : adj[u])
{
int v = x.first;
int weight = x.second;
if (dist[v] > dist[u] + weight)
{
dist[v] = dist[u] + weight;
pq.push(make_pair(dist[v], v));
}
}
It does check the previous value of the node in dist[v], which I assume stores the current best distance from the root (or u) to node v. If a new path is found to v which is shorter than the previous shortest one, it is reinserted into the priority queue because it may now provide shorter paths to other nodes. If this new distance to v is longer than the previous, then it is left alone. This is why there is no else in the implementation.
There will be many queries. Each query (A,B,K) requires you to check if a node (value=K) can be found in the path from A to B. Solution is expected not to exceed O(n+qlogq), n,q : node count, queries count.
I have a solution in my mind. I am posting that down. I want to know what other approaches are.
my approach:
Find LCA (lowest common ancestor) between A and B. Check if K is an ancestor to A or B. If yes=> check if LCA is ancestor to K. If yes, output yes. To find if a vertex is ancestor to other vertex, we can check whether a vertex is present in the subtree of another vertex. (This can be done in O(1) if we preprocess node's in-out visiting order in dfs. https://www.geeksforgeeks.org/printing-pre-and-post-visited-times-in-dfs-of-a-graph/ )
But the complexity increases if all queries have same K value. we need to check all the K which satisfies the in-out times with A or B. So to optimize that we can sort all K respective to in-out time of DFS.
Any thoughts?
There are following cases for R to exist in the path between U and V:
R is the lowest common ancestor of U and V.
R is on the path between LCA(U,V) and U.
R is on the path between LCA(U,V) and V.
// Function that return true if R
// exists on the path between U
// and V in the given tree
bool isPresent(int U, int V, int R)
{
// Calculating LCA between U and V
int LCA = lowestCommonAncestor(U, V);
// Calculating LCA between U and R
int LCA_1 = lowestCommonAncestor(U, R);
// Calculating LCA between U and V
int LCA_2 = lowestCommonAncestor(V, R);
if (LCA == R || (LCA_1 == LCA && LCA_2 == R) ||
(LCA_2 == LCA && LCA_1 == R)) {
return true;
}
return false;
}
Questions:
Given a directed graph of N nodes and M edges (M <= 2.N). Find all the nodes that is reachable from all other nodes.
Example:
The below graph has 4 nodes and 4 edges:
Answer: Node (2) and (3) is reachable from all other nodes.
P/S:
The only solution I came up with is to revert the graph, BFS all nodes and check if they reach all other nodes. But it would take O(n^2).
Is there any other approach that takes O(n.logn) or less?
Here's my take on the O(n^2) approach:
void init(){
cin >> n >> m;
for(int i = 0; i < m; i++){
int u, v; cin >> u >> v;
adj[v].emplace_back(u);
}
}
void dfs(int u){
visited[u] = true;
for(int v : adj[u])
if(!visited[v]) dfs(v);
}
init();
for(int u = 1; u <= n; u++){
memset(visited, 0, sizeof visited);
dfs(u);
if(count(visited + 1, visited + n + 1, 1) == n) cout << u << ' ';
}
You can get an O(|V| + |E|) algorithm using Tarjan's strongly connected components algorithm. Many sample implementations are available online or in standard algorithms textbooks.
Once you have the strongly connected components, the set of vertices reachable from all others is exactly the set of vertices whose strongly connected component(s) have out-degree 0 (the sinks in the graph). However, there is one exception: if the condensation of the graph (the new graph induced by the strongly connected components) is not connected, there are no such vertices. You can test the connectivity of a directed acyclic graph in O(|V|) time, by running a breadth-first search from any in-degree 0 vertex.
Can You share that code which take O(n^2) ?
Anyway, This approach only take O(n), Which is better then O(n log n)!
class DirectedGraph {
graph = {}
constructor(directedGraph) {
for (let [id, arrow] of directedGraph)
this.graph[id] = { arrow, isVisited: false }
}
find_end_points(id) {
let out = [], graph = this.graph;
function go_next(id, from) {
let node = graph[id];
if (node.isVisited || !node.arrow) return out.push(id)
if (node.arrow == from) out.push(id);
node.isVisited = true;
go_next(node.arrow, id);
}
go_next(id, null);
return out
}
}
let directedGraph = new DirectedGraph([[2, 3], [3, 2], [1, 2], [4, 1]]);
console.log(directedGraph.find_end_points(4))
(This is derived from a recently completed programming contest)
You are given G, a connected graph with N nodes and N-1 edges.
(Notice that this implies G forms a tree.)
Each edge of G is directed. (not necessarily upward to any root)
For each vertex v of G it is possible to invert zero or more edges such that there is a directed path from every other vertex w to v. Let the minimum possible number of edge inversions to achieve this be f(v).
By what linear or loglinear algorithm can we determine the subset of vertexes that have the minimal overall f(v) (including the value of f(v) of those vertexes)?
For example consider the 4 vertex graph with these edges:
A<--B
C<--B
D<--B
The value of f(A) = 2, f(B) = 3, f(C) = 2 and f(D) = 2...
..so therefore the desired output is {A,C,D} and 2
(note we only need to calculate the f(v) of vertexes that have a minimal f(v) - not all of them)
Code:
For posterity here is the code of solution:
int main()
{
struct Edge
{
bool fwd;
int dest;
};
int n;
cin >> n;
vector<vector<Edge>> V(n+1);
rep(i, n-1)
{
int src, dest;
scanf("%d %d", &src, &dest);
V[src].push_back(Edge{true, dest});
V[dest].push_back(Edge{false, src});
}
vector<int> F(n+1, -1);
vector<bool> done(n+1, false);
vector<int> todo;
todo.push_back(1);
done[1] = true;
F[1] = 0;
while (!todo.empty())
{
int next = todo.back();
todo.pop_back();
for (Edge e : V[next])
{
if (done[e.dest])
continue;
if (!e.fwd)
F[1]++;
done[e.dest] = true;
todo.push_back(e.dest);
}
}
todo.push_back(1);
while (!todo.empty())
{
int next = todo.back();
todo.pop_back();
for (Edge e : V[next])
{
if (F[e.dest] != -1)
continue;
if (e.fwd)
F[e.dest] = F[next] + 1;
else
F[e.dest] = F[next] - 1;
todo.push_back(e.dest);
}
}
int minf = INT_MAX;
rep(i,1,n)
chmin(minf, F[i]);
cout << minf << endl;
rep(i,1,n)
if (F[i] == minf)
cout << i << " ";
cout << endl;
}
I think that the following algorithm works correctly, and it certainly works in linear time.
The motivation for this algorithm is the following. Let's suppose that you already know the value of f(v) for some single node v. Now, consider any node u adjacent to v. If we want to compute the value of f(u), we can reuse some of the information from f(v) in order to compute it. Note that in order to get from any node w in the graph to u, one of two cases must happen:
That path passes through the edge connecting u and v. In that case, the way that we get from w to u is to go from w to v, then to follow the edge from v to u.
That path does not pass through the edge connecting u and v. In that case, the way that we get from w to u is the exact same way that we got from w to v, except that we stop as soon as we get to u.
The reason that this observation is important is that it means that if we know the number of edges we'd flip to get from any node to v, we can easily modify it to get the set of edges that we'd flip to get from any node to u. Specifically, it's going to be the same set of edges as before, except that we want to direct the edge connecting u and v so that it connects v to u rather than the other way around.
If the edge from u to v is initially directed (u, v), then we have to flip all the normal edges we flipped to get every node pointing at v, plus one more edge to get v pointed back at u. Thus f(u) = f(v) + 1. Otherwise, if the edge is originally directed (v, u), then the set of edges that we'd flip would be the same as before (pointing everything at v), except that we wouldn't flip the edge (v, u). Thus f(u) = f(v) - 1.
Consequently, once we know the value of f for a single node v, we can compute it for each adjacent node u as follows:
f(u) = f(v) + 1 if (u, v) is an edge.
f(u) = f(v) - 1 otherwise
This means that we can compute f(v) for all nodes v as follows:
Compute f(v) for some initial node v, chosen arbitrarily.
Do a DFS starting from v. When reaching a node u, compute its f score using the above logic.
All that's left to do is to compute f(v) for some initial node. To do this, we can run a DFS from v outward. Every time we see an edge pointed the wrong way, we have to flip it. Thus the initial value of f(v) is given by the number of wrong-pointing edges we find during the initial DFS.
We thus can compute the f score for each node in O(n) time by doing an initial DFS to compute f(v) for the initial node, then a secondary DFS to compute f(u) for each other node u. You can then for-loop over each of the n f-scores to find the minimum score, then do one more loop to find all values with that f-score. Each of these steps takes O(n) time, so the overall algorithm takes O(n) time as well.
Hope this helps! This was an awesome problem!
Here is an excise for graph.
Given an undirected graph G with n vertices and m edges, and an integer k, give an O(m + n) algorithm that finds the maximum induced subgraph H of G such that each vertex in H has degree ≥ k, or prove that no such graph exists.
An induced subgraph F = (U, R) of a graph G = (V, E) is a subset of U of the vertices V of G, and all edges R of G such that both vertices of each edge are in U.
My initial idea is like this:
First, this excise actually asks that we have all vertices S whose degrees are bigger than or equal to k, then we remove vertices in S who don't have any edge connected to others. Then the refined S is H, in which all vertices have degree >= k and the edges between them is R.
In addition, it asks O(m+n), so I think I need to a BFS or DFS. Then I get stuck.
In BFS, I can know the degree of a vertex. But once I get the degree of v (a vertex), I don't know other connected vertices except for its parent. But if the parent doesn't have degree >= k, I can't eliminate v as it may still be connected with others.
Any hints?
Edit:
According to the answer of #Michael J. Barber, I implemented it and update the code here:
Can anyone have a look at the key method of the codes public Graph kCore(Graph g, int k)? Do I do it right? Is it O(m+n)?
class EdgeNode {
EdgeNode next;
int y;
}
public class Graph {
public EdgeNode[] edges;
public int numVertices;
public boolean directed;
public Graph(int _numVertices, boolean _directed) {
numVertices = _numVertices;
directed = _directed;
edges = new EdgeNode[numVertices];
}
public void insertEdge(int x, int y) {
insertEdge(x, y, directed);
}
public void insertEdge(int x, int y, boolean _directed) {
EdgeNode edge = new EdgeNode();
edge.y = y;
edge.next = edges[x];
edges[x] = edge;
if (!_directed)
insertEdge(y, x, true);
}
public Graph kCore(Graph g, int k) {
int[] degree = new int[g.numVertices];
boolean[] deleted = new boolean[g.numVertices];
int numDeleted = 0;
updateAllDegree(g, degree);// get all degree info for every vertex
for (int i = 0;i < g.numVertices;i++) {
**if (!deleted[i] && degree[i] < k) {
deleteVertex(p.y, deleted, g);
}**
}
//Construct the kCore subgraph
Graph h = new Graph(g.numVertices - numDeleted, false);
for (int i = 0;i < g.numVertices;i++) {
if (!deleted[i]) {
EdgeNode p = g[i];
while(p!=null) {
if (!deleted[p.y])
h.insertEdge(i, p.y, true); // I just insert the good edge as directed, because i->p.y is inserted and later p.y->i will be inserted too in this loop.
p = p.next;
}
}
}
}
return h;
}
**private void deleteVertex(int i, boolean[] deleted, Graph g) {
deleted[i] = true;
EdgeNode p = g[i];
while(p!=null) {
if (!deleted[p.y] && degree[p.y] < k)
deleteVertex(p.y, deleted, g);
p = p.next;
}
}**
private void updateAllDegree(Graph g, int[] degree) {
for(int i = 0;i < g.numVertices;i++) {
EdgeNode p = g[i];
while(p!=null) {
degree[i] += 1;
p = p.next;
}
}
}
}
A maximal induced subgraph where the vertices have minimum degree k is called a k-core. You can find the k-cores just by repeatedly removing any vertices with degree less than k.
In practice, you first evaluate the degrees of all the vertices, which is O(m). You then go through the vertices looking for vertices with degree less than k. When you find such a vertex, cut it from the graph and update the degrees of the neighbors, also deleting any neighbors whose degrees drop below k. You need to look at each vertex at least once (so doable in O(n)) and update degrees at most once for each edge (so doable in O(m)), giving a total asymptotic bound of O(m+n).
The remaining connected components are the k-cores. Find the biggest one by evaluating their sizes.