Shortest paths between vertices in two sets - algorithm

You're given a directed weighted graph, which has m edges and n vertices. Every edge's weight is nonnegative. The vertices are either in set S1 or in set S2 (S1 and S2 are disjoint). You need to find the shortest path between any pairs (v1, v2) (v1 is in S1, and v2 in S2).
The running time of the solution should be O(mlogn).
'Nonnegtive' and 'mlogn' remind me of Dijkstra, but I have no idea how to use Dijkstra for constant times to solve it.
Thanks in advance.

Dijkstra, but modified:
Initialize
double min = infinite; // shortest distance between n1 (of s1) and n2 (of s2)
Node n1 = null, n2 = null; // members of s1 and s2 with shortest distance between them
Set<Node> visited = new Set<>(); // initially-empty set of visited nodes of s1
Run Dijkstra from each node of s1 until
a. you reach a node of s2 (add start node to visited; if distance < min, update min, n1 and n2)
b. you reach a node of s1 that is already in visited (add the start node to visited)
c. you run out of nodes to visit (possible only if graph is not connected; add the start node to visited)
Return result pair.
Running Dijkstra is O(m log n) worst-case, but you will not be running it fully |s1| times - instead, the end result of the above steps is equivalent or faster (in terms of edges visited) to running Dijkstra fully just once per connected component. Therefore, the whole algorithm is O(m log n)

Related

Find the N highest cost vertices that has a path to S, where S is a vertex in an undirected Graph G

I would like to know, what would be the most efficient way (w.r.t., Space and Time) to solve the following problem:
Given an undirected Graph G = (V, E), a positive number N and a vertex S in V. Assume that every vertex in V has a cost value. Find the N highest cost vertices that is connected to S.
For example:
G = (V, E)
V = {v1, v2, v3, v4},
E = {(v1, v2),
(v1, v3),
(v2, v4),
(v3, v4)}
v1 cost = 1
v2 cost = 2
v3 cost = 3
v4 cost = 4
N = 2, S = v1
result: {v3, v4}
This problem can be solved easily by the graph traversal algorithm (e.g., BFS or DFS). To find the vertices connected to S, we can run either BFS or DFS starting from S. As the space and time complexity of BFS and DFS is same (i.e., time complexity: O(V+E), space complexity: O(E)), here I am going to show the pseudocode using DFS:
Parameter Definition:
* G -> Graph
* S -> Starting node
* N -> Number of connected (highest cost) vertices to find
* Cost -> Array of size V, contains the vertex cost value
procedure DFS-traversal(G,S,N,Cost):
let St be a stack
let Q be a min-priority-queue contains <cost, vertex-id>
let discovered is an array (of size V) to mark already visited vertices
St.push(S)
// Comment: if you do not want to consider the case "S is connected to S"
// then, you can consider commenting the following line
Q.push(make-pair(S, Cost[S]))
label S as discovered
while St is not empty
v = St.pop()
for all edges from v to w in G.adjacentEdges(v) do
if w is not labeled as discovered:
label w as discovered
St.push(w)
Q.push(make-pair(w, Cost[w]))
if Q.size() == N + 1:
Q.pop()
let ret is a N sized array
while Q is not empty:
ret.append(Q.top().second)
Q.pop()
Let's first describe the process first. Here, I run the iterative version of DFS to traverse the graph starting from S. During the traversal, I use a priority-queue to keep the N highest cost vertices that is reachable from S. Instead of the priority-queue, we can use a simple array (or even we can reuse the discovered array) to keep the record of the reachable vertices with cost.
Analysis of space-complexity:
To store the graph: O(E)
Priority-queue: O(N)
Stack: O(V)
For labeling discovered: O(V)
So, as O(E) is the dominating term here, we can consider O(E) as the overall space complexity.
Analysis of time-complexity:
DFS-traversal: O(V+E)
To track N highest cost vertices:
By maintaining priority-queue: O(V*logN)
Or alternatively using array: O(V*logV)
The overall time-complexity would be: O(V*logN + E) or O(V*logV + E)

Shortest path distance from source(s) to all nodes in the graph - O(m + n log(n)) time

Let G(V,E) be a directed weighted graph with edge lengths, where all of the edge lengths are positive except two of the edges have negative lengths. Given a fixed vertex s, give an algorithm computing shortest paths from s to any other vertex in O(e + v log(v)) time.
My work:
I am thinking about using the reweighting technique of Johnson's algorithm. And then, run Belford Algo once and apply Dijkstra v times. This will give me the time complexity as O(v^2 log v + ve).
This is the standard all pair shortest problem, As I only need one vertex (s) - my time complexity will be O(v log v + e) right?
For this kind of problem, changing the graph is often a lot easier than changing the algorithm. Let's call the two negative-weight edges N1 and N2; a path by definition cannot use the same edge more than once, so there are four kinds of path:
A. Those which use neither N1 nor N2,
B. Those which use N1 but not N2,
C. Those which use N2 but not N1,
D. Those which use both N1 and N2.
So we can construct a new graph with four copies of each node from the original graph, such that for each node u in the original graph, (u, A), (u, B), (u, C) and (u, D) are nodes in the new graph. The edges in the new graph are as follows:
For each positive weight edge u-v in the original graph, there are four copies of this edge in the new graph, (u, A)-(v, A) ... (u, D)-(v, D). Each edge in the new graph has the same weight as the corresponding edge in the original graph.
For the first negative-weight edge (N1), there are two copies of this edge in the new graph; one from layer A to layer B, and one from layer C to layer D. These new edges have weight 0.
For the second negative-weight edge (N2), there are two copies of this edge in the new graph; one from layer A to layer C, and one from layer B to layer D. These new edges have weight 0.
Now we can run any standard single-source shortest-path problem, e.g. Dijkstra's algorithm, just once on the new graph. The shortest path from the source to a node u in the original graph will be one of the following four paths in the new graph, whichever corresponds to a path of the lowest weight in the original graph:
(source, A) to (u, A) with the same weight.
(source, A) to (u, B) with the weight in the new graph minus the weight of N1.
(source, A) to (u, C) with the weight in the new graph minus the weight of N2.
(source, A) to (u, D) with the weight in the new graph minus the weights of N1 and N2.
Since the new graph has 4V vertices and 4E - 2 edges, the worst-case performance of Dijkstra's algorithm is O((4E - 2) + 4V log 4V), which simplifies to O(E + V log V) as required.
To ensure that a shortest path in the new graph corresponds to a genuine path in the original graph, it remains to be proved that a path from e.g. (source, A) to (u, B) will not use two copies of the same edge from the original graph. That is quite easy to show, but I'll leave it to you as something to think about.

Find two paths in a graph that are in distance of at least D(constant)

Instance of the problem:
Undirected and unweighted graph G=(V,E).
two source nodes a and b, two destination nodes c and d and a constant D(complete positive number).(we can assume that lambda(c,d),lambda(a,b)>D, when lambda(x,y) is the shortest path between x and y in G).
we have two peoples standing on the nodes a and b.
Definition:scheduler set-
A scheduler set is a set of orders such that in each step only one of the peoples make a move from his node v to one of v neighbors, when the starting position of them is in the nodes a,b and the ending position is in the nodes c,d.A "scheduler set" is missing-disorders if in each step the distance between the two peoples is > D.
I need to find an algorithm that decides whether there is a "missing-disorders scheduler set" or not.
any suggestions?
One simple solution would be to first solve all-pairs shortest paths using n breadth-first searches from every node in O(n * (n + m)).
Then create the graph of valid node pairs (x,y) with lambda(x, y) > D, with edges indicating the possible moves. There is an edge {(v,w), (x,y)} if v = x and there is an edge {w, y} in the original graph or if w = y and there is an edge {v, x} in the original graph. This new graph has O(n^2) nodes and O(nm) edges.
Now you just need to check whether (c, d) is reachable from (a, b) in the new graph. This can be achieved using DFS or BFS.
The total runtime be O(n * (n + m)).

Explaination of prim's algorithm

I have to implement Prim's algorithm using a min-heap based priority queue. If my graph contained the vertices A, B, C, and D with the below undirected adjacency list... [it is sorted as (vertex name, weight to adjacent vertex)]
A -> B,4 -> D,3
B -> A,4 -> C,1 -> D,7
C -> B,1
D -> B,7 -> A,3
Rough Graph:
A-4-B-1-C
| /
3 7
| /
D
What would the priority queue look like? I have no idea what I should put into it. Should I put everything? Should I put just A B C and D. I have no clue and I would really like an answer.
Prim's: grow the tree by adding the edge of min weight with exactly one end in the tree.
The PQ contains the edges with one end in the tree.
Start with vertex 0 added to tree and add all vertices connected to 0 into the PQ.
DeleteMin() will give you the min weight edge (v, w), you add it to the MST and add all vertices connected to w into the PQ.
is this enough to get you started?
---
so, in your example, the in the first iteration, the MST will contain vertex A, and the PQ will contain the 2 edges going out from A:
A-4-B
A-3-D
Here's prim's algorithm:
Choose a node.
Mark it as visited.
Place all edges from this node into a priority queue (sorted to give smallest weights first).
While queue not empty:
pop edge from queue
if both ends are visited, continue
add this edge to your minimum spanning tree
add all edges coming out of the node that hasn't been visited to the queue
mark that node as visited
So to answer your question, you put the edges in from one node.
If you put all of the edges into the priority queue, you've got Kruskal's algorithm, which is also used for minimum spanning trees.
It depends on how you represent your graph as to what the running time is. Adjacency lists make the complexity O(E log E) for Kruskal's and Prim's is O(E log V) unless you use a fibonacci heap, in which case you can achieve O(E + V log V).
You can assign weights to your vertices. Then use priority queue based on these weights. This is a reference from the wiki: http://en.wikipedia.org/wiki/Prim's_algorithm
MST-PRIM (G, w, r) {
for each u ∈ G.V
u.key = ∞
u.parent = NIL
r.key = 0
Q = G.V
while (Q ≠ ø)
u = Extract-Min(Q)
for each v ∈ G.Adj[u]
if (v ∈ Q) and w(u,v) < v.key
v.parent = u
v.key = w(u,v)
}
Q will be your priority queue. You can use struct to hold the information of the vertices.

Why is the time complexity of both DFS and BFS O( V + E )

The basic algorithm for BFS:
set start vertex to visited
load it into queue
while queue not empty
for each edge incident to vertex
if its not visited
load into queue
mark vertex
So I would think the time complexity would be:
v1 + (incident edges) + v2 + (incident edges) + .... + vn + (incident edges)
where v is vertex 1 to n
Firstly, is what I've said correct? Secondly, how is this O(N + E), and intuition as to why would be really nice. Thanks
Your sum
v1 + (incident edges) + v2 + (incident edges) + .... + vn + (incident edges)
can be rewritten as
(v1 + v2 + ... + vn) + [(incident_edges v1) + (incident_edges v2) + ... + (incident_edges vn)]
and the first group is O(N) while the other is O(E).
DFS(analysis):
Setting/getting a vertex/edge label takes O(1) time
Each vertex is labeled twice
once as UNEXPLORED
once as VISITED
Each edge is labeled twice
once as UNEXPLORED
once as DISCOVERY or BACK
Method incidentEdges is called once for each vertex
DFS runs in O(n + m) time provided the graph is represented by the adjacency list structure
Recall that Σv deg(v) = 2m
BFS(analysis):
Setting/getting a vertex/edge label takes O(1) time
Each vertex is labeled twice
once as UNEXPLORED
once as VISITED
Each edge is labeled twice
once as UNEXPLORED
once as DISCOVERY or CROSS
Each vertex is inserted once into a sequence Li
Method incidentEdges is called once for each vertex
BFS runs in O(n + m) time provided the graph is represented by the adjacency list structure
Recall that Σv deg(v) = 2m
Very simplified without much formality: every edge is considered exactly twice, and every node is processed exactly once, so the complexity has to be a constant multiple of the number of edges as well as the number of vertices.
An intuitive explanation to this is by simply analysing a single loop:
visit a vertex -> O(1)
a for loop on all the incident edges -> O(e) where e is a number of edges incident on a given vertex v.
So the total time for a single loop is O(1)+O(e). Now sum it for each vertex as each vertex is visited once. This gives
For every V
=>
O(1)
+
O(e)
=> O(V) + O(E)
Time complexity is O(E+V) instead of O(2E+V) because if the time complexity is n^2+2n+7 then it is written as O(n^2).
Hence, O(2E+V) is written as O(E+V)
because difference between n^2 and n matters but not between n and 2n.
I think every edge has been considered twice and every node has been visited once, so the total time complexity should be O(2E+V).
Short but simple explanation:
I the worst case you would need to visit all the vertex and edge hence
the time complexity in the worst case is O(V+E)
In Bfs, each neighboring vertex is inserted once into a queue. This is done by looking at the edges of the vertex. Each visited vertex is marked so it cannot be visited again: each vertex is visited exactly once, and all edges of each vertex are checked. So the complexity of BFS is V+E.
In DFS, each node maintains a list of all its adjacent edges, then, for each node, you need to discover all its neighbors by traversing its adjacency list just once in linear time. For a directed graph, the sum of the sizes of the adjacency lists of all the nodes is E(total number of edges). So, the complexity of DFS is O(V + E).
It's O(V+E) because each visit to v of V must visit each e of E where |e| <= V-1. Since there are V visits to v of V then that is O(V). Now you have to add V * |e| = E => O(E). So total time complexity is O(V + E).

Resources