Runtime of singe-source shortest paths in the directed acyclic graphs algorithm - algorithm

Here is the Algorithm:
Topologically sort the Vertices of G
Initialize - Single - Source(G,s)
for each vertex u, taken in topologically sorted order
for each vertex v in G.Adjacent[u]
Relax(u,v,w)
Topological sort has Runtime O(V + E), where V - is the number of
Vertices and E - is a number of edges
Initialize - Single - Source(G,s) has runtime O(V)
The main question is double for Loop: The running time of the double for Loop is O(V + E). But I cannot understand, why it's not O(V*E)? Because for every Vertices we go through every edge and normally one nested Loop(all together 2 for Loops) have complexity O(N^2), but in this case it's not true.

For each vertex u, you only iterate through the edges that go out from u. Each distinct edge is visited only once, and that's why the algorithm takes O(V+E) time.
This assumes you are using a graph representation (like adjacency lists, not a matrix) that allows quick access to every vertex's adjacent edges. The topological sort also requires this.

Related

Time Complexity Analysis of BFS

I know that there are a ton of questions out there about the time complexity of BFS which is : O(V+E)
However I still struggle to understand why is the time complexity O(V+E) and not O(V*E)
I know that O(V+E) stands for O(max[V,E]) and my only guess is that it has something to do with the density of the graph and not with the algorithm itself unlike say Merge Sort where it's time complexity is always O(n*logn).
Examples I've thought of are :
A Directed Graph with |E| = |V|-1 and yeah the time complexity will be O(V)
A Directed Graph with |E| = |V|*|V-1| and the complexity would in fact be O(|E|) = O(|V|*|V|) as each vertex has an outgoing edge to every other vertex besides itself
Am I in the right direction? Any insight would be really helpful.
Your "examples of thought" illustrate that the complexity is not O(V*E), but O(E). True, E can be a large number in comparison with V, but it doesn't matter when you say the complexity is O(E).
When the graph is connected, then you can always say it is O(E). The reason to include V in the time complexity, is to cover for the graphs that have many more vertices than edges (and thus are disconnected): the BFS algorithm will not only have to visit all edges, but also all vertices, including those that have no edges, just to detect that they don't have edges. And so we must say O(V+E).
The complexity comes off easily if you walk through the algorithm. Let Q be the FIFO queue where initially it contains the source node. BFS basically does the following
while Q not empty
pop u from Q
for each adjacency v of u
if v is not marked
mark v
push v into Q
Since each node is added once and removed once then the while loop is done O(V) times. Also each time we pop u we perform |adj[u]| operations where |adj[u]| is the number of
adjacencies of u.
Therefore the total complexity is Sum (1+|adj[u]|) over all V which is O(V+E) since the sum of adjacencies is O(E) (2E for undirected graph and E for a directed one)
Consider a situation when you have a tree, maybe even with cycles, you start search from the root and your target is the last leaf of your tree. In this case you will traverse all the edges before you get into your destination.
E.g.
0 - 1
1 - 2
0 - 2
0 - 3
In this scenario you will check 4 edges before you actually find a node #3.
It depends on how the adjacency list is implemented. A properly implemented adjacency list is a list/array of vertices with a list of related edges attached to each vertex entry.
The key is that the edge entries point directly to their corresponding vertex array/list entry, they never have to search through the vertex array/list for a matching entry, they can just look it up directly. This insures that the total number of edge accesses is 2E and the total number of vertex accesses is V+2E. This makes the total time O(E+V).
In improperly implemented adjacency lists, the vertex array/list is not directly indexed, so to go from an edge entry to a vertex entry you have to search through the vertex list which is O(V), which means that the total time is O(E*V).

How to find longest increasing subsequence among all simple paths of an unweighted general graph?

Let G = (V, E) be an unweighted general graph in which every vertex v has a weight w(v).
An increasing subsequence of a simple path p in G is a sequence of vertices of p in which the weights of all vertices along this sequence increase. The simple paths can be closed paths.
A longest increasing subsequence (LIS) of a simple path p is an increasing subsequence of p such that has maximum number of vertices.
The question is that, how to find a longest increasing subsequence among all simple paths of G?
Note that the graph is undirected, therefore it is not a directed acyclic graph (DAG).
Here's a very fast algorithm for solving this problem. The longest increasing subsequence in the graph is a subsequence of a path in the graph, and each path must belong purely to a single connected component. So if we can solve this problem on connected components, we can solve it for the overall graph by finding the best solution across all connected components.
Next, think about the case where you're solving this problem for a connected graph G. In that case, the longest increasing subsequence you could find would be formed by sorting the nodes by their weight, then traversing from the lowest-weight node to the second, then to the third, then to the fourth, etc. If there are any ties or duplicates, you can just skip them. In other words, you can solve this problem by
Sorting all the nodes by weight,
Discarding all but one node of each weight, and
Forming an LIS by visiting each node in sequence.
This leads to a very fast algorithm for the overall problem. In time O(m + n), find all connected components. For each connected component, use the preceding algorithm in time O(Sort(n)), where Sort(n) is the time required to sort n elements (which could be Θ(n log n) if you use heapsort, Θ(n + U) for bucket sort, Θ(n lg U) for radix sort, etc.). Then, return the longest sequence you find.
Overall, the runtime is O(m + n + &Sort(n)), which beats my previous approach and should be a lot easier to code up.
I had originally posted this answer, which I'll leave up because I think it's interesting:
Imagine that you pick a simple path out of the graph G and look at the longest increasing subsequence of that path. Although the path walks all over the graph and might have lots of intermediary nodes, the longest increasing subsequence of that path really only cares about
the first node on the path that's also a part of the LIS, and
from that point, the next-largest value in the path.
As a result, we can think about forming an LIS like this. Start at any node in the graph. Now, travel to any node in the graph that (1) has a higher value than the current node and (2) is reachable from the current node, then repeat this process as many times as desired. The goal is to do so in a way that gives the longest possible sequence of increasing values.
We can model this process as finding a longest path in a DAG. Each node in the DAG represents a node in the original graph G, and there's an edge from a node u to a node v if
there's a path from u to v in G, and
w(u) < w(v).
This is a DAG because of that second condition, even though the original graph isn't a DAG.
So we can solve this overall problem in a two-step process. First, build the DAG described above. To do so:
Find the connected components of the original graph G and label each node with its connected component number. Time: O(m + n).
For each node u in G, construct a corresponding node u' in a new DAG D. Time: O(n).
For each node u in G, and for each node v in G that's in the same SCC as u, if w(u) < w(v), add an edge from u' to v'. Time: Θ(n2) in the worst-case, Θ(n) in the best case.
Find the longest path in D. This path corresponds to the longest increasing subsequence of any simple path in G. Time: O(m + n).
Overall runtime: Θ(n2) in the worst-case, Θ(m + n) in the best-case.

tree graph - find how many pairs of vertices, for which the sum of edges on a path between them is C

I've got a weighted tree graph, where all the weights are positive. I need an algorithm to solve the following problem.
How many pairs of vertices are there in this graph, for which the sum of the weights of edges between them equals C?
I thought of a solutions thats O(n^2)
For each vertex we start a DFS from it and stop it when the sum gets bigger than C. Since the number of edges is n-1, that gives us obviously an O(n^2) solution.
But can we do better?
For an undirected graph, in terms of theoretic asymptotic complexity - no, you cannot do better, since the number of such pairs could be itself O(n^2).
As an example, take a 'sun/flower' graph:
G=(V[union]{x},E)
E = { (x,v) | v in V }
w(e) = 1 (for all edges)
It is easy to see that the graph is indeed a tree.
However, the number of pairs that have distance of exactly 2 is (n-1)(n-2) which is in Omega(n^2), and thus any algorithm that finds all of them will be Omega(n^2) in this case.

Why is the complexity of BFS O(V+E) instead of O(V*E)?

Some pseudocode here (disregard my style)
Starting from v1(enqueued):
function BFS(queue Q)
v2 = dequeue Q
enqueue all unvisited connected nodes of v2 into Q
BFS(Q)
end // maybe minor problems here
Since there are V vertices in the graph, and these V vertices are connected to E edges, and visiting getting connected nodes (equivalent to visiting connected edges) is in the inner loop (the outer loop is the recursion itself), it seems to me that the complexity should be O(V*E) rather than O(V+E). Can anyone explain this for me?
E is not the number of edges adjacent to each vertex - its actually the total number of edges in the graph. Defining it this way is useful because you don't necessarily have the same number of edges on every single vertex.
Since each edge gets visited once by the time the DFS ends, you get O(E) complexity from that part. Then you add the O(V) for visiting each vertex once and get O(V + E) on total.

Is the worst time complexity of BFS in a graph traversal n+2E?

I understand that time complexity of BFS in a graph traversal is O( V + E ) since every vertex and every edge will be explored in the worst case.
Well,is the exact time complexity v+2E ??
Every vertex is explored once+ Every adjacent vertices
The sum of the degree of all the vertices in a graph= No of edges*2= 2E
Thus the time complexity is n+2E..Am i correct?
For a random graph, the time complexity is O(V+E): Breadth-first search
As stated in the link, according to the topology of your graph, O(E) may vary from O(V) (if your graph is acyclic) to O(V^2) (if all vertices are connected with each other).
Therefore the time complexity varies fromO(V + V) = O(V) to O(V + V^2) = O(V^2) according to the topology of your graph.
Besides, since |V| <= 2 |E|, then O(3E) = O(E) is also correct, but the bound is looser.
Assumptions
Let's assume that G is connected and undirected. If it's not connected, then you can apply the below idea to every connected component of G independently. In addition, let's assume that G is represented as an adjacency lists and for every vertex v, we can decide if v was visited in O(1) time for example using a lookup table.
Analyze
If you want to count the exact number of steps in the BFS you can observe that:
Since G is connected, BFS will visit every vertex exactly once, so we count |V| visits in nodes. Notice that in one visit you may perform more operations, not counting looping over edges, than just marking current vertex visited.
For every vertex v we want to count, how many edges the BFS examines at this vertex.
You have to loop over all edges of v to perform the BFS. If you skip one edge, then it's easy to show that BFS is not correct. So every edge is examined twice.
One question may arise here. Someone could ask, if there is a need to examine the edge (p, v) in vertex v, where p is the parent of v in already constructed BFS tree, i.e. we came to v directly from p. Of course you don't have to consider this edge, but deciding to skip this edge also costs at least one additional operation:
for (v, u) in v.edges:
if u == p: # p is the parent of v in already constructed tree
continue
if not visited[u]:
BFS(u, parent=v)
It examines the same number of edges that the code below, but has higher complexity, because for all but one edge, we run two if-statements rather than one.
for (v, u) in v.edges:
if not visited[u]: # if p is the parent of v, then p is already visited
BFS(u, parent=v)
Conclusion
You may even develop a different method to skip edge (v, p), but it always takes at least one operation, so it's a wasteful effort.

Resources