Dijkstra Time Complexity using Binary Heap - algorithm

Let G(V, E)be an undirected graph with positive edge weights. Dijkstra’s single source shortest path algorithm can be implemented using the binary heap data structure with time complexity:
1. O(|V|^2)
2. O(|E|+|V|log|V|)
3. O(|V|log|V|)
4. O((|E|+|V|)log|V|)
========================================================================
Correct answer is -
O((|E|+|V|)log|V|)
=========================================================================
My Approach is as follows -
O(V+V+VlogV+ElogV) = O(ElogV)
O(V) to intialize.
O(V) to Build Heap.
VlogV to perform Extract_Min
ElogV to perform Decrease Key
Now, as I get O(ElogV) and when I see options, a part of me says the
correct one is O(VlogV) because for a sparse Graph |V| = |E|, but as I
said the correct answer is O((|E|+|V|)log|V|). So, where am I going
wrong?

Well, you are correct that the complexity is actually O(E log V).
Since E can be up to (V^2 - V)/2, this is not the same as O(V log V).
If every vertex has an edge, then V <= 2E, so in that case, O(E log V) = O( (E+V) log V). That is the usual case, and corresponds to the "correct" answer.
But technically, O(E log V) is not the same as O( (E+V) log V), because there may be a whole bunch of disconnected vertexes in V. When that is the case, however, Dijkstra's algorithm will never see all those vertexes, since it only finds vertexes connected to the single source. So, when the difference between these two complexities is important, you are right and the "correct answer" is not.

Let me put it this way.The correct answer is O((E+V)logV)).If the graph has the source vertex not reachable to all of the other vertices,VlogV could be more than ElogV.But if we assume that the source vertex is reachable to every other vertex, the graph will have at least V-1 edges.So,it will be ElogV.It is more to do with the reachability from the source vertex.

Related

Time Complexity Analysis of BFS

I know that there are a ton of questions out there about the time complexity of BFS which is : O(V+E)
However I still struggle to understand why is the time complexity O(V+E) and not O(V*E)
I know that O(V+E) stands for O(max[V,E]) and my only guess is that it has something to do with the density of the graph and not with the algorithm itself unlike say Merge Sort where it's time complexity is always O(n*logn).
Examples I've thought of are :
A Directed Graph with |E| = |V|-1 and yeah the time complexity will be O(V)
A Directed Graph with |E| = |V|*|V-1| and the complexity would in fact be O(|E|) = O(|V|*|V|) as each vertex has an outgoing edge to every other vertex besides itself
Am I in the right direction? Any insight would be really helpful.
Your "examples of thought" illustrate that the complexity is not O(V*E), but O(E). True, E can be a large number in comparison with V, but it doesn't matter when you say the complexity is O(E).
When the graph is connected, then you can always say it is O(E). The reason to include V in the time complexity, is to cover for the graphs that have many more vertices than edges (and thus are disconnected): the BFS algorithm will not only have to visit all edges, but also all vertices, including those that have no edges, just to detect that they don't have edges. And so we must say O(V+E).
The complexity comes off easily if you walk through the algorithm. Let Q be the FIFO queue where initially it contains the source node. BFS basically does the following
while Q not empty
pop u from Q
for each adjacency v of u
if v is not marked
mark v
push v into Q
Since each node is added once and removed once then the while loop is done O(V) times. Also each time we pop u we perform |adj[u]| operations where |adj[u]| is the number of
adjacencies of u.
Therefore the total complexity is Sum (1+|adj[u]|) over all V which is O(V+E) since the sum of adjacencies is O(E) (2E for undirected graph and E for a directed one)
Consider a situation when you have a tree, maybe even with cycles, you start search from the root and your target is the last leaf of your tree. In this case you will traverse all the edges before you get into your destination.
E.g.
0 - 1
1 - 2
0 - 2
0 - 3
In this scenario you will check 4 edges before you actually find a node #3.
It depends on how the adjacency list is implemented. A properly implemented adjacency list is a list/array of vertices with a list of related edges attached to each vertex entry.
The key is that the edge entries point directly to their corresponding vertex array/list entry, they never have to search through the vertex array/list for a matching entry, they can just look it up directly. This insures that the total number of edge accesses is 2E and the total number of vertex accesses is V+2E. This makes the total time O(E+V).
In improperly implemented adjacency lists, the vertex array/list is not directly indexed, so to go from an edge entry to a vertex entry you have to search through the vertex list which is O(V), which means that the total time is O(E*V).

Minimum spanning tree to minimize cost

Can someone please help me solve this problem?
We have a set E of roads, a set H of highways, and a set V of different cities. We also have a cost x(i) associated to each road i and a cost y(i) associated to each highways i. We want to build the roads to connect the cities, with the conditions that there is always a path between any pair of cities and that we can build at most one highway, which may be cheaper than a road.
Set E and set H are different, and their respective costs are unrelated.
Design an algorithm to build the roads (with at most one highway) that minimize the total cost.
So, what we have is a fully connected graph of edges.
Solution steps:
Find the minimum spanning tree for the roads alone and consider it as the minimum cost.
Add one highway to the roads graph an calculate the minimum spanning cost tree again.
compare step 2 cost with the minimum cost to replace it if its smaller.
remove that high way.
go back to step 2 and go the steps again for each highway.
O(nm) = m*mst_cost(n)
Using Prim's or Kruskal's to build an MST: O(E log V).
The problem is the constraint of at most 1 highway.
1. Naive method to solve this:
For each possible highway, build the MST from scratch.
Time complexity of this solution: O(H E log V)
2. Alternative
Idea: If you build an MST, you can refine the MST with a better MST if you have an additional available edge you have not considered before.
Suppose the new edge connects (u,v). If you use this edge, you can remove the most expensive edge in the path between vertices u and v in the MST. You can find the path naively in O(V) time.
Using this idea, the time complexity is the cost to build the initial MST O(E log V) and the time to try to refine the MST with each of the H highways. The total algorithmic complexity is therefore O(E log V + H V), which is better than the first solution.
3. Optimized refinement
Instead of doing a naive path-searching method with the second method, we can find a faster way to do this. One related problem is LCA (lowest-common ancestor). A good way of solving LCA is using jump pointers. First you root hte tree, then each vertex will have jump pointers towards the root (1 step, 2 steps, 4 steps etc.) Pre-processing might cost O(V log V) time, and finding the LCA of 2 vertices is O(log V) worst case (although it is actually O(log (depth of tree)) which is usually better).
Once you have found the LCA, that implicitly gives you the path between vertices u and v. However, to find the most expensive edge to delete could be expensive since traversing the path is costly.
In 1-dimensional problems, the range-maximum-query (RMQ) can be employed. This uses a segment tree to solve the RMQ in O(log N) time.
Instead of a 1-dimensional space (like an array), we have a tree. However, we can apply the same idea, and build a segment tree-like structure. In fact, this is equivalent to bundling an extra piece of information with each jump pointer. To find the LCA, each vertex in the tree will have log(tree depth) jump pointers towards the root. We can bundle the maximum edge weight of the edges we jump over with the jump pointer. The cost of adding this information is the same as creating the jump pointer in the first place. Therefore, a slight refinement to the LCA algorithm allows us to find the maximum edge weight on the path between vertices u and v in O(log (depth)) time.
Finally, putting it together, the algorithmic complexity of this 3rd solution is O(E log V + H log V) or equivalently O((E+H) log V).

How can a heap be used to optimizie Prim's minimum spanning tree algorithm?

I have to solve a question that is something like this:
I am given a number N which represents the number of points I have. Each point has two coordinates: X and Y.
I can find the distance between two points with the following formula:
abs(x2-x1)+abs(y2-y1),
(x1,y1) being the coordinates of the first point, (x2,y2) the coordinates of the second point and abs() being the absolute value.
I have to find the minimum spanning tree, meaning I must have all my points connected with the sum of the edges being minimal. Prim's algorithm is good, but it is too slow. I read that I can make it faster using a heap but I didn't find any article that explains how to do that.
Can anyone explain me how Prim's algorithm works with a heap(some sample code would be good but not neccesarily), please?
It is possible to solve this problem efficiently(in O(n log n) time), but it is not that easy. Just using the Prim's algorithm with a heap does not help(it actually makes it even slower), because its time complexity is O(E log V), which is O(n^2 * log n) in this case.
However, you can use the Delaunay triangulation to reduce the number of edges in the graph. The Delaunay triangulation graph is planar, so it has linear number of edges. That's why running the Prim's algorithm with a heap on it gives O(n log n) time complexity(there are O(n) edges and n vertices). You can read more about it here(covering this algorithm in details and proving its correctness would make my answer way too long): http://en.wikipedia.org/wiki/Euclidean_minimum_spanning_tree. Note that even though the article is about the Euclidian mst, the approach for your case is essentially the same(it is possible to build the Delaunay triangulation for manhattan distance efficiently, too).
A description of the Prim's algorithm with a heap itself is already present in two other answers to your question.
From the Wikipedia article on Prim's algorithm:
[S]toring vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed minimum spanning tree (MST) (or infinity if no such edge exists). Every time a vertex v is chosen and added to the MST, a decrease-key operation is performed on all vertices w outside the partial MST such that v is connected to w, setting the key to the minimum of its previous value and the edge cost of (v,w).
While it was pointed out that Prim's with a heap is O(E log V), which is O(n^2 log n) in the worst case, I can provide what makes the heap faster in cases other than that worst case, since that has still not been answered.
What makes Prim's so costly at O(V^2) is the necessary updating each iteration in the algorithm. In general, Prim's works by keeping a table of your vertices with the lowest length to other vertices and picking the cheapest vertex to add to your growing tree until all are added. Every time you add a vertex, you must then go back to your table and update any vertices that can now be accessed with less weight. You then must walk back all the way through your table to decide which vertex is cheapest to add. This setup - having to pick the next vertex (O(V)) V times - gives the O(V^2).
The heap is able to help this running time is all cases besides the worst case because it fixes this bottleneck. By working with a minimum heap, you can access the minimum weight in consideration in O(1). Additionally, it costs O(log V) to fix a heap after adding a number to it to maintain its properties, which is done E times for O(E log V) to maintain the heap for Prim's. This becomes the new bottleneck, which is what gives rise to the final running time of O(E log V).
So, depending on how much you know about your data, Prim's with a heap can certainly be more efficient than without!

Graph In-degree Calculation from Adjacency-list

I came across this question in which it was required to calculate in-degree of each node of a graph from its adjacency list representation.
for each u
for each Adj[i] where i!=u
if (i,u) ∈ E
in-degree[u]+=1
Now according to me its time complexity should be O(|V||E|+|V|^2) but the solution I referred instead described it to be equal to O(|V||E|).
Please help and tell me which one is correct.
Rather than O(|V||E|), the complexity of computing indegrees is O(|E|). Let us consider the following pseudocode for computing indegrees of each node:
for each u
indegree[u] = 0;
for each u
for each v \in Adj[u]
indegree[v]++;
First loop has linear complexity O(|V|). For the second part: for each v, the innermost loop executes at most |E| times, while the outermost loop executes |V| times. Therefore the second part appears to have complexity O(|V||E|). In fact, the code executes an operation once for each edge, so a more accurate complexity is O(|E|).
According to http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)Graphs.html, Section 4.2, with an adjacency list representation,
Finding predecessors of a node u is extremely expensive, requiring looking through every list of every node in time O(n+m), where m is the total number of edges.
So, in the notation used here, the time complexity of computing the in-degree of a node is O(|V| + |E|).
This can be reduced at the cost of additional space of using extra space, however. The Wiki also states that
adding a second copy of the graph with reversed edges lets us find all predecessors of u in O(d-(u)) time, where d-(u) is u's in-degree.
An example of a package which implements this approach is the Python package Networkx. As you can see from the constructor of the DiGraph object for directional graphs, networkx keeps track of both self._succ and self._pred, which are dictionaries representing the successors and predecessors of each node, respectively. This allows it to compute each node's in_degree efficiently.
O(|V|+|E|) is the correct answer, because you visit each vertex in O(|V|) and each time you visit a fraction of the edges so O(|E|) in total, also usually |E|>>|V| so O(|E|) is also correct

Even length path algorithm

I was asked for my homework to write an efficient algorithm that finds all the vertices in a directed graph which have even length of path to them from the given vertex.
This is what I thought of:
(It's very similar to "Visit" algorithm of DFS)
Visit(vertex u)
color[u]<-gray
for each v E adj[u]
for each w E adj[v]
if color[w] = white then
print w
Visit(w)
I think it works but I'm having hard time calculating it's efficiency, especially when the graph is with cycles. Could you help me?
If I may suggest an alternative - I would have reduced the problem and use DFS instead of modifying DFS.
Given a graph G = (V,E), cretae a graph G' = (V,E') where E'={(u,v) | there is w in V such that (u,w) and (w,v) are in E)
In other words - we are creating a graph G', which has edge (u,v) if and only if there is a path of length 2 from u to v.
Given that graph, we can derive the following algorithm [high level pseudo-code]:
Create G' from G
run DFS on G' from the source s, and mark the same nodes DFS marked.
Correctness and time complexity analysis of the solution:
Complexity:
The complexity is obviously O(min{|V|^2,|E|^2} + |V|), because of part 1 - since there are at most min{|E|^2,|V|^2} edges in G', so DFS on step 2 runs in O(|E'| + |V|) = O(min{|V|^2,|E|^2} + |V|)
Correctness:
If the algorithm found that there is a path from v0 to vk, then from the correctness of DFS - there is a path v0->v1->...->vk on G', so there is a path v0->v0'->v1->v1'->...->vk of even length on G.
If there is a path of even length on G from v0 to vk, let it be v0->v1->...->vk. then v0->v2->...->vk is a path on G', and will be found by DFS - from the correctness of DFS.
As a side note:
Reducing problems instead of modifying algorithms is usually less vulnurable to bugs, and easier to analyze and prove correctness on, so you should usually prefer these over modifying algorithms when possible.
EDIT: regarding your solution: Well, analysing it shows they are both pretty much identical - with the exception of I was generating E' as pre-processing, and you are generating it on the fly, in each iteration.
Since your solution is generating the edges on the fly - it might to doing some work more then once. However, it is bounded to the job at most |V| times more, since each vertex is being visited at most once.
Assuming |E| = O(|V|^2) for simplicity, giving us total an upper bound for the run time of O(|V|^3) for your solution.
It is also a lower bound, look at the example of a clique, During each visit() of any node, the algorithm will do O(|V|^2) to generate all possibilities, and visit() one of the possibilities, since we visit exactly |V| nodes, we get total run time of Omega(|V|^3)
Since we found the solution is both O(|V|^3) and Omega(|V|^3), it is total of Theta(O(|V|^3))
For each undirected graph G(V,E) we should rebuilt the mentioned graph to be bipartite graph G'(V',E') when:
V' = V1 ∪ V2
E'={(u1,v2):(u,v)∈E, u1∈V1, v2∈V2}
V1={v1: v∈V}
V2={v2: v∈V}
For example the graph
becomes
On this graph (bipartite graph) we should run BFS algorithm - BFS(G',S1').
After running BFS(G',S1') we should return array d that contains length of shortest even path δ(s1,u1)

Resources