What is the time complexity of Dijkstra's alogrithm using an adjacency list and priority queue? - algorithm

Let's say I have code likes this:
enqueue source vertex
while(queue is not empty){
dequeue min vertex
add to shortest path set
iterate over vertex edges
if not in shortest path set and new distance smaller enqueue
}
What is the time complexity if the while loop runs for all edges in the graph instead of only running V times, for all vertices in the graph? Is it still O(ELogV) since it's O(E+E)*O(LogV)?

Yes, this is pretty much how you implement Dijkstra's algorithm when your priority queue doesn't support a DECREASE_KEY operation. The priority queue contains (cost,vertex) records, and whenever you find a vertex cost that is lower than the previous one, you just insert a new record.
The complexity becomes O(E log E), which is no bigger than O(E log V).
This is the same complexity that Dijkstra's algorithm has when you use a binary heap that does support the DECREASE_KEY operation, because DECREASE_KEY takes O(log V) time. To get down to O(E + V log V), you need to use a Fibonacci heap that can do DECREASE_KEY in constant time.

Related

Time Complexity of BFS and DFS for bot Matrix and Adjacency List

enter image description here
enter image description here
above are the pseudocode of BFS and DFS.
Now with my calculation I think time complexity for both the code will be O(n), but I also have another confusion that it might be O(V+E) where V stands for Vertex and E stands for Edges. Can anyone give me a detailed time complexity of both the pseudocode.
So in short, what will be the time complexity of the BFS and DFS on both Matrix and Adjacency List.
Let us analyze the time complexity of BFS first for adjacency list implementation.
For breadth-first search, that's what we do:
Start from a node and mark it as visited. Then mark all of the neighbors of that node as visited and add them to a queue. Then fetch the next node from the queue and perform the same operation until the queue is empty. If queue is empty but there are still unvisited nodes, call the BFS function again for that node.
When we are at a node, we check each neighbor of that node to fill up the queue. If a neighbor is already visited (visited[int(neighbor) - 1] = 1), we do not add it to the queue. Neighbor of a node is another node connected to it by an edge, therefore checking all neighbors means checking all the edges. This makes our time complexity O(E). Also since we add each node to the queue (and pop it later), it makes the time complexity O(V).
So which one should we take?
Well, we should take the maximum of E and V. That's why we say O(V+E). If one of them is larger than the other, then smaller can be seen as a constant.
For example, if we have a connected graph with N many nodes, we'll have N*(N-1) edges. At each node, we will check all the neighbors, which makes N*(N-1) many checks. Therefore time complexity will be max(N, N*(N-1)) = N*(N-1) = O(N^2)
For example, if we have a sparse graph with N many nodes, and say sqrt(N) many edges, we have to say time complexity of BFS should be O(N).
Same logic can be applied for DFS. You visit each node and check each edge to dive into the depths of the graph. And again it makes it O(V+E).
As to your assumption, it is partially correct. However, as I explained above we cannot say that time complexity will always be O(N). (I assume N is the number of vertices, you didn't specify that in your question.)
Notice that these are for the adjacency list implementation.
For adjacency matrix implementation, to check neighbors of a node, we have to check all the columns corresponding to the related row, which makes O(V). And we have to do it for all vertices, therefore it is O(V^2).
So, for matrix implementation, time complexity is not dependent on the number of edges. However in most cases O(V+E) << O(V^2), therefore prefer adjacency list implementation.

Why is the time complexity of Dijkstra O((V + E) logV)

I was reading about worst case time complexity for the Dijkstra algorithm using binary heap (the graph being represented as adjacency list).
According to wikipedia (https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Running_time) and various stackoverflow questions, this is O((V + E) logV) where E - number of edges, V - number of vertices. However I found no explanation as to why it can't be done in O(V + E logV).
With a self-balancing binary search tree or binary heap, the algorithm requires Θ((E+V) logV) time in the worst case
In case E >= V, the complexity reduces to O(E logV) anyway. Otherwise, we have O(E) vertices in the same connected component as the start vertex (and the algorithm ends once we get to them). On each iteration we obtain one of these vertices, taking O(logV) time to remove it from the heap.
Each update of the distance to a connected vertex takes O(logV) and the number of these updates is bound by the number of edges E so in total we do O(E) such updates. Adding O(V) time for initializing distances, we get final complexity O(V + E logV).
Where am I wrong?

shortest path between 2 vertices in undirected weighted graph

I am trying to find shortest path between 2 vertices in undirected weighted graph. It is also known that weights are integers less than log(log|V|), where |V| is amount of vertices. It is easy to solve using Bellman-Ford or Dijkstra algorithms, but is there any algorithm which can do it faster?
So far, I have been thinking of using BFS and dividing edges with weight greater than 1 into couple of them with weight 1, but it is not really good idea if |V| is large number. No, it is not my homework, I am just wondering.
One way to think of this question is to improve the running time of using Dijkstra's algorithm to find the shortest path between two vertices in the undirected weighted graph. So in this case, you can use a binary heap as the data structure. A heap is a complete binary tree with the heap property that every parent node is smaller (greater) than its children nodes in the tree in a min heap (a max heap). Here you can use the min heap to store the cost to each node from the starting node.
More information about heap can be found here: https://courses.csail.mit.edu/6.006/fall10/handouts/recitation10-8.pdf
With a heap, the running time of Dijkstra's algorithm can be reduced from O(V^2) to O(E log E), because selecting the minimum distance from the heap takes O(log V) (removing the minimum distance is O(1) and fixing the heap takes O(log V)) and updating distances to vertices takes O(E log V) in total (fixing heap takes O(log V) and it takes E times to examine neighbors and change costs).
Hope this help.

Dijkstra Time Complexity using Binary Heap

Let G(V, E)be an undirected graph with positive edge weights. Dijkstra’s single source shortest path algorithm can be implemented using the binary heap data structure with time complexity:
1. O(|V|^2)
2. O(|E|+|V|log|V|)
3. O(|V|log|V|)
4. O((|E|+|V|)log|V|)
========================================================================
Correct answer is -
O((|E|+|V|)log|V|)
=========================================================================
My Approach is as follows -
O(V+V+VlogV+ElogV) = O(ElogV)
O(V) to intialize.
O(V) to Build Heap.
VlogV to perform Extract_Min
ElogV to perform Decrease Key
Now, as I get O(ElogV) and when I see options, a part of me says the
correct one is O(VlogV) because for a sparse Graph |V| = |E|, but as I
said the correct answer is O((|E|+|V|)log|V|). So, where am I going
wrong?
Well, you are correct that the complexity is actually O(E log V).
Since E can be up to (V^2 - V)/2, this is not the same as O(V log V).
If every vertex has an edge, then V <= 2E, so in that case, O(E log V) = O( (E+V) log V). That is the usual case, and corresponds to the "correct" answer.
But technically, O(E log V) is not the same as O( (E+V) log V), because there may be a whole bunch of disconnected vertexes in V. When that is the case, however, Dijkstra's algorithm will never see all those vertexes, since it only finds vertexes connected to the single source. So, when the difference between these two complexities is important, you are right and the "correct answer" is not.
Let me put it this way.The correct answer is O((E+V)logV)).If the graph has the source vertex not reachable to all of the other vertices,VlogV could be more than ElogV.But if we assume that the source vertex is reachable to every other vertex, the graph will have at least V-1 edges.So,it will be ElogV.It is more to do with the reachability from the source vertex.

How can a heap be used to optimizie Prim's minimum spanning tree algorithm?

I have to solve a question that is something like this:
I am given a number N which represents the number of points I have. Each point has two coordinates: X and Y.
I can find the distance between two points with the following formula:
abs(x2-x1)+abs(y2-y1),
(x1,y1) being the coordinates of the first point, (x2,y2) the coordinates of the second point and abs() being the absolute value.
I have to find the minimum spanning tree, meaning I must have all my points connected with the sum of the edges being minimal. Prim's algorithm is good, but it is too slow. I read that I can make it faster using a heap but I didn't find any article that explains how to do that.
Can anyone explain me how Prim's algorithm works with a heap(some sample code would be good but not neccesarily), please?
It is possible to solve this problem efficiently(in O(n log n) time), but it is not that easy. Just using the Prim's algorithm with a heap does not help(it actually makes it even slower), because its time complexity is O(E log V), which is O(n^2 * log n) in this case.
However, you can use the Delaunay triangulation to reduce the number of edges in the graph. The Delaunay triangulation graph is planar, so it has linear number of edges. That's why running the Prim's algorithm with a heap on it gives O(n log n) time complexity(there are O(n) edges and n vertices). You can read more about it here(covering this algorithm in details and proving its correctness would make my answer way too long): http://en.wikipedia.org/wiki/Euclidean_minimum_spanning_tree. Note that even though the article is about the Euclidian mst, the approach for your case is essentially the same(it is possible to build the Delaunay triangulation for manhattan distance efficiently, too).
A description of the Prim's algorithm with a heap itself is already present in two other answers to your question.
From the Wikipedia article on Prim's algorithm:
[S]toring vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed minimum spanning tree (MST) (or infinity if no such edge exists). Every time a vertex v is chosen and added to the MST, a decrease-key operation is performed on all vertices w outside the partial MST such that v is connected to w, setting the key to the minimum of its previous value and the edge cost of (v,w).
While it was pointed out that Prim's with a heap is O(E log V), which is O(n^2 log n) in the worst case, I can provide what makes the heap faster in cases other than that worst case, since that has still not been answered.
What makes Prim's so costly at O(V^2) is the necessary updating each iteration in the algorithm. In general, Prim's works by keeping a table of your vertices with the lowest length to other vertices and picking the cheapest vertex to add to your growing tree until all are added. Every time you add a vertex, you must then go back to your table and update any vertices that can now be accessed with less weight. You then must walk back all the way through your table to decide which vertex is cheapest to add. This setup - having to pick the next vertex (O(V)) V times - gives the O(V^2).
The heap is able to help this running time is all cases besides the worst case because it fixes this bottleneck. By working with a minimum heap, you can access the minimum weight in consideration in O(1). Additionally, it costs O(log V) to fix a heap after adding a number to it to maintain its properties, which is done E times for O(E log V) to maintain the heap for Prim's. This becomes the new bottleneck, which is what gives rise to the final running time of O(E log V).
So, depending on how much you know about your data, Prim's with a heap can certainly be more efficient than without!

Resources