Dijkstra time complexity with C++ pq - algorithm

Dijkstra time complexity is O(V+ElogV) with binary heaps.
But, C++ pq(if used as binary heap), does not support decrease key. One solution suggested is to just insert the same vertex again in pq with decreased distance. For, ex:
From: https://www.hackerearth.com/practice/algorithms/graphs/shortest-path-algorithms/tutorial/
void dijkstra(){
// set the vertices distances as infinity
memset(vis, false , sizeof vis); // set all vertex as unvisited
dist[1] = 0;
multiset < pair < int , int > > s; // multiset do the job as a min-priority queue
s.insert({0 , 1}); // insert the source node with distance = 0
while(!s.empty()){
pair <int , int> p = *s.begin(); // pop the vertex with the minimum distance
s.erase(s.begin());
int x = p.s; int wei = p.f;
if( vis[x] ) continue; // check if the popped vertex is visited before
vis[x] = true;
for(int i = 0; i < v[x].size(); i++){
int e = v[x][i].f; int w = v[x][i].s;
if(dist[x] + w < dist[e] ){ // check if the next vertex distance could be minimized
dist[e] = dist[x] + w;
s.insert({dist[e], e} ); // insert the next vertex with the updated distance
}
}
}
}
The complexity should increase with this implementation(as opposed to claimed in the article, O(V+ElogV)), as the heap is size>V. I believe the complexity should be O(V+ElogE).
Am I correct? If not, what should be correct complexity?

Those bounds are actually equivalent for simple connected graphs. Since
|V| − 1 ≤ |E| ≤ |V| (|V| − 1)/2,
we can take logs and find that
log(|V|) − O(1/|V|) ≤ log(|V| − 1) ≤ log(|E|) ≤ log (|V| (|V| − 1)/2) ≤ 2 log(|V|),
hence Θ(log(|V|)) = Θ(log(|E|)).

Related

Given nodes of a graph on a circle, find minimum number of nodes to be removed to have a graph in which each node has edge to its next node

Given n people sitting at the table, some of people are familiar with each other, familiarity is a bidirectional relation, find minimum number of people to be eliminated from the table in order to have a table that each person at the table is familiar with its neighbors. Give a solution of O(n^2)
My current effort:
As the order suggests, I tried solving the problem by T(n) = T(n-1) + O(n),
But if I consider I have found the ideal circle with m nodes envolving in it and now I want to add a new node, I check new node if it can be new member for the circle and make new circle of m+1 nodes, if it is possible then the problem is solved, but if it is not, I do need to store all circles of length m-1, m-2,... and add this new node to them which takes more than O(n) time.
Let assume that we have a graph with n nodes, 0 to n - 1.
If we view the problem as finding the shortest cycle from node A back to A, the distance between node a and b is the absolute different (b - a - 1) (which is all the nodes between those two) and we could only go from a to b if b > a or a is the start node. This problem is reduced to classic finding the shortest path in the graph.
Using Dijkstra algo, we could obtain a O(N^2 log N) algorithm.
Java code:
class State{
int node, distance;
}
int result = n - 1;
for(int i = 0; i < n; i++){
//Find the shortest path to move from i -> i
PriorityQueue<State> q = new PriorityQueue<>((a , b) -> Integer.compare(a.distance, b.distance));
for(int j : map[i]) {
if( j > i){
q.add(new State(j , j - i + 1);
}
}
int[]dist = new int[n];
Arrays.fill(dist, n - 1);
while(!q.isEmpty()){
State s = q.poll();
if(s.distance != dist[s.node]){
continue;
}
for(int j : map[s.node]){
if((j > s.node || j == i) && dist[j] > s.distance + (j - s.node + 1)){
dist[j] = s.distance + (j - s.node + 1);
q.add(new State(j, dist[j]);
}
}
}
result = Integer.min(result, dist[i]);
}

O(n) solution to counting sub-arrays with sum constraints

I'm trying to improve my intuition around the following two sub-array problems.
Problem one
Return the length of the shortest, non-empty, contiguous sub-array of A with sum at least
K. If there is no non-empty sub-array with sum at least K, return -1
I've come across an O(N) solution online.
public int shortestSubarray(int[] A, int K) {
int N = A.length;
long[] P = new long[N+1];
for (int i = 0; i < N; ++i)
P[i+1] = P[i] + (long) A[i];
// Want smallest y-x with P[y] - P[x] >= K
int ans = N+1; // N+1 is impossible
Deque<Integer> monoq = new LinkedList(); //opt(y) candidates, as indices of P
for (int y = 0; y < P.length; ++y) {
// Want opt(y) = largest x with P[x] <= P[y] - K;
while (!monoq.isEmpty() && P[y] <= P[monoq.getLast()])
monoq.removeLast();
while (!monoq.isEmpty() && P[y] >= P[monoq.getFirst()] + K)
ans = Math.min(ans, y - monoq.removeFirst());
monoq.addLast(y);
}
return ans < N+1 ? ans : -1;
}
It seems to be maintaining a sliding window with a deque. It looks like a variant of Kadane's algorithm.
Problem two
Given an array of N integers (positive and negative), find the number of
contiguous sub array whose sum is greater or equal to K (also, positive or
negative)"
The best solution I've seen to this problem is O(nlogn) as described in the following answer.
tree = an empty search tree
result = 0
// This sum corresponds to an empty prefix.
prefixSum = 0
tree.add(prefixSum)
// Iterate over the input array from left to right.
for elem <- array:
prefixSum += elem
// Add the number of subarrays that have this element as the last one
// and their sum is not less than K.
result += tree.getNumberOfLessOrEqual(prefixSum - K)
// Add the current prefix sum the tree.
tree.add(prefixSum)
print result
My questions
Is my intuition that algorithm one is a variant of Kandane's algorithm correct?
If so, is there a variant of this algorithm (or another O(n) solution) that can be used to solve problem two?
Why can problem two only be solved in O(nlogn) time when they look so similar?

DFS and BFS Time and Space complexities of 'Number of islands' on Leetcode

Here is the question description. The first 2 suggested solutions involve DFS and BFS. This question refers to the 1st two approaches: DFS and BFS.
I have included the problem statement here for easier reading.
Given a 2d grid map of '1's (land) and '0's (water), count the number of
islands. An island is surrounded by water and is formed by connecting adjacent
lands horizontally or vertically. You may assume all four edges of the grid are
all surrounded by water.
Example 1:
Input:
11110
11010
11000
00000
Output: 1
Example 2:
Input:
11000
11000
00100
00011
Output: 3
I am unclear as to why the time complexity for both DFS and BFS is O(rows * columns) for both. I see how this is the case where the grid is just full of 0's - we simply have to check each cell. However, doesn't the DFS approach add more time to the search? Even if we mark the cells we visited by changing them to 0 in the dfs methods, we still would revisit all the cells because of the two outer loops. If dfs could be have time complexity of O(n) in the case of a big grid with large row and column numbers, wouldn't the time complexity be O(rows * columns * max[rows, cols])? Moreover, isn't the same case with the BFS approach where it is O(rows * cols * possibleMaxSizeOfQueue) where possibleMaxSizeOfQueue could again be max[rows, cols]?
for (int r = 0; r < nr; ++r) {
for (int c = 0; c < nc; ++c) {
if (grid[r][c] == '1') {
++num_islands;
dfs(grid, r, c);
}
}
}
How is DFS's space complexity O(rows*cols)? Is it not possible/common to consider the call stack space as freed when a recursion branch returns?
How is the space complexity for BFS O(min(rows, cols))? The way I see it, the queue could be full of all elements in the case of a grid with just 1's thereby giving O(rows*cols) for BFS space complexity.
DFS Solution
class Solution {
void dfs(char[][] grid, int r, int c) {
int nr = grid.length;
int nc = grid[0].length;
if (r < 0 || c < 0 || r >= nr || c >= nc || grid[r][c] == '0') {
return;
}
grid[r][c] = '0';
dfs(grid, r - 1, c);
dfs(grid, r + 1, c);
dfs(grid, r, c - 1);
dfs(grid, r, c + 1);
}
public int numIslands(char[][] grid) {
if (grid == null || grid.length == 0) {
return 0;
}
int nr = grid.length;
int nc = grid[0].length;
int num_islands = 0;
for (int r = 0; r < nr; ++r) {
for (int c = 0; c < nc; ++c) {
if (grid[r][c] == '1') {
++num_islands;
dfs(grid, r, c);
}
}
}
return num_islands;
}
}
Time complexity : O(M×N) where M is the number of rows
and N is the number of columns.
Space complexity : worst case O(M×N) in case that the
grid map is filled with lands where DFS goes by M×N deep.
BFS Solution
class Solution {
public int numIslands(char[][] grid) {
if (grid == null || grid.length == 0) {
return 0;
}
int nr = grid.length;
int nc = grid[0].length;
int num_islands = 0;
for (int r = 0; r < nr; ++r) {
for (int c = 0; c < nc; ++c) {
if (grid[r][c] == '1') {
++num_islands;
grid[r][c] = '0'; // mark as visited
Queue<Integer> neighbors = new LinkedList<>();
neighbors.add(r * nc + c);
while (!neighbors.isEmpty()) {
int id = neighbors.remove();
int row = id / nc;
int col = id % nc;
if (row - 1 >= 0 && grid[row-1][col] == '1') {
neighbors.add((row-1) * nc + col);
grid[row-1][col] = '0';
}
if (row + 1 < nr && grid[row+1][col] == '1') {
neighbors.add((row+1) * nc + col);
grid[row+1][col] = '0';
}
if (col - 1 >= 0 && grid[row][col-1] == '1') {
neighbors.add(row * nc + col-1);
grid[row][col-1] = '0';
}
if (col + 1 < nc && grid[row][col+1] == '1') {
neighbors.add(row * nc + col+1);
grid[row][col+1] = '0';
}
}
}
}
}
return num_islands;
}
}
Time complexity : O(M×N) where M is the number of rows
and N is the number of columns.
Space complexity : O(min(M,N)) because in worst case where
the grid is filled with lands, the size of queue can grow up to
min(M,N).
DFS' time complexity is proportional to the total number of vertexes and edges of the graph visited. In that case, there are N*M vertexes and slightly less than 4*N*M edges, their sum is still O(N*M).
Why so: because we process each edge exactly once in each direction. Situation where recursive call is immediately terminated does not matter as time spent for that call can be accounted for on the call site; and there is at most once call for each directed edge, hence O(N*M).
BFS' time complexity is quite similar. Maximal length of the queue does not matter at all because at no point we examine it in a whole. Queue only gets "append" and "remove first" queries, which can be processed in constant time regardless of queue's size. If we need to check whether a vertex was already visited, we do so in constant time.
Worst-case space complexity for DFS is Theta(N*M): just take any "snake-wise" maze:
......
#####.
......
.#####
......
Here DFS will be forced to traverse the path in whole before it stops and starts freeing up the stack. However, in no situation there will be more than N*M+1 elements on the stack.
Worst-case space complexity for BFS is indeed not O(max(N, M)): it's Theta(N*M) even when we're considering simple grid. Here is an example from math.stackexchange.com:
If we start BFS in the red point, it will end up with a queue which contains all leafs of the tree, their number is proportional to N*M. One can also truncate 3/4rd of the example and make the red dot appear in the upper-left corner.
Looks like the solution you've read is wrong in respect to BFS' worst case memory consumption.
#yeputons: I don't think the space complexity for BFS will be proportional to N * M.
When you say a queue could have at max all the leaf elements (when starting at center) that actually means 2*(N+M) elements at Max.
And when starting from one of the corners it indeed is O(min(m, n)), because number of elements being added to the queue are constrained.
This is a classic example of BFS implementation. Only catch here is that we don't need extra space to mark a node as visited. Existing grid matrix can be reused to identify a visited node. I have made a small video explaining this logic. Please check this out
https://www.youtube.com/watch?v=GkG4cQzyFoU
I think the space complexity of BFS is indeed O(min(M, N)) where M is the row length and N is the column length. Please see the below examples:
When you start traversing a matrix from the corner, the maximum number of cells/nodes you can have in the queue is k where k is the number of cells on a diagonal line in the matrix, which means k = min(M, N).
When you start traversing a matrix from the centre, the maximum number of cells/nodes you can have in the queue is {1, 4, 8, 12, 16, ..., 4i} where i is the i-th layer. And such cells fit in a matrix of min size {1, 4, 9, 16, 25, ..., i*i} respectively. Please see the below scratch:
We know that i is min(M, N), so yet again we have space complexity of O(4 * min(M, N)) which is O(min(M,N)).
Below is my attempt at arguing against #yeputon's answer:
I think the space complexity of BFS in the answer from #yeputons is not applicable to matrix traversal. The plot shown in that answer is a plot of a binary tree laid out in a matrix, but we are traversing in a ternary tree fashion except for the first step where we branch out to 4 branches. The traversal is more like what's described here in a different answer to the Maths Stack Exchange question quoted by #yeputon (https://math.stackexchange.com/a/2219954/758829). But I feel that this is still not the same, because when traversing a matrix in a BFS way, we go from the starting point outwards only. Traversal in both answers to the Maths Stack Exchange question goes recursively and that means it's not strictly outwards from the starting point.
Please correct me if I'm wrong. Thank you!

Path of Length N in graph with constraints

I want to find number of path of length N in a graph where the vertex can be any natural number. However two vertex are connected only if the product of the two vertices is less than some natural number P. If the product of two vertexes are greater than P than those are not connected and can't be reached from one other.
I can obviously run two nested loops (<= P) and create an adjacency matrix, but P can be extremely large and this approach would be extremely slow. Can anyone think of some optimal approach to solve the problem? Can we solve it using Dynamic Programming?
I agree with Ante's recurrence, although I used a slightly simplified version. Note that I'm using the letter P to name the maximum product, as it is used in the original problem statement:
f(1,x) = 1
f(i,x) = sum(f(i-1, y) for y in {1, ..., floor(P/x)})
f(i,x) is the number of sequences of length i that end with x. The answer to the question is then f(n+1, 1).
Of course since P can be up to 10^9 in this task, a straightforward implementation with a DP table is out of the question. However, there are only up to m < 70000 possible different values of floor(P/i). So let's find the maximal segments aj ... bj, where floor(P/aj) = floor(P/bj). We can find those segments in O(number of segments * log P) using binary search.
Imagine the full DP table for f. Since there are only m different values for floor(P/x), every row of f consists of m contiguous ranges that have the same value.
So let's compute the compressed DP table, where we represent the rows as list of (length, value) pairs. We start with f(1) = [(P, 1)] and we can compute f(i+1) from f(i) by processing the segments in increasing order and computing prefix sums of the lengths stored in f(i).
The total runtime of my implementation of this approach is O(m (log P + n)). This is the code I used:
using ll=long long;
const int mod = 1000000007;
void add(int& x, ll y) { x = (x+y)%mod; }
int main() {
int n, P;
cin >> n >> P;
int x = 1;
vector<pair<int,int>> segments;
while(x <= P) {
int y = x+1, hi = P+1;
while(y<hi) {
int mid = (y+hi)/2;
if (P/mid < P/x) hi=mid;
else y=mid+1;
}
segments.push_back(make_pair(P/x, y-x));
x = y;
}
reverse(begin(segments), end(segments));
vector<pair<int,int>> dp;
dp.push_back(make_pair(P,1));
for (int i = 1; i <= n; ++i) {
int j = 0;
int sum_smaller = 0, cnt_smaller = 0;
vector<pair<int,int>> dp2;
for (auto it : segments) {
int value = it.first, cnt = it.second;
while (cnt_smaller + dp[j].first <= value) {
cnt_smaller += dp[j].first;
add(sum_smaller,(ll)dp[j].first*dp[j].second);
j++;
}
int pref_sum = sum_smaller;
if (value > cnt_smaller)
add(pref_sum, (ll)(value - cnt_smaller)*dp[j].second);
dp2.push_back(make_pair(cnt, pref_sum));
}
dp = dp2;
reverse(begin(dp),end(dp));
}
cout << dp[0].second << endl;
}
I needed to do some micro-optimizations with the handling of the arrays to get AC, but those aren't really relevant, so I left them away.
If number of vertices is small than adjacency matrix (A) can help. Since sum of elements in A^N is number of distinct paths, if paths are oriented. If not than number of paths i sum of elements / 2. That is due an element (i,j) represents number of paths from vertex i to vertex j.
In this case, same approach can be done by DP, using reasoning that number of paths of length n from vertex v is sum of numbers of paths of length n-1 of all it's neighbours. Neigbours of vertex i are vertices from 1 to floor(Q/i). With that we can construct function N(vertex, length) which represent number of paths from given vertex with given length:
N(i, 1) = floor(Q/i),
N(i, n) = sum( N(j, n-1) for j in {1, ..., floor(Q/i)}.
Number of all oriented paths of length is sum( N(i,N) ).

Running time of minimum spanning tree? ( Prim method )

I have written a code that solves MST using Prim method. I read that this kind of implementation(using priority queue) should have O(E + VlogV) = O(VlogV) where E is the number of edges and V number of Edges but when I look at my code it simply doesn't look that way.I would appreciate it if someone could clear this up for me.
To me it seems the running time is this:
The while loop takes O(E) times(until we go through all the edges)
Inside that loop we extract an element from the Q which takes O(logE) time.
And the second inner loop takes O(V) time(although we dont run this loop everytime
it is clear that it will be ran V times since we have to add all the vertices )
My conclusion would be that the running time is: O( E(logE+V) ) = O( E*V ).
This is my code:
#define p_int pair < int, int >
int N, M; //N - nmb of vertices, M - nmb of edges
int graph[100][100] = { 0 }; //adj. matrix
bool in_tree[100] = { false }; //if a node if in the mst
priority_queue< p_int, vector < p_int >, greater < p_int > > Q;
/*
keeps track of what is the smallest edge connecting a node in the mst tree and
a node outside the tree. First part of pair is the weight of the edge and the
second is the node. We dont remember the parent node beaceuse we dont need it :-)
*/
int mst_prim()
{
Q.push( make_pair( 0, 0 ) );
int nconnected = 0;
int mst_cost = 0;
while( nconnected < N )
{
p_int node = Q.top(); Q.pop();
if( in_tree[ node.second ] == false )
{
mst_cost += node.first;
in_tree[ node.second ] = true;
for( int i = 0; i < N; ++i )
if( graph[ node.second ][i] > 0 && in_tree[i]== false )
Q.push( make_pair( graph[ node.second ][i], i ) );
nconnected++;
}
}
return mst_cost;
}
You can use adjacency lists to speed your solution up (but not for dense graphs), but even then, you are not going to get O(V log V) without a Fibonacci heap..
Maybe the Kruskal algorithm would be simpler for you to understand. It features no priority queue, you only have to sort an array of edges once. It goes like this basically:
Insert all edges into an array and sort them by weight
Iterate over the sorted edges, and for each edge connecting nodes i and j, check if i and j are connected. If they are, skip the edge, else add the edge into the MST.
The only catch is to be quickly able to say if two nodes are connected. For this you use the Union-Find data structure, which goes like this:
int T[MAX_#_OF_NODES]
int getParent(int a)
{
if (T[a]==-1)return a;
return T[a]=getParent(T[a]);
}
void Unite(int a,int b)
{
if (rand()&1)
T[a]=b;
else
T[b]=a;
}
In the beginning, just initialize T to all -1, and then every time you want to find out if nodes A and B are connected, just compare their parents - if they are the same, they are connected (like this getParent(A)==getParent(B)). When you are inserting the edge to MST, make sure to update the Union-Find with Unite(getParent(A),getParent(B)).
The analysis is simple, you sort the edges and iterate over using the UF that takes O(1). So it is O(E logE + E ) which equals O(E log E).
That is it ;-)
I did not have to deal with the algorithm before, but what you have implemented does not match the algorithm as explained on Wikipedia. The algorithm there works as follows.
But all vertices into the queue. O(V)
While the queue is not empty... O(V)
Take the edge with the minimum weight from the queue. O(log(V))
Update the weights of adjacent vertices. O(E / V), this is the average number of adjacent vertices.
Reestablish the queue structure. O(log(V))
This gives
O(V) + O(V) * (O(log(V)) + O(V/E))
= O(V) + O(V) * O(log(V)) + O(V) * O(E / V)
= O(V) + O(V * log(V)) + O(E)
= O(V * log(V)) + O(E)
exactly what one expects.

Resources