Time complexity of union - algorithm

We have a directed graph G=(V,E) ,at which each edge (u, v) in E has a relative value r(u, v) in R and 0<=r(u, v) <= 1, that represents the reliability , at a communication channel, from the vertex u to the vertex v.
Consider as r(u, v) the probability that the chanel from u to v will not fail the transfer and that the probabilities are independent.
I want to write an efficient algorithm that finds the most reliable path between two given nodes.
I have tried the following:
DIJKSTRA(G,r,s,t)
1. INITIALIZE-SINGLE-SOURCE(G,s)
2. S=Ø
3. Q=G.V
4. while Q != Ø
5. u<-EXTRACT-MAX(Q)
6. if (u=t) return d[t]
7. S<-S U {u}
8. for each vertex v in G.Adj[u]
9. RELAX(u,v,r)
INITIAL-SINGLE-SOURCE(G,s)
1. for each vertex v in V.G
2. d[v]=-inf
3. pi[v]=NIL
4. d[s]=1
RELAX(u,v,r)
1. if d[v]<d[u]*r(u,v)
2 d[v]<-d[u]*r(u,v)
3. pi[v]<-u
and I wanted to find the complexity of the algorithm.
The time complexity of INITIALIZE-SINGLE-SOURCE(G,s) is O(|V|).
The time complexity of the line 4 is O(1).
The time complexity of the line 5 is O(|V|).
The time complexity of the line 7 is O(log(|V|)).
The time complexity of the line 8 is O(1).
Which is the time complexityof the command S<-S U {u} ?
The line 10 is executed in total O(Σ_{v \in V} deg(v))=O(E) times and the time complexity of RELAX is O(1).
So the time complexity of the algorithm is equal to the time complexity of the lines (3-9)+O(E).
Which is the time complexity of the union?

So the time complexity of the algorithm is equal to the time
complexity of the lines (3-9)+O(E). Which is the time complexity of
the union?
No, it is not the complexity of the union, union can be done pretty efficiently if you are using hash table for example. Moreover, since you use S only for the union, it seems to be redundant.
The complexity of the algorithm also depends heavily on your EXTRACT-MAX(Q) function (usually it is logarithmic in the size of the Q, so logV per iteration), and on RELAX(u,v,r) (which is also usually logarithmic in the size of Q, since you need to update entries in your priority queue).
As expected, this brings us to the same complexity of original Dijkstra's algorithm, which is O(E+VlogV) or O(ElogV), depending on implementation of your priority queue.

I think that the solution should be based on the classic Dijkstra algorithm (complexity of which is well-known), as you suggested, however in your solution you define the "shortest path" problem incorrectly.
Note that the probability of A and B is p(A) * p(B) (if they're independent). Hence, you should find a path, whose multiplication of edges is maximized. Whereas Dijkstra algorithm finds the path whose sum of edges is minimized.
To overcome this issue you should define the weight of your edges as:
R*(u, v) = -log ( R(u, v) )
By introducing the logarithm, you convert multiplicative problem to additive.

Related

Running time of Kruskal's algorithm

The Kruskal's algorithm is the following:
MST-KRUSKAL(G,w)
1. A={}
2. for each vertex v∈ G.V
3. MAKE-SET(v)
4. sort the edges of G.E into nondecreasing order by weight w
5. for each edge (u,v) ∈ G.E, taken in nondecreasing order by weight w
6. if FIND-SET(u)!=FIND-SET(v)
7. A=A U {(u,v)}
8. Union(u,v)
9. return A
According to my textbook:
Initializing the set A in line 1 takes O(1) time, and the time to sort
the edges in line 4 is O(E lgE). The for loop of lines 5-8 performs
O(E) FIND-SET and UNION operations on the disjoint-set forest. Along
with the |V| MAKE-SET operations, these take a total of O((V+E)α(V))
time, where α is a very slowly growing function. Because we assume
that G is connected, we have |E| <= |V|-1, and so the disjoint-set
operations take O(E α(V)) time. Moreover, since α(V)=O(lgV)=O(lgE),
the total running time of Kruskal's algorithm is O(E lgE). Observing
that |E|<|V|^2, we have lg |E|=O(lgV), and so we can restate the
running time of Kruskal's algorithm as O(E lgV).
Could you explain me why we deduce that the time to sort the edges in line 4 is O(E lgE)?
Also how do we get that the total time complexity is O((V+E)α(V)) ?
In addition, suppose that all edge weights in a graph are integers from 1 to |V|. How fast can you make Kruskal's algorithm run? What if the edges weights are integers in the range from 1 to W for some constant W?
How does the time complexity depend on the weight of the edges?
EDIT:
In addition, suppose that all edge weights in a graph are integers
from 1 to |V|. How fast can you make Kruskal's algorithm run?
I have thought the following:
In order the Kruskal's algorithm to run faster, we can sort the edges applying Counting Sort.
The line 1 requires O(1) time.
The lines 2-3 require O(v) time.
The line 4 requires O(|V|+|E|) time.
The lines 5-8 require O(|E|α(|V|)) time.
The line 9 requires O(1) time.
So if we use Counting Sort in order to solve the edges, the time complexity of Kruskal will be
Could you tell me if my idea is right?
Also:
What if the edges weights are integers in the range from 1 to W for
some constant W?
We will again use Counting Sort. The algorithm will be the same. We find the time complexity as follows:
The line 1 requires O(1) time.
The lines 2-3 require O(|V|) time.
The line 4 requires O(W+|E|)=O(W)+O(|E|)=O(1)+O(|E|)=O(|E|) time.
The lines 5-8 require O(|E|α(|V|)) time.
The line 9 requires O(1) time.
So the time complexity will be:
Could you explain me why we deduce that the time to sort the edges in line 4 is O(E*lgE)?
To sort a set of N items we use O(Nlg(N)) algorithm, which is quick sort, merge sort or heap sort. To sort E edges we therefore need O(Elg(E)) time. This however is not necessary in some cases, as we could use sorting algorithm with better complexity (read further).
Also how do we get that the total time complexity is O((V+E)α(V))?
I don't think total complexity is O((V+E)α(V)). That would be complexity of the 5-8 loop. O((V+E)α(V)) complexity comes from V MAKE-SET operations and E Union operations. To find out why we multiply that with α(V) you will need to read in depth analysis of disjoint set data structure in some algorithmic book.
How fast can you make Kruskal's algorithm run?
For first part, line 4, we have O(E*lg(E)) complexity and for second part, line 5-8, we have O((E+V)α(V)) complexity. This two summed up yield O(Elg(E)) complexity. If we use O(N*lg(N)) sort this can't be improved.
What if the edges weights are integers in the range from 1 to W for
some constant W?
If that is the case, than we could use counting sort for first part. Giving line 4 complexity of O(E+W) = O(E). In that case algorithm would have O((E+V)*α(V)) total complexity. Note that however O(E + W) in reality includes a constant that could be rather large and might be impractical for large W.
How does the time complexity depend on the weight of the edges?
As said, if weight of the edges is small enough we can use counting sort and speed up the algorithm.
EDIT:
In addition, suppose that all edge weights in a graph are integers
from 1 to |V|. How fast can you make Kruskal's algorithm run? I have
thought the following:
In order the Kruskal's algorithm to run faster, we can sort the edges
applying Counting Sort.
The line 1 requires O(1) time. The lines 2-3 require O(vα(|V|)) time.
The line 4 requires O(|V|+|E|) time. The lines 5-8 require
O(|E|α(|V|)) time. The line 9 requires O(1) time.
Your idea is correct, however you can make bounds smaller.
The lines 2-3 requires O(|V|) rather than O(|V|α(|V|)). We however simplified it to O(|V|α(|V|)) in previous calculations to make calculations easier.
With this you get the time of:
O(1) + O(|V|) + O(|V| + |E|) + O(|E|α(|V|)) + O(1) = O(|V| + |E|) + O(|E|α(|V|))
You can simplify this to either O((|V| + |E|) * α(|V|) or to O(|V| + |E|*α(|V|).
So while you were correct, since O((|V| + |E|) * α(|V|) < O((|V| + |E|) * lg(|E|)
Calculations for the |W| are analogous.

Big O Notation of Algorithm composed of smaller algorithms

I am working on an assignment which takes some a graph, adds an extra vertex to the graph, applies Bellman Ford with the new vertex as the source, then uses applies Dijkstra's all pairs to the graph.
The algorithms being used have run-times/space requirements of:
Adding extra vertex
-- Running Time: V
-- Space: V
Bellman Ford single source shortest path algorithm
-- Running time: EV
-- Space: V
Dijkstra's all pairs shortest path algorithm
-- Running time: EV log V
-- Space: V
I am having difficulty understanding if I am calculating the big O of the total process. Each program is run separately, and the output is piped from program to program. My though is the total algorithm would have a Big-O run time of:
O( V + EV + EV log V ) which would simplify to O( EV log V )
The space requirement would be calculated in a similar fashion. I'm I thinking of this correctly? Thanks!
Exactly, a "rule of thumb" is that, in a sequence of code blocks, the overall complexity is dominated by the block with greatest complexity (asymptotically)
Matematically, when V tends to very large numbers, it's smaller than EV, which is smaller than EVlogV. So, for large V, the complexity of your algorithm is approximated well by EVlogV

Number of Edges in Sparse Graph?

I was reading Dijkstras Algorithm in Chap. 24 and got confused with meaning of sparse graph. They say "If the graph is sufficiently sparse—in particular,E= o(V^2/lg V)-we can
improve the algorithm by implementing the min-priority queue with a binary minheap."
My questions
From where they have derived the expression E= o(V^2/lg V)for sparse graph?
Can't we use min-priority queue in case of dense graph. What will be the affect of it on Dijkstra's time complexity?
Reference-CLRS Page-662 3rd Ed.
Please Read:
Substitute that expression for E into the total running time, O((V + E)lg V), and you'll see that if E=o(V^2/lg V) the total will be o(V^2), which is an improvement over the O(V^2) running time of not using a minheap.
Once again, substitute. Let's assume a complete graph, E = V^2. Then, the running time becomes O((V + V^2)lg V) = O(V^2 lg V), which is worse than O(V^2).

Prim's algorithm when range of Edge weights is known

Suppose that all the edge weights in a graph are integers in the range from 1 to |V|. How fast can you make Prim's algorithm run? What if edge weights are integers in the range 1 to W for some constant W?
I think since the Prim's algorithm is based on implementation of min-heap, knowledge about the weights of edges will not help in speeding up the procedure. Is this correct?
With this constraint, you can implement a heap that uses O(V) / O(W) respectively space but has O(1) insert and O(1) extract-min operations. Actually you can get O(1) for all operations you require for Prim's algorithm. Since the time complexity of the heap influences the complexity of the main algorithm, you can get better than the default generic implementation.
I think the main idea to solve this problem is remember that W is a constant, so, if you represent your priority queue as some structure which size is bounded by W, travel the entire list at each iteration will not change the time complexity of your algorithm...
For example, if you represent your priority queue as an array T with W + 1 positions, having a linked list of vertices in each position such that T[i] is a list with all the vertices that have priority equal to i and use T[W + 1] to store vertices with priority equal to infinite, you will take
O(V) to build your priority queue (just insert all the vertices in the list T[W+1])
O(W) to extract the minimum element (just travel T searching for the first position non empty)
O(1) to decrease key (if vertex v had key equal to i and it was updated to j, just take-off v from list T[i] and insert at the first position of the list T[j]).
So, it will give you complexity O(VW + E) instead of O(V logV + E).
(Of course, it will not work if the range is from 1 to V, because V^2 + E is greater than V \logV + E).
For non-binary heap Prim's implementation, the pseudocode can be found with Cormen, Introduction to Algorithms, 3rd edition.
Knowing the range being 1...k, we can create an array with k size and walk through the list, adding edges to the corresponding weight. This, by nature of its storage, means the edges are sorted by weights. This would be O(n+m) time.
Relying on the pseudocode for Prim's algorithm in Cormen, we can analyze its complexity to result in O(nlog{n} + mlog{n}) = O((n+m)log{n}) time (Cormen page 636). In specific, step 7 and step 11 contributes the log{n} element that is iterated over n and m loop. The n log{n}-loop is from the EXTRACT-MIN operation, and the m log{n}-loop is from the "implicit DECREASE-KEY" operation. Both can be replaced with our edge-weight array, a loop of O(k). As such, with our modified Prim's algorithm, we would have a O(nk + mk) = O(k(n+m)) algorithm.

Difficult algorithm: Optimal Solution to 1

This is one of those difficult algorithms just because there are so many options. Imagine a number 'N' and a set of Primes under 10 i.e. {2, 3, 5, 7}. The goal is to keep dividing N till we reach 1. If at any steps N is not divisible by any of the given primes then you can an operation out of:
i) N= N-1
OR ii) N=N+1
This will ensure that N is even and we can continue.
The goal should be achieved while using minimum number of operations.
Please note that this may sound trivial i.e. you can implement a step in your algo that "if N is divisible by any prime then divide it". But this does not always produce the optimal solution
E.g. if N=134: Now 134 is divisble by 2. if you divide by 2 , you get 67. 67 is not divisible by any prime, so you do an operation and N will be 66/68 both of which require another operation. So total 2 operations.
Alternatively, if N=134 and you do an operation N=N+1 i.e. N=135, In this case the total operations needed to reach 1 is 1. So this is the optimal solution
Unless there is some mathematical solution for this problem (If you are looking for a mathematical solution, math.SE is better for this question) - you could reduce the problem to a shortest path problem.
Represent the problem as a graph G=(V,E) where V = N (all natural numbers) and E = {(u,v) | you can get from u to v in a single step }1.
Now, you need to run a classic search algorithm from your source (the input number) to your target (the number 1). Some of the choices to get an optimal solution are:
BFS - since the reduced graph is not weighted, BFS is guaranteed to be both complete (find a solution if one exists) and optimal (finds the shortest solution).
heuristic A* - which is also complete and optimal2, and if you have a good heuristic function - should be faster then an uninformed BFS.
Optimization note:
The graph can be constructed "on the fly", no need to create it as pre-processing. To do so, you will need a next:V->2^V (from a node to a set of node) function, such that next(v) = {u | (v,u) is in E}
P.S. complexity comment: The BFS solution is pseudo-polynomial (linear in the input number worst case), since the "highest" vertex you will ever develop is n+1, so the solution is basically O(n) worst case - though I believe deeper analysis can restrict it to a better limit.
(1) If you are interested only in +1/-1 to be counted as ops, you can create the edges based on the target after finishing divisions.
(2) If an admissible heuristic function is used.

Resources