Time Complexity of the Kruskal Algorithm? - algorithm

I am calculating time complexity for kruskal algorithm like this (Please see the algorithm in the Image Attached)
T(n) = O(1) + O(V) + O(E log E) + O(V log V)
= O(E log E) + O(V log V)
as |E| >= |V| - 1
T(n) = E log E + E log E
= E log E
The CLRS Algorithm:
Is it correct or I'm doing something wrong please tell.

Kruskal is O(E log E); your derivation is right. You could also say O(E log V) because E <= V * V, so log(E) <= 2 log(V) (I don't know why I remember that, other than that I think a prof put that on an exam at one point...)

Since |V| > |E|+1, we prefer a tight upper bound with V terms instead of E terms.
|E| <= |V|²
. log |E| < log |V|²
. log |E| < 2 log |V|
. running time of MST-KRUSKAL is: O(E log V)

Sorry for the late reply.
Runtime for Kruskal algorithm is O(E log E) and not O(E log V).
As, the edges have to be sorted first and it takes O(E log E) where it dominates the runtime for verifying whether the edge in consideration is a safe edge or not which would take O( E log V). And |E| > |V|((corner case if the graph is already a tree)), so its safe to assume the runtime is O(E log E)

O(ElogE) is definitely O(ElogV) because E <= V^2 (fully connected graph)
ElogE <= Elog(V^2) = 2ElogV = O(ElogV)

All other answers are correct, but we can consider the following case, that gives us the time complexity of O(|E|).
The following answer is from Algorithms book by Dasgupta, chapter 5, page 140, section path compression:
In the time complexity computation of this algorithm, the dominant part is the edge sorting section which is O(|E| log|E|) or as all other answers explained O( |E|. log|V|).
But, what if the given edges are sorted?
Or if the weights are small (say, O(|E|)) so that sorting can be done in linear time (like applying counting sort).
In such a case, the data structure part becomes the bottleneck (the Union-find) and it is useful to think about improving its performance beyond log n per operation.
The solution is using the path-compression method, while doing the find() operation.
This amortized cost turns out to be just barely more than O(1), down from the earlier O(log n). For more details please check this reference.
The brief idea is, whenever the find(v) operation is called to find the root of a set which v belongs to, all the nodes' links to their parent will be changed and will point out to the root. This way if you call find(x) operation on each node x on the same path, you will get the set's root (label) in O(1). Hence, in this case, the algorithm bottleneck is the Union-find operation and using the described solution it is O(1), the running time of this algorithm in the described situation is O(|E|).

line 5 to 9 the complexity is O(E).
O(E)
O(1)
O(1)
O(1)
O(1)
till the line 5 you have calculated the complexity rightly. Finally, the Dominating factor here is O(E lg E). So, the complexity is O(E lg E)

Related

O(n log n) vs O(m) for algorithm

I am finding an algorithm for a problem where I have two sets A and B of points with n and m points. I have two algorithms for the sets with complexity O(n log n) and O(m) and I am now wondering whether the complexity for the both algorithms combined is O(n log n) or O(m).
Basically, I am wondering whether there is some relation between m and n which would result in O(m).
If m and n are truly independent of one another and neither quantity influences the other, then the runtime of running an O(n log n)-time algorithm and then an O(m)-time algorithm is will be O(n log n + m). Neither term dominates the other - if n gets huge compared to m then the n log n part dominates, and if m is huge relative to n then the m term dominates.
This gets more complicated if you know how m and n relate to one another in some way. Many graph algorithms, for example, use m to denote the number of edges and n to denote the number of nodes. In those cases, you can sometimes simplify these expressions, but sometimes cannot. For example, the cost of implementing Dijkstra’s algorithm with a Fibonacci heap is O(m + n log n), the same as what we have above.
Size of your input is x: = m + n.
Complexity of a combined (if both are performed at most a constant number of times in the combined algorithm) algorithm is:
O(n log n) + O(m) = O(x log x) + O(x) = O(x log x).
Yes if m ~ n^n, then O(logm) = O(nlogn).
There is a log formula:
log(b^c) = c*log(b)
EDIT:
For both the algos combined the Big O is always the one that is larger because we are concerned about the asymptotic upper bound.
So it will depend on value of n and m. Eg: While n^n < m, the complexity is Olog(m), after that it becomes O(nlog(n)).
For Big-O notation we are only concerned about the larger values, so if n^n >>>> m then it is O(nlog(n)), else if m >>>> n^n then it is O(logm)

Dijkstra's: Where is the equation from? m < n^2/log n

In this passage from my textbook:
where are the inequalities from? (The ones that I've marked with red rectangles.) I feel that they describe a relationship between vertices and edges in a graph, but I don't understand it.
You have two implementations of Dijkstra’s algorithm to choose from. One runs in time O((m + n) log n) = O(m log n), assuming the graph is connected. The other runs in time O(n2). The question is where the crossover point is between these two runtimes. Equating and simplifying gives that
m log n = n2
m = n2 / log n
So if m is asymptotically smaller than n2 / log n, you’d prefer the heap implementation, and if m is asymptotically bigger than n2 / log n you’d prefer the unsorted sequence approach.
(Note that, with a Fibonacci heap, the runtime of Dijkstra’s algorithm is O(m + n log n), which is never asymptotically worse than O(n2).)

Time complexity of BFS and DFS on a BinaryTree: Why O(n)?

The time complexity of BFS, or DFS, on a graph is O(V+E) because we traverse all the nodes and edges of the graph. (I get that) But for a binary tree, the time complexity of BFS and DFS is O(V)... Why is that?
I am assuming is because of the following: O(V+E) = O(V + V-1) = O(2V) = O(V). Is this the correct reasoning? If not, an intuitive explanation would be much appreciated. Thanks
All trees have n - 1 edges, n being the number of nodes. The time complexity is still technically O(V + E), but that equates to O(n + (n-1)) = O(n).
you can actually see it in a different way, without the use of graphs.
n is the number of nodes.
And denote the steps required for traversing through the whole tree f(n) (note, the time complexity will then be O(f(n))).
Consider that for each node we need to:
either visit that, or traverse it through on the left, and traverse it through on the right, and eventually return on it at most one time.
All these 4 operations can happen at most Once for each node. Agree?
From this we deduce that f(n) <= 4n.
Agree? Because for each node we can have at most those 4 operations. Remind we have n nodes.
Obviously, at the same time, n <= f(n)
because we need to visit each node at least once.
Therefore,
n <= f(n) <= 4n
Applying the O notation, we get
O(n) <= O(f(n)) <= O(4n)
Reminding that O(4n) = O(n) by properties of O (invariance due to multiplicative constants different from 0), we get that
O(n) <= O(f(n)) <= O(4n) = O(n),
or
O(n) <= O(f(n)) <= O(n)
Notice the left side of this chain of inequality is equal to the right side of the chain, meaning that it is not only a chain of inequality, but a chain of equalities, or
O(n) = O(f(n)) = O(n)
meaning that the complexity is O(n)

Complexity of Dijkstra's Algorithm for Heap Implementation

In CRLS' book, the analysis of Dijkstra's algorithm is as follows:
How many times do you need to use the heap? One time for pulling off each node from the heap (i.e. Extract-Min in CRLS's book) --- O(N); and also every time when looking at the edge ---- O(E), you might need to change the distance (i.e., Decrease-Key in CRLS' book), which means to fix the heap order. And each heap operation needs O(logN) work.
Thus, total time complexity: O((N + E)logN), which is O(ElogN) if all vertices are reachable from the source.
My Question is:
Why the complexity becomes O(ElogN) if all vertices are reachable from the source? Why can we ignore the O(NlogN) part from O((N + E)logN)?
If all vertices are reachable from the source, then there are at least N-1 edges in graph, therefore E >= N-1, N = O(E) and O((N + E) log N) = O((E + E) log N) = O(E log N)
If all nodes are connected there must be at least N-1 edges. So E >= N-1 and thus N <= E+1 and N+E <= 2E+1 which is in O(E).

Find 3 elements in each of 3 arrays that sum to a given value

Let A , B, C be 3 arrays of n elements each. Find an algorithm for determining whether there exist an a in A, b in B, c in C such that a+b+c = k.
I have tried the following algorithm, but it takes O(n²):
Sort all 3 arrays. - O(n log n)
Temporary array h = k - (a+b) - O(n)
For every h, find c' in B such that c' = h - B[i] - O(n)
Search c' in C using binary search - O(log n)
Total is = O(n log n) + O(n) + O(n² log n)
Can we solve it in O(n log n)?
Your question asks about solving the problem 3SUMx1, in linearithmic time, which is shown to reduce to 3SUMx3 in randomized linear time. See here for the reduction.
Unless you're about to publish something very big, I doubt that there can be such a fast algorithm for your problem, which is at least as hard as 3SUM (you can also show the reduction in the opposite direction with some work, too).
Edit: To make the above paragraph clear, the linear-time reduction from 3SUM proves that OP's problem is $\Omega(n^{1.5})$.
this is just a variation of the 3SUM problem. you cannot solve it in O(n log n)
it can be solved in O(n^2). The algorithm you described is wrong - it is not considering combinations of various indexes from A and B... see https://en.wikipedia.org/wiki/3SUM

Resources