why E dominates v? - algorithm

I analyzed the running time for Kruskal algorithm and I come up with O(ElogE+Elogv+v)
I asked my prof and he said that if the graph is very sparse with many isolated vertices V dominates E which makes sense if not then E dominates V and I can not understand why?
I can give an example where graph is not sparse but still V is greater than E
Can anyone help me to clear this confusion?

A tree in a undirectional graph has |V|-1 edges.
Since a tree is the connected component with least edges as possible - it basically means that for each connected undirectional graph, |E| is in Omega(|V|), so |V| is dominated by |E|.
This basically means that if |E| < |V|-1 - the graph is not connected.
Now, since Kruskal algorithm is designed to find a spanning tree, you can abort the algorithm once you have found |E| < |V|-1 - there is no spanning tree at all, no point to look for one.
From this we conclude that when |E| < |V|-1, there is no point in discussing complexity of Kruskal Algorithm, and we can safely assume that |E| >= |V| -1 , so |V| is dominated by |E|.

Density = number of edges / number of possible edges = E / (V(V-1))/2
Let the graph be a tree E = V - 1
So V = (E + 1)
And Kruskal's complexity is
O(E log E + E log V + V) = O(E log E + E log (E + 1) + (E + 1)) = O( E log E )
So E dominates. E will dominate as long as E = O(V).

Related

Find the N highest cost vertices that has a path to S, where S is a vertex in an undirected Graph G

I would like to know, what would be the most efficient way (w.r.t., Space and Time) to solve the following problem:
Given an undirected Graph G = (V, E), a positive number N and a vertex S in V. Assume that every vertex in V has a cost value. Find the N highest cost vertices that is connected to S.
For example:
G = (V, E)
V = {v1, v2, v3, v4},
E = {(v1, v2),
(v1, v3),
(v2, v4),
(v3, v4)}
v1 cost = 1
v2 cost = 2
v3 cost = 3
v4 cost = 4
N = 2, S = v1
result: {v3, v4}
This problem can be solved easily by the graph traversal algorithm (e.g., BFS or DFS). To find the vertices connected to S, we can run either BFS or DFS starting from S. As the space and time complexity of BFS and DFS is same (i.e., time complexity: O(V+E), space complexity: O(E)), here I am going to show the pseudocode using DFS:
Parameter Definition:
* G -> Graph
* S -> Starting node
* N -> Number of connected (highest cost) vertices to find
* Cost -> Array of size V, contains the vertex cost value
procedure DFS-traversal(G,S,N,Cost):
let St be a stack
let Q be a min-priority-queue contains <cost, vertex-id>
let discovered is an array (of size V) to mark already visited vertices
St.push(S)
// Comment: if you do not want to consider the case "S is connected to S"
// then, you can consider commenting the following line
Q.push(make-pair(S, Cost[S]))
label S as discovered
while St is not empty
v = St.pop()
for all edges from v to w in G.adjacentEdges(v) do
if w is not labeled as discovered:
label w as discovered
St.push(w)
Q.push(make-pair(w, Cost[w]))
if Q.size() == N + 1:
Q.pop()
let ret is a N sized array
while Q is not empty:
ret.append(Q.top().second)
Q.pop()
Let's first describe the process first. Here, I run the iterative version of DFS to traverse the graph starting from S. During the traversal, I use a priority-queue to keep the N highest cost vertices that is reachable from S. Instead of the priority-queue, we can use a simple array (or even we can reuse the discovered array) to keep the record of the reachable vertices with cost.
Analysis of space-complexity:
To store the graph: O(E)
Priority-queue: O(N)
Stack: O(V)
For labeling discovered: O(V)
So, as O(E) is the dominating term here, we can consider O(E) as the overall space complexity.
Analysis of time-complexity:
DFS-traversal: O(V+E)
To track N highest cost vertices:
By maintaining priority-queue: O(V*logN)
Or alternatively using array: O(V*logV)
The overall time-complexity would be: O(V*logN + E) or O(V*logV + E)

running time variation Prim

I wanted to find the "minimum spanning tree", given a set A with edges that must be in the "minimum spanning tree" (so it's not a real minimum spanning tree, but given A, it has the least sum of weights). So the "minimum spanning tree" must definitely contain all edges of A. I made some modifications to Prim's algorithm which can be found below. I then wanted to find the running time of this algorithm, however, I'm having trouble with finding the running time to check wheter the intersection of two sets is empty.
Could somebody please help me? And what would be to total running time then? I already put the running time for each step next to that step, except for "?".
Notation clarification:
δ(W) = {{v ,w} ∈ E : w ∈ W, v ∈ V\W} for W ⊂ V
algorithm:
1. T = ∅, W = {v} for some v ∈ V O(1)
2. While W ≠ V n iterations
If (A ∩ δ(W) ≠ ∅) do ?
Take e = {v,w} ∈ (A ∩ δ(W)) O(1)
T = T ∪ {e} O(1)
W = W ∪ {v,w } O(1)
Else
Find e = {v,w } ∈ δ(W) s.t. ce ≤ cf ∀ f ∈ δ(W) O(m)
T = T ∪ {e} O(1)
W = W ∪ {v,w } O(1)
End while

What is O(n*log m) + O(m)?

I am confused about addition with the big O notation.
I'm to create an algorithm to find a MST for a graph with some other requirements for a school problem. It's time complexity is to be in O(E * log V), where E is the number of edges and V the number of vertices in the graph. I have arrived at a solution that is in O(E * log V) + O(V).
Does it hold that O(E * log V) + O(V) = O(E * log V)?
Thank you for all the answers! I am assuming this complexity on connected graphs, on graphs that are not connected, my algorithm works in O(E * log V).
For any x, you can make a graph with x edges and 2ˣ (mostly disconnected) vertices.
For such a graph, E log V = x², so (V + E log V)/(E log V) = (2ˣ+x²)/x².
This grows without bound as x increases, so O(E log V) + O(V) is NOT the same as O(E log V), even for graphs.
HOWEVER, if you specify connected graphs, then you have V < E. In that case, as long as V>=2, you have V + E log V < E + E log V <= 2(E log V)
So O(E log V) = O(E log V) + O(V) for connected graphs.
O(ElogV+V) is not the same as O(ElogV). In general V can be arbitrarily larger than ElogV, which makes the two complexity classes different.
But, assuming you have an O(ElogV + V) time algorithm for finding an MST if one exists, you can turn it into a guaranteed O(ElogV) time algorithm, assuming the graph is represented in adjacency list form.
We can determine, in O(E) time, if E>=V/2. Go through the vertices of the graph, and see if there's any edges adjacent to that vertex. If you find a vertex with no adjacent edges, the graph clearly has no MST since that vertex is not connected to the rest of the graph. If you have gone through all vertices, you know that E>=V/2. If you find a vertex with no adjacent edges after n steps, you know you have at least (n-1)/2 edges in the graph, so this procedure takes O(E) time (even though naively it looks like it's O(V) time).
If E is less than V/2, the graph is disconnected (since in a connected graph, E>=V-1), and there's no MST.
So: check if E>=V/2 and only if so, run your MST algorithm.
This takes O(E + ElogV + V) = O(E + ElogV + 2E) = O(ElogV) time.

Dropping non-constants in Algorithm Complexity

So, basically I'm implementing an algorithm to calculate distances from one source node to every other node in a weighted graph, and if a node is in a negative cycle, it detects and marks that node as such.
My question regards the total time complexity of my algorithm. Assume V is number of nodes and E the number of edges.
The algorithm starts by asking E lines of input to specify the Edges of the graph and inserts it in the corresponding adjacency list. Such operation is O(E)
I apply the Bellman-Ford algorithm V-1 times to know the distances and then I apply the algorithm V-1 times once again to detect the Nodes in a negative cycle. This is 2 * O(VE) = O(VE).
I print a distances vector with the size V to display the distances and/or wether the node is in a negative cycle or not. O(V).
So I guess my total complexity would be O(VE + V + E). Now my question is: Since VE is almost always bigger than V+E (for large numbers, it's always!), can I drop the V+E in the complexity and make it simply O(VE)?
Yes, O(VE + V + E) simplifies to O(VE) given that V and E represent the number of vertices and edges in a graph. For a highly connected graph, E = O(V^2) and so in that case VE + V + E = O(V^3) = O(VE). For a sparse graph, E = O(V) (note, this is not necessarily a tight upper bound) and so VE + V + E = O(V^2) = O(VE). In all cases O(VE) is an appropriate upper bound on the complexity.
Yes, when dealing with asymptotic complexity, you always assume that V and E are very large (in theory, you study complexity by calculating limits when V and E approach infinity). Pretty much the same why you can write n^2 + n = O(n^2), in your case VE + V + E is O(VE).
Note that the worst-case complexity of Bellman-Ford actually is O(VE), which confirms that your reasoning is correct.

Graph Minimum Spanning Tree using BFS

This is a problem from a practice exam that I'm struggling with:
Let G = (V, E) be a weighted undirected connected graph, with positive
weights (you may assume that the weights are distinct). Given a real
number r, define the subgraph Gr = (V, {e in E | w(e) <= r}). For
example, G0 has no edges (obviously disconnected), and Ginfinity = G
(which by assumption is connected). The problem is to find the
smallest r such that Gr is connected.
Describe an O(mlogn)-time algorithm that solves the problem by
repeated applications of BFS or DFS.
The real problem is doing it in O(mlogn). Here's what I've got:
r = min( w(e) ) => O(m)
while true do => O(m)
Gr = G with edges e | w(e) > r removed => O(m)
if | BFS( Gr ).V | < |V| => O(m + n)
r++ (or r = next smallest w(e))
else
return r
That's a whopping O(m^2 + mn). Any ideas for getting it down to O(mlogn)? Thanks!
You are iterating over all possible edge costs which results in the outer loop of O(m). Notice that if the graph is disconnected when you discard all edges >w(e), it is also disconnected for >w(e') where w(e') < w(e). You can use this property to do a binary search over the edge costs and thus do this in O(log(n)).
lo=min(w(e) for e in edges), hi=max(w(e) for e in edges)
while lo<hi:
mid=(lo+hi)/2
if connected(graph after discarding all e where w(e)>w(mid)):
lo=mid
else:
hi=mid-1
return lo
The binary search has a complexity of O(log (max_e-min_e)) (you can actually bring it down to O(log(edges)) and discarding edges and determining connectivity can be done in O(edges+vertices), so this can be done in O((edge+vertices)*log(edges)).
Warning: I have not tested this in code yet, so there may be bugs. But the idea should work.
How about the following algorithm?
First take a list of all edges (or all distinct edge lengths, using ) from the graph and sort them. That takes O(m*log m) = O(m*log n) time: m is usually less than n^2, so O(log m)=O(log n^2)=O(2*log n)=O(log n).
It is obvious that r should be equal to the weight of some edge. So you can do a binary search on the index of the edge in the sorted array.
For each index you try, you take the length of the correspondong edge as r, and check the graph for connectivity, only using the edges of length <= r with BFS or DFS.
Each iteration of the binary search takes O(m), and you have to make O(log m)=O(log n) iterations.

Resources