What is the order of growth of the running time of the following code if the graph uses an adjacency-list representation, where V is the number of vertices, and E is the total number of edges?
// G.V() returns number of vertices, G is the graph.
for (int v = 0; v < G.V(); v++) {
for (w : G.adj(v)) {
System.out.println(v + "-" + w);
}
}
Why is the time complexity of the above code Theta(V+E), where V is the number of vertices and E is the number of edges?
I believe that if we let printing be the cost function, then it should be Theta(sum of degrees of each v) = Theta(2E) = Theta(E) because we enter the inner loop deg(v) times for vertex v.
if we let printing be the cost function, then
Using such assumption, yes, there will be Theta(E) println calls.
However, generally the execution time does not depend only on printing, but also on other instructions such as v++, there will be Theta(V+E) of them all.
Related
I am going through this link for adjacency list representation.
http://www.geeksforgeeks.org/graph-and-its-representations/
I have a simple doubt in some part of a code as follows :
// A utility function to print the adjacenncy list representation of graph
void printGraph(struct Graph* graph)
{
int v;
for (v = 0; v < graph->V; ++v)
{
struct AdjListNode* pCrawl = graph->array[v].head;
printf("\n Adjacency list of vertex %d\n head ", v);
while (pCrawl)
{
printf("-> %d", pCrawl->dest);
pCrawl = pCrawl->next;
}
printf("\n");
}
}
Since, here for every V while loop is executed say d times where d is the degree of each vertex.
So, I think time complexity is like
d0 + d1 + d2 + d3 ....... +dV where di is the degree of each vertex.
All this sums up O(E), but the link says that time complexity is O(|V|+|E|)
Not sure what is the problem with the understanding. Some help needed here
The important thing here is that for the time complexity to be valid, we need to cover every possible situation:
The outer loop is executed O(|V|) regardless of the graph structure.
Even if we had no edges at all, for every iteration of the outer loop, we would have to do a constant number of operations (O(1))
The inner loop is executed once for every edge, thus O(deg(v)) times, where deg(v) is the degree of the current node.
Thus the runtime of a single iteration of the outer loop is O(1 + deg(v)). Note that we cannot leave out the 1, because deg(v) might be 0 but we still need to do some work in that iteration
Summing it all up, we get a runtime of O(|V| * 1 + deg(v1) + deg(v2) + ...) = O(|V| + |E|).
As mentioned before, |E| could be rather small such that we still need
to account for the work done exclusively in the outer loop. Thus, we cannot simply remove the |V| term.
In an undirected graph with V vertices and E edges how would you count the number of triangles in O(|V||E|)? I see the algorithm here but I'm not exactly sure how that would be implemented to achieve that complexity. Here's the code presented in that post:
for each edge (u, v):
for each vertex w:
if (v, w) is an edge and (w, u) is an edge:
return true
return false
Would you use an adjacency list representation of the graph to traverse all edges in the outer loop and then an adjacency matrix to check for the existence of the 2 edges in the inner loop?
Also, I saw a another solution presented as O(|V||E|) which involves performing a depth-first search on the graph and when you encounter a backedge (u,v) from the vertex u you're visiting check if the grandparent of the vertex u is vertex v. If it is then you have found a triangle. Is this algorithm correct? If so, wouldn't this be O(|V|+|E|)? In the post I linked to there is a counterexample for the breadth-first search solution offered up but based on the examples I came up with it seems like the depth-first search method I outlined above works.
Firstly, note that the algorithm does not so much count the number of triangles, but rather returns whether one exists at all.
For the first algorithm, the analysis becomes simple if we assume that we can do the lookup of (a, b) is an edge in constant time. (Since we loop over all vertices for all edges, and only do something with constant time we get O(|V||E|*1). ) Telling whether something is a member of a set in constant time can be done using for example a hashtable/set. We could also, as you said, do this by the use of the adjacency matrix, which we could create beforehand by looping over all the edges, not changing our total complexity.
An adjacency list representation for looping over the edges could perhaps be used, but traversing this may be O(|V|+|E|), giving us the total complexity O(|V||V| + |V||E|) which may be more than we wanted. If that is the case, we should instead loop over this first, and add all our edges to a normal collection (like a list).
For your proposed DFS algorithm, the problem is that we cannot be sure to encounter a certain edge as a backedge at the correct moment, as is illustrated by the following counterexample:
A -- B --- C -- D
\ / |
E ----- F
Here if we look from A-B-C-E, and then find the backedge E-B, we correctly find the triangle; but if we instead go A-B-C-D-F-E, the backedges E-B, and E-C, do no longer satisfy our condition.
This is a naive approach to count the number of cycles.
We need the input in the form of an adjacency matrix.
public int countTricycles(int [][] adj){
int n = adj.length;
int count = 0;
for(int i = 0; i < n ;i++){
for(int j = 0; j < n; j++){
if(adj[i][j] != 0){
for(int k = 0; k < n; k++){
if(k!=i && adj[j][k] != 0 && adj[i][k] != 0 ){
count++;
}
}
}
}
}
return count/6;
}
The complexity would be O(n^3).
Let G = (V, E) be a directed graph with nodes v_1, v_2,..., v_n. We say that G is an ordered graph if it has the following properties.
Each edge goes from a node with lower index to a node with a higher index. That is, every directed edge has the form (v_i, v_j) with i < j.
Each node except v_n has at least one edge leaving it. That is, for every node v_i, there is at least one edge of the form (v_i, v_j).
Give an efficient algorithm that takes an ordered graph G and returns the length of the longest path that begins at v_1 and ends at v_n.
If you want to see the nice latex version: here
My attempt:
Dynamic programming. Opt(i) = max {Opt(j)} + 1. for all j such such j is reachable from i.
Is there perhaps a better way to do this? I think even with memoization my algorithm will still be exponential. (this is just from an old midterm review I found online)
Your approach is right, you will have to do
Opt(i) = max {Opt(j)} + 1} for all j such that j is reachable from i
However, this is exponential only if you run it without memoization. With memoization, you will have the memoized optimal value for every node j, j > i, when you are on node i.
For the worst case complexity, let us assume that every two nodes that could be connected are connected. This means, v_1 is connected with (v_2, v_3, ... v_n); v_i is connected with (v_(i+1), v_(i+2), ... v_n).
Number of Vertices (V) = n
Hence, number of edges (E) = n*(n+1)/2 = O(V^2)
Let us focus our attention on a vertex v_k. For this vertex, we have to go through the already derived optimal values of (n-k) nodes.
Number of ways of reaching v_k directly = (k-1)
Hence worst case time complexity => sigma((k-1)*(n-k)) from k=1 to k=n, which is a sigma of power 2 polynomical, and hence will result in O(n^3) Time complexity.
Simplistically, the worst case time complexity is O(n^3) == O(V^3) == O(E) * O(V) == O(EV).
Thanks to the first property, this problem can be solved O(V^2) or even better with O(E) where V is the number of vertices and E is the number of edges. Indeed, it uses the dynamic programming approach which is quiet similar with the one you gives. Let opt[i] be the length of the longest path for v_1 to v_i. Then
opt[i] = max(opt[j]) + 1 where j < i and we v_i and v_j is connected,
using this equation, it can be solved in O(V^2).
Even better, we can solve this in another order.
int LongestPath() {
for (int v = 1; v <= V; ++v) opt[v] = -1;
opt[1] = 0;
for (int v = 1; v <= V; ++v) {
if (opt[v] >= 0) {
/* Each edge can be visited at most once,
thus the runtime time is bounded by |E|.
*/
for_each( v' can be reached from v)
opt[v'] = max(opt[v]+1, opt[v']);
}
}
return opt[V];
}
I have a weighted graph, no negative weights, and I would like to find the path from one node to another, trying to minimize the cost for the single step. I don't need to minimize the total cost of the trip (as e.g. Dijkstra does) but the average step-cost. However, I have a constraint: K, the maximum number of nodes in the path.
So for example to go from A to J maybe Dijkstra would find this path (between parenthesis the weight)
A (4) D (6) J -> total cost: 10
and the algorithm I need, setting K = 10, would find something like
A (1) B (2) C (2) D (1) E (3) F (2) G (1) H (3) J -> total cost: 15
Is there any well known algorithm for this problem?
Thanks in advance.
Eugenio
Edit as answer to templatetypedef.
Some questions:
1) The fact that it can happen to take a cycle multiple times to drive down the average is not good for my problem: maybe I should have mentioned it but I don' want to visit the same node more than once
2) Is it possible to exploit the fact that I don't have negative weights?
3) When you said O(kE) you meant for the whole algorithm or just for the additional part?
Let's take this simple implementation in C where n=number of nodes e=number of edges, d is a vector with the distances, p a vector with the predecessor and a structure edges (u,v,w) memorize the edges in the graphs
for (i = 0; i < n; ++i)
d[i] = INFINITY;
d[s] = 0;
for (i = 0; i < n - 1; ++i)
for (j = 0; j < e; ++j)
if (d[edges[j].u] + edges[j].w < d[edges[j].v]){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
}
I'm not sure how I should modify the code according to your answer; to take into consideration the average instead of the total cost should this be enough?
for (i = 0; i < n; ++i)
d[i] = INFINITY;
d[s] = 0;
for (i = 0; i < n - 1; ++i)
steps = 0;
for (j = 0; j < e; ++j)
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
But anyway I don't know how take into consideration the K limit at the same time...Thanks again in advance for your help.
Edit
Since I can afford some errors I'm thinking about this naif solution:
precompute all the shortest paths and memorize in A
precompute all the shortest paths on a modified graph, where I cut the edges over a certain weight and memorize them in B
When I need a path, I look in A, e.g. from x to y this is the path
x->z->y
then for each step I look in B,
so for x > z I see if there is a connection in B, if not I keep x > z otherwise I fill the path x > z with the subpath provided by B, that could be something like x->j->h->z; then I do the same for z->y.
Each time I will also check if I'm adding a cyclic path.
Maybe I will get some weird paths but it could work in most of the case.
If I extend the solution trying with different "cut thresholds" maybe I can also be close to respect the K constrain.
I believe that you can solve this using a modified version of the Bellman-Ford algorithm.
Bellman-Ford is based on the following dynamic programming recurrence that tries to find the shortest path from some start node s to each other node that's of length no greater than m for some m. As a base case, when you consider paths of length zero, the only reachable node is s and the initial values are
BF(s, t, 0) = infinity
BF(s, s, 0) = 0
Then, if we know the values for a path of length m, we can find it for paths of length m + 1 by noting that the old path may still be valid, or we want to extend some path by length one:
BF(s, t, m + 1) = min {
BF(s, t, m),
BF(s, u, m) + d(u, t) for any node u connected to t
}
The algorithm as a whole works by noting that any shortest path must have length no greater than n and then using the above recurrence and dynamic programming to compute the value of BF(s, t, n) for all t. Its overall runtime is O(EV), since there are E edges to consider at each step and V total vertices.
Let's see how we can change this algorithm to solve your problem. First, to limit this to paths of length k, we can just cut off the Bellman-Ford iteration after finding all shortest paths of length up to k. To find the path with lowest average cost is a bit trickier. At each point, we'll track two quantities - the length of the shortest path reaching a node t and the average length of that path. When considering new paths that can reach t, our options are to either keep the earlier path we found (whose cost is given by the shortest path so far divided by the number of nodes in it) or to extend some other path by one step. The new cost of that path is then given by the total cost from before plus the edge length divided by the number of edges in the old path plus one. If we take the cheapest of these and then record both its cost and number of edges, at the end we will have computed the path with lowest average cost of length no greater than k in time O(kE). As an initialization, we will say that the path from the start node to itself has length 0 and average cost 0 (the average cost doesn't matter, since whenever we multiply it by the number of edges we get 0). We will also say that every other node is at distance infinity by saying that the average cost of an edge is infinity and that the number of edges is one. That way, if we ever try computing the cost of a path formed by extending the path, it will appear to have average cost infinity and won't be chosen.
Mathematically, the solution looks like this. At each point we store the average edge cost and the total number of edges at each node:
BF(s, t, 0).edges = 1
BF(s, t, 0).cost = infinity
BF(s, s, 0).edges = 0
BF(s, s, 0).cost = 0
BF(s, t, m + 1).cost = min {
BF(s, t, m).cost,
(BF(s, u, m).cost * BF(s, u, m).edges + d(u, t)) / (BF(s, u, m).edges + 1)
}
BF(s, t, m + 1).edges = {
BF(s, t, m).edges if you chose the first option above.
BF(s, u, m).edges + 1 else, where u is as above
}
Note that this may not find a simple path of length k, since minimizing the average cost might require you to take a cycle with low (positive or negative) cost multiple times to drive down the average. For example, if a graph has a cost-zero loop, you should just keep taking it as many times as you can.
EDIT: In response to your new questions, this approach won't work if you don't want to duplicate nodes on a path. As #comestibles has pointed out, this version of the problem is NP-hard, so unless P = NP you shouldn't expect to find any good polynomial-time algorithm for this problem.
As for the runtime, the algorithm I've described above runs in total time O(kE). This is because each iteration of computing the recurrence takes O(E) time and there are a total of k iterations.
Finally, let's look at your proposed code. I've reprinted it here:
for (i = 0; i < n - 1; ++i) {
steps = 0;
for (j = 0; j < e; ++j) {
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
}
}
Your first question was how to take k into account. This can be done easily by rewriting the outer loop to count up to k, not n - 1. That gives us this code:
for (i = 0; i < k; ++i) {
steps = 0;
for (j = 0; j < e; ++j) {
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
}
}
One problem that I'm noticing is that the modified Bellman-Ford algorithm needs to have each candidate best path store its number of edges independently, since each node's optimal path might be reached by a different number of edges. To fix this, I would suggest having the d array store two values - the number of edges required to reach the node and the average cost of a node along that path. You would then update your code by replacing the steps variable in these equations with the cached path lengths.
Hope this helps!
For the new version of your problem, there's a reduction from Hamilton path (making your problem intractable). Take an instance of Hamilton path (i.e., a graph whose edges are assumed to have unit weight), add source and sink vertices and edges of weight 2 from the source to all others and from the sink to all others. Set K = |V| + 2 and request a path from source to sink. There exists a Hamilton path if and only if the optimal mean edge length is (|V| + 3)/(|V| + 2).
Care to tell us why you want these paths so that we can advise you of a reasonable approximation strategy?
You can slightly modify Bellman-Ford algorithm to find minimum path using at most K edges/nodes.
If the number of edges is fixed than you have to minimize total cost, because average cost would be TotalCost/NumberOfEdges.
One of the solutions would be to iterate NumberOfEdges from 1 to K, find minimal total cost and choose minimum TotalCost/NumberOfEdges.
I have written a code that solves MST using Prim method. I read that this kind of implementation(using priority queue) should have O(E + VlogV) = O(VlogV) where E is the number of edges and V number of Edges but when I look at my code it simply doesn't look that way.I would appreciate it if someone could clear this up for me.
To me it seems the running time is this:
The while loop takes O(E) times(until we go through all the edges)
Inside that loop we extract an element from the Q which takes O(logE) time.
And the second inner loop takes O(V) time(although we dont run this loop everytime
it is clear that it will be ran V times since we have to add all the vertices )
My conclusion would be that the running time is: O( E(logE+V) ) = O( E*V ).
This is my code:
#define p_int pair < int, int >
int N, M; //N - nmb of vertices, M - nmb of edges
int graph[100][100] = { 0 }; //adj. matrix
bool in_tree[100] = { false }; //if a node if in the mst
priority_queue< p_int, vector < p_int >, greater < p_int > > Q;
/*
keeps track of what is the smallest edge connecting a node in the mst tree and
a node outside the tree. First part of pair is the weight of the edge and the
second is the node. We dont remember the parent node beaceuse we dont need it :-)
*/
int mst_prim()
{
Q.push( make_pair( 0, 0 ) );
int nconnected = 0;
int mst_cost = 0;
while( nconnected < N )
{
p_int node = Q.top(); Q.pop();
if( in_tree[ node.second ] == false )
{
mst_cost += node.first;
in_tree[ node.second ] = true;
for( int i = 0; i < N; ++i )
if( graph[ node.second ][i] > 0 && in_tree[i]== false )
Q.push( make_pair( graph[ node.second ][i], i ) );
nconnected++;
}
}
return mst_cost;
}
You can use adjacency lists to speed your solution up (but not for dense graphs), but even then, you are not going to get O(V log V) without a Fibonacci heap..
Maybe the Kruskal algorithm would be simpler for you to understand. It features no priority queue, you only have to sort an array of edges once. It goes like this basically:
Insert all edges into an array and sort them by weight
Iterate over the sorted edges, and for each edge connecting nodes i and j, check if i and j are connected. If they are, skip the edge, else add the edge into the MST.
The only catch is to be quickly able to say if two nodes are connected. For this you use the Union-Find data structure, which goes like this:
int T[MAX_#_OF_NODES]
int getParent(int a)
{
if (T[a]==-1)return a;
return T[a]=getParent(T[a]);
}
void Unite(int a,int b)
{
if (rand()&1)
T[a]=b;
else
T[b]=a;
}
In the beginning, just initialize T to all -1, and then every time you want to find out if nodes A and B are connected, just compare their parents - if they are the same, they are connected (like this getParent(A)==getParent(B)). When you are inserting the edge to MST, make sure to update the Union-Find with Unite(getParent(A),getParent(B)).
The analysis is simple, you sort the edges and iterate over using the UF that takes O(1). So it is O(E logE + E ) which equals O(E log E).
That is it ;-)
I did not have to deal with the algorithm before, but what you have implemented does not match the algorithm as explained on Wikipedia. The algorithm there works as follows.
But all vertices into the queue. O(V)
While the queue is not empty... O(V)
Take the edge with the minimum weight from the queue. O(log(V))
Update the weights of adjacent vertices. O(E / V), this is the average number of adjacent vertices.
Reestablish the queue structure. O(log(V))
This gives
O(V) + O(V) * (O(log(V)) + O(V/E))
= O(V) + O(V) * O(log(V)) + O(V) * O(E / V)
= O(V) + O(V * log(V)) + O(E)
= O(V * log(V)) + O(E)
exactly what one expects.