Given: An unweighted, directed Graph (G=(E,V)), which can contain any number of cycles.
Goal: For all vertices I want the longest simple path to some target vertex X in V
Algorithm Idea:
For each v in V
v.distanceToTarget = DepthFirstSearch(v)
Next
DepthFirstSearch(v as Vertex)
if v = target then
'Distance towards target is 0 for target itself
return 0
elseif v.isVisitedInCurrentDFSPath then
'Cycle found -> I wont find the target when I go around in cycles -> abort
return -infinity
else
'Return the maximum Distance of all Successors + 1
return max(v.Successors.ForEach(Function(v) DepthFirstSearch(v) )) + 1
end if
Is this correct for all cases? (Assuming, that the target can be reached from every vertex)
The number of edges in my graphs is very small.
Assume |E| <= 3*|V| holds. How would I compute the average time complexity?
Thanks!
Time complexity is about what values influence your runtime most. In your case your evaluating all possible paths between v and target. That is basically O(number of routes). Now you need to figure out how to express number of all possible routes in terms of E and V.
Most likely result with be something like O(exp(E)) or O(exp(V)) because number of routes going through each node/vertex goes exponentially when you add new possible routes.
EDIT: I missed a detail that you were asking for average time complexity that would mean amortized complexity. But as your algorithm is always evaluates all possible routes worst case complexity is same as average complexity.
Related
Here I use a pseudocode to present my algorithm,it's a variation of DFS
The coding style is imitating the Introduction to Algorithm ,Everytime we come across a vertex,its color is painted BLACK .Suppose the starting vertex is START,and the target vertex is TARGET,and the graph is represented as G=(V,E).One more thing,assume that the graph is connected,and strongly connected if it's a directed graph.
FIND-ALL-PATH(G,START,END)
for each vertex u in G.V
u.color=WHITE
path=0//store the result
DFS-VISIT(G,START)
DFS-VISIT(G,u)
if(u==TARGET)
path=path+1
return
u.color=BLACK
for each v in G:Adj[u]
if(v.color==WHITE)
DFS-VISIT(G,v)
u.color=WHITE;//re-paint the vertex to find other possible ways
How to analyze the Time Complexity of the algorithm above?If it's the normal DFS then of course its O(N+E),because each vertex is visited only once,and each edge is visited twice.But what about this?It seems hard to specify the time that each vertex or edge is visited.
To analyze the time complexity for FIND-ALL-PATH, let's see what is the time complexity of DFS-VISIT. I am assuming you are using Adjacency List for representing the Graph.
Here, in one call of DFS-VISIT every vertex which is connected to u (the vertex you passed as the argument) is going to be explored once (i.e. vertex color is going to be changed to BLACK). Since this is a recursive function so in each recursion a new stack is going to be formed and there the set G:Adj[u] present in each stack is nothing but element adjacent to u. Therefore, every node in all the list put together will be examined(color is changed) exactly once and whenever they are examined, we do a constant work (i.e. O(1) operation). There are overall E elements in case of directed Graph and 2E in case of un-directed Graph in Adjacency List representation. So we say it's time is O(E), where E is the number of edges. In some books, they add extra time O(N), where N is the number of vertices, so they say they overall time complexity for DFS-VISIT is O(N+E)(I think that the reason for that extra O(N) time is the for loop which gets executed N number of times or it may be something else). BTW, N is always less than E so you can either ignore it or consider it, it doesn't affect the Asymptotic time for the DFS-VISIT.
The time complexity of the function FIND-ALL-PATH is N * time complexity for DFS-VISIT; where N is the number of vertices in the Graph. So I would say that the algorithm you wrote above is not exactly same as depth-first traversal algorithm but then it will do the same work as depth-first traversal. The time taken in your algorithms is more because you are calling DFS-VISIT for each vertex in your graph. Your function FIND-ALL-PATH could be optimized in a way that before calling DFS-VISIT function just check if the color of the vertex is changed to BLACK or not (that's what is generally done in depth-first traversal).
i.e. you should have written the function like this:
FIND-ALL-PATH(G,START,END)
for each vertex u in G.V
u.color=WHITE
path=0//store the result
for each vertex u in G.V
if u.color is WHITE
DFS-VISIT(G,START)
Now this function written above will have same time complexity as DFS-VISIT.
Also note that there is some time taken to initialize the color of all vertices to WHITE, which is O(N) operation.
So, the overall time complexity of your function FIND-ALL-PATH is O(N)+O(N*(N+E)) or you can ignore the first O(N) (as it's very less as compared to the other term).
Thus, time complexity = O(N*(N+E)), or if you assume just O(E) time for your DFS-VISIT function then you can say that time complexity is O(N*E).
Let me know if you have doubt at any point mentioned above.
For directed graph :-
Suppose there are 7 vertices {0,1,2,3,4,5,6} and consider the worst case where every vertex is connected to every other vertex =>
No of edges required to reach from x to y vertices is as follows :
(6->6) 6 to 6 =0
(5->6) 5 to 6 =1
(4->6) 4 to 6 = (4 to 5 -> 6) + (4 to 6->6) = (1+( 5 -> 6)) + (1+0) =(1 + 1) + 1= 3
(3->6) 3 to 6 =(3 to 4 -> 6) + (3 to 5 -> 6 ) + (3 to 6->6) = (1+3) + (1+1) + (1+0)=7
(2->6) 2 to 6= 4+7+3+1=15
(1->6) 1 to 6= 5+15+7+3+1=31
(0->6) 0 to 6 = 6+5+15+7+3+1=63
So the time complexity to cover all the path to reach from 0 to 6= summation of (1+3+7+15+.....+T(n-1)+T(n))+(Total no of vertices -1) = (2^(n+1)-2-n)+(V-1)
value of n=V-1.
So final time complexity = O(2^V)
For undirected graph :-
Every edge will be traversed twice = 2*((2^(n+1)-2-n)+(V-1))=O(2^(V+1))
Let G(V,E) a directed graph with weights w:E->R. Let's assume |V| is dividable by 10. Let s in V. Describe an algorithm to find a shortest path from s to every v such that it contains exactly |V|/10 edges.
So at first I thought about using dynamic programming but I ended up with complexity of O(|V|^3) which apparently not optimal for this exercise.
We can't use Bellman-Ford (As far as I understand) since there might be negative cycles.
What could be an optimal algorithm for this problem?
Thanks
EDIT
I forgot to mention a crucial information; Path can be not-simple. i.e. we may repeat edges in our path.
You can perform a depth limited search with limit of |V|/10 on your graph. It will help you find the path with the least cost.
limit = v_size / 10
best[v_size] initialize to MAX_INT
depth_limited(node, length, cost)
if length == limit
if cost < best[node]
best[node] = cost
return
for each u connected to node
depth_limited(u, length+1, cost + w[node][u])
depth_limited(start_node, 0, 0)
According to me Bellman Ford's algorithm SHOULD be applicable here with a slight modification.
After iteration k, the distances to each node u_j(k) would be the shortest distance from source s having exactly k edges.
Initialize u_j(0) = infinity for all u_j, and 0 for s. Then recurrence relation would be,
u_j(k) = min {u_p(k-1) + w_{pj} | There is an edge from u_p to u_j}
Note that in this case u_j(k) may be greater than u_j(k-1).
The complexity of the above algorithm shall be O(|E|.|V|/10) = O(|E|.|V|).
Also, the negative cycles won't matter in this case because we shall stop after |V|/10 iterations. Whether cost can be further decreased by adding more edges is irrelevant.
I'm looking for an algorithm to solve this problem. I have to implement it (so I need a not np solution XD)
I have a complete graph with a cost on each arch and a reward on each vertex. I have only a start point, but it doesn't matter the end point, becouse the problem is to find a path to see as many vertex as possible, in order to have the maximum reward possible, but subject to a maximum cost limit. (for this reason it doesn't matter the end position).
I think to find the optimum solution is a np-hard problem, but also an approximate solution is apprecciated :D
Thanks
I'm trying study how to solve the problem with branch & bound...
update: complete problem dscription
I have a region in which there are several areas identify by its id and x,y,z position. Each vertex identifies one ot these areas. The maximum number of ares is 200.
From a start point S, I know the cost, specified in seconds and inserted in the arch (so only integer values), to reach each vertex from each other vertex (a complete graph).
When I visit a vertex I get a reward (float valiues).
My objective is to find a paths in a the graph that maximize the reward but I'm subject to a cost constraint on the paths. Indeed I have only limited minutes to complete the path (for example 600 seconds.)
The graph is made as matrix adjacency matrix for the cost and reward (but if is useful I can change the representation).
I can visit vertex more time but with one reward only!
Since you're interested in branch and bound, let's formulate a linear program. Use Floyd–Warshall to adjust the costs minimally downward so that cost(uw) ≤ cost(uv) + cost(vw) for all vertices u, v, w.
Let s be the starting vertex. We have 0-1 variables x(v) that indicate whether vertex v is part of the path and 0-1 variables y(uv) that indicate whether the arc uv is part of the path. We seek to maximize
sum over all vertices v of reward(v) x(v).
The constraints unfortunately are rather complicated. We first relate the x and y variables.
for all vertices v ≠ s, x(v) - sum over all vertices u of y(uv) = 0
Then we bound the cost.
sum over all arcs uv of cost(uv) y(uv) ≤ budget
We have (pre)flow constraints to ensure that the arcs chosen look like a path possibly accompanied by cycles (we'll handle the cycles shortly).
for all vertices v, sum over all vertices u of y(uv)
- sum over all vertices w of y(vw)
≥ -1 if v = s
0 if v ≠ s
To handle the cycles, we add cut covering constraints.
for all subsets of vertices T such that s is not in T,
for all vertices t in T,
x(t) - sum over all vertices u not in T and v in T of y(uv) ≥ 0
Because of the preflow constraints, a cycle necessarily is disconnected from the path structure.
There are exponentially many cut covering constraints, so when solving the LP, we have to generate them on demand. This means finding the minimum cut between s and each other vertex t, then verifying that the capacity of the cut is no greater than x(t). If we find a violation, then we add the constraint and use the dual simplex method to find the new optimum (repeat as necessary).
I'm going to pass on describing the branching machinery – this should be taken care of by your LP solver anyway.
Finding the optimal solution
Here is a recursive approach to solving your problem.
Let's begin with some definitions :
Let A = (Ai)1 ≤ i ≤ N be the areas.
Let wi,j = wj,i the time cost for traveling from Ai to Aj and vice versa.
Let ri the reward for visiting area Ai
Here is the recursive procedure that will output the exact requested solution : (pseudo-code)
List<Area> GetBestPath(int time_limit, Area S, int *rwd) {
int best_reward(0), possible_reward(0), best_fit(0);
List<Area> possible_path[N] = {[]};
if (time_limit < 0) {
return [];
}
if (!S.visited) {
*rwd += S.reward;
S.visit();
}
for (int i = 0; i < N; ++i) {
if (S.index != i) {
possible_path[i] = GetBestPath(time_limit - W[S.index][i], A[i], &possible_reward);
if (possible_reward > best_reward) {
best_reward = possible_reward;
best_fit = i;
}
}
}
*rwd+= best_reward;
possible_path[best_fit].push_front(S);
return possible_path[best_fit];
}
For obvious clarity reasons, I supposed the Ai to be globally reachable, as well as the wi,j.
Explanations
You start at S. First thing you do ? Collect the reward and mark the node as visited. Then you have to check which way to go is best between the S's N-1 neighbors (lets call them NS,i for 1 ≤ i ≤ N-1).
This is the exact same thing as solving the problem for NS,i with a time limit of :
time_limit - W(S ↔ NS,i)
And since you mark the visited nodes, when arriving at an area, you first check if it is marked. If so you have no reward ... Else you collect and mark it as visited ...
And so forth !
The ending condition is when time_limit (C) becomes negative. This tells us we reached the limit and cannot proceed to further moves : the recursion ends. The final path may contain useless journeys if all the rewards have been collected before the time limit C is reached. You'll have to "prune" the output list.
Complexity ?
Oh this solution is soooooooo awful in terms of complexity !
Each calls leads to N-1 calls ... Until the time limit is reached. The longest possible call sequence is yielded by going back and forth each time on the shortest edge. Let wmin be the weight of this edge.
Then obviously, the overall complexity is bounded by NC/wmin.C/wmin.
This is huuuuuge.
Another approach
Maintain a hash table of all the visited nodes.
On the other side, maintain a Max-priority queue (eg. using a MaxHeap) of the nodes that have not been collected yet. (The top of the heap is the node with the highest reward). The priority value for each node Ai in the queue is set as the couple (ri, E[wi,j])
Pop the heap : Target <- heap.pop().
Compute the shortest path to this node using Dijkstra algorithm.
Check out the path : If the cost of the path is too high, then the node is not reachable, add it to the unreachable nodes list.
Else collect all the uncollected nodes that you find in it and ...
Remove each collected node from the heap.
Set Target as the new starting point.
In either case, proceed to step 1. until the heap is empty.
Note : A hash table is the best suited to keep track of the collected node. This way, we can check a node in a path computed using Dijkstra in O(1).
Likewise, maintaining a hashtable leading to the position of each node in the heap might be useful to optimise the "pruning" of the heap, when collecting the nodes along a path.
A little analysis
This approach is slightly better than the first one in terms of complexity, but may not lead to the optimal result. In fact, it can even perform quite poorly on some graph configurations. For example, if all nodes have a reward r, except one node T that has r+1 and W(N ↔ T) = C for every node N, but the other edges would be all reachable, then this will only make you collect T and miss every other node. In this particular case, the best solution would have been to ignore T and collect everyone else leading to a reward of (N-1).r instead of only r+1.
I'm trying to come up with a reasonable algorithm for this problem:
Let's say we have bunch of locations. We know the distances between each pair of locations. Each location also has a point. The goal is to maximize the sum of the points while travelling from a starting location to a destination location without exceeding a given amount of distance.
Here is a simple example:
Starting location: C , Destination: B, Given amount of distance: 45
Solution: C-A-B route with 9 points
I'm just curious if there is some kind of dynamic algorithm for this type of problem. What the would be the best, or rather easiest approach for that problem?
Any help is greatly appreciated.
Edit: You are not allowed to visit the same location many times.
EDIT: Under the newly added restriction that every node can be visited only once, the problem is most definitely NP-hard via reduction to Hamilton path: For a general undirected, unweighted graph, set all edge weights to zero and every vertex weight to 1. Then the maximum reachable score is n iif there is a Hamilton path in the original graph.
So it might be a good idea to look into integer linear programming solvers for instance families that are not constructed specifically to be hard.
The solution below assumes that a vertex can be visited more than once and makes use of the fact that node weights are bounded by a constant.
Let p(x) be the point value for vertex x and w(x,y) be the distance weight of the edge {x,y} or w(x,y) = ∞ if x and y are not adjacent.
If we are allowed to visit a vertex multiple times and if we can assume that p(x) <= C for some constant C, we might get away with the following recurrence: Let f(x,y,P) be the minimum distance we need to get from x to y while collecting P points. We have
f(x,y,P) = ∞ for all P < 0
f(x,x,p(x)) = 0 for all x
f(x,y,P) = MIN(z, w(x, z) + f(z, y, P - p(x)))
We can compute f using dynamic programming. Now we just need to find the largest P such that
f(start, end, P) <= distance upper bound
This P is the solution.
The complexity of this algorithm with a naive implementation is O(n^4 * C). If the graph is sparse, we can get O(n^2 * m * C) by using adjacency lists for the MIN aggregation.
A universal sink in a directed graph is a vertex v where the in-degree of v is
|V|-1 and out-degree is 0.
I can determine whether a directed graph G has a universal sink by the following alg.
Note: G is represented as an adjacency matrix AdjM and AdjM is given:
for (i=1 to |V|)
if (AdjM[i,1] + AdjM[i,2] + AdjM[i,3] + ... + AdjM[i,|V|] == 0)
&& (AdjM[1,i] + AdjM[2,i] + AdjM[3,i] + ... + AdjM[|V|,i] == |V|-1)
then return i; // i is a universal sink
I solved this problem in O(|V|) time by writing all
|V| of AdjM[i,] and AdjM[,i] values in the code and thus eliminating an inner loop to do these summations.
Is there a way of doing this-- solving it in O(|V|) time without explicitly
coding the summations with each AdjM[i,] and AdjM[,i] as the terms in the summations?
There must be a better way to do it using bit-wise operations, but I can't see it now.
This is Q 22.1-6 in section "Representations of Graphs" of CLRS, p.530.
Thanks in advance.
I think you can easily construct a graph with one universal sink, and change it to a graph with no universal sink by changing only one value in AdjM. This means that you must examine every value in AdjM in order to determine if a universal sink exists.
No matter how cleverly you manipulate the indices and pointers, you cannot beat O(|V|^2).
This can be done in O(|V|) as shown here
Basically, you don't need the sum. You can just check that
all AdjM[i, _] are 0
all AdjM[_, i] are 1 except for AdjM[i, i].
I don't however see how you can eliminate the loop. The solution with checks will be typically faster because you can break out of the loop as soon as any of the condition checks fails.