Path from s to e in a weighted DAG graph with limitations

Path from s to e in a weighted DAG graph with limitations - algorithm

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?

Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

Related

Time Complexity Analysis of BFS

I know that there are a ton of questions out there about the time complexity of BFS which is : O(V+E)
However I still struggle to understand why is the time complexity O(V+E) and not O(V*E)
I know that O(V+E) stands for O(max[V,E]) and my only guess is that it has something to do with the density of the graph and not with the algorithm itself unlike say Merge Sort where it's time complexity is always O(n*logn).
Examples I've thought of are :
A Directed Graph with |E| = |V|-1 and yeah the time complexity will be O(V)
A Directed Graph with |E| = |V|*|V-1| and the complexity would in fact be O(|E|) = O(|V|*|V|) as each vertex has an outgoing edge to every other vertex besides itself
Am I in the right direction? Any insight would be really helpful.

Your "examples of thought" illustrate that the complexity is not O(V*E), but O(E). True, E can be a large number in comparison with V, but it doesn't matter when you say the complexity is O(E).
When the graph is connected, then you can always say it is O(E). The reason to include V in the time complexity, is to cover for the graphs that have many more vertices than edges (and thus are disconnected): the BFS algorithm will not only have to visit all edges, but also all vertices, including those that have no edges, just to detect that they don't have edges. And so we must say O(V+E).

The complexity comes off easily if you walk through the algorithm. Let Q be the FIFO queue where initially it contains the source node. BFS basically does the following
while Q not empty
pop u from Q
for each adjacency v of u
if v is not marked
mark v
push v into Q
Since each node is added once and removed once then the while loop is done O(V) times. Also each time we pop u we perform |adj[u]| operations where |adj[u]| is the number of
adjacencies of u.
Therefore the total complexity is Sum (1+|adj[u]|) over all V which is O(V+E) since the sum of adjacencies is O(E) (2E for undirected graph and E for a directed one)

Consider a situation when you have a tree, maybe even with cycles, you start search from the root and your target is the last leaf of your tree. In this case you will traverse all the edges before you get into your destination.
E.g.
0 - 1
1 - 2
0 - 2
0 - 3
In this scenario you will check 4 edges before you actually find a node #3.

It depends on how the adjacency list is implemented. A properly implemented adjacency list is a list/array of vertices with a list of related edges attached to each vertex entry.
The key is that the edge entries point directly to their corresponding vertex array/list entry, they never have to search through the vertex array/list for a matching entry, they can just look it up directly. This insures that the total number of edge accesses is 2E and the total number of vertex accesses is V+2E. This makes the total time O(E+V).
In improperly implemented adjacency lists, the vertex array/list is not directly indexed, so to go from an edge entry to a vertex entry you have to search through the vertex list which is O(V), which means that the total time is O(E*V).

All paths of length k from a given vertice

Let's say i have a directed graph G(V,E) with edge cost(real number) ∈ (0,1).For given i,I need to find all the couples of vertices (i,j) starting from i that "match".Two vertices (i,j) match if there is a directed path from i to j with length exactly k(k is a given number that is relatively small and could be considered as constant)with cost >=C(C is a given number).Cost of a path is calculated as the product of it's edges.For example if a path starting from i and ending in j of lenght 2 consists edges e1 and e2 then CostOfpath=cost(e1)*cost(e2).
This has to be done in O(E+V*k).So what i thought is modifying the DFS algorithm updating the distances from given starting vertice i until they reach the length of k.If they don't then we can't have a match.However i am having a hard time finding what exactly i can modify in the DFS.Any ideas?

When you need to consider paths with a fixed number of edges in it, dynamic programming often comes to help (while other approaches often fail).
Let's denote dp[v][j] the maximal cost of the path from vertex i (fixed) to vertex v that has exactly j edges.
For a starting values, you can set values for j==1: dp[v][1] is the cost of edge from i to v (or 0 if no such edge exists). Or if you think on it it will be obvious that you can set values for j==0, not j==1: dp[i][0] is 1, while dp[v][0] can be set to zero for v!=i.
Now, if you have values for some j, it is easy to calculate values for j+1:
dp[v][j+1] = max( dp[v'][j] * cost((v', v)) )
This is very similar to Ford-Bellman's algorithm, only that the latter does not need to track the number of edges and thus can use one-dimensional array.
This gives you the solution in O((E+V)*k). Not exactly what you have requested, but I doubt that there exists solution in O(E+V*k).
(In the solution above, I assume that the constant C is positive, and so a zero cost path is equivalent to the path being absolutely absent. If you need, you can specifically account for the C==0 case.)

least cost path, destination unknown

Question
How would one going about finding a least cost path when the destination is unknown, but the number of edges traversed is a fixed value? Is there a specific name for this problem, or for an algorithm to solve it?
Note that maybe the term "walk" is more appropriate than "path", I'm not sure.
Explanation
Say you have a weighted graph, and you start at vertex V1. The goal is to find a path of length N (where N is the number of edges traversed, can cross the same edge multiple times, can revisit vertices) that has the smallest cost. This process would need to be repeated for all possible starting vertices.
As an additional heuristic, consider a turn-based game where there are rooms connected by corridors. Each corridor has a cost associated with it, and your final score is lowered by an amount equal to each cost 'paid'. It takes 1 turn to traverse a corridor, and the game lasts 10 turns. You can stay in a room (self-loop), but staying put has a cost associated with it too. If you know the cost of all corridors (and for staying put in each room; i.e., you know the weighted graph), what is the optimal (highest-scoring) path to take for a 10-turn (or N-turn) game? You can revisit rooms and corridors.
Possible Approach (likely to fail)
I was originally thinking of using Dijkstra's algorithm to find least cost path between all pairs of vertices, and then for each starting vertex subset the LCP's of length N. However, I realized that this might not give the LCP of length N for a given starting vertex. For example, Dijkstra's LCP between V1 and V2 might have length < N, and Dijkstra's might have excluded an unnecessary but low-cost edge, which, if included, would have made the path length equal N.

It's an interesting fact that if A is the adjacency matrix and you compute Ak using addition and min in place of the usual multiply and sum used in normal matrix multiplication, then Ak[i,j] is the length of the shortest path from node i to node j with exactly k edges. Now the trick is to use repeated squaring so that Ak needs only log k matrix multiply ops.
If you need the path in addition to the minimum length, you must track where the result of each min operation came from.
For your purposes, you want the location of the min of each row of the result matrix and corresponding path.
This is a good algorithm if the graph is dense. If it's sparse, then doing one bread-first search per node to depth k will be faster.

Efficient algorithm to extract a subgraph within a maximum distance from multiple vertices

I have an algorithmic problem where there's a straightforward solution, but it seems wasteful. I'm wondering if there's a more efficient way to do the same thing.
Here's the problem:
Input: A large graph G with non-negative edge weights (interpreted as lengths), a list of vertices v, and a list of distances d the same length as v.
Output: The subgraph S of G consisting of all of the vertices that are at a distance of at most d[i] from v[i] for some i.
The obvious solution is to use Dijkstra's algorithm starting from each v[i], modified so that it bails out after hitting a distance of d[i], and then taking the union of the subgraphs that each search traverses. However, in my use case it's frequently going to be the case that the search trees from the v[i]s overlap substantially. That means the Dijkstra approach will wastefully traverse the vertices in the overlap multiple times before I take the union.
In the case that there is only one vertex in v, the Dijkstra approach runs in O(|S|log|S|), taking |S| to be the number of vertices (my graph is sparse, so I ignore the edges term). Is it possible to achieve the same asymptotic run time when v has more than one vertex?
My first idea was to combine the searches out of each v[i] into the same priority queue, but the "bail out" condition mentioned above complicates this approach. Sometimes a vertex will be reached in a shorter distance from one v[i], but you would still want to search through it from another v[j] if the second vertex has a larger d[j] allotted to it.
Thanks!

You can solve this with the complexity of a single Dijkstra run.
Let D be the maximum of the distances in d.
Define a new start vertex, and give it edges to each of the vertices in v.
The length of the edge between start and v[i] should be set to D-d[i].
Then in this new graph, S is given by all vertices within a length D of the start vertex, so apply Dijkstra to the start vertex.

Prim and Kruskal's algorithms complexity

Given an undirected connected graph with weights. w:E->{1,2,3,4,5,6,7} - meaning there is only 7 weights possible.
I need to find a spanning tree using Prim's algorithm in O(n+m) and Kruskal's algorithm in O( m*a(m,n)).
I have no idea how to do this and really need some guidance about how the weights can help me in here.

You can sort edges weights faster.
In Kruskal algorithm you don't need O(M lg M) sort, you just can use count sort (or any other O(M) algorithm). So the final complexity is then O(M) for sorting and O(Ma(m)) for union-find phase. In total it is O(Ma(m)).
For the case of Prim algorithm. You don't need to use heap, you need 7 lists/queues/arrays/anything (with constant time insert and retrieval), one for each weight. And then when you are looking for cheapest outgoing edge you check is one of these lists is nonempty (from the cheapest one) and use that edge. Since 7 is a constant, whole algorithms runs in O(M) time.

As I understand, it is not popular to answer homework assignments, but this could hopefully be usefull for other people than just you ;)
Prim:
Prim is an algorithm for finding a minimum spanning tree (MST), just as Kruskal is.
An easy way to visualize the algorithm, is to draw the graph out on a piece of paper.
Then you create a moveable line (cut) over all the nodes you have selected. In the example below, the set A will be the nodes inside the cut. Then you chose the smallest edge running through the cut, i.e. from a node inside of the line to a node on the outside. Always chose the edge with the lowest weight. After adding the new node, you move the cut, so it contains the newly added node. Then you repeat untill all nodes are within the cut.
A short summary of the algorithm is:
Create a set, A, which will contain the chosen verticies. It will initially contain a random starting node, chosen by you.
Create another set, B. This will initially be empty and used to mark all chosen edges.
Choose an edge E (u, v), that is, an edge from node u to node v. The edge E must be the edge with the smallest weight, which has node u within the set A and v is not inside A. (If there are several edges with equal weight, any can be chosen at random)
Add the edge (u, v) to the set B and v to the set A.
Repeat step 3 and 4 until A = V, where V is the set of all verticies.
The set A and B now describe you spanning tree! The MST will contain the nodes within A and B will describe how they connect.
Kruskal:
Kruskal is similar to Prim, except you have no cut. So you always chose the smallest edge.
Create a set A, which initially is empty. It will be used to store chosen edges.
Chose the edge E with minimum weight from the set E, which is not already in A. (u,v) = (v,u), so you can only traverse the edge one direction.
Add E to A.
Repeat 2 and 3 untill A and E are equal, that is, untill you have chosen all edges.
I am unsure about the exact performance on these algorithms, but I assume Kruskal is O(E log E) and the performance of Prim is based on which data structure you use to store the edges. If you use a binary heap, searching for the smallest edge is faster than if you use an adjacency matrix for storing the minimum edge.
Hope this helps!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio