Find Path of a Specific Weight in a Weighted DAG - algorithm

Given a DAG where are Edges have a Positive Edge Weight. Given a Value N.
Algorithm to calculate a simple (no cycles or node repetitions) Path with the Total weight N?
I am aware of the Algorithm where we have to find a Path of Given Path Length (number of Edges) but somewhat confused about for the Given Path Weight?
Can Dijkstra be modified for this case? Or anything else?

This is NP-complete, so don't expect any reasonably fast (polynomial-time) algorithm. Here's a reduction from the NP-complete Subset Sum problem, where we are given a multiset of n integers X = {x_1, x_2, ..., x_n} and a number k, and asked if there is any submultiset of the n numbers that sum to exactly k:
Create a graph G with n+1 vertices v_1, v_2, ..., v_{n+1}. For each vertex v_i, add edges to every higher-numbered vertex v_j, and give all these edges weight x_i. This graph has O(n^2) edges and can be constructed in O(n^2) time. Clearly it contains no cycles.
Suppose the answer to the Subset Sum problem is YES: That is, there exists a submultiset Y of X such that the numbers in Y total to exactly k. Actually, let Y = {y_1, y_2, ..., y_m} consist of the m <= n indices 1 <= i <= n of the selected elements of X. Then there is a corresponding path in the graph G with exactly the same weight -- namely the path that starts at v_{y_1}, takes the edge to v_{y_2} (which is of weight x_{y_1}), then takes the edge to v_{y_3}, and so on, finally arriving at v_{y_m} and taking a final edge (which is of weight x_{y_m}) to the terminal vertex v_{n+1}.
In the other direction, suppose that there is a simple path in G of total weight exactly k. Since the path is simple, each vertex appears at most once. Thus each edge in the path leaves a unique vertex. For each vertex v_i in the path except the last, add x_i to a set of chosen numbers: these numbers correspond to the edge weights in the path, so clearly they sum to exactly k, implying that the solution to the Subset Sum problem is also YES. (Notice that the position of the final vertex in the path doesn't matter, since we only care about the vertex that it leaves, and all edges leaving a vertex have the same weight.)
A YES answer to either problem implies a YES answer to the other problem, so a NO answer to either problem implies a NO answer to the other problem. Thus the answer to any Subset Sum problem instance can be found by first constructing the specified instance of your problem in polynomial time, and then using any algorithm for your problem to solve that instance -- so if an algorithm exists that can solve any instance of your problem in polynomial time, the NP-hard Subset Sum problem can also be solved in polynomial time.

Related

All paths of length k from a given vertice

Let's say i have a directed graph G(V,E) with edge cost(real number) ∈ (0,1).For given i,I need to find all the couples of vertices (i,j) starting from i that "match".Two vertices (i,j) match if there is a directed path from i to j with length exactly k(k is a given number that is relatively small and could be considered as constant)with cost >=C(C is a given number).Cost of a path is calculated as the product of it's edges.For example if a path starting from i and ending in j of lenght 2 consists edges e1 and e2 then CostOfpath=cost(e1)*cost(e2).
This has to be done in O(E+V*k).So what i thought is modifying the DFS algorithm updating the distances from given starting vertice i until they reach the length of k.If they don't then we can't have a match.However i am having a hard time finding what exactly i can modify in the DFS.Any ideas?
When you need to consider paths with a fixed number of edges in it, dynamic programming often comes to help (while other approaches often fail).
Let's denote dp[v][j] the maximal cost of the path from vertex i (fixed) to vertex v that has exactly j edges.
For a starting values, you can set values for j==1: dp[v][1] is the cost of edge from i to v (or 0 if no such edge exists). Or if you think on it it will be obvious that you can set values for j==0, not j==1: dp[i][0] is 1, while dp[v][0] can be set to zero for v!=i.
Now, if you have values for some j, it is easy to calculate values for j+1:
dp[v][j+1] = max( dp[v'][j] * cost((v', v)) )
This is very similar to Ford-Bellman's algorithm, only that the latter does not need to track the number of edges and thus can use one-dimensional array.
This gives you the solution in O((E+V)*k). Not exactly what you have requested, but I doubt that there exists solution in O(E+V*k).
(In the solution above, I assume that the constant C is positive, and so a zero cost path is equivalent to the path being absolutely absent. If you need, you can specifically account for the C==0 case.)

Single Source Shortest Path with constraint lower bound of minimum cost

Problem description:
Given a graph G in adjacencyMatrix and adjacencyList, inside which there is a source vertex s and a destination vertex d. Find the shortest path from s to d, with a constraint. The constraint is that the shortest path cost c has a lower bound, i.e., the cost c must be greater than an assigned lower bound N but is the smallest in all the costs of possible paths that are greater or equal N.
I understand with this constraint conventional SSSP algorithm like Bellman ford cannot work correctly. How shall I find a most efficient algorithm for this problem?
I suppose you wanted a walk, since path cannot have cycles.
Unfortunately, the problem can be easily modeled as change-making problem, which is NPC too.
Change-making problem: Given N types of coins of c_i value each, is it possible that number X can be changed with those N coins?
Modelling: Assume all c_i's are even (double all c_i, and also X, if not). Create N + 2 vertices, which the i-th vertex represents the i-th coin for 1 <= i <= N. Also, the (N+1) and (N+2)-th vertices have edge to all coins with cost (c_i / 2). The problem is then equivalent to finding a shortest path of cost at least X, which is NPC. The reduction should be obvious, but if further explanations are needed I can edit my answer.

Algorithm to Maximize Degree Centrality of Subgraph

Say I have some graph with nodes and undirected edges (the edges may have a weight associated to them).
I want to find all (or at least one) connected subgraphs that maximize the sum of the degree centrality of all nodes in the subgraph (the degree centrality is based on the original graph) under the constraint that the sum of the weighted edges is < X.
Is there an algorithm that will do this?
A quick search took me to this description of degree centrality. It turns out that the "degree centrality" of a vertex is simply its degree (neighbour count).
Unfortunately your problem is NP-hard, so it's very unlikely that any algorithm exists that can solve every instance quickly. First notice that, assuming edge weights are positive, the edges in any optimal solution necessarily form a tree, since in any non-tree you can delete at least 1 edge without destroying connectivity, and doing so will decrease the total edge weight of the subgraph. (So, as a positive spinoff: If you compute the minimum spanning tree of your input graph and find that it happens to have total weight < X, then you can simply include every vertex in the graph in your solution.)
Let's formulate a decision version of your problem. Given a graph G = (V, E) with positive (I'll assume) weights on the edges, a number X and a number Y, we want to know: Does there exist a connected subgraph G' = (V', E') of G such that the sum of the edge weights in E' is at most X, and the sum of the degrees of V' (w.r.t. G) is at least Y? (Clearly this is no harder than your original problem: If you had an algorithm to solve your problem, then you could just run it, add up the degrees of the vertices in the subgraph it found and compare this to Y to answer "my" problem.)
Here's a reduction from the NP-hard Steiner Tree in Graphs problem, where we are given a graph G = (V, E) with positive weights on the edges, a subset S of its vertices, and a number k, and the task is to determine whether it's possible to connect the vertices in S using a subset of edges with total weight at most k. (As I showed above, the solution will necessarily be a tree.) If the sum of all degrees in G is d, then all we need to do to transform G into an input for your problem is the following: For each vertex s_i in S we add enough new "ballast" vertices that are each connected only to s_i, via edges with weight X+1, to bring the degree of s_i up to d+1. We set X to k, and set Y to |S|(d+1).
Now suppose that the solution to the Steiner Tree problem is YES -- that is, there exists a subset of edges having total weight <= k that does connect all the vertices in S. In that case, it's clear that the same subgraph in the instance of your problem constructed above connects (possibly among others) all the vertices in S, and since each vertex in S has degree d+1, the total degree is at least |S|(d+1), so the answer to your decision problem is also YES.
In the other direction, suppose that the answer to your decision problem is YES -- that is, there exists a subset of edges having total weight <= X ( = k) that connects a set of vertices having total degree at least |S|(d+1). We need to show that this implies a YES answer to the original Steiner Tree problem. Clearly it suffices to show that the vertex set V' of any subgraph satisfying the conditions above (i.e. edges have total weight <= k and vertices have total degree >= |S|(d+1)) contains S (possibly among other vertices). So let V' be the vertex set of such a solution, and suppose to the contrary that there is some vertex u in S that is not in V'. But then the largest sum of degrees that we could possibly make would be to include all other non-ballast vertices in the graph in V', which would give a degree total of at most (|S|-1)(d+1) + d (the first term is the degree sum for the other vertices in S; the second is an upper bound on the degree sum of all non-S vertices in G; note that none of the ballast vertices we added in could be in the subgraph, because the only way to include any of them is to use an edge of weight X+1, which we obviously can't do). But clearly (|S|-1)(d+1) + d = |S|(d+1) - 1, which is strictly less than |S|(d+1), contradicting our assumption that V' has a degree total at least |S|(d+1). So it follows that S is a subset of V', and thus that it is possible to use the same subset of edges to connect the vertices in S for a total weight of at most k, i.e. that the answer to the Steiner Tree problem is also YES.
So a YES answer to either problem implies a YES answer to the other one, in turn implying that a NO answer to either implies a NO answer to the other. Thus if it were possible to solve the decision version of your problem in polynomial time, it would imply a polynomial-time solution to the NP-hard Steiner Tree in Graphs problem. This means the decision version of your problem is itself NP-hard, and so is the optimisation version (which as I said above is at least as hard). (The decision form is also NP-complete, since a YES answer can be easily verified in polynomial time.)
Sidenote: At first I thought I had a very straightforward reduction from the NP-hard Knapsack problem: Given a list of n weights w_1, ..., w_n and a list of n profits p_1, ..., p_n, make a single central vertex c, and n other vertices v_1, ..., v_n. For each v_i, attach it to c with an edge of weight w_i, and add p_i other leaf vertices, each attached only to v_i with an edge of weight X+1. However this reduction doesn't actually work, because the profits can be exponential in the input size n, meaning that the constructed instance of your problem might need to have an exponential number of vertices, which isn't allowed for a polynomial-time reduction.

Optimal substructure for least number of perfect squares

Question: I know how the recursion works but I can't seem to understand the 'optimal substructure' for this problem which necessitates the use of dynamic programming.
Problem: Find least number of perfect squares that sum upto a given number.
Let's say we want to find the shortest path from U to V. If we have a node X in between then shortest path from U to V will be shortest path from U to X plus shortest path from X to V.
I am having difficulty understanding how the least squares problem follows the optimal substructure property.
To my understanding, the recurrence relation for sum of perfect squares behaved similarly to the recurrence relation for shortest paths in the following way. Let
f(n) := minimum number of perfect squares which sum up to exactly n
then a suitable recurrence can be stated as
f(n) = min{ f(n-i) + f(i) : 0 < i < n }
which means that all partitions of the original argument into two summands have to be taken into account. Intuitively, the 'split point' for the shortest path problem is a node, whereras in the perfect squares problem it is the decision how to partition into summands (which are then examined further).
You did not state the property correctly, the third paragraph should be rephrased this way:
Let's say we want to find the shortest path from U to V. First verify if U and V are linked directly, if not, compute for each node X in between U and V the shortest path from U to X plus shortest path from X to V, the smallest answers will be the shortest paths from U to V, possibly duplicated.
This is applicable for your problem because you can determine the set of nodes X that are between U and V. For U=0 and V=n, this set is all the numbers from 1 to n-1, because you are adding positive numbers.
For the solution, use an array to cache the smallest path from 0 to i for i going from 0 to n, for each new value, a linear search will yield the best solutions, for an overall time complexity of O(n2).
You can optimize the linear search by enumerating only the perfect squares less or equal to n. This list is much shorter that the whole list of numbers. Its length is actually sqrt(n), so the complexity for the overall search drops to O(n3/2).
The cache can be just a pair of integers: the length of the path and the value of an intermediary X that is on one of the shortest paths. This gives a space complexity of O(n).
The problem in question: Find least number of perfect squares that sum up to a given number. has been extensively studied for more than 17 centuries: Lagrange's four-square theorem, also known as Bachet's conjecture was already known to Diophantus in the third century. It states that every natural number can be represented as the sum of four integer squares. Analytical solutions exist for determine whether any given integer is the sum of 1, 2, 3 or 4 perfect squares.

Path from s to e in a weighted DAG graph with limitations

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?
Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

Resources