Imagine a graph where each vertex has a value (example, number of stones) and is connected through edges, that represents the cost of traversing that edge in stones. I want to find the largest possible amount of stones, such that each vertex Vn >= this value. Vertices can can exchange stones to others, but the value exchanged gets subtracted by the distance, or weight of the edges connecting them
I need to solve this as a greedy algorithm and in O(n) complexity, where n is the amount of vertices, but I have problems identifying the subproblems/greedy choice. I was hoping that someone could provide a stepping stone or some hints on how to accomplish this, much appreciated
Summary of Question
I am not sure I have understood the question correctly, so first I will summarize my understanding.
We have a graph with vertices v1,v2,..,vn and weighted edges. Let the weight between vi and vj be W[i,j]
Each vertex starts with a number of stones, let us call the number of stones on vertex vi equal to A[i]
You wish to perform multiple transfers in order to maximise the value of min(A[i] for i = 1..n)
x stones can be transferred between vi and vj if x>W[i,j], this operation transforms the values as:
A[i] -= x
A[j] += x-W[i,j] # Note fewer stones arrive than leave
Is this correct?
Response
I believe this problem is NP-hard because it can be used to solve 3-SAT, a known NP-complete problem.
For a 3-sat example with M clauses such as:
(A+B+!C).(B+C+D)
Construct a directed graph which has a node for each clause (with no stones), a node for each variable with 3M+1 stones, and two auxiliary nodes for each variable with 1 stone (one represents the variable having a positive value, and one represents the variable having a negative value.
Then connect the nodes as shown below:
This graph will have a solution with all vertices having value >= 1, if and only if the 3-sat is soluble.
The point is that each red node (e.g. for variable A) can only send stones to either A=1 or A=0, but not both. If we provide stones to the green node A=1, then this node can supply stones to all of the blue clauses which use that variable in a positive sense.
(Your original question does not involve a directed graph, but I doubt that this additional change will make a material difference to the complexity of the problem.)
Summary
I am afraid it is going to be very hard to get an O(n) solution to this problem.
Related
I've been reading some papers on multicut algorithms for segmenting graph structures. I'm specifically interested in this work which proposes an algorithm to solve an extension of the multicut problem:
https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Keuper_Efficient_Decomposition_of_ICCV_2015_paper.pdf
Regarding the edge costs, it says:
...for any pair
of nodes, a real-valued cost (reward) to all decompositions
for which these nodes are in distinct components
Fair enough. It further says that the solution to the multicut problem is a simple binary vector of length equal to the number of edges in the graph, in which a '1' indicates that the corresponding edge separates two vertices belonging to distinct graph components:
for every edge vw ∈ E ∪ F , y(v,w) = 1 if and only if v and w
are in distinct components of G.
But then the optimization problem is written as:
This doesn't seem to make sense. If the edge weights depict rewards for that edge connecting nodes in distinct components, shouldn't this be a maximization problem? And in either case, if all edge weights are positive, wouldn't that lead to a trivial solution where y is an all-zeros vector? The above expression is followed by some constraints in the paper, but I couldn't figure out how any of those prevent this outcome.
Furthermore, when it later tries to generate an initial solution using Greedy Additive Edge Contraction, it says:
Alg. 1 starts from the decomposition into
single nodes. In every iteration, a pair of neighboring components is joined for which the join decreases the objective
value maximally. If no join strictly decreases the objective
value, the algorithm terminates.
Again, if edge weights are rewards for keeping nodes separate, wouldn't joining any two nodes reduce the reward? And even if I assume for a second that edge weights are penalties for keeping nodes separate, wouldn't this method simply lump all the nodes into a single component?
The only way I see where this would work is if the edge weights are a balanced combination of positive and negative values, but I'm pretty sure I'm missing something because this constraint isn't mentioned anywhere in literature.
Just citing this multicut lecture
Minimum Multicut. The input consists of a weighted, undirected graph G
= (V,E) with a non-negative weight c_k for every edge in E, and a set of terminal pairs {(s1,t1);(s2,t2)...(sk,tk)}. A multicut is a set of
edges whose removal disconnects each of the terminal pairs.
I think from this definition it is clear that the multicut problem is a minimization problem for the accumulated weight which is defined by the selection of edges to cut. Maximizing the weight would of course be trivial (removing all edges). No?
Better late than never, here's the answer:
The weights c_e for cutting the edge e are not restricted to be positive as defined in Definition 1. In fact, Equation (7) specifies that they are log-ratios of two complementary probabilities. That means if the estimated probability for edge e being cut is greater than 0.5, then c_e will be negative. If it's smaller, then c_e will be positive.
While the trivial "all edges cut" solution is still feasible, it is quite unlikely that it is also optimal in any "non-toy" instance where you will have edges that are more likely to be cut while others are more likely to remain.
Question
How would one going about finding a least cost path when the destination is unknown, but the number of edges traversed is a fixed value? Is there a specific name for this problem, or for an algorithm to solve it?
Note that maybe the term "walk" is more appropriate than "path", I'm not sure.
Explanation
Say you have a weighted graph, and you start at vertex V1. The goal is to find a path of length N (where N is the number of edges traversed, can cross the same edge multiple times, can revisit vertices) that has the smallest cost. This process would need to be repeated for all possible starting vertices.
As an additional heuristic, consider a turn-based game where there are rooms connected by corridors. Each corridor has a cost associated with it, and your final score is lowered by an amount equal to each cost 'paid'. It takes 1 turn to traverse a corridor, and the game lasts 10 turns. You can stay in a room (self-loop), but staying put has a cost associated with it too. If you know the cost of all corridors (and for staying put in each room; i.e., you know the weighted graph), what is the optimal (highest-scoring) path to take for a 10-turn (or N-turn) game? You can revisit rooms and corridors.
Possible Approach (likely to fail)
I was originally thinking of using Dijkstra's algorithm to find least cost path between all pairs of vertices, and then for each starting vertex subset the LCP's of length N. However, I realized that this might not give the LCP of length N for a given starting vertex. For example, Dijkstra's LCP between V1 and V2 might have length < N, and Dijkstra's might have excluded an unnecessary but low-cost edge, which, if included, would have made the path length equal N.
It's an interesting fact that if A is the adjacency matrix and you compute Ak using addition and min in place of the usual multiply and sum used in normal matrix multiplication, then Ak[i,j] is the length of the shortest path from node i to node j with exactly k edges. Now the trick is to use repeated squaring so that Ak needs only log k matrix multiply ops.
If you need the path in addition to the minimum length, you must track where the result of each min operation came from.
For your purposes, you want the location of the min of each row of the result matrix and corresponding path.
This is a good algorithm if the graph is dense. If it's sparse, then doing one bread-first search per node to depth k will be faster.
I'm trying to find a heuristic for a problem that is mapped to a directed graph with say non-negative weight edges. However, each edge is associated with two weight properties as opposed to only one weight (e.g. say one is distance, and another one showing how good the road's 4G LTE coverage is!). Is there any specific variation of dijkstra, Bellman Ford, or any other algorithm that pursues this objective? Of course, a naive workaround is manually deriving a single weight property as a combination of all of them, but this does not look good.
Can it be generalized to cases with multiple properties?
Say you want to optimize simultaneously two criteria: distance and attractiveness (and say path attractiveness is defined as the attractiveness of the most attractive edge, although you can think of different definitions). The following variation of Dijkstra can be shown to work, but I think it is mainly useful where one of the criteria takes a small number of values - say attractiveness is 1, ..., k for some small fixed k (smaller i is better).
The standard pseudocode for Dijsktra's algorithm uses a single priority queue. Instead use k priority queues. Priority queue i will correspond in Dijkstra's algorithm to the shortest path to a node v ∈ V with attractiveness i.
Start by initializing that each node is in each of the queues with distance ∞ (because, initially, the shortest path to v with attractiveness i is infinite).
In the main Dijkstra loop, where it says
while Q is not empty
change it to
while there is an i for which Q[i] is not empty
Q = Q[i] for the lowest such i
and continue from there.
Note that when you update, you pop from queue Q[i], and insert to Q[j] for j ≥ i.
It's possible to modify the proof of Dijkstra's relaxation property to show that this works.
Note that you will obtain up to k |V| results, as per node and attractiveness, you can have the shortest distance to the node with the given attractiveness.
Example
Taking an example from the comments:
So basically if a path has a total no-coverage miles of >10, then we go for another path.
Here, e.g., assuming the miles are integers (or can be rounded to integers), we could create 11 queues: queue i corresponds to the shortest distance with i no-coverage miles, except for 10, which corresponds to 10-or-higher no-coverage-miles.
At some point of the algorithm, say all queues are empty below queue 3. We pop queue 3, and update the vertex's neighbors: this might update, e.g., some node in queue 4, if the distance from the popped node to the other node is 1.
As the algorithm runs, it outputs mappings of (node, no-coverage-distance) → shortest distance. Here, you could decide that you discard all mappings for which the second item in the pair is 10.
Background:
As you can see below, there is an undirected graph on the left of the figure. Vertices are represented by S1,S2 ... S6, and edges are represented by line segments between vertices. Every edge has a weight (the number near the edge), either positive or negative.
Definitions:
In the graph, a simple cycle is called a conflicting cycle if it has an odd number of negative edges, and a concordant cycle if an even (or zero) number of negative edges. On the left of the figure below, for example, the graph has two conflicting cycles(S1-S2-S3-S1 and S2-S3-S4-S2), and other cycles are concordant. A graph is called happy if it has no conflicting cycle.
Objective:
Make the graph happy by removing some edges, meanwhile ensuring the cost(the sum of absolute values of weights of removed edges) is lowest. In the figure below, for example, after removing the edge (red line segment), there is no conflicting cycles. So the graph becomes happy, and the cost is only 2.
This problem is NP-hard by reduction from maximum cut. Given an instance of maximum cut, multiply all of the edge weights by -1. The constraints of this problem dictate that edges be removed so as to eliminate all odd cycles, i.e., we need to find the maximum-weight bipartite subgraph.
This problem in fact is equivalent to a 2-label unique label cover problem. The goal is to color each vertex black or white so as to minimize the sum of costs for (i) positive edges that connect vertices of different colors (ii) negative edges that connect vertices of the same color. Deleting all of these edges is a valid solution to the original problem. Conversely, given a valid set of edges to delete, we can determine a coloring. I expect that there's an approximation algorithm based on semidefinite programming (and the relaxation could be used for branch and bound).
Unless you're familiar with combinatorial optimization, however, the algorithm that I would suggest is integer programming. Let x(e) be 1 if we delete edge e and let x(e) be 0 if we don't.
minimize sum_{edges e} cost(e) x(e)
subject to
for every simple cycle C with an odd number of negative edges,
sum_{edges e in C} x(e) >= 1
for each edge e, x(e) in {0, 1}
The solver will do most of the work. The problem is handling the exponential number of constraints that I've written. The crudest thing to do is to generate all simple cycles and give the solver the whole program. Another possibility is to solve to optimality with a subset of the constraints, test whether the solution is actually valid, and, if not, introduce one or more missing constraints. To do the test, attempt to two-color the undeleted subgraph such that vertices joined by a positive edge have identical colors and vertices joined by a negative edge have different colors. Color greedily; if we get stuck, then there's an odd cycle at fault.
With more sophistication, it's possible to solve the program as written via a technique called column generation.
I've written a solver for this problem (under the name "Signed Graph Balancing"). It is based on a fixed-parameter algorithm that is fast if only few edges need to be deleted. The method is described in the paper "Separator-based data reduction for signed graph balancing".
Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?
Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.