I need to write an algorithm for the following scenario:
Given a set of points (cities) on a map I have to visit. Each of these cities has a priority, from 1 (you need to go there ASAP) to 5 (no rush, but go there eventually). I have limited ressources so I can not visit all the priority-1 cities at first. (E.g if NY and SF have priority 1 and Washington has Priority 5, I'm looking for a path NY-Washington-SF not NY-SF-Washington).
I don't know it it matters, but n (# of cities) will usually be around 10-20.
I found a presentation of "The Hierarchical Traveling Salesman Problem" (Panchamgam, Xiong, Golden and Wasi) which is kind of what I'm looking for but the corresponding article is not publicly accessible.
Can you recommend existing algorithms for such scenarios? Or point me in the right direction what to search for?
An approximation would be alright. My scenario is not as life-threatening as in the scenario described by Panchamgam et. al. It's important to avoid unnecessary detours caused by the priorities without ignoring them completely.
In standard TSP you want to minimize the total length of the route. In your case you want to basically optimize two metrics: the length of the route, and how early high priority cities appear on the route. You need to put these two metrics into a single metric, and how you do that might have an impact on the algorithm choice. For example, map the city priorities to penalties, e.g.
1 -> 16
2 -> 8
3 -> 4
4 -> 2
5 -> 1
Then use as the metric you want to minimize the total sum of (city_penalty * distance_to_city_from_start_on_route). This pushes the high priority cities to the beginning of the route in general but allows for out of priority order traversal if otherwise the route becomes too long. Obviously the penalty values should be experimentally tuned.
With this metric, you can then use e.g. standard stochastic search approach --- start with a route and then swap cities or edges on the route in order to decrease the metric (use simulated annealing or tabu search).
An upper bound of 20ish puts dynamic programming in play. There's an O(n^2 2^n)-time algorithm for plain old traveling salesman path that goes like this. For each end vertex (n) and subset of vertices containing that end vertex (2^(n - 1)), we're going to determine the cheapest tour that visits the entire subset. Iterate over the subsets so that each set comes after its proper subsets (e.g., represent the sets as bit vectors and count from 0 to 2^n - 1). For each end vertex v in a subset S, the cheapest tour of S is either just v (if S = {v}) or it consists of a cheapest tour of S - {v} (computed already) followed by v. Each vertex w in S - {v} is a possibility for the next to last vertex of the tour of S - {v}.
You haven't completely specified how the priorities interact with the goal of minimizing the distance. One could, for example, translate the priorities into deadlines (you must visit this vertex before traveling x distance). The dynamic program adapts easily for this setting: the only modification needed is to assign cost +infinity if the time to reach the specified end vertex is too great. There are a lot of other possibilities here; you can have an objective consisting of a sum over each individual vertex of some vertex-dependent function of the distance to reach that vertex.
From an engineering standpoint, the nice thing about implementing an exact algorithm is that it is much easier to test (just compare with brute force).
Related
I am trying to develop an algorithm to solve a problem that I am not able to classify, I expose the subject:
You have a map divided into sections that have a certain area and where a certain number of people live.
The problem consists of finding sets of connected sections whose area does not exceed a certain value maximizing the number of selected inhabitants.
For now I can think of two approaches:
Treat the problem as an all-pairs shortest paths problem in an
undirected graph with positive natural values where the solutions
that do not meet the constraint of the maximum selected area will be
discarded. For this you could use the Floyd-Warshall algorithm,
Dijkstra for all pairs or Thorup algorithm (which can be done in time
V * E, where these are the vertices and edges of the graph).
Treat it as an open vehicle routing problem with profits where each
vehicle can start and end wherever it wants (open vehicle routing
problem with profits or OVRPP).
Another aproach
Also, depending on the combinatorics of the particular problem it is possible in certain cases to use genetic algorithms, together with tabu search, but this is only for cases where finding an optimal solution is inadmissible.
To be clearer, what is sought is to obtain a selection of connected sections whose sum of areas does not exceed a total area. The parameter to maximize is the sum of populations of the selected sections. The objective is to find an optimal solution.
For example, this is the optimal selection with max area of 6 (red color area)
Thank you all in advance!
One pragmatic approach would be to formulate this as an instance of integer linear programming, and use an off-the-shelf ILP solver. One way to formulate this as an ILP instance is build a graph with one vertex per section and an edge between each pair of adjacent sections; then, selecting a connected component in that graph is equivalent to selecting a spanning tree for that component.
So, let x_v be a set of zero-or-one variables, one for each vertex v, and let y_{u,v} be another set of zero-or-one variables, one per edge (u,v). The intended meaning is that x_v=1 means that v is one of the selected sections; and that y_{u,v}=1 if and only if x_u=x_v=1, which can be enforced by y_{u,v} >= x_u + x_v - 1, y_{u,v} <= x_u, y_{u,v} <= x_v. Also add a constraint that the number of y's that are 1 is one less than the number of x's that are 1 (so that the y's form a tree): sum_v x_v = 1 + sum_{u,v} y_{u,v}. Finally, you have a constraint that the total area does not exceed the maximum: sum_v A_v x_v <= maxarea, where A_v is the area of section v.
Then your goal is to maximize sum_v P_v x_v, where P_v is the population of section v. Then the solution to this integer linear programming problem will give the optimal solution to your problem.
I'm looking at several problems of similar format but different difficulty. Help would be appreciated for polynomial (preferably relatively fast, but not necessarily), and even brute-force solutions of any of them.
The idea of all of the problems is that you have a weighted, undirected graph, and that an agent controls some of the nodes of the graph at the start. The agent can gain control of a node if they already control two adjacent nodes. The agent is trying to minimise the time they take to control a certain number of nodes. The problems differ on some details.
(1) You gain control of nodes in order (ie. you cannot take over multiple nodes simultaneously). The time taken to take control of a node is defined as the minimum of the edges from the two nodes used to take control of it. The goal is to take control of every single node in the graph.
(2) Again, you gain nodes in order and the goal is to take control of every single node in the graph. The time taken to take control of a node is defined as the maximum of the two nodes used to take control of it.
(3) Either (1) or (2), but with the goal of taking control of a certain number of nodes, not necessarily all of them.
(4) (3), but you can take control of multiple nodes simultaneously. Basically, say nodes 2 and 4 are being used to take over node 3 in time of 5. During that time of 5, nodes 2 and 4 cannot be used to take over a node that is not node 3. However, nodes 5 and 6 may for example be simultaneously taking over node 1.
(5) (4), but with an unweighted graph.
I started with the problem (4). I progressively made the problem easier from (4) to (3) to (2) to (1) with the hopes I could construct the solution for (4) from that. Finally, I have solved (1) but do not know how to solve any other one. My solution to (1) is this: of all candidate nodes which have two adjacent nodes that we control, simply take the one which takes the shortest amount of time. This is similar to Dijkstra's shortest path algorithm. However, this kind of solution should not solve any of the others. I believe that possibly a dynamic programming solution might work though, but I have no idea how to formulate one. I also have not found brute force solutions for any of the 4 problems. It is also possible that some of the problems are not polynomially solvable, and I would be curious to know why if that is the case.
Idea for the questions are my own, and I'm solving for my own entertainment. But I would not be surprised if it can be found elsewhere.
This isn't an answer to the problem. It is a demonstration that the greedy approach fails for problem 1.
Suppose that we have a graph with 7 nodes. We start by controlling A and B. The cost from A to B and B to C and C to D are all 1. Both E and F connect to A, B, and D with cost 10. G connects to A, B, C, and D with cost 100.
The greedy strategy that you describe will connect to E and F at cost 10 each, then D at cost 10, then C at cost 1, then G at cost 100 for a total cost of 131.
The best strategy is to connect to G at cost 100, then C and D at cost 1, then E and F at cost 10 for a total cost of 122 < 131.
And this example demonstrates that greedy is not always going to produce the right answer.
I haven't been able to come up with a reduction yet, but these problems have the flavor of NP-hard network design and maximum coverage problems, so I would be quite surprised if variants (3) through (5) were tractable.
My practical suggestion would be to apply the Biased Random-Key Genetic Algorithm framework. The linked slide deck covers the generic part (an individual is a map from nodes to numbers; at each step, we rank individuals, retain the top x% "elite" individuals as is, produce y% offspring by crossing a random elite individual with a random non-elite individual, biased toward selecting the elite chromosomes, and fill out the rest of the population with random individuals). The non-generic part is translating an individual into a solution. My recommended starting point would be to choose to explore the lowest-numbered eligible node each time.
I have a problem similar to the basic TSP but not quite the same.
I have a starting position for a player character, and he has to pick up n objects in the shortest time possible. He doesn't need to return to the original position and the order in which he picks up the objects does not matter.
In other words, the problem is to find the minimum-weight (distance) Hamiltonian path with a given (fixed) start vertex.
What I have currently, is an algorithm like this:
best_total_weight_so_far = Inf
foreach possible end vertex:
add a vertex with 0-weight edges to the start and end vertices
current_solution = solve TSP for this graph
remove the 0 vertex
total_weight = Weight (current_solution)
if total_weight < best_total_weight_so_far
best_solution = current_solution
best_total_weight_so_far = total_weight
However this algorithm seems to be somewhat time-consuming, since it has to solve the TSP n-1 times. Is there a better approach to solving the original problem?
It is a rather minor variation of TSP and clearly NP-hard. Any heuristic algorithm (and you really shouldn't try to do anything better than heuristic for a game IMHO) for TSP should be easily modifiable for your situation. Even the nearest neighbor probably wouldn't be bad -- in fact for your situation it would probably be better that when used in TSP since in Nearest Neighbor the return edge is often the worst. Perhaps you can use NN + 2-Opt to eliminate edge crossings.
On edit: Your problem can easily be reduced to the TSP problem for directed graphs. Double all of the existing edges so that each is replaced by a pair of arrows. The costs for all arrows is simply the existing cost for the corresponding edges except for the arrows that go into the start node. Make those edges cost 0 (no cost in returning at the end of the day). If you have code that solves the TSP for directed graphs you could thus use it in your case as well.
At the risk of it getting slow (20 points should be fine), you can use the good old exact TSP algorithms in the way John describes. 20 points is really easy for TSP - instances with thousands of points are routinely solved and instances with tens of thousands of points have been solved.
For example, use linear programming and branch & bound.
Make an LP problem with one variable per edge (there are more edges now because it's directed), the variables will be between 0 and 1 where 0 means "don't take this edge in the solution", 1 means "take it" and fractional values sort of mean "take it .. a bit" (whatever that means).
The costs are obviously the distances, except for returning to the start. See John's answer.
Then you need constraints, namely that for each node the sum of its incoming edges is 1, and the sum of its outgoing edges is one. Also the sum of a pair of edges that was previously one edge must be smaller or equal to one. The solution now will consist of disconnected triangles, which is the smallest way to connect the nodes such that they all have both an incoming edge and an outgoing edge, and those edges are not both "the same edge". So the sub-tours must be eliminated. The simplest way to do that (probably strong enough for 20 points) is to decompose the solution into connected components, and then for each connected component say that the sum of incoming edges to it must be at least 1 (it can be more than 1), same thing with the outgoing edges. Solve the LP problem again and repeat this until there is only one component. There are more cuts you can do, such as the obvious Gomory cuts, but also fancy special TSP cuts (comb cuts, blossom cuts, crown cuts.. there are whole books about this), but you won't need any of this for 20 points.
What this gives you is, sometimes, directly the solution. Usually to begin with it will contain fractional edges. In that case it still gives you a good underestimation of how long the tour will be, and you can use that in the framework of branch & bound to determine the actual best tour. The idea there is to pick an edge that was fractional in the result, and pick it either 0 or 1 (this often turns edges that were previously 0/1 fractional, so you have to keep all "chosen edges" fixed in the whole sub-tree in order to guarantee termination). Now you have two sub-problems, solve each recursively. Whenever the estimation from the LP solution becomes longer than the best path you have found so far, you can prune the sub-tree (since it's an underestimation, all integral solutions in this part of the tree can only be even worse). You can initialize the "best so far solution" with a heuristic solution but for 20 points it doesn't really matter, the techniques I described here are already enough to solve 100-point problems.
I have to study the resistance of the principal cluster of a percolating network of conducting wires
. Individual wires are labeled from 1 to n. I represent the network by a graph G(V,E) and find its adjacency matrix A, where A_ij = 1 if wires i and j are in contact, 0 otherwise.
My question is the following : given that I need to implement Kirchhoff's Laws
on the main percolated cluster, I need an algorithm that returns all the, ideally, smallest loops in the cluster. Do you know of an algorithm (mine is bruteforce now and not efficient) that finds all the loops inside a graph from its adjacency matrix ?
In general, there can be exponentially many simple cycles (loops), so since you want only the "smallest", it sounds as though you don't want them all. If you're looking to write equations corresponding to Kirchhoff's second law for all possible cycles, then it suffices to use just the equation for each cycle in a cycle basis. There is a polynomial-time algorithm to find the cycle basis that uses the least total number of edges (minimum cycle basis). Rather than implement that algorithm, however, it may suffice to switch from arc variables xu→v to differences of node variables yv - yu (fix one node variable per connected component to be zero).
I am aware that Dijkstra's algorithm can find the minimum distance between two nodes (or in case of a metro - stations). My question though concerns finding the minimum number of transfers between two stations. Moreover, out of all the minimum transfer paths I want the one with the shortest time.
Now in order to find a minimum-transfer path I utilize a specialized BFS applied to metro lines, but it does not guarantee that the path found is the shortest among all other minimum-transfer paths.
I was thinking that perhaps modifying Dijkstra's algorithm might help - by heuristically adding weight (time) for each transfer, such that it would deter the algorithm from making transfer to a different line. But in this case I would need to find the transfer weights empirically.
Addition to the question:
I have been recommended to add a "penalty" to each time the algorithm wants to transfer to a different subway line. Here I explain some of my concerns about that.
I have put off this problem for a few days and got back to it today. After looking at the problem again it looks like doing Dijkstra algorithm on stations and figuring out where the transfer occurs is hard, it's not as obvious as one might think.
Here's an example:
If here I have a partial graph (just 4 stations) and their metro lines: A (red), B (red, blue), C (red), D (blue). Let station A be the source.
And the connections are :
---- D(blue) - B (blue, red) - A (red) - C (red) -----
If I follow the Dijkstra algorithm: initially I place A into the queue, then dequeue A in the 1st iteration and look at its neighbors :
B and C, I update their distances according to the weights A-B and A-C. Now even though B connects two lines, at this point I don't know
if I need to make a transfer at B, so I do not add the "penalty" for a transfer.
Let's say that the distance between A-B < A-C, which causes on the next iteration for B to be dequeued. Its neighbor is D and only at this
point I see that the transfer had to be made at B. But B has already been processed (dequeued). S
So I am not sure how this "delay" in determining the need for transfer would affect the integrity of the algorithm.
Any thoughts?
You can make each of your weights a pair: (# of transfers, time). You can add these weights in the obvious way, and compare them in lexicographic order (compare # of transfers first, use time as the tiebreaker).
Of course, as others have mentioned, using K * (# of transfers) + time for some large enough K produces the same effect as long as you know the maximum time apriori and you don't run out of bits in your weight storage.
I'm going to be describing my solution using the A* Algorithm, which I consider to be an extension (and an improvement -- please don't shoot me) of Dijkstra's Algorithm that is easier to intuitively understand. The basics of it goes like this:
Add the starting path to the priority queue, weighted by distance-so-far + minimum distance to goal
Every iteration, take the lowest weighted path and explode it into every path that is one step from it (discarding paths that wrap around themselves) and put it back into the queue. Stop if you find a path that ends in the goal.
Instead of making your weight simply distance-so-far + minimum-distance-to-goal, you could use two weights: Stops and Distance/Time, compared this way:
Basically, to compare:
Compare stops first, and report this comparison if possible (i.e., if they aren't the same)
If stops are equal, compare distance traveled
And sort your queue this way.
If you've ever played Mario Party, think of stops as Stars and distance as Coins. In the middle of the game, a person with two stars and ten coins is going to be above someone with one star and fifty coins.
Doing this guarantees that the first node you take out of your priority queue will be the level that has the least amount of stops possible.
You have the right idea, but you don't really need to find the transfer weights empirically -- you just have to ensure that the weight for a single transfer is greater than the weight for the longest possible travel time. You should be pretty safe if you give a transfer a weight equivalent to, say, a year of travel time.
As Amadan noted in a comment, it's all about creating right graph. I'll just describe it in more details.
Consider two vertexes (stations) to have edge if they are on a single line. With this graph (and weights 1) you will find minimum number of transitions with Dijkstra.
Now, lets assume that maximum travel time is always less 10000 (use your constant). Then, weight of edge AB (A and B are on one line) is a time_to_travel_between(A, B) + 10000.
Running Dijkstra on such graph will guarantee that minimal number of transitions is used and minimum time is reached in the second place.
update on comment
Let's "prove" it. There're two solution: with 2 transfers and 40 minutes travel time and with 3 transfers and 25 minutes travel time. In first case you travel on 3 lines, so path weight will be 3*10000 + 40. In second: 4*10000 + 25. First solution will be chosen.
I had the same problem as you, until now. I was using Dijkstra. The penalties for transfers is a very good idea indeed and I've been using it for a while now. The main problem is that you cannot use it directly in the weight as you first you have to identify the transfer. And I didn't want to modify the algorithm.
So what I'be been doing, is that each time and you find a transfer, delete the node, add it with the penalty weight and rerun the graph.
But this way I found out that Dijkstra wont work. And this is where I tried Floyd-Warshall which au contraire to Dijkstra compares all possible paths through the graph between each pair of vertices.
It helped me with my problem switching to Floyd-Warshall. Hope it helps you as well.
Its easier to code and lot more easier to implement.