Maximum Diversity: translate an heuristic algorithm in C (or pseudocode) - algorithm

I have a set of N items and I know their mutual distances. every element has a cost and I have a budget. I should accomplish the following task: suppose I put an Item in the basket, the following item in the basket will be the item whose distance is the maximum from the first (under budget constraint) the third item will be the item whose sum of distances from item1 and item2 is the maximum (under budget constraints), a forth item will be the one whose sum of distances from item 1,2 and 3 is the maximum (always budget) etc. How do I find the subset whose total distance (computed as above) is max? Do you know any algorithm to solve this problem? thanks in advance
UPDATE: I've done some research and this problem is called Maximum Diversity Problem. I can't traslate the heuristic algorithm (that would solve the problem) stated above in C or pseudocode!

This is an interesting question. If I understand correctly you are trying to find a path with maximum distance given a budget.
Let us imagine the items here as a connected graph thus we can use tools from graph theory. The edges are the costs and the vertices or nodes are the actual items. Essentially it seems you want to find a maximum path under budget constraints so a reverse dijkstra algorithm.
Steps:
Select starting vertex
Evaluate distance from starting point.
Select vertex with maximum distance if this is above your budget go to the next one burning the edge that was above your budget
Calculate distance between the newly added item to the others as the sum of the path to get to the item + the cost of choosing the other item (i.e. first iteration say we got item 1 then went to item 2 then the distance between item 2 and item x would be item 1 + item 2 +item x)
Select maximum again if above budget go to the next maximum burning the edge to the maximum that would be above your budget.
Repeat until budget exhausted
Hope this helps feel free to ask for clarification if this makes sense. I suggest some background reading on graph theory and associated algorithms

Related

Optimization problem in connected graphs with profits

I am trying to develop an algorithm to solve a problem that I am not able to classify, I expose the subject:
You have a map divided into sections that have a certain area and where a certain number of people live.
The problem consists of finding sets of connected sections whose area does not exceed a certain value maximizing the number of selected inhabitants.
For now I can think of two approaches:
Treat the problem as an all-pairs shortest paths problem in an
undirected graph with positive natural values where the solutions
that do not meet the constraint of the maximum selected area will be
discarded. For this you could use the Floyd-Warshall algorithm,
Dijkstra for all pairs or Thorup algorithm (which can be done in time
V * E, where these are the vertices and edges of the graph).
Treat it as an open vehicle routing problem with profits where each
vehicle can start and end wherever it wants (open vehicle routing
problem with profits or OVRPP).
Another aproach
Also, depending on the combinatorics of the particular problem it is possible in certain cases to use genetic algorithms, together with tabu search, but this is only for cases where finding an optimal solution is inadmissible.
To be clearer, what is sought is to obtain a selection of connected sections whose sum of areas does not exceed a total area. The parameter to maximize is the sum of populations of the selected sections. The objective is to find an optimal solution.
For example, this is the optimal selection with max area of 6 (red color area)
Thank you all in advance!
One pragmatic approach would be to formulate this as an instance of integer linear programming, and use an off-the-shelf ILP solver. One way to formulate this as an ILP instance is build a graph with one vertex per section and an edge between each pair of adjacent sections; then, selecting a connected component in that graph is equivalent to selecting a spanning tree for that component.
So, let x_v be a set of zero-or-one variables, one for each vertex v, and let y_{u,v} be another set of zero-or-one variables, one per edge (u,v). The intended meaning is that x_v=1 means that v is one of the selected sections; and that y_{u,v}=1 if and only if x_u=x_v=1, which can be enforced by y_{u,v} >= x_u + x_v - 1, y_{u,v} <= x_u, y_{u,v} <= x_v. Also add a constraint that the number of y's that are 1 is one less than the number of x's that are 1 (so that the y's form a tree): sum_v x_v = 1 + sum_{u,v} y_{u,v}. Finally, you have a constraint that the total area does not exceed the maximum: sum_v A_v x_v <= maxarea, where A_v is the area of section v.
Then your goal is to maximize sum_v P_v x_v, where P_v is the population of section v. Then the solution to this integer linear programming problem will give the optimal solution to your problem.

Knapsack problem with dependent selection

Just as classic knapsack problem, we want to maximize the total value while do not let the total weight exceed the capacity, and their values and weights are independent. But, for some items, if you want to select it, you have to select some other items.
For exmaple: There are item_1, item_2, ..., item_n. If you want to select item_1, you have to select item_3 and item_5, and if you want to select item_3, you have to select item_2, item_7, item_9... etc.
The dependencies are independent, that is, if we draw the dependency graph, it is just a "directed graph".
First, I noticed "precedence constrained knapsack problem" and "partially ordered knapsack problem", but in my problem, the dependency doesn't follow antisymmetric (that is, the dependency graph may contain cycles).
The closest problem I found is "set-union knapsack problem"
Given a set of items, select the subset with largest total value, subject to the constraint that the total weight of the items selected does not exceed a fixed capacity. The total value of a set of items is the sum of the individual values and the total weight is the sum of the individual weights. In the Set-Union Knapsack Problem, the items each have a value, but instead of a weight, each corresponds to a set of elements. Each element has a weight. The total value of a set of items is sum of the individual values, but the total weight is the sum of the weights of the elements in the union of the corresponding sets.
But it only unions the "weights", the value of some items may accumulate several times.
Is there any way to efficiently solve this problem?
EDITED:
I found a way which I can leverage some approximation algorithm
step 1. Make a directed dependency graph
step 2. Transfer this graph to component graph (use DFS to find strongly connected component) to remove cycle
step 3. So now, this become a "precedence constrained knapsack problem" or "partially ordered knapsack problem". These are strongly NP-complete but there were a lot of papers talk about this, and can find a approximation algorithm to solve.
Before selecting the item you have to check weather item will create cycle or not if it creates cycle then discard it and move to next item. for that you can use Kruskal's algorithms.

Finding a subgraph of max weight

I have a city area (let's think of it as a graph of streets), where all streets have some weight and length associated with them. What I want to do is find a connected set of streets, located near other, with some max (or close to max) total weight W, given that my max subgraph can only contain up to N streets.
I'm specifically not interested in a subgraph that would span the entire graph, but rather only a small cluster of streets that has max or close to max combined weight and where all streets are located "near" each other, where "near" would be defined as no street being more than X meters away from the center of the cluster. Resulting subgraph would have to be connected.
Does anyone know if the name for this algorithm assuming it exists?
Also interested in any solutions, exact or approximations.
To show this visually, assume my graph is all the street segments (intersection to intersection) in the image below. So individual street is not Avenue A, it's Avenue A between 10th and 11th, and so on. Street will either have weight of 1 or 0. Assume that the set of streets with max weights are in the selected polygon - what I want to do is find this polygon.
Here's a proposal. Consider each vertex in the graph of nodes as the "center" as you've defined it. For each center C[i], execute Dijkstra's algorithm to construct a shortest path tree with C[i] as the origin. Stop constructing the tree when it would include a vertex more than the max allowed from the center.
Then let A[i] be the set of all edges incident to vertices in the tree centered on V[i]. The result will be the set A[i] with maximum weight.
The run time of one execution of Dijkstra's algorithm is O(|E[i]| + |V[i]| log |V[i]|) for the ith center. The sets here are limited in size by the max distance from center. Total cost is sum_(i in 1..|V|) O(|E[i]| + |V[i]| log |V[i]|). In the degenerate case where the max allowable weight allows the whole graph to be included from each center, cost will be O(|V| (|E| + |V| log |V|)).
I can think of some possible optimizations to improve run time, but want to verify this gets at the problem you have in mind.
Here is an integer programming exact formulation for the problem assuming you have a finite number of total streets, S, and the "center" of a cluster can be one of the finite number of streets S. If you are looking at a cluster center in continuous Euclidean space, that is going to take us into the domain of the Weber Problem. That may still be doable, but we would have to look at a column-generation formulation.
Objective function maximizes the weight of selected streets indexed by j. Constraint (1) specifies that exactly one center be chosen. Constraint (2) specifies that for any potential center, i, only N streets are chosen as neighbors. Constraint (3) stipulates that a street is chosen as part of some neighborhood only if the corresponding center is chosen. The rest are binary integer constraints.
If the street chosen as center counts as one of the N streets, it is easy to enforce by specifying y_{ii} = x_i
Note: If the above formulation is right, or captures the problem accurately, [MIP] can be solved trivially, once set N_i has been identified.
Consider each i in turn. From N_i pick the top N neighboring streets in descending order of weight. This is your incumbent solution. Update incumbent solution if better solution is found as you iterate through the is.

Minimum cost path for N*N matrix in ϴ(N^2) time

I am provided with an N*N matrix of towns, and each town has a respective widget price (a weight).
I am tasked with finding the cheapest cost to acquire a widget for every town, using the following equation:
Cost for target town : (Target Town Cost) + (Rectilinear distance away from base town)
Cheapest Total Cost for base town = min(costs calculated for all target towns)
The simple solution is ϴ(N4), where we go through all N2 base towns, and calculate the cost for all N2 targets and the answer is the minimum.
I need to find a solution in ϴ(N2). I know there is a dynamic programming solution but we have not covered that yet so I am trying to find an alternative.
Solutions I have tried so far:
Calculating the cost for middle row and column, then recursively calculating the costs for the middle row and column for every sub-grid. This would however be ϴ(log(N2)N2), since calculating the mininum for the initial row and column would be ϴ((number of base towns) * (checking all target towns)) = ϴ(log(N2)N2)
Creating a min-heap based on the smallest town, but this would still take ϴ(N2log(N)) due to building the heap.
The current attempt I am trying seems to be leading me in the right direction. We can start out by
Finding the minimum cost town of the whole grid, and we know it is its own minimum cost town. That will take N2
Take the surrounding towns. Their minimum cost will either be themselves, or the global minimum.
We recurs onto the surrounding towns again, and here I don't know what to do. I am looking for a property of the already calculated towns that I could take advantage of in order to calculate each current surrounding town in constant time.
Is there something to this or should I consider some other method?
Thank you.

Optimal (approximated) path Algorithm (distance + priority)

I need to write an algorithm for the following scenario:
Given a set of points (cities) on a map I have to visit. Each of these cities has a priority, from 1 (you need to go there ASAP) to 5 (no rush, but go there eventually). I have limited ressources so I can not visit all the priority-1 cities at first. (E.g if NY and SF have priority 1 and Washington has Priority 5, I'm looking for a path NY-Washington-SF not NY-SF-Washington).
I don't know it it matters, but n (# of cities) will usually be around 10-20.
I found a presentation of "The Hierarchical Traveling Salesman Problem" (Panchamgam, Xiong, Golden and Wasi) which is kind of what I'm looking for but the corresponding article is not publicly accessible.
Can you recommend existing algorithms for such scenarios? Or point me in the right direction what to search for?
An approximation would be alright. My scenario is not as life-threatening as in the scenario described by Panchamgam et. al. It's important to avoid unnecessary detours caused by the priorities without ignoring them completely.
In standard TSP you want to minimize the total length of the route. In your case you want to basically optimize two metrics: the length of the route, and how early high priority cities appear on the route. You need to put these two metrics into a single metric, and how you do that might have an impact on the algorithm choice. For example, map the city priorities to penalties, e.g.
1 -> 16
2 -> 8
3 -> 4
4 -> 2
5 -> 1
Then use as the metric you want to minimize the total sum of (city_penalty * distance_to_city_from_start_on_route). This pushes the high priority cities to the beginning of the route in general but allows for out of priority order traversal if otherwise the route becomes too long. Obviously the penalty values should be experimentally tuned.
With this metric, you can then use e.g. standard stochastic search approach --- start with a route and then swap cities or edges on the route in order to decrease the metric (use simulated annealing or tabu search).
An upper bound of 20ish puts dynamic programming in play. There's an O(n^2 2^n)-time algorithm for plain old traveling salesman path that goes like this. For each end vertex (n) and subset of vertices containing that end vertex (2^(n - 1)), we're going to determine the cheapest tour that visits the entire subset. Iterate over the subsets so that each set comes after its proper subsets (e.g., represent the sets as bit vectors and count from 0 to 2^n - 1). For each end vertex v in a subset S, the cheapest tour of S is either just v (if S = {v}) or it consists of a cheapest tour of S - {v} (computed already) followed by v. Each vertex w in S - {v} is a possibility for the next to last vertex of the tour of S - {v}.
You haven't completely specified how the priorities interact with the goal of minimizing the distance. One could, for example, translate the priorities into deadlines (you must visit this vertex before traveling x distance). The dynamic program adapts easily for this setting: the only modification needed is to assign cost +infinity if the time to reach the specified end vertex is too great. There are a lot of other possibilities here; you can have an objective consisting of a sum over each individual vertex of some vertex-dependent function of the distance to reach that vertex.
From an engineering standpoint, the nice thing about implementing an exact algorithm is that it is much easier to test (just compare with brute force).

Resources