Optimization problem in connected graphs with profits - algorithm

I am trying to develop an algorithm to solve a problem that I am not able to classify, I expose the subject:
You have a map divided into sections that have a certain area and where a certain number of people live.
The problem consists of finding sets of connected sections whose area does not exceed a certain value maximizing the number of selected inhabitants.
For now I can think of two approaches:
Treat the problem as an all-pairs shortest paths problem in an
undirected graph with positive natural values where the solutions
that do not meet the constraint of the maximum selected area will be
discarded. For this you could use the Floyd-Warshall algorithm,
Dijkstra for all pairs or Thorup algorithm (which can be done in time
V * E, where these are the vertices and edges of the graph).
Treat it as an open vehicle routing problem with profits where each
vehicle can start and end wherever it wants (open vehicle routing
problem with profits or OVRPP).
Another aproach
Also, depending on the combinatorics of the particular problem it is possible in certain cases to use genetic algorithms, together with tabu search, but this is only for cases where finding an optimal solution is inadmissible.
To be clearer, what is sought is to obtain a selection of connected sections whose sum of areas does not exceed a total area. The parameter to maximize is the sum of populations of the selected sections. The objective is to find an optimal solution.
For example, this is the optimal selection with max area of 6 (red color area)
Thank you all in advance!

One pragmatic approach would be to formulate this as an instance of integer linear programming, and use an off-the-shelf ILP solver. One way to formulate this as an ILP instance is build a graph with one vertex per section and an edge between each pair of adjacent sections; then, selecting a connected component in that graph is equivalent to selecting a spanning tree for that component.
So, let x_v be a set of zero-or-one variables, one for each vertex v, and let y_{u,v} be another set of zero-or-one variables, one per edge (u,v). The intended meaning is that x_v=1 means that v is one of the selected sections; and that y_{u,v}=1 if and only if x_u=x_v=1, which can be enforced by y_{u,v} >= x_u + x_v - 1, y_{u,v} <= x_u, y_{u,v} <= x_v. Also add a constraint that the number of y's that are 1 is one less than the number of x's that are 1 (so that the y's form a tree): sum_v x_v = 1 + sum_{u,v} y_{u,v}. Finally, you have a constraint that the total area does not exceed the maximum: sum_v A_v x_v <= maxarea, where A_v is the area of section v.
Then your goal is to maximize sum_v P_v x_v, where P_v is the population of section v. Then the solution to this integer linear programming problem will give the optimal solution to your problem.

Related

Finding a subgraph of max weight

I have a city area (let's think of it as a graph of streets), where all streets have some weight and length associated with them. What I want to do is find a connected set of streets, located near other, with some max (or close to max) total weight W, given that my max subgraph can only contain up to N streets.
I'm specifically not interested in a subgraph that would span the entire graph, but rather only a small cluster of streets that has max or close to max combined weight and where all streets are located "near" each other, where "near" would be defined as no street being more than X meters away from the center of the cluster. Resulting subgraph would have to be connected.
Does anyone know if the name for this algorithm assuming it exists?
Also interested in any solutions, exact or approximations.
To show this visually, assume my graph is all the street segments (intersection to intersection) in the image below. So individual street is not Avenue A, it's Avenue A between 10th and 11th, and so on. Street will either have weight of 1 or 0. Assume that the set of streets with max weights are in the selected polygon - what I want to do is find this polygon.
Here's a proposal. Consider each vertex in the graph of nodes as the "center" as you've defined it. For each center C[i], execute Dijkstra's algorithm to construct a shortest path tree with C[i] as the origin. Stop constructing the tree when it would include a vertex more than the max allowed from the center.
Then let A[i] be the set of all edges incident to vertices in the tree centered on V[i]. The result will be the set A[i] with maximum weight.
The run time of one execution of Dijkstra's algorithm is O(|E[i]| + |V[i]| log |V[i]|) for the ith center. The sets here are limited in size by the max distance from center. Total cost is sum_(i in 1..|V|) O(|E[i]| + |V[i]| log |V[i]|). In the degenerate case where the max allowable weight allows the whole graph to be included from each center, cost will be O(|V| (|E| + |V| log |V|)).
I can think of some possible optimizations to improve run time, but want to verify this gets at the problem you have in mind.
Here is an integer programming exact formulation for the problem assuming you have a finite number of total streets, S, and the "center" of a cluster can be one of the finite number of streets S. If you are looking at a cluster center in continuous Euclidean space, that is going to take us into the domain of the Weber Problem. That may still be doable, but we would have to look at a column-generation formulation.
Objective function maximizes the weight of selected streets indexed by j. Constraint (1) specifies that exactly one center be chosen. Constraint (2) specifies that for any potential center, i, only N streets are chosen as neighbors. Constraint (3) stipulates that a street is chosen as part of some neighborhood only if the corresponding center is chosen. The rest are binary integer constraints.
If the street chosen as center counts as one of the N streets, it is easy to enforce by specifying y_{ii} = x_i
Note: If the above formulation is right, or captures the problem accurately, [MIP] can be solved trivially, once set N_i has been identified.
Consider each i in turn. From N_i pick the top N neighboring streets in descending order of weight. This is your incumbent solution. Update incumbent solution if better solution is found as you iterate through the is.

How to select k nodes in a fully connected graph with max separation between any pair of nodes?

Supposing I have a fully connected graph of N nodes, and I know the weight between any two pairs of nodes. How do I select k nodes such that I maximize the minimum distance between any pair of nodes?
I mapped this problem as a more general case of the one I actually want to solve, which I've dubbed the cheating students problem (I don't know if it has an actual name).
Cheating Students problem:
Given an N.M matrix, how to select k cells with maximum distance between any pair of cells? You could assume the matrix is a classroom where k cheating students are giving a test. No pair of students should be close to each other, and thus we want to maximize the minimum distance between any pair.
Your generalized graph problem appears to be very closely related to the maximum independent set problem described in https://en.wikipedia.org/wiki/Independent_set_%28graph_theory%29, which is NP-complete. I can find a maximum independent set by running a binary chop to find the largest k for which an algorithm solving your graph problem returns a minimum distance greater than 1. Since finding a maximum independent set is hard, I think your generalized problem is hard.
I don't see an easy way to solve the matrix problem, either, but the related problem of packing circles as efficiently as possible on a 2-d surface of infinite size has been solved, and the answer is what is called a hexagonal packing (https://en.wikipedia.org/wiki/Circle_packing) which confusingly is based on a triangular tiling (https://en.wikipedia.org/wiki/Triangular_tiling - "The vertices of the triangular tiling are the centers of the densest possible circle packing").
So for finite matrices and numbers of students it is possible that arranging the students in widely separated rows, with the rows staggered so that each student is centered between the pair of students nearest them in the row in front of them and behind them, is not too far from optimal - or at least a good place from which to start some sort of hill-climbing attempt.

Which algorithm should match this specific Graph

specific question here. Suppose you have a graph where each vertice specifies how many connections they must have to another vertices and the following rules/properties apply:
1- The graph can be incomplete (no need to every vertice to have a connection with every other)
2- There can be two connections between two vertices only if they are in opposite directions (e.g: A points do B, B points to A).
3- Suppose they are on a 2D plane, there can be no crossing of connections (not even tangents).
4- Theres no interest for the shortest path, just respecting the properties and knowing if the solution is unique or not.
5- There can be no possible solution
EDIT: Alright guys sorry for not being specific. I'll try to clarify my point here: what I want to do is given a number of vertices, know if a graph is connected (if all the points have at least a connection to the graph). The vertices given can be impossible to make a graph of it so I want to know if there's is a solution, if the solution is unique or not or (worst case scenario) if there is no possible solution. I think that clarifies point 4 and 5. The graph is undirected, the connections can Not curve, only straight lines.The Nodes (vertices) are fixed, we have their position from or W/E input. I wanted to know the best approach and I've been researching and it is a connectivity problem, though maybe some specific alg may be more efficient doing this task. That's all, sorry for late reply
EDIT2: Alright guys would the problem be different if we think that each vertice is on a row and column of a plane matrix and they can only connect with other Vertices on the same column or row? So it would be just 90/180/270/360 straight connections. This would hugely shorten the possibilities right?
I am going to assume that the question is: Given the degree of each vertex, work out a graph that passes all the constraints given.
I think you can reduce this to a very large integer programming problem - linear constraints, but with the variables required to be integers (in fact either 0 or 1), which makes the problem much more difficult than ordinary linear programming.
Let the unknowns be of the form Xij, where Xij is 1 if there is an edge from node i to node j, and 0 otherwise. The requirements on the number of connections then amount to requirements of the form SUM_{all i}Xij = K for some K dependent on the requirement. The requirement that the graph is planar reduces to the requirement that the graph not contain two known graphs as subgraphs - https://en.wikipedia.org/wiki/Graph_minor. Each possible subgraph then produces a constraint such as X01 + X02 + ... < 5 - there will be a huge number of these constraints - so large that for large number of nodes simply producing all the constraints may be too expensive to be practical, let alone solving them. The number of constraints goes up as at least the 6th power of the number of nodes. However this is polynomial, so theoretically practical to write down the MIP to be solved - so perhaps this is better than no algorithm at all.
Assuming that you are asking us to:
Find out if it is possible to generate one-or-more directed planar graphs such that each vertex has a given out-degree (not necessarily the same out-degree per vertex).
Let's also assume that you want the graph to be connected.
If there are n vertices and the vertices have degrees d_1 ... d_n then for vertex i there are C(n-1,d_i) = (n-1)!/((d_i)!*(n-1-d_i)!) possible combinations of out-edges from that vertex. Taking the product of all these combinations over all the vertices will give you the upper bound on the number of possible graphs.
The naive approach is:
Generate all possible graphs.
Filter the graphs to only have connected graphs.
Run a planarity test on the graph to determine if it is planar (you can consider the graph to be undirected in this step); discard if it isn't.
Profit!

Optimal (approximated) path Algorithm (distance + priority)

I need to write an algorithm for the following scenario:
Given a set of points (cities) on a map I have to visit. Each of these cities has a priority, from 1 (you need to go there ASAP) to 5 (no rush, but go there eventually). I have limited ressources so I can not visit all the priority-1 cities at first. (E.g if NY and SF have priority 1 and Washington has Priority 5, I'm looking for a path NY-Washington-SF not NY-SF-Washington).
I don't know it it matters, but n (# of cities) will usually be around 10-20.
I found a presentation of "The Hierarchical Traveling Salesman Problem" (Panchamgam, Xiong, Golden and Wasi) which is kind of what I'm looking for but the corresponding article is not publicly accessible.
Can you recommend existing algorithms for such scenarios? Or point me in the right direction what to search for?
An approximation would be alright. My scenario is not as life-threatening as in the scenario described by Panchamgam et. al. It's important to avoid unnecessary detours caused by the priorities without ignoring them completely.
In standard TSP you want to minimize the total length of the route. In your case you want to basically optimize two metrics: the length of the route, and how early high priority cities appear on the route. You need to put these two metrics into a single metric, and how you do that might have an impact on the algorithm choice. For example, map the city priorities to penalties, e.g.
1 -> 16
2 -> 8
3 -> 4
4 -> 2
5 -> 1
Then use as the metric you want to minimize the total sum of (city_penalty * distance_to_city_from_start_on_route). This pushes the high priority cities to the beginning of the route in general but allows for out of priority order traversal if otherwise the route becomes too long. Obviously the penalty values should be experimentally tuned.
With this metric, you can then use e.g. standard stochastic search approach --- start with a route and then swap cities or edges on the route in order to decrease the metric (use simulated annealing or tabu search).
An upper bound of 20ish puts dynamic programming in play. There's an O(n^2 2^n)-time algorithm for plain old traveling salesman path that goes like this. For each end vertex (n) and subset of vertices containing that end vertex (2^(n - 1)), we're going to determine the cheapest tour that visits the entire subset. Iterate over the subsets so that each set comes after its proper subsets (e.g., represent the sets as bit vectors and count from 0 to 2^n - 1). For each end vertex v in a subset S, the cheapest tour of S is either just v (if S = {v}) or it consists of a cheapest tour of S - {v} (computed already) followed by v. Each vertex w in S - {v} is a possibility for the next to last vertex of the tour of S - {v}.
You haven't completely specified how the priorities interact with the goal of minimizing the distance. One could, for example, translate the priorities into deadlines (you must visit this vertex before traveling x distance). The dynamic program adapts easily for this setting: the only modification needed is to assign cost +infinity if the time to reach the specified end vertex is too great. There are a lot of other possibilities here; you can have an objective consisting of a sum over each individual vertex of some vertex-dependent function of the distance to reach that vertex.
From an engineering standpoint, the nice thing about implementing an exact algorithm is that it is much easier to test (just compare with brute force).

Algorithm for finding optimal node pairs in hexagonal graph

I'm searching for an algorithm to find pairs of adjacent nodes on a hexagonal (honeycomb) graph that minimizes a cost function.
each node is connected to three adjacent nodes
each node "i" should be paired with exactly one neighbor node "j".
each pair of nodes defines a cost function
c = pairCost( i, j )
The total cost is then computed as
totalCost = 1/2 sum_{i=1:N} ( pairCost(i, pair(i) ) )
Where pair(i) returns the index of the node that "i" is paired with. (The sum is divided by two because the sum counts each node twice). My question is, how do I find node pairs that minimize the totalCost?
The linked image should make it clearer what a solution would look like (thick red line indicates a pairing):
Some further notes:
I don't really care about the outmost nodes
My cost function is something like || v(i) - v(j) || (distance between vectors associated with the nodes)
I'm guessing the problem might be NP-hard, but I don't really need the truly optimal solution, a good one would suffice.
Naive algos tend to get nodes that are "locked in", i.e. all their neighbors are taken.
Note: I'm not familiar with the usual nomenclature in this field (is it graph theory?). If you could help with that, then maybe that could enable me to search for a solution in the literature.
This is an instance of the maximum weight matching problem in a general graph - of course you'll have to negate your weights to make it a minimum weight matching problem. Edmonds's paths, trees and flowers algorithm (Wikipedia link) solves this for you (there is also a public Python implementation). The naive implementation is O(n4) for n vertices, but it can be pushed down to O(n1/2m) for n vertices and m edges using the algorithm of Micali and Vazirani (sorry, couldn't find a PDF for that).
This seems related to the minimum edge cover problem, with the additional constraint that there can only be one edge per node, and that you're trying to minimize the cost rather than the number of edges. Maybe you can find some answers by searching for that phrase.
Failing that, your problem can be phrased as an integer linear programming problem, which is NP-complete, which means that you might get dreadful performance for even medium-sized problems. (This does not necessarily mean that the problem itself is NP-complete, though.)

Resources