Need Algorithm to Allocating Managers to Stores - algorithm

Suppose you have "n" managers and "n" stores all located randomly across a geographic area. I need to be able to assign each manager to a store. The managers will travel daily from their homes to their assigned store. In general I'd like to minimize the daily distance travelled. This can be interpreted in two ways:
Minimize the average daily travel distance (which is also the same as minimizing the total travel distance.
Minimize the maximum travel distance for any single manager
Is this a known problem? Are there any obvious algorithms to solve it? It seems similar to the traveling salesman problem but it's not quite the same.

Both can be solved polynomially
I'll quickly cover both ways to define an optimal allocation as described in the question. Note that I won't make any assumption on the traveling times, such as triangle inequality. Of course such a property is likely to hold in practice, and there may be better algorithms that use these properties.
Minimize total distance
For this instance, we consider the managers and the stores to be a weighted complete bipartite graph. We then want a matching that minimizes the sum of the weights.
This is called the Balanced Assignment Problem, which is a specific case of minimum/maximum matching. Because the graph is bipartite, this can be solved polynomially. Wikipedia lists a couple of algorithms for solving this problem, most notably the Hungarian algorithm.
Minimize maximum distance
If we wish to minimize the maximum distance, we can find a solution through a binary search. Specifically, we binary search over the maximum distance and attempt to find a matching that does not violate this maximum distance.
For any given maximum distance x, we create the bipartite graph that has edges between manager M and store S if and only if d(M, S) < x. We then try to create a complete matching on this bipartite graph with any bipartite matching algorithm, and through success and failure complete the binary search for the smallest x that allows for matching, thus minimizing the maximum distance.

Related

Minimize total distance using k links among n nodes

I came up with the following problem that I do not know the solutions to nor can I find the 'lookup' term to investigate further.
Say we have N ordered nodes (n_1,n_2....n_N) each with a fixed distance of 1 between them. So dist(n_1,n_N) = N-1. Now we are are allowed to connect any two nodes, thus effectively reducing their distance to 1. Assume we can have k such connections.
The problem is : how do we choose which nodes to connect to minimize the total distance between any two nodes ?
Is this problem a known variant of some well-studied problem ? Does an efficient solution exist for this (or a variant where we only want to minimize the max distance between any two nodes)
Thanks
You may be interested in "On the sum of all distances in a graph or digraph". That paper refers to your "total distance" as the "transmission" of a graph. Your "max distance" is generally called the "diameter" of a graph. It discusses the two, proves some properties of a graph's transmission, and shows that the transmission and diameter are independent of one another.
Naively, you've got n-choose-k options to try. That's pretty bad if n and k are large. Not too bad if one of them is small.
There is work on doing better than that. This Mathoverflow question asks about reducing the mean distance between vertices, which is proportional to the transmission of the graph. There are two answers, neither of which I can vouch for. It also refers to a paper that directly addresses this question.
Minimizing the diameter of a graph is dealt with in this paper.
You might consider addressing this question to the Math stackexchange.

How to find the size of maximal clique or clique number?

Given an undirected graph G = G(V, E), how can I find the size of the largest clique in it in polynomial time? Knowing the number of edges, I could put an upper limit on the maximal clique size with
https://cs.stackexchange.com/questions/11360/size-of-maximum-clique-given-a-fixed-amount-of-edges
, and then I could iterate downwards from that upper limit to 1. Since this upper cap is O(sqrt(|E|)), I think I can check for the maximal clique size in O(sqrt(|E|) * sqrt(|E|) * sqrt(|E|)) time.
Is there a more efficient way to solve this NP-complete problem?
Finding the largest clique in a graph is the clique number of the graph and is also known as the maximum clique problem (MCP). This is one of the most deeply studied problems in the graph domain and is known to be NP-Hard so no polynomial time algorithm is expected to be found to solve it in the general case (there are particular graph configurations which do have polynomial time algorithms). Maximum clique is even hard to approximate (i.e. find a number close to the clique number).
If you are interested in exact MCP algorithms there have been a number of important improvements in the past decade, which have increased performance in around two orders of magnitude. The current leading family of algorithms are branch and bound and use approximate coloring to compute bounds. I name the most important ones and the improvement:
Branching on color (MCQ)
Static initial ordering in every subproblem (MCS and BBMC)
Recoloring: MCS
Use of bit strings to encode the graph and the main operations (BBMC)
Reduction to maximum satisfiability to improve bounds (MaxSAT)
Selective coloring (BBMCL)
and others.
It is actually a very active line of research in the scientific community.
The top algorithms are currently BBMC, MCS and I would say MaxSAT. Of these probably BBMC and its variants (which use a bit string encoding) are the current leading general purpose solvers. The library of bitstrings used for BBMC is publicly available.
Well I was thinking a bit about some dynamic programming approach and maybe I figured something out.
First : find nodes with very low degree (can be done in O(n)). Test them, if they are part of any clique and then remove them. With a little "luck" you can crush graph into few separate components and then solve each one independently (which is much much faster).
(To identify component, O(n) time is required).
Second : For each component, you can find if it makes sense to try to find any clique of given size. How? Lets say, you want to find clique of size 19. Then there has to exist at least 19 nodes with at least 19 degree. Otherwise, such clique cannot exist and you dont have to test it.

Optimal distribution of power plants on a city

I've searched both Google and Stack Overflow for an answer to my problem but I can't find one.
I need to find the optimal distribution for the power network of a city. The city is represented by a connected graph. I want to distribute power plants among some of those nodes in order to cover all of them in the electrical grid. The problem being that every power plant has a certain "range" (it can only cover for example in a "radius" of two nodes). My program needs to find the minimum number of power plants and their locations to cover the entire city.
I know from my searches that it should be related to MST's (minimum spanning trees) but the problem is in the limited range of the power plants.
I've thought about going through every node in the city and calculate the sub-graph containing all nodes within the range of a power plant in that node until I find the one that covers the most uncovered nodes and then keep doing that until the entire city is covered (basically brute forcing the problem) but that seems very unpractical and I was wondering if there is any other more effective way to solve this problem.
Thanks.
Unfortunately, this problem is known to be NP-hard by a reduction from the dominating set problem.
Given a graph G, a dominating set in G is a set of nodes D such that every node in the graph is either in D or is one hop away from D. The problem of finding the smallest dominating set in a graph is known to be NP-hard, and this problem easily reduces to the one you're trying to solve: given a graph G, produce a city (represented as a graph) that has the same structure as G, then give every power plant a radius of 1 (meaning that it can cover a node and all its neighbors). Finding the smallest set of power plants to cover the entire city then ends up producing a dominating set for the graph. Therefore, your problem is NP-hard.
As mentioned in this section of the Wikipedia page, it turns out that this problem is surprisingly hard to approximate. The Wikipedia page lists a few algorithms and approaches for approximating it, but it appears to be one of those NP-hard problems that resists polynomial-time approximation schemes.
Hope this helps!

Sorting points such that the minimal Euclidean distance between consecutive points would be maximized

Given a set of points in a 3D Cartesian space, I am looking for an algorithm that will sort these points, such that the minimal Euclidean distance between two consecutive points would be maximized.
It would also be beneficial if the algorithm tends to maximize the average Euclidean distance between consecutive points.
Edit:
I've crossposted on https://cstheory.stackexchange.com/ and got a good answer. See https://cstheory.stackexchange.com/questions/8609/sorting-points-such-that-the-minimal-euclidean-distance-between-consecutive-poin.
Here is a lower bound for the cost of the solution, which might serve as a building block for branch and bound or a more unreliable incomplete search algorithm:
Sort the distances between the points and consider them in non-increasing order. Use http://en.wikipedia.org/wiki/Disjoint-set_data_structure to keep track of sets of points, merging two sets when connected by a link between two points. The length of the shortest distance you encounter up to the point when you merge all the points into one set is an upper bound to the minimum distance in a perfect solution, because a perfect solution also merges all the points into one. However your upper bound may be longer than the minimum distance for a perfect solution, because the links you are joining up will probably form a tree, not a path.
You can model your problem by graph, draw line between your points, now you have a complete graph, now your problem is finding longest path in this graph which is NP-Hard, see wiki for longest path.
In fact I answered a second part of problem, maximize average, which means maximize path which goes from every node of graph, if you weight them as 1/distance it will be a travelling salesman problem (minimize the path length) and is NP-Hard. and for this case may be is useful to see Metric TSP approximation.

Efficient minimal spanning tree in metric space

I have a large set of points (n > 10000 in number) in some metric space (e.g. equipped with Jaccard Distance). I want to connect them with a minimal spanning tree, using the metric as the weight on the edges.
Is there an algorithm that runs in less than O(n2) time?
If not, is there an algorithm that runs in less than O(n2) average time (possibly using randomization)?
If not, is there an algorithm that runs in less than O(n2) time and gives a good approximation of the minimum spanning tree?
If not, is there a reason why such algorithm can't exist?
Thank you in advance!
Edit for the posters below:
Classical algorithms for finding minimal spanning tree don't work here. They have an E factor in their running time, but in my case E = n2 since I actually consider the complete graph. I also don't have enough memory to store all the >49995000 possible edges.
Apparently, according to this: Estimating the weight of metric minimum spanning trees in sublinear time there is no deterministic o(n^2) (note: smallOh, which is probably what you meant by less than O(n^2), I suppose) algorithm. That paper also gives a sub-linear randomized algorithm for the metric minimum weight spanning tree.
Also look at this paper: An optimal minimum spanning tree algorithm which gives an optimal algorithm. The paper also claims that the complexity of the optimal algorithm is not yet known!
The references in the first paper should be helpful and that paper is probably the most relevant to your question.
Hope that helps.
When I was looking at a very similar problem 3-4 years ago, I could not find an ideal solution in the literature I looked at.
The trick I think is to find a "small" subset of "likely good" edges, which you can then run plain old Kruskal on. In general, it's likely that many MST edges can be found among the set of edges that join each vertex to its k nearest neighbours, for some small k. These edges might not span the graph, but when they don't, each component can be collapsed to a single vertex (chosen randomly) and the process repeated. (For better accuracy, instead of picking a single representative to become the new "supervertex", pick some small number r of representatives and in the next round examine all r^2 distances between 2 supervertices, choosing the minimum.)
k-nearest-neighbour algorithms are quite well-studied for the case where objects can be represented as vectors in a finite-dimensional Euclidean space, so if you can find a way to map your objects down to that (e.g. with multidimensional scaling) then you may have luck there. In particular, mapping down to 2D allows you to compute a Voronoi diagram, and MST edges will always be between adjacent faces. But from what little I've read, this approach doesn't always produce good-quality results.
Otherwise, you may find clustering approaches useful: Clustering large datasets in arbitrary metric spaces is one of the few papers I found that explicitly deals with objects that are not necessarily finite-dimensional vectors in a Euclidean space, and which gives consideration to the possibility of computationally expensive distance functions.

Resources