I came across this problem wherein there are a number of houses on a 2-D grid (their coordinates are given) and we essentially have to find which house can be used as a meeting point so that the distance traveled by everyone minimizes. Let us assume that a distance along the x or y-axis takes 1 unit and a distance to the diagonal neighbors takes (say) 1.2 units.
I cannot really think of a good optimization algorithm for this.
P.S: Not a homework problem. And I am only looking for an algorithm (not code) and if possible, its proof.
P.S #2: I am not looking for the Exhaustive solution. Believe it or not, that did strike me :)
As already pointed, in case of Manhattan distance the median gives a solution. This is an obvious conclusion from the well-known fact that median minimizes the mean of absolute deviation:
E|X-c| >= E|X-median(X)| for any constant c.
And here you can find an example of the proof for discrete case:
https://stats.stackexchange.com/questions/7307/mean-and-median-properties/7315#7315
This is probably really inefficient, but loop through all the houses, then loop through all the other houses. (nested for loops) Use the distance formula to find the distance between the 2 houses. Then you have the distance between every house. One quick and easy way to find which house is the closest distance is to add everyone's walking distance together for the given house. The house with the least total walking distance is the meeting area of choice.
I have been bugged by the same problem for some time now. The solution is the obvious consensus given in earlier posts: find the median (mx, my) independently and then find the point closest in the given N points and that is the meeting place. To see why this is actually the solution you should first consider the distance.
d = sum(|xi-x|) + sum(|yi-y|) over all 1<=i<=N,
which is independent in x and y. Hence we can solve the 1-D case for x and y. I will skip over the explanation given ^^ and hence conclude that (mx,my) is the best solution if we consider all possible points.The bigger challenge is to prove that we may move from (mx, my) to the closest (xi,yi) such that (xi, yi) is one of the given points, w/o changing(increasing) the distance. The proof goes:
Consider that we have sorted x-coordinates( for sake for proof ) and
that X1<X2<...<Xn. Also Xj<mx<X(j+1) where j = N/2, now let's move mx
one step to left, that is mx' <- mx-1.
Hence d' = |X1-mx+1| + .. + |Xj-mx+1| + |X(j+1)-mx+1| + .. + |Xn-mx+1|
We know that mx-1 will increase N/2 values( for k>=j+1 and decrease
for <=j ) hence the effect is the same. Thus (mx-1, my) gives the same
solution. It means that there is a space from Xj<mx<X(j+1) and
Yj<my<Y(j+1) where the distance does not change. Thus we can find the
closest such point which is the answer.
I have ignored the subtle case of even/odd nodes, but I hope the math works out itself when you realize the basic proof.
This is my first post, do help me improve my writing skills.
Your distance metric is weird. You'd expect that travelling on the diagonal should take at least sqrt(2) ~= 1.41 times the distance of travelling along a component direction, because that's how much further it is if travelling in a straight line along the diagonal by the Pythagorean theorem.
If you insisted on a manhattan distance (no diagonals allowed), then you'd want to pick the house closest to the median(x) + median(y) of the houses.
Consider the 1D case, you have a bunch of points on a line, and you want to pick the meeting spot. For concreteness/simplicity, let's say there are 5 houses, none duplicate.
Consider what happens as the meeting spot drifts away from the median to the right. For every unit away until you pass the 4th house left to right order, 3 people have to take an additional step to the right, and 2 people have to take one less step to the left, so the cost goes up by 1. Once you pass the 4th house, then 4 people have to taken an additional step to the right, and a single person has to take one less step to the left, so the cost increases by 3. An identical argument holds as you move the meeting spot to the left from the median. Moving away from the median always increases the cost.
The argument generalizes to any number of people, with or without duplicate houses, and even across to arbitrary number of dimensions, so long as you aren't allowed to use the diagonal.
Your problem is called Optimal Meeting Point Finding.
The following paper gives an efficient approximate algorithm
http://www.cse.ust.hk/~wilfred/paper/vldb11.pdf
Well, you could brute force it. Take each house and calculate the distance to each other house. Sum the distances up for each individual house. Then just grab the house that had the lowest sum.
Related
I successfully implemented an 8-puzzle solver using the A* algorithm and now I am adding a twist to it: there could be more than one empty space in the puzzle and the numbers on the tiles are no longer unique (there could be numbers that are the same).
While the algorithm works after I modified it to generate successor states for all empty spaces, it didn't solve the game in the smallest number of moves possible (because I actually came up with a smaller number of moves when I tried to solve it by hand, surprise!)
Question: Is Manhattan distance still a viable heuristic in this puzzle? If not, what could the heuristic be?
Yes, an admissible heuristic for this problem can involve Manhattan distance.
The simplest approach is just to take the Manhattan distance to the closest possible target location for each tile.
This is clearly admissible because it's impossible to take less moves to get to any location quicker than directly moving to the closest one with ignoring all obstacles.
But we can do better - for two identical tiles A and B with target positions 1 and 2, rather than calculating the distance to the closest one for each, we can calculate the distance of all possible assignments of tiles to positions, so:
min(dist(A,1) + dist(B,2), dist(A,2) + dist(B,1))
This can be generalized to any number of tiles, but keep in mind that, for n identical tiles, there are n! such possibilities, so it gets quite expensive to calculate quite quickly.
Seeing why this is admissible is still fairly easy - since we're calculating the shortest possible distance for all assignments of tiles to positions, there's no way that the actual shortest distance could be any less.
I'm looking for efficient solution of following problem: For given set of points in n-dimensional euclidian space find such member of this set that minimizes total distance to other points in set.
The obvious naïve approach is quadratic, so I'm looking for something less than quadratic.
My first thought was that all I need is just to find the center of bounding sphere and then, find the closest point in set to this point. But this is actually not true, imagine right triangle - all its vertices are equidistant from such center, nevertheless, exactly one vertice meets our requirements.
It would be nice it one will shed some light on this issue.
What minimizes the distance to all of the points is their average. Only a guess, but after you'll find the average you could find a point closest to it. As correctly pointed out in comments, median instead of average will actually minimize the distance (average will minimize squared distance). Median can also be calculated in O(n). For high dimensional datasets this solution would be O(n*m) of course, where m is the number of dimensions.
Also some links:
See accepted answer here: Algorithm to find point of minimum total distance from locations
And link provided by mcdowella: http://en.wikipedia.org/wiki/Geometric_median
I am making this up as I go along, but there appears to be a close connection between "best point of a set" and "best point" in convex optimization.
Your score function is a sum of distances. Each distance is convex U-shaped (OK V-shaped in this case) so their sum is convex U-shaped. In particular it has a perfectly good derivative everywhere except at points in the set, and this derivative is optimistic - if you take the value at a point and its derivative, neglecting any point at the point you are looking at, then predictions based on this will be optimistic - the line formed using the derivative lies almost entirely beneath the correct answer but grazes it at a single point.
This leads to the following algorithm:
Repeatedly
Pick a point at random and look to see if is the best point so far. If so, take note of it. Take the derivative of the sum of distances at this point. Use this, and the value at that point, to work out the predicted sum of distances at every other point and discard the points where this prediction is worse than the best answer so far as possible answers (although you still need to take them into account when working out distances and derivatives). These will be the points on the far side of a plane drawn through the chosen point normal to the derivative.
Now discard the chosen point as a contender as well and repeat if there are any points left to consider.
I would expect this to be something like n log n on randomly chosen points. However, if the set of points form the vertices of a regular polygon in n dimensions then it will cost N^2, discarding only the chosen point each time - any of the N points is in fact a correct answer and they all have the same sum of distances from each other.
I will of course up-vote anybody who can confirm or deny this general principle for finding the best of a set of given points under a convex objective function.
OK - I was interested enough in this to program this up - so I have 200+ lines of Java to dump in here if anybody cares. In 2 dimensions it's very fast, but at 20 dimensions you gain only a factor of two or so - this is reasonably understandable - each iteration cuts off points by projecting the problem down to a line and chopping off a fraction of the points outside the line. A randomly chosen point will be about half as far away from the centre as the other points - and very roughly you can expect the projection to cut off all but some multiple of the d-th root of 1/2 so as d increases the fraction of points you can discard in each iteration reduces.
The book "Introduction to algorithms" by Cormen has a question post office location problem in chap 9.
We are given n points p1,p2,...pn with weights w1,w2,....wn. Find a point p(not necessarily one of the input points) that minimizes the sum wi*d(pi,p) where d(a,b) = distance between points a,b.
Looking at the solution to the same , I understand that the weighed median would be the best solution for this problem.
But I have some fundamental doubts about the actual coding part and the usage.
If all elements have equal weight , then to find the weighed median, we find the point till which summation of all weights < 1/2. How to extend it here ?
Given a real scenario having say the number of letters to be delivered at various houses as the weights and we want to minimize the distance to be traveled by finding the location of the post office, x coordinates given ( assuming all houses are in 1 single dimension) , how would we actually go about it ?
Could someone help me in clearing my doubts and understanding the problem.
EDIT :
I was also thinking about a very similar problem : There is a rectangular grid(2d) and different number of people at various places and all want to meet at 1 point (should definitely have integer coordinates) , then what difference would be there from the above problem and how would we solve it ?
You still want the point at which the weights sum to 1/2. Pick any point and consider whether you would do better moving one point to the left or one point to the right from there. If you move left one point you reduce the distance to all points on the left by one and increase the distance to all points on the right by one. Whether you win or lose by this depends on the sum of the weights to the left and the sum of the weights to the right. If the weights do not sum to 1/2 you can do better by moving in the direction that has weight > 1/2, so the only point where you can't do better by choosing another one is the point with weight 1/2 - or, to be more accurate, the point where the weights on either side are both <= 1/2.
For 1/2 to be the right answer the weights have to sum to 1, so if you start off with weights which are numbers of letters then you have to divide them by the total number of letters to get them to sum by one. Of course this penalty function doesn't really make sense unless you have to make a separate trip for each letter to be delivered, but I'm assuming that we are supposed to ignore that.
EDIT
For more than one dimension, you pretty much end up solving the problem of minimising the weighted sum of distances directly. Wikipedia describes this in http://en.wikipedia.org/wiki/Geometric_median. You want to take weights into account, but that doesn't complicate the problem that much. One way of doing it is http://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares. Unfortunately, that doesn't guarantee that the solution it finds will be on a grid point, or that the nearest grid point to a solution will be the best grid point. It probably won't be too bad, but finding the very best grid point in all possible cases might be trickier.
EDIT : this answer is wrong, see comments
What you're looking for is called center of mass (TMHO the weighted median is the center of mass in one dimension).
I didn't get you first question can you detail.
For your example, we would compute the average position weighted by the number of letter per office linked with this position. This would give us : x_center = sum(x_i * w_i) / sum(w_i) and y_center = sum(y_i * w_i) / sum(w_i).
Did it correctly answer your problem ?
Given a set of n points, can we find three points that describe a triangle with minimum area in O(n^2)? If yes, how, and if not, can we do better than O(n^3)?
I have found some papers that state that this problem is at least as hard as the problem that asks to find three collinear points (a triangle with area 0). These papers describe an O(n^2) solution to this problem by reducing it to an instance of the 3-sum problem. I couldn't find any solution for what I'm interested in however. See this (look for General Position) for such a paper and more information on 3-sum.
There are O(n2) algorithms for finding the minimum area triangle.
For instance you can find one here: http://www.cs.tufts.edu/comp/163/fall09/CG-lecture9-LA.pdf
If I understood that pdf correctly, the basic idea is as follows:
For each pair of points AB you find the point that is closest to it.
You construct a dual of the points so that lines <-> points.
Line y = mx + c is mapped to point (m,c)
In the dual, for a given point (which corresponds to a segment in original set of points) the nearest line vertically gives us the required point for 1.
Apparently 2 & 3 can be done in O(n2) time.
Also I doubt the papers showed 3SUM-hardness by reducing to 3SUM. It should be the other way round.
There's an algorithm that finds the required area with complexity O(n^2*log(n)).
For each point Pi in set do the following(without loss of generality we can assume that Pi is in the origin or translate the points to make it so).
Then for each points (x1,y1), (x2,y2) the triangle area will be 0.5*|x1*y2-x2*y1| so we need to minimize that value. Instead of iterating through all pairs of remaining points (which gives us O(N^3) complexity) we sort those points using predicate X1 * Y2 < X2 * Y1. It is claimed that to find triangle with minimal area we need to check only the pairs of adjacent points in the sorted array.
So the complexity of this procedure for each point is n*log(n) and the whole algorithm works in O(n^2*log(n))
P.S. Can't quickly find the proof that this algorithm is correct :(, hope will find it it later and post it then.
The problem
Given a set of n points, can we find three points that describe a triangle with minimum area in O(n^2)? If yes, how, and if not, can we do better than O(n^3)
is better resolved in this paper: James King, A Survey of 3sum-Hard Problems, 2004
Given n points on a 2-D plane, what is the point such that the distance from all the points is minimized? This point need not be from the set of points given. Is it centroid or something else?
How to find all such points(if more than one) with an algorithm?
This is known as the "Center of Distance" and is different from the centroid.
Firstly you have to define what measure of distance you are using. If we assume you are using the standard metric of d=sqrt( (x1-x2)^2 + (y1-y2)^2) then it is not unique, and the problem is minimising this sum.
The easiest example to show this answer is not unique is the straight line example. Any point in between the two points has an equal total distance from all points.
In 1D, the correct answer will be any answer that has the same number of points to the right and the left. As long as this is true, then any move to the left and right will increase and decrease the left and right sides by the same amount, and so leave the distance the same. This also proves the centroid is not necessarily the right answer.
If we extend to 2D this is no longer the case - as the sqrt makes the problem weighted. Surprisingly to me there does not seem to be a standard algorithm! The page here seems to use a brute force method. I never knew that!
If I wanted to use an algorithm, then I would find the median point in X and Y as a start point, then use a gradient descent algorithm - this would get the answer pretty quickly. The whole equation ends up as a quadratic, so it feels like there ought to be an exact solution.
There may be more than one point. Consider a plane with only two points on it. Those points describe a line-segment. Any point on that line-segment will have the same total distance from the two end-points.
This is discussed in detail here http://www.ddj.com/architect/184405252?pgno=1
A brute force algo. might give you the best results. Firstly, locate a rectangle/any quadrilateral bounding the input points. Finally, for each point inside the rectangle, calculate distance from other points. Sum the distances of the point from the input set. Say this is the 'cost' of the point. Repeat for each point and select point with min. cost.
Intelligence can also be added to the algo. it can eliminate areas based on average cost, etc...
Thats how I would approach the problem at least... hope it helps.