Given N points(in 2D) with x and y coordinates. You have to find a point P (in N given points) such that the sum of distances from other(N-1) points to P is minimum.
for ex. N points given p1(x1,y1),p2(x2,y2) ...... pN(xN,yN).
we have find a point P among p1 , p2 .... PN whose sum of distances from all other points is minimum.
I used brute force approach , but I need a better approach. I also tried by finding median, mean etc. but it is not working for all cases.
then I came up with an idea that I would treat X as a vertices of a polygon and find centroid of this polygon, and then I will choose a point from Y nearest to the centroid. But I'm not sure whether centroid minimizes sum of its distances to the vertices of polygon, so I'm not sure whether this is a good way? Is there any algorithm for solving this problem?
If your points are nicely distributed and if there are so many of them that brute force (calculating the total distance from each point to every other point) is unappealing the following might give you a good enough answer. By 'nicely distributed' I mean (approximately) uniformly or (approximately) randomly and without marked clustering in multiple locations.
Create a uniform k*k grid, where k is an odd integer, across your space. If your points are nicely distributed the one which you are looking for is (probably) in the central cell of this grid. For all the other cells in the grid count the number of points in each cell and approximate the average position of the points in each cell (either use the cell centre or calculate the average (x,y) for points in the cell).
For each point in the central cell, compute the distance to every other point in the central cell, and the weighted average distance to the points in the other cells. This will, of course, be the distance from the point to the 'average' position of points in the other cells, weighted by the number of points in the other cells.
You'll have to juggle the increased accuracy of higher values for k against the increased computational load and figure out what works best for your points. If the distribution of points across cells is far from uniform then this approach may not be suitable.
This sort of approach is quite widely used in large-scale simulations where points have properties, such as gravity and charge, which operate over distances. Whether it suits your needs, I don't know.
The point in consideration is known as the Geometric Median
The centroid or center of mass, defined similarly to the geometric median as minimizing the sum of the squares of the distances to each sample, can be found by a simple formula — its coordinates are the averages of the coordinates of the samples but no such formula is known for the geometric median, and it has been shown that no explicit formula, nor an exact algorithm involving only arithmetic operations and kth roots can exist in general.
I'm not sure if I understand your question but when you calculate the minimum spanning tree the sum from any point to any other point from the tree is minimum.
Related
Suppose I have some n points (in my case, 4 points) in 3 dimensions. I want to determine both the point a which minimizes the squared distance to each of these n points, as well as the largest difference that can exist between the distance from an arbitrary point b and any two of these n points (i.e. the two "farthest points").
How can this be most efficiently accomplished? I know that, in 2 dimensions and with 3 points, the solution to the point that minimized distance is the centroid of the triangle formed by the 3 points, and the solution to the largest difference can be found by taking a point located precisely at one (any?) of the 3 points. It seems that the same should be true in 3 dimensions, although I am unsure.
I want to determine both the point that minimizes distance from each of these n points
The centroid minimizes the sum of the squared distances to every point in the set. But will not minimize the max distance (the farther distance) to the points.
I suspect that you are interested in computing the center and radius of the minimal sphere containing every point in the set. This is a classic problem in CG that can be solved in linear time quite easily in an approximate way, or exactly if you program the algorithm propossed by Emmerich Welzl.
If the number of points is as small as 4, an approximate solution is search the pair of point with maximum distance (there is 12 possible pairs) and compute the midpoint as center and half-distance as radius . Then, ensure that the other two points are also inside the sphere, or make it grow if necessary.
See more information at
https://en.wikipedia.org/wiki/Bounding_sphere
https://en.wikipedia.org/wiki/Smallest-circle_problem
The largest difference between the distances of a point to two given points is achieved when the three points are aligned and the unknown point is "outside" (there are infinitely many solutions). In this configuration, the difference is just the distance between the two given points.
If you mean to maximize all differences simultaneously (or rather the sum of differences), you must go to infinity in some direction. That direction maximizes the sum of the lengths of the projections of all edges.
Let P1, . . . , Pk be a collection of pairwise disjoint simple polygons with a total of n edges, all enclosed within a given square. Find the largest
disk that can be inscribed in this square so that it is disjoint from all the interiors of the polygons Pi.
was thinking of using Voronoi diagram of line segments...
Consider the distance field d(x, y) on the square that defines for every point (x, y) the distance to the nearest polygon. Your aim is to find the maximum of this distance field.
A very easy approach is to discretize the distance field on a grid and find the vertex with the maximum value. You can also subdivide the grid further around candidate points.
But you can also incorporate a continuous optimization step. For this, calculate the discretized distance field and use the top vertices as seeds for continuous optimization. In this optimization, move the point, such that the distance increases until you cannot move anymore. This will find local maxima. With proper initialization, one of these maxima is most likely the global maximum.
To find the direction in which to move, find the intersections of the current disk with all polygons. If there is one intersection, moving in the opposite direction will definitely increase the distance. If there are more intersections, you could average the directions. You can either use a pre-defined step length or an estimate. To estimate the step length, find the distance to the next intersections (excluding the already found intersection primitives - not the whole polygons), let's name this distance d_next. You can safely move (d_next - d_current) / 2 without causing a new intersection. There may be smarter way to calculate the step length.
To effectively find n nearest neighbors of a point in d-dimensional space, I selected the dimension with greatest scatter (i.e. in this coordinate differences between points are largest). The whole range from minimal to maximal value in this dimension was split into k bins. Each bin contains points which coordinates (in this dimensions) are within the range of that bin. It was ensured that there are at least 2n points in each bin.
The algorithm for finding n nearest neighbors of point x is following:
Identify bin kx,in which point x lies(its projection to be precise).
Compute distances between x and all the points in bin kx.
Sort computed distances in ascending order.
Select first n distances. Points to which these distances were measured are returned as n
nearest neighbors of x.
This algorithm is not working for all cases. When algorithm can fail to compute nearest neighbors?
Can anyone propose modification of the algorithm to ensure proper operation for all cases?
Where KNN failure:
If the data is a jumble of all different classes then knn will fail because it will try to find k nearest neighbours but all points are random
outliers points
Let's say you have two clusters of different classes. Then if you have a outlier point as query, knn will assign one of the classes even though the query point is far away from both clusters.
This is failing because (any of) the k nearest neighbors of x could be in a different bin than x.
What do you mean by "not working"? You do understand that, what you are doing is only an approximate method.
Try normalising the data and then choosing the dimension, else scatter makes no sense.
The best vector for discrimination or for clustering may not be one of the original dimensions, but any combination of dimensions.
Use PCA (Principal Component Analysis) or LDA (Linear Discriminant Analysis), to identify a discriminative dimension.
I'm wondering about Manhattan distance. It is very specific, and (I don't know if it's a good word) simple. For example when we are given a set of n points in this metric, then it is very easy to find the distance between two farthest points, in linear time. But is it also easy to find two closest points?
I heard, that there exists universal algorithm for finding two closest points in any metric, but it's complicated. I'm wondering if in this situation (Manhattan metric) it is possible to use special properties of this distance and come up with an easier algorithm, that will be more friendly in implementation?
EDIT: n points on a plane, and lets say -10^9 <= x,y <= 10^9 for all points.
Assuming you're talking about n points on a plane, find among the coordinates the minimal and maximal values of x and y coordinates. Create a matrix sized maxX-minX x maxY-minY, such that all points are representable by a cell in the matrix. Fill the matrix with the n given points (not all cells will be filled, set NaN there, for example). Scan the matrix - shortest distance is between adjacent filled cells in the matrix (there are might be several such pairs).
I'm building an application based around finding a "convenient meeting point" given a set of locations.
Currently I'm defining "convenient" as "minimising the total travel distance". This is a different problem from finding the centroid as illustrated by the following example (using cartesian coordinates rather than latitude and longitude for convenience):
A is at (0,0)
B is at (0,0)
C is at (0,12)
The location of minimum total travel for these points is at (0,0) with total travel distance of 12; the centroid is at (0,4) with total travel distance of 16 (4 + 4 + 8).
If the location were confined to being at one of the points, the problem appears to become simpler, but this isn't a constraint I intend to have (unlike, for example, this otherwise similar question).
What I can't seem to do is come up with any sort of algorithm to solve this - suggestions welcomed please!
Here is a solution that finds the geographical midpoint and then iteratively explores nearby positions to adjust that towards the minimum total distance point.
http://www.geomidpoint.com/calculation.html
This question is also quite similar to
Minimum Sum of All Travel Times
Here is a wikipedia article on the general problem you're trying to solve:
http://en.wikipedia.org/wiki/Geometric_median
In a way what you appear to be looking for is the center of mass of a triangle with equal weights at the vertices. That would point to barycentric coordinates.
When going beyond a triangle there are solutions for generalized barycentric coordinates and you could give priorities to persons by modifying the weight of the vertices. What that still would not account for is distances on a real map (can't just travel straight in any direction) but it may be a start?
One option is to define an objective (and gradient) function and use a generic optimization library, such as scipy.optimize. fmin_cg would be a good algorithm to try for your problem. Your objective would be the sum of distances as defined in the "Definition" section of the Geometric median Wikipedia page referenced by hatchet. The argument to your objective function is y.