I have a set with 6000 vectors (n dim). I calculated the distance matrix to collect all the distances. Suppose I want to spot the 10 points that are more distant each other, i.d. they are the subset in which the sum of distances is the highest between every other possibilities in that set.
If I want to discriminate also with a threshold avoiding two points too close, what should be the fastest algorithm that can do that?
cheers
Daniele
Related
Suppose I have some n points (in my case, 4 points) in 3 dimensions. I want to determine both the point a which minimizes the squared distance to each of these n points, as well as the largest difference that can exist between the distance from an arbitrary point b and any two of these n points (i.e. the two "farthest points").
How can this be most efficiently accomplished? I know that, in 2 dimensions and with 3 points, the solution to the point that minimized distance is the centroid of the triangle formed by the 3 points, and the solution to the largest difference can be found by taking a point located precisely at one (any?) of the 3 points. It seems that the same should be true in 3 dimensions, although I am unsure.
I want to determine both the point that minimizes distance from each of these n points
The centroid minimizes the sum of the squared distances to every point in the set. But will not minimize the max distance (the farther distance) to the points.
I suspect that you are interested in computing the center and radius of the minimal sphere containing every point in the set. This is a classic problem in CG that can be solved in linear time quite easily in an approximate way, or exactly if you program the algorithm propossed by Emmerich Welzl.
If the number of points is as small as 4, an approximate solution is search the pair of point with maximum distance (there is 12 possible pairs) and compute the midpoint as center and half-distance as radius . Then, ensure that the other two points are also inside the sphere, or make it grow if necessary.
See more information at
https://en.wikipedia.org/wiki/Bounding_sphere
https://en.wikipedia.org/wiki/Smallest-circle_problem
The largest difference between the distances of a point to two given points is achieved when the three points are aligned and the unknown point is "outside" (there are infinitely many solutions). In this configuration, the difference is just the distance between the two given points.
If you mean to maximize all differences simultaneously (or rather the sum of differences), you must go to infinity in some direction. That direction maximizes the sum of the lengths of the projections of all edges.
I have coordinates. Each coordinates represent a building and gives number of resident. I would like to predict neighbors in radius 0.5km for any point. This point represents base stations. I want to calculate how many subscribers base station will cover. Anyway, I need number of neighbors in radius. According to my research, I found RadiusNearestNeighborsRegression technique. Bu this teqnique never give me the number of neighbors. It is only give the average value for all neighbors in radius. I need sum of neighbors output or number of neighbors. I will multiply average output and number of neighbors(thats gives me the total output for all neighbors.
I hope, I explained clearly.
You could partition the residences into regions such that, if the minimum distance between any 2 regions is too large, there would be no need to make comparisons between any pair of residences split between the 2 regions.
Also, when you have found that A & B are close enough, A would count towards B's neighbors and B would count towards A's w/ having to repeat that calculation.
I have two sets of 3D points and I want to find the closest point in the second set for each point in the first set. In a more difficult case, the sets may have different numbers of points, and I need to find the closest pairs of points. I'm not sure what this problem is called, but I have some brute-force ideas for solving it. For example, I could calculate the distance between all pairs of points and choose the pairs with the shortest total distance. The maximum number of points in each set is 20, so I don't need the most efficient solution.
If I have a set of k vectors of n dimensions, how can I sort these such that the distance between each consecutive pair of vectors is the minimal possible? The distance can be calculated by using the Euclidian distance, but how is the "sorting" then implemented in an effective manner?
I'm thinking one approach would be to select a vector at random, calculate the distance to all other vectors, pick the vector that minimizes the distance as the next vector and repeat until all vectors have been "sorted". However, this greedy search would probably render different results depending on which vector I start with.
Any ideas on how to do this?
If you really want just 'that the distance between each consecutive pair of vectors is the minimal possible' without randomness, you can firstly find 2 closest points (by O(n log n) algo like this) - let's say, p and q, then search for closest points for p (let's say, r) and q (let's say, s), then compare distance (p,r) and (q,s) and if the first is smaller, start with q,p,r and use your greedy algo (in other case, obviously, start with p,q,s).
However, if your goal is actually to arrange points so that the sum of all paired distances is smallest, you should choose any approximate solution for Travelling salesman problem. Note this trick in order to reduce your task to TSP.
Given N points(in 2D) with x and y coordinates. You have to find a point P (in N given points) such that the sum of distances from other(N-1) points to P is minimum.
for ex. N points given p1(x1,y1),p2(x2,y2) ...... pN(xN,yN).
we have find a point P among p1 , p2 .... PN whose sum of distances from all other points is minimum.
I used brute force approach , but I need a better approach. I also tried by finding median, mean etc. but it is not working for all cases.
then I came up with an idea that I would treat X as a vertices of a polygon and find centroid of this polygon, and then I will choose a point from Y nearest to the centroid. But I'm not sure whether centroid minimizes sum of its distances to the vertices of polygon, so I'm not sure whether this is a good way? Is there any algorithm for solving this problem?
If your points are nicely distributed and if there are so many of them that brute force (calculating the total distance from each point to every other point) is unappealing the following might give you a good enough answer. By 'nicely distributed' I mean (approximately) uniformly or (approximately) randomly and without marked clustering in multiple locations.
Create a uniform k*k grid, where k is an odd integer, across your space. If your points are nicely distributed the one which you are looking for is (probably) in the central cell of this grid. For all the other cells in the grid count the number of points in each cell and approximate the average position of the points in each cell (either use the cell centre or calculate the average (x,y) for points in the cell).
For each point in the central cell, compute the distance to every other point in the central cell, and the weighted average distance to the points in the other cells. This will, of course, be the distance from the point to the 'average' position of points in the other cells, weighted by the number of points in the other cells.
You'll have to juggle the increased accuracy of higher values for k against the increased computational load and figure out what works best for your points. If the distribution of points across cells is far from uniform then this approach may not be suitable.
This sort of approach is quite widely used in large-scale simulations where points have properties, such as gravity and charge, which operate over distances. Whether it suits your needs, I don't know.
The point in consideration is known as the Geometric Median
The centroid or center of mass, defined similarly to the geometric median as minimizing the sum of the squares of the distances to each sample, can be found by a simple formula — its coordinates are the averages of the coordinates of the samples but no such formula is known for the geometric median, and it has been shown that no explicit formula, nor an exact algorithm involving only arithmetic operations and kth roots can exist in general.
I'm not sure if I understand your question but when you calculate the minimum spanning tree the sum from any point to any other point from the tree is minimum.