I have a set of points, and need to know which one has the farthest euclidean distance from any other points.
I want to improve this from O(n^2)
Now guys i've heard about Kd trees for solution, BUT
KD Trees doesn't provide nearest distance if the point 'x' is already present in Kd tree. And there is no implementation for it to remove.
Edit:
You can do this by ignoring self in "nearest search algo" and "where we set root/parent" initially to begin search
given n points Pi, 1 <= i <= n:
build kd-tree (with an O(n) median of median algorithm this is O(n log n))
for all points Pi: find second closest point (closest point will be point itself), compute distance and remember Pi if the distance is a new minimum; this is O(n log n) again.
Alltogether this is an O(n log n) algorithm.
I assume that you want to find the point that maximises the distance to the nearest neighbour. Like a small island in the south pacific being 1100 miles away from the nearest land.
Well, you should be nowhere near O (n^2). Say you have a million points. Divide the points into a 1000 x 1000 grid. To find the nearest point, you would only have to examine the nine neigbouring grids, so you are far below O (n^2). If a grid contains lots of points, they will be close together so you can remove them from the search quickly.
Related
Suppose we have two sets of points, say A and B (both of size O(n)) in the plane. Can we find farthest pair of points each being in A & B in O(n) time?
No, you can not calculate the furthest point for each point in O(n). The best you can obtain is O(n log n) with a 2-d tree. You can do this with a technique, similar to finding a closest point.
Read a more detailed answer here where I show a couple of other approaches to solve a similar problem.
I'm learning for the test from algorithms and I spotted problem that I cannot deal with for a few days. So I'm writing here for help.
For a given two disjoint sets on plane:
G={(x_1^G, y_1^G), (x_2^G, y_2^G), ..., (x_n^G, y_n^G)}
D={(x_1^D, y_1^D), (x_2^D, y_2^D), ..., (x_n^D, y_n^D)}
Where for every 1 <= i, j <= n we have y_i^D < y_j^G, so G is above D.
Find an effective algorithm that
counts the distance between them
defined as:
d(G,D) = min{ d(a,b): a \in G and b\in D },
where d(a,b) = |x_a - x_b| + |y_a - y_b|
O(n^2) is trivial, so it is not the answer.
I hope the solution isn't too hard since it is from materials to review before the test. Can anybody help?
I think it will appear that this is a special case of some common problem. But if it is a special case, maybe the solution can be easier?
There are a few different ways to do this in O(n log n) time.
One: Compute the manhattan distance Voronoi diagram of the G points and build a point location data structure based on that. This takes O(n log n) time. For each D point, find the closest G point using the point location data structure. This takes O(log n) time per D point. Take the min of the distances between the pairs you just found and that's your answer.
Two: You can adapt Fortune's algorithm to this problem; just keep separate binary trees for D and G points. Kind of annoying to describe.
The next idea computes the distance of the closest pair for the infinity-norm, which is max(|x1-x2|, |y1-y2|). You can tilt your problem 45 degrees (substituting u = x-y, v = x+y) to get it into the appropriate form.
Three (variant of two): Sort all of the points by y coordinate. Maintain d, the distance between the closest pair seen so far. We'll sweep a line from top to bottom, maintaining two binary search trees, one of G points and one of D points. When a point is d or farther above the sweep line, we remove it from its binary search tree. When a point is first encountered by the sweep line, say a D point, we (1) check the G binary search tree to see if it has any elements whose x-coordinate is within d of the new point's, updating d as necessary, and (2) insert the new point into D's binary search tree. Each point only causes a constant number of binary search tree operations plus a constant amount of additional work, so the sweep is O(n log n). The sort is too, unsurprisingly, so our overall time complexity is as desired.
You can probably make a divide-and-conquer strategy work too based on similar ideas to three.
Assume we have an array that holds n vectors. We want to calculate the maximum euclidean distance between those vectors.
The easiest (naive?) approach would be to iterate the array and for each vector calculate its distance with the all subsequent vectors and then find the maximum.
This algorithm, however, would grow (n-1)! with respect to the size of the array.
Is there any other more efficient approach to this problem?
Thanks.
Your computation of the naive algorithm's complexity is wonky, it should be O(n(n-1)/2), which reduces to O(n^2). Computing the distance between two vectors is O(k) where k is the number of elements in the vector; this still gives a complexity well below O(n!).
Complexity is O(N^2 * K) for brute force algorithm (K is number of elem in vector). But we can do better by knowing that in euclidean space for points A,B and C:
|AB| + |AC| >= |BC|
Algorithm should be something like this:
If max distance found so far is MAX and for a |AB| there is a point C, such that distance |AC| and |CB| already computed and MAX > |AC|+|CB|, then we can skip calculation for |AB|.
It is difficult to tell complexity of this algorithm, but my gut feeling tells me it is not far from O(N*log(N)*K)
This question has been here before, see How to find two most distant points?
And the answer is: is can be done in less than O(n^2) in Euclidean space. See also http://mukeshiiitm.wordpress.com/2008/05/27/find-the-farthest-pair-of-points/
So suppose you have a pair of points A and B. Consider the hypersphere that have A and B at the north and south pole respectively. Could any point C contained in the hypersphere be farther from A than B?
Further suppose we partition the pointset into sqrt(N) hyperboxes with sqrt(N) points each. For any pair of hyperboxes, we can calculate in k time the maximum distance possible between any two points of the infinite set of points contained within them - by simply calculating the distance between their furthest corners. If we already have a candidate better than this we can discard all pairs of points from those hyperboxes.
I need to get two points that have biggest distance between.
The easiest method is to compute distance between each of them, but that solution would have an quadratic complexity.
So i'm looking for any faster solution.
How about:
1 Determine the convex hull of the set of points.
2 Find the longest distance between points on the hull.
That should allow you to ignore all points not on the hull when checking for distance.
To elaborate on rossom's answer:
Find the convex hull of the points which can be found in O(n log n) time with an algorithm like Graham's Scan or O(n log h) time with other algorithm's which I assume are harder to implement
Start at a point, say A, and loop through the other points to find the one furthest from it, say B.
Advance A to the next point and advance B until it is furthest from A again. If this distance is larger than the one in part 2, store it as the largest. Repeat until you have looped through all points A in the set
Parts 2 and 3 take amortized O(n) time and therefore the overall algorithm takes O(n log n) or O(n log h) time depending on how much time you can be bothered spending on implementing convex hull.
This is great and all but if you only have a few thousand points (like you said), O(n^2) should work fine (unless you're executing it many times).
Given a set S of points in 2-dimensional space, provide an algorithm that computes nearest neighbor(euclidian) for each point in the set. I think its called nearest neighbor graph, isn't it? Any existing efficient algorithm (N log N), where N = len(S)?
The kd-tree is a pretty standard algorithm for nearest neighbor search (even in 2-space, don't let the first illustration throw you).