Closest Pair of Points Algorithm - algorithm

I am currently working on implementing the closest pair of points algorithm in C++. That is, given a list of points (x, y) find the pair of points that has the smallest Euclidean distance. I have done research into this and my understanding of the algorithm is the following (please correct me if I'm wrong):
Split the array of points down the middle
Recursively find the pair of points with the minimum distance for the left and right halves.
Sort the left and right halves by y-coordinate, and compare each point on the left to its 6 closest neighbors (by y-coordinate) on the right. There is some theoretical stuff behind this, but this is my understanding of what needs to be done).
I've gotten the recursion part of the algorithm to work, but am struggling to find an efficient way to find the 6 closest neighbors on the right for each point on the left. In other words, given two sorted arrays, I need to find the 6 closest numbers in Array B for each point in array A. I assume something similar to merge sort is required here, but haven't been able to figure it out. Any help would be much appreciated.

Sounds like you want a quad tree.

Let dist = min(dist_L, dist_R) where dist_L, dist_R are the minimum distances found in the left and right sets, respectively.
Now to find the minimum distance where one point is on the left half and the other on the right half, you only need to consider points whose x-coordinates are in the interval [x_m - dist, x_m+dist].
The idea now is to consider the 6 closest points in this interval. So sort the points by y-coordinate for each point, look forward at the next 6. This will result in an O(nlog^2(n)) running time.
You can further improve upon this to O(nlogn) by speeding up the sorting process. To do this, have each recursive call also return a sorted list of the points. Then to sort the entire list, you just have to merge the two sorted lists. An observant reader would notice that this is precisely merge sort.

Related

Effiecient algorithm for matching line segments without intersection

I wanted to know an efficient algorithm to match (partition into n/2 distinct pairs) n=2k points in general position in the plane in such way that segments joining the matched points do not cross. Any idea would help out immmensely.
Mr. SRKV there is a simpler way of doing it.
Sort all the points based on the x-coordinate.
Now pair the left most point with the next left most one.
Remove the two points that we just paired.
Continue from Step 2 till there are no points left.
In case two points have the same x-coordinate. The following is the tie breaking rule.
Join the point with the lower y-coordinate to the point with the 2nd lowest y-coordinate.
If there are an odd number of points with the same x-coordinate, then we join the lone remaining point (topmost y) with the next x-coordinate(if multiple then the lowest one).
Total complexity O(nlogn) to sort and O(n) to traverse so asymptotically it is O(nlogn).
Find the convex hull.
Working your way around the hull (let's say clockwise), take adjacent pairs of vertices and add them to your set of pairs. Delete each pair from the graph as you do so. If the hull contains an even number of points, then all of them will be deleted, otherwise 1 will be left over.
If the graph still contains points, goto 1.
If each hull contains an even number of points, then it's clear that every pair of line segments found by this algorithm either came from the same hull, or from different hulls -- and either way, they will not intersect. I'm convinced it will work even when some hulls have an odd number of points.

In the closest pair of points divide and conquer algorithm, what is the significance of sorting the "strip" by the points' y values?

I believe I understand the algorithm quite clearly, except for the step where you look to see if there's any points that are close by looking across the division and create a strip where points within the strip are candidates.
But then the algorithm states to sort the points by their y coordinates and then check each other point in the strip to find if there is a smaller distance than the one previously found. It basically sounds like you brute force within the strip.
For example, here's what Introduction to Algorithms states:
So it seems you just take each point and compare it against all the others to find the closest points? Why is it necessary to sort by y value then? You already have them sorted by x, why not brute force with that?
You don't brute force compare against all points in Y' but only against the one next to p. If that one is already too far away you can just stop, because all other points will be even further away. You only continue evaluating the next closest neighbor if the last one was still within your search distance.
The text explains it in the As we will see shortly section.
Sorting is an optimization here that allows you to iterate nearest neighbors in O(1) after paying the sorting costs of O(n log n) once.

Find two points in a given set of points in 2D plane with least distance in less than O(n^2) time

I was asked this question in Yahoo for machine learning profile. Given a set of points (x,y) coordinates I was asked to find points with lowest distance in O(n) or O(log n )time.
Obviously I was able to come up with O(n^2) time but was no way near getting the better algorithm. Even though the problem statement was screaming for Divide and Conquer I just could not come up with the reasoning for the merge step. I also googled for this question on the internet and found that It is actually very popular but I still could not get hold of the reasoning of the merge step.
Can anyone help me out with this?
Input: (x1,y1),(x2,y2),(x3,y3),(x4,y4),(x5,y5)
The problem can be solved in O(n log n) time using the recursive divide and conquer approach, e.g., as follows:
1.Sort points according to their x-coordinates.
2.Split the set of points into two equal-sized subsets by a vertical line x=xmid.
3.Solve the problem recursively in the left and right subsets. This yields the left-side and right-side minimum distances dLmin and dRmin, respectively.
4.Find the minimal distance dLRmin among the pair of points in which one point lies on the left of the dividing vertical and the second point lies to the right.
5.The final answer is the minimum among dLmin, dRmin, and dLRmin.
http://en.wikipedia.org/wiki/Closest_pair_of_points

Calculating all bitonic paths

I'm trying to calculate all bitonic paths for a given set of points.
Given N points.
My guess is there are O(n!) possible paths.
Reasoning
You have n points you can choose from your starting location. From there you have n-1 points, then n-2 points...which seems to equal n!.
Is this reasoning correct?
You can solve it with good old dynamic programming.
Let Count(top,bottom) be the number of incomplete tours such that top is the rightmost top row point and bottom is the rightmost point and all the points left of top are bottom are already in the trail.
Now, Count(i,j) = Count(k,j) where k={i-1}U{l: l
This is O(n^3) complexity.
If you want to enumerate all the bitonic trails, along with Count also keep track of all the paths. In the update step append path appropriately. This would require a lot of memory though. If you don't want to use lot of memory use recursion (same idea. sort the points. At every recursion point either put the new point is top fork or the bottom fork and check if there are any crossings)

Finding the farthest point in one set from another set

My goal is a more efficient implementation of the algorithm posed in this question.
Consider two sets of points (in N-space. 3-space for the example case of RGB colorspace, while a solution for 1-space 2-space differs only in the distance calculation). How do you find the point in the first set that is the farthest from its nearest neighbor in the second set?
In a 1-space example, given the sets A:{2,4,6,8} and B:{1,3,5}, the answer would be
8, as 8 is 3 units away from 5 (its nearest neighbor in B) while all other members of A are just 1 unit away from their nearest neighbor in B. edit: 1-space is overly simplified, as sorting is related to distance in a way that it is not in higher dimensions.
The solution in the source question involves a brute force comparison of every point in one set (all R,G,B where 512>=R+G+B>=256 and R%4=0 and G%4=0 and B%4=0) to every point in the other set (colorTable). Ignore, for the sake of this question, that the first set is elaborated programmatically instead of iterated over as a stored list like the second set.
First you need to find every element's nearest neighbor in the other set.
To do this efficiently you need a nearest neighbor algorithm. Personally I would implement a kd-tree just because I've done it in the past in my algorithm class and it was fairly straightforward. Another viable alternative is an R-tree.
Do this once for each element in the smallest set. (Add one element from the smallest to larger one and run the algorithm to find its nearest neighbor.)
From this you should be able to get a list of nearest neighbors for each element.
While finding the pairs of nearest neighbors, keep them in a sorted data structure which has a fast addition method and a fast getMax method, such as a heap, sorted by Euclidean distance.
Then, once you're done simply ask the heap for the max.
The run time for this breaks down as follows:
N = size of smaller set
M = size of the larger set
N * O(log M + 1) for all the kd-tree nearest neighbor checks.
N * O(1) for calculating the Euclidean distance before adding it to the heap.
N * O(log N) for adding the pairs into the heap.
O(1) to get the final answer :D
So in the end the whole algorithm is O(N*log M).
If you don't care about the order of each pair you can save a bit of time and space by only keeping the max found so far.
*Disclaimer: This all assumes you won't be using an enormously high number of dimensions and that your elements follow a mostly random distribution.
The most obvious approach seems to me to be to build a tree structure on one set to allow you to search it relatively quickly. A kd-tree or similar would probably be appropriate for that.
Having done that, you walk over all the points in the other set and use the tree to find their nearest neighbour in the first set, keeping track of the maximum as you go.
It's nlog(n) to build the tree, and log(n) for one search so the whole thing should run in nlog(n).
To make things more efficient, consider using a Pigeonhole algorithm - group the points in your reference set (your colorTable) by their location in n-space. This allows you to efficiently find the nearest neighbour without having to iterate all the points.
For example, if you were working in 2-space, divide your plane into a 5 x 5 grid, giving 25 squares, with 25 groups of points.
In 3 space, divide your cube into a 5 x 5 x 5 grid, giving 125 cubes, each with a set of points.
Then, to test point n, find the square/cube/group that contains n and test distance to those points. You only need to test points from neighbouring groups if point n is closer to the edge than to the nearest neighbour in the group.
For each point in set B, find the distance to its nearest neighbor in set A.
To find the distance to each nearest neighbor, you can use a kd-tree as long as the number of dimensions is reasonable, there aren't too many points, and you will be doing many queries - otherwise it will be too expensive to build the tree to be worthwhile.
Maybe I'm misunderstanding the question, but wouldn't it be easiest to just reverse the sign on all the coordinates in one data set (i.e. multiply one set of coordinates by -1), then find the first nearest neighbour (which would be the farthest neighbour)? You can use your favourite knn algorithm with k=1.
EDIT: I meant nlog(n) where n is the sum of the sizes of both sets.
In the 1-Space set I you could do something like this (pseudocode)
Use a structure like this
Struct Item {
int value
int setid
}
(1) Max Distance = 0
(2) Read all the sets into Item structures
(3) Create an Array of pointers to all the Items
(4) Sort the array of pointers by Item->value field of the structure
(5) Walk the array from beginning to end, checking if the Item->setid is different from the previous Item->setid
if (SetIDs are different)
check if this distance is greater than Max Distance if so set MaxDistance to this distance
Return the max distance.

Resources