Finding the closest pair using Divide and Conquer - closest-points

I am currently assigned to create a c++ program to find the closest pair of points in a (x,y) coordinate system. However, I am having a lot of trouble trying to understand one thing.
Every tutorial/guide that I have read about the closest pair problem it tells me to sort the set of points by there Y coordinates, but I don't see what the point of this is? Can someone explain to me why we sort it by Y coordinates and what is the use of it? I understand that we sort the points by X in order to get L and X*, but I just don't understand why we have to sort the points by Y coordinates as well.

You don't, but then your running time is not improved over O(n2). The whole point is to compute as little as possible -- by examining as few points as possible, ignoring those you know will not be part of the answer. Do that by sorting Y.
Here's a pretty good explanation I just googled: http://www.cs.mcgill.ca/~cs251/ClosestPair/ClosestPairDQ.html

Related

dividing a point cloud to equal size sub-clouds

I want to find an algorithm that solves the following problem.
Suppose we have a point cloud with N points of dimension m we want to divide the point cloud into sub-clouds where any sub-cloud is larger than or equal to size k and we want to minimize the following:
each sub-cloud size is closest as possible to k.
the distances between points in each sub-cloud.
any direction for a solution will be great, and implementation in python will be appreciated.
Have you thought about using K-means machine learning algorithm?
I know it's not a perfect solution, as you still need to solve the k-size condition, but it's a good direction.
To solve the issue I would:
Choose my k to be something around N / size of sub-cloud (what you called k). I think it has the best chance to success.
Each sub-cloud that returned from the algorithm and is smaller than the wanted size- just add its points to the closest sub-cloud that's been created.
Hope it helped you somehow!

What's a O(n^2 log ⁡n) algorithm that decides if a distinct set is completely triangulable? [duplicate]

What is the best algorithm to find if any three points are collinear in a set of points say n. Please also explain the complexity if it is not trivial.
Thanks
Bala
If you can come up with a better than O(N^2) algorithm, you can publish it!
This problem is 3-SUM Hard, and whether there is a sub-quadratic algorithm (i.e. better than O(N^2)) for it is an open problem. Many common computational geometry problems (including yours) have been shown to be 3SUM hard and this class of problems is growing. Like NP-Hardness, the concept of 3SUM-Hardness has proven useful in proving 'toughness' of some problems.
For a proof that your problem is 3SUM hard, refer to the excellent surver paper here: http://www.cs.mcgill.ca/~jking/papers/3sumhard.pdf
Your problem appears on page 3 (conveniently called 3-POINTS-ON-LINE) in the above mentioned paper.
So, the currently best known algorithm is O(N^2) and you already have it :-)
A simple O(d*N^2) time and space algorithm, where d is the dimensionality and N is the number of points (probably not optimal):
Create a bounding box around the set of points (make it big enough so there are no points on the boundary)
For each pair of points, compute the line passing through them.
For each line, compute its two collision points with the bounding box.
The two collision points define the original line, so if there any matching lines they will also produce the same two collision points.
Use a hash set to determine if there are any duplicate collision point pairs.
There are 3 collinear points if and only if there were duplicates.
Another simple (maybe even trivial) solution which doesn't use a hash table, runs in O(n2log n) time, and uses O(n) space:
Let S be a set of points, we will describe an algorithm which finds out whether or not S contains some three collinear points.
For each point o in S do:
Pass a line L parallel to the x-axis through o.
Replace every point in S below L, with its reflection. (For example if L is the x axis, (a,-x) for x>0 will become (a,x) after the reflection). Let the new set of points be S'
The angle of each point p in S', is the right angle of the segment po with the line L. Let us sort the points S' by their angles.
Walk through the sorted points in S'. If there are two consecutive points which are collinear with o - return true.
If no collinear points were found in the loop - return false.
The loop runs n times, and each iteration performs nlog n steps. It is not hard to prove that if there're three points on a line they'll be found, and we'll find nothing otherwise.

Finding the horizontal cost (and vertical cost) of an element?

I was browsing Stack Overflow for a question on an algorithm in which to find the closest point to point A from a list of 2D points. I know the list must be sorted to get an optimal time, but I want it faster than O(N**2) which would be using brute force.
I found an answer that seems appealing here: Given list of 2d points, find the point closest to all other points, but I don't fully understand the answer. I am wondering about the part where he begins explaining about the horizontal/vertical cost and from that point onward. Could someone provide me an example of what to do in the case of these (random) points?
point A: (20, 40.6)
List of Points[(-20,200),(12,47), (4,0), (-82,92), (40,15), (112, 97), (-203, 84)]
If you can provide an alternative method to the linked post that would also be fine. I know it has something to do with sorting the list, and probably taking out the extremes, but again I'm not sure what method(s) to use.
EDIT: I understand now that this is not the euclidian distance that I am most interested in. Would a divide and conquer algorithm/method be the best bet here? I don't fully understand it yet, but it sounds like it solves what I want it to solve in O(N*log(N)). Would this approach be optimal, and if so would someone mind breaking it down to the basics as I am unable to understand it as most other sites have described it.
What you try to do is not possible if there is no structure in the list of points and they can really be random. Assume you have an algorithm that runs faster than in linear time, then in in your list of points there is one point B that is not read at all by the algorithm. Necessarily if I change B to another value the algorithm will run in the same way and return the same result. Now if the algorithm does not return a point of the list that is the same than A then I can change B to B=A and the correct solution to the problem would now be B (you can't get any closer than actually being the same point) and the algorithm would necessarily return a wrong result.
What the question you are referring to is trying to do is find a point A out of a list L such that sum of all distances between A and every point in L is minimal. The algorithm described in the answer runs in time O(n*log(n)) (where n). Note that n*log(n) actually grows faster than n so it is actually slower than looking at every element.
Also "distance" in the question does not refer to the euclidean distance. Where normally you would define the distance between a point (x_1,y_1) and a second point (x_2,y_2) to be sqrt((x_2-x_1)^2+(y_2-x_2)^2) the question refers to the "Taxicab distance" |x_2-x_1|+|y_2-x_2| where | | refers to the absolute value.
Re: edit
If you just want to find one point of the list that is closest to a fixed point A than you can linearly search for it in the list. See the following python code:
def distance(x,y):
#Manhatten distance
x1,x2=x
y1,y2=y
return abs(x1-y1)+abs(x2-y2)
def findClosest(a,listOfPoints):
minDist=float("inf")
minIndex=None
for index,b in enumerate(listOfPoints):
if distance(a,b) < minDist:
minDist=distance(a,b)
minIndex=index
return minDist,minIndex
a=(20, 40.6)
listOfPoints=[(-20,200),(12,47), (4,0), (-82,92), (40,15), (112, 97), (-203, 84)]
minDist,minIndex=findClosest(a,listOfPoints)
print("minDist:",minDist)
print("minIndex:",minIndex)
print("closest point:",listOfPoints[minIndex])
The challenge of the referenced question is that you don't want to minimize the distance to a fixed point, but that you want to find a point A out of the list L whose average distance to all other points in L is minimal.

Is there an efficient algorithm to generate random points in general position in the plane?

I need to generate n random points in general position in the plane, i.e. no three points can lie on a same line. Points should have coordinates that are integers and lie inside a fixed square m x m. What would be the best algorithm to solve such a problem?
Update: square is aligned with the axes.
Since they're integers within a square, treat them as points in a bitmap. When you add a point after the first, use Bresenham's algorithm to paint all pixels on each of the lines going through the new point and one of the old ones. When you need to add a new point, get a random location and check if it's clear; otherwise, try again. Since each pair of pixels gives a new line, and thus excludes up to m-2 other pixels, as the number of points grows you will have several random choices rejected before you find a good one. The advantage of the approach I'm suggesting is that you only pay the cost of going through all lines when you have a good choice, while rejecting a bad one is a very quick test.
(if you want to use a different definition of line, just replace Bresenham's with the appropriate algorithm)
Can't see any way around checking each point as you add it, either by (a) running through all of the possible lines it could be on, or (b) eliminating conflicting points as you go along to reduce the possible locations for the next point. Of the two, (b) seems like it could give you better performance.
Similar to #LaC's answer. If memory is not a problem, you could do it like this:
Add all points on the plane to a list (L).
Shuffle the list.
For each point (P) in the list,
For each point (Q) previously picked,
Remove every point from L which are linear to P-Q.
Add P to the picked list.
You could continue the outer loop until you have enough points, or run out of them.
This might just work (though might be a little constrained on being random). Find the largest circle you can draw within the square (this seems very doable). Pick any n points on the circle, no three will ever be collinear :-).
This should be an easy enough task in code. Say the circle is centered at origin (so something of the form x^2 + y^2 = r^2). Assuming r is fixed and x randomly generated, you can solve to find y coordinates. This gives you two points on the circle for every x which are diametrically opposite. Hope this helps.
Edit: Oh, integer points, just noticed that. Thats a pity. I'm going to keep this solution up though - since I like the idea
Both #LaC's and #MizardX's solution are very interesting, but you can combine them to get even better solution.
The problem with #LaC's solution is that you get random choices rejected. The more points you have already generated the harder it gets to generate new ones. If there is only one available position left you have slight chance of randomly choosing it (1/(n*m)).
In the #MizardX's solution you never get rejected choices, however if you directly implement the "Remove every point from L which are linear to P-Q." step you'll get worse complexity (O(n^5)).
Instead it would be better to use a bitmap to find which points from L are to be removed. The bitmap would contain a value indicating whether a point is free to use and what is its location on the L list or a value indicating that this point is already crossed out. This way you get worst-case complexity of O(n^4) which is probably optimal.
EDIT:
I've just found that question: Generate Non-Degenerate Point Set in 2D - C++
It's very similar to this one. It would be good to use solution from this answer Generate Non-Degenerate Point Set in 2D - C++. Modifying it a bit to use radix or bucket sort and adding all the n^2 possible points to the P set initially and shufflying it, one can also get worst-case complexity of O(n^4) with a much simpler code. Moreover, if space is a problem and #LaC's solution is not feasible due to space requirements, then this algorithm will just fit in without modifications and offer a decent complexity.
Here is a paper that can maybe solve your problem:
"POINT-SETS IN GENERAL POSITION WITH MANY
SIMILAR COPIES OF A PATTERN"
by BERNARDO M. ABREGO AND SILVIA FERNANDEZ-MERCHANT
um, you don't specify which plane.. but just generate 3 random numbers and assign to x,y, and z
if 'the plane' is arbitrary, then set z=o every time or something...
do a check on x and y to see if they are in your m boundary,
compare the third x,y pair to see if it is on the same line as the first two... if it is, then regenerate the random values.

Trilateration of a signal using Time Difference of Arrival

I am having some trouble to find or implement an algorithm to find a signal source. The objective of my work is to find the sound emitter position.
To accomplish this I am using three microfones. The technique that I am using is multilateration that is based on the time difference of arrival.
The time difference of arrival between each microfones are found using Cross Correlation of the received signals.
I already implemented the algorithm to find the time difference of arrival, but my problem is more on how multilateration works, it's unclear for me based on my reference, and I couldn't find any other good reference for this that are free/open.
If you have some references on how I can implement a multilateration algorithm, or some other trilateration algorithm that I can use based on time difference of arrival it would be a great help.
Thanks in advance.
The point you are looking for is the intersection of three hyperbolas. I am assuming 2D here since you only use 3 receptors. Technically, you can find a unique 3D solution but as you likely have noise, I assume that if you wanted a 3D result, you would have taken 4 microphones (or more).
The wikipedia page makes some computations for you. They do it in 3D, you just have to set z = 0 and solve for system of equations (7).
The system is overdetermined, so you will want to solve it in the least squares sense (this is the point in using 3 receptors actually).
I can help you with multi-lateration in general.
Basically, if you want a solution in 3d - you have to have at least 4 points and 4 distances from them (2-give you the circle in which is the solution - because that is the intersection between 2 spheres, 3 points give you 2 possible solutions (intersection between 3 spheres) - so, in order to have one solution - you need 4 spheres). So, when you have some points (4+) and the distance between them (there is an easy way to transform the TDOA into the set of equations for just having the length type distances /not time/) you need a way to solve the set of equations. First - you need a cost function (or solution error function, as I call it) which would be something like
err(x,y,z) = sum(i=1..n){sqrt[(x-xi)^2 + (y-yi)^2 + (z-zi)^2] - di}
where x, y, z are coordinates of the current point in the numerical solution and xi, yi, zi and di are the coordinates and distance towards the ith reference point. In order to solve this - my advice is NOT to use Newton/Gauss or Newton methods. You need first and second derivative of the aforementioned function - and those have a finite discontinuation in some points in space - hence that is not a smooth function and these methods won't work. What will work is direct search family of algorithms for optimization of functions (finding minimums and maximums. in our case - you need minimum of the error/cost function).
That should help anyone wanting to find a solution for similar problem.

Resources