Closest Pair of Points in 3+ Dimensions (Divide and Conquer) - algorithm

I am struggling to wrap my head around how the divide and conquer algorithm works for dimensions greater than 2, specifically how to find the closest pair of points between two sub-problems.
I know that I need to only consider points within a distance d of the division between the two on the x axis.
I know that in the 3d case I need to compare each point to only 15 others.
What I don't understand is how to choose those 15 points. In the 2d case, one simply sorts the values by y value and goes through them in order. In the 3d case, however, each point needs to be compared to the 15 points closest to it on both the y and z axes. I can't seem to find a way to determine what those 15 points are in a way that doesn't have a worst case of O(n^2)...
What am I missing here?

A simple solution is to create an octree or a k-d tree, with all the points and then use it to find the nearest point for every point. That is O(N*log N) for the average case.
A faster solution that I think is O(N) for the average case can be implemented considering the following idea:
If you partition the space in half (say by some axis aligned plane), you get the points divided in two subsets, A and B, and the two nearest points can be both in A, both in B or one in A and one in B.
So, you have to create a queue of pairs of 3d boxes, ordered by the minimum distance between them and then:
1) Pick the first pair of boxes from the queue
2) If both boxes are the same box A, divide it in half in two boxes B and C and push the pairs (B, B), (C, C) and (B, C) into the queue.
3) If they are different (A, B), divide the biggest (for instance, B) in half obtaining boxes C and D and push into the queue the pairs (A, C) and (A, D).
4) Repeat.
Also, when the number of points inside the pair of boxes goes below some threshold you may use brute force to find the nearest pair of points.
The search stops once the distance between the two boxes in the pair at the top is bigger than the minimal distance found so far.

Related

How to find a unique set of closest pairs of points in 1D?

My question is very similar to this one:
How to find a unique set of closest pairs of points?
The only difference is that I am in 1D.
So, I have two sets of points (as I'm in 1D, we can see them as numbers between 0 and 1) A and B, each containing m and n element respectively, with m<=n
My goal is to find the set C, made of m DISTINCT points in B that minimize the sum of the distances [A(i), C(i)]
If m = n, I can use wasserstein distance which has a nice 1D implementation
In 2D, I would use the hungarian algorithm, but it's quite expensive, and I hope that there is a quicker solution in 1D.
Thanks
Thinking aloud:
It is an easy matter to find, for every point in A, the two nearest points in B, on both sides. For this is suffices to sort A and B increasingly, and by a merge-like process you find the predecessor and successor in B of every point of A.
The cost of this process is O(NA Log NA) + O(NB Log NB) + O(NA + NB), where the last term can be absorbed.
The smallest sum will be achieved by assigning every point its nearest neighbor, among the left and right ones.
So far so good, but unfortunately the nearest neighbor might also be the nearest to a neighboring point in A, and the conflict needs to be arbitrated (one of the A-point must be assigned its other B-neighbor). In the worst case, this can cause cascaded conflicts.
So far, I fear that this problem is global and I see no better way than to try and resolve the conflicts in all possible ways and keep the best configuration. This process is exponential in the length of the sequences of conflicting points.

Analytic geometry, ordering vertices of triangle to capture shortes and second sortest sides

If I have x and y coordinates for triangle corners A, B, and C, I want to know which of the six orderings of {A, B, C} put the shortest side of the triangle between the first two vertices in the ordering, and the second shortest side between the last two. I know how to solve this, but not in a way that isn't clumsy and inelegant and all around ugly. My favorite language is Ruby, but I respect all of them.
As the third side of a triangle cannot be deduced from the other two, you must compute the three distances.
As the three points may require to be permuted in one of six ways, you cannot work this out with a decision tree that has less than three levels (two levels can distinguish at most four cases).
Hence, compute the three distances and sort them increasingly using the same optimal decision tree as here: https://stackoverflow.com/a/22112521/1196549 (obviously their A, B, C correspond to your distances). For every leaf of the tree, determine what permutation of your points you must apply.
For instance, if you determine |AB|<|CA|<|BC|, you must swap A and B. Solve all six cases similarly.
Doing this you will obtain maximally efficient code.
If you are completely paranoid like I am, you can organize the decision tree in such a way that the cases that require a heavier permutation effort are detected in two tests rather than three.
Here's how I would do it: let's take a triangle with sides x, y, and z, such that l(x) <= l(y) <= l(z). Then, let x', y', and z' be the vertices opposite to x, y, and z, respectively.
Your output will be y', z', x' (if you draw out your triangle, you'll see that this is the order which achieves your requirement). So, the pseudocode looks like:
For points a, b, c each with some coordinates (x, y), calculate the length of the segment opposite to each point (e.g. for a this is segment bc)
Order a, b, c by the length of their opposing segment in the order of [2nd longest, longest, shortest]
Return
Does this make sense? The real work is mapping to the euclidean distance between the opposing points. If you get stuck, update your question with your code and I'm happy to help you work it out.

Finding weight value of isosceles right angled triangles

I want to find out the weight value of total number of isosceles right angled triangles inside a n*m rectangular grid. The weight value is the value which is total of adding each point value in the rectangular grid. Let me explain through an example
Here is the rectangular grid with n=1 and m=2. I want to find out the weight of each isosceles right angled triagle present in this grid. Here are the possible right angle isosceles triangles which can be formed from this grid
So I want to find out the weight value of each of these triangles like triangle A has 4, B has 6.
I tried finding total number of right angled triangles using C Program to detect right angled triangles but it is difficult to find each triangle's weight if I would only know how many triangles there are. My approach for this problem was picking every point and find the triangle associated with it and the corresponding weight value. But it takes 4 times the time complexity the number of points in the grid (4 times 2*3 in this case). I want to find an efficient formula so that I can perform this operation for large n and m as well. Any help would be appreciated.
Per the discussion in the comments, you're looking to enumerate all of the possible triangles and discover the sum of all of the points on the edges.
You can enumerate the triangles as follows. Given a point p = (p1, p2) and another point q = (q1, q2) there is exactly one right angled isosceles starting at p, going to q and turning right. The third vertex will be at r = (q1 + q2 - p2, q2 - q1 + p1). If you loop over all pairs of vertices, this will find every possible triangle exactly once.
Next we need the weight of each line segment. Given a line segment from p to q, first find the GCD of (q1 - p1, q2 - p2). (Special case. The GCD of any integer and 0 is 1.) Then divide both coefficients by that GCD to get the smallest vector along that line going from point to point. Let's call that smallest vector v. Now you can add up the weights for p, p+v, p+2v, ... then stop at q. (Note, each line interval should include one point and not the other.)
And there you go. The final algorithm should be O(n^2 m^2 log(n+m)). Which can't be improved much given that the number of right-angled isosceles triangles is O(n^2 m^2). If needed you could improve the log factor by making the weight of (starting point, unit vector, n) recursive then memoizing it. However that requires a O(n^2 m^2) data structure and locality problems addressing it could easily exceed the theoretical performance gain.
OK, improvement! Instead of iterating over pairs of points, iterate over starting vectors v = (v1, v2) with (v1, v2) relatively prime (check with the Euclidean algorithm, then over starting points p = (p1, p2), then over multiples i of the starting vector. The triangles that you are considering will be (p1, p2), (p1 + n*v1, p2 + n*v2), (p1 + n*v1 + n*v2, p2 - n*v1 + n*v2). And NOW for each starting vector, each value of p2 - p1, and each of the 3 directions you could be going, you can calculate the sum of all of the weights you could have from infinity to each point on that line. (A O(nm) data structure.) With that data structure the two inner loops can execute in time O(1) per triangle.
This gives you a O(n^2 m^2) algorithm to find the total weight of all O(n^2 * m^2) right-angled isosceles triangles. Which is as good as you could theoretically do. (And the auxiliary data structures required are O(nm).)

Tangents range for all pairs of points in a box

Suppose i have a box with a lot of points. I need to be able to calculate min and max angles for all lines which go through all possible pairs of the points. I can do it in O(n^2) times by just enumerating every point with all others. But is there faster algorithm?
Taking the idea of dual plane proposed by Evgeny Kluev, and my comment about finding left-most intersection point, I'll try to give an equivalent direct solution without any dual space.
The solution is simple: sort your points by (x, y) lexicographically. Now draw a line through each two adjacent points in the sorted order. It can be proved that the minimal angle is achieved by one of these lines. In order to get maximal angle, you need to sort by (x, -y) lexicographically, and also check only adjacent pairs of points.
Let's prove by the idea for min angle. Consider the two points A and B which yield the minimal possible angle. Among such points we can choose the pair with minimal difference of x coordinates.
Suppose that they have same y. If there is no other point between them, then they are adjacent. If there are any points between them, then clearly at least one of them is adjacent to A in our order, and all of them yield the same angle.
Suppose that there exists a point P with x-coordinate in between A and B, i.e. Ax < Px < Bx. If P lies on AB, then AP has same angle but less difference of x coordinates, hence a contradiction. When P is not on AB, then either AP or PB would give you less angle, which also gives contradiction.
Now we have points A and B lying on two adjacent vertical lines. There are no other points between these lines. If A and B are the only points on their vertical lines, then the AB pair is clearly adjacent in sorted order and QED. If there many points on these lines, obviously the minimal angle is achieved by taking the highest point on the left vertical line (which must be A) and the lowest point on the right vertical line (which must be B). Since we sort points of equal x by y, these two points are also adjacent.
Sort the points (or use hash map) to find out if there are any horizontal lines.
Then solve this problem on dual plane. Here you only need to find the leftmost and the rightmost intersection points. Use binary searches to find a pair of horizontal coordinates such that all intersection points are between them. (You could quickly find approximate results just by continuing binary searches from these coordinates).
Then sort lines according to their tangents on dual plane. And for pairs of adjacent lines in this sorted order find intersections closest to those horizontal coordinates. This does not guarantee good complexity in the worst case (when some lines on primal plane are almost horizontal). But in most cases time complexity would be determined by sorting: O(N log N) + O(binary_search_complexity).

Triangle partitioning

This was a problem in the 2010 Pacific ACM-ICPC contest. The gist of it is trying to find a way to partition a set of points inside a triangle into three subtriangles such that each partition contains exactly a third of the points.
Input:
Coordinates of a bounding triangle: (v1x,v1y),(v2x,v2y),(v3x,v3y)
A number 3n < 30000 representing the number of points lying inside the triangle
Coordinates of the 3n points: (x_i,y_i) for i=1...3n
Output:
A point (sx,sy) that splits the triangle into 3 subtriangles such that each subtriangle contains exactly n points.
The way the splitting point splits the bounding triangle into subtriangles is as follows: Draw a line from the splitting point to each of the three vertices. This will divide the triangle into 3 subtriangles.
We are guaranteed that such a point exists. Any such point will suffice (the answer is not necessarily unique).
Here is an example of the problem for n=2 (6 points). We are given the coordinates of each of the colored points and the coordinates of each vertex of the large triangle. The splitting point is circled in gray.
Can someone suggest an algorithm faster than O(n^2)?
Here's an O(n log n) algorithm. Let's assume no degeneracy.
The high-level idea is, given a triangle PQR,
P
C \
/ S\
R-----Q
we initially place the center point C at P. Slide C toward R until there are n points inside the triangle CPQ and one (S) on the segment CQ. Slide C toward Q until either triangle CRP is no longer deficient (perturb C and we're done) or CP hits a point. In the latter case, slide C away from P until either triangle CRP is no longer deficient (we're done) or CQ hits a point, in which case we begin sliding C toward Q again.
Clearly the implementation cannot “slide” points, so for each triangle involving C, for each vertex S of that triangle other than C, store the points inside the triangle in a binary search tree sorted by angle with S. These structures suffice to implement this kinetic algorithm.
I assert without proof that this algorithm is correct.
As for the running time, each event is a point-line intersection and can be handled in time O(log n). The angles PC and QC and RC are all monotonic, so each of O(1) lines hits each point at most once.
Main idea is: if we have got the line, we can try to find a point on it using linear search. If the line is not good enough, we can move it using binary search.
Sort the points based on the direction from vertex A. Sort them for B and C too.
Set current range for vertex A to be all the points.
Select 2 middle points from the range for vertex A. These 2 points define subrange for 'A'. Get some line AD lying between these points.
Iterate for all the points lying between B and AD (starting from BA). Stop when n points found. Select subrange of directions from B to points n and next after n (if there is no point after n, use BC). If less than n points can be found, set current range for vertex A to be the left half of the current range and go to step 3.
Same as step 4, but for vertex C.
If subranges A, B, C intersect, choose any point from there and finish. Otherwise, if A&B is closer to A, set current range for vertex A to be the right half of the current range and go to step 3. Otherwise set current range for vertex A to be the left half of the current range and go to step 3.
Complexity: sorting O(n * log n), search O(n * log n). (Combination of binary and linear search).
Here is an approach that takes O(log n) passes of cost n each.
Each pass starts with an initial point, which divides the triangle into there subtriangles. If each has n points, we are finished. If not, consider the subtriangle which is furthest away from the desired n. Suppose it has too many, just for now. The imbalances sum to zero, so at least one of the other two subtriangles has too few points. The third subtriangle either also has too few, or has exactly n points - or the original subtriangle would not have the highest discrepancy.
Take the most imbalanced subtriangle and consider moving the centre point along the line leading away from it. As you do so, the imbalance of the most imbalanced point will reduce. For each point in the triangle, you can work out when that point crosses into or out of the most imbalanced subtriangle as you move the centre point. Therefore you can work out in time n where to move the centre point to give the most imbalanced triangle any desired count.
As you move the centre point you can choose whether points move in our out of the most imbalanced subtriangle, but you can't chose which of the other two subtriangles they go to, or from - but you can predict which easily from which side of the line along which you are sliding the centre point they live, so you can move the centre point along this line to get the lowest maximum discrepancy after the move. In the worst case, all of the points moved go into, or out of, the subtriangle that was exactly balanced. However, if the imbalanced subtriangle has n + k points, by moving k/2 of them, you can move, at worst, to the case where it and the previously balanced subtriangle are out by k/2. The third subtriangle may still be unbalanced by up to k, in the other direction, but in this case a second pass will reduce the maximum imbalance to something below k/2.
Therefore in the case of a large unbalance, we can reduce it by at worst a constant factor in two passes of the above algorithm, so in O(log n) passes the imbalance will be small enough that we are into special cases where we worry about an excess of at most one point. Here I am going to guess that the number of such special cases is practically enumerable in a program, and the cost amounts to a small constant addition.
I think there is a linear time algorithm. See the last paragraph of the paper "Illumination by floodlights- by Steiger and Streinu". Their algorithm works for any k1, k2, k3 that sum up to n. Therefore, k1=k2=k3=n/3 is a special case.
Here is the link where you can find the article. http://www.sciencedirect.com/science/article/pii/S0925772197000278 a CiteSeerX link is http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.4634

Resources