Find all collinear points in a given set - algorithm

This is an interview question: "Find all collinear points in a given set".
As I understand, they ask to print out the points, which lie in the same line (and every two points are always collinear). I would suggest the following.
Let's introduce two types Line (pair of doubles) and Point (pair of integers).
Create a multimap : HashMap<Line, List<Point>)
Loop over all pairs of points and for each pair: calculate the Line connecting the points and add the line with those points to the multimap.
Finally, the multimap contains the lines as the keys and a list collinear points for each line as its value.
The complexity is O(N^2). Does it make sense ? Are there better solutions ?

Collinear here doesn't really make sense unless you fix on 2 points to begin with. So to say, "find all collinear points in a given set" doesn't make much sense in my opinion, unless you fix on 2 points and test the others to see if they're collinear.
Maybe a better question is, what is the maximum number of collinear points in the given set? In that case, you could fix on 2 points (just use 2 loops), then loop over the other points and check that the slope matches between the fixed points and the other points. You could use something like this for that check, assuming the coordinates are integer (you could change parameter types to double otherwise).
// ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
// Returns whether 3 points are collinear
// ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
bool collinear(int x1, int y1, int x2, int y2, int x3, int y3) {
return (y1 - y2) * (x1 - x3) == (y1 - y3) * (x1 - x2);
}
So the logic becomes:
int best = 2;
for (int i = 0; i < number of points; ++i) {
for (int j = i+1, j < number of points; j++) {
int count = 2;
for (int k = 0; i < number of points; ++k) {
if (k==i || k==j)
continue;
check that points i, j and k are collinear (use function above), if so, increment count.
}
best = max(best,count);
}
}

solve the problem in dual plane, convert all the points in to line segments. And then run a plane sweep to report all the intersections. The lines in the dual plane will pass through same point represents collinear points. And the plane sweep can be done in O(nlogn) time.

Are you sure your analysis of the runtime is correct? You say to loop over all the pairs of points, of which there are n*(n-1)/2, i.e. O(n^2). Then you add the line and the pair of points to the map. However, I don't think the time to add such a line + point combination is constant. This means overall your time is O(n^2 log n) not a constant times n^2, which is what O(n^2) means.
So the real question would be, given that it can be done in time O(n^2 log n) can it be done in time O(n^2). Clearly O(n^2) gives a lower bound as you must at the very least print out every pair of points, of which there are O(n^2). My feeling is this is a completely general sorting problem and that one cannot expect better than O(n^2 log n) in general. However a full proof of that fact may be non-trivial.
Another thing to beware of is that the set may contain zero or one points, and so you will need to check that your algorithm handles these cases correctly, otherwise write special cases for those.

Related

Given N points in a 2D plane, determine if there is a line that divides them into two sets of N / 2 points each + some more rules

Given N points on a 2D plane, determine if there is a line that divides them into two sets of N / 2 points each.
There are two more rules:
The sum of the distances of each set of points to this line should be the same.
The line can't pass through any of the points.
Extras (not sure if helps):
We can assume that N is large (~100k); -2000 <= x[i], y[i] <= 2000
Do you folks have any insights to this problem ? I really tried many stuff but I believe that I should use some sort of equality, or prove something like: sum(distancesSet1[i]) = sum(distancesSet2[i]).
If you want, I can also post here the stuff that I tried and failed (or I think it failed), but before I'd like to see your suggestions.
Thank you so much!
#Edit:
What I need to know for this problem is to exactly say whether it's possible or not given the set of N points.
Update: This was an attempt to answer the initial, more general question of whether it was possible to divide the points or not.
The problem as defined by your constraints is mathematically unsolvable. You can't guarantee that the sums of the distances will be equal for both sets.
All you need as proof is a counterexample:
S = [[-1000,0], [0,0], [1,0], [2,0]]
There is only one possible combination to separate the pairs:
S1 = [[-1000,0], [0,0]]
S2 = [[1,0], [2,0]]
All points are on a line L1. Given your bullet #2 we can conclude that any line L2 that separate those points will form an angle t wrt L1. The sum of the distances are then:
sum1 = a*sin(t) :: 1000 < a < 1002
sum2 = b*sin(t) :: 1 < b < 3
t != 0
sum1 > sum2
QED

Algorithm to find if triangles formed by set of points contains origin or not and give total count as well?

Input: S = {p1, . . . , pn}, n points on 2D plane each point is given by its x and y-coordinate.
For simplicity, we assume:
The origin (0, 0) is NOT in S.
Any line L passing through (0, 0) contains at most one point in S.
No three points in S lie on the same line.
If we pick any three points from S, we can form a triangle. So the total number of triangles that can be formed this way is Θ(n^3).
Some of these triangles contain (0, 0), some do not.
Problem: Calculate the number of triangles that contain (0, 0).
You may assume we have an O(1) time function Test(pi, pj , pk) that, given three points pi, pj , pk in S, returns 1, if the triangle formed by {pi, pj , pk} contains (0, 0), and returns 0 otherwise. It’s trivial to solve the problem in Θ(n^3) time (just enumerate and test all triangles).
Describe an algorithm for solving this problem with O(n log n) run time.
My analysis of the above problem leads to the following conclusion
There are 4 coordinates ( + ,+ ) , ( + ,- ) , ( -, - ), ( -, + ) { x and y coordinate > 0 or not }.
Let
s1 = coordinate x < 0 and y > 0
s2 = x > 0 , y > 0
s3 = x < 0 , y < 0
s4 = x > 0 , y < 0
Now we need to do the testing of points in between sets of the following combinations only
S1 S2 S3
S1 S1 S4
S2 S2 S3
S3 S3 S2
S1 S4 S4
S1 S3 S4
S1 S2 S4
S2 S3 S4
I now need to test the points in the above combination of sets only ( e.g. one point from s1 , one point from s2 and one point from s3 < first combinaton > ) and see the points contain (0,0) by calling Test function ( which is assumed as constant time function here) .
Can someone guide me on this ?
Image added below for clarification on why only some subsets (s1,s2 , s4 ) can contain (0,0) and some ( s1,s1,s3) cannot.
I'm guessing we're in the same class (based on the strange wording of the question), so now that the due date is past, I feel alright giving out my solution. I managed to find the n log n algorithm, which, as the question stated, is more a matter of cleverly transforming the problem, and less of a Dynamic Programming / DaC solution.
Note: This is not an exhaustive proof, I leave that to you.
First, some visual observations. Take some triangle that obviously contains the origin.
Then, convert the points to vectors.
Convince yourself that any selection of three points, one from each vector, describes a triangle that also contains the origin.
It also follows that, if you perform the above steps on a triangle that doesn't enclose the origin, any combination of points along those vectors will also not contain the origin.
The main point to get from this is, the magnitude of the vector does not matter, only the direction. Additionally, a hint to the question says that "any line crossing (0,0) only contains one point in S", from which we can extrapolate that the direction of each vector is unique.
So, if only the angle matters, it would follow that there is some logic that determines what range of points, given two points, could possibly form a triangle that encloses the origin. For simplicity, we'll assume we've taken all the points in S and converted them to vectors, then normalized them, effectively making all points lie on the unit circle.
So, take two points along this circle.
Then, draw a line from each point through the origin and to the opposite side of the circle.
It follows that, given the two points, any point that lies along the red arc can form a triangle.
So our algorithm should do the following:
Take each point in S. Make a secondary array A, and for each point, add the angle along the unit circle (atan2(x,y)) to A (0 ≤ Ai ≤ 2π). Let's assume this is O(n)
Sort A by increasing. O(n log n), assuming we use Merge Sort.
Count the number of triangles possible for each pair (Ai,Aj). This means that we count the number of Ai + π ≤ Ak ≤ Aj + π. Since the array is sorted, we can use a Binary Search to find the indices of Ai + π and Aj + π, which is O(2 log n) = O(log n)
However, we run into a problem, there are n^2 points, and if we have to do an O(log n) search for each, we have O(n^2 log n). So, we need to make one more observation.
Given some Ai < Aj, we'll say Tij describes the number of triangles possible, as calculated by the above method. Then, given a third Ak > Aj, we know that Tij ≤ Tik, as the number of points between Ai + π and Ak + π must be at least as many as there are betwen Ai + π and Aj + π. In fact, it is exactly the count between Ai + π and Aj + π, plus the count between Aj + π and Ak + π. Since we already know the count between Ai + π and Aj + π, we don't need to recalculate it - we only need to calculate the number between Aj + π and Ak + π, then add the previous count. It follows that:
A(n) = count(A(n),A(n-1)) + count(A(n-1),A(n-2)) + ... + count(A(1),A(0))
And this means we don't need to check all n^2 pairs, we only need to check consecutive pairs - so, only n-1.
So, all the above can give us the following psuedocode solution.
int triangleCount(point P[],int n)
int A[n], C[n], totalCount = 0;
for(i=0...n)
A[i] = atan2(P[i].x,P[i].y);
mergeSort(A);
int midPoint = binarySearch(A,π);
for(i=0...midPoint-1)
int left = A[i] + π, right = A[i+1] + π;
C[i] = binarySearch(a,right) - binarySearch(a,left);
for(j=0...i)
totalCount += C[j]
return totalCount;
It seems that in the worst case there are Θ(n3) triangles containing the origin, and since you need them all, the answer is no, there is no better algorithm.
For a worst case consider a regular polygon of an odd degree n, centered at the origin.
Here is an outline of the calculations. A chord connecting two vertices which are k < n/2 vertices apart is a base for Θ(k) triangles. Fix a vertex; its contribution is a sum over all chords coming from it, yielding Θ(n2), and a total (a contribution of all n vertices) is Θ(n3) (each triangle is counted 3 times, which doesn't affect the asymptotic).

Algorithm: Distance transform - any faster algorithm?

I'm trying to solve distance transform problem (using Manhattan's distance). Basically, giving matrix with 0's and 1's, program must assign distances of every position to nearest 1. For example, for this one
0000
0100
0000
0000
distance transform matrix is
2123
1012
2123
3234
Possible solutions from my head are:
Slowest ones (slowest because I have tried to implement them - they were lagging on very big matrices):
Brute-force - for every 1 that program reads, change distances accordingly from beginning till end.
Breadth-first search from 0's - for every 0, program looks for nearest 1 inside out.
Same as 2 but starting from 1's mark every distance inside out.
Much faster (read from other people's code)
Breadth-first search from 1's
1. Assign all values in the distance matrix to -1 or very big value.
2. While reading matrix, put all positions of 1's into queue.
3. While queue is not empty
a. Dequeue position - let it be x
b. For each position around x (that has distance 1 from it)
if position is valid (does not exceed matrix dimensions) then
if distance is not initialized or is greater than (distance of x) + 1 then
I. distance = (distance of x) + 1
II. enqueue position into queue
I wanted to ask if there is faster solution to that problem. I tried to search algorithms for distance transform but most of them are dealing with Euclidean distances.
Thanks in advance.
The breadth first search would perform Θ(n*m) operations where n and m are the width and height of your matrix.
You need to output Θ(n*m) numbers, so you can't get any faster than that from a theoretical point of view.
I'm assuming you are not interested in going towards discussions involving cache and such optimizations.
Note that this solution works in more interesting cases. For example, imagine the same question, but there could be different "sources":
00000
01000
00000
00000
00010
Using BFS, you will get the following distance-to-closest-source in the same time complexity:
21234
10123
21223
32212
32101
However, with a single source, there is another solution that might have a slightly better performance in practice (even though the complexity is still the same).
Before, let's observe the following property.
Property: If source is at (a, b), then a point (x, y) has the following manhattan distance:
d(x, y) = abs(x - a) + abs(y - b)
This should be quite easy to prove. So another algorithm would be:
for r in rows
for c in cols
d(r, c) = abc(r - a) + abs(c - b)
which is very short and easy.
Unless you write and test it, there is no easy way of comparing the two algorithms. Assuming an efficient bounded queue implementation (with an array), you have the following major operations per cell:
BFS: queue insertion/deletion, visit of each node 5 times (four times by neighbors, and one time out of the queue)
Direct formula: two subtraction and two ifs
It would really depend on the compiler and its optimizations as well as the specific CPU and memory architecture to say which would perform better.
That said, I'd advise for going with whichever seems simpler to you. Note however that with multiple sources, in the second solution you would need multiple passes on the array (or multiple distance calculations in one pass) and that would definitely have a worse performance than BFS for a large enough number of sources.
You don't need a queue or anything like that at all. Notice that if (i,j) is at distance d from (k,l), one way to realise that distance is to go left or right |i-k| times and then up or down |j-l| times.
So, initialise your matrix with big numbers and stick a zero everywhere you have a 1 in your input. Now do something like this:
for (i = 0; i < sx-1; i++) {
for (j = 0; j < sy-1; j++) {
dist[i+1][j] = min(dist[i+1][j], dist[i][j]+1);
dist[i][j+1] = min(dist[i][j+1], dist[i][j]+1);
}
dist[i][sy-1] = min(dist[i][sy-1], dist[i][sy-2]+1);
}
for (j = 0; j < sy-1; j++) {
dist[sx-1][j] = min(dist[sx-1][j], dist[sx-2][j]+1);
}
At this point, you've found all of the shortest paths that involve only going down or right. If you do a similar thing for going up and left, dist[i][j] will give you the distance from (i, j) to the nearest 1 in your input matrix.

Amount of points above some line

Consider some points on a 2d plane and function f(x)=ax, where b=0. Let's say a point is a 1x1 square.
Now we want to tell how many points is between f(x) function and y line, as in picture below.
Black points are valid, white not. We also say point is valid if it:
intersects with the y axis;
or with the function f(x);
or is between them.
As denoted in the picture :
How can we solve this, assuming that we don't remove any of the points and we don't add them? Is there any other approach than standard brute force?
If I am understanding this right the points are random and given to you by their coordinates, and the line is also given to you. If that is the case, there cannot be any a priori knowledge about any relationship between the points, so you'd have to go through them, in the order given, and compare their x coordinate with 0 and their y coordinate with f(x). If a point passes the check you increment the counter, otherwise you don't. The algorithm runs in O(n) time and I highly doubt you can do any better than that without some extra information about the points.
The question is quite unclear but it appears from comment "I mean find that a in f(x)=ax to have maximum points which are valid and their amount doesn't exceed some value X" that you want to find a such that N(a)=X, where by N(a) I mean number of points right of the y axis and above line y=ax; or if no such a exists, find a such that m = N(a)<X and N(b)<m implies N(b)<X.
Here's an O(n*ln(n)) algorithm: For each point p, excluding any p below y=0, compute slope M_p as ratio of p's y and x coordinates, or DBL_MAX if x=0. Sort the M's into ascending order (this is the O(n*ln(n)) step), and call the sorted array S.
Now we will set up an array T such that when any X is given, S[T[X-1]] is a slope that will place X points on or above that slope:
S[n] = DBL_MAX;
for (k=0, j=n-1; k<=n; --j) {
T[j] = k;
do ++k; while (S[k]==S[k-1] && k<=n);
}
Thereafter, let any X be given. Let h = T[X-1]. If h<n then N(S[h]) <= X; if h==n, there are multiple points on the Y axis and no finite slope will work.
This algorithm uses time O(n*ln(n)) and space O(n) to preprocess a set of n first-quadrant points, and thereafter uses time O(1) to find an a for any given X, 0 < X <= n, such that N(a) = X, if such a exists, else returns a such that N(a) < X < N(b) if b>a, else returns DBL_MAX.

Efficient algorithm for finding spheres farthest apart in large collection

I've got a collection of 10000 - 100000 spheres, and I need to find the ones farthest apart.
One simple way to do this is to simply compare all the spheres to each other and store the biggest distance, but this feels like a real resource hog of an algorithm.
The Spheres are stored in the following way:
Sphere (float x, float y, float z, float radius);
The method Sphere::distanceTo(Sphere &s) returns the distance between the two center points of the spheres.
Example:
Sphere *spheres;
float biggestDistance;
for (int i = 0; i < nOfSpheres; i++) {
for (int j = 0; j < nOfSpheres; j++) {
if (spheres[i].distanceTo(spheres[j]) > biggestDistance) {
biggestDistance = spheres[i].distanceTo(spheres[j]) > biggestDistance;
}
}
}
What I'm looking for is an algorithm that somehow loops through all the possible combinations in a smarter way, if there is any.
The project is written in C++ (which it has to be), so any solutions that only work in languages other than C/C++ are of less interest.
The largest distance between any two points in a set S of points is called the diameter. Finding the diameter of a set of points is a well-known problem in computational geometry. In general, there are two steps here:
Find the three-dimensional convex hull composed of the center of each sphere -- say, using the quickhull implementation in CGAL.
Find the points on the hull that are farthest apart. (Two points on the interior of the hull cannot be part of the diameter, or otherwise they would be on the hull, which is a contradiction.)
With quickhull, you can do the first step in O(n log n) in the average case and O(n2) worst-case running time. (In practice, quickhull significantly outperforms all other known algorithms.) It is possible to guarantee a better worst-case bound if you can guarantee certain properties about the ordering of the spheres, but that is a different topic.
The second step can be done in Ω(h log h), where h is the number of points on the hull. In the worst case, h = n (every point is on the hull), but that's pretty unlikely if you have thousands of random spheres. In general, h will be much smaller than n. Here's an overview of this method.
Could you perhaps store these spheres in a BSP Tree? If that's acceptable, then you could start by looking for nodes of the tree containing spheres which are furthest apart. Then you can continue down the tree until you get to individual spheres.
Your problem looks like something that could be solved using graphs. Since the distance from Sphere A to Sphere B is the same as the distance from Sphere B to Sphere A, you can minimize the number of comparisons you have to make.
I think what you're looking at here is known as an Adjacency List. You can either build one up, and then traverse that to find the longest distance.
Another approach you can use will still give you an O(n^2) but you can minimize the number of comparisons you have to make. You can store the result of your calculation into a hash table where the key is the name of the edge (so AB would hold the length from A to B). Before you perform your distance calculation, check to see if AB or BA exists in the hash table.
EDIT
Using the adjacency-list method (which is basically a Breadth-First Search) you get O(b^d) or worst-case O(|E| + |V|) complexity.
Paul got my brain thinking and you can optimize a bit by changing
for (int j=0; j < nOfSpheres; j++)
to
for (int j=i+1; j < nOfSpheres; j++)
You don't need to compare sphere A to B AND B to A. This will make the search O(n log n).
--- Addition -------
Another thing that makes this calculation expensive is the DistanceTo calulations.
distance = sqrt((x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2)
That's alot of math. You can trim that down by checking to see if
((x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2 > maxdist^2
Removes the sqrt until the end.

Resources