Efficient algorithm for finding spheres farthest apart in large collection - algorithm

I've got a collection of 10000 - 100000 spheres, and I need to find the ones farthest apart.
One simple way to do this is to simply compare all the spheres to each other and store the biggest distance, but this feels like a real resource hog of an algorithm.
The Spheres are stored in the following way:
Sphere (float x, float y, float z, float radius);
The method Sphere::distanceTo(Sphere &s) returns the distance between the two center points of the spheres.
Example:
Sphere *spheres;
float biggestDistance;
for (int i = 0; i < nOfSpheres; i++) {
for (int j = 0; j < nOfSpheres; j++) {
if (spheres[i].distanceTo(spheres[j]) > biggestDistance) {
biggestDistance = spheres[i].distanceTo(spheres[j]) > biggestDistance;
}
}
}
What I'm looking for is an algorithm that somehow loops through all the possible combinations in a smarter way, if there is any.
The project is written in C++ (which it has to be), so any solutions that only work in languages other than C/C++ are of less interest.

The largest distance between any two points in a set S of points is called the diameter. Finding the diameter of a set of points is a well-known problem in computational geometry. In general, there are two steps here:
Find the three-dimensional convex hull composed of the center of each sphere -- say, using the quickhull implementation in CGAL.
Find the points on the hull that are farthest apart. (Two points on the interior of the hull cannot be part of the diameter, or otherwise they would be on the hull, which is a contradiction.)
With quickhull, you can do the first step in O(n log n) in the average case and O(n2) worst-case running time. (In practice, quickhull significantly outperforms all other known algorithms.) It is possible to guarantee a better worst-case bound if you can guarantee certain properties about the ordering of the spheres, but that is a different topic.
The second step can be done in Ω(h log h), where h is the number of points on the hull. In the worst case, h = n (every point is on the hull), but that's pretty unlikely if you have thousands of random spheres. In general, h will be much smaller than n. Here's an overview of this method.

Could you perhaps store these spheres in a BSP Tree? If that's acceptable, then you could start by looking for nodes of the tree containing spheres which are furthest apart. Then you can continue down the tree until you get to individual spheres.

Your problem looks like something that could be solved using graphs. Since the distance from Sphere A to Sphere B is the same as the distance from Sphere B to Sphere A, you can minimize the number of comparisons you have to make.
I think what you're looking at here is known as an Adjacency List. You can either build one up, and then traverse that to find the longest distance.
Another approach you can use will still give you an O(n^2) but you can minimize the number of comparisons you have to make. You can store the result of your calculation into a hash table where the key is the name of the edge (so AB would hold the length from A to B). Before you perform your distance calculation, check to see if AB or BA exists in the hash table.
EDIT
Using the adjacency-list method (which is basically a Breadth-First Search) you get O(b^d) or worst-case O(|E| + |V|) complexity.

Paul got my brain thinking and you can optimize a bit by changing
for (int j=0; j < nOfSpheres; j++)
to
for (int j=i+1; j < nOfSpheres; j++)
You don't need to compare sphere A to B AND B to A. This will make the search O(n log n).
--- Addition -------
Another thing that makes this calculation expensive is the DistanceTo calulations.
distance = sqrt((x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2)
That's alot of math. You can trim that down by checking to see if
((x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2 > maxdist^2
Removes the sqrt until the end.

Related

Efficiently obtain the graph from given facts

I have a set of points in 2-D plane(xy) say n points.
(x1, y1), (x2, y2), (x3, y3), ......................., (xn, yn)
My objective is to draw a graph. Two nodes(points) in the graph will be connected iff
abs(difference in x coordinate) + abs(difference in y coordinate) = L(given).
It can be done O(n*n). Is it possible to do it efficiently.
BTW I am trying to solve this problem
You can do this in O(n log n + E) time, where E is the actual number of edges you end up with (that is, the number of pairs of neighbors).
For any given point, its allowed-neighbor locations form a diamond shape with side-length L√2:
*
* *
* *
* *
* o *
* *
* *
* *
*
If you sort the points by x + y with fallback to x − y, then a single O(n + E) pass through the sorted points will let you find all neighbors of this type:
*
*
*
*
o *
for each point. (To do this, you use an index i to keep track of the current point you're finding neighbors for, and a separate index j to keep track of the line of allowed neighbors such that xj − yj = xi − yi + L. That may sound like O(n2), since you have two indices into the array; but the trick is that j is monotonically increasing with i, so each of i and j make just a single pass through the array. This would even be an O(n) pass, except that if you do find any neighbors of (xi, yi), then you'll need to re-consider them as potential neighbors for (xi+1, yi+1), so you can't increment j. So it comes out to an O(n + E) pass.)
You can then re-sort them by y − x with fallback to x + y, and repeat the process to find these neighbors:
*
*
*
*
* o
And since neighbor-ness is a symmetric relation, you don't actually need to worry about the remaining neighbors:
o
* *
* *
* *
*
(The overall O(n log n + E) time includes O(n log n) time to sort the points, plus the time for the two O(n + E) passes.)
It is certainly possible to do it efficiently given certain assumptions about the data. I'll think about more general cases. If, for instance, the points are distributed homogeneously and the interaction distance L(given) is small relative to the spread of the data then the problem can be converted to O(n) by binning the particles.
This takes you from the situation on the left to the situation on the right:
The bin size is taken to be >=L(given) and, for any particle, the particle's bin and the 8 neighbouring bins are all searched. If the number of particles in a bin averages a constant d, then the problem is solvable in O(9dn)=O(n) time.
Another possibility, related to the foregoing, is to use a sparse-matrix structure to store 1 values at the location of all of your points and 0 values elsewhere.
While nice libraries exist for this, you can fake it by coming up with a hash which combines your x and y coordinates. In C++ that looks something like:
std::unordered_set< std::pair<int,int> > hashset;
Presize the hashtable so it is perhaps 30-50% larger than you need to avoid expensive rehashing.
Add all the points to the hashset; this takes O(n) time.
Now, the interaction distance L(given) defines a diamond about a center point. You could pregenerate a list of offsets for this diamond. For instance, if L=2, the offsets are:
int dx[]={0,-2,-1,0,1,2, 1,0,-1};
int dy[]={0, 0, 1,2,1,0,-1,2,-1};
Now, for each point, loop over the list of offsets and add them to that point's coordinates. This generates an implicit list of locations where neighbours could be. Use the hashset to check if that neighbour exists. This takes O(n) time and is efficient if 8L << N (with some qualifications about the number of neighbours reachable from the first node).
I like ruakh#'s solution a lot. Another approach would allow incrementally growing the point set without a loss in efficiency.
To add each point P, you'd search the tree for points Q meeting your criteria and add edges when any were found.
At each level of any k-d tree search, there is available the rectangular extent represented by each child. In this case, you would only continue the search "down" into a child node if and only if its extent could possibly contain a point matching P. I.e., the rectangle would have to include some part of the diamond that ruakh# describes.
Analysis of k-d tree searches is usually tricky. I'm pretty sure this algorithm runs in expected O(|E| log n) time for a random point set, but it's also pretty easy to imagine point sets where performance is better and others where it's worse.
consider the lines y = x and y = -x
consider the distance of each point from these lines.
two points are connected only if they have the right difference of distance to these two lines.
Thus you can bucket all the points by distance to these lines. Then in each bucket, have an ordered map of points (ordered by how far along the line they were). Any points within the right distance in this ordered map should be connected in the graph.
should be N*Log(N) worse case, even if all the points are ontop of each other.

Algorithm: Distance transform - any faster algorithm?

I'm trying to solve distance transform problem (using Manhattan's distance). Basically, giving matrix with 0's and 1's, program must assign distances of every position to nearest 1. For example, for this one
0000
0100
0000
0000
distance transform matrix is
2123
1012
2123
3234
Possible solutions from my head are:
Slowest ones (slowest because I have tried to implement them - they were lagging on very big matrices):
Brute-force - for every 1 that program reads, change distances accordingly from beginning till end.
Breadth-first search from 0's - for every 0, program looks for nearest 1 inside out.
Same as 2 but starting from 1's mark every distance inside out.
Much faster (read from other people's code)
Breadth-first search from 1's
1. Assign all values in the distance matrix to -1 or very big value.
2. While reading matrix, put all positions of 1's into queue.
3. While queue is not empty
a. Dequeue position - let it be x
b. For each position around x (that has distance 1 from it)
if position is valid (does not exceed matrix dimensions) then
if distance is not initialized or is greater than (distance of x) + 1 then
I. distance = (distance of x) + 1
II. enqueue position into queue
I wanted to ask if there is faster solution to that problem. I tried to search algorithms for distance transform but most of them are dealing with Euclidean distances.
Thanks in advance.
The breadth first search would perform Θ(n*m) operations where n and m are the width and height of your matrix.
You need to output Θ(n*m) numbers, so you can't get any faster than that from a theoretical point of view.
I'm assuming you are not interested in going towards discussions involving cache and such optimizations.
Note that this solution works in more interesting cases. For example, imagine the same question, but there could be different "sources":
00000
01000
00000
00000
00010
Using BFS, you will get the following distance-to-closest-source in the same time complexity:
21234
10123
21223
32212
32101
However, with a single source, there is another solution that might have a slightly better performance in practice (even though the complexity is still the same).
Before, let's observe the following property.
Property: If source is at (a, b), then a point (x, y) has the following manhattan distance:
d(x, y) = abs(x - a) + abs(y - b)
This should be quite easy to prove. So another algorithm would be:
for r in rows
for c in cols
d(r, c) = abc(r - a) + abs(c - b)
which is very short and easy.
Unless you write and test it, there is no easy way of comparing the two algorithms. Assuming an efficient bounded queue implementation (with an array), you have the following major operations per cell:
BFS: queue insertion/deletion, visit of each node 5 times (four times by neighbors, and one time out of the queue)
Direct formula: two subtraction and two ifs
It would really depend on the compiler and its optimizations as well as the specific CPU and memory architecture to say which would perform better.
That said, I'd advise for going with whichever seems simpler to you. Note however that with multiple sources, in the second solution you would need multiple passes on the array (or multiple distance calculations in one pass) and that would definitely have a worse performance than BFS for a large enough number of sources.
You don't need a queue or anything like that at all. Notice that if (i,j) is at distance d from (k,l), one way to realise that distance is to go left or right |i-k| times and then up or down |j-l| times.
So, initialise your matrix with big numbers and stick a zero everywhere you have a 1 in your input. Now do something like this:
for (i = 0; i < sx-1; i++) {
for (j = 0; j < sy-1; j++) {
dist[i+1][j] = min(dist[i+1][j], dist[i][j]+1);
dist[i][j+1] = min(dist[i][j+1], dist[i][j]+1);
}
dist[i][sy-1] = min(dist[i][sy-1], dist[i][sy-2]+1);
}
for (j = 0; j < sy-1; j++) {
dist[sx-1][j] = min(dist[sx-1][j], dist[sx-2][j]+1);
}
At this point, you've found all of the shortest paths that involve only going down or right. If you do a similar thing for going up and left, dist[i][j] will give you the distance from (i, j) to the nearest 1 in your input matrix.

Given list of 2d points, find the point closest to all other points

Input: list of 2d points (x,y) where x and y are integers.
Distance: distance is defined as the Manhattan distance.
ie:
def dist(p1,p2)
return abs(p1.x-p2.x) + abs(p1.y - p2.y)
What is an efficient algorithm to find the point that is closest to all other points.
I can only think of a brute force O(n^2) solution:
minDist=inf
bestPoint = null
for p1 in points:
dist = 0
for p2 in points:
dist+=distance(p1,p2)
minDist = min(dist,minDist)
bestPoint = argmin(p1, bestPoint)
basically look at every pair of points.
Note that in 1-D the point that minimizes the sum of distances to all the points is the median.
In 2-D the problem can be solved in O(n log n) as follows:
Create a sorted array of x-coordinates and for each element in the array compute the "horizontal" cost of choosing that coordinate. The horizontal cost of an element is the sum of distances to all the points projected onto the X-axis. This can be computed in linear time by scanning the array twice (once from left to right and once in the reverse direction). Similarly create a sorted array of y-coordinates and for each element in the array compute the "vertical" cost of choosing that coordinate.
Now for each point in the original array, we can compute the total cost to all other points in O(1) time by adding the horizontal and vertical costs. So we can compute the optimal point in O(n). Thus the total running time is O(n log n).
What you are looking for is center of mass.
You basically add all xs' and ys' together and divide be the mass of the whole system.
Now, You have uniform mass less particles let their mass be 1.
then all you have to do is is sum the location of the particles and divide by the number of particles.
say we have p1(1,2) p2(1,1) p3 (1,0)
// we sum the x's
bestXcord = (1+1+1)/3 = 1
//we sum the y's
bestYcord = (2+1)/3 = 1
so p2 is the closest.
solved in O(n)
Starting from your initial algortihm, there is an optimization possible:
minDist=inf
bestPoint = null
for p1 in points:
dist = 0
for p2 in points:
dist+=distance(p1,p2)
//This will weed out bad points rather fast
if dist>=minDist then continue(p1)
/*
//Unnecessary because of new line above
minDist = min(dist,minDist)
bestPoint = argmin(p1, bestPoint)
*/
bestPoint = p1
The idea is, to throw away outliers as fast as possible. This can be improved more:
start p1 loop with a heuristic "inner" point (This creates a good minDist first, so worse points get thrown away faster)
start p2 loop with heuristic "outer" points (This makes dist rise fast, possibly triggering the exit condition faster
If you trade size for speed, you can go another route:
//assumes points are numbered 0..n
dist[]=int[n+1]; //initialized to 0
for (i=0;i<n;i++)
for (j=i+1;j<=n;j++) {
dist=dist(p[i], p[j]);
dist[i]+=dist;
dist[j]+=dist;
}
minDist=min(dist);
bestPoint=p[argmin(dist)];
which needs more space for the dist array, but calculates each distance only once. This has the advantage to be better suited to a "get the N best points" sort of problem and for calculation-intensive metrices. I suspect it would bring nothing for the manhattan metric, on an x86 or x64 architecture though: The memory access would heavily dominate the calculation.

Fast algorithm to find out the number of points under hyperplane

Given points in Euclidean space, is there a fast algorithm to count the number of points 'under' one arbitrary hyperplane? Fast means time complexity lower than O(n)
Time for preprocessing or sorting the points is okay
And, even if not high dimensional, I'd like to know whether there exists one that can be used in 2 dimension space
If you're willing to preprocess the points, then you have to visit each one at least once, which is O(n). If you consider a test of which side the point is on as part of the preprocessing then you've got an O(0) algorithm (with O(n) preprocessing). So I don't think this question makes sense as stated.
Nevertheless, I'll attempt to give a useful answer, even if it's not precisely what the OP asked for.
Choose a hyperplane unit normal and root point. If the plane is given in parametric form
(P - O).N == 0
then you have these already, just make sure the normal is unitized.
If it's given in analytic form: Sum(i = 1 to n: a[i] x[i]) + d = 0, then the vector A = (a1, ... a[n]) is a normal of the plane, and N = A/||A|| is the unit plane normal. A point O (for origin) on the plane is d N.
You can test which side each point P is on by projecting it onto N add checking the sign of the parameter:
Let V = P - O. V is the vector from the chosen origin O to P.
Let s N be the projection of V onto N. If s is negative, then P is "under" the hyperplane.
You should go to the link on vector projection if you're rusty on the subject, but I'll summarize here using my notation. Or, you can take my word for it, and just skip to the formula at the end.
If alpha is the angle between V and N, then from the definition of cosine we have cos(alpha) = s||N||/||V|| = s/||V|| since N is a unit normal. But we also know from vector algebra that cos(alpha) = ||V||(V.N), where "." is scalar product (a.k.a. dot product, or euclidean inner product).
Equating these two expressions for cos(alpha) we have
s = (V.V)(V.N)
(using the fact that ||V||^2 == V.V).
So your proprocesing work is to compute N and O, and your test is:
bool is_under = (dot(V, V)*dot(V, N) < 0.);
I don't believe it can be done any faster.
When setting the point values, use checking conditions at that point setting. Then increment or dont increment the counter. O(n)
I found O(logN) algorithm in 2D dimension by using divide-and-conquer and binary search with O(N log N) preprocessing time complexity and O(N log N) memory complexity
The basic idea is that points can be divided into left N/2 points and right N/2 points, and the number of points that's under the line(in 2D dimension) is sum of the number of left points under the line and the number of the right points under the line. I'll call the infinite line that divides whole points into 'left' and 'right' as 'dividing line'. Dividing line will be look like 'x = k'
If each 'left points' and 'right points' are sorted by y-axis order, then the number of specific points - the points at the right lower corner - can be quickly found by using binary searching 'the number of points whose y values are lower than the y value of intersection point of the line and the Dividing line'.
Therefore time complexity is
T(N) = 2T(N/2) + O(log N)
and finally the time complexity is O(log N)

Find all collinear points in a given set

This is an interview question: "Find all collinear points in a given set".
As I understand, they ask to print out the points, which lie in the same line (and every two points are always collinear). I would suggest the following.
Let's introduce two types Line (pair of doubles) and Point (pair of integers).
Create a multimap : HashMap<Line, List<Point>)
Loop over all pairs of points and for each pair: calculate the Line connecting the points and add the line with those points to the multimap.
Finally, the multimap contains the lines as the keys and a list collinear points for each line as its value.
The complexity is O(N^2). Does it make sense ? Are there better solutions ?
Collinear here doesn't really make sense unless you fix on 2 points to begin with. So to say, "find all collinear points in a given set" doesn't make much sense in my opinion, unless you fix on 2 points and test the others to see if they're collinear.
Maybe a better question is, what is the maximum number of collinear points in the given set? In that case, you could fix on 2 points (just use 2 loops), then loop over the other points and check that the slope matches between the fixed points and the other points. You could use something like this for that check, assuming the coordinates are integer (you could change parameter types to double otherwise).
// ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
// Returns whether 3 points are collinear
// ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
bool collinear(int x1, int y1, int x2, int y2, int x3, int y3) {
return (y1 - y2) * (x1 - x3) == (y1 - y3) * (x1 - x2);
}
So the logic becomes:
int best = 2;
for (int i = 0; i < number of points; ++i) {
for (int j = i+1, j < number of points; j++) {
int count = 2;
for (int k = 0; i < number of points; ++k) {
if (k==i || k==j)
continue;
check that points i, j and k are collinear (use function above), if so, increment count.
}
best = max(best,count);
}
}
solve the problem in dual plane, convert all the points in to line segments. And then run a plane sweep to report all the intersections. The lines in the dual plane will pass through same point represents collinear points. And the plane sweep can be done in O(nlogn) time.
Are you sure your analysis of the runtime is correct? You say to loop over all the pairs of points, of which there are n*(n-1)/2, i.e. O(n^2). Then you add the line and the pair of points to the map. However, I don't think the time to add such a line + point combination is constant. This means overall your time is O(n^2 log n) not a constant times n^2, which is what O(n^2) means.
So the real question would be, given that it can be done in time O(n^2 log n) can it be done in time O(n^2). Clearly O(n^2) gives a lower bound as you must at the very least print out every pair of points, of which there are O(n^2). My feeling is this is a completely general sorting problem and that one cannot expect better than O(n^2 log n) in general. However a full proof of that fact may be non-trivial.
Another thing to beware of is that the set may contain zero or one points, and so you will need to check that your algorithm handles these cases correctly, otherwise write special cases for those.

Resources