Efficiently obtain the graph from given facts - algorithm

I have a set of points in 2-D plane(xy) say n points.
(x1, y1), (x2, y2), (x3, y3), ......................., (xn, yn)
My objective is to draw a graph. Two nodes(points) in the graph will be connected iff
abs(difference in x coordinate) + abs(difference in y coordinate) = L(given).
It can be done O(n*n). Is it possible to do it efficiently.
BTW I am trying to solve this problem

You can do this in O(n log n + E) time, where E is the actual number of edges you end up with (that is, the number of pairs of neighbors).
For any given point, its allowed-neighbor locations form a diamond shape with side-length L√2:
*
* *
* *
* *
* o *
* *
* *
* *
*
If you sort the points by x + y with fallback to x − y, then a single O(n + E) pass through the sorted points will let you find all neighbors of this type:
*
*
*
*
o *
for each point. (To do this, you use an index i to keep track of the current point you're finding neighbors for, and a separate index j to keep track of the line of allowed neighbors such that xj − yj = xi − yi + L. That may sound like O(n2), since you have two indices into the array; but the trick is that j is monotonically increasing with i, so each of i and j make just a single pass through the array. This would even be an O(n) pass, except that if you do find any neighbors of (xi, yi), then you'll need to re-consider them as potential neighbors for (xi+1, yi+1), so you can't increment j. So it comes out to an O(n + E) pass.)
You can then re-sort them by y − x with fallback to x + y, and repeat the process to find these neighbors:
*
*
*
*
* o
And since neighbor-ness is a symmetric relation, you don't actually need to worry about the remaining neighbors:
o
* *
* *
* *
*
(The overall O(n log n + E) time includes O(n log n) time to sort the points, plus the time for the two O(n + E) passes.)

It is certainly possible to do it efficiently given certain assumptions about the data. I'll think about more general cases. If, for instance, the points are distributed homogeneously and the interaction distance L(given) is small relative to the spread of the data then the problem can be converted to O(n) by binning the particles.
This takes you from the situation on the left to the situation on the right:
The bin size is taken to be >=L(given) and, for any particle, the particle's bin and the 8 neighbouring bins are all searched. If the number of particles in a bin averages a constant d, then the problem is solvable in O(9dn)=O(n) time.

Another possibility, related to the foregoing, is to use a sparse-matrix structure to store 1 values at the location of all of your points and 0 values elsewhere.
While nice libraries exist for this, you can fake it by coming up with a hash which combines your x and y coordinates. In C++ that looks something like:
std::unordered_set< std::pair<int,int> > hashset;
Presize the hashtable so it is perhaps 30-50% larger than you need to avoid expensive rehashing.
Add all the points to the hashset; this takes O(n) time.
Now, the interaction distance L(given) defines a diamond about a center point. You could pregenerate a list of offsets for this diamond. For instance, if L=2, the offsets are:
int dx[]={0,-2,-1,0,1,2, 1,0,-1};
int dy[]={0, 0, 1,2,1,0,-1,2,-1};
Now, for each point, loop over the list of offsets and add them to that point's coordinates. This generates an implicit list of locations where neighbours could be. Use the hashset to check if that neighbour exists. This takes O(n) time and is efficient if 8L << N (with some qualifications about the number of neighbours reachable from the first node).

I like ruakh#'s solution a lot. Another approach would allow incrementally growing the point set without a loss in efficiency.
To add each point P, you'd search the tree for points Q meeting your criteria and add edges when any were found.
At each level of any k-d tree search, there is available the rectangular extent represented by each child. In this case, you would only continue the search "down" into a child node if and only if its extent could possibly contain a point matching P. I.e., the rectangle would have to include some part of the diamond that ruakh# describes.
Analysis of k-d tree searches is usually tricky. I'm pretty sure this algorithm runs in expected O(|E| log n) time for a random point set, but it's also pretty easy to imagine point sets where performance is better and others where it's worse.

consider the lines y = x and y = -x
consider the distance of each point from these lines.
two points are connected only if they have the right difference of distance to these two lines.
Thus you can bucket all the points by distance to these lines. Then in each bucket, have an ordered map of points (ordered by how far along the line they were). Any points within the right distance in this ordered map should be connected in the graph.
should be N*Log(N) worse case, even if all the points are ontop of each other.

Related

Given a simple polygon P, consisting of n vertices, and Set S Of k points, determine if each of the polygon vertices are covered by some point from S

Given a simple polygon P, consisting of n vertices, and Set S Of k points, determine if each of the polygon vertices are covered by some point from S.
My best solution was to check for every P vertex if there exist such point in S - total complexity of O(n*k). I belive there should be a more efficient solution. any hints?
Whether P is a polygon or not seems to be irrelevant. So the generalized question becomes: Given 2 sets of points A (with a points) and B (with b points), find out whether A is a subset of B or not?
The simple solution is O(a * b) but you can also get O(a + b) by doing some preprocessing.
Put all the points of B in a hash map with the x-coordinate as key and a hash set with the y-coordinates as values (Map<Number,Set<Number>>). This lets you query whether a point (x, y) is in B in O(1): map.containsKey(x) && map.get(x).contains(y).
Go through all the points of A and check whether the point is in B using the datastructure created above.
Step 1 is O(b) and Step 2 is O(a) which gives O(a + b).

A divide-and-conquer algorithm for counting dominating points?

Let's say that a point at coordinate (x1,y1) dominates another point (x2,y2) if x1 ≤ x2 and y1 ≤ y2;
I have a set of points (x1,y1) , ....(xn,yn) and I want to find the total number of dominating pairs. I can do this using brute force by comparing all points against one another, but this takes time O(n2). Instead, I'd like to use a divide-and-conquer approach to solve this in time O(n log n).
Right now, I have the following algorithm:
Draw a vertical line dividing the set of points points into two equal subsets of Pleft and Pright. As a base case, if there are just two points left, I can compare them directly.
Recursively count the number of dominating pairs in Pleft and Pright
Some conquer step?
The problem is that I can't see what the "conquer" step should be here. I want to count how many dominating pairs there are that cross from Pleft into Pright, but I don't know how to do that without comparing all the points in both parts, which would take time O(n2).
Can anyone give me a hint about how to do the conquer step?
so the 2 halves of y coordinates are : {1,3,4,5,5} and {5,8,9,10,12}
i draw the division line.
Suppose you sort the points in both halves separately in ascending order by their y coordinates. Now, look at the lowest y-valued point in both halves. If the lowest point on the left has a lower y value than the lowest point on the right, then that point is dominated by all points on the right. Otherwise, the bottom point on the right doesn't dominate anything on the left.
In either case, you can remove one point from one of the two halves and repeat the process with the remaining sorted lists. This does O(1) work per point, so if there are n total points, this does O(n) work (after sorting) to count the number of dominating pairs across the two halves. If you've seen it before, this is similar to the algorithm for counting inversions in an array).
Factoring in the time required to sort the points (O(n log n)), this conquer step takes O(n log n) time, giving the recurrence
T(n) = 2T(n / 2) + O(n log n)
This solves to O(n log2 n) according to the Master Theorem.
However, you can speed this up. Suppose that before you start the divide amd conquer step that you presort the points by their y coordinates, doing one pass of O(n log n) work. Using tricks similar to the closest pair of points problem, you can then get the points in each half sorted in O(n) time on each subproblem of size n (see the discussion at this bottom of this page) for details). That changes the recurrence to
T(n) = 2T(n / 2) + O(n)
Which solves to O(n log n), as required.
Hope this helps!
Well in this way you have O(n^2) just for division to subsets...
My approach would be different
sort points by X ... O(n.log(n))
now check for Y
but check only points with bigger X (if you sort them ascending then with larger index)
so now you have O(n.log(n)+(n.n/2))
You can also further speed things up by doing separate X and Y test and after that combine the result, that would leads O(n + 3.n.log(n))
add index attribute to your points
where index = 0xYYYYXXXXh is unsigned integer type
YYYY is index of point in Y-sorted array
XXXX is index of point in X-sorted array
if you have more than 2^16 points use bigger then 32-bit data-type.
sort points by ascending X and set XXXX part of their index O1(n.log(n))
sort points by ascending Y and set YYYY part of their index O2(n.log(n))
sort points by ascending index O3(n.log(n))
now point i dominates any point j if (i < j)
but if you need to create actually all the pairs for any point
that would take O4(n.n/2) so this approach will save not a bit of time
if you need just single pair for any point then simple loop will suffice O4(n-1)
so in this case O(n-1+3.n.log(n)) -> ~O(n+3.n.log(n))
hope it helped,... of course if you are stuck with that subdivision approach than i have no better solution for you.
PS. for this you do not need any additional recursion just 3x sorting and only one uint for any point so the memory requirements are not that big and even should be faster than recursive call to subdivision recursion in general
This algorithm runs in O(N*log(N)) where N is the size of the list of points and it uses O(1) extra space.
Perform the following steps:
Sort the list of points by y-coordinate (ascending order), break ties by
x-coordinate (ascending order).
Go through the sorted list in reverse order to count the dominating points:
if the current x-coordinate >= max x-coordinate value encountered so far
then increment the result and update the max.
This works since you know for sure that if all pairs with a greater y-coordinates have a smaller x-coordinate than the current point you have found a dominating points. The sorting step makes it really efficient.
Here's the Python code:
def my_cmp(p1, p2):
delta_y = p1[1] - p2[1]
if delta_y != 0:
return delta_y
return p1[0] - p2[0]
def count_dom_points(points):
points.sort(cmp = my_cmp)
maxi = float('-inf')
count = 0
for x, y in reversed(points):
if x >= maxi:
count += 1
maxi = x
return count

Finding a square side length is R in 2D plane ?

I was at the high frequency Trading firm interview, they asked me
Find a square whose length size is R with given n points in the 2D plane
conditions:
--parallel sides to the axis
and it contains at least 5 of the n points
running complexity is not relative to the R
they told me to give them O(n) algorithm
Interesting problem, thanks for posting! Here's my solution. It feels a bit inelegant but I think it meets the problem definition:
Inputs: R, P = {(x_0, y_0), (x_1, y_1), ..., (x_N-1, y_N-1)}
Output: (u,v) such that the square with corners (u,v) and (u+R, v+R) contains at least 5 points from P, or NULL if no such (u,v) exist
Constraint: asymptotic run time should be O(n)
Consider tiling the plane with RxR squares. Construct a sparse matrix, B defined as
B[i][j] = {(x,y) in P | floor(x/R) = i and floor(y/R) = j}
As you are constructing B, if you find an entry that contains at least five elements stop and output (u,v) = (i*R, j*R) for i,j of the matrix entry containing five points.
If the construction of B did not yield a solution then either there is no solution or else the square with side length R does not line up with our tiling. To test for this second case we will consider points from four adjacent tiles.
Iterate the non-empty entries in B. For each non-empty entry B[i][j], consider the collection of points contained in the tile represented by the entry itself and in the tiles above and to the right. These are the points in entries: B[i][j], B[i+1][j], B[i][j+1], B[i+1][j+1]. There can be no more than 16 points in this collection, since each entry must have fewer than 5. Examine this collection and test if there are 5 points among the points in this collection satisfying the problem criteria; if so stop and output the solution. (I could specify this algorithm in more detail, but since (a) such an algorithm clearly exists, and (b) its asymptotic runtime is O(1), I won't go into that detail).
If after iterating the entries in B no solution is found then output NULL.
The construction of B involves just a single pass over P and hence is O(N). B has no more than N elements, so iterating it is O(N). The algorithm for each element in B considers no more than 16 points and hence does not depend on N and is O(1), so the overall solution meets the O(N) target.
Run through set once, keeping the 5 largest x values in a (sorted) local array. Maintaining the sorted local array is O(N) (constant time performed N times at most).
Define xMin and xMax as the x-coordinates of the two points with largest and 5th largest x values respectively (ie (a[0] and a[4]).
Sort a[] again on Y value, and set yMin and yMax as above, again in constant time.
Define deltaX = xMax- xMin, and deltaY as yMax - yMin, and R = largest of deltaX and deltaY.
The square of side length R located with upper-right at (xMax,yMax) meets the criteria.
Observation if R is fixed in advance:
O(N) complexity means no sort is allowed except on a fixed number of points, as only a Radix sort would meet the criteria and it requires a constraint on the values of xMax-xMin and of yMax-yMin, which was not provided.
Perhaps the trick is to start with the point furthest down and left, and move up and right. The lower-left-most point can be determined in a single pass of the input.
Moving up and right in steps and counitng points in the square requries sorting the points on X and Y in advance, which to be done in O(N) time requiress that the Radix sort constraint be met.

Fast algorithm to find out the number of points under hyperplane

Given points in Euclidean space, is there a fast algorithm to count the number of points 'under' one arbitrary hyperplane? Fast means time complexity lower than O(n)
Time for preprocessing or sorting the points is okay
And, even if not high dimensional, I'd like to know whether there exists one that can be used in 2 dimension space
If you're willing to preprocess the points, then you have to visit each one at least once, which is O(n). If you consider a test of which side the point is on as part of the preprocessing then you've got an O(0) algorithm (with O(n) preprocessing). So I don't think this question makes sense as stated.
Nevertheless, I'll attempt to give a useful answer, even if it's not precisely what the OP asked for.
Choose a hyperplane unit normal and root point. If the plane is given in parametric form
(P - O).N == 0
then you have these already, just make sure the normal is unitized.
If it's given in analytic form: Sum(i = 1 to n: a[i] x[i]) + d = 0, then the vector A = (a1, ... a[n]) is a normal of the plane, and N = A/||A|| is the unit plane normal. A point O (for origin) on the plane is d N.
You can test which side each point P is on by projecting it onto N add checking the sign of the parameter:
Let V = P - O. V is the vector from the chosen origin O to P.
Let s N be the projection of V onto N. If s is negative, then P is "under" the hyperplane.
You should go to the link on vector projection if you're rusty on the subject, but I'll summarize here using my notation. Or, you can take my word for it, and just skip to the formula at the end.
If alpha is the angle between V and N, then from the definition of cosine we have cos(alpha) = s||N||/||V|| = s/||V|| since N is a unit normal. But we also know from vector algebra that cos(alpha) = ||V||(V.N), where "." is scalar product (a.k.a. dot product, or euclidean inner product).
Equating these two expressions for cos(alpha) we have
s = (V.V)(V.N)
(using the fact that ||V||^2 == V.V).
So your proprocesing work is to compute N and O, and your test is:
bool is_under = (dot(V, V)*dot(V, N) < 0.);
I don't believe it can be done any faster.
When setting the point values, use checking conditions at that point setting. Then increment or dont increment the counter. O(n)
I found O(logN) algorithm in 2D dimension by using divide-and-conquer and binary search with O(N log N) preprocessing time complexity and O(N log N) memory complexity
The basic idea is that points can be divided into left N/2 points and right N/2 points, and the number of points that's under the line(in 2D dimension) is sum of the number of left points under the line and the number of the right points under the line. I'll call the infinite line that divides whole points into 'left' and 'right' as 'dividing line'. Dividing line will be look like 'x = k'
If each 'left points' and 'right points' are sorted by y-axis order, then the number of specific points - the points at the right lower corner - can be quickly found by using binary searching 'the number of points whose y values are lower than the y value of intersection point of the line and the Dividing line'.
Therefore time complexity is
T(N) = 2T(N/2) + O(log N)
and finally the time complexity is O(log N)

Efficient algorithm for finding spheres farthest apart in large collection

I've got a collection of 10000 - 100000 spheres, and I need to find the ones farthest apart.
One simple way to do this is to simply compare all the spheres to each other and store the biggest distance, but this feels like a real resource hog of an algorithm.
The Spheres are stored in the following way:
Sphere (float x, float y, float z, float radius);
The method Sphere::distanceTo(Sphere &s) returns the distance between the two center points of the spheres.
Example:
Sphere *spheres;
float biggestDistance;
for (int i = 0; i < nOfSpheres; i++) {
for (int j = 0; j < nOfSpheres; j++) {
if (spheres[i].distanceTo(spheres[j]) > biggestDistance) {
biggestDistance = spheres[i].distanceTo(spheres[j]) > biggestDistance;
}
}
}
What I'm looking for is an algorithm that somehow loops through all the possible combinations in a smarter way, if there is any.
The project is written in C++ (which it has to be), so any solutions that only work in languages other than C/C++ are of less interest.
The largest distance between any two points in a set S of points is called the diameter. Finding the diameter of a set of points is a well-known problem in computational geometry. In general, there are two steps here:
Find the three-dimensional convex hull composed of the center of each sphere -- say, using the quickhull implementation in CGAL.
Find the points on the hull that are farthest apart. (Two points on the interior of the hull cannot be part of the diameter, or otherwise they would be on the hull, which is a contradiction.)
With quickhull, you can do the first step in O(n log n) in the average case and O(n2) worst-case running time. (In practice, quickhull significantly outperforms all other known algorithms.) It is possible to guarantee a better worst-case bound if you can guarantee certain properties about the ordering of the spheres, but that is a different topic.
The second step can be done in Ω(h log h), where h is the number of points on the hull. In the worst case, h = n (every point is on the hull), but that's pretty unlikely if you have thousands of random spheres. In general, h will be much smaller than n. Here's an overview of this method.
Could you perhaps store these spheres in a BSP Tree? If that's acceptable, then you could start by looking for nodes of the tree containing spheres which are furthest apart. Then you can continue down the tree until you get to individual spheres.
Your problem looks like something that could be solved using graphs. Since the distance from Sphere A to Sphere B is the same as the distance from Sphere B to Sphere A, you can minimize the number of comparisons you have to make.
I think what you're looking at here is known as an Adjacency List. You can either build one up, and then traverse that to find the longest distance.
Another approach you can use will still give you an O(n^2) but you can minimize the number of comparisons you have to make. You can store the result of your calculation into a hash table where the key is the name of the edge (so AB would hold the length from A to B). Before you perform your distance calculation, check to see if AB or BA exists in the hash table.
EDIT
Using the adjacency-list method (which is basically a Breadth-First Search) you get O(b^d) or worst-case O(|E| + |V|) complexity.
Paul got my brain thinking and you can optimize a bit by changing
for (int j=0; j < nOfSpheres; j++)
to
for (int j=i+1; j < nOfSpheres; j++)
You don't need to compare sphere A to B AND B to A. This will make the search O(n log n).
--- Addition -------
Another thing that makes this calculation expensive is the DistanceTo calulations.
distance = sqrt((x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2)
That's alot of math. You can trim that down by checking to see if
((x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2 > maxdist^2
Removes the sqrt until the end.

Resources