Fast algorithm to find out the number of points under hyperplane - algorithm

Given points in Euclidean space, is there a fast algorithm to count the number of points 'under' one arbitrary hyperplane? Fast means time complexity lower than O(n)
Time for preprocessing or sorting the points is okay
And, even if not high dimensional, I'd like to know whether there exists one that can be used in 2 dimension space

If you're willing to preprocess the points, then you have to visit each one at least once, which is O(n). If you consider a test of which side the point is on as part of the preprocessing then you've got an O(0) algorithm (with O(n) preprocessing). So I don't think this question makes sense as stated.
Nevertheless, I'll attempt to give a useful answer, even if it's not precisely what the OP asked for.
Choose a hyperplane unit normal and root point. If the plane is given in parametric form
(P - O).N == 0
then you have these already, just make sure the normal is unitized.
If it's given in analytic form: Sum(i = 1 to n: a[i] x[i]) + d = 0, then the vector A = (a1, ... a[n]) is a normal of the plane, and N = A/||A|| is the unit plane normal. A point O (for origin) on the plane is d N.
You can test which side each point P is on by projecting it onto N add checking the sign of the parameter:
Let V = P - O. V is the vector from the chosen origin O to P.
Let s N be the projection of V onto N. If s is negative, then P is "under" the hyperplane.
You should go to the link on vector projection if you're rusty on the subject, but I'll summarize here using my notation. Or, you can take my word for it, and just skip to the formula at the end.
If alpha is the angle between V and N, then from the definition of cosine we have cos(alpha) = s||N||/||V|| = s/||V|| since N is a unit normal. But we also know from vector algebra that cos(alpha) = ||V||(V.N), where "." is scalar product (a.k.a. dot product, or euclidean inner product).
Equating these two expressions for cos(alpha) we have
s = (V.V)(V.N)
(using the fact that ||V||^2 == V.V).
So your proprocesing work is to compute N and O, and your test is:
bool is_under = (dot(V, V)*dot(V, N) < 0.);
I don't believe it can be done any faster.

When setting the point values, use checking conditions at that point setting. Then increment or dont increment the counter. O(n)

I found O(logN) algorithm in 2D dimension by using divide-and-conquer and binary search with O(N log N) preprocessing time complexity and O(N log N) memory complexity
The basic idea is that points can be divided into left N/2 points and right N/2 points, and the number of points that's under the line(in 2D dimension) is sum of the number of left points under the line and the number of the right points under the line. I'll call the infinite line that divides whole points into 'left' and 'right' as 'dividing line'. Dividing line will be look like 'x = k'
If each 'left points' and 'right points' are sorted by y-axis order, then the number of specific points - the points at the right lower corner - can be quickly found by using binary searching 'the number of points whose y values are lower than the y value of intersection point of the line and the Dividing line'.
Therefore time complexity is
T(N) = 2T(N/2) + O(log N)
and finally the time complexity is O(log N)

Related

Find the m by m square that contains the most "conflicting pairs"?

There are two types of units on a 2d plane, green units (G) and red units (R).
The plane is represented as an n by n matrix, each unit is represented as an element in the matrix.
A pair of two units is called a "conflicting pair" if the two are of different colours. The goal is to find the m by m submatrix that contains the most "conflicting pairs".
Example
[R R 0 0 0
R R 0 0 0
0 0 R R 0
0 0 0 G G
0 0 0 G G]
In the above 5 by 5 matrix, the "most conflicting" 3 by 3 submatrix is at the lower right corner, where there are two red units and four green units, which amounts to 8 conflicting pairs within the submatrix.
A naive solution will take O(m^2n^2) for iterating every element in every possible submatrix.
I also thought of using dynamic programming like the Summed-area table algorithm, the time complexity will then be O(n^2), which looks good since it's already O(n^2) for scanning each element once.
However the n by n matrix may be large and sparse and given in a sparse format (like CSR), in that case an O(n^2) algorithm may not be efficient. Any suggeststions on how do I do better for sparse matrices (and dense matrices)?
If you have k non-empty cells (with R or G) then you can solve with time complexity O(k^2) (squeeze the matrix) because optimal submatrix has one non-empty cell on the border of the matrix.
Or time complexity maybe O(k * (log n)^2) if use two dimension sparse segments tree for getting sum on a rectangle.
The answer is given by
idx = argmax SUM(X_r,m) * SUM(X_g,m)
where SUM(X,m) returns a matrix with the summation of units in each m x m window, X_r and X_g are the matrices with only red and green units enabled respectively, and idx is the m x m window with the largest number of conflicting nodes.
The question then becomes can SUM(X,m) be more efficiently calculated for sparse matrices. I think the answer is: it really depends on the structure of X and the value of m.
An obvious way to make use of the sparsity of X is to compute SUM(X,m) by using the identity
SUM(X,m) = transpose(SUM1d( transpose(SUM1d(X,m) ), m )) (1)
where SUM1d(X,m) is the results of summing intervals of length m along rows of X. Clearly, SUM1d can be implemented in O(n) time for each row, and O(n^2) for the entire matrix, in a similar fashion to the Sum-Area-Table algorithm. This yields the same complexity O(n^2) for the entire algorithm. But that is rather uninteresting as it's the same runtime as a Sum-Area-Table algorithm.
What is interesting is asking whether SUM1d(X,m) can be implemented to take advantage of any sparsity of X. It's clear that SUM1d can be implemented to take full advantage of the sparsity of the input matrix; however, depending on the structure of X and the size of m the output matrix may not be sparse.
Assuming, m is much less than n then implementing SUM1d(X,m) as described in eq (1) above can be done in O(nz_row) time where nz_row is the max number of non-zero elements on any of the rows of X. Furthermore, SUM1d(X,m) will produce a sparse matrix, albeit with O(m) less sparsity. Since we assume m is much less than n this is still a sparse matrix and will still translate to efficiency gains.
Therefore, we should expect O(n*nz_row) for the first call to SUM1d in eq (1) and O(n*m*nz_col) for the second call to SUM1d.

How to find minimal and maximal angle determined by three points selected from n points?

Given n points in 2-D plane, like (0,0),(1,1), ... We can select any three points from them to construct angle. For example, we choose A(0, 0), B(1, 1), C(1, 0), then we get angle ABC = 45 degree, ACB = 90 degree and CAB = 45 degree.
My question is how to calculate max or min angle determined by three points selected from n points.
Obviously, we can use brute-force algorithm - calculate all angels and find maximal and minimal value, using Law Of Cosines to calculate angles and Pythagorean theorem to calculate distances. But does efficient algorithm exist?
If I'm correct, brute-force runs in O(n^3): you basically take every triplet, compute the 3 angles, and store the overall max.
You can improve slightly to O(n^2 * log(n)) but it's trickier:
best_angle = 0
for each point p1:
for each point p2:
compute vector (p1, p2), and the signed angle it makes with the X-axis
store it in an array A
sort array A # O(n * log(n))
# Traverse A to find the best choice:
for each alpha in A:
look for the element beta in A closest to alpha+Pi
# Takes O(log n) because it's sorted. Don't forget to take into account the fact that A represents a circle: both ends touch...
best_angle = max(best_angle, abs(beta - alpha))
The complexity is O(n * (n + nlog(n) + n * log(n))) = O(n^2 * log(n))
Of course you can also retrieve the pt1, pt2 that obtained the best angle during the loops.
There is probably still better, this feels like doing too much work overall, even if you re-use your previous computations of pt1 for pt2, ..., ptn ...

Data structure to support a particular query on a set of 2D points

I have a set of 2D points, and I want to be able to make the following query with arguments x_min and n: what are the n points with largest y which have x > x_min?
To rephrase in Ruby:
class PointsThing
def initialize(points)
#points = points
end
def query(x_min, n)
#points.select { |point| point.x > x_min }.sort_by { |point| point.y }.take(n)
end
end
Ideally, my class would also support an insert and delete operation.
I can't think of a data structure for this which would enable the query to run in less than O(|#points|) time. Does anyone know one?
Sort the points by x descending. For each point in order, insert it into a purely functional red-black tree ordered by y descending. Keep all of the intermediate trees in an array.
To look up a particular x_min, use binary search to find the intermediate tree where exactly the points with x > x_min have been inserted. Traverse this tree to find the first n points.
The preprocessing cost is O(p log p) in time and space, where p is the number of points. The query time is O(log p + n), where n is the number of points to be returned in the query.
If your data are not sorted, then you have no choice but to check every point since you cannot know if there exists another point for which y is greater than that of all other points and for which x > x_min. In short: you can't know if another point should be included if you don't check them all.
In that case, I would assume that it would be impossible to check in sublinear time as you ask for, since you have to check them all. Best case for searching all would be linear.
If your data are sorted, then your best case will be constant time (all n points are those with the greatest y), and worst case would be linear (all n points are those with least y). Average case would be closer to constant I think if your x and x_min are both roughly random within a specific range.
If you want this to scale (that is, you could have large values of n), you will want to keep your resultant set sorted as well since you will need to check new potential points against it and to drop the lowest value when you insert (if size > n). Using a tree, this can be log time.
So, to do the entire thing, worst case is for unsorted points, in which case you're looking at nlog(n) time. Sorted points is better, in which case you're looking at average case of log(n) time (again, assuming roughly randomly distributed values for x and x_min), which yes is sub-linear.
In case it isn't at first obvious why sorted points will have have constant time to search through, I will go over that here quickly.
If the n points with the greatest y values all had x > x_min (the best case) then you are just grabbing what you need off the top, so that case is obvious.
For the average case, assuming roughly randomly distributed x and x_min, the odds that x > x_min are basically half. For any two random numbers a and b, a > b is just as likely to be true as b > a. This is the same thing with x and x_min; x > x_min is equally as likely to be true as x_min > x, meaning 0.5 probability. This means that, for your points, on average every second point checked will meet your x > x_min requirement, so on average you will check 2n points to find the n highest points that meet your criteria. So the best case was c time, average is 2c which is still constant.
Note, however, that for values of n approaching the size of the set this hides the fact that you are going through the entire set, essentially bringing it right back up to linear time. So my assertion that it is constant time does not hold true if you assume random values of n within the range of the size of your set.
If this is not a purely academic question and is prompted by some actual need, then it depends on the situation.
(edit)
I just realized that my constant-time assertions were assuming a data structure where you have direct access to the highest value and can go sequentially to lower values. If the data structure that those are provided to you in does not fit that description, then obviously that will not be the case.
Some precomputation would help in this case.
First partition the set of points taking x_min as pivot element.
Then for set of points lying on right side of x_min build a max_heap based on y co-ordinates.
Now run your query as: Perform n extract_max operations on the built max_heap.
The running time of your query would be log X + log (X-1) + ..... log (X-(n-1))
log X: For the first extract max operation.
log X-1: For the second extract max operation and so on.
X: Size of original Max heap.
Even in the worst case when your n << X , The time taken would be O(n log X).
Notation
Let P be the set of points.
Let top_y ( n, x_min) describe the query to collect the n points from P with the largest y-coordinates among those with x-coordinate greater than or equal to `x_min' .
Let x_0 be the minimum of x coordinates in your point set. Partition the x axis to the right of x_0 into a set of left-hand closed, right-hand open intervals I_i by the set of x coordinates of the point set P such that min(I_i) is the i-th but smallest x coordinate from P. Define the coordinate rank r(x) of x as the index of the interval x is an element of or 0 if x < x_0.
Note that r(x) can be computed in O(log #({I_i})) using a binary search tree.
Simple Solution
Sort your point set by decreasing y-coordinates and save this array A in time O(#P log #P) and space O(#P).
Process each query top_y ( n, x_min ) by traversing this array in order, skipping over items A_i: A_i.x < x_0, counting all other entries until the counter reaches n or you are at the end of A. This processing takes O(n) time and O(1) space.
Note that this may already be sufficient: Queries top_y ( n_0, a_0 ); a_0 < min { p.x | p \in P }, n_0 = c * #P, c = const require step 1 anyway and for n << #P and 'infrequent' queries any further optimizations weren't worth the effort.
Observation
Consider the sequences s_i,s_(i+1)of points with x-coordinates greater than or equal tomin(I_i), min(I_(i+1)), ordered by decreasing y-coordinate.s_(i+1)is a strict subsequence ofs_i`.
If p_1 \in s_(i+1) and p_2.x >= p_1.x then p_2 \in s_(i+1).
Refined Solution
A refined data structure allows for O(n) + O(log #P) query processing time.
Annotate the array A from the simple solution with a 'successor dispatch' for precisely those elements A_i with A_(i+1).x < A_i.x; This dispatch data would consist of an array disp:[r(A_(i+1).x) + 1 .. r(A_i.x)] of A-indexes of the next element in A whose x-coordinate ranks at least as high as the index into disp. The given dispatch indices suffice for processing the query, since ...
... disp[j] = disp[r(A_(i+1).x) + 1] for each j <= r(A_(i+1).x).
... for any x_min with r(x_min) > r(A_i.x), the algorithm wouldn't be here
The proper index to access disp is r(x_min) which remains constant throughout a query and thus takes O(log #P) to compute once per query while the index selection itself is O(1) at each A element.
disp can be precomputed. No two disp entries across all disp arrays are identical (Proof skipped, but it's easy [;-)] to see given the construction). Therefore the construction of disp arrays can be performed stack-based in a single sweep through the point set sorted in A. As there are #P entries, the disp structure takes O(#P) space and O(#P) time to construct, being dominated by space and time requirements for y-sorting. So in a certain sense, this structure comes for free.
Time requirements for query top_y(n,x_min)
Computing r(x_min): O(log #P);
Passage through A: O(n);

A divide-and-conquer algorithm for counting dominating points?

Let's say that a point at coordinate (x1,y1) dominates another point (x2,y2) if x1 ≤ x2 and y1 ≤ y2;
I have a set of points (x1,y1) , ....(xn,yn) and I want to find the total number of dominating pairs. I can do this using brute force by comparing all points against one another, but this takes time O(n2). Instead, I'd like to use a divide-and-conquer approach to solve this in time O(n log n).
Right now, I have the following algorithm:
Draw a vertical line dividing the set of points points into two equal subsets of Pleft and Pright. As a base case, if there are just two points left, I can compare them directly.
Recursively count the number of dominating pairs in Pleft and Pright
Some conquer step?
The problem is that I can't see what the "conquer" step should be here. I want to count how many dominating pairs there are that cross from Pleft into Pright, but I don't know how to do that without comparing all the points in both parts, which would take time O(n2).
Can anyone give me a hint about how to do the conquer step?
so the 2 halves of y coordinates are : {1,3,4,5,5} and {5,8,9,10,12}
i draw the division line.
Suppose you sort the points in both halves separately in ascending order by their y coordinates. Now, look at the lowest y-valued point in both halves. If the lowest point on the left has a lower y value than the lowest point on the right, then that point is dominated by all points on the right. Otherwise, the bottom point on the right doesn't dominate anything on the left.
In either case, you can remove one point from one of the two halves and repeat the process with the remaining sorted lists. This does O(1) work per point, so if there are n total points, this does O(n) work (after sorting) to count the number of dominating pairs across the two halves. If you've seen it before, this is similar to the algorithm for counting inversions in an array).
Factoring in the time required to sort the points (O(n log n)), this conquer step takes O(n log n) time, giving the recurrence
T(n) = 2T(n / 2) + O(n log n)
This solves to O(n log2 n) according to the Master Theorem.
However, you can speed this up. Suppose that before you start the divide amd conquer step that you presort the points by their y coordinates, doing one pass of O(n log n) work. Using tricks similar to the closest pair of points problem, you can then get the points in each half sorted in O(n) time on each subproblem of size n (see the discussion at this bottom of this page) for details). That changes the recurrence to
T(n) = 2T(n / 2) + O(n)
Which solves to O(n log n), as required.
Hope this helps!
Well in this way you have O(n^2) just for division to subsets...
My approach would be different
sort points by X ... O(n.log(n))
now check for Y
but check only points with bigger X (if you sort them ascending then with larger index)
so now you have O(n.log(n)+(n.n/2))
You can also further speed things up by doing separate X and Y test and after that combine the result, that would leads O(n + 3.n.log(n))
add index attribute to your points
where index = 0xYYYYXXXXh is unsigned integer type
YYYY is index of point in Y-sorted array
XXXX is index of point in X-sorted array
if you have more than 2^16 points use bigger then 32-bit data-type.
sort points by ascending X and set XXXX part of their index O1(n.log(n))
sort points by ascending Y and set YYYY part of their index O2(n.log(n))
sort points by ascending index O3(n.log(n))
now point i dominates any point j if (i < j)
but if you need to create actually all the pairs for any point
that would take O4(n.n/2) so this approach will save not a bit of time
if you need just single pair for any point then simple loop will suffice O4(n-1)
so in this case O(n-1+3.n.log(n)) -> ~O(n+3.n.log(n))
hope it helped,... of course if you are stuck with that subdivision approach than i have no better solution for you.
PS. for this you do not need any additional recursion just 3x sorting and only one uint for any point so the memory requirements are not that big and even should be faster than recursive call to subdivision recursion in general
This algorithm runs in O(N*log(N)) where N is the size of the list of points and it uses O(1) extra space.
Perform the following steps:
Sort the list of points by y-coordinate (ascending order), break ties by
x-coordinate (ascending order).
Go through the sorted list in reverse order to count the dominating points:
if the current x-coordinate >= max x-coordinate value encountered so far
then increment the result and update the max.
This works since you know for sure that if all pairs with a greater y-coordinates have a smaller x-coordinate than the current point you have found a dominating points. The sorting step makes it really efficient.
Here's the Python code:
def my_cmp(p1, p2):
delta_y = p1[1] - p2[1]
if delta_y != 0:
return delta_y
return p1[0] - p2[0]
def count_dom_points(points):
points.sort(cmp = my_cmp)
maxi = float('-inf')
count = 0
for x, y in reversed(points):
if x >= maxi:
count += 1
maxi = x
return count

Finding a square side length is R in 2D plane ?

I was at the high frequency Trading firm interview, they asked me
Find a square whose length size is R with given n points in the 2D plane
conditions:
--parallel sides to the axis
and it contains at least 5 of the n points
running complexity is not relative to the R
they told me to give them O(n) algorithm
Interesting problem, thanks for posting! Here's my solution. It feels a bit inelegant but I think it meets the problem definition:
Inputs: R, P = {(x_0, y_0), (x_1, y_1), ..., (x_N-1, y_N-1)}
Output: (u,v) such that the square with corners (u,v) and (u+R, v+R) contains at least 5 points from P, or NULL if no such (u,v) exist
Constraint: asymptotic run time should be O(n)
Consider tiling the plane with RxR squares. Construct a sparse matrix, B defined as
B[i][j] = {(x,y) in P | floor(x/R) = i and floor(y/R) = j}
As you are constructing B, if you find an entry that contains at least five elements stop and output (u,v) = (i*R, j*R) for i,j of the matrix entry containing five points.
If the construction of B did not yield a solution then either there is no solution or else the square with side length R does not line up with our tiling. To test for this second case we will consider points from four adjacent tiles.
Iterate the non-empty entries in B. For each non-empty entry B[i][j], consider the collection of points contained in the tile represented by the entry itself and in the tiles above and to the right. These are the points in entries: B[i][j], B[i+1][j], B[i][j+1], B[i+1][j+1]. There can be no more than 16 points in this collection, since each entry must have fewer than 5. Examine this collection and test if there are 5 points among the points in this collection satisfying the problem criteria; if so stop and output the solution. (I could specify this algorithm in more detail, but since (a) such an algorithm clearly exists, and (b) its asymptotic runtime is O(1), I won't go into that detail).
If after iterating the entries in B no solution is found then output NULL.
The construction of B involves just a single pass over P and hence is O(N). B has no more than N elements, so iterating it is O(N). The algorithm for each element in B considers no more than 16 points and hence does not depend on N and is O(1), so the overall solution meets the O(N) target.
Run through set once, keeping the 5 largest x values in a (sorted) local array. Maintaining the sorted local array is O(N) (constant time performed N times at most).
Define xMin and xMax as the x-coordinates of the two points with largest and 5th largest x values respectively (ie (a[0] and a[4]).
Sort a[] again on Y value, and set yMin and yMax as above, again in constant time.
Define deltaX = xMax- xMin, and deltaY as yMax - yMin, and R = largest of deltaX and deltaY.
The square of side length R located with upper-right at (xMax,yMax) meets the criteria.
Observation if R is fixed in advance:
O(N) complexity means no sort is allowed except on a fixed number of points, as only a Radix sort would meet the criteria and it requires a constraint on the values of xMax-xMin and of yMax-yMin, which was not provided.
Perhaps the trick is to start with the point furthest down and left, and move up and right. The lower-left-most point can be determined in a single pass of the input.
Moving up and right in steps and counitng points in the square requries sorting the points on X and Y in advance, which to be done in O(N) time requiress that the Radix sort constraint be met.

Resources