Computational Geometry set of points algorithm - algorithm

I have to design an algorithm with running time O(nlogn) for the following problem:
Given a set P of n points, determine a value A > 0 such that the shear transformation (x,y) -> (x+Ay,y) does not change the order (in x direction) of points with unequal x-coordinates.
I am having a lot of difficulty even figuring out where to begin.
Any help with this would be greatly appreciated!
Thank you!

I think y = 0.
When x = 0, A > 0
(x,y) -> (x+Ay,y)
-> (0+(A*0),0) = (0,0)
When x = 1, A > 0
(x,y) -> (x+Ay,y)
-> (1+(A*0),0) = (1,0)
with unequal x-coordinates, (2,0), (3,0), (4,0)...
So, I think that the begin point may be (0,0), x=0.

Suppose all x,y coordinates are positive numbers. (Without loss of generality, one can add offsets.) In time O(n log n), sort a list L of the points, primarily in ascending order by x coordinates and secondarily in ascending order by y coordinates. In time O(n), process point pairs (in L order) as follows. Let p, q be any two consecutive points in L, and let px, qx, py, qy denote their x and y coordinate values. From there you just need to consider several cases and it should be obvious what to do: If px=qx, do nothing. Else, if py<=qy, do nothing. Else (px>qx, py>qy) require that px + A*py < qx + A*qy, i.e. (px-qx)/(py-qy) > A.
So: Go through L in order, and find the largest A' that is satisfied for all point pairs where px>qx and py>qy. Then choose a value of A that's a little less than A', for example, A'/2. (Or, if the object of the problem is to find the largest such A, just report the A' value.)

Ok, here's a rough stab at a method.
Sort the list of points by x order. (This gives the O(nlogn)--all the following steps are O(n).)
Generate a new list of dx_i = x_(i+1) - x_i, the differences between the x coordinates. As the x_i are ordered, all of these dx_i >= 0.
Now for some A, the transformed dx_i(A) will be x_(i+1) -x_i + A * ( y_(i+1) - y_i). There will be an order change if this is negative or zero (x_(i+1)(A) < x_i(A).
So for each dx_i, find the value of A that would make dx_i(A) zero, namely
A_i = - (x_(i+1) - x_i)/(y_(i+1) - y_i). You now have a list of coefficients that would 'cause' an order swap between a consecutive (in x-order) pair of points. Watch for division by zero, but that's the case where two points have the same y, these points will not change order. Some of the A_i will be negative, discard these as you want A>0. (Negative A_i will also induce an order swap, so the A>0 requirement is a little arbitrary.)
Find the smallest A_i > 0 in the list. So any A with 0 < A < A_i(min) will be a shear that does not change the order of your points. Pick A_i(min) as that will bring two points to the same x, but not past each other.

Related

Maximum area rectangle

Given set of points (x[1]; y[1]), (x[2]; y[2]), ..., (x[n]; y[n]) . We need to find maximum area of rectangle that we can get. Rectangle's vertexes should be in points set. Also, rectangle is not necessary be axis-aligned. For example, answer for (1; 1), (2; 2), (2; 0); (3; 1) is 2.
n <= 1300; -10^9 <= x[i], y[i] <= 10^9.
Can someone help me with this problem? My solution is brute-force O(N^3), it's giving TLE. I select some three points and find fourth.
Every pair of points determines a line L, which has a slope m and an intercept c. (Ignore vertical lines for now.) Instead of considering the intercept, let's work with a different quantity that gives much the same information: The distance d(L) between the line and the origin, i.e., the length of a line segment R perpendicular to L and connecting L to the origin. Additionally, we can talk about the "displacement" of a point along L: We can say that the point p on L where it meets R has displacement 0, and the point on L that is x "above" p (has distance x from p and higher y coordinate) has displacement x, with negative displacements for points "below" p. In fact, we don't need the intercept or d(L) to define the displacement of a point with respect to a line L -- just the line's slope. Define disp(m, q) to be the displacement of point q on a line with slope m.
Suppose a, b, c, d are the vertices of a rectangle, with sides ab, bc, cd and da. Observe that the line containing ab has the same slope m as the line containing cd, and (disp(m, a), disp(m, b)) = (disp(m, d), disp(m, c)). So the only 4-tuples of vertices that we need to test are those comprised of pairs of vertex pairs like ab and cd -- vertex pairs having the same slope and displacement pairs. Furthermore, one side length (shared by ab and cd) is equal to |disp(m, b) - disp(m, a)|, and the other side length will be |d(Lab) - d(Lcd)|, where Lab and Lcd are the lines containing the line segments ab and cd, respectively.
To find these 4-tuples of vertices efficiently:
For all pairs of vertices i, j:
Let L be the line passing through i and j. Compute its slope m and distance d(L) from the origin. Also compute disp(m, i) and disp(m, j). If disp(m, i) <= disp(m, j), add the tuple (m, disp(m, i), disp(m, j), d(L)) to an array Z.
Sort Z lexicographically. This will place all point pairs lying on lines of the same slope and having equal displacements in a contiguous block, ordered by increasing d(L).
Scan through the array, looking for block boundaries -- positions k at which any of the first three tuple elements changes. Let prev be the last such k found (initially, prev = 0). For each such k:
Compute (Z[k-1][3] - Z[prev][3]) * (Z[k-1][2] - Z[k-1][1]). This is the area of the largest rectangle having a pair of sides with slope Z[k-1][0] and length (Z[k-1][2] - Z[k-1][1]). If this is greater than the maximum rectangle size found so far, update it.
This algorithm takes O(n^2 log n) time and O(n^2) space.

Algorithm to find if triangles formed by set of points contains origin or not and give total count as well?

Input: S = {p1, . . . , pn}, n points on 2D plane each point is given by its x and y-coordinate.
For simplicity, we assume:
The origin (0, 0) is NOT in S.
Any line L passing through (0, 0) contains at most one point in S.
No three points in S lie on the same line.
If we pick any three points from S, we can form a triangle. So the total number of triangles that can be formed this way is Θ(n^3).
Some of these triangles contain (0, 0), some do not.
Problem: Calculate the number of triangles that contain (0, 0).
You may assume we have an O(1) time function Test(pi, pj , pk) that, given three points pi, pj , pk in S, returns 1, if the triangle formed by {pi, pj , pk} contains (0, 0), and returns 0 otherwise. It’s trivial to solve the problem in Θ(n^3) time (just enumerate and test all triangles).
Describe an algorithm for solving this problem with O(n log n) run time.
My analysis of the above problem leads to the following conclusion
There are 4 coordinates ( + ,+ ) , ( + ,- ) , ( -, - ), ( -, + ) { x and y coordinate > 0 or not }.
Let
s1 = coordinate x < 0 and y > 0
s2 = x > 0 , y > 0
s3 = x < 0 , y < 0
s4 = x > 0 , y < 0
Now we need to do the testing of points in between sets of the following combinations only
S1 S2 S3
S1 S1 S4
S2 S2 S3
S3 S3 S2
S1 S4 S4
S1 S3 S4
S1 S2 S4
S2 S3 S4
I now need to test the points in the above combination of sets only ( e.g. one point from s1 , one point from s2 and one point from s3 < first combinaton > ) and see the points contain (0,0) by calling Test function ( which is assumed as constant time function here) .
Can someone guide me on this ?
Image added below for clarification on why only some subsets (s1,s2 , s4 ) can contain (0,0) and some ( s1,s1,s3) cannot.
I'm guessing we're in the same class (based on the strange wording of the question), so now that the due date is past, I feel alright giving out my solution. I managed to find the n log n algorithm, which, as the question stated, is more a matter of cleverly transforming the problem, and less of a Dynamic Programming / DaC solution.
Note: This is not an exhaustive proof, I leave that to you.
First, some visual observations. Take some triangle that obviously contains the origin.
Then, convert the points to vectors.
Convince yourself that any selection of three points, one from each vector, describes a triangle that also contains the origin.
It also follows that, if you perform the above steps on a triangle that doesn't enclose the origin, any combination of points along those vectors will also not contain the origin.
The main point to get from this is, the magnitude of the vector does not matter, only the direction. Additionally, a hint to the question says that "any line crossing (0,0) only contains one point in S", from which we can extrapolate that the direction of each vector is unique.
So, if only the angle matters, it would follow that there is some logic that determines what range of points, given two points, could possibly form a triangle that encloses the origin. For simplicity, we'll assume we've taken all the points in S and converted them to vectors, then normalized them, effectively making all points lie on the unit circle.
So, take two points along this circle.
Then, draw a line from each point through the origin and to the opposite side of the circle.
It follows that, given the two points, any point that lies along the red arc can form a triangle.
So our algorithm should do the following:
Take each point in S. Make a secondary array A, and for each point, add the angle along the unit circle (atan2(x,y)) to A (0 ≤ Ai ≤ 2π). Let's assume this is O(n)
Sort A by increasing. O(n log n), assuming we use Merge Sort.
Count the number of triangles possible for each pair (Ai,Aj). This means that we count the number of Ai + π ≤ Ak ≤ Aj + π. Since the array is sorted, we can use a Binary Search to find the indices of Ai + π and Aj + π, which is O(2 log n) = O(log n)
However, we run into a problem, there are n^2 points, and if we have to do an O(log n) search for each, we have O(n^2 log n). So, we need to make one more observation.
Given some Ai < Aj, we'll say Tij describes the number of triangles possible, as calculated by the above method. Then, given a third Ak > Aj, we know that Tij ≤ Tik, as the number of points between Ai + π and Ak + π must be at least as many as there are betwen Ai + π and Aj + π. In fact, it is exactly the count between Ai + π and Aj + π, plus the count between Aj + π and Ak + π. Since we already know the count between Ai + π and Aj + π, we don't need to recalculate it - we only need to calculate the number between Aj + π and Ak + π, then add the previous count. It follows that:
A(n) = count(A(n),A(n-1)) + count(A(n-1),A(n-2)) + ... + count(A(1),A(0))
And this means we don't need to check all n^2 pairs, we only need to check consecutive pairs - so, only n-1.
So, all the above can give us the following psuedocode solution.
int triangleCount(point P[],int n)
int A[n], C[n], totalCount = 0;
for(i=0...n)
A[i] = atan2(P[i].x,P[i].y);
mergeSort(A);
int midPoint = binarySearch(A,π);
for(i=0...midPoint-1)
int left = A[i] + π, right = A[i+1] + π;
C[i] = binarySearch(a,right) - binarySearch(a,left);
for(j=0...i)
totalCount += C[j]
return totalCount;
It seems that in the worst case there are Θ(n3) triangles containing the origin, and since you need them all, the answer is no, there is no better algorithm.
For a worst case consider a regular polygon of an odd degree n, centered at the origin.
Here is an outline of the calculations. A chord connecting two vertices which are k < n/2 vertices apart is a base for Θ(k) triangles. Fix a vertex; its contribution is a sum over all chords coming from it, yielding Θ(n2), and a total (a contribution of all n vertices) is Θ(n3) (each triangle is counted 3 times, which doesn't affect the asymptotic).

Data structure to hold and retrieve points in a plane

Definition 1: Point (x,y) is controlling point (x',y') if and only if x < x' and y < y'.
Definition 2: Point (x,y) is controlled by point (x',y') if and only if x' < x and y' < y.
I'm trying to come up with data structure to support the following operations:
Add(x,y) - Adds a point (x,y) to the system in O(logn) complexity, where n is the number of points in the system.
Remove(x,y) - Removes a point (x,y) from the system in O(logn) complexity, where n is the number of points in the system.
Score(x,y) - Returns the number of points (x,y) controls - number of points that (x,y) is controlled by. Worst case complexity O(logn).
I've tried to solve it using multiple AVL trees, but could not come up with elegant enough solution.
Point (x,y) is controlling point (x',y') if and only if x < x' and y <
y'.
Point (x,y) is controlled by point (x',y') if and only if x' < x and
y' < y.
Lets assume that (x,y) is the middle of the square.
(x,y) is controlling points in B square and is being controlled by points in C.
The output required is the number of points (x,y) controls minus the number of points (x,y) is being controlled by. Which is the number of points in B minus the number of points in C,B-C(Referring to the number of points in A,B,C,D as simply A,B,C,D).
We can easily calculate the number of points in A+C, that's simply the number of points with x' < x.
Same goes for C+D (Points with y'y), B+D (x'>x).
We add up A+C to C+D which is A+2C+D.
Add up A+B to B+D which is A+2B+D.
Deduct the two: A+2B+D-(A+2C+D) = 2B-2C, divide by two: (2B-2C)/2 = B-C which is the output needed.
(I'm assuming handling the 1D case is simple enough and there is no need to explain.)
For the sake of future reference
Solution outline:
We will maintain two AVL trees.
Tree_X: will hold points sorted by their X coordinate.
Tree_Y: will hold points sorted by their Y coordinate.
Each node within both trees will hold the following additional data:
Number of leaves in left sub-tree.
Number of leaves in right sub-tree.
For a point $(x,y)$ we will define regions A ,B, C, D:
Point (x',y') is in A if x' < x and y' > y.
Point (x',y') is in B if x' > x and y' > y.
Point (x',y') is in C if x' < x and y' < y.
Point (x',y') is in D if x' > x and y' < y.
Now it is clear that Score(x,y) = |C|-|B|.
However |A|+|C|, |B|+|D|, |A|+|B|, |C|+|D| could be easily retrieved from our two AVL trees, as we will soon see.
And notice that [(|A| + |C| + |C| + |D|) - (|A| + |B| + |B| + |D|)]/2 = |C|-|B|
Implementation of required operations:
Add(x,y) - We will add point (x,y) to both of our AVL trees. Since the additional data we are storing is affected only on the insertion path and since the insertion occurs in (logn), the total cost of Add(x,y) is O(logn).
Remove(x,y) - We will remove point (x,y) from both of our AVL trees. Since the additional data we are storing is affected only on the removal path and since the removal occurs in (logn), the total cost of Remove(x,y) is O(logn).
Score(x,y) - I will show how to calculate $|B|+|D|$ as others done in similar way and same complexity costs. It is clear that $|B|+|D|$ is the number of points which satisfy $x' > x$. To calculate this number we will:
Find x in AVL_X. Complexity O(logn).
Go upwards in Tree_X until the root and on each turn right we will sum the number of elements in left sub-tree of the son. Complexity O(logn).
Total cost of Remove(x,y) is O(logn).

Finding a square side length is R in 2D plane ?

I was at the high frequency Trading firm interview, they asked me
Find a square whose length size is R with given n points in the 2D plane
conditions:
--parallel sides to the axis
and it contains at least 5 of the n points
running complexity is not relative to the R
they told me to give them O(n) algorithm
Interesting problem, thanks for posting! Here's my solution. It feels a bit inelegant but I think it meets the problem definition:
Inputs: R, P = {(x_0, y_0), (x_1, y_1), ..., (x_N-1, y_N-1)}
Output: (u,v) such that the square with corners (u,v) and (u+R, v+R) contains at least 5 points from P, or NULL if no such (u,v) exist
Constraint: asymptotic run time should be O(n)
Consider tiling the plane with RxR squares. Construct a sparse matrix, B defined as
B[i][j] = {(x,y) in P | floor(x/R) = i and floor(y/R) = j}
As you are constructing B, if you find an entry that contains at least five elements stop and output (u,v) = (i*R, j*R) for i,j of the matrix entry containing five points.
If the construction of B did not yield a solution then either there is no solution or else the square with side length R does not line up with our tiling. To test for this second case we will consider points from four adjacent tiles.
Iterate the non-empty entries in B. For each non-empty entry B[i][j], consider the collection of points contained in the tile represented by the entry itself and in the tiles above and to the right. These are the points in entries: B[i][j], B[i+1][j], B[i][j+1], B[i+1][j+1]. There can be no more than 16 points in this collection, since each entry must have fewer than 5. Examine this collection and test if there are 5 points among the points in this collection satisfying the problem criteria; if so stop and output the solution. (I could specify this algorithm in more detail, but since (a) such an algorithm clearly exists, and (b) its asymptotic runtime is O(1), I won't go into that detail).
If after iterating the entries in B no solution is found then output NULL.
The construction of B involves just a single pass over P and hence is O(N). B has no more than N elements, so iterating it is O(N). The algorithm for each element in B considers no more than 16 points and hence does not depend on N and is O(1), so the overall solution meets the O(N) target.
Run through set once, keeping the 5 largest x values in a (sorted) local array. Maintaining the sorted local array is O(N) (constant time performed N times at most).
Define xMin and xMax as the x-coordinates of the two points with largest and 5th largest x values respectively (ie (a[0] and a[4]).
Sort a[] again on Y value, and set yMin and yMax as above, again in constant time.
Define deltaX = xMax- xMin, and deltaY as yMax - yMin, and R = largest of deltaX and deltaY.
The square of side length R located with upper-right at (xMax,yMax) meets the criteria.
Observation if R is fixed in advance:
O(N) complexity means no sort is allowed except on a fixed number of points, as only a Radix sort would meet the criteria and it requires a constraint on the values of xMax-xMin and of yMax-yMin, which was not provided.
Perhaps the trick is to start with the point furthest down and left, and move up and right. The lower-left-most point can be determined in a single pass of the input.
Moving up and right in steps and counitng points in the square requries sorting the points on X and Y in advance, which to be done in O(N) time requiress that the Radix sort constraint be met.

Amount of points above some line

Consider some points on a 2d plane and function f(x)=ax, where b=0. Let's say a point is a 1x1 square.
Now we want to tell how many points is between f(x) function and y line, as in picture below.
Black points are valid, white not. We also say point is valid if it:
intersects with the y axis;
or with the function f(x);
or is between them.
As denoted in the picture :
How can we solve this, assuming that we don't remove any of the points and we don't add them? Is there any other approach than standard brute force?
If I am understanding this right the points are random and given to you by their coordinates, and the line is also given to you. If that is the case, there cannot be any a priori knowledge about any relationship between the points, so you'd have to go through them, in the order given, and compare their x coordinate with 0 and their y coordinate with f(x). If a point passes the check you increment the counter, otherwise you don't. The algorithm runs in O(n) time and I highly doubt you can do any better than that without some extra information about the points.
The question is quite unclear but it appears from comment "I mean find that a in f(x)=ax to have maximum points which are valid and their amount doesn't exceed some value X" that you want to find a such that N(a)=X, where by N(a) I mean number of points right of the y axis and above line y=ax; or if no such a exists, find a such that m = N(a)<X and N(b)<m implies N(b)<X.
Here's an O(n*ln(n)) algorithm: For each point p, excluding any p below y=0, compute slope M_p as ratio of p's y and x coordinates, or DBL_MAX if x=0. Sort the M's into ascending order (this is the O(n*ln(n)) step), and call the sorted array S.
Now we will set up an array T such that when any X is given, S[T[X-1]] is a slope that will place X points on or above that slope:
S[n] = DBL_MAX;
for (k=0, j=n-1; k<=n; --j) {
T[j] = k;
do ++k; while (S[k]==S[k-1] && k<=n);
}
Thereafter, let any X be given. Let h = T[X-1]. If h<n then N(S[h]) <= X; if h==n, there are multiple points on the Y axis and no finite slope will work.
This algorithm uses time O(n*ln(n)) and space O(n) to preprocess a set of n first-quadrant points, and thereafter uses time O(1) to find an a for any given X, 0 < X <= n, such that N(a) = X, if such a exists, else returns a such that N(a) < X < N(b) if b>a, else returns DBL_MAX.

Resources