Finding a square side length is R in 2D plane ? - algorithm

I was at the high frequency Trading firm interview, they asked me
Find a square whose length size is R with given n points in the 2D plane
conditions:
--parallel sides to the axis
and it contains at least 5 of the n points
running complexity is not relative to the R
they told me to give them O(n) algorithm

Interesting problem, thanks for posting! Here's my solution. It feels a bit inelegant but I think it meets the problem definition:
Inputs: R, P = {(x_0, y_0), (x_1, y_1), ..., (x_N-1, y_N-1)}
Output: (u,v) such that the square with corners (u,v) and (u+R, v+R) contains at least 5 points from P, or NULL if no such (u,v) exist
Constraint: asymptotic run time should be O(n)
Consider tiling the plane with RxR squares. Construct a sparse matrix, B defined as
B[i][j] = {(x,y) in P | floor(x/R) = i and floor(y/R) = j}
As you are constructing B, if you find an entry that contains at least five elements stop and output (u,v) = (i*R, j*R) for i,j of the matrix entry containing five points.
If the construction of B did not yield a solution then either there is no solution or else the square with side length R does not line up with our tiling. To test for this second case we will consider points from four adjacent tiles.
Iterate the non-empty entries in B. For each non-empty entry B[i][j], consider the collection of points contained in the tile represented by the entry itself and in the tiles above and to the right. These are the points in entries: B[i][j], B[i+1][j], B[i][j+1], B[i+1][j+1]. There can be no more than 16 points in this collection, since each entry must have fewer than 5. Examine this collection and test if there are 5 points among the points in this collection satisfying the problem criteria; if so stop and output the solution. (I could specify this algorithm in more detail, but since (a) such an algorithm clearly exists, and (b) its asymptotic runtime is O(1), I won't go into that detail).
If after iterating the entries in B no solution is found then output NULL.
The construction of B involves just a single pass over P and hence is O(N). B has no more than N elements, so iterating it is O(N). The algorithm for each element in B considers no more than 16 points and hence does not depend on N and is O(1), so the overall solution meets the O(N) target.

Run through set once, keeping the 5 largest x values in a (sorted) local array. Maintaining the sorted local array is O(N) (constant time performed N times at most).
Define xMin and xMax as the x-coordinates of the two points with largest and 5th largest x values respectively (ie (a[0] and a[4]).
Sort a[] again on Y value, and set yMin and yMax as above, again in constant time.
Define deltaX = xMax- xMin, and deltaY as yMax - yMin, and R = largest of deltaX and deltaY.
The square of side length R located with upper-right at (xMax,yMax) meets the criteria.
Observation if R is fixed in advance:
O(N) complexity means no sort is allowed except on a fixed number of points, as only a Radix sort would meet the criteria and it requires a constraint on the values of xMax-xMin and of yMax-yMin, which was not provided.
Perhaps the trick is to start with the point furthest down and left, and move up and right. The lower-left-most point can be determined in a single pass of the input.
Moving up and right in steps and counitng points in the square requries sorting the points on X and Y in advance, which to be done in O(N) time requiress that the Radix sort constraint be met.

Related

How to find independent points in a unit square in O(n log n)?

Consider a unit square containing n 2D points. We say that two points p and q are independent in a square, if the Euclidean distance between them is greater than 1. A unit square can contain at most 3 mutually independent points. I would like to find those 3 mutually independent points in the given unit square in O(n log n). Is it possible? Please help me.
Can this problem be solved in O(n^2) without using any spatial data structures such as Quadtree, kd-tree, etc?
Use a spatial data structure such as a Quadtree to store your points. Each node in the quadtree has a bounding box and a set of 4 child nodes, and a list of points (empty except for the leaf nodes). The points are stored in the leaf nodes.
The point quadtree is an adaptation of a binary tree used to represent two-dimensional point data. It shares the features of all quadtrees but is a true tree as the center of a subdivision is always on a point. The tree shape depends on the order in which data is processed. It is often very efficient in comparing two-dimensional, ordered data points, usually operating in O(log n) time.
For each point, maintain a set of all points that are independent of that point.
Insert all your points into the quadtree, then iterate through the points and use the quadtree to find the points that are independent of each:
main()
{
for each point p
insert p into quadtree
set p's set to empty
for each point p
findIndependentPoints(p, root node of quadtree)
}
findIndependentPoints(Point p, QuadTreeNode n)
{
Point f = farthest corner of bounding box of n
if distance between f and p < 1
return // none of the points in this node or
// its children are independent of p
for each point q in n
if distance between p and q > 1
find intersection r of q's set and p's set
if r is non-empty then
p, q, r are the 3 points -> ***SOLVED***
add p to q's set of independent points
add q to p's set of independent points
for each subnode m of n (up 4 of them)
findIndependentPoints(p, m)
}
You could speed up this:
find intersection r of q's set and p's set
by storing each set as a quadtree. Then you could find the intersection by searching in q's quadtree for a point independent of p using the same early-out technique:
// find intersection r of q's set and p's set:
// r = findMututallyIndependentPoint(p, q's quadtree root)
Point findMututallyIndependentPoint(Point p, QuadTreeNode n)
{
Point f = farthest corner of bounding box of n
if distance between f and p < 1
return // none of the points in this node or
// its children are independent of p
for each point r in n
if distance between p and r > 1
return r
for each subnode m of n (up 4 of them)
findMututallyIndependentPoint(p, m)
}
An alternative to using Quadtrees is using K-d trees, which produces more balanced trees where each leaf node is a similar depth from the root. The algorithm for finding independent points in that case would be the same, except that there would only be up to 2 and not 4 child nodes for each node in the data structure, and the bounding boxes at each level would be of variable size.
You might want to try this out.
Pick the top left point (Y) with coordinate (0,1). Calculate distance from each point from the List to point Y.
Sort the result in increasing order into SortedPointList (L)
If the first point (A) and the last point (B) in list L are independent:
Foreach point P in list L:
if P is independent to both A and B:
Return A, B, P
Pick the top right point (X) with coordinate (1,1). Calculate distance from each point from the List to point X.
Sort the result in increasing order into SortedPointList (S)
If the first point (C) and the last point (D) in list L are independent:
Foreach point O in list S:
if P is independent to both C and D:
Return C, D, O
Return null
This is a wrong solution. Kept it just for comments. If one finds another solution based on smallest enclosing circle, please put a link as a comment.
Solve the Smallest-circle problem.
If diameter of a circle <= 1, return null.
If the circle is determined by 3 points, check which are "mutually independent". If there are only two of them, try to find the third by iteration.
If the circle is determined by 2 points, they are "mutually independent". Try to find the third one by iteration.
Smallest-sircle problem can be solved in O(N), thus the whole problem complexity is also O(N).

Maximum minimum manhattan distance

Input:
A set of points
Coordinates are non-negative integer type.
Integer k
Output:
A point P(x, y) (in or not in the given set) whose manhattan distance to closest is maximal and max(x, y) <= k
My (naive) solution:
For every (x, y) in the grid which contain given set
BFS to find closest point to (x, y)
...
return maximum;
But I feel it run very slow for a large grid, please help me to design a better algorithm (or the code / peseudo code) to solve this problem.
Should I instead of loop over every (x, y) in grid, just need to loop every median x, y
P.S: Sorry for my English
EDIT:
example:
Given P1(x1,y1), P2(x2,y2), P3(x3,y3). Find P(x,y) such that min{dist(P,P1), dist(P,P2),
dist(P,P3)} is maximal
Yes, you can do it better. I'm not sure if my solution is optimal, but it's better than yours.
Instead of doing separate BFS for every point in the grid. Do a 'cumulative' BFS from all the input points at once.
You start with 2-dimensional array dist[k][k] with cells initialized to +inf and zero if there is a point in the input for this cell, then from every point P in the input you try to go in every possible direction. The further you are from the start point the bigger integer you put in the array dist. If there is a value in dist for a specific cell, but you can get there with a smaller amount of steps (smaller integer) you overwrite it.
In the end, when no more moves can be done, you scan the array dist to find the cell with maximum value. This is your point.
I think this would work quite well in practice.
For k = 3, assuming 1 <= x,y <= k, P1 = (1,1), P2 = (1,3), P3 = (2,2)
dist would be equal in the beginning
0, +inf, +inf,
+inf, 0, +inf,
0, +inf, +inf,
in the next step it would be:
0, 1, +inf,
1, 0, 1,
0, 1, +inf,
and in the next step it would be:
0, 1, 2,
1, 0, 1,
0, 1, 2,
so the output is P = (3,1) or (3,3)
If K is not large enough and you need to find a point with integer coordinates, you should do, as another answer suggested - Calculate minimum distances for all points on the grid, using BFS, strarting from all given points at once.
Faster solution, for large K, and probably the only one which can find a point with float coordinates, is as following. It has complexity of O(n log n log k)
Search for resulting maximum distance using dihotomy. You have to check if there is any point inside the square [0, k] X [0, k] which is at least given distance away from all points in given set. Suppose, you can check that fast enough for any distance. It is obvious, that if there is such point for some distance R, there always will be some point for all smaller distances r < R. For example, the same point would go. Thus you can search for maximum distance using binary search procedure.
Now, how to fast check for existence (and also find) a point which is at least r units away from all given points. You should draw "Manhattan spheres of radius r" around all given points. These are set of points at most r units away from given point. They are tilted by 45 degrees squares with diagonal equal to 2r. Now turn the picture by 45 degrees, and all squares will be parallel to the axis. Now you can check for existence of any point outside such squares using sweeping line algorithm. You have to sort all vertical edges of squares, and then process them one by one from left to right. Left borders will add segment mark to sweeping line, Left borders will erase it. And you have to check if there is any non marked point on the line. You can implement it using segment tree. Then, you have to check if there is any non marked point on the line inside the initial square [0,k]X[0,k].
So, again, overall solution will be binary search for r. Inside of it you will have to check if there is any point at least r units away from all given points. Do that by constructing "manhattans spheres of radius r" and then scanning them with a diagonal line from left-top corner to right-bottom. While moving line you should store number of opened spheres at each point at the line in the segment tree. between opening and closing of any spheres, line does not change, and if there is any free point there, it means, that you found it for distance r.
Binary search contributes log k to complexity. Each checking procedure is n log n for sorting squares borders, and n log k (n log n?) for processing them all.
Voronoi diagram would be another fast solution and could also find non integer answer. But it is much much harder to implement even for Manhattan measure.
First try
We can turn a 2D problem into a 1D problem by projecting onto the lines y=x and y=-x. If the points are (x1,y1) and (x2,y2) then the manhattan distance is abs(x1-x2)+abs(y1-y2). Change coordinate to a u-v system with basis U = (1,1), V = (1,-1). Coords of the two points in this basis are u1 = (x1-y1)/sqrt(2), v1= (x1+y1), u2= (x1-y1), v2 = (x1+y1). And the manhatten distance is the largest of abs(u1-u2), abs(v1-v2).
How this helps. We can just work with the 1D u-values of each points. Sort by u-value, loop through points and find the largest difference between pains of points. Do the same of v-values.
Calculating u,v coords of O(n), quick sorting is O(n log n), looping through sorted list is O(n).
Alas does not work well. Fails if we have point (-10,0), (10,0), (0,-10), (0,10). Lets try a
Voronoi diagram
Construct a Voronoi diagram
using Manhattan distance. This can be calculate in O(n log n) using https://en.wikipedia.org/wiki/Fortune%27s_algorithm
The vertices in the diagram are points which have maximum distance from its nearest vertices. There is psudo-code for the algorithm on the wikipedia page. You might need to adapt this for Manhattan distance.

A divide-and-conquer algorithm for counting dominating points?

Let's say that a point at coordinate (x1,y1) dominates another point (x2,y2) if x1 ≤ x2 and y1 ≤ y2;
I have a set of points (x1,y1) , ....(xn,yn) and I want to find the total number of dominating pairs. I can do this using brute force by comparing all points against one another, but this takes time O(n2). Instead, I'd like to use a divide-and-conquer approach to solve this in time O(n log n).
Right now, I have the following algorithm:
Draw a vertical line dividing the set of points points into two equal subsets of Pleft and Pright. As a base case, if there are just two points left, I can compare them directly.
Recursively count the number of dominating pairs in Pleft and Pright
Some conquer step?
The problem is that I can't see what the "conquer" step should be here. I want to count how many dominating pairs there are that cross from Pleft into Pright, but I don't know how to do that without comparing all the points in both parts, which would take time O(n2).
Can anyone give me a hint about how to do the conquer step?
so the 2 halves of y coordinates are : {1,3,4,5,5} and {5,8,9,10,12}
i draw the division line.
Suppose you sort the points in both halves separately in ascending order by their y coordinates. Now, look at the lowest y-valued point in both halves. If the lowest point on the left has a lower y value than the lowest point on the right, then that point is dominated by all points on the right. Otherwise, the bottom point on the right doesn't dominate anything on the left.
In either case, you can remove one point from one of the two halves and repeat the process with the remaining sorted lists. This does O(1) work per point, so if there are n total points, this does O(n) work (after sorting) to count the number of dominating pairs across the two halves. If you've seen it before, this is similar to the algorithm for counting inversions in an array).
Factoring in the time required to sort the points (O(n log n)), this conquer step takes O(n log n) time, giving the recurrence
T(n) = 2T(n / 2) + O(n log n)
This solves to O(n log2 n) according to the Master Theorem.
However, you can speed this up. Suppose that before you start the divide amd conquer step that you presort the points by their y coordinates, doing one pass of O(n log n) work. Using tricks similar to the closest pair of points problem, you can then get the points in each half sorted in O(n) time on each subproblem of size n (see the discussion at this bottom of this page) for details). That changes the recurrence to
T(n) = 2T(n / 2) + O(n)
Which solves to O(n log n), as required.
Hope this helps!
Well in this way you have O(n^2) just for division to subsets...
My approach would be different
sort points by X ... O(n.log(n))
now check for Y
but check only points with bigger X (if you sort them ascending then with larger index)
so now you have O(n.log(n)+(n.n/2))
You can also further speed things up by doing separate X and Y test and after that combine the result, that would leads O(n + 3.n.log(n))
add index attribute to your points
where index = 0xYYYYXXXXh is unsigned integer type
YYYY is index of point in Y-sorted array
XXXX is index of point in X-sorted array
if you have more than 2^16 points use bigger then 32-bit data-type.
sort points by ascending X and set XXXX part of their index O1(n.log(n))
sort points by ascending Y and set YYYY part of their index O2(n.log(n))
sort points by ascending index O3(n.log(n))
now point i dominates any point j if (i < j)
but if you need to create actually all the pairs for any point
that would take O4(n.n/2) so this approach will save not a bit of time
if you need just single pair for any point then simple loop will suffice O4(n-1)
so in this case O(n-1+3.n.log(n)) -> ~O(n+3.n.log(n))
hope it helped,... of course if you are stuck with that subdivision approach than i have no better solution for you.
PS. for this you do not need any additional recursion just 3x sorting and only one uint for any point so the memory requirements are not that big and even should be faster than recursive call to subdivision recursion in general
This algorithm runs in O(N*log(N)) where N is the size of the list of points and it uses O(1) extra space.
Perform the following steps:
Sort the list of points by y-coordinate (ascending order), break ties by
x-coordinate (ascending order).
Go through the sorted list in reverse order to count the dominating points:
if the current x-coordinate >= max x-coordinate value encountered so far
then increment the result and update the max.
This works since you know for sure that if all pairs with a greater y-coordinates have a smaller x-coordinate than the current point you have found a dominating points. The sorting step makes it really efficient.
Here's the Python code:
def my_cmp(p1, p2):
delta_y = p1[1] - p2[1]
if delta_y != 0:
return delta_y
return p1[0] - p2[0]
def count_dom_points(points):
points.sort(cmp = my_cmp)
maxi = float('-inf')
count = 0
for x, y in reversed(points):
if x >= maxi:
count += 1
maxi = x
return count

Algorithm for polygon with weight on vertices and operations on edges

I am thinking about the algorithm for the following problem (found on carrercup):
Given a polygon with N vertexes and N edges. There is an int number(could be negative) on every vertex and an operation in set(*,+) on every edge. Every time, we remove an edge E from the polygon, merge the two vertexes linked by the edge(V1,V2) to a new vertex with value: V1 op(E) V2. The last case would be two vertexes with two edges, the result is the bigger one.
Return the max result value can be gotten from a given polygon.
I think we can use just greedy approach. I.e. for polygon with k edges find a pair (p, q) which produces the maximum number when collapsing: (p ,q) = max ({i operation j : i, j - adjacent edges)
Then just call a recursion on polygons:
1. Let function CollapseMaxPair( P(k) ) - gets polygon with k edges and returns 'collapsed' polygon with k-1 edges
2. Then our recursion:
P = P(N);
Releat until two edges left
P = CollapseMaxPair( P )
maxvalue = max ( two remained values)
What do you think?
I have answered this question here: Google Interview : Find the maximum sum of a polygon and it was pointed out to me that that question is a duplicate of this one. Since no one has answered this question fully yet, I have decided to add this answer here as well.
As you have identified (tagged) correctly, this indeed is very similar to the matrix multiplication problem (in what order do I multiply matrixes in order to do it quickly).
This can be solved polynomially using a dynamic algorithm.
I'm going to instead solve a similar, more classic (and identical) problem, given a formula with numbers, addition and multiplications, what way of parenthesizing it gives the maximal value, for example
6+1 * 2 becomes (6+1)*2 which is more than 6+(1*2).
Let us denote our input a1 to an real numbers and o(1),...o(n-1) either * or +. Our approach will work as follows, we will observe the subproblem F(i,j) which represents the maximal formula (after parenthasizing) for a1,...aj. We will create a table of such subproblems and observe that F(1,n) is exactly the result we were looking for.
Define
F(i,j)
- If i>j return 0 //no sub-formula of negative length
- If i=j return ai // the maximal formula for one number is the number
- If i<j return the maximal value for all m between i (including) and j (not included) of:
F(i,m) (o(m)) F(m+1,j) //check all places for possible parenthasis insertion
This goes through all possible options. TProof of correctness is done by induction on the size n=j-i and is pretty trivial.
Lets go through runtime analysis:
If we do not save the values dynamically for smaller subproblems this runs pretty slow, however we can make this algorithm perform relatively fast in O(n^3)
We create a n*n table T in which the cell at index i,j contains F(i,j) filling F(i,i) and F(i,j) for j smaller than i is done in O(1) for each cell since we can calculate these values directly, then we go diagonally and fill F(i+1,i+1) (which we can do quickly since we already know all the previous values in the recursive formula), we repeat this n times for n diagonals (all the diagonals in the table really) and filling each cell takes (O(n)), since each cell has O(n) cells we fill each diagonals in O(n^2) meaning we fill all the table in O(n^3). After filling the table we obviously know F(1,n) which is the solution to your problem.
Now back to your problem
If you translate the polygon into n different formulas (one for starting at each vertex) and run the algorithm for formula values on it, you get exactly the value you want.
Here's a case where your greedy algorithm fails:
Imagine your polygon is a square with vertices A, B, C, D (top left, top right, bottom right, bottom left). This gives us edges (A,B), (A,D), (B,C), and (C, D).
Let the weights be A=-1, B=-1, C=-1, and D=1,000,000.
A (-1) ------ B (-1)
| |
| |
| |
| |
D(1000000) ---C (-1)
Clearly, the best strategy is to collapse (A,B), and then (B,C), so that you may end up with D by itself. Your algorithm, however, will start with either (A,D) or (D,C), which will not be optimal.
A greedy algorithm that combines the min sums has a similar weakness, so we need to think of something else.
I'm starting to see how we want to try to get all positive numbers together on one side and all negatives on the other.
If we think about the initial polygon entirely as a state, then we can imagine all the possible child states to be the subsequent graphs were an edge is collapsed. This creates a tree-like structure. A BFS or DFS would eventually give us an optimal solution, but at the cost of traversing the entire tree in the worst case, which is probably not as efficient as you'd like.
What you are looking for is a greedy best-first approach to search down this tree that is provably optimal. Perhaps you could create an A*-like search through it, although I'm not sure what your admissable heuristic would be.
I don't think the greedy algorithm works. Let the vertices be A = 0, B = 1, C = 2, and the edges be AB = a - 5b, BC = b + c, CA = -20. The greedy algorithm selects BC to evaluate first, value 3. Then AB, value, -15. However, there is a better sequence to use. Evaluate AB first, value -5. Then evaluate BC, value -3. I don't know of a better algorithm though.

Fast algorithm to find out the number of points under hyperplane

Given points in Euclidean space, is there a fast algorithm to count the number of points 'under' one arbitrary hyperplane? Fast means time complexity lower than O(n)
Time for preprocessing or sorting the points is okay
And, even if not high dimensional, I'd like to know whether there exists one that can be used in 2 dimension space
If you're willing to preprocess the points, then you have to visit each one at least once, which is O(n). If you consider a test of which side the point is on as part of the preprocessing then you've got an O(0) algorithm (with O(n) preprocessing). So I don't think this question makes sense as stated.
Nevertheless, I'll attempt to give a useful answer, even if it's not precisely what the OP asked for.
Choose a hyperplane unit normal and root point. If the plane is given in parametric form
(P - O).N == 0
then you have these already, just make sure the normal is unitized.
If it's given in analytic form: Sum(i = 1 to n: a[i] x[i]) + d = 0, then the vector A = (a1, ... a[n]) is a normal of the plane, and N = A/||A|| is the unit plane normal. A point O (for origin) on the plane is d N.
You can test which side each point P is on by projecting it onto N add checking the sign of the parameter:
Let V = P - O. V is the vector from the chosen origin O to P.
Let s N be the projection of V onto N. If s is negative, then P is "under" the hyperplane.
You should go to the link on vector projection if you're rusty on the subject, but I'll summarize here using my notation. Or, you can take my word for it, and just skip to the formula at the end.
If alpha is the angle between V and N, then from the definition of cosine we have cos(alpha) = s||N||/||V|| = s/||V|| since N is a unit normal. But we also know from vector algebra that cos(alpha) = ||V||(V.N), where "." is scalar product (a.k.a. dot product, or euclidean inner product).
Equating these two expressions for cos(alpha) we have
s = (V.V)(V.N)
(using the fact that ||V||^2 == V.V).
So your proprocesing work is to compute N and O, and your test is:
bool is_under = (dot(V, V)*dot(V, N) < 0.);
I don't believe it can be done any faster.
When setting the point values, use checking conditions at that point setting. Then increment or dont increment the counter. O(n)
I found O(logN) algorithm in 2D dimension by using divide-and-conquer and binary search with O(N log N) preprocessing time complexity and O(N log N) memory complexity
The basic idea is that points can be divided into left N/2 points and right N/2 points, and the number of points that's under the line(in 2D dimension) is sum of the number of left points under the line and the number of the right points under the line. I'll call the infinite line that divides whole points into 'left' and 'right' as 'dividing line'. Dividing line will be look like 'x = k'
If each 'left points' and 'right points' are sorted by y-axis order, then the number of specific points - the points at the right lower corner - can be quickly found by using binary searching 'the number of points whose y values are lower than the y value of intersection point of the line and the Dividing line'.
Therefore time complexity is
T(N) = 2T(N/2) + O(log N)
and finally the time complexity is O(log N)

Resources