Maximum rectangle overlapping point - algorithm

Given the coordinates of N rectangles (N<=100.000) in the grid L*C (L and C can range from 0 to 1.000.000.000) I want to know what is the maximum number of rectangle overlapping at any point in the grid.
So I figured I would use a sweeping algorithm, for each event (opening or ending of a rectangle) sorted by x value, I add or remove an interval to my structure.
I have to use a tree to maintain the maximum overlapping of the intervals, and be able to add and remove an interval.
I know how to do that when the values of the intervals (start and end) are ranging from 0 to 100.000, but it is impossible here since the dimensions of the plane are from 0 to 1.000.000.000. How can I implement such a tree?

If you know the coordinates of all the rectangles up-front, you can use "coordinate compression".
Since you only have 10^5 rectangles, that means you have at most 2*10^5 different x and y coordinates. You can therefore create a mapping from those coordinates to natural numbers from 1 to 2*10^5 (by simply sorting the coordinates). Then you can just use the normal tree that you already know for the new coordinates.
This would be enough to get the number of rectangles, but if you also need the point where they overlap, you should also maintain a reverse mapping so you can get back to the real coordinates of the rectangles. In the general case, the answer will be a rectangle, not just a single point.

Use an interval tree. Your case is a bit more complicated because you really need a weighted interval tree, where the weight is the number of open rectangles for that interval.

Related

Minimum amount of rectangles from multi-colored grid

I've been working for some time in an XNA roguelike game and I can't get my head around the following problem: developing an algorithm to divide a matrix of non-binary values into the fewest rectangles grouping these values.
Example: given the following matrix
01234567
0 ---##*##
1 ---##*##
2 --------
The algorithm should return:
3x3 rectangle of '-'s starting at (0,0)
2x2 rectangle of '#'s starting at (3, 0)
1x2 rectangle of '*'s starting at (5, 0)
2x2 rectangle of '#'s starting at (6, 0)
5x1 rectangle of '-'s starting at (3, 2)
Why am I doing this: I've gotten a pretty big dungeon type with a size of approximately 500x500. If I were to individually call the "Draw" method for each tile's Sprite, my FPS would be far too low. It is possible to optimize this process by grouping similar-textured tiles and applying texture repetition to them, which would dramatically decrease the amount of GPU draw calls for that. For example, if my map were the previous matrix, instead of calling draw 16 times, I'd call it only 5 times.
I've looked at some algorithms which can give you the biggest rectangle of a type inside a given binary matrix, but that doesn't fit my problem.
Thanks in advance!
You can use breadth first searches to separate each area of different tile type.
Picking a partitioning within the individual shapes is an NP-hard problem (see https://en.wikipedia.org/wiki/Graph_partition), so you can't find an efficient solution that guarantees the minimum number of rectangles. However if you don't mind an extra rectangle or two for each shape and your shapes are relatively small, you can come up with algorithms that split the shape into a number of rectangles close to the minimum.
An off the top of my head guess for something that could potentially work would be to pick a tile with the maximum connecting tiles and start growing a rectangle from it using a recursive algorithm to maximize the size. Remove the resulting rectangle from the shape, then repeat until there are no more tiles not included in a rectangle. Again, this won't produce perfect results, there are graphs on which this will return with more than the minimum amount of rectangles, but it's an easy to implement ballpark solution. With a little more effort I'm sure you will be able to find better heuristics to use and get better results too.
One possible building block is a routine to check, given two points, whether the rectangle formed by using those points as opposite corners is all of the same type. I think that a fast (but unreliable) means of testing this can be based on mapping each type to a large random number, and then working out the sum of the numbers within a rectangle modulo a large prime. Take one of the numbers within the rectangle. If the sum of the numbers within the rectangle is the size of the rectangle times the one number sampled, assume that the all of the numbers in the rectangle are the same.
In one dimension we can work out all of the cumulative sums a, a+b, a+b+c, a+b+c+d,... in time O(N) and then, for any two points, work out the sum for the interval between them by subtracting cumulative sums: b+c+d = a+b+c+d - a. In two dimensions, we can use cumulative sums to work out, for each point, the sum of all of the numbers from positions which have x and y co-ordinates no greater than the (x, y) coordinate of that position. For any rectangle we can work out the sum of the numbers within that rectangle by working out A-B-C+D where A,B,C,D are two-dimensional cumulative sums.
So with pre-processing O(N) we can work out a table which allows us to compute the sum of the numbers within a rectangle specified by its opposite corners in time O(1). Unless we are very unlucky, checking this sum against the size of the rectangle times a number extracted from within the rectangle will tell us whether the rectangle is all of the same type.
Based on this, repeatedly start with a random point not covered. Take a point just to its left and move that point left as long as the interval between the two points is of the same type. Then move that point up as long as the rectangle formed by the two points is of the same type. Now move the first point to the right and down as long as the rectangle formed by the two points is of the same type. Now you think you have a large rectangle covering the original point. Check it. In the unlikely event that it is not all of the same type, add that rectangle to a list of "fooled me" rectangles you can check against in future and try again. If it is all of the same type, count that as one extracted rectangle and mark all of the points in it as covered. Continue until all points are covered.
This is a greedy algorithm that makes no attempt at producing the optimal solution, but it should be reasonably fast - the most expensive part is checking that the rectangle really is all of the same type, and - assuming you pass that test - each cell checked is also a cell covered so the total cost for the whole process should be O(N) where N is the number of cells of input data.

Most efficient way to select point with the most surrounding points

N.B: there's a major edit at the bottom of the question - check it out
Question
Say I have a set of points:
I want to find the point with the most points surrounding it, within radius (ie a circle) or within (ie a square) of the point for 2 dimensions. I'll refer to it as the densest point function.
For the diagrams in this question, I'll represent the surrounding region as circles. In the image above, the middle point's surrounding region is shown in green. This middle point has the most surrounding points of all the points within radius and would be returned by the densest point function.
What I've tried
A viable way to solve this problem would be to use a range searching solution; this answer explains further and that it has " worst-case time". Using this, I could get the number of points surrounding each point and choose the point with largest surrounding point count.
However, if the points were extremely densely packed (in the order of a million), as such:
then each of these million points () would need to have a range search performed. The worst-case time , where is the number of points returned in the range, is true for the following point tree types:
kd-trees of two dimensions (which are actually slightly worse, at ),
2d-range trees,
Quadtrees, which have a worst-case time of
So, for a group of points within radius of all points within the group, it gives complexity of for each point. This yields over a trillion operations!
Any ideas on a more efficient, precise way of achieving this, so that I could find the point with the most surrounding points for a group of points, and in a reasonable time (preferably or less)?
EDIT
Turns out that the method above is correct! I just need help implementing it.
(Semi-)Solution
If I use a 2d-range tree:
A range reporting query costs , for returned points,
For a range tree with fractional cascading (also known as layered range trees) the complexity is ,
For 2 dimensions, that is ,
Furthermore, if I perform a range counting query (i.e., I do not report each point), then it costs .
I'd perform this on every point - yielding the complexity I desired!
Problem
However, I cannot figure out how to write the code for a counting query for a 2d layered range tree.
I've found a great resource (from page 113 onwards) about range trees, including 2d-range tree psuedocode. But I can't figure out how to introduce fractional cascading, nor how to correctly implement the counting query so that it is of O(log n) complexity.
I've also found two range tree implementations here and here in Java, and one in C++ here, although I'm not sure this uses fractional cascading as it states above the countInRange method that
It returns the number of such points in worst case
* O(log(n)^d) time. It can also return the points that are in the rectangle in worst case
* O(log(n)^d + k) time where k is the number of points that lie in the rectangle.
which suggests to me it does not apply fractional cascading.
Refined question
To answer the question above therefore, all I need to know is if there are any libraries with 2d-range trees with fractional cascading that have a range counting query of complexity so I don't go reinventing any wheels, or can you help me to write/modify the resources above to perform a query of that complexity?
Also not complaining if you can provide me with any other methods to achieve a range counting query of 2d points in in any other way!
I suggest using plane sweep algorithm. This allows one-dimensional range queries instead of 2-d queries. (Which is more efficient, simpler, and in case of square neighborhood does not require fractional cascading):
Sort points by Y-coordinate to array S.
Advance 3 pointers to array S: one (C) for currently inspected (center) point; other one, A (a little bit ahead) for nearest point at distance > R below C; and the last one, B (a little bit behind) for farthest point at distance < R above it.
Insert points pointed by A to Order statistic tree (ordered by coordinate X) and remove points pointed by B from this tree. Use this tree to find points at distance R to the left/right from C and use difference of these points' positions in the tree to get number of points in square area around C.
Use results of previous step to select "most surrounded" point.
This algorithm could be optimized if you rotate points (or just exchange X-Y coordinates) so that width of the occupied area is not larger than its height. Also you could cut points into vertical slices (with R-sized overlap) and process slices separately - if there are too many elements in the tree so that it does not fit in CPU cache (which is unlikely for only 1 million points). This algorithm (optimized or not) has time complexity O(n log n).
For circular neighborhood (if R is not too large and points are evenly distributed) you could approximate circle with several rectangles:
In this case step 2 of the algorithm should use more pointers to allow insertion/removal to/from several trees. And on step 3 you should do a linear search near points at proper distance (<=R) to distinguish points inside the circle from the points outside it.
Other way to deal with circular neighborhood is to approximate circle with rectangles of equal height (but here circle should be split into more pieces). This results in much simpler algorithm (where sorted arrays are used instead of order statistic trees):
Cut area occupied by points into horizontal slices, sort slices by Y, then sort points inside slices by X.
For each point in each slice, assume it to be a "center" point and do step 3.
For each nearby slice use binary search to find points with Euclidean distance close to R, then use linear search to tell "inside" points from "outside" ones. Stop linear search where the slice is completely inside the circle, and count remaining points by difference of positions in the array.
Use results of previous step to select "most surrounded" point.
This algorithm allows optimizations mentioned earlier as well as fractional cascading.
I would start by creating something like a https://en.wikipedia.org/wiki/K-d_tree, where you have a tree with points at the leaves and each node information about its descendants. At each node I would keep a count of the number of descendants, and a bounding box enclosing those descendants.
Now for each point I would recursively search the tree. At each node I visit, either all of the bounding box is within R of the current point, all of the bounding box is more than R away from the current point, or some of it is inside R and some outside R. In the first case I can use the count of the number of descendants of the current node to increase the count of points within R of the current point and return up one level of the recursion. In the second case I can simply return up one level of the recursion without incrementing anything. It is only in the intermediate case that I need to continue recursing down the tree.
So I can work out for each point the number of neighbours within R without checking every other point, and pick the point with the highest count.
If the points are spread out evenly then I think you will end up constructing a k-d tree where the lower levels are close to a regular grid, and I think if the grid is of size A x A then in the worst case R is large enough so that its boundary is a circle that intersects O(A) low level cells, so I think that if you have O(n) points you could expect this to cost about O(n * sqrt(n)).
You can speed up whatever algorithm you use by preprocessing your data in O(n) time to estimate the number of neighbouring points.
For a circle of radius R, create a grid whose cells have dimension R in both the x- and y-directions. For each point, determine to which cell it belongs. For a given cell c this test is easy:
c.x<=p.x && p.x<=c.x+R && c.y<=p.y && p.y<=c.y+R
(You may want to think deeply about whether a closed or half-open interval is correct.)
If you have relatively dense/homogeneous coverage, then you can use an array to store the values. If coverage is sparse/heterogeneous, you may wish to use a hashmap.
Now, consider a point on the grid. The extremal locations of a point within a cell are as indicated:
Points at the corners of the cell can only be neighbours with points in four cells. Points along an edge can be neighbours with points in six cells. Points not on an edge are neighbours with points in 7-9 cells. Since it's rare for a point to fall exactly on a corner or edge, we assume that any point in the focal cell is neighbours with the points in all 8 surrounding cells.
So, if a point p is in a cell (x,y), N[p] identifies the number of neighbours of p within radius R, and Np[y][x] denotes the number of points in cell (x,y), then N[p] is given by:
N[p] = Np[y][x]+
Np[y][x-1]+
Np[y-1][x-1]+
Np[y-1][x]+
Np[y-1][x+1]+
Np[y][x+1]+
Np[y+1][x+1]+
Np[y+1][x]+
Np[y+1][x-1]
Once we have the number of neighbours estimated for each point, we can heapify that data structure into a maxheap in O(n) time (with, e.g. make_heap). The structure is now a priority-queue and we can pull points off in O(log n) time per query ordered by their estimated number of neighbours.
Do this for the first point and use a O(log n + k) circle search (or some more clever algorithm) to determine the actual number of neighbours the point has. Make a note of this point in a variable best_found and update its N[p] value.
Peek at the top of the heap. If the estimated number of neighbours is less than N[best_found] then we are done. Otherwise, repeat the above operation.
To improve estimates you could use a finer grid, like so:
along with some clever sliding window techniques to reduce the amount of processing required (see, for instance, this answer for rectangular cases - for circular windows you should probably use a collection of FIFO queues). To increase security you can randomize the origin of the grid.
Considering again the example you posed:
It's clear that this heuristic has the potential to save considerable time: with the above grid, only a single expensive check would need to be performed in order to prove that the middle point has the most neighbours. Again, a higher-resolution grid will improve the estimates and decrease the number of expensive checks which need to be made.
You could, and should, use a similar bounding technique in conjunction with mcdowella's answers; however, his answer does not provide a good place to start looking, so it is possible to spend a lot of time exploring low-value points.

Given a set of rectangles, do any overlap?

Given a set of rectangles represented as tuples (xmin, xmax, ymin, ymax) where xmin and xmax are the left and right edges, and ymin and ymax are the bottom and top edges, respectively - is there any pair of overlapping rectangles in the set?
A straightforward approach is to compare every pair of rectangles for overlap, but this is O(n^2) - it should be possible to do better.
Update: xmin, xmax, ymin, ymax are integers. So a condition for rectangle 1 and rectangle 2 to overlap is xmin_2 <= xmax_1 AND xmax_2 >= xmin_1; similarly for the Y coordinates.
If one rectangle contains another, the pair is considered overlapping.
You can do it in O(N log N) approach the following way.
Firstly, "squeeze" your y coordinates. That is, sort all y coordinates (tops and bottoms) together in one array, and then replace coordinates in your rectangle description by its index in a sorted array. Now you have all y's being integers from 0 to 2n-1, and the answer to your problem did not change (in case you have equal y's, see below).
Now you can divide the plane into 2n-1 stripes, each unit height, and each rectangle spans completely several of them. Prepare an segment tree for these stripes. (See this link for segment tree overview.)
Then, sort all x-coordinates in question (both left and right boundaries) in the same array, keeping for each coordinate the information from which rectangle it comes and whether this is a left or right boundary.
Then go through this list, and as you go, maintain list of all the rectangles that are currently "active", that is, for which you have seen a left boundary but not right boundary yet.
More exactly, in your segment tree you need to keep for each stripe how many active rectangles cover it. When you encounter a left boundary, you need to add 1 for all stripes between a corresponding rectangle's bottom and top. When you encounter a right boundary, you need to subtract one. Both addition and subtraction can be done in O(log N) using the mass update (lazy propagation) of the segment tree.
And to actually check what you need, when you meet a left boundary, before adding 1, check, whether there is at least one stripe between bottom and top that has non-zero coverage. This can be done in O(log N) by performing a sum on interval query in segment tree. If the sum on this interval is greater than 0, then you have an intersection.
squeeze y's
sort all x's
t = segment tree on 2n-1 cells
for all x's
r = rectangle for which this x is
if this is left boundary
if t.sum(r.bottom, r.top-1)>0 // O(log N) request
you have occurence
t.add(r.bottom, r.top-1, 1) // O(log N) request
else
t.subtract(r.bottom, r.top-1) // O(log N) request
You should implement it carefully taking into account whether you consider a touch to be an intersection or not, and this will affect your treatment of equal numbers. If you consider touch an intersection, then all you need to do is, when sorting y's, make sure that of all points with equal coordinates all tops go after all bottoms, and similarly when you sort x's, make sure that of all equal x's all lefts go before all rights.
Why don't you try a plane sweep algorithm? Plane sweep is a design paradigm widely used in computational geometry, so it has the advantage that it is well studied and a lot of documetation is available online. Take a look at this. The line segment intersection problem should give you some ideas, also the area of union of rectangles.
Read about Bentley-Ottman algorithm for line segment intersection, the problem is very similar to yours and it has O((n+k)logn) where k is the number of intersections, nevertheless, since your rectangles sides are parallel to the x and y axis, it is way more simpler so you can modify Bentley-Ottman to run in O(nlogn +k) since you won't need to update the event heap, since all intersections can be detected once the rectangle is visited and won't modify the sweep line ordering, so no need to mantain the events. To retrieve all intersecting rectangles with the new rectangle I suggest using a range tree on the ymin and ymax for each rectangle, it will give you all points lying in the interval defined by the ymin and ymax of the new rectangle and thus the rectangles intersecting it.
If you need more details you should take a look at chapter two of M. de Berg, et. al Computational Geometry book. Also take a look at this paper, they show how to find all intersections between convex polygons in O(nlogn + k), it might prove simpler than my above suggestion since all data strcutures are explained there and your rectangles are convex, a very good thing in this case.
You can do better by building a new list of rectangles that do not overlap. From the set of rectangles, take the first one and add it to the list. It obviously does not overlap with any others because it is the only one in the list. Take the next one from the set and see if it overlaps with the first one in the list. If it does, return true; otherwise, add it to the list. Repeat for all rectangles in the set.
Each time, you are comparing rectangle r with the r-1 rectangles in the list. This can be done in O(n*(n-1)/2) or O((n^2-n)/2). You can even apply this algorithm to the original set without having to create a new list.

Generating random points with defined minimum and maximum distance

I need algorithm ideas for generating points in 2D space with defined minimum and maximum possible distances between points.
Bassicaly, i want to find a good way to insert a point in a 2D space filled with points, in such manner that the point has random location, but is also more than MINIMUM_DISTANCE_NUM and less than MAXIMUM_DISTANCE_NUM away from nearest points.
I need it for a game, so it should be fast, and not depending on random probability.
Store the set of points in a Kd tree. Generate a new point at random, then examine its nearest neighbors which can be looked up quickly in the Kd tree. If the point is accepted (i.e. MIN_DIST < nearest neighbor < MAX_DIST), then add it to the tree.
I should note that this will work best under conditions where the points are not too tightly packed, i.e. MIN * N << L where N is the number of points and L is the dimension of the box you are putting them in. If this is not true, then most new points will be rejected. But think about it, in this limit, you're packing marbles into a box, and the arrangement can't be very "random" at all above a certain density.
You could use a 2D regular grid of Points (P0,P1,P2,P3,...,P(m*n), when m is width and n is height of this grid )
Each point is associated with 1) a boolean saying wether this grid point was used or not and 2) a 'shift' from this grid position to avoid too much regularity. (or you can put the point+shift coordinates in your grid allready)
Then when you need a new point, just pick a random point of your grid which was not used, state this point as 'used' and use the Point+shift in your game.
Depending on n, m, width/height of your 2D space, and the number of points you're gonna use, this could be just fine.
A good option for this is to use Poisson-Disc sampling. The algorithm is efficient (O(n)) and "produces points that are tightly-packed, but no closer to each other than a specified minimum distance, resulting in a more natural pattern".
https://www.jasondavies.com/poisson-disc/
How many points are you talking about? If you have an upper limit to the number of points, you can generate (pre-compute) an array of points and store them in the array. Take that array and store them in a file.
You'll have all the hard computational processing done before the map loads (so that way you can use any random point generating algorithm) and then you have a nice fast way to get your points.
That way you can generate a ton of different maps and then randomly select one of the maps to generate your points.
This only works if you have an upper limit and you can precomputer the points before the game loads

Find the largest convex black area in an image

I have an image of which this is a small cut-out:
As you can see it are white pixels on a black background. We can draw imaginary lines between these pixels (or better, points). With these lines we can enclose areas.
How can I find the largest convex black area in this image that doesn't contain a white pixel in it?
Here is a small hand-drawn example of what I mean by the largest convex black area:
P.S.: The image is not noise, it represents the primes below 10000000 ordered horizontally.
Trying to find maximum convex area is a difficult task to do. Wouldn't you just be fine with finding rectangles with maximum area? This problem is much easier and can be solved in O(n) - linear time in number of pixels. The algorithm follows.
Say you want to find largest rectangle of free (white) pixels (Sorry, I have images with different colors - white is equivalent to your black, grey is equivalent to your white).
You can do this very efficiently by two pass linear O(n) time algorithm (n being number of pixels):
1) in a first pass, go by columns, from bottom to top, and for each pixel, denote the number of consecutive pixels available up to this one:
repeat, until:
2) in a second pass, go by rows, read current_number. For each number k keep track of the sums of consecutive numbers that were >= k (i.e. potential rectangles of height k). Close the sums (potential rectangles) for k > current_number and look if the sum (~ rectangle area) is greater than the current maximum - if yes, update the maximum. At the end of each line, close all opened potential rectangles (for all k).
This way you will obtain all maximum rectangles. It is not the same as maximum convex area of course, but probably would give you some hints (some heuristics) on where to look for maximum convex areas.
I'll sketch a correct, poly-time algorithm. Undoubtedly there are data-structural improvements to be made, but I believe that a better understanding of this problem in particular will be required to search very large datasets (or, perhaps, an ad-hoc upper bound on the dimensions of the box containing the polygon).
The main loop consists of guessing the lowest point p in the largest convex polygon (breaking ties in favor of the leftmost point) and then computing the largest convex polygon that can be with p and points q such that (q.y > p.y) || (q.y == p.y && q.x > p.x).
The dynamic program relies on the same geometric facts as Graham's scan. Assume without loss of generality that p = (0, 0) and sort the points q in order of the counterclockwise angle they make with the x-axis (compare two points by considering the sign of their dot product). Let the points in sorted order be q1, …, qn. Let q0 = p. For each 0 ≤ i < j ≤ n, we're going to compute the largest convex polygon on points q0, a subset of q1, …, qi - 1, qi, and qj.
The base cases where i = 0 are easy, since the only “polygon” is the zero-area segment q0qj. Inductively, to compute the (i, j) entry, we're going to try, for all 0 ≤ k ≤ i, extending the (k, i) polygon with (i, j). When can we do this? In the first place, the triangle q0qiqj must not contain other points. The other condition is that the angle qkqiqj had better not be a right turn (once again, check the sign of the appropriate dot product).
At the end, return the largest polygon found. Why does this work? It's not hard to prove that convex polygons have the optimal substructure required by the dynamic program and that the program considers exactly those polygons satisfying Graham's characterization of convexity.
You could try treating the pixels as vertices and performing Delaunay triangulation of the pointset. Then you would need to find the largest set of connected triangles that does not create a concave shape and does not have any internal vertices.
If I understand your problem correctly, it's an instance of Connected Component Labeling. You can start for example at: http://en.wikipedia.org/wiki/Connected-component_labeling
I thought of an approach to solve this problem:
Out of the set of all points generate all possible 3-point-subsets. This is a set of all the triangles in your space. From this set remove all triangles that contain another point and you obtain the set of all empty triangles.
For each of the empty triangles you would then grow it to its maximum size. That is, for every point outside the rectangle you would insert it between the two closest points of the polygon and check if there are points within this new triangle. If not, you will remember that point and the area it adds. For every new point you want to add that one that maximizes the added area. When no more point can be added the maximum convex polygon has been constructed. Record the area for each polygon and remember the one with the largest area.
Crucial to the performance of this algorithm is your ability to determine a) whether a point lies within a triangle and b) whether the polygon remains convex after adding a certain point.
I think you can reduce b) to be a problem of a) and then you only need to find the most efficient method to determine whether a point is within a triangle. The reduction of the search space can be achieved as follows: Take a triangle and increase all edges to infinite length in both directions. This separates the area outside the triangle into 6 subregions. Good for us is that only 3 of those subregions can contain points that would adhere to the convexity constraint. Thus for each point that you test you need to determine if its in a convex-expanding subregion, which again is the question of whether it's in a certain triangle.
The whole polygon as it evolves and approaches the shape of a circle will have smaller and smaller regions that still allow convex expansion. A point once in a concave region will not become part of the convex-expanding region again so you can quickly reduce the number of points you'll have to consider for expansion. Additionally while testing points for expansion you can further cut down the list of possible points. If a point is tested false, then it is in the concave subregion of another point and thus all other points in the concave subregion of the tested points need not be considered as they're also in the concave subregion of the inner point. You should be able to cut down to a list of possible points very quickly.
Still you need to do this for every empty triangle of course.
Unfortunately I can't guarantee that by adding always the maximum new region your polygon becomes the maximum polygon possible.

Resources