Merging overlapping axis-aligned rectangles

Merging overlapping axis-aligned rectangles - algorithm

I have a set of axis aligned rectangles. When two rectangles overlap (partly or completely), they shall be merged into their common bounding box. This process works recursively.
Detecting all overlaps and using union-find to form groups, which you merge in the end will not work, because the merging of two rectangles covers a larger area and can create new overlaps. (In the figure below, after the two overlapping rectangles have been merged, a new overlap appears.)
As in my case the number of rectangles is moderate (say N<100), a brute force solution is usable (try all pairs and if an overlap is found, merge and restart from the beginning). Anyway I would like to reduce the complexity, which is probably O(N³) in the worst case.
Any suggestion how to improve this ?

I think an R-Tree will do the job here. An R-Tree indexes rectangular regions and lets you insert, delete and query (e.g, intersect queries) in O(log n) for "normal" queries in low dimensions.
The idea is to process your rectangles successively, for each rectangle you do the following:
perform an intersect query on the current R-Tree (empty in the
beginning)
If there are results then delete the results from the R-Tree,
merge the current rectangle with all result rectangles and insert
the newly merged rectangle (for the last step jump to step 1.).
If there are no results just insert the rectangle in the R-Tree
In total you will perform
O(n) intersect queries in step 1. (O(n log n))
O(n) insert steps in step 3. (O(n log n))
at most n delete and n insert steps in step 2. This is because each time you perform step 2 you decrease the number of rectangles by at least 1 (O(n log n))
In theory you should get away with O(n log n), however the merging steps in the end (with large rectangles) might have low selectivity and need more than O(log n), but depending on the data distribution this should not ruin the overall runtime.

Use a balanced normalized quad-tree.
Normalized: Gather all the x coordinates, sort them and replace them with the index in the sorted array. Same for the y coordinates.
Balanced: When building the quad-tree always split at the middle coordinate.
So when you get a rectangle you want to go and mark the correct nodes in the tree with some id of the rectangle. If you find any other rectangles underneath(that means they will be overlapping), gather them in a set. When done, if the vector is not empty (you found overlapping rectangles), then we create a new rectangle to represent the union of the subrectangles. If the computed rectangle is bigger then the one you just inserted, then apply the algorithm again using the new computed rectangle. Repeat this until it no longer grows, then move to the next input rectangle.
For performance every node in the quad-tree store all the rectangles overlapping that node, in addition to marking it as an end-node.
Complexity: Initial normalization is O(NlogN). Inserting and checking for overlaps will be O(log(N)^2). You need to do this for the original N rectangles and also for the overlaps. Every time you find an overlap you eliminate at least one of the original rectangles so you can find at most (N-1) overlaps. So overall you need 2N operations. So overall the complexity is going to be O(N(log(N)^2)).
This is better than other approaches because you don't need to check any-to-any rectangles for overlap.

This can be solved using a combination of plane sweep and spatial data structure: we merge intersecting rectangles along the sweep line and put any rectangles behind the sweep line to spatial data structure. Every time we get a newly merged rectangle we check spatial data structure for any rectangles intersecting this new rectangle and merge it if found.
If any rectangle (R) behind the sweep line intersects some rectangle (S) under sweep line then either of two corners of R nearest to the sweep line is inside S. This means that spatial data structure should store points (two points for each rectangle) and answer queries about any point lying inside a given rectangle. One obvious way to implement such data structure is segment tree where each leaf contains the rectangle with top and bottom sides at corresponding y-coordinate and each other node points to one of its descendants containing the rightmost rectangle (where its right side is nearest to the sweep line).
To use such segment tree we should compress (normalize) y-coordinates of rectangles' corners.
Compress y-coordinates
Sort rectangles by x-coordinate of their left sides.
Move sweep line to next rectangle, if it passes right sides of some rectangles, move them to the segment tree.
Check if current rectangle intersects anything along the sweep line. If not go to step 3.
Check if union of rectangles found on step 4 intersects anything in the segment tree and merge recursively, then go to step 4.
When step 3 reaches the end of list get all rectangles under sweep line and all rectangles in segment tree and uncompress their coordinates.
Worst-case time complexity is determined by segment tree: O(n log n). Space complexity is O(n).

Related

How to find and print intersections of n given circles in O(n*log(n)) time?

I'm looking for an O(n*logn) algorithm to find and print the intersections of n given circles. Each circle is specified by its center and radius.
An O(n2) algorithm consists in checking if the distance between the centers is less than or equal to the sum of the radii (of the two circles being compared). However, I'm looking for a faster one!
A similar problem is intersection of line segments. I think even my problem can be solved using line sweep algorithm, but I am unable to figure out how to modify the event queue in-case of circles.
Please, take also care of the following corner case. The black points indicate the event points (as per User Sneftel's solution below the intersection of circles marked by arrows won't be printed)

The line sweep algorithm will simply add circles to a list when you encounter their left extrema (that is, (x-r, y)), and removed from the list when you encounter their right extrema. Right before you add a circle to the list, check it against the circles already in the list. So your event queue is basically the left and right extrema of all circles, sorted by x. (Note that you know all the events ahead of time, so it's not really a "queue" in the normal sense.)
This is also known as "sweep and prune".

This is the correct solution I found based on a modification of User Sneftel's algorithm that wasn't working for all cases.
Fig 1 : Represent each circle by a bounded box.
Now to use the sweep line method, moving the sweep line parallel to y-axis we need TWO line segments to represent each circle's y-range as shown in figure 2.
Having done that the problem reduces to the following :
Here 2 line segments represent one circle.
Sweep line status can be maintained as any balanced dynamic data structure like AVL tree, Skip Lists, Red Black Trees having insertion/Update/Deletion/Retrieval time at-most O(logn).
The comparison function in this case will check if the two circles corresponding to the adjacent line segments intersect or not(In place of checking for line segments to intersect as in the original line sweep method for finding out line segment intersections). This can be done in O(1) time as constant amount of operations are required.
Number of Event Points : 4n (for n circles => 2n line segments => 4n end points)
Complexity = O(4nlog(4n)) = O(nlogn)

Segment Intersection

Here is a question from CLRS.
A disk consists of a circle plus its interior and is represented by its center point and radius. Two disks intersect if they have any point in common. Give an O(n lg n)-time algorithm to determine whether any two disks in a set of n intersect.
Its not my home work. I think we can take the horizontal diameter of every circle to be the representing line segment. If two orders come consecutive, then we check the length of the distances between the two centers. If its less than or equal to the sum of the radii of the circles, then they intersect.
Please let me know if m correct.

Build a Voronoi diagram for disk centers. This is an O(n log n) job.
Now for each edge of the diagram take the corresponding pair of centers and check whether their disk intersect.

Build a k-d tree with the centres of the circles.
For every circle (p, r), find using the k-d tree the set S of circles whose centres are nearer than 2r from p.
Check if any of the circles in S touches the current circle.
I think the average cost for this algorithm is O(NlogN).
The logic is that we loop over the set O(N), and for every element get a subset of elements near O(NlogN), so, a priori, the complexity is O(N^2 logN). But we also have to consider that the probability of two random circles being less than 2r apart and not touching is lesser than 3/4 (if they touch we can short-circuit the algorithm).
That means that the average size of S is probabilistically limited to be a small value.

Another approach to solve the problem:
Divide the plane using a grid whose diameter is that of the biggest circle.
Use a hashing algorithm to classify the grid cells in N groups.
For every circle calculate the grid cells it overlaps and the corresponding groups.
Get all the circles in a group and...
Check if the biggest circle touches any other circle in the group.
Recurse applying this algorithm to the remaining circles in the group.
This same algorithm implemented in scala: https://github.com/salva/simplering/blob/master/touching/src/main/scala/org/vesbot/simplering/touching/Circle.scala

Finding number of lattice points inside a region

Given a set of points in 2-D plane, How to find number of points lying on or inside any arbitrary triangle.
One method is to check all points whether they lie inside the given triangle.
But I read that Kd-tree can be used to find the number of points lying within a region in O(log n) time, where 'n' is number of points. But I did not understand how to implement that.
Is there any other simpler method to do that?
Or kd-tree will work? If so can someone explain how?

It can be done by recursively checking position of sub-partitions to a triangle. To see which points of a tree node are in a triangle, check each of a node partition (there are 2 in a k-d tree) is it whole in a triangle, is it outside of a triangle or is it intersecting triangle. If partition is in triangle than add number of points in that partition to a result, if partition is out of triangle than do nothing for that partition, if partition intersects triangle than make same check for a sub-partitions of that partition.
For this, each tree node has to store number of points in its sub-tree, which is easy to do in tree creation.
Running time depends on a number of intersections of triangle edges with a partition boundaries. I'm not sure is it O(log n) in the worst case.

How to calculate total volume of multiple overlapping cuboids

I have a list of Cuboids, defined by their coordinates of their lower-left-back and upper-right-front corners, with edges parallel to the axis. Coordinates are double values. These cuboids are densely packed, will overlap with one or more others, or even fully contain others.
I need to calculate the total volume encompassed by all the given cuboids. Areas which overlap (even multiple times) should be counted exactly once.
For example, the volumes:
((0,0,0) (3,3,3))
((0,1,0) (2,2,4))
((1,0,1) (2,5,2))
((6,6,6) (8,8,8))
The total volume is 27 + 1 + 2 + 8 = 38.
Is there an easy way to do this ( in O(n^3) time or better?) ?

How about maintaining a collection of non-intersecting cuboids as each one is processed?
This collection would start empty.
The first cuboid would be added to the collection – it would be the only element, therefore guaranteed not to intersect anything else.
The second and subsequent cuboids would be checked against the elements in the collection. For each new cuboid N, for each element E already in the collection:
If N is totally contained by E, discard N and resume processing at the next new cuboid.
If N totally contains E, remove E from the collection and continue testing N against the other elements in the collection.
If N intersects E, split N into up to five (see comment below) smaller cuboids (depending on how they intersect) representing the volume that does not intersect and continue testing these smaller cuboids against the other elements in the collection.
If we get to the end of the tests against the non-intersecting elements with one or more cuboids generated from N (representing the volume contributed by N that wasn't in any of the previous cuboids) then add them all to the collection and process the next cuboid.
Once all the cuboids have been processed, the total volume will be the sum of the volumes in the collection of non-intersecting cuboids that has been built up.

This can be solved efficiently using a plane-sweep algorithm, that is a straightforward extension of the line-sweep algorithm suggested here for finding the total area of overlapping rectangles.
For each cuboid add it's left and right x-coordinate in an event queue and sort the queue. Now sweep a yz-plane (that has a constant x value) through the cuboids and record the volume between any two successive events in the event queue. We do this by maintaining the list of rectangles that intersect the plane at any stage
As we sweep the plane we encounter two types of events:
(1) We see the beginning of new cuboid that starts intersecting the sweeping plane. In this case a new rectangle intersects the plane, and we update the area of the rectangles intersecting the sweeping plane.
(2) The end of an existing cuboid that was intersecting with the plane. In this case we have to remove the corresponding rectangle from the list of rectangles that are currently intersecting the plane and update the new area of the resulting rectangles.
The volume of the cuboids between any two successive events qi and qi+1 is equal to the horizontal distance between the two events times the area of the rectangles intersecting the sweep line at qi.
By using the O(nlogn) algorithm for computing the area of rectangles as a subroutine, we can obtain an O(n2logn) algorithm for computing the total volume of the cuboids. But there may be a better way of maintaining the rectangles (since we only add or delete a rectangle at any stage) that is more efficient.

I recently had the same problem and found the following approach easy to implement and working for n dimensions.
First build a grid and then check for each cell in the grid whether it overlaps with a cuboid or not. The volume of overlapping cuboids is the sum of the volumes for those cells which are included in one or more cuboids.
Describe your cuboids with their min/max value for each dimension.
For each dimension store min/max values of each cuboid in an array. Sort this array and remove duplicates.
Now you have grid points of a non-equidistant grid. Each cell of the grid is either completely inside one or more cuboids or not.
Iterate over the grid cells and count the volume for those cells which overlap with one or more cuboids.
You can get all grid cells by using the Cartesian Product.

I tried the cellular approach suggested by #ccssmnn; it worked but was way too slow. The problem is that the size of the array used for "For each dimension store min/max values of each cuboid in an array." is O(n), so the number of cells (hence, the execution time) is n^d, e.g., n^3 for three dimensions.
Next, I tried a nested sweep-line algorithm, as suggested by #krjampani; much faster but still too slow. I believe the complexity is n^2*log^3(n).
So now, I'm wondering if there's any recourse. I've read several postings that mention the use of interval trees or augmented interval trees; might this approach have better complexity, e.g., n*log^3(n)?
Also, I'm trying to get my head around what would be the augmenting value in this case? In the case of point or range queries, I can see sorting the cuboids by their (xlo,ylo,zlo) and using max(xhi,yhi,zhi) for each subtree as the augmenting value, but can't figure out how to extend this to keep track of the union of the cuboids and its volume.

Intersection of N rectangles

I'm looking for an algorithm to solve this problem:
Given N rectangles on the Cartesian coordinate, find out if the intersection of those rectangles is empty or not. Each rectangle can lie in any direction (not necessary to have its edges parallel to Ox and Oy)
Do you have any suggestion to solve this problem? :) I can think of testing the intersection of each rectangle pair. However, it's O(N*N) and quite slow :(

Abstract
Either use a sorting algorithm according to smallest X value of the rectangle, or store your rectangles in an R-tree and search it.
Straight-forward approach (with sorting)
Let us denote low_x() - the smallest (leftmost) X value of a rectangle, and high_x() - the highest (rightmost) X value of a rectangle.
Algorithm:
Sort the rectangles according to low_x(). # O(n log n)
For each rectangle in sorted array: # O(n)
Finds its highest X point. # O(1)
Compare it with all rectangles whose low_x() is smaller # O(log n)
than this.high(x)
Complexity analysis
This should work on O(n log n) on uniformly distributed rectangles.
The worst case would be O(n^2), for example when the rectangles don't overlap but are one above another. In this case, generalize the algorithm to have low_y() and high_y() too.
Data-structure approach: R-Trees
R-trees (a spatial generalization of B-trees) are one of the best ways to store geospatial data, and can be useful in this problem. Simply store your rectangles in an R-tree, and you can spot intersections with a straightforward O(n log n) complexity. (n searches, log n time for each).

Observation 1: given a polygon A and a rectangle B, the intersection A ∩ B can be computed by 4 intersection with half-planes corresponding to each edge of B.
Observation 2: cutting a half plane from a convex polygon gives you a convex polygon. The first rectangle is a convex polygon. This operation increases the number of vertices at most per 1.
Observation 3: the signed distance of the vertices of a convex polygon to a straight line is a unimodal function.
Here is a sketch of the algorithm:
Maintain the current partial intersection D in a balanced binary tree in a CCW order.
When cutting a half-plane defined by a line L, find the two edges in D that intersect L. This can be done in logarithmic time through some clever binary or ternary search exploiting the unimodality of the signed distance to L. (This is the part I don't exactly remember.) Remove all the vertices on the one side of L from D, and insert the intersection points to D.
Repeat for all edges L of all rectangles.

This seems like a good application of Klee's measure. Basically, if you read http://en.wikipedia.org/wiki/Klee%27s_measure_problem there are lower bounds on the runtime of the best algorithms that can be found for rectilinear intersections at O(n log n).

I think you should use something like the sweep line algorithm: finding intersections is one of its applications. Also, have a look at answers to this questions

Since the rectangles must not be parallel to the axis, it is easier to transform the problem to an already solved one: compute the intersections of the borders of the rectangles.
build a set S which contains all borders, together with the rectangle they're belonging to; you get a set of tuples of the form ((x_start,y_start), (x_end,y_end), r_n), where r_n is of course the ID of the corresponding rectangle
now use a sweep line algorithm to find the intersections of those lines
The sweep line stops at every x-coordinate in S, i.e. all start values and all end values. For every new start coordinate, put the corresponding line in a temporary set I. For each new end-coordinate, remove the corresponding line from I.
Additionally to adding new lines to I, you can check for each new line whether it intersects with one of the lines currently in I. If they do, the corresponding rectangles do, too.
You can find a detailed explanation of this algorithm here.
The runtime is O(n*log(n) + c*log(n)), where c is the number of intersection points of the lines in I.

Pick the smallest rectangle from the set (or any rectangle), and go over each point within it. If one of it's point also exists in all other rectangles, the intersection is not empty. If all points are free from ALL other rectangles, the intersection is empty.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio