K-d trees: nearest neighbor search algorithm - algorithm

This is my understanding of it:
1. Recurse down the tree, taking the left or right subtree according as whether ELEMENT would lie in the left or the right subtree, if it existed.
2. Set CURRENT_BEST as the first leaf node that you reach.
3. As you recurse back up, check to see whether ELEMENT lies closer to the splitting hyperplane than it does to CURRENT_BEST. If so, set CURRENT_BEST as the current node.
This is the part I got from Wikipedia and my class, and the part I don't understand:
4. Check to see whether any node in the other subtree of the splitting point singled out in 3. is closer to ELEMENT than the splitting point.
I don't see why we need to do 4., since any point that might lie in the one subtree of the splitting node must necessarily be closer to the splitting node than to any point in the other subtree.
It's obviously my understanding of the algorithm that is flawed, so help will be greatly appreciated.

Step 4 is the 'else' in step 3, what you do if the plane is closer than the point. Just because the point you found would be in the same rectangle as the point you are finding the neighbour for doesn't mean that it is the closest.
Imagine the following scenario: you have two points in your kD-Tree, A and B. A is in the middle of its rectangle, while B is just over the edge, in the partitioned area next to that of A. If you now search for the nearest neighbour to point C, which is right next to B but happens to be the other side of the edge and in the partition area of A, your first point you choose will be A due to the initial Depth First Search that chooses whatever would be in the same partition as your search point. However, B is actually closer, so even though you chose A, you need to check whether B is closer otherwise your kD-Tree won't actually give you correct results.
A good way of visualising this is to draw it out:
A-------------C--|--B
A is the first point we found in the DFS, C is our point we want the nearest neighbour of, B is the actual nearest neighbour, | is our split plane.
Another way to think of it is to draw a circle with radius dist(A,C) around point C. If any other rectangles have any portion of themselves fall within this circle, then there is a chance that they hold a point which might be closer to C than A is, so they must be checked. If you now find B, you can reduce the radius of your circle (because B is closer) so that less rectangles have a chance of intersecting, and once you have checked all the rectangles which intersect with your circle (reducing your circle radius as your find closer neighbours) you can definitively say that there are no closer points.

I wrote a basic C++ implementation on github. It has both an iterative and recursive version.

function kdtree (list of points pointList, int depth)
{
// Select axis based on depth so that axis cycles through all valid values
var int axis := depth mod k;
// Sort point list and choose median as pivot element
select median by axis from pointList;
// Create node and construct subtrees
var tree_node node;
node.location := median;
node.leftChild := kdtree(points in pointList before median, depth+1);
node.rightChild := kdtree(points in pointList after median, depth+1);
return node;
}

Related

HilbertMaze task on Codility

Can anybody give me a hint about how to approach following task from Codility: https://codility.com/programmers/task/hilbert_maze/
I would be able to find the shortest path by generating the maze and searching for the shortest path using BFS, but since the worst-case time complexity is expected to be O(N) I don't think this would be the right way to go. Time complexity of BFS is O(|V| + |E]) where V is number of vertices and E the number of edges. For example if N = 3, we have a grid of size 17x17 and it's intuitively obvious that we can't find the path in only 3 steps.
So, either the indicated time complexity is wrong and should be something like M^2 or there is a quick trick to simply calculate the distance between two points without using graph algorithms. I found some algorithms for calculating Hilbert distance for 2 given points (if that's what is needed here), which use bit manipulations etc. but couldn't understand them at all. Moreover, I think that the goal of the task is to find out on your own how to calculate the distance and not using an existing formula.
Is there somebody who solved this task and can help me further? Thanks!
Here is the solution I came up with:
Every points location can be defined by an array of quadrants and their orientation (it will have N elements) - each element representing the orientation in the previous quadrant. The whole maze having upwards orientations
You need to define this array for both points. For example: if N = 2 and the point is in the lower left quadrant then it will have the orientation to the left. We take this quadrant and we rotate our coordinate system so it will the same orientation. This way we define the next quadrant and orientation pair in our new system. So if we have our point in the lower left quadrant then it will have orientation to the left, but as this was relative to our previous orientation (which was also to the left) this will become an upwards orientation.
At this point we have all the quadrants and orientation down to the smallest maze that contains our point. From backwards (from the smallest maze) we need to solve them. Every maze can be solved by the following rules:
if our point in our current quadrant is on any of the extremes (meaning that any of the coordinate's components are either the lowest or highest of the quadrant) we leave it where it was, otherwise check next points
if our point is downwards or at the middle of the current quadrant then we move to the quadrants lowest middle point (these goes relative the previously defined orientation, i.e.: if our orientation is upwards then we will move our point at the topmost middle point)
if our point is upwards (in the relative direction) we will have to move it to the topmost middle point
Storing these moves, we check if we have any common elements in the two array belonging to the two points:
if not we calculate the distance between the two endpoints and the we add up all the distances from the two moves list (in this list every distance can be calculated as coordinate component subtractions, i.e.: abs(x1-x2) + abs(y1-y2))
if we have common elements then we delete every move after that including the common elements and we calculate the distance as mentioned at the point before
This solutions can be optimised, it is just meant to present and idea to start with.
Edit: Here is my implementation of the above presented solution in Swift3: https://codility.com/demo/results/training9WWFXU-EWC/

Four-way navigation algorithm

Consider a rectangular shaped canvas, containing rectangles of random sizes and positions. To navigate between these rectangles, a user can use four arrows: up, down, left, right.
Are you familiar with any algorithm of navigation that would produce a fairly straightforward user experience?
I came across a few solutions but none of them seemed suitable. I am aware that no solution will be "ideal". However, the kind of algorithm I am looking for is the sort used to navigate between icons on a desktop using only the arrow keys.
[EDIT 21/5/2013: As pointed out by Gene in a comment, my weighting scheme actually does not guarantee that every rectangle will be reachable from every other rectangle -- only that every rectangle will be connected to some other rectangle in each direction.]
A nice way to do this is using maximum weighted bipartite matching.
What we want to do is build a table defining a function f(r, d) that returns the rectangle that the user will be moved to if they are currently at rectangle r and hit direction d (up, down, left or right). We would like this function to have some nice properties, such as:
It must be possible to reach every rectangle from every other rectangle
Pressing left then right or vice versa, or up then down or vice versa, should leave the user in the same place
Pressing e.g. left should take the user to a rectangle to the left (this is a bit more difficult to state precisely, but we can use a scoring system to measure the quality)
For each rectangle, create 4 vertices in a graph: one for each possible key that could be pressed while at that rectangle. For a particular rectangle r, call them rU, rD, rL and rR. For every pair of rectangles r and s, create 4 edges:
(rU, sD)
(rD, sU)
(rL, sR)
(rR, sL)
This graph has 2 connected components: one contains all U and D vertices, and the other contains all L and R vertices. Each component is bipartite, because e.g. no U vertex is ever connected to another U vertex. We could in fact run maximum weighted bipartite matching on each component separately, although it's easier just to talk about running it once on the entire graph after grouping, say, U vertices with L vertices and D vertices with R vertices.
Assign each of these edges a nonnegative weight according to how much sense it makes for that pair of rectangles to be connected by that pair of keys. You are free to choose the form for this scoring function, but it should probably be:
inversely proportional to the distances between the rectangles (you could use the distance between their centres), and
inversely proportional to how far the angle between the centres of the rectangles differs from the desired horizontal or vertical line, and
zero whenever the rectangles are oriented the wrong way (e.g. if for the edge (rU, sD) if the centre of r is actually above the centre of s). Alternatively, you can just delete these zero-weight edges.
This function attempts to satisfy requirement 3 at the top.
[EDIT #2 24/5/2013: Added an example function below.]
Here is C-ish pseudocode for an example function satisfying these properties. It takes the centre points of 2 rectangles and the direction from rectangle 1 (the direction from rectangle 2 is always the opposite of this direction):
const double MAXDISTSQUARED = /* The maximum possible squared distance */;
const double Z = /* A +ve number. Z > 1 => distance more important than angle */
// Return a weight in the range [0, 1], with higher indicating a better fit.
double getWeight(enum direction d, int x1, int y1, int x2, int y2) {
if (d == LEFT && x1 < x2 ||
d == RIGHT && x1 > x2 ||
d == UP && y1 < y2 ||
d == DOWN && y1 > y2) return 0;
// Don't need to take sqrt(); in fact it's probably better not to
double distSquared = (x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2);
double angle = abs(atan2(x1 - x2, y1 - y2)); // 0 => horiz; PI/2 => vert
if (d == UP || d == DOWN) angle = PI / 2 - angle;
return 1 - pow(distSquared / MAXDISTSQUARED, Z) * (2 * angle / PI);
}
Now run maximum weighted bipartite matching. This will attempt to find the set of edges having highest total weight such that every vertex (or at least as many as possible) are adjacent to a selected edge, but no vertex is adjacent to more than one edge. (If we allowed a vertex to be adjacent to more than one edge, it would mean that pressing that key while at that rectangle would take you to more than one destination rectangle, which doesn't make sense.) Each edge in this matching corresponds to a bidirectional pair of keypresses, so that pressing e.g. up and then down will take to back to where you were, automatically satisfying requirement 2 at the top.
The only requirement not automatically satisfied by this approach so far is the important one, number 1: it does not necessarily guarantee that every rectangle will be reachable. If we just use the "raw" quality scores as edge weights, then this can actually occur for certain configurations, e.g. when there is one rectangle in each of the 4 corners of the screen, plus one at the centre, the centre one might be unreachable.
[EDIT 21/5/2013: As Gene says, my claim below that property 1 is satisfied by the new weighting scheme I propose is wrong. In many cases every rectangle will be reachable, but in general, you need to solve the NP-hard Hamiltonian Cycle problem to guarantee this. I'll leave the explanation in as it gets us some of the way there. In any case it can be hacked around by adjusting weights between connected components upward whenever subcycles are detected.]
In order to guarantee that the matching algorithm always returns a matching in which every rectangle is reachable, we need to adjust the edge weights so that it is never possible for a matching to score higher than a matching with more edges. This can be achieved by scaling the scoring function to between 0 and 1, and adding the number of rectangles, n, to each edge's weight. This works because a full matching then has score at least 4n^2 (i.e. even if the quality score is 0, the edge itself has a weight of n and there are 4n of them), while any matching with fewer edges has score at most 4(n-1)(n+1) = 4n^2 - 4, which is strictly less.
It's a truism that to a person with a hammer everything looks like a nail. Shortest path algorithms are an obvious tool here because shortest distance seems intuitive.
However we are designing a UI where logical distance is much more important than physical distance.
So let's try thinking differently.
One constraint is that repeatedly hitting the up (right, down or left) arrow ought to eventually cycle through all the rectangles. Otherwise some unreachable "orphans" are likely. Achieving this with an algorithm based on physical (2d) distance is difficult because the closest item in 2d might be in the wrong direction in the 1d projection corresponding to the arrow pair being used. I.e. hitting the up arrow could easily select a box below the current. Ouch.
So let's adopt an extremely simple solution. Just sort all the rectangles on the x-coordinate of their centroids. Hitting the right and left arrow cycles through rectangles in this order: right to the next highest x and left to the next lowest x. Wrap at the screen edges.
Also do the same with y-coordinates. Using up and down cycles in this order.
The key (pun intended) to success is adding dynamic information to the screen while cycling to show the user the logic of what is occurring. Here is a proposal. Others are possible.
At first vertical (up or down) key, a pale translucent overlay appears over the rectangles. They are shaded pale red or blue in a pattern that alternates by y coordinate of centroid. There are also horizontal hash marks of matching color across the entire window. The only reason for two colors is to provide a visual indicator of correspondence between lines and rectangles. The currently selected rectangle is non-translucent and the hash mark is brighter than all the others. When you continue to hit the up or down key, the highlighted box changes in the centroid y-order as described above. The overlay disappears when no arrow key has been struck for a half second or so.
A very similar overlay appears if a horizontal key is hit, only it's vertical hash marks and x-order.
As a user I'd really like this scheme. But YMMV.
The algorithm and data structures needed to implement this are obvious, trivial, and scale very well. The effort will go into making the overlays look good.
NB Now that I have done all the drawings I realize it would be a good idea to place a correctly colored dot at the centroid of each box to show which of the lines is intersecting it. Some illustrative diagrams follow.
Bare Boxes
Selection with up or down arrow in progress
Selection with left or right arrow in progress
What about building a movement graph as follows:
for any direction, try to go to the nearest rectangle, in the given direction, whose center point is the middle of the current rectangle's side.
try to eliminate loops, e.g. moving 'right' from A should try to yield a different rectangle than moving 'up-right' from A. For example in this drawing, the 'right' from green should be orange, even though pink would be the nearest mid-point
(Thanks to biziclop): if any rectangles aren't reachable in the graph, then re-map one of the adjoining rectangles to get to it, likely the one with the least error. Repeat until all rectangles are reachable (I think that algorithm would terminate...)
Then store the graph and only use that to navigate. You don't want to change the directions in the middle of the session.
This problem can be modeled as a graph problem and algorithm of navigation can be used as a shortest path routing.
Here is the modelling.
Each rectangle is a vertex in the graph. From each vertex (aka rectangle) you have four options - up, down, left, right. So, you can reach four different rectangles, i.e this vertex will have four neighbors and you add these edges to graph.
I am not sure if this is part of the problem -- "multiple rectangles can be reached from a rectangle using a particular action (e.g. up)". If not, the above modelling is good enough. If yes, then add all such vertices as the neighbor for this vertex. Therefore, you may not end up with a 4-regular graph. Otherwise, you will model your problem into a 4-regular graph.
Now, the question is how do you define your "navigation" algorithm. If you do not want to distinguish between your actions, i.e. if up, down, left, and right are all equal, then you can add weight of 1 to all edges.
If you decide to give a particular action more precedence than other, say up is better than the rest, then you can give weight for edges resulted from up movement as 1, and the remaining edges as 2. The idea is by assigning different weights you can distinguish between the edges you will travel.
If you decide that all up edges are not equal, i.e. the up distance between A and B, is shorter than the up distance between C and D, then you can accordingly assign weights to the edges during the graph construction process.
Here is the routing
Now how to find the route -- You can use dijkstra's algorithm to find a shortest path between a given pair of vertices. If you are interested in multiple shortest paths, you can use k-shortest path algorithm to find k shortest paths between a pair of node, and then pick your best path.
Note that the graph you end up with does not have to be a directed graph. If you prefer a directed graph, you can assign directions to the edges when you are constructing them. Otherwise you should be good using an undirected graph, as all you care is to use an edge to reach a vertex from another. Moreover, if rectangle A can be reached using up from rectangle B, then rectangle B can be reached using down from rectangle A. Thus, directions really do not matter, if you do not need them for other reasons. If you do not like the assumption I just made, then you need to construct a directed graph.
Hope this helps.

Placing a point to minimise distance from the farthest point

I have a question,
Given a set of points , how do you place a point with the constraint that the distance to the farthest point is as small as possible?.
This is in reference to this problem. I do not know how to proceed. Some pointers anyone?
Thanks
Check out this page. It describes several methods to do this.
http://www.personal.kent.edu/~rmuhamma/Compgeometry/MyCG/CG-Applets/Center/centercli.htm
In case the link above ever dies, here's the relevant part that describes the most straight-forward method:
An O(n2)-time Algorithm
Draw a circle at center, c, such that points of given set lie within the circle. Clearly, this circle can be made smaller (or else we have the solution).
Make a circle smaller by finding the point A farthest from the center of circler, and drawing a new circle with the same center and passing through the point A. These two steps produce a smaller enclosing circle. The reason that the new circle is smaller is that this new circle still contains all the points of given set, except now it passes through farthest point, x, rather than enclosing it.
If the circle passes through 2 or more points, proceed to step 4. Otherwise, make the circle smaller by moving the center towards point A, until the circle makes contact with another point B from the set.
At this stage, we a circle, C, that passes through two or more points of a given set. If the circle contains an interval (point-free interval) of arc greater than half the circle's circumference on which no points lie, the circle can be made smaller. Let D and E be the points at the ends of this point-free interval. While keeping D and E on the circle's boundary, reduce the diameter of the circle until we have either case (a) or case (b).
Case (a) The diameter is the distance DE.
We are done!
Case (b) The circle C touches another point from the set, F.
Check whether there exits point-free arc intervals of length more than half the circumference of C.
IF no such point-free arc intervals exit THEN
We are done!
Else
Goto step 4.
In this case, three points must lie on an arc less than half the circumference in length. We repeat step 4 on the outer two of the three points on the arc.
Another page here, with a sample applet:
http://www.sunshine2k.de/stuff/Java/Welzl/Welzl.html
You need to use the Voronoi diagram, possibibly the Farthest-Point Voronoi diagram, where the plane is divided into regions, where points in the same region have the same fartherst point
Update
You need to build a Farthest-Point voronoi diagram first, which is O(nlogn) time, and find the center of the smallest circle among all the vertices(if the circle is defined by three points) and all the edges(if the circle is defined by two points). The total time complexity of this approach is O(nlogn)
I just saw the Smallest circle problem wiki page, seems like there is a O(n) time algorithm. You can check it out if you care about the speed, otherwise never mind.

Algorithm to take the union of rectangles and to see if the union is still a rectangle

I have a problem in which I have to test whether the union of given set of rectangles forms
a rectangle or not. I don't have much experience solving computational geometry problems.
What my approach to the problem was that since I know the coordinates of all the rectangles, I can easily sort the points and then deduce the corner points of the largest rectangle possible. Then I could sweep a line and see if all the points on the line falls inside the rectangle. But, this approach is flawed and this would fail because the union may be in the form of a 'U'.
I would be a great help if you could push me in the right direction.
Your own version does not take into account that the edges of the rectangles can be non-parallel to each other. Therefore, there might not be "largest rectangle possible".
I would try this general approach:
1) Find the convex hull. You can find convex hull calculation algorithms here http://en.wikipedia.org/wiki/Convex_hull_algorithms.
2) Check if the convex hull is a rectangle. You can do this by looping through all the points on convex hull and checking if they all form 180 or 90 degree angles. If they do not, union is not a rectangle.
3) Go through all points on the convex hull. For each point check if the middle point between ThisPoint and NextPoint lies on the edge of any initially given rectangle.
If every middle point does, union is a rectangle.
If it does not, union is not a rectangle.
Complexity would be O(n log h) for finding convex hull, O(h) for the second part and O(h*n) for third part, where h is number of points on the convex hull.
Edit:
If the goal is to check if the resulting object is a filled rectangle, not only edges and corners rectangle then add step (4).
4) Find all line segments that are formed by intersecting or touching rectangles. Note - by definition all of these line segments are segments of edges of given rectangles. If a rectangle does not touch/intersect other rectangles, the line segments are it's edges.
For each line segment check if it's middle point is
On the edge of the convex hull
Inside one of given rectangles
On the edge of two non-overlapping given rectangles.
If at least one of these is true for every line segment, resulting object is a filled rectangle.
You could deduce the he corner points of the largest rectangle possible, and then go over all the rectangle that share the border with the largest possible rectangle, for example the bottom, and make sure that the line is entirely contained in their borders. This will also fail if an empty space in the middle of the rectangle is a problem, however. I think the complexity will be O(n2).
I think you are on the right direction. After you get the coordinates of largest possible rectangle,
If the largest possible rectangle is a valid rectangle, then each side of it must be union of sides of original rectangles. You can scan the original rectangle set, find those rectangles that is a part of the largest side we are looking for (this can be done in O(n) by checking if X==largestRectangle.Top.X if you are looking at top side, etc.), lets call them S.
For each side s in S we can create an interval [from,to]. All we need to check is whether the union of all intervals matches the side of the largest Rectangle. This can be done in O(nlog(n)) by standard algorithms, or on average O(n) by some hash trick (see http://www.careercup.com/question?id=12523672 , see my last comment (of the last comment) there for the O(n) algorithm ).
For example, say we got two 1*1 rectangles in the first quadrant, there left bottom coordinates are (0,0) and (1,0). Largest rectangle is 2*1 with left bottom coordinate (0,0). Since [0,1] Union [1,2] is [0,2], top side and bottom side match the largest rectangle, similar for left and right side.
Now suppose we got an U shape. 3*1 at (0,0), 1*1 at (0,1), 1*1 at (2,1), we got largest rectangle 3*2 at (0,0). Since for the top side we got [0,1] Union [1,3] does not match [0,3], the algorithm will output the union of above rectangles is not a rectangle.
So you can do this in O(n) on average, or O(nlog(n)) at least if you don't want to mess with some complex hash bucket algorithm. Much better than O(n^4)!
Edit: We have a small problem if there exists empty space somewhere in the middle of all rectangles. Let me think about it....
Edit2: An easy way to detect empty space is for each corner of a rectangle which is not a point on the largest rectangle, we go outward a little bit for all four directions (diagonal) and check if we are still in any rectangle. This is O(n^2). (Which ruins my beautiful O(nlog(n))! Can anyone can come up a better idea?
I haven't looked at a similar problem in the past, so there maybe far more efficient ways of doing it. The key problem is that you cannot look at containment of one rectangle in another in isolation since they could be adjacent but still form a rectangle, or one rectangle could be contained within multiple.
You can't just look at the projection of each rectangle on to the edges of the bounding rectangle unless the problem allows you to leave holes in the middle of the rectangle, although that is probably a fast initial check that could be performed before the following exhaustive approach:
Running through the list once, calculating the minimum and maximum x and y coordinates and the area of each rectangle
Create an input list containing your input rectangles ordered by descending size.
Create a work list containing the bounding rectangle initially
While there are rectangles in the work list
Take the largest rectangle in the input list R
Create an empty list for fragments
for each rectangle r in the work list, intersect r with R, splitting r into a rectangular portion contained within R (if any) and zero or more rectangles not within R. If r was split, discard the portion contained within R and add the remaining rectangles to the fragment list.
add the contents of the fragment list to the work list
Assuming your rectangles are aligned to the coordinate axis:
Given two rectangles A, B, you can make a function that subtracts B from A returning a set of sub-rectangles of A (that may be the empty set): Set = subtract_rectangle(A, B)
Then, given a set of rectangles R for which you want to know if their union is a rectangle:
Calculate a maximum rectangle Big that covers all the rectangles as ((min_x,min_y)-(max_x,max_y))
make the set S contain the rectangle Big: S = (Big)
for every rectangle B in R:
S1 = ()
for evey rectangle A in S:
S1 = S1 + subtract_rectangle(A, B)
S = S1
if S is empty then the union of the rectangles is a rectangle.
End, S contains the parts of Big not covered by any rectangle from R
If the rectangles are not aligned to the coordinate axis you can use a similar algorithm but that employs triangles instead of rectangles. The only issues are that subtracting triangles is not so simple to implement and that handling numerical errors can be difficult.
A simple approach just came to mind: If two rectangles share an edge[1], then together they form a rectangle which contains both - either the rectangles are adjacent [][ ] or one contains the other [[] ].
So if the list of rectangles forms a larger rectangle, then all you need it to repeatedly iterate over the rectangles, and "unify" pairs of them into a single larger one. If in one iteration you can unify none, then it is not possible to create any larger rectangle than you already have, with those pieces; otherwise, you will keep "unifying" rectangles until a single is left.
[1] Share, as in they have the same edge; it is not enough for one of them to have an edge included in one of the other's edges.
efficiency
Since efficiency seems to be a problem, you could probably speed it up by creating two indexes of rectangles, one with the larger edge size and another with the smaller edge size.
Then compare the edges with the same size, and if they are the same unify the two rectangles, remove them from the indexes and add the new rectangle to the indexes.
You can probably speed it up by not moving to the next iteration when you unify something, but to proceed to the end of the indexes before reiterating. (Stopping when one iteration does no unifications, or there is only one rectangle left.)
Additionally, the edges of a rectangle resulting from unification are by analysis always equal or larger than the edges of the original rectangles.
So if the indexes are ordered by ascending edge size, the new rectangle will be inserted in either the same position as you are checking or in positions yet to be checked, so each unification will not require an extra iteration cycle. (As the new rectangle will assuredly not unify with any rectangle previously checked in this iteration, since its edges are larger than all edges checked.)
For this to hold, in each step of a particular iteration you need to attempt unification on the next smaller edge from either of the indexes:
If you're in index1=3 and index2=6, you check index1 and advance that index;
If next edge on that index is 5, next iteration step will be in index1=5 and index2=6, so it will check index1 and advance that index;
If next edge on that index is 7, next iteration step will be in index1=7 and index2=6, so it will check index2 and advance that index;
If next edge on that index is 10, next iteration step will be in index1=7 and index2=10, so it will check index1 and advance that index;
etc.
examples
[A ][B ]
[C ][D ]
A can be unified with B, C with D, and then AB with CD. One left, ABCD, thus possible.
[A ][B ]
[C ][D ]
A can be unified with B, C with D, but AB cannot be unified with CD. 2 left, AB and CD, thus not possible.
[A ][B ]
[C ][D [E]]
A can be unified with B, C with D, CD with E, CDE with AB. 1 left, ABCDE, thus possible.
[A ][B ]
[C ][D ][E]
A can be unified with B, C with D, CD with AB, but not E. 2 left, ABCD and E, thus not possible.
pitfall
If a rectangle is contained in another but does not share a border, this approach will not unify them.
A way to address this is, when one hits an iteration that does not unify anything and before concluding that it is not possible to unify the set of rectangles, to get the rectangle with the widest edge and discard from the indexes all others that are contained within this largest rectangle.
This still does not address two situations.
First, consider the situation where with this map:
A B C D
E F G H
we have rectangles ACGE and BDFH. These rectangles share no edge and are not contained, but form a larger rectangle.
Second, consider the situation where with this map:
A B C D
E F G H
I J K L
we have rectangles ABIJ, CDHG and EHLI. They do not share edges, are not contained within each-other, and no two of them can be unified into a single rectangle; but form a rectangle, in total.
With these pitfalls this method is not complete. But it can be used to greatly reduce the complexity of the problem and reduce the number of rectangles to analyse.
Maybe...
Gather up all the x-coordinates in a list, and sort them. From this list, create a sequence of adjacent intervals. Do the same thing for the y-coordinates. Now you've got two lists of intervals. For each pair of intervals (A=[x1,x2] from the x-list, B=[y1,y2] from the y-list), make their product rectangle A x B = (x1,y1)-(x2,y2)
If every single product rectangle is contained in at least one of your initial rectangles, then the union must be a rectangle.
Making this efficient (I think I've offered about an O(n4) algorithm) is a different question entirely.
As jva stated, "Your own version does not take into account that the edges of the rectangles can be non-parallel to each other." This answer also assumes "parallel" rectangles.
If you have a grid as opposed to needing infinite precision, depending on the number and sizes of the rectangles and the granularity of the grid, it might be feasible to brute-force it.
Just take your "largest rectangle possible" and test all its points to see whether each point is in at least one of the smaller rectangles.
I finally was able to find the impressive javascript project (thanks to github search :) !)
https://github.com/evanw/csg.js
Also have a look into my answer here with other interesting projects
General case, thinking in images:
| outer_rect - union(inner rectangles) |
Check that result is zero

Triangle partitioning

This was a problem in the 2010 Pacific ACM-ICPC contest. The gist of it is trying to find a way to partition a set of points inside a triangle into three subtriangles such that each partition contains exactly a third of the points.
Input:
Coordinates of a bounding triangle: (v1x,v1y),(v2x,v2y),(v3x,v3y)
A number 3n < 30000 representing the number of points lying inside the triangle
Coordinates of the 3n points: (x_i,y_i) for i=1...3n
Output:
A point (sx,sy) that splits the triangle into 3 subtriangles such that each subtriangle contains exactly n points.
The way the splitting point splits the bounding triangle into subtriangles is as follows: Draw a line from the splitting point to each of the three vertices. This will divide the triangle into 3 subtriangles.
We are guaranteed that such a point exists. Any such point will suffice (the answer is not necessarily unique).
Here is an example of the problem for n=2 (6 points). We are given the coordinates of each of the colored points and the coordinates of each vertex of the large triangle. The splitting point is circled in gray.
Can someone suggest an algorithm faster than O(n^2)?
Here's an O(n log n) algorithm. Let's assume no degeneracy.
The high-level idea is, given a triangle PQR,
P
C \
/ S\
R-----Q
we initially place the center point C at P. Slide C toward R until there are n points inside the triangle CPQ and one (S) on the segment CQ. Slide C toward Q until either triangle CRP is no longer deficient (perturb C and we're done) or CP hits a point. In the latter case, slide C away from P until either triangle CRP is no longer deficient (we're done) or CQ hits a point, in which case we begin sliding C toward Q again.
Clearly the implementation cannot “slide” points, so for each triangle involving C, for each vertex S of that triangle other than C, store the points inside the triangle in a binary search tree sorted by angle with S. These structures suffice to implement this kinetic algorithm.
I assert without proof that this algorithm is correct.
As for the running time, each event is a point-line intersection and can be handled in time O(log n). The angles PC and QC and RC are all monotonic, so each of O(1) lines hits each point at most once.
Main idea is: if we have got the line, we can try to find a point on it using linear search. If the line is not good enough, we can move it using binary search.
Sort the points based on the direction from vertex A. Sort them for B and C too.
Set current range for vertex A to be all the points.
Select 2 middle points from the range for vertex A. These 2 points define subrange for 'A'. Get some line AD lying between these points.
Iterate for all the points lying between B and AD (starting from BA). Stop when n points found. Select subrange of directions from B to points n and next after n (if there is no point after n, use BC). If less than n points can be found, set current range for vertex A to be the left half of the current range and go to step 3.
Same as step 4, but for vertex C.
If subranges A, B, C intersect, choose any point from there and finish. Otherwise, if A&B is closer to A, set current range for vertex A to be the right half of the current range and go to step 3. Otherwise set current range for vertex A to be the left half of the current range and go to step 3.
Complexity: sorting O(n * log n), search O(n * log n). (Combination of binary and linear search).
Here is an approach that takes O(log n) passes of cost n each.
Each pass starts with an initial point, which divides the triangle into there subtriangles. If each has n points, we are finished. If not, consider the subtriangle which is furthest away from the desired n. Suppose it has too many, just for now. The imbalances sum to zero, so at least one of the other two subtriangles has too few points. The third subtriangle either also has too few, or has exactly n points - or the original subtriangle would not have the highest discrepancy.
Take the most imbalanced subtriangle and consider moving the centre point along the line leading away from it. As you do so, the imbalance of the most imbalanced point will reduce. For each point in the triangle, you can work out when that point crosses into or out of the most imbalanced subtriangle as you move the centre point. Therefore you can work out in time n where to move the centre point to give the most imbalanced triangle any desired count.
As you move the centre point you can choose whether points move in our out of the most imbalanced subtriangle, but you can't chose which of the other two subtriangles they go to, or from - but you can predict which easily from which side of the line along which you are sliding the centre point they live, so you can move the centre point along this line to get the lowest maximum discrepancy after the move. In the worst case, all of the points moved go into, or out of, the subtriangle that was exactly balanced. However, if the imbalanced subtriangle has n + k points, by moving k/2 of them, you can move, at worst, to the case where it and the previously balanced subtriangle are out by k/2. The third subtriangle may still be unbalanced by up to k, in the other direction, but in this case a second pass will reduce the maximum imbalance to something below k/2.
Therefore in the case of a large unbalance, we can reduce it by at worst a constant factor in two passes of the above algorithm, so in O(log n) passes the imbalance will be small enough that we are into special cases where we worry about an excess of at most one point. Here I am going to guess that the number of such special cases is practically enumerable in a program, and the cost amounts to a small constant addition.
I think there is a linear time algorithm. See the last paragraph of the paper "Illumination by floodlights- by Steiger and Streinu". Their algorithm works for any k1, k2, k3 that sum up to n. Therefore, k1=k2=k3=n/3 is a special case.
Here is the link where you can find the article. http://www.sciencedirect.com/science/article/pii/S0925772197000278 a CiteSeerX link is http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.4634

Resources