Nearest neighbour search in a constantly changing set of line segments - algorithm

I have a set of line segments. I want to perform the following operations on them:
Insert a new line segment.
Find all line segments within radius R of a given point.
Find all points within radium R1 of a given point.
Given a line segment, find if it intersects any of the existing line segments. Exact intersection point is not necessary(though that probably doesnt simplify anything.)
The problem is algorithms like kd/bd tree or BSP trees is that they assume a static set of points, and in my case the points will constantly get augmented with new points, necessitating rebuilding the tree.
What data-structure/algorithms will be most suited for this situation ?
Edit: Accepted the answer that is the simplest and solves my problem. Thanks everyone!

Maintaining a dynamic tree might not be as bad as you think.
As you insert new points/lines etc into the collection it's clear that you'd need to refine the current tree, but I can't see why you'd have to re-build the whole tree from scratch every time a new entity was added, as you're suggesting.
With a dynamic tree approach you'd have a valid tree at all times, so you can then just use the fast range searches supported by your tree type to find the geometric entities you've mentioned.
For your particular problem:
You could setup a dynamic geometric tree, where the leaf elements
maintain a list of geometric entities (points and lines) associated
with that leaf.
When a line is inserted into the collection it should be pushed onto
the lists of all leaf elements that it intersects with. You can do
this efficiently by traversing the subset of the tree from the root
that intersects with the line.
To find all points/lines within a specified radial halo you first need
to find all leaves in this region. Again, this can be done by traversing
the subset of the tree from the root that is enclosed by, or that
intersects with the halo. Since there maybe some overlap, you need to
check that the entities associated with this set of leaf elements
actually lie within the halo.
Once you've inserted a line into a set of leaf elements, you can find
whether it intersects with another line by scanning all of the lines
associated with the subset of leaf boxes you've just found. You can then
do line-line intersection tests on this subset.
A potential dynamic tree refinement algorithm, based on an upper limit to the number of entities associated with each leaf in the tree, might work along these lines:
function insert(x, y)
find the tree leaf element enclosing the new entitiy at (x,y) based on whatever
fast search algorithm your tree supports
if (number of entities per leaf > max allowable) do
refine current leaf element (would typically either be a bisection
or quadrisection based on a 2D tree type)
push all entities from the old leaf element list onto the new child element
lists, based on the enclosing child element
else
push new entity onto list for leaf element
endif
This type of refinement strategy only makes local changes to the tree and is thus generally pretty fast in practice. If you're also deleting entities from the collection you can also support dynamic aggregation by imposing a minimum number of entities per leaf, and collapsing leaf elements to their parents when necessary.
I've used this type of approach a number of times with quadtrees/octrees, and I can't at this stage see why a similar approach wouldn't work with kd-trees etc.
Hope this helps.

One possibility is dividing your space into a grid of boxes - perhaps 10 in the y-axis and 10 in the x-axis for a total of 100.
Store these boxes in an array, so it's very easy/fast to determine neighboring boxes. Each box will hold a list vector of line segments that live in that box.
When you calculate line segments within R of one segment, you can check only the relevant neighboring boxes.
Of course, you can create multiple maps of differing granularities, maybe 100 by 100 smaller boxes. Simply consider space vs time and maintenance trade-offs in your design.
updating segment positions is cheap: just integer-divide by box sizes in the x and y directions. For example, if box-size is 20 in both directions and your new coordinate is 145,30. 145/20==7 and 30/20==1, so it goes into box(7,1), for a 0-based system.

While items 2 & 3 are relatively easy, using a simple linear search with distance checks as each line is inserted, item 4 is a bit more involved.
I'd tend to use a constrained triangulation to solve this, where all the input lines are treated as constraints, and the triangulation is balanced using a nearest neighbour rather than Delaunay criterion. This is covered pretty well in Triangulations and applications by Øyvind Hjelle, Morten Dæhlen and Joseph O'Rourkes Computational Geometry in C Both have source available, including getting sets of all intersections.
The approach I've taken to do this dynamically in the past is as follows;
Create a arbitrary triangulation (TIN) comprising of two triangles
surrounding the extents (current + future) of your data.
For each new line
Insert both points into the TIN. This can be done very quickly by
traversing triangles to find the insertion point, and replacing the
triangle found with three new triangles based on the new point and
old triangle.
Cut a section through the TIN based on the two end points, keeping a
list of points where the section cuts any previously inserted lined.
Add the intersection point details to a list stored against both
lines, and insert them into the TIN.
Force the inserted line as a constraint
Balance all pairs of adjacent triangles modified in the above process
using a nearest neighbour criterion, and repeat until all triangles
have been balanced.
This works better than a grid based method for poorly distributed data, but is more difficult to implement. Grouping end-point and lines into overlapping grids will probably be a good optimization for 2 & 3.
Note that I think using the term 'nearest neighbour' in your question is misleading, as this is not the same as 'all points within a given distance of a line', or 'all points within a given radius of another point'. Nearest neighbour typically implies a single result, and does not equate to 'within a given point to point or point to line distance'.

Instead of inserting and deleting into a tree you can calculate a curve that completely fills the plane. Such a curve reduce the 2d complexity to a 1d complexity and you would be able to find the nearest neighbor. You can find some example like z curve and hilbert curve. Here is a better description of my problem http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.

Related

Using a spatial index to find points within range of each other

I'm trying to find a spatial index structure suitable for a particular problem : using a union-find data structure, I want to connect\associate points that are within a certain range of each other.
I have a lot of points and I'm trying to optimize an existing solution by using a better spatial index.
Right now, I'm using a simple 2D grid indexing each square of width [threshold distance] of my point map, and I look for potential unions by searching for points in adjacent squares in the grid.
Then I compute the squared Euclidean distance to the adjacent cells combinations, which I compare to my squared threshold, and I use the union-find structure (optimized using path compression and etc.) to build groups of points.
Here is some illustration of the method. The single black points actually represent the set of points that belong to a cell of the grid, and the outgoing colored arrows represent the actual distance comparisons with the outside points.
(I'm also checking for potential connected points that belong to the same cells).
By using this pattern I make sure I'm not doing any distance comparison twice by using a proper "neighbor cell" pattern that doesn't overlap with already tested stuff when I iterate over the grid cells.
Issue is : this approach is not even close to being fast enough, and I'm trying to replace the "spatial grid index" method with something that could maybe be faster.
I've looked into quadtrees as a suitable spatial index for this problem, but I don't think it is suitable to solve it (I don't see any way of performing repeated "neighbours" checks for a particular cell more effectively using a quadtree), but maybe I'm wrong on that.
Therefore, I'm looking for a better algorithm\data structure to effectively index my points and query them for proximity.
Thanks in advance.
I have some comments:
1) I think your problem is equivalent to a "spatial join". A spatial join takes two sets of geometries, for example a set R of rectangles and a set P of points and finds for every rectangle all points in that rectangle. In Your case, R would be the rectangles (edge length = 2 * max distance) around each point and P the set of your points. Searching for spatial join may give you some useful references.
2) You may want to have a look at space filling curves. Space filling curves create a linear order for a set of spatial entities (points) with the property that points that a close in the linear ordering are usually also close in space (and vice versa). This may be useful when developing an algorithm.
3) Have look at OpenVDB. OpenVDB has a spatial index structure that is highly optimized to traverse 'voxel'-cells and their neighbors.
4) Have a look at the PH-Tree (disclaimer: this is my own project). The PH-Tree is a somewhat like a quadtree but uses low level bit operations to optimize navigation. It is also Z-ordered/Morten-ordered (see space filling curves above). You can create a window-query for each point which returns all points within that rectangle. To my knowledge, the PH-Tree is the fastest index structure for this kind of operation, especially if you typically have only 9 points in a rectangle. If you are interested in the code, the V13 implementation is probably the fastest, however the V16 should be much easier to understand and modify.
I tried on my rather old desktop machine, using about 1,000,000 points I can do about 200,000 window queries per second, so it should take about 5 second to find all neighbors for every point.
If you are using Java, my spatial index collection may also be useful.
A standard approach to this is the "sweep and prune" algorithm. Sort all the points by X coordinate, then iterate through them. As you do, maintain the lowest index of the point which is within the threshold distance (in X) of the current point. The points within that range are candidates for merging. You then do the same thing sorting by Y. Then you only need to check the Euclidean distance for those pairs which showed up in both the X and Y scans.
Note that with your current union-find approach, you can end up unioning points which are quite far from each other, if there are a bunch of nearby points "bridging" them. So your basic approach -- of unioning groups of points based on proximity -- can induce an arbitrary amount of distance error, not just the threshold distance.

Most efficient way to select point with the most surrounding points

N.B: there's a major edit at the bottom of the question - check it out
Question
Say I have a set of points:
I want to find the point with the most points surrounding it, within radius (ie a circle) or within (ie a square) of the point for 2 dimensions. I'll refer to it as the densest point function.
For the diagrams in this question, I'll represent the surrounding region as circles. In the image above, the middle point's surrounding region is shown in green. This middle point has the most surrounding points of all the points within radius and would be returned by the densest point function.
What I've tried
A viable way to solve this problem would be to use a range searching solution; this answer explains further and that it has " worst-case time". Using this, I could get the number of points surrounding each point and choose the point with largest surrounding point count.
However, if the points were extremely densely packed (in the order of a million), as such:
then each of these million points () would need to have a range search performed. The worst-case time , where is the number of points returned in the range, is true for the following point tree types:
kd-trees of two dimensions (which are actually slightly worse, at ),
2d-range trees,
Quadtrees, which have a worst-case time of
So, for a group of points within radius of all points within the group, it gives complexity of for each point. This yields over a trillion operations!
Any ideas on a more efficient, precise way of achieving this, so that I could find the point with the most surrounding points for a group of points, and in a reasonable time (preferably or less)?
EDIT
Turns out that the method above is correct! I just need help implementing it.
(Semi-)Solution
If I use a 2d-range tree:
A range reporting query costs , for returned points,
For a range tree with fractional cascading (also known as layered range trees) the complexity is ,
For 2 dimensions, that is ,
Furthermore, if I perform a range counting query (i.e., I do not report each point), then it costs .
I'd perform this on every point - yielding the complexity I desired!
Problem
However, I cannot figure out how to write the code for a counting query for a 2d layered range tree.
I've found a great resource (from page 113 onwards) about range trees, including 2d-range tree psuedocode. But I can't figure out how to introduce fractional cascading, nor how to correctly implement the counting query so that it is of O(log n) complexity.
I've also found two range tree implementations here and here in Java, and one in C++ here, although I'm not sure this uses fractional cascading as it states above the countInRange method that
It returns the number of such points in worst case
* O(log(n)^d) time. It can also return the points that are in the rectangle in worst case
* O(log(n)^d + k) time where k is the number of points that lie in the rectangle.
which suggests to me it does not apply fractional cascading.
Refined question
To answer the question above therefore, all I need to know is if there are any libraries with 2d-range trees with fractional cascading that have a range counting query of complexity so I don't go reinventing any wheels, or can you help me to write/modify the resources above to perform a query of that complexity?
Also not complaining if you can provide me with any other methods to achieve a range counting query of 2d points in in any other way!
I suggest using plane sweep algorithm. This allows one-dimensional range queries instead of 2-d queries. (Which is more efficient, simpler, and in case of square neighborhood does not require fractional cascading):
Sort points by Y-coordinate to array S.
Advance 3 pointers to array S: one (C) for currently inspected (center) point; other one, A (a little bit ahead) for nearest point at distance > R below C; and the last one, B (a little bit behind) for farthest point at distance < R above it.
Insert points pointed by A to Order statistic tree (ordered by coordinate X) and remove points pointed by B from this tree. Use this tree to find points at distance R to the left/right from C and use difference of these points' positions in the tree to get number of points in square area around C.
Use results of previous step to select "most surrounded" point.
This algorithm could be optimized if you rotate points (or just exchange X-Y coordinates) so that width of the occupied area is not larger than its height. Also you could cut points into vertical slices (with R-sized overlap) and process slices separately - if there are too many elements in the tree so that it does not fit in CPU cache (which is unlikely for only 1 million points). This algorithm (optimized or not) has time complexity O(n log n).
For circular neighborhood (if R is not too large and points are evenly distributed) you could approximate circle with several rectangles:
In this case step 2 of the algorithm should use more pointers to allow insertion/removal to/from several trees. And on step 3 you should do a linear search near points at proper distance (<=R) to distinguish points inside the circle from the points outside it.
Other way to deal with circular neighborhood is to approximate circle with rectangles of equal height (but here circle should be split into more pieces). This results in much simpler algorithm (where sorted arrays are used instead of order statistic trees):
Cut area occupied by points into horizontal slices, sort slices by Y, then sort points inside slices by X.
For each point in each slice, assume it to be a "center" point and do step 3.
For each nearby slice use binary search to find points with Euclidean distance close to R, then use linear search to tell "inside" points from "outside" ones. Stop linear search where the slice is completely inside the circle, and count remaining points by difference of positions in the array.
Use results of previous step to select "most surrounded" point.
This algorithm allows optimizations mentioned earlier as well as fractional cascading.
I would start by creating something like a https://en.wikipedia.org/wiki/K-d_tree, where you have a tree with points at the leaves and each node information about its descendants. At each node I would keep a count of the number of descendants, and a bounding box enclosing those descendants.
Now for each point I would recursively search the tree. At each node I visit, either all of the bounding box is within R of the current point, all of the bounding box is more than R away from the current point, or some of it is inside R and some outside R. In the first case I can use the count of the number of descendants of the current node to increase the count of points within R of the current point and return up one level of the recursion. In the second case I can simply return up one level of the recursion without incrementing anything. It is only in the intermediate case that I need to continue recursing down the tree.
So I can work out for each point the number of neighbours within R without checking every other point, and pick the point with the highest count.
If the points are spread out evenly then I think you will end up constructing a k-d tree where the lower levels are close to a regular grid, and I think if the grid is of size A x A then in the worst case R is large enough so that its boundary is a circle that intersects O(A) low level cells, so I think that if you have O(n) points you could expect this to cost about O(n * sqrt(n)).
You can speed up whatever algorithm you use by preprocessing your data in O(n) time to estimate the number of neighbouring points.
For a circle of radius R, create a grid whose cells have dimension R in both the x- and y-directions. For each point, determine to which cell it belongs. For a given cell c this test is easy:
c.x<=p.x && p.x<=c.x+R && c.y<=p.y && p.y<=c.y+R
(You may want to think deeply about whether a closed or half-open interval is correct.)
If you have relatively dense/homogeneous coverage, then you can use an array to store the values. If coverage is sparse/heterogeneous, you may wish to use a hashmap.
Now, consider a point on the grid. The extremal locations of a point within a cell are as indicated:
Points at the corners of the cell can only be neighbours with points in four cells. Points along an edge can be neighbours with points in six cells. Points not on an edge are neighbours with points in 7-9 cells. Since it's rare for a point to fall exactly on a corner or edge, we assume that any point in the focal cell is neighbours with the points in all 8 surrounding cells.
So, if a point p is in a cell (x,y), N[p] identifies the number of neighbours of p within radius R, and Np[y][x] denotes the number of points in cell (x,y), then N[p] is given by:
N[p] = Np[y][x]+
Np[y][x-1]+
Np[y-1][x-1]+
Np[y-1][x]+
Np[y-1][x+1]+
Np[y][x+1]+
Np[y+1][x+1]+
Np[y+1][x]+
Np[y+1][x-1]
Once we have the number of neighbours estimated for each point, we can heapify that data structure into a maxheap in O(n) time (with, e.g. make_heap). The structure is now a priority-queue and we can pull points off in O(log n) time per query ordered by their estimated number of neighbours.
Do this for the first point and use a O(log n + k) circle search (or some more clever algorithm) to determine the actual number of neighbours the point has. Make a note of this point in a variable best_found and update its N[p] value.
Peek at the top of the heap. If the estimated number of neighbours is less than N[best_found] then we are done. Otherwise, repeat the above operation.
To improve estimates you could use a finer grid, like so:
along with some clever sliding window techniques to reduce the amount of processing required (see, for instance, this answer for rectangular cases - for circular windows you should probably use a collection of FIFO queues). To increase security you can randomize the origin of the grid.
Considering again the example you posed:
It's clear that this heuristic has the potential to save considerable time: with the above grid, only a single expensive check would need to be performed in order to prove that the middle point has the most neighbours. Again, a higher-resolution grid will improve the estimates and decrease the number of expensive checks which need to be made.
You could, and should, use a similar bounding technique in conjunction with mcdowella's answers; however, his answer does not provide a good place to start looking, so it is possible to spend a lot of time exploring low-value points.

Implementation of Planar Point Location using slabs

I am looking into a Planar Point Location, and right now thinking about make use of The Slabs approach.
I've read through a couple of articles:
Planar Point Location Using Persistent Search Trees
Persistent Data Structures
Point location on Wikipedia
I understand the concept behind a persistent balanced binary search tree, but let's omit them in this question, I don't really care about extra storage space overhead. The thing with the articles is they are discussing how speed can be improved, but don't explain basic stuff. For example:
Could you please correct me if I am wrong:
We draw a line across all intersections.
Now, we have slabs that are split by the segments at different angles.
Every intersection with any line is considered a vertex.
Slabs are sorted by order in a binary search tree (let's omit a partially persistent bst)
Somehow, sectors are sorted in those respective BST's, even though the segments dividing them are almost always at an angle. Does each node has to carry a definition of area?
Please refer to this example image:
Questions:
How would I actually figure out if the point lies in Node c and not in Node b? Is it somehow via area?
Do I instead need arrange my nodes to contain information about the segments? I could then check if the query point lies above a segment (if that's how I should determine my sectors)? If so, would I then search through a Polygon List after, to see which polygon this particular segment belongs to?
Maybe I need to store BST for each line and not a slab?
Would I then have to look at 2 BST's belonging to line on the left, and the 2nd one of the line to the right from the vertex? I could then sort the vertexes by y coordinate in each tree and return the y coordinate of a vertex (the end of a segment) right below my query point. Having done this for left and right line, I would then do a comparison to see if the names of the segment those vertexes come from actually match.
However, this will not give me the right answer, since even if the names do match, I might be below or above the segment (if I am close to it). Also, this implies I have to do 3 binary searches (1 for lines, 1 for the y-coordinate on left line, 1 for the right one), and the books says I only need to do 2 searches (1 for slab, 2nd for sector).
Could someone please point me in the direction to do it?
I probably just missed some essential thought or something.
Edit:
Here is another good article, that explains the solution to the problem, however, I don't quite understand how I would achieve the following:
"Consider any query point q ∈ R2. To find the face of G that contains q, we first use binary search with the x-coordinate of q to find the vertical slab s that contains q. Given s, we use binary search with the y-coordinate of q to find the edges of Es between which q lies. "
How exactly to find those 2 edges? Is it as simple as checking if the point lies below the segment? However, this seems like a complicated check to do (and expensive), as we descend down the tree, inspecting the other nodes.
Well, this was asked a year ago so I guess you're not going to be around to accept the answer. It's a fun question, though, so here goes anyway...
Yes, you can divide the plane into slabs using vertical lines through every intersection. The important result is that all line segments that touch a slab will go all the way through, and the order of those line segments will be the same across the entire slab.
In each slab, you make a binary search tree of those line segments, ordered by their order in the slab. The areas that you call "nodes" are the leaves of the tree, and the line segments are the internal nodes. I will call the leaves "areas" to avoid ambiguity. Since the order of the line segments is the same across the entire slab, the order in this tree is valid for every x coordinate in the slab.
to determine the area that contains any point, you find the slab that contains the point, and then do a simple search in the BST to find the area.
Lets say you're looking for point C, and you're at the node in slab 2 that contains (x2,y2) - (x3,y4). Determine whether C is above or below this line segment, and then take that branch in the tree.
For example, if C is (cx,cy) then the y coordinate of the segment at x=cx is:
testy = ((cx-x2)/(x3-x2)) * (y4-y2) + y2
If cy < testy, then take the upward branch, otherwise take the downward branch. (assuming positive Y goes downward).
When you get to a leaf, that's the area that contains C. You're done.
Now... Making a whole new tree for every slab takes a lot of space, since you can have N^2 intersection points and therefore N^2 slabs. Storing a separate tree in every slab would take O(N^3) space.
Using, say, persistent red-black trees, however, the slabs can actually share a lot of nodes. Using a simple purely functional implementation like Chris Okasaki's red-black tree takes O(log N) space per intersection for a total of O(N^2*log N) space.
I seem to remember that there is also a constant-space-per-change persistent red-black tree that would give you O(N^2), but I don't have a reference.
Note that for most real-life scenarios there are closer to O(N) intersections, because lines don't cross very much, but it's still nice to save the space by using a persistent tree.

Algorithm for falling grid items

I do not know how to describe the goal succinctly, which may be why I haven't been able to find an applicable algorithm despite ample searching, but a picture shows it clearly:
Given the state of items in the grid at the left, does anyone know of an algorithm for efficiently finding the ending positions shown in the grid at right? In this case all the items have "fallen" "down", but the direction of course is arbitrary. The point is just that:
There are a collection of items of arbitrary shapes, but all composed of contiguous squares
Items cannot overlap
All items should move the maximum distance in a given direction until they are touching a wall, or they are touching another item which [...is touching another item ad infinitum...] is touching a wall.
This is not homework, I'm not a student. This is for my own interest in geometry and programming. I haven't mentioned the language because it doesn't matter. I can implement whatever algorithm in the language I'm using for the specific project I'm working on. A useful answer could be described in words or code; it's the ideas that matter.
This problem could probably be abstracted into some kind of graph (in the mathematical sense) of dependencies and slack space, so perhaps an algorithm aimed at minimizing lag time could be adapted.
If you don't know the answer but are about to try to make up an algorithm on the spot, just remember that there can be circular dependencies, such as with the interlocking pink (backwards) "C" and blue "T" shapes. Parts of T are below C, and parts of C are below T. This would be even more tricky if interlocking items were locked through a "loop" of several pieces.
Some notes for an applicable algorithm: All the following are very easy and fast to do because of the way I've built the grid object management framework:
Enumerate the individual squares within a piece
Enumerate all pieces
Find the piece, if any, occupying a specific square in the overall grid
A note on the answer:
maniek hinted it first, but bloops has provided a brilliant explanation. I think the absolute key is the insight that all pieces moving the same amount maintain their relationship to each other, and therefore those relationships don't have to be considered.
An additional speed-up for a sparsely populated board would be to shift all pieces to eliminate rows that are completely empty. It is very easy to count empty rows and to identify pieces on one side ("above") an empty row.
Last note: I did in fact implement the algorithm described by bloops, with a few implementation-specific modifications. It works beautifully.
The Idea
Define the set of frozen objects inductively as follows:
An object touching the bottom is frozen.
An object lying on a frozen object is frozen.
Intuitively, exactly the frozen objects have reached their final place. Call the non-frozen objects active.
Claim: All active objects can fall one unit downwards simultaneously.
Proof: Of course, an active object will not hit another active object, since their relative position with respect to each other does not change. An active object will also not hit a frozen object. If that was so, then the active object was, in fact, frozen, because it was lying on a a frozen object. This contradicts our assumption.
Our algorithm's very high-level pseudo-code would be as follows:
while (there are active objects):
move active objects downwards simultaneously until one of them hits a frozen object
update the status (active/frozen) of each object
Notice that at least one object becomes frozen in each iteration of the while loop. Also, every object becomes frozen exactly once. These observations would be used while analyzing the run-time complexity of the actual algorithm.
The Algorithm
We use the concept of time to improve the efficiency of most operations. The time is measured starting from 0, and every unit movement of the active objects take 1 unit time. Observe that, when we are at time t, the displacement of all the objects currently active at time t, is exactly t units downward.
Note that in each column, the relative ordering of each cell is fixed. One of the implications of this is that each cell can directly stop at most one other cell from falling. This observation could be used to efficiently predict the time of the next collision. We can also get away with 'processing' every cell at most once.
We index the columns starting from 1 and increasing from left to right; and the rows with height starting from 1. For ease of implementation, introduce a new object called bottom - which is the only object which is initially frozen and consists of all the cells at height 0.
Data Structures
For an efficient implementation, we maintain the following data structures:
An associative array A containing the final displacement of each cell. If a cell is active, its entry should be, say, -1.
For each column k, we maintain the set S_k of the initial row numbers of active cells in column k. We need to be able to support successor queries and deletions on this set. We could use a Van Emde Boas tree, and answer every query in O(log log H); where H is the height of the grid. Or, we could use a balanced binary search tree which can perform these operations in O(log N); where N is the number of cells in column k.
A priority queue Q which will store the the active cells with its key as the expected time of its future collision. Again, we could go for either a vEB tree for a query time of O(log log H) or a priority queue with O(log N) time per operation.
Implementation
The detailed pseudo-code of the algorithm follows:
Populate the S_k's with active cells
Initialize Q to be an empty priority queue
For every cell b in bottom:
Push Q[b] = 0
while (Q is not empty):
(x,t) = Q.extract_min() // the active cell x collides at time t
Object O = parent_object(x)
For every cell y in O:
A[y] = t // freeze cell y at displacement t
k = column(y)
S_k.delete(y)
a = S_k.successor(y) // find the only active cell that can collide with y
if(a != nil):
// find the expected time of the collision between a and y
// note that both their positions are currently t + (their original height)
coll_t = t + height(a) - height(y) - 1
Push/update Q[a] = coll_t
The final position of any object can be obtained by querying A for the displacement of any cell belonging to that object.
Running Time
We process and freeze every cell exactly once. We perform a constant number of lookups while freezing every cell. We assume parent_object lookup can be done in constant time. The complexity of the whole algorithm is O(N log N) or O(N log log H) depending on the data structures we use. Here, N is the total number of cells across all objects.
And now something completely different :)
Each piece that rests on the ground is fixed. Each piece that rests on a fixed piece is fixed. The rest can move. Move the unfixed pieces 1 square down, repeat until nothing can move.
Okay so this appears to be as follows -
we proceed in step
in each step we build a directed graph whose vertices are the object set and whose edges are as follows =>
1) if x and y are two objects then add an edge x->y if x cannot move until y moves. Note that we can have both x->y and y->x edges.
2) further there are objects which can no longer move since they are at the bottom so we colour their vertices as blue. The remaining vertices are red.
2) in the directed graph we find all the strongly connected components using Kosaraju's algorithm/Tarjan's algorithm etc. (In case you are not familiar with SCC then they are extremely powerful technique and you can refer to Kosaraju's algorithm.) Once we get the SCCs we shrink them to a small vertex i.e. we replace the SCC by a single vertex while preserving all external(to SCC) edges. Now if any of the vertex in SCC is blue then we color the new vertex as blue else it is red. This implies that if one object cannot move in SCC then none can.
3) the graph you get will be a directed acyclic graph and you can do a topological sort. Traverse the vertex in increasing order of top numbering and as long as you see a red colour vertex and move the objects represented by the vertex.
continue this step until you can no longer move any vertex in step 3.
If two objects A and B overlap then we say that they are inconsistent relative to each other. For proof of correctness argue the following lemmas
1) "if I move an SCC then none of the object in it cause inconsistency among themselves."
2) "when I move an object in step 3 then I do not cause inconsistencies"
The challenge for you now will be to formally prove the correctness and find suitable data structures to solve it in efficient. Let me know if you need any help.
I haven't cleared all the details, but I think the following seems like a somewhat systematic approach:
Looking at the whole picture as a graph, what you need is a topologic sort of all the vertices - i.e. the items. Item A should be before item B in the sorting if any part of A is below any part of B. Then, when you have the items sorted topologically, you can just iterate through them in this order and determine the positions - as all the items below the current one already have fixed positions.
In order to be able to do the topologic sort, you need an acyclic graph. You can then use some of the algorithms for Strongly Connected Components to find all the cycles and compress them into single vertices. Then you can perform a topsort on the resulting graph.
To find the positions of the pieces within a SCC: first consider it as one big piece and determine where it will end up. This will determine some fixed pieces that cannot move anymore. Remove them and repeat the same procedure for the rest of the pieces in this SCC (if any) to find out their final positions.
The third part is the only one that seems computationally intensive - if you have a very complicated structure for the pieces, but it should still be more optimal than trying to move the pieces one square of the grid at a time.
EDITED several times. I think this is all you need to do:
Find all the pieces that can only fall mutually dependent on each other and combine them into an equivalent larger piece (e.g. the T and backwards C in your picture.)
Iterate through all the pieces, moving them the maximum direction down before they hit something. Repeat until nothing moves.

Suggestions on speeding up edge selection

I am building a graph editor in C# where the user can place nodes and then connect them with either a directed or undirected edge. When finished, an A* pathfinding algorithm determines the best path between two nodes.
What I have: A Node class with an x, y, list of connected nodes and F, G and H scores.
An Edge class with a Start, Finish and whether or not it is directed.
A Graph class which contains a list of Nodes and Edges as well as the A* algorithm
Right now when a user wants to select a node or an edge, the mouse position gets recorded and I iterate through every node and edge to determine whether it should be selected. This is obviously slow. I was thinking I can implement a QuadTree for my nodes to speed it up however what can I do to speed up edge selection?
Since users are "drawing" these graphs I would assume they include a number of nodes and edges that humans would likely be able to generate (say 1-5k max?). Just store both in the same QuadTree (assuming you already have one written).
You can easily extend a classic QuadTree into a PMR QuadTree which adds splitting criteria based on the number of line segments crossing through them. I've written a hybrid PR/PMR QuadTree which supported bucketing both points and lines, and in reality it worked with a high enough performance for 10-50k moving objects (rebalancing buckets!).
So your problem is that the person has already drawn a set of nodes and edges, and you'd like to make the test to figure out which edge was clicked on much faster.
Well an edge is a line segment. For the purpose of filtering down to a small number of possible candidate edges, there is no harm in extending edges into lines. Even if you have a large number of edges, only a small number will pass close to a given point so iterating through those won't be bad.
Now divide edges into two groups. Vertical, and not vertical. You can store the vertical edges in a sorted datastructure and easily test which vertical lines are close to any given point.
The not vertical ones are more tricky. For them you can draw vertical boundaries to the left and right of the region where your nodes can be placed, and then store each line as the pair of heights at which the line intersects those lines. And you can store those pairs in a QuadTree. You can add to this QuadTree logic to be able to take a point, and search through the QuadTree for all lines passing within a certain distance of that point. (The idea is that at any point in the QuadTree you can construct a pair of bounding lines for all of the lines below that point. If your point is not between those lines, or close to them, you can skip that section of the tree.)
I think you have all the ingredients already.
Here's a suggestion:
Index all your edges in a spatial data structure (could be QuadTree, R-Tree etc.). Every edge should be indexed using its bounding box.
Record the mouse position.
Search for the most specific rectangle containing your mouse position.
This rectangle should have one or more edges/nodes; Iterate through them, according to the needed mode.
(The tricky part): If the user has not indicated any edge from the most specific rectangle, you should go up one level and iterate over the edges included in this level. Maybe you can do without this.
This should be faster.

Resources