Efficient placement of boxes in a voxel environment - algorithm

I have an arbitrary set of voxels that define a room for a game. My goal is to place various props (bound to the size of voxels) within this room according to specific rules for the prop. Some simple rule examples (voxel dimensions are xyz, and there are far more props than just these three):
A bookshelf (2x3x1) needs to be placed against a wall and the ground.
A table (4x1x2) needs to be placed on the ground.
A wall-mounted lamp (1x1x1) needs to be placed against the wall.
The props can't intersect with other props or with existing voxels. I'm trying to find some efficient data structures and/or algorithms that can let me do this fairly fast. My current method is this:
Create a set S of possible prop locations, which is the set of empty voxels that are adjacent to filled voxels.
Mark each item in S if it's a floor, wall, corner, etc.
Pick a prop P to be placed.
Choose a subset S' of S that fulfills the placement rules of the prop (wall only, corner, etc).
Pick an arbitrary element E from S'.
Here is the non-optimal part: try to somehow fit P around E. See if the bounds of P allows it to be placed on top of E without intersecting with other props and voxels. If it doesn't fit, try rotating and/or translating the bounds of P until it's in a legal spot that contains E.
If it can fit, then update S to include the new prop, and start placing more props.
If it still can't fit, then pick another arbitrary element from S' and try again.
It technically works but it's not very optimal and can perform horribly in worst-case scenarios, such as when a prop can't fit anywhere in a room, or if I'm picking many large floor props to put in a room where most of the floor space is broken up by pillars and
holes.
It would be ideal if I could somehow take the dimensions of P into account when picking E. I was looking into generating a 3D convolution map of the voxel grid (essentially making a blurry image of the grid) so that each voxel has some rough data about how much space it has around it, but the problem is I need to update the map every time I place down a new prop, which sounds expensive.
Another idea was to store the world in an octree and somehow make better placement checks with that, but I can't seem to picture how that would help much. Would an octree allow me to determine if an arbitrary box contains any points any more efficiently than a dictionary keyed by position?
TLDR: How would you programatically decorate a house in Minecraft using decorations that can be larger than a single voxel?

If you don't have too many voxels in S, after creating walls and floors you can simply create 3 exhaustive sets of valid placements, one for each prop type. Let's call the set of valid placements for props of type p ValidPlacements(p).
Then, when you successfully place a new object into the world, for each prop type p, generate the set of placements of type p that would intersect in at least 1 voxel with the just-placed object, and delete these from ValidPlacements(p). (Some of these placements may already be absent, because they were already known to be impossible due to earlier-placed objects -- this is not an error condition, it can just be ignored.) Use a hashtable or balanced tree structure to hold each set of placements, so that each one can be looked up and deleted in O(1) time, or O(log n) time, respectively
Because your objects are small, placing an object eliminates only a small number of other possible object placements, so the number of placements deleted for any object will be small (it should be roughly proportional to the product of the volumes of the two objects being intersected). If you need to backtrack and try other placements of an object x, record which placements were actually deleted from the allowed-placements sets when x was placed, and reinsert them when you remove x.
Example: Placing a bookshelf with top-left-forwardmost corner at (x, y, z) will eliminate 2*3*1 = 6 possible placements of lamps (i.e. 1 placement for each voxel now occupied by the table), and a larger number of table and bookshelf placements (i.e. 1 placement for each possible placement that would overlap in any way with the just-placed bookshelf).

Related

How to index nearby 3D points on the fly?

In physics simulations (for example n-body systems) it is sometimes necessary to keep track of which particles (points in 3D space) are close enough to interact (within some cutoff distance d) in some kind of index. However, particles can move around, so it is necessary to update the index, ideally on the fly without recomputing it entirely. Also, for efficiency in calculating interactions it is necessary to keep the list of interacting particles in the form of tiles: a tile is a fixed size array (eg 32x32) where the rows and columns are particles, and almost every row-particle is close enough to interact with almost every column particle (and the array keeps track of which ones actually do interact).
What algorithms may be used to do this?
Here is a more detailed description of the problem:
Initial construction: Given a list of points in 3D space (on the order of a few thousand to a few million, stored as array of floats), produce a list of tiles of a fixed size (NxN), where each tile has two lists of points (N row points and N column points), and a boolean array NxN which describes whether the interaction between each row and column particle should be calculated, and for which:
a. every pair of points p1,p2 for which distance(p1,p2) < d is found in at least one tile and marked as being calculated (no missing interactions), and
b. if any pair of points is in more than one tile, it is only marked as being calculated in the boolean array in at most one tile (no duplicates),
and also the number of tiles is relatively small if possible (but this is less important than being able to update the tiles efficiently)
Update step: If the positions of the points change slightly (by much less than d), update the list of tiles in the fastest way possible so that they still meet the same conditions a and b (this step is repeated many times)
It is okay to keep any necessary data structures that help with this, for example the bounding boxes of each tile, or a spatial index like a quadtree. It is probably too slow to calculate all particle pairwise distances for every update step (and in any case we only care about particles which are close, so we can skip most possible pairs of distances just by sorting along a single dimension for example). Also it is probably too slow to keep a full (quadtree or similar) index of all particle positions. On the other hand is perfectly fine to construct the tiles on a regular grid of some kind. The density of particles per unit volume in 3D space is roughly constant, so the tiles can probably be built from (essentially) fixed size bounding boxes.
To give an example of the typical scale/properties of this kind of problem, suppose there is 1 million particles, which are arranged as a random packing of spheres of diameter 1 unit into a cube with of size roughly 100x100x100. Suppose the cutoff distance is 5 units, so typically each particle would be interacting with (2*5)**3 or ~1000 other particles or so. The tile size is 32x32. There are roughly 1e+9 interacting pairs of particles, so the minimum possible number of tiles is ~1e+6. Now assume each time the positions change, the particles move a distance around 0.0001 unit in a random direction, but always in a way such that they are at least 1 unit away from any other particle and the typical density of particles per unit volume stays the same. There would typically be many millions of position update steps like that. The number of newly created pairs of interactions per step due to the movement is (back of the envelope) (10**2 * 6 * 0.0001 / 10**3) * 1e+9 = 60000, so one update step can be handled in principle by marking 60000 particles as non-interacting in their original tiles, and adding at most 60000 new tiles (mostly empty - one per pair of newly interacting particles). This would rapidly get to a point where most tiles are empty, so it is definitely necessary to combine/merge tiles somehow pretty often - but how to do it without a full rebuild of the tile list?
P.S. It is probably useful to describe how this differs from the typical spatial index (eg octrees) scenario: a. we only care about grouping close by points together into tiles, not looking up which points are in an arbitrary bounding box or which points are closest to a query point - a bit closer to clustering that querying and b. the density of points in space is pretty constant and c. the index has to be updated very often, but most moves are tiny
Not sure my reasoning is sound, but here's an idea:
Divide your space into a grid of 3d cubes, like this in three dimensions:
The cubes have a side length of d. Then do the following:
Assign all points to all cubes in which they're contained; this is fast since you can derive a point's cube from just their coordinates
Now check the following:
Mark all points in the top left of your cube as colliding; they're less than d apart. Further, every "quarter cube" in space is only the top left quarter of exactly one cube, so you won't check the same pair twice.
Check fo collisions of type (p, q), where p is a point in the top left quartile, and q is a point not in the top left quartile. In this way, you will check collision between every two points again at most once, because very pair of quantiles is checked exactly once.
Since every pair of points is either in the same quartile or in neihgbouring quartiles, they'll be checked by the first or the second algorithm. Further, since points are approximately distributed evenly, your runtime is much less than n^2 (n=no points); in aggregate, it's k^2 (k = no points per quartile, which appears to be approximately constant).
In an update step, you only need to check:
if a point crossed a boundary of a box, which should be fast since you can look at one coordinate at a time, and box' boundaries are a simple multiple of d/2
check for collisions of the points as above
To create the tiles, divide the space into a second grid of (non-overlapping) cubes whose width is chosen s.t. the average count of centers between two particles that almost interact with each other that fall into a given cube is less than the width of your tiles (i.e. 32). Since each particle is expected to interact with 300-500 particles, the width will be much smaller than d.
Then, while checking for interactions in step 1 & 2, assigne particle interactions to these new cubes according to the coordinates of the center of their interaction. Assign one tile per cube, and mark interacting particles assigned to that cube in the tile. Visualization:
Further optimizations might be to consider the distance of a point's closest neighbour within a cube, and derive from that how many update steps are needed at least to change the collision status of that point; then ignore that point for this many steps.
I suggest the following algorithm. E.g we have cube 1x1x1 and the cutoff distance is 0.001
Let's choose three base anchor points: (0,0,0) (0,1,0) (1,0,0)
Associate array of size 1000 ( 1 / 0.001) with each anchor point
Add three numbers into each regular point. We will store the distance between the given point and each anchor point inside these fields
At the same time this distance will be used as an index in an array inside the anchor point. E.g. 0.4324 means index 432.
Let's store the set of points inside of each three arrays
Calculate distance between the regular point and each anchor point every time when update point
Move point between sets in arrays during the update
The given structures will give you an easy way to find all closer points: it is the intersection between three sets. And we choose these sets based on the distance between point and anchor points.
In short, it is the intersection between three spheres. Maybe you need to apply additional filtering for the result if you want to erase the corners of this intersection.
Consider using the Barnes-Hut algorithm or something similar. A simulation in 2D would use a quadtree data structure to store particles, and a 3D simulation would use an octree.
The benefit of using a a tree structure is that it stores the particles in a way that nearby particles can be found quickly by traversing the tree, and far-away particles are in traversal paths that can be ignored.
Wikipedia has a good description of the algorithm:
The Barnes–Hut tree
In a three-dimensional n-body simulation, the Barnes–Hut algorithm recursively divides the n bodies into groups by storing them in an octree (or a quad-tree in a 2D simulation). Each node in this tree represents a region of the three-dimensional space. The topmost node represents the whole space, and its eight children represent the eight octants of the space. The space is recursively subdivided into octants until each subdivision contains 0 or 1 bodies (some regions do not have bodies in all of their octants). There are two types of nodes in the octree: internal and external nodes. An external node has no children and is either empty or represents a single body. Each internal node represents the group of bodies beneath it, and stores the center of mass and the total mass of all its children bodies.
demo

Finding Contiguous Areas of Bits in 2D Bit Array

The Problem
I have a bit array which represents a 2-dimensional "map" of "tiles". This image provides a graphical example of the bits in the bit array:
I need to determine how many contiguous "areas" of bits exist in the array. In the example above, there are two such contiguous "areas", as illustrated here:
Tiles must be located directly N, S, E or W of a tile to be considered "contiguous". Diagonally-touching tiles do not count.
What I've Got So Far
Because these bit arrays can become relatively large (several MB in size), I have intentionally avoided using any sort of recursion in my algorithm.
The pseudo-code is as follows:
LET S BE SOURCE DATA ARRAY
LET C BE ARRAY OF IDENTICAL LENGTH TO SOURCE DATA USED TO TRACK "CHECKED" BITS
FOREACH INDEX I IN S
IF C[I] THEN
CONTINUE
ELSE
SET C[I]
IF S[I] THEN
EXTRACT_AREA(S, C, I)
EXTRACT_AREA(S, C, I):
LET T BE TARGET DATA ARRAY FOR STORING BITS OF THE AREA WE'RE EXTRACTING
LET F BE STACK OF TILES TO SEARCH NEXT
PUSH I UNTO F
SET T[I]
WHILE F IS NOT EMPTY
LET X = POP FROM F
IF C[X] THEN
CONTINUE
ELSE
SET C[X]
IF S[X] THEN
PUSH TILE NORTH OF X TO F
PUSH TILE SOUTH OF X TO F
PUSH TILE WEST OF X TO F
PUSH TILE EAST OF X TO F
SET T[X]
RETURN T
What I Don't Like About My Solution
Just to run, it requires two times the memory of the bitmap array it's processing.
While extracting an "area", it uses three times the memory of the bitmap array.
Duplicates exist in the "tiles to check" stack - which seems ugly, but not worth avoiding the way I have things now.
What I'd Like To See
Better memory profile
Faster handling of large areas
Solution / Follow-Up
I re-wrote the solution to explore the edges only (per #hatchet 's suggestion).
This was very simple to implement - and eliminated the need to keep track of "visited tiles" completely.
Based on three simple rules, I can traverse the edges, track min/max x & y values, and complete when I've arrived at the start again.
Here's the demo with the three rules I used:
One approach would be a perimeter walk.
Given a starting point anywhere along the edge of the shape, remember that point.
Start the bounding box as just that point.
Walk the perimeter using a clockwise rule set - if the point used to get to the current point was above, then first look right, then down, then left to find the next point on the shape perimeter. This is kind of like the simple strategy of solving a maze where you continuously follow a wall and always bear to the right.
Each time you visit a new perimeter point, expand the bounding box if the new point is outside it (i.e. keep track of the min and max x and y.
Continue until the starting point is reached.
Cons: if the shape has lots of single pixel 'filaments', you'll be revisiting them as the walk comes back.
Pros: if the shape has large expanses of internal occupied space, you never have to visit them or record them like you would if you were recording visited pixels in a flood fill.
So, conserves space, but in some cases at the expense of time.
Edit
As is so often the case, this problem is known, named, and has multiple algorithmic solutions. The problem you described is called Minimum Bounding Rectangle. One way to solve this is using Contour Tracing. The method I described above is in that class, and is called Moore-Neighbor Tracing or Radial Sweep. The links I've included for them discuss them in detail and point out a problem I hadn't caught. Sometimes you'll revisit the start point before you have traversed the entire perimeter. If your start point is for instance somewhere along a single pixel 'filament', you will revisit it before you're done, and unless you consider this possibility, you'll stop prematurely. The website I linked to talks about ways to address this stopping problem. Other pages at that website also talk about two other algorithms: Square Tracing, and Theo Pavlidis's Algorithm. One thing to note is that these consider diagonals as contiguous, whereas you don't, but that should just something that can be handled with minor modifications to the basic algorithms.
An alternative approach to your problem is Connected-component labeling. For your needs though, this may be a more time expensive solution than you require.
Additional resource:
Moore Neighbor Contour Tracing Algorithm in C++
I actually got a question like this in an interview once.
You can pretend the array is a graph and the connected nodes are the adjacent ones. My algo would involves going 1 to the right until you find a marked node. When you find one do a breadth first search which runs in O(n) and avoids recursion. When the BFS returns keep searching from where you left off and if the node has already been marked by one of the previous BFS's you obviously don't need to search. I wasn't sure if you wanted to actually return the number of objects found, but it's easy to keep track by just incrementing a counter when you hit the first marked square.
Generally when you do a flood fill type algorithm you are placed in a spot and asked to fill. Since this is finding all the filled regions one way you would want to optimize it is to avoid rechecking the already marked nodes from previous BFS's, unfortunately at the moment I cannot think of a way to do that.
One hacky way to reduce memory consumption would be too store a short[][] instead of a boolean. Then use this scheme to avoid making a whole second 2d-array
unmarked = 0, marked = 1, checked and unmarked = 3, checked and marked = 3
This way you can check the status of an entry by its value and avoid making a second array.

Algorithm for falling grid items

I do not know how to describe the goal succinctly, which may be why I haven't been able to find an applicable algorithm despite ample searching, but a picture shows it clearly:
Given the state of items in the grid at the left, does anyone know of an algorithm for efficiently finding the ending positions shown in the grid at right? In this case all the items have "fallen" "down", but the direction of course is arbitrary. The point is just that:
There are a collection of items of arbitrary shapes, but all composed of contiguous squares
Items cannot overlap
All items should move the maximum distance in a given direction until they are touching a wall, or they are touching another item which [...is touching another item ad infinitum...] is touching a wall.
This is not homework, I'm not a student. This is for my own interest in geometry and programming. I haven't mentioned the language because it doesn't matter. I can implement whatever algorithm in the language I'm using for the specific project I'm working on. A useful answer could be described in words or code; it's the ideas that matter.
This problem could probably be abstracted into some kind of graph (in the mathematical sense) of dependencies and slack space, so perhaps an algorithm aimed at minimizing lag time could be adapted.
If you don't know the answer but are about to try to make up an algorithm on the spot, just remember that there can be circular dependencies, such as with the interlocking pink (backwards) "C" and blue "T" shapes. Parts of T are below C, and parts of C are below T. This would be even more tricky if interlocking items were locked through a "loop" of several pieces.
Some notes for an applicable algorithm: All the following are very easy and fast to do because of the way I've built the grid object management framework:
Enumerate the individual squares within a piece
Enumerate all pieces
Find the piece, if any, occupying a specific square in the overall grid
A note on the answer:
maniek hinted it first, but bloops has provided a brilliant explanation. I think the absolute key is the insight that all pieces moving the same amount maintain their relationship to each other, and therefore those relationships don't have to be considered.
An additional speed-up for a sparsely populated board would be to shift all pieces to eliminate rows that are completely empty. It is very easy to count empty rows and to identify pieces on one side ("above") an empty row.
Last note: I did in fact implement the algorithm described by bloops, with a few implementation-specific modifications. It works beautifully.
The Idea
Define the set of frozen objects inductively as follows:
An object touching the bottom is frozen.
An object lying on a frozen object is frozen.
Intuitively, exactly the frozen objects have reached their final place. Call the non-frozen objects active.
Claim: All active objects can fall one unit downwards simultaneously.
Proof: Of course, an active object will not hit another active object, since their relative position with respect to each other does not change. An active object will also not hit a frozen object. If that was so, then the active object was, in fact, frozen, because it was lying on a a frozen object. This contradicts our assumption.
Our algorithm's very high-level pseudo-code would be as follows:
while (there are active objects):
move active objects downwards simultaneously until one of them hits a frozen object
update the status (active/frozen) of each object
Notice that at least one object becomes frozen in each iteration of the while loop. Also, every object becomes frozen exactly once. These observations would be used while analyzing the run-time complexity of the actual algorithm.
The Algorithm
We use the concept of time to improve the efficiency of most operations. The time is measured starting from 0, and every unit movement of the active objects take 1 unit time. Observe that, when we are at time t, the displacement of all the objects currently active at time t, is exactly t units downward.
Note that in each column, the relative ordering of each cell is fixed. One of the implications of this is that each cell can directly stop at most one other cell from falling. This observation could be used to efficiently predict the time of the next collision. We can also get away with 'processing' every cell at most once.
We index the columns starting from 1 and increasing from left to right; and the rows with height starting from 1. For ease of implementation, introduce a new object called bottom - which is the only object which is initially frozen and consists of all the cells at height 0.
Data Structures
For an efficient implementation, we maintain the following data structures:
An associative array A containing the final displacement of each cell. If a cell is active, its entry should be, say, -1.
For each column k, we maintain the set S_k of the initial row numbers of active cells in column k. We need to be able to support successor queries and deletions on this set. We could use a Van Emde Boas tree, and answer every query in O(log log H); where H is the height of the grid. Or, we could use a balanced binary search tree which can perform these operations in O(log N); where N is the number of cells in column k.
A priority queue Q which will store the the active cells with its key as the expected time of its future collision. Again, we could go for either a vEB tree for a query time of O(log log H) or a priority queue with O(log N) time per operation.
Implementation
The detailed pseudo-code of the algorithm follows:
Populate the S_k's with active cells
Initialize Q to be an empty priority queue
For every cell b in bottom:
Push Q[b] = 0
while (Q is not empty):
(x,t) = Q.extract_min() // the active cell x collides at time t
Object O = parent_object(x)
For every cell y in O:
A[y] = t // freeze cell y at displacement t
k = column(y)
S_k.delete(y)
a = S_k.successor(y) // find the only active cell that can collide with y
if(a != nil):
// find the expected time of the collision between a and y
// note that both their positions are currently t + (their original height)
coll_t = t + height(a) - height(y) - 1
Push/update Q[a] = coll_t
The final position of any object can be obtained by querying A for the displacement of any cell belonging to that object.
Running Time
We process and freeze every cell exactly once. We perform a constant number of lookups while freezing every cell. We assume parent_object lookup can be done in constant time. The complexity of the whole algorithm is O(N log N) or O(N log log H) depending on the data structures we use. Here, N is the total number of cells across all objects.
And now something completely different :)
Each piece that rests on the ground is fixed. Each piece that rests on a fixed piece is fixed. The rest can move. Move the unfixed pieces 1 square down, repeat until nothing can move.
Okay so this appears to be as follows -
we proceed in step
in each step we build a directed graph whose vertices are the object set and whose edges are as follows =>
1) if x and y are two objects then add an edge x->y if x cannot move until y moves. Note that we can have both x->y and y->x edges.
2) further there are objects which can no longer move since they are at the bottom so we colour their vertices as blue. The remaining vertices are red.
2) in the directed graph we find all the strongly connected components using Kosaraju's algorithm/Tarjan's algorithm etc. (In case you are not familiar with SCC then they are extremely powerful technique and you can refer to Kosaraju's algorithm.) Once we get the SCCs we shrink them to a small vertex i.e. we replace the SCC by a single vertex while preserving all external(to SCC) edges. Now if any of the vertex in SCC is blue then we color the new vertex as blue else it is red. This implies that if one object cannot move in SCC then none can.
3) the graph you get will be a directed acyclic graph and you can do a topological sort. Traverse the vertex in increasing order of top numbering and as long as you see a red colour vertex and move the objects represented by the vertex.
continue this step until you can no longer move any vertex in step 3.
If two objects A and B overlap then we say that they are inconsistent relative to each other. For proof of correctness argue the following lemmas
1) "if I move an SCC then none of the object in it cause inconsistency among themselves."
2) "when I move an object in step 3 then I do not cause inconsistencies"
The challenge for you now will be to formally prove the correctness and find suitable data structures to solve it in efficient. Let me know if you need any help.
I haven't cleared all the details, but I think the following seems like a somewhat systematic approach:
Looking at the whole picture as a graph, what you need is a topologic sort of all the vertices - i.e. the items. Item A should be before item B in the sorting if any part of A is below any part of B. Then, when you have the items sorted topologically, you can just iterate through them in this order and determine the positions - as all the items below the current one already have fixed positions.
In order to be able to do the topologic sort, you need an acyclic graph. You can then use some of the algorithms for Strongly Connected Components to find all the cycles and compress them into single vertices. Then you can perform a topsort on the resulting graph.
To find the positions of the pieces within a SCC: first consider it as one big piece and determine where it will end up. This will determine some fixed pieces that cannot move anymore. Remove them and repeat the same procedure for the rest of the pieces in this SCC (if any) to find out their final positions.
The third part is the only one that seems computationally intensive - if you have a very complicated structure for the pieces, but it should still be more optimal than trying to move the pieces one square of the grid at a time.
EDITED several times. I think this is all you need to do:
Find all the pieces that can only fall mutually dependent on each other and combine them into an equivalent larger piece (e.g. the T and backwards C in your picture.)
Iterate through all the pieces, moving them the maximum direction down before they hit something. Repeat until nothing moves.

Nearest neighbour search in a constantly changing set of line segments

I have a set of line segments. I want to perform the following operations on them:
Insert a new line segment.
Find all line segments within radius R of a given point.
Find all points within radium R1 of a given point.
Given a line segment, find if it intersects any of the existing line segments. Exact intersection point is not necessary(though that probably doesnt simplify anything.)
The problem is algorithms like kd/bd tree or BSP trees is that they assume a static set of points, and in my case the points will constantly get augmented with new points, necessitating rebuilding the tree.
What data-structure/algorithms will be most suited for this situation ?
Edit: Accepted the answer that is the simplest and solves my problem. Thanks everyone!
Maintaining a dynamic tree might not be as bad as you think.
As you insert new points/lines etc into the collection it's clear that you'd need to refine the current tree, but I can't see why you'd have to re-build the whole tree from scratch every time a new entity was added, as you're suggesting.
With a dynamic tree approach you'd have a valid tree at all times, so you can then just use the fast range searches supported by your tree type to find the geometric entities you've mentioned.
For your particular problem:
You could setup a dynamic geometric tree, where the leaf elements
maintain a list of geometric entities (points and lines) associated
with that leaf.
When a line is inserted into the collection it should be pushed onto
the lists of all leaf elements that it intersects with. You can do
this efficiently by traversing the subset of the tree from the root
that intersects with the line.
To find all points/lines within a specified radial halo you first need
to find all leaves in this region. Again, this can be done by traversing
the subset of the tree from the root that is enclosed by, or that
intersects with the halo. Since there maybe some overlap, you need to
check that the entities associated with this set of leaf elements
actually lie within the halo.
Once you've inserted a line into a set of leaf elements, you can find
whether it intersects with another line by scanning all of the lines
associated with the subset of leaf boxes you've just found. You can then
do line-line intersection tests on this subset.
A potential dynamic tree refinement algorithm, based on an upper limit to the number of entities associated with each leaf in the tree, might work along these lines:
function insert(x, y)
find the tree leaf element enclosing the new entitiy at (x,y) based on whatever
fast search algorithm your tree supports
if (number of entities per leaf > max allowable) do
refine current leaf element (would typically either be a bisection
or quadrisection based on a 2D tree type)
push all entities from the old leaf element list onto the new child element
lists, based on the enclosing child element
else
push new entity onto list for leaf element
endif
This type of refinement strategy only makes local changes to the tree and is thus generally pretty fast in practice. If you're also deleting entities from the collection you can also support dynamic aggregation by imposing a minimum number of entities per leaf, and collapsing leaf elements to their parents when necessary.
I've used this type of approach a number of times with quadtrees/octrees, and I can't at this stage see why a similar approach wouldn't work with kd-trees etc.
Hope this helps.
One possibility is dividing your space into a grid of boxes - perhaps 10 in the y-axis and 10 in the x-axis for a total of 100.
Store these boxes in an array, so it's very easy/fast to determine neighboring boxes. Each box will hold a list vector of line segments that live in that box.
When you calculate line segments within R of one segment, you can check only the relevant neighboring boxes.
Of course, you can create multiple maps of differing granularities, maybe 100 by 100 smaller boxes. Simply consider space vs time and maintenance trade-offs in your design.
updating segment positions is cheap: just integer-divide by box sizes in the x and y directions. For example, if box-size is 20 in both directions and your new coordinate is 145,30. 145/20==7 and 30/20==1, so it goes into box(7,1), for a 0-based system.
While items 2 & 3 are relatively easy, using a simple linear search with distance checks as each line is inserted, item 4 is a bit more involved.
I'd tend to use a constrained triangulation to solve this, where all the input lines are treated as constraints, and the triangulation is balanced using a nearest neighbour rather than Delaunay criterion. This is covered pretty well in Triangulations and applications by Øyvind Hjelle, Morten Dæhlen and Joseph O'Rourkes Computational Geometry in C Both have source available, including getting sets of all intersections.
The approach I've taken to do this dynamically in the past is as follows;
Create a arbitrary triangulation (TIN) comprising of two triangles
surrounding the extents (current + future) of your data.
For each new line
Insert both points into the TIN. This can be done very quickly by
traversing triangles to find the insertion point, and replacing the
triangle found with three new triangles based on the new point and
old triangle.
Cut a section through the TIN based on the two end points, keeping a
list of points where the section cuts any previously inserted lined.
Add the intersection point details to a list stored against both
lines, and insert them into the TIN.
Force the inserted line as a constraint
Balance all pairs of adjacent triangles modified in the above process
using a nearest neighbour criterion, and repeat until all triangles
have been balanced.
This works better than a grid based method for poorly distributed data, but is more difficult to implement. Grouping end-point and lines into overlapping grids will probably be a good optimization for 2 & 3.
Note that I think using the term 'nearest neighbour' in your question is misleading, as this is not the same as 'all points within a given distance of a line', or 'all points within a given radius of another point'. Nearest neighbour typically implies a single result, and does not equate to 'within a given point to point or point to line distance'.
Instead of inserting and deleting into a tree you can calculate a curve that completely fills the plane. Such a curve reduce the 2d complexity to a 1d complexity and you would be able to find the nearest neighbor. You can find some example like z curve and hilbert curve. Here is a better description of my problem http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.

Randomly and efficiently filling space with shapes

What is the most efficient way to randomly fill a space with as many non-overlapping shapes? In my specific case, I'm filling a circle with circles. I'm randomly placing circles until either a certain percentage of the outer circle is filled OR a certain number of placements have failed (i.e. were placed in a position that overlapped an existing circle). This is pretty slow, and often leaves empty spaces unless I allow a huge number of failures.
So, is there some other type of filling algorithm I can use to quickly fill as much space as possible, but still look random?
Issue you are running into
You are running into the Coupon collector's problem because you are using a technique of Rejection sampling.
You are also making strong assumptions about what a "random filling" is. Your algorithm will leave large gaps between circles; is this what you mean by "random"? Nevertheless it is a perfectly valid definition, and I approve of it.
Solution
To adapt your current "random filling" to avoid the rejection sampling coupon-collector's issue, merely divide the space you are filling into a grid. For example if your circles are of radius 1, divide the larger circle into a grid of 1/sqrt(2)-width blocks. When it becomes "impossible" to fill a gridbox, ignore that gridbox when you pick new points. Problem solved!
Possible dangers
You have to be careful how you code this however! Possible dangers:
If you do something like if (random point in invalid grid){ generateAnotherPoint() } then you ignore the benefit / core idea of this optimization.
If you do something like pickARandomValidGridbox() then you will slightly reduce the probability of making circles near the edge of the larger circle (though this may be fine if you're doing this for a graphics art project and not for a scientific or mathematical project); however if you make the grid size 1/sqrt(2) times the radius of the circle, you will not run into this problem because it will be impossible to draw blocks at the edge of the large circle, and thus you can ignore all gridboxes at the edge.
Implementation
Thus the generalization of your method to avoid the coupon-collector's problem is as follows:
Inputs: large circle coordinates/radius(R), small circle radius(r)
Output: set of coordinates of all the small circles
Algorithm:
divide your LargeCircle into a grid of r/sqrt(2)
ValidBoxes = {set of all gridboxes that lie entirely within LargeCircle}
SmallCircles = {empty set}
until ValidBoxes is empty:
pick a random gridbox Box from ValidBoxes
pick a random point inside Box to be center of small circle C
check neighboring gridboxes for other circles which may overlap*
if there is no overlap:
add C to SmallCircles
remove the box from ValidBoxes # possible because grid is small
else if there is an overlap:
increase the Box.failcount
if Box.failcount > MAX_PERGRIDBOX_FAIL_COUNT:
remove the box from ValidBoxes
return SmallCircles
(*) This step is also an important optimization, which I can only assume you do not already have. Without it, your doesThisCircleOverlapAnother(...) function is incredibly inefficient at O(N) per query, which will make filling in circles nearly impossible for large ratios R>>r.
This is the exact generalization of your algorithm to avoid the slowness, while still retaining the elegant randomness of it.
Generalization to larger irregular features
edit: Since you've commented that this is for a game and you are interested in irregular shapes, you can generalize this as follows. For any small irregular shape, enclose it in a circle that represent how far you want it to be from things. Your grid can be the size of the smallest terrain feature. Larger features can encompass 1x2 or 2x2 or 3x2 or 3x3 etc. contiguous blocks. Note that many games with features that span large distances (mountains) and small distances (torches) often require grids which are recursively split (i.e. some blocks are split into further 2x2 or 2x2x2 subblocks), generating a tree structure. This structure with extensive bookkeeping will allow you to randomly place the contiguous blocks, however it requires a lot of coding. What you can do however is use the circle-grid algorithm to place the larger features first (when there's lot of space to work with on the map and you can just check adjacent gridboxes for a collection without running into the coupon-collector's problem), then place the smaller features. If you can place your features in this order, this requires almost no extra coding besides checking neighboring gridboxes for collisions when you place a 1x2/3x3/etc. group.
One way to do this that produces interesting looking results is
create an empty NxM grid
create an empty has-open-neighbors set
for i = 1 to NumberOfRegions
pick a random point in the grid
assign that grid point a (terrain) type
add the point to the has-open-neighbors set
while has-open-neighbors is not empty
foreach point in has-open-neighbors
get neighbor-points as the immediate neighbors of point
that don't have an assigned terrain type in the grid
if none
remove point from has-open-neighbors
else
pick a random neighbor-point from neighbor-points
assign its grid location the same (terrain) type as point
add neighbor-point to the has-open-neighbors set
When done, has-open-neighbors will be empty and the grid will have been populated with at most NumberOfRegions regions (some regions with the same terrain type may be adjacent and so will combine to form a single region).
Sample output using this algorithm with 30 points, 14 terrain types, and a 200x200 pixel world:
Edit: tried to clarify the algorithm.
How about using a 2-step process:
Choose a bunch of n points randomly -- these will become the centres of the circles.
Determine the radii of these circles so that they do not overlap.
For step 2, for each circle centre you need to know the distance to its nearest neighbour. (This can be computed for all points in O(n^2) time using brute force, although it may be that faster algorithms exist for points in the plane.) Then simply divide that distance by 2 to get a safe radius. (You can also shrink it further, either by a fixed amount or by an amount proportional to the radius, to ensure that no circles will be touching.)
To see that this works, consider any point p and its nearest neighbour q, which is some distance d from p. If p is also q's nearest neighbour, then both points will get circles with radius d/2, which will therefore be touching; OTOH, if q has a different nearest neighbour, it must be at distance d' < d, so the circle centred at q will be even smaller. So either way, the 2 circles will not overlap.
My idea would be to start out with a compact grid layout. Then take each circle and perturb it in some random direction. The distance in which you perturb it can also be chosen at random (just make sure that the distance doesn't make it overlap another circle).
This is just an idea and I'm sure there are a number of ways you could modify it and improve upon it.

Resources