I have a set of 3D objects (points, triangles, lines). I would like to split these objects where they overlap. In order to do this, I want to compare each pair of objects which overlaps in 3D space.
I only need to do the operation once, though the new split objects need to be added to the list of 3D objects to check for further overlaps.
Is there an efficient data structure to put the objects into to do this kind of iteration? Would the cost of builidng such a structure be greater than sorting the objects along one axis and iterating a front and back axis coordinate to keep track of overlapping objects.
Related
In physics simulations (for example n-body systems) it is sometimes necessary to keep track of which particles (points in 3D space) are close enough to interact (within some cutoff distance d) in some kind of index. However, particles can move around, so it is necessary to update the index, ideally on the fly without recomputing it entirely. Also, for efficiency in calculating interactions it is necessary to keep the list of interacting particles in the form of tiles: a tile is a fixed size array (eg 32x32) where the rows and columns are particles, and almost every row-particle is close enough to interact with almost every column particle (and the array keeps track of which ones actually do interact).
What algorithms may be used to do this?
Here is a more detailed description of the problem:
Initial construction: Given a list of points in 3D space (on the order of a few thousand to a few million, stored as array of floats), produce a list of tiles of a fixed size (NxN), where each tile has two lists of points (N row points and N column points), and a boolean array NxN which describes whether the interaction between each row and column particle should be calculated, and for which:
a. every pair of points p1,p2 for which distance(p1,p2) < d is found in at least one tile and marked as being calculated (no missing interactions), and
b. if any pair of points is in more than one tile, it is only marked as being calculated in the boolean array in at most one tile (no duplicates),
and also the number of tiles is relatively small if possible (but this is less important than being able to update the tiles efficiently)
Update step: If the positions of the points change slightly (by much less than d), update the list of tiles in the fastest way possible so that they still meet the same conditions a and b (this step is repeated many times)
It is okay to keep any necessary data structures that help with this, for example the bounding boxes of each tile, or a spatial index like a quadtree. It is probably too slow to calculate all particle pairwise distances for every update step (and in any case we only care about particles which are close, so we can skip most possible pairs of distances just by sorting along a single dimension for example). Also it is probably too slow to keep a full (quadtree or similar) index of all particle positions. On the other hand is perfectly fine to construct the tiles on a regular grid of some kind. The density of particles per unit volume in 3D space is roughly constant, so the tiles can probably be built from (essentially) fixed size bounding boxes.
To give an example of the typical scale/properties of this kind of problem, suppose there is 1 million particles, which are arranged as a random packing of spheres of diameter 1 unit into a cube with of size roughly 100x100x100. Suppose the cutoff distance is 5 units, so typically each particle would be interacting with (2*5)**3 or ~1000 other particles or so. The tile size is 32x32. There are roughly 1e+9 interacting pairs of particles, so the minimum possible number of tiles is ~1e+6. Now assume each time the positions change, the particles move a distance around 0.0001 unit in a random direction, but always in a way such that they are at least 1 unit away from any other particle and the typical density of particles per unit volume stays the same. There would typically be many millions of position update steps like that. The number of newly created pairs of interactions per step due to the movement is (back of the envelope) (10**2 * 6 * 0.0001 / 10**3) * 1e+9 = 60000, so one update step can be handled in principle by marking 60000 particles as non-interacting in their original tiles, and adding at most 60000 new tiles (mostly empty - one per pair of newly interacting particles). This would rapidly get to a point where most tiles are empty, so it is definitely necessary to combine/merge tiles somehow pretty often - but how to do it without a full rebuild of the tile list?
P.S. It is probably useful to describe how this differs from the typical spatial index (eg octrees) scenario: a. we only care about grouping close by points together into tiles, not looking up which points are in an arbitrary bounding box or which points are closest to a query point - a bit closer to clustering that querying and b. the density of points in space is pretty constant and c. the index has to be updated very often, but most moves are tiny
Not sure my reasoning is sound, but here's an idea:
Divide your space into a grid of 3d cubes, like this in three dimensions:
The cubes have a side length of d. Then do the following:
Assign all points to all cubes in which they're contained; this is fast since you can derive a point's cube from just their coordinates
Now check the following:
Mark all points in the top left of your cube as colliding; they're less than d apart. Further, every "quarter cube" in space is only the top left quarter of exactly one cube, so you won't check the same pair twice.
Check fo collisions of type (p, q), where p is a point in the top left quartile, and q is a point not in the top left quartile. In this way, you will check collision between every two points again at most once, because very pair of quantiles is checked exactly once.
Since every pair of points is either in the same quartile or in neihgbouring quartiles, they'll be checked by the first or the second algorithm. Further, since points are approximately distributed evenly, your runtime is much less than n^2 (n=no points); in aggregate, it's k^2 (k = no points per quartile, which appears to be approximately constant).
In an update step, you only need to check:
if a point crossed a boundary of a box, which should be fast since you can look at one coordinate at a time, and box' boundaries are a simple multiple of d/2
check for collisions of the points as above
To create the tiles, divide the space into a second grid of (non-overlapping) cubes whose width is chosen s.t. the average count of centers between two particles that almost interact with each other that fall into a given cube is less than the width of your tiles (i.e. 32). Since each particle is expected to interact with 300-500 particles, the width will be much smaller than d.
Then, while checking for interactions in step 1 & 2, assigne particle interactions to these new cubes according to the coordinates of the center of their interaction. Assign one tile per cube, and mark interacting particles assigned to that cube in the tile. Visualization:
Further optimizations might be to consider the distance of a point's closest neighbour within a cube, and derive from that how many update steps are needed at least to change the collision status of that point; then ignore that point for this many steps.
I suggest the following algorithm. E.g we have cube 1x1x1 and the cutoff distance is 0.001
Let's choose three base anchor points: (0,0,0) (0,1,0) (1,0,0)
Associate array of size 1000 ( 1 / 0.001) with each anchor point
Add three numbers into each regular point. We will store the distance between the given point and each anchor point inside these fields
At the same time this distance will be used as an index in an array inside the anchor point. E.g. 0.4324 means index 432.
Let's store the set of points inside of each three arrays
Calculate distance between the regular point and each anchor point every time when update point
Move point between sets in arrays during the update
The given structures will give you an easy way to find all closer points: it is the intersection between three sets. And we choose these sets based on the distance between point and anchor points.
In short, it is the intersection between three spheres. Maybe you need to apply additional filtering for the result if you want to erase the corners of this intersection.
Consider using the Barnes-Hut algorithm or something similar. A simulation in 2D would use a quadtree data structure to store particles, and a 3D simulation would use an octree.
The benefit of using a a tree structure is that it stores the particles in a way that nearby particles can be found quickly by traversing the tree, and far-away particles are in traversal paths that can be ignored.
Wikipedia has a good description of the algorithm:
The Barnes–Hut tree
In a three-dimensional n-body simulation, the Barnes–Hut algorithm recursively divides the n bodies into groups by storing them in an octree (or a quad-tree in a 2D simulation). Each node in this tree represents a region of the three-dimensional space. The topmost node represents the whole space, and its eight children represent the eight octants of the space. The space is recursively subdivided into octants until each subdivision contains 0 or 1 bodies (some regions do not have bodies in all of their octants). There are two types of nodes in the octree: internal and external nodes. An external node has no children and is either empty or represents a single body. Each internal node represents the group of bodies beneath it, and stores the center of mass and the total mass of all its children bodies.
demo
Task with example
I'm working with geodata (country-size) from openstreetmap. Buildings are often polygons without housenumbers and a single point with the housenumber is placed within the polygon of the building. Buildings may have multiple housenumbers.
I want to match the housenumbers to the polygons of the buildings.
Simple solution
Foreach housenumber perform a point-in-polygon-test with each building-polygon.
Problem
Way too slow for about 50,000,000 buildings and 10,000,000 address-points.
Idea
Build and index for the building-polygons to accelerate the search for the surrounding polygon for each housenumber-point.
Question
What index or strategy would you recommend for this polygon-structure? The polygons never overlap and the area is sparsly covered.
This question is duplicated to gis.stackexchange.com. It was recommendet to post the question there.
Since it sounds like you have well-formed polygons to test against, I'd use a spatial hash with a AABB check, and then finally the full point-in-polygon test. Hopefully at that point you'll be averaging three or less point-in-polygon tests per address.
Break the area your data is over into a simple grid where a grid is a small multiple (2 to 4) of the median building size. (Maybe 100-200 meters?)
Compute the axis aligned bounding box of every polygon, add it (with its bounding box) to each grid location which the bounding box intersects. (It's pretty simple to figure out where an axis aligned bounding box overlaps regular axis aligned grid cells. I wouldn't store the grid in a simple 2D array -- I'd use a hash table that maps 2D integer grid coordinates, e.g. (1023, 301), to a list of polygons)
Then go through all your address points. Look up in your hash table what cell that point is in. Go through all the polygons in that cell and if the point is within any polygon's axis aligned bounding box do the full point-in-polygon test.
This has several advantages:
The data structures are simple -- no fancy libraries needed (other than handling polygons). With C++, your polygon library, and the std namespace this could be implemented in less than an hour.
Spatial structure isn't hierarchical -- when you're looking up the points you only have to do one O(1) lookup in the hash table.
And of course, the usual disadvantage of grids as a spatial structure:
Doesn't handle wildly varying sized polygons particularly well. However, I'm hoping since you're using map data the sizes are almost always within an order of magnitude, and probably much less.
Assuming you end up with N maximum polygons in each of grid and each polygon has P points and you've got B buildings and A addresses, you're looking at O(B*P + N*A). Since B and P are likely relatively small, especially on average, you could consider this O(B + N) -- pretty much linear.
I have a array of [width, height, x, y] vectors, like so: [[width_1, height_1, x_1, y_1],...,[width_n, height_n, x_n, y_n]] representing a 2D plane of blocks. This vector is potentially long (n > 10k).
An example:
must be projected like:
The problem is however that the blocks are not neatly stacked, but can be in any shape and position
The criterion for which block should be project doesn't really matter. In the example I took the first (on the x-axis) largest; which seems reasonable.
What is important is that a list (vector) is maintained of which other blocks were occluded by the projected block. The blocks bear metadata which is important, so I should be able to answer the question "to what line segment was this block projected?"
So concretely how can a 2D plane be efficiently projected onto a line, in a sense "cast a shadow", in a way that maintains a method of seeing what blocks partake in a line segment (shadow)?
Edit: while the problem is rather generic, the concrete problem is that I have a document that has multiple columns and floating images for which I would like to generate a "minimap" which indicates where to find certain annotations (colors)
Assuming that the rectangles are always aligned with the axes, as in your example, I would use a sweep line approach:
Sort the rectangle tops/bottoms according to their y value. For every element, keep a reference to the full rectangle data.
Scan the list in increasing y order, maintaining a set S of rectangles representing the rectangles that contain the current y value. For every top of a rectangle r, add r to S. Similarly, for every bottom of r, remove r from S. Every time you do it, a segment is being closed and a new one is started. If you inspect S at this point, you have all the rectangles that participate in the segment, so this is the place to apply a policy for choosing the segment color.
If you need to know later what segments a rectangle belongs to, you can build a mapping between rectangles and segments lists, and update it during the scan.
I was wondering if there is a good data structure to hold a list of axis-aligned non overlapping discrete space rectangles. Thus each rectangle could be stored as the integers x, y, width, and height. It would be easy to just store such a list but I also want to be able to query if a given x,y coordinate is inside any other rectangle.
One easy solution would be to create a hash and fill it with the hashed lower left coordinates of the start of each rectangle. This would not allow me to test a given x,y coordinate because it would hit an empty space in the middle. Another answer is to create a bunch of edges into the hash table that cover the entire rectangle with unit squares. This would create too many needless entries for a rectangle of say 100 by 100.
R-Tree is the can be used. R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The information of all rectangles can be stored in tree form so searching will be easy
Wikipedia page, short ppt and the research paper will help you understand the concept.
I have an arbitrary set of voxels that define a room for a game. My goal is to place various props (bound to the size of voxels) within this room according to specific rules for the prop. Some simple rule examples (voxel dimensions are xyz, and there are far more props than just these three):
A bookshelf (2x3x1) needs to be placed against a wall and the ground.
A table (4x1x2) needs to be placed on the ground.
A wall-mounted lamp (1x1x1) needs to be placed against the wall.
The props can't intersect with other props or with existing voxels. I'm trying to find some efficient data structures and/or algorithms that can let me do this fairly fast. My current method is this:
Create a set S of possible prop locations, which is the set of empty voxels that are adjacent to filled voxels.
Mark each item in S if it's a floor, wall, corner, etc.
Pick a prop P to be placed.
Choose a subset S' of S that fulfills the placement rules of the prop (wall only, corner, etc).
Pick an arbitrary element E from S'.
Here is the non-optimal part: try to somehow fit P around E. See if the bounds of P allows it to be placed on top of E without intersecting with other props and voxels. If it doesn't fit, try rotating and/or translating the bounds of P until it's in a legal spot that contains E.
If it can fit, then update S to include the new prop, and start placing more props.
If it still can't fit, then pick another arbitrary element from S' and try again.
It technically works but it's not very optimal and can perform horribly in worst-case scenarios, such as when a prop can't fit anywhere in a room, or if I'm picking many large floor props to put in a room where most of the floor space is broken up by pillars and
holes.
It would be ideal if I could somehow take the dimensions of P into account when picking E. I was looking into generating a 3D convolution map of the voxel grid (essentially making a blurry image of the grid) so that each voxel has some rough data about how much space it has around it, but the problem is I need to update the map every time I place down a new prop, which sounds expensive.
Another idea was to store the world in an octree and somehow make better placement checks with that, but I can't seem to picture how that would help much. Would an octree allow me to determine if an arbitrary box contains any points any more efficiently than a dictionary keyed by position?
TLDR: How would you programatically decorate a house in Minecraft using decorations that can be larger than a single voxel?
If you don't have too many voxels in S, after creating walls and floors you can simply create 3 exhaustive sets of valid placements, one for each prop type. Let's call the set of valid placements for props of type p ValidPlacements(p).
Then, when you successfully place a new object into the world, for each prop type p, generate the set of placements of type p that would intersect in at least 1 voxel with the just-placed object, and delete these from ValidPlacements(p). (Some of these placements may already be absent, because they were already known to be impossible due to earlier-placed objects -- this is not an error condition, it can just be ignored.) Use a hashtable or balanced tree structure to hold each set of placements, so that each one can be looked up and deleted in O(1) time, or O(log n) time, respectively
Because your objects are small, placing an object eliminates only a small number of other possible object placements, so the number of placements deleted for any object will be small (it should be roughly proportional to the product of the volumes of the two objects being intersected). If you need to backtrack and try other placements of an object x, record which placements were actually deleted from the allowed-placements sets when x was placed, and reinsert them when you remove x.
Example: Placing a bookshelf with top-left-forwardmost corner at (x, y, z) will eliminate 2*3*1 = 6 possible placements of lamps (i.e. 1 placement for each voxel now occupied by the table), and a larger number of table and bookshelf placements (i.e. 1 placement for each possible placement that would overlap in any way with the just-placed bookshelf).