Imagine we have a set of quad-tree cells that intersect with a viewport. These cells represent areas and are used for a spatial query.
After viewport is moved we get a new set of intersecting quad-tree cells.
Since some of the cells are identical and were already queried, we don't need to query all cells form the second set. However, in addition to identical cells, a cell can be contained within an already queried cell as a cells can have a different depth.
Viewport can be panned and zoomed in or out.
I would like to calculate two new sets from these two sets. A diff-set that contains only the new cells we need to query. And a combined-set that that contains all queried cells after the the second set has also been queried. Combined set should also merge 4 lower level cells to a single higher level cell in case all 4 of them have been queried.
Are there any well known algorithms for these problems? I feel like I'm reinventing the wheel, but I have no idea what keywords to search for.
Related
The H3 API reference introduced polyfill, the idea of which is "a point-in-poly operation on every hexagon in a k-ring defined around the given geofence". The questions are:
I don't understand what's the meaning of "k-ring defined around a geofence", is this a "ring" and it center is actually the total geofence?
If the judgement is based on the hexagons' center location, and do an "point in polygon" operation, it's possible that the geofence and a hexagon overlaps, but the hexagon's center is out of the geofence and I can't get it's index using polyfill. So, is there any ways that I can get the 2 kinds of hexagons separately which are totally inside a geofence and the hexagons which are partly overlaps with the geofence?
You can actually ignore the k-ring part of this - it's an implementation detail, and in fact that detail has changed in the latest version of the library. The basic idea in both implementations is that we collect a "test" set of cells that covers the entire polygon, and then do a point-in-poly check based on the center of each cell.
It sounds like what you need are ways to get all fully-contained cells and all intersecting cells. There's an existing feature request for this functionality, and while we'd like to add this we don't have any support for other polyfill modes in the library at present.
It's not too hard to roll this yourself, but it may be slow for large sets. You'd need to have a polygon intersection check for a cell and the polygon - a naive implementation would simply check if any two segments intersect (O(polygonVertices), since the count of cell vertices is effectively constant).
Run polyfill to get the starting set. This includes fully contained and some, but not necessarily all, partially contained cells.
For each cell in the starting set, check if it intersects the polygon
If it does not intersect, it is fully contained, add to the set of contained cells
If it does intersect, add to the set of partially contained cells and also to a temporary queue of "border" cells.
If it is contiguous with the polygon, it's fully contained. Add it to the set of contained cells, and also to the border cell queue.
Now identify partially-contained cells that were not in the initial polyfill set. While cells remain in the border queue, pop the first cell and use kRing(cell, 1) to get its neighbors. For each neighbor:
If the neighbor is in the initial polyfill set, ignore.
If the neighbor does not intersect the polygon, ignore.
If the neighbor intersects the polygon, add to the set of partially contained cells and push onto the border queue.
When the border queue is empty, your two sets (contained and partially contained) are complete.
Task with example
I'm working with geodata (country-size) from openstreetmap. Buildings are often polygons without housenumbers and a single point with the housenumber is placed within the polygon of the building. Buildings may have multiple housenumbers.
I want to match the housenumbers to the polygons of the buildings.
Simple solution
Foreach housenumber perform a point-in-polygon-test with each building-polygon.
Problem
Way too slow for about 50,000,000 buildings and 10,000,000 address-points.
Idea
Build and index for the building-polygons to accelerate the search for the surrounding polygon for each housenumber-point.
Question
What index or strategy would you recommend for this polygon-structure? The polygons never overlap and the area is sparsly covered.
This question is duplicated to gis.stackexchange.com. It was recommendet to post the question there.
Since it sounds like you have well-formed polygons to test against, I'd use a spatial hash with a AABB check, and then finally the full point-in-polygon test. Hopefully at that point you'll be averaging three or less point-in-polygon tests per address.
Break the area your data is over into a simple grid where a grid is a small multiple (2 to 4) of the median building size. (Maybe 100-200 meters?)
Compute the axis aligned bounding box of every polygon, add it (with its bounding box) to each grid location which the bounding box intersects. (It's pretty simple to figure out where an axis aligned bounding box overlaps regular axis aligned grid cells. I wouldn't store the grid in a simple 2D array -- I'd use a hash table that maps 2D integer grid coordinates, e.g. (1023, 301), to a list of polygons)
Then go through all your address points. Look up in your hash table what cell that point is in. Go through all the polygons in that cell and if the point is within any polygon's axis aligned bounding box do the full point-in-polygon test.
This has several advantages:
The data structures are simple -- no fancy libraries needed (other than handling polygons). With C++, your polygon library, and the std namespace this could be implemented in less than an hour.
Spatial structure isn't hierarchical -- when you're looking up the points you only have to do one O(1) lookup in the hash table.
And of course, the usual disadvantage of grids as a spatial structure:
Doesn't handle wildly varying sized polygons particularly well. However, I'm hoping since you're using map data the sizes are almost always within an order of magnitude, and probably much less.
Assuming you end up with N maximum polygons in each of grid and each polygon has P points and you've got B buildings and A addresses, you're looking at O(B*P + N*A). Since B and P are likely relatively small, especially on average, you could consider this O(B + N) -- pretty much linear.
I have a rendering application that renders lots and lots of cubes in a 3-dimensional grid. This is inherently inefficient as each cube represents 4 vertices, and often the cubes are adjacent, creating one surface that could be represented by a single rectangle.
To populate the area I use a 3-dimensional array, where a value of 0 denotes empty space and a non-0 value denotes a block.
e.g. (where X denotes where a cube would be placed)
OOOXXXOOOO
OOXXXXXXXO
OOXXXXXXXO
OOXXXXOOOO
would currently be represented as 21 cubes, or 252 triangles, whereas it could easily be represented as (where each letter denotes a part of a rectangle)
OOOAAAOOOO
OOBAAACCCO
OOBAAACCCO
OOBAAAOOOO
which is a mere 3 rectangles, or 26 triangles.
The typical size of these grids is 128x128x128, so it's clear I would benefit from a massive performance boost if I could efficiently reduce the shapes to the fewest rectangles possible in a reasonable amount of time, but I'm stuck for ideas for an algorithm.
Using Dynamic programming - Largest square block would be one option, but it wouldn't result in an optimal answer, although if the solution is too complex to perform efficiently then this would have to be the way to go.
Eventually I will have multiple types of cubes (e.g. green, brown, blue, referenced using different non-0 numbers in the array) so if possible a version that would work with multiple categories would be very helpful.
Maybe something "octree" like:
Build a 64x64x64 grid over your 128x128x128 grid so each cell of the first grid "contains" height cells of the second.
For each cell, of the 64x64x64 grid, proceed like that:
If the height contained cells have the same value, put that value in the 64x64x64 grid.
Else draw each cell individually and put -1 in the 64x64x64 grid.
Now build a 32x32x32 grid over the 64x64x64 one and repeat.
Then 16x16x16, 8x8x8, 4x4x4, 2x2x2, 1x1x1 and you're done :)
Of course, it would be best if the octree was computed once and for all, not for each rendering operation.
In a multi-dimensional space, I have a collection of rectangles, all of which are aligned to the grid. (I am using the word "rectangles" loosely - in a three dimensional space, they would be rectangular prisms.)
I want to query this collection for all rectangles that overlap an input rectangle.
What is the best data structure for holding the collection of rectangles? I will be adding rectangles to and removing rectangles from the collection from time to time, but these operations will be infrequent. The operation I want to be fast is the query.
One solution is to keep the corners of the rectangles in a list, and do a linear scan over the list, finding which rectangles overlap the query rectangle and skipping over the ones that don't.
However, I want the query operation to be faster than linear.
I've looked at the R-tree data structure, but it holds a collection of points, not a collection of rectangles, and I don't see any obvious way to generalize it.
The coordinates of my rectangles are discrete, in case you find that helpful.
I am interested in the general solution, but I will also tell you the properties of my specific problem: my problem space has three dimensions, and their multiplicity varies wildly. The first dimension has two possible values, the second dimension has 87 values, and the third dimension has 1.8 million values.
You can probably use KD-Trees which can be used for rectangles according to the wiki page:
Variations
Instead of points
Instead of points, a kd-tree can also
contain rectangles or
hyperrectangles[5]. A 2D rectangle is
considered a 4D object (xlow, xhigh,
ylow, yhigh). Thus range search
becomes the problem of returning all
rectangles intersecting the search
rectangle. The tree is constructed the
usual way with all the rectangles at
the leaves. In an orthogonal range
search, the opposite coordinate is
used when comparing against the
median. For example, if the current
level is split along xhigh, we check
the xlow coordinate of the search
rectangle. If the median is less than
the xlow coordinate of the search
rectangle, then no rectangle in the
left branch can ever intersect with
the search rectangle and so can be
pruned. Otherwise both branches should
be traversed. See also interval tree,
which is a 1-dimensional special case.
Let's call the original problem by PN - where N is number of dimensions.
Suppose we know the solution for P1 - 1-dimensional problem: find if a new interval is overlapping with a given collection of intervals.
Once we know to solve it, we can check if the new rectangle is overlapping with the collection of rectangles in each of the x/y/z projections.
So the solution of P3 is equivalent to P1_x AND P1_y AND P1_z.
In order to solve P1 efficiently we can use sorted list. Each node of the list will include coordinate and number-of-opened-intetrvals-up-to-this-coordinate.
Suppose we have the following intervals:
[1,5]
[2,9]
[3,7]
[0,2]
then the list will look as follows:
{0,1} , {1,2} , {2,2}, {3,3}, {5,2}, {7,1}, {9,0}
if we receive a new interval, say [6,7], we find the largest item in the list that is smaller than 6: {5,2} and smllest item that is greater than 7: {9,0}.
So it is easy to say that the new interval does overlap with the existing ones.
And the search in the sorted list is faster than linear :)
You have to use some sort of a partitioning technique. However, because your problem is constrained (you use only rectangles), the data-structure can be a little simplified. I haven't thought this through in detail, but something like this should work ;)
Using the discrete value constraint - you can create a secondary table-like data-structure where you store the discrete values of second dimension (the 87 possible values). Assume that these values represent planes perpendicular to this dimension. For each of these planes you can store, in this secondary table, the rectangles that intersect these planes.
Similarly for the third dimension you can use another table with as many equally spaced values as you need (1.8 million is too much, so you would probably want to make this at least a couple of magnitudes smaller), and create a map the rectangles that are between two chosen values.
Given a query rectangle you can query the first table in constant time to determine a set of tables which possibly intersects this query. Then you can do another query on the second table, and do an intersection of the results from the first and the second query results. This should narrow down the number of actual intersection tests that you have to perform.
Given a basic grid (like a piece of graph paper), where each cell has been randomly filled in with one of n colors, is there a tried and true algorithm out there that can tell me what contiguous regions (groups of cells of the same color that are joined at the side) there are? Let's say n is something reasonable, like 5.
I have some ideas, but they all feel horribly inefficient.
The best possible algorithm is O(number of cells), and is not related to the number of colors.
This can be achieved by iterating through the cells, and every time you visit one that has not been marked as visited, do a graph traversal to find all the contiguous cells in that region, and then continue iterating.
Edit:
Here's a simple pseudo code example of a depth first search, which is an easy to implement graph traversal:
function visit(cell) {
if cell.marked return
cell.marked = true
foreach neighbor in cell.neighbors {
if cell.color == neighbor.color {
visit(neighbor)
}
}
}
In addition to recursive's recursive answer, you can use a stack if recursion is too slow:
function visit(cell) {
stack = new stack
stack.push cell
while not stack.empty {
cell = stack.pop
if cell.marked continue
cell.marked = true
foreach neighbor in cell.neighbors {
if cell.color == neighbor.color {
stack.push neighbor
}
}
}
}
You could try doing a flood fill on each square. As the flood spreads, record the grid squares in an array or something, and colour them in an unused colour, say -1.
The Wikipedia article on flood fill might be useful to you here: http://en.wikipedia.org/wiki/Flood_fill
Union-find would work here as well. Indeed, you can formulate your question as a problem about a graph: the vertices are the grid cells, and two vertices are adjacent if their grid cells have the same color. You're trying to find the connected components.
The way you would use a union-find data structure is as follows: first create a union-find data structure with as many elements as you have cells. Then iterate through the cells, and union two adjacent cells if they have the same color. In the end, run find on each cell and store the response. Cells with the same find are in the same contiguous colored region.
If you want a little more fine grain control, you might think about using the A* algorithm and use the heuristic to include similarly colored tiles.
You iterate through the regions in a scanline, going left-right top-bottom. For each cell you make a list of cells shared as the same memory object between the cells. For each cell, you add the current cell to the list (either shared with it or created). Then if the cell to the right or below is the same color, you share that list with that cell. If that cell already has a list, you combine the lists and replace the reference to the list object in each cell listed in the lists with the new merged list.
Then located in each cell is a reference to a list that contains every contiguous cell with that cell. This aptly combines the work of the floodfill between every cell. Rather than repeating it for each cell. Since you have the lists replacing the data with the merged data is just iterating through a list. It will be O(n*c) where n is the number of cells and c is a measure of how contiguous the graph is. A completely disjointed grid will be n time. A completely contiguous 1 color graph with be n^2/2.
I heard this question in a video and also found it here and I came up with what is the best approach I have seen in my searching. Here are the basic steps of the algorithm:
Loop through the array (assuming the grid of colors is represented as a 2-dimensional array) from top-left to bottom-right.
When you go through the first row just check the color to the left to see if it is the same color. When you go through all subsequent rows, check the cell above and the cell to the left - this is more efficient than checking to the top, bottom, left and right every time. Don't forget to check that the left cell is not out of bounds.
Create a Dictionary of type <int,Dictionary<int,Hashset<cell>>> for storing colors and groups within those colors. The Hashset contains cell locations (cell object with 2 properties: int row, int column).
If the cell is not connected at the top or left to a cell of the same color then create a new Dictionary entry, a new color group within that entry, and add the current cell to that group (Hashset). Else it is connected to another cell of the same color; add the current cell to the color group containing the cell it's connected to.
If at some point you encounter a cell that has the same color at the top and left, if they both belong to the same color group then that's easy, just add the current cell to that color group. Else check the kitty-corner cell to the top-left. If it is a different color than the current cell and the cell to the top and cell to the left belong to different color groups --> merge the 2 color groups together; add the current cell to the group.
Finally, loop through all of the Hashsets to see which one has the highest count - this will be the return value.
Here is a link to a video I made with visual and full explanation:
https://d.tube/#!/v/israelgeeksout77/wm2ax1vpu3y
P.S. I found this post on GeeksForGeeks https://www.geeksforgeeks.org/largest-connected-component-on-a-grid/
They conveniently posted source code to this problem in several languages! But I tried their code vs. mine and mine ran in about 1/3 of the time.