Data structure for traversing polygon segments in Haskell? - data-structures

i have an unordered list of horizontal/vertical segments of length 1, which build one or many polygons. I now need to find the list of all connected corners in each polygon.
Example:
[Horizontal (0,0), Vertical (2,0), Horizontal (1,0), Horizontal (2,2), Vertical (2,1)]
represents a line like this
2 X--X
|
1 X
|
0 X--X--X
0 1 2
I would be looking for the corners [(2,0), (2,2)].
In imperative languages i would probably use a (doubly-)linked data structure and traverse those. I can't come up with a elegant representation for this in Haskell. How would you do it?

Before we go looking for corners, let's take a step back. What are you trying to do?
I have an unordered list of horizontal/vertical segments of length 1, which build one or many polygons. I now need to find the list of all connected corners in each polygon.
"Searching" and "unordered lists" don't really go together, as I'm sure you realize! This is true even in simple lookups, but it's even worse for what you're doing, which is conceptually closer to finding duplicates because it requires correlating elements of the collection with each other, instead of inspecting each independently.
So, you're definitely going to want something with a lot more structure to it. One option would be a more semantically-meaningful representation in terms of complete polygons, allowing a simple traversal of an unbroken perimeter, but I'm guessing you don't have that information available (for instance, if you're trying to create such a structure here).
Now, in a comment you said:
The reason for this is, that the segments were stored in "Set"s before, in order to remove overlapping segments. This representation guarantees that there is only one segment (x,y)--(x+1,y).
This is worth further thought. How does Data.Set work, and why is it better for removing duplicates than an unordered list? That last bit's kind of a give-away, because Data.Set is precisely an ordered collection, so by giving each item a representation that sorts uniquely, you get the combined benefits of automatically removing duplicates and fast lookup.
As mentioned above, your actual problem is conceptually similar to finding duplicates; instead of finding overlapping segments, you want adjacent ones. Can using Data.Set help you here as well?
Alas, it cannot. To see why, think about how the sorting works. Given two items X and Y, there are three possible comparisons: X < Y, X == Y, or X > Y. If distinct, adjacent elements differ by the minimum amount possible, you can safely examine only elements that are adjacent in the sorted collection. But this cannot generalize to line segments for multiple reasons, the simplest being that up to four distinct elements can be adjacent, which cannot be described in a sorted sequence.
Hopefully I've been heavy-handed enough with my hints that you're now wondering what a sorted collection that does allow four distinct elements to be adjacent would look like, and whether it would allow easy searching the way Data.Set does.
The answer to the latter is yes, absolutely, and the answer to the first is that it would be a higher-dimensional search tree. A simple binary search tree looks something like this:
data Tree a = Leaf | Branch a (Tree a) (Tree a)
...where you ensure that at any branch, all leaf values in the left half are smaller than those in the right. A simple 2-dimensional search tree would instead look something like this:
data Tree a = Leaf | Branch a (Tree a) (Tree a) (Tree a) (Tree a)
...where each branch represents a quadrant, sorting by comparing on the two axes independently. Otherwise, it works just like familiar 1-dimensional search trees, with straightforward translations of many standard algorithms, and given a specific line segment you can quickly search for any adjacent segments.
Edit: In hindsight, I got a little too wrapped-up in exposition and forgot to give references. This is not at all a novel concept, and has many extant variations:
What I described would be called a point Quadtree and is a simple extension of binary search trees, like Data.Set.
The same concept can be done with regions instead of discrete points, with lookups ending at regions that are either fully included or excluded. These are similar extensions of tries, like Data.IntSet.
A variation called R-trees are similar to a B-trees and have useful performance characteristics for some purposes.
The concepts extend just as well to higher dimensions, as well. Data structures along these lines are used for rendering and collision detection in simulations and video games, spatial databases with "nearest neighbor" searches, as well as more abstract applications you wouldn't normally think of geometrically, where sparse data points can be sorted along multiple axes and some combined notion of "distance" is meaningful.
Oddly enough, I've been unable to find any implementation of such data structures on Hackage, besides one incomplete and seemingly-abandoned package.

If I understand the problem description correctly, each segment can take part in up to four possible corners that each identifies a specific complementary segment. Given a list of segments, we can than walk down the list seeing which possible two-segment corners are present, then figure out where those segments meet. This is a very slow approach due to the repeated list traversals, but if you are only dealing with handfuls of segments, it is at least fairly concise.
data Segment = Horizontal (Int,Int) | Vertical (Int,Int) deriving (Eq, Show)
example = [ Horizontal (0,0)
, Vertical (2,0)
, Horizontal (1,0)
, Horizontal (2,2)
, Vertical (2,1) ]
corners [] = []
corners (Horizontal (x,y):xs) = ns ++ corners xs
where ns = map cornerLoc . filter (`elem` xs) $
map Vertical [(x,y),(x+1,y),(x,y-1),(x+1,y-1)]
cornerLoc (Vertical (x',_)) = (max x x', y)
corners (Vertical (x,y):xs) = ns ++ corners xs
where ns = map cornerLoc . filter (`elem` xs) $
map Horizontal [(x,y),(x,y+1),(x-1,y),(x-1,y+1)]
cornerLoc (Horizontal (_,y')) = (x, max y y')

Related

Using a spatial index to find points within range of each other

I'm trying to find a spatial index structure suitable for a particular problem : using a union-find data structure, I want to connect\associate points that are within a certain range of each other.
I have a lot of points and I'm trying to optimize an existing solution by using a better spatial index.
Right now, I'm using a simple 2D grid indexing each square of width [threshold distance] of my point map, and I look for potential unions by searching for points in adjacent squares in the grid.
Then I compute the squared Euclidean distance to the adjacent cells combinations, which I compare to my squared threshold, and I use the union-find structure (optimized using path compression and etc.) to build groups of points.
Here is some illustration of the method. The single black points actually represent the set of points that belong to a cell of the grid, and the outgoing colored arrows represent the actual distance comparisons with the outside points.
(I'm also checking for potential connected points that belong to the same cells).
By using this pattern I make sure I'm not doing any distance comparison twice by using a proper "neighbor cell" pattern that doesn't overlap with already tested stuff when I iterate over the grid cells.
Issue is : this approach is not even close to being fast enough, and I'm trying to replace the "spatial grid index" method with something that could maybe be faster.
I've looked into quadtrees as a suitable spatial index for this problem, but I don't think it is suitable to solve it (I don't see any way of performing repeated "neighbours" checks for a particular cell more effectively using a quadtree), but maybe I'm wrong on that.
Therefore, I'm looking for a better algorithm\data structure to effectively index my points and query them for proximity.
Thanks in advance.
I have some comments:
1) I think your problem is equivalent to a "spatial join". A spatial join takes two sets of geometries, for example a set R of rectangles and a set P of points and finds for every rectangle all points in that rectangle. In Your case, R would be the rectangles (edge length = 2 * max distance) around each point and P the set of your points. Searching for spatial join may give you some useful references.
2) You may want to have a look at space filling curves. Space filling curves create a linear order for a set of spatial entities (points) with the property that points that a close in the linear ordering are usually also close in space (and vice versa). This may be useful when developing an algorithm.
3) Have look at OpenVDB. OpenVDB has a spatial index structure that is highly optimized to traverse 'voxel'-cells and their neighbors.
4) Have a look at the PH-Tree (disclaimer: this is my own project). The PH-Tree is a somewhat like a quadtree but uses low level bit operations to optimize navigation. It is also Z-ordered/Morten-ordered (see space filling curves above). You can create a window-query for each point which returns all points within that rectangle. To my knowledge, the PH-Tree is the fastest index structure for this kind of operation, especially if you typically have only 9 points in a rectangle. If you are interested in the code, the V13 implementation is probably the fastest, however the V16 should be much easier to understand and modify.
I tried on my rather old desktop machine, using about 1,000,000 points I can do about 200,000 window queries per second, so it should take about 5 second to find all neighbors for every point.
If you are using Java, my spatial index collection may also be useful.
A standard approach to this is the "sweep and prune" algorithm. Sort all the points by X coordinate, then iterate through them. As you do, maintain the lowest index of the point which is within the threshold distance (in X) of the current point. The points within that range are candidates for merging. You then do the same thing sorting by Y. Then you only need to check the Euclidean distance for those pairs which showed up in both the X and Y scans.
Note that with your current union-find approach, you can end up unioning points which are quite far from each other, if there are a bunch of nearby points "bridging" them. So your basic approach -- of unioning groups of points based on proximity -- can induce an arbitrary amount of distance error, not just the threshold distance.

Finding all points in certain radius of another point

I am making a simple game and stumbled upon this problem. Assume several points in 2D space. What I want is to make points close to each other interact in some way.
Let me throw a picture here for better understanding of the problem:
Now, the problem isn't about computing the distance. I know how to do that.
At first I had around 10 points and I could simply check every combination, but as you can already assume, this is extremely inefficient with increasing number of points. What if I had a million of points in total, but all of them would be very distant to each other?
I'm trying to find a suitable data structure or a way to look at this problem, so every point can only mind their surrounding and not whole space. Are there any known algorithms for this? I don't exactly know how to name this problem so I can google exactly what I want.
If you don't know of such known algorighm, all ideas are very welcome.
This is a range searching problem. More specifically - the 2-d circular range reporting problem.
Quoting from "Solving Query-Retrieval Problems by Compacting Voronoi Diagrams" [Aggarwal, Hansen, Leighton, 1990]:
Input: A set P of n points in the Euclidean plane E²
Query: Find all points of P contained in a disk in E² with radius r centered at q.
The best results were obtained in "Optimal Halfspace Range Reporting in Three Dimensions" [Afshani, Chan, 2009]. Their method requires O(n) space data structure that supports queries in O(log n + k) worst-case time. The structure can be preprocessed by a randomized algorithm that runs in O(n log n) expected time. (n is the number of input points, and k in the number of output points).
The CGAL library supports circular range search queries. See here.
You're still going to have to iterate through every point, but there are two optimizations you can perform:
1) You can eliminate obvious points by checking if x1 < radius and if y1 < radius (like Brent already mentioned in another answer).
2) Instead of calculating the distance, you can calculate the square of the distance and compare it to the square of the allowed radius. This saves you from performing expensive square root calculations.
This is probably the best performance you're gonna get.
This looks like a nearest neighbor problem. You should be using the kd tree for storing the points.
https://en.wikipedia.org/wiki/K-d_tree
Space partitioning is what you want.. https://en.wikipedia.org/wiki/Quadtree
If you could get those points to be sorted by x and y values, then you could quickly pick out those points (binary search?) which are within a box of the central point: x +- r, y +- r. Once you have that subset of points, then you can use the distance formula to see if they are within the radius.
I assume you have a minimum and maximum X and Y coordinate? If so how about this.
Call our radius R, Xmax-Xmin X, and Ymax-Ymin Y.
Have a 2D matrix of [X/R, Y/R] of double-linked lists. Put each dot structure on the correct linked list.
To find dots you need to interact with, you only need check your cell plus your 8 neighbors.
Example: if X and Y are 100 each, and R is 1, then put a dot at 43.2, 77.1 in cell [43,77]. You'll check cells [42,76] [43,76] [44,76] [42,77] [43,77] [44,77] [42,78] [43,78] [44,78] for matches. Note that not all cells in your own box will match (for instance 43.9,77.9 is in the same list but more than 1 unit distant), and you'll always need to check all 8 neighbors.
As dots move (it sounds like they'd move?) you'd simply unlink them (fast and easy with a double-link list) and relink in their new location. Moving any dot is O(1). Moving them all is O(n).
If that array size gives too many cells, you can make bigger cells with the same algo and probably same code; just be prepared for fewer candidate dots to actually be close enough. For instance if R=1 and the map is a million times R by a million times R, you wouldn't be able to make a 2D array that big. Better perhaps to have each cell be 1000 units wide? As long as density was low, the same code as before would probably work: check each dot only against other dots in this cell plus the neighboring 8 cells. Just be prepared for more candidates failing to be within R.
If some cells will have a lot of dots, each cell having a linked list, perhaps the cell should have an red-black tree indexed by X coordinate? Even in the same cell the vast majority of other cell members will be too far away so just traverse the tree from X-R to X+R. Rather than loop over all dots, and go diving into each one's tree, perhaps you could instead iterate through the tree looking for X coords within R and if/when you find them calculate the distance. As you traverse one cell's tree from low to high X, you need only check the neighboring cell to the left's tree while in the first R entries.
You could also go to cells smaller than R. You'd have fewer candidates that fail to be close enough. For instance with R/2, you'd check 25 link lists instead of 9, but have on average (if randomly distributed) 25/36ths as many dots to check. That might be a minor gain.

Implementation of Planar Point Location using slabs

I am looking into a Planar Point Location, and right now thinking about make use of The Slabs approach.
I've read through a couple of articles:
Planar Point Location Using Persistent Search Trees
Persistent Data Structures
Point location on Wikipedia
I understand the concept behind a persistent balanced binary search tree, but let's omit them in this question, I don't really care about extra storage space overhead. The thing with the articles is they are discussing how speed can be improved, but don't explain basic stuff. For example:
Could you please correct me if I am wrong:
We draw a line across all intersections.
Now, we have slabs that are split by the segments at different angles.
Every intersection with any line is considered a vertex.
Slabs are sorted by order in a binary search tree (let's omit a partially persistent bst)
Somehow, sectors are sorted in those respective BST's, even though the segments dividing them are almost always at an angle. Does each node has to carry a definition of area?
Please refer to this example image:
Questions:
How would I actually figure out if the point lies in Node c and not in Node b? Is it somehow via area?
Do I instead need arrange my nodes to contain information about the segments? I could then check if the query point lies above a segment (if that's how I should determine my sectors)? If so, would I then search through a Polygon List after, to see which polygon this particular segment belongs to?
Maybe I need to store BST for each line and not a slab?
Would I then have to look at 2 BST's belonging to line on the left, and the 2nd one of the line to the right from the vertex? I could then sort the vertexes by y coordinate in each tree and return the y coordinate of a vertex (the end of a segment) right below my query point. Having done this for left and right line, I would then do a comparison to see if the names of the segment those vertexes come from actually match.
However, this will not give me the right answer, since even if the names do match, I might be below or above the segment (if I am close to it). Also, this implies I have to do 3 binary searches (1 for lines, 1 for the y-coordinate on left line, 1 for the right one), and the books says I only need to do 2 searches (1 for slab, 2nd for sector).
Could someone please point me in the direction to do it?
I probably just missed some essential thought or something.
Edit:
Here is another good article, that explains the solution to the problem, however, I don't quite understand how I would achieve the following:
"Consider any query point q ∈ R2. To find the face of G that contains q, we first use binary search with the x-coordinate of q to find the vertical slab s that contains q. Given s, we use binary search with the y-coordinate of q to find the edges of Es between which q lies. "
How exactly to find those 2 edges? Is it as simple as checking if the point lies below the segment? However, this seems like a complicated check to do (and expensive), as we descend down the tree, inspecting the other nodes.
Well, this was asked a year ago so I guess you're not going to be around to accept the answer. It's a fun question, though, so here goes anyway...
Yes, you can divide the plane into slabs using vertical lines through every intersection. The important result is that all line segments that touch a slab will go all the way through, and the order of those line segments will be the same across the entire slab.
In each slab, you make a binary search tree of those line segments, ordered by their order in the slab. The areas that you call "nodes" are the leaves of the tree, and the line segments are the internal nodes. I will call the leaves "areas" to avoid ambiguity. Since the order of the line segments is the same across the entire slab, the order in this tree is valid for every x coordinate in the slab.
to determine the area that contains any point, you find the slab that contains the point, and then do a simple search in the BST to find the area.
Lets say you're looking for point C, and you're at the node in slab 2 that contains (x2,y2) - (x3,y4). Determine whether C is above or below this line segment, and then take that branch in the tree.
For example, if C is (cx,cy) then the y coordinate of the segment at x=cx is:
testy = ((cx-x2)/(x3-x2)) * (y4-y2) + y2
If cy < testy, then take the upward branch, otherwise take the downward branch. (assuming positive Y goes downward).
When you get to a leaf, that's the area that contains C. You're done.
Now... Making a whole new tree for every slab takes a lot of space, since you can have N^2 intersection points and therefore N^2 slabs. Storing a separate tree in every slab would take O(N^3) space.
Using, say, persistent red-black trees, however, the slabs can actually share a lot of nodes. Using a simple purely functional implementation like Chris Okasaki's red-black tree takes O(log N) space per intersection for a total of O(N^2*log N) space.
I seem to remember that there is also a constant-space-per-change persistent red-black tree that would give you O(N^2), but I don't have a reference.
Note that for most real-life scenarios there are closer to O(N) intersections, because lines don't cross very much, but it's still nice to save the space by using a persistent tree.

Applying Binary Search Trees to find Moore Neighbours

Assuming I have a large set of Coordinates such us (3,4), (5,-6), etc., where x and y are integers; is it possible to order them using a BST?
How can I go about determining what should be on the left vs right node?
The reason why I'm looking at BST instead of simply using a list of coordinates is so that I can more efficiently (vs linear search) determine those coordinates that would be in the Moore neighborhood (Chebyshev distance 1) of another.
I've thought about alternating comparisons to x and y values; is that a good approach?
How else can I apply BST to this situation? Or is using BST untenable?
I suggest you create a grid of cells. Each cell (which is actually a list) contains all coordinates which like within it.
If you need to find neighbors of a coordinate, you just look at the coordinates that lies within the same cell (or in neighboring cells).
Despite the simplicity of the approach by aioobe to create a grid of cells (two dimensional array), it was a bit heavy handed/inefficient to store the state of all possible cells/coordinates especially when I may have cases where there is only have a handful of actual coordinates in a very large space (sparse array).
Ultimately I realised using a BST is feasible (there are other approaches) and this is what I did to find Moore Neighbours efficiently using a balanced BST:
Impose a ordinal relation on coordinates i.e. (x1,y1) > (x2,y2) => (x1 > x2) || (x1==x2 && y1 > y2)
Sort list of coordinates by that relationship (to generate a balanced tree)
Generate BST from sorted list recursively: insert median element as node, slice the list below and above the median and pass as parameter to recursive call for generating left and right sub-trees
Then to search for the Moore Neighbours of a coordinate, I can then just look up/search for the 8 possible neighbour coordinates in the tree (as aioobe suggests) in O(log n).

Nearest neighbour search in a constantly changing set of line segments

I have a set of line segments. I want to perform the following operations on them:
Insert a new line segment.
Find all line segments within radius R of a given point.
Find all points within radium R1 of a given point.
Given a line segment, find if it intersects any of the existing line segments. Exact intersection point is not necessary(though that probably doesnt simplify anything.)
The problem is algorithms like kd/bd tree or BSP trees is that they assume a static set of points, and in my case the points will constantly get augmented with new points, necessitating rebuilding the tree.
What data-structure/algorithms will be most suited for this situation ?
Edit: Accepted the answer that is the simplest and solves my problem. Thanks everyone!
Maintaining a dynamic tree might not be as bad as you think.
As you insert new points/lines etc into the collection it's clear that you'd need to refine the current tree, but I can't see why you'd have to re-build the whole tree from scratch every time a new entity was added, as you're suggesting.
With a dynamic tree approach you'd have a valid tree at all times, so you can then just use the fast range searches supported by your tree type to find the geometric entities you've mentioned.
For your particular problem:
You could setup a dynamic geometric tree, where the leaf elements
maintain a list of geometric entities (points and lines) associated
with that leaf.
When a line is inserted into the collection it should be pushed onto
the lists of all leaf elements that it intersects with. You can do
this efficiently by traversing the subset of the tree from the root
that intersects with the line.
To find all points/lines within a specified radial halo you first need
to find all leaves in this region. Again, this can be done by traversing
the subset of the tree from the root that is enclosed by, or that
intersects with the halo. Since there maybe some overlap, you need to
check that the entities associated with this set of leaf elements
actually lie within the halo.
Once you've inserted a line into a set of leaf elements, you can find
whether it intersects with another line by scanning all of the lines
associated with the subset of leaf boxes you've just found. You can then
do line-line intersection tests on this subset.
A potential dynamic tree refinement algorithm, based on an upper limit to the number of entities associated with each leaf in the tree, might work along these lines:
function insert(x, y)
find the tree leaf element enclosing the new entitiy at (x,y) based on whatever
fast search algorithm your tree supports
if (number of entities per leaf > max allowable) do
refine current leaf element (would typically either be a bisection
or quadrisection based on a 2D tree type)
push all entities from the old leaf element list onto the new child element
lists, based on the enclosing child element
else
push new entity onto list for leaf element
endif
This type of refinement strategy only makes local changes to the tree and is thus generally pretty fast in practice. If you're also deleting entities from the collection you can also support dynamic aggregation by imposing a minimum number of entities per leaf, and collapsing leaf elements to their parents when necessary.
I've used this type of approach a number of times with quadtrees/octrees, and I can't at this stage see why a similar approach wouldn't work with kd-trees etc.
Hope this helps.
One possibility is dividing your space into a grid of boxes - perhaps 10 in the y-axis and 10 in the x-axis for a total of 100.
Store these boxes in an array, so it's very easy/fast to determine neighboring boxes. Each box will hold a list vector of line segments that live in that box.
When you calculate line segments within R of one segment, you can check only the relevant neighboring boxes.
Of course, you can create multiple maps of differing granularities, maybe 100 by 100 smaller boxes. Simply consider space vs time and maintenance trade-offs in your design.
updating segment positions is cheap: just integer-divide by box sizes in the x and y directions. For example, if box-size is 20 in both directions and your new coordinate is 145,30. 145/20==7 and 30/20==1, so it goes into box(7,1), for a 0-based system.
While items 2 & 3 are relatively easy, using a simple linear search with distance checks as each line is inserted, item 4 is a bit more involved.
I'd tend to use a constrained triangulation to solve this, where all the input lines are treated as constraints, and the triangulation is balanced using a nearest neighbour rather than Delaunay criterion. This is covered pretty well in Triangulations and applications by Øyvind Hjelle, Morten Dæhlen and Joseph O'Rourkes Computational Geometry in C Both have source available, including getting sets of all intersections.
The approach I've taken to do this dynamically in the past is as follows;
Create a arbitrary triangulation (TIN) comprising of two triangles
surrounding the extents (current + future) of your data.
For each new line
Insert both points into the TIN. This can be done very quickly by
traversing triangles to find the insertion point, and replacing the
triangle found with three new triangles based on the new point and
old triangle.
Cut a section through the TIN based on the two end points, keeping a
list of points where the section cuts any previously inserted lined.
Add the intersection point details to a list stored against both
lines, and insert them into the TIN.
Force the inserted line as a constraint
Balance all pairs of adjacent triangles modified in the above process
using a nearest neighbour criterion, and repeat until all triangles
have been balanced.
This works better than a grid based method for poorly distributed data, but is more difficult to implement. Grouping end-point and lines into overlapping grids will probably be a good optimization for 2 & 3.
Note that I think using the term 'nearest neighbour' in your question is misleading, as this is not the same as 'all points within a given distance of a line', or 'all points within a given radius of another point'. Nearest neighbour typically implies a single result, and does not equate to 'within a given point to point or point to line distance'.
Instead of inserting and deleting into a tree you can calculate a curve that completely fills the plane. Such a curve reduce the 2d complexity to a 1d complexity and you would be able to find the nearest neighbor. You can find some example like z curve and hilbert curve. Here is a better description of my problem http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.

Resources