Tree that contains points - improvement needed - algorithm

Currently, I'm making my own structure that holds points in 2D space. I know that there are many ready-made algorithms and kind of trees but I want to have something lightweight. So I have (x, y) point that is inside of each node, each node contains 4 children: topLeft, topRight, botLeft, botRight next node.
Inserting:
Every new node is inserted depends on its position.
If the tree is empty, insert the new node / If the tree is not empty, go to the first node and do:
1. Decide what is the position of the current node in regard of the new node.
2. If it is e. g. topLeft and it is not occupied then insert the new node.
3. If position topLeft is occupied go to this node and repeat.
Removing:
The structure I need does not need to have "removing particular node" function, so if the job is done the destructor deletes the whole tree recursively.
Check if the node is inside of particular area:
If the tree is not empty go the first node and then:
1. Check if given area's x is less than node's x and area's y is less than node's y if it is then go to the topLeft children node (if it exists).
2. The same for the topRight (check area's x + width position and y).
3. The same for botRight (check area's x + width and y + height).
4. The same for botLeft (check area's x and y + height).
5. Check if current node is inside of area if it is do stuff you want to do with a point. Recursively go back and repeat.
That's how my structure looks like, the image shows which bonds would be checked for particular area (orange color):
link
My question is, is there is a better algorithm? What Can I improve? I saw quadtree but it seems to be different and it contains more data. I need something that can easily hold moving objects in 2D. I appreciate your help.

What you have is basically a quadtree, but you use your data for doing splits instead of the typical middle.
You can improve the system a bit by switching to a KD tree. It's similar except at each point you split along a single dimension. The main difference is that you only have two pointers per node (instead of 4) so you save about half the memory.
Another thing is that you split your space until you get to 1 point. Because modern CPUs do really fancy things, for small values, linear search will be faster than traversing the tree. So I would only split a space when you have 50-100 points already in there. This will also save a bunch of pointers that don't need to stored at all.
If you know something about the distribution of your points you might be able to do something better. If the distribution is kinda uniform you can simply chunk your space into uniform cells and store the points in the associated cells. A rule of thumb says if you have N points you should have sqrt(N) cells, but you should try and see what works best.

Related

Can a quad-tree be used to accurately determine the closest object to a point?

I have a list of coordinates and I need to find the closest coordinate to a specific point which I'll call P.
At first I tried to just calculate the distance from each coordinate to P, but this is too slow.
I then tried to store these coordinates as a quad-tree, find the leaf node that contains P, then find the closest coordinate in that leaf by comparing distances of every coordinate to P. This gives a good approximation for the closest coordinate, but can be wrong sometimes. (when a coordinate is outside the leaf node, but closer). I've also tried searching through the leaf node's parent, but while that makes the search more accurate, it doesn't make it perfect.
If it is possible for this to be done with a quad-tree, please let me know how, otherwise, what other methods/data structures could I used that are reasonably efficient, or is it even possible to do this perfectly in an efficient manner?
Try "loose quadtree". It does not have a fixed volume per node. So it can adjust each node's bounding volume to adapt to the items added.
If you don't like quadtree's traversing performance and if your objects are just points, adaptive-grid can perform fast or very close to O(N). But memory-wise, loose quadtree would be better.
There is an algorithm by Hjaltason and Samet described in their paper "Distance browsing in spatial databases". It can easily be applied to quadtrees, I have an implementation here.
Basically, you maintain a sorted list of object, the list is sorted by distance (closest first), and the objects are either point in your tree (you call the coordinates) or nodes in the tree (distance to closest corner, or distance=0 if they overlap with you search point).
You start adding all nodes that overlap with your search point, and add all points and subnodes in these points.
Then you simply return points from the top of the list until you have as many closest points as you want. If a node is at the top of the list, add points/subnodes from that node to the list and check the top of the list again. Repeat.
yes you can find the closest coordinate inside a quad-tree even when it is not directly inside the leaf. in order to do that, you can do the following search algorithm :
search the closest position inside the quad-tree.
take its distance from your initial position
search all the nodes inside this bounding box from your root node
return the closest node from all the nodes inside this bounding box
however, this is a very basic algorithm with no performance optimizations. among other things :
if the distance calculated in 2. is less than the distance to the border of the tree node, then you don't need to do 3 or 4. (or you can take a node that is not the root node)
also, 3 and 4 could be simplified into a single algorithm that only search inside the tree with the distance to the closest node as the bounding box.
And you could also sort the way you search for the nodes inside the bounding box by beginning to search for the nodes closest to your position first.
However, I have not made complexity calculation, but you should expect a worst case scenario on one node that is as bad if not worst than normal, but in general you should get a pretty decent speed up all the while being error free.

Implementation of Planar Point Location using slabs

I am looking into a Planar Point Location, and right now thinking about make use of The Slabs approach.
I've read through a couple of articles:
Planar Point Location Using Persistent Search Trees
Persistent Data Structures
Point location on Wikipedia
I understand the concept behind a persistent balanced binary search tree, but let's omit them in this question, I don't really care about extra storage space overhead. The thing with the articles is they are discussing how speed can be improved, but don't explain basic stuff. For example:
Could you please correct me if I am wrong:
We draw a line across all intersections.
Now, we have slabs that are split by the segments at different angles.
Every intersection with any line is considered a vertex.
Slabs are sorted by order in a binary search tree (let's omit a partially persistent bst)
Somehow, sectors are sorted in those respective BST's, even though the segments dividing them are almost always at an angle. Does each node has to carry a definition of area?
Please refer to this example image:
Questions:
How would I actually figure out if the point lies in Node c and not in Node b? Is it somehow via area?
Do I instead need arrange my nodes to contain information about the segments? I could then check if the query point lies above a segment (if that's how I should determine my sectors)? If so, would I then search through a Polygon List after, to see which polygon this particular segment belongs to?
Maybe I need to store BST for each line and not a slab?
Would I then have to look at 2 BST's belonging to line on the left, and the 2nd one of the line to the right from the vertex? I could then sort the vertexes by y coordinate in each tree and return the y coordinate of a vertex (the end of a segment) right below my query point. Having done this for left and right line, I would then do a comparison to see if the names of the segment those vertexes come from actually match.
However, this will not give me the right answer, since even if the names do match, I might be below or above the segment (if I am close to it). Also, this implies I have to do 3 binary searches (1 for lines, 1 for the y-coordinate on left line, 1 for the right one), and the books says I only need to do 2 searches (1 for slab, 2nd for sector).
Could someone please point me in the direction to do it?
I probably just missed some essential thought or something.
Edit:
Here is another good article, that explains the solution to the problem, however, I don't quite understand how I would achieve the following:
"Consider any query point q ∈ R2. To find the face of G that contains q, we first use binary search with the x-coordinate of q to find the vertical slab s that contains q. Given s, we use binary search with the y-coordinate of q to find the edges of Es between which q lies. "
How exactly to find those 2 edges? Is it as simple as checking if the point lies below the segment? However, this seems like a complicated check to do (and expensive), as we descend down the tree, inspecting the other nodes.
Well, this was asked a year ago so I guess you're not going to be around to accept the answer. It's a fun question, though, so here goes anyway...
Yes, you can divide the plane into slabs using vertical lines through every intersection. The important result is that all line segments that touch a slab will go all the way through, and the order of those line segments will be the same across the entire slab.
In each slab, you make a binary search tree of those line segments, ordered by their order in the slab. The areas that you call "nodes" are the leaves of the tree, and the line segments are the internal nodes. I will call the leaves "areas" to avoid ambiguity. Since the order of the line segments is the same across the entire slab, the order in this tree is valid for every x coordinate in the slab.
to determine the area that contains any point, you find the slab that contains the point, and then do a simple search in the BST to find the area.
Lets say you're looking for point C, and you're at the node in slab 2 that contains (x2,y2) - (x3,y4). Determine whether C is above or below this line segment, and then take that branch in the tree.
For example, if C is (cx,cy) then the y coordinate of the segment at x=cx is:
testy = ((cx-x2)/(x3-x2)) * (y4-y2) + y2
If cy < testy, then take the upward branch, otherwise take the downward branch. (assuming positive Y goes downward).
When you get to a leaf, that's the area that contains C. You're done.
Now... Making a whole new tree for every slab takes a lot of space, since you can have N^2 intersection points and therefore N^2 slabs. Storing a separate tree in every slab would take O(N^3) space.
Using, say, persistent red-black trees, however, the slabs can actually share a lot of nodes. Using a simple purely functional implementation like Chris Okasaki's red-black tree takes O(log N) space per intersection for a total of O(N^2*log N) space.
I seem to remember that there is also a constant-space-per-change persistent red-black tree that would give you O(N^2), but I don't have a reference.
Note that for most real-life scenarios there are closer to O(N) intersections, because lines don't cross very much, but it's still nice to save the space by using a persistent tree.

I need a way to construct a 2D-polygon with holes

My object is constructed in a honeycomb grid. All objects are connected. The red lines represents the connection between each node. I heard Binary Space Partitioning (BSP) Trees is good with this type of problem, but not sure what is front and back in my case.
I've implemented the lookup using the honeycomb grid system as shown (x , y)
class Node {
Point position; //center position
Point grid; //honeycomb grid system
}
class MyObject {
Node lookup(Point grid);
}
I need a data structure that represent the graph as user add more nodes onto the scene and a way to quickly determine if a grid point is (against MyObject):
1. outside
2. inside
3. inside a hole
How big is the space you're working in?
Simulate the whole thing with a simple rectangular grid assuming even-rows are staggered right.
Any node has coordinates [x,y]
For (y%2 == 0) neighbor nodes are [x-1,y][x+1,y][x,y+1][x,y-1][x-1,y-1][x-1,y+1]
For (y%2 == 1) neighbor nodes are [x-1,y][x+1,y][x,y+1][x,y-1][x+1,y-1][x+1,y+1]
Each node can be full or empty and empty nodes can be checked or not checked. Originally all empty nodes are not checked.
Check if a node belongs to a hole by:
if node is full - it does not belong to hole, it belongs to the shape.
if a node is empty mark node as checked.
Iterate over all neighbor nodes:
Skip nodes marked as checked or full
recursively repeat the search for all nodes not checked.
If the recursion brings you into any x<0, y<0, x>MAX_X, y>MAX_Y, abort. The node is outside the shape
If the recursion ends without finding edges of the playfield, the node belongs to a hole.
Additionally, you may now repeat the procedure turning all checked nodes into outside or hole for later reference.
If you want to index all holes at start, it may be easier to find all not checked empty nodes at borders of the playfield (x==0, y==0, x==MAX_X, y==MAX_Y) and use the above algorithm to mark them as outside. All remaining empty nodes are holes.
Depending on your grid size, you may implement it as 2D array of structs/objects containing object state (or even chars, with status as bits of the number), sized [MAX_X+1][MAX_Y+1] if it's reasonably sized, or as a list (vector) of full nodes, each containing its coords, status and if you want this to be more speed-optimal, neighbors. In this case, search the shape, finding all empty neighbor nodes for potential holes. Edge nodes with extreme coords (lowest/highest x/y) belong to "outside". Follow their empty neighbors that have full neighbors, to find the outer edge of the shape. All remaining are inner edges and after following the algorithm starting with them you'll have all your "holes".
My suggestion:
Assign each tile a center position in a 2d cartesian space.
Build a binary search tree (BST) containing all center positions.
For a query point find the relative position to the nearest occupied tile using that BST.
Determine whether that position is inside or outside the tile using some geometric formula, e.g., as in here:
Is a point inside regular hexagon
Alternatively, work with an approximation using squares, e.g., as seen here:
Hexagonal Grids, how do you find which hexagon a point is in?

Algorithm for falling grid items

I do not know how to describe the goal succinctly, which may be why I haven't been able to find an applicable algorithm despite ample searching, but a picture shows it clearly:
Given the state of items in the grid at the left, does anyone know of an algorithm for efficiently finding the ending positions shown in the grid at right? In this case all the items have "fallen" "down", but the direction of course is arbitrary. The point is just that:
There are a collection of items of arbitrary shapes, but all composed of contiguous squares
Items cannot overlap
All items should move the maximum distance in a given direction until they are touching a wall, or they are touching another item which [...is touching another item ad infinitum...] is touching a wall.
This is not homework, I'm not a student. This is for my own interest in geometry and programming. I haven't mentioned the language because it doesn't matter. I can implement whatever algorithm in the language I'm using for the specific project I'm working on. A useful answer could be described in words or code; it's the ideas that matter.
This problem could probably be abstracted into some kind of graph (in the mathematical sense) of dependencies and slack space, so perhaps an algorithm aimed at minimizing lag time could be adapted.
If you don't know the answer but are about to try to make up an algorithm on the spot, just remember that there can be circular dependencies, such as with the interlocking pink (backwards) "C" and blue "T" shapes. Parts of T are below C, and parts of C are below T. This would be even more tricky if interlocking items were locked through a "loop" of several pieces.
Some notes for an applicable algorithm: All the following are very easy and fast to do because of the way I've built the grid object management framework:
Enumerate the individual squares within a piece
Enumerate all pieces
Find the piece, if any, occupying a specific square in the overall grid
A note on the answer:
maniek hinted it first, but bloops has provided a brilliant explanation. I think the absolute key is the insight that all pieces moving the same amount maintain their relationship to each other, and therefore those relationships don't have to be considered.
An additional speed-up for a sparsely populated board would be to shift all pieces to eliminate rows that are completely empty. It is very easy to count empty rows and to identify pieces on one side ("above") an empty row.
Last note: I did in fact implement the algorithm described by bloops, with a few implementation-specific modifications. It works beautifully.
The Idea
Define the set of frozen objects inductively as follows:
An object touching the bottom is frozen.
An object lying on a frozen object is frozen.
Intuitively, exactly the frozen objects have reached their final place. Call the non-frozen objects active.
Claim: All active objects can fall one unit downwards simultaneously.
Proof: Of course, an active object will not hit another active object, since their relative position with respect to each other does not change. An active object will also not hit a frozen object. If that was so, then the active object was, in fact, frozen, because it was lying on a a frozen object. This contradicts our assumption.
Our algorithm's very high-level pseudo-code would be as follows:
while (there are active objects):
move active objects downwards simultaneously until one of them hits a frozen object
update the status (active/frozen) of each object
Notice that at least one object becomes frozen in each iteration of the while loop. Also, every object becomes frozen exactly once. These observations would be used while analyzing the run-time complexity of the actual algorithm.
The Algorithm
We use the concept of time to improve the efficiency of most operations. The time is measured starting from 0, and every unit movement of the active objects take 1 unit time. Observe that, when we are at time t, the displacement of all the objects currently active at time t, is exactly t units downward.
Note that in each column, the relative ordering of each cell is fixed. One of the implications of this is that each cell can directly stop at most one other cell from falling. This observation could be used to efficiently predict the time of the next collision. We can also get away with 'processing' every cell at most once.
We index the columns starting from 1 and increasing from left to right; and the rows with height starting from 1. For ease of implementation, introduce a new object called bottom - which is the only object which is initially frozen and consists of all the cells at height 0.
Data Structures
For an efficient implementation, we maintain the following data structures:
An associative array A containing the final displacement of each cell. If a cell is active, its entry should be, say, -1.
For each column k, we maintain the set S_k of the initial row numbers of active cells in column k. We need to be able to support successor queries and deletions on this set. We could use a Van Emde Boas tree, and answer every query in O(log log H); where H is the height of the grid. Or, we could use a balanced binary search tree which can perform these operations in O(log N); where N is the number of cells in column k.
A priority queue Q which will store the the active cells with its key as the expected time of its future collision. Again, we could go for either a vEB tree for a query time of O(log log H) or a priority queue with O(log N) time per operation.
Implementation
The detailed pseudo-code of the algorithm follows:
Populate the S_k's with active cells
Initialize Q to be an empty priority queue
For every cell b in bottom:
Push Q[b] = 0
while (Q is not empty):
(x,t) = Q.extract_min() // the active cell x collides at time t
Object O = parent_object(x)
For every cell y in O:
A[y] = t // freeze cell y at displacement t
k = column(y)
S_k.delete(y)
a = S_k.successor(y) // find the only active cell that can collide with y
if(a != nil):
// find the expected time of the collision between a and y
// note that both their positions are currently t + (their original height)
coll_t = t + height(a) - height(y) - 1
Push/update Q[a] = coll_t
The final position of any object can be obtained by querying A for the displacement of any cell belonging to that object.
Running Time
We process and freeze every cell exactly once. We perform a constant number of lookups while freezing every cell. We assume parent_object lookup can be done in constant time. The complexity of the whole algorithm is O(N log N) or O(N log log H) depending on the data structures we use. Here, N is the total number of cells across all objects.
And now something completely different :)
Each piece that rests on the ground is fixed. Each piece that rests on a fixed piece is fixed. The rest can move. Move the unfixed pieces 1 square down, repeat until nothing can move.
Okay so this appears to be as follows -
we proceed in step
in each step we build a directed graph whose vertices are the object set and whose edges are as follows =>
1) if x and y are two objects then add an edge x->y if x cannot move until y moves. Note that we can have both x->y and y->x edges.
2) further there are objects which can no longer move since they are at the bottom so we colour their vertices as blue. The remaining vertices are red.
2) in the directed graph we find all the strongly connected components using Kosaraju's algorithm/Tarjan's algorithm etc. (In case you are not familiar with SCC then they are extremely powerful technique and you can refer to Kosaraju's algorithm.) Once we get the SCCs we shrink them to a small vertex i.e. we replace the SCC by a single vertex while preserving all external(to SCC) edges. Now if any of the vertex in SCC is blue then we color the new vertex as blue else it is red. This implies that if one object cannot move in SCC then none can.
3) the graph you get will be a directed acyclic graph and you can do a topological sort. Traverse the vertex in increasing order of top numbering and as long as you see a red colour vertex and move the objects represented by the vertex.
continue this step until you can no longer move any vertex in step 3.
If two objects A and B overlap then we say that they are inconsistent relative to each other. For proof of correctness argue the following lemmas
1) "if I move an SCC then none of the object in it cause inconsistency among themselves."
2) "when I move an object in step 3 then I do not cause inconsistencies"
The challenge for you now will be to formally prove the correctness and find suitable data structures to solve it in efficient. Let me know if you need any help.
I haven't cleared all the details, but I think the following seems like a somewhat systematic approach:
Looking at the whole picture as a graph, what you need is a topologic sort of all the vertices - i.e. the items. Item A should be before item B in the sorting if any part of A is below any part of B. Then, when you have the items sorted topologically, you can just iterate through them in this order and determine the positions - as all the items below the current one already have fixed positions.
In order to be able to do the topologic sort, you need an acyclic graph. You can then use some of the algorithms for Strongly Connected Components to find all the cycles and compress them into single vertices. Then you can perform a topsort on the resulting graph.
To find the positions of the pieces within a SCC: first consider it as one big piece and determine where it will end up. This will determine some fixed pieces that cannot move anymore. Remove them and repeat the same procedure for the rest of the pieces in this SCC (if any) to find out their final positions.
The third part is the only one that seems computationally intensive - if you have a very complicated structure for the pieces, but it should still be more optimal than trying to move the pieces one square of the grid at a time.
EDITED several times. I think this is all you need to do:
Find all the pieces that can only fall mutually dependent on each other and combine them into an equivalent larger piece (e.g. the T and backwards C in your picture.)
Iterate through all the pieces, moving them the maximum direction down before they hit something. Repeat until nothing moves.

Nearest neighbour search in a constantly changing set of line segments

I have a set of line segments. I want to perform the following operations on them:
Insert a new line segment.
Find all line segments within radius R of a given point.
Find all points within radium R1 of a given point.
Given a line segment, find if it intersects any of the existing line segments. Exact intersection point is not necessary(though that probably doesnt simplify anything.)
The problem is algorithms like kd/bd tree or BSP trees is that they assume a static set of points, and in my case the points will constantly get augmented with new points, necessitating rebuilding the tree.
What data-structure/algorithms will be most suited for this situation ?
Edit: Accepted the answer that is the simplest and solves my problem. Thanks everyone!
Maintaining a dynamic tree might not be as bad as you think.
As you insert new points/lines etc into the collection it's clear that you'd need to refine the current tree, but I can't see why you'd have to re-build the whole tree from scratch every time a new entity was added, as you're suggesting.
With a dynamic tree approach you'd have a valid tree at all times, so you can then just use the fast range searches supported by your tree type to find the geometric entities you've mentioned.
For your particular problem:
You could setup a dynamic geometric tree, where the leaf elements
maintain a list of geometric entities (points and lines) associated
with that leaf.
When a line is inserted into the collection it should be pushed onto
the lists of all leaf elements that it intersects with. You can do
this efficiently by traversing the subset of the tree from the root
that intersects with the line.
To find all points/lines within a specified radial halo you first need
to find all leaves in this region. Again, this can be done by traversing
the subset of the tree from the root that is enclosed by, or that
intersects with the halo. Since there maybe some overlap, you need to
check that the entities associated with this set of leaf elements
actually lie within the halo.
Once you've inserted a line into a set of leaf elements, you can find
whether it intersects with another line by scanning all of the lines
associated with the subset of leaf boxes you've just found. You can then
do line-line intersection tests on this subset.
A potential dynamic tree refinement algorithm, based on an upper limit to the number of entities associated with each leaf in the tree, might work along these lines:
function insert(x, y)
find the tree leaf element enclosing the new entitiy at (x,y) based on whatever
fast search algorithm your tree supports
if (number of entities per leaf > max allowable) do
refine current leaf element (would typically either be a bisection
or quadrisection based on a 2D tree type)
push all entities from the old leaf element list onto the new child element
lists, based on the enclosing child element
else
push new entity onto list for leaf element
endif
This type of refinement strategy only makes local changes to the tree and is thus generally pretty fast in practice. If you're also deleting entities from the collection you can also support dynamic aggregation by imposing a minimum number of entities per leaf, and collapsing leaf elements to their parents when necessary.
I've used this type of approach a number of times with quadtrees/octrees, and I can't at this stage see why a similar approach wouldn't work with kd-trees etc.
Hope this helps.
One possibility is dividing your space into a grid of boxes - perhaps 10 in the y-axis and 10 in the x-axis for a total of 100.
Store these boxes in an array, so it's very easy/fast to determine neighboring boxes. Each box will hold a list vector of line segments that live in that box.
When you calculate line segments within R of one segment, you can check only the relevant neighboring boxes.
Of course, you can create multiple maps of differing granularities, maybe 100 by 100 smaller boxes. Simply consider space vs time and maintenance trade-offs in your design.
updating segment positions is cheap: just integer-divide by box sizes in the x and y directions. For example, if box-size is 20 in both directions and your new coordinate is 145,30. 145/20==7 and 30/20==1, so it goes into box(7,1), for a 0-based system.
While items 2 & 3 are relatively easy, using a simple linear search with distance checks as each line is inserted, item 4 is a bit more involved.
I'd tend to use a constrained triangulation to solve this, where all the input lines are treated as constraints, and the triangulation is balanced using a nearest neighbour rather than Delaunay criterion. This is covered pretty well in Triangulations and applications by Øyvind Hjelle, Morten Dæhlen and Joseph O'Rourkes Computational Geometry in C Both have source available, including getting sets of all intersections.
The approach I've taken to do this dynamically in the past is as follows;
Create a arbitrary triangulation (TIN) comprising of two triangles
surrounding the extents (current + future) of your data.
For each new line
Insert both points into the TIN. This can be done very quickly by
traversing triangles to find the insertion point, and replacing the
triangle found with three new triangles based on the new point and
old triangle.
Cut a section through the TIN based on the two end points, keeping a
list of points where the section cuts any previously inserted lined.
Add the intersection point details to a list stored against both
lines, and insert them into the TIN.
Force the inserted line as a constraint
Balance all pairs of adjacent triangles modified in the above process
using a nearest neighbour criterion, and repeat until all triangles
have been balanced.
This works better than a grid based method for poorly distributed data, but is more difficult to implement. Grouping end-point and lines into overlapping grids will probably be a good optimization for 2 & 3.
Note that I think using the term 'nearest neighbour' in your question is misleading, as this is not the same as 'all points within a given distance of a line', or 'all points within a given radius of another point'. Nearest neighbour typically implies a single result, and does not equate to 'within a given point to point or point to line distance'.
Instead of inserting and deleting into a tree you can calculate a curve that completely fills the plane. Such a curve reduce the 2d complexity to a 1d complexity and you would be able to find the nearest neighbor. You can find some example like z curve and hilbert curve. Here is a better description of my problem http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.

Resources