Suggestions on speeding up edge selection - algorithm

I am building a graph editor in C# where the user can place nodes and then connect them with either a directed or undirected edge. When finished, an A* pathfinding algorithm determines the best path between two nodes.
What I have: A Node class with an x, y, list of connected nodes and F, G and H scores.
An Edge class with a Start, Finish and whether or not it is directed.
A Graph class which contains a list of Nodes and Edges as well as the A* algorithm
Right now when a user wants to select a node or an edge, the mouse position gets recorded and I iterate through every node and edge to determine whether it should be selected. This is obviously slow. I was thinking I can implement a QuadTree for my nodes to speed it up however what can I do to speed up edge selection?

Since users are "drawing" these graphs I would assume they include a number of nodes and edges that humans would likely be able to generate (say 1-5k max?). Just store both in the same QuadTree (assuming you already have one written).
You can easily extend a classic QuadTree into a PMR QuadTree which adds splitting criteria based on the number of line segments crossing through them. I've written a hybrid PR/PMR QuadTree which supported bucketing both points and lines, and in reality it worked with a high enough performance for 10-50k moving objects (rebalancing buckets!).

So your problem is that the person has already drawn a set of nodes and edges, and you'd like to make the test to figure out which edge was clicked on much faster.
Well an edge is a line segment. For the purpose of filtering down to a small number of possible candidate edges, there is no harm in extending edges into lines. Even if you have a large number of edges, only a small number will pass close to a given point so iterating through those won't be bad.
Now divide edges into two groups. Vertical, and not vertical. You can store the vertical edges in a sorted datastructure and easily test which vertical lines are close to any given point.
The not vertical ones are more tricky. For them you can draw vertical boundaries to the left and right of the region where your nodes can be placed, and then store each line as the pair of heights at which the line intersects those lines. And you can store those pairs in a QuadTree. You can add to this QuadTree logic to be able to take a point, and search through the QuadTree for all lines passing within a certain distance of that point. (The idea is that at any point in the QuadTree you can construct a pair of bounding lines for all of the lines below that point. If your point is not between those lines, or close to them, you can skip that section of the tree.)

I think you have all the ingredients already.
Here's a suggestion:
Index all your edges in a spatial data structure (could be QuadTree, R-Tree etc.). Every edge should be indexed using its bounding box.
Record the mouse position.
Search for the most specific rectangle containing your mouse position.
This rectangle should have one or more edges/nodes; Iterate through them, according to the needed mode.
(The tricky part): If the user has not indicated any edge from the most specific rectangle, you should go up one level and iterate over the edges included in this level. Maybe you can do without this.
This should be faster.

Related

Can a quad-tree be used to accurately determine the closest object to a point?

I have a list of coordinates and I need to find the closest coordinate to a specific point which I'll call P.
At first I tried to just calculate the distance from each coordinate to P, but this is too slow.
I then tried to store these coordinates as a quad-tree, find the leaf node that contains P, then find the closest coordinate in that leaf by comparing distances of every coordinate to P. This gives a good approximation for the closest coordinate, but can be wrong sometimes. (when a coordinate is outside the leaf node, but closer). I've also tried searching through the leaf node's parent, but while that makes the search more accurate, it doesn't make it perfect.
If it is possible for this to be done with a quad-tree, please let me know how, otherwise, what other methods/data structures could I used that are reasonably efficient, or is it even possible to do this perfectly in an efficient manner?
Try "loose quadtree". It does not have a fixed volume per node. So it can adjust each node's bounding volume to adapt to the items added.
If you don't like quadtree's traversing performance and if your objects are just points, adaptive-grid can perform fast or very close to O(N). But memory-wise, loose quadtree would be better.
There is an algorithm by Hjaltason and Samet described in their paper "Distance browsing in spatial databases". It can easily be applied to quadtrees, I have an implementation here.
Basically, you maintain a sorted list of object, the list is sorted by distance (closest first), and the objects are either point in your tree (you call the coordinates) or nodes in the tree (distance to closest corner, or distance=0 if they overlap with you search point).
You start adding all nodes that overlap with your search point, and add all points and subnodes in these points.
Then you simply return points from the top of the list until you have as many closest points as you want. If a node is at the top of the list, add points/subnodes from that node to the list and check the top of the list again. Repeat.
yes you can find the closest coordinate inside a quad-tree even when it is not directly inside the leaf. in order to do that, you can do the following search algorithm :
search the closest position inside the quad-tree.
take its distance from your initial position
search all the nodes inside this bounding box from your root node
return the closest node from all the nodes inside this bounding box
however, this is a very basic algorithm with no performance optimizations. among other things :
if the distance calculated in 2. is less than the distance to the border of the tree node, then you don't need to do 3 or 4. (or you can take a node that is not the root node)
also, 3 and 4 could be simplified into a single algorithm that only search inside the tree with the distance to the closest node as the bounding box.
And you could also sort the way you search for the nodes inside the bounding box by beginning to search for the nodes closest to your position first.
However, I have not made complexity calculation, but you should expect a worst case scenario on one node that is as bad if not worst than normal, but in general you should get a pretty decent speed up all the while being error free.

Adding cycles to a Minimum Spanning Tree without moving the points?

I am generating a dungeon layout for a video game. I have created the rooms, spaced them out using seperation steering, and created a fully connected weighted, undirected graph of the rooms. Then I calculated a MST using Prim's Algorithm, all using GML (GameMaker Language). I miss Python.
My intention is to add additional edges to reintroduce loops, so a player does not have to always return along a path, and to make layouts more interesting. The problem is, these edges cannot cross, and I would prefer not to have to move the points around. I had been given a recommendation to use Delaunay Triangulation, but if I am honest this is completely over my head, and may not be a viable solution in GML. I am asking for any suggestions on algorithms that I could use to identify edges that I could add that do not intersect previously created edges.
I have included an image of the MST (the lines connect to the corners of the red markers, even if the image shows they stop short)
If I'm understanding your question correctly, we're looking at more of a geometry problem than a graph theory problem. You have existing points and line segments with concrete locations in 2d space, and you want to add new line segments that will not intersect existing line segments.
For checking whether you can connect two nodes, node1 and node2, you can iterate through all existing edges and see whether the line segment node1---node2 would intersect the line segment edge.endpoint1 --- edge.endpoint2. The key function that checks whether two line segments intersect can be implemented with any of the solutions found here: How can I check if two segments intersect?.
That would take O(E) time and look something like
def canAddEdge(node1, node2):
canAdd = True
for edge in graph:
canAdd = canAdd and not doesIntersect([node1.location(),
node2.location(), edge.endpoint1.location(), edge.endpoint2.location()])
And you can get a list of valid edges to add in O(EV^2) with something like
def getListOfValidEdges(graph):
validEdges = []
for index,firstEndpointNode in enumerate(graph.nodes()):
for secondEndpointNode in graph.nodes()[index:]:
if (canAddEdge(firstEndpointNode, secondEndpointNode)):
validEdges.append([firstEndpointNode, secondEndpointNode])
return validEdges
Of course, you would need to recalculate the valid edges every time after adding a new edge.

Algorithm placing nodes in 2D - Diagram Creation

Is there an unlicensed algorithm for placing nodes/vertices in a compact, clear way, with nodes being close to each other without overlapping and having short edges and those with many links not being all in the center etc, i.e. all that matters in a good diagram?
In other words, how do I isomorph a graph in the most clearly arranged position?
Oh, and the nodes are rectangles (as I said, it's for a diagram) can differ in size depending on their content

Find nearest edge in graph

I want to find the nearest edge in a graph. Consider the following example:
Figure 1: yellow: vertices, black: edges, blue: query-point
General Information:
The graph contains about 10million vertices and about 15million edges. Every vertex has coordinates. Edges are defined by the two adjacent vertices.
Simplest solution:
I could simply calculate the distance from the query-point to every other edge in the graph, but that would be horribly slow.
Idea and difficulties:
My idea was to use some spatial index to accelerate the query. I already implemented a kd-tree to find the nearest vertex. But as Figure 1 shows the edges incident to the nearest vertex are not necessarily the nearest to the query-point. I would get the edge 3-4 instead of the nearer edge 7-8.
Question:
Is there an algorithm to find the nearest edge in a graph?
A very simple solution (but maybe not the one with lowest complexity) would be to use a quad tree for all your edges based on their bounding box. Then you simply extract the set of edges closest to your query point and iterate over them to find the closest edge.
The extracted set of edges returned by the quad tree should be many factors smaller than your original 15 million edges and therefore a lot less expensive to iterate through.
A quad tree is a simpler data structure than the R-tree. It is fairly common and should be readily available in many environments. For example, in Java the JTS Topology Suite has a structure QuadTree that can easily be wrapped to perform this task.
There are spatial query structures which are appropriate for other types of data than points. The most general is the "R-tree" structure (and its many, many variants), which will allow you to store the bounding rectangles of your line segments. You can then search outward from your query points, examining the segments in the bounding rectangles and stopping when the nearest remaining rectangle is further than the closest line encountered so far. This could have poor performance when there are many long line segments overlapping, but for a PSLG such as you seem to have here, that shouldn't happen.
Another option is to use the segments to define a BSP tree, and scan outwards from your point to find all the "visible" lines. This in turn will be problematic if your point can see many edges.
Without proof:
You start with a constrained Delaunay Triangulation, that is a triangulation that takes the existing edges into account. E.g. CGAL or Triangle can do this. For each query point you determine which triangle it belongs to. Then you you only have to check the edges touching a corner of that triangle.
I think this should work in most cases, but there are certainly corner cases where it fails, e.g. when there are many vertices without any edge at all, so at least you have to remove those empty vertices.
You can compute the voronoi diagram and run a query on each voronoi cell. You can subdivide the voronoi diagram to get a better result. You can combine metric and voronoi diagram:http://www.cc.gatech.edu/~phlosoft/voronoi/
you could insert extra vertices in long edges to get some approximation based on closest vertices ..

Nearest neighbour search in a constantly changing set of line segments

I have a set of line segments. I want to perform the following operations on them:
Insert a new line segment.
Find all line segments within radius R of a given point.
Find all points within radium R1 of a given point.
Given a line segment, find if it intersects any of the existing line segments. Exact intersection point is not necessary(though that probably doesnt simplify anything.)
The problem is algorithms like kd/bd tree or BSP trees is that they assume a static set of points, and in my case the points will constantly get augmented with new points, necessitating rebuilding the tree.
What data-structure/algorithms will be most suited for this situation ?
Edit: Accepted the answer that is the simplest and solves my problem. Thanks everyone!
Maintaining a dynamic tree might not be as bad as you think.
As you insert new points/lines etc into the collection it's clear that you'd need to refine the current tree, but I can't see why you'd have to re-build the whole tree from scratch every time a new entity was added, as you're suggesting.
With a dynamic tree approach you'd have a valid tree at all times, so you can then just use the fast range searches supported by your tree type to find the geometric entities you've mentioned.
For your particular problem:
You could setup a dynamic geometric tree, where the leaf elements
maintain a list of geometric entities (points and lines) associated
with that leaf.
When a line is inserted into the collection it should be pushed onto
the lists of all leaf elements that it intersects with. You can do
this efficiently by traversing the subset of the tree from the root
that intersects with the line.
To find all points/lines within a specified radial halo you first need
to find all leaves in this region. Again, this can be done by traversing
the subset of the tree from the root that is enclosed by, or that
intersects with the halo. Since there maybe some overlap, you need to
check that the entities associated with this set of leaf elements
actually lie within the halo.
Once you've inserted a line into a set of leaf elements, you can find
whether it intersects with another line by scanning all of the lines
associated with the subset of leaf boxes you've just found. You can then
do line-line intersection tests on this subset.
A potential dynamic tree refinement algorithm, based on an upper limit to the number of entities associated with each leaf in the tree, might work along these lines:
function insert(x, y)
find the tree leaf element enclosing the new entitiy at (x,y) based on whatever
fast search algorithm your tree supports
if (number of entities per leaf > max allowable) do
refine current leaf element (would typically either be a bisection
or quadrisection based on a 2D tree type)
push all entities from the old leaf element list onto the new child element
lists, based on the enclosing child element
else
push new entity onto list for leaf element
endif
This type of refinement strategy only makes local changes to the tree and is thus generally pretty fast in practice. If you're also deleting entities from the collection you can also support dynamic aggregation by imposing a minimum number of entities per leaf, and collapsing leaf elements to their parents when necessary.
I've used this type of approach a number of times with quadtrees/octrees, and I can't at this stage see why a similar approach wouldn't work with kd-trees etc.
Hope this helps.
One possibility is dividing your space into a grid of boxes - perhaps 10 in the y-axis and 10 in the x-axis for a total of 100.
Store these boxes in an array, so it's very easy/fast to determine neighboring boxes. Each box will hold a list vector of line segments that live in that box.
When you calculate line segments within R of one segment, you can check only the relevant neighboring boxes.
Of course, you can create multiple maps of differing granularities, maybe 100 by 100 smaller boxes. Simply consider space vs time and maintenance trade-offs in your design.
updating segment positions is cheap: just integer-divide by box sizes in the x and y directions. For example, if box-size is 20 in both directions and your new coordinate is 145,30. 145/20==7 and 30/20==1, so it goes into box(7,1), for a 0-based system.
While items 2 & 3 are relatively easy, using a simple linear search with distance checks as each line is inserted, item 4 is a bit more involved.
I'd tend to use a constrained triangulation to solve this, where all the input lines are treated as constraints, and the triangulation is balanced using a nearest neighbour rather than Delaunay criterion. This is covered pretty well in Triangulations and applications by Øyvind Hjelle, Morten Dæhlen and Joseph O'Rourkes Computational Geometry in C Both have source available, including getting sets of all intersections.
The approach I've taken to do this dynamically in the past is as follows;
Create a arbitrary triangulation (TIN) comprising of two triangles
surrounding the extents (current + future) of your data.
For each new line
Insert both points into the TIN. This can be done very quickly by
traversing triangles to find the insertion point, and replacing the
triangle found with three new triangles based on the new point and
old triangle.
Cut a section through the TIN based on the two end points, keeping a
list of points where the section cuts any previously inserted lined.
Add the intersection point details to a list stored against both
lines, and insert them into the TIN.
Force the inserted line as a constraint
Balance all pairs of adjacent triangles modified in the above process
using a nearest neighbour criterion, and repeat until all triangles
have been balanced.
This works better than a grid based method for poorly distributed data, but is more difficult to implement. Grouping end-point and lines into overlapping grids will probably be a good optimization for 2 & 3.
Note that I think using the term 'nearest neighbour' in your question is misleading, as this is not the same as 'all points within a given distance of a line', or 'all points within a given radius of another point'. Nearest neighbour typically implies a single result, and does not equate to 'within a given point to point or point to line distance'.
Instead of inserting and deleting into a tree you can calculate a curve that completely fills the plane. Such a curve reduce the 2d complexity to a 1d complexity and you would be able to find the nearest neighbor. You can find some example like z curve and hilbert curve. Here is a better description of my problem http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.

Resources