Nearest neighbor zones visualized - algorithm

I'm writing an app that looks up points in two-dimensional space using a k-d tree. It would be nice, during development, to be able to "see" the nearest-neighbor zones surrounding each point.
In the attached image, the red points are points in the k-d tree, and the blue lines surrounding each point bound the zone where a nearest neighbor search will return the contained point.
The image was created thusly:
for each point in the space:
da = distance to nearest neighbor
db = distance to second-nearest neighbor
if absolute_value(da - db) < 4:
draw blue pixel
This algorithm has two problems:
(more important) It's slow on my (reasonably fast Core i7) computer.
(less important) It's sloppy, as you can see by the varying widths of the blue lines.
What is this "visualization" of a set of points called?
What are some good algorithms to create such a visualization?

This is called a Voronoi Diagram and there are many excellent algorithms for generating them efficiently. The one I've heard about most is Fortune's algorithm, which runs in time O(n log n), though others algorithms exist for this problem.
Hope this helps!

Jacob,
hey, you found an interesting way of generating this Voronoi diagram, even though it is not so efficient.
The less important issue first: the varying thickness boundaries that you get, those butterfly shapes, are in fact the area between the two branches of an hyperbola. Precisely the hyperbola given by the equation |da - db| = 4. To get a thick line instead, you have to modify this criterion and replace it by the distance to the bisector of the two nearest neighbors, let A and B; using vector calculus, | PA.AB/||AB|| - ||AB||/2 | < 4.
The more important issue: there are two well known efficient solutions to the construction of the Voronoi diagram of a set of points: Fortune's sweep algorithm (as mentioned by templatetypedef) and Preparata & Shamos' Divide & Conquer solutions. Both run in optimal time O(N.Lg(N)) for N points, but aren't so easy to implement.
These algorithm will construct the Voronoi diagram as a set of line segments and half-lines. Check http://en.wikipedia.org/wiki/Voronoi_diagram.
This paper "Primitives for the manipulation of general subdivisions and the computation of Voronoi" describes both algorithms using a somewhat high-level framework, caring about all implementation details; the article is difficult but the algorithms are implementable.
You may also have a look at "A straightforward iterative algorithm for the planar Voronoi diagram", which I never tried.
A totally different approach is to directly build the distance map from the given points for example by means of Dijkstra's algorithm: starting from the given points, you grow the boundary of the area within a given distance from every point and you stop growing when two boundaries meet. [More explanations required.] See http://1.bp.blogspot.com/-O6rXggLa9fE/TnAwz4f9hXI/AAAAAAAAAPk/0vrqEKRPVIw/s1600/distmap-20-seed4-fin.jpg
Another good starting point (for efficiently computing the distance map) can be "A general algorithm for computing distance transforms in linear time".

From personal experience: Fortune's algorithm is a pain to implement. The divide and conquer algorithm presented by Guibas and Stolfi isn't too bad; they give detailed pseudocode that's easy to transcribe into a procedural programming language. Both will blow up if you have nearly degenerate inputs and use floating point, but since the primitives are quadratic, if you can represent coordinates as 32-bit integers, then you can use 64 bits to carry out the determinant computations.
Once you get it working, you might consider replacing your kd-tree algorithms, which have a Theta(√n) worst case, with algorithms that work on planar subdivisions.

You can find a great implementation for it at D3.js library: http://mbostock.github.com/d3/ex/voronoi.html

Related

reference algorithm for weighted voronoi diagrams?

Can someone point me to a reference implementation on how to construct a (multiplicatively and/or additively) weighted voronoi diagram, which is preferably based on Fortune's voronoi algorithm?
My goal:
Given a set of points(each point has a weight) and a set of boundary edges(usually a rectangle) I want to construct a weighted voronoi diagram using either python or the processing.org-framework. Here is an example.
What I have worked on so far:
So far I have implemented Fortune's algorithm as well as the "centroidal voronoi tessellation" presented in Michael Balzer's paper. Algorithm 3 states how the weights need to be adjusted, however, when I implement this my geometry does not work anymore. To fix this the sweep-line algorithm has to be updated to take weights into account, but I have been unable to do this so far.
Hence I would like to see how other people solved this problem.
For additively weighted Voronoi Diagram: Remember that a power diagram in dimension n is only a(n unweighted) Voronoi diagram in dimension n+1.
For that, just recall that the Voronoi diagram of a point set is invariant if you add any constant to the coordinates, and that the weighted Voronoi diagram can thus be written as a non weighted Voronoi diagram using the coordinates, for example in 2D lifted to 3D:
(x_i, y_i, sqrt(C - w_i))
where w_i is the weight of the seed, and C is any arbitrarily large constant (in practice, one just small enough such that C-w_i is positive).
Once your diagram is computed, just discard the last component.
So, basically, you only need to find a library that is able to handle Voronoi diagrams in dimension n+1 compared to your problem. CGAL can do that. This also makes the implementation extremely easy.
This computation is not easy, but it is available in CGAL. See the manual pages here.
See also the Effective Computational Geometry project, which employs and
supports CGAL:
There is little `off-the-shelf' open source code out there, for the case where distances to the centers are weighted with a multiplicative factor.
To my knowledge, none of the current CGAL packages covers this case.
Takashi Ohyama's beautifully colorful website provides java implementations
http://www.nirarebakun.com/voro/emwvoro.html
for up to 100 points with a SIMPLE algorithm (Euclidean and Manhattan distances).
There is also a paper describing this simple intersection algorithm and an approximate implementation within O(n^3) time, as a plugin to TerraView.
However, I cannot find the source of this plugin in the TerraView / TerraLib repository:
http://www.geoinfo.info/geoinfo2011/papers/mauricio1.pdf
Aurenhammer and Edelsbrunner describe an optimal n^2 time algorithm, but I'm unaware of available code of that.
If you are comfortable digging into Octave, you could reference the code provided in their library.

efficient algorithm to find nearest point in a graph that does not have a known equation

I'm asking this questions out of curiostity, since my quick and dirty implementation seems to be good enough. However I'm curious what a better implementation would be.
I have a graph of real world data. There are no duplicate X values and the X value increments at a consistant rate across the graph, but Y data is based off of real world output. I want to find the nearest point on the graph from an arbitrary given point P programmatically. I'm trying to find an efficient (ie fast) algorithm for doing this. I don't need the the exact closest point, I can settle for a point that is 'nearly' the closest point.
The obvious lazy solution is to increment through every single point in the graph, calculate the distance, and then find the minimum of the distance. This however could theoretically be slow for large graphs; too slow for what I want.
Since I only need an approximate closest point I imagine the ideal fastest equation would involve generating a best fit line and using that line to calculate where the point should be in real time; but that sounds like a potential mathematical headache I'm not about to take on.
My solution is a hack which works only because I assume my point P isn't arbitrary, namely I assume that P will usually be close to my graph line and when that happens I can cross out the distant X values from consideration. I calculating how close the point on the line that shares the X coordinate with P is and use the distance between that point and P to calculate the largest/smallest X value that could possible be closer points.
I can't help but feel there should be a faster algorithm then my solution (which is only useful because I assume 99% of the time my point P will be a point close to the line already). I tried googling for better algorithms but found so many algorithms that didn't quite fit that it was hard to find what I was looking for amongst all the clutter of inappropriate algorithms. So, does anyone here have a suggested algorithm that would be more efficient? Keep in mind I don't need a full algorithm since what I have works for my needs, I'm just curious what the proper solution would have been.
If you store the [x,y] points in a quadtree you'll be able to find the closest one quickly (something like O(log n)). I think that's the best you can do without making assumptions about where the point is going to be. Rather than repeat the algorithm here have a look at this link.
Your solution is pretty good, by examining how the points vary in y couldn't you calculate a bound for the number of points along the x axis you need to examine instead of using an arbitrary one.
Let's say your point P=(x,y) and your real-world data is a function y=f(x)
Step 1: Calculate r=|f(x)-y|.
Step 2: Find points in the interval I=(x-r,x+r)
Step 3: Find the closest point in I to P.
If you can use a data structure, some common data structures for spacial searching (including nearest neighbour) are...
quad-tree (and octree etc).
kd-tree
bsp tree (only practical for a static set of points).
r-tree
The r-tree comes in a number of variants. It's very closely related to the B+ tree, but with (depending on the variant) different orderings on the items (points) in the leaf nodes.
The Hilbert R tree uses a strict ordering of points based on the Hilbert curve. The Hilbert curve (or rather a generalization of it) is very good at ordering multi-dimensional data so that nearby points in space are usually nearby in the linear ordering.
In principle, the Hilbert ordering could be applied by sorting a simple array of points. The natural clustering in this would mean that a search would usually only need to search a few fairly-short spans in the array - with the complication being that you need to work out which spans they are.
I used to have a link for a good paper on doing the Hilbert curve ordering calculations, but I've lost it. An ordering based on Gray codes would be simpler, but not quite as efficient at clustering. In fact, there's a deep connection between Gray codes and Hilbert curves - that paper I've lost uses Gray code related functions quite a bit.
EDIT - I found that link - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.7490

Given a set of polygons and a series of points, find the which polygons are the points located

This is a question similar to the one here, but I figure that it would be helpful if I can recast it in a more general terms.
I have a set of polygons, these polygons can touch one another, overlap and can take on any shape. My question is, given a list of points, how to devise an efficient algorithm that find which polygons are the points located?
One of the interesting restriction of the location of the points is that, all the points are located at the edges of the polygons, if this helps.
I understand that r-trees can help, but given that I am doing a series of points, is there a more efficient algorithm instead of computing for each point one by one?
The key search term here is point location. Under that name, there are many algorithms in the computational geometry literature for various cases, from special to general. For example, this link lists various software packages, including my own. (A somewhat out-of-date list now.)
There is a significant tradeoff between speed and program complexity (and therefore implementation effort). The easiest-to-program method is to check each point against each polygon, using standard point-in-polygon code. But this could be slow depending on how many polygons you have.
More difficult is to build a point-location data structure by sweeping the plane
and finding all the edge-edge intersection points. See the this Wikipedia article to see some of your options.
I think you are bumping up against intuition about the problem (which is a quasi-analog perception) versus a computational approach which is of necessity O(n).
Given a plane, a degenerate polygon (a line), and an arbitrary set of points on the plane, do the points intersect the line or fall "above" or "below" it? I cannot think of an approach that is smaller than O(n) even for this degenerate case.
Either, each point would have to be checked for its relation to the line, or you'd have to partition the points into some tree-like structure which would require at least O(n) operations but very likely more.
If I were better at computational geometry, I might be able to say with authority that you've just restated Klee's measure problem but as it is I just have to suggest it.
If points can only fall on the edges, then you can find the polygons in O(n) by just examining the edges.
If it were otherwise, you'd have to triangulate the polygons in O(n log n) to test against the triangles in O(n).
You could also divide the space by the line extended from each segment, noting which side is inside/outside of the corresponding polygon. A point is inside a polygon if it falls on an edge or if it's on the inside part of every edge of the polygon. It's O(n) on the number of edges in the worst case, but tends to O(m) on the number of polygons on the average case.
An R-tree would help in both cases, but only if you have several points to test. Otherwise, constructing the R-tree would be more expensive than searching through the list of triangles.

Simplified (or smooth) polygons that contain the original detailed polygon

I have a detailed 2D polygon (representing a geographic area) that is defined by a very large set of vertices. I'm looking for an algorithm that will simplify and smooth the polygon, (reducing the number of vertices) with the constraint that the area of the resulting polygon must contain all the vertices of the detailed polygon.
For context, here's an example of the edge of one complex polygon:
My research:
I found the Ramer–Douglas–Peucker algorithm which will reduce the number of vertices - but the resulting polygon will not contain all of the original polygon's vertices. See this article Ramer-Douglas-Peucker on Wikipedia
I considered expanding the polygon (I believe this is also known as outward polygon offsetting). I found these questions: Expanding a polygon (convex only) and Inflating a polygon. But I don't think this will substantially reduce the detail of my polygon.
Thanks for any advice you can give me!
Edit
As of 2013, most links below are not functional anymore. However, I've found the cited paper, algorithm included, still available at this (very slow) server.
Here you can find a project dealing exactly with your issues. Although it works primarily with an area "filled" by points, you can set it to work with a "perimeter" type definition as yours.
It uses a k-nearest neighbors approach for calculating the region.
Samples:
Here you can request a copy of the paper.
Seemingly they planned to offer an online service for requesting calculations, but I didn't test it, and probably it isn't running.
HTH!
I think Visvalingam’s algorithm can be adapted for this purpose - by skipping removal of triangles that would reduce the area.
I had a very similar problem : I needed an inflating simplification of polygons.
I did a simple algorithm, by removing concav point (this will increase the polygon size) or removing convex edge (between 2 convex points) and prolongating adjacent edges. In any case, doing one of those 2 possibilities will remove one point on the polygon.
I choosed to removed the point or the edge that leads to smallest area variation. You can repeat this process, until the simplification is ok for you (for example no more than 200 points).
The 2 main difficulties were to obtain fast algorithm (by avoiding to compute vertex/edge removal variation twice and maintaining possibilities sorted) and to avoid inserting self-intersection in the process (not very easy to do and to explain but possible with limited computational complexity).
In fact, after looking more closely it is a similar idea than the one of Visvalingam with adaptation for edge removal.
That's an interesting problem! I never tried anything like this, but here's an idea off the top of my head... apologies if it makes no sense or wouldn't work :)
Calculate a convex hull, that might be way too big / imprecise
Divide the hull into N slices, for example joining each one of the hull's vertices to the center
Calculate the intersection of your object with each slice
Repeat recursively for each intersection (calculating the intersection's hull, etc)
Each level of recursion should give a better approximation.... when you reached a satisfying level, merge all the hulls from that level to get the final polygon.
Does that sound like it could do the job?
To some degree I'm not sure what you are trying to do but it seems you have two very good answers. One is Ramer–Douglas–Peucker (DP) and the other is computing the alpha shape (also called a Concave Hull, non-convex hull, etc.). I found a more recent paper describing alpha shapes and linked it below.
I personally think DP with polygon expansion is the way to go. I'm not sure why you think it won't substantially reduce the number of vertices. With DP you supply a factor and you can make it anything you want to the point where you end up with a triangle no matter what your input. Picking this factor can be hard but in your case I think it's the best method. You should be able to determine the factor based on the size of the largest bit of detail you want to go away. You can do this with direct testing or by calculating it from your source data.
http://www.it.uu.se/edu/course/homepage/projektTDB/ht13/project10/Project-10-report.pdf
I've written a simple modification of Douglas-Peucker that might be helpful to anyone having this problem in the future: https://github.com/prakol16/rdp-expansion-only
It's identical to DP except that it pushes a line segment outwards a bit if the points that it would remove are outside the polygon. This guarantees that the resulting simplified polygon contains all the original polygon, but it has almost the same number of line segments as the original DP algorithm and is usually reasonably good at approximating the original shape.

Smallest circle which covers given points on 2D plane

Problem: What is the smallest possible diameter of a circle which covers given N points on a 2D plane?
What is the most efficient algorithm to solve this problem and how does it work?
This is the smallest circle problem. See the references for the links to the suggested algorithms.
E.Welzl, Smallest Enclosing Disks
(Balls and Ellipsoids), in H. Maurer
(Ed.), New Results and New Trends in
Computer Science, Lecture Notes in
Computer Science, Vol. 555,
Springer-Verlag, 359–37 (1991)
is the reference to the "fastest" algorithm.
There are several algorithms and implementations out there for the Smallest enclosing balls problem.
For 2D and 3D, Gärtner's implementation is probably the fastest.
For higher dimensions (up to 10,000, say), take a look at https://github.com/hbf/miniball, which is the implementation of an algorithm by Gärtner, Kutz, and Fischer (note: I am one of the co-authors).
For very, very high dimensions, core-set (approximation) algorithms will be faster.
Note: If you are looking for an algorithm to compute the smallest enclosing sphere of spheres, you will find a C++ implementation in the Computational Geometry Algorithms Library (CGAL). (You do not need to use all of CGAL; simply extract the required header and source files.)
the furthest point voronoi diagram approach
http://www.dma.fi.upm.es/mabellanas/tfcs/fvd/algorithm.html
turns out to work really well for the 2 d problem. It is non-iterative and (pretty sure) guaranteed exact. I suspect it doesn't extend so well to higher dimensions, which is why there is little attention to it in the literature.
If there is interest i'll describe it here - the above link is a bit hard to follow I think.
edit another link: http://ojs.statsbiblioteket.dk/index.php/daimipb/article/view/6704

Resources