Simple algorithm to compute vertical map of set of segments? - algorithm

The paper An optimal algorithm for intersecting line segments in the plane by Chazelle and Edelsbrunner defines the "vertical map" of a set of segments as
the planar subdivision obtained by drawing a rectangular frame around the
segments and connecting by a vertical line segment every endpoint to
the edges immediately above and below
(It really is easier to understand looking at the diagram on the top of page 3).
They make the comment
we do not know of any simple algorithm, no matter how slow, for
computing the vertical map of a set of segments
To me this seems confusing, because I immediately thought of a simple algorithm to do so:
Iterate through every endpoint p
Iterate through every other segment s
t=intersection(s, vertical line through p)
keep track of the closest t above and below p
add the closest point above and the closest point below and connect them with a segment
And then we're done? I mean, this would only be O(N^2) running time, not even that bad.
What more could they mean that this algorithm doesn't satisfy? Do they expect to find the intersection points as well? Wouldn't an O(N^2) pairwise intersection check do that? Do they want to enumerate the subdivisions of the plane? Couldn't that just be done very simply using standard algorithms to go from points+edges to faces? e.g. by walking around the border of the face?
Is there something I'm missing?

The papers is from 1992, and - as an example -, the author works with C-code.
At that time, there was not (except perhaps in specific languages, but not fast, not easy to use and code), collections, sort and other very sympathetic and optimized tools we have today.
So, if you try to do it, only with arrays and simple types, you probably will need a lot of arrays, indexes and loops, and this will not be very easy to understand.
Note that the paper is plenty of process time, and memory usage.

Related

Use of Hilvert Curve to query a rectangular area and see if it overlaps other rectangles

I am looking for a method that can help me in project I am working on. The idea is that there are some rectangles in 2d space, I want to query a rectangular area and see if it overlaps any rectangle in that area. If it does, it fails. If it doesn't, meaning the space is empty, it succeeds.
I was linked to z-order curves to help turn 2d coordinates into 1d. While I was reading about it, I encountered the Hilbert curve. I read that the Hilbert curve is preferred over a z-order curve because it maintains better proximity of points. I also read that the Hilbert curve is used to make more efficient quadtrees and octrees.
I was reading this article https://sigmodrecord.org/publications/sigmodRecord/0103/3.lawder.pdf for a possible solution but I don't know if this applies to my case.
I also saw this comment https://news.ycombinator.com/item?id=14217760 which mentioned multiple index entries for non point objects.
Is there an elegant method where I can use the Hilbert curve to achieve this? Is it possible with just an array of rectangles?
I am pretty sure it is possible and can be very efficient. But there is a lot of complexity involved.
I have implemented this for a z-order curve and called it PH-Tree, you find implementations in Java and C++, as well as theoretical background in the link. The PH-Tree is similar to a z-ordered quadtree but with several modifications that allow taking full advantage of the space filling curve.
There are a few things to unpack here.
Hilbert curves vs z-order curves: Yes, Hilbert curves have a slightly better proximity than z-order curves. Better proximity does two things: 1) reduce the number of elements (nodes, data) to look at and 2) improve linear access patterns (less hopping around). Both is important if the spatial index is stored on disk and I/O is expensive (especially old disk drives).
If I remember correctly, Hilbert curve and z-order are similar enough that the always access the same amount of data. The only thing that Hilbert curves are better at is linear access.
However, Hilbert curves are much more complicated to calculate, in the tests I made with an in-memory index (not very thorough testing, I admit) I found that z-order curves are considerably more efficient because the reduced CPU time outweighs the cost of accessing data slightly out of order.
Space filling curves (at least Hilbert and z-curve) allow for some neat bit-level hacks that can speed up index operations. However, even for z-ordering I found that getting these right required a lot of thinking (I wrote a whole paper about it). I believe these operations can be adapted for Hilbert curves but I may take some time and, as I wrote above, it may not improve performance at all.
Storing rectangles in a spatial curve index:
There are different approaches to encode rectangles in a spatial curve. All approaches that I am aware of encode the rectangle in a multi-dimensional point. I found the following encoding to work best (assuming axis aligned rectangles). We defined the rectangle by the lower left minimum-corner and the upper right maximum corner, e.g. min={min0, min1}={2,3}/max={max0, max1}={8,4}. We transform this (by interleaving the min/max values) into a 4-dimensional point {2,8,3,4} and store this point in the index. Other approaches use a different ordering (2,3,8,4) or, instead of two corners, store the center point and the lengths of the edges.
Querying:
If you want to find any rectangles that overlap/intersect with a given region (we call that a window query) we need to create a 4-dimensional query box, i.e. an axis aligned box that is defined by a 4D min-point and 4D max-point (copied from here):
min = {−∞, min_0, −∞, min_1} = {−∞, 2, −∞, 3}
max = {max_0, +∞, max_1, +∞} = {8, +∞, 4, +∞}
We can process the dimensions one by one. The first min/max pair is {−∞, 8}, that will match any 2D rectangle whose minimum x-coordinate is is 8 or lower. All coordinates:
d=0: min/max pair is {−∞, 8}: matches any 2D min-x <= 8
d=1: min/max pair is {2, +∞}: matches any 2D max-x >= 2
d=2: min/max pair is {−∞, 4}: matches any 2D min-y <= 4
d=3: min/max pair is {3, +∞}: matches any 2D max-y <= 3
If all these conditions hold true, then the stored rectangle overlaps with the query window.
Final words:
This sounds complicated but can be implemented very efficiently (also lends itself to vectorization). I found that is on par with other indexes (quadtree, R-Star-Tree), see some benchmarks I made here.
Specifically, I found that the z-ordered indexes have very good insertion/update/removal times (I don't know whether that matters for you) and is very good for small query result sizes (it sounds like you often expect it be zero, i.e. no overlapping rectangle found). It generally works well with large datasets (1000s or millions of entries) and datasets that have strong clusters.
For smaller datasets of if you expect to typically find many result (you can of course abort a query early once you find the first match!) other index types may be better.
On a last note, I found the dataset characteristics to have considerable influence on which index worked best. Also, implementation appears to be at least as important as the underlying algorithms. Especially for quadtrees I found huge variations in performance from different implementations.

Algorithm for multiple polyline and polygon decimation

We have some polylines (list of points, has start and end point, not cyclic) and polygons (list of points, cyclic, no such thing as endpoints).
We want to map each polyline to a new polyline and each polygon to a new polygon so the total number of edges is small enough.
Let's say the number of edges originally is N, and we want our result to have M edges. N is much larger than M.
Polylines need to keep their start and end points, so they contribute at least 1 edge, one less than their vertex count. Polygons need to still be polygons, so they contribute at least 3 edges, equal to their vertex count. M will be at least large enough for this requirement.
The outputs should be as close as possible to the inputs. This would end up being an optimization problem of minimizing some metric to within some small tolerance of the true optimal solution. Originally I'd have used the area of the symmetric difference of the original and result (area between), but if another metric makes this easier to do I'll gladly take that instead.
It's okay if the results only include vertices in the original, the fit will be a little worse but it might be necessary to keep the time complexity down.
Since I'm asking for an algorithm, it'd be nice to also see an implementation. I'll likely have to re-implement it for where I'll be using it anyway, so details like what language or what data structures won't matter too much.
As for how good the approximation needs to be, about what you'd expect from getting a vector image from a bitmap image. The actual use here is for a tool for a game though, there's some strange details for the specific game, that's why the output edge count is fixed rather than the tolerance.
It's pretty hard to find any information on this kind of thing, so without even providing a full workable algorithm, just some pointers would be very much appreciated.
Ramer–Douglas–Peucker algorithm (mentioned in the comment) is definitely good, but it has some disadvantages:
It requires open polyline on input, for closed polygon one has to fix an arbitrary point, which either decreases final quality or forces to test many points and decreases the performance.
The vertices of simplified polyline are a subset of original polyline vertices, other options are not considered. This permits very fast implementations, but again decreases the precision of simplified polyline.
Another alternative is to take well known algorithm for simplification of triangular meshes Surface Simplification Using Quadric Error Metrics and adapt it for polylines:
distances to planes containing triangles are replaced with distances to lines containing polyline segments,
quadratic forms lose one dimension if the polyline is two dimensional.
But the majority of the algorithm is kept including the queue of edge contraction (minimal heap) according to the estimated distortion such contraction produces in the polyline.
Here is an example of this algorithm application:
Red - original polyline, blue - simplified polyline, and one can see that all its vertices do not lie on the original polyline, while general shape is preserved as much as possible with so few line segments.
it'd be nice to also see an implementation.
One can find an implementation in MeshLib, see MRPolylineDecimate.h/.cpp

How to break a geometry into blocks?

I am certain there is already some algorithm that does what I need, but I am not sure what phrase to Google, or what is the algorithm category.
Here is my problem: I have a polyhedron made up by several contacting blocks (hyperslabs), i. e. the edges are axis aligned and the angles between edges are 90°. There may be holes inside the polyhedron.
I want to break up this concave polyhedron in as little convex rectangular axis-aligned whole blocks are possible (if the original polyhedron is convex and has no holes, then it is already such a block, and therefore, the solution). To illustrate, some 2-D images I made (but I need the solution for 3-D, and preferably, N-D):
I have this geometry:
One possible breakup into blocks is this:
But the one I want is this (with as few blocks as possible):
I have the impression that an exact algorithm may be too expensive (is this problem NP-hard?), so an approximate algorithm is suitable.
One detail that maybe make the problem easier, so that there could be a more appropriated/specialized algorithm for it is that all edges have sizes multiple of some fixed value (you may think all edges sizes are integer numbers, or that the geometry is made up by uniform tiny squares, or voxels).
Background: this is the structured grid discretization of a PDE domain.
What algorithm can solve this problem? What class of algorithms should I
search for?
Update: Before you upvote that answer, I want to point out that my answer is slightly off-topic. The original poster have a question about the decomposition of a polyhedron with faces that are axis-aligned. Given such kind of polyhedron, the question is to decompose it into convex parts. And the question is in 3D, possibly nD. My answer is about the decomposition of a general polyhedron. So when I give an answer with a given implementation, that answer applies to the special case of polyhedron axis-aligned, but it might be that there exists a better implementation for axis-aligned polyhedron. And when my answer says that a problem for generic polyhedron is NP-complete, it might be that there exists a polynomial solution for the special case of axis-aligned polyhedron. I do not know.
Now here is my (slightly off-topic) answer, below the horizontal rule...
The CGAL C++ library has an algorithm that, given a 2D polygon, can compute the optimal convex decomposition of that polygon. The method is mentioned in the part 2D Polygon Partitioning of the manual. The method is named CGAL::optimal_convex_partition_2. I quote the manual:
This function provides an implementation of Greene's dynamic programming algorithm for optimal partitioning [2]. This algorithm requires O(n4) time and O(n3) space in the worst case.
In the bibliography of that CGAL chapter, the article [2] is:
[2] Daniel H. Greene. The decomposition of polygons into convex parts. In Franco P. Preparata, editor, Computational Geometry, volume 1 of Adv. Comput. Res., pages 235–259. JAI Press, Greenwich, Conn., 1983.
It seems to be exactly what you are looking for.
Note that the same chapter of the CGAL manual also mention an approximation, hence not optimal, that run in O(n): CGAL::approx_convex_partition_2.
Edit, about the 3D case:
In 3D, CGAL has another chapter about Convex Decomposition of Polyhedra. The second paragraph of the chapter says "this problem is known to be NP-hard [1]". The reference [1] is:
[1] Bernard Chazelle. Convex partitions of polyhedra: a lower bound and worst-case optimal algorithm. SIAM J. Comput., 13:488–507, 1984.
CGAL has a method CGAL::convex_decomposition_3 that computes a non-optimal decomposition.
I have the feeling your problem is NP-hard. I suggest a first step might be to break the figure into sub-rectangles along all hyperplanes. So in your example there would be three hyperplanes (lines) and four resulting rectangles. Then the problem becomes one of recombining rectangles into larger rectangles to minimize the final number of rectangles. Maybe 0-1 integer programming?
I think dynamic programming might be your friend.
The first step I see is to divide the polyhedron into a trivial collection of blocks such that every possible face is available (i.e. slice and dice it into the smallest pieces possible). This should be trivial because everything is an axis aligned box, so k-tree like solutions should be sufficient.
This seems reasonable because I can look at its cost. The cost of doing this is that I "forget" the original configuration of hyperslabs, choosing to replace it with a new set of hyperslabs. The only way this could lead me astray is if the original configuration had something to offer for the solution. Given that you want an "optimal" solution for all configurations, we have to assume that the original structure isn't very helpful. I don't know if it can be proven that this original information is useless, but I'm going to make that assumption in this answer.
The problem has now been reduced to a graph problem similar to a constrained spanning forest problem. I think the most natural way to view the problem is to think of it as a graph coloring problem (as long as you can avoid confusing it with the more famous graph coloring problem of trying to color a map without two states of the same color sharing a border). I have a graph of nodes (small blocks), each of which I wish to assign a color (which will eventually be the "hyperslab" which covers that block). I have the constraint that I must assign colors in hyperslab shapes.
Now a key observation is that not all possibilities must be considered. Take the final colored graph we want to see. We can partition this graph in any way we please by breaking any hyperslab which crosses the partition into two pieces. However, not every partition is meaningful. The only partitions that make sense are axis aligned cuts, which always break a hyperslab into two hyperslabs (as opposed to any more complicated shape which could occur if the cut was not axis aligned).
Now this cut is the reverse of the problem we're really trying to solve. That cutting is actually the thing we did in the first step. While we want to find the optimal merging algorithm, undoing those cuts. However, this shows a key feature we will use in dynamic programming: the only features that matter for merging are on the exposed surface of a cut. Once we find the optimal way of forming the central region, it generally doesn't play a part in the algorithm.
So let's start by building a collection of hyperslab-spaces, which can define not just a plain hyperslab, but any configuration of hyperslabs such as those with holes. Each hyperslab-space records:
The number of leaf hyperslabs contained within it (this is the number we are eventually going to try to minimize)
The internal configuration of hyperslabs.
A map of the surface of the hyperslab-space, which can be used for merging.
We then define a "merge" rule to turn two or more adjacent hyperslab-spaces into one:
Hyperslab-spaces may only be combined into new hyperslab-spaces (so you need to combine enough pieces to create a new hyperslab, not some more exotic shape)
Merges are done simply by comparing the surfaces. If there are features with matching dimensionalities, they are merged (because it is trivial to show that, if the features match, it is always better to merge hyperslabs than not to)
Now this is enough to solve the problem with brute force. The solution will be NP-complete for certain. However, we can add an additional rule which will drop this cost dramatically: "One hyperslab-space is deemed 'better' than another if they cover the same space, and have exactly the same features on their surface. In this case, the one with fewer hyperslabs inside it is the better choice."
Now the idea here is that, early on in the algorithm, you will have to keep track of all sorts of combinations, just in case they are the most useful. However, as the merging algorithm makes things bigger and bigger, it will become less likely that internal details will be exposed on the surface of the hyperslab-space. Consider
+===+===+===+---+---+---+---+
| : : A | X : : : :
+---+---+---+---+---+---+---+
| : : B | Y : : : :
+---+---+---+---+---+---+---+
| : : | : : : :
+===+===+===+ +---+---+---+
Take a look at the left side box, which I have taken the liberty of marking in stronger lines. When it comes to merging boxes with the rest of the world, the AB:XY surface is all that matters. As such, there are only a handful of merge patterns which can occur at this surface
No merges possible
A:X allows merging, but B:Y does not
B:Y allows merging, but A:X does not
Both A:X and B:Y allow merging (two independent merges)
We can merge a larger square, AB:XY
There are many ways to cover the 3x3 square (at least a few dozen). However, we only need to remember the best way to achieve each of those merge processes. Thus once we reach this point in the dynamic programming, we can forget about all of the other combinations that can occur, and only focus on the best way to achieve each set of surface features.
In fact, this sets up the problem for an easy greedy algorithm which explores whichever merges provide the best promise for decreasing the number of hyperslabs, always remembering the best way to achieve a given set of surface features. When the algorithm is done merging, whatever that final hyperslab-space contains is the optimal layout.
I don't know if it is provable, but my gut instinct thinks that this will be an O(n^d) algorithm where d is the number of dimensions. I think the worst case solution for this would be a collection of hyperslabs which, when put together, forms one big hyperslab. In this case, I believe the algorithm will eventually work its way into the reverse of a k-tree algorithm. Again, no proof is given... it's just my gut instinct.
You can try a constrained delaunay triangulation. It gives very few triangles.
Are you able to determine the equations for each line?
If so, maybe you can get the intersection (points) between those lines. Then if you take one axis, and start to look for a value which has more than two points (sharing this value) then you should "draw" a line. (At the beginning of the sweep there will be zero points, then two (your first pair) and when you find more than two points, you will be able to determine which points are of the first polygon and which are of the second one.
Eg, if you have those lines:
verticals (red):
x = 0, x = 2, x = 5
horizontals (yellow):
y = 0, y = 2, y = 3, y = 5
and you start to sweep through of X axis, you will get p1 and p2, (and we know to which line-equation they belong ) then you will get p3,p4,p5 and p6 !! So here you can check which of those points share the same line of p1 and p2. In this case p4 and p5. So your first new polygon is p1,p2,p4,p5.
Now we save the 'new' pair of points (p3, p6) and continue with the sweep until the next points. Here we have p7,p8,p9 and p10, looking for the points which share the line of the previous points (p3 and p6) and we get p7 and p10. Those are the points of your second polygon.
When we repeat the exercise for the Y axis, we will get two points (p3,p7) and then just three (p1,p2,p8) ! On this case we should use the farest point (p8) in the same line of the new discovered point.
As we are using lines equations and points 2 or more dimensions, the procedure should be very similar
ps, sorry for my english :S
I hope this helps :)

How to fit more than one line to data points

I am trying to fit more than one line to a list of points in 2D. My points are quite low in number (16 or 32).
These points are coming from a simulated environment of a robot with laser range finders attached to its side. If the points lie on a line it means that they detected a wall, if not, it means they detected an obstacle. I am trying to detect the walls and calculate their intersection, and for this I thought the best idea is to fit lines on the dataset.
Fitting one line to a set of points is not a problem, if we know all those points line on or around a line.
My problem is that I don't know how can I detect which sets of points should be classified for fitting on the same line and which should not be, for each line. Also, I don't even now the number of points on a line, while naturally it would be the best to detect the longest possible line segment.
How would you solve this problem? If I look at all the possibilities for example for groups of 5 points for all the 32 points then it gives 32 choose 5 = 201376 possibilities. I think it takes way too much time to try all the possibilities and try to fit a line to all 5-tuples.
So what would be a better algorithm what would run much faster? I could connect points within limit and create polylines. But even connecting the points is a hard task, as the edge distances change even within a single line.
Do you think it is possible to do some kind of Hough transform on a discrete dataset with such a low number of entries?
Note: if this problem is too hard to solve, I was thinking about using the order of the sensors and use it for filtering. This way the algorithm could be easier but if there is a small obstacle in front of a wall, it would distract the continuity of the line and thus break the wall into two halves.
A good way to find lines in noisy point data like this is to use RANSAC. Standard RANSAC usage is to pick the best hypothesis (=line in this case) but you can just as easy pick the best 2 or 4 lines given your data. Have a look at the example here:
http://www.janeriksolem.net/2009/06/ransac-using-python.html
Python code is available here
http://www.scipy.org/Cookbook/RANSAC
The first thing I would point out is that you seem to be ignoring a vital aspect of the data, which is you know which sensors (or readings) are adjacent to each other. If you have N laser sensors you know where they are bolted to the robot and if you are rotating a sensor you know the order (and position) in which the measurements are taken. So, connecting points together to form a piecewise linear fit (polylines) should be trivial. Once you had these correspondances you could take each set of four points and determine if they can be modeled effectively by only 2 lines, or something.
Secondly, it's well known that finding a globally optimal fit for even two lines to an arbitrary set of points is NP-Hard as it can be reduced to k-means clustering, so I would not expect to find an efficient algorithm for this. When I used the Hough transform it was for finding corners based on pixel intensities, in theory it is probably applicable to this sort of problem but it is just an approximation and it is probably going to take a fair bit of work to figure out and implement.
I hate to suggest it but it seems like you might benefit by looking at the problem in a slightly different way. When I was working in autonomous navigation with a laser range finder we solved this problem by discretizing the space into an occupancy grid, which is the default approach. Of course, this just assumes the walls are just another object, which is not a particularly outrageous idea IMO. Beyond that, can you assume you have a map of the walls and you are just trying to find the obstacles and localize (find the pose of) the robot? If so, there are a large number of papers and tech reports on this problem.
A part of the solution could be (it's where I would start) to investigate the robot. Questions such as:
How fast is the robot turning/moving?
At what interval is the robot shooting the laser?
Where was the robot and what was its orientation when a point was found?
The answers to these questions might help you better than trying to find some algorithm that relies on the points only. Especially when there are so very few.
The answers to these questions give you more information/points. E.g., if you know the robot was at a specific position when it detected a point, there are no points in between the position of the robot and the detected point. that is very valuable.
For all the triads, fit a line through them and compute how much the line is deviating or not from the points.
Then use only the good (little-deviating) triads and merge them if they have two points at common, and the grow the sets by appending all triads that have (at least) two points in the set.
As the last step you can ditch the triads that haven't been merged with any other but have some common element with result set of at least 4 elements (to get away with crossing lines) but keep those triads that did not merge with anyone but does not have any element in common with sets (this would yield three points on the right side of your example as one of the lines).
I presume this would find the left line and bottom line in the second step, and the right line the third.

efficient algorithm to find nearest point in a graph that does not have a known equation

I'm asking this questions out of curiostity, since my quick and dirty implementation seems to be good enough. However I'm curious what a better implementation would be.
I have a graph of real world data. There are no duplicate X values and the X value increments at a consistant rate across the graph, but Y data is based off of real world output. I want to find the nearest point on the graph from an arbitrary given point P programmatically. I'm trying to find an efficient (ie fast) algorithm for doing this. I don't need the the exact closest point, I can settle for a point that is 'nearly' the closest point.
The obvious lazy solution is to increment through every single point in the graph, calculate the distance, and then find the minimum of the distance. This however could theoretically be slow for large graphs; too slow for what I want.
Since I only need an approximate closest point I imagine the ideal fastest equation would involve generating a best fit line and using that line to calculate where the point should be in real time; but that sounds like a potential mathematical headache I'm not about to take on.
My solution is a hack which works only because I assume my point P isn't arbitrary, namely I assume that P will usually be close to my graph line and when that happens I can cross out the distant X values from consideration. I calculating how close the point on the line that shares the X coordinate with P is and use the distance between that point and P to calculate the largest/smallest X value that could possible be closer points.
I can't help but feel there should be a faster algorithm then my solution (which is only useful because I assume 99% of the time my point P will be a point close to the line already). I tried googling for better algorithms but found so many algorithms that didn't quite fit that it was hard to find what I was looking for amongst all the clutter of inappropriate algorithms. So, does anyone here have a suggested algorithm that would be more efficient? Keep in mind I don't need a full algorithm since what I have works for my needs, I'm just curious what the proper solution would have been.
If you store the [x,y] points in a quadtree you'll be able to find the closest one quickly (something like O(log n)). I think that's the best you can do without making assumptions about where the point is going to be. Rather than repeat the algorithm here have a look at this link.
Your solution is pretty good, by examining how the points vary in y couldn't you calculate a bound for the number of points along the x axis you need to examine instead of using an arbitrary one.
Let's say your point P=(x,y) and your real-world data is a function y=f(x)
Step 1: Calculate r=|f(x)-y|.
Step 2: Find points in the interval I=(x-r,x+r)
Step 3: Find the closest point in I to P.
If you can use a data structure, some common data structures for spacial searching (including nearest neighbour) are...
quad-tree (and octree etc).
kd-tree
bsp tree (only practical for a static set of points).
r-tree
The r-tree comes in a number of variants. It's very closely related to the B+ tree, but with (depending on the variant) different orderings on the items (points) in the leaf nodes.
The Hilbert R tree uses a strict ordering of points based on the Hilbert curve. The Hilbert curve (or rather a generalization of it) is very good at ordering multi-dimensional data so that nearby points in space are usually nearby in the linear ordering.
In principle, the Hilbert ordering could be applied by sorting a simple array of points. The natural clustering in this would mean that a search would usually only need to search a few fairly-short spans in the array - with the complication being that you need to work out which spans they are.
I used to have a link for a good paper on doing the Hilbert curve ordering calculations, but I've lost it. An ordering based on Gray codes would be simpler, but not quite as efficient at clustering. In fact, there's a deep connection between Gray codes and Hilbert curves - that paper I've lost uses Gray code related functions quite a bit.
EDIT - I found that link - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.7490

Resources