Divide points on a plane with a line - algorithm

I have N green points and M red points (no three of them are colinear). I want to tell if it is possible to divide those points with a line in such a way so all green points are on one side and all red points on the other side. If there is such line I would like to find the equation of it. The line cannot pass through those points. What is the fastest algorithm to solve this problem? This is not a homework assignment, just a problem I've recently thought about.

Build convex hulls for both sets. If they intersect, there is no such dividing line
Citation:
Two pattern sets Xi and Xj are said to be linearly separable if their
convex hulls are disjoint
When hulls are disjoint, line might be found with rotating calipers

That is exactly what SVMs do. More specifically, a SVM would find the line that is further from both sets of points, if they happen to be linearly separable; otherwise, the algorithm can be tuned to find some kind of "best-effort" solution.
There are plenty of sources where you can read about SVMs in more detail, but basically you would need to use a linear kernel. As an example, here is the SVM implementation in scikit-learn, with some images.

Use (single layer) perceptron to learn a separating hyperplane (a straight line in 2-dimensions), by posing the problem as a binary classification problem.
If two sets of points are linearly separable, then one such hyperplane exists, the perceptron algorithm is guaranteed to converge and solution will give you the hyperplane (a straight line in 2-dimensions) separating the two classes (green and blue).
In case the solution does not exist, the algorithm will not converge and we can stop after a finite number of iterations.
In either of the cases, after the algorithm stops (either with convergence or beyond maximum iterations), we can check whether all the points on the either sides of the output line are monochromatic (i.e., one side contains only red, another side contains only green points), if so, then return the equation of the line, otherwise output no such line exists.
We can use perceptron pocket algorithm too rather than using the naive implementation, but for our objective the naive implementation will also work (https://en.wikipedia.org/wiki/Perceptron).

Related

Finding a point that is not inside any rotated rectangles

I'm looking for an algorithm that can quickly (I'm heavily constrained by performance) find a point inside of a circle, where this point is outside of all rectangles in a provided set (these rectangles can be rotated).
Or alternatively, to find a circle A with its center inside a circle B, where circle A does not intersect with a set of line segments.
The only solution I can come up with is to just loop through samples of points and then loop through the rectangles for each of them. But since my space is continuous, that's quite a pain. I'm basically satisfied with just a single point that doesn't intersect, but there will be cases where no such points exist. In the latter case I would ideally try to find a point with the least amount of intersections, or be able to find the answer that no such point exists.
Does anyone know of any algorithms that can accomplish this in something less than O(n^2)? Anything that would help identify good candidate points would be awesome too.
A typical example of the situation is this:
Lots of big rectangles, with small circle in which I hope to find a point (here indicated with blue). It's common that many of the rectangles fall completely outside of the circle, and also common that the circle is completely covered. There's only a small set of lengths and widths that tend to be used for the rectangles.
There are probably several interesting ways to do this. The simplest algorithm I can think of that gives a decent runtime is an algorithm as follows:
Treat all rectangles as a set of line segments.
Use an efficient algorithm to find the intersection of all line segments (for example the Bentley-Ottmann algorithm.)
Create a list of points of interest (POIs) that are either a) the corners of a rectangle or b) the intersection points computed in 2.
Create a finer set of line segments such that each line segment terminates at a POI defined in 3.
Using the POIs and the finer set of line segments from 4, compute a constrained triangulation (for example a Constrained Delaunay Triangulation.)
Pick any (unlabeled) triangle to start. Determine if the triangle lies within at least one rectangle (label it as a COVERED triangle) or not (label it as a FREE triangle). To do this you can use any point in polygon algorithm, for example ray-casting.
Run a Depth or Breadth first search starting at this triangle and expanding to neighbors, taking care not to cross between any triangle pair that would require crossing a line segment defined in 4. For every triangle visited, label it as the same label as the starting triangle.
Repeat 6-7 until all triangles are labeled (or all triangles covering the circle of interest are labeled.)
The union of all FREE triangles intersected with the circled of interest yields precisely the points that are not covered by any rectangle and are within the circle.
Note, this algorithm is a bit general and can be improved by focusing only in the area around the circle (for example a bounding box region can only be considered, with the bounding box encompassing all rectangles intersecting the circle.)
To analyze the runtime, consider the runtime of each key step:
has a runtime of O((n+k) log n) where k is the number of intersections, where n is the number of line segments.
has a runtime of O(m log m) where m is the number of POIs, m is O(n+k)
and 7. should be analyzed together. In the worst case, each triangle would need O(n) computations to check for containment in a rectangle. Given that there would be O(m) triangles this would yield a O(nm) bound. However, the purpose of the triangulation is to reuse the point in polygon computation for the seeding triangle to label as many neighboring triangles as possible. In practice the number of triangles that would require a point in polygon computation should be negligible. Therefore the runtime of this step is O(tn) where t is the number of traingles for which point in polygon computations are performed.
The runtime expected is, therefore, O((n+k) log n + t(n+k)) where k is the number of intersections in step 2 and t is the number of triangles for which point in polygon computations are performed. In the worst case this is O(n^2 log n) as you can create a pathological example with n^2 intersections, but this should be unlikely if not possible. Likewise, the number t should be kept to a minimum to make this as efficient as possible. If both t << n and k << n^2, this would be quite efficient.
One approximation that could yield performance improvement:
Consider approximating the circle by a set of r line segments, and including these line segments in steps 1-5. While this is an approximation, it would potentially improve the runtime, as only triangles inside the circle would ever need to be considered.

How to break a geometry into blocks?

I am certain there is already some algorithm that does what I need, but I am not sure what phrase to Google, or what is the algorithm category.
Here is my problem: I have a polyhedron made up by several contacting blocks (hyperslabs), i. e. the edges are axis aligned and the angles between edges are 90°. There may be holes inside the polyhedron.
I want to break up this concave polyhedron in as little convex rectangular axis-aligned whole blocks are possible (if the original polyhedron is convex and has no holes, then it is already such a block, and therefore, the solution). To illustrate, some 2-D images I made (but I need the solution for 3-D, and preferably, N-D):
I have this geometry:
One possible breakup into blocks is this:
But the one I want is this (with as few blocks as possible):
I have the impression that an exact algorithm may be too expensive (is this problem NP-hard?), so an approximate algorithm is suitable.
One detail that maybe make the problem easier, so that there could be a more appropriated/specialized algorithm for it is that all edges have sizes multiple of some fixed value (you may think all edges sizes are integer numbers, or that the geometry is made up by uniform tiny squares, or voxels).
Background: this is the structured grid discretization of a PDE domain.
What algorithm can solve this problem? What class of algorithms should I
search for?
Update: Before you upvote that answer, I want to point out that my answer is slightly off-topic. The original poster have a question about the decomposition of a polyhedron with faces that are axis-aligned. Given such kind of polyhedron, the question is to decompose it into convex parts. And the question is in 3D, possibly nD. My answer is about the decomposition of a general polyhedron. So when I give an answer with a given implementation, that answer applies to the special case of polyhedron axis-aligned, but it might be that there exists a better implementation for axis-aligned polyhedron. And when my answer says that a problem for generic polyhedron is NP-complete, it might be that there exists a polynomial solution for the special case of axis-aligned polyhedron. I do not know.
Now here is my (slightly off-topic) answer, below the horizontal rule...
The CGAL C++ library has an algorithm that, given a 2D polygon, can compute the optimal convex decomposition of that polygon. The method is mentioned in the part 2D Polygon Partitioning of the manual. The method is named CGAL::optimal_convex_partition_2. I quote the manual:
This function provides an implementation of Greene's dynamic programming algorithm for optimal partitioning [2]. This algorithm requires O(n4) time and O(n3) space in the worst case.
In the bibliography of that CGAL chapter, the article [2] is:
[2] Daniel H. Greene. The decomposition of polygons into convex parts. In Franco P. Preparata, editor, Computational Geometry, volume 1 of Adv. Comput. Res., pages 235–259. JAI Press, Greenwich, Conn., 1983.
It seems to be exactly what you are looking for.
Note that the same chapter of the CGAL manual also mention an approximation, hence not optimal, that run in O(n): CGAL::approx_convex_partition_2.
Edit, about the 3D case:
In 3D, CGAL has another chapter about Convex Decomposition of Polyhedra. The second paragraph of the chapter says "this problem is known to be NP-hard [1]". The reference [1] is:
[1] Bernard Chazelle. Convex partitions of polyhedra: a lower bound and worst-case optimal algorithm. SIAM J. Comput., 13:488–507, 1984.
CGAL has a method CGAL::convex_decomposition_3 that computes a non-optimal decomposition.
I have the feeling your problem is NP-hard. I suggest a first step might be to break the figure into sub-rectangles along all hyperplanes. So in your example there would be three hyperplanes (lines) and four resulting rectangles. Then the problem becomes one of recombining rectangles into larger rectangles to minimize the final number of rectangles. Maybe 0-1 integer programming?
I think dynamic programming might be your friend.
The first step I see is to divide the polyhedron into a trivial collection of blocks such that every possible face is available (i.e. slice and dice it into the smallest pieces possible). This should be trivial because everything is an axis aligned box, so k-tree like solutions should be sufficient.
This seems reasonable because I can look at its cost. The cost of doing this is that I "forget" the original configuration of hyperslabs, choosing to replace it with a new set of hyperslabs. The only way this could lead me astray is if the original configuration had something to offer for the solution. Given that you want an "optimal" solution for all configurations, we have to assume that the original structure isn't very helpful. I don't know if it can be proven that this original information is useless, but I'm going to make that assumption in this answer.
The problem has now been reduced to a graph problem similar to a constrained spanning forest problem. I think the most natural way to view the problem is to think of it as a graph coloring problem (as long as you can avoid confusing it with the more famous graph coloring problem of trying to color a map without two states of the same color sharing a border). I have a graph of nodes (small blocks), each of which I wish to assign a color (which will eventually be the "hyperslab" which covers that block). I have the constraint that I must assign colors in hyperslab shapes.
Now a key observation is that not all possibilities must be considered. Take the final colored graph we want to see. We can partition this graph in any way we please by breaking any hyperslab which crosses the partition into two pieces. However, not every partition is meaningful. The only partitions that make sense are axis aligned cuts, which always break a hyperslab into two hyperslabs (as opposed to any more complicated shape which could occur if the cut was not axis aligned).
Now this cut is the reverse of the problem we're really trying to solve. That cutting is actually the thing we did in the first step. While we want to find the optimal merging algorithm, undoing those cuts. However, this shows a key feature we will use in dynamic programming: the only features that matter for merging are on the exposed surface of a cut. Once we find the optimal way of forming the central region, it generally doesn't play a part in the algorithm.
So let's start by building a collection of hyperslab-spaces, which can define not just a plain hyperslab, but any configuration of hyperslabs such as those with holes. Each hyperslab-space records:
The number of leaf hyperslabs contained within it (this is the number we are eventually going to try to minimize)
The internal configuration of hyperslabs.
A map of the surface of the hyperslab-space, which can be used for merging.
We then define a "merge" rule to turn two or more adjacent hyperslab-spaces into one:
Hyperslab-spaces may only be combined into new hyperslab-spaces (so you need to combine enough pieces to create a new hyperslab, not some more exotic shape)
Merges are done simply by comparing the surfaces. If there are features with matching dimensionalities, they are merged (because it is trivial to show that, if the features match, it is always better to merge hyperslabs than not to)
Now this is enough to solve the problem with brute force. The solution will be NP-complete for certain. However, we can add an additional rule which will drop this cost dramatically: "One hyperslab-space is deemed 'better' than another if they cover the same space, and have exactly the same features on their surface. In this case, the one with fewer hyperslabs inside it is the better choice."
Now the idea here is that, early on in the algorithm, you will have to keep track of all sorts of combinations, just in case they are the most useful. However, as the merging algorithm makes things bigger and bigger, it will become less likely that internal details will be exposed on the surface of the hyperslab-space. Consider
+===+===+===+---+---+---+---+
| : : A | X : : : :
+---+---+---+---+---+---+---+
| : : B | Y : : : :
+---+---+---+---+---+---+---+
| : : | : : : :
+===+===+===+ +---+---+---+
Take a look at the left side box, which I have taken the liberty of marking in stronger lines. When it comes to merging boxes with the rest of the world, the AB:XY surface is all that matters. As such, there are only a handful of merge patterns which can occur at this surface
No merges possible
A:X allows merging, but B:Y does not
B:Y allows merging, but A:X does not
Both A:X and B:Y allow merging (two independent merges)
We can merge a larger square, AB:XY
There are many ways to cover the 3x3 square (at least a few dozen). However, we only need to remember the best way to achieve each of those merge processes. Thus once we reach this point in the dynamic programming, we can forget about all of the other combinations that can occur, and only focus on the best way to achieve each set of surface features.
In fact, this sets up the problem for an easy greedy algorithm which explores whichever merges provide the best promise for decreasing the number of hyperslabs, always remembering the best way to achieve a given set of surface features. When the algorithm is done merging, whatever that final hyperslab-space contains is the optimal layout.
I don't know if it is provable, but my gut instinct thinks that this will be an O(n^d) algorithm where d is the number of dimensions. I think the worst case solution for this would be a collection of hyperslabs which, when put together, forms one big hyperslab. In this case, I believe the algorithm will eventually work its way into the reverse of a k-tree algorithm. Again, no proof is given... it's just my gut instinct.
You can try a constrained delaunay triangulation. It gives very few triangles.
Are you able to determine the equations for each line?
If so, maybe you can get the intersection (points) between those lines. Then if you take one axis, and start to look for a value which has more than two points (sharing this value) then you should "draw" a line. (At the beginning of the sweep there will be zero points, then two (your first pair) and when you find more than two points, you will be able to determine which points are of the first polygon and which are of the second one.
Eg, if you have those lines:
verticals (red):
x = 0, x = 2, x = 5
horizontals (yellow):
y = 0, y = 2, y = 3, y = 5
and you start to sweep through of X axis, you will get p1 and p2, (and we know to which line-equation they belong ) then you will get p3,p4,p5 and p6 !! So here you can check which of those points share the same line of p1 and p2. In this case p4 and p5. So your first new polygon is p1,p2,p4,p5.
Now we save the 'new' pair of points (p3, p6) and continue with the sweep until the next points. Here we have p7,p8,p9 and p10, looking for the points which share the line of the previous points (p3 and p6) and we get p7 and p10. Those are the points of your second polygon.
When we repeat the exercise for the Y axis, we will get two points (p3,p7) and then just three (p1,p2,p8) ! On this case we should use the farest point (p8) in the same line of the new discovered point.
As we are using lines equations and points 2 or more dimensions, the procedure should be very similar
ps, sorry for my english :S
I hope this helps :)

Sorting Geographical non-contiguous line segments along an implied curve

Given:
A Set (for the sake of discussion we will call it S), which is an unordered collection of line segments. Each line segment is defined as two Longitude-Latitude end-points. While all of the line segments follow an implied curve, there are "gaps" between each of the segments, of various sizes. We refer to this curve as "implied" because it is not explicitly defined anywhere. The only information that we have available are the line segments contained within S.
Desired Result:
A sequence (for the sake of discussion we will call it R), which is an ordered collection of line segments. Each line segment is defined just as before, following the same implied curve as before but are now sorted by their position along the implied curve.
Context (i.e. "Why in the heck do I need this?"):
Basically I have incomplete geographical data that needs to be normalized and "completed" by doing some very simple interpolation to form a complete curve with no gaps. You might ask "why not just fit a curve to all the line segment end-points and be done with it?" -- well, that's not quite what I am after. The line segments are precisely where they should be located, and there is no need for the final curve to be "smooth". In fact, I intend to connect each of the segments with a straight-line (the crudest form of interpolation imaginable). But, connecting the segments is easy; the hard part is sorting them.
So In Summary: What would be a performant algorithm for going from S to R?
You can use a k-d tree or a cover tree to find nearby points quickly.
If you need one continuous curve, I would suggest that a short traveling salesman path that incorporates the given edges would be a reasonable reconstruction. You could use 2-opt together with a k-d tree the way Bentley described (paywalled, sorry; I think there's also a description in this chapter on TSP local search by Johnson and McGeoch). The one modification needed would be to ensure that the initial path includes the given edges and that 2-opt moves do not remove those edges.
I guess the implied curve has two properties. One is it is continious which means there is no segments. Second, its first derivative is continious which means there is no corners.
From second property we can say that if the angle between two line is closer to each other, they are more related. But i guess it is not enough. You can define a cost function which depends on the angle between lines and distance of lines.
C = A*angle + B*distance (where A,B should be tested and tuned)
Form this function you can find how much each line is related to another one. Than you can just simply connect the line with the strongest relations. Though i guess greedy algorithm does not mean you will always get the optimal solution.

Determine whether the two classes are linearly separable (algorithmically in 2D)

There are two classes, let's call them X and O. A number of elements belonging to these classes are spread out in the xy-plane. Here is an example where the two classes are not linearly separable. It is not possible to draw a straight line that perfectly divides the Xs and the Os on each side of the line.
How to determine, in general, whether the two classes are linearly separable?. I am interested in an algorithm where no assumptions are made regarding the number of elements or their distribution. An algorithm of the lowest computational complexity is of course preferred.
If you found the convex hull for both the X points and the O points separately (i.e. you have two separate convex hulls at this stage) you would then just need to check whether any segments of the hulls intersected or whether either hull was enclosed by the other.
If the two hulls were found to be totally disjoint the two data-sets would be geometrically separable.
Since the hulls are convex by definition, any separator would be a straight line.
There are efficient algorithms that can be used both to find the convex hull (the qhull algorithm is based on an O(nlog(n)) quickhull approach I think), and to perform line-line intersection tests for a set of segments (sweepline at O(nlog(n))), so overall it seems that an efficient O(nlog(n)) algorithm should be possible.
This type of approach should also generalise to general k-way separation tests (where you have k groups of objects) by forming the convex hull and performing the intersection tests for each group.
It should also work in higher dimensions, although the intersection tests would start to become more challenging...
Hope this helps.
Computationally the most effective way to decide whether two sets of points are linearly separable is by applying linear programming. GLTK is perfect for that purpose and pretty much every highlevel language offers an interface for it - R, Python, Octave, Julia, etc.
Let's say you have a set of points A and B:
Then you have to minimize the 0 for the following conditions:
(The A below is a matrix, not the set of points from above)
"Minimizing 0" effectively means that you don't need to actually optimize an objective function because this is not necessary to find out if the sets are linearly separable.
In the end
() is defining the separating plane.
In case you are interested in a working example in R or the math details, then check this out.
Here is a naïve algorithm that I'm quite sure will work (and, if so, shows that the problem is not NP-complete, as another post claims), but I wouldn't be surprised if it can be done more efficiently: If a separating line exists, it will be possible to move and rotate it until it hits two of the X'es or one X and one O. Therefore, we can simply look at all the possible lines that intersect two X'es or one X and one O, and see if any of them are dividing lines. So, for each of the O(n^2) pairs, iterate over all the n-2 other elements to see if all the X'es are on one side and all the O's on the other. Total time complexity: O(n^3).
Linear perceptron is guaranteed to find such separation if one exists.
See: http://en.wikipedia.org/wiki/Perceptron .
You can probably apply linear programming to this problem. I'm not sure of its computational complexity in formal terms, but the technique is successfully applied to very large problems covering a wide range of domains.
Computing a linear SVM then determining which side of the computed plane with optimal marginals each point lies on will tell you if the points are linearly separable.
This is overkill, but if you need a quick one off solution, there are many existing SVM libraries that will do this for you.
As mentioned by ElKamina, Linear Perceptron is guaranteed to find a solution if one exists. This approach is not efficient for large dimensions. Computationally the most effective way to decide whether two sets of points are linearly separable is by applying linear programming.
A code with an example to solve using Perceptron in Matlab is here
In general that problem is NP-hard but there are good approximate solutions like K-means clustering.
Well, both Perceptron and SVM (Support Vector Machines) can tell if two data sets are separable linearly, but SVM can find the Optimal Hiperplane of separability. Besides, it can work with n-dimensional vectors, not only points.
It is used in applications such as face recognition. I recomend to go deep into this topic.

Simplified (or smooth) polygons that contain the original detailed polygon

I have a detailed 2D polygon (representing a geographic area) that is defined by a very large set of vertices. I'm looking for an algorithm that will simplify and smooth the polygon, (reducing the number of vertices) with the constraint that the area of the resulting polygon must contain all the vertices of the detailed polygon.
For context, here's an example of the edge of one complex polygon:
My research:
I found the Ramer–Douglas–Peucker algorithm which will reduce the number of vertices - but the resulting polygon will not contain all of the original polygon's vertices. See this article Ramer-Douglas-Peucker on Wikipedia
I considered expanding the polygon (I believe this is also known as outward polygon offsetting). I found these questions: Expanding a polygon (convex only) and Inflating a polygon. But I don't think this will substantially reduce the detail of my polygon.
Thanks for any advice you can give me!
Edit
As of 2013, most links below are not functional anymore. However, I've found the cited paper, algorithm included, still available at this (very slow) server.
Here you can find a project dealing exactly with your issues. Although it works primarily with an area "filled" by points, you can set it to work with a "perimeter" type definition as yours.
It uses a k-nearest neighbors approach for calculating the region.
Samples:
Here you can request a copy of the paper.
Seemingly they planned to offer an online service for requesting calculations, but I didn't test it, and probably it isn't running.
HTH!
I think Visvalingam’s algorithm can be adapted for this purpose - by skipping removal of triangles that would reduce the area.
I had a very similar problem : I needed an inflating simplification of polygons.
I did a simple algorithm, by removing concav point (this will increase the polygon size) or removing convex edge (between 2 convex points) and prolongating adjacent edges. In any case, doing one of those 2 possibilities will remove one point on the polygon.
I choosed to removed the point or the edge that leads to smallest area variation. You can repeat this process, until the simplification is ok for you (for example no more than 200 points).
The 2 main difficulties were to obtain fast algorithm (by avoiding to compute vertex/edge removal variation twice and maintaining possibilities sorted) and to avoid inserting self-intersection in the process (not very easy to do and to explain but possible with limited computational complexity).
In fact, after looking more closely it is a similar idea than the one of Visvalingam with adaptation for edge removal.
That's an interesting problem! I never tried anything like this, but here's an idea off the top of my head... apologies if it makes no sense or wouldn't work :)
Calculate a convex hull, that might be way too big / imprecise
Divide the hull into N slices, for example joining each one of the hull's vertices to the center
Calculate the intersection of your object with each slice
Repeat recursively for each intersection (calculating the intersection's hull, etc)
Each level of recursion should give a better approximation.... when you reached a satisfying level, merge all the hulls from that level to get the final polygon.
Does that sound like it could do the job?
To some degree I'm not sure what you are trying to do but it seems you have two very good answers. One is Ramer–Douglas–Peucker (DP) and the other is computing the alpha shape (also called a Concave Hull, non-convex hull, etc.). I found a more recent paper describing alpha shapes and linked it below.
I personally think DP with polygon expansion is the way to go. I'm not sure why you think it won't substantially reduce the number of vertices. With DP you supply a factor and you can make it anything you want to the point where you end up with a triangle no matter what your input. Picking this factor can be hard but in your case I think it's the best method. You should be able to determine the factor based on the size of the largest bit of detail you want to go away. You can do this with direct testing or by calculating it from your source data.
http://www.it.uu.se/edu/course/homepage/projektTDB/ht13/project10/Project-10-report.pdf
I've written a simple modification of Douglas-Peucker that might be helpful to anyone having this problem in the future: https://github.com/prakol16/rdp-expansion-only
It's identical to DP except that it pushes a line segment outwards a bit if the points that it would remove are outside the polygon. This guarantees that the resulting simplified polygon contains all the original polygon, but it has almost the same number of line segments as the original DP algorithm and is usually reasonably good at approximating the original shape.

Resources