How to calculate "corners" in a GPX track? - algorithm

I want to be able to take a GPX track of a twisty road and have an algorithm count the number of corners. I guess it will have to be done comparing the "bearings" of subsequent tracks.
However I'm new to this and wonder if there is a simple solution out there.

If you took your coordinates and were able to determine when the bearings changed, you would have the answer. To do this, we could find the best fit of straight line on the points given.
This is the segmented least squares problem - given a set of points, find the minimum cost line segments that fit the points.
There is a tradeoff cost between least-sqaures error and adding a new line (otherwise, creating line segments for every two points would have zero error), you will have to play with the parameters for this yourself with your data.
See
http://www.cs.princeton.edu/~wayne/cs423/lectures/dynamic-programming-4up.pdf
Segmented least squares algorithm, don't understand this dynamic programming concept at all

Related

Mapping 2D points to a fixed grid

I have any number of points on an imaginary 2D surface. I also have a grid on the same surface with points at regular intervals along the X and Y access. My task is to map each point to the nearest grid point.
The code is straight forward enough until there are a shortage of grid points. The code I've been developing finds the closest grid point, displaying an already mapped point if the distance will be shorter for the current point.
I then added a second step that compares each mapped point to another and, if swapping the mapping with another point produces a smaller sum of the total mapped distance of both points, I swap them.
This last step seems important as it reduces the number crossed map lines. (This would be used to map points on a plate to a grid on another plate, with pins connecting the two, and lines that don't cross seem to have a higher chance that the pins would not make contact.)
Questions:
Can anyone comment on my thinking that if the image above were truly optimized, (that is, the mapped points--overall--would have the smallest total distance), then none of the lines were cross?
And has anyone seen any existing algorithms to help with this. I've searched but came up with nothing.
The problem could be approached as a variation of the Assignment Problem, with the "agents" being the grid squares and the points being the "tasks", (or vice versa) with the distance between them being the "cost" for that agent-task combination. You could solve with the Hungarian algorithm.
To handle the fact that there are more grid squares than points, find a bounding box for the possible grid squares you want to consider and add dummy points that have a cost of 0 associated with all grid squares.
The Hungarian algorithm is O(n3), perhaps your approach is already good enough.
See also:
How to find the optimal mapping between two sets?
How to optimize assignment of tasks to agents with these constraints?
If I understand your main concern correctly, minimising total length of line segments, the algorithm you used does not find the best mapping and it is clear in your image. e.g. when two line segments cross each other, simple mathematic says that if you rearrange their endpoints such that they do not cross, it provides a better total sum. You can use this simple approach (rearranging crossed items) to get better approximation to the optimum, you should apply swapping for more somehow many iterations.
In the following picture you can see why crossing has longer length than non crossing (first question) and also why by swapping once there still exists crossing edges (second question and w.r.t. Comments), I just drew one sample, in fact one may need many iterations of swapping to get non crossed result.
This is a heuristic algorithm certainly not optimum but I expect to be very good and efficient and simple to implement.

Algorithm for getting an area from line data (CAD fill algorithm)

I am searching for an algorithm which calculates, from a list of lines, the area a given point (P) is in. The lines are stored with x and y coordinates, the resulting area schould be stored as a polygon which may has holes in it.
Here two images to illustrate what I mean:
Many CAD applications use such algorithms to let the user hatch areas. I don't know how to call such a method so I don't really know what to search for.
Edit 1: To clearify what I'm looking for:
I don't need an algorithm to calculate the area of a polygon, I need an algorithm which returns the polygon which is formed by lines around P. If P is outside any possible polygon, the algorithm fails.
(I also edited the images)
Edit 2: I read the accepted answer of this possible duplicate and it seams I'm looking for a point location algorithm.
I also stumpled across this site which does exactly explains what I'm looking for and more important it led me to the open-source CGAL library which provides the functionality to do such things. Point 4.6 of the CGAL manual gives an example how to use this library to form a region from a bunch of line segments.
One way to do it is to imagine a line from P to infinity. For each polygon test each edge to see if it crosses the line. Count up how many lines cross. If its even then the point is outside the polygon, if its odd then the point is inside the polygon.
This is a fairly standard mathematical result. If you want to get a little deeper into the subject have a look at https://en.wikipedia.org/wiki/Winding_number.
I did a fairly similar things earlier this week Detecting Regions Described by Lines on HTML5 Canvas
this calculates a the polygon around a point given a set of lines. You can see a working example at the jsfiddle mentioned in the answer
the difference is that it uses infinite mathematical lines rather than line segments, but it could easily be modified for this example.
An outline algorithm
First construct two data-structures one of line-segments one of intersection-points of line-segments. In each record which two lines give each intersection, so you can flip from
one to the other. You can get this by pair-wise intersection of line-segments.
It might be easiest to cut the segments at the intersection points so each segment only have two solutions on it, one at each end. We assume we have done this.
Prune the data-structures, remove all line-segments with no intersections and only one segment - these cannot contribute.
Find possible starting lines. You could calculate the distance from each line to the point and take the smallest. You could check for intersections with a line from the point to infinity.
Walk around the polygon anti-clockwise. With you starting line find the most anti-clockwise end. At that point find the most anti-clockwise segment. Follow that and repeat. It may happen that closed loop is formed, in which case discard all the segments in the loop. Continue until you get back to the starting point.
Check if the polygon actually encloses the point. If so we are done. If not discard segments in the polygon and start again.

extract points which satisfy certain conditions

I have an array of points in one plane. They form some shape. I need to extract points from this array which only form straight lines of this shape.
At this moment I have an algorithm but it does not work very good. I take first two points, make a straight line and then check if the following points lie on it with some tolerance. But there is a problem: the points which form straight line are not really on the straight but have some deviation. This deviation is quite large. If in my algorithm I make deviation large enough to get points from the straight part, then other points which are on the slightly bent part but have deviation less then specified also extracted.
I am looking for some idea on how to perform such task.
Here is the picture:
In circles are the parts which I want to extract. Red points are the parts which I could extract with my approach. If I increase the tolerance then I miss the straight pieces too.
First, if you already have some candidate subset of points and want to check whether they lie on a straight line. Use a form of linear regression to identify the best-fitting line, then check how well it fits and accept or reject the hypothesis that this particular segment is linear based on that.
One of the most standard ways of doing that is using Least Squares method.
Identifying the subset is a different problem, the best solution to which will depend strongly on the kind of data you have and the objective. I suggest that enumerating all the segments is a good starting point, if the amount of data is not extremely large, -- that should be doable in no more than cubic time, I gather.
There are certainly some approximations one can apply, e.g. choosing a point in the sequence and building a subset by iteratively adding points on either side as long as the segment remains linear within the tolerance threshold, than accepting or rejecting it if the segment is long enough.
I assume here that the curve is parameterizable by one of the coordinates. If this is not the case, e.g. if the curve is closed, additional steps may be required to separate the curve into parameterizable segments.
EDIT: how to check a segment is straight
There's a number of options.
First, I would expect that for a straight line the average deviation would stay roughly the same as you add the new points, then you can simply find a reasonable threshold on that given the data.
Second option is to further split the subset into a fixed number of parts (e.g. 2), find the best fitting line for each one and then compare these. In case of a straight line, roughly the same line should be predicted, but for a curve it would be different.
Third option is to perform nonlinear curve fitting, e.g. fit a quadratic curve and check the coefficient for the quadratic term -- if the line is straight, it should be close to zero.
In each case, of course, there is a tradeoff between the segment size and the deviation of the points from that segment. In the extreme case, there would either be one huge linear segment with huge deviation or a whole buch of 2-point segments with 0 deviation. The actual threshold on the deviation, the difference between the tangent curves, or the magnitude of the quadratic term (depending on the option you prefer) has to be selected for the given dataset to suit your needs. Looking at the plot, I would say that the threshold should be picked so as to allow for segments of length 10 or so.

How to fit more than one line to data points

I am trying to fit more than one line to a list of points in 2D. My points are quite low in number (16 or 32).
These points are coming from a simulated environment of a robot with laser range finders attached to its side. If the points lie on a line it means that they detected a wall, if not, it means they detected an obstacle. I am trying to detect the walls and calculate their intersection, and for this I thought the best idea is to fit lines on the dataset.
Fitting one line to a set of points is not a problem, if we know all those points line on or around a line.
My problem is that I don't know how can I detect which sets of points should be classified for fitting on the same line and which should not be, for each line. Also, I don't even now the number of points on a line, while naturally it would be the best to detect the longest possible line segment.
How would you solve this problem? If I look at all the possibilities for example for groups of 5 points for all the 32 points then it gives 32 choose 5 = 201376 possibilities. I think it takes way too much time to try all the possibilities and try to fit a line to all 5-tuples.
So what would be a better algorithm what would run much faster? I could connect points within limit and create polylines. But even connecting the points is a hard task, as the edge distances change even within a single line.
Do you think it is possible to do some kind of Hough transform on a discrete dataset with such a low number of entries?
Note: if this problem is too hard to solve, I was thinking about using the order of the sensors and use it for filtering. This way the algorithm could be easier but if there is a small obstacle in front of a wall, it would distract the continuity of the line and thus break the wall into two halves.
A good way to find lines in noisy point data like this is to use RANSAC. Standard RANSAC usage is to pick the best hypothesis (=line in this case) but you can just as easy pick the best 2 or 4 lines given your data. Have a look at the example here:
http://www.janeriksolem.net/2009/06/ransac-using-python.html
Python code is available here
http://www.scipy.org/Cookbook/RANSAC
The first thing I would point out is that you seem to be ignoring a vital aspect of the data, which is you know which sensors (or readings) are adjacent to each other. If you have N laser sensors you know where they are bolted to the robot and if you are rotating a sensor you know the order (and position) in which the measurements are taken. So, connecting points together to form a piecewise linear fit (polylines) should be trivial. Once you had these correspondances you could take each set of four points and determine if they can be modeled effectively by only 2 lines, or something.
Secondly, it's well known that finding a globally optimal fit for even two lines to an arbitrary set of points is NP-Hard as it can be reduced to k-means clustering, so I would not expect to find an efficient algorithm for this. When I used the Hough transform it was for finding corners based on pixel intensities, in theory it is probably applicable to this sort of problem but it is just an approximation and it is probably going to take a fair bit of work to figure out and implement.
I hate to suggest it but it seems like you might benefit by looking at the problem in a slightly different way. When I was working in autonomous navigation with a laser range finder we solved this problem by discretizing the space into an occupancy grid, which is the default approach. Of course, this just assumes the walls are just another object, which is not a particularly outrageous idea IMO. Beyond that, can you assume you have a map of the walls and you are just trying to find the obstacles and localize (find the pose of) the robot? If so, there are a large number of papers and tech reports on this problem.
A part of the solution could be (it's where I would start) to investigate the robot. Questions such as:
How fast is the robot turning/moving?
At what interval is the robot shooting the laser?
Where was the robot and what was its orientation when a point was found?
The answers to these questions might help you better than trying to find some algorithm that relies on the points only. Especially when there are so very few.
The answers to these questions give you more information/points. E.g., if you know the robot was at a specific position when it detected a point, there are no points in between the position of the robot and the detected point. that is very valuable.
For all the triads, fit a line through them and compute how much the line is deviating or not from the points.
Then use only the good (little-deviating) triads and merge them if they have two points at common, and the grow the sets by appending all triads that have (at least) two points in the set.
As the last step you can ditch the triads that haven't been merged with any other but have some common element with result set of at least 4 elements (to get away with crossing lines) but keep those triads that did not merge with anyone but does not have any element in common with sets (this would yield three points on the right side of your example as one of the lines).
I presume this would find the left line and bottom line in the second step, and the right line the third.

Space partitioning algorithm

I have a set of points which are contained within the rectangle. I'd like to split the rectangles into subrectangles based on point density (giving a number of subrectangles or desired density, whichever is easiest).
The partitioning doesn't have to be exact (almost any approximation better than regular grid would do), but the algorithm has to cope with the large number of points - approx. 200 millions. The desired number of subrectangles however is substantially lower (around 1000).
Does anyone know any algorithm which may help me with this particular task?
Just to understand the problem.
The following is crude and perform badly, but I want to know if the result is what you want>
Assumption> Number of rectangles is even
Assumption> Point distribution is markedly 2D (no big accumulation in one line)
Procedure>
Bisect n/2 times in either axis, looping from one end to the other of each previously determined rectangle counting "passed" points and storing the number of passed points at each iteration. Once counted, bisect the rectangle selecting by the points counted in each loop.
Is that what you want to achieve?
I think I'd start with the following, which is close to what #belisarius already proposed. If you have any additional requirements, such as preferring 'nearly square' rectangles to 'long and thin' ones you'll need to modify this naive approach. I'll assume, for the sake of simplicity, that the points are approximately randomly distributed.
Split your initial rectangle in 2 with a line parallel to the short side of the rectangle and running exactly through the mid-point.
Count the number of points in both half-rectangles. If they are equal (enough) then go to step 4. Otherwise, go to step 3.
Based on the distribution of points between the half-rectangles, move the line to even things up again. So if, perchance, the first cut split the points 1/3, 2/3, move the line half-way into the heavy half of the rectangle. Go to step 2. (Be careful not to get trapped here, moving the line in ever decreasing steps first in one direction, then the other.)
Now, pass each of the half-rectangles in to a recursive call to this function, at step 1.
I hope that outlines the proposal well enough. It has limitations: it will produce a number of rectangles equal to some power of 2, so adjust it if that's not good enough. I've phrased it recursively, but it's ideal for parallelisation. Each split creates two tasks, each of which splits a rectangle and creates two more tasks.
If you don't like that approach, perhaps you could start with a regular grid with some multiple (10 - 100 perhaps) of the number of rectangles you want. Count the number of points in each of these tiny rectangles. Then start gluing the tiny rectangles together until the less-tiny rectangle contains (approximately) the right number of points. Or, if it satisfies your requirements well enough, you could use this as a discretisation method and integrate it with my first approach, but only place the cutting lines along the boundaries of the tiny rectangles. This would probably be much quicker as you'd only have to count the points in each tiny rectangle once.
I haven't really thought about the running time of either of these; I have a preference for the former approach 'cos I do a fair amount of parallel programming and have oodles of processors.
You're after a standard Kd-tree or binary space partitioning tree, I think. (You can look it up on Wikipedia.)
Since you have very many points, you may wish to only approximately partition the first few levels. In this case, you should take a random sample of your 200M points--maybe 200k of them--and split the full data set at the midpoint of the subsample (along whichever axis is longer). If you actually choose the points at random, the probability that you'll miss a huge cluster of points that need to be subdivided will be approximately zero.
Now you have two problems of about 100M points each. Divide each along the longer axis. Repeat until you stop taking subsamples and split along the whole data set. After ten breadth-first iterations you'll be done.
If you have a different problem--you must provide tick marks along the X and Y axis and fill in a grid along those as best you can, rather than having the irregular decomposition of a Kd-tree--take your subsample of points and find the 0/32, 1/32, ..., 32/32 percentiles along each axis. Draw your grid lines there, then fill the resulting 1024-element grid with your points.
R-tree
Good question.
I think the area you need to investigate is "computational geometry" and the "k-partitioning" problem. There's a link that might help get you started here
You might find that the problem itself is NP-hard which means a good approximation algorithm is the best you're going to get.
Would K-means clustering or a Voronoi diagram be a good fit for the problem you are trying to solve?
That's looks like Cluster analysis.
Would a QuadTree work?
A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions. The regions may be square or rectangular, or may have arbitrary shapes. This data structure was named a quadtree by Raphael Finkel and J.L. Bentley in 1974. A similar partitioning is also known as a Q-tree. All forms of Quadtrees share some common features:
They decompose space into adaptable cells
Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits
The tree directory follows the spatial decomposition of the Quadtree

Resources