I have a number of 2D line segments that should all intersect at one point but don't because of noise earlier in the calculations that cannot be reduced.
Is there an algorithm to compute the best approximation to the intersection of all the line segments. Something like the point with the minimum average distance to all the line segments that doesn't necessarily lie on any of the segments?
The first comment from Amit is your answer. I'll explain why.
Let p_i be your points of intersection and c = 1/n sum(p_i). Let's show that c minimizes the average distance, d(a) between the p_i and an arbitrary point a:
d(a) = 1/n sum( |a-p_i|^2 )
What is being averaged in d(a) is, using inner product notation,
|a-p_i|^2 = <a-p_i, a-p_i> = |a|^2 + |p_i|^2 - 2<a,p_i>`
The average of <a,p_i> is just <a,c>, using the bilinear properties of dot product. So,
d(a) = |a|^2 - 2<a,c> + 1/n sum( |p_i|^2 )
And so likewise
d(c) = |c|^2 - 2<c,c> + 1/n sum( |p_i|^2 ) = -|c|^2 + 1/n sum( |p_i|^2 )
Subtracting the two
d(a) - d(c) = |a|^2 - 2<a,c> + |c|^2 = |a-c|^2
So, adding d(c) to both sides, the average distance to an arbitrary point a is
d(a) = d(c) + |a-c|^2
which since all terms are positive is minimized when |a-c|^2 is zero, in other words, when a = c.
If we have freedom to select a metric, sum of squared distances may give a simple algorithm.
We can represent square of distance to a line #i as function of point coordinates, we will get (A[i]x,x)+(b[i],x)+c[i], A[i] is a matrix 3x3, b[i] - vector, c[i] - number, (a,b) - scalar multiplication.
Their sum will be (A[sum]x,x)+(b[sum],x)+c[sum].
Minimum of such function is x=-inverse(A[sum])b[sum]/2.
Related
I am working on a genetic algorithm. Here is how it works :
Input : a list of 2D points
Input : the degree of the curve
Output : the equation of the curve that passes through points the best way (try to minimize the sum of vertical distances from point's Ys to the curve)
The algorithm finds good equations for simple straight lines and for 2-degree equations.
But for 4 points and 3 degree equations and more, it gets more complicated. I cannot find the right combination of parameters : sometimes I have to wait 5 minutes and the curve found is still very bad. I tried modifying many parameters, from population size to number of parents selected...
Do famous combinations/theorems in GA programming can help me ?
Thank you ! :)
Based on what is given, you would need a polynomial interpolation in which, the degree of the equation is number of points minus 1.
n = (Number of points) - 1
Now having said that, let's assume you have 5 points that need to be fitted and I am going to define them in a variable:
var points = [[0,0], [2,3], [4,-1], [5,7], [6,9]]
Please be noted the array of the points have been ordered by the x values which you need to do.
Then the equation would be:
f(x) = a1*x^4 + a2*x^3 + a3*x^2 + a4*x + a5
Now based on definition (https://en.wikipedia.org/wiki/Polynomial_interpolation#Constructing_the_interpolation_polynomial), the coefficients are computed like this:
Now you need to used the referenced page to come up with the coefficient.
It is not that complicated, for the polynomial interpolation of degree n you get the following equation:
p(x) = c0 + c1 * x + c2 * x^2 + ... + cn * x^n = y
This means we need n + 1 genes for the coefficients c0 to cn.
The fitness function is the sum of all squared distances from the points to the curve, below is the formula for the squared distance. Like this a smaller value is obviously better, if you don't want that you can take the inverse (1 / sum of squared distances):
d_squared(xi, yi) = (yi - p(xi))^2
I think for faster conversion you could limit the mutation, e.g. when mutating choose a new value with 20% probability between min and max (e.g. -1000 and 1000) and with 80% probabilty a random factor between 0.8 and 1.2 with which you multiply the old value.
Given n points in 2-D plane, like (0,0),(1,1), ... We can select any three points from them to construct angle. For example, we choose A(0, 0), B(1, 1), C(1, 0), then we get angle ABC = 45 degree, ACB = 90 degree and CAB = 45 degree.
My question is how to calculate max or min angle determined by three points selected from n points.
Obviously, we can use brute-force algorithm - calculate all angels and find maximal and minimal value, using Law Of Cosines to calculate angles and Pythagorean theorem to calculate distances. But does efficient algorithm exist?
If I'm correct, brute-force runs in O(n^3): you basically take every triplet, compute the 3 angles, and store the overall max.
You can improve slightly to O(n^2 * log(n)) but it's trickier:
best_angle = 0
for each point p1:
for each point p2:
compute vector (p1, p2), and the signed angle it makes with the X-axis
store it in an array A
sort array A # O(n * log(n))
# Traverse A to find the best choice:
for each alpha in A:
look for the element beta in A closest to alpha+Pi
# Takes O(log n) because it's sorted. Don't forget to take into account the fact that A represents a circle: both ends touch...
best_angle = max(best_angle, abs(beta - alpha))
The complexity is O(n * (n + nlog(n) + n * log(n))) = O(n^2 * log(n))
Of course you can also retrieve the pt1, pt2 that obtained the best angle during the loops.
There is probably still better, this feels like doing too much work overall, even if you re-use your previous computations of pt1 for pt2, ..., ptn ...
Input: S = {p1, . . . , pn}, n points on 2D plane each point is given by its x and y-coordinate.
For simplicity, we assume:
The origin (0, 0) is NOT in S.
Any line L passing through (0, 0) contains at most one point in S.
No three points in S lie on the same line.
If we pick any three points from S, we can form a triangle. So the total number of triangles that can be formed this way is Θ(n^3).
Some of these triangles contain (0, 0), some do not.
Problem: Calculate the number of triangles that contain (0, 0).
You may assume we have an O(1) time function Test(pi, pj , pk) that, given three points pi, pj , pk in S, returns 1, if the triangle formed by {pi, pj , pk} contains (0, 0), and returns 0 otherwise. It’s trivial to solve the problem in Θ(n^3) time (just enumerate and test all triangles).
Describe an algorithm for solving this problem with O(n log n) run time.
My analysis of the above problem leads to the following conclusion
There are 4 coordinates ( + ,+ ) , ( + ,- ) , ( -, - ), ( -, + ) { x and y coordinate > 0 or not }.
Let
s1 = coordinate x < 0 and y > 0
s2 = x > 0 , y > 0
s3 = x < 0 , y < 0
s4 = x > 0 , y < 0
Now we need to do the testing of points in between sets of the following combinations only
S1 S2 S3
S1 S1 S4
S2 S2 S3
S3 S3 S2
S1 S4 S4
S1 S3 S4
S1 S2 S4
S2 S3 S4
I now need to test the points in the above combination of sets only ( e.g. one point from s1 , one point from s2 and one point from s3 < first combinaton > ) and see the points contain (0,0) by calling Test function ( which is assumed as constant time function here) .
Can someone guide me on this ?
Image added below for clarification on why only some subsets (s1,s2 , s4 ) can contain (0,0) and some ( s1,s1,s3) cannot.
I'm guessing we're in the same class (based on the strange wording of the question), so now that the due date is past, I feel alright giving out my solution. I managed to find the n log n algorithm, which, as the question stated, is more a matter of cleverly transforming the problem, and less of a Dynamic Programming / DaC solution.
Note: This is not an exhaustive proof, I leave that to you.
First, some visual observations. Take some triangle that obviously contains the origin.
Then, convert the points to vectors.
Convince yourself that any selection of three points, one from each vector, describes a triangle that also contains the origin.
It also follows that, if you perform the above steps on a triangle that doesn't enclose the origin, any combination of points along those vectors will also not contain the origin.
The main point to get from this is, the magnitude of the vector does not matter, only the direction. Additionally, a hint to the question says that "any line crossing (0,0) only contains one point in S", from which we can extrapolate that the direction of each vector is unique.
So, if only the angle matters, it would follow that there is some logic that determines what range of points, given two points, could possibly form a triangle that encloses the origin. For simplicity, we'll assume we've taken all the points in S and converted them to vectors, then normalized them, effectively making all points lie on the unit circle.
So, take two points along this circle.
Then, draw a line from each point through the origin and to the opposite side of the circle.
It follows that, given the two points, any point that lies along the red arc can form a triangle.
So our algorithm should do the following:
Take each point in S. Make a secondary array A, and for each point, add the angle along the unit circle (atan2(x,y)) to A (0 ≤ Ai ≤ 2π). Let's assume this is O(n)
Sort A by increasing. O(n log n), assuming we use Merge Sort.
Count the number of triangles possible for each pair (Ai,Aj). This means that we count the number of Ai + π ≤ Ak ≤ Aj + π. Since the array is sorted, we can use a Binary Search to find the indices of Ai + π and Aj + π, which is O(2 log n) = O(log n)
However, we run into a problem, there are n^2 points, and if we have to do an O(log n) search for each, we have O(n^2 log n). So, we need to make one more observation.
Given some Ai < Aj, we'll say Tij describes the number of triangles possible, as calculated by the above method. Then, given a third Ak > Aj, we know that Tij ≤ Tik, as the number of points between Ai + π and Ak + π must be at least as many as there are betwen Ai + π and Aj + π. In fact, it is exactly the count between Ai + π and Aj + π, plus the count between Aj + π and Ak + π. Since we already know the count between Ai + π and Aj + π, we don't need to recalculate it - we only need to calculate the number between Aj + π and Ak + π, then add the previous count. It follows that:
A(n) = count(A(n),A(n-1)) + count(A(n-1),A(n-2)) + ... + count(A(1),A(0))
And this means we don't need to check all n^2 pairs, we only need to check consecutive pairs - so, only n-1.
So, all the above can give us the following psuedocode solution.
int triangleCount(point P[],int n)
int A[n], C[n], totalCount = 0;
for(i=0...n)
A[i] = atan2(P[i].x,P[i].y);
mergeSort(A);
int midPoint = binarySearch(A,π);
for(i=0...midPoint-1)
int left = A[i] + π, right = A[i+1] + π;
C[i] = binarySearch(a,right) - binarySearch(a,left);
for(j=0...i)
totalCount += C[j]
return totalCount;
It seems that in the worst case there are Θ(n3) triangles containing the origin, and since you need them all, the answer is no, there is no better algorithm.
For a worst case consider a regular polygon of an odd degree n, centered at the origin.
Here is an outline of the calculations. A chord connecting two vertices which are k < n/2 vertices apart is a base for Θ(k) triangles. Fix a vertex; its contribution is a sum over all chords coming from it, yielding Θ(n2), and a total (a contribution of all n vertices) is Θ(n3) (each triangle is counted 3 times, which doesn't affect the asymptotic).
I am fitting a plane to a 3D point set with the least square method. I already have algorithm to do that, but I want to modify it to use weighted least square. Meaning I have a weight for each point (the bigger weight, the closer the plane should be to the point).
The current algorithm (without weight) looks like this:
Compute the sum:
for(Point3D p3d : pointCloud) {
pos = p3d.getPosition();
fSumX += pos[0];
fSumY += pos[1];
fSumZ += pos[2];
fSumXX += pos[0]*pos[0];
fSumXY += pos[0]*pos[1];
fSumXZ += pos[0]*pos[2];
fSumYY += pos[1]*pos[1];
fSumYZ += pos[1]*pos[2];
}
than make the matrices:
double[][] A = {
{fSumXX, fSumXY, fSumX},
{fSumXY, fSumYY, fSumY},
{fSumX, fSumY, pointCloud.size()}
};
double[][] B = {
{fSumXZ},
{fSumYZ},
{fSumZ}
};
than solve Ax = B and the 3 components of the solution are the coefficients of the fitted plain...
So, can you please help me how to modify this to use weights? Thanks!
Intuition
A point x on a plane defined by normal n and a point on the plane p obeys: n.(x - p) = 0. If a point y does not lie on the plane, n.(y -p) will not be equal to zero, so a useful way to define a cost is by |n.(y - p)|^2 . This is the squared distance of the point y from the plane.
With equal weights, you want to find an n that minimizes the total squared error when summing over the points:
f(n) = sum_i | n.(x_i - p) |^2
Now this assumes we know some point p that lies on the plane. We can easily compute one as the centroid, which is simply the component-wise mean of the points in the point cloud and will always lie in the least-squares plane.
Solution
Let's define a matrix M where each row is the ith point x_i minus the centroid c, we can re-write:
f(n) = | M n |^2
You should be able to convince yourself that this matrix multiplication version is the same as the sum on the previous equation.
You can then take singular value decomposition of M, and the n you want is then given by the right singular vector of M that corresponds to the smallest singular value.
To incorporate weights you simply need to define a weight w_i for each point. Calculate c as the weighted average of the points, and change sum_i | n.(x_i - c) |^2 to sum_i | w_i * n.(x_i - c) |^2, and the matrix M in a similar way. Then solve as before.
Multiply each term in each sum by the corresponding weight. For example:
fSumZ += weight * pos[2];
fSumXX += weight * pos[0]*pos[0];
Since pointCloude.size() is the sum of 1 for all points, it should be replaced with the sum of all weights.
Start from redefining the least-square error calculation. The formula tries to minimize the sum of squares of errors. Multiply the squared error by a function of two points which decreases with their distance. Then try to minimize the weighted sum of squared errors and derive the coefficients from that.
The problem:
N points are given on a 2-dimensional plane. What is the maximum number of points on the same straight line?
The problem has O(N2) solution: go through each point and find the number of points which have the same dx / dy with relation to the current point. Store dx / dy relations in a hash map for efficiency.
Is there a better solution to this problem than O(N2)?
There is likely no solution to this problem that is significantly better than O(n^2) in a standard model of computation.
The problem of finding three collinear points reduces to the problem of finding the line that goes through the most points, and finding three collinear points is 3SUM-hard, meaning that solving it in less than O(n^2) time would be a major theoretical result.
See the previous question on finding three collinear points.
For your reference (using the known proof), suppose we want to answer a 3SUM problem such as finding x, y, z in list X such that x + y + z = 0. If we had a fast algorithm for the collinear point problem, we could use that algorithm to solve the 3SUM problem as follows.
For each x in X, create the point (x, x^3) (for now we assume the elements of X are distinct). Next, check whether there exists three collinear points from among the created points.
To see that this works, note that if x + y + z = 0 then the slope of the line from x to y is
(y^3 - x^3) / (y - x) = y^2 + yx + x^2
and the slope of the line from x to z is
(z^3 - x^3) / (z - x) = z^2 + zx + x^2 = (-(x + y))^2 - (x + y)x + x^2
= x^2 + 2xy + y^2 - x^2 - xy + x^2 = y^2 + yx + x^2
Conversely, if the slope from x to y equals the slope from x to z then
y^2 + yx + x^2 = z^2 + zx + x^2,
which implies that
(y - z) (x + y + z) = 0,
so either y = z or z = -x - y as suffices to prove that the reduction is valid.
If there are duplicates in X, you first check whether x + 2y = 0 for any x and duplicate element y (in linear time using hashing or O(n lg n) time using sorting), and then remove the duplicates before reducing to the collinear point-finding problem.
If you limit the problem to lines passing through the origin, you can convert the points to polar coordinates (angle, distance from origin) and sort them by angle. All points with the same angle lie on the same line. O(n logn)
I don't think there is a faster solution in the general case.
The Hough Transform can give you an approximate solution. It is approximate because the binning technique has a limited resolution in parameter space, so the maximum bin will give you some limited range of possible lines.
Again an O(n^2) solution with pseudo code. Idea is create a hash table with line itself as the key. Line is defined by slope between the two points, point where line cuts x-axis and point where line cuts y-axis.
Solution assumes languages like Java, C# where equals method and hashcode methods of the object are used for hashing function.
Create an Object (call SlopeObject) with 3 fields
Slope // Can be Infinity
Point of intercept with x-axis -- poix // Will be (Infinity, some y value) or (x value, 0)
Count
poix will be a point (x, y) pair. If line crosses x-axis the poix will (some number, 0). If line is parallel to x axis then poix = (Infinity, some number) where y value is where line crosses y axis.
Override equals method where 2 objects are equal if Slope and poix are equal.
Hashcode is overridden with a function which provides hashcode based on combination of values of Slope and poix. Some pseudo code below
Hashmap map;
foreach(point in the array a) {
foeach(every other point b) {
slope = calculateSlope(a, b);
poix = calculateXInterception(a, b);
SlopeObject so = new SlopeObject(slope, poix, 1); // Slope, poix and intial count 1.
SlopeObject inMapSlopeObj = map.get(so);
if(inMapSlopeObj == null) {
inMapSlopeObj.put(so);
} else {
inMapSlopeObj.setCount(inMapSlopeObj.getCount() + 1);
}
}
}
SlopeObject maxCounted = getObjectWithMaxCount(map);
print("line is through " + maxCounted.poix + " with slope " + maxCounted.slope);
Move to the dual plane using the point-line duality transform for p=(a,b) p*:y=a*x + b.
Now using a line sweep algorithm find all intersection points in NlogN time.
(If you have points which are one above the other just rotate the points to some small angle).
The intersection points corresponds in the dual plane to lines in the primer plane.
Whoever said that since 3SUM have a reduction to this problem and thus the complexity is O(n^2). Please note that the complexity of 3SUM is less than that.
Please check https://en.wikipedia.org/wiki/3SUM and also read
https://tmc.web.engr.illinois.edu/reduce3sum_sosa.pdf
As already mentioned, there probably isn't a way to solve the general case of this problem better than O(n^2). However, if you assume a large number of points lie on the same line (say the probability that a random point in the set of points lie on the line with the maximum number of points is p) and don't need an exact algorithm, a randomized algorithm is more efficient.
maxPoints = 0
Repeat for k iterations:
1. Pick 2 random, distinct points uniformly at random
2. maxPoints = max(maxPoints, number of points that lies on the
line defined by the 2 points chosen in step 1)
Note that in the first step, if you picked 2 points which lies on the line with the maximum number of points, you'll get the optimal solution. Assuming n is very large (i.e. we can treat the probability of finding 2 desirable points as sampling with replacement), the probability of this happening is p^2. Therefore the probability of finding a suboptimal solution after k iterations is (1 - p^2)^k.
Suppose you can tolerate a false negative rate rate = err. Then this algorithm runs in O(nk) = O(n * log(err) / log(1 - p^2)). If both n and p are large enough, this is significantly more efficient than O(n^2). (i.e. Supposed n = 1,000,000 and you know there are at least 10,000 points that lie on the same line. Then n^2 would required on the magnitude of 10^12 operations, while randomized algorithm would require on the magnitude of 10^9 operations to get a error rate of less than 5*10^-5.)
It is unlikely for a $o(n^2)$ algorithm to exist, since the problem (of even checking if 3 points in R^2 are collinear) is 3Sum-hard (http://en.wikipedia.org/wiki/3SUM)
This is not a solution better than O(n^2), but you can do the following,
For each point convert first convert it as if it where in the (0,0) coordinate, and then do the equivalent translation for all the other points by moving them the same x,y distance you needed to move the original choosen point.
2.Translate this new set of translated points to the angle with respect to the new (0,0).
3.Keep stored the maximum number (MSN) of points that are in each angle.
4.Choose the maximum stored number (MSN), and that will be the solution