I came across a traveling salesman solution which uses Matlab script, and in its code, I found that it uses a representation called City Coordinates, which looks like:
CityCood = [0.4000,0.2439,0.1707,0.2239,0.5171;0.4439,0.1463,0.2293,0.7610,0.9414]
for 5 cities.
At this point, I am really clueless about how did the author get this representation, since from what I have seen so far, the information at hand should be a 5*5 symmetric matrix representing distance between any two of these five cities.
So I would be grateful if anyone could give me an idea on how that coordinate-based representation works. Thanks in advance.
CityCoord (I think there's a letter missing) is a 2-by-5 array. I assume this means thatCityCoord contains two coordinates (x,y) for every single city.
To create a 5-by-5 distance matrix, you can call
squareform(pdist(CityCoord'))
If you don't have the Statistics Toolbox, an equivalent form to the solution provided by #Jonas to compute the Euclidean distance is:
%# dist(u,v) = norm(u-v) = sqrt(sum((u-v).^2))
D = cell2mat( arrayfun( ...
#(i) sqrt( sum( bsxfun(#minus, CityCoord, CityCoord(:,i)).^2 ) ), ...
(1:size(CityCood,2))', ...
'UniformOutput',false) );
Otherwise, we can use the fact that ||u-v||^2 = ||u||^2 + ||v||^2 - 2*u.v to implement an even faster vectorized code:
X = sum(CityCoord.^2);
D = real( sqrt(bsxfun(#plus,X,X')-2*(CityCoord'*CityCoord)) );
Related
I have a set of points W={(x1, y1), (x2, y2),..., (xn, yn)} on the 2D plane. Can you find an algorithm that takes these points as the input and returns a point (x, y) on the 2D plane which has the minimum sum of distances from the points in W? In other words, if
di = Euclidean_distance((x, y), (xi, yi))
I want to minimize:
d1 + d2 + ... + dn
The Problem
You're looking for the geometric median.
An Easy Solution
There is no closed-form solution to this problem, so iterative or probabilistic methods are used. The easiest way to find this is probably with Weiszfeld's algorithm:
We can implement this in Python as follows:
import numpy as np
from numpy.linalg import norm as npnorm
c_pt_old = np.random.rand(2)
c_pt_new = np.array([0,0])
while npnorm(c_pt_old-c_pt_new)>1e-6:
num = 0
denom = 0
for i in range(POINT_NUM):
dist = npnorm(c_pt_new-pts[i,:])
num += pts[i,:]/dist
denom += 1/dist
c_pt_old = c_pt_new
c_pt_new = num/denom
print(c_pt_new)
There's a chance that Weiszfeld's algorithm won't converge, so it might be best to run it several times from different starting points.
A General Solution
You can also find this using second-order cone programming (SOCP). In addition to solving your specific problem, this general formulation then allows you to easily add constraints and weightings, such as variable uncertainty in the location of each data point.
To do so, you create a number of indicator variables representing the distance between the proposed center point and the data points.
You then minimize the sum of the indicator variables. The result follows
import cvxpy as cp
import numpy as np
import matplotlib.pyplot as plt
#Generate random test data
POINT_NUM = 100
pts = np.random.rand(POINT_NUM,2)
c_pt = cp.Variable(2) #The center point we wish to locate
distances = cp.Variable(POINT_NUM) #Distance from the center point to each data point
#Generate constraints. These are used to hold distances.
constraints = []
for i in range(POINT_NUM):
constraints.append( cp.norm(c_pt-pts[i,:])<=distances[i] )
objective = cp.Minimize(cp.sum(distances))
problem = cp.Problem(objective,constraints)
optimal_value = problem.solve()
print("Optimal value = {0}".format(optimal_value))
print("Optimal location = {0}".format(c_pt.value))
plt.scatter(x=pts[:,0], y=pts[:,1], s=1)
plt.scatter(c_pt.value[0], c_pt.value[1], s=10)
plt.show()
SOCPs are available in a number of solvers including CPLEX, Elemental, ECOS, ECOS_BB, GUROBI, MOSEK, CVXOPT, and SCS.
I've tested and the two approaches give the same answers to within tolerance.
Weiszfeld, E. (1937). "Sur le point pour lequel la somme des distances de n points donnes est minimum". Tohoku Mathematical Journal. 43: 355–386.
If that point does not need to be from your sample, then the mean minimises the euclidean distance.
A third method would be to use a compact nonlinear programming formulation. An unconstrained NLP model would be:
min sum(i, ||x-p(i)|| )
This has just 2 variables (the coordinates of x).
There is a very good initial point available. Let p(i,c) be the coordinates of the data points. Then the mean is
m(c) = sum(i, p(i,c)) / n
where n is the number of data points. This point is often very close to the optimal value of x. So we can use m as an excellent initial point for x.
Some limited experiments indicate this approach is quite faster than a cone programming formulation for large n.
For details see Yet Another Math Programming Consultant - Finding the Central Point in a Point Cloud blog post.
I have a matrix having around 1000 geospatial points(Longitude, Latitude) and i am trying to find the points that are in 1KM range.
NOTE: "The points are dynamic, Imagine 1000 vehicles are moving, so i have to re-calculate all distances every few seconds"
I did some searches and read about Graph algorithms like (Floyd–Warshall) to solve this, and I ended up with many keywords, and i am kinda lost now. I am considering the performance and since the search radius is short, I will not consider the curvature of the earth.
Basically, It appears that i have to calculate the distance between every point to every other point then sort the distances starting from every point in the matrix and get the points that are in its range. So if I have 1000 co-ordinates, I have to perfom this process (1000^2-1000) times and I do not beleive this is the optimum solution. Thank You.
If you make a modell with a grid of 1km spacing:
0 1 2 3
___|____|____|____
0 | | |
c| b|a | d
___|____|____|____
1 | | |
| |f |
___|e___|____|____
2 | |g |
let's assume your starting point is a.
If your grid is of 1km size, points in 1km reach have to be in the same cell or one of the 8 neighbours (Points b, d, e, f).
Every other cell can be ignored (c,g).
While d is nearly of the same distance to a as c, c can be dropped early, because there are 2 barriers to cross, while a and d lie on opposite areas of their border, and are therefore nearly 2 km away from each other.
For early dropping of element, you can exclude, it is enough to check the x- or y-part of the coordinate. Since a belongs to (0,2), if x is 0 or smaller, or > 3, the point is already out of range.
After filtering only few candidates, you may use exhaustive search.
In your case, you should be looking at the GeoHash which allows you to quickly query the coordinates within a given distance.
FYI, MongoDB uses geohash internally and it's performing excellently.
Try with an R-Tree. The R-Tree supports the operation to find all the points closest to a given point that are not further away than a given radius. The execution time is optimal and I think it's O(number_of_points_in_the_result).
You could compute geocodes of 1km range around each of those 1000 coordinates and check, whether some points are in that range. May be it's not optimum, but you will save yourself some sorting.
If you want to lookup the matrix for each point vs. each point then you already got the right formula (1000^2-1000). There isn't any shortcut for this calculation. However when you know where to start the search and you want look for points within a 1KM radius you can use a grid or spatial algorithm to speed up the lookup. Most likely it's uses a divide and conquer algorithm and the cheapest of it is a geohash or a z curve. You can also try a kd-tree. Maybe this is even simpler. But if your points are in euklidian space then there is this planar method describe here: http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.
Edit: When I say 1000^2-1000 then I mean the size of the grid but it's actually 1000^(1000 − 1) / 2 pairs of points so a lot less math.
I have something sort of similar on a web page I worked on, I think. The user clicks a location on the map and enters a radius, and a function returns all the locations within a database within the given radius. Do you mean you are trying to find the points that are within 1km of one of the points in the radius? Or are you trying to find the points that are within 1km of each other? I think you should do something like this.
radius = given radius
x1 = latitude of given point;
y1 = longitude of given point;
x2 = null;
y2 = null;
x = null;
y = null;
dist = null;
for ( i=0; i<locationArray.length; i++ ) {
x2 = locationArray[i].latitude;
y2 = locationArray[i].longitude;
x = x1 - x2;
y = y1 - y2;
dist = sqrt(x^2 + y^2);
if (dist <= radius)
these are your points
}
If you are trying to calculate all of the points that are within 1km of another point, you could add an outer loop giving the information of x1 and y1, which would then make the inner loop test the distance between the given point and every other point giving every point in your matrix as input. The calculations shouldn't take too long, since it is so basic.
I had the same problem but in a web service development
In my case to avoid the calculation time problem i used a simple divide & conquer solution : The idea was start the calculation of the distance between the new point and the others in every new data insertion, so that my application access directly the distance between those tow points that had been already calculated and put in my database
I'm trying to find the shortest path between two points in a grid with no obstacles and move in all directions (N NE E ES S SW W WN).
It seems to be a common task... Is this not implemented already in Matlab? When Matlab plots two points joined by a line ( plot(X,Y,'-') ) seems to internally do this calculation as I guess that the generated image is a grid too.
Example: From [1,1] to [3,6] one solution is [1,1; 2,2; 2,3; 2,4; 3,5; 3,6]
I have tried:
dist_x = length(linspace(p1(1),p2(1), dist(p1(1),p2(1))+1));
dist_y = length(linspace(p1(2),p2(2), dist(p1(2),p2(2))+1));
num_points = max(dist_x, dist_y);
x = round(linspace(p1(1),p2(1),num_points));
y = round(linspace(p1(2),p2(2),num_points));
But I think that it returns more points than it should and maybe there is an implemented routine.
Thanks a lot
The solution (given by J.F. Sebastian) is the Bresenham Line Algorithm.
Given a set of points, what's the fastest way to fit a parabola to them? Is it doing the least squares calculation or is there an iterative way?
Thanks
Edit:
I think gradient descent is the way to go. The least squares calculation would have been a little bit more taxing (having to do qr decomposition or something to keep things stable).
If the points have no error associated, you may interpolate by three points. Otherwise least squares or any equivalent formulation is the way to go.
I recently needed to find a parabola that passes through 3 points.
suppose you have (x1,y1), (x2,y2) and (x3,y3) and you want the parabola
y-y0 = a*(x-x0)^2
to pass through them: find y0, x0, and a.
You can do some algebra and get this solution (providing the points aren't all on a line) :
let c = (y1-y2) / (y2-y3)
x0 = ( -x1^2 + x2^2 + c*( x2^2 - x3^2 ) ) / (2.0*( -x1+x2 + c*x2 - c*x3 ))
a = (y1-y2) / ( (x1-x0)^2 - (x2-x0)^2 )
y0 = y1 - a*(x1-x0)^2
Note in the equation for c if y2==y3 then you've got a problem. So in my algorithm I check for this and swap say x1, y1 with x2, y2 and then proceed.
hope that helps!
Paul Probert
A calculated solution is almost always faster than an iterative solution. The "exception" would be for low iteration counts and complex calculations.
I would use the least squares method. I've only every coded it for linear regression fits but it can be used for parabolas (I had reason to look it up recently - sources included an old edition of "Numerical Recipes" Press et al; and "Engineering Mathematics" Kreyzig).
ALGORITHM FOR PARABOLA
Read no. of data points n and order of polynomial Mp .
Read data values .
If n< Mp
[ Regression is not possible ]
stop
else
continue ;
Set M=Mp + 1 ;
Compute co-efficient of C-matrix .
Compute co-efficient of B-matrix .
Solve for the co-efficients
a1,a2,. . . . . . . an .
Write the co-efficient .
Estimate the function value at the glren of independents variables .
Using the free arbitrary accuracy math program "PARI" (for Mac or PC):
Here is how I would fit a parabola to a set of 641 points,
and I also show how to find the minimum of that parabola:
Set a high number of digits of precision:
\p 300
Write the data points to a text file separated by one space
for each data point
(use ASCII characters in base ten, no space at file start or file end, and no returns, write extremely large or small floating points as for example
"9.0E-23" but not "9.0D-23" ).
make a string to point to that file:
fileone="./desktop/data.txt"
read that file into PARI using the following instructions:
fileopen(fileone,r)
readsplit(file) = my(cmd);cmd="perl -ne \"chomp; print '[' . join(',', split(/ +/)) . ']\n';\"";eval(externstr(Str(cmd," ",file)))
readsplit(fileone)
Label that data with a name:
in = %
V = in[1]
Define a least squares fit function:
lsf(X,Y,n) = my(M=matrix(#X,n+1,i,j,X[i]^(j-1)));fit=Polrev(matsolve(M~*M,M~*Y~))
Apply that lsf function to your 641 data points:
lsf([-320..320],V, 2)
Then if you want to show the minimum of that parabolic fit, enter:
xextreme = solve (x=-1000,1000,eval(deriv(fit)));print (xextreme*(124.5678-123.5678)/640+(124.5678+123.5678)/2);x=xextreme;print(eval(fit))
(I had to adjust for my particular x-axis scaling before the "print" statement in that command line above).
(Note: A sacrifice made to simplify this algorithm
causes it to work only
when the data set has equally spaced x-axis coordinates.)
I was worried that my last post
was too compact to follow and
too hard to convert to other environments.
I would like to show here how to solve the
generalized problem of parabolic data fitting explicitly
without specialized matrix math terminology;
and so that each multiplication, division,
subtraction and addition can be seen at once.
To save ink this fit reparameterizes the x-axis as evenly
spaced points centered on zero
so that odd powered sums all get eliminated
(saving a lot of space and time),
so the x-coordinates of the N data points
are effectively labeled by points
of this vector: X=[-(N-1)/2..(N-1)/2].
For example "xextreme" will be returned
versus those integer indices
and so (if desired) a simple (consumes very little CPU time)
linear transformation must be applied after the algorithm below
to get it versus your problem's particular x-axis labels.
This is written in the language of
the free program "PARI" but all the
commands are simple to translate to any language.
Step 1: assign a label to the y-axis data:
? V=[5,2,1,2,5]
"PARI" confirms that entry:
%280 = [5, 2, 1, 2, 5]
Then type in the following processing algorithm
which calculates a best fit parabola
through any y-axis data set with constant x-axis separation:
? g=#V;h=(g-1)*g*(g+1)/3;i=h*(3*g*g-7)/5;\
a=sum(i=1,g,V[i]);b=sum(i=1,g,(2*i-1-g)*V[i]);c=sum(i=1,g,(2*i-1-g)*(2*i-1-g)*V[i]);\
A=matdet([a,c;h,i])/matdet([g,h;h,i]);B=b/h*2;C=matdet([g,h;a,c])/matdet([g,h;h,i])*4;\
xextreme=-B/(2*C);yextreme=-B*B/(4*C)+A;fit=Polrev([A,B,C]);\
print("\n","y of extreme is ",yextreme,"\n","which occurs this many data points from center of data: ",xextreme)
(Note for non-PARI users:
the command "matdet([a,c;h,i])"
is just another way of entering "a*i-c*h")
Those commands then produce the following screen output:
y of extreme is 1
which occurs this many data points from center of data: 0
The algorithm stores the polynomial of the fit in the variable "fit":
? fit
%282 = x^2 + 1
?
(Note that to make that algorithm short
the x-axis labels are assigned as X=[-(N-1)/2..(N-1)/2],
thus they are X=[-2,-1,0,1,2]
To correct that
for the same polynomial as parameterized
by an x-axis coordinate data set of say X=[−1,0,1,2,3]:
just apply a simple linear transform, in this case:
"x^2 + 1" --> "(t - 1)^2 + 1".)
A-B-C-D are 4 points. We define r = length(B-C), angle, ang1 = (A-B-C) and angle ang2 = (B-C-D) and the torsion angle tors1 = (A-B-C-D). What I really need to do is to find the coordinates of C and D provided that I have the new values of r, ang1, ang2 and tors1.
The thing is that the points A and B are rigidly connected to each other, and points C and D are also connected to each other by a rigid connector, so to speak. That is the distance (C-D) remains fixed and also distance A-B remains fixed. There is no such rigid connection between the points B and C.
We have the old coordinates of the 4 points for some other set of (r,ang1,ang2,tors1) and we need to find the new coordinates when this defining set of variables changes to some arbitrary value.
I would be grateful for any helpful comments.
Thanks a lot.
I'm not allowed to post a picture because I'm a new user :(
Additional Info: An iterative solution is not going to be useful because I need to do this in a simulation "plenty of times O(10^6)".
I think the best way to approach this problem would be to think in terms of analytic geometry.
Each point A,B,C,D has some 3D coordinates (x,y,z) and you have some relationships between
them (e.g. distance B-C is equal to r means that
r = sqrt[ (x_b - x_c)^2 + (y_b - y_c)^2 + (z_b - z_c)^2 ]
Once you define such relations it remains to solve the resulting system of equations for the unknown values of coordinates of the points you need to determine.
This is a general approach, if you describe the problem better (maybe a picture?) it might be easy to find some efficient ways of solving such systems because of some special properties your problem has.
You haven't mentioned the coordinate system. Even if (r, a1, a2, t) don't change, the "coordinates" will change if the whole structure can be sent whirling off into space. So I'll make some assumptions:
Put B at the origin, C on the positive X axis and A in the XY plane with y>0. If you don't know the distance AB, calculate it from the old coordinates. Likewise CD.
A: (-AB cos(a1), AB sin(a1), 0)
B: (0, 0, 0)
C: (r, 0, 0)
D: (r + CD cos(a2), CD sin(a2) cos(t), CD sin(a2) sin(t))
(Just watch out for sign conventions in the angles.)
you are describing a set of constraints.
what you need to do is for every constraint check if they are still satisfied, and if not calc the most efficient way to get it correct again.
for instance, in case of length b-c=r if b-c is not r anymore, make it r again by moving both b and c to or from eachother so that the constraint is met again.
for every constraint one by one do this.
Then repeat a few times until the system has stabilized again (e.g. all constraints are met).
that's it
You are asking for a solution to a nonlinear system of equations. For the mathematically inclined, I will write out the constraint equations:
Suppose you have positions of points A,B,C,D. We define vectors AB=A-B, etc., and furthermore, we use the notation nAB to denote the normalized vector AB/|AB|. With this notation, we have:
AB.AB = fixed
CD.CD = fixed
CB.CB = r*r
nAB.nCB = cos(ang1)
nDC.nBC = cos(ang2)
Let E = D - DC.(nCB x nAB) // projection of D onto plane defined by ABC
nEC.nDC = cos(tors1)
nEC x nDC = sin(tors1) // not sure if your torsion angle is signed (if not, delete this)
where the dot (.) denotes dot product, and cross (x) denotes cross product.
Each point is defined by 3 coordinates, so there are 12 unknowns, and 6 constraint equations, leaving 6 degrees of freedom that are unconstrained. These are the 6 gauge DOFs from the translational and rotational invariance of the space.
Assuming you have old point positions A', B', C', and D', and you want to find a new solution which is "closest" (in a sense I defined) to those old positions, then you are solving an optimization problem:
minimize: AA'.AA' + BB'.BB' + CC'.CC' + DD'.DD'
subject to the 4-5 constraints above.
This optimization problem has no nice properties so you will want to use something like Conjugate Gradient descent to find a locally optimal solution with the starting guess being the old point positions. That is an iterative solution, which you said is unacceptable, but there is no direct solution unless you clarify your problem.
If this sounds good to you, I can elaborate on the nitty gritty of performing the numerical optimization.
This is a different solution than the one I gave already. Here I assume that the positions of A and B are not allowed to change (i.e. positions of A and B are constants), similar to Beta's solution. Note that there are still an infinite number of solutions, since we can rotate the structure around the axis defined by A-B and all your constraints are still satisfied.
Let the coordinates of A be A[0], A[1] and A[2], and similarly for B. You want explicit equations for C and D, as you mentioned in the response to Beta's solution, so here they are:
First find the position of C. As mentioned before, there are an infinite number of possibilities, so I will pick a good one for you.
Vector AB = A-B
Normalize(AB)
int best_i = 0;
for i = 1 to 2
if AB[i] < AB[best_i]
best_i = i
// best_i contains dimension in which AB is smallest
Vector N = Cross(AB, unit_vec[best_i]) // A good normal vector to AB
Normalize(N)
Vector T = Cross(N, AB) // AB, N, and T form an orthonormal frame
Normalize(T) // redundant, but just in case
C = B + r*AB*cos(ang1) + r*N*sin(ang1)
// Assume s is the known, fixed distance between C and D
// Update the frame
Vector BC = B-C, Normalize(BC)
N = Cross(BC, T), Normalize(N)
D = C + s*cos(tors1)*BC*cos(ang2) + s*cos(tors1)*N*sin(ang1) +/- s*sin(tors1)*T
That last plus or minus depends on how you define the orthonormal frame. Try one and see if it's what you want, otherwise it's the other sign. The notation above is pretty informal, but it gives a definite recipe for how to generate C and D from A, B, and your parameters. It also chooses a good C (which depends on a good, nondegenerate N). unit_vec[i] refers to the vector of all zeros, except for a 1 at index i. As usual, I have not tested the pseudocode above :)