How does it work comparing/matching images with kd-trees and nearest neighbor search? - computational-geometry

I have been querying google for some material about kd-trees and image comparison but I couldn't make the 'link' between the technics for image comparison using kd-trees.
Firstly, I found some articles talking about speed improvement with randomized kd-trees, then I was introduced to SIFT. After understanding basically how SIFT works, I read about nearest neighbor search.
My real question is: If I have a mesh of points from SIFT, then I create the kd-tree for every image. How the nearest neighbor search can help me compare the images? At first, I thought that comparing images with a tree would work with some algorithm checking the tree structure and how near every point is from an image A from a point in the same node in and image B.
If the question is too dumb, please suggest material or some topic for search.
Thank you!

I'd suggest first understanding slow feature matching, without kdtrees.
input: 1000 reference features, e.g. of faces or flowers; call these F1 .. F1000
a query feature Q: which face or flower feature is most like, nearest, Q ?
As you know,
SIFT
reduces an image feature to 128 8-bit numbers, scaled so that
similarity( feature F, feature Q ) =
Euclidean distance( SIFT(F), SIFT(Q) ).
The simplest way to find which of F1 .. F1000 is most like Q
is just to look at F1, F2 ... one by one:
# find the feature of F1 .. F1000 nearest Q
nearestdistance = infinity
nearestindex = 0
for j in 1 .. 1000:
distance = Euclideandistance( SIFT(Fj), SIFT(Q) ) # 128 numbers vs. 128 numbers
if distance < nearestdistance:
nearestdistance = distance
nearestindex = j
(Of course one computes the SIFT numbers outside the loop.)
A Kdtree
is just a way of finding nearby vectors quickly;
it has little to do with what is being matched
(vectors of numbers representing ...), or how (Euclidean distance).
Now kdtrees are very fast for 2d, 3d ... up to perhaps 20d,
but may be no faster than a linear scan of all the data above 20d.
So how can a kdtree work for features in 128d ?
The main trick is to quit searching early.
The paper by Muja and Lowe,
Fast approximate nearest neighbors with automatic algorithm configuration,
2009, 10p, describes multiple randomized kdtrees for matching 128d SIFT features.
(Lowe is the inventor of SIFT.)
To compare two images I and Q, one finds a set of feature vectors --
several hundred up to a few thousand SIFT vectors -- for each,
and looks for near matches of these sets.
(One may think of images as molecules, features as atoms;
near-matching molecules is much harder than near-matching atoms,
but it helps to be able to match atoms quickly.)
Hope this helps.

If you are planning on using kd-trees for approximate NN search in higher dimensions, you might want to review the experiments here: http://zach.in.tu-clausthal.de/software/approximate_nn/

I suggest you to extract color code values of each image and create a KD tree using those features vectors.
You can use the following mat lab code to extract the color code features.
im = imread('image.jpg');
len = size(im,3);
if(len == 1)
im = ind2rgb(im, colourMap);
im = uint8(im.*255);
end
im(logical( 0 <= im & im <= 63)) = 0;
im(logical( 64 <= im & im <= 127)) = 1;
im(logical(128 <= im & im <= 191)) = 2;
im(logical(192 <= im & im <= 255)) = 3;
im = im(:,:,1) * 16 + im(:,:,2) * 4 + im(:,:,3);
imHist = histc(im(:),0:63);

Related

Best way to find all points of lattice in sphere

Given a bunch of arbitrary vectors (stored in a matrix A) and a radius r, I'd like to find all integer-valued linear combinations of those vectors which land inside a sphere of radius r. The necessary coordinates I would then store in a Matrix V. So, for instance, if the linear combination
K=[0; 1; 0]
lands inside my sphere, i.e. something like
if norm(A*K) <= r then
V(:,1)=K
end
etc.
The vectors in A are sure to be the simplest possible basis for the given lattice and the largest vector will have length 1. Not sure if that restricts the vectors in any useful way but I suspect it might. - They won't have as similar directions as a less ideal basis would have.
I tried a few approaches already but none of them seem particularly satisfying. I can't seem to find a nice pattern to traverse the lattice.
My current approach involves starting in the middle (i.e. with the linear combination of all 0s) and go through the necessary coordinates one by one. It involves storing a bunch of extra vectors to keep track of, so I can go through all the octants (in the 3D case) of the coordinates and find them one by one. This implementation seems awfully complex and not very flexible (in particular it doesn't seem to be easily generalizable to arbitrary numbers of dimension - although that isn't strictly necessary for the current purpose, it'd be a nice-to-have)
Is there a nice* way to find all the required points?
(*Ideally both efficient and elegant**. If REALLY necessary, it wouldn't matter THAT much to have a few extra points outside the sphere but preferably not that many more. I definitely do need all the vectors inside the sphere. - if it makes a large difference, I'm most interested in the 3D case.
**I'm pretty sure my current implementation is neither.)
Similar questions I found:
Find all points in sphere of radius r around arbitrary coordinate - this is actually a much more general case than what I'm looking for. I am only dealing with periodic lattices and my sphere is always centered at 0, coinciding with one point on the lattice.
But I don't have a list of points but rather a matrix of vectors with which I can generate all the points.
How to efficiently enumerate all points of sphere in n-dimensional grid - the case for a completely regular hypercubic lattice and the Manhattan-distance. I'm looking for completely arbitary lattices and euclidean distance (or, for efficiency purposes, obviously the square of that).
Offhand, without proving any assertions, I think that 1) if the set of vectors is not of maximal rank then the number of solutions is infinite; 2) if the set is of maximal rank, then the image of the linear transformation generated by the vectors is a subspace (e.g., plane) of the target space, which intersects the sphere in a lower-dimensional sphere; 3) it follows that you can reduce the problem to a 1-1 linear transformation (kxk matrix on a k-dimensional space); 4) since the matrix is invertible, you can "pull back" the sphere to an ellipsoid in the space containing the lattice points, and as a bonus you get a nice geometric description of the ellipsoid (principal axis theorem); 5) your problem now becomes exactly one of determining the lattice points inside the ellipsoid.
The latter problem is related to an old problem (counting the lattice points inside an ellipse) which was considered by Gauss, who derived a good approximation. Determining the lattice points inside an ellipse(oid) is probably not such a tidy problem, but it probably can be reduced one dimension at a time (the cross-section of an ellipsoid and a plane is another ellipsoid).
I found a method that makes me a lot happier for now. There may still be possible improvements, so if you have a better method, or find an error in this code, definitely please share. Though here is what I have for now: (all written in SciLab)
Step 1: Figure out the maximal ranges as defined by a bounding n-parallelotope aligned with the axes of the lattice vectors. Thanks for ElKamina's vague suggestion as well as this reply to another of my questions over on math.se by chappers: https://math.stackexchange.com/a/1230160/49989
function I=findMaxComponents(A,r) //given a matrix A of lattice basis vectors
//and a sphere radius r,
//find the corners of the bounding parallelotope
//built from the lattice, and store it in I.
[dims,vecs]=size(A); //figure out how many vectors there are in A (and, unnecessarily, how long they are)
U=eye(vecs,vecs); //builds matching unit matrix
iATA=pinv(A'*A); //finds the (pseudo-)inverse of A^T A
iAT=pinv(A'); //finds the (pseudo-)inverse of A^T
I=[]; //initializes I as an empty vector
for i=1:vecs //for each lattice vector,
t=r*(iATA*U(:,i))/norm(iAT*U(:,i)) //find the maximum component such that
//it fits in the bounding n-parallelotope
//of a (n-1)-sphere of radius r
I=[I,t(i)]; //and append it to I
end
I=[-I;I]; //also append the minima (by symmetry, the negative maxima)
endfunction
In my question I only asked for a general basis, i.e, for n dimensions, a set of n arbitrary but linearly independent vectors. The above code, by virtue of using the pseudo-inverse, works for matrices of arbitrary shapes and, similarly, Scilab's "A'" returns the conjugate transpose rather than just the transpose of A so it equally should work for complex matrices.
In the last step I put the corresponding minimal components.
For one such A as an example, this gives me the following in Scilab's console:
A =
0.9701425 - 0.2425356 0.
0.2425356 0.4850713 0.7276069
0.2425356 0.7276069 - 0.2425356
r=3;
I=findMaxComponents(A,r)
I =
- 2.9494438 - 3.4186986 - 4.0826424
2.9494438 3.4186986 4.0826424
I=int(I)
I =
- 2. - 3. - 4.
2. 3. 4.
The values found by findMaxComponents are the largest possible coefficients of each lattice vector such that a linear combination with that coefficient exists which still land on the sphere. Since I'm looking for the largest such combinations with integer coefficients, I can safely drop the part after the decimal point to get the maximal plausible integer ranges. So for the given matrix A, I'll have to go from -2 to 2 in the first component, from -3 to 3 in the second and from -4 to 4 in the third and I'm sure to land on all the points inside the sphere (plus superfluous extra points, but importantly definitely every valid point inside) Next up:
Step 2: using the above information, generate all the candidate combinations.
function K=findAllCombinations(I) //takes a matrix of the form produced by
//findMaxComponents() and returns a matrix
//which lists all the integer linear combinations
//in the respective ranges.
v=I(1,:); //starting from the minimal vector
K=[];
next=1; //keeps track of what component to advance next
changed=%F; //keeps track of whether to add the vector to the output
while or(v~=I(2,:)) //as long as not all components of v match all components of the maximum vector
if v <= I(2,:) then //if each current component is smaller than each largest possible component
if ~changed then
K=[K;v]; //store the vector and
end
v(next)=v(next)+1; //advance the component by 1
next=1; //also reset next to 1
changed=%F;
else
v(1:next)=I(1,1:next); //reset all components smaller than or equal to the current one and
next=next+1; //advance the next larger component next time
changed=%T;
end
end
K=[K;I(2,:)]'; //while loop ends a single iteration early so add the maximal vector too
//also transpose K to fit better with the other functions
endfunction
So now that I have that, all that remains is to check whether a given combination actually does lie inside or outside the sphere. All I gotta do for that is:
Step 3: Filter the combinations to find the actually valid lattice points
function points=generatePoints(A,K,r)
possiblePoints=A*K; //explicitly generates all the possible points
points=[];
for i=possiblePoints
if i'*i<=r*r then //filter those that are too far from the origin
points=[points i];
end
end
endfunction
And I get all the combinations that actually do fit inside the sphere of radius r.
For the above example, the output is rather long: Of originally 315 possible points for a sphere of radius 3 I get 163 remaining points.
The first 4 are: (each column is one)
- 0.2425356 0.2425356 1.2126781 - 0.9701425
- 2.4253563 - 2.6678919 - 2.4253563 - 2.4253563
1.6977494 0. 0.2425356 0.4850713
so the remainder of the work is optimization. Presumably some of those loops could be made faster and especially as the number of dimensions goes up, I have to generate an awful lot of points which I have to discard, so maybe there is a better way than taking the bounding n-parallelotope of the n-1-sphere as a starting point.
Let us just represent K as X.
The problem can be represented as:
(a11x1 + a12x2..)^2 + (a21x1 + a22x2..)^2 ... < r^2
(x1,x2,...) will not form a sphere.
This can be done with recursion on dimension--pick a lattice hyperplane direction and index all such hyperplanes that intersect the r-radius ball. The ball intersection of each such hyperplane itself is a ball, in one lower dimension. Repeat. Here's the calling function code in Octave:
function lat_points(lat_bas_mx,rr)
% **globals for hyperplane lattice point recursive function**
clear global; % this seems necessary/important between runs of this function
global MLB;
global NN_hat;
global NN_len;
global INP; % matrix of interior points, each point(vector) a column vector
global ctr; % integer counter, for keeping track of lattice point vectors added
% in the pre-allocated INP matrix; will finish iteration with actual # of points found
ctr = 0; % counts number of ball-interior lattice points found
MLB = lat_bas_mx;
ndim = size(MLB)(1);
% **create hyperplane normal vectors for recursion step**
% given full-rank lattice basis matrix MLB (each vector in lattice basis a column),
% form set of normal vectors between successive, nested lattice hyperplanes;
% store them as columnar unit normal vectors in NN_hat matrix and their lengths in NN_len vector
NN_hat = [];
for jj=1:ndim-1
tmp_mx = MLB(:,jj+1:ndim);
tmp_mx = [NN_hat(:,1:jj-1),tmp_mx];
NN_hat(:,jj) = null(tmp_mx'); % null space of transpose = orthogonal to columns
tmp_len = norm(NN_hat(:,jj));
NN_hat(:,jj) = NN_hat(:,jj)/tmp_len;
NN_len(jj) = dot(MLB(:,jj),NN_hat(:,jj));
if (NN_len(jj)<0) % NN_hat(:,jj) and MLB(:,jj) must have positive dot product
% for cutting hyperplane indexing to work correctly
NN_hat(:,jj) = -NN_hat(:,jj);
NN_len(jj) = -NN_len(jj);
endif
endfor
NN_len(ndim) = norm(MLB(:,ndim));
NN_hat(:,ndim) = MLB(:,ndim)/NN_len(ndim); % the lowest recursion level normal
% is just the last lattice basis vector
% **estimate number of interior lattice points, and pre-allocate memory for INP**
vol_ppl = prod(NN_len); % the volume of the ndim dimensional lattice paralellepiped
% is just the product of the NN_len's (they amount to the nested altitudes
% of hyperplane "paralellepipeds")
vol_bll = exp( (ndim/2)*log(pi) + ndim*log(rr) - gammaln(ndim/2+1) ); % volume of ndim ball, radius rr
est_num_pts = ceil(vol_bll/vol_ppl); % estimated number of lattice points in the ball
err_fac = 1.1; % error factor for memory pre-allocation--assume max of err_fac*est_num_pts columns required in INP
INP = zeros(ndim,ceil(err_fac*est_num_pts));
% **call the (recursive) function**
% for output, global variable INP (matrix of interior points)
% stores each valid lattice point (as a column vector)
clp = zeros(ndim,1); % confirmed lattice point (start at origin)
bpt = zeros(ndim,1); % point at center of ball (initially, at origin)
rd = 1; % initial recursion depth must always be 1
hyp_fun(clp,bpt,rr,ndim,rd);
printf("%i lattice points found\n",ctr);
INP = INP(:,1:ctr); % trim excess zeros from pre-allocation (if any)
endfunction
Regarding the NN_len(jj)*NN_hat(:,jj) vectors--they can be viewed as successive (nested) altitudes in the ndim-dimensional "parallelepiped" formed by the vectors in the lattice basis, MLB. The volume of the lattice basis parallelepiped is just prod(NN_len)--for a quick estimate of the number of interior lattice points, divide the volume of the ndim-ball of radius rr by prod(NN_len). Here's the recursive function code:
function hyp_fun(clp,bpt,rr,ndim,rd)
%{
clp = the lattice point we're entering this lattice hyperplane with
bpt = location of center of ball in this hyperplane
rr = radius of ball
rd = recrusion depth--from 1 to ndim
%}
global MLB;
global NN_hat;
global NN_len;
global INP;
global ctr;
% hyperplane intersection detection step
nml_hat = NN_hat(:,rd);
nh_comp = dot(clp-bpt,nml_hat);
ix_hi = floor((rr-nh_comp)/NN_len(rd));
ix_lo = ceil((-rr-nh_comp)/NN_len(rd));
if (ix_hi<ix_lo)
return % no hyperplane intersections detected w/ ball;
% get out of this recursion level
endif
hp_ix = [ix_lo:ix_hi]; % indices are created wrt the received reference point
hp_ln = length(hp_ix);
% loop through detected hyperplanes (updated)
if (rd<ndim)
bpt_new_mx = bpt*ones(1,hp_ln) + NN_len(rd)*nml_hat*hp_ix; % an ndim by length(hp_ix) matrix
clp_new_mx = clp*ones(1,hp_ln) + MLB(:,rd)*hp_ix; % an ndim by length(hp_ix) matrix
dd_vec = nh_comp + NN_len(rd)*hp_ix; % a length(hp_ix) row vector
rr_new_vec = sqrt(rr^2-dd_vec.^2);
for jj=1:hp_ln
hyp_fun(clp_new_mx(:,jj),bpt_new_mx(:,jj),rr_new_vec(jj),ndim,rd+1);
endfor
else % rd=ndim--so at deepest level of recursion; record the points on the given 1-dim
% "lattice line" that are inside the ball
INP(:,ctr+1:ctr+hp_ln) = clp + MLB(:,rd)*hp_ix;
ctr += hp_ln;
return
endif
endfunction
This has some Octave-y/Matlab-y things in it, but most should be easily understandable; M(:,jj) references column jj of matrix M; the tic ' means take transpose; [A B] concatenates matrices A and B; A=[] declares an empty matrix.
Updated / better optimized from original answer:
"vectorized" the code in the recursive function, to avoid most "for" loops (those slowed it down a factor of ~10; the code now is a bit more difficult to understand though)
pre-allocated memory for the INP matrix-of-interior points (this speeded it up by another order of magnitude; before that, Octave was having to resize the INP matrix for every call to the innermost recursion level--for large matrices/arrays that can really slow things down)
Because this routine was part of a project, I also coded it in Python. From informal testing, the Python version is another 2-3 times faster than this (Octave) version.
For reference, here is the old, much slower code in the original posting of this answer:
% (OLD slower code, using for loops, and constantly resizing
% the INP matrix) loop through detected hyperplanes
if (rd<ndim)
for jj=1:length(hp_ix)
bpt_new = bpt + hp_ix(jj)*NN_len(rd)*nml_hat;
clp_new = clp + hp_ix(jj)*MLB(:,rd);
dd = nh_comp + hp_ix(jj)*NN_len(rd);
rr_new = sqrt(rr^2-dd^2);
hyp_fun(clp_new,bpt_new,rr_new,ndim,rd+1);
endfor
else % rd=ndim--so at deepest level of recursion; record the points on the given 1-dim
% "lattice line" that are inside the ball
for jj=1:length(hp_ix)
clp_new = clp + hp_ix(jj)*MLB(:,rd);
INP = [INP clp_new];
endfor
return
endif

Intersection of axis-aligned rectangular cuboids (MBR) in one dimension

Currently I'm doing benchmarks on time series indexing algorithms. Since most of the time no reference implementations are available, I have to write my own implementations (all in Java). At the moment I am stuck a little at section 6.2 of a paper called Indexing multi-dimensional time-series with support for multiple distance measures available here in PDF : http://hadjieleftheriou.com/papers/vldbj04-2.pdf
A MBR (minimum bounding rectangle) is basically a rectanglular cubiod with some coordinates and directions. As an example P and Q are two MBRs with P.coord={0,0,0} and P.dir={1,1,3} and Q.coords={0.5,0.5,1} and Q.dir={1,1,1} where the first entries represent the time dimension.
Now I would like to calculate the MINDIST(Q,P) between Q and P :
However I am not sure how to implement the "intersection of two MBRs in the time dimension" (Dim 1) since I am not sure what the intersection in the time dimension actually means. It is also not clear what h_Q, l_Q, l_P, h_P mean, since this notation is not explained (my guess is they mean something like highest or lowest value of a dimension in the intersection).
I would highly appreciate it, if someone could explain to me how to calculate the intersection of two MBRs in the first dimension and maybe enlighten me with an interpretation of the notation. Thanks!
Well, Figure 14 in your paper explains the time intersection. And the rectangles are axis-aligned, thus it makes sense to use high and low on each coordinate.
The multiplication sign you see is not a cross product, just a normal multiplication, because on both sides of it you have a scalar, and not vectors.
However I must agree that the discussions on page 14 are rather fuzzy, but they seem to tell us that both types of intersections (complete and partial), when they are have a t subscript, mean the norm of the intersection along the t coordinate.
Thus it seems you could factorize the time intersection to get a formula that would be :
It is worth noting that, maybe counter-intuitively, when your objects don't intersect on the time plane, their MINDIST is defined to be 0.
Hence the following pseudo-code ;
mindist(P, Q)
{
if( Q.coord[0] + Q.dir[0] < P.coord[0] ||
Q.coord[0] > P.coord[0] + P.dir[0] )
return 0;
time = min(Q.coord[0] + Q.dir[0], P.coord[0] + P.dir[0]) - max(Q.coord[0], P.coord[0]);
sum = 0;
for(d=1; d<D; ++d)
{
if( Q.coord[d] + Q.dir[d] < P.coord[d] )
x = Q.coord[d] + Q.dir[d] - P.coord[d];
else if( P.coord[d] + P.dir[d] < Q.coord[d] )
x = P.coord[d] + P.dir[d] - Q.coord[d];
else
x = 0;
sum += x*x;
}
return sqrt(time * sum);
}
Note the absolute values in the paper are unnecessary since we just checked which values where bigger, and we thus know we only add positive numbers.

Multiliteration implementation with inaccurate distance data

I am trying to create an android smartphone application which uses Apples iBeacon technology to determine the current indoor location of itself. I already managed to get all available beacons and calculate the distance to them via the rssi signal.
Currently I face the problem, that I am not able to find any library or implementation of an algorithm, which calculates the estimated location in 2D by using 3 (or more) distances of fixed points with the condition, that these distances are not accurate (which means, that the three "trilateration-circles" do not intersect in one point).
I would be deeply grateful if anybody can post me a link or an implementation of that in any common programming language (Java, C++, Python, PHP, Javascript or whatever). I already read a lot on stackoverflow about that topic, but could not find any answer I were able to convert in code (only some mathematical approaches with matrices and inverting them, calculating with vectors or stuff like that).
EDIT
I thought about an own approach, which works quite well for me, but is not that efficient and scientific. I iterate over every meter (or like in my example 0.1 meter) of the location grid and calculate the possibility of that location to be the actual position of the handset by comparing the distance of that location to all beacons and the distance I calculate with the received rssi signal.
Code example:
public Location trilaterate(ArrayList<Beacon> beacons, double maxX, double maxY)
{
for (double x = 0; x <= maxX; x += .1)
{
for (double y = 0; y <= maxY; y += .1)
{
double currentLocationProbability = 0;
for (Beacon beacon : beacons)
{
// distance difference between calculated distance to beacon transmitter
// (rssi-calculated distance) and current location:
// |sqrt(dX^2 + dY^2) - distanceToTransmitter|
double distanceDifference = Math
.abs(Math.sqrt(Math.pow(beacon.getLocation().x - x, 2)
+ Math.pow(beacon.getLocation().y - y, 2))
- beacon.getCurrentDistanceToTransmitter());
// weight the distance difference with the beacon calculated rssi-distance. The
// smaller the calculated rssi-distance is, the more the distance difference
// will be weighted (it is assumed, that nearer beacons measure the distance
// more accurate)
distanceDifference /= Math.pow(beacon.getCurrentDistanceToTransmitter(), 0.9);
// sum up all weighted distance differences for every beacon in
// "currentLocationProbability"
currentLocationProbability += distanceDifference;
}
addToLocationMap(currentLocationProbability, x, y);
// the previous line is my approach, I create a Set of Locations with the 5 most probable locations in it to estimate the accuracy of the measurement afterwards. If that is not necessary, a simple variable assignment for the most probable location would do the job also
}
}
Location bestLocation = getLocationSet().first().location;
bestLocation.accuracy = calculateLocationAccuracy();
Log.w("TRILATERATION", "Location " + bestLocation + " best with accuracy "
+ bestLocation.accuracy);
return bestLocation;
}
Of course, the downside of that is, that I have on a 300m² floor 30.000 locations I had to iterate over and measure the distance to every single beacon I got a signal from (if that would be 5, I do 150.000 calculations only for determine a single location). That's a lot - so I will let the question open and hope for some further solutions or a good improvement of this existing solution in order to make it more efficient.
Of course it has not to be a Trilateration approach, like the original title of this question was, it is also good to have an algorithm which includes more than three beacons for the location determination (Multilateration).
If the current approach is fine except for being too slow, then you could speed it up by recursively subdividing the plane. This works sort of like finding nearest neighbors in a kd-tree. Suppose that we are given an axis-aligned box and wish to find the approximate best solution in the box. If the box is small enough, then return the center.
Otherwise, divide the box in half, either by x or by y depending on which side is longer. For both halves, compute a bound on the solution quality as follows. Since the objective function is additive, sum lower bounds for each beacon. The lower bound for a beacon is the distance of the circle to the box, times the scaling factor. Recursively find the best solution in the child with the lower lower bound. Examine the other child only if the best solution in the first child is worse than the other child's lower bound.
Most of the implementation work here is the box-to-circle distance computation. Since the box is axis-aligned, we can use interval arithmetic to determine the precise range of distances from box points to the circle center.
P.S.: Math.hypot is a nice function for computing 2D Euclidean distances.
Instead of taking confidence levels of individual beacons into account, I would instead try to assign an overall confidence level for your result after you make the best guess you can with the available data. I don't think the only available metric (perceived power) is a good indication of accuracy. With poor geometry or a misbehaving beacon, you could be trusting poor data highly. It might make better sense to come up with an overall confidence level based on how well the perceived distance to the beacons line up with the calculated point assuming you trust all beacons equally.
I wrote some Python below that comes up with a best guess based on the provided data in the 3-beacon case by calculating the two points of intersection of circles for the first two beacons and then choosing the point that best matches the third. It's meant to get started on the problem and is not a final solution. If beacons don't intersect, it slightly increases the radius of each up until they do meet or a threshold is met. Likewise, it makes sure the third beacon agrees within a settable threshold. For n-beacons, I would pick 3 or 4 of the strongest signals and use those. There are tons of optimizations that could be done and I think this is a trial-by-fire problem due to the unwieldy nature of beaconing.
import math
beacons = [[0.0,0.0,7.0],[0.0,10.0,7.0],[10.0,5.0,16.0]] # x, y, radius
def point_dist(x1,y1,x2,y2):
x = x2-x1
y = y2-y1
return math.sqrt((x*x)+(y*y))
# determines two points of intersection for two circles [x,y,radius]
# returns None if the circles do not intersect
def circle_intersection(beacon1,beacon2):
r1 = beacon1[2]
r2 = beacon2[2]
dist = point_dist(beacon1[0],beacon1[1],beacon2[0],beacon2[1])
heron_root = (dist+r1+r2)*(-dist+r1+r2)*(dist-r1+r2)*(dist+r1-r2)
if ( heron_root > 0 ):
heron = 0.25*math.sqrt(heron_root)
xbase = (0.5)*(beacon1[0]+beacon2[0]) + (0.5)*(beacon2[0]-beacon1[0])*(r1*r1-r2*r2)/(dist*dist)
xdiff = 2*(beacon2[1]-beacon1[1])*heron/(dist*dist)
ybase = (0.5)*(beacon1[1]+beacon2[1]) + (0.5)*(beacon2[1]-beacon1[1])*(r1*r1-r2*r2)/(dist*dist)
ydiff = 2*(beacon2[0]-beacon1[0])*heron/(dist*dist)
return (xbase+xdiff,ybase-ydiff),(xbase-xdiff,ybase+ydiff)
else:
# no intersection, need to pseudo-increase beacon power and try again
return None
# find the two points of intersection between beacon0 and beacon1
# will use beacon2 to determine the better of the two points
failing = True
power_increases = 0
while failing and power_increases < 10:
res = circle_intersection(beacons[0],beacons[1])
if ( res ):
intersection = res
else:
beacons[0][2] *= 1.001
beacons[1][2] *= 1.001
power_increases += 1
continue
failing = False
# make sure the best fit is within x% (10% of the total distance from the 3rd beacon in this case)
# otherwise the results are too far off
THRESHOLD = 0.1
if failing:
print 'Bad Beacon Data (Beacon0 & Beacon1 don\'t intersection after many "power increases")'
else:
# finding best point between beacon1 and beacon2
dist1 = point_dist(beacons[2][0],beacons[2][1],intersection[0][0],intersection[0][1])
dist2 = point_dist(beacons[2][0],beacons[2][1],intersection[1][0],intersection[1][1])
if ( math.fabs(dist1-beacons[2][2]) < math.fabs(dist2-beacons[2][2]) ):
best_point = intersection[0]
best_dist = dist1
else:
best_point = intersection[1]
best_dist = dist2
best_dist_diff = math.fabs(best_dist-beacons[2][2])
if best_dist_diff < THRESHOLD*best_dist:
print best_point
else:
print 'Bad Beacon Data (Beacon2 distance to best point not within threshold)'
If you want to trust closer beacons more, you may want to calculate the intersection points between the two closest beacons and then use the farther beacon to tie-break. Keep in mind that almost anything you do with "confidence levels" for the individual measurements will be a hack at best. Since you will always be working with very bad data, you will defintiely need to loosen up the power_increases limit and threshold percentage.
You have 3 points : A(xA,yA,zA), B(xB,yB,zB) and C(xC,yC,zC), which respectively are approximately at dA, dB and dC from you goal point G(xG,yG,zG).
Let's say cA, cB and cC are the confidence rate ( 0 < cX <= 1 ) of each point.
Basically, you might take something really close to 1, like {0.95,0.97,0.99}.
If you don't know, try different coefficient depending of distance avg. If distance is really big, you're likely to be not very confident about it.
Here is the way i'll do it :
var sum = (cA*dA) + (cB*dB) + (cC*dC);
dA = cA*dA/sum;
dB = cB*dB/sum;
dC = cC*dC/sum;
xG = (xA*dA) + (xB*dB) + (xC*dC);
yG = (yA*dA) + (yB*dB) + (yC*dC);
xG = (zA*dA) + (zB*dB) + (zC*dC);
Basic, and not really smart but will do the job for some simple tasks.
EDIT
You can take any confidence coef you want in [0,inf[, but IMHO, restraining at [0,1] is a good idea to keep a realistic result.

Algorithm to calculate the distances between many geo points

I have a matrix having around 1000 geospatial points(Longitude, Latitude) and i am trying to find the points that are in 1KM range.
NOTE: "The points are dynamic, Imagine 1000 vehicles are moving, so i have to re-calculate all distances every few seconds"
I did some searches and read about Graph algorithms like (Floyd–Warshall) to solve this, and I ended up with many keywords, and i am kinda lost now. I am considering the performance and since the search radius is short, I will not consider the curvature of the earth.
Basically, It appears that i have to calculate the distance between every point to every other point then sort the distances starting from every point in the matrix and get the points that are in its range. So if I have 1000 co-ordinates, I have to perfom this process (1000^2-1000) times and I do not beleive this is the optimum solution. Thank You.
If you make a modell with a grid of 1km spacing:
0 1 2 3
___|____|____|____
0 | | |
c| b|a | d
___|____|____|____
1 | | |
| |f |
___|e___|____|____
2 | |g |
let's assume your starting point is a.
If your grid is of 1km size, points in 1km reach have to be in the same cell or one of the 8 neighbours (Points b, d, e, f).
Every other cell can be ignored (c,g).
While d is nearly of the same distance to a as c, c can be dropped early, because there are 2 barriers to cross, while a and d lie on opposite areas of their border, and are therefore nearly 2 km away from each other.
For early dropping of element, you can exclude, it is enough to check the x- or y-part of the coordinate. Since a belongs to (0,2), if x is 0 or smaller, or > 3, the point is already out of range.
After filtering only few candidates, you may use exhaustive search.
In your case, you should be looking at the GeoHash which allows you to quickly query the coordinates within a given distance.
FYI, MongoDB uses geohash internally and it's performing excellently.
Try with an R-Tree. The R-Tree supports the operation to find all the points closest to a given point that are not further away than a given radius. The execution time is optimal and I think it's O(number_of_points_in_the_result).
You could compute geocodes of 1km range around each of those 1000 coordinates and check, whether some points are in that range. May be it's not optimum, but you will save yourself some sorting.
If you want to lookup the matrix for each point vs. each point then you already got the right formula (1000^2-1000). There isn't any shortcut for this calculation. However when you know where to start the search and you want look for points within a 1KM radius you can use a grid or spatial algorithm to speed up the lookup. Most likely it's uses a divide and conquer algorithm and the cheapest of it is a geohash or a z curve. You can also try a kd-tree. Maybe this is even simpler. But if your points are in euklidian space then there is this planar method describe here: http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.
Edit: When I say 1000^2-1000 then I mean the size of the grid but it's actually 1000^(1000 − 1) / 2 pairs of points so a lot less math.
I have something sort of similar on a web page I worked on, I think. The user clicks a location on the map and enters a radius, and a function returns all the locations within a database within the given radius. Do you mean you are trying to find the points that are within 1km of one of the points in the radius? Or are you trying to find the points that are within 1km of each other? I think you should do something like this.
radius = given radius
x1 = latitude of given point;
y1 = longitude of given point;
x2 = null;
y2 = null;
x = null;
y = null;
dist = null;
for ( i=0; i<locationArray.length; i++ ) {
x2 = locationArray[i].latitude;
y2 = locationArray[i].longitude;
x = x1 - x2;
y = y1 - y2;
dist = sqrt(x^2 + y^2);
if (dist <= radius)
these are your points
}
If you are trying to calculate all of the points that are within 1km of another point, you could add an outer loop giving the information of x1 and y1, which would then make the inner loop test the distance between the given point and every other point giving every point in your matrix as input. The calculations shouldn't take too long, since it is so basic.
I had the same problem but in a web service development
In my case to avoid the calculation time problem i used a simple divide & conquer solution : The idea was start the calculation of the distance between the new point and the others in every new data insertion, so that my application access directly the distance between those tow points that had been already calculated and put in my database

Fastest way to fit a parabola to set of points?

Given a set of points, what's the fastest way to fit a parabola to them? Is it doing the least squares calculation or is there an iterative way?
Thanks
Edit:
I think gradient descent is the way to go. The least squares calculation would have been a little bit more taxing (having to do qr decomposition or something to keep things stable).
If the points have no error associated, you may interpolate by three points. Otherwise least squares or any equivalent formulation is the way to go.
I recently needed to find a parabola that passes through 3 points.
suppose you have (x1,y1), (x2,y2) and (x3,y3) and you want the parabola
y-y0 = a*(x-x0)^2
to pass through them: find y0, x0, and a.
You can do some algebra and get this solution (providing the points aren't all on a line) :
let c = (y1-y2) / (y2-y3)
x0 = ( -x1^2 + x2^2 + c*( x2^2 - x3^2 ) ) / (2.0*( -x1+x2 + c*x2 - c*x3 ))
a = (y1-y2) / ( (x1-x0)^2 - (x2-x0)^2 )
y0 = y1 - a*(x1-x0)^2
Note in the equation for c if y2==y3 then you've got a problem. So in my algorithm I check for this and swap say x1, y1 with x2, y2 and then proceed.
hope that helps!
Paul Probert
A calculated solution is almost always faster than an iterative solution. The "exception" would be for low iteration counts and complex calculations.
I would use the least squares method. I've only every coded it for linear regression fits but it can be used for parabolas (I had reason to look it up recently - sources included an old edition of "Numerical Recipes" Press et al; and "Engineering Mathematics" Kreyzig).
ALGORITHM FOR PARABOLA
Read no. of data points n and order of polynomial Mp .
Read data values .
If n< Mp
[ Regression is not possible ]
stop
else
continue ;
Set M=Mp + 1 ;
Compute co-efficient of C-matrix .
Compute co-efficient of B-matrix .
Solve for the co-efficients
a1,a2,. . . . . . . an .
Write the co-efficient .
Estimate the function value at the glren of independents variables .
Using the free arbitrary accuracy math program "PARI" (for Mac or PC):
Here is how I would fit a parabola to a set of 641 points,
and I also show how to find the minimum of that parabola:
Set a high number of digits of precision:
\p 300
Write the data points to a text file separated by one space
for each data point
(use ASCII characters in base ten, no space at file start or file end, and no returns, write extremely large or small floating points as for example
"9.0E-23" but not "9.0D-23" ).
make a string to point to that file:
fileone="./desktop/data.txt"
read that file into PARI using the following instructions:
fileopen(fileone,r)
readsplit(file) = my(cmd);cmd="perl -ne \"chomp; print '[' . join(',', split(/ +/)) . ']\n';\"";eval(externstr(Str(cmd," ",file)))
readsplit(fileone)
Label that data with a name:
in = %
V = in[1]
Define a least squares fit function:
lsf(X,Y,n) = my(M=matrix(#X,n+1,i,j,X[i]^(j-1)));fit=Polrev(matsolve(M~*M,M~*Y~))
Apply that lsf function to your 641 data points:
lsf([-320..320],V, 2)
Then if you want to show the minimum of that parabolic fit, enter:
xextreme = solve (x=-1000,1000,eval(deriv(fit)));print (xextreme*(124.5678-123.5678)/640+(124.5678+123.5678)/2);x=xextreme;print(eval(fit))
(I had to adjust for my particular x-axis scaling before the "print" statement in that command line above).
(Note: A sacrifice made to simplify this algorithm
causes it to work only
when the data set has equally spaced x-axis coordinates.)
I was worried that my last post
was too compact to follow and
too hard to convert to other environments.
I would like to show here how to solve the
generalized problem of parabolic data fitting explicitly
without specialized matrix math terminology;
and so that each multiplication, division,
subtraction and addition can be seen at once.
To save ink this fit reparameterizes the x-axis as evenly
spaced points centered on zero
so that odd powered sums all get eliminated
(saving a lot of space and time),
so the x-coordinates of the N data points
are effectively labeled by points
of this vector: X=[-(N-1)/2..(N-1)/2].
For example "xextreme" will be returned
versus those integer indices
and so (if desired) a simple (consumes very little CPU time)
linear transformation must be applied after the algorithm below
to get it versus your problem's particular x-axis labels.
This is written in the language of
the free program "PARI" but all the
commands are simple to translate to any language.
Step 1: assign a label to the y-axis data:
? V=[5,2,1,2,5]
"PARI" confirms that entry:
%280 = [5, 2, 1, 2, 5]
Then type in the following processing algorithm
which calculates a best fit parabola
through any y-axis data set with constant x-axis separation:
? g=#V;h=(g-1)*g*(g+1)/3;i=h*(3*g*g-7)/5;\
a=sum(i=1,g,V[i]);b=sum(i=1,g,(2*i-1-g)*V[i]);c=sum(i=1,g,(2*i-1-g)*(2*i-1-g)*V[i]);\
A=matdet([a,c;h,i])/matdet([g,h;h,i]);B=b/h*2;C=matdet([g,h;a,c])/matdet([g,h;h,i])*4;\
xextreme=-B/(2*C);yextreme=-B*B/(4*C)+A;fit=Polrev([A,B,C]);\
print("\n","y of extreme is ",yextreme,"\n","which occurs this many data points from center of data: ",xextreme)
(Note for non-PARI users:
the command "matdet([a,c;h,i])"
is just another way of entering "a*i-c*h")
Those commands then produce the following screen output:
y of extreme is 1
which occurs this many data points from center of data: 0
The algorithm stores the polynomial of the fit in the variable "fit":
? fit
%282 = x^2 + 1
?
(Note that to make that algorithm short
the x-axis labels are assigned as X=[-(N-1)/2..(N-1)/2],
thus they are X=[-2,-1,0,1,2]
To correct that
for the same polynomial as parameterized
by an x-axis coordinate data set of say X=[−1,0,1,2,3]:
just apply a simple linear transform, in this case:
"x^2 + 1" --> "(t - 1)^2 + 1".)

Resources