Algorithm to calculate the distances between many geo points - algorithm

I have a matrix having around 1000 geospatial points(Longitude, Latitude) and i am trying to find the points that are in 1KM range.
NOTE: "The points are dynamic, Imagine 1000 vehicles are moving, so i have to re-calculate all distances every few seconds"
I did some searches and read about Graph algorithms like (Floyd–Warshall) to solve this, and I ended up with many keywords, and i am kinda lost now. I am considering the performance and since the search radius is short, I will not consider the curvature of the earth.
Basically, It appears that i have to calculate the distance between every point to every other point then sort the distances starting from every point in the matrix and get the points that are in its range. So if I have 1000 co-ordinates, I have to perfom this process (1000^2-1000) times and I do not beleive this is the optimum solution. Thank You.

If you make a modell with a grid of 1km spacing:
0 1 2 3
___|____|____|____
0 | | |
c| b|a | d
___|____|____|____
1 | | |
| |f |
___|e___|____|____
2 | |g |
let's assume your starting point is a.
If your grid is of 1km size, points in 1km reach have to be in the same cell or one of the 8 neighbours (Points b, d, e, f).
Every other cell can be ignored (c,g).
While d is nearly of the same distance to a as c, c can be dropped early, because there are 2 barriers to cross, while a and d lie on opposite areas of their border, and are therefore nearly 2 km away from each other.
For early dropping of element, you can exclude, it is enough to check the x- or y-part of the coordinate. Since a belongs to (0,2), if x is 0 or smaller, or > 3, the point is already out of range.
After filtering only few candidates, you may use exhaustive search.

In your case, you should be looking at the GeoHash which allows you to quickly query the coordinates within a given distance.
FYI, MongoDB uses geohash internally and it's performing excellently.

Try with an R-Tree. The R-Tree supports the operation to find all the points closest to a given point that are not further away than a given radius. The execution time is optimal and I think it's O(number_of_points_in_the_result).

You could compute geocodes of 1km range around each of those 1000 coordinates and check, whether some points are in that range. May be it's not optimum, but you will save yourself some sorting.

If you want to lookup the matrix for each point vs. each point then you already got the right formula (1000^2-1000). There isn't any shortcut for this calculation. However when you know where to start the search and you want look for points within a 1KM radius you can use a grid or spatial algorithm to speed up the lookup. Most likely it's uses a divide and conquer algorithm and the cheapest of it is a geohash or a z curve. You can also try a kd-tree. Maybe this is even simpler. But if your points are in euklidian space then there is this planar method describe here: http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.
Edit: When I say 1000^2-1000 then I mean the size of the grid but it's actually 1000^(1000 − 1) / 2 pairs of points so a lot less math.

I have something sort of similar on a web page I worked on, I think. The user clicks a location on the map and enters a radius, and a function returns all the locations within a database within the given radius. Do you mean you are trying to find the points that are within 1km of one of the points in the radius? Or are you trying to find the points that are within 1km of each other? I think you should do something like this.
radius = given radius
x1 = latitude of given point;
y1 = longitude of given point;
x2 = null;
y2 = null;
x = null;
y = null;
dist = null;
for ( i=0; i<locationArray.length; i++ ) {
x2 = locationArray[i].latitude;
y2 = locationArray[i].longitude;
x = x1 - x2;
y = y1 - y2;
dist = sqrt(x^2 + y^2);
if (dist <= radius)
these are your points
}
If you are trying to calculate all of the points that are within 1km of another point, you could add an outer loop giving the information of x1 and y1, which would then make the inner loop test the distance between the given point and every other point giving every point in your matrix as input. The calculations shouldn't take too long, since it is so basic.

I had the same problem but in a web service development
In my case to avoid the calculation time problem i used a simple divide & conquer solution : The idea was start the calculation of the distance between the new point and the others in every new data insertion, so that my application access directly the distance between those tow points that had been already calculated and put in my database

Related

How can you iterate linearly through a 3D grid?

Assume we have a 3D grid that spans some 3D space. This grid is made out of cubes, the cubes need not have integer length, they can have any possible floating point length.
Our goal is, given a point and a direction, to check linearly each cube in our path once and exactly once.
So if this was just a regular 3D array and the direction is say in the X direction, starting at position (1,2,0) the algorithm would be:
for(i in number of cubes)
{
grid[1+i][2][0]
}
But of course the origin and the direction are arbitrary and floating point numbers, so it's not as easy as iterating through only one dimension of a 3D array. And the fact the side lengths of the cubes are also arbitrary floats makes it slightly harder as well.
Assume that your cube side lengths are s = (sx, sy, sz), your ray direction is d = (dx, dy, dz), and your starting point is p = (px, py, pz). Then, the ray that you want to traverse is r(t) = p + t * d, where t is an arbitrary positive number.
Let's focus on a single dimension. If you are currently at the lower boundary of a cube, then the step length dt that you need to make on your ray in order to get to the upper boundary of the cube is: dt = s / d. And we can calculate this step length for each of the three dimensions, i.e. dt is also a 3D vector.
Now, the idea is as follows: Find the cell where the ray's starting point lies in and find the parameter values t where the first intersection with the grid occurs per dimension. Then, you can incrementally find the parameter values where you switch from one cube to the next for each dimension. Sort the changes by the respective t value and just iterate.
Some more details:
cell = floor(p - gridLowerBound) / s <-- the / is component-wise division
I will only cover the case where the direction is positive. There are some minor changes if you go in the negative direction but I am sure that you can do these.
Find the first intersections per dimension (nextIntersection is a 3D vector):
nextIntersection = ((cell + (1, 1, 1)) * s - p) / d
And calculate the step length:
dt = s / d
Now, just iterate:
if(nextIntersection.x < nextIntersection.y && nextIntersection.x < nextIntersection.z)
cell.x++
nextIntersection.x += dt.x
else if(nextIntersection.y < nextIntersection.z)
cell.y++
nextIntersection.y += dt.y
else
cell.z++
nextIntersection.z += dt.z
end if
if cell is outside of grid
terminate
I have omitted the case where two or three cells are changed at the same time. The above code will only change one at a time. If you need this, feel free to adapt the code accordingly.
Well if you are working with floats, you can make the equation for the line in direction specifiedd. Which is parameterized by t. Because in between any two floats there is a finite number of points, you can simply check each of these points which cube they are in easily cause you have point (x,y,z) whose components should be in, a respective interval defining a cube.
The issue gets a little bit harder if you consider intervals that are, dense.
The key here is even with floats this is a discrete problem of searching. The fact that the equation of a line between any two points is a discrete set of points means you merely need to check them all to the cube intervals. What's better is there is a symmetry (a line) allowing you to enumerate each point easily with arithmetic expression, one after another for checking.
Also perhaps consider integer case first as it is same but slightly simpler in determining the discrete points as it is a line in Z_2^8?

In a restricted space with n dimension, how to find the coordinates of p points, so that they are as far as possible from each other?

For example, in a 2D space, with x [0 ; 1] and y [0 ; 1]. For p = 4, intuitively, I will place each point at each corner of the square.
But what can be the general algorithm?
Edit: The algorithm needs modification if dimensions are not orthogonal to eachother
To uniformly place the points as described in your example you could do something like this:
var combinedSize = 0
for each dimension d in d0..dn {
combinedSize += d.length;
}
val listOfDistancesBetweenPointsAlongEachDimension = new List
for each d dimension d0..dn {
val percentageOfWholeDimensionSize = d.length/combinedSize
val pointsToPlaceAlongThisDimension = percentageOfWholeDimensionSize * numberOfPoints
listOfDistancesBetweenPointsAlongEachDimension[d.index] = d.length/(pointsToPlaceAlongThisDimension - 1)
}
Run on your example it gives:
combinedSize = 2
percentageOfWholeDimensionSize = 1 / 2
pointsToPlaceAlongThisDimension = 0.5 * 4
listOfDistancesBetweenPointsAlongEachDimension[0] = 1 / (2 - 1)
listOfDistancesBetweenPointsAlongEachDimension[1] = 1 / (2 - 1)
note: The minus 1 deals with the inclusive interval, allowing points at both endpoints of the dimension
2D case
In 2D (n=2) the solution is to place your p points evenly on some circle. If you want also to define the distance d between points then the circle should have radius around:
2*Pi*r = ~p*d
r = ~(p*d)/(2*Pi)
To be more precise you should use circumference of regular p-point polygon instead of circle circumference (I am too lazy to do that). Or you can compute the distance of produced points and scale up/down as needed instead.
So each point p(i) can be defined as:
p(i).x = r*cos((i*2.0*Pi)/p)
p(i).y = r*sin((i*2.0*Pi)/p)
3D case
Just use sphere instead of circle.
ND case
Use ND hypersphere instead of circle.
So your question boils down to place p "equidistant" points to a n-D hypersphere (either surface or volume). As you can see 2D case is simple, but in 3D this starts to be a problem. See:
Make a sphere with equidistant vertices
sphere subdivision triangulation
As you can see there are quite a few approaches to do this (there are much more of them even using Fibonacci sequence generated spiral) which are more or less hard to grasp or implement.
However If you want to generalize this into ND space you need to chose general approach. I would try to do something like this:
Place p uniformly distributed place inside bounding hypersphere
each point should have position,velocity and acceleration vectors. You can also place the points randomly (just ensure none are at the same position)...
For each p compute acceleration
each p should retract any other point (opposite of gravity).
update position
just do a Newton D'Alembert physics simulation in ND. Do not forget to include some dampening of speed so the simulation will stop in time. Bound the position and speed to the sphere so points will not cross it's border nor they would reflect the speed inwards.
loop #2 until max speed of any p crosses some threshold
This will more or less accurately place p points on the circumference of ND hypersphere. So you got minimal distance d between them. If you got some special dependency between n and p then there might be better configurations then this but for arbitrary numbers I think this approach should be safe enough.
Now by modifying #2 rules you can achieve 2 different outcomes. One filling hypersphere surface (by placing massive negative mass into center of surface) and second filling its volume. For these two options also the radius will be different. For one you need to use surface and for the other volume...
Here example of similar simulation used to solve a geometry problem:
How to implement a constraint solver for 2-D geometry?
Here preview of 3D surface case:
The number on top is the max abs speed of particles used to determine the simulations stopped and the white-ish lines are speed vectors. You need to carefully select the acceleration and dampening coefficients so the simulation is fast ...

What is the best way to check all pixels within certain radius?

I'm currently developing an application that will alert users of incoming rain. To do this I want to check certain area around user location for rainfall (different pixel colours for intensity on rainfall radar image). I would like the checked area to be a circle but I don't know how to do this efficiently.
Let's say I want to check radius of 50km. My current idea is to take subset of image with size 100kmx100km (user+50km west, user+50km east, user+50km north, user+50km south) and then check for each pixel in this subset if it's closer to user than 50km.
My question here is, is there a better solution that is used for this type of problems?
If the occurrence of the event you are searching for (rain or anything) is relatively rare, then there's nothing wrong with scanning a square or pixels and then, only after detecting rain in that square, checking whether that rain is within the desired 50km circle. Note that the key point here is that you don't need to check each pixel of the square for being inside the circle (that would be very inefficient), you have to search for your event (rain) first and only when you found it, check whether it falls into the 50km circle. To implement this efficiently you also have to develop some smart strategy for handling multi-pixel "stains" of rain on your image.
However, since you are scanning a raster image, you can easily implement the well-known Bresenham circle algorithm to find the starting and the ending point of the circle for each scan line. That way you can easily limit your scan to the desired 50km radius.
On the second thought, you don't even need the Bresenham algorithm for that. For each row of pixels in your square, calculate the points of intersection of that row with the 50km circle (using the usual schoolbook formula with square root), and then check all pixels that fall between these intersection points. Process all rows in the same fashion and you are done.
P.S. Unfortunately, the Wikipedia page I linked does not present Bresenham algorithm at all. It has code for Michener circle algorithm instead. Michener algorithm will also work for circle rasterization purposes, but it is less precise than Bresenham algorithm. If you care for precision, find a true Bresenham on somewhere. It is actually surprisingly diffcult to find on the net: most search hits erroneously present Michener as Bresenham.
There is, you can modify the midpoint circle algorithm to give you an array of for each y, the x coordinate where the circle starts (and ends, that's the same thing because of symmetry). This array is easy to compute, pseudocode below.
Then you can just iterate over exactly the right part, without checking anything.
Pseudo code:
data = new int[radius];
int f = 1 - radius, ddF_x = 1;
int ddF_y = -2 * radius;
int x = 0, y = radius;
while (x < y)
{
if (f >= 0)
{
y--;
ddF_y += 2; f += ddF_y;
}
x++;
ddF_x += 2; f += ddF_x;
data[radius - y] = x; data[radius - x] = y;
}
Maybe you can try something that will speed up your algorithm.
In brute force algorithm you will probably use equation:
(x-p)^2 + (y-q)^2 < r^2
(p,q) - center of the circle, user position
r - radius (50km)
If you want to find all pixels (x,y) that satisfy above condition and check them, your algorithm goes to O(n^2)
Instead of scanning all pixels in this circle I will check only only pixels that are on border of the circle.
In that case, you can use some more clever way to define circle.
x = p+r*cos(a)
y = q*r*sin(a)
a - angle measured in radians [0-2pi]
Now you can sample some angles, for example twenty of them, iterate and find all pairs (x,y) that are border for radius 50km. Now check are they on the rain zone and alert user.
For more safety I recommend you to use multiple radians (smaller than 50km), because your whole rain cloud can be inside circle, and your app will not recognize him. For example use 3 incircles (r = 5km, 15km, 30km) and do same thing. Efficiency of this algorithm only depends on number of angles and number of incircles.
Pseudocode will be:
checkRainDanger()
p,q <- position
radius[] <- array of radii
for c = 1 to length(radius)
a=0
while(a<2*pi)
x = p + radius[c]*cos(a)
y = q + radius[c]*sin(a)
if rainZone(x,y)
return true
else
a+=pi/10
end_while
end_for
return false //no danger
r2=r*r
for x in range(-r, +r):
max_y=sqrt(r2-x*x)
for y in range(-max_y, +max_y):
# x,y is in range - check for rain

find all points within a range to any point of an other set

I have two sets of points A and B.
I want to find all points in B that are within a certain range r to A, where a point b in B is said to be within range r to A if there is at least one point a in A whose (Euclidean) distance to b is equal or smaller to r.
Each of the both sets of points is a coherent set of points. They are generated from the voxel locations of two non overlapping objects.
In 1D this problem fairly easy: all points of B within [min(A)-r max(A)+r]
But I am in 3D.
What is the best way to do this?
I currently repetitively search for every point in A all points in B that within range using some knn algorithm (ie. matlab's rangesearch) and then unite all those sets. But I got a feeling that there should be a better way to do this. I'd prefer a high level/vectorized solution in matlab, but pseudo code is fine too :)
I also thought of writing all the points to images and using image dilation on object A with a radius of r. But that sounds like quite an overhead.
You can use a k-d tree to store all points of A.
Iterate points b of B, and for each point - find the nearest point in A (let it be a) in the k-d tree. The point b should be included in the result if and only if the distance d(a,b) is smaller then r.
Complexity will be O(|B| * log(|A|) + |A|*log(|A|))
I archived further speedup by enhancing #amit's solution by first filtering out points of B that are definitely too far away from all points in A, because they are too far away even in a single dimension (kinda following the 1D solution mentioned in the question).
Doing so limits the complexity to O(|B|+min(|B|,(2r/res)^3) * log(|A|) + |A|*log(|A|)) where res is the minimum distance between two points and thus reduces run time in the test case to 5s (from 10s, and even more in other cases).
example code in matlab:
r=5;
A=randn(10,3);
B=randn(200,3)+5;
roughframe=[min(A,[],1)-r;max(A,[],1)+r];
sortedout=any(bsxfun(#lt,B,roughframe(1,:)),2)|any(bsxfun(#gt,B,roughframe(2,:)),2);
B=B(~sortedout,:);
[~,dist]=knnsearch(A,B);
B=B(dist<=r,:);
bsxfun() is your friend here. So, say you have 10 points in set A and 3 points in set B. You want to have them arrange so that the singleton dimension is at the row / columns. I will randomly generate them for demonstration
A = rand(10, 1, 3); % 10 points in x, y, z, singleton in rows
B = rand(1, 3, 3); % 3 points in x, y, z, singleton in cols
Then, distances among all the points can be calculated in two steps
dd = bsxfun(#(x,y) (x - y).^2, A, B); % differences of x, y, z in squares
d = sqrt(sum(dd, 3)); % this completes sqrt(dx^2 + dy^2 + dz^2)
Now, you have an array of the distance among points in A and B. So, for exampl, the distance between point 3 in A and point 2 in B should be in d(3, 2). Hope this helps.

How to filter a set of 2D points moving in a certain way

I have a list of points moving in two dimensions (x- and y-axis) represented as rows in an array. I might have N points - i.e., N rows:
1 t1 x1 y1
2 t2 x2 y2
.
.
.
N tN xN yN
where ti, xi, and yi, is the time-index, x-coordinate, and the y-coordinate for point i. The time index-index ti is an integer from 1 to T. The number of points at each such possible time index can vary from 0 to N (still with only N points in total).
My goal is the filter out all the points that do not move in a certain way; or to keep only those that do. A point must move in a parabolic trajectory - with decreasing x- and y-coordinate (i.e., moving to the left and downwards only). Points with other dynamic behaviour must be removed.
Can I use a simple sorting mechanism on this array - and then analyse the order of the time-index? I have also considered the fact each point having the same time-index ti are physically distinct points, and so should be paired up with other points. The complexity of the problem grew - and now I turn to you.
NOTE: You can assume that the points are confined to a sub-region of the (x,y)-plane between two parabolic curves. These curves intersect only at only at one point: A point close to the origin of motion for any point.
More Information:
I have made some datafiles available:
MATLAB datafile (1.17 kB)
same data as CSV with semicolon as column separator (2.77 kB)
Necessary context:
The datafile hold one uint32 array with 176 rows and 5 columns. The columns are:
pixel x-coordinate in 175-by-175 lattice
pixel y-coordinate in 175-by-175 lattice
discrete theta angle-index
time index (from 1 to T = 10)
row index for this original sorting
The points "live" in a 175-by-175 pixel-lattice - and again inside the upper quadrant of a circle with radius 175. The points travel on the circle circumference in a counterclockwise rotation to a certain angle theta with horizontal, where they are thrown off into something close to a parabolic orbit. Column 3 holds a discrete index into a list with indices 1 to 45 from 0 to 90 degress (one index thus spans 2 degrees). The theta-angle was originally deduces solely from the points by setting up the trivial equations of motions and solving for the angle. This gives rise to a quasi-symmetric quartic which can be solved in close-form. The actual metric radius of the circle is 0.2 m and the pixel coordinate were converted from pixel-coordinate to metric using simple linear interpolation (but what we see here are the points in original pixel-space).
My problem is that some points are not behaving properly and since I need to statistics on the theta angle, I need to remove the points that certainly do NOT move in a parabolic trajoctory. These error are expected and fully natural, but still need to be filtered out.
MATLAB plot code:
% load data and setup variables:
load mat_points.mat;
num_r = 175;
num_T = 10;
num_gridN = 20;
% begin plotting:
figure(1000);
clf;
plot( ...
num_r * cos(0:0.1:pi/2), ...
num_r * sin(0:0.1:pi/2), ...
'Color', 'k', ...
'LineWidth', 2 ...
);
axis equal;
xlim([0 num_r]);
ylim([0 num_r]);
hold all;
% setup grid (yea... went crazy with one):
vec_tickValues = linspace(0, num_r, num_gridN);
cell_tickLabels = repmat({''}, size(vec_tickValues));
cell_tickLabels{1} = sprintf('%u', vec_tickValues(1));
cell_tickLabels{end} = sprintf('%u', vec_tickValues(end));
set(gca, 'XTick', vec_tickValues);
set(gca, 'XTickLabel', cell_tickLabels);
set(gca, 'YTick', vec_tickValues);
set(gca, 'YTickLabel', cell_tickLabels);
set(gca, 'GridLineStyle', '-');
grid on;
% plot points per timeindex (with increasing brightness):
vec_grayIndex = linspace(0,0.9,num_T);
for num_kt = 1:num_T
vec_xCoords = mat_points((mat_points(:,4) == num_kt), 1);
vec_yCoords = mat_points((mat_points(:,4) == num_kt), 2);
plot(vec_xCoords, vec_yCoords, 'o', ...
'MarkerEdgeColor', 'k', ...
'MarkerFaceColor', vec_grayIndex(num_kt) * ones(1,3) ...
);
end
Thanks :)
Why, it looks almost as if you're simulating a radar tracking debris from the collision of two missiles...
Anyway, let's coin a new term: object. Objects are moving along parabolae and at certain times they may emit flashes that appear as points. There are also other points which we are trying to filter out.
We will need some more information:
Can we assume that the objects obey the physics of things falling under gravity?
Must every object emit a point at every timestep during its lifetime?
Speaking of lifetime, do all objects begin at the same time? Can some expire before others?
How precise is the data? Is it exact? Is there a measure of error? To put it another way, do we understand how poorly the points from an object might fit a perfect parabola?
Sort the data with (index,time) as keys and for all locations of a point i see if they follow parabolic trajectory?
Which part are you facing problem? Sorting should be very easy. IMHO, it is the second part (testing if a set of points follow parabolic trajectory) that is difficult.

Resources