Algorithm behind distance calculations in The Google Distance Matrix API

Algorithm behind distance calculations in The Google Distance Matrix API - algorithm

How does the Google Distance Matrix API calculate the distance from point A to B. Often there are multiple ways to go from A to B and the question is how Google prioritizes different routes to find the one that is used for the distance calculation. Strategies could be:
Fastest
Shortest
Low risk of queues
Etc.
Sincerely,
Henning

Google map calculation uses a fastest calculation but the distance matrix api can also gives accurate distance in meters. Here is some answer from Nick Johnson unfortunatelynot about the question: What algorithms compute directions from point A to point B on a map?. At least the algorithm is a modified. I think with fastest calculation the map is more flexible. I don't see why they can't switch between both?

Related

What is the best algorithm for determining duplicated paths between tracking trips?

I am developing an mobile application for recording user trips. A trip is made by sequences of user positions (with longitude and latitude values).
Now my problem is how to determine a trip has been traveled so far? On the other words, how to determine duplicated paths between trips?
(I know that we could not have 2 trips with the exact same data points, hence, I don't know how to begin with, I am looking for an algorithm could approximately address this problem).
Thank for your help!

There are a couple of trajectory distance measures that could help: Euclidean Distance, Dynamic Time Wraping, Edit Distance with Real Penalty, LCSS, ... Which one to pick depends on how you want to define similarity.
In this paper the authors describe all distance measures and evaluate them.
As far as I understand your scenario an LCSS or ERP based similarity measure might fit. A quick search brought me to this Github Repository

Algorithm to find k neighbors in a certain range?

Suppose there is a point cloud having 50 000 points in the x-y-z 3D space. For every point in this cloud, what algorithms or data strictures should be implemented to find k neighbours of a given point which are within a distance of [R,r]? Naive way is to go through each of the 49 999 points for each of the 50 000 points and do a metric testing. But this approach will take large time. Just like there is kd tree to find nearest neighbour in small time so is there some real-time DS/algo implementation out there to pre-process the point clouds to achieve the goal inn shortest time?

Your problem is part of the topic of Nearest Neighbor Search, or more precisely, k-Nearest Neighbor Search. The answer to your question depends on the data structure you are using to store the points. If you use R-trees or variants like R*-trees, and you are doing multiple searches on your database, you will likely find a substantial performance improvement in two or three-dimensional space compared with naive linear search. In higher dimensions, space partitioning schemes tend to underperform linear search.

As some answers already suggest for NN search you could use some tree algorithm like k-d-tree. There are implementations available for all programming languages.
If your description [R,r] suggests a hollow sphere you should compare one-time-testing (within interval) vs. two stages (test-for-outer and remove samples that pass test-for-inner).
You also did not mention performance requirements (timing or frame rate?) and your intended application (feasible approach?).

If you are using an ordinary Euclidean metric, you could go through the list three times and extract those points that within R in each dimension, essentially extracting the enclosing cube. Searching the resulting list would still be O(n^2), but on a much smaller n.

There are efficient algorithms (in average, for random data), see Nearest neighbor search.
Your approach is not efficient, yet simple.
Please read through, check you requirements and get back so we can help.

K nearest neighbour search with weights on dimensions

I have a floor on which various sensors are placed at different location on the floor. For every transmitting device, sensors may detect its readings. It is possible to have 6-7 sensors on a floor, and it is possible that a particular reading may not be detected by some sensors, but are detected by some other sensors.
For every reading I get, I would like to identify the location of that reading on the floor. We divide floor logically into TILEs (5x5 feet area) and find what ideally the reading at each TILE should be as detected by each sensor device (based on some transmission pathloss equation).
I am using the precomputed readings from 'N' sensor device at each TILE as a point in N-dimensional space. When I get a real life reading, I find the nearest neighbours of this reading, and assign this reading to that location.
I would like to know if there is a variant of K nearest neighbours, where a dimension could be REMOVED from consideration. This will especially be useful, when a particular sensor is not reporting any reading. I understand that putting weightage on a dimension will be impossible with algorithms like kd-tree or R trees. However, I would like to know if it would be possible to discard a dimension when computing nearest neighbours. Is there any such algorithm?
EDIT:
What I want to know is if the same R/kd tree could be used for k nearest search with different queries, where each query has different dimension weightage? I don't want to construct another kd-tree for every different weightage on dimensions.
EDIT 2:
Is there any library in python, which allows you to specify the custom distance function, and search for k nearest neighbours? Essentially, I would want to use different custom distance functions for different queries.

Both for R-trees and kd-trees, using weighted Minkowski norms is straightforward. Just put the weights into your distance equations!
Putting weights into Eulidean point-to-rectangle minimum distance is trivial, just look at the regular formula and plug in the weight as desired.
Distances are not used at tree construction time, so you can vary the weights as desired at query time.

After going through a lot of questions on stackoverflow, and finally going into details of scipy kd tree source code, I realised the answer by "celion" in following link is correct:
KD-Trees and missing values (vector comparison)
Excerpt:
"I think the best solution involves getting your hands dirty in the code that you're working with. Presumably the nearest-neighbor search computes the distance between the point in the tree leaf and the query vector; you should be able to modify this to handle the case where the point and the query vector are different sizes. E.g. if the points in the tree are given in 3D, but your query vector is only length 2, then the "distance" between the point (p0, p1, p2) and the query vector (x0, x1) would be
sqrt( (p0-x0)^2 + (p1-x1)^2 )
I didn't dig into the java code that you linked to, but I can try to find exactly where the change would need to go if you need help.
-Chris
PS - you might not need the sqrt in the equation above, since distance squared is usually equivalent."

Geospatial lookup

I'm developing a algorithm and data structures to handle lookup by euclidean distance on a large quantities of 2-dimentional points.
I've tried researching this on google scholar but found nothing yet (probably because I don't know what this problem is usually called in the literature).
These are the two approaches I've considered:
Approach 1:
Create a bidimentional grid with buckets. Insert points into buckets, keeping a reference of each point's bucket.
On lookup of point P with distance D, get its bucket B and all the buckets where any of the corners of its grid-square have (distance to B) < D.
Finally, enumerate the points in all those buckets and calculate distance to P.
Approach 2:
Create two lists, each with all the point ordered by one of the coordinates (x,y). On lookup of point P with distance D, perform binary search to find two points in each of the list in order to find the rectangular region where points have their Chebyshev distance to P < D.
Finally, calculate euclidean distance of all those points to P
I'm guessing the state-of-the-art algorithms will be vastly superior to this, though? Any ideas on this are appreciated

Some tips to help you:
Take a look at KDTree, it is a k-dimensional tree (2d in your case) which is one of the best ways to look for nearest-neighbors.
Perhaps you could benefit from a Spatial Database, specifically developed to deal with Geospatial Data;
You could use any of the above with your desired distance function. Depending on your application, you want map distance, great circle distance, constant slope distance, constant bearing distance, etc. Your distance function should be known by you. I use to apply great circle (haversine) distance to deal with google-maps-like maps and tracks.
In case you want a Python implementation, there is scipy.spatial (docs). From this module, the function query_ball_point((px, py), radius) seems to be what you're looking for.
Hope this helps!

Can I use arbitrary metrics to search KD-Trees?

I just finished implementing a kd-tree for doing fast nearest neighbor searches. I'm interested in playing around with different distance metrics other than the Euclidean distance. My understanding of the kd-tree is that the speedy kd-tree search is not guaranteed to give exact searches if the metric is non-Euclidean, which means that I might need to implement a new data structure and search algorithm if I want to try out new metrics for my search.
I have two questions:
Does using a kd-tree permanently tie me to the Euclidean distance?
If so, what other sorts of algorithms should I try that work for arbitrary metrics? I don't have a ton of time to implement lots of different data structures, but other structures I'm thinking about include cover trees and vp-trees.

The nearest-neighbour search procedure described on the Wikipedia page you linked to can certainly be generalised to other distance metrics, provided you replace "hypersphere" with the equivalent geometrical object for the given metric, and test each hyperplane for crossings with this object.
Example: if you are using the Manhattan distance instead (i.e. the sum of the absolute values of all differences in vector components), your hypersphere would become a (multidimensional) diamond. (This is easiest to visualise in 2D -- if your current nearest neighbour is at distance x from the query point p, then any closer neighbour behind a different hyperplane must intersect a diamond shape that has width and height 2x and is centred on p). This might make the hyperplane-crossing test more difficult to code or slower to run, however the general principle still applies.

I don't think you're tied to euclidean distance - as j_random_hacker says, you can probably use Manhattan distance - but I'm pretty sure you're tied to geometries that can be represented in cartesian coordinates. So you couldn't use a kd-tree to index a metric space, for example.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio