How do we get the shortest distance route from point A to B by default from Google Direction API - directions

How do we get the shortest distance route from point A to B by default from Google Direction API suggested alternative routes? By default it gives us shortest duration routes depending upon the current traffic conditions. I have noticed that google responds with multiple alternative routes if you turn on "provideRouteAlternatives=true", I was wondering if we could send a parameter to Google API so that it will always return shortest distance route by default

As Rameshwor has mentioned, the suggested route returned by Google may be optimised for journey time rather than journey distance. If one or more waypoints have been specified then only one route may be returned anyway, but programatically it's best to assume that more than one route will always be returned.
The following example shows a simple way to find the route with the shortest journey distance using jQuery; please note this code isn't optimised but it should work:
var route_options = [];
for (var i = 0; i < response.routes.length; i++)
var route = response.routes[i];
var distance = 0;
// Total the legs to find the overall journey distance for each route option
for (var j = 0; j < route.legs.length; j++)
distance += route.legs[j].distance.value; // metres
'route_id': i,
'distance': distance
route_options = [
{route_id:0, distance:35125},
{route_id:1, distance:22918},
{route_id:2, distance:20561}
// Sort the route options; shortest to longest distance in ascending order
route_options.sort(function(a, b) {
return parseInt(a.distance) - parseInt(b.distance);
route_options = [
{route_id:2, distance:20561},
{route_id:1, distance:22918},
{route_id:0, distance:35125}
var shortest_distance = (route_options[0]['distance'] * 0.001); // convert metres to kilometres
You can then access the "route_id" value to access the correct route in the response object.

By default Google gives us shortest duration routes depending upon the current traffic conditions. I have noticed that google responds with multiple alternative routes if you turn on “provideRouteAlternatives=true”. When you do not ask for alternate routes, you get the most optimized route by default, although this isn’t necessarily optimized for distance.
If you’re trying to get the shortest route, then the way to do this would be to have your application evaluate the total distance of each route in the response, and programmatically select the one with the shortest distance. There isn’t a query parameter that you can pass to Google in the request to say “return only the shortest route”.
Check this demo

Using the shortest route rather than the fastest route is generally not a good idea in practice.
When the shortest route is not the fastest, it is likely to be a lower quality route in terms of time, fuel efficiency and sometimes even personal safety. These factors are more important to the majority of drivers on the road.
There are a few workarounds that could potentially yield shorter routes, but they have significant drawbacks, so I'd recommend against them:
Request routes in both directions.
Directions from A to B may not yield a feasible route from B to A due to situations like one-way streets, turn restrictions and different locations of highway exits. Requesting routes in both directions and taking the shortest route may yield a route that is not usable in one direction.
Request alternative routes.
Asking for alternative routes and picking the shortest route can yield a shorter route than that returned by default. However, alternative routes are not generally stable (may change over time as short-term road conditions change) nor guaranteed to include the shortest route. This means that the shortest route may still not be available, and also the shortest route found by this approach may change over time, giving an impression of instability.
Oftentimes I've seen requests for the shortest routes come from use cases where the goal is rather a realistic driving distance at specific times of the day and week, e.g. workers commuting to/from work. In these cases, driving directions can be requested with departure_time set to the relevant time of the day (and day of the week) within a week (in the future, not past) to obtain a route influenced by typical traffic conditions at that time of the day and week.
Note: this is only available if request includes an API key or a Google Maps APIs Premium Plan client ID.


How do I find the largest cluster in this simple dataset?

I have data on users and their interests. Some users have more interests than others. Data looks like below.
How do I find the largest cluster of users with the most interests in common? Formally, I am trying to maximize (number of users in cluster * number of shared interests in cluster)
In the data below, the largest cluster is:
Users: [1,2,3]
Interests: [2,3]
Cluster-value: 3 users x 2 shared interests = 6
User 1: {3,2}
User 2: {3,2,4}
User 3: {2,3,8}
User 4: {7}
User 5: {7}
User 6: {9}
How do I find the largest cluster of users with the most interests in common?
Here would be a hypothetical data generation process:
import random
# Generate 300 random (user, interest) tupples
def generate_data():
data = []
while len(data) < 300:
data_pt = {"user": random.randint(1,100), "interest":random.randint(50)}
if data_pt not in data:
return data
def largest_cluster(data):
return None
UPDATE: As somebody pointed out, the data is too parse. In the real case, there would be more users than interests. So I have updated the data generating process.
This looks to me like the kind of combinatorial optimization problem which would fall into the NP-Hard complexity class, which would of course mean that it's intractable to find an exact solution for instances with more than ~30 users.
Dynamic Programming would be the tool you'd want to employ if you were to find a usable algorithm for a problem with an exponential search space like this (here the solution space is all 2^n subsets of users), but I don't see DP helping us here because of the lack of overlapping sub-problems. That is, for DP to help, we have to be able to use and combine solutions to smaller sub-problems into an overall solution in polynomial time, and I don't see how we can do that for this problem.
Imagine you have a solution for a size=k problem, using a limited subset of the users {u1, u2,} and you want to use that solution to find the new solution when you add another user u(k+1). The issue is the solution set in the incrementally larger instance might not overlap at all with the previous solution (it may be an entirely different group of users/interests), so we can't effectively combine solutions to subproblems to get the overall solution. And if instead of trying to just use the single optimal solution for the size k problem to reason about the size k+1 problem you instead stored all possible user combinations from the smaller instance along with their scores, you could of course quite easily do set intersections across these groups' interests with the new user's interests to find the new optimal solution. However, the problem with this approach is of course that the information you have to store would double with iteration, yielding an exponential time algorithm not better than the brute force solution. You run into similar problems if you try to base your DP off incrementally adding interests rather than users.
So if you know you only have a few users, you can use the brute force approach: generating all user combinations, taking a set intersection of each combination's interests, scoring and saving the max score. The best way to approach larger instances would probably be with approximate solutions through search algorithms (unless there is a DP solution I don't see). You could iteratively add/subtracts/swap users to improve the score and climb towards towards an optimum, or use a branch-and-bound algorithm which systematically explores all user combinations but stops exploring any user-subset branches with null interest intersection (as adding additional users to that subset will still produce a null intersection). You might have a lot of user groups with null interest intersections, so this latter approach could be quite quick practically speaking by its pruning off large parts of the search space, and if you ran it without a depth limit it would find the exact solution eventually.
Branch-and-bound would work something like this:
def getLargestCluster((user, interest)[]):
userInterestDict := { user -> {set of user's interests} } # build a dict
# generate and score user clusters
users := userInterestDict.keys() # save list of users to iterate over
bestCluster, bestInterests, bestClusterScore := {}, {}, 0
return [bestCluster, bestInterests bestClusterScore]
# (define locally in getLargestCluster or pass needed values
def generateClusterScores(i = 0, userCluster = {}, clusterInterests = {}):
curScore := userCluster.size * clusterInterests.size
if curScore > bestScore:
bestScore, bestCluster, bestInterests := curScore, curCluster, clusterInterests
if i = users.length: return
curUser := users[i]
curInterests := userInterestDict[curUser]
newClusterInterests := userCluster.size = 0 ? curInterests : setIntersection(clusterInterests, curInterests)
# generate rest subsets with and without curUser (copy userCluster if pass by reference)
generateClusterScores(i+1, userCluster, clusterInterests)
if !newClusterInterests.isEmpty(): # bound the search here
generateClusterScores(i+1, userCluster.add(curUser), newClusterInterests)
You might be able to do a more sophisticated bounding (like if you can calculate that the current cluster score couldn't eclipse your current best score, even if all the remaining users were added to the cluster and the interest intersection stayed the same), but checking for an empty interest intersection is simple enough. This works fine for 100 users, 50 interests though, up to around 800 data points. You could also make it more efficient by iterating over the minimum of |interests| and |users| (to generate fewer recursive calls/combinations) and just mirror the logic for the case where interests is lower. Also, you get more interesting clusters with fewer users/interests

Cypher recommendation query performance

I am working with rNeo4j for a recommendation application and I am having some issues writing an efficient query. The goal of the query is to recommend an item to a user, with the stipulation that they have not used the item before.
I want to return the item's name, the nodes on the path (for a visualization of the recommendation), and some additional measures to be able to make the recommendation as relevant as possible. Currently I'm returning the number of users that have used the item before, the length of the path to the recommendation, and a sum of the qCount relationship property.
Current query:
MATCH (subject:User {id: {idQ}), (rec:Item),
p = shortestPath((subject)-[*]-(rec))
WHERE NOT (subject)-[:ACCESSED]->(rec)
MATCH (users:User)-[:ACCESSED]->(rec)
RETURN rec.Name as Item,
count(users) as popularity,
length(p) as pathLength,
reduce(weight = 0, q IN relationships(p)| weight + toInt(q.qCount)) as Strength,
nodes(p) as path
ORDER BY pathLength, Strength DESCENDING, popularity DESCENDING
LIMIT {resultLimit}
The query appears to be working correctly, but it takes too long for the desired application (around 8 seconds). Does anyone have some suggestions for how to improve my query's performance?
I am new to cypher so I apologize if it is something obvious to a more advanced user.
One thing to consider is specifying an upper bound on the variable length path pattern like this: p = shortestPath((subject)-[*2..5]->(rec)) This limits the number of relationships in the pattern to a maximum of 5. Without setting a maximum performance can be poor, as paths of all lengths are considered.
Another thing to consider: by summing the relationship property qCount across all nodes in the path and then sorting by this sum you are looking for the shortest weighted path. Neo4j includes some graph algorithms (such as Dijkstra) for finding these paths efficiently, however they are not exposed via Cypher. See this page for more info.

Network Coverage: Finding percentage of points within a given distance

I'll start out by framing the problem I'm trying to solve. This is a health care problem so I'll use the terms 'member' and 'provider.' Basically, we want to try to contract providers until a certain percentage of members are "covered."
With that, let me define "coverage": a member is covered if there is a contracted provider within a given number of miles (let's call this maxd for maximum distance). So if our maxd=15, and there's a provider 12 miles away from me, I'm covered by that provider. Each member only has to be covered by one provider.
The goal here is to cover a certain percentage of numbers (let's say 90%) while having to contract the fewest number of providers. In this case, it's helpful to generate a list that, given our current state (current state being our list of contracted providers), shows us which providers will cover the most members that aren't already covered.
Here's how I'm doing this so far. I have a set contracted_providers that tells me who I have contracted. It may be empty. First, I find out what members are already covered and forget about them, since members only need to be covered once.
maxd = 15 # maximum distance to be covered, 15 for example
for p in contracted_providers:
for m in members:
if dist(p,m) <= maxd:
Then I calculate each provider's coverage (percentage-wise) on the remaning set of yet-uncovered members.
uncovered_members = members # renaming this for clarity
results = dict()
for p in not_contracted_providers:
count = 0
for m in uncovered_members: # this set now just contains uncovered members
if dist(p,m) <= maxd:
results[p] = count/uncovered_members.size() # percentage of uncovered members that this provider would cover.
Ok, thanks for bearing with me through that. Now I can ask my question. These data sets are pretty big. On the larger end of the scale, we might have 10,000 providers and 40,000 members. Is there any better way to do this than brute-force?
I'm thinking something along the lines of a data structure that represents a heat map and then use that to find the best providers. Basically something that allows me to cheat a little bit and not have to calculate each individual distance for every provider, member combination. I've tried to research this but I don't even know what to search for, so any sort of direction would be helpful. If it's relevant, all locations are represented by geolocation (lat,long).
And as a side note, if brute force is pretty much the only option, would something like Hadoop be a good choice to do it quickly?

Algorithm for optimizing the order of actions with cooldowns

I can choose from a list of "actions" to perform one once a second. Each action on the list has a numerical value representing how much it's worth, and also a value representing its "cooldown" -- the number of seconds I have to wait before using that action again. The list might look something like this:
Action A has a value of 1 and a cooldown of 2 seconds
Action B has a value of 1.5 and a cooldown of 3 seconds
Action C has a value of 2 and a cooldown of 5 seconds
Action D has a value of 3 and a cooldown of 10 seconds
So in this situation, the order ABA would have a total value of (1+1.5+1) = 3.5, and it would be acceptable because the first use of A happens at 1 second and the final use of A happens at 3 seconds, and then difference between those two is greater than or equal to the cooldown of A, 2 seconds. The order AAB would not work because you'd be doing A only a second apart, less than the cooldown.
My problem is trying to optimize the order in which the actions are used, maximizing the total value over a certain number of actions. Obviously the optimal order if you're only using one action would be to do Action D, resulting in a total value of 3. The maximum value from two actions would come from doing CD or DC, resulting in a total value of 5. It gets more complicated when you do 10 or 20 or 100 total actions. I can't find a way to optimize the order of actions without brute forcing it, which gives it complexity exponential on the total number of actions you want to optimize the order for. That becomes impossible past about 15 total.
So, is there any way to find the optimal time with less complexity? Has this problem ever been researched? I imagine there could be some kind of weighted-graph type algorithm that works on this, but I have no idea how it would work, let alone how to implement it.
Sorry if this is confusing -- it's kind of weird conceptually and I couldn't find a better way to frame it.
EDIT: Here is a proper solution using a highly modified Dijkstra's Algorithm:
Dijkstra's algorithm is used to find the shortest path, given a map (of a Graph Abstract), which is a series of Nodes(usually locations, but for this example let's say they are Actions), which are inter-connected by arcs(in this case, instead of distance, each arc will have a 'value')
Here is the structure in essence.
Graph{//in most implementations these are not Arrays, but Maps. Honestly, for your needs you don't a graph, just nodes and arcs... this is just used to keep track of them.
node[] nodes;
arc[] arcs;
Node{//this represents an action
arc[] options;//for this implementation, this will always be a list of all possible Actions to use.
float value;//Action value
node start;//the last action used
node end;//the action after that
dist=1;//1 second
We can use this datatype to make a map of all of the viable options to take to get the optimal solution, based on looking at the end-total of each path. Therefore, the more seconds ahead you look for a pattern, the more likely you are to find a very-optimal path.
Every segment of a road on the map has a distance, which represents it's value, and every stop on the road is a one-second mark, since that is the time to make the decision of where to go (what action to execute) next.
For simplicity's sake, let's say that A and B are the only viable options.
na means no action, because no actions are avaliable.
If you are travelling for 4 seconds(the higher the amount, the better the results) your choices are...
there are more too, but I already know that the optimal path is B->A->na->B->A, because it's value is the highest. So, the established best-pattern for handling this combination of actions is (at least after analyzing it for 4 seconds) B->A->na->B->A
This will actually be quite an easy recursive algorithm.
cur is the current action that you are at, it is a Node. In this example, every other action is seen as a viable option, so it's as if every 'place' on the map has a path going to every other path.
numLeft is the amount of seconds left to run the simulation. The higher the initial value, the more desirable the results.
This won't work as written, but will give you a good idea of how the algorithm works.
function getOptimal(cur,numLeft,path){
var emptyNode;//let's say, an empty node wiht a value of 0.
return emptyNode;
var best=path;
for(var i=0;i<cur.options.length;i++){
var opt=cur.options[i];//this is a COPY
for(var i2=0;i2<opt.length;i2++){
opt[i2].timeCooled+=1;//everything below this in the loop is as if it is one second ahead
var potential=getOptimal(opt[i],numLeft-1,best);
if(getTotal(potential)>getTotal(cur)){best.add(potential);}//if it makes it better, use it! getTotal will sum up the values of an array of nodes(actions)
return best;
function getOptimalExample(){
log(getOptimal(someNode,4,someEmptyArrayOfNodes));//someNode will be A or B
End edit.
I'm a bit confused on the question but...
If you have a limited amount of actions, and that's it, then always pick the action with the most value, unless the cooldown hasn't been met yet.
Sounds like you want something like this (in pseudocode):
function getOptimal(){
var a=[A,B,C,D];//A,B,C, and D are actions
a.sort()//(just pseudocode. Sort the array items by how much value they have.)
var theBest=null;
for(var i=0;i<a.length;++i){//find which action is the most valuable
for(...){//now just loop through, and add time to each OTHER Action for their timeSinceLastUsed...
}//That way, some previously used, but more valuable actions will be freed up again.
}//because a is worth the most, and you can use it now, so why not?
EDIT: After rereading your problem a bit more, I see that the weighted scheduling algorithm would need to be tweaked to fit your problem statement; in our case we only want to take those overlapping actions out of the set that match the class of the action we selected, and those that start at the same point in time. IE if we select a1, we want to remove a2 and b1 from the set but not b2.
This looks very similar to the weighted scheduling problem which is discussed in depth in this pdf. In essence, the weights are your action's values and the intervals are (starttime,starttime+cooldown). The dynamic programming solution can be memoized which makes it run in O(nlogn) time. The only difficult part will be modifying your problem such that it looks like the weighted interval problem which allows us to then utilize the predetermined solution.
Because your intervals don't have set start and end times (IE you can choose when to start a certain action), I'd suggest enumerating all possible start times for all given actions assuming some set time range, then using these static start/end times with the dynamic programming solution. Assuming you can only start an action on a full second, you could run action A for intervals (0-2,1-3,2-4,...), action B for (0-3,1-4,2-5,...), action C for intervals (0-5,1-6,2-7,...) etc. You can then use union the action's sets to get a problem space that looks like the original weighted interval problem:
|---1---2---3---4---5---6---7---| time
|{--a1--}-----------------------| v=1
|---{--a2---}-------------------| v=1
|-------{--a3---}---------------| v=1
|{----b1----}-------------------| v=1.5
|---{----b2-----}---------------| v=1.5
|-------{----b3-----}-----------| v=1.5
|{--------c1--------}-----------| v=2
|---{--------c2---------}-------| v=2
|-------{-------c3----------}---| v=2
Always choose the available action worth the most points.

Algorithm to calculate a page importance based on its views / comments

I need an algorithm that allows me to determine an appropriate <priority> field for my website's sitemap based on the page's views and comments count.
For those of you unfamiliar with sitemaps, the priority field is used to signal the importance of a page relative to the others on the same website. It must be a decimal number between 0 and 1.
The algorithm will accept two parameters, viewCount and commentCount, and will return the priority value. For example:
GetPriority(100000, 100000); // Damn, a lot of views/comments! The returned value will be very close to 1, for example 0.995
GetPriority(3, 2); // Ok not many users are interested in this page, so for example it will return 0.082
You mentioned doing this in an SQL query, so I'll give samples in that.
If you have a table/view Pages, something like this
views:int - indexed
comments:int - indexed
Then you can order them by writing
(0.3+LOG10(10+views)/LOG10(10+(SELECT MAX(views) FROM Pages))) +
(0.7+LOG10(10+comments)/LOG10(10+(SELECT MAX(comments) FROM Pages)))
I've deliberately chosen unequal weighting between views and comments. A problem that can arise with keeping an equal weighting with views/comments is that the ranking becomes a self-fulfilling prophecy - a page is returned at the top of the list, so it's visited more often, and thus gets more points, so it's shown at the stop of the list, and it's visited more often, and it gets more points.... Putting more weight on on the comments reflects that these take real effort and show real interest.
The above formula will give you ranking based on all-time statistics. So an article that amassed the same number of views/comments in the last week as another article amassed in the last year will be given the same priority. It may make sense to repeat the formula, each time specifying a range of dates, and favoring pages with higher activity, e.g.
0.3*(score for views/comments today) - live data
0.3*(score for views/comments in the last week)
0.25*(score for views/comments in the last month)
0.15*(score for all views/comments, all time)
This will ensure that "hot" pages are given higher priority than similarly scored pages that haven't seen much action lately. All values apart from today's scores can be persisted in tables by scheduled stored procedures so that the database isn't having to aggregate many many comments/view stats. Only today's stats are computed "live". Taking it one step further, the ranking formula itself can be computed and stored for historical data by a stored procedure run daily.
EDIT: To get a strict range from 0.1 to 1.0, you would motify the formula like this. But I stress - this will only add overhead and is unecessary - the absolute values of priority are not important - only their relative values to other urls. The search engine uses these to answer the question, is URL A more important/relevant than URL B? It does this by comparing their priorities - which one is greatest - not their absolute values.
// unnormalized - x is some page id
un(x) = 0.3*log(views(x)+10)/log(10+maxViews()) +
// the original formula (now in pseudo code)
The maximum will be 1.0, the minimum will start at 1.0 and move downwards as more views/comments are made.
we define un(0) as the minimum value, i.e. (where views(x) and comments(x) are both 0 in the above formula)
To get a normalized formula from 0.1 to 1.0, you then compute n(x), the normalized priority for page x
(1.0-un(x)) * (un(0)-0.1)
n(x) = un(x) - ------------------------- when un(0) != 1.0
= 0.1 otherwise.
Priority = W1 * views / maxViewsOfAllArticles + W2 * comments / maxCommentsOfAllArticles
with W1+W2=1
Although IMHO, just use 0.5*log_10(10+views)/log_10(10+maxViews) + 0.5*log_10(10+comments)/log_10(10+maxComments)
What you're looking for here is not an algorithm, but a formula.
Unfortunately, you haven't really specified the details of what you want, so there's no way we can provide the formula to you.
Instead, let's try to walk through the problem together.
You've got two incoming parameters, the viewCount and the commentCount. You want to return a single number, Priority. So far, so good.
You say that Priority should range between 0 and 1, but this isn't really important. If we were to come up with a formula we liked, but resulted in values between 0 and N, we could just divide the results by N-- so this constraint isn't really relevant.
Now, the first thing we need to decide is the relative weight of Comments vs Views.
If page A has 100 comments and 10 views, and page B has 10 comments and 100 views, which should have a higher priority? Or, should it be the same priority? You need to decide what's right for your definition of Priority.
If you decide, for example, that comments are 5 times more valuable than views, then we can begin with a formula like
Priority = 5 * Comments + Views
Obviously, this can be generalized to
Priority = A * Comments + B * Views
Where A and B are relative weights.
But, sometimes we want our weights to be exponential instead of linear, like
Priority = Comment ^ A + Views ^ B
which will give a very different curve than the earlier formula.
Priority = Comment ^ A * Views ^ B
will give higher value to a page with 20 comments and 20 views than one with 1 comment and 40 views, if the weights are equal.
So, to summarize:
You really ought to make a spreadsheet with sample values for Views and Comments, and then play around with various formulas until you get one that has the distribution that you are hoping for.
We can't do it for you, because we don't know how you want to value things.
I know it has been a while since this was asked, but I encountered a similar problem and had a different solution.
When you want to have a way to rank something, and there are multiple factors that you're using to perform that ranking, you're doing something called multi-criteria decision analysis. (MCDA). See:
There are several ways to handle this. In your case, your criteria have different "units". One is in units of comments, the other is in units of views. Futhermore, you may want to give different weight to these criteria based on whatever business rules you come up with.
In that case, the best solution is something called a weighted product model. See:
The gist is that you take each of your criteria and turn it into a percentage (as was previously suggested), then you take that percentage and raise it to the power of X, where X is a number between 0 and 1. This number represents your weight. Your total weights should add up to one.
Lastly, you multiple each of the results together to come up with a rank. If the rank is greater than 1, than the numerator page has a higher rank than the denominator page.
Each page would be compared against every other page by doing something like:
p1C = page 1 comments
p1V = page 1 view
p2C = page 2 comments
p2V = page 2 views
wC = comment weight
wV = view weight
rank = (p1C/p2C)^(wC) * (p1V/p2V)^(wV)
The end result is a sorted list of pages according to their rank.
I've implemented this in C# by performing a sort on a collection of objects implementing IComparable.
What several posters have essentially advocated without conceptual clarification is that you use linear regression to determine a weighting function of webpage view and comment counts to establish priority.
This technique is pretty easy to implement for your problem, and the basic concept is described well in this Wikipedia article on linear regression models.
A quick summary of how to apply it to your problem is:
Determine the parameters of the line which best fits the view and comment count data for all your site's webpages, i.e., use linear regression.
Use the line parameters to derive your priority function for the view/count parameters.
Code examples for basic linear regression should not be hard to track down if you don't want to implement it from scratch from basic math formulas (use the web, Numerical Recipes, etc.). Also, any general math software package like Matlab, R, etc., comes with linear regression functions.
The most naive approach would be the following:
Let v[i] the views of page i, c[i] the number of comments for page i, then define the relative view weight for page i to be
r_v(i) = v[i]/(sum_j v[j])
where sum_j v[j] is the total of the v[.] over all pages. Similarly define the relative comment weight for page i to be
r_c(i) = c[i]/(sum_j c[j]).
Now you want some constant parameter p: 0 < p < 1 which indicates the importance of views over comments: p = 0 means only comments are significant, p = 1 means only views are significant, and p = 0.5 gives equal weight.
Then set the priority to be
p*r_v(i) + (1-p)*r_c(i)
This might be over-simplistic but its probably the best starting point.
