Calculating "Kevin Bacon" Numbers - algorithm

I've been playing around with some things and thought up the idea of trying to figure out Kevin Bacon numbers. I have data for a site that for this purpose we can consider a social network. Let's pretend that it's Facebook (for simplification of discussion). I have people and I have a list of their friends, so I have the connections between them. How can I calculate the distance from one person to another (basically, a Kevin Bacon number)?
My best idea is a Bidirectional search, with a depth limit (to limit computational complexity and avoid the problem of people who simply can't be connected in the graph), but I realize this is rather brute force.
Could it be better to make little sub-graphs (say something equivalent to groups on Facebook), calculate the shortest distances between them (ahead of time, perhaps) and then try to use THOSE to find a link? While this requires pre-calculation, it could make it possible to search many fewer nodes (nodes could be groups instead of individuals, making the graph much smaller). This would still be a bidirectional search though.
I could also pre-calculate the number of people an individual is connected to, searching the nodes for "popular" people first since they could have the best chance of connecting to the given destination individual. I realize this would be a trade-off of speed for possible shortest path. I'd think I'd also want to use a depth-first search instead of the breadth-first search I'd plan to use in the other cases.
Can someone think of a simpler/faster way of doing this? I'd like to be able to find the shortest length between two people, so it's not as easy as always having the same end point (such as in the Kevin Bacon problem).
I realize that there are problems like I could get chains of 200 people and such, but that can be solved my having a limit to the depth I'm willing to search.

This is a standard shortest path problem. There are lots of solutions, including Dijkstra's algorithm and Bellman-Ford. You may be particularly interested in looking at the A* algorithm and seeing how it would perform with the cost function relative to the inverse of any particular node's degree. The idea would be to visit more popular nodes (those with higher degree) first.

Sounds like a job for
Dijkstra's algorithm.
ED: Eh, I shouldn't have pulled the trigger so fast. Dijkstra's (and Bellman-Ford) reduces to a breadth-first search when the weights are 1, so this isn't too useful. Oh well.
The A* algorithm, mentioned by tvanfosson, may be ideal for this. The idea is that instead of searching and recursing in whatever order the elements are in each level of the tree (rooted on your start- or end-point), you use some heuristic to determine which element you are going to try first. In your case a good bet would probably be the degree of a node (number of "friends"), but you could possibly want to use the number of people within some arbitrary number of degrees of a given person (i.e., the guy who has has three friends who each have 100 friends is likely to be a better node than the guy who has 20 friends in a clique that shuns outsiders). There's all sorts of other things you could use as a heuristic (friends get 2 points, friends-of-friends get 1 point; whatever, experiment).
Combine this with a depth limit (cut off after 6 degrees of separation, or whatever), and you can vastly improve your average case (worst case is still the same as basic BFS).

run a breadth-first search in both directions (from each endpoint) and stop when you have a connection or reach your depth limit

This one might be better overall Floyd-Warshall the all pairs shortest distance.

Related

A* or Bidirectional Breadth First Search?

Not sure if this is the right place to ask but,
I have programmed a bidirectional breadth-first search algorithm in Java which searches from both the start node in a graph and the goal node in a graph at the same time. In a graph with 3000000 (3 million) nodes of which all nodes are connected with an average of 4 other nodes (connected with two-way/bidirectional edges) it takes on average only 0.5 seconds on a mediocre CPU to find the shortest path between any two random nodes, with about 10 seconds being the worst-case time that I found over 30 test runs.
Assuming that only a search for one path is required (for example, when plotting a route between a starting point and a destination), what would be the advantages of using an A* algorithm with a decent heuristic in this case? Yes, it might be slightly faster with finding a path, but A* most likely won't find the shortest path.
Can someone elaborate on why A* is often chosen over bidirectional breadth-first search and what advantages it yields in terms of performance (computation time, memory usage, the chance of finding the optimal solution etc.)?
Also,
Happy easter everyone!
Bidirectional search is great when it’s feasible, but it relies on being able to produce goal nodes as easily as start nodes (how would you select a goal node in a chess game, for example?), and the branching factor being similar in both directions. It relies on being able to go backwards at all, come to that. Even if you have a good heuristic, you’re unlikely to be able to take much advantage from it in the reverse part of the search.

Game of pathfinding in a weighted graph

I was wondering about an optimisation problem related to graph's exploration :
Suppose we have a connected weighted graph, and each vertex has between 1 and 4 connections with other vertices. Now let's say some vertices contain chocolate ,and we put on that graph at two different vertices two students both of them controlled by two AIs that have access to the position of the chocolates, the position of the other student and the graph, and at each turn, both of them can move to a connected vertex (and need K moves if the weigh of the edge is K). Finally, if a student is on a vertex containing chocolate he eats it.
My question : What would the best algorithm for the AI so that the student controlled will eat more chocolate then the other ?
Thank you .
The best approach to this would likely be using cost analysis and heuristics.
Your bot has 1-4 choices it can make, so you should consider each one. What's the benefit of going up? What's the cost? If you take value as benefit - cost, then you should choose the most valuable option.
So... How do you even calculate benefit and cost, anyways? You've outlined a few conditions already. The K value is a cost. Whether there is a chocolate there is a benefit. Maybe analyze the number of chocolates adjacent to a given vertex?
But wait! You know your enemy's position, and your enemy knows yours! If you move in a given direction, it'll influence their move (If they're smart.) So you'll have to look ahead and analyze what your enemy will probably do in response to you. Chess solvers use breadth-first lookahead to plan out the best possible move via this exact system. Unfortunately, your problem is unbounded - There is no true win condition. So your bot will look ahead indefinitely (or, until it runs out of memory, anyways.) This is a problem, because calculations take time. Chess solvers get around this by imposing a time limit. They take the best move they've found by the time they run out of time.
Now, I've given you a general framework to work within. You still need to assemble it. Part of making heuristics is weighing your costs and benefits. Maybe the cost of K is insignificant compared to the benefit of getting a chocolate? Maybe it isnt? Use constant coefficients to weigh the costs and benefits.
As for finding your way through the grid, I'd take a look at Dijkstra's algorithm or A* to move.
Oh, and before I forget, here's the wikipedia article on graph traversal. You may find it useful.

How to divide a connected weighted graph to N semi-equal subgraphs

I have a graph of many hundred nodes that are mainly connected with each other. I can do processing on entire graph but it really takes a lot of time, so I would like to divide it to smaller sub-graphs of approximately similar size.
With other words. I have a collection of aerial images and I do pairwise image matching on all of them. As a result I get a set of matches for each pair (pixel from first image matched with pixel on second image). Number of matches is considered as weight of this (undirected) edge. These edges then form a graph mentioned above.
I'm not so familiar with graph theory (as it is a very broad topic). What is the best algorithm for this job?
Thank you.
Edit:
This problem has a perfect analogy which I think is easier to understand. Imagine you have a set of people and their connections/friendships, like I social network. Each friendship has a numeric value/weight representing how good friends they are. So in a large group of people I want to get k most interconnected sub-groups .
Unfortunately, the problem you're describing is almost certainly NP-hard. From a graph perspective, you have a graph where each edge has a weight on it. You're trying to split the graph into relatively equal pieces while cutting the lowest total cost of edges cut. This problem is called the maximum k-cut problem and is NP-hard. If you introduce the constraint that you also want to try to make the pieces roughly even in size, you have the balanced k-cut problem, which is also NP-hard.
The good news is that there are nice approximation algorithms for these problems, so if you're looking for solutions that are just "good enough," then you can probably find a library somewhere that implements them. There are also other techniques like spectral clustering which work well in practice and are really fast, but which don't have any guarantees on how well they'll do.

Algorithm Optimization - Shortest Route Between Multiple Points

Problem: I have a large collection of points. Each of these points has a list with references to other points with the distance between them already calculated and stored. I need to determine the shortest route that begins from an origin and passes through a specific number of points to any destination.
Ex: I'm on vacation and I'm staying in a specific city. I'm making a ONE WAY trip to see ANY four cities and I want to travel the least distance possible. I cannot visit the same city more than once.
Current solution: Right now I'm just iterating through every possibility manually and storing the shortest path. This works but feels inefficient. Also, this problem will eventually be expanded to include searching from multiple origin points to multiple destination points, so I think that might explode the search space.
What is the better way to search for the shortest route?
Answering to the updated post, your solution of checking every possibility is optimal (at least, noone has discovered better algorithms so far). Yes, that's a travelling salesman, whose essense is not touching every city, but touching every city once. If you don't want to search for best solution possible, you may find it useful to use heuristics that work faster, but allow for limited discrepancy from ideal solution.
For future answerers: Floyd-Warshall algorithm and all Floyd-like variations are inapplicable here.
In generally you should to strict bad variants...
I think you should use some variations of Branch_and_bound method
http://en.wikipedia.org/wiki/Branch_and_bound
Either bredth first search as norheim.se said or Dijkstra's algorithm would be my suggestion as well.
This sounds Travelling Salesman-esque? One solution is to use an optimisation technique such as an evolutionary algorithm. Currently you are doing an exhaustive search, which will get very slow very quickly. But I think this is pretty much a travelling salesman problem and it has been tackled for several decades if not centuries, and such there are several possible ways of attack. Google is your friend.
Perhaps this is what the original poster means by "iterating through each possibility manually and storing the shortest path", but I thought I'd like to make explicit what appears to be a baseline solution.
Assume you already have a two-point shortest path algorithm--this has classical solutions for various kinds of graphs. Assume all distances are nonnegative and d(A->B->C) = d(A->B) + d(B->C).
The essentials are that the path starts at S goes through one of intermediate cities "abcd" and ends with E:
e.g. SabcdE, SacbdE, etc...
With only 4 intermediate cities, you enumerate all 24 permutations. For each permutation use your shortest two-point algorithm to compute the path from head to tail, and its total distance.
Then given the start and ending point, there are 12 possibilities attaching to one of abcd and for each two possibilities for the interior. You've computed these distances already, so you add on the distance from S to the head and the tail to E. Choose minimum. So once you've precomputed the intermediate distances for a fixed set of interior cities you need to do 12 two point shortest path problems for any pair of start and end points.
This obviously scales poorly with increasing number of intermediate cities. It's not clear to me that it could do better unless you impose greater restrictions on the graph structure (is this in a physical Euclidenan space? Triangle inequality?).
My thought example: suppose all intermediate distances between cities are O(1). With no restriction on the graph, then the distance from S to any intermediate city might be 1000 except for one being 1. Same for the tail. So you can force the first city to be visited to be anything. Now, go one layer down, take the first city as the "start point". Apply the same argument: you can make the best path go to any of the following cities by manipulating the distances in the graph.
So it seems that the complexity can't be helped without additional assumptions.
This is the very common and real time situation any one can fall in.Google map user interface gives you the path in the same order, you add in the destination list. it doesn't give you the optimal path though their own Google maps API provide the solution.
Google maps API provides the solution for this. In the request to find out the path you have to provide the flag 'optimizeWaypoints: true,'. The request will seem like this.
var request = {
origin: start,
destination: end,
waypoints: waypts,
optimizeWaypoints: true,
travelMode: google.maps.TravelMode.DRIVING
};
and you can see whole code of the utility in the view source as complete utility is developed in javascript and HTML.
I hope it will help.
It appears that the edges of your graph are bidirectional. In this case, the algorithm you're looking for is Dijkstra's algorithm.

How to design an approximate path solution?

I am attempting to write (or expand on an existing) graph search algorithm that will let me find the path to get closest to destination node considering there is no guarantee that the nodes will be connected.
To provide a realistic application of this, let's say I need to get from Brampton, Ontario to Hamilton, Ontario. I know my possible options at my start point are Local transit, GO bus or Walking. I know that walking is the least desired way to get to my destination so I look at GO bus first. I know I can take GO to a point close to Hamilton, but at that point the GO bus turns and goes another direction at that closest point is at a place where I have no options (other than walk, but the algorithm would only consider walking for short distances otherwise it will consider the route not feasible)
Using this same example, if the algorithm were to find that I can get there a way that is longer but gets me closer to the destination node (or possible at the destination node) that would be a higher weighted path (The weightings don't matter so much while its searching, only when the results are delivered, it would list by which path was closest to the destination in ascending order). For example, one GO Bus may get me 3km from the destination node, while 3 public transit buses would get me 500m away
So my question is two fold:
1) What algorithm should I start with that does something similar
2) How would I programmaticly explain that it's ok if nodes don't connect so that it doesn't just jump from node A to node R. Would starting from the end and working backward accomplish this
Edit: I forgot to ask how to aim for the best approximate solution because especially with a large graph there will be possibly millions of solutions for this problem.
Thanks,
Michael
Read up on the A* algorithm. It is a generalization of Dijkstra's shortest path algorithm, that allows you to specify a heuristic, which provides a lower bound for distances between two verteces. In your case, the heuristic function would simply return the Euclidean distance.
Run the algorithm and keep track of the vertex with the best characteristic value, which you somehow compute from the graph distance from source and Euclidean distance to target. The only tricky part is to determine when to terminate (unless you want to traverse the entire graph).
Why can't you assume that all nodes are connected? In the real world, they normally are, i.e. you can always walk or call a taxi, etc.?
In this case, you could simply change your model the following way: You have one graph for each transportation method. The nodes that are at the same place are connected with edges of weight 0 (i.e. if you are dropped off by car at an airport or train station).
Then, label each vertex and edge with the transportation type and you can simply use existing routing algorithms. Oh, by the way: A* will not scale well to really big networks. To get something that software like Yahoo/Google/Microsoft Maps actually would use, have a look here. The work of this research group includes the winner of the DIMACS shortest path challenge.
Sounds very much like a travelling salesman problem with additional node characteristics. Just be wary that this type of problem is NP Complete and your best bet would be to go with some sort of approximation algorithm.

Resources