Time complexity of betweenness centrality? - algorithm

What is the time complexity of computing betweenness centrality if we are given the shortest path predecessor matrix of a graph?
Predecessor matrix cells look like this:
if node i and node j are directly connected then value in the cell is 0;
if node i and node j are not connected then value in the cell is -1;
else cell = predecessor(j) - this can be only one predecessor if there is a single shortest path or an array of predecessors if there are more than one shortest paths between i and j.
Thank you for your answer,
I am familiar with Brandes Algorithm. However Brandes Algorithm will compute the betweenness for all the nodes inside a network. I think that time spent for computing CB for one vertex is the same as the time for computing CB for all vertices as Brandes algorithm can't be adapted for such a case.
So, my idea was to store the predecessor matrix, and to be able to compute CB for a certain vertex (and not have to wait for all vertices CB computations).
I am aware I can't achieve smaller time complexity but I think that the difference in amount of time can be made by not computing CB for all 7000 vertices. Instead, by having this matrix I am able to compute CB for only one single vertex.
I think it is possible to compute CB in O(n^2*L) where L is the average shortest path when we have predecessor matrix.
What is your opinion about this concept?

As far as I can find out, the best known algorithm for computing betweenness centrality is the one described in this paper:
Ulrik Brandes (2001). A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology 25(2):163-177.
You'll see that this algorithm computes, as a first step, the number of shortest paths between every pair of nodes. It is natural to do so in a way that simultaneously computes the predecessor matrix too. So it appears that there's no benefit to pre-computing the predecessor matrix: you get it essentially for free anyway while executing Brandes' algorithm.
(Of course, this isn't a proof that it makes no difference, and maybe someone else knows better. You might want to ask on cstheory.stackexchange.com.)

Related

What is the simplest, easiest algorithm for finding EMST of a complete graph of order 10^5

I just want to be clear that EMST stands for Euclidean Minimum Spanning Tree.
Essentially, I have been given a file with 100k 4D vertices (one vertex on each line). The goal is to visit every vertex in the file while minimizing the total distance traveled. The distance from a point to another point is simply the Euclidean Distance (Distance if you draw a Straight Line between two points".
I already know that this is pretty much the Traveling Salesman Problem, which is NP Complete, so I am looking to approximate the solution.
The first approximation algorithm that came to my mind is by finding the MST from a graph constructed from the file... But that would take O(N^2) to even just construct all the edges from the file given the fact that it's a complete graph ( I can go from any point to another ). And given that my input is N = 10^5, my algorithm will have a huge running time, which is too slow...
Any ideas on how I can plan on approximating the solution? Thank you very much..
I know it's quadratic-time, but I think you should consider Prim with an implicit graph. The structure of the algorithm is
for each vertex v
mindist[v] := infinity
visited[v] := false
choose a root vertex r
mindist[r] := 0
repeat |V| times
let w be the minimizer of d[w] such that not visited[w]
visited[w] := true
for each vertex v
if not visited[v] and distance(w, v) < mindist[v]:
mindist[v] := distance(w, v)
parent[v] := w
Since the storage used is linear, it will likely stay resident in cache, and there are no fancy data structures, so this algorithm should run pretty fast.
I am going to assume that you actually want a EMST as your title suggests, and the TSP is just a means to that end, and not the actual goal itself. The two have very different restrictions (the TSP being far more restrictive), and thus very different optimal solutions.
Overview
The idea is that we want to run a modified kruskal's algorithm, which will make use of a k-d tree to find the closest pairs without evaluating every potential edge. We can find the shortest edge to each vertex in a connected component, take the shortest overall, and connect our connected components via that edge. As you'll see, this connects at least half of our connected components each iteration, so it takes at most logn iterations to complete.
Nearest Neighbor Search
For constructing an EMST, you'll want to use a data structure for querying for nearest neighbors in 4D space. You could extend octrees to work in a higher dimension, but I'd personally go with a k-d tree. You can construct a k-d tree in O(nlogn) time using the median of medians algorithm to find the median at each level, and you can insert / remove from a balanced k-d tree in O(logn) time.
Once you've built a k-d tree, you'll want to query for the nearest neighbor to each point. We'll then construct the edge between these two vertices. Many of these edges will be duplicated, as for some vertices A and B, A's nearest neighbor may be B, and B's nearest neighbor may be A. We'll handle this by storing which connected component each vertex belongs to, and after two vertices are joined by an edge, the duplicate edge will clearly connect two vertices of the same connected component, and so we'll discard it. To accomplish this, we'll use a disjoint-set (just like in many implementations of kruskal's algorithm) to assign a connected component to each vertex. This will also prevent us from creating cycles in our graph, which would introduce unnecessary edges in the MST.
Merging
However, as we construct each edge, we'll want to insert it into a min-heap priority queue before checking which edges to keep and which edges connect already-connected vertices. This will not affect the outcome of this first iteration, but later on we will need to handle edges by increasing distance. Then dequeue all the edges, check for unnecessary / redundant edges via the disjoint-set, insert valid edges into the MST, and merge the respective disjoint-sets. All of this of course introduces a nlogn factor for constructing and dequeuing elements from the min-heap (we could also just sort them in a plain array, if we wished).
After this first iteration of adding edges, we'll have connected at least half of the MST, maybe more. This is because for each vertex we added one edge, and we can have at most one duplicate per edge, so we've added a few as vertices / 2 edges, but as many as vertices - 1. Now at least 1/2 of our MST has been built. We'll continue the process as described in the following paragraphs, until we've added vertices - 1 edges in total.
Generalizing NN-Search
To continue, we'll want to construct lists of the vertices in each connected component, so that we can iterate over them by groups. This can be done in nearly linear time, as searching (also merging) a disjoint-set takes O(α(n)) time (α being the inverse ackermann function) and we repeat exactly n times. Once we have our lists of vertices per connected component, the rest is fairly straightforward. We'll take our existing k-d tree, and remove all the vertices in our current connected component. We'll then query for the nearest neighbor to each vertex to each vertex in our connected component, and add these edges to our min-heap. We'll then add these vertices back into the k-d tree, and repeat on the next connected component. Since we insert/remove a total of n elements, this amounts to an average case O(nlogn) time complexity.
Now that we have a queue of the shortest potential edges connecting our connected components, we'll dequeue these in order, and just as before insert valid edges and merge the disjoint sets. For the same reasons as before, this is guaranteed to connect at least half of our components, maybe even all of them. We'll repeat this process until we have connected all vertices into a single connected component, which will be our MST. Note that because we halve the number of disconnected components each iteration, it'll take at most O(logn) iterations to connect every vertex in our MST (most likely far less).
Remarks
Overall, this will take O(nlog^2(n)) time. There will likely be far less than log(n) iterations however, so expect a speedup there in practice. Also note that R-tree might be a good alternative to the k-d tree- I don't know how they compare in practice however.

Using Single Source Shortest Path to traverse a chess board

Say we have a n x n chess board (or a matrix in other words) and each square has a weight to it. A piece can move horizontally or vertically, but it can't move diagonally. The cost of each movement will be equal to the difference of the two squares on the chess board. Using an algorithm, I want to find the minimum cost for a single chess piece to move from the square (1,1) to square (n,n) which has a worst-case time complexity in polynomial time.
Could dikstras algorithm be used to solve this? Would my algorithm below be able to solve this problem? Diijkstras can already be ran in polynomial time, but what makes it this time complexity?
Pseudocode:
We have some empty set S, some integer V, and input a unweighted graph. After that we complete a adjacency matrix showing the cost of an edge without the infinity weighted vertices and while the matrix hasn't picked all the vertices we find a vertex and if the square value is less then the square we're currently on, move to that square and update V with the difference between the two squares and update S marking each vertices thats been visited. We do this process until there are no more vertices.
Thanks.
Since you are trying to find a minimum cost path, you can use Dijkstra's for this. Since Dijkstra is O(|E| + |V|log|V|) in the worst case, where E is the number of edges and V is the number of verticies in the graph, this satisfies your polynomial time complexity requirement.
However, if your algorithm considers only costs associated with the beginning and end square of a move, and not the intermediate nodes, then you must connect all possible beginning and end squares together so that you can take "short-cuts" around the intermediate nodes.

Algorithm for determining largest covered area

I'm looking for an algorithm which I'm sure must have been studied, but I'm not familiar enough with graph theory to even know the right terms to search for.
In the abstract, I'm looking for an algorithm to determine the set of routes between reachable vertices [x1, x2, xn] and a certain starting vertex, when each edge has a weight and each route can only have a given maximum total weight of x.
In more practical terms, I have road network and for each road segment a length and maximum travel speed. I need to determine the area that can be reached within a certain time span from any starting point on the network. If I can find the furthest away points that are reachable within that time, then I will use a convex hull algorithm to determine the area (this approximates enough for my use case).
So my question, how do I find those end points? My first intuition was to use Dijkstra's algorithm and stop once I've 'consumed' a certain 'budget' of time, subtracting from that budget on each road segment; but I get stuck when the algorithm should backtrack but has used its budget. Is there a known name for this problem?
If I understood the problem correctly, your initial guess is right. Dijkstra's algorithm, or any other algorithm finding a shortest path from a vertex to all other vertices (like A*) will fit.
In the simplest case you can construct the graph, where weight of edges stands for minimum time required to pass this segment of road. If you have its length and maximum allowed speed, I assume you know it. Run the algorithm from the starting point, pick those vertices with the shortest path less than x. As simple as that.
If you want to optimize things, note that during the work of Dijkstra's algorithm, currently known shortest paths to the vertices are increasing monotonically with each iteration. Which is kind of expected when you deal with graphs with non-negative weights. Now, on each step you are picking an unused vertex with minimum current shortest path. If this path is greater than x, you may stop. There is no chance that you have any vertices with shortest path less than x from now on.
If you need to exactly determine points between vertices, that a vehicle can reach in a given time, it is just a small extension to the above algorithm. As a next step, consider all (u, v) edges, where u can be reached in time x, while v cannot. I.e. if we define shortest path to vertex w as t(w), we have t(u) <= x and t(v) > x. Now use some basic math to interpolate point between u and v with the coefficient (x - t(u)) / (t(v) - t(u)).
Using breadth first search from the starting node seems a good way to solve the problem in O(V+E) time complexity. Well that's what Dijkstra does, but it stops after finding the smallest path. In your case, however, you must continue collecting routes for your set of routes until no route can be extended keeping its weigth less than or equal the maximum total weight.
And I don't think there is any backtracking in Dijkstra's algorithm.

Finding reachable vertices for every vertex in a directed graph

I know that brute force approach to do this is perform DFS on all the vertices of the graph.So for this algorithm the complexity would be O(V|V+E|). But is there more efficient way to do this?
I get the impression from papers like http://research.microsoft.com/pubs/144985/todsfinal.pdf that there is no algorithm that does better than O(VE) or O(V^3) in the general case. For sparse graphs and other special graphs there are faster algorithms. It seems, however, that you can still make improvements by separating "index construction" from "query", if you have some idea of the number of queries that will be made on the data. If there are going to be a lot of queries, O(1) is possible for queries if all the data is pre-computed (DFS or Floyd-Warshall, etc.) and stored in O(n^2) space. On the other hand, if there are going to be relatively few queries, space and/or index construction time can be reduced at the expense of query time.
I really suspect that there isn't a known better algorithm for general graphs. All the papers I found on the subject [1] [2] describe algorithms that run in O(|V| * |E|) time. That isn't better than your naïve attempt in the worst case.
Even the wikipedia page [3] says the fastest algorithms reduce the problem to matrix multiplication, which the fastest algorithms are only marginally better than your baseline.
[1] http://ion.uwinnipeg.ca/~ychen2/conferencePapers/tranRelationCopy.pdf
[2] http://www.vldb.org/conf/1988/P382.PDF
[3] http://en.wikipedia.org/wiki/Transitive_closure#Algorithms
[EDIT: As pointed out by kraskevich, the final query step can be worse than I had originally claimed: up to O(|V|^2) even for an output of size O(|V|), which is no better than ordinary DFS without any preprocessing.].
In the worst case, O(|V|^2) space would be needed to store all this information explicitly -- i.e., to store the complete list of reachable vertices for each vertex (think of a graph in which every vertex has an edge to every other vertex). But it's possible to represent it in such a way that only O(|V|) space is needed, and this representation can be built in O(|V|+|E|) time, and a query on it will only take time proportional to the size of the answer (number of reachable vertices).
The basic idea is: Every vertex in a strongly connected component (SCC) can reach every other vertex in the same SCC (this is the definition of SCC), and can reach all vertices in SCCs that it can reach, and no other vertices.
Find all SCCs; this can be done in O(|V|+|E|) time. Build a table SCC, so that SCC(u) = i if the SCC of u is i (both vertices in G and SCCs can be represented as integers). Afterwards make another pass through this table to build a dual table, Verts, so that Verts(i) contains a list of all vertices in the ith SCC.
Build a new graph G' whose vertices are the SCCs of G. G' will necessarily be acyclic.
So, given a vertex u in G, look up its SCC, SCC(u). Call this i. Perform a DFS through G' starting at vertex i: For each vertex (of G') j encountered during this DFS, output every vertex (of G) in Verts(j).

How to calculate minimum time of travel between two places when a matrix containing the time of travel between each node is given

For n stations a n*n matrix A is given such that A[i][j] represents time of direct journey from station i to j (i,j <= n).
The person travelling between stations always seeks least time. Given two station numbers a, b, how to proceed about calculating minimum time of travel between them?
Can this problem be solved without using graph theory, i.e. just by matrix A alone?
You do need graph theory in order to solve it - more specifically, you need Dijkstra's algorithm. Representing the graph as a matrix is neither an advantage nor a disadvantage to that algorithm.
Note, though, that Dijkstra's algorithm requires all distances to be nonnegative. If for some reason you have negative "distances" in your matrix, you must use the slower Bellman-Ford algorithm instead.
(If you're really keen on using matrix operations and don't mind that it will be terribly slow, you could use the Floyd-Warshall algorithm, which is based on quite simple matrix operations, to compute the shortest paths between all pairs of stations (massive overkill), and then pick the pair you're interested in...)
This looks strikingly similar to the traveling salesman problem which is NP hard.
Wiki link to TSP

Resources