Time complexity of two-way BFS - algorithm

Time complexity of traditional (one-way) BFS is O(V+E) when an adjacency list is used. What is it in case of two-way BFS?
Based on the answer here, I know:
BFS will traverse 1 + B + B^2 + ... + B^k vertices; whereas
Bi-directional BFS will traverse 2 + 2B^2 + ... + 2B^(k/2) vertices.
But I don't know how to derive the time complexity based on this.

I'm pretty sure that the time complexity is O(V+E) too.
Let's imagine the following graph: You have two binary trees with the same depth. Then for each of the binary trees you connect all the leaves to a vertice and you connect the roots. Starting at the two points connected to the leaves of the binary trees we will see that both algorithms have to look at every vertex. Bi-directional has to look at all edges too. But one-way can skip 2^(d - 1) - 1 edges (d is the depth). So I somehow doubt that it is even worth it to have a more complex algorithm and do it bi-directional. My thought is that you might find a solution faster if the points are "close" but for the extreme cases that take the longest to calculate it doesn't help.
Btw. the number of vertices that are traversed is an approximation at best because you discard all vertices that were already discovered. So the number is probably a lot smaller than B^k.

Related

Why is the complexity of BFS O(V+E) instead of O(E)? [duplicate]

This question already has answers here:
Why is time complexity for BFS/DFS not simply O(E) instead of O(E+V)?
(2 answers)
Breadth First Search time complexity analysis
(8 answers)
Closed 2 years ago.
This is a generic BFS implementation:
For a connected graph with V nodes and E total number of edges, we know that every edge will be considered twice in the inner loop. So if the total number of iterations in the inner loop of BFS is going to be 2 * number of edges E, isn't the runtime going to be O(E) instead?
This is a case where one needs to look a little deeper at the implementation. In particular, how do I determine if a node is visited or not?
The traditional algorithm does this by coloring the vertices. All vertices are colored white at first, and they get colored black as they are visited. Thus visitation can be determined simply by looking at the color of the vertex. If you use this approach, then you have to do O(V) worth of initialization work setting the color of each vertex to white at the start.
You could manage your colors differently. You could maintain a data structure containing all visited nodes. If you did this, you could avoid the O(V) initialization cost. However, you will pay that cost elsewhere in the data structure. For example, if you stored them all in a balanced tree, each if w is not visited now costs O(log V).
This obviously gives you a choice. You can have O(V+E) using the traditional coloring approach, or you can have O(E log V) by storing this information in your own data structure.
You specify a connected graph in your problem. In this case, O(V+E) == O(E) because the number of vertices can never be more than E+1. However, the time complexity of BFS is typically given with respect to an arbitrary graph, which can include a very sparse graph.
If a graph is sufficiently sparse (say, a million vertices and five edges), the cost of initialization may be great enough that you want to switch to a O(E ln V) algorithm. However, these are pretty rare in a practical setting. In a practical setting, the speed of the traditional approach (giving each vertex a color) is just so blinding fast compared to the more fancy data structures that you choose this traditional coloring scheme for everything except the most extraordinarily sparse graphs.
If you maintained a dedicated color property on your vertices with an invariant rule that all nodes are black between algotihm invocations, you could drop the cost to O(E) by doing each BFS twice. On your first pass, you could set them all to white, and then do a second pass to turn them all black. If you had a very sparse graph, this could be more efficient.
Well, let's break it up into easy pieces...
You've kept a visited array, and by looking it up, you decide whether to push a node into the queue or not. Once visited, you don't push it again. So, how many nodes get pushed into the queue: (of course) V nodes. And it's complexity is O(V).
Now, each time, you take out a node from queue and visit all of its neighboring nodes. Now, following this way, for all of V nodes, how many node you'll come across. Well, it's the number of edges if the graph is unidirectional, or, 2 * number of edges if the graph is bidirectional. So, the complexity would be O(E) for unidirectional and O(2 * E) for bidirectional.
So, the ultimate(i.e. total) complexity would be O(V + E) or O(V + 2 * E) or generally, we may say O(v + E).
Because there might be graph having edges less than number of vertices.
Consider this graph:
1 ---- 2
|
|
3 ---- 4
There are 4 vertices but only 3 edges, and in BFS you have to traverse each and every vertex. Thatswhy time complexity is O(V+E) as it considers both V as well as E.

Minimum Spanning Tree (MST) algorithm variation

I was asked the following question in an interview and I am unable to find an efficient solution.
Here is the problem:
We want to build a network and we are given c nodes/cities and D possible edges/connections made by roads. Edges are bidirectional and we know the cost of the edge. The costs of the edges can be represented as d[i,j] which denotes the cost of the edge i-j. Note not all c nodes can be directly connected to each other (D is the set of possible edges).
Now we are given a list of k potential edges/connections that have no cost. However, you can only choose one edge in the list of k edges to use (like getting free funding to build an airport between two cities).
So the question is... find the set of roads (and the one free airport) that minimizes total cost required to build the network connecting all cities in an efficient runtime.
So in short, solve a minimum spanning tree problem but where you can choose 1 edge in a list of k potential edges to be free of cost. I'm unsure how to solve... I've tried finding all the spanning trees in order of increasing cost and choosing the lowest cost, but I'm still challenged on how to consider the one free edge from the list of k potential free edges. I've also tried finding the MST of the D potential connections and then adjusting it according the the options in k to get a result.
Thank you for any help!
One idea would be to treat your favorite MST algorithm as a black box and to think about changing the edges in the graph before asking for the MST. For example, you could try something like this:
for each edge in the list of possible free edges:
make the graph G' formed by setting that edge cost to 0.
compute the MST of G'
return the cheapest MST out of all the ones generated this way
The runtime of this approach is O(kT(m, n)), where k is the number of edges to test and T(m, n) is the cost of computing an MST using your favorite black-box algorithm.
We can do better than this. There's a well-known problem of the following form:
Suppose you have an MST T for a graph G. You then reduce the cost of some edge {u, v}. Find an MST T' in the new graph G'.
There are many algorithms for solving this problem efficiently. Here's one:
Run a DFS in T starting at u until you find v.
If the heaviest edge on the path found this way costs more than {u, v}:
Delete that edge.
Add {u, v} to the spanning tree.
Return the resulting tree T'.
(Proving that this works is tedious but doable.) This would give an algorithm of cost O(T(m, n) + kn), since you would be building an initial MST (time T(m, n)), then doing k runs of DFS in a tree with n nodes.
However, this can potentially be improved even further if you're okay using some more advanced algorithms. The paper "On Cartesian Trees and Range Minimum Queries" by Demaine et al shows that in O(n) time, it is possible to preprocess a minimum spanning tree so that, in time O(1), queries of the form "what is the lowest-cost edge on the path in this tree between nodes u and v?" in time O(1). You could therefore build this structure instead of doing a DFS to find the bottleneck edge between u and v, reducing the overall runtime to O(T(m, n) + n + k). Given that T(m, n) is very low (the best known bound is O(m α(m)), where α(m) is the Ackermann inverse function and is less than five for all inputs in the feasible univers), this is asymptotically a very quick algorithm!
First generate a MST. Now, if you add a free edge, you will create exactly one cycle. You could then remove the heaviest edge in the cycle to get a cheaper tree.
To find the best tree you can make by adding one free edge, you need to find the heaviest edge in the MST that you could replace with a free one.
You can do that by testing one free edge at a time:
Pick a free edge
Find the lowest common ancestor in the tree (from an arbitrary root) of its adjacent vertices
Remember the heaviest edge on the path between the free edge vertices
When you're done, you know which free edge to use -- it's the one associated with the heaviest tree edge, and you know which edge it replaces.
In order to make steps (2) and (3) faster, you can remember the depth of each node and connect it to multiple ancestors like a skip list. You can then do those steps in O(log |V|) time, leading to a total complexity of O( (|E|+k) log |V| ), which is pretty good.
EDIT: Even Easier Way
After thinking about this a bit, it seems there's a super easy way to figure out which free edge to use and which MST edge to replace.
Disregarding the k possible free edges, you build the MST from the other edges using Kruskal's algorithm, but you modify the usual disjoint set data structure as follows:
Use union by size or rank, but not path compression. Every union operation will then establish exactly one link, and take O(log N) time, and all path lengths will be at most O(log N) long.
For each link, remember the index of the edge that caused it to be created.
For each possible free edge, then, you can walk up the links in the disjoint set structure to find out exactly at which point its endpoints were connected into the same connected component. You get the index of the last required edge, i.e., the one it would replace, and the free edge with the greatest replacement target index is the one you should use.

Is there a better algorithm to find the shortest path in a graph?

I'm facing a problem where I have to find the shortest path from two nodes in a graph. The graph has some caracteristics that I'm sure can lead to a better solution, as all the ones I've found and thought of 'till now are O(V+E).
In particular:
-The graph is a single connected component.
-The graph is not oriented and unweighted.
-The nodes which arrange a simple cycle are a complete subgraph (***).
I need to find and return the minimum distance, given two nodes of the graph.
I've looked at different algorithms, for weighted and unweighted graphs: Dijkstra, Bellman-Ford, Floyd-Warshall and Breadth First Search, but I can't find an algorithm that makes use of the (***) property, which I'm quite sure is important and useful.
Thanks in advance.
If the input to your problem is a graph and a single pair of vertices then you cannot hope for a solution faster than O(V + E) simply because you need to at least read the input data. However, if you have multiple (say, K) queries, then you can indeed do better than O(K*(V + E)).
If that is the case then one way of incorporating property (***) that I see is the following:
If the graph is a (rooted) tree then the shortest distance between two vertices (u, v) is a path (u--w--v), where w is the least common ancestor (LCA) of u and v. There exists an algorithm that takes O(V + E) time for a certain precomputation and then O(1) time for the actual LCA queries (it is described, for example, here. Once you have the vertex w, it is then straightforward to calculate the length of the path, since it is essentially (depth(w) - depth(u)) + (depth(w) - depth(v)), where depth(x) is the depth of the vertex x in our rooted tree.
In your case, the graph is not a tree, but resembles one a bit. I will give a high level idea of what seems to be possible for this case.
Property (***) tells us that each strongly connected component is a complete subgraph, and the distances between each pair of vertices inside such a component is 1. Therefore, if we contract each strongly connected component into a single vertex then we could do something similar to the previous case.
However, there would be a few subtleties to take care of. For example, when a path in the "contracted" tree passes a vertex, it could mean that we need to visit either one or two vertices in the original graph, depending on whether or not we need to switch the vertex before continuing along our contracted tree. But this is something that we can precompute once for each contracted vertex, and then each query can again be made to run in O(1) time, so overall for K queries we would then have O(V + E) for preprocessing and O(K) for queries, giving us total O(V + E + K) time.

Find the shortest path from source to target in a weighted-undirected graph in O(V + E) time

I've been tasked with designing an algorithm that finds the shortest path in an weighted-undirected graph with V nodes and E edges in O(V + E) time. The graph weights are all positive integers and no weight is greater than 15.
I believe I can use Dijkstra's algorithm to find the shortest path from a source node to a target node, but I don't think it satisfies the runtime constraints.
Knowing at the runtimes of BFS and DFS, I'm thinking that some sort of modification with those algorithms will get me to O(V + E), but I'm not sure what direction to head in or how I can leverage the <= 15 weight constraint on the edges.
Any help is appreciated.
You can use Dijkstra's algorithm, but you have to be a little careful with the priority queue.
Since all the weights are integers from 1 to 15, there can only be 16 different priorities in the queue at any one time. You can use this fact to implement all your priority queue operations in constant time. That will change the complexity of the algorithm from O(|V| + |E| log |V|) to O(|V| + |E|)
There are lots of ways to make that priority queue. Basically you partition the entries into lists of entries with the same priority, and then you only have to prioritize the 16 lists. It's reasonable to keep those 16 lists in a circular array.
The algorithm that You're looking for is called Dial's Algorithm as it works also in graphs that contain cycles. It's complexity is O(E + WV). In case, W>>V you can replace one bucket per W with buckets for weights 1, 2-3, 4-7, 8-15 etc.
It's an optimization on Dijkstra, which uses the fact, that given the range of weights, You're able to replace the Fibonacci Heap with buckets which will decrease the find_node operation from O(logn) to O(1).
The algorithm in detail is well described on GeeksForGeeks and Wikipedia among others.
You should also be interested in Directed Acyclic Graph Shortest Path in Cormen's Introduction to Algorithms on p. 655 or on GeeksForGeeks

Algorithm to find if a node is reachable from another node

I have a large graph with millions of nodes. I want to check if node 'A' is reachable from node 'B' with less than 4 hops. If possible, I want the shortest path. Which is the best way (or algorithm) to solve this issue?
Note that if the graph is unweighted (as it seems in your question) - a simple and efficient BFS will be enough to find the shortest path from the source to the target.
Also, since you have a single source and a single target - you can apply bi-directional BFS, which is more efficient then BFS.
Algorithm idea: do a BFS search simultaneously from the source and the target: [BFS until depth 1 in both, until depth 2 in both, ....].
The algorithm will end when you find a vertex v, which is in both BFS's front.
Algorithm behavior: The vertex v that terminates the algorithm's run will be exactly in the middle between the source and the target.
This algorithm will yield much better result in most cases then BFS from the source [explanation why it is better then BFS follows], and will surely provide an answer, if one exist.
why is it better then BFS from the source?
assume the distance between source to target is k, and the branch factor is B [every vertex has B edges].
BFS will open: 1 + B + B^2 + ... + B^k vertices.
bi-directional BFS will open: 2 + 2B^2 + 2B^3 + .. + 2B^(k/2) vertices.
for large B and k, the second is obviously much better the the first.
(*) The explanation of bi-directional search is taken from another answer I posted
The best algorithm for finding the shortest path between two nodes in a graph in which you have no extra information about how likely it is that one node is close to the target is Dijkstra's Algorithm. You can easily modify this algorithm to quit after 3 hops, to avoid wasting computation on results that you are not interested in.
If you do have some extra information about the likelihood that a given node is close to your target, you can use A* search, which uses a heuristic on a node's distance to its target to improve its runtime performance.
If you need path less than 3 hops, then all possible paths are A (A=B), A-B (nodes are adjacent), A-X-B (X is a node adjacent to both ends). So there's no need in any complex algorithm. First, test for A=B, second, test that A and B are adjacent, and third, try to find X that is adjacent to both A and B (eg. intesection of endpoint adjacency sets).

Resources