I realize you can find the diameter or the max distance of a undirected unweighted graph by using BFS twice, my question is about the specifics of this algorithm.
If I were to implement this would I literally just do BFS twice and it would return the max distance? or do I have to set throughout the BFS algorithm the distance and weight values for each node and calculate if the new max is greater than the old max, etc? Because I have heard if you use BFS then the last visited value will be the max distance from your original node, which means I wouldn't need to do all that stuff, right?

You have to run BFS n times, once from each node. Distances must be calculated from scratch every time: distances from some node u have no sense when you run bfs from some other node v, so you have to recalculate them entirely.
Now, for each node v you store the maximum distance to any other node. The diameter of the graph is maximum of these maximums.
However, as I understood from your comment, you are solving the problem on for a tree rather than a general graph. In case of the tree, it is simpler. Run BFS from any node v. Find any of the farthest nodes from v; let it be d1. Now run BFS again from node d1 and find any of the farthest nodes from it; let it be d2. Then, a path from d1 to d2 is a diameter of the tree (one of them). There is a proof in the answer to this question.
Note that these two BFS's still compute all distances from scratch. So yes, you just have to run BFS twice.


Algorithm to find any two nodes with distance of at least half the (undirected) graph's diameter

I have to give an algorithm as follows:
Given an undirected connected graph G, give an algorithm that finds two nodes x,y such that their distance is at least half the diameter of the Graph. Prove any claim.
I'm assuming I have to run a BFS from any arbitrary node and find its furthest node to find the diameter. Then find two of the explored nodes whose distance is bigger than half the diameter.
But I doubt this is the optimal and asked for solution. Is there any other way that when running the BFS to find the diameter, to simultaneously find these two required nodes? So that the complexity remains polynomial.
Any guidance or hint would be appreciated!
The diameter (lets call it D) of a graph is the largest distance (= minimal number of hops) between any of its nodes.
Choose any node and perform BFS, while retaining, for each node, the number of hops from your initial node. This takes O(V), since you will visit all nodes exactly once. Note that this number of hops is also the shortest distance to v from the root - which I will refer to as d(root, v).
Now, take the leaf z that has the largest number of hops from your root. Congratulations, d(root, z) >= D/2, because
Lemma: for any node x in a connected graph of diameter D, there must exist a node y that is at least D/2 far away.
Proof: If this were not so, then there would be some node x so that, for all y, d(x,y) = D/2 - k <= D/2 (with k>=1). But then, by passing through x, we could find paths from any node to all others in at most 2 * (D/2 - k) = D - 2k - and therefore, the graph's diameter could not be D, but D - 2k.
Thats actually the tricky one, but I think I got it. Interesting thing is that your partially wrong solution put me on the right way.
Lets just copy here few definitions:
Distance between two vertices in a graph is the number of edges in a shortest path
The eccentricity of a vertex v is the greatest distance between v and any other vertex
The diameter d of a graph is the maximum eccentricity of any vertex in the graph. That is, d is the greatest distance between any pair of vertices
The real issue would be to actually find the diameter, its not an easy task. To find diameter you cannot just choose any node and run BFS - in such case you just find node that has highest distance from that node (the eccentricity), but it is not diameter. To actually find diameter you would have to run BFS (=find eccentricity) from every single node and the highest distance you got is diameter (there are some better alghoritms, but as I said - its not simple task).
However! You dont have to know the diameter at all. If you actually run BFS from random node and you find the node with highest distance (eccentricity) - thats the solution to your alghorithm. x would be your starting node and y would be the node with highest distance.
Why? If you imagine super simple graph like this
You can see that the diameter is between nodes 1 and nodes 4. So no matter from which point you run the BFS, that point has to be either in a middle (which means it will have half the diameter) or not in the middle and then the node with highest distance must have even higher distance than half the diameter.
Even more complex graphs do not change the fact
If you choose 6 or 7, its not exactly in diameter path (because the highest distance is between 1-2-3-4-5), but it means that you get even higher distance, which is fine for your task.
Result: Run the BFS from random node, when it ends, take node with highest distance from the starting node (=find eccentricity and remember the furthest node) and the starting and "ending" nodes are (x,y)

Find the Diameter of an unweighted undirected graph

The diameter as in, the largest minimum distance between any two points in the graph.
To solve this, would we just do BFS from any node, and then choose a node among the farthest nodes from the original node. Do BFS on this new node, and then the largest distance here is the diameter of the graph.
Another post talks about weighted directed graphs. This is strictly for unweighted. Although the same algorithm might work here, I am asking if we can do it more efficiently w/ the algo I proposed here.
diameter does exactly this.
G = nx.lollipop_graph(5, 5)
Output: 6

Efficient algorithm to extract a subgraph within a maximum distance from multiple vertices

I have an algorithmic problem where there's a straightforward solution, but it seems wasteful. I'm wondering if there's a more efficient way to do the same thing.
Here's the problem:
Input: A large graph G with non-negative edge weights (interpreted as lengths), a list of vertices v, and a list of distances d the same length as v.
Output: The subgraph S of G consisting of all of the vertices that are at a distance of at most d[i] from v[i] for some i.
The obvious solution is to use Dijkstra's algorithm starting from each v[i], modified so that it bails out after hitting a distance of d[i], and then taking the union of the subgraphs that each search traverses. However, in my use case it's frequently going to be the case that the search trees from the v[i]s overlap substantially. That means the Dijkstra approach will wastefully traverse the vertices in the overlap multiple times before I take the union.
In the case that there is only one vertex in v, the Dijkstra approach runs in O(|S|log|S|), taking |S| to be the number of vertices (my graph is sparse, so I ignore the edges term). Is it possible to achieve the same asymptotic run time when v has more than one vertex?
My first idea was to combine the searches out of each v[i] into the same priority queue, but the "bail out" condition mentioned above complicates this approach. Sometimes a vertex will be reached in a shorter distance from one v[i], but you would still want to search through it from another v[j] if the second vertex has a larger d[j] allotted to it.
You can solve this with the complexity of a single Dijkstra run.
Let D be the maximum of the distances in d.
Define a new start vertex, and give it edges to each of the vertices in v.
The length of the edge between start and v[i] should be set to D-d[i].
Then in this new graph, S is given by all vertices within a length D of the start vertex, so apply Dijkstra to the start vertex.

What is difference between BFS and Dijkstra's algorithms when looking for shortest path?

I was reading about Graph algorithms and I came across these two algorithms:
Dijkstra's algorithm
Breadth-first search
What is the difference between Dijkstra's algorithm and BFS while looking for the shortest-path between nodes?
I searched a lot about this but didn't get any satisfactory answer!
The rules for BFS for finding shortest-path in a graph are:
We discover all the connected vertices,
Add them in the queue and also
Store the distance (weight/length) from source u to that vertex v.
Update with path from source u to that vertex v with shortest distance and we have it!
This is exactly the same thing we do in Dijkstra's algorithm!
So why are the time complexities of these algorithms so different?
If anyone can explain it with the help of a pseudo code then I will be
very grateful!
I know I am missing something! Please help!
Breadth-first search is just Dijkstra's algorithm with all edge weights equal to 1.
Dijkstra's algorithm is conceptually breadth-first search that respects edge costs.
The process for exploring the graph is structurally the same in both cases.
When using BFS for finding the shortest path in a graph, we discover all the connected vertices, add them to the queue and also maintain the distance from source to that vertex. Now, if we find a path from source to that vertex with less distance then we update it!
We do not maintain a distance in BFS. It is for discovery of nodes.
So we put them in a general queue and pop them. Unlike in Dijikstra, where we put accumulative weight of node (after relaxation) in a priority queue and pop the min distance.
So BFS would work like Dijikstra in equal weight graph. Complexity varies because of the use of simple queue and priority queue.
Dijkstra and BFS, both are the same algorithm. As said by others members, Dijkstra using priority_queue whereas BFS using a queue. The difference is because of the way the shortest path is calculated in both algorithms.
In BFS Algorithm, for finding the shortest path we traverse in all directions and update the distance array respectively. Basically, the pseudo-code will be as follow:
distance[src] = 0;
while(queue not empty) {
pop the node at front (say u)
for all its adjacent (say v)
if dist[u] + weight < dist[v]
update distance of v
push v into queue
The above code will also give the shortest path in a weighted graph. But the time complexity is not equal to normal BFS i.e. O(E+V). Time complexity is more than O(E+V) because many of the edges are repeated twice.
Consider, the above graph. Dry run it for the above pseudo-code you will find that node 2 and node 3 are pushed two times into the queue and further the distance for all future nodes is updated twice.
So, assume if there is lot more nodes after 3 then the distance calculated by the first insertion of 2 will be used for all future nodes then those distance will be again updated using the second push of node 2. Same scenario with 3.
So, you can see that nodes are repeated. Hence, all nodes and edges are not traversed only once.
Dijkstra Algorithm does a smart work here...rather than traversing in all the directions it only traverses in the direction with the shortest distance, so that repetition of updation of distance is prevented.
So, to trace the shortest distance we have to use priority_queue in place of the normal queue.
If you try to dry run the above graph again using the Dijkstra algorithm you will find that nodes are push twice but only that node is considered which has a shorter distance.
So, all nodes are traversed only once but time complexity is more than normal BFS because of the use of priority_queue.
With SPFA algorithm, you can get shortest path with normal queue in weighted edge graph.
It is variant of bellman-ford algorithm, and it can also handle negative weights.
But on the down side, it has worse time complexity over Dijkstra's
Since you asked for psuedocode this website has visualizations with psuedocode

graph - How to find Minimum Directed Cycle (minimum total weight)?

Here is an excise:
Let G be a weighted directed graph with n vertices and m edges, where all edges have positive weight. A directed cycle is a directed path that starts and ends at the same vertex and contains at least one edge. Give an O(n^3) algorithm to find a directed cycle in G of minimum total weight. Partial credit will be given for an O((n^2)*m) algorithm.
Here is my algorithm.
I do a DFS. Each time when I find a back edge, I know I've got a directed cycle.
Then I will temporarily go backwards along the parent array (until I travel through all vertices in the cycle) and calculate the total weights.
Then I compare the total weight of this cycle with min. min always takes the minimum total weights. After the DFS finishes, our minimum directed cycle is also found.
Ok, then about the time complexity.
To be honest, I don't know the time complexity of my algorithm.
For DFS, the traversal takes O(m+n) (if m is the number of edges, and n is the number of vertices). For each vertex, it might point back to one of its ancestors and thus forms a cycle. When a cycle is found, it takes O(n) to summarise the total weights.
So I think the total time is O(m+n*n). But obviously it is wrong, as stated in the excise the optimal time is O(n^3) and the normal time is O(m*n^2).
Can anyone help me with:
Is my algorithm correct?
What is the time complexity if my algorithm is correct?
Is there any better algorithm for this problem?
You can use Floyd-Warshall algorithm here.
The Floyd-Warshall algorithm finds shortest path between all pairs of vertices.
The algorithm is then very simple, go over all pairs (u,v), and find the pair that minimized dist(u,v)+dist(v,u), since this pair indicates on a cycle from u to u with weight dist(u,v)+dist(v,u). If the graph also allows self-loops (an edge (u,u)) , you will also need to check them alone, because those cycles (and only them) were not checked by the algorithm.
pseudo code:
run Floyd Warshall on the graph
min <- infinity
vertex <- None
for each pair of vertices u,v
if (dist(u,v) + dist(v,u) < min):
min <- dist(u,v) + dist(v,u)
pair <- (u,v)
return path(u,v) + path(v,u)
path(u,v) + path(v,u) is actually the path found from u to v and then from v to u, which is a cycle.
The algorithm run time is O(n^3), since floyd-warshall is the bottle neck, since the loop takes O(n^2) time.
I think correctness in here is trivial, but let me know if you disagree with me and I'll try to explain it better.
Is my algorithm correct?
No. Let me give a counter example. Imagine you start DFS from u, there are two paths p1 and p2 from u to v and 1 path p3 from v back to u, p1 is shorter than p2.
Assume you start by taking the p2 path to v, and walk back to u by path p3. One cycle found but apparently it's not minimum. Then you continue exploring u by taking the p1 path, but since v is fully explored, the DFS ends without finding the minimum cycle.
"For each vertex, it might point back to one of its ancestors and thus forms a cycle"
I think it might point back to any of its ancestors which means N
Also, how are u going to mark vertexes when you came out of its dfs, you may come there again from other vertex and its going to be another cycle. So this is not (n+m) dfs anymore.
So ur algo is incomplete
same here
During one dfs, I think the vertex should be either unseen, or check, and for checked u can store the minimum weight for the path to the starting vertex. So if on some other stage u find an edge to that vertex u don't have to search for this path any more.
This dfs will find the minimum directed cycle containing first vertex. and it's O(n^2) (O(n+m) if u store the graph as list)
So if to do it from any other vertex its gonna be O(n^3) (O(n*(n+m))
Sorry, for my english and I'm not good at terminology
I did a similar kind of thing but i did not use any visited array for dfs (which was needed for my algorithm to work correctly) and hence i realised that my algorithm was of exponential complexity.
Since, you are finding all cycles it is not possible to find all cycles in less than exponential time since there can be 2^(e-v+1) cycles.
