Spanning Tree VS. Spanning Forest - algorithm

What's the difference between a Spanning Tree and a Spanning Forest in graphs, conceptually.
Also, is it possible to construct a Spanning Forest through DFS or BFS traversals? Why? How?
I understand the Spanning Tree, but I couldn't find any clear explanations about spanning forests. Even Wikipedia (https://en.wikipedia.org/wiki/Spanning_tree), doesn't give a clear definition about it.
My book (Data Structures & Algorithms, Wiley - sixth edition) also has no definition for spanning forests.
I was wondering, if we have a graph with for example three connected components in it, is it possible to construct a spanning forest by DFS/BFS traversals?

When there is only one connected component in your graph, the spanning tree = spanning forest.
But when there are multiple connected components in your graph. For example in following picture we have 3 connected components.:
So for each component, we will have a spanning tree, and all 3 spanning trees will constitute spanning forest
I was wondering, if we have a graph with for example three connected
components in it, is it possible to construct a spanning forest by
DFS/BFS traversals?
Yes it is possible. When there is only 1 connected component, your BFS or DFS will terminate visiting all the vertices and you will have a spanning tree (which in this case is equal to spanning forest).
But when you have more than 1 connected component, like in the picture, the only thing you have to do is start another BFS or DFS from an unvisited vertex. Your algorithm terminates when there is no unvisited vertex left and each BFS or DFS traversal will yield a spanning tree.

Non-trivial spanning forests can be constructed even for complete graphs via the following algorithm:
preconditions
all vertices are unmarked
the edges are enumerated from 1 to m
edge processing
if both of its adjacent vertices are marked, skip it because then that edge would either merge trees of the forest or creates a cycle in one of its trees
else mark its unmarked adjacent vertices
algorithm
process the edges in the order of their enumeration
explanaton:
while it is feasible in the construction of spanning trees to add edges that "bridge" connected components, those edges are not added in the above algorithm.
interpretation:
if the edges are enumerated according to ascending length, the edges of the resulting spanning forest will be a subsets of the MST and the trees of the forest will resemble "basins" i.e. the length of edges is smallest for the one that created the connected component and increases with every edge that is attached in later steps.
In that case the properties of spanning forest may provide insight into the structural properties of the original graph and/or be utilized in algorithms.

Related

Relation between Dijkstra and MST

This question came to my mind when I see this question. For simplicity, we can limit our discussion to undirected, weighted, connected graphs. It is clear that Dijkstra cannot guarantee to produce a MST if we choose an arbitrary node from a graph as the source. However, is it guaranteed that there must exist one node in an undirected, weighted, connected graph, which will produce a MST for the graph if we choose it as the source and apply Dijkstra's algorithm? Maybe you can give a proof or a counterexample. Thanks!
However, is it guaranteed that there must exist one node in an
undirected, weighted, connected graph, which will produce a MST for
the graph if we choose it as the source and apply Dijkstra's
algorithm?
Nope, Dijkstra's algorithm minimizes the path weight from a single node to all other nodes. A minimum spanning tree minimizes the sum of the weights needed to connect all nodes together. There's no reason to expect that those disparate requirements will result in identical solutions.
Consider a complete graph where the sum of the weight of any two edges exceeds the weight of any single edge. That forces Dijkstra to always select the direct connection as the shortest path between two nodes. Then, if the lowest weight edges in the graph don't all originate from a single node, the minimum spanning tree won't be the same as any of the trees that Dijkstra will produce.
Here's an example:
The minimum spanning tree consists of the three edges with weight 3 (total weight 9). The trees returned by Dijkstra's algorithm will be whichever three edges connect directly to the source node (total weight 10 or 11).

Prim's Algorithm and disconnected graph

Consider we are trying to apply prim's algorithm on a disconnected graph. Consider that this disconnected graph has vertices a,b,c and d. Where this vertex d is disconnected. Now I need to check my understanding, if we apply prim's algorithm on this disconnected graph, the algorithm will not reach the vertex d and therefore will return a MST with vertices a,b and c only. So, is this assumption right ?
While it is true that the actual definition of MST applies to connected graphs, you could also say that, for a disconnected graph, a minimum spanning forest is composed of a minimum spanning tree for each connected component.
The problem adds an initial step to isolate each connected sub-graph, and apply Prim's algorithm to each of these.
There's no point in applying a Minimum Spanning Tree on a disconnected graph because by definition it is disconnected and it will not span all the vertices. By definition it is relevant only to connected graphs:
A minimum spanning tree (MST) or minimum weight spanning tree is a
subset of the edges of a connected, edge-weighted undirected graph
that connects all the vertices together, without any cycles and with
the minimum possible total edge weight.

Trying to create an algo that creates a spanning tree with least number of edges removed from a unweighed graph

I am trying to create an algo that will get a spanning tree with root node such that, the spanning tree will have least number of edges removed from Original Graph G.
Thanks in advance
For any connected graph, the spanning tree always contains n-1 edges where n is the number of nodes in the graph. So you will have to remove all the remaining edges. (If I have understood your question correctly)
Even for disconnected graphs, the number of edges in a spanning tree is defined by the number of components and number of nodes in each component.

Determining whether or not a directed or undirected graph is a tree

I would like to know of a fast algorithm to determine if a directed or undirected graph is a tree.
This post seems to deal with it, but it is not very clear; according to this link, if the graph is acyclic, then it is a tree. But if you consider the directed and undirected graphs below: in my opinion, only graphs 1 and 4 are trees. I suppose 3 is neither cyclic, nor a tree.
What needs to be checked to see if a directed or undirected graph is a tree or not, in an efficient way? And taking it one step ahead: if a tree exists then is it a binary tree or not?
For a directed graph:
Find the vertex with no incoming edges (if there is more than one or no such vertex, fail).
Do a breadth-first or depth-first search from that vertex. If you encounter an already visited vertex, it's not a tree.
If you're done and there are unexplored vertices, it's not a tree - the graph is not connected.
Otherwise, it's a tree.
To check for a binary tree, additionally check if each vertex has at most 2 outgoing edges.
For an undirected graph:
Check for a cycle with a simple depth-first search (starting from any vertex) - "If an unexplored edge leads to a node visited before, then the graph contains a cycle." If there's a cycle, it's not a tree.
If the above process leaves some vertices unexplored, it's not a tree, because it's not connected.
Otherwise, it's a tree.
To check for a binary tree, if the graph has more than one vertex, additionally check that all vertices have 1-3 edges (1 to the parent and 2 to the children).
Checking for the root, i.e. whether one vertex contains 1-2 edges, is not necessary as there has to be vertices with 1-2 edges in an acyclic connected undirected graph.
Note that identifying the root is not generically possible (it may be possible in special cases) as, in many undirected graphs, more than one of the nodes can be made the root if we were to make it a binary tree.
If an undirected given graph is a tree:
the graph is connected
the number of edges equals the number of nodes - 1.
An undirected graph is a tree when the following two conditions are true:
The graph is a connected graph.
The graph does not have a cycle.
A directed graph is a tree when the following three conditions are true:
The graph is a connected graph.
The graph does not have a cycle.
Each node except root should have exactly one parent.

how can a breadth-first-search-tree include a cross-edge?

Well, I know that a breadth-first-search-tree of an undirected graph can't have a back edge. But I'm wondering how can it even have a cross-edge? I'm not able to image a spanning tree of a graph G constructed out of OFS, that contains a cross-edge.
The process of building a spanning tree using BFS over an undirected graph would generate the following types of edges:
Tree edges
Cross edges (connecting vertices on different branches)
A simple example: Imagine a triangle (a tri-vertice clique) - start a BFS from any node, and you'll reach the other two on the first step. You're left with an edge between them that does not belong to the spanning tree.
What about back-edges (connecting an ancestor with an non-immediate child) ? Well, as you point out, in BFS over an undirected graph you won't have them, since you would have used that edge when first reaching the ancestor.
In fact, you can make a stronger statement - all non-tree edges should be between vertices as the same level, or adjacent ones (you can't use that edge for the tree if the vertice on the other side is a sibling, like in the triangle case, or a sibling of the parent, that was not explored yet). Either way, it's falls under the definition of a cross-edge.
I had this same question...and the answer is that there are no cross edges in the BFS, but that the BFS tree itself encodes all the edges that would have been back-edges and forward-edges in the DFS tree as tree edges in the BFS tree, such that the remaining edges which the undirected graph has, but which are still not present in the BFS, are cross edges--and nothing else.
So the Boolean difference of the set of edges in the undirected graph and the edges in the BFS tree are all cross edges.
...As opposed to the DFS, where the set of missing edges may also include "Back Edges," "Forward Edges," and "Cross Edges."
I don't know why it is in the algorithmic parlance to say that both "tree edges and cross edges are in a BFS"
...I think it is just a short hand, and that in a math class, the professor would have written the relationship in set notation and unions (which I can't do on this stack exchange).

Resources