Find the Root in a given Graph? - data-structures

Given a directed graph, Find the root of the graph i.e node with max outgoing node .
So that the graph can be divided into maximum individual sub trees.

Assuming that the graph is given as an Adjacency Matrix, you can scan each row to count the outgoing edges from corresponding node, and finally scan those values for each node to get the node with maximum outdegree. This will take O(n^2) time.

Scan the graph one by one and store number of outgoing edges in a sorted list and then the max number matches the root node

Related

DAG - Algorithm to ensure there is a single source and a single sink

I have to ensure that a graph in our application is a DAG with a unique source and a unique sink.
Specifically I have to ensure that for a given start node and end node (both of which are known at the outset) that every node in the graph lies on a path from the start node to the end node.
I already have an implementation of Tarjan's algorithm which I use to identify cycles, and a a topological sorting algorithm that I can run once Tarjan's algorithm reports the graph is a DAG.
What is the most efficient way to make sure the graph meets this criterion?
If your graph is represented by an adjacency matrix, then node x is a source node if the xth column of the matrix is 0 and is a sink node if row x of the matrix is 0. You could run two quick passes over the matrix to count the number of rows and columns that are 0s to determine how many sources and sinks exist and what they are. This takes time O(n2) and is probably the fastest way to check this.
If your graph is represented by an adjacency list, you can find all sink nodes in time O(n) by checking whether any node has no outgoing edges. You can find all sinks by maintaining for each node a boolean value indicating whether it has any incoming edges, which is initially false. You can then iterate across all the edges in the list in time O(n + m) marking all nodes with incoming edges. The nodes that weren't marked as having incoming edges are then sources. This process takes time O(m + n) and has such little overhead that it's probably one of the fastest approaches.
Hope this helps!
A simple breadth or depth-first search should satisfy this. Firstly you can keep a set of nodes that comprise sink nodes that you've seen. Secondly, you can keep a set of nodes that you've discovered using BFS/DFS. Then the graph will then be connected iff there is a single connected component. Assuming you're using some kind of adjacency list style representation for your graph, the algorithm would look something like this:
create an empty queue
create an empty set to store seen vertices
create an empty set for sink nodes
add source node to queue
while queue is not empty
get next vertex from queue, add vertex to seen vertices
if num of adjacent nodes == 0
add sink nodes to sink node set
else
for each node in adjacent nodes
if node is not in seen vertices
add node to queue
return size of sink nodes == 1 && size of seen vertices == total number in graph
This will be linear in the number of vertices and edges in the graph.
Note that as long as you know a source vertex to start from, this will also ensure the property of a single source: any other vertex that is a source won't be discovered by the BFS/DFS, and thus the size of the seen vertices won't be the total number in the graph.
If your algorithm takes as input a DAG, which is weakly connected, assume that there are only one node s whose in-degree is zero and only one node t whose out-degree is zero, while all other nodes have positive in-degree and out-degree, then s can reach all other nodes, and all other nodes can reach t. By contradiction, assume that there is a node v that s cannot reach. Since no nodes can reach s, that is, v cannot reach s too. As such, v and s are disconnected, which contradicts the assumption. On the other hand, if the DAG is not weakly connected, it definitely does not satisfy the requirements that you want. In sum, you can first compute the weakly connected component of the DAG simply using BFS/DFS, meanwhile remembering the number of nodes whose in-degree or out-degree is zero. The complexity of this algorithm is O(|E|).
First, do a DFS on the graph, starting from the source node, which, as you say, is known in advance. If you encounter a back edge[1], then you have a cycle and you can exit with an error. During this traversal you can identify if there are nodes, not reachable from the source[2] and take the appropriate action.
Once you have determined the graph is a DAG, you can ensure that every node lies on a path from the source to the sink by another DFS, starting from the source, as follows:
bool have_path(source, sink) {
if source == sink {
source.flag = true
return true
}
// traverse all successor nodes of `source`
for dst in succ(source) {
if not dst.flag and not have_path(dst, sink)
return false // exit as soon as we find a node with no path to `sink`
}
source.flag = true;
return true
}
The procedure have_path sets a boolean flag in each node, for which there exists some path from that node to the sink. In the same time the procedure traverses only nodes reachable from the source and it traverses all nodes reachable from the source. If the procedure returns true, then all the nodes, reachable from the source lie on a path to the sink. Unreachable nodes were already handled in the first phase.
[1] an edge, linking a node with a greater DFS number to an node with a lesser DFS number, that is not already completely processed, i.e. is still on the DFS stack
[2] e.g. they would not have an assigned DFS number

BOOST graph queries

I have the following queries:
I want to know how to create a graph dynamically
How to manage multiple weights on a graph
How to find the distance from a particular node to another in a minimum spanning tree using kruskal.In kruskals the minimum spanning tree is output as a vector of edges.Hence the vertices are not explicitly stored. I do not know how to get the distance for say an example node 0 to the node furthest from it. I tried getting the vertices using sourc and target and then storing the verices in an array.After that, locating node 0 and from there iterating and reverse iterating through the vertices calculating the weights to find the largest diatance from the node 0. But I fell I'm using the most round about way of going about it.There must be a function for this, or perhaps a clevere way of going about this.
Does kruskal store the edges in the spanning tree in order of the spanning tree? Or at least is the first node of the first edge stored the actual first node? How can I get the order of the nodes in spanning tree in kruskals?
Similarly how can I get the weight of the spanning tree using Prim? The way I did it was to use the predecessor array where predecessors are stored and find what edge in weightsMap and add it.Is there an easier way? And in prims the distanceMap stores the distance from node 0 to the others in the original graph and not the spanning tree right?

Min cost edge deletions in tree to separate all leaves in a tree

This is from a larger problem, that I have boiled down to the following problem. Given a weighted tree with positive edge weights, and having k leaves. A leaf is a node which has exactly one adjacent node in the tree. I need to delete some edges from the tree so that the tree splits up into k components, with each component containing exactly one of the leaf nodes from the original tree. In other words, I need to delete edges so that all the leaves in the original tree are separated/disconnected from every other leaf of the original tree.
I need to do this in such a way that the sum of weights (cost) of the deleted edges is minimized. It is trivial to show that k-1 edges need to be deleted. So I need to minimize the sum of weights of these k-1 edges.
What is the optimal way to do this? Any hints would be appreciated. Thanks!
I think a greedy algorithm works here.
i.e. delete the lowest weight edge that generates a new component and repeat k-1 times.
Note that you have to be careful for a graph such as:
D<-A->B->C
If you delete B->C first, then deleting A->B does not generate a new component because B was not a leaf so does not need to be separated.
In other words, when selecting the lowest weight edge, do not include any edges that do not still lead to a leaf node.

how to find all edge-disjoint equi-cost paths in an undirected graph between a pair of nodes?

Given an undirected graph G = (V, E), the edges in G have non-negative weights.
[1] how to find all the edge-disjoint and equal-cost paths between two nodes s and t ?
[2] how to find all the edge-disjoint paths between two nodes s and t ?
[3] how to find all the vertex-disjoint and equal-cost paths between two nodes s and t ?
[4] how to find all the vertex-disjoint paths between two nodes s and t ?
Any approximation algorithms instead ?
Build a tree where each node is a representation of a path through your unidirectional graph.
I know, a tree is a graph too, but in this answer I will use the following terms with this meanings:
vertex: is a vertex in your unidirectional graph. It is NOT a node in my tree.
edge: is an edge in your unidirectional graph. It is NOT part of my tree.
node: a vertex in my tree, not in your graph.
root: the sole node on top of my tree that has no parent.
leaf: any node in my tree that has no children.
I will not talk about edges in my tree, so there is no word for tree-edges. Every edge in this answer is part of your graph.
Algorithm to build the tree
The root of the tree is the representation of a path that contains only of the vertex s and contains no edge. Let its weight be 0.
Do for every node in that tree:
Take the end-vertex of the path represented by that node (I call it the actual node) and find all edges that lead away from that end-vertex.
If: there are no edges that lead away from this vertex, you reached a dead end. This path never will lead to vertex t. So mark this node as a leaf and give it an infinite weight.
Else:
For each of those edges:
add a child-node to the actual node. Let it be a copy of the actual node. Attach the edge to the end of path and then attach the edges end-vertex to the path too. (so each path follows the pattern vertex-edge-vertex-edge-vertex-...)
Now traverse up in the tree, back to the root and check if any of the predecessors has an end-vertex that is identic with the just added end-vertex.
If you have a match, the newly generated node is the representation of a path that contains a loop. Mark this node as a leaf and give it an infinite weight.
Else If there is no loop, just add the newly added edges weight to the nodes weight.
Now test, if the end-vertex of the newly generated node is the vertex t.
If it really is, mark this node as a leaf, since the path represented by this node is a valid path from s to t.
This algorithm always comes to an end in finite time. At the end you have 3 types of leafs in your tree:
nodes that represent dead ends with an infinite weight
nodes that represent loops, also with an infinite weight
nodes that represent a valid path from s to t, with its weight beeing the sum of all edges weights that are part of this path.
Each path represented by a leaf has its individual sequence of edges (but might contain the same sequence of vertexes), so the leafs with finite weights represent the complete set of edge-disjoint pathes from s to t. This is the solution of exercise [2].
Now do for all leafs with finite weight:
Write its weight and its path into a list. Sort the list by the weights. Now paths with identic weight are neighbours in the list, and you can find and extract all groups of equal-cost paths in this sorted list. This is the solution of exercise [1].
Now, do for each entry in this list:
add to each path in this list the list of its vertexes. After you have done this, you have a table with this columns:
weight
path
1st vertex (is always s)
2nd vertex
3rd vertex
...
Sort this table lexigraphic by the vertexes and after all vertexes by the weight (sort by 1st vertex, 2nd vertex, 3rd vertex ,... ,weight)
If one row in this sorted table has the same sequence of vertexes as the row before, then delete this row.
This is the list of all vertex-disjoint paths between two nodes s and t, and so it is the solution of exercise [4].
And in this list you find all equal-cost paths as neighbours, so you can easily extract all groups of vertex-disjoint and equal-cost paths between two nodes s and t from that list, so here you have the solution of exercise [3].

How to find the maximum-weight path between two vertices in a DAG?

In a DAG G, with non negative weighted edges, how do you find the maximum-weight path between two vertices in G?
Thank you guys!
You can solve this in O(n + m) time (where n is the number of nodes and m the number of edges) using a topological sort. Begin by doing topological sort on the reverse graph, so that you have all the nodes ordered in a way such that no node is visited before all its children are visited.
Now, we're going to label all the nodes with the weight of the highest-weight path starting with that node. This is done based on the following recursive observation:
The weight of the highest-weight path starting from a sink node (any node with no outgoing edges) is zero, since the only path starting from that node is the length-zero path of just that node.
The weight of the highest-weight path starting from any other node is given by the maximum weight of any path formed by following an outgoing edge to a node, then taking the maximum-weight path from that node.
Because we have the nodes reverse-topologically sorted, we can visit all of the nodes in an order that guarantees that if we ever try following an edge and looking up the cost of the heaviest path at the endpoint of that node, we will have already computed the maximum-weight path starting at that node. This means that once we have the reverse topological sorted order, we can apply the following algorithm to all the nodes in that order:
If the node has no outgoing edges, record the weight of the heaviest path starting at that node (denoted d(u)) as zero.
Otherwise, for each edge (u, v) leaving the current node u, compute l(u, v) + d(v), and set d(u) to be the largest value attained this way.
Once we've done this step, we can make one last pass over all the nodes and return the highest value of d attained by any node.
The runtime of this algorithm can be analyzed as follows. Computing a topological sort can be done in O(n + m) time using many different methods. When we then scan over each node and each outgoing edge from each node, we visit each node and edge exactly once. This means that we spend O(n) time on the nodes and O(m) time on the edges. Finally, we spend O(n) time on one final pass over the elements to find the highest weight path, which takes O(n). This gives a grand total of O(n + m) time, which is linear in the size of the input.
A simple brute-force algorithm can be written using recursive functions.
Start with an empty vector (in C++: std::vector) and insert the first node.
Then call your recursive function with the vector as argument that does the following:
loop over all neighbours and for each neighbour
copy the vector
add the neighbour
call ourself
Also add the total weight as argument to the recursive function and add the weight in every recursive call.
The function should stop whenever it reaches the end node. Then compare the total weight with the maximum weight you have so far (use a global variable) and if the new total weight is bigger, set the maximum weight and store the vector.
The rest is up to you.

Resources