Efficiently find the depth of a graph from every node - algorithm

I have a problem where I am to find the minimum possible depth of a graph which implies that I have to find the maximum depth from each node and return the least of them all. Obviously a simple DFS from each node will do the trick but when things get crazy with extremely large input, then DFS becomes inefficient (time limit). I tried keeping the distance of each leaf to the node being explored in memory to but that didn't help much.
How do I efficiently find the minimum depth of a very large graph. It is worthy of note that the graph in question has no cycle.

To find the graph centre/center of an undirected tree graph you could:
Do a DFS to find a list of all leaf nodes O(n)
Remove all these leaf nodes from the graph and note during the deletion which new nodes become leaf nodes
Repeat step 2 until the graph is completely deleted
The node/nodes deleted in the last stage of the algorithm will be the graph centres of your tree.
Each node is deleted once, so this whole process can be done in O(n).

What you seem to be looking for is the diameter / 2. You could compute the diameter of a tree as below and call it as findDiameter(n, null), for an arbitrary node n of the tree.
public findDiameter(node n, node from) returns <depth, diameter>
// apply recursively on all children
foreach child in (neighbors(n) minus from) do
<de, di> = findDiameter(child, n)
// depth of this node is 1 more than max depth of children
depth = 1 + max(de)
// max diameter either uses this node, then it is 1 + the 2 largest depths
// or it does not use this node, then it's the max depth of the neighbors
diameter = max(max(di), 1 + max(de) + oneButLargest(de))
All you need to do is in the loop over the neighbors keep track of the largest diameter and the 2 largest depths.

Related

Leetcode 543: Diameter of Binary tree top down or bottom-up?

I'm working on diameter of a binary tree and I've copied over the question.
Given the root of a binary tree, return the length of the diameter of the tree.
The diameter of a binary tree is the length of the longest path between any two nodes in a tree. This path may or may not pass through the root. The length of a path between two nodes is represented by the number of edges between them.
The official solution is below.
def diameterOfBinaryTree(self, root: Optional[TreeNode]) -> int:
max_diameter = 0
def getDepth(root):
nonlocal max_diameter
if not root:
return 0
left = getDepth(root.left)
right = getDepth(root.right)
diameter = right + left
max_diameter = max(max_diameter, diameter)
return max(left, right) + 1
getDepth(root)
return max_diameter
The solution has the time complexity as O(N) b/c each node in the tree is only processed once. Is this a bottom-up algorithm b/c the leaf nodes are processed before their parent nodes, and they return information (their longest path, essentially) to their callers? If the algorithm was instead top-down where the diameter of parent nodes are calculated before the diameter of their child nodes are calculated, would the algorithm be O(N^2) since there's repeated diameter calculations?

Shortest path from one source which goes through N edges

In my economics research I am currently dealing with a specific shortest path problem:
Given a directed deterministic dynamic graph with weights on the edges, I need to find the shortest path from one source S, which goes through N edges. The graph can have cycles, the edge weights could be negative, and the path is allowed to go through a vertex or edge more than once.
Is there an efficient algorithm for this problem?
One possibility would be:
First find the lowest edge-weight in the graph.
And then build a priority queue of all paths from the starting edge (initially an empty path from starting point) where all yet-to-be-handled edges are counted as having the lowest weight.
Main loop:
Remove path with lowest weight from the queue.
If path has N edges you are done
Otherwise add all possible one-edge extensions of that path to priority queue
However, that simple algorithm has a flaw - you might re-visit a vertex multiple times as i:th edge (visiting as 2nd and 4th is ok, but 4th in two different paths is the issue), which is inefficient.
The algorithm can be improved by skipping them in the 3rd step above, since the priority queue guarantees that the first partial path to the vertex had the lowest weight-sum to that vertex, and the rest of the path does not depend on how you reached the vertex (since edges and vertices can be duplicated).
The "exactly N edges" constraint makes this problem much easier to solve than if that constraint didn't exist. Essentially you can solve N = 0 (just the start node), use that to solve N = 1 (all the neighbors of the start node), then N = 2 (neighbors of the solution to N = 1, taking the lowest cost path for nodes that are are connected to multiple nodes), etc.
In pseudocode (using {field: val} to mean "a record with a field named field with value val"):
# returns a map from node to cost, where each key represents
# a node reachable from start_node in exactly n steps, and the
# associated value is the total cost of the cheapest path to
# that node
cheapest_path(n, start_node):
i = 0
horizon = new map()
horizon[start_node] = {cost: 0, path: []}
while i <= n:
next_horizon = new map()
for node, entry in key_value_pairs(horizon):
for neighbor in neighbors(node):
this_neighbor_cost = entry.cost + edge_weight(node, neighbor, i)
this_neighbor_path = entry.path + [neighbor]
if next_horizon[neighbor] does not exist or this_neighbor_cost < next_horizon[neighbor].cost:
next_horizon[neighbor] = {cost: this_neighbor_cost, path: this_neighbor_path}
i = i + 1
horizon = next_horizon
return horizon
We take account of dynamic weights using edge_weight(node, neighbor, i), meaning "the cost of going from node to neighbor at time step i.
This is a degenerate version of a single-source shortest-path algorithm like Dijkstra's Algorithm, but it's much simpler because we know we must walk exactly N steps so we don't need to worry about getting stuck in negative-weight cycles, or longer paths with cheaper weights, or anything like that.

Calculate the number of nodes on either side of an edge in a tree

A tree here means an acyclic undirected graph with n nodes and n-1 edges. For each edge in the tree, calculate the number of nodes on either side of it. If on removing the edge, you get two trees having a and b number of nodes, then I want to find those values a and b for all edges in the tree (ideally in O(n) time).
Intuitively I feel a multisource BFS starting from all the "leaf" nodes would yield an answer, but I'm not able to translate it into code.
For extra credit, provide an algorithm that works in any general graph.
Run a depth-first search (or a breadth-first search if you like it more) from any node.
That node will be called the root node, and all edges will be traversed only in the direction from the root node.
For each node, we calculate the number of nodes in its rooted subtree.
When a node is visited for the first time, we set this number to 1.
When the subtree of a child is fully visited, we add the size of its subtree to the parent.
After this, we know the number of nodes on one side of each edge.
The number on the other side is just the total minus the number we found.
(The extra credit version of your question involves finding bridges in the graph on top of this as a non-trivial part, and thus deserves to be asked as a separate question if you are really interested.)
Consider the following tree:
1
/ \
2 3
/ \ | \
5 6 7 8
If we cut the edge between node 1 and 2, The tree will surely split into two tree because there is only one unique edge between two nodes according to tree property:
1
\
3
| \
7 8
and
2
/ \
5 6
So, now a is the number of nodes rooted at 1 and b is number of nodes rooted at 2.
> Run one DFS considering any node as root.
> During DFS, for each node x, calculate nodes[x] and parent[x] where
nodes [x] = k means number of nodes of sub-tree rooted at x is k
parent[x] = y means y is parent of x.
> For any edge between node x and y where parent[x] = y:
a := nodes[root] - nodes[x]
b := nodes[x]
Time and space complexity both O(n).
Note that n=b-a+1. Due to this, you don't need to count both sides of the edge. This greatly simplifies things. A normal recursion over the nodes starting from the root is enough. Since your tree is undirected you don't really have a "root", just pick one of the leaves.
What you want to do is to "go down" the tree until you reach the bottom. Then you count backwards from there. The leaf returns 1, and each recursive step sums the return values for each edge and then increment by 1.
Here is the Java code. Function countEdges() takes in the adjacency list of the tree as an argument also current node and the parent node of the current node(here parent node means that current node was introduced by parent node in this DFS).
Here edge[][] stores the number of nodes on one side of the edge[i][j], obviously the number of nodes on the other side will be equal to (total nodes - edge[i][j]).
int edge[][];
int countEdges(ArrayList<Integer> adj[], int cur, int par) {
// If current nodes is leaf node and is not the node provided by the calling function then return 1
if(adj[cur].size() == 1 && par != 0) return 1;
int count = 1;
// count the number of nodes recursively for each neighbor of current node.
for(int neighbor: adj[cur]) {
if(neighbor == par) continue;
count += countEdges(adj, neighbor, cur);
}
// while returning from recursion assign the result obtained in the edge[][] matrix.
return edge[par][cur] = count;
}
Since we are visiting each node only once in the DFS time complexity should be O(V).

Graph algorithm to calculate node degree

I'm trying to implement the topological-sort algorithm for a DAG. (http://en.wikipedia.org/wiki/Topological_sorting)
First step of this simple algorithm is finding nodes with zero degree, and I cannot find any way to do this without a quadratic algorithm.
My graph implementation is a simple adjacency list and the basic process is to loop through every node and for every node go through each adjacency list so the complexity will be O(|V| * |V|).
The complexity of topological-sort is O(|V| + |E|) so i think there must be a way to calculate the degree for all nodes in a linear way.
You can maintain the indegree of all vertices while removing nodes from the graph and maintain a linked list of zero indegree nodes:
indeg[x] = indegree of node x (compute this by going through the adjacency lists)
zero = [ x in nodes | indeg[x] = 0 ]
result = []
while zero != []:
x = zero.pop()
result.push(x)
for y in adj(x):
indeg[y]--
if indeg[y] = 0:
zero.push(y)
That said, topological sort using DFS is conceptionally much simpler, IMHO:
result = []
visited = {}
dfs(x):
if x in visited: return
visited.insert(x)
for y in adj(x):
dfs(y)
result.push(x)
for x in V: dfs(x)
reverse(result)
You can achieve it in o(|v|+|e|). Follow below given steps:
Create two lists inDegree, outDegree which maintain count for in coming and out going edges for each node, initialize it to 0.
Now traverse through given adjacency list, for edge (u,v) in graph g, increase count of outdegree for u, and increment count of indegree for v.
You can traverse through adjacency list in o(v +e) , and will have indegree and outdegree for each u in o(|v|+|e|).
The Complexity that you mentioned for visiting adjacency nodes is not quite correct (O(n2)), because if you think carefully, you will notice that this is more like a BFS search. So, you visit each node and each edge only once. Therefore, the complexity is O(m+n). Where, n is the number of nodes and m is the edge count.
You can also use DFS for topological sorting. You won't need additional pass to calculate in-degree after processing each node.
http://www.geeksforgeeks.org/topological-sorting/

the center of a tree with weighted edges

i am trying to give a solution to my university assignement..given a connected tree T=(V,E). every edge e has a specific positive cost c..d(v,w) is the distance between node v and w..I'm asked to give the pseudocode of an algorithm that finds the center of such a tree(the node that minimizes the maximum distance to every other node)..
My solution consists first of all in finding the first two taller branches of the tree..then the center will be in the taller branch in a distance of H/2 from the root(H is the difference between the heights of the two taller branches)..the pseudocode is:
Algorithm solution(Node root, int height, List path)
root: the root of the tree
height : the height measured for every branch. Initially height=0
path : the path from the root to a leaf. Initially path={root}
Result : the center of the tree
if root==null than
return "error message"
endif
/*a list that will contain an element <h,path> for every
leaf of the tree. h is the distanze of the leaf from the root
and path is the path*/
List L = empty
if isLeaf(root) than
L = L union {<height,path>}
endif
foreach child c of root do
solution(c,height+d(root,c),path UNION {c})
endfor
/*for every leaf in the tree I have stored in L an element containing
the distance from the root and the relative path. Now I'm going to pick
the two most taller branches of the tree*/
Array array = sort(L)
<h1,path1> = array[0]//corresponding to the tallest branch
<h2,path2> = array[1]//corresponding to the next tallest branch
H = h1 - h2;
/*The center will be the node c in path1 with d(root,c)=H/2. If such a
node does not exist we can choose the node with te distance from the root
closer to H/2 */
int accumulator = 0
for each element a in path1 do
if d(root,a)>H/2 than
return MIN([d(root,a)-H/2],[H/2-d(root,a.parent)])
endif
end for
end Algorithm
is this a correct solution??is there an alternative and more efficient one??
Thank you...
Your idea is correct. You can arbitrarily pick any vertex to be a root of the tree, and then traverse the tree in 'post-order'. Since the weights are always positive, you can always pick the two longest 'branches' and update the answer in O(1) for each node.
Just remember that you are looking for the 'global' longest path (i.e. diameter of the graph), rather than the 'local' longest paths that go through the roots of the subtrees.
You can find more information if you search for "(weighted) Jordan Center (in a tree)". The optimal algorithm is O(N) for trees, so asymptotically your solution is optimal since you only use a single DFS which is O(|V| + |E|) == O(|V|) for a tree.

Resources