Distance of every possible path between all nodes in minimum spanning tree - algorithm

I want to get distance of all nodes from all nodes. There are a few questions regarding this but none solves my problem.
What my appoach is that I am implementing recursive DFS and storing the result of each path while backtracking but the problem is I am having high running time and I a going through a path N number of times (N being the number of edges).
As you can see in my code I am running dfs for every possible node as a root. I dont know how to reuse the path to know the answer just in One Iteration of DFS search.
I think it could be done in O(n) as its a minimum spanning tree and there is only one path between a pair of nodes.
my code :
vector<int> visited(50001);
map< pair<ll,ll> , ll> mymap;
map<ll, vector<ll> > graph;
void calc(ll start,ll node)
{
visited[node]=1;
vector<ll>::iterator it;
for (it = graph[node].begin(); it != graph[node].end(); ++it)
{
if (!visited[*it])
{
ans+=mymap[make_pair(node,*it)];
mymap[make_pair(start,*it)]=ans;
calc(start, *it);
ans-=mymap[make_pair(node,*it)];
}
}
}
In main :
for(i=1;i<=n;i++)
{
fill(visited.begin(),visited.end(),0);
ans=0;
calc(i,i);
}

I can think of a solution of complexity O(n * logn) using divide and conquer. Let me share it.
Let's choose an edge e of distance d which is connecting node a and b. Let's cut it. Now we have two tree with root a and b. Let's assume,
number of nodes in tree a = na
sum of distance between every node in tree a is = ca
sum of distance of every node to root in tree a is = ra
number of nodes in tree b = nb
sum of distance between every node in tree b is = cb
sum of distance of every node to root in tree b is = rb
So the distance between every node in the original tree is:
ca + cb + (nb * ra + na * d * nb + na * rb))
Now, we can calculate the sum of distance of every node in tree a or b using same approach. One thing to be careful is that, we have to choose such edge e such that the difference between the number of components isn't much. You can always find an edge in a tree that if you cut that edge, the difference between the number of nodes in resulting two trees won't be more than 1.

Related

How to count all reachable nodes in a directed graph?

There is a directed graph (which might contain cycles), and each node has a value on it, how could we get the sum of reachable value for each node. For example, in the following graph:
the reachable sum for node 1 is: 2 + 3 + 4 + 5 + 6 + 7 = 27
the reachable sum for node 2 is: 4 + 5 + 6 + 7 = 22
.....
My solution: To get the sum for all nodes, I think the time complexity is O(n + m), the n is the number of nodes, and m stands for the number of edges. DFS should be used,for each node we should use a method recursively to find its sub node, and save the sum of sub node when finishing the calculation for it, so that in the future we don't need to calculate it again. A set is needed to be created for each node to avoid endless calculation caused by loop.
Does it work? I don't think it is elegant enough, especially many sets have to be created. Is there any better solution? Thanks.
This can be done by first finding Strongly Connected Components (SCC), which can be done in O(|V|+|E|). Then, build a new graph, G', for the SCCs (each SCC is a node in the graph), where each node has value which is the sum of the nodes in that SCC.
Formally,
G' = (V',E')
Where V' = {U1, U2, ..., Uk | U_i is a SCC of the graph G}
E' = {(U_i,U_j) | there is node u_i in U_i and u_j in U_j such that (u_i,u_j) is in E }
Then, this graph (G') is a DAG, and the question becomes simpler, and seems to be a variant of question linked in comments.
EDIT previous answer (striked out) is a mistake from this point, editing with a new answer. Sorry about that.
Now, a DFS can be used from each node to find the sum of values:
DFS(v):
if v.visited:
return 0
if v is leaf:
return v.value
v.visited = true
return sum([DFS(u) for u in v.children])
This is O(V^2 + VE) worst vase, but since the graph has less nodes, V
and E are now significantly lower.
Some local optimizations can be made, for example, if a node has a single child, you can reuse the pre-calculated value and not apply DFS on the child again, since there is no fear of counting twice in this case.
A DP solution for this problem (DAG) can be:
D[i] = value(i) + sum {D[j] | (i,j) is an edge in G' }
This can be calculated in linear time (after topological sort of the DAG).
Pseudo code:
Find SCCs
Build G'
Topological sort G'
Find D[i] for each node in G'
apply value for all node u_i in U_i, for each U_i.
Total time is O(|V|+|E|).
You can use DFS or BFS algorithms for solving Your problem.
Both have complexity O(V + E)
You dont have to count all values for all nodes. And you dont need recursion.
Just make something like this.
Typically DFS looks like this.
unmark all vertices
choose some starting vertex x
mark x
list L = x
while L nonempty
choose some vertex v from front of list
visit v
for each unmarked neighbor w
mark w
add it to end of list
In Your case You have to add some lines
unmark all vertices
choose some starting vertex x
mark x
list L = x
float sum = 0
while L nonempty
choose some vertex v from front of list
visit v
sum += v->value
for each unmarked neighbor w
mark w
add it to end of list

Calculate the number of nodes on either side of an edge in a tree

A tree here means an acyclic undirected graph with n nodes and n-1 edges. For each edge in the tree, calculate the number of nodes on either side of it. If on removing the edge, you get two trees having a and b number of nodes, then I want to find those values a and b for all edges in the tree (ideally in O(n) time).
Intuitively I feel a multisource BFS starting from all the "leaf" nodes would yield an answer, but I'm not able to translate it into code.
For extra credit, provide an algorithm that works in any general graph.
Run a depth-first search (or a breadth-first search if you like it more) from any node.
That node will be called the root node, and all edges will be traversed only in the direction from the root node.
For each node, we calculate the number of nodes in its rooted subtree.
When a node is visited for the first time, we set this number to 1.
When the subtree of a child is fully visited, we add the size of its subtree to the parent.
After this, we know the number of nodes on one side of each edge.
The number on the other side is just the total minus the number we found.
(The extra credit version of your question involves finding bridges in the graph on top of this as a non-trivial part, and thus deserves to be asked as a separate question if you are really interested.)
Consider the following tree:
1
/ \
2 3
/ \ | \
5 6 7 8
If we cut the edge between node 1 and 2, The tree will surely split into two tree because there is only one unique edge between two nodes according to tree property:
1
\
3
| \
7 8
and
2
/ \
5 6
So, now a is the number of nodes rooted at 1 and b is number of nodes rooted at 2.
> Run one DFS considering any node as root.
> During DFS, for each node x, calculate nodes[x] and parent[x] where
nodes [x] = k means number of nodes of sub-tree rooted at x is k
parent[x] = y means y is parent of x.
> For any edge between node x and y where parent[x] = y:
a := nodes[root] - nodes[x]
b := nodes[x]
Time and space complexity both O(n).
Note that n=b-a+1. Due to this, you don't need to count both sides of the edge. This greatly simplifies things. A normal recursion over the nodes starting from the root is enough. Since your tree is undirected you don't really have a "root", just pick one of the leaves.
What you want to do is to "go down" the tree until you reach the bottom. Then you count backwards from there. The leaf returns 1, and each recursive step sums the return values for each edge and then increment by 1.
Here is the Java code. Function countEdges() takes in the adjacency list of the tree as an argument also current node and the parent node of the current node(here parent node means that current node was introduced by parent node in this DFS).
Here edge[][] stores the number of nodes on one side of the edge[i][j], obviously the number of nodes on the other side will be equal to (total nodes - edge[i][j]).
int edge[][];
int countEdges(ArrayList<Integer> adj[], int cur, int par) {
// If current nodes is leaf node and is not the node provided by the calling function then return 1
if(adj[cur].size() == 1 && par != 0) return 1;
int count = 1;
// count the number of nodes recursively for each neighbor of current node.
for(int neighbor: adj[cur]) {
if(neighbor == par) continue;
count += countEdges(adj, neighbor, cur);
}
// while returning from recursion assign the result obtained in the edge[][] matrix.
return edge[par][cur] = count;
}
Since we are visiting each node only once in the DFS time complexity should be O(V).

Algorithm - Finding the number of pairs with diameter distance in a tree?

I have a non-rooted bidirectional unweighted non-binary tree. I know how to find the diameter of the tree, the greatest distance between any pair of points in the tree, but I'm interested in finding the number of pairs with that max distance. Is there an algorithm to find the number of pairs with diameter distance in better than O(V^2) time, where V is the number of nodes?
Thank you!
Yes, there's a linear-time algorithm that operates bottom-up and resembles the algorithm for just finding the diameter. Here's the signature in Java-ish pseudocode; I'll leave the algorithm itself as an exercise.
class Node {
Collection<Node> children;
}
class Result {
int height; // height of the tree
int num_deep_nodes; // number of nodes whose depth equals the height
int diameter; // length of the longest path inside the tree
int num_long_paths; // number of pairs of nodes at distance |diameter|
}
Result computeNumberOfLongPaths(Node root); // recursive
Yes there is an algorithm with O(V+E) time.It is simply a modified version of finding the diameter.
As we know we can find the diameter using two calls of BFS by first making first call on any node and then remembering the last node discovered u and running a second call BFS(u),and remembering the last node discovered ,say v.The distance between u and v gives us the diameter.
Coming to number of pairs with that max distance.
1.Before invoking the first BFS,initialize an array distance of length |V| and distance[s]=0.s is the starting vertex for first BFS call on any node.
2.In the BFS,modify the while loop as:
while(Q is not empty)
{
e=deque(Q);
for all vertices w adjacent to e
{
if(w is not visited)
{
enque(w)
mark w as visited
distance[w]=distance[e]+1
parent[w]=e
}
}
}
3.Like I said,remembering the last node visited,say u is that node. Now counting the number of vertices that are at the same level as vertex u. mark is an array of length n,which has all its value initialized to 0,0 implies that vertex not counted initially.
n1=0
for i = 1 to number of vertices
{
if(distance[i]==distance[u]&&mark[i]==0)
{
n1++
mark[i]=1/*vertex counted*/
}
}
n1 gives the number of vertices,that are at the same level as vertex u,now for all vertices that have mark[i] = 1 ,are marked and they will not be counted again.
4.Similarly before performing second BFS on u,initialize another array distance2 of length |V| and distance2[u]=0.
5.Run BFS(u) and again get the last node discovered say v
6.Repeat 3rd step,this time on distance2 array and taking a different variable say n2=0 and the condition being
if(distance2[i]==distance2[v]&&mark[i]==0)
n2++
else if(distance2[i]==distance2[v]&&mark[i]==1)
set_common=1
7.set_common is a global variable that is set when there are a set of vertices such that between any two vertices the path is that of a diameter and the first bfs did not mark all those vertices but did mark at least one of those that is why mark[i]==1.
Suppose that first bfs did mark all such vertices in first call then n2 would be = 0 and set_common would not be set and there is no need also.But this situation is same as above
In any case the number of pairs giving diameter are:=
(n+n2)combination2 - X=(n1+n2)!/((2!)((n1+n2-2)!)) - X
I will elaborate on what X is.Else the number of pairs are = n1*n2,which is the case when 2 disjoint set of vertices are giving the diameter
So the Condition used is
if(n2==0||set_common==1)
number_of_pairs=(n1+n2)C2-X
else n1*n2
Now talking about X.It can occur that the vertices that are marked may have common parent.In that case we must not count there combinations.So before using the above condition it is advised to run the following algorithm
X=0/*Initialize*/
for(i = 1 to number of vertices)
{
s = 0,p = -1
if(mark[i]==0)
continue
else
{
s++
if(p==-1)
p=parent[i]
while((i+1)<=number_of_vertices&& p==parent[i+1])
{s++;i++}
}
if(s>1)
X=X+sC2
}
Proof of correctness
It is very easy.Since BFS traverses a tree level by level,n1 will give you the number of vertices at the level of u and n2 gives you the number of vertices at the level of v and since the distance between u and v = diameter.Therefore, distance between any vertex on level of u and any vertex on level of v will be equal to diameter.
The time taken is 2(|V|) + 2*time_of_DFS=O(V+E).

Finding cheapest path on a graph, cost determined by max-weight of used nodes

I have a graph G with a starting node S and an ending node E. What's special with this graph is that instead of edges having costs, here it's the nodes that have a cost. I want to find the way (a set of nodes, W) between S and E, so that max(W) is minimized. (In reality, I am not interested of W, just max(W)) Equivalently, if I remove all nodes with cost larger than k, what's the smallest k so that S and E are still connected?
I have one idea, but want to know if it is correct and optimal. Here's my current pseudocode:
L := Priority Queue of nodes (minimum on top)
L.add(S, S.weight)
while (!L.empty) {
X = L.poll()
return X.weight if (X == G)
mark X visited
foreach (unvisited neighbour N of X, N not in L) {
N.weight = max(N.weight, X.weight)
L.add(N, N.weight)
}
}
I believe it is worst case O(n log n) where n is the number of nodes.
Here are some details for my specific problem (percolation), but I am also interested of algorithms for this problem in general. Node weights are randomly uniformly distributed between 0 and a given max value. My nodes are Poisson distributed on the R²-plane, and an edge between two nodes exists if the distance between two nodes is less than a given constant. There are potentially very many nodes, so they are generated on the fly (hidden in the foreach in the pseudocode). My starting node is in (0,0) and the ending node is any node on a distance larger than R from (0,0).
EDIT: The weights on the nodes are floating point numbers.
Starting from an empty graph, you can insert vertices (and their edges to existing neighbours) one at a time in increasing weight order, using a fast union/find data structure to maintain the set of connected components. This is just like the Kruskal algorithm for building minimum spanning trees, but instead of adding edges one at a time, for each vertex v that you process, you would combine the components of all of v's neighbours.
You also keep track of which two components contain the start and end vertices. (Initially comp(S) = S and comp(E) = E; before each union operation, the two input components X and Y can be checked to see whether either one is either comp(S) or comp(E), and the latter updated accordingly in O(1) time.) As soon as these two components become a single component (i.e. comp(S) = comp(E)), you stop. The vertex just added is the maximum weight vertex on the the path between S and E that minimises the maximum weight of any vertex.
[EDIT: Added time complexity info]
If the graph contains n vertices and m edges, it will take O(n log n) time to sort the vertices by weight. There will be at most m union operations (since every edge could be used to combine two components). If a simple disjoint set data structure is used, all of these union operations could be done in O(m + n log n) time, and this would become the overall time complexity; if path compression is also used, this drops to O(m A(n)), where A(n) is the incredibly slowly growing inverse Ackermann function, but the overall time complexity remains unchanged from before because the initial sorting dominates.
Assuming integer weights, Pham Trung's binary search approach will take O((n + m) log maxW) time, where maxW is the heaviest vertex in the graph. On sparse graphs (where m = O(n)), this becomes O(n log maxW), while mine becomes O(n log n), so here his algorithm will beat mine if log(maxW) << log(n) (i.e. if all weights are very small). If his algorithm is called on a graph with large weights but only a small number of distinct weights, then one possible optimisation would be to sort the weights in O(n log n) time and then replace them all with their ranks in the sorted order.
This problem can be solved by using binary search.
Assume that the solution is x, Starting from the start, we will use BFS or DFS to discover the graph, visit only those nodes which have weight <= x. So, in the end, if Start and End is connected, x can be the solution. We can find the optimal value for x by applying binary search.
Pseudo code
int min = min_value_of_all_node;
int max = max_value_of_all_node;
int result = max;
while(min<= max){
int mid = (min + max)>>1;
if(BFS(mid)){//Using Breadth first search to discover the graph.
result = min(mid, result);
max = mid - 1;
}else{
min = mid + 1;
}
}
print result;
Note: we only need to apply those weights that exist in the graph, so this can help to reduce time complexity of the binary search to O(log n) with n is number of distinct weights
If the weights are float, just use the following approach:
List<Double> listWeight ;//Sorted list of weights
int min = 0;
int max = listWeight.size() - 1;
int result = max;
while(min<= max){
int mid = (min + max)>>1;
if(BFS(listWeight.get(mid))){//Using Breadth first search to discover the graph.
result = min(mid, result);
max = mid - 1;
}else{
min = mid + 1;
}
}
print listWeight.get(result);

Finding the heaviest length-constrained path in a weighted Binary Tree

UPDATE
I worked out an algorithm that I think runs in O(n*k) running time. Below is the pseudo-code:
routine heaviestKPath( T, k )
// create 2D matrix with n rows and k columns with each element = -∞
// we make it size k+1 because the 0th column must be all 0s for a later
// function to work properly and simplicity in our algorithm
matrix = new array[ T.getVertexCount() ][ k + 1 ] (-∞);
// set all elements in the first column of this matrix = 0
matrix[ n ][ 0 ] = 0;
// fill our matrix by traversing the tree
traverseToFillMatrix( T.root, k );
// consider a path that would arc over a node
globalMaxWeight = -∞;
findArcs( T.root, k );
return globalMaxWeight
end routine
// node = the current node; k = the path length; node.lc = node’s left child;
// node.rc = node’s right child; node.idx = node’s index (row) in the matrix;
// node.lc.wt/node.rc.wt = weight of the edge to left/right child;
routine traverseToFillMatrix( node, k )
if (node == null) return;
traverseToFillMatrix(node.lc, k ); // recurse left
traverseToFillMatrix(node.rc, k ); // recurse right
// in the case that a left/right child doesn’t exist, or both,
// let’s assume the code is smart enough to handle these cases
matrix[ node.idx ][ 1 ] = max( node.lc.wt, node.rc.wt );
for i = 2 to k {
// max returns the heavier of the 2 paths
matrix[node.idx][i] = max( matrix[node.lc.idx][i-1] + node.lc.wt,
matrix[node.rc.idx][i-1] + node.rc.wt);
}
end routine
// node = the current node, k = the path length
routine findArcs( node, k )
if (node == null) return;
nodeMax = matrix[node.idx][k];
longPath = path[node.idx][k];
i = 1;
j = k-1;
while ( i+j == k AND i < k ) {
left = node.lc.wt + matrix[node.lc.idx][i-1];
right = node.rc.wt + matrix[node.rc.idx][j-1];
if ( left + right > nodeMax ) {
nodeMax = left + right;
}
i++; j--;
}
// if this node’s max weight is larger than the global max weight, update
if ( globalMaxWeight < nodeMax ) {
globalMaxWeight = nodeMax;
}
findArcs( node.lc, k ); // recurse left
findArcs( node.rc, k ); // recurse right
end routine
Let me know what you think. Feedback is welcome.
I think have come up with two naive algorithms that find the heaviest length-constrained path in a weighted Binary Tree. Firstly, the description of the algorithm is as follows: given an n-vertex Binary Tree with weighted edges and some value k, find the heaviest path of length k.
For both algorithms, I'll need a reference to all vertices so I'll just do a simple traversal of the Tree to have a reference to all vertices, with each vertex having a reference to its left, right, and parent nodes in the tree.
Algorithm 1
For this algorithm, I'm basically planning on running DFS from each node in the Tree, with consideration to the fixed path length. In addition, since the path I'm looking for has the potential of going from left subtree to root to right subtree, I will have to consider 3 choices at each node. But this will result in a O(n*3^k) algorithm and I don't like that.
Algorithm 2
I'm essentially thinking about using a modified version of Dijkstra's Algorithm in order to consider a fixed path length. Since I'm looking for heaviest and Dijkstra's Algorithm finds the lightest, I'm planning on negating all edge weights before starting the traversal. Actually... this doesn't make sense since I'd have to run Dijkstra's on each node and that doesn't seem very efficient much better than the above algorithm.
So I guess my main questions are several. Firstly, do the algorithms I've described above solve the problem at hand? I'm not totally certain the Dijkstra's version will work as Dijkstra's is meant for positive edge values.
Now, I am sure there exist more clever/efficient algorithms for this... what is a better algorithm? I've read about "Using spine decompositions to efficiently solve the length-constrained heaviest path problem for trees" but that is really complicated and I don't understand it at all. Are there other algorithms that tackle this problem, maybe not as efficiently as spine decomposition but easier to understand?
You could use a DFS downwards from each node that stops after k edges to search for paths, but notice that this will do 2^k work at each node for a total of O(n*2^k) work, since the number of paths doubles at each level you go down from the starting node.
As DasBoot says in a comment, there is no advantage to using Dijkstra's algorithm here since it's cleverness amounts to choosing the shortest (or longest) way to get between 2 points when multiple routes are possible. With a tree there is always exactly 1 way.
I have a dynamic programming algorithm in mind that will require O(nk) time. Here are some hints:
If you choose some leaf vertex to be the root r and direct all other vertices downwards, away from the root, notice that every path in this directed tree has a highest node -- that is, a unique node that is nearest to r.
You can calculate the heaviest length-k path overall by going through each node v and calculating the heaviest length-k path whose highest node is v, finally taking the maximum over all nodes.
A length-k path whose highest node is v must have a length-i path descending towards one child and a length-(k-i) path descending towards the other.
That should be enough to get you thinking in the right direction; let me know if you need further help.
Here's my solution. Feedback is welcome.
Lets treat the binary tree as a directed graph, with edges going from parent to children. Lets define two concepts for each vertex v:
a) an arc: which is a directed path, that is, it starts from vertex v, and all vertices in the path are children of the starting vertex v.
b) a child-path: which is a directed or non-directed path containing v, that is, it could start anywhere, end anywhere, and go from child of v to v, and then, say to its other child. The set of arcs is a subset of the set of child-paths.
We also define a function HeaviestArc(v,j), which gives, for a vertex j, the heaviest arc, on the left or right side, of length j, starting at v. We also define LeftHeaviest(v,j), and RightHeaviest(v,j) as the heaviest left and right arcs of length j respectively.
Given this, we can define the following recurrences for each vertex v, based on its children:
LeftHeaviest(v,j) = weight(LeftEdge(v)) + HeaviestArc(LeftChild(v)),j-1);
RightHeaviest(v,j) = weight(RightEdge(v)) + HeaviestArc(RightChild(v)),j-1);
HeaviestArc(v,j) = max(LeftHeaviest(v,j),RightHeaviest(v,j));
Here j here goes from 1 to k, and HeaviestArc(v,0)=LeftHeaviest(v,0),RightHeaviest(v,0)=0 for all. For leaf nodes, HeaviestArc(v,0) = 0, and HeaviestArc(v,j)=-inf for all other j (I need to think about corner cases more thoroughly).
And then HeaviestChildPath(v), the heaviest child-path containing v, can be calculated as:
HeaviestChildPath(v) = max{ for j = 0 to k LeftHeaviest(j) + RightHeaviest(k-j)}
The heaviest path should be the heaviest of all child paths.
The estimated runtime of the algorithm should be order O(kn).
def traverse(node, running_weight, level):
if level == 0:
if max_weight < running_weight:
max_weight = running_weight
return
traverse(node->left,running_weight+node.weight,level-1)
traverse(node->right,running_weight+node.weight,level-1)
traverse(node->parent,running_weight+node.weight,level-1)
max_weight = 0
for node in tree:
traverse(node,0,N)

Resources