Prove traversing a k-ary tree twice yields the diameter - algorithm

I've known the algorithm to find the diameter of a tree mentioned here for quite some time:
Select a random node A
Run BFS on this node to find furthermost node from A. name this node as S.
Now run BFS starting from S, find the furthermost node from S, name it D.
Path between S and D is diameter of the tree.
But why does it work?
I would accept both Ivan's and coproc's answer if I can. These are 2 very different approaches that both answer my question.

say S = [A - B - C - D - ... X - Y - Z] is the diameter of the tree.
consider each node in S, say #, start from it and go "away" from the diameter, there won't be a longer chain than min(length(#, A), length(#, Z)).
so dfs from any node on the tree, it will ends at A or 'Z', i.e. one end of the diameter, dfs again from it will of course lead you to the other side of the tree.
refer to this

Suppose you've completed steps 1 and 2 and have found S, and that there is no diameter in the tree that includes S. Pick a diameter PQ of the tree. You basically have to check the possible cases and in all of them, find that either PS or SQ is at least as long as PQ - which would be a contradiction.
In order to systematically check all cases, you can assume that the tree is rooted at A. Then the shortest path between any two vertices U and V is calculated in the following way - let W be the lowest common ancestor of U and V. Then the length of UV is equal to the sum of the distances between U and W and between V and W - and, in a rooted tree, these distances are just differences in the levels of the nodes (and S has a maximum level in this tree).
Then analyze all possible positions S could take with respect to the subtree rooted at W (lowest common ancestor of P and Q) and the vertices P and Q. For example, the first case is simple - S is not in the subtree rooted at W. Then, we can trivially improve the path by selecting the one of P and Q that is more distant to the root, and connecting it to the S. The rest of the cases are similar.

This algorithm works for any acyclic graph (a tree being a special acyclic graph in that it has a root).
A proof can be constructed by choosing any two additional points S2 and D2 and showing that their distance d(S2,D2) ≤ d(S,D). From the algorithm we know
by step 2: d(A,S)≥d(A,D), d(A,S)≥d(A,S2), d(A,S)≥d(A,D2) and
by step 3: d(S,D)≥d(S,A), d(S,D)≥d(S,S2), d(S,D)≥d(S,D2).
By distinguishing at most 5 cases (e.g. the paths SD and S2D2 have no edge in common, the paths SD and S2D2 have edges in common and A is connected to the edges running to S, etc. see image below) one can decompose the above distances into sub-paths and rewrite the inequalities based on the sub-paths. The conclusion follows from simple algebra. The details are left to the reader as an exercise. :-)

A few lemmas/facts before we get started with proof.
T is a tree so there is exactly 1 path between any 2 pair of vertices.
If S--D is the diameter then a BFS with source as S (or D) will end up giving D (or S) the largest distance. (By definition of diameter)
Also lets define |XY| to be the length of the path X--Y.
Define |XX| = 0.
Let A be the random node selected by the algorithm.
After Step 2 let the furthest node got be P.
If P is either S or D then using Lemma 2 we are done. So we must show that P has to be either S or D.
Claim : If S--D is the diameter, then P is either S or D.
Proof: I am going to prove the above by proving the Contrapositive. The proof is for a tree with a unique diameter but it should work with minor changes (mostly the equalities) for non-unique diameters too.
If P is neither S nor D then S--D is not the diameter.
Assume P is neither S nor D.
Case 1: The Path A--P intersects S--D
Let the point of intersection be K. We know that BFS marked P as the farthest node from A and from Lemma 1.
|AP| > |AS|
|AK| + |KP| > |AK| + |KS|
Therefore we get |KP| > |KS|.
Similarly |KP| > |KD|.
Now we consider the path SP
|SP| = |SK| + |KP|
|SP| > |SK| + |KD|
|SP| > |SD|
So SP is longer than the diameter which means SD is NOT the diameter.
Case 2:The Path A--P does NOT intersects S--D
Now we know BFS marked P as the farthest node. So we have
|AP| > |AD|
|AP| > |AS|
We can write |AD| = |AK| + |KD| where K is one of the vertices in the diameter (including S and D). Similarly |AS| = |AK| + |KS|.
Without loss of generality assume |AD|>=|AS|
|AK| + |KD| >= |AK| + |KS|
|KD| >= |KS|
Now consider the path PD
|PD| = |AP| + |AD|
|PD| = |AP| + |AK| + |KD|
|PD| > |AP| + |KD| (|AK| > 0 since A cannot be on the diameter)
|PD| > |KD| + |KD| (|AP| > |KD|)
|PD| > |SK| + |KD| (|KD| >= |KS|)
|PD| > |SD|
So SD is not the diameter and hence the claim.

Let the Set s represents the nodes along the diameter of the tree, with A and Z being the end nodes, and the distance from A to Z is the diameter. For any node, n, that is a member of s the longest possible path from n will end in either A or Z. Now if you pick a rand node in the tree, v, it either is a member of the set, or it has a path to a node, n, in this set. Since the longest path from n is either A or Z and the path from v to n can not be longer than either the path from n to A or n to Z (if it was then v would have to be a member of the set) then running BFS on any node V will first find either A or Z, and the subsequent call will find the complementary end point. Not a math girl, just throwing out thoughts.

Related

How to count all reachable nodes in a directed graph?

There is a directed graph (which might contain cycles), and each node has a value on it, how could we get the sum of reachable value for each node. For example, in the following graph:
the reachable sum for node 1 is: 2 + 3 + 4 + 5 + 6 + 7 = 27
the reachable sum for node 2 is: 4 + 5 + 6 + 7 = 22
.....
My solution: To get the sum for all nodes, I think the time complexity is O(n + m), the n is the number of nodes, and m stands for the number of edges. DFS should be used,for each node we should use a method recursively to find its sub node, and save the sum of sub node when finishing the calculation for it, so that in the future we don't need to calculate it again. A set is needed to be created for each node to avoid endless calculation caused by loop.
Does it work? I don't think it is elegant enough, especially many sets have to be created. Is there any better solution? Thanks.
This can be done by first finding Strongly Connected Components (SCC), which can be done in O(|V|+|E|). Then, build a new graph, G', for the SCCs (each SCC is a node in the graph), where each node has value which is the sum of the nodes in that SCC.
Formally,
G' = (V',E')
Where V' = {U1, U2, ..., Uk | U_i is a SCC of the graph G}
E' = {(U_i,U_j) | there is node u_i in U_i and u_j in U_j such that (u_i,u_j) is in E }
Then, this graph (G') is a DAG, and the question becomes simpler, and seems to be a variant of question linked in comments.
EDIT previous answer (striked out) is a mistake from this point, editing with a new answer. Sorry about that.
Now, a DFS can be used from each node to find the sum of values:
DFS(v):
if v.visited:
return 0
if v is leaf:
return v.value
v.visited = true
return sum([DFS(u) for u in v.children])
This is O(V^2 + VE) worst vase, but since the graph has less nodes, V
and E are now significantly lower.
Some local optimizations can be made, for example, if a node has a single child, you can reuse the pre-calculated value and not apply DFS on the child again, since there is no fear of counting twice in this case.
A DP solution for this problem (DAG) can be:
D[i] = value(i) + sum {D[j] | (i,j) is an edge in G' }
This can be calculated in linear time (after topological sort of the DAG).
Pseudo code:
Find SCCs
Build G'
Topological sort G'
Find D[i] for each node in G'
apply value for all node u_i in U_i, for each U_i.
Total time is O(|V|+|E|).
You can use DFS or BFS algorithms for solving Your problem.
Both have complexity O(V + E)
You dont have to count all values for all nodes. And you dont need recursion.
Just make something like this.
Typically DFS looks like this.
unmark all vertices
choose some starting vertex x
mark x
list L = x
while L nonempty
choose some vertex v from front of list
visit v
for each unmarked neighbor w
mark w
add it to end of list
In Your case You have to add some lines
unmark all vertices
choose some starting vertex x
mark x
list L = x
float sum = 0
while L nonempty
choose some vertex v from front of list
visit v
sum += v->value
for each unmarked neighbor w
mark w
add it to end of list

Shortest path that traverses a list of required edges

I have a directed graph, that looks like this:
I want to find the cheapest path from Start to End where the orange dotted lines are all required for the path to be valid.
The natural shortest path would be: Start -> A -> B -> End with the resultant cost = 5, but we have not met all required edge visits.
The path I want to find (via a general solution) is Start -> A -> B -> C -> D -> B -> End where the cost = 7 and we have met all required edge visits.
Does anyone have any thoughts on how to require such edge traversals?
Let R be the set of required edges and F = |R|. Let G be the input graph, t (resp. s) the starting (resp. ending) point of the requested path.
Preprocessing: A bunch of Dijkstra's algorithm runs...
The first step is to create another graph. This graph will have exactly F+2 vertices:
One for each edge in R
One for the starting point t of the path you want to compute
One for the ending point s of the path you want to compute
To create this graph, you will have to do the following:
Remove every edge in R from G.
For each edge E = (b,e) in R:
Compute the shortest path from t to b and the shortest path from e to s. If they exist, add an edge linking s to E in the "new graph", weighing the length of the related shortest path.
For each edge E' = (b', e') in R \ {E}:
Compute the shortest path from e to b'. If it exists, add an edge from E to E' in the new graph, weighing the length of that shortest path. Attach the computed paths as payload to the relevent edges.
Attach the computed path as a payload to that edge
The complexity to build this graph is O((F+2)².(E+V).log(V)) where E (resp. V) is the number of edges (resp. vertices) in the original graph.
Exhaustive search for the best possible path
From this point, we have to find the shortest Hamiltonian Path in the newly created graph. Unfortunately, this task is a hard problem. We have no better way than exploring every possible path. But that doesn't mean we can't do it cleverly.
We will perform the search using backtracking. We can achieve this by maintaining two sets:
The list of currently explored vertices: K (K for Known)
The list of currently unknown vertices: U (U for Uknown)
Before digging in the algorithm definition, here are the main ideas. We cannot do anything else than exploring the whole space of possible paths in the new graph. At each step, we have to make a decision: which edge do we take next? This leads to a sequence of decisions until we cannot move anymore or we reached s. But now we need to go back and cancel decisions to see if we can do better by changing a direction. To cancel decisions we proceed like this:
Every time we are stuck (or found a path), we cancel the last decision we made
Each time we take a decision at some point, we keep track of which decision, so when we get back to this point, we know not to take that very same decision and explore the others that are available.
We can be stuck because:
We found a path.
We cannot move further (there is no edge we can explore or the only one we could take increases the current partial path too much -- its length becomes higher than the length of the current best path found).
The final algorithm can be summed up in this fashion: (I give an iterative implementation, one can find a recursive implementation a tad easier and clearer)
Let K ← [], L[0..R+1] ← [] and U ← V (where V is the set of every vertex in the working graph minus the starting and ending vertices t and s). Finally let l ← i ← 0 and best_path_length ← ∞ and best_path ← []
While (i ≥ 0):
While U ≠ []
c ← U.popFront() (we take the head of U)
L[i].pushBack(c)
If i == R+1 AND (l == weight(cur_path.back(), s) + l) < best_path_length:
best_path_length ← l
best_path ← cur_path
If there is an edge e between K.tail() and c, and weight(e) + l < best_path_length: (if K is empty, then replace K.tail() with t in the previous statement)
K.pushBack(c)
i ← i+1
l ← weight(e) + l
cur_path.pushBack(c)
Concatenate L[i] at the end of U
L[i] ← []
i ← i-1
cur_path.popBack()
At the end of the while loop (while (i ≥ 0)), best_path will hold the best path (in the new graph). From there you just have to get the edges' payload to rebuild the path in the original graph.

Find two paths in a graph that are in distance of at least D(constant)

Instance of the problem:
Undirected and unweighted graph G=(V,E).
two source nodes a and b, two destination nodes c and d and a constant D(complete positive number).(we can assume that lambda(c,d),lambda(a,b)>D, when lambda(x,y) is the shortest path between x and y in G).
we have two peoples standing on the nodes a and b.
Definition:scheduler set-
A scheduler set is a set of orders such that in each step only one of the peoples make a move from his node v to one of v neighbors, when the starting position of them is in the nodes a,b and the ending position is in the nodes c,d.A "scheduler set" is missing-disorders if in each step the distance between the two peoples is > D.
I need to find an algorithm that decides whether there is a "missing-disorders scheduler set" or not.
any suggestions?
One simple solution would be to first solve all-pairs shortest paths using n breadth-first searches from every node in O(n * (n + m)).
Then create the graph of valid node pairs (x,y) with lambda(x, y) > D, with edges indicating the possible moves. There is an edge {(v,w), (x,y)} if v = x and there is an edge {w, y} in the original graph or if w = y and there is an edge {v, x} in the original graph. This new graph has O(n^2) nodes and O(nm) edges.
Now you just need to check whether (c, d) is reachable from (a, b) in the new graph. This can be achieved using DFS or BFS.
The total runtime be O(n * (n + m)).

How to find if a graph is a tree and its center

Is there an algorithm (or a sequence of algorithms) to find, given a generic graph structure G=(V,E) with no notion of parent node, leaf node and child node but only neighboordhood relations:
1) If G it is a tree or not (is it sufficient to check |V| = |E|+1?)
2) If the graph is actually a tree, the leaves and the center of it? (i.e the node of the graph which minimizes the tree depth)
Thanks
If the "center" of the tree is defined as "the node of the graph which minimizes the tree depth", there's an easier way to find it than finding the diameter.
d[] = degrees of all nodes
que = { leaves, i.e i that d[i]==1}
while len(que) > 1:
i=que.pop_front
d[i]--
for j in neighbors[i]:
if d[j] > 0:
d[j]--
if d[j] == 1 :
que.push_back(j)
and the last one left in que is the center.
you can prove this by thinking about the diameter path.
to simpify , we assume the length of the diameter path is odd, so that the middle node of the path is unique, let's call that node M,
we can see that:
M will not be pushed to the back of que until every node else on
diameter path has been pushed into que
if there's another node N
that is pushed after M has already been pushed into que, then N must
be on a longer path than the diameter path. Therefore N can't exist. M must be the last
node pushed (and left) in que
For (1), all you have to do is verify |V| = |E| + 1 and that the graph is fully connected.
For (2), you need to find a maximal diameter then pick a node in the middle of the diameter path. I vaguely remember that there's an easy way to do this for trees.
You start with an arbitrary node a, then find a node at maximal distance from a, call it b. Then you search from b and find a node at maximal distance from b, call it c. The path from b to c is a maximal diameter.
There are other ways to do it that might be more convenient for you, like this one. Check Google too.
No, it is not enough - a tree is a CONNECTED graph with n-1 edges. There could be n-1 edges in a not connected graph - and it won't be a tree.
You can run a BFS to find if the graph is connected and then count the number of edges, that will give you enough information if the graph is a tree
The leaves are the nodes v with degree of the nodes denoted by d(v) given by the equation d(v) = 1 (which have only one connected vertex to each)
(1) The answer assumes non-directed graphs
(2) In here, n denotes the number of vertices.

Path finding algorithm on graph considering both nodes and edges

I have an undirected graph. For now, assume that the graph is complete. Each node has a certain value associated with it. All edges have a positive weight.
I want to find a path between any 2 given nodes such that the sum of the values associated with the path nodes is maximum while at the same time the path length is within a given threshold value.
The solution should be "global", meaning that the path obtained should be optimal among all possible paths. I tried a linear programming approach but am not able to formulate it correctly.
Any suggestions or a different method of solving would be of great help.
Thanks!
If you looking for an algorithm in general graph, your problem is NP-Complete, Assume path length threshold is n-1, and each vertex has value 1, If you find the solution for your problem, you can say given graph has Hamiltonian path or not. In fact If your maximized vertex size path has value n, then you have a Hamiltonian path. I think you can use something like Held-Karp relaxation, for finding good solution.
This might not be perfect, but if the threshold value (T) is small enough, there's a simple algorithm that runs in O(n^3 T^2). It's a small modification of Floyd-Warshall.
d = int array with size n x n x (T + 1)
initialize all d[i][j][k] to -infty
for i in nodes:
d[i][i][0] = value[i]
for e:(u, v) in edges:
d[u][v][w(e)] = value[u] + value[v]
for t in 1 .. T
for k in nodes:
for t' in 1..t-1:
for i in nodes:
for j in nodes:
d[i][j][t] = max(d[i][j][t],
d[i][k][t'] + d[k][j][t-t'] - value[k])
The result is the pair (i, j) with the maximum d[i][j][t] for all t in 0..T
EDIT: this assumes that the paths are allowed to be not simple, they can contain cycles.
EDIT2: This also assumes that if a node appears more than once in a path, it will be counted more than once. This is apparently not what OP wanted!
Integer program (this may be a good idea or maybe not):
For each vertex v, let xv be 1 if vertex v is visited and 0 otherwise. For each arc a, let ya be the number of times arc a is used. Let s be the source and t be the destination. The objective is
maximize ∑v value(v) xv .
The constraints are
∑a value(a) ya ≤ threshold
∀v, ∑a has head v ya - ∑a has tail v ya = {-1 if v = s; 1 if v = t; 0 otherwise (conserve flow)
∀v ≠ x, xv ≤ ∑a has head v ya (must enter a vertex to visit)
∀v, xv ≤ 1 (visit each vertex at most once)
∀v ∉ {s, t}, ∀cuts S that separate vertex v from {s, t}, xv ≤ ∑a such that tail(a) ∉ S &wedge; head(a) &in; S ya (benefit only from vertices not on isolated loops).
To solve, do branch and bound with the relaxation values. Unfortunately, the last group of constraints are exponential in number, so when you're solving the relaxed dual, you'll need to generate columns. Typically for connectivity problems, this means using a min-cut algorithm repeatedly to find a cut worth enforcing. Good luck!
If you just add the weight of a node to the weights of its outgoing edges you can forget about the node weights. Then you can use any of the standard algorigthms for the shortest path problem.

Resources