ET-tree in Henzinger and King (1995) - data-structures

I'm having trouble understanding the following from the beginning of section 2.5 of Henzinger and King (FOCS 1995):
We encode an arbitrary tree T with n vertices using a sequence of 2n - 1 symbols, which is generated as follows: Root the tree at an arbitrary vertex.
Then traverse T in depth-first search order traversing each edge twice (once in each direction) and visiting every degree-d vertex d times, except for the root which is visited d + 1 times. Each time any vertex u is encountered, we call this an occurrence of the vertex. Let ET(T) be the sequence of node occurrences representing an arbitrary tree T.
For each spanning tree T(B) of a block B of H_i each occurrence of ET(T(B)) is stored in a node of a balanced binary search tree, called the ET(T(B))-tree. For each vertex u in T(B), we arbitrarily choose one occurrence to be the active occurrence of u.
With the active occurrence of each vertex v, we keep the (unordered) list of nontree edges in B which are incident to u, stored as a balanced binary tree. Each node in the ET-tree contains the number of nontree edges stored in its subtree.
Using this data structure for each level we can sample an edge of T_1 in time O(logn).
My questions follow some explanation of some terms.
From section 2.2,
A block is a maximal set of nodes that are biconnected.
H_i is a subgraph of G, whose dynamic biconnectivity (which is much of the topic of the paper) we wish to maintain. (More specifically, the edge set of G has been partitioned into l = log(|E(G)|) sets and each set induces a graph H_i for 1 <= i <= l.)
Here's my understanding. Pick some T(B) of H_i. The key of each active node in the corresponding ET tree is the number of "nontree edges in B [that] are incident to u". The value of each node is the appropriate list of edges.
Questions:
To finalise the ET tree, we have each node store the sum of the keys of all nodes in its subtree (including the node itself)?
The list for each node is not a balanced tree, right?
If only the active nodes matter, why create an ET-tree in the first place?
To sample an edge, we generate a number between 1 and |E(B))|. If the number is between 1 and |V(B)| - 1 then we're choosing some tree edge. Otherwise, we traverse the balanced tree and then select the appropriate edge from a list?

Related

Concept of Tarjan's bridge-finding algorithm

I was doning a problem of finding a bridge in a undirected connected graph, I looked up wikipedia for Tarjan's algorithm. Here is what it writes
Tarjan's bridge-finding algorithm
The first linear time algorithm for finding the bridges in a graph was described by
Robert Tarjan in 1974. It performs the following steps:
Find a spanning forest of G
Create a rooted forest F from the spanning forest
Traverse the forest F in preorder and number the nodes. Parent nodes in the forest now have lower numbers than child nodes.
For each node v in preorder (denoting each node using its preorder number), do:
Compute the number of forest descendants ND(v) for this node, by adding one to the sum of its children's descendants.
Compute L(v), the lowest preorder label reachable from v by a path for which all but the last edge stays within
the subtree rooted at v. This is the minimum of the set
consisting of the preorder label of v, of the values of
L(w) at child nodes of v and of the preorder
labels of nodes reachable from v by edges that do not
belong to F.
Similarly, compute H(v), the highest preorder label reachable by a path for which all but the last edge stays within the
subtree rooted at v. This is the maximum of the set
consisting of the preorder label of v, of the values of
H(w) at child nodes of v and of the preorder
labels of nodes reachable from v by edges that do not
belong to F.
For each node w with parent node v, if L(w) = w and H(w) < w + ND(w) then the edge
from v to w is a bridge.
I wonder whether I understand the previous steps wrong, since in my opinion, I think that L(w) = w is never gonna happen except at the root. Where in other cases, L(w) should be at least smaller than the father of w.
Source
The English description of L and H is slightly wrong -- they should exclude paths that contain the parent edge, or else it's as if there are parallel edges between each pair of adjacent nodes, hence no bridges. The algorithm for computing L and H correctly iterates over children only.

Specific Graph and need to more Creative solution

Directed Graph (|V|=a, |E|=b) is given.
each vertexes has specific weight. we want for each vertex (1..a) find a vertex with maximum weight that can be reachable from that vertex.
Update 1: one nice answer is prepare by #Paul in O(b + a log a). but I
search for O(a + b) algorithms, if any?
Is there any different efficient or fastest any other ways for doing it?
Yes, it's possible to modify Tarjan's SCC algorithm to solve this problem in linear time.
Tarjan's algorithm uses two node fields to drive its SCC finding logic: index, which represents the order in which the algorithm discovers the nodes; and lowlink, the minimum index reachable by a sequence of tree arcs followed by a back arc. As part of the same depth-first traversal, we can compute another field, maxweight, which has one of two meanings:
For a node not yet included in a finished SCC, it represents the maximum weight reachable by a sequence of tree arcs, optionally followed by a cross arc to another SCC and then any subsequent path.
For nodes in a finished SCC, it represents the maximum weight reachable.
The logic for computing maxweight is as follows. If we discover an arc from v to a new node w, then vw is a tree arc, so we compute w.maxweight recursively and update v.maxweight = max(v.maxweight, w.maxweight). If w is on the stack, then we do nothing, because vw is a back arc and not included in the definition of maxweight. Otherwise, vw is a cross arc, and we do the same update that we would have done for a tree arc, just without the recursive call.
When Tarjan's algorithm identifies an SCC, it's because it has a node r with r.lowlink == r.index. Since r is the depth-first search root of this SCC, its value of maxweight is correct for the whole SCC. Instead of recording each node in the SCC, we simply update its maxweight to r.maxweight.
Sort all nodes by weight in decreasing order and create the graph g' with all edges in E reversed (i.e. if there's an edge a -> b in g, there's an edge b -> a in g'). In this graph you can now propagate the maximum-value by simple DFS. Do this iteratively for all nodes and terminate when a maximum-weight has already been assigned.
As pseudocode:
dfs_assign_weight_reachable(node, weight):
if node.max_weight_reachable >= weight:
return
node.max_weight_reachable = weight
for n = neighbor of node:
dfs_assign_weight_reachable(n, weight)
g' = g with all edges reversed
nodes = nodes from g' sorted descendingly by weight
assign max_weight_reachable = -inf to each node in nodes
for node in nodes:
dfs_assign_weight_reachable(node, node.weight)
UPDATE:
The tight bound is O(b + a log a). a log a is caused by the sorting step. And each edge gets visited once during the reversal step and once during the assigning maximum weights, giving the second term in the max-expression.
Acknowledgement:
I'd like to thank #SerialLazer for the time invested in a discussion about the time-complexity of the above algorithm and helping me figure out the correct bound.

relation between degrees of vertices and edge removal

I'm looking for help to prove the next question:
given an undirected tree with n vertices with each one's degree <= 3,
(1) prove that there exists an edge that if we remove we'll have two trees with number of vertices in each one - maximum (2*n/3).
(2) suggest a linear algorithm that finds such an edge in the above given tree
Choose an arbitrary root. Do a post order traversal to compute the size of each subtree. By descending from the root via children with subtrees at least as large as their siblings, find a subtree of size between (n-1)/3 inclusive and 2(n-1)/3 + 1 exclusive (the degree bound keeps the size from decreasing by more than minus one divided by two). Sever its parent edge.

Detecting connectedness of nodes over a long time in a graph

I start out with a graph of N nodes with no edges.
Then I procede to take M predetermined steps.
At each step, I must either create an edge between two nodes, or delete an edge.
After each step, I must print out how many connected components there are in my graph.
Is there an algorithm for solving this problem in time linear with respect to M? If not, is there one better than O(min(M,N) * M) in the worst case?
EDIT:
The program does not get to decide what the M steps are.
I have to read from the input, whether I am supposed to create an edge or delete it, and also which edge I am supposed to create/delete.
So example input might be
N = 4
M = 4
JOIN 1 2
JOIN 2 3
DELETE 2 3
DELETE 1 2
Then my output should be
3 # (1 2) 3 4
2 # (1 2 3) 4
3 # (1 2) 3 4
4 # 1 2 3 4
There are ways to solve this problem fully online, but they're more complicated than this answer. The algorithm that I'm proposing is to maintain a spanning forest of the available edges, together with the number of components of the spanning forest (and hence the graph). If we were attacking this problem fully online, then this would be problematic, since a spanning forest edge might get deleted, leaving us to paw through the unused edges for a replacement. We know, however, how soon each edge currently in the graph will be deleted.
The particular spanning forest that we maintain is a maximum-weight spanning forest, where the weight of each edge is its deletion time. If an edge belonging to this spanning forest is deleted, then there is no replacement, since every other edge connecting the components represented by its endpoints either hasn't been inserted yet or, having lesser weight, has already been deleted.
There's a dynamic tree data structure, also referred to as a link/cut tree, due to Sleator and Tarjan, that can be made to provide the following operations in logarithmic time.
Link(u, v, w) - inserts an edge between u and v with weight w;
u and v must not be connected already
Cut(u, v) - cuts the edge between u and v, if it exists;
returns a boolean indicating whether an edge was removed
FindMin(u, v) - finds the minimum-weight edge on the path from u to v
and returns its endpoints and weight;
returns null if either u = v or u and v are not connected
To maintain the forest, when an edge from u to v is inserted, compare its removal time to the minimum on the path from u to v. If the minimum does not exist, then insert the edge. If the minimum is less than the new edge, delete the minimum and replace it with the new edge. Otherwise, do nothing. When an edge from u to v is deleted, attempt to delete it from the forest.
The running time of this approach is O(m log n). If you don't have a dynamic tree handy, then it admittedly will take quite a while to implement. Instead of using a proper dynamic tree, I've had success with a much simpler data structure that just stores the forest as a bunch of nodes with weights and parent pointers. The running time then is O(m d) where d is the maximum diameter of the graph, and if you're lucky, d is quite a lot less than n.

how to find all edge-disjoint equi-cost paths in an undirected graph between a pair of nodes?

Given an undirected graph G = (V, E), the edges in G have non-negative weights.
[1] how to find all the edge-disjoint and equal-cost paths between two nodes s and t ?
[2] how to find all the edge-disjoint paths between two nodes s and t ?
[3] how to find all the vertex-disjoint and equal-cost paths between two nodes s and t ?
[4] how to find all the vertex-disjoint paths between two nodes s and t ?
Any approximation algorithms instead ?
Build a tree where each node is a representation of a path through your unidirectional graph.
I know, a tree is a graph too, but in this answer I will use the following terms with this meanings:
vertex: is a vertex in your unidirectional graph. It is NOT a node in my tree.
edge: is an edge in your unidirectional graph. It is NOT part of my tree.
node: a vertex in my tree, not in your graph.
root: the sole node on top of my tree that has no parent.
leaf: any node in my tree that has no children.
I will not talk about edges in my tree, so there is no word for tree-edges. Every edge in this answer is part of your graph.
Algorithm to build the tree
The root of the tree is the representation of a path that contains only of the vertex s and contains no edge. Let its weight be 0.
Do for every node in that tree:
Take the end-vertex of the path represented by that node (I call it the actual node) and find all edges that lead away from that end-vertex.
If: there are no edges that lead away from this vertex, you reached a dead end. This path never will lead to vertex t. So mark this node as a leaf and give it an infinite weight.
Else:
For each of those edges:
add a child-node to the actual node. Let it be a copy of the actual node. Attach the edge to the end of path and then attach the edges end-vertex to the path too. (so each path follows the pattern vertex-edge-vertex-edge-vertex-...)
Now traverse up in the tree, back to the root and check if any of the predecessors has an end-vertex that is identic with the just added end-vertex.
If you have a match, the newly generated node is the representation of a path that contains a loop. Mark this node as a leaf and give it an infinite weight.
Else If there is no loop, just add the newly added edges weight to the nodes weight.
Now test, if the end-vertex of the newly generated node is the vertex t.
If it really is, mark this node as a leaf, since the path represented by this node is a valid path from s to t.
This algorithm always comes to an end in finite time. At the end you have 3 types of leafs in your tree:
nodes that represent dead ends with an infinite weight
nodes that represent loops, also with an infinite weight
nodes that represent a valid path from s to t, with its weight beeing the sum of all edges weights that are part of this path.
Each path represented by a leaf has its individual sequence of edges (but might contain the same sequence of vertexes), so the leafs with finite weights represent the complete set of edge-disjoint pathes from s to t. This is the solution of exercise [2].
Now do for all leafs with finite weight:
Write its weight and its path into a list. Sort the list by the weights. Now paths with identic weight are neighbours in the list, and you can find and extract all groups of equal-cost paths in this sorted list. This is the solution of exercise [1].
Now, do for each entry in this list:
add to each path in this list the list of its vertexes. After you have done this, you have a table with this columns:
weight
path
1st vertex (is always s)
2nd vertex
3rd vertex
...
Sort this table lexigraphic by the vertexes and after all vertexes by the weight (sort by 1st vertex, 2nd vertex, 3rd vertex ,... ,weight)
If one row in this sorted table has the same sequence of vertexes as the row before, then delete this row.
This is the list of all vertex-disjoint paths between two nodes s and t, and so it is the solution of exercise [4].
And in this list you find all equal-cost paths as neighbours, so you can easily extract all groups of vertex-disjoint and equal-cost paths between two nodes s and t from that list, so here you have the solution of exercise [3].

Resources