I already looked at my book many times, but I'm confused about one definition. In my data structures and algorithms book, I have the following definition:
A node u is an ancestor of a node v if u = v or u is an ancestor of the parent of v. Conversely, we say that a node is a descendent of a node u if u is an ancestor of v.
What is the first part of the definition saying? Does it mean that a node only has two ancestors, itself (u = v) and the parent of it's parent (the parent of v)?
A node u is an ancestor of a node v if u = v or u is an ancestor of the parent of v.
This is a recursive definition, and means that the ancestors of a node are the node itself together with the node's parent and all the parent's ancestors.
The fact that the parent's ancestors are defined by the same definition is what makes it recursive, and what makes, for example, the parent's parent also an ancestor.
Perhaps a walk-through of a structure applying this definition will help.
Consider the tree:
A
/ \
B C
/ / \
D E F
/
G
If we want to find all ancestors of node G, we apply the definition:
G itself is an ancestor, and as E is G's parent, all of E's ancestors are also ancestors of G.
This means in order to find all ancestors of G we must find all ancestors of E.
Again, by the definition E itself is an ancestor of E and therefore also of G, as are all ancestors of E's parent, C.
So we have to find all ancestors of C!
Again applying the definition, C is an ancestor of itself, and thus of both E and G. And the ancestors of C's parent A are as well.
So we have to find all ancestors of A!
Well, by the definition A is its own ancestor, and as it has no parent, it has no other ancestors.
So then, having reached a terminating condition, the recursion sends us back through the stack for the result ...
C has the ancestors C (itself) and A (all the parent A's ancestors).
E has the ancestors E (itself) and C, A (all the parent C's ancestors).
*G has the ancestors G (itself) and E, C, A (all the parent E's ancestors).
I hope this makes it more clear, but, as they say, "In order to understand recursion, you must first understand recursion."
Related
Can someone help me, how I could traversal a balanced binary tree in order without recursion, stack, or morris traversal. I want to traverse it iteratively without modifying the tree.
Thank you so much.
In the case where there are no duplicate keys, this corresponds to the tree representing a set (or map.) In that case, a backtracking approach will be O(log n) (AVL tree property) per key. One could get a faster run time by storing the nodes, (such as in recursion,) but often this is unfeasible.
If current is not null, descend next with the current key as the target. Whenever the left is taken, first assign an ancestor (before descending.)
There are three cases: current was null -> next = root; current == next has a right child, next <- next.right; or next does not have a right child, next <- ancestor (if it does not exist, finished.)
In the first two cases, descend on the left until you hit a leaf.
I'll use Wikipedia AVL Tree article example, without the balance factors, (this will work on any binary tree, but one is not guaranteed performance.)
current
path
result
null
root J
C
C
ancestor D
D
D
ancestor F
F
F
right G
G
G
ancestor J
J
J
right P
N
N
ancestor L
L
L
ancestor P
P
P
right V
Q
Q
ancestor S
S
S
right U
U
U
ancestor V
V
V
right X
X
X
ancestor null
null
If the tree can have duplicate entries, this might be said to be a multiset. In this case, this will not work because this relies on the keys being unique.
I have list of tree nodes. lets say (a b c d). b node is a parent of some e node and e is a parent of c. I want to write algorithm to remove nodes like c from list.
Any suggestions except iterating over all parents of each node and comparing with each element?
I read an algorithm about Twig Pattern Matching as TJfast algorithm.
there is a function as dbl(n) ,the parameter n is a node and this function returns direct Branching or leaf Nodes but I can not understand that , the name of article is " From region encoding to extended dewey: On efficient processing of XML twig pattern matching " there is an example but is vague for me.
Base on definition in the article :
dbl(v) (for direct branching or
leaf node) returns the set of all branching nodes b and leaf nodes f in the twig
rooted at v such that there is no branching nodes along the path from v to b or
f, excluding v, b or f.
example :
dbl(a)={b,c}
dbl(c)={f,g}
I can not understand why dbl(c)={f,g} ??
dbl (directBranchingOrLeafNodes) only contains branching nodes and leaf nodes. Among those nodes, it only contains ones in which there is no intermediate branching node between them and the root.
It's surprisingly hard to find the definition for a branching node, but it appears to be a node that has more than one child. d and e are not branching nodes, because they only have one child. Therefore, they cannot be part of dbl(c).
Then, the path from c to f has no branching nodes, so f is in dbl(c). Likewise, the path from c to g has no branching nodes, so g is in dbl(c).
So we have:
dbl(c) = {f,g}
I'm guessing they probably use dbl to represent subqueries.
dbl(a) = {b,c}, because b is a leaf node and c is a branching node and both of them are descendants of a. In addition, note that there are no other branching node or leaf node which are a descendant of a and an ancestor of b( or c).
dbl(c) = {f,g}, because f and g are leaf nodes and both of them are descendants of c. In addition, note that there are no other branching node or leaf node which are a descendant of c and an ancestor of f(g).
I'm having some trouble trying to represent and manipulate dependency graphs in this scenario:
a node has some dependencies that have to be solved
every path must not have dependencies loops (like in DAG graphs)
every dependency could be solved by more than one other node
I starts from the target node and recursively look for its dependencies, but have to mantain the above properties, in particular the third one.
Just a little example here:
I would like to have a graph like the following one
(A)
/ \
/ \
/ \
[(B),(C),(D)] (E)
/\ \
/ \ (H)
(F) (G)
which means:
F,G,C,H,E have no dependencies
D dependends on H
B depends on F and G
A depends on E and
B or
C or
D
So, if I write down all the possible topological-sorted paths to A I should have:
E -> F -> G -> B -> A
E -> C -> A
E -> H -> D -> A
How can I model a graph with these properties? Which kind of data structure is the more suitable to do that?
You should use a normal adjacency list, with an additional property, wherein a node knows its the other nodes that would also satisfy the same dependency. This means that B,C,D should all know that they belong to the same equivalence class. You can achieve this by inserting them all into a set.
Node:
List<Node> adjacencyList
Set<Node> equivalentDependencies
To use this data-structure in a topo-sort, whenever you remove a source, and remove all its outgoing edges, also remove the nodes in its equivalency class, their outgoing edges, and recursively remove the nodes that point to them.
From wikipedia:
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
remove a node n from S
add n to tail of L
for each node o in the equivalency class of n <===removing equivalent dependencies
remove j from S
for each node k with an edge e from j to k do
remove edge e from the graph
if k has no other incoming edges then
insert k into S
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
This algorithm will give you one of the modified topologicaly-sorted paths.
I am having a tough time understanding Tarjan's lowest common ancestor algorithm. Can somebody explain it with an example?
I am stuck after the DFS search, what exactly does the algorithm do?
My explanation will be based on the wikipedia link posted above :).
I assumed that you already know about the union disjoint structure using in the algorithm.
(If not please read about it, you can find it in "Introduction to Algorithm").
The basic idea is every times the algorithm visit a node x, the ancestor of all its descendants will be that node x.
So to find a Least common ancestor (LCA) r of two nodes (u,v), there will be two cases:
Node u is a child of node v (vice versa), this case is obvious.
Node u is ith branch and v is the jth branch (i < j) of node r, so after visit node u, the algorithm backtrack to node r, which is the ancestor of the two nodes, mark the ancestor of node u as r and go to visit node v.
At the moment it visit node v, as u is already marked as visited (black), so the answer will be r. Hope you get it!
I will explain using the code from CP-Algorithms:
void dfs(int v)
{
visited[v] = true;
ancestor[v] = v;
for (int u : adj[v]) {
if (!visited[u]) {
dfs(u);
union_sets(v, u);
ancestor[find_set(v)] = v;
}
}
for (int other_node : queries[v]) {
if (visited[other_node])
cout << "LCA of " << v << " and " << other_node
<< " is " << ancestor[find_set(other_node)] << ".\n";
}
}
Let's outline a proof of the algorithm.
Lemma 1: For each vertex v and its parent p, after we visit v from p and union v with p, p and all vertices in the subtree of root v (i.e. p and all descendents of v, including v) will be in one disjoint set represented by p (i.e. ancester[root of the disjoint set] is p).
Proof: Suppose the tree has height h. Then proceed by induction in vertex height, starting from the leaf nodes.
Lemma 2: For each vertex v, right before we mark it as visited, the following statements are true:
Each v's parents pi will be in a disjoint set that contains precisely pi and all vertices in the subtrees of pi that pi has already finished visiting.
Every visited vertex so far is in one of these disjoint sets.
Proof: We proceed by induction. The statement is vacuously true for the root (the only vertex with height 0) as it has no parent. Now suppose the statement holds for every vertex of height k for k ≥ 0, and suppose v is a vertex of height k + 1. Let p be v's parent. Before p visits v, suppose it has already visited its children c1, c2, ..., cn. By Lemma 1, p and all vertices in the subtrees of root c1, c2, ..., cn are in one disjoint set represented by p. Furthermore, all newly visited vertices after we visited p are the vertices in this disjoint set. Since p is of height k, we can use the induction hypothesis to conclude that v indeed satisfies 1 and 2.
We are now ready to prove the algorithm.
Claim: For each query (u,v), the algorithm outputs the lowest common ancester of u and v.
Proof: Without loss of generality suppose we visit u before we visit v in the DFS. Then either v is a descendent of u or not.
If v is a descedent of u, by Lemma 1 we know that u and v are in one disjoint set that is represented by u, which means ancestor[find_set(v)] is u, the correct answer.
If v is not a descendent of u, then by Lemma 2 we know that u must be in one of the disjoint sets, each of them represented by a parent of v at the time we mark v. Let p be the representing vertex of the disjoint set u is in. By Lemma 2 we know p is a parent of v, and u is in a visited subtree of p and therefore a descendent of p. These are not changed after we have visited all v's children, so p is indeed a common ancestor of u and v. To see p is the lowest common ancestor, suppose q is the child of p of which v is a descendent (i.e. if we travel back to root from v, q is the last parent before we reach p; q can be v). Suppose for contradiction that u is also a descendent of q. Then by Lemma 2 u is in both the disjoint set represented by p and the disjoint set represented by q, so this disjoint set contains two v's parents, a contradiction.