Breadth First Vs Depth First - algorithm

When Traversing a Tree/Graph what is the difference between Breadth First and Depth first? Any coding or pseudocode examples would be great.

These two terms differentiate between two different ways of walking a tree.
It is probably easiest just to exhibit the difference. Consider the tree:
A
/ \
B C
/ / \
D E F
A depth first traversal would visit the nodes in this order
A, B, D, C, E, F
Notice that you go all the way down one leg before moving on.
A breadth first traversal would visit the node in this order
A, B, C, D, E, F
Here we work all the way across each level before going down.
(Note that there is some ambiguity in the traversal orders, and I've cheated to maintain the "reading" order at each level of the tree. In either case I could get to B before or after C, and likewise I could get to E before or after F. This may or may not matter, depends on you application...)
Both kinds of traversal can be achieved with the pseudocode:
Store the root node in Container
While (there are nodes in Container)
N = Get the "next" node from Container
Store all the children of N in Container
Do some work on N
The difference between the two traversal orders lies in the choice of Container.
For depth first use a stack. (The recursive implementation uses the call-stack...)
For breadth-first use a queue.
The recursive implementation looks like
ProcessNode(Node)
Work on the payload Node
Foreach child of Node
ProcessNode(child)
/* Alternate time to work on the payload Node (see below) */
The recursion ends when you reach a node that has no children, so it is guaranteed to end for
finite, acyclic graphs.
At this point, I've still cheated a little. With a little cleverness you can also work-on the nodes in this order:
D, B, E, F, C, A
which is a variation of depth-first, where I don't do the work at each node until I'm walking back up the tree. I have however visited the higher nodes on the way down to find their children.
This traversal is fairly natural in the recursive implementation (use the "Alternate time" line above instead of the first "Work" line), and not too hard if you use a explicit stack, but I'll leave it as an exercise.

Understanding the terms:
This picture should give you the idea about the context in which the words breadth and depth are used.
Depth-First Search:
Depth-first search algorithm acts as if it wants to get as far away
from the starting point as quickly as possible.
It generally uses a Stack to remember where it should go when it reaches a dead end.
Rules to follow: Push first vertex A on to the Stack
If possible, visit an adjacent unvisited vertex, mark it as visited, and push it on the stack.
If you can’t follow Rule 1, then, if possible, pop a vertex off the stack.
If you can’t follow Rule 1 or Rule 2, you’re done.
Java code:
public void searchDepthFirst() {
// Begin at vertex 0 (A)
vertexList[0].wasVisited = true;
displayVertex(0);
stack.push(0);
while (!stack.isEmpty()) {
int adjacentVertex = getAdjacentUnvisitedVertex(stack.peek());
// If no such vertex
if (adjacentVertex == -1) {
stack.pop();
} else {
vertexList[adjacentVertex].wasVisited = true;
// Do something
stack.push(adjacentVertex);
}
}
// Stack is empty, so we're done, reset flags
for (int j = 0; j < nVerts; j++)
vertexList[j].wasVisited = false;
}
Applications: Depth-first searches are often used in simulations of games (and game-like situations in the real world). In a typical game you can choose one of several possible actions. Each choice leads to further choices, each of which leads to further choices, and so on into an ever-expanding tree-shaped graph of possibilities.
Breadth-First Search:
The breadth-first search algorithm likes to stay as close as possible
to the starting point.
This kind of search is generally implemented using a Queue.
Rules to follow: Make starting Vertex A the current vertex
Visit the next unvisited vertex (if there is one) that’s adjacent to the current vertex, mark it, and insert it into the queue.
If you can’t carry out Rule 1 because there are no more unvisited vertices, remove a vertex from the queue (if possible) and make it the current vertex.
If you can’t carry out Rule 2 because the queue is empty, you’re done.
Java code:
public void searchBreadthFirst() {
vertexList[0].wasVisited = true;
displayVertex(0);
queue.insert(0);
int v2;
while (!queue.isEmpty()) {
int v1 = queue.remove();
// Until it has no unvisited neighbors, get one
while ((v2 = getAdjUnvisitedVertex(v1)) != -1) {
vertexList[v2].wasVisited = true;
// Do something
queue.insert(v2);
}
}
// Queue is empty, so we're done, reset flags
for (int j = 0; j < nVerts; j++)
vertexList[j].wasVisited = false;
}
Applications: Breadth-first search first finds all the vertices that are one edge away from the starting point, then all the vertices that are two edges away, and so on. This is useful if you’re trying to find the shortest path from the starting vertex to a given vertex.
Hopefully that should be enough for understanding the Breadth-First and Depth-First searches. For further reading I would recommend the Graphs chapter from an excellent data structures book by Robert Lafore.

Given this binary tree:
Breadth First Traversal:
Traverse across each level from left to right.
"I'm G, my kids are D and I, my grandkids are B, E, H and K, their grandkids are A, C, F"
- Level 1: G
- Level 2: D, I
- Level 3: B, E, H, K
- Level 4: A, C, F
Order Searched: G, D, I, B, E, H, K, A, C, F
Depth First Traversal:
Traversal is not done ACROSS entire levels at a time. Instead, traversal dives into the DEPTH (from root to leaf) of the tree first. However, it's a bit more complex than simply up and down.
There are three methods:
1) PREORDER: ROOT, LEFT, RIGHT.
You need to think of this as a recursive process:
Grab the Root. (G)
Then Check the Left. (It's a tree)
Grab the Root of the Left. (D)
Then Check the Left of D. (It's a tree)
Grab the Root of the Left (B)
Then Check the Left of B. (A)
Check the Right of B. (C, and it's a leaf node. Finish B tree. Continue D tree)
Check the Right of D. (It's a tree)
Grab the Root. (E)
Check the Left of E. (Nothing)
Check the Right of E. (F, Finish D Tree. Move back to G Tree)
Check the Right of G. (It's a tree)
Grab the Root of I Tree. (I)
Check the Left. (H, it's a leaf.)
Check the Right. (K, it's a leaf. Finish G tree)
DONE: G, D, B, A, C, E, F, I, H, K
2) INORDER: LEFT, ROOT, RIGHT
Where the root is "in" or between the left and right child node.
Check the Left of the G Tree. (It's a D Tree)
Check the Left of the D Tree. (It's a B Tree)
Check the Left of the B Tree. (A)
Check the Root of the B Tree (B)
Check the Right of the B Tree (C, finished B Tree!)
Check the Right of the D Tree (It's a E Tree)
Check the Left of the E Tree. (Nothing)
Check the Right of the E Tree. (F, it's a leaf. Finish E Tree. Finish D Tree)...
Onwards until...
DONE: A, B, C, D, E, F, G, H, I, K
3) POSTORDER:
LEFT, RIGHT, ROOT
DONE: A, C, B, F, E, D, H, K, I, G
Usage (aka, why do we care):
I really enjoyed this simple Quora explanation of the Depth First Traversal methods and how they are commonly used:
"In-Order Traversal will print values [in order for the BST (binary search tree)]"
"Pre-order traversal is used to create a copy of the [binary search tree]."
"Postorder traversal is used to delete the [binary search tree]."
https://www.quora.com/What-is-the-use-of-pre-order-and-post-order-traversal-of-binary-trees-in-computing

I think it would be interesting to write both of them in a way that only by switching some lines of code would give you one algorithm or the other, so that you will see that your dillema is not so strong as it seems to be at first.
I personally like the interpretation of BFS as flooding a landscape: the low altitude areas will be flooded first, and only then the high altitude areas would follow. If you imagine the landscape altitudes as isolines as we see in geography books, its easy to see that BFS fills all area under the same isoline at the same time, just as this would be with physics. Thus, interpreting altitudes as distance or scaled cost gives a pretty intuitive idea of the algorithm.
With this in mind, you can easily adapt the idea behind breadth first search to find the minimum spanning tree easily, shortest path, and also many other minimization algorithms.
I didnt see any intuitive interpretation of DFS yet (only the standard one about the maze, but it isnt as powerful as the BFS one and flooding), so for me it seems that BFS seems to correlate better with physical phenomena as described above, while DFS correlates better with choices dillema on rational systems (ie people or computers deciding which move to make on a chess game or going out of a maze).
So, for me the difference between lies on which natural phenomenon best matches their propagation model (transversing) in real life.

Related

data structure for representing non square board game

I am trying to design a board game, which is not square. Here we have 2 different types of pieces(attacker & defenders). Both can move to any adjacent free intersection. The attacker player can also jump on defender player if there is empty space in the adjacent intersection in the same line. Considering these cases i can think of storing the board as array.
But this is not a right choice, as i need to harcode attackable position from each indexes. need your suggestions for designing this board.
One more option is using graph and maintain the direction of the node(Left,Right,Top, Down), but this would require 3 Down nodes on the top vertex of the board.
The two times I had to do this, I created a play_line data type. I had the canonical graph with nodes and edges; a play_line is a sequence of edges.
A legal move to a free, adjacent intersection is a trivial property from the graph alone. A piece A at node m can move to any node n where
edge (m,n) exists
node n is empty
A jump from m, over n, to p exists where
edge (m,n) exists
edge (n,p) exists
node n contains a piece D (defender)
node p is empty
Along the play_line containing edge (m,n), (n,p) is the next edge on that line.
Does that help?
Update after OP comment
There is nothing to maintain for the play_line objects, as the lines of play do not change once initialized. These are hard-coded from the game board, an enhancement of the graph. For instance if the board is labeled
a
b c d e f
g h i j k
l m n o p
q r s
then the first full row is a line of play containing five nodes in order, [b, c, d, e, f]. There are corresponding graph edges (correct by construction) of (b, c), (c, d), (d, e), (e, f). Note that your code must either traverse this in either direction, or you make a second play_line in the reverse order.

Counting p-cousins on a directed tree

We're given a directed tree to work with. We define the concepts of p-ancestor and p-cousin as follows
p-ancestor: A node is an 1-ancestor of another if it is the parent of it. It is the p-ancestor of a node, if it is the parent of the (p-1)-th ancestor.
p-cousin: A node is the p-cousin of another, if they share the same p-ancestor.
For example, consider the tree below.
4 has three 1-cousins i,e, 3, 4 and 5 since they all share the common
1-ancestor, which is 1
For a particular tree, the problem is as follows. You are given multiple pairs of (node,p) and are supposed to count (and output) the number of p-cousins of the corresponding nodes.
A slow algorithm would be to crawl up to the p-ancestor and run a BFS for each node.
What is the (asymptotically) fastest way to solve the problem?
If an off-line solution is acceptable, two Depth first searches can do the job.
Assume that we can index all of those n queries (node, p) from 0 to n - 1
We can convert each query (node, p) into another type of query (ancestor , p) as follow:
Answer for query (node, p), with node has level a (distance from root to this node is a), is the number of descendants level a of the ancestor at level a - p. So, for each queries, we can find who is that ancestor:
Pseudo code
dfs(int node, int level, int[]path, int[] ancestorForQuery, List<Query>[]data){
path[level] = node;
visit all child node;
for(Query query : data[node])
if(query.p <= level)
ancestorForQuery[query.index] = path[level - p];
}
Now, after the first DFS, instead of the original query, we have a new type of query (ancestor, p)
Assume that we have an array count, which at index i stores the number of node which has level i. Assume that, node a at level x , we need to count number of p descendants, so, the result for this query is:
query result = count[x + p] after we visit a - count[x + p] before we visit a
Pseudo code
dfs2(int node, int level, int[] result, int[]count, List<TransformedQuery>[]data){
count[level] ++;
for(TransformedQuery query : data[node]){
result[query.index] -= count[level + query.p];
}
visit all child node;
for(TransformedQuery query : data[node]){
result[query.index] += count[level + query.p];
}
}
Result of each query is stored in result array.
If p is fixed, I suggest the following algorithm:
Let's say that count[v] is number of p-children of v. Initially all count[v] are set to 0. And pparent[v] is p-parent of v.
Let's now run a dfs on the tree and keep the stack of visited nodes, i.e. when we visit some v, we put it into the stack. Once we leave v, we pop.
Suppose we've come to some node v in our dfs. Let's do count[stack[size - p]]++, indicating that we are a p-child of v. Also pparent[v] = stack[size-p]
Once your dfs is finished, you can calculate the desired number of p-cousins of v like this:
count[pparent[v]]
The complexity of this is O(n + m) for dfs and O(1) for each query
First I'll describe a fairly simple way to answer each query in O(p) time that uses O(n) preprocessing time and space, and then mention a way that query times can be sped up to O(log p) time for a factor of just O(log n) extra preprocessing time and space.
O(p)-time query algorithm
The basic idea is that if we write out the sequence of nodes visited during a DFS traversal of the tree in such a way that every node is written out at a vertical position corresponding to its level in the tree, then the set of p-cousins of a node form a horizontal interval in this diagram. Note that this "writing out" looks very much like a typical tree diagram, except without lines connecting nodes, and (if a postorder traversal is used; preorder would be just as good) parent nodes always appearing to the right of their children. So given a query (v, p), what we will do is essentially:
Find the p-th ancestor u of the given node v. Naively this takes O(p) time.
Find the p-th left-descendant l of u -- that is, the node you reach after repeating the process of visiting the leftmost child of the current node, p times. Naively this takes O(p) time.
Find the p-th right-descendant r of u (defined similarly). Naively this takes O(p) time.
Return the value x[r] - x[l] + 1, where x[i] is a precalculated value that records the number of nodes in the sequence described above that are at the same level as, and at or to the left of, node i. This takes constant time.
The preprocessing step is where we calculate x[i], for each 1 <= i <= n. This is accomplished by performing a DFS that builds up a second array y[] that records the number y[d] of nodes visited so far at depth d. Specifically, y[d] is initially 0 for each d; during the DFS, when we visit a node v at depth d, we simply increment y[d] and then set x[v] = y[d].
O(log p)-time query algorithm
The above algorithm should already be fast enough if the tree is fairly balanced -- but in the worst case, when each node has just a single child, O(p) = O(n). Notice that it is navigating up and down the tree in the first 3 of the above 4 steps that force O(p) time -- the last step takes constant time.
To fix this, we can add some extra pointers to make navigating up and down the tree faster. A simple and flexible way uses "pointer doubling": For each node v, we will store log2(depth(v)) pointers to successively higher ancestors. To populate these pointers, we perform log2(maxDepth) DFS iterations, where on the i-th iteration we set each node v's i-th ancestor pointer to its (i-1)-th ancestor's (i-1)-th ancestor: this takes just two pointer lookups per node per DFS. With these pointers, moving any distance p up the tree always takes at most log(p) jumps, because the distance can be reduced by at least half on each jump. The exact same procedure can be used to populate corresponding lists of pointers for "left-descendants" and "right-descendants" to speed up steps 2 and 3, respectively, to O(log p) time.

How to find largest common sub-tree in the given two binary search trees?

Two BSTs (Binary Search Trees) are given. How to find largest common sub-tree in the given two binary trees?
EDIT 1:
Here is what I have thought:
Let, r1 = current node of 1st tree
r2 = current node of 2nd tree
There are some of the cases I think we need to consider:
Case 1 : r1.data < r2.data
2 subproblems to solve:
first, check r1 and r2.left
second, check r1.right and r2
Case 2 : r1.data > r2.data
2 subproblems to solve:
- first, check r1.left and r2
- second, check r1 and r2.right
Case 3 : r1.data == r2.data
Again, 2 cases to consider here:
(a) current node is part of largest common BST
compute common subtree size rooted at r1 and r2
(b)current node is NOT part of largest common BST
2 subproblems to solve:
first, solve r1.left and r2.left
second, solve r1.right and r2.right
I can think of the cases we need to check, but I am not able to code it, as of now. And it is NOT a homework problem. Does it look like?
Just hash the children and key of each node and look for duplicates. This would give a linear expected time algorithm. For example, see the following pseudocode, which assumes that there are no hash collisions (dealing with collisions would be straightforward):
ret = -1
// T is a tree node, H is a hash set, and first is a boolean flag
hashTree(T, H, first):
if (T is null):
return 0 // leaf case
h = hash(hashTree(T.left, H, first), hashTree(T.right, H, first), T.key)
if (first):
// store hashes of T1's nodes in the set H
H.insert(h)
else:
// check for hashes of T2's nodes in the set H containing T1's nodes
if H.contains(h):
ret = max(ret, size(T)) // size is recursive and memoized to get O(n) total time
return h
H = {}
hashTree(T1, H, true)
hashTree(T2, H, false)
return ret
Note that this is assuming the standard definition of a subtree of a BST, namely that a subtree consists of a node and all of its descendants.
Assuming there are no duplicate values in the trees:
LargestSubtree(Tree tree1, Tree tree2)
Int bestMatch := 0
Int bestMatchCount := 0
For each Node n in tree1 //should iterate breadth-first
//possible optimization: we can skip every node that is part of each subtree we find
Node n2 := BinarySearch(tree2(n.value))
Int matchCount := CountMatches(n, n2)
If (matchCount > bestMatchCount)
bestMatch := n.value
bestMatchCount := matchCount
End
End
Return ExtractSubtree(BinarySearch(tree1(bestMatch)), BinarySearch(tree2(bestMatch)))
End
CountMatches(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return 0
End
Return 1 + CountMatches(n1.left, n2.left) + CountMatches(n1.right, n2.right)
End
ExtractSubtree(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return nil
End
Node result := New Node(n1.value)
result.left := ExtractSubtree(n1.left, n2.left)
result.right := ExtractSubtree(n1.right, n2.right)
Return result
End
To briefly explain, this is a brute-force solution to the problem. It does a breadth-first walk of the first tree. For each node, it performs a BinarySearch of the second tree to locate the corresponding node in that tree. Then using those nodes it evaluates the total size of the common subtree rooted there. If the subtree is larger than any previously found subtree, it remembers it for later so that it can construct and return a copy of the largest subtree when the algorithm completes.
This algorithm does not handle duplicate values. It could be extended to do so by using a BinarySearch implementation that returns a list of all nodes with the given value, instead of just a single node. Then the algorithm could iterate this list and evaluate the subtree for each node and then proceed as normal.
The running time of this algorithm is O(n log m) (it traverses n nodes in the first tree, and performs a log m binary-search operation for each one), putting it on par with most common sorting algorithms. The space complexity is O(1) while running (nothing allocated beyond a few temporary variables), and O(n) when it returns its result (because it creates an explicit copy of the subtree, which may not be required depending upon exactly how the algorithm is supposed to express its result). So even this brute-force approach should perform reasonably well, although as noted by other answers an O(n) solution is possible.
There are also possible optimizations that could be applied to this algorithm, such as skipping over any nodes that were contained in a previously evaluated subtree. Because the tree-walk is breadth-first we know than any node that was part of some prior subtree cannot ever be the root of a larger subtree. This could significantly improve the performance of the algorithm in certain cases, but the worst-case running time (two trees with no common subtrees) would still be O(n log m).
I believe that I have an O(n + m)-time, O(n + m) space algorithm for solving this problem, assuming the trees are of size n and m, respectively. This algorithm assumes that the values in the trees are unique (that is, each element appears in each tree at most once), but they do not need to be binary search trees.
The algorithm is based on dynamic programming and works with the following intution: suppose that we have some tree T with root r and children T1 and T2. Suppose the other tree is S. Now, suppose that we know the maximum common subtree of T1 and S and of T2 and S. Then the maximum subtree of T and S
Is completely contained in T1 and r.
Is completely contained in T2 and r.
Uses both T1, T2, and r.
Therefore, we can compute the maximum common subtree (I'll abbreviate this as MCS) as follows. If MCS(T1, S) or MCS(T2, S) has the roots of T1 or T2 as roots, then the MCS we can get from T and S is given by the larger of MCS(T1, S) and MCS(T2, S). If exactly one of MCS(T1, S) and MCS(T2, S) has the root of T1 or T2 as a root (assume w.l.o.g. that it's T1), then look up r in S. If r has the root of T1 as a child, then we can extend that tree by a node and the MCS is given by the larger of this augmented tree and MCS(T2, S). Otherwise, if both MCS(T1, S) and MCS(T2, S) have the roots of T1 and T2 as roots, then look up r in S. If it has as a child the root of T1, we can extend the tree by adding in r. If it has as a child the root of T2, we can extend that tree by adding in r. Otherwise, we just take the larger of MCS(T1, S) and MCS(T2, S).
The formal version of the algorithm is as follows:
Create a new hash table mapping nodes in tree S from their value to the corresponding node in the tree. Then fill this table in with the nodes of S by doing a standard tree walk in O(m) time.
Create a new hash table mapping nodes in T from their value to the size of the maximum common subtree of the tree rooted at that node and S. Note that this means that the MCS-es stored in this table must be directly rooted at the given node. Leave this table empty.
Create a list of the nodes of T using a postorder traversal. This takes O(n) time. Note that this means that we will always process all of a node's children before the node itself; this is very important!
For each node v in the postorder traversal, in the order they were visited:
Look up the corresponding node in the hash table for the nodes of S.
If no node was found, set the size of the MCS rooted at v to 0.
If a node v' was found in S:
If neither of the children of v' match the children of v, set the size of the MCS rooted at v to 1.
If exactly one of the children of v' matches a child of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the subtree rooted at that child.
If both of the children of v' match the children of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the left subtree plus the size of the MCS of the right subtree.
(Note that step (4) runs in expected O(n) time, since it visits each node in S exactly once, makes O(n) hash table lookups, makes n hash table inserts, and does a constant amount of processing per node).
Iterate across the hash table and return the maximum value it contains. This step takes O(n) time as well. If the hash table is empty (S has size zero), return 0.
Overall, the runtime is O(n + m) time expected and O(n + m) space for the two hash tables.
To see a correctness proof, we proceed by induction on the height of the tree T. As a base case, if T has height zero, then we just return zero because the loop in (4) does not add anything to the hash table. If T has height one, then either it exists in T or it does not. If it exists in T, then it can't have any children at all, so we execute branch 4.3.1 and say that it has height one. Step (6) then reports that the MCS has size one, which is correct. If it does not exist, then we execute 4.2, putting zero into the hash table, so step (6) reports that the MCS has size zero as expected.
For the inductive step, assume that the algorithm works for all trees of height k' < k and consider a tree of height k. During our postorder walk of T, we will visit all of the nodes in the left subtree, then in the right subtree, and finally the root of T. By the inductive hypothesis, the table of MCS values will be filled in correctly for the left subtree and right subtree, since they have height ≤ k - 1 < k. Now consider what happens when we process the root. If the root doesn't appear in the tree S, then we put a zero into the table, and step (6) will pick the largest MCS value of some subtree of T, which must be fully contained in either its left subtree or right subtree. If the root appears in S, then we compute the size of the MCS rooted at the root of T by trying to link it with the MCS-es of its two children, which (inductively!) we've computed correctly.
Whew! That was an awesome problem. I hope this solution is correct!
EDIT: As was noted by #jonderry, this will find the largest common subgraph of the two trees, not the largest common complete subtree. However, you can restrict the algorithm to only work on subtrees quite easily. To do so, you would modify the inner code of the algorithm so that it records a subtree of size 0 if both subtrees aren't present with nonzero size. A similar inductive argument will show that this will find the largest complete subtree.
Though, admittedly, I like the "largest common subgraph" problem a lot more. :-)
The following algorithm computes all the largest common subtrees of two binary trees (with no assumption that it is a binary search tree). Let S and T be two binary trees. The algorithm works from the bottom of the trees up, starting at the leaves. We start by identifying leaves with the same value. Then consider their parents and identify nodes with the same children. More generally, at each iteration, we identify nodes provided they have the same value and their children are isomorphic (or isomorphic after swapping the left and right children). This algorithm terminates with the collection of all pairs of maximal subtrees in T and S.
Here is a more detailed description:
Let S and T be two binary trees. For simplicity, we may assume that for each node n, the left child has value <= the right child. If exactly one child of a node n is NULL, we assume the right node is NULL. (In general, we consider two subtrees isomorphic if they are up to permutation of the left/right children for each node.)
(1) Find all leaf nodes in each tree.
(2) Define a bipartite graph B with edges from nodes in S to nodes in T, initially with no edges. Let R(S) and T(S) be empty sets. Let R(S)_next and R(T)_next also be empty sets.
(3) For each leaf node in S and each leaf node in T, create an edge in B if the nodes have the same value. For each edge created from nodeS in S to nodeT in T, add all the parents of nodeS to the set R(S) and all the parents of nodeT to the set R(T).
(4) For each node nodeS in R(S) and each node nodeT in T(S), draw an edge between them in B if they have the same value AND
{
(i): nodeS->left is connected to nodeT->left and nodeS->right is connected to nodeT->right, OR
(ii): nodeS->left is connected to nodeT->right and nodeS->right is connected to nodeT->left, OR
(iii): nodeS->left is connected to nodeT-> right and nodeS->right == NULL and nodeT->right==NULL
(5) For each edge created in step (4), add their parents to R(S)_next and R(T)_next.
(6) If (R(S)_next) is nonempty {
(i) swap R(S) and R(S)_next and swap R(T) and R(T)_next.
(ii) Empty the contents of R(S)_next and R(T)_next.
(iii) Return to step (4).
}
When this algorithm terminates, R(S) and T(S) contain the roots of all maximal subtrees in S and T. Furthermore, the bipartite graph B identifies all pairs of nodes in S and nodes in T that give isomorphic subtrees.
I believe this algorithm has complexity is O(n log n), where n is the total number of nodes in S and T, since the sets R(S) and T(S) can be stored in BST’s ordered by value, however I would be interested to see a proof.

need a graph algorithm similar to DFS

I'm curious if there is a specific graph algorithm that traverses an unweighted acyclic directed graph by choosing a start node and then proceeding via DFS. If a node is encountered that has unsearched predecessors then it should back track the incoming paths until all paths to start have been explored.
I found a wikipedia category for graph algorithms but there is a small sea of algorithms here and I'm not familiar with most of them.
EDIT: example:
given the graph {AB, EB, BC, BD}, traverse as: {A,B,E,B,C,D} or unique order as {A,B,E,C,D}.
Note this algorithm unlike BFS or DFS does not need to begin again at a new start node if all paths of the first start node are exhausted.
In DFS, you usually choose the vertex to be visited after u based on the edges starting at u. You want to choose based first on the edges ending at u. To do this, you could have a transpose graph info, and try to get the vertex from there first.
It would be something like this:
procedure dfs(vertex u)
mark u as visited
for each edge (v, u) //found in transpose graph
if v not visited
dfs(v)
for each edge (u, w)
if v not visited
dfs(w)
What you are looking for is the topological sort. As far as I'm aware there's no easy way to traverse a graph in its topologically sorted order without any precomputation.
The standard way to get the topsort is to do a standard DFS, and then store the visited nodes in order of their visiting times. Finally, reverse those nodes and voila, you have them in the order you desire.
Pseudocode:
list topsort
procedure dfs(vertex u)
mark u as visited
for each edge (u, v)
if v not visited
dfs(v)
add u to the back of topsort
The list topsort will then contain the vertices in the reverse order that you want. Just reverse the elements of topsort to correct that.
If you're looking for topological sort, you can also do this, given an adjacency list (or a list of edges (u,v) which you can preprocess in O(E) time):
list top_sort( graph in adjacency list )
parent = new list
queue = new queue
for each u in nodes
parent(u) = number of parents
if ( parent(u) is 0 ) // nothing points to node i
queue.enqueue( u )
while ( queue is not empty )
u = queue.pop
add u to visited
for each edge ( u, v )
decrement parent(v) // children all have one less parent
if ( parent(v) is 0 )
queue.enqueue( v )
Given an adjacency list (or a list of edges (u,v)), this is O( V + E ), since each edge is touched twice - once to increment, once to decrement, in O(1) time each. With a normal queue, each vertice will also be processed by the queue at most twice - which can be done also in O(1) with a standard queue.
Note that this differs from the DFS (at least a straight-up implementation) in that it handles forests.
Another interesting note is that if you substitute queue with a priority_queue imposing some sort of structure/ordering, you can actually return the results sorted in some order.
For example, for a canonical class dependency graph (you can only take class X if you took class Y):
100:
101: 100
200: 100 101
201:
202: 201
you would probably get, as a result:
100, 201, 101, 202, 200
but if you change it so that you always want to take lower numbered classes first, you can easily change it to return:
100, 101, 200, 201, 202

What is a good algorithm for getting the minimum vertex cover of a tree?

What is a good algorithm for getting the minimum vertex cover of a tree?
INPUT:
The node's neighbours.
OUTPUT:
The minimum number of vertices.
I didn't fully understand after reading the answers here, so I thought I'd post one from here
The general idea is that you root the tree at an arbitrary node, and ask whether that root is in the cover or not. If it is, then you calculate the min vertex cover of the subtrees rooted at its children by recursing. If it isn't, then every child of the root must be in the vertex cover so that every edge between the root and its children is covered. In this case, you recurse on the root's grandchildren.
So for example, if you had the following tree:
A
/ \
B C
/ \ / \
D E F G
Note that by inspection, you know the min vertex cover is {B, C}. We will find this min cover.
Here we start with A.
A is in the cover
We move down to the two subtrees of B and C, and recurse on this algorithm. We can't simply state that B and C are not in the cover, because even if AB and AC are covered, we can't say anything about whether we will need B and C to be in the cover or not.
(Think about the following tree, where both the root and one of its children are in the min cover ({A, D})
A
/|\___
B C D
/|\
E F G
)
A is not in the cover
But we know that AB and AC must be covered, so we have to add B and C to the cover. Since B and C are in the cover, we can recurse on their children instead of recursing on B and C (even if we did, it wouldn't give us any more information).
"Formally"
Let C(x) be the size of the min cover rooted at x.
Then,
C(x) = min (
1 + sum ( C(i) for i in x's children ), // root in cover
len(x's children) + sum( C(i) for i in x's grandchildren) // root not in cover
)
T(V,E) is a tree, which implies that for any leaf, any minimal vertex cover has to include either the leaf or the vertex adjacent to the leaf. This gives us the following algorithm to finding S, the vertex cover:
Find all leaves of the tree (BFS or DFS), O(|V|) in a tree.
If (u,v) is an edge such that v is a leaf, add u to the vertex cover, and prune (u,v). This will leave you with a forest T_1(V_1,E_1),...,T_n(U_n,V_n).
Now, if V_i={v}, meaning |V_i|=1, then that tree can be dropped since all edges incident on v are covered. This means that we have a termination condition for a recursion, where we have either one or no vertices, and we can compute S_i as the cover for each T_i, and define S as all the vertices from step 2 union the cover of each T_i.
Now, all that remains is to verify that if the original tree has only one vertex, we return 1 and never start the recursion, and the minimal vertex cover can be computed.
Edit:
Actually, after thinking about it for a bit, it can be accomplished with a simple DFS variant.
I hope here you can find more related answer to your question.
I was thinking about my solution, probably you will need to polish it but as long as dynamic programing is in one of your tags you probably need to:
For each u vertex define S+(u) is
cover size with vertex u and S-(u)
cover without vertex u.
S+(u)= 1 + Sum(S-(v)) for each child v of u.
S-(u)=Sum(max{S-(v),S+(v)}) for each child v of u.
Answer is max(S+(r), S-(r)) where r is root of your tree.
After reading this. Changed the above algorithm to find maximum independent set, since in wiki article stated
A set is independent if and only if its complement is a vertex cover.
So by changing min to max we can find the maximum independent set and by compliment the minimum vertex cover, since both problem are equivalent.
{- Haskell implementation of Artem's algorithm -}
data Tree = Branch [Tree]
deriving Show
{- first int is the min cover; second int is the min cover that includes the root -}
minVC :: Tree -> (Int, Int)
minVC (Branch subtrees) = let
costs = map minVC subtrees
minWithRoot = 1 + sum (map fst costs) in
(min minWithRoot (sum (map snd costs)), minWithRoot)
We can use a DFS based algorithm to solve this probelm:
DFS(node x)
{
discovered[x] = true;
/* Scan the Adjacency list for the node x*/
while((y = getNextAdj() != NULL)
{
if(discovered[y] == false)
{
DFS(y);
/* y is the child of node x*/
/* If child is not selected as a vertex for minimum selected cover
then select the parent */
if(y->isSelected == false)
{
x->isSelected = true;
}
}
}
}
The leaf node will never be selected for the vertex cover.
We need to find the minimum vertex cover for each node we have to choices to make, either to include it or not to include it. But according to the problem for each edge (u, v), either of 'u' or 'v' should be in the cover so we need to take care that if the current vertex is not included then we should include its children, and if we are including the current vertex then, we may or may not include its children based on optimal solution.
Here, DP1[v] for any vertex v = When we include it.
DP2[v] for any vertex v = when we don't include it.
DP1[v] = 1 + sum(min(DP2[c], DP1[c])) - this means include current and may or may not include its children, based on what is optimal.
DP2[v] = sum(DP1[c]) - this means not including current then we need to include children of current vertex. Here, c is the child of vertex v.
Then, our solution is min(DP1[root], DP2[root])
#include <bits/stdc++.h>
using namespace std;
vector<vector<int> > g;
int dp1[100010], dp2[100010];
void dfs(int curr, int p){
for(auto it : g[curr]){
if(it == p){
continue;
}
dfs(it, curr);
dp1[curr] += min(dp1[it], dp2[it]);
dp2[curr] += dp1[it];
}
dp1[curr] += 1;
}
int main(){
int n;
cin >> n;
g.resize(n+1);
for(int i=0 ; i<n-1 ; i++){
int u, v;
cin >> u >> v;
g[u].push_back(v);
g[v].push_back(u);
}
dfs(1, 0);
cout << min(dp1[1], dp2[1]);
return 0;
}
I would simply use a linear program to solve the minimum vertex cover problem.
A formulation as an integer linear program could look like the one given here: ILP formulation
I don't think that your own implementation would be faster than these highly optimized LP solvers.

Resources