I'm trying to get my head around the calculation of time complexity of my own method - can anybody nudge me in the direction of calculating a method involving a for each method which involves a recursive call on itself?
I've written a method that does a tree traversal of an -nary tree. Unfortunately I can't post the exact code, but it goes: given the root node to start with
for each (child of node)
does a quick check
sets a boolean
does a recursive call on itself until we go to the lead nodes
Your loop visits every node of the tree exactly once.
Beginning withe the root node, you visit all of its child nodes which for them you call the same function on every child node of the root-child nodes and the same repeats.
Since you visit every node exactly once this loop has a runtime of O(n) for n nodes of your tree assuming that quick check is constant and does not depend on n or does anything that exceeds O(n).
"is the for each part done n times":
Yes and no: The for each part is done numberOfChildsOfNode(Node node) for a single node but since you do that for each child node by calling your function recursively the number of times this is executed is actually n times.
What you can test/try: Declare a static variable executionCount or somtheing like that, initialize it to 0 and increment it inside your loop. You should see that executionCount equals the number of nodes.
Related
I'm trying to figure out this data structure, but I don't understand how can we
tell there are O(log(n)) subtrees that represents the answer to a query?
Here is a picture for illustration:
Thanks!
If we make the assumption that the above is a purely functional binary tree [wiki], so where the nodes are immutable, then we can make a "copy" of this tree such that only elements with a value larger than x1 and lower than x2 are in the tree.
Let us start with a very simple case to illustrate the point. Imagine that we simply do not have any bounds, than we can simply return the entire tree. So instead of constructing a new tree, we return a reference to the root of the tree. So we can, without any bounds return a tree in O(1), given that tree is not edited (at least not as long as we use the subtree).
The above case is of course quite simple. We simply make a "copy" (not really a copy since the data is immutable, we can just return the tree) of the entire tree. So let us aim to solve a more complex problem: we want to construct a tree that contains all elements larger than a threshold x1. Basically we can define a recursive algorithm for that:
the cutted version of None (or whatever represents a null reference, or a reference to an empty tree) is None;
if the node has a value is smaller than the threshold, we return a "cutted" version of the right subtree; and
if the node has a value greater than the threshold, we return an inode that has the same right subtree, and as left subchild the cutted version of the left subchild.
So in pseudo-code it looks like:
def treelarger(some_node, min):
if some_tree is None:
return None
if some_node.value > min:
return Node(treelarger(some_node.left, min), some_node.value, some_node.right)
else:
return treelarger(some_node.right, min)
This algorithm thus runs in O(h) with h the height of the tree, since for each case (except the first one), we recurse to one (not both) of the children, and it ends in case we have a node without children (or at least does not has a subtree in the direction we need to cut the subtree).
We thus do not make a complete copy of the tree. We reuse a lot of nodes in the old tree. We only construct a new "surface" but most of the "volume" is part of the old binary tree. Although the tree itself contains O(n) nodes, we construct, at most, O(h) new nodes. We can optimize the above such that, given the cutted version of one of the subtrees is the same, we do not create a new node. But that does not even matter much in terms of time complexity: we generate at most O(h) new nodes, and the total number of nodes is either less than the original number, or the same.
In case of a complete tree, the height of the tree h scales with O(log n), and thus this algorithm will run in O(log n).
Then how can we generate a tree with elements between two thresholds? We can easily rewrite the above into an algorithm treesmaller that generates a subtree that contains all elements that are smaller:
def treesmaller(some_node, max):
if some_tree is None:
return None
if some_node.value < min:
return Node(some_node.left, some_node.value, treesmaller(some_node.right, max))
else:
return treesmaller(some_node.left, max)
so roughly speaking there are two differences:
we change the condition from some_node.value > min to some_node.value < max; and
we recurse on the right subchild in case the condition holds, and on the left if it does not hold.
Now the conclusions we draw from the previous algorithm are also conclusions that can be applied to this algorithm, since again it only introduces O(h) new nodes, and the total number of nodes can only decrease.
Although we can construct an algorithm that takes the two thresholds concurrently into account, we can simply reuse the above algorithms to construct a subtree containing only elements within range: we first pass the tree to the treelarger function, and then that result through a treesmaller (or vice versa).
Since in both algorithms, we introduce O(h) new nodes, and the height of the tree can not increase, we thus construct at most O(2 h) and thus O(h) new nodes.
Given the original tree was a complete tree, then it thus holds that we create O(log n) new nodes.
Consider the search for the two endpoints of the range. This search will continue until finding the lowest common ancestor of the two leaf nodes that span your interval. At that point, the search branches with one part zigging left and one part zagging right. For now, let's just focus on the part of the query that branches to the left, since the logic is the same but reversed for the right branch.
In this search, it helps to think of each node as not representing a single point, but rather a range of points. The general procedure, then, is the following:
If the query range fully subsumes the range represented by this node, stop searching in x and begin searching the y-subtree of this node.
If the query range is purely in range represented by the right subtree of this node, continue the x search to the right and don't investigate the y-subtree.
If the query range overlaps the left subtree's range, then it must fully subsume the right subtree's range. So process the right subtree's y-subtree, then recursively explore the x-subtree to the left.
In all cases, we add at most one y-subtree in for consideration and then recursively continue exploring the x-subtree in only one direction. This means that we essentially trace out a path down the x-tree, adding in at most one y-subtree per step. Since the tree has height O(log n), the overall number of y-subtrees visited this way is O(log n). And then, including the number of y-subtrees visited in the case where we branched right at the top, we get another O(log n) subtrees for a total of O(log n) total subtrees to search.
Hope this helps!
According to wikipedia, there are basically two difference in implementation of DFS and BFS.
They are:
1)DFS uses stack while BFS uses queue.(this I understand).
2)DFS delays checking whether a vertex has been discovered until the vertex is popped from the stack rather than making this check before pushing the vertex.
I am not able to understand the second difference.I mean why DFS visits the node after removing from stack while BFS visits the node before adding it to queue.
Thanks!
Extra info:
In a simple implementation of above two algorithms, we take a boolean array (let us name it visited) to keep track of which node is visited or not.The question mentions this visited boolean array.
This is probably the first time I ever heard of when DFS would delay setting the "discovered" property until popped from the stack (even on wikipedia, both the recursive and iterative pseudocode have labeling the current node as discovered before pushing the children in the stack). Also, if you "discover" the node only when you finish processing it, I think you could easily get into endless loops.
There are however situations when I use two flags for each node: one set when entering, one upon leaving the node (usually, I write DFS as recursive, so, right at the end of the recursive function). I think I used something like this is when I needed stuff like: strongly connected components or critical points in a connected graph (="nodes which, if removed, the graph would lose its connectivity"). Also, the order in which you exit the nodes is often used for topological sorting (the topological sort is just the inverse of the order you finished processing the nodes).
The wikipedia article mentions two ways to perform a DFS: using recursion and using a stack.
For completeness, I copy both here:
Using recursion
procedure DFS(G,v):
label v as discovered
for all edges from v to w in G.adjacentEdges(v) do
if vertex w is not labeled as discovered then
recursively call DFS(G,w)
Using a stack
procedure DFS-iterative(G,v):
let S be a stack
S.push(v)
while S is not empty
v ← S.pop()
if v is not labeled as discovered:
label v as discovered
for all edges from v to w in G.adjacentEdges(v) do
S.push(w)
The important thing to know here is how method calls work. There is an underlying stack, lets call is T. Before the method gets called, it's arguments are pushed on the stack. The method then takes the arguments from that stack again, performs its operation, and pushes its result back on the stack. This result is then taken from the stack by the calling method.
As an example, consider the following snippet:
function caller() {
callResult = callee(argument1, argument2);
}
In terms of the stack T, this is what happens (schematically):
// inside method caller
T.push(argument1);
T.push(argument2);
"call method callee"
// inside method callee
argument2 = T.pop();
argument1 = T.pop();
"do something"
T.push(result);
"return from method callee"
// inside method caller
callResult = T.pop();
This is pretty much what is happening in the second implementation: the stack is used explicitly. You can compare making the call to DFS in the first snippet with pushing a vertex on the stack, and compare executing the call to DFS in the first snippet with popping the vertex from the stack.
The first thing after popping vertex v from the stack is marking it as discovered. This is equivalent to marking it as discovered as a first step in the execution of DFS.
Let's say I have binary trees A and B and I want to know if A is a "part" of B. I am not only talking about subtrees. What I want to know is if B has all the nodes and edges that A does.
My thoughts were that since tree is essentially a graph, and I could view this question as a subgraph isomorphism problem (i.e. checking to see if A is a subgraph of B). But according to wikipedia this is an NP-complete problem.
http://en.wikipedia.org/wiki/Subgraph_isomorphism_problem
I know that you can check if A is a subtree of B or not with O(n) algorithms (e.g. using preorder and inorder traversals to flatten the trees to strings and checking for substrings). I was trying to modify this a little to see if I can also test for just "parts" as well, but to no avail. This is where I'm stuck.
Are there any other ways to view this problem other than using subgraph isomorphism? I'm thinking there must be faster methods since binary trees are much more restricted and simpler versions of graphs.
Thanks in advance!
EDIT: I realized that the worst case for even a brute force method for my question would only take O(m * n), which is polynomial. So I guess this isn't a NP-complete problem after all. Then my next question is, is there an algorithm that is faster than O(m*n)?
I would approach this problem in two steps:
Find the root of A in B (either BFS of DFS)
Verify that A is contained in B (giving that starting node), using a recursive algorithm, as below (I concocted same crazy pseudo-language, because you didn't specify the language. I think this should be understandable, no matter your background). Note that a is a node from A (initially the root) and b is a node from B (initially the node found in step 1)
function checkTrees(node a, node b) returns boolean
if a does not exist or b does not exist then
// base of the recursion
return false
else if a is different from b then
// compare the current nodes
return false
else
// check the children of a
boolean leftFound = true
boolean rightFound = true
if a.left exists then
// try to match the left child of a with
// every possible neighbor of b
leftFound = checkTrees(a.left, b.left)
or checkTrees(a.left, b.right)
or checkTrees(a.left, b.parent)
if a.right exists then
// try to match the right child of a with
// every possible neighbor of b
leftFound = checkTrees(a.right, b.left)
or checkTrees(a.right, b.right)
or checkTrees(a.right, b.parent)
return leftFound and rightFound
About the running time: let m be the number of nodes in A and n be the number of nodes in B. The search in the first step takes O(n) time. The running time of the second step depends on one crucial assumption I made, but that might be wrong: I assumed that every node of A is equal to at most one node of B. If that is the case, the running time of the second step is O(m) (because you can never search too far in the wrong direction). So the total running time would be O(m + n).
While writing down my assumption, I start to wonder whether that's not oversimplifying your case...
you could compare the trees in bottom-up as follows:
for each leaf in tree A, identify the corresponding node in tree B.
start a parallel traversal towards the root in both trees from the nodes just matched.
specifically, move to the parent of a node in A and subsequently move towards the root in B until you either encounter the corresponding node in B (proceed) or a marked node in A (see below, if a match in B is found proceed, else fail) or the root of B (fail)
mark all nodes visited in A.
you succeed, if you haven't failed ;-).
the main part of the algorithm runs in O(e_B) - in the worst case, all edges in B are visited a constant number of times. the leaf node matching will run in O(n_A * log n_B) if there the B vertices are sorted, O(n_A * log n_A + n_B * log n_B + n) = O(n_B * log n_B) (sort each node set, lienarly scan the results thereafter) otherwise.
EDIT:
re-reading your question, abovementioned step 2 is even easier, as for matching nodes in A, B, their parents must match too (otheriwse there would be a mismatch between the edge sets). no effect on worst-case run time, of course.
Given: list of N nodes. Each node consists of 2 numbers: nodeID and parentID. parentID may be null (if it's a root node).
Is there an algorithm for recreating a tree from this list of nodes with time complexity better than O(N^2)?
Each node may have 0 or more children.
Short description of an algorithm with O(N^2) complexity:
find a root Node, put it to a Queue
while Queue is not empty
parentNode = Queue.pop()
loop through nodes
if currentNode.parentId = parentNode.id
parentNode.addChild(currentNode)
queue.push(currentNode)
nodes.remove(currentNode)
It seems that this algorithm has O(N^2) time complexity (with small coefficient, maybe 0.25). But I may be wrong at complexity calculation here.
Since you've already got an external structure to the tree (a queue), I'm going to assume you don't mind using a bit of extra memory to get the job done faster.
Do it in two conceptual steps with a hash table:
First make a hash table that relates node IDs to their actual node.
Then look up a node's parent based on its parent's ID in the hash table and add the child to that parent.
More programatically:
for each node
add node to hash table indexed by node's parent
for each node
if parent is null set node as the root
otherwise look up in the hash table the parent from the parent ID of the node
add the node as a child of the found parent
The only potential issue with this technique is you don't necessarily end up with a valid tree until the very end. (That is, the root node may not have a child until the last link.) Depending on what you're doing with your tree, this may be an issue.
If that is an issue you can end up doing the same operation with a data structure that doesn't have that issue (just a vanilla tree with no attached data) and then mirror the structure.
All in all, this should be O(N) on the average.
For each node initialize a list of children and for each node update the parent's children list with itself. Complexity O(n).
For node in NodeList:
node.childList = []
For node in NodeList:
if node.parent is not NULL:
node.parent.childList.append(&node)
If the parent link is not readily available, then create a hash map. FWIW, the best worst case complexity of hash mapping is O(logn) for each insertion or lookup. So the final complexity becomes O(nlogn).
I do not know what your input is, but let's assume that it is some sort of unordered list.
You can then create a tree structure by just putting them into a data structure that allows looking them up by their nodeID. For example, an array would do that. You then have a tree that is only linked in the direction of the parents. Transformation from an unordered list into this array is possible in linear time, assuming that the nodeIDs are unique.
In order to get the tree also linked in the direction of the children, you can prepare the nodes with a data structure (e.g. a list) to hold the children, and then do a second pass that adds each node to its parent's children list. This is also possible in linear time.
I need to determine the complexity of the pseudocode I wrote
while root ≠ null
while hasChild(root)
push(parentTree) ← root
root ← pop(getChilds(root))
...
is parentTree isEmpty
root ← null
else
root ← pop(parentTree)
How can I know the number of execution (for each line) in a worst case scenario ?
I am not able to determine it, because I actually does not know the first two lines. After, it's easy, but I don't know the count for the two first lines...
It's a tree implementation using a stack, and root is the root node, as you see.
By the way, it's the first time I write pseudo code, so I am not sure i wrote it in a good way. If it's not correct, I can rewrite it.
prima facie analysis leads me to think the runtime is O(logn*logn)
Reasoning:
Outer while loop executes at most clogn times (where c is a constant). This is due to the fact that it relies on the 'root' variable, which in turn relies on the 'pop parenttree'
parent tree only gets populated with the 'original' root's grandchildren, iteratively. At most it will have all the children down one path in the tree. Its known that the length of a single path down a tree is logn
Inner while loop also executes at most d logn times (d is constant), if the ... does not execute in O(1) then it would execute in dlogn+X, and the overall runtime would be O(logn*(logn+X)), likely simplifying to O(Xlogn)
Assuming the is is an if, the if/else statements run in O(1)
Outer*Inner = O(clogn*dlogn)