Why memory is the main constraint while using breadth first search - data-structures

I am reading BFS and DFS and I understood that BFS uses a queue to store the nodes while DFS uses a stack to store the nodes that are yet to be visited. But when going through the differences, I found that lot of web sites mentioned that Breadth First Search needs more memory as it needs queue to store the nodes. I did't understand why BFS only needs more memory because even DFS is using stack to maintain the nodes. Can any one please let me know if I am missing any thing?

Well, for a start, balanced trees tend to be wider than they are taller. That's because every time you add a depth level to a balanced tree, you roughly double its capacity.
So, for storing 16,383 items, your width at the bottom of the tree is 8,192 but your depth is only 14:
Level 1: 1
2: 2-3
3: 4-7
4: 8-15
5: 16-31
6: 32-63
7: 64-127
8: 128-255
9: 256-511
10: 512-1023
11: 1024-2047
12: 2048-4095
13: 4096-8191
14: 8192-16383

The main difference between BFS and DFS storage is that BFS keeps the queue of nodes it is going to visit, while the DFS stack keeps nodes it visited while going from the root to the current node (it will go back to those nodes when it finishes traversing the children of the current node).
In the worst case both BFS and DFS will store O(N) nodes in the queue or stack.
The worst case for DFS in terms of memory usage is when it stores almost all the nodes of the tree in the stack, that's when a tree looks like a linked list (each node except the last one has exactly one child). It will have N-1 nodes in the stack in this case.
For BFS the worst case in terms of memory usage would be when your root node is connected to each of the other nodes, in this case it will store N-1 nodes in the queue — just the same amount as DFS stores in the stack in its worst case.
But if we think about balanced trees (the average case), DFS will only store the path from the root to the current node each time (that's about log N nodes), while BFS will store the queue which, for balanced binary trees, can be as large as N/2 when you get to the bottom of the tree.

Related

Best 'order' traversal to copy a balanced binary tree into an AVL tree with minimum rotations

I have two binary trees. One, A which I can access its nodes and pointers (left, right, parent) and B which I don't have access to any of its internals. The idea is to copy A into B by iterating over the nodes of A and doing an insert into B. B being an AVL tree, is there a traversal on A (preorder, inorder, postorder) so that there is a minimum number of rotations when inserting elements to B?
Edit:
The tree A is balanced, I just don't know the exact implementation;
Iteration on tree A needs to be done using only pointers (the programming language is C and there is no queue or stack data structure that I can make use of).
Rebalancing in AVL happens when the depth of one part of the tree exceeds the depth of some other part of the tree by more than one. So to avoid triggering a rebalance you want to feed nodes into the AVL tree one level at a time; that is, feed it all of the nodes from level N of the original tree before you feed it any of the nodes from level N+1.
That ordering would be achieved by a breadth-first traversal of the original tree.
Edit
OP added:
Iteration on tree A needs to be done using only pointers (the
programming language is C and there is no queue or stack data
structure that I can make use of).
That does not affect the answer to the question as posed, which is still that a breadth-first traversal requires the fewest rebalances.
It does affect the way you will implement the breadth-first traversal. If you can't use a predefined queue then there are several ways that you could implement your own queue in C: an array, if permitted, or some variety of linked list are the obvious choices.
If you aren't allowed to use dynamic memory allocation, and the size of the original tree is not bounded such that you can build a queue using a fixed buffer that is sized for the worst case, then you can abandon the queue-based approach and instead use recursion to visit successively deeper levels of the tree. (Imagine a recursive traversal that stops when it reaches a specified depth in the tree, and only emits a result for nodes at that specified depth. Wrap that recursion in a while or for loop that runs from a depth of zero to the maximum depth of the tree.)
If the original tree is not necessarily AVL-balanced, then you can't just copy it.
To ensure that there is no rebalancing in the new tree, you should create a complete binary tree, and you should insert the nodes in BFS/level order so that every intermediate tree is also complete.
A "complete" tree is one in which every level is full, except possibly the last. Since every complete tree is AVL-balanced, and every intermediate tree is complete, there will be no rebalancing required.
If you can't copy your original tree out into an array or other data structure, then you'll need to do log(N) in-order traversals of the original tree to copy all the nodes. During the first traversal, you select and copy the root. During the second, you select and copy level 2. During the third, you copy level 3, etc.
Whether or not a source node is selected for each level depends only on its index within the source tree, so the actual structure of the source tree is irrelevant.
Since each traversal takes O(N) time, the total time spent traversing is O(N log N). Since inserts take O(log N) time, though, that is how long insertion takes as well, so doing log N traversals does not increase the complexity of the overall process.

Time complexity to delete a leaf node from Max Heap?

Let's say we're given with a MAX Heap and we want to delete any of the leaf node, then how much time will it take to delete any of the leaf node and maintain the max heap property?
My main doubt is - will it O(n) time to reach to leaf nodes?
Also, why Binary Heaps has to be a complete Binary Tree and not almost complete Binary tree?
A binary heap is a complete binary tree. All levels are full, except possibly the last, which is left-filled. A binary tree is not necessarily a full binary tree.
In a binary heap of size N, represented in an array, the leaf nodes are in the last half of the array. That is, the nodes from N/2 to N-1 are leaf nodes. Deleting the last node (i.e. a[N-1]) is an O(1) operation: all you have to do is remove the node and decrease the size of the heap.
Removing any other leaf node is potentially an O(log n) operation because you have to:
Move the last node, a[N-1] to the node that you're deleting.
Bubble that item up into the heap, to its proper position.
The first part is, of course, O(1). The second part can require up to log(n) - 1 moves. The average is less than 2, but the worst case is log(n) - 1.
In a MAX heap you can access the leaf node in the heap in O(logn) as it is a complete binary tree and traversing the entire height of the tree takes O(logn)
Once this is done, you can call heapify to build the heap again which takes O(logn)
Almost Complete Binary Tree is no different from Complete Binary Tree except that it has following two restrictions :
At every node after completion of current level only go to next level.
At every node after completion of left node go to right.
Every formula that is applicable to complete binary tree will be applicable to almost complete binary tree.
The only difference is there is a gap at last level from right to left in almost complete binary tree. If there is no gap then it is Complete Binary Tree.
Heap is forced to have this property of being a compete binary tree for effciency purposes

BFS vs DFS for these situations?

I can't decide whether or not to use a bfs or dfs in these two situations
situation1: the graph is unbalanced undirected edge weighted tree with height 40 and minimal depth to any leaf node of 38. What is the best algorithm to find the minimal edge cost from root to any leaf
situation2: the graph is a max heap which algorithm is the best to find the maximum key value within each level of the heap.
For situation 1 I'm thinking DFS because you don't have to go through all of the branches to find the smallest one, the second a branch is bigger than the comparison you stop.
for situation 2 I'm thinking BFS because a BFS gets all the nodes from each level at once, and is better for comparison..
any advice?
I am assuming that you only have a pointer to the root of the tree/heap to start off with in both cases.
The worst case time complexity for both situations regardless of whether you use BFS or DFS is O(n), where n is the number of nodes. Thus any optimizations that you may be able to come up with would be "on average" optimizations.
You are correct that DFS is likely to perform better than BFS for situation 1 for the exact reason that you have given.
For situation 2, however, DFS is no slower than BFS (in theory at least) because you can simply store each node at their corresponding levels and them compare all nodes in each level later. For space complexity, however, BFS would be better, because once a level is done and you move onto the next, you don't have to store any of the parent nodes. For this reason BFS can be recommended for situation 2.

Why the Red Black Tree is kept unbalanced after insertion?

Here is a red black tree which seems unbalanced. If this is the case, Someone please explain why it is unbalanced?.
The term "balanced" is a bit ambiguous, since different kinds of balanced trees have different constraints.
A red-black tree ensures that every path to a leaf has the same number of black nodes, and at least as many black nodes as red nodes. The result is that the longest path is at most twice as long as the shortest path, which is good enough to guarantee O(log N) time for search, insert, and delete operations.
Most other kinds of balanced trees have tighter balancing constraints. An AVL tree, for example, ensures that the lengths of the longest paths on either side of every node differ by at most 1. This is more than you need, and that has costs -- inserting or deleting in an AVL tree (after finding the target node) takes O(log N) operations on average, while inserting or deleting in a red-black tree takes O(1) operations on average.
If you wanted to keep a tree completely balanced, so that you had the same number of descendents on either side of every node, +/- 1, it would be very expensive -- insert and delete operations would take O(N) time.
Yes it is balanced. The rule says, counting the black NIL leaves, the longest possible path should consists maximum of 2*B-1 nodes where B is black nodes in shortest possible path from the root to any leaf. In your example shortest path has 2 black nodes so B = 2 so longest path can have upto 3 black nodes but it is just 2.

Why in-order traversal of a threaded tree is O(N)?

I can't seem to figure out how the in-order traversal of a threaded binary tree is O(N)..
Because you have to descend the links to find the the leftmost child and then go back by the thread when you want to add the parent to the traversal path. would not that be O(N^2)?
Thanks!
The traversal of a tree (threaded or not) is O(N) because visiting any node, starting from its parent, is O(1). The visitation of a node consists of three fixed operations: descending to the node from parent, the visitation proper (spending time at the node), and then returning to the parent. O(1 * N) is O(N).
The ultimate way to look at it is that the tree is a graph, and the traversal crosses each edge in the graph only twice. And the number of edges is proportional to the number of nodes since there are no cycles or redundant edges (each node can be reached by one unique path). A tree with N nodes has exactly N-1 edges: each node has an edge leading to it from its parent node, except for the root node of the tree.
At times it appears as if visiting a node requires more than one descent. For instance, after visiting the rightmost node in a subtree, we have to pop back up numerous levels before we can march to the right into the next subtree. But we did not descend all the way down just to visit that node. Each one-level descent can be accounted for as being necessary for visiting just the node immediately below, and the opposite ascent's
cost is lumped with that. By visiting a node V, we also gain access to all the nodes below it, but all those nodes benefit from and share the edge traversal from V's parent down to V, and back up again.
This is related to amortized analysis, which applies in situations where we can globally understand the overall cost based on some general observation about the structure of the problem, but at the detailed level of the individual operations, the costs are distributed in an uneven way that appears confusing.
Amortized analysis helps us understand that, for instance, N insertions into a hash table which resizes itself by growing exponentially are O(N). Most of the insertion operations are quick, but from time to time, we grow the table and process its contents. This is similar to how, from time to time during a tree traversal, we have to perform numerous consecutive ascents to climb out of a deep subtree.
The global observation about the hash table is that each item inserted into the table will move to a larger table on average about three times in three resize operations, and so each insertion can be regarded as "pre paying" for three re-insertions, which is a fixed cost. Of course, "older" items will be moved more times, but this is offset by "younger" entries that move fewer times, diluting the cost. And the global observation about the tree was already noted above: it has N-1 edges, each of which are traversed exactly twice during the traversal, so the visitation of each node "pays" for the double traversal of its respective edge. Because this is so easy to see, we don't actually have to formally apply amortized analysis to tree traversal.
Now suppose we performed an individual searches for each node (and the tree is a balanced search tree). Then the traversal would still not be O(N*N), but rather O(N log N). Suppose we have an ordered search tree which holds consecutive integers. If we increment over the integers and perform individual searches for each value, then each search is O(log N), and we end up doing N of these. In this situation, the edge traversals are no longer shared, so amortization does not apply. To reach some given node that we are searching for which is found at depth D, we have to cross D edges twice, for the sake of that node and that node alone. The next search in the loop for another integer will be completely independent of the previous one.
It may also help you to think of a linked list, which can be regarded as a very unbalanced tree. To visit all the items in a linked list of length N and return back to the head node is obviously O(N). Searching for each item individually is O(N*N), but in a traversal, we are not searching for each node individually, but using each predecessor as a springboard into finding the next node.
There is no loop to find the parent. Otherwise said, you are going through each arc between two node twice. That would be 2*number of arc = 2*(number of node -1) which is O(N).

Resources