Cracking the Coding Interview 6th Edition: 10.10. Rank from Stream - algorithm

The problem statement is as follows:
Imagine you are reading in a stream of integers. Periodically, you
wish to be able to look up the rank of a number x (the number of
values less than or equal to x). Implement the data structures and
algorithms to support these operations.That is, implement the method
track (in t x), which is called when each number is generated, and the
method getRankOfNumber(int x) , which returns the number of values
less than or equal to X (not including x itself).
EXAMPLE: Stream(in order of appearance): 5, 1, 4, 4, 5, 9, 7, 13, 3
getRankOfNumber(1) = 0 getRankOfNumber(3) = 1 getRankOfNumber(4) = 3
The suggested solution uses a modified Binary Search Tree, where each node stores stores the number of nodes to the left of that node. The time complexity for both methods is is O(logN) for balanced tree and O(N) for unbalanced tree, where N is the number of nodes.
But how can we construct a balanced BST from a stream of random integers? Won't the tree become unbalanced in due time if we keep on adding to the same tree and the root is not the median? Shouldn't the worst case complexity be O(N) for this solution (in which case a HashMap with O(1) and O(N) for track() and getRankOfNumber() respectively would be better)?

you just need to build an AVL or Red-Black Tree to have the O(lg n) complexities you desire.
about the rank, its kind of simple. Let's call count(T) the number of elements of a tree with root T.
the rank of a node N will be:
firstly there will be count(N's left subtree) nodes before N (elements smaller than N)
let A = N's father. If N is right son of A, then there will be 1 + count(A's left subtree) nodes before N
if A is right son of some B, then there will be 1 + count(B's left subtree) nodes before N
recursively, run all the way up until you reach the root or until the node you are in isn't someone's right son.
as the height of a balanced tree is at most lg(n), this method will take O(lg n) to return you someone's rank ( O(lg n) to find + O(lg n) to run back and measure the rank ), but this taking in consideration that all nodes store the sizes of their left and right subtrees.
hope that helps :)

Building a Binary Search Tree (BST) using the stream of numbers should be easier to imagine. All the values less than the node, goes to the left and all the values greater than the node, goes to the right.
Then Rank of any x will be number of nodes in left subtree of that node with value x.
Operations to be done: Find the node with Value x O(logN) + Count Nodes of left Subtree of node found O(logN) = Total O(logN + logN) = O(logN)
In case to optimize searching of counts of node of left subtree from O(logN) to O(1), you can keep another class variable 'leftSubTreeSize' in Node class, and populate it during insertion of a node.

Related

How to decide a value resides in which subtree from a root for a level order binary tree?

I have a level order complete binary tree rooted from 2, as in below figure.
Given a root value and another value v, how can I decide whether v is on left or right subtree of the tree, without traversing the tree?
For example: Let's say root = 2, v = 15. I want to decide using a mathematical function or something that v is in right subtree.
Another example could be, root = 3, v = 10. Answer should be left subtree.
I know I can do this by a tree traversal. I want to know if this is possible in O(1).
It is unclear from your question if you want O(1) to be the time complexity or space complexity.
But, I assume you are talking about the time complexity as space is abundant these days.
If the space complexity permits, there is an approach using which you can query the subtree with a search value in constant time.
The idea is to store all the ancestors of the node with proper direction.
For example:
Let's assume Node 11 to be the target node.
In a single traversal, we can maintain a separate ancestors map for all the nodes containing the respective ancestor and direction to reach the target node.
Starting from the root, Node 2.
Node 2 has no parent, therefore, its ancestors map will be empty.
For Node 3, store a key value pair <2, L> (2 for parent and L for left).
Likewise, for Node 4, store a key value pair <2, R> (2 for parent and R for right).
For Node 6, the ancestors map looks like:
{
2 : "L",
3 : "R"
}
Repeat the procedure for until we cover each node.
Now, the ancestors map for Node 11 will look like as follows:
{
2 : "L",
3 : "R",
6 : "L"
}
Just check if the value of the root of the subtree is present in the ancestors map of Node 11.
If present, just return its value, which denotes the left/right subtree, in constant time.
PS: Using unordered map can be beneficial in such case.
Also, as it is a binary tree, the maximum height for N nodes, will be log2(N).
Therefore, space complexity required is O(N * log2(N)).
The time complexity to for insertion into unordered map is O(1) on average.
Therefore, time complexity for building all the maps = O(N * log2(N) * some constant factor).
Time complexity for queuing = constant ~ O(1).
For, N <= 10^5, the logic for building the ancestors map can be executed within 1 second.

Why is the time complexity of performing n union find (union by size) operations O(n log n)?

In Tree based Implementation of Union Find operation, each element is stored in a node, which contains a pointer to a set name. A node v whose set pointer points back to v is also a set name. Each set is a tree, rooted at a node with a self-referencing set pointer.
To perform a union, we simply make the root of one tree point to the root of the other. To perform a find, we follow set name pointers from the starting node until reaching a node whose set name pointer refers back to itself.
In Union by size -> When performing a union, we make the root of smaller tree
point to the root of the larger. This implies O(n log n) time for
performing n union find operations. Each time we follow a pointer, we are going to a subtree of size at most double the size of the previous subtree. Thus, we will follow at most O(log n) pointers for any find.
I do not understand how for each union operation, Find operation is always O(log n). Can someone please explain how the worst case complexity is actually computed?
Let's assume for the moment, that each tree of height h contains at least 2^h nodes. What happens, if you join two such trees?
If they are of different height, the height of the combined tree is the same as the height of the higher one, thus the new tree still has more than 2^h nodes (same height but more nodes).
Now if they are the same height, the resulting tree will increase its height by one, and will contain at least 2^h + 2^h = 2^(h+1) nodes. So the condition will still hold.
The most basic trees (1 node, height 0) also fulfill the condition. It follows, that all trees that can be constructed by joining two trees together fulfill it as well.
Now the height is just the maximal number of steps to follow during a find. If a tree has n nodes and height h (n >= 2^h) this gives immediately log2(n) >= h >= steps.
You can do n union find (union by rank or size) operations with complexity O(n lg* n) where lg* n is the inverse Ackermann function using path compression optimization.
Note that O(n lg* n) is better than O(n log n)
In the question Why is the Ackermann function related to the amortized complexity of union-find algorithm used for disjoint sets? you can find details about this relation.
We need to prove that maximum height of trees is log(N) where N is the number of items in UF (1)
In the base case, all trees have a height of 0. (1) of course satisfied
Now assuming all the trees satisfy (1) we need to prove that joining any 2 trees with i, j (i <= j) nodes will create a new tree with maximum height is log(i + j)(2):
Because the joining 2 trees procedure gets root node of the smaller tree and attach it to the root node of the bigger one so the height of the new tree will be:
max(log(j), 1 + log(i)) = max(log(j), log(2i)) <= log(i + j) => (2) proved
log(j): height of new tree is still the height of the bigger tree
1 + log(i): when height of 2 trees are the same
See the picture below for more details:
Ref: book Algorithms

Big O Time Complexity for Recursive Pattern

I have question on runtime for recursive pattern.
Example 1
int f(int n) {
if(n <= 1) {
return 1;
}
return f(n - 1) + f(n - 1);
}
I can understand that the runtime for the above code is O(2^N) because if I pass 5, it calls 4 twice then each 4 calls 3 twice and follows till it reaches 1 i.e., something like O(branches^depth).
Example 2
Balanced Binary Tree
int sum(Node node) {
if(node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
}
I read that the runtime for the above code is O(2^log N) since it is balanced but I still see it as O(2^N). Can anyone explain it?
When the number of element gets halved each time, the runtime is log N. But how a binary tree works here?
Is it 2^log N just because it is balanced?
What if it is not balanced?
Edit:
We can solve O(2^log N) = O(N) but I am seeing it as O(2^N).
Thanks!
Binary tree will have complexity O(n) like any other tree here because you are ultimately traversing all of the elements of the tree. By halving we are not doing anything special other than calculating sum for the corresponding children separately.
The term comes this way because if it is balanced then 2^(log_2(n)) is the number of elements in the tree (leaf+non-leaf).(log2(n) levels)
Again if it is not balanced it doesn't matter. We are doing an operation for which every element needs to be consideredmaking the runtime to be O(n).
Where it could have mattered? If it was searching an element then it would have mattered (whether it is balanced or not).
I'll take a stab at this.
In a balanced binary tree, you should have half the child nodes to the left and half to the right of each parent node. The first layer of the tree is the root, with 1 element, then 2 elements in the next layer, then 4 elements in the next, then 8, and so on. So for a tree with L layers, you have 2^L - 1 nodes in the tree.
Reversing this, if you have N elements to insert into a tree, you end up with a balanced binary tree of depth L = log_2(N), so you only ever need to call your recursive algorithm for log_2(N) layers. At each layer, you are doubling the number of calls to your algorithm, so in your case you end up with 2^log_2(N) calls and O(2^log_2(N)) run time. Note that 2^log_2(N) = N, so it's the same either way, but we'll get to the advantage of a binary tree in a second.
If the tree is not balanced, you end up with depth greater than log_2(N), so you have more recursive calls. In the extreme case, when all of your children are to the left (or right) of their parent, you have N recursive calls, but each call returns immediately from one of its branches (no child on one side). Thus you would have O(N) run time, which is the same as before. Every node is visited once.
An advantage of a balanced tree is in cases like search. If the left-hand child is always less than the parent, and the right-hand child is always greater than, then you can search for an element n among N nodes in O(log_2(N)) time (not 2^log_2(N)!). If, however, your tree is severely imbalanced, this search becomes a linear traversal of all of the values and your search is O(N). If N is extremely large, or you perform this search a ton, this can be the difference between a tractable and an intractable algorithm.

Disjoint Set in a special ways?

We implement Disjoint Data structure with tree. in this data structure makeset() create a set with one element, merge(i, j) merge two tree of set i and j in such a way that tree with lower height become a child of root of the second tree. if we do n makeset() operation and n-1 merge() operations in random manner, and then do one find operation. what is the cost of this find operation in worst case?
I) O(n)
II) O(1)
III) O(n log n)
IV) O(log n)
Answer: IV.
Anyone could mentioned a good tips that the author get this solution?
The O(log n) find is only true when you use union by rank (also known as weighted union). When we use this optimisation, we always place the tree with lower rank under the root of the tree with higher rank. If both have the same rank, we choose arbitrarily, but increase the rank of the resulting tree by one. This gives an O(log n) bound on the depth of the tree. We can prove this by showing that a node that is i levels below the root (equivalent to being in a tree of rank >= i) is in a tree of at least 2i nodes (this is the same as showing a tree of size n has log n depth). This is easily done with induction.
Induction hypothesis: tree size is >= 2^j for j < i.
Case i == 0: the node is the root, size is 1 = 2^0.
Case i + 1: the length of a path is i + 1 if it was i and the tree was then placed underneath
another tree. By the induction hypothesis, it was in a tree of size >= 2^i at
that time. It is being placed under another tree, which by our merge rules means
it has at least rank i as well, and therefore also had >= 2^i nodes. The new tree
therefor has >= 2^i + 2^i = 2^(i + 1) nodes.

How to generate an AVL tree as lopsided as possible?

I saw this in some paper and someone argued that there can be at most log(n) times rotation when we delete a node of an AVL tree. I believe we can achieve this by generating an AVL tree as lopsided as possible. The problem is how to do this. This will help me a lot about researching the removal rotation thing. Thanks very much!
If you want to make a maximally lopsided AVL tree, you are looking for a Fibonacci tree, which is defined inductively as follows:
A Fibonacci tree of order 0 is empty.
A Fibonacci tree of order 1 is a single node.
A Fibonacci tree of order n + 2 is a node whose left child is a Fibonacci tree of order n and whose right child is a Fibonacci tree of order n + 1.
For example, here's a Fibonacci tree of order 5:
The Fibonacci trees represent the maximum amount of skew that an AVL tree can have, since if the balance factor were any more lopsided the balance factor of each node would exceed the limits placed by AVL trees.
You can use this definition to very easily generate maximally lopsided AVL trees:
function FibonacciTree(int order) {
if order = 0, return the empty tree.
if order = 1, create a single node and return it.
otherwise:
let left = FibonacciTree(order - 2)
let right = FibonacciTree(order - 1)
return a tree whose left child is "left" and whose right child is "right."
Hope this helps!

Resources