How many nodes in an AVL tree change depth after a rotation - algorithm

When adding or deleting a node in an AVL tree, rebalancing might occur. I can understand how there can be O(log(n)) rebalances needed, but when those rotations occur to balance the tree how many nodes actually change level. I can't seem to find this anywhere. I was thinking it was O(log(n)) but can't seem to figure out why. Help would be greatly appreciated

The answer is O(n).
Suppose that at each node there is a "depth" field, how much will it cost to maintain it?
There is a theorem: If the information in field F of node N depends solely on its direct children, then it can be maintained when updated (-inserted or deleted) in logarithmic time.
(the theorem can be proved by induction)
The "depth" field doesn't depends on its children - but rather on its parent.
Note, however, that the theorem is one way, meaning it says when it can be maintained in logarithmic time, but doesn't say when not.Therefore it cant't be said with certaity that the "depth" field can be maintained in logarithmic time (such as height or BF fields), and it can even be seen at insertion that O(n) nodes change thier depth:
In the first rotation the depth of t2 and t4 have been changed, and in the second, the depth of t1, t2, and t3!

Related

Why is shallow binary tree better?

Why do we always want shallow binary tree? In what cases is shallow binary tree better than non-shallow/minimum depth tree?
I am just confused as my prof keeps saying we want to aim for shallowest possible binary tree but I do not understand why. I guess smallar is better but is there any specific concrete reason? Sorry for my bad english thanks for your help
I'm assuming this is in regards to binary search trees - if not, please let me know and I can update this answer.
In a binary search tree, the cost of almost every operation (insertion, deletion, lookup, successor, predecessor, min, max, range search, split, join, etc.) depends on the height of the binary search tree. The reason for this is that these operations work by walking down the tree from the root until they either fall off the tree or find what they're looking for. The deeper the tree, the longer this can take if you get bad inputs.
By shuffling nodes around to keep the tree height low, we can make it so that these operations are, in general, very fast. A tree with height h can have at most 2h - 1 nodes in it, which is a huge number compared with h (figure that if h = 20, 2h - 1 is over a million!), so if you make an effort to pack the nodes into the tree higher up and closer to the root, you'll get better operation speeds all around.
There are some cases where it's actually beneficial to have trees that are as imbalanced as possible. For example, if you have a binary search tree and know in advance that some elements will be looked up more than others, you may want to shuffle the nodes around in the tree to put the high-frequency items higher up and the low-frequency items deeper in the tree. In non-binary-search-tree contexts, the randomized meldable priority queue works by randomly walking down a tree doing merges, and the less balanced the tree is the more likely it is for these operations to end early by falling off the tree.

Promote a node after already lost 2 or more children

In the decrease-key operation of a Fibonacci Heap, if it is allowed to lose s > 1 children before cutting a node and melding it to the root list (promote the node), does this alter the overall runtime complexity? I think there are no changes in the complexity since the change in potential will be the same. But I am not sure if I am right.
And how can this be proved by the amortized analysis?
Changing the number of children that a node in the Fibonacci heap can lose does affect the runtime, but my suspicion is that if you're careful with how you do it you'll still get the same asymptotic runtime.
You're correct that the potential function will be unchanged if you allow each node to lose multiple children before being promoted back up to the root. However, the potential function isn't the source of the Fibonacci heap's efficiency. The reason that we perform cascading cuts (promoting multiple nodes back up to the root level during a decrease-key) is to ensure that a tree that has order n has a number of nodes in it that is exponential in n. That way, when doing a dequeue-min operation and coalescing trees together such that there is at most one tree of each order, the total number of trees required to store all the nodes is logarithmic in the number of nodes. The standard marking scheme ensures that each tree of order n has at least Θ(φn) nodes, where φ is the Golden Ratio (around 1.618...)
If you allow more nodes to be removed out of each tree before promoting them back to the root, my suspicion is that if you cap the number of missing children at some constant that you should still get the same asymptotic time bounds, but probably with a higher constant factor (because each tree holds fewer nodes and therefore more trees will be required). It might be worth writing out the math to see what recurrence relation you get for the number of nodes in each tree in case you want an exact value.
Hope this helps!

Check if 2 tree nodes are related (ancestor/descendant) in O(1) with pre-processing

Check if 2 tree nodes are related (i.e. ancestor-descendant)
solve it in O(1) time, with O(N) space (N = # of nodes)
pre-processing is allowed
That's it. I'll be going to my solution (approach) below. Please stop if you want to think yourself first.
For a pre-processing I decided to do a pre-order (recursively go through the root first, then children) and give a label to each node.
Let me explain the labels in details. Each label will consist of comma-separated natural numbers like "1,2,1,4,5" - the length of this sequence equals to (the depth of the node + 1). E.g. the label of the root is "1", root's children will have labels "1,1", "1,2", "1,3" etc.. Next-level nodes will have labels like "1,1,1", "1,1,2", ..., "1,2,1", "1,2,2", ...
Assume that "the order number" of a node is the "1-based index of this node" in the children list of its parent.
Common rule: node's label consists of its parent label followed by comma and "the order number" of the node.
Thus, to answer if two nodes are related (i.e. ancestor-descendant) in O(1), I'll be checking if the label of one of them is "a prefix" of the other's label. Though I'm not sure if such labels can be considered to occupy O(N) space.
Any critics with fixes or an alternative approach is expected.
You can do it in O(n) preprocessing time, and O(n) space, with O(1) query time, if you store the preorder number and postorder number for each vertex and use this fact:
For two given nodes x and y of a tree T, x is an ancestor of y if and
only if x occurs before y in the preorder traversal of T and after y
in the post-order traversal.
(From this page: http://www.cs.arizona.edu/xiss/numbering.htm)
What you did in the worst case is Theta(d) where d is the depth of the higher node, and so is not O(1). Space is also not O(n).
if you consider a tree where a node in the tree has n/2 children (say), the running time of setting the labels will be as high as O(n*n). So this labeling scheme wont work ....
There are linear time lowest common ancestor algorithms(at least off-line). For instance have a look here. You can also have a look at tarjan's offline LCA algorithm. Please note that these articles require that you know the pairs for which you will be performing the LCA in advance. I think there are also online linear time precomputation time algorithms but they are very complex. For instance there is a linear precomputation time algorithm for the range minimum query problem. As far as I remember this solution passed through the LCA problem twice . The problem with the algorithm is that it had such a large constant that it require enormous input to be actually faster then the O(n*log(n)) algorithm.
There is much simpler approach that requires O(n*log(n)) additional memory and again answers in constant time.
Hope this helps.

Backtracking Algorithm

How weigth order affects the computing cost in a backtracking algorithm? The number of nodes and search trees are the same but when it's non-ordered it tooks a more time, so it's doing something.
Thanks!
Sometimes in backtracking algorithms, when you know a certain branch is not an answer - you can trim it. This is very common with agents for games, and is called Alpha Beta Prunning.
Thus - when you reorder the visited nodes, you can increase your prunning rate and thus decrease the actual number of nodes you visit, without affecting the correctness of your answer.
One more possibility - if there is no prunning is cache performance. Sometimes trees are stored as array [especially complete trees]. Arrays are most efficient when iterating, and not "jumping randomly". The reorder might change this behavior, resulting in better/worse cache behavior.
The essence of backtracking is precisely not looking at all possibilities or nodes (in this case), however, if the nodes are not ordered it is impossible for the algorithm to "prune" a possible branch because it is not known with certainty if the element Is actually on that branch.
Unlike when it is an ordered tree since if the searched element is greater / smaller the root of that subtree, the searched element is to the right or left respectively. That is why if the tree is not ordered the computational order is equal to brute force, however, if the tree is ordered in the worst case order is equivalent to brute force, but the order of execution is smaller.

Data Structure to maintain numbers

Please suggest a data structure to maintain numbers in such a way that i can answer the following queries -
Find(int n) - O(log(n))
Count number of numbers less than k = O(log(n))
Insert - O(Log(n))
It's not homework, but a smaller problem that i am encountering to solve a bigger one - Number of students with better grades and lower jee rank
I have though of an avl tree with maintaining number of nodes in subtree at each nod.But i dont know how to maintain this count at each node when an insert is being done and re-balancing is being done.
I would also try using an AVL Tree. Without looking much deeper into it, I don't think this would be too hard to add. In case of an AVL tree you alway need to know the depth of each subtree for each node anyway (or at least the balancing factor). So it should not be too hard to propagate the size of the subtrees. In case of an rotation, you know exactely where each node and each subtree will land, so it should be just a simple recalculation for those nodes, which are rotated.
Finding in a binary tree is O(log(n)), inserting too.
If you store the subtree size in the node:
you can coming back from a successful insert in a subtree increment the node's counter;
on deleting the same.
So subtree size is like a find, O(log(n)).
Have a look at the different variants of heap data structures, e.g. here.

Resources