Splay tree rotation algorithm: Why use zig-zig and zig-zag instead of simpler rotations? - data-structures

I don't quite understand why the rotation in the splay tree data structure is taking into account not only the parent of the rating node, but also the grandparent (zig-zag and zig-zig operation). Why would the following not work:
as we insert, for instance, a new node to the tree, we check whether we insert into the left or right subtree. If we insert into the left, we rotate the result RIGHT, and vice versa for right subtree. Recursively it would be sth like this
Tree insert(Tree root, Key k){
if(k < root.key){
root.setLeft(insert(root.getLeft(), key);
return rotateRight(root);
}
//vice versa for right subtree
}
That should avoid the whole "splay" procedure, don't you think?

The algorithm you're proposing on the tree is called the "move-to-root" heuristic and is discussed on page four of Sleator and Tarjan's original paper on splay trees. They cite an older paper by Allen and Munro where it is shown that if you try to use move-to-root as a means for reshaping trees, it is possible for the amortized cost of each lookup to be O(n), which is quite slow. Splaying is a very carefully designed algorithm for reshaping the tree, and it guarantees amortized O(log n) lookups no matter what sequence of accesses is performed.
Intuitively, move-to-root is not a very good way to reshape the tree because it moves down all the nodes on the path from the node to the root while trying to make the accessed node easier to reach in the future. As a result, the overall tree can get worse when doing this version of tree reorganizing. On the other hand, the splay method tends to decrease the height of the splayed node and all of the nodes on its access path, which means that as a whole the tree tends to get better during a splay.
Hope this helps!

Related

Best 'order' traversal to copy a balanced binary tree into an AVL tree with minimum rotations

I have two binary trees. One, A which I can access its nodes and pointers (left, right, parent) and B which I don't have access to any of its internals. The idea is to copy A into B by iterating over the nodes of A and doing an insert into B. B being an AVL tree, is there a traversal on A (preorder, inorder, postorder) so that there is a minimum number of rotations when inserting elements to B?
Edit:
The tree A is balanced, I just don't know the exact implementation;
Iteration on tree A needs to be done using only pointers (the programming language is C and there is no queue or stack data structure that I can make use of).
Rebalancing in AVL happens when the depth of one part of the tree exceeds the depth of some other part of the tree by more than one. So to avoid triggering a rebalance you want to feed nodes into the AVL tree one level at a time; that is, feed it all of the nodes from level N of the original tree before you feed it any of the nodes from level N+1.
That ordering would be achieved by a breadth-first traversal of the original tree.
Edit
OP added:
Iteration on tree A needs to be done using only pointers (the
programming language is C and there is no queue or stack data
structure that I can make use of).
That does not affect the answer to the question as posed, which is still that a breadth-first traversal requires the fewest rebalances.
It does affect the way you will implement the breadth-first traversal. If you can't use a predefined queue then there are several ways that you could implement your own queue in C: an array, if permitted, or some variety of linked list are the obvious choices.
If you aren't allowed to use dynamic memory allocation, and the size of the original tree is not bounded such that you can build a queue using a fixed buffer that is sized for the worst case, then you can abandon the queue-based approach and instead use recursion to visit successively deeper levels of the tree. (Imagine a recursive traversal that stops when it reaches a specified depth in the tree, and only emits a result for nodes at that specified depth. Wrap that recursion in a while or for loop that runs from a depth of zero to the maximum depth of the tree.)
If the original tree is not necessarily AVL-balanced, then you can't just copy it.
To ensure that there is no rebalancing in the new tree, you should create a complete binary tree, and you should insert the nodes in BFS/level order so that every intermediate tree is also complete.
A "complete" tree is one in which every level is full, except possibly the last. Since every complete tree is AVL-balanced, and every intermediate tree is complete, there will be no rebalancing required.
If you can't copy your original tree out into an array or other data structure, then you'll need to do log(N) in-order traversals of the original tree to copy all the nodes. During the first traversal, you select and copy the root. During the second, you select and copy level 2. During the third, you copy level 3, etc.
Whether or not a source node is selected for each level depends only on its index within the source tree, so the actual structure of the source tree is irrelevant.
Since each traversal takes O(N) time, the total time spent traversing is O(N log N). Since inserts take O(log N) time, though, that is how long insertion takes as well, so doing log N traversals does not increase the complexity of the overall process.

Creating a Red-Black tree from BST tree - the fastest way?

I have to create and describe an algorithm for my university course that gets a BST tree T and creates new BST tree T' that satifies properties (and is as fast as possible):
1) T' has the same exact key values as T
2) T' is a Red-Black tree
So far I've had only one idea: randomize 0 or 1. In case of 0, get the max key node from left subtree of T and insert it into T', otherwise get the min key node from right subtree of T and insert it into T'. This is to ensure that Red-Black tree is at least somewhat balanced. The insertion would be any standard RB insertion.
Complexity of getting min/max is O(h), and since this needs to be repeated for each of the nodes in T, this would get quite high. I could also keep a pointer at the max node of left subtree and min node of the right subtree, which would solve the problem of traversing the whole height of the tree every time.
What do you think about this solution? I'm pretty sure it can be done better. Sorry if there is an obvious better solution, but I couldn't find answer to this on the internet, also it's only my 2nd semester at the university and I don't have much experience with programming.
Unless you have some other constraints or information, the fastest way is to forget about the shape of the original BST.
Just put the keys in an ordered list, and build a complete binary tree from it, all in O(N) time.
Then, if there's a partially filled leaf level, then color those nodes red. The rest are black.

Is it always possible to turn one BST into another using at most O(n) tree rotations?

This earlier question asks whether it's always possible to turn one BST for a set of values into another BST for the same set of values purely using tree rotations (the answer is yes). However, is it always possible to do this using at most O(n) total tree rotations?
Yes, it is always possible to turn one BST into another using at most O(n) tree rotations. This answer follows the same general approach as the other answer by picking some canonical tree shape T* and bounding the number of rotations needed to turn an arbitrary tree into our canonical tree. Then you can turn an arbitrary tree T₁ into another tree T₂ by transforming T₁ into T* and then transforming T* into T₂.
As suggested in comments, you can choose your canonical tree to be a degenerate linked list. For trees of n nodes, this upper bounds the number of rotations needed at 2n−2.
In the paper Rotation Distance, Triangulation, and Hyperbolic Geometry, Daniel Sleator, Robert Tarjan, and William Thurston proved that the rotation distance between any two binary trees of n nodes is at most 2n−6 (better than the bound we get when transforming into a linked list).
At a high level, they did this by introducing a way to represent any binary tree as a polygon triangulation, where a tree rotation has a corresponding triangulation operation. Then, instead of reasoning about binary trees in their usual representation, the paper picks a canonical triangulation and shows how to transform an arbitrary triangulation into their desired one.
The canonical triangulation they chose is one where all diagonals emanate from a single vertex in a fan-like shape, which ends up corresponding to a somewhat unintuitive binary tree shape (a generalization of linked lists that also includes diamond shaped trees consisting of a root, a left child whose right child is a linked list, and a right child whose left child is a linked list).
It's a very cool technique that illustrates the power of isometries in data structures, showing how changing our representation can give us a new way of approaching a problem. Some friends and I recently put together a writeup walking through Sleator, Tarjan, and Thurston's proof if you would like to explore this in more detail.
Yes, this is always possible. I fear that the best I can do right now is give you a silly algorithm that proves it's possible, though I suspect that there must be a much better way to do this.
The Day-Stout-Warren algorithm is an algorithm that, starting with any BST, uses tree rotations to convert it to a perfectly balanced BST. It runs in time O(n) and does O(n) total rotations.
So suppose that you want to turn one tree T1 into another tree T2 using tree rotations. Run Day-Stout-Warren on both trees to convert them to the same balanced tree T*, and record the rotations that you needed to make in both cases. Then you can turn T1 into T2 by first running all the rotations needed to perfectly balanced T1, then running the reverse of the rotations needed to turn T2 into a balanced tree. This turns T1 into T* and then turns T* into T2. Since the Day-Stout-Warren algorithms makes only O(n) total rotations, this too makes only O(n) total rotations.
I feel like there has to be a better way to do this, but I'm not sure off the top of my head how to achieve this. If I think of anything, I'll let you know!

Keeping avl tree balanced without rotations

B Tree is self balancing tree like AVL tree. HERE we can see how left and right rotations are used to keep AVL tree balanced.
And HERE is a link which explains insertion in B tree. This insertion technique does not involve any rotations, if I am not wrong, to keep tree balanced. And therefore it looks simpler.
Question: Is there any similar (or any other technique without using rotations) to keep avl tree balanced ?
The answer is... yes and no.
B-trees don't need to perform rotations because they have some slack with how many different keys they can pack into a node. As you add more and more keys into a B-tree, you can avoid the tree becoming lopsided by absorbing those keys into the nodes themselves.
Binary trees don't have this luxury. If you insert a key into a binary tree, it will increase the height of some branch in that tree by 1 in all cases because that key needs to go into its own node. Rotations combat the overall growth of the tree by ensuring that if certain branches grow too much, that height is shuffled into the rest of the tree.
Most balanced BSTs have some sort of rebalancing strategy that involves rotations, but not all do. One notable example of a strategy that doesn't directly involve rotations is the scapegoat tree, which rebalances by tearing huge subtrees out of the master tree, optimally rebuilding them, then gluing the subtree back into the main tree. This approach doesn't technically involve any rotations and is a pretty clean way to implement a balanced tree.
That said - the most space-efficient implementations of scapegoat trees do indeed use rotations to convert an imbalanced tree into a perfectly balanced one. You don't have to use rotations to do this, though if space is short it's probably the best way to do so.
Hope this helps!
Rotations can be made simple (if you need only simplicity).
If the insertion traffic is left, the balance -1 is the red-light.
If the insertion traffic is right, the balance 1 is the red-light.
This is a (simplified) coarse-graining (2-adic rounding) of the normalized fundamental AVL balance:
{left,even,right} ~ {low,even,high} ~ {green,green,red}
Walk the insertion route and rotate every red-light (before the insertion). If the next light is green, you can just rotate the red-light 1 or 2 times. You may have to re-balance the next subtrees before each rotation, because inner subtrees are invariant. This is simple, but it takes a very long time. You have to move down the green-light before each rotation. You can always move down the green-light to the root, and you can rotate the tree-top to generate a new green-light.
The red-light rotations naturally move down the green-light.
At this point, you have only the green-lights for the insertion.
The cost structure of this naive method is topologically simplified as
df(h)/dh=∫f(h)dh
such as sin(h),sinh(h),etc.

Why in-order traversal of a threaded tree is O(N)?

I can't seem to figure out how the in-order traversal of a threaded binary tree is O(N)..
Because you have to descend the links to find the the leftmost child and then go back by the thread when you want to add the parent to the traversal path. would not that be O(N^2)?
Thanks!
The traversal of a tree (threaded or not) is O(N) because visiting any node, starting from its parent, is O(1). The visitation of a node consists of three fixed operations: descending to the node from parent, the visitation proper (spending time at the node), and then returning to the parent. O(1 * N) is O(N).
The ultimate way to look at it is that the tree is a graph, and the traversal crosses each edge in the graph only twice. And the number of edges is proportional to the number of nodes since there are no cycles or redundant edges (each node can be reached by one unique path). A tree with N nodes has exactly N-1 edges: each node has an edge leading to it from its parent node, except for the root node of the tree.
At times it appears as if visiting a node requires more than one descent. For instance, after visiting the rightmost node in a subtree, we have to pop back up numerous levels before we can march to the right into the next subtree. But we did not descend all the way down just to visit that node. Each one-level descent can be accounted for as being necessary for visiting just the node immediately below, and the opposite ascent's
cost is lumped with that. By visiting a node V, we also gain access to all the nodes below it, but all those nodes benefit from and share the edge traversal from V's parent down to V, and back up again.
This is related to amortized analysis, which applies in situations where we can globally understand the overall cost based on some general observation about the structure of the problem, but at the detailed level of the individual operations, the costs are distributed in an uneven way that appears confusing.
Amortized analysis helps us understand that, for instance, N insertions into a hash table which resizes itself by growing exponentially are O(N). Most of the insertion operations are quick, but from time to time, we grow the table and process its contents. This is similar to how, from time to time during a tree traversal, we have to perform numerous consecutive ascents to climb out of a deep subtree.
The global observation about the hash table is that each item inserted into the table will move to a larger table on average about three times in three resize operations, and so each insertion can be regarded as "pre paying" for three re-insertions, which is a fixed cost. Of course, "older" items will be moved more times, but this is offset by "younger" entries that move fewer times, diluting the cost. And the global observation about the tree was already noted above: it has N-1 edges, each of which are traversed exactly twice during the traversal, so the visitation of each node "pays" for the double traversal of its respective edge. Because this is so easy to see, we don't actually have to formally apply amortized analysis to tree traversal.
Now suppose we performed an individual searches for each node (and the tree is a balanced search tree). Then the traversal would still not be O(N*N), but rather O(N log N). Suppose we have an ordered search tree which holds consecutive integers. If we increment over the integers and perform individual searches for each value, then each search is O(log N), and we end up doing N of these. In this situation, the edge traversals are no longer shared, so amortization does not apply. To reach some given node that we are searching for which is found at depth D, we have to cross D edges twice, for the sake of that node and that node alone. The next search in the loop for another integer will be completely independent of the previous one.
It may also help you to think of a linked list, which can be regarded as a very unbalanced tree. To visit all the items in a linked list of length N and return back to the head node is obviously O(N). Searching for each item individually is O(N*N), but in a traversal, we are not searching for each node individually, but using each predecessor as a springboard into finding the next node.
There is no loop to find the parent. Otherwise said, you are going through each arc between two node twice. That would be 2*number of arc = 2*(number of node -1) which is O(N).

Resources