Is RBT always full? - algorithm

As I understand, binary trees do not have to be full. However, it seems that RBTs have to be full (sometimes children are NIL). Is that true, or am I missing something?

The path from the root node to all leaf nodes of any given red-black tree all have the same number of black nodes. In that sense I suppose you could say that red-black trees are always 'full' but I don't see that being a very useful definition.
The general idea of the red-black algorithm is to constrain the actual maximum difference in total height of leaf nodes (not just black height) between the leaf node with the shortest total path and the leaf node with the longest total path. If you use that as your basis then a RB tree is 'full' if all leaf nodes have the same total height (just as a regular binary tree is full if all leaves are at the same depth) and an RB tree does not have to be filled.

No. Red Black Trees are not always full. In fact that's a seldom event. You can learn more about it by reading the book Introduction to Algorithms (Cormen, page 308), 3rd Edition (it has some figures illustrating the answer at page 310, i am not showing them because copyright).

Related

Red Black Tree Insertion & Deletion Uniqueness

I've been learning and working on implementing a red-black tree data structure. I'm following this article on red-black tree deletion examples and looking at example 5 they have:
When I insert the same nodes into my tree, I get the following:
I understand that red black trees are not unique (I think), therefore both of the above trees are valid since they don't violate any of the properties.
In the example article, after deleting node 1, they get the following:
But after deleting node 1 in my code, I get the following:
Since in my case, node 1 is red, I don't call my delete_fix function which takes care of re-arranging the tree and such. The deletion algorithm I was following simply states to call a delete_fix function if the node to be deleted is black.
However, after comparing my tree with the one in the example article I can see that mine is not exactly optimized. It still follows the rules of the red-black tree though. Is this to be expected with red-black trees or am I missing something here?
However, after comparing my tree with the one in the example article I can see that mine is not exactly optimized.
It is optimised. Your tree will be fast at deleting nodes 5, 7, 20 & 28. The other only 5 & 7.
Bear in mind that for Red-Black Trees, they can be bushy in one direction. If the black tree height of real nodes is N, then the minimum path from root to leaf node is N (all black) and maximum path from root to leaf node is 2 * N (alternatively black-red-black-red etc). If you try to add a new node to the bushy path that is at maximum height, the tree will recolour and/or rebalance.
If you want a more balanced search tree you should use an AVL tree. Red-Black trees favour minimal insertion/deletion fixups over finding a node. Your tree is fine.

Height difference between leaves in an AVL tree

What is the maximum difference between any two leaves in an AVL tree? If I take an example, my tree becomes unbalanced, if the height difference is more than 2(for any two leaves), but the answer is the difference can be any value. I really don't understand, how this is possible.Can anyone explain with examples?
The difference in levels of any two leaves can be any value! Definition of AVL describes height difference only on two sub-trees from one node.
So you need to fill subtrees with equal height then add new nodes just to create that single node difference. But nobody said that that subtree doesn't contain some subtrees with the exact same definition. Of course tree is selfbalanced but if we'll be that accurate to not touch it's balance then we can create any height difference between some leaves.
Example with leaf 24 on level 3 and leaf 10 on level 6:
According to the explanation in this Wikipedia article, the balancing operations in an AVL tree successfully aim at rearranging the tree such that the height of any two leaves differs no more than one. This is the key property of the data structure which makes the retrieval of nodes efficient (namely logarithmic in the number of nodes of the tree, as a path from the root to a leaf is traversed in the worst case).

What is the name (if any) for this kind of tree?

I have this tree which, for each node, has exactly 10 childnodes (0-9). Each node has some associated data (say, for example, a name and a tag and a color) which, I guess, isn't important for this question. Each of the childnodes has exactly 10 childnodes. A node can be null (which 'ends' the branch') or contain another node.
To visualize what I'm talking about I made this diagram (fear my paintz0r skillz!):
A black box is a null-node. A white box is a node which contains data and childnodes. As you can see, even the root, each node has exactly 10 childnodes. Because of simplicity and to keep the diagram sane I have drawn some nodes very tiny but you can imagine these tiny nodes being the same.
This structure allows me to traverse a path consisting of digits very quickly: a path of 47352 would lead me down the "orange path" to the final destination; 4->7->3->5 where the final 2 cannot be resolved because that last one is a null-node (although colored red) and contains no childnodes.
My question is pretty simple actually: what is this kind of tree called? I have gone through all trees on Wikipedia's Tree (data structure) lemma and the closest I (think I) could get is the Octree and/or K-ary tree. Along those lines of reasoning my tree would be called a Dectree, Decitree, 10-ary tree or 10-way tree or something. But there might be a better name for this. So: anyone?
K-ary tree with K=10
In graph theory, a k-ary tree is a rooted tree in which each node has
no more than k children
It is also sometimes known as a k-way tree, an N-ary tree, or an M-ary
tree. A binary tree is the special case where k=2.
This is something like B-Tree.

Complete binary tree definitions

I have some questions on binary trees:
Wikipedia states that a binary tree is complete when "A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible." What does the last "as far left as possible" passage mean?
A well-formed binary tree is said to be "height-balanced" if (1) it is empty, or (2) its left and right children are height-balanced and the height of the left tree is within 1 of the height of the right tree, taken from How to determine if binary tree is balanced?, is this correct or there's "jitter" on the 1-value? I read on the answer I linked that there could be also a difference factor of 4 between the height of the right and the left tree
Do the complete and height-balanced definitions just apply to binary tree or just any other tree?
Following the reference of the definition in wikipedia, I got to
this page. The definition was taken from there but modified:
Definition: A binary tree in which every level, except possibly the deepest, is completely filled. At depth n, the height of the
tree, all nodes must be as far left as possible.
It continues with a note below though,
A complete binary tree has 2k nodes at every depth k < n and between 2n and 2^(n+1) - 1 nodes altogether.
Sometimes, definitions vary according to convenience (be useful for something). That passage might be a variation which, as I understand, requires leaf nodes to fill first the left side of the deepest level (that is, fill from left to right). The definition that I usually found is exactly as described above but without that
passage.
Usually the definition taken for height-balanced tree is the one you
described. In other words:
A tree is balanced if and only if for every node the heights of its two subtrees differ by at most 1.
That definition was taken from here. Again, sometimes definitions are made more flexible to serve specific purposes. For example, the definition of an AVL tree says that
In an AVL tree, the heights of the two child subtrees of any node
differ by at most one
Still, I remember once I had to rewrite an algorithm so that the tree
would be considered height-balanced if the two child subtrees of any
node differed by at most 2. Note that the definition you gave is recursive, this is very common for binary trees.
In a tree whose number of children is variable, you wouldn't be able to say that it is complete (any parent could have the number of children that you want). Still, it can apply to n-ary trees (with a fixed amount of n children).
Do the complete and height-balanced definitions just apply to binary
tree or just any other tree?
Short answer: Yes, it can be extended to any n-ary tree.

How does a red-black tree work?

There are lots of questions around about red-black trees but none of them answer how they work. Why is it called red-black? How does this keep the tree balanced (thus increasing performance over an unbalanced normal binary search tree)? I'm just looking for an overview of how and why it works.
For searches and traversals, it's the same as any binary tree.
For inserts and deletes, more sophisticated algorithms are applied which aim to ensure that the tree cannot be too unbalanced. These guarantee that all single-item operations will always run in at worst O(log n) time, whereas in a simple binary tree the binary tree can become so unbalanced that it's effectively a linked list, giving O(n) worst case performance for each single-item operation.
The basic idea of the red-black tree is to imitate a B-tree with up to 3 keys and 4 children per node. B-trees (or variations such as B+ trees) are mainly used for database indexes and for data stored on hard disk.
Each binary tree node has a "colour" - red or black. Each black node is, in the B-tree analogy, the subtree root for the subtree that fits within that B-tree node. If this node has red children, they are also considered part of the same B-tree node. So it is possible (though not done in practice) to convert a red-black tree to a B-tree and back, with (most) structure preserved. The only possible anomoly is that when a B-tree node has two keys and three children, you have a choice of which key to goes in the black node in the equivalent red-black tree.
For example, with red-black trees, every line from root to leaf has the same number of black nodes. This rule is derived from the B-tree rule that all leaf nodes are at the same depth.
Although this is the basic idea from which red-black trees are derived, the algorithms used in practice for inserts and deletes are modified to enforce all the B-tree rules (there might be a minor exception - I forget) during updates, but are tailored for the binary tree form. This means that doing a red-black tree insert or delete may give a different structure for the result than that you'd expect comparing with doing the B-tree insert or delete.
For more detail, follow the Wikipedia link that MigDus already supplied.
A red-black tree is an ordered binary tree where each vertex is coloured red or black. The intuition is that a red vertex should be seen as being at the same height as its parent (i.e., an edge to a red vertex is thought of as "horizontal" rather than "descending").
[I don't believe the Wikipedia entry makes this point clear.]
The usual rules for red-black trees require that a red vertex never point to another red vertex. This means that the possible vertex arrangements for any subtree rooted with a black vertex (bbb, bbr, rbb, rbr -- for [left child][root][right child]) correspond to 234 trees.
Searching a red-black tree is just the same as searching an ordinary binary tree. Insertion and deletion are similar, except that a "fix-up" rotation may be required at some point to preserve the red-black invariant.
Cheers!

Resources