Applications of 2-3-4 Tree - binary-tree

What are the applications of 2-3-4 trees? Are they widely used in applications for providing better performance of the applications ?
Edit : Which algorithms make the best use of 2-3-4 trees ?

2-3-4 trees are self-balancing and usually are usually very efficient for finding, adding and deleting elements, so like all trees they can be used for storing and retrieving elements in non-linear order. Unfortunately they tend to use more memory than other trees, because even nodes with only 2 data items still need to have enough memory to possibly store 4 of them.
This is why 2-3-4 trees are used as models for red-black trees, which are like standard BSTs except that nodes can be either red or black, and various rules exist about how to choose which colour a node is.
The key is that algorithms for searching/adding/deleting in a 2-3-4 tree are VERY similar to the ones for a red-black tree, so usually 2-3-4 trees are studied as a way of understanding red-black trees. The red-black trees themselves are quite widely used - I believe the standard Java Collections Framework tree is a red-black tree.

Answer to the applications of 2-3-4 Trees are:
• Linux Kernel.
• Completely Fair Scheduler
• To keep track of Virtual Memory Segments of a Process.
Because it is similar of Red Black Trees and as #Adam said we use it in Java Collections Framework itself.

Related

Is there a tree structure with multiple root nodes, if so, what is it called?

A classical tree has one root node. Example:
Is there a tree with multiple initial roots like in the picture?:
As #Joe Sewell noted in the comments, a collection of independent trees is called a forest. This term applies both to collections of directed rooted trees like the one you showed above, plus collections of undirected, unrooted trees as well.
Many data structures and algorithms make use of forests. The binomial and Fibonacci heap data structures store their items in a collection of smaller independent trees. Link/cut trees, which are used in some maximum flow algorithms, work with independent collections of trees as well.

Is kd-tree always balanced?

I have used kd-tree algoritham and make tree.
But i found that tree is not balanced so my question is if we used kd-tree algoritham then that tree is always balanced if not then how can we make it balance ?.
We can use another algoritham likes AVL or Red-Black for balancing kd tree ?
I have some sample data for that i used kd-tree algoritham but that tree is not balanced.
(14,31), (15,32), (17,42), (16,44), (18,52), (16,62)
This is a fairly broad topic and the questions themselves are kind of general.
Hopefully this will give you some useful insights and material to work with:
Kd tree is not always balanced.
AVL and Red-Black will not work with K-D Trees, you will have either construct some balanced variant such as K-D-B-tree or use other balancing techniques.
K-d Tree are commonly used to store GeoSpatial data because they let you search over more then one key, contrary to 'traditional' tree which lets you do single dimensional search. GeoSpatial data certainly cannot be represented in single dimension.
Note that there are also specialized databases working with GeoSpatial data so it might be worth checking if the overhead could be shifted to them instead of making your own solution: Although i don't have much experience with this, maybe it is worth checking the postgis.
postgis
Here are some useful links showing how to build balanced K-D tree variant and usage of K-D trees with Spatial data:
balancing K-D-Tree
K-D-B-tree
spatial data k-d-trees
It depends on how you build the tree.
If built as originally published, the tree will be balanced, i.e. only at the leaf level it will have at most a height difference of 1. If your data set has 2^n-1 elements, the tree will be perfectly balanced.
When constructed with the median, then half of the objects must be on either branch of the tree, thus it has minimal height and is balanced.
However, this tree cannot be changed then. I am not aware of an insert or remove algorithm that would preserve this property, but YMMV. I bet there are two dozens of kd-tree extensions that aim at rebalancing and making insertions/deletions more effective.
The k-d-tree is not designed for changes, and will quickly lose efficiency. It relies on the median, and thus any change to the tree would worst-case propagate through all of the tree. Therefore, you need to allow some tolerance in the tree quality to support changes. It appears to be a common approach to just keep track of insertions/deletions and rebuild the tree eventually. You cannot combine it with red-black-trees or AVL-trees, because data with more than 1 dimension is not ordered; these trees only work for ordered data. Upon rotation of the tree the splitting axis changes; and there may be elements in either half that suddenly would need to move to the other branch. This does not happen in AVL or red-black trees.
But as you can imagine, people have published several indexes that remain balanced. Such as k-d-b-trees, and R-trees. These also work better for large data that needs to be stored on disk.
In order to make your kd-tree balanced use median value.
(14,31), (15,32), (17,42), (16,44), (18,52), (16,62)
In the root choose median of x-cordinates [14,15,16,16,17,18] which is 16,
So all the elements less than 16 goes to left part of the tree and
greater than or equal to goes to right side of tree.
as of now,
left part tree consists of [14,31],[15,32] ,now for y-axis find the median for [31,32]
so that the tree is balanced

Why storing data only in the leaf nodes of a balanced binary-search tree?

I have bought a nice little book about computational geometry. While reading it here and there, I often stumbled over the use of this special kind of binary search tree. These trees are balanced and should store the data only in the leaf nodes, whereas inner nodes should only store values to guide the search down to the leaves.
The following image shows an example of this trees (where the leaves are rectangles and the inner nodes are circles).
I have two questions:
What is the advantage of not storing data in the inner nodes?
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
Any kind of helpful resource is very welcome.
What is the advantage of not storing data in the inner nodes?
There are some tree data structures that, by design, require that no data is stored in the inner nodes, such as Huffman code trees and B+ trees. In the case of Huffman trees, the requirement is that no two leaves have the same prefix (i.e. the path to node 'A' is 101 whereas the path to node 'B' is 10). In the case of B+ trees, it comes from the fact that it is optimized for block-search (this also means that every internal node has a lot of children, and that the tree is usually only a few levels deep).
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
Sure! An AVL tree is not extremely complicated, so it's a good candidate for learning.
It is common to have other kinds of binary trees with data at the leaves instead of the interior nodes, but fairly uncommon for binary SEARCH trees.
One reason you might WANT to do this is educational -- it's often EASIER to implement a binary search tree this way then the traditional way. Why? Almost entirely because of deletions. Deleting a leaf is usually very easy, whereas deleting an interior node is harder/messier. If your data is only at the leaves, then you are always in the easy case!
It's worth thinking about where the keys on interior nodes come from. Often they are duplicates of keys that are also at the leaves (with data). Later, if the key at the leaf is deleted, the key at the interior nodes might still hang around.
What is the advantage of not storing data in the inner nodes?
In general, there is no advantage in not storing data in the inner nodes. For example, a red-black tree is a balanced tree and it stores its data into the inner and leaf nodes.
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
In my opinion, it is.
One benefit to only keeping the data in leaf nodes (e.g., B+ tree) is that scanning/reading the data is exceedingly simple. The leaf nodes are linked together. So to read the next item when you are at the "end" (right or left) of the data within a given leaf node, you just read the link/pointer to the next (or previous) node and jump to the next leaf page.
With a B tree where data is in every node, you have to traverse the tree to read the data in order. That is certainly a well-defined process but is arguably more complex and typically requires more state information.
I am reading the same book and they say it could be done either way, data storage at external or at internal nodes.
The trees they use are Red-Black.
In any case, here is an article that stores data at internal nodes of a Red Black Tree and then links these data nodes together as a list.
Balanced binary search tree with a doubly linked list in C++
by Arjan van den Boogaard
http://archive.gamedev.net/archive/reference/programming/features/TStorage/default.html

How are red-black trees isomorphic to 2-3-4 trees?

I have a basic understanding of both red black trees and 2-3-4 trees and how they maintain the height balance to make sure that the worst case operations are O(n logn).
But, I am not able to understand this text from Wikipedia
2-3-4 trees are an isometry of red-black trees, meaning that they are equivalent data structures. In other words, for every 2-3-4 tree, there exists at least one red-black tree with data elements in the same order. Moreover, insertion and deletion operations on 2-3-4 trees that cause node expansions, splits and merges are equivalent to the color-flipping and rotations in red-black trees.
I don't see how the operations are equivalent. Is this quote on Wikipedia accurate? How can one see that the operations are equivalent?
rb-tree(red-black-tree) is not isomorphic to 2-3-4-tree. Because the 3-node in 2-3-4-tree can be lean left or right if we try to map this 3-node to a rb-tree. But llrb-tree(Left-leaning red-black tree) does.
Words from Robert Sedgewick(In Introduction section):
In particular, the paper describes a way to maintain
a correspondence between red-black trees and 2-3-4 trees,
by interpreting red links as internal links in 3-nodes and
4-nodes. Since red links can lean either way in 3-nodes
(and, for some implementations in 4-nodes), the correspondence is not necessarily 1-1
Also Page29 and Page30 of presentation from Robert Sedgewick. This a presentation about LLRB tree.
And "Analogy to B-trees of order 4" section of "Red-black Tree" in the wikipedia, it contains a good graph.

How does a red-black tree work?

There are lots of questions around about red-black trees but none of them answer how they work. Why is it called red-black? How does this keep the tree balanced (thus increasing performance over an unbalanced normal binary search tree)? I'm just looking for an overview of how and why it works.
For searches and traversals, it's the same as any binary tree.
For inserts and deletes, more sophisticated algorithms are applied which aim to ensure that the tree cannot be too unbalanced. These guarantee that all single-item operations will always run in at worst O(log n) time, whereas in a simple binary tree the binary tree can become so unbalanced that it's effectively a linked list, giving O(n) worst case performance for each single-item operation.
The basic idea of the red-black tree is to imitate a B-tree with up to 3 keys and 4 children per node. B-trees (or variations such as B+ trees) are mainly used for database indexes and for data stored on hard disk.
Each binary tree node has a "colour" - red or black. Each black node is, in the B-tree analogy, the subtree root for the subtree that fits within that B-tree node. If this node has red children, they are also considered part of the same B-tree node. So it is possible (though not done in practice) to convert a red-black tree to a B-tree and back, with (most) structure preserved. The only possible anomoly is that when a B-tree node has two keys and three children, you have a choice of which key to goes in the black node in the equivalent red-black tree.
For example, with red-black trees, every line from root to leaf has the same number of black nodes. This rule is derived from the B-tree rule that all leaf nodes are at the same depth.
Although this is the basic idea from which red-black trees are derived, the algorithms used in practice for inserts and deletes are modified to enforce all the B-tree rules (there might be a minor exception - I forget) during updates, but are tailored for the binary tree form. This means that doing a red-black tree insert or delete may give a different structure for the result than that you'd expect comparing with doing the B-tree insert or delete.
For more detail, follow the Wikipedia link that MigDus already supplied.
A red-black tree is an ordered binary tree where each vertex is coloured red or black. The intuition is that a red vertex should be seen as being at the same height as its parent (i.e., an edge to a red vertex is thought of as "horizontal" rather than "descending").
[I don't believe the Wikipedia entry makes this point clear.]
The usual rules for red-black trees require that a red vertex never point to another red vertex. This means that the possible vertex arrangements for any subtree rooted with a black vertex (bbb, bbr, rbb, rbr -- for [left child][root][right child]) correspond to 234 trees.
Searching a red-black tree is just the same as searching an ordinary binary tree. Insertion and deletion are similar, except that a "fix-up" rotation may be required at some point to preserve the red-black invariant.
Cheers!

Resources