Is kd-tree always balanced? - algorithm

I have used kd-tree algoritham and make tree.
But i found that tree is not balanced so my question is if we used kd-tree algoritham then that tree is always balanced if not then how can we make it balance ?.
We can use another algoritham likes AVL or Red-Black for balancing kd tree ?
I have some sample data for that i used kd-tree algoritham but that tree is not balanced.
(14,31), (15,32), (17,42), (16,44), (18,52), (16,62)

This is a fairly broad topic and the questions themselves are kind of general.
Hopefully this will give you some useful insights and material to work with:
Kd tree is not always balanced.
AVL and Red-Black will not work with K-D Trees, you will have either construct some balanced variant such as K-D-B-tree or use other balancing techniques.
K-d Tree are commonly used to store GeoSpatial data because they let you search over more then one key, contrary to 'traditional' tree which lets you do single dimensional search. GeoSpatial data certainly cannot be represented in single dimension.
Note that there are also specialized databases working with GeoSpatial data so it might be worth checking if the overhead could be shifted to them instead of making your own solution: Although i don't have much experience with this, maybe it is worth checking the postgis.
postgis
Here are some useful links showing how to build balanced K-D tree variant and usage of K-D trees with Spatial data:
balancing K-D-Tree
K-D-B-tree
spatial data k-d-trees

It depends on how you build the tree.
If built as originally published, the tree will be balanced, i.e. only at the leaf level it will have at most a height difference of 1. If your data set has 2^n-1 elements, the tree will be perfectly balanced.
When constructed with the median, then half of the objects must be on either branch of the tree, thus it has minimal height and is balanced.
However, this tree cannot be changed then. I am not aware of an insert or remove algorithm that would preserve this property, but YMMV. I bet there are two dozens of kd-tree extensions that aim at rebalancing and making insertions/deletions more effective.
The k-d-tree is not designed for changes, and will quickly lose efficiency. It relies on the median, and thus any change to the tree would worst-case propagate through all of the tree. Therefore, you need to allow some tolerance in the tree quality to support changes. It appears to be a common approach to just keep track of insertions/deletions and rebuild the tree eventually. You cannot combine it with red-black-trees or AVL-trees, because data with more than 1 dimension is not ordered; these trees only work for ordered data. Upon rotation of the tree the splitting axis changes; and there may be elements in either half that suddenly would need to move to the other branch. This does not happen in AVL or red-black trees.
But as you can imagine, people have published several indexes that remain balanced. Such as k-d-b-trees, and R-trees. These also work better for large data that needs to be stored on disk.

In order to make your kd-tree balanced use median value.
(14,31), (15,32), (17,42), (16,44), (18,52), (16,62)
In the root choose median of x-cordinates [14,15,16,16,17,18] which is 16,
So all the elements less than 16 goes to left part of the tree and
greater than or equal to goes to right side of tree.
as of now,
left part tree consists of [14,31],[15,32] ,now for y-axis find the median for [31,32]
so that the tree is balanced

Related

Dynamically building a balanced BST with values "in the leaves"?

In their book Computational Geometry (2008), de Berg, et al., describe the data structure underlying their range search algorithm as a balanced BST where "leaves of T store the points of P and the internal nodes of T store splitting values to guide the search."
The Wikipedia page on range trees (link), which cites de Berg, says: "A 1-dimensional range tree on a set of n points is a binary search tree" such that "each node which is not a leaf stores the largest value of its left subtree."
Examples online construct such trees statically, by first sorting the set of points and then recursively pairing up nodes.
Does there exist an algorithm to build a BST of this nature dynamically (i.e., with the ability to insert additional values into the tree)? Where is it described?
It's possible to adapt just about any tree balancing procedure to work with these two examples, just by treating the leaves separately -- make a balanced tree of the internal nodes, and then take care to keep the leaves in order. Each operation, including balancing, will require you to recalculate the "summary statistics" on at most O(log N) nodes. Those are all the nodes that were updated and their ancestors.
This can be a little complicated, though, and doesn't work for the multi-dimensional range tree, because every level is treated differently from the ones above and below, and that makes tree rotations (which most balancing operations require) invalid.
For these kinds of trees, therefore, where different levels are handled differently, it is usually best to just avoid tree rotations by using a low-order B+tree variant like a 2-3 tree. In a tree like this, nodes can be split and merged, but they never have to change height -- you can implement them so that leaves are always leaves and internal nodes are always internal. The height of the tree is only ever changed by adding or removing the root.
Of course, if you use a tree that can have more than 2 children per node, then your search algorithms will need to change, but the changes are typically trivial.

Comparison between optimal binary search tree and AVL Trees?

I was preparing for my finals and while going through these topics, I got stuck with few queries.
I know that optimal binary Search trees give optimal structure to trees with respect to the frequency of the nodes. The idea is to keep the nodes having higher frequency (and also satisfying the bst properties) at the top for optimal solution.
Whereas Avl are BST which on rotation perform balancing and hence provide a balanced tree but frequencies don't play a role here.
So, my query is if we know that for frequencies of the respective nodes, we come to a optimal bst structure which is of the form :
Eg :
10
\
12
\
14
\
16...And so on
I.e., linked list kind of a structure but it is optimal as frequencies are, for example [15,13,10,4,2..]
But if this is solved with AVL, AVL will balance the tree and form a different structure making it not so optimal for these frequencies.
So, my question is if we know the respective frequencies for the nodes, then applying AVL on such data is not the right choice as compared to optimal binary search tree ?
Is optimal binary search tree a better choice in such situations ?
I would be thankful if anyone can clear this doubt of mine.
Thanks in advance.

Keeping avl tree balanced without rotations

B Tree is self balancing tree like AVL tree. HERE we can see how left and right rotations are used to keep AVL tree balanced.
And HERE is a link which explains insertion in B tree. This insertion technique does not involve any rotations, if I am not wrong, to keep tree balanced. And therefore it looks simpler.
Question: Is there any similar (or any other technique without using rotations) to keep avl tree balanced ?
The answer is... yes and no.
B-trees don't need to perform rotations because they have some slack with how many different keys they can pack into a node. As you add more and more keys into a B-tree, you can avoid the tree becoming lopsided by absorbing those keys into the nodes themselves.
Binary trees don't have this luxury. If you insert a key into a binary tree, it will increase the height of some branch in that tree by 1 in all cases because that key needs to go into its own node. Rotations combat the overall growth of the tree by ensuring that if certain branches grow too much, that height is shuffled into the rest of the tree.
Most balanced BSTs have some sort of rebalancing strategy that involves rotations, but not all do. One notable example of a strategy that doesn't directly involve rotations is the scapegoat tree, which rebalances by tearing huge subtrees out of the master tree, optimally rebuilding them, then gluing the subtree back into the main tree. This approach doesn't technically involve any rotations and is a pretty clean way to implement a balanced tree.
That said - the most space-efficient implementations of scapegoat trees do indeed use rotations to convert an imbalanced tree into a perfectly balanced one. You don't have to use rotations to do this, though if space is short it's probably the best way to do so.
Hope this helps!
Rotations can be made simple (if you need only simplicity).
If the insertion traffic is left, the balance -1 is the red-light.
If the insertion traffic is right, the balance 1 is the red-light.
This is a (simplified) coarse-graining (2-adic rounding) of the normalized fundamental AVL balance:
{left,even,right} ~ {low,even,high} ~ {green,green,red}
Walk the insertion route and rotate every red-light (before the insertion). If the next light is green, you can just rotate the red-light 1 or 2 times. You may have to re-balance the next subtrees before each rotation, because inner subtrees are invariant. This is simple, but it takes a very long time. You have to move down the green-light before each rotation. You can always move down the green-light to the root, and you can rotate the tree-top to generate a new green-light.
The red-light rotations naturally move down the green-light.
At this point, you have only the green-lights for the insertion.
The cost structure of this naive method is topologically simplified as
df(h)/dh=∫f(h)dh
such as sin(h),sinh(h),etc.

Why storing data only in the leaf nodes of a balanced binary-search tree?

I have bought a nice little book about computational geometry. While reading it here and there, I often stumbled over the use of this special kind of binary search tree. These trees are balanced and should store the data only in the leaf nodes, whereas inner nodes should only store values to guide the search down to the leaves.
The following image shows an example of this trees (where the leaves are rectangles and the inner nodes are circles).
I have two questions:
What is the advantage of not storing data in the inner nodes?
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
Any kind of helpful resource is very welcome.
What is the advantage of not storing data in the inner nodes?
There are some tree data structures that, by design, require that no data is stored in the inner nodes, such as Huffman code trees and B+ trees. In the case of Huffman trees, the requirement is that no two leaves have the same prefix (i.e. the path to node 'A' is 101 whereas the path to node 'B' is 10). In the case of B+ trees, it comes from the fact that it is optimized for block-search (this also means that every internal node has a lot of children, and that the tree is usually only a few levels deep).
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
Sure! An AVL tree is not extremely complicated, so it's a good candidate for learning.
It is common to have other kinds of binary trees with data at the leaves instead of the interior nodes, but fairly uncommon for binary SEARCH trees.
One reason you might WANT to do this is educational -- it's often EASIER to implement a binary search tree this way then the traditional way. Why? Almost entirely because of deletions. Deleting a leaf is usually very easy, whereas deleting an interior node is harder/messier. If your data is only at the leaves, then you are always in the easy case!
It's worth thinking about where the keys on interior nodes come from. Often they are duplicates of keys that are also at the leaves (with data). Later, if the key at the leaf is deleted, the key at the interior nodes might still hang around.
What is the advantage of not storing data in the inner nodes?
In general, there is no advantage in not storing data in the inner nodes. For example, a red-black tree is a balanced tree and it stores its data into the inner and leaf nodes.
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
In my opinion, it is.
One benefit to only keeping the data in leaf nodes (e.g., B+ tree) is that scanning/reading the data is exceedingly simple. The leaf nodes are linked together. So to read the next item when you are at the "end" (right or left) of the data within a given leaf node, you just read the link/pointer to the next (or previous) node and jump to the next leaf page.
With a B tree where data is in every node, you have to traverse the tree to read the data in order. That is certainly a well-defined process but is arguably more complex and typically requires more state information.
I am reading the same book and they say it could be done either way, data storage at external or at internal nodes.
The trees they use are Red-Black.
In any case, here is an article that stores data at internal nodes of a Red Black Tree and then links these data nodes together as a list.
Balanced binary search tree with a doubly linked list in C++
by Arjan van den Boogaard
http://archive.gamedev.net/archive/reference/programming/features/TStorage/default.html

Why is avl tree faster for searching than red black tree?

I have read it in a couple of places that avl tree search faster, but not able to understand. As I understand :
max height of red-black tree = 2*log(N+1)
height of AVL tree = 1.44*logo(N+1)
Is it because AVL is shorter?
Yes.
The number of steps required to find an item depends on the distance between the item and the root.
Since the AVL tree is packed tighter (i.e. it has a lower max height) it means more items are closer to the root than in the red-black case.
The extra tight packing also means the AVL tree requires more work when inserting elements.
The best choice for any app depends on whether it is insert intensive or search intensive...
AVL tree is better than red-black tree if the input key is almost ascending/descending because then we would have to do single rotation(left-left or right-right case) to add this element. Also, since the tree would be tightly balanced, the search would also be faster.
But for randomly selected input key, RBTree are better since they require less rotation for insertion in comparison to AVL.
Overall, it depends on the input sequence, which would decide how tilted our tree is, and the operation performed.For insert-intensive use Red-Black Tree and for search-intensive use AVL.
AVL tree and RBTree do have respective advantages as well as disadvantages. You'll perceive that better if you've already learned how they work.
AVL is slightly faster than RBTree in insert operation because there would be at most one rotation involved in insertion, while there may be two for RBTree.
RBTree only require at most three rotations in deletion, but this is not guaranteed in AVL. So it can delete nodes faster than AVL.
However, above all, they both have strict logarithmic tree height.
Pick up any subtree, the property that makes AVL "balanced" guarantees that the difference of height between two child subtrees is at most one, which is to say, intuitively, the whole tree is rigidly balanced.
But when it comes to an RBTree, the rule becomes likely "looser", since property of RBTree can only guarantee the depth of a tree is not larger than twice as the logarithm of the total number of nodes.
Here're some facts that may be more precise:
An AVL tree's height is strictly less than: 1.44log(n+2)-0.328
(approximately)
A red-black tree's height is at most 2log(n+1)
See https://en.wikipedia.org/wiki/AVL_tree#Comparison_to_other_structures for detailed information.

Resources