All link-cut tree operations are described assuming that we already have a node we want to perform the operation on, but how do I have to store the tree to be able to get to that node in the first place?
I can't just store the root of the abstract tree, because link-cut trees don't support descendant search so most of the nodes in the tree will be lost.
I had an idea to store roots of every solid tree (splay tree) in a vector, but there's another problem: solid trees are constantly changing their structure, which means I will have to recalculate roots of each tree after performing any operation, which is complicated, slow, and definitely is not correct.
So what is the correct way to store it?
Related
Is there a data structure for a sorted set allows quick lookup of the n-th (i.e. the least but n-th) item? That is, something like a a hybrid between a rope and a red-black tree.
Seems like it should be possible to either keep track of the size of the left subtree and update it through rotations or do something else clever and I'm hoping someone smart has already worked this out.
Seems like it should be possible to either keep track of the size of the left subtree and update it through rotations […]
Yes, this is quite possible; but instead of keeping track of the size of the left subtree, it's a bit simpler to keep track of the size of the complete subtree rooted at a given node. (You can then get the size of its left subtree by examining its left-child's size.) It's not as tricky as you might think, because you can always re-calculate a node's size as long as its children are up-to-date, so you don't need any extra bookkeeping beyond making sure that you recalculate sizes by working your way up the tree.
Note that, in most mutable red-black tree implementations, 'put' and 'delete' stop walking back up the tree once they've restored the invariants, whereas with this approach you need to walk all the way back up the tree in all cases. That'll be a small performance hit, but at least it's not hard to implement. (In purely functional red-black tree implementations, even that isn't a problem, because those always have to walk the full path back up to create the new parent nodes. So you can just put the size-calculation in the constructor — very simple.)
Edited in response to your comment:
I was vaguely hoping this data structure already had a name so I could just find some implementations out there and that there was something clever one could do to minimize the updating but (while I can find plenty of papers on data structures that are variations of balanced binary trees) I can't figure out a good search term to look for papers that let one lookup the nth least element.
The fancy term for the nth smallest value in a collection is order statistic; so a tree structure that enables fast lookup by order statistic is called an order statistic tree. That second link includes some references that may help you — not sure, I haven't looked at them — but regardless, that should give you some good search terms. :-)
Yes, this is fully possible. Self-balancing tree algorithms do not actually need to be search trees, that is simply the typical presentation. The actual requirement is that nodes be ordered in some fashion (which a rope provides).
What is required is to update the tree weight on insert and erase. Rotations do not require a full update, local is enough. For example, a left rotate requires that the weight of the parent be added to the new parent (since that new parent is the old parent's right child it is not necessary to walk down the new parent's right descent tree since that was already the new parent's left descent tree). Similarly, for a right rotate it is necessary to subtract the weight of the new parent only, since the new parent's right descent tree will become the left descent tree of the old parent.
I suppose it would be possible to create an insert that updates the weight as it does rotations then adds the weight up any remaining ancestors but I didn't bother when I was solving this problem. I simply added the new node's weight all the way up the tree then did rotations as needed. Similarly for erase, I did the fix-up rotations then subtracted the weight of the node being removed before finally unhooking the node from the tree.
I'm aware of ways to keep binary search trees balanced/self-balancing using rotations.
I am not sure if my case needs to be that complicated. I don't need to maintain any sorted order property like with self-balancing BSTs. I just have an ordinary binary tree that I may need to delete nodes or insert nodes. I need try to maintain balance in the tree. For simplicity, my binary tree is similar to a segment tree, and every time a node is deleted, all the nodes along the path from the root to this node will be affected (in my case, it's just some subtraction of the nodal values). Similarly, every time a node is inserted, all the nodes from the root to the inserted node's final location will be affected (an addition to nodal values this time).
What would be the most straightforward way to keep a tree such as this balanced? It doesn't need to be strictly as height balanced as AVL trees, but something like RB trees or maybe slightly less balanced is acceptable as well.
If a new node does not have to be inserted at a particular spot -- possibly determined by its own value and the values in the tree -- but you are completely free to choose its location, then you could maintain the shape of the tree as a complete tree:
In a complete binary tree every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible.
An array is a very efficient data structure for a complete tree, as you can store the nodes in their order in a breadth-first traversal. Because the tree is given to be complete, the array has no gaps. This structure is commonly used for heaps:
Heaps are usually implemented with an array, as follows:
Each element in the array represents a node of the heap, and
The parent / child relationship is defined implicitly by the elements' indices in the array.
Example of a complete binary max-heap with node keys being integers from 1 to 100 and how it would be stored in an array.
In the array, the first index contains the root element. The next two indices of the array contain the root's children. The next four indices contain the four children of the root's two child nodes, and so on. Therefore, given a node at index i, its children are at indices 2i + 1 and 2i + 2, and its parent is at index floor((i-1)/2). This simple indexing scheme makes it efficient to move "up" or "down" the tree.
Operations
In your case, you would define the insert/delete operations as follows:
Insert: append the node to the end of the array. Then perform the mutation needed to its ancestors (as you described in your question)
Delete: replace the node to be deleted with the node that currently sits at the very end of the array, and shorten the array by 1. Make the updates needed that follow from the change at these two locations -- so two paths from root-to-node are impacted.
When balancing non-BSTs, the big question to ask is
Can your tree efficiently support rotations?
Some types of binary trees, like k-d trees, have a specific layer-by-layer structure that makes rotations infeasible. Others, like range trees, have auxiliary metadata in each node that's expensive to update after a rotation. But if you can handle rotations, then you can use just about any of the balancing strategies out there. The simplest option might be to model your tree on a treap: put a randomly-chosen weight field into each node, and then, during insertions, rotate your newly-added leaf up until its weight is less than its parent. To delete, repeatedly rotate the node with its lighter child until it's a leaf, then delete it.
If you cannot support rotations, you'll need a rebalancing strategy that does not require them. Perhaps the easiest option there is to model your tree after a scapegoat tree, which works by lazily detecting a node that's too deep for the tree to be balanced, then rebuilding the smallest imbalanced subtree possible into a perfectly-balanced tree to get everything back into order. Deletions are handled by rebuilding the whole tree once the number of nodes drops by some constant factor.
Why nodes of a binary tree have links only from parent to children? I know tha there is threaded binary tree but those are harder to implement. A binary tree with two links will allow traversal in both directions iteratively without a stack or queue.
I do not know of any such design. If there is one please let me know.
Edit1: Let me conjure a problem for this. I want to do traversal without recursion and without using extra memory in form of stack or queue.
PS: I am afraid that I am going to get flake and downvotes for this stupid question.
Some binary trees do require children to keep up with their parent, or even their grandparent, e.g. Splay Trees. However this is only to balance or splay the tree. The reason we only traverse a tree from the parent to the children is because we are usually searching for a specific node, and as long as the binary tree is implemented such that all left children are less than the parent, and all right children are greater than the parent (or vice-versa), we only need links in one direction to find that node. We start the search at the root and then iterate down, and if the node is in the tree, we are guaranteed to find it. If we started at a leaf, there is no guarantee we would find the node we want by going back to the root. The reason we don't have links from the child to the parent is because it is unnecessary for searches. Hope this helps.
It can be, however, we should consider the balance between the memory usage and the complexity.
Yeah you can traverse the binary tree with an extra link in each node, but actually you are using the same extra memory as you do the traversal with a queue, which even run faster.
What binary search tree good at is that it can implement many searching problems in O(logN). It's fast enough and memory saving.
Let me conjure a problem for this. I want to do traversal without recursion and without using extra memory in form of stack or queue.
Have you considered that the parent pointers in the tree occupy space themselves?
They add O(N) memory to the tree to store parent pointer in order not to use O(log N) space during recursion.
What parent pointers allow us to do is to support an API whereby the caller can pass a pointer to a node and request an operation on it like "find the next node in order" (for example).
In this situation, we do not have a stack which holds the path to the root; we just receive a node "out of the blue" from the caller. With parent pointers, given a tree node, we can find its successor in amortized constant time O(1).
Implementations which don't require this functionality can save space by not including the parent pointers in the tree, and using recursion or an explicit stack structure for the root to leaf traversals.
A Doubly Linked List enables idiomatic traversal of a Linked List and I thought why not for a Binary Tree? Traditionally, Binary Trees or Trees ingeneral are unidirectional and that implies, given a large tree with sufficient number of nodes, the running time to find a leaf node can be costly.
If, after finding such a node, to find the next I could traverse the tree back toward the root, would that not be advantageous as compared to another depth-first search through every node of the tree? I have never considered this before until realizing the marriage of a Doubly Linked List and a Binary Tree could potentially add benefit.
For example, if I employed an inner class
class Tree<T> {
private class TwoWayNode {
var data : T
var left : TwoWayNode
var right : TwoWayNode
var previous : TwoWayNode
}
}
The use of left and right are as normal to traverse the respective subtrees from each node and previous would hold a pointer to the parent node enable idiomatic traversal. Would someting like this work well and what are some of the potential problems or pitfalls?
Given you store a previous reference, you can walk leftmost first. Upon arrival at the leaf node, you back one up again, traverse right.
You can always compare the current node, your "walker", with the child nodes, so you can check if you went left or right the last time. This makes your traversal stateless and you do not even need recursion; suitable for very large datasets.
Now, everytime you just left the right leaf, you back one up again.
This algorithm is a Depth-First-Search.*
Making it faster:
Given that you could define some deterministic condition for the order of traversal, this can become quite flexible, and even be used in applications like ray tracing.
*: http://en.wikipedia.org/wiki/Depth-first_search
Bonus: This paper on traversal algorithms for Kd-trees in Ray Tracing: Review: Kd-tree Traversal Algorithms for Ray Tracing (http://dcgi.felk.cvut.cz/home/havran/ARTICLES)/cgf2011.pdf
Indeed nodes of a binary tree are often implemented with pointers to the left and right child and the parent (see this implementation of red black trees).
But you not always need a parent pointer:
For an inorder-traversal you can use a recursive algorithm so that the call stack takes care of that for you.
If you want to access the min or max node you can simply maintain a extra pointer to them.
Sometimes you can use a finger tree.
Or organize your pointers extra clever (see Self adjusting binary search trees page 666):
The left pointer of a node points to the first (left) child
The right pointer of a node points to either the sibling (if it is a left child) or back to the parent (if it is a right child)
Extra cool: Threaded binary search trees for extra easy inorder (and reverse order) traversal without a stack - so O(1) space!
I have bought a nice little book about computational geometry. While reading it here and there, I often stumbled over the use of this special kind of binary search tree. These trees are balanced and should store the data only in the leaf nodes, whereas inner nodes should only store values to guide the search down to the leaves.
The following image shows an example of this trees (where the leaves are rectangles and the inner nodes are circles).
I have two questions:
What is the advantage of not storing data in the inner nodes?
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
Any kind of helpful resource is very welcome.
What is the advantage of not storing data in the inner nodes?
There are some tree data structures that, by design, require that no data is stored in the inner nodes, such as Huffman code trees and B+ trees. In the case of Huffman trees, the requirement is that no two leaves have the same prefix (i.e. the path to node 'A' is 101 whereas the path to node 'B' is 10). In the case of B+ trees, it comes from the fact that it is optimized for block-search (this also means that every internal node has a lot of children, and that the tree is usually only a few levels deep).
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
Sure! An AVL tree is not extremely complicated, so it's a good candidate for learning.
It is common to have other kinds of binary trees with data at the leaves instead of the interior nodes, but fairly uncommon for binary SEARCH trees.
One reason you might WANT to do this is educational -- it's often EASIER to implement a binary search tree this way then the traditional way. Why? Almost entirely because of deletions. Deleting a leaf is usually very easy, whereas deleting an interior node is harder/messier. If your data is only at the leaves, then you are always in the easy case!
It's worth thinking about where the keys on interior nodes come from. Often they are duplicates of keys that are also at the leaves (with data). Later, if the key at the leaf is deleted, the key at the interior nodes might still hang around.
What is the advantage of not storing data in the inner nodes?
In general, there is no advantage in not storing data in the inner nodes. For example, a red-black tree is a balanced tree and it stores its data into the inner and leaf nodes.
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
In my opinion, it is.
One benefit to only keeping the data in leaf nodes (e.g., B+ tree) is that scanning/reading the data is exceedingly simple. The leaf nodes are linked together. So to read the next item when you are at the "end" (right or left) of the data within a given leaf node, you just read the link/pointer to the next (or previous) node and jump to the next leaf page.
With a B tree where data is in every node, you have to traverse the tree to read the data in order. That is certainly a well-defined process but is arguably more complex and typically requires more state information.
I am reading the same book and they say it could be done either way, data storage at external or at internal nodes.
The trees they use are Red-Black.
In any case, here is an article that stores data at internal nodes of a Red Black Tree and then links these data nodes together as a list.
Balanced binary search tree with a doubly linked list in C++
by Arjan van den Boogaard
http://archive.gamedev.net/archive/reference/programming/features/TStorage/default.html