Splitting a treap at a key that exists inside the tree - algorithm

I'm been trying to implement and understand the split/merge operations on a treap. Every node has two keys: a heap key and a tree key. Looking at the heap keys you should see a valid heap and same with the tree keys.
Splitting a treap is easier than normal because you can just insert a dummy node with the maxmimum or minimum priority (depends on if it's a max-heap or min-heap). However, this link just says to assume that the splitting key isn't in the tree. However, what if I always want existing key inside the right tree, or the left tree? What do I do?

Find the node with the key in question.
Move it up to become the new root (by giving it a very high -- or very low -- priority).
Split off the left (or right) subtree.

Related

How to get the n-th value of a b-tree

Is there general pseudocode or related data structure to get the nth value of a b-tree? For example, the eighth value of this tree is 13 [1,4,9,9,11,11,12,13].
If I have some values sorted in a b-tree, I would like to find the nth value without having to go through the entire tree. Is there a better structure for this problem? The data order could update anytime.
You are looking for order statistics tree. The idea of it, is in addition to any data stored in nodes - also store the size of the subtree in the node, and keep them updated in insertions and deletions.
Since you are "touching" O(logn) nodes for each insert/delete operation - keeping it up to date still keeps the O(logn) behavior of these.
FindKth() is then done by eliminating subtrees that their bigger index is still smaller than k, and checking the next one. Since you don't need to go to the depth of each subtree, only directly to the required one (and checking the nodes in the path to this element) - you need to "touch" O(logn) nodes, which makes this operation O(logn) as well.

Uniqueness of B-tree

Say that I have a sequence of key values to be inserted into a B-tree of any given order. After insertion of all the elements, I am performing a deletion operation on some of those elements. Does it always give an unique result (in the form of a B-tree) or it can it differ according to the deletion operation?
Quoted from wiki :
link:https://en.wikipedia.org/wiki/B-tree
Deletion from an internal node
Each element in an internal node acts as a separation value for two
subtrees, therefore we need to find a replacement for separation. Note
that the largest element in the left subtree is still less than the
separator. Likewise, the smallest element in the right subtree is
still greater than the separator. Both of those elements are in leaf
nodes, and either one can be the new separator for the two subtrees.
Algorithmically described below:
Choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from
the leaf node it is in, and replace the element to be deleted with the
new separator.
The previous step deleted an element (the new separator) from a leaf
node. If that leaf node is now deficient (has fewer than the required
number of nodes), then rebalance the tree starting from the leaf node.
I think according to the deletion operation it may vary because of the above lines quoted in bold letters. Am I right? help :)
If your question is whether two B-trees that contain the exact same collection of key values will always have identical nodes, then the answer is No.
Note that this is also true for e.g. simple binary trees.
However, in the case of B-trees this can be more pronounced because B-trees are optimized for minimizing page changes and thus the need to write back to slow secondary storage.

Why does a Binary Heap has to be a Complete Binary Tree?

The heap property says:
If A is a parent node of B then the key of node A is ordered with
respect to the key of node B with the same ordering applying across
the heap. Either the keys of parent nodes are always greater than or
equal to those of the children and the highest key is in the root node
(this kind of heap is called max heap) or the keys of parent nodes are
less than or equal to those of the children and the lowest key is in
the root node (min heap).
But why in this wiki, the Binary Heap has to be a Complete Binary Tree? The Heap Property doesn't imply that in my impression.
According to the wikipedia article you provided, a binary heap must conform to both the heap property (as you discussed) and the shape property (which mandates that it is a complete binary tree). Without the shape property, one would lose the runtime advantage that the data structure provides (i.e. the completeness ensures that there is a well defined way to determine the new root when an element is removed, etc.)
Every item in the array has a position in the binary tree, and this position is calculated from the array index. The positioning formula ensures that the tree is 'tightly packed'.
For example, this binary tree here:
is represented by the array
[1, 2, 3, 17, 19, 36, 7, 25, 100].
Notice that the array is ordered as if you're starting at the top of the tree, then reading each row from left-to-right.
If you add another item to this array, it will represent the slot below the 19 and to the right of the 100. If this new number is less than 19, then values will have to be swapped around, but nonetheless, that is the slot that will be filled by the 10th item of the array.
Another way to look at it: try constructing a binary heap which isn't a complete binary tree. You literally cannot.
You can only guarantee O(log(n)) insertion and (root) deletion if the tree is complete. Here's why:
If the tree is not complete, then it may be unbalanced and in the worst case, simply a linked list, requiring O(n) to find a leaf, and O(n) for insertion and deletion. With the shape requirement of completeness, you are guaranteed O(log(n)) operations since it takes constant time to find a leaf (last in array), and you are guaranteed that the tree is no deeper than log2(N), meaning the "bubble up" (used in insertion) and "sink down" (used in deletion) will require at most log2(N) modifications (swaps) of data in the heap.
This being said, you don't absolutely have to have a complete binary tree, but you just loose these runtime guarantees. In addition, as others have mentioned, having a complete binary tree makes it easy to store the tree in array format forgoing object reference representation.
The point that 'complete' makes is that in a heap all interior (not leaf) nodes have two children, except where there are no children left -- all the interior nodes are 'complete'. As you add to the heap, the lowest level of nodes is filled (with childless leaf nodes), from the left, before a new level is started. As you remove nodes from the heap, the right-most leaf at the lowest level is removed (and pushed back in at the top). The heap is also perfectly balanced (hurrah!).
A binary heap can be looked at as a binary tree, but the nodes do not have child pointers, and insertion (push) and deletion (pop or from inside the heap) are quite different to those procedures for an actual binary tree.
This is a direct consequence of the way in which the heap is organised. The heap is held as a vector with no gaps between the nodes. The parent of the i'th item in the heap is item (i - 1) / 2 (assuming a binary heap, and assuming the top of the heap is item 0). The left child of the i'th item is (i * 2) + 1, and the right child one greater than that. When there are n nodes in the heap, a node has no left child if (i * 2) + 1 exceeds n, and no right child if (i * 2) + 2 does.
The heap is a beautiful thing. It's one flaw is that you do need a vector large enough for all entries... unlike a real binary tree, you cannot allocate a node at a time. So if you have a heap for an indefinite number of items, you have to be ready to extend the underlying vector as and when needed -- or run some fragmented structure which can be addressed as if it was a vector.
FWIW: when stepping down the heap, I find it convenient to step to the right child -- (i + 1) * 2 -- if that is < n then both children are present, if it is == n only the left child is present, otherwise there are no children.
By maintaining binary heap as a complete binary gives multiple advantages such as
1.heap is complete binary tree so height of heap is minimum possible i.e log(size of tree). And insertion, build heap operation depends on height. So if height is minimum then their time complexity will be reduced.
2.All the items of complete binary tree stored in contiguous manner in array so random access is possible and it also provide cache friendliness.
In order for a Binary Tree to be considered a heap two it must meet two criteria. 1) It must have the heap property. 2) it must be a complete tree.
It is possible for a structure to have either of these properties and not have the other, but we would not call such a data structure a heap. You are right that the heap property does not entail the shape property. They are separate constraints.
The underlying structure of a heap is an array where every node is an index in an array so if the tree is not complete that means that one of the index is kept empty which is not possible beause it is coded in such a way that each node is an index .I have given a link below so that u can see how the heap structure is built
http://www.sanfoundry.com/java-program-implement-min-heap/
Hope it helps
I find that all answers so far either do not address the question or are, essentially, saying "because the definition says so" or use a similar circular argument. They are surely true but (to me) not very informative.
To me it became immediately obvious that the heap must be a complete tree when I remembered that you insert a new element not at the root (as you do in a binary search tree) but, rather, at the bottom right.
Thus, in a heap, a new element propagates from the bottom up - it is "moved up" within the tree till it finds a suitable place.
In a binary search tree a newly inserted element moves the other way round - it is inserted at the root and it "moves down" till it finds its place.
The fact that each new element in a heap starts as the bottom right node means that the heap is going to be a complete tree at all times.

Sort list to ease construction of binary tree

I have a set of items that are supposed to for a balanced binary tree. Each item is of the form (data,parent), data being the useful information and parent being the index of the parent node in the binary tree.
Nodes in the tree are numbered left-to-right, row-by-row, like this:
1
___/ \___
/ \
2 3
_/\_ _/\_
4 5 6 7
These elements come stored in a linked list. How should I order this list such that it's easier for me to build the tree? Each parent node will be referenced (by index) by exactly two child nodes; if I sort these by parent index, the sorting must be stable.
You can sort the list in any stable sort, according to the parent field, in increasing order.
The result will be a list like that:
[(d_1,nil), (d_2,1), (d_3,1) , (d_4,2), (d_5,2), ...(d_i,x), (d_i+1,x) ]
^
the root has no parent...
Note that in this list, since we used a stable sort - for each two pairs (d_i,x), (d_i+1,x) in the sorted list, d_i is the left leaf!
Now, you can populate the tree in breadth-first traversal,
Since it is homework - I still want you to make sure you understand everything by your own. So I do not want to "feed answer". If you have any specific question, please comment - and I will try to edit and explain the relevant parts with more details.
Bonus: The result of this organization is very common way to implement a binary heap structure, which is a complete binary tree, but for performance, we usually store it as an array, which is very similar to the output generated by this approach.
I don't think I understand what exactly are you trying to achieve. You have to write the function that inserts items in the tree. The red-black tree, for example, has the same complexity for insertions, O(log n), no matter how the input data is sorted. Is there a specific implementation that you have to use or a specific speed target that you must reach for inserts?
PS: Sounds like a homework to me :)
It sounds like you want a binary tree that allows you to go from a leaf node to its ancestors, using an array.
Usually sorting a list before putting it into a binary tree causes an unbalanced binary tree, unless you use a treap or other O(logn) datastructure.
The usual way of stashing a (complete) binary tree in an array, is to make node i have two children 2i and 2i+1.
Given this organization (not sorting but organization), you can go to a parent node from a leaf node by dividing the array index by 2 using integer arithmetic which will truncate fractions.
if your binary trees are not always complete, you'll probably be better served by forgetting about using an array, and instead using a more traditional tree structure with pointers/references.

Data structure supporting Add and Partial-Sum

Let A[1..n] be an array of real numbers. Design an algorithm to perform any sequence of the following operations:
Add(i,y) -- Add the value y to the ith number.
Partial-sum(i) -- Return the sum of the first i numbers, i.e.
There are no insertions or deletions; the only change is to the values of the numbers. Each operation should take O(logn) steps. You may use one additional array of size n as a work space.
How to design a data structure for above algorithm?
Construct a balanced binary tree with n leaves; stick the elements along the bottom of the tree in their original order.
Augment each node in the tree with "sum of leaves of subtree"; a tree has #leaves-1 nodes so this takes O(n) setup time (which we have).
Querying a partial-sum goes like this: Descend the tree towards the query (leaf) node, but whenever you descend right, add the subtree-sum on the left plus the element you just visited, since those elements are in the sum.
Modifying a value goes like this: Find the query (left) node. Calculate the difference you added. Travel to the root of the tree; as you travel to the root, update each node you visit by adding in the difference (you may need to visit adjacent nodes, depending if you're storing "sum of leaves of subtree" or "sum of left-subtree plus myself" or some variant); the main idea is that you appropriately update all the augmented branch data that needs updating, and that data will be on the root path or adjacent to it.
The two operations take O(log(n)) time (that's the height of a tree), and you do O(1) work at each node.
You can probably use any search tree (e.g. a self-balancing binary search tree might allow for insertions, others for quicker access) but I haven't thought that one through.
You may use Fenwick Tree
See this question

Resources