Max-Heapify Function - max-heap

In maxHeapify, why is it necessary to check that the left and right indices are ≤ heap->size after calculating the index of the left and right children?
For this question a suggested response was to check the index sizes against heap->size to determine if the left child or right child does not exist.

Related

Where in a max-heap might the smallest element reside, assuming that all elements are distinct?

Question: Where in a max-heap might the smallest element reside, assuming that all elements are distinct?
I understand that in a max-heap, the largest node is the root and the smallest is one of the leaves. I found answer which says that it's in any of the any of the leaves, that is, elements with index ⌊n/2⌋+k, where k>=1 , that is, in the second half of the heap array.
Problem: can you please explain why the answer I found do not just say it's one of the leaves? Can you please explain why answer brough ⌊n/2⌋+k? Second, why in the second half, when it's in the last level of the tree given that all parents are greater than their children, so a child at height 1 is smaller than parent but larger than its child and so on.
Edited: Can you please explain why the indices of the leaves are ⌊n/2⌋+1, ⌊n/2⌋+2, ..., ⌊n/2⌋+n? Or why the index of the last non-leaf node is at ⌊n/2⌋ of array-based heap please? We know that total vertices of heap is given by Ceil(n/2^(h+1)). The leaves number is Ceil(n/2), so hope the extra details would help solving the question.
In a zero-based indexed array, the root is at index 0. The children of the root are at index 1 and 2. In general the children of index 𝑖 are at indices 2𝑖+1 and 2𝑖+2.
So for a node to have children, we must have that 2𝑖+1 (i.e. the left child) is still within the range of the array, i.e. 2𝑖+1 < 𝑛, where 𝑛 is the size of the array (i.e. the last index is at 𝑛-1).
Which is the index of the first leaf? That would be the least value of 𝑖 for which 2𝑖+1 < 𝑛 is not true, i.e. when 2𝑖+1 ≥ 𝑛. From that we can derive that:
All indices that represent leaves are grouped together. They are all at the 𝑖, for which 2𝑖+1 ≥ 𝑛
The least index of a leaf is at index 𝑖 = ⌊𝑛/2⌋.
If you are working with a one-based indexed array (as is common in pseudo code), then you can adapt the above reasoning to derive that the first leaf is at index 𝑖 = ⌊𝑛/2⌋+1.
So the answer you quote is assuming a 1-based indexed array, and then it is correct to say that the first leaf is positioned at ⌊𝑛/2⌋+1, and any leaf is at a position ⌊𝑛/2⌋+𝑘, where 𝑘≥1.

Binary tree without pointers

Below is a representation of a binary tree that I use in my project. In the bottom are the leaf nodes (orange boxes), and every level is the sum of the children below.
So, 3 on the leftmost node is the sum of 1 and 2 (it's left and right children), 10 is the sum of 3 and 7 (again left and right children).
What I am trying to do is, store this tree in a flat array without using any pointers. So this array is basically an integer array, holding 2n-1 nodes (n is the number of the leaf nodes).
So the index of the root element is 0 (let's call it p), and the index of it's left child is 2p+1, index of the right child is 2p+2. Please see Binary Tree (Array implementation)
Everything works nicely if I know the number of leaf values beforehand but I can't seem to find a way to store this tree in a dynamically expanding array.
If I need to add 9 for example as the 9th element to the array, the structure needs to change and I need to recalculate all the indices again which I refrain because there may be hundreds of thousand of elements in the array at any time.
Does anyone know of an implementation that handles dynamic arrays with this implementation?
EDIT:
Below is the demonstration of what happens when I add new elements to the array. 36 was the root before, now it's a second level element and the new root array[0] is 114, which triggers a new layout.

Why BST left <= parent <= right is not ideal?

When allow duplicates, BST nodes typically has the following property:
left <= parent < right, or: left < parent <= right.
What is wrong with left <= parent <= right?
Your premise is incorrect. In a BST that allows duplicates, it's always left <= parent <= right. The code that picks the place to insert a new node will just pick one side or the other, but that is not a rule about how nodes must be linked and it is not an invariant that will be maintained.
That's because, for trees with duplicate values, the condition that the left or right branches contain only strictly larger elements is not compatible with balancing operations. Try this: link 20 copies of the same value into a tree. If you can only link equal values on the left or the right, then you have to make a singly-linked list. Your tree will be 20 levels deep and completely unbalanced.
The way to think about duplicate values in the tree is that there really aren't any duplicate values in the tree :-) A BST defines a total ordering, and valid rebalancing operations like rotations preserve this total ordering.
When you insert a duplicate into the tree, you'll put it either to the left or right of the existing match. If you put it on the left, then it will be smaller according to the ordering of the tree, and it will always be smaller after any rebalancing. If you put it on the right, then it will be larger, and will stay larger according to the tree's ordering.
If you think about it, it has to be this way because the balancing operations and in-order traversals don't even look at the values in the nodes. The way they're linked together determines the ordering, the ordering doesn't change as the tree is rebalanced, and the ordering is total -- every node is either before or after every other node in an inorder traversal.
Because you need to maintain the O(log n) complexity for search. Consider you are searching for a Node, then you will have to check it in both the Left and Right subtree to check for its existence. However, the correct condition enforces the constraint that the Node will exist in only one of the subtrees.
Consider a scenario where the BST Node contains an Integer and a String, and the key for building the BST is the Integer.
If you need all the Strings for an integer a, you will need to check for both the subtrees, which will lead to a worse time-complexity of O(n) rather than the O(log n) if you implement it according to the correct condition.
If left <= parent <= right then in case of equality where would you go? Left or right? You need to be deterministic, not choose randomly. So let's say you decide to always use left, then there you go: left <= parent < right

RB tree with sum

I have some questions about augmenting data structures:
Let S = {k1, . . . , kn} be a set of numbers. Design an efficient
data structure for S that supports the following two operations:
Insert(S, k) which inserts the
number k into S (you can assume that k is not contained in S yet), and TotalGreater(S, a)
which returns the sum of all keys ki ∈ S which are larger than a, that is, P ki∈S, ki>a ki .
Argue the running time of both operations and give pseudo-code for TotalGreater(S, a) (do not given pseudo-code for Insert(S, k)).
I don't understand how to do this, I was thinking of adding an extra field to the RB-tree called sum, but then it doesn't work because sometimes I need only the sum of the left nodes and sometimes I need the sum of the right nodes too.
So I was thinking of adding 2 fields called leftSum and rightSum and if the current node is > GivenValue then add the cached value of the sum of the sub nodes to the current sum value.
Can someone please help me with this?
You can just add a variable size to each node, which is the number of nodes in the subtree rooted at that node. When finding the node with the smallest value that is larger than the value a, two things can happen on the path to that node: you can go left or right. Every time you go left, you add the size of the right child + 1 to the running total. Every time you go right, you do nothing.
There are two conditions for termination. 1) we find a node containing the exact value a, in which case we add the size of its right child to the total. 2) we reach a leaf, in which case we add 1 if it is larger than a, or nothing if it is smaller.
As Jordi describes: The key-word could be augmented red-black tree.

Design a data structure

I am trying to design a data structure that stores elements according to some prescribed ordering, each element with its own value, and that supports each of the following
four operations in logarithmic time (amortized or worst-case, your choice):
add a new element of value v in the kth position
delete the kth element
returns the sum of the values of elements i through j
increase by x the values of elements i through j
Any Idea will be appreciated,
Thanks
I suspect you could do it with a red-black tree. Over the classic red-black tree, each node would need the following additional fields:
size
sum
increment
The size field would track the total number of child nodes, allowing for log(n) time insertion and deletion.
The sum field would track the sum of its child nodes, allowing for log(n) time summing.
The increment field would be used to track an increment to each of its child nodes which would be added on when calculating sums. So, when calculating the final sum, we would return sum + size*increment. This is the trickiest one. The increment field would be added on when calculating sums. I think by adding positive and negative increments at the appropriate nodes, it would be possible to alter the returned sum correctly in all cases by altering only log(n) nodes.
Needless to say, implementation would be very tricky. Sum and increment fields would have to be updated after each insertion and deletion, and each would have at least five cases to deal with.
Update: I'm not going to try to solve this completely, but I would note that incrementing i through j by n is equivalent to incrementing the whole tree by n, then decrementing 0 through i by n and decrementing j through to the end by n. A global increment can be done in constant time, with the other two operations being a 'left side decrement' and a 'right side decrement', which are symmetrical. Doing a left side decrement to i would be something like, 'take the count of the left subtree of the root node. If it the count is less than i, decrement the increment field on the left child of root by n. Then apply a left decrement of n to to right sub-tree of the root node up to i - count(left subtree) elements. Alternatively, if the count is greater than i, decrement the increment field of the left-left grandchild of the root by n, then apply a left decrement of n to the left-right subtree of the root up to count (left-left subtree) '. As the tree is balanced, I think the left decrement operation need only be recursively applied ln(n) times. The right decrement would be similar, but reversed.
What you're asking for isn't feasible.
Requirement #3 might be possible, but #4 just can't be done in logarithmic time. You have to edit at most every node. Imagine i is 0 and j is n-1. You'd have to edit every node. Even with constant access that's linear time.
Edit:
Upon further consideration, if you kept track of "mass increases" you could potentially control access to a node, decorating it on the way out with whatever mass increases it required. I still think it would entirely unweildly, but I suppose it's possible.
Requirement 1, 2 and 3 can be satisfied by Binary Indexed Tree (BIT, Fenwick Tree):
http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees
I am thinking of a way to modify BIT to work with #4 in logarithm complexity.

Resources