What does it actually mean by different heap-operations? - data-structures

There are various heap-operations and various names are given to some same operations.
I am overwhelmed by the names and aliases.
So please clarify, What are the differences/similarities/relationships among the following heap-operations:
(1) Heapify
(2) Insert
(3) Delete
(4) Shift-up
(5) Shift-down
For example, some resources talk about implementing Heapsort using shift-down; while some implemented the same algorithm using Heapify. Some even implemented it using Delete.

1) Heapify restores the heap condition. For example if you changed a node in the tree the condition isn't valid anymore. You can restore the condition if you move your nodes up or down the tree.
2) Insert a node in the tree
3) Delete a node in the tree
4) Move a node up in the tree, as long as needed (depending on the heap condition: min-heap or max-heap)
5) Move a node down in the tree, similar to 4)
It's probably best if you try to implement or understand real code and don't worry about the naming..

Take a peek over at Wikipedia and you can get all sorts of information on heaps:
http://en.wikipedia.org/wiki/Heap_%28data_structure%29

To add a note to answer by #duedl0r, what shift up and shift down are used for is heapify the current structure. So for eg. in case of min heap, when you insert the element which is less than some nodes in the tree, the data structure now doesn't follow heap condition (in case of min heap, value of parent should be less than its children), so you have to shift up and up.
So in terms of code :
public void insert(int value) {
if (heapSize == data.length)
throw new HeapException("Heap's underlying storage is overflow");
else {
heapSize++;
data[heapSize - 1] = value;
siftUp(heapSize - 1);
}
}
private void siftUp(int nodeIndex) {
int parentIndex, tmp;
if (nodeIndex != 0) {
parentIndex = getParentIndex(nodeIndex);
/*if parent index data is more than child data, swap*/
if (data[parentIndex] > data[nodeIndex]) {
tmp = data[parentIndex];
data[parentIndex] = data[nodeIndex];
data[nodeIndex] = tmp;
siftUp(parentIndex);
}
}
}
data is the array to resemple heap, and heapSize is current place where new element will be stored, and it tells that this much heap is full.
Similarly in case of delete you have to use shift down to restructure your heap.

By splitting heapify logic into shiftUp and shiftDown, we can reduce comparisons while inserting elements.
insert -> shift up -> only one comparison (with its parent)
remove -> shift down -> two comparison (with its left and right child's)
https://discuss.codecademy.com/t/what-are-some-differences-between-heapify-up-and-heapify-down/375384

Related

Check if a tree is a mirror image?

Given a binary tree which is huge and can not be placed in memory, how do you check if the tree is a mirror image.
I got this as an interview question
If a tree is a mirror image of another tree, the inorder traversal of one tree would be reverse of another.
So just do inorder traversal on the first tree and a reverse inorder traversal on another and check if all the elements are the same.
I can't take full credit for this reply of course; a handful of my colleagues helped with some assumptions and for poking holes in my original idea. Much thanks to them!
Assumptions
We can't have the entire tree in memory, so it's not ideal to use recursion. Let's assume, for simplicity's sake, that we can only hold a maximum of two nodes in memory.
We know n, the total number of levels in our tree.
We can perform seeks on the data with respect to the character or line position it's in.
The data that is on disk is ordered by depth. That is to say, the first entry on disk is the root, and the next two are its children, and the next four are its children's children, and so forth.
There are cases in which the data is perfectly mirrored, and cases in which it isn't. Blank data interlaced with non-blank data is considered "acceptable", unless otherwise specified.
We have freedom over using any data type we wish so long as the values can be compared for equivalence. Testing for object equivalence may not be ideal, so let's assume we're comparing primitives.
"Mirrored" means mirrored between the root's children. To use different terminologies, the grandparent's left child is mirrored with its right child, and the left child (parent)'s left child is mirrored with the grandparent's right child's right child. This is illustrated in the graph below; the matching symbols represent the mirroring we want to check for.
G
P* P*
C1& C2^ C3^ C4&
Approach
We know how many nodes on each level we should expect when we're reading from disk - some multiple of 2k. We can establish a double loop to iterate over the total depth of the tree, and the count of the nodes in each level. Inside of this, we can simply compare the outermost values for equivalence, and short-circuit if we find an unequal value.
We can determine the location of each outer location by using multiples of 2k. The leftmost child of any level will always be 2k, and the rightmost child of any level will always be 2k+1-1.
Small Proof: Outermost nodes on level 1 are 2 and 3; 21 = 2, 21+1-1 = 22-1 = 3. Outermost nodes on level 2 are 4 and 7; 22 = 4, 22+1-1 = 23-1 = 7. One could expand this all the way to the nth case.
Pseudocode
int k, i;
for(k = 1; k < n; k++) { // Skip root, trivially mirrored
for(i = 0; i < pow(2, k) / 2; i++) {
if(node_retrieve(i + pow(2, k)) != node_retrieve(pow(2, (k+1)-i)) {
return false;
}
}
}
return true;
Thoughts
This sort of question is a great interview question because, more than likely, they want to see how you would approach this problem. This approach may be horrible, it may be immaculate, but an employer would want you to take your time, draw things on a piece of paper or whiteboard, and ask them questions about how the data is stored, how it can be read, what limitations there are on seeks, etc etc.
It's not the coding aspect that interviewers are interested in, but the problem solving aspect.
Recursion is easy.
struct node {
struct node *left;
struct node *right;
int payload;
};
int is_not_mirror(struct node *one, struct node *two)
{
if (!one && !two) return 0;
if (!one) return 1;
if (!two) return 1;
if (compare(one->payload, two->payload)) return 1;
if (is_not_mirror(one->left, two->right)) return 1;
if (is_not_mirror(one->right, two->left)) return 1;
return 0;
}

Generalizing the find-min/find-max stack to arbitrary order statistics?

In this earlier question, the OP asked for a data structure similar to a stack supporting the following operations in O(1) time each:
Push, which adds a new element atop the stack,
Pop, which removes the top element from the stack,
Find-Max, which returns (but does not remove) the largest element of the stack, and
Find-Min, which returns (but does not remove) the smallest element of the stack.
A few minutes ago I found this related question asking for a clarification on a similar data structure that instead of allowing for the max and min to be queried, allows for the median element of the stack to be queried. These two data structures seem to be a special case of a more general data structure supporting the following operations:
Push, which pushes an element atop the stack,
Pop, which pops the top of the stack, and
Find-Kth, which for a fixed k determined when the structure is created, returns the kth largest element of the stack.
It is possible to support all of these operations by storing a stack and an balanced binary search tree holding the top k elements, which would enable all these operations to run in O(log k) time. My question is this: is it possible to implement the above data structure faster than this? That is, could we get O(1) for all three operations? Or perhaps O(1) for push and pop and O(log k) for the order statistic lookup?
Since the structure can be used to sort k elements with O(k) push and find-kth operations, every comparison-based implementation has at least one of these cost Omega(log k), even in an amortized sense, with randomization.
Push can be O(log k) and pop/find-kth can be O(1) (use persistent data structures; push should precompute the order statistic). My gut feeling based on working with lower bounds for comparison-based algorithms is that O(1) push/pop and O(log k) find-kth is doable but requires amortization.
I think what tophat was saying is, implement a purely functional data structure that supports only O(log k) insert and O(1) find-kth (cached by insert), and then make a stack of these structures. Push inserts into the top version and pushes the update, pop pops the top version, and find-kth operates on the top version. This is O(log k)/O(1)/(1) but super-linear space.
EDIT: I was working on O(1) push/O(1) pop/O(log k) find-kth, and I think it can't be done. The sorting algorithm that tophat referred to can be adapted to get √k evenly spaced order statistics of a length-k array in time O(k + (√k) log k). Problem is, the algorithm must know how each order statistic compares with all other elements (otherwise it might be wrong), which means that it has bucketed everything into one of √k + 1 buckets, which takes Ω(k log (√k + 1)) = Ω(k log k) comparisons on information theoretic grounds. Oops.
Replacing √k by keps for any eps > 0, with O(1) push/O(1) pop, I don't think find-kth can be O(k1 - eps), even with randomization and amortization.
Whether this is actually faster than your log k implementation, depending on which operations are used most frequently, I propose an implementation with O(1) Find-kth and Pop and O(n) Push, where n is the stack size. And I also want to share this with SO because it is just a hilarious data structure at first sight, but might even be reasonable.
It's best described by a doubly doubly linked stack, or perhaps more easily dscribed as a hybrid of a linked stack and a doubly linked sorted list. Basically each node maintains 4 references to other nodes, the next and previous in stack order and the next and previous in sorted order on the element size. These two linked lists can be implemented using the same nodes, but they work completely seperately, i.e. the sorted linked list doesn't have to know about the stack order and vice versa.
Like a normal linked stack, the collection itself will need to maintain a reference to the top node (and to the bottom?). To accomodate the O(1) nature of the Find-kth method, the collection will also keep a reference to the kth largest element.
The pop method works as follows:
The popped node gets removed from the sorted doubly linked list, just like a removal from a normal sorted linked list. It takes O(1) as the collection has a reference to the top. Depending on whether the popped element was larger or smaller than the kth element, the reference to the kth largest element is set to either the previous or the next. So the method still has O(1) complexity.
The push method works just like a normal addition to a sorted linked list, which is a O(n) operation. It start with the smallest element, and inserts the new node when a larger element is encountered. To maintain the correct reference to the kth largest element, again either the previous or next element to the current kth largest element is selected, depending on whether the pushed node was larger or smaller than the kth largest element.
Of course next to this, the reference to the 'top' of the stack has to be set in both methods. Also there's the problem of k > n, for which you haven't specified what the data structure should do. I hope it is clear how it works, otherwise I could add an example.
But ok, not entirely the complexity you had hoped for, but I find this an interesting 'solution'.
Edit: An implementation of the described structure
A bounty was issued on this question, which indicates my original answer wasn’t good enough:P Perhaps the OP would like to see an implementation?
I have implemented both the median problem and the fixed-k problem, in C#. The implementation of the tracker of the median is just a wrapper around the tracker of the kth element, where k can mutate.
To recap the complexities:
Push takes O(n)
Pop takes O(1)
FindKth takes O(1)
Change k takes O(delta k)
I have already described the algorithm in reasonable detail in my original post. The implementation is then fairly straightforward(but not so trivial to get right, as there are a lot of inequality signs and if statements to consider). I have commented only to indicate what is done, but not the details of how, as it would otherwise become too large. The code is already quite lengthy for a SO post.
I do want to provide the contracts of all non-trivial public members:
K is the index of the element in the sorted linked list to keep a reference too. Is it mutable and when set, the structure is immediately corrected for that.
KthValue is the value at that index, unless the structure doesn’t have k elements yet, in which case it returns a default value.
HasKthValue exists to easily distinguish these default values from elements which happened to be the default value of its type.
Constructors: a null enumerable is interpreted as an empty enumerable, and a null comparer is interpreted as the default. This comparer defines the order used when determining the kth value.
So this is the code:
public sealed class KthTrackingStack<T>
{
private readonly Stack<Node> stack;
private readonly IComparer<T> comparer;
private int k;
private Node smallestNode;
private Node kthNode;
public int K
{
get { return this.k; }
set
{
if (value < 0) throw new ArgumentOutOfRangeException();
for (; k < value; k++)
{
if (kthNode.NextInOrder == null)
return;
kthNode = kthNode.NextInOrder;
}
for (; k >= value; k--)
{
if (kthNode.PreviousInOrder == null)
return;
kthNode = kthNode.PreviousInOrder;
}
}
}
public T KthValue
{
get { return HasKthValue ? kthNode.Value : default(T); }
}
public bool HasKthValue
{
get { return k < Count; }
}
public int Count
{
get { return this.stack.Count; }
}
public KthTrackingStack(int k, IEnumerable<T> initialElements = null, IComparer<T> comparer = null)
{
if (k < 0) throw new ArgumentOutOfRangeException("k");
this.k = k;
this.comparer = comparer ?? Comparer<T>.Default;
this.stack = new Stack<Node>();
if (initialElements != null)
foreach (T initialElement in initialElements)
this.Push(initialElement);
}
public void Push(T value)
{
//just a like a normal sorted linked list should the node before the inserted node be found.
Node nodeBeforeNewNode;
if (smallestNode == null || comparer.Compare(value, smallestNode.Value) < 0)
nodeBeforeNewNode = null;
else
{
nodeBeforeNewNode = smallestNode;//untested optimization: nodeBeforeNewNode = comparer.Compare(value, kthNode.Value) < 0 ? smallestNode : kthNode;
while (nodeBeforeNewNode.NextInOrder != null && comparerCompare(value, nodeBeforeNewNode.NextInOrder.Value) > 0)
nodeBeforeNewNode = nodeBeforeNewNode.NextInOrder;
}
//the following code includes the new node in the ordered linked list
Node newNode = new Node
{
Value = value,
PreviousInOrder = nodeBeforeNewNode,
NextInOrder = nodeBeforeNewNode == null ? smallestNode : nodeBeforeNewNode.NextInOrder
};
if (newNode.NextInOrder != null)
newNode.NextInOrder.PreviousInOrder = newNode;
if (newNode.PreviousInOrder != null)
newNode.PreviousInOrder.NextInOrder = newNode;
else
smallestNode = newNode;
//the following code deals with changes to the kth node due the adding the new node
if (kthNode != null && comparer.Compare(value, kthNode.Value) < 0)
{
if (HasKthValue)
kthNode = kthNode.PreviousInOrder;
}
else if (!HasKthValue)
{
kthNode = newNode;
}
stack.Push(newNode);
}
public T Pop()
{
Node result = stack.Pop();
//the following code deals with changes to the kth node
if (HasKthValue)
{
if (comparer.Compare(result.Value, kthNode.Value) <= 0)
kthNode = kthNode.NextInOrder;
}
else if(kthNode.PreviousInOrder != null || Count == 0)
{
kthNode = kthNode.PreviousInOrder;
}
//the following code maintains the order in the linked list
if (result.NextInOrder != null)
result.NextInOrder.PreviousInOrder = result.PreviousInOrder;
if (result.PreviousInOrder != null)
result.PreviousInOrder.NextInOrder = result.NextInOrder;
else
smallestNode = result.NextInOrder;
return result.Value;
}
public T Peek()
{
return this.stack.Peek().Value;
}
private sealed class Node
{
public T Value { get; set; }
public Node NextInOrder { get; internal set; }
public Node PreviousInOrder { get; internal set; }
}
}
public class MedianTrackingStack<T>
{
private readonly KthTrackingStack<T> stack;
public void Push(T value)
{
stack.Push(value);
stack.K = stack.Count / 2;
}
public T Pop()
{
T result = stack.Pop();
stack.K = stack.Count / 2;
return result;
}
public T Median
{
get { return stack.KthValue; }
}
public MedianTrackingStack(IEnumerable<T> initialElements = null, IComparer<T> comparer = null)
{
stack = new KthTrackingStack<T>(initialElements == null ? 0 : initialElements.Count()/2, initialElements, comparer);
}
}
Of course you're always free to ask any question about this code, as I realize some things may not be obvious from the description and sporadic comments
The only actual working implementation I can wrap my head around is Push/Pop O(log k) and Kth O(1).
Stack (single linked)
Min Heap (size k)
Stack2 (doubly linked)
The value nodes will be shared between the Stack, Heap and Stack2
PUSH:
Push to the stack
If value >= heap root
If heap size < k
Insert value in heap
Else
Remove heap root
Push removed heap root to stack2
Insert value in heap
POP:
Pop from the stack
If popped node has stack2 references
Remove from stack2 (doubly linked list remove)
If popped node has heap references
Remove from the heap (swap with last element, perform heap-up-down)
Pop from stack2
If element popped from stack2 is not null
Insert element popped from stack2 into heap
KTH:
If heap is size k
Return heap root value
You could use a skip list . (I first thought of linked-list, but insertion is O(n) and amit corrected me with skip list. I think this data structure could be pretty interesting in your case)
With this data structure, inserting/deleting would take O(ln(k))
and finding the maximum O(1)
I would use :
a stack, containing your elements
a a stack containing the history of skip list (containing the k smallest elements)
(I realised it was the Kth largest..element. but it's pretty much the same problem)
when pushing (O(ln(k)):
if the element is less the kth element, delete the kth element (O(ln(k)) put it in the LIFO pile (O(1)) then insert the element in the skip list O(ln(k))
otherwise it's not in the skip list just put it on the pile (O(1))
When pushing you add a new skip list to the history, since this is similar to a copy on write it wouldn't take more than O(ln(k))
when popping (O(1):
you just pop from both stacks
getting kth element O(1):
always take the maximum element in the list (O(1))
All the ln(k) are amortised cost.
Example:
I will take the same example as yours (on Stack with find-min/find-max more efficient than O(n)) :
Suppose that we have a stack and add the values 2, 7, 1, 8, 3, and 9, in that order. and k = 3
I will represent it this way :
[number in the stack] [ skip list linked with that number]
first I push 2,7 and 1 (it doesn't make sens to look for the kth element in a list of less than k elements)
1 [7,2,1]
7 [7,2,null]
2 [2,null,null]
If I want the kth element I just need to take the max in the linked list: 7
now I push 8,3, 9
on the top of the stack I have :
8 [7,2,1] since 8 > kth element therefore skip list doesn't change
then :
3 [3,2,1] since 3 < kth element, the kth element has changed. I first delete 7 who was the previous kth element (O(ln(k))) then insert 3 O(ln(k)) => total O(ln(k))
then :
9 [3,2,1] since 9 > kth element
Here is the stack I get :
9 [3,2,1]
3 [3,2,1]
8 [7,2,1]
1 [7,2,1]
7 [7,2,null]
2 [2,null,null]
find k th element :
I get 3 in O(1)
now I can pop 9 and 3 (takes O(1)):
8 [7,2,1]
1 [7,2,1]
7 [7,2,null]
2 [2,null,null]
find kth element :
I get 7 in O(1)
and push 0 (takes O(ln(k) - insertion)
0 [2,1,0]
8 [7,2,1]
1 [7,2,1]
7 [7,2,null]
2 [2,null,null]
#tophat is right - since this structure could be used to implement a sort, it can't have less complexity than an equivalent sort algorithm. So how do you do a sort in less than O(lg N)? Use Radix Sort.
Here is an implementation which makes use of a Binary Trie. Inserting items into a binary Trie is essentially the same operation as performing a radix sort. The cost for inserting and deleting s O(m), where m is a constant: the number of bits in the key. Finding the next largest or smallest key is also O(m), accomplished by taking the next step in an in-order depth-first traversal.
So the general idea is to use the values pushed onto the stack as keys in the trie. The data to store is the occurance count of that item in the stack. For each pushed item: if it exists in the trie, increment its count, else store it with a count of 1. When you pop an item, find it, decrement the count, and remove it if the count is now 0. Both those operations are O(m).
To get O(1) FindKth, keep track of 2 values: The value of the Kth item, and how many instances of that value are in the first K item. (for example, for K=4 and a stack of [1,2,3,2,0,2], the Kth value is 2 and the "iCount" is 2.) Then when you push values < the KthValue, you simply decrement the instance count, and if it is 0, do a FindPrev on the trie to get the next smaller value.
When you pop values greater than the KthValue, increment the instance count if more instances of that vaue exist, else do a FindNext to get the next larger value.
(The rules are different if there are less than K items. In that case, you can simply track the max inserted value. When there are K items, the max will be the Kth.)
Here is a C implementation. It relies on a BinaryTrie (built using the example at PineWiki as a base) with this interface:
BTrie* BTrieInsert(BTrie* t, Item key, int data);
BTrie* BTrieFind(BTrie* t, Item key);
BTrie* BTrieDelete(BTrie* t, Item key);
BTrie* BTrieNextKey(BTrie* t, Item key);
BTrie* BTriePrevKey(BTrie* t, Item key);
Here is the Push function.
void KSStackPush(KStack* ks, Item val)
{
BTrie* node;
//resize if needed
if (ks->ct == ks->sz) ks->stack = realloc(ks->stack,sizeof(Item)*(ks->sz*=2));
//push val
ks->stack[ks->ct++]=val;
//record count of value instances in trie
node = BTrieFind(ks->trie, val);
if (node) node->data++;
else ks->trie = BTrieInsert(ks->trie, val, 1);
//adjust kth if needed
ksCheckDecreaseKth(ks,val);
}
Here is the helper to track the KthValue
//check if inserted val is in set of K
void ksCheckDecreaseKth(KStack* ks, Item val)
{
//if less than K items, track the max.
if (ks->ct <= ks->K) {
if (ks->ct==1) { ks->kthValue = val; ks->iCount = 1;} //1st item
else if (val == ks->kthValue) { ks->iCount++; }
else if (val > ks->kthValue) { ks->kthValue = val; ks->iCount = 1;}
}
//else if value is one of the K, decrement instance count
else if (val < ks->kthValue && (--ks->iCount<=0)) {
//if that was only instance in set,
//find the previous value, include all its instances
BTrie* node = BTriePrev(ks->trie, ks->kthValue);
ks->kthValue = node->key;
ks->iCount = node->data;
}
}
Here is the Pop function
Item KSStackPop(KStack* ks)
{
//pop val
Item val = ks->stack[--ks->ct];
//find in trie
BTrie* node = BTrieFind(ks->trie, val);
//decrement count, remove if no more instances
if (--node->data == 0)
ks->trie = BTrieDelete(ks->trie, val);
//adjust kth if needed
ksCheckIncreaseKth(ks,val);
return val;
}
And the helper to increase the KthValue
//check if removing val causes Kth to increase
void ksCheckIncreaseKth(KStack* ks, Item val)
{
//if less than K items, track max
if (ks->ct < ks->K)
{ //if removing the max,
if (val==ks->kthValue) {
//find the previous node, and set the instance count.
BTrie* node = BTriePrev(ks->trie, ks->kthValue);
ks->kthValue = node->key;
ks->iCount = node->data;
}
}
//if removed val was among the set of K,add a new item
else if (val <= ks->kthValue)
{
BTrie* node = BTrieFind(ks->trie, ks->kthValue);
//if more instances of kthValue exist, add 1 to set.
if (node && ks->iCount < node->data) ks->iCount++;
//else include 1 instance of next value
else {
BTrie* node = BTrieNext(ks->trie, ks->kthValue);
ks->kthValue = node->key;
ks->iCount = 1;
}
}
}
So this is algorithm is O(1) for all 3 operations. It can also support the Median operation: Start with KthValue = the first value, and whenever stack size changes by 2, do an IncreaseKth or DecreasesKth operation. The downside is that the constant is large. It is only a win when m < lgK. However, for small keys and large K, this may be good choice.
What if you paired the stack with a pair of Fibonacci Heaps? That could give amortized O(1) Push and FindKth, and O(lgN) delete.
The stack stores [value, heapPointer] pairs. The heaps store stack pointers.
Create one MaxHeap, one MinHeap.
On Push:
if MaxHeap has less than K items, insert the stack top into the MaxHeap;
else if the new value is less than the top of the MaxHeap, first insert the result of DeleteMax in the MinHeap, then insert the new item into MaxHeap;
else insert it into the MinHeap. O(1) (or O(lgK) if DeleteMax is needed)
On FindKth, return the top of the MaxHeap. O(1)
On Pop, also do a Delete(node) from the popped item's heap.
If it was in the MinHeap, you are done. O(lgN)
If it was in the MaxHeap, also perform a DeleteMin from the MinHeap and Insert the result in the MaxHeap. O(lgK)+O(lgN)+O(1)
Update:
I realized I wrote it up as K'th smallest, not K'th largest.
I also forgot a step when a new value is less than the current K'th smallest. And that step
pushes the worst case insert back to O(lg K). This may still be ok for uniformly distributed input and small K, as it will only hit that case on K/N insertions.
*moved New Idea to different answer - it got too large.
Use a Trie to store your values. Tries already have an O(1) insert complexity. You only need to worry about two things, popping and searching, but if you tweak your program a little, it would be easy.
When inserting (pushing), have a counter for each path that stores the number of elements inserted there. This will allow each node to keep track of how many elements have been inserted using that path, i.e. the number represents the number of elements that are stored beneath that path. That way, when you try to look for the kth element, it would be a simple comparison at each path.
For popping, you can have a static object that has a link to the last stored object. That object can be accessed from the root object, hence O(1). Of course, you would need to add functions to retrieve the last object inserted, which means the newly pushed node must have a pointer to the previously pushed element (implemented in the push procedure; very simple, also O(1)). You also need to decrement the counter, which means each node must have a pointer to the parent node (also simple).
For finding kth element (this is for smallest kth element, but finding the largest is very similar): when you enter each node you pass in k and the minimum index for the branch (for the root it would be 0). Then you do a simple if comparison for each path: if (k between minimum index and minimum index + pathCounter), you enter that path passing in k and the new minimum index as (minimum index + sum of all previous pathCounters, excluding the one you took). I think this is O(1), since increasing the number data within a certain range doesn't increase the difficulty of finding k.
I hope this helps, and if anything is not very clear, just let me know.

How to find the rank of a node in an AVL tree?

I need to implement two rank queries [rank(k) and select(r)]. But before I can start on this, I need to figure out how the two functions work.
As far as I know, rank(k) returns the rank of a given key k, and select(r) returns the key of a given rank r.
So my questions are:
1.) How do you calculate the rank of a node in an AVL(self balancing BST)?
2.) Is it possible for more than one key to have the same rank? And if so, what woulud select(r) return?
I'm going to include a sample AVL tree which you can refer to if it helps answer the question.
Thanks!
Your question really boils down to: "how is the term 'rank' normally defined with respect to an AVL tree?" (and, possibly, how is 'select' normally defined as well).
At least as I've seen the term used, "rank" means the position among the nodes in the tree -- i.e., how many nodes are to its left. You're typically given a pointer to a node (or perhaps a key value) and you need to count the number of nodes to its left.
"Select" is basically the opposite -- you're given a particular rank, and need to retrieve a pointer to the specified node (or the key for that node).
Two notes: First, since neither of these modifies the tree at all, it makes no real difference what form of balancing is used (e.g., AVL vs. red/black); for that matter a tree with no balancing at all is equivalent as well. Second, if you need to do this frequently, you can improve speed considerably by adding an extra field to each node recording how many nodes are to its left.
Rank is the number of nodes in the Left sub tree plus one, and is calculated for every node. I believe rank is not a concept specific to AVL trees - it can be calculated for any binary tree.
Select is just opposite to rank. A rank is given and you have to return a node matching that rank.
The following code will perform rank calculation:
void InitRank(struct TreeNode *Node)
{
if(!Node)
{
return;
}
else
{ Node->rank = 1 + NumeberofNodeInTree(Node->LChild);
InitRank(Node->LChild);
InitRank(Node->RChild);
}
}
int NumeberofNodeInTree(struct TreeNode *Node)
{
if(!Node)
{
return 0;
}
else
{
return(1+NumeberofNodeInTree(Node->LChild)+NumeberofNodeInTree(Node->RChild));
}
}
Here is the code i wrote and worked fine for AVL Tree to get the rank of a particular value. difference is just you used a node as parameter and i used a key a parameter. you can modify this as your own way. Sample code:
public int rank(int data){
return rank(data,root);
}
private int rank(int data, AVLNode r){
int rank=1;
while(r != null){
if(data<r.data)
r = r.left;
else if(data > r.data){
rank += 1+ countNodes(r.left);
r = r.right;
}
else{
r.rank=rank+countNodes(r.left);
return r.rank;
}
}
return 0;
}
[N.B] If you want to start your rank from 0 then initialize variable rank=0.
you definitely should have implemented the method countNodes() to execute this code.

Create Balanced Binary Search Tree from Sorted linked list

What's the best way to create a balanced binary search tree from a sorted singly linked list?
How about creating nodes bottom-up?
This solution's time complexity is O(N). Detailed explanation in my blog post:
http://www.leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html
Two traversal of the linked list is all we need. First traversal to get the length of the list (which is then passed in as the parameter n into the function), then create nodes by the list's order.
BinaryTree* sortedListToBST(ListNode *& list, int start, int end) {
if (start > end) return NULL;
// same as (start+end)/2, avoids overflow
int mid = start + (end - start) / 2;
BinaryTree *leftChild = sortedListToBST(list, start, mid-1);
BinaryTree *parent = new BinaryTree(list->data);
parent->left = leftChild;
list = list->next;
parent->right = sortedListToBST(list, mid+1, end);
return parent;
}
BinaryTree* sortedListToBST(ListNode *head, int n) {
return sortedListToBST(head, 0, n-1);
}
You can't do better than linear time, since you have to at least read all the elements of the list, so you might as well copy the list into an array (linear time) and then construct the tree efficiently in the usual way, i.e. if you had the list [9,12,18,23,24,51,84], then you'd start by making 23 the root, with children 12 and 51, then 9 and 18 become children of 12, and 24 and 84 become children of 51. Overall, should be O(n) if you do it right.
The actual algorithm, for what it's worth, is "take the middle element of the list as the root, and recursively build BSTs for the sub-lists to the left and right of the middle element and attach them below the root".
Best isn't only about asynmptopic run time. The sorted linked list has all the information needed to create the binary tree directly, and I think this is probably what they are looking for
Note that the first and third entries become children of the second, then the fourth node has chidren of the second and sixth (which has children the fifth and seventh) and so on...
in psuedo code
read three elements, make a node from them, mark as level 1, push on stack
loop
read three elemeents and make a node of them
mark as level 1
push on stack
loop while top two enties on stack have same level (n)
make node of top two entries, mark as level n + 1, push on stack
while elements remain in list
(with a bit of adjustment for when there's less than three elements left or an unbalanced tree at any point)
EDIT:
At any point, there is a left node of height N on the stack. Next step is to read one element, then read and construct another node of height N on the stack. To construct a node of height N, make and push a node of height N -1 on the stack, then read an element, make another node of height N-1 on the stack -- which is a recursive call.
Actually, this means the algorithm (even as modified) won't produce a balanced tree. If there are 2N+1 nodes, it will produce a tree with 2N-1 values on the left, and 1 on the right.
So I think #sgolodetz's answer is better, unless I can think of a way of rebalancing the tree as it's built.
Trick question!
The best way is to use the STL, and advantage yourself of the fact that the sorted associative container ADT, of which set is an implementation, demands insertion of sorted ranges have amortized linear time. Any passable set of core data structures for any language should offer a similar guarantee. For a real answer, see the quite clever solutions others have provided.
What's that? I should offer something useful?
Hum...
How about this?
The smallest possible meaningful tree in a balanced binary tree is 3 nodes.
A parent, and two children. The very first instance of such a tree is the first three elements. Child-parent-Child. Let's now imagine this as a single node. Okay, well, we no longer have a tree. But we know that the shape we want is Child-parent-Child.
Done for a moment with our imaginings, we want to keep a pointer to the parent in that initial triumvirate. But it's singly linked!
We'll want to have four pointers, which I'll call A, B, C, and D. So, we move A to 1, set B equal to A and advance it one. Set C equal to B, and advance it two. The node under B already points to its right-child-to-be. We build our initial tree. We leave B at the parent of Tree one. C is sitting at the node that will have our two minimal trees as children. Set A equal to C, and advance it one. Set D equal to A, and advance it one. We can now build our next minimal tree. D points to the root of that tree, B points to the root of the other, and C points to the... the new root from which we will hang our two minimal trees.
How about some pictures?
[A][B][-][C]
With our image of a minimal tree as a node...
[B = Tree][C][A][D][-]
And then
[Tree A][C][Tree B]
Except we have a problem. The node two after D is our next root.
[B = Tree A][C][A][D][-][Roooooot?!]
It would be a lot easier on us if we could simply maintain a pointer to it instead of to it and C. Turns out, since we know it will point to C, we can go ahead and start constructing the node in the binary tree that will hold it, and as part of this we can enter C into it as a left-node. How can we do this elegantly?
Set the pointer of the Node under C to the node Under B.
It's cheating in every sense of the word, but by using this trick, we free up B.
Alternatively, you can be sane, and actually start building out the node structure. After all, you really can't reuse the nodes from the SLL, they're probably POD structs.
So now...
[TreeA]<-[C][A][D][-][B]
[TreeA]<-[C]->[TreeB][B]
And... Wait a sec. We can use this same trick to free up C, if we just let ourselves think of it as a single node instead of a tree. Because after all, it really is just a single node.
[TreeC]<-[B][A][D][-][C]
We can further generalize our tricks.
[TreeC]<-[B][TreeD]<-[C][-]<-[D][-][A]
[TreeC]<-[B][TreeD]<-[C]->[TreeE][A]
[TreeC]<-[B]->[TreeF][A]
[TreeG]<-[A][B][C][-][D]
[TreeG]<-[A][-]<-[C][-][D]
[TreeG]<-[A][TreeH]<-[D][B][C][-]
[TreeG]<-[A][TreeH]<-[D][-]<-[C][-][B]
[TreeG]<-[A][TreeJ]<-[B][-]<-[C][-][D]
[TreeG]<-[A][TreeJ]<-[B][TreeK]<-[D][-]<-[C][-]
[TreeG]<-[A][TreeJ]<-[B][TreeK]<-[D][-]<-[C][-]
We are missing a critical step!
[TreeG]<-[A]->([TreeJ]<-[B]->([TreeK]<-[D][-]<-[C][-]))
Becomes :
[TreeG]<-[A]->[TreeL->([TreeK]<-[D][-]<-[C][-])][B]
[TreeG]<-[A]->[TreeL->([TreeK]<-[D]->[TreeM])][B]
[TreeG]<-[A]->[TreeL->[TreeN]][B]
[TreeG]<-[A]->[TreeO][B]
[TreeP]<-[B]
Obviously, the algorithm can be cleaned up considerably, but I thought it would be interesting to demonstrate how one can optimize as you go by iteratively designing your algorithm. I think this kind of process is what a good employer should be looking for more than anything.
The trick, basically, is that each time we reach the next midpoint, which we know is a parent-to-be, we know that its left subtree is already finished. The other trick is that we are done with a node once it has two children and something pointing to it, even if all of the sub-trees aren't finished. Using this, we can get what I am pretty sure is a linear time solution, as each element is touched only 4 times at most. The problem is that this relies on being given a list that will form a truly balanced binary search tree. There are, in other words, some hidden constraints that may make this solution either much harder to apply, or impossible. For example, if you have an odd number of elements, or if there are a lot of non-unique values, this starts to produce a fairly silly tree.
Considerations:
Render the element unique.
Insert a dummy element at the end if the number of nodes is odd.
Sing longingly for a more naive implementation.
Use a deque to keep the roots of completed subtrees and the midpoints in, instead of mucking around with my second trick.
This is a python implementation:
def sll_to_bbst(sll, start, end):
"""Build a balanced binary search tree from sorted linked list.
This assumes that you have a class BinarySearchTree, with properties
'l_child' and 'r_child'.
Params:
sll: sorted linked list, any data structure with 'popleft()' method,
which removes and returns the leftmost element of the list. The
easiest thing to do is to use 'collections.deque' for the sorted
list.
start: int, start index, on initial call set to 0
end: int, on initial call should be set to len(sll)
Returns:
A balanced instance of BinarySearchTree
This is a python implementation of solution found here:
http://leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html
"""
if start >= end:
return None
middle = (start + end) // 2
l_child = sll_to_bbst(sll, start, middle)
root = BinarySearchTree(sll.popleft())
root.l_child = l_child
root.r_child = sll_to_bbst(sll, middle+1, end)
return root
Instead of the sorted linked list i was asked on a sorted array (doesn't matter though logically, but yes run-time varies) to create a BST of minimal height, following is the code i could get out:
typedef struct Node{
struct Node *left;
int info;
struct Node *right;
}Node_t;
Node_t* Bin(int low, int high) {
Node_t* node = NULL;
int mid = 0;
if(low <= high) {
mid = (low+high)/2;
node = CreateNode(a[mid]);
printf("DEBUG: creating node for %d\n", a[mid]);
if(node->left == NULL) {
node->left = Bin(low, mid-1);
}
if(node->right == NULL) {
node->right = Bin(mid+1, high);
}
return node;
}//if(low <=high)
else {
return NULL;
}
}//Bin(low,high)
Node_t* CreateNode(int info) {
Node_t* node = malloc(sizeof(Node_t));
memset(node, 0, sizeof(Node_t));
node->info = info;
node->left = NULL;
node->right = NULL;
return node;
}//CreateNode(info)
// call function for an array example: 6 7 8 9 10 11 12, it gets you desired
// result
Bin(0,6);
HTH Somebody..
This is the pseudo recursive algorithm that I will suggest.
createTree(treenode *root, linknode *start, linknode *end)
{
if(start == end or start = end->next)
{
return;
}
ptrsingle=start;
ptrdouble=start;
while(ptrdouble != end and ptrdouble->next !=end)
{
ptrsignle=ptrsingle->next;
ptrdouble=ptrdouble->next->next;
}
//ptrsignle will now be at the middle element.
treenode cur_node=Allocatememory;
cur_node->data = ptrsingle->data;
if(root = null)
{
root = cur_node;
}
else
{
if(cur_node->data (less than) root->data)
root->left=cur_node
else
root->right=cur_node
}
createTree(cur_node, start, ptrSingle);
createTree(cur_node, ptrSingle, End);
}
Root = null;
The inital call will be createtree(Root, list, null);
We are doing the recursive building of the tree, but without using the intermediate array.
To get to the middle element every time we are advancing two pointers, one by one element, other by two elements. By the time the second pointer is at the end, the first pointer will be at the middle.
The running time will be o(nlogn). The extra space will be o(logn). Not an efficient solution for a real situation where you can have R-B tree which guarantees nlogn insertion. But good enough for interview.
Similar to #Stuart Golodetz and #Jake Kurzer the important thing is that the list is already sorted.
In #Stuart's answer, the array he presented is the backing data structure for the BST. The find operation for example would just need to perform index array calculations to traverse the tree. Growing the array and removing elements would be the trickier part, so I'd prefer a vector or other constant time lookup data structure.
#Jake's answer also uses this fact but unfortunately requires you to traverse the list to find each time to do a get(index) operation. But requires no additional memory usage.
Unless it was specifically mentioned by the interviewer that they wanted an object structure representation of the tree, I would use #Stuart's answer.
In a question like this you'd be given extra points for discussing the tradeoffs and all the options that you have.
Hope the detailed explanation on this post helps:
http://preparefortechinterview.blogspot.com/2013/10/planting-trees_1.html
A slightly improved implementation from #1337c0d3r in my blog.
// create a balanced BST using #len elements starting from #head & move #head forward by #len
TreeNode *sortedListToBSTHelper(ListNode *&head, int len) {
if (0 == len) return NULL;
auto left = sortedListToBSTHelper(head, len / 2);
auto root = new TreeNode(head->val);
root->left = left;
head = head->next;
root->right = sortedListToBSTHelper(head, (len - 1) / 2);
return root;
}
TreeNode *sortedListToBST(ListNode *head) {
int n = length(head);
return sortedListToBSTHelper(head, n);
}
If you know how many nodes are in the linked list, you can do it like this:
// Gives path to subtree being built. If branch[N] is false, branch
// less from the node at depth N, if true branch greater.
bool branch[max depth];
// If rem[N] is true, then for the current subtree at depth N, it's
// greater subtree has one more node than it's less subtree.
bool rem[max depth];
// Depth of root node of current subtree.
unsigned depth = 0;
// Number of nodes in current subtree.
unsigned num_sub = Number of nodes in linked list;
// The algorithm relies on a stack of nodes whose less subtree has
// been built, but whose right subtree has not yet been built. The
// stack is implemented as linked list. The nodes are linked
// together by having the "greater" handle of a node set to the
// next node in the list. "less_parent" is the handle of the first
// node in the list.
Node *less_parent = nullptr;
// h is root of current subtree, child is one of its children.
Node *h, *child;
Node *p = head of the sorted linked list of nodes;
LOOP // loop unconditionally
LOOP WHILE (num_sub > 2)
// Subtract one for root of subtree.
num_sub = num_sub - 1;
rem[depth] = !!(num_sub & 1); // true if num_sub is an odd number
branch[depth] = false;
depth = depth + 1;
num_sub = num_sub / 2;
END LOOP
IF (num_sub == 2)
// Build a subtree with two nodes, slanting to greater.
// I arbitrarily chose to always have the extra node in the
// greater subtree when there is an odd number of nodes to
// split between the two subtrees.
h = p;
p = the node after p in the linked list;
child = p;
p = the node after p in the linked list;
make h and p into a two-element AVL tree;
ELSE // num_sub == 1
// Build a subtree with one node.
h = p;
p = the next node in the linked list;
make h into a leaf node;
END IF
LOOP WHILE (depth > 0)
depth = depth - 1;
IF (not branch[depth])
// We've completed a less subtree, exit while loop.
EXIT LOOP;
END IF
// We've completed a greater subtree, so attach it to
// its parent (that is less than it). We pop the parent
// off the stack of less parents.
child = h;
h = less_parent;
less_parent = h->greater_child;
h->greater_child = child;
num_sub = 2 * (num_sub - rem[depth]) + rem[depth] + 1;
IF (num_sub & (num_sub - 1))
// num_sub is not a power of 2
h->balance_factor = 0;
ELSE
// num_sub is a power of 2
h->balance_factor = 1;
END IF
END LOOP
IF (num_sub == number of node in original linked list)
// We've completed the full tree, exit outer unconditional loop
EXIT LOOP;
END IF
// The subtree we've completed is the less subtree of the
// next node in the sequence.
child = h;
h = p;
p = the next node in the linked list;
h->less_child = child;
// Put h onto the stack of less parents.
h->greater_child = less_parent;
less_parent = h;
// Proceed to creating greater than subtree of h.
branch[depth] = true;
num_sub = num_sub + rem[depth];
depth = depth + 1;
END LOOP
// h now points to the root of the completed AVL tree.
For an encoding of this in C++, see the build member function (currently at line 361) in https://github.com/wkaras/C-plus-plus-intrusive-container-templates/blob/master/avl_tree.h . It's actually more general, a template using any forward iterator rather than specifically a linked list.

Homework: binary tree - level-order trasversal

is there a way to visit a binary tree from the lowest level to the higher (root) ?
not from the root-level to the lowest!!!
(and not using the level-order traversal and a stack...!!!) <--- its opposite..
so difficult...thank you!
There's a few challenges here that lead to different solutions:
Can you traverse up the tree? Often data structures are set up so you can only go down. You could find all leaf nodes, put them in a priority queue by level, and then traverse up.
Can you store O(n) additional data? You could traverse it in a normal breadth-first manner, inserting pointers into a priority queue by level, as with the previous solution, but this time inserting all nodes during the initial traversal. This will increase the maximum size of the auxiliary data used during traversal though.
Is the tree guaranteed to be balanced and full, like it might be in a Heap-like tree? If it is, you can traverse it in a simpler manner, by just going to the right places.
You probably could do it easily IF you maintained a pointer to the node at the greatest depth. If you don't, then you must find that node before begining your traversal. Also, your nodes will all have to have pointers to their parents.
I explain in a better way. I have an algebraic expression tree (so not balanced). I have to valuate it USING a queue (and only a queue). I asked this question because I think the only way is to take nodes starting from the lowest level, till the root...
example:
tree ( + ( * ( 2 ) ( 2 ) ) ( 3 ) )
I take the queue and:
enqueue(1);
enqueue(2);
(*) -----> dequeue; dequeue; result = 2 * 2; enqueue(result);
enqueue 3;
(+) -----> dequeue; dequeue; result = 4 + 3; give result;
so I need to have this traversal: 2 ; 2 ; * ; 3 ; +
I dont know if it's clear...
Provided I understood you question correctly: If you want to traverse the tree visiting a leaf first and the root last, you can visit the nodes on the way back as you traverse the tree.
function traverse(node)
for each child of node
traverse(child)
end
visit(node)
end
If you want to visit the nodes in level order, you could do something like this (though using a stack -- I'm not sure whether you didn't want one at all or some particular solution using a stack):
queue.push(root)
while queue is not empty
node = queue.pop()
for each child of node
queue.push(child)
stack.push(child)
end
end
while stack is not empty
visit(stack.pop())
end
You can do it using only a queue, but with worse time complexity, if you do it like this:
for i = treedepth down to 0
queue.push(root)
while queue is not empty
node = queue.pop()
if node has depth i
visit(node)
else
for each child of node
queue.push(child)
end
end
end
end
The tree depth and node levels can be found using an initial traversal, if needed.
However, if you are allowed to make recursive calls, you actually have access to a stack (the call stack). This can be exploited to implement the second solution, but making the stack implicit.
function unwind(queue)
if queue is not empty
node = queue.pop()
unwind(queue)
visit(node)
end
end
queue.push(root)
while queue is not empty
node = queue.pop()
for each child of node
queue.push(child)
queue2.push(child)
end
end
unwind(queue2)
And of course, if you have access to almost any other data structure (list, array, priority queue, double ended queue, etc.) you can easily implement a stack yourself. But then it would be rather pointless to forbid stacks in the first place.
Queue is only useful for traversing level-order from root to leaf of the tree.
You can use a Depth-first Traversal for printing a certain level.
Like this:
void printLevel(BinaryTree *p, int level) {
if (!p) return;
if (level == 1) {
cout << p->data << " ";
} else {
printLevel(p->left, level-1);
printLevel(p->right, level-1);
}
}
To print all levels from leaf up to root, you would need to find the maximum depth of the tree. This could be done easily using depth-first traversal as well (You can easily google the solution).
void printLevelOrder(BinaryTree *root) {
int height = maxHeight(root);
for (int level = height; level >= 1; level--) {
printLevel(root, level);
cout << endl;
}
}
The run time complexity is surprisingly, O(N), where N is the total number of nodes.
For more information and run-time complexity analysis, refer to the page below:
http://www.ihas1337code.com/2010/09/binary-tree-level-order-traversal-using_17.html

Resources