Range function for values in a splay tree? - algorithm

I am a beginner at data structures.
I am trying to write some pseudocode for a range function with splay trees: Range(S, A, B), which changes S to the set of all its members for which the key value C satisfies A ≤ C ≤ B. I know a splay trees fall into being types of binary search trees and implement their own splay operation. Basically, I am trying to return a range of values that are between A and B. However, I am having trouble understanding how I should do this, or where I should even begin, and what conditions I should check. I've read the definition of splay trees, and know they are like binary search trees with the move-to-front algorithm.
This is what I have so far:
Algorithm Range(Array S, int A, int B): array Set
S = new array(size) //Initialize an empty array of some size
if (A > B) then return NULL
I just feel somewhat lost after this point. I am not sure how to check the values of splay trees. Please let me know if I can provide additional information, or what directions I should go in.

According to Wikipedia,
A splay tree is a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look-up and removal in O(log n) amortized time.
However, since the "splay" operation applies only to random searches, the tree may be considered to be an ordinary "binary search tree".
The algorithm becomes,
Range (BSTree T, int A, int B)
int Array S
S ← Empty array
If A <= B then
For each C in T
If A <= C and C <= B then
Append C to S
Return S
That is, tree T is traversed, in order; and, for each item C meeting the condition, the item is added to array S. If no item meets the condition, an empty array is returned.
The For each, if not available in the implementation language, may be implemented using the algorithm described as in-order
inorder(node)
if (node = null)
return
inorder(node.left)
visit(node)
inorder(node.right)
where vist(node) is the place to test whether the item meets the condition.

This is pretty late, but from the term "change" in the question prompt, it seems like it is asking you to modify the S tree so that it has only the elements within range.
So I would do it like this: splay the tree around the lower bound, and drop the left subtree, since everything in the left subtree will be lower value than the lower bound. Then splay the tree around upper bound, then dropping the right subtree, since everything in the right subtree will be higher value than the upper bound.
Here is how I would write it, in pseudocode
//assumes S is the root of an actual tree with elements
function Range(node S, int A, int B)
node temp <- Splay(k1, S) //splay around lower bound
if (temp.key < k1) //makes sure that there are elements in tree that satisfies this
temp <- temp.right
if (temp == null) return //there are no key greater than A, abort!
temp <- Splay(temp.key, S)
temp.left <- null //drops left subtree, bc they are all going to be lesser value
temp <- Splay(k2, temp) //splay around upper bound
if (temp.key > k2)
temp <- temp.left
if (temp == null) return //there are no keys less than B, abort!
temp <- Splay(temp.key, temp)
temp.right <- null //drops all right subtree
S <- temp
Hope this helps! This should run in O(logn) also

Related

Algorithm to construct a binary search tree out of elements from another bst

I'm trying to come up with an algorithm to construct a binary search tree using the elements from another binary search tree, but with the restriction that those elements have to be greater or equal than some given integer, let's call it x.
I thought of a recursive approach (using in order traversal):
binary_tree (bst tree, int x) {
if (tree is empty)
return empty;
if (tree->element>=x)
insert tree->element in a new BST;
else ????
}
I have no idea what the last recursive call would be, I obviously can't write two returns like this:
else
return (tree->left, x)
return (tree->right, x)
And I can't think of anything else, sorry if this is a silly question! I'm just starting with recursion and it's really confusing.
Lets think about what we are doing here. We want to construct a tree from an existing binary search tree. Because the existing tree is a BST we get some helpful info.
For any node V, if V <= x then the subtree pointed to by V -> left will have nodes all smaller than x. So we no longer need to look in the left subtree anymore. However if we hit a node that is greater than or equal to x we need to continue the recursion. Lets bring this all together in pseudo code
newBST(root):
if root is null
return
if root.val >= x
addNewNode(root.val)
newBST(root.right)
newBST(root.left)
else:
newBST(root.right)
It's a little tricky to do this recursively, because there isn't a 1-1 correspondence between subtrees in the tree you have and subtrees in the tree you want.
The simplest way to do this is to copy the values >= x into a list in order, and then build a tree from the list recursively.

Construct a binary tree from permutation in n log n time

The numbers 1 to n are inserted in a binary search tree in a specified order p_1, p_2,..., p_n. Describe an O(nlog n) time algorithm to construct the resulting final binary search tree.
Note that :-
I don't need average time n log n, but the worst time.
I need the the exact tree that results when insertion takes place with the usual rules. AVL or red black trees not allowed.
This is an assignment question. It is very very non trivial. In fact it seemed impossible at first glance. I have thought on it much. My observations:-
The argument that we use to prove that sorting takes atleast n log n time does not eliminate the existence of such an algorithm here.
If it is always possible to find a subtree in O(n) time whose size is between two fractions of the size of tree, the problem can be easily solved.
Choosing median or left child of root as root of subtree doesn't work.
The trick is not to use the constructed BST for lookups. Instead, keep an additional, balanced BST for lookups. Link the leaves.
For example, we might have
Constructed Balanced
3 2
/ \ / \
2 D 1 3
/ \ / | | \
1 C a b c d
/ \
A B
where a, b, c, d are pointers to A, B, C, D respectively, and A, B, C, D are what would normally be null pointers.
To insert, insert into the balanced BST first (O(log n)), follow the pointer to the constructed tree (O(1)), do the constructed insert (O(1)), and relink the new leaves (O(1)).
As David Eisenstat doesn't have time to extend his answer, I'll try to put more details into a similar algorithm.
Intuition
The main intuition behind the algorithm is based on the following statements:
statement #1: if a BST contains values a and b (a < b) AND there are no values between them, then either A (node for value a) is a (possibly indirect) parent of B (node for value b) or B is a (possibly indirect) parent of A.
This statement is obviously true because if their lowest common ancestor C is some other node than A and B, its value c must be between a and b. Note that statement #1 is true for any BST (balanced or unbalanced).
statement #2: if a simple (unbalanced) BST contains values a and b (a < b) AND there are no values between them AND we are trying to add value x such that a < x < b, then X (node for value x) will be either direct right (greater) child of A or direct left (less) child of B whichever node is lower in the tree.
Let's assume that the lower of two nodes is a (the other case is symmetrical). During insertion phase value x will travel the same path as a during its insertion because tree doesn't contain any values between a and x i.e. at any comparison values a and x are indistinguishable. It means that value x will navigate tree till node A and will pass node B at some earlier step (see statement #1). As x > a it should become a right child of A. Direct right child of A must be empty at this point because A is in B's subtree i.e. all values in that subtree are less than b and since there are no values between a and b in the tree, no value can be right child of node A.
Note that statement #2 might potentially be not true for some balanced BST after re-balancing was performed although this should be a strange case.
statement #3: in a balanced BST for any value x not in the tree yet, you can find closest greater and closest less values in O(log(N)) time.
This follows directly from statements #1 and #2: all you need is find the potential insertion point for the value x in the BST (takes O(log(N))), one of the two values will be direct parent of the insertion point and to find the other you need to travel the tree back to the root (again takes O(log(N))).
So now the idea behind the algorithm becomes clear: for fast insertion into an unbalanced BST we need to find nodes with closest less and greater values. We can easily do it if we additionally maintain a balanced BST with the same keys as our target (unbalanced) BST and with corresponding nodes from that BST as values. Using that additional data structure we can find insertion point for each new value in O(log(N)) time and update this data structure with new value in O(log(N)) time as well.
Algorithm
Init "main" root and balancedRoot with null.
for each value x in the list do:
if this is the first value just add it as the root nodes to both trees and go to #2
in the tree specified by balancedRoot find nodes that correspond to the closest less (BalancedA, points to node A in the main BST) and closest greater (BalancedB, points to node B in the main BST) values.
If there is no closest lower value i.e. we are adding minimum element, add it as the left child to the node B
If there is no closest greater value i.e. we are adding maximum element, add it as the right child to the node A
Find whichever of nodes A or B is lower in the tree. You can use explicit level stored in the node. If the lower node is A (less node), add x as the direct right child of A else add x as the direct left child of B (greater node). Alternatively (and more cleverly) you may notice that from the statements #1 and #2 follows that exactly one of the two candidate insert positions (A's right child or B's left child) will be empty and this is where you want to insert your value x.
Add value x to the balanced tree (might re-use from step #4).
Go to step #2
As no inner step of the loop takes more than O(log(N)), total complexity is O(N*log(N))
Java implementation
I'm too lazy to implement balanced BST myself so I used standard Java TreeMap that implements Red-Black tree and has useful lowerEntry and higherEntry methods that correspond to step #4 of the algorithm (you may look at the source code to ensure that both are actually O(log(N))).
import java.util.Map;
import java.util.TreeMap;
public class BSTTest {
static class Node {
public final int value;
public Node left;
public Node right;
public Node(int value) {
this.value = value;
}
public boolean compareTree(Node other) {
return compareTrees(this, other);
}
public static boolean compareTrees(Node n1, Node n2) {
if ((n1 == null) && (n2 == null))
return true;
if ((n1 == null) || (n2 == null))
return false;
if (n1.value != n2.value)
return false;
return compareTrees(n1.left, n2.left) &&
compareTrees(n1.right, n2.right);
}
public void assignLeftSafe(Node child) {
if (this.left != null)
throw new IllegalStateException("left child is already set");
this.left = child;
}
public void assignRightSafe(Node child) {
if (this.right != null)
throw new IllegalStateException("right child is already set");
this.right = child;
}
#Override
public String toString() {
return "Node{" +
"value=" + value +
'}';
}
}
static Node insertToBst(Node root, int value) {
if (root == null)
root = new Node(value);
else if (value < root.value)
root.left = insertToBst(root.left, value);
else
root.right = insertToBst(root.right, value);
return root;
}
static Node buildBstDirect(int[] values) {
Node root = null;
for (int v : values) {
root = insertToBst(root, v);
}
return root;
}
static Node buildBstSmart(int[] values) {
Node root = null;
TreeMap<Integer, Node> balancedTree = new TreeMap<Integer, Node>();
for (int v : values) {
Node node = new Node(v);
if (balancedTree.isEmpty()) {
root = node;
} else {
Map.Entry<Integer, Node> lowerEntry = balancedTree.lowerEntry(v);
Map.Entry<Integer, Node> higherEntry = balancedTree.higherEntry(v);
if (lowerEntry == null) {
// adding minimum value
higherEntry.getValue().assignLeftSafe(node);
} else if (higherEntry == null) {
// adding max value
lowerEntry.getValue().assignRightSafe(node);
} else {
// adding some middle value
Node lowerNode = lowerEntry.getValue();
Node higherNode = higherEntry.getValue();
if (lowerNode.right == null)
lowerNode.assignRightSafe(node);
else
higherNode.assignLeftSafe(node);
}
}
// update balancedTree
balancedTree.put(v, node);
}
return root;
}
public static void main(String[] args) {
int[] input = new int[]{7, 6, 9, 4, 1, 8, 2, 5, 3};
Node directRoot = buildBstDirect(input);
Node smartRoot = buildBstSmart(input);
System.out.println(directRoot.compareTree(smartRoot));
}
}
Here's a linear-time algorithm. (I said that I wasn't going to work on this question, so if you like this answer, please award the bounty to SergGr.)
Create a doubly linked list with nodes 1..n and compute the inverse of p. For i from n down to 1, let q be the left neighbor of p_i in the list, and let r be the right neighbor. If p^-1(q) > p^-1(r), then make p_i the right child of q. If p^-1(q) < p^-1(r), then make p_i the left child of r. Delete p_i from the list.
In Python:
class Node(object):
__slots__ = ('left', 'key', 'right')
def __init__(self, key):
self.left = None
self.key = key
self.right = None
def construct(p):
# Validate the input.
p = list(p)
n = len(p)
assert set(p) == set(range(n)) # 0 .. n-1
# Compute p^-1.
p_inv = [None] * n
for i in range(n):
p_inv[p[i]] = i
# Set up the list.
nodes = [Node(i) for i in range(n)]
for i in range(n):
if i >= 1:
nodes[i].left = nodes[i - 1]
if i < n - 1:
nodes[i].right = nodes[i + 1]
# Process p.
for i in range(n - 1, 0, -1): # n-1, n-2 .. 1
q = nodes[p[i]].left
r = nodes[p[i]].right
if r is None or (q is not None and p_inv[q.key] > p_inv[r.key]):
print(p[i], 'is the right child of', q.key)
else:
print(p[i], 'is the left child of', r.key)
if q is not None:
q.right = r
if r is not None:
r.left = q
construct([1, 3, 2, 0])
Here's my O(n log^2 n) attempt that doesn't require building a balanced tree.
Put nodes in an array in their natural order (1 to n). Also link them into a linked list in the order of insertion. Each node stores its order of insertion along with the key.
The algorithm goes like this.
The input is a node in the linked list, and a range (low, high) of indices in the node array
Call the input node root, Its key is rootkey. Unlink it from the list.
Determine which subtree of the input node is smaller.
Traverse the corresponding array range, unlink each node from the linked list, then link them in a separate linked list and sort the list again in the insertion order.
Heads of the two resulting lists are children of the input node.
Perform the algorithm recursively on children of the input node, passing ranges (low, rootkey-1) and (rootkey+1, high) as index ranges.
The sorting operation at each level gives the algorithm the extra log n complexity factor.
Here's an O(n log n) algorithm that can also be adapted to O(n log log m) time, where m is the range, by using a Y-fast trie rather than a balanced binary tree.
In a binary search tree, lower values are left of higher values. The order of insertion corresponds with the right-or-left node choices when traveling along the final tree. The parent of any node, x, is either the least higher number previously inserted or the greatest lower number previously inserted, whichever was inserted later.
We can identify and connect the listed nodes with their correct parents using the logic above in O(n log n) worst-time by maintaining a balanced binary tree with the nodes visited so far as we traverse the order of insertion.
Explanation:
Let's imagine a proposed lower parent, p. Now imagine there's a number, l > p but still lower than x, inserted before p. Either (1) p passed l during insertion, in which case x would have had to pass l to get to p but that contradicts that x must have gone right if it reached l; or (2) p did not pass l, in which case p is in a subtree left of l but that would mean a number was inserted that's smaller than l but greater than x, a contradiction.
Clearly, a number, l < x, greater than p that was inserted after p would also contradict p as x's parent since either (1) l passed p during insertion, which means p's right child would have already been assigned when x was inserted; or (2) l is in a subtree to the right of p, which again would mean a number was inserted that's smaller than l but greater than x, a contradiction.
Therefore, for any node, x, with a lower parent, that parent must be the greatest number lower than and inserted before x. Similar logic covers the scenario of a higher proposed parent.
Now let's imagine x's parent, p < x, was inserted before h, the lowest number greater than and inserted before x. Then either (1) h passed p, in which case p's right node would have been already assigned when x was inserted; or (2) h is in a subtree right of p, which means a number lower than h and greater than x was previously inserted but that would contradict our assertion that h is the lowest number inserted so far that's greater than x.
Since this is an assignment, I'm posting a hint instead of an answer.
Sort the numbers, while keeping the insertion order. Say you have input: [1,7,3,5,8,2,4]. Then after sorting you will have [[1,0], [2,5], [3,2], [4, 6], [5,3], [7,1], [8,4]] . This is actually the in-order traversal of the resulting tree. Think hard about how to reconstruct the tree given the in-order traversal and the insertion order (this part will be linear time).
More hints coming if you really need them.

Segment tree time complexity analysis

How can we prove that the update and query operations on a segment tree (http://letuskode.blogspot.in/2013/01/segtrees.html) (not to be confused with an interval tree) are O(log n)?
I thought of a way which goes like this - At every node, we make at most two recursive calls on the left and right sub-trees. If we could prove that one of these calls terminates fairly quickly, the time complexity would be logarithmically bounded. But how do we prove this?
Lemma: at most 2 nodes are used at each level of the tree(a level is set of nodes with a fixed distance from the root).
Proof: Let's assume that at the level h at least 3 nodes were used(let's call them L, M and R). It means that the entire interval from the left bound of the L node to the right bound of the R node lies inside the query range. That's why M is fully covered by a node(let's call it UP) from the h - 1 level that fully lies inside the query range. But it implies that M could not be visited at all because the traversal would stop in the UP node or higher. Here are some pictures to clarify this step of the proof:
h - 1: UP UP UP
/\ /\ /\
h: L M R L M R L M R
That's why at most two nodes at each level are used. There are only log N levels in a segment tree so at most 2 * log N are used in total.
The claim is that there are at most 2 nodes which are expanded at each level. We will prove this by contradiction.
Consider the segment tree given below.
Let's say that there are 3 nodes that are expanded in this tree. This means that the range is from the left most colored node to the right most colored node. But notice that if the range extends to the right most node, then the full range of the middle node is covered. Thus, this node will immediately return the value and won't be expanded. Thus, we prove that at each level, we expand at most 2 nodes and since there are log⁡n levels, the nodes that are expanded are 2⋅logn=Θ(logn)
If we prove that there at most N nodes to visit on each level and knowing that Binary segment tree has max logN height - we can say that query operatioin has is O(LogN) complexity. Other answers tell you that there at most 2 nodes to visit on each level but i assume that there at most 4 nodes to visit 4 nodes are visited on the level. You can find the same statement without proof in other sources like Geek for Geeks
Other answers show you too small segment tree. Consider this example: Segment tree with leaf nodes size - 16, indexes start from zero. You are looking for the range [0-14]
See example: Crossed are nodes that we are visiting
At each level (L) of tree there would be at max 2 nodes which could have partial overlap. (If unable to prove - why ?, please mention)
So, at level (L+1) we have to explore at max 4 nodes. and total height/levels in the tree is O(log(N)) (where N is number of nodes). Hence time complexity is O(4*Log(N)) ~ O(Log(N)).
PS: Please refer diagram attached by #Oleksandr Papchenko to get better understanding.
I will try to give simple mathematical explanation.
Look at the code below . As per the segment tree implementation for range_query
int query(int node, int st, int end, int l, int r)
{
/*if range lies inside the query range*/
if(l <= st && end <= r )
{
return tree[node];
}
/*If range is totally outside the query range*/
if(st>r || end<l)
return INT_MAX;
/*If query range intersects both the children*/
int mid = (st+end)/2;
int ans1 = query(2*node, st, mid, l, r);
int ans2 = query(2*node+1, mid+1, end, l, r);
return min(ans1, ans2);
}
you go left and right and if its range then you return value.
So at each level 2 nodes are selected let's call LeftMost and rightMost. If say some other node is selected in between called mid one, then their least common ancestor must have been same and that range would have been included. thus
thus , For segment tree with logN levels.
Search at each level = 2
Total search = (search at each level ) * (number of levels) = (2logN)
Therefore search complexity = O(2logN) ~ O(logN).
P.S for space complexity (https://codeforces.com/blog/entry/49939 )

Convert Binary Tree -> BST (maintaining original tree shape)

I have a binary tree of some shape. I want to Convert it to BST search tree of same shape. Is it possible?
I tried methods like -
Do In-order traversal of Binary Tree & put contents into an array. Then map this into a BST keeping in mind the condition (left val <= root <= right val). This works for some cases but faile for others.
P.S.: I had a look at this - Binary Trees question. Checking for similar shape. But It's easy to compare 2 BST's for similarity in shape.
The short answer is: you can't. A BST requires that the nodes follow the rule left <= current < right. In the example you linked: http://upload.wikimedia.org/wikipedia/commons/f/f7/Binary_tree.svg, if you try and build a BST with the same shap you'll find that you can't.
However if you want to stretch the definition of a BST so that it allows left <= current <= right (notice that here current <= right is allowed, as apposed to the stricter definition) you can. Sort all the elements and stick them in an array. Now do an in-order traversal, replacing the values at nodes with each element in your array. Here's some pseudo code:
// t is your non-BST tree, a is an array containing the sorted elements of t, i is the current index into a
index i = 0
create_bst(Tree t, Array a)
{
if(t is NIL)
return;
create_bst(t->left, a)
t->data = a[i]
i++
create_bst(t->right, a)
}
The result won't be a true BST however. If you want a true BST that's as close to the original shape as possible, then you again put the elements in a sorted array but this time insert them into a BST. The order in which you insert them is defined by the sizes of the subtrees of the original tree. Here's some pseudo-code:
// left is initially set to 0
create_true_bst(Tree t, BST bt, array a, index left)
{
index i = left + left_subtree(t)->size
bt->insert(a[i])
if(left_subtree(t)->size != 0)
{
create_true_bst(t->left, bt, a, left)
create_true_bst(t->right, bt, a, i + 1)
}
}
This won't guarantee that the shape is the same however.
extract all elements of tree, then sort it and then use recursive inorder process to replace values.
The method you describe is guaranteed to work if you implement it properly. The traversal order on a binary tree is unique, and defines an ordering of the elements. If you sort the elements by value and then stick them in according to that ordering, then it will always be true that
left subtree <= root <= right subtree
for every node, given that this is the order in which you traverse them, and given that you sorted them in that order.
I would simply do two in-order traversals. In the first traversal, get the values from the tree and put them into a heap. In the second, get the values in order from the heap and put them into the tree. This runs in O(n·log n) time and O(n) space.

Finding last element of a binary heap

quoting Wikipedia:
It is perfectly acceptable to use a
traditional binary tree data structure
to implement a binary heap. There is
an issue with finding the adjacent
element on the last level on the
binary heap when adding an element
which can be resolved
algorithmically...
Any ideas on how such an algorithm might work?
I was not able to find any information about this issue, for most binary heaps are implemented using arrays.
Any help appreciated.
Recently, I have registered an OpenID account and am not able to edit my initial post nor comment answers. That's why I am responding via this answer. Sorry for this.
quoting Mitch Wheat:
#Yse: is your question "How do I find
the last element of a binary heap"?
Yes, it is.
Or to be more precise, my question is: "How do I find the last element of a non-array-based binary heap?".
quoting Suppressingfire:
Is there some context in which you're
asking this question? (i.e., is there
some concrete problem you're trying to
solve?)
As stated above, I would like to know a good way to "find the last element of a non-array-based binary heap" which is necessary for insertion and deletion of nodes.
quoting Roy:
It seems most understandable to me to
just use a normal binary tree
structure (using a pRoot and Node
defined as [data, pLeftChild,
pRightChild]) and add two additional
pointers (pInsertionNode and
pLastNode). pInsertionNode and
pLastNode will both be updated during
the insertion and deletion subroutines
to keep them current when the data
within the structure changes. This
gives O(1) access to both insertion
point and last node of the structure.
Yes, this should work. If I am not mistaken, it could be a little bit tricky to find the insertion node and the last node, when their locations change to another subtree due to an deletion/insertion. But I'll give this a try.
quoting Zach Scrivena:
How about performing a depth-first
search...
Yes, this would be a good approach. I'll try that out, too.
Still I am wondering, if there is a way to "calculate" the locations of the last node and the insertion point. The height of a binary heap with N nodes can be calculated by taking the log (of base 2) of the smallest power of two that is larger than N. Perhaps it is possible to calculate the number of nodes on the deepest level, too. Then it was maybe possible to determine how the heap has to be traversed to reach the insertion point or the node for deletion.
Basically, the statement quoted refers to the problem of resolving the location for insertion and deletion of data elements into and from the heap. In order to maintain "the shape property" of a binary heap, the lowest level of the heap must always be filled from left to right leaving no empty nodes. To maintain the average O(1) insertion and deletion times for the binary heap, you must be able to determine the location for the next insertion and the location of the last node on the lowest level to use for deletion of the root node, both in constant time.
For a binary heap stored in an array (with its implicit, compacted data structure as explained in the Wikipedia entry), this is easy. Just insert the newest data member at the end of the array and then "bubble" it into position (following the heap rules). Or replace the root with the last element in the array "bubbling down" for deletions. For heaps in array storage, the number of elements in the heap is an implicit pointer to where the next data element is to be inserted and where to find the last element to use for deletion.
For a binary heap stored in a tree structure, this information is not as obvious, but because it's a complete binary tree, it can be calculated. For example, in a complete binary tree with 4 elements, the point of insertion will always be the right child of the left child of the root node. The node to use for deletion will always be the left child of the left child of the root node. And for any given arbitrary tree size, the tree will always have a specific shape with well defined insertion and deletion points. Because the tree is a "complete binary tree" with a specific structure for any given size, it is very possible to calculate the location of insertion/deletion in O(1) time. However, the catch is that even when you know where it is structurally, you have no idea where the node will be in memory. So, you have to traverse the tree to get to the given node which is an O(log n) process making all inserts and deletions a minimum of O(log n), breaking the usually desired O(1) behavior. Any search ("depth-first", or some other) will be at least O(log n) as well because of the traversal issue noted and usually O(n) because of the random nature of the semi-sorted heap.
The trick is to be able to both calculate and reference those insertion/deletion points in constant time either by augmenting the data structure ("threading" the tree, as mention in the Wikipedia article) or using additional pointers.
The implementation which seems to me to be the easiest to understand, with low memory and extra coding overhead, is to just use a normal simple binary tree structure (using a pRoot and Node defined as [data, pParent, pLeftChild, pRightChild]) and add two additional pointers (pInsert and pLastNode). pInsert and pLastNode will both be updated during the insertion and deletion subroutines to keep them current when the data within the structure changes. This implementation gives O(1) access to both insertion point and last node of the structure and should allow preservation of overall O(1) behavior in both insertion and deletions. The cost of the implementation is two extra pointers and some minor extra code in the insertion/deletion subroutines (aka, minimal).
EDIT: added pseudocode for an O(1) insert()
Here is pseudo code for an insert subroutine which is O(1), on average:
define Node = [T data, *pParent, *pLeft, *pRight]
void insert(T data)
{
do_insertion( data ); // do insertion, update count of data items in tree
# assume: pInsert points node location of the tree that where insertion just took place
# (aka, either shuffle only data during the insertion or keep pInsert updated during the bubble process)
int N = this->CountOfDataItems + 1; # note: CountOfDataItems will always be > 0 (and pRoot != null) after an insertion
p = new Node( <null>, null, null, null); // new empty node for the next insertion
# update pInsert (three cases to handle)
if ( int(log2(N)) == log2(N) )
{# #1 - N is an exact power of two
# O(log2(N))
# tree is currently a full complete binary tree ("perfect")
# ... must start a new lower level
# traverse from pRoot down tree thru each pLeft until empty pLeft is found for insertion
pInsert = pRoot;
while (pInsert->pLeft != null) { pInsert = pInsert->pLeft; } # log2(N) iterations
p->pParent = pInsert;
pInsert->pLeft = p;
}
else if ( isEven(N) )
{# #2 - N is even (and NOT a power of 2)
# O(1)
p->pParent = pInsert->pParent;
pInsert->pParent->pRight = p;
}
else
{# #3 - N is odd
# O(1)
p->pParent = pInsert->pParent->pParent->pRight;
pInsert->pParent->pParent->pRight->pLeft = p;
}
pInsert = p;
// update pLastNode
// ... [similar process]
}
So, insert(T) is O(1) on average: exactly O(1) in all cases except when the tree must be increased by one level when it is O(log N), which happens every log N insertions (assuming no deletions). The addition of another pointer (pLeftmostLeaf) could make insert() O(1) for all cases and avoids the possible pathologic case of alternating insertion & deletion in a full complete binary tree. (Adding pLeftmost is left as an exercise [it's fairly easy].)
My first time to participate in stack overflow.
Yes, the above answer by Zach Scrivena (god I don't know how to properly refer to other people, sorry) is right. What I want to add is a simplified way if we are given the count of nodes.
The basic idea is:
Given the count N of nodes in this full binary tree, do "N % 2" calculation and push the results into a stack. Continue the calculation until N == 1. Then pop the results out. The result being 1 means right, 0 means left. The sequence is the route from root to target position.
Example:
The tree now have 10 nodes, I want insert another node at position 11. How to route it?
11 % 2 = 1 --> right (the quotient is 5, and push right into stack)
5 % 2 = 1 --> right (the quotient is 2, and push right into stack)
2 % 2 = 0 --> left (the quotient is 1, and push left into stack. End)
Then pop the stack: left -> right -> right. This is the path from the root.
You could use the binary representation of the size of the Binary Heap to find the location of the last node in O(log N). The size could be stored and incremented which would take O(1) time. The the fundamental concept behind this is the structure of the binary tree.
Suppose our heap size is 7. The binary representation of 7 is, "111". Now, remember to always omit the first bit. So, now we are left with "11". Read from left-to-right. The bit is '1', so, go to the right child of the root node. Then the string left is "1", the first bit is '1'. So, again go to the right child of the current node you are at. As you no longer have bits to process, this indicates that you have reached the last node. So, the raw working of the process is that, convert the size of the heap into bits. Omit the first bit. According to the leftmost bit, go to the right child of the current node if it is '1', and to the left child of the current node if it is '0'.
As you always to to the very end of the binary tree this operation always takes O(log N) time. This is a simple and accurate procedure to find the last node.
You may not understand it in the first reading. Try working this method on the paper for different values of Binary Heap, I'm sure you'll get the intuition behind it. I'm sure this knowledge is enough to solve your problem, if you want more explanation with figures, you can refer to my blog.
Hope my answer has helped you, if it did, let me know...! ☺
How about performing a depth-first search, visiting the left child before the right child, to determine the height of the tree. Thereafter, the first leaf you encounter with a shorter depth, or a parent with a missing child would indicate where you should place the new node before "bubbling up".
The depth-first search (DFS) approach above doesn't assume that you know the total number of nodes in the tree. If this information is available, then we can "zoom-in" quickly to the desired place, by making use of the properties of complete binary trees:
Let N be the total number of nodes in the tree, and H be the height of the tree.
Some values of (N,H) are (1,0), (2,1), (3,1), (4,2), ..., (7,2), (8, 3).
The general formula relating the two is H = ceil[log2(N+1)] - 1.
Now, given only N, we want to traverse from the root to the position for the new node, in the least number of steps, i.e. without any "backtracking".
We first compute the total number of nodes M in a perfect binary tree of height H = ceil[log2(N+1)] - 1, which is M = 2^(H+1) - 1.
If N == M, then our tree is perfect, and the new node should be added in a new level. This means that we can simply perform a DFS (left before right) until we hit the first leaf; the new node becomes the left child of this leaf. End of story.
However, if N < M, then there are still vacancies in the last level of our tree, and the new node should be added to the leftmost vacant spot.
The number of nodes that are already at the last level of our tree is just (N - 2^H + 1).
This means that the new node takes spot X = (N - 2^H + 2) from the left, at the last level.
Now, to get there from the root, you will need to make the correct turns (L vs R) at each level so that you end up at spot X at the last level. In practice, you would determine the turns with a little computation at each level. However, I think the following table shows the big picture and the relevant patterns without getting mired in the arithmetic (you may recognize this as a form of arithmetic coding for a uniform distribution):
0 0 0 0 0 X 0 0 <--- represents the last level in our tree, X marks the spot!
^
L L L L R R R R <--- at level 0, proceed to the R child
L L R R L L R R <--- at level 1, proceed to the L child
L R L R L R L R <--- at level 2, proceed to the R child
^ (which is the position of the new node)
this column tells us
if we should proceed to the L or R child at each level
EDIT: Added a description on how to get to the new node in the shortest number of steps assuming that we know the total number of nodes in the tree.
Solution in case you don't have reference to parent !!!
To find the right place for next node you have 3 cases to handle
case (1) Tree level is complete Log2(N)
case (2) Tree node count is even
case (3) Tree node count is odd
Insert:
void Insert(Node root,Node n)
{
Node parent = findRequiredParentToInsertNewNode (root);
if(parent.left == null)
parent.left = n;
else
parent.right = n;
}
Find the parent of the node in order to insert it
void findRequiredParentToInsertNewNode(Node root){
Node last = findLastNode(root);
//Case 1
if(2*Math.Pow(levelNumber) == NodeCount){
while(root.left != null)
root=root.left;
return root;
}
//Case 2
else if(Even(N)){
Node n =findParentOfLastNode(root ,findParentOfLastNode(root ,last));
return n.right;
}
//Case 3
else if(Odd(N)){
Node n =findParentOfLastNode(root ,last);
return n;
}
}
To find the last node you need to perform a BFS (breadth first search) and get the last element in the queue
Node findLastNode(Node root)
{
if (root.left == nil)
return root
Queue q = new Queue();
q.enqueue(root);
Node n = null;
while(!q.isEmpty()){
n = q.dequeue();
if ( n.left != null )
q.enqueue(n.left);
if ( n.right != null )
q.enqueue(n.right);
}
return n;
}
Find the parent of the last node in order to set the node to null in case replacing with the root in removal case
Node findParentOfLastNode(Node root ,Node lastNode)
{
if(root == null)
return root;
if( root.left == lastNode || root.right == lastNode )
return root;
Node n1= findParentOfLastNode(root.left,lastNode);
Node n2= findParentOfLastNode(root.left,lastNode);
return n1 != null ? n1 : n2;
}
I know this is an old thread but i was looking for a answer to the same question. But i could not afford to do an o(log n) solution as i had to find the last node thousands of times in a few seconds. I did have a O(log n) algorithm but my program was crawling because of the number of times it performed this operation. So after much thought I did finally find a fix for this. Not sure if anybody things this is interesting.
This solution is O(1) for search. For insertion it is definitely less than O(log n), although I cannot say it is O(1).
Just wanted to add that if there is interest, i can provide my solution as well.
The solution is to add the nodes in the binary heap to a queue. Every queue node has front and back pointers.We keep adding nodes to the end of this queue from left to right until we reach the last node in the binary heap. At this point, the last node in the binary heap will be in the rear of the queue.
Every time we need to find the last node, we dequeue from the rear,and the second-to-last now becomes the last node in the tree.
When we want to insert, we search backwards from the rear for the first node where we can insert and put it there. It is not exactly O(1) but reduces the running time dramatically.

Resources