Binary Tree Find number nearest and greater than the key - algorithm

Say I have a balanced binary tree. I wish to search for a key k in the tree. However, if k doesn't exist in the binary tree, it should give me the next greatest number nearest to k.
For examples suppose I have these numbers [1,5,6,8,10] as keys in the tree. If I search for '7' it should return 8 and if I search for 2 it should return 5 etc.
What would have to be the modifications in the binary tree to be able to perform such a search? I want an O(log n) solution as well please.

Assuming you mean "binary search tree" rather than "binary tree", you don't need any modifications to find the minimum element y in the tree such that y >= x.
search(n, x, best_so_far) ::=
if n == nil { return best_so_far }
if n.value == x { return x }
if n.value > x { return search(n.left, x, min(best_so_far, n.value) }
if n.value < x { return search(n.right, x, best_so_far) }
You would call this function as search(root, x, +infinity).
The idea is that if you're exploring the left branch at a node n, you don't need to consider anything to the right of n: n.value is larger than x, and everything to the right is larger than n.value. Similarly, if you're exploring the right branch of a node n, then you can discard everything to the left of n: n.value is smaller than x and everything to the left of n is smaller than n.value.
The code's runtime is bounded by the height of the tree, so is O(log n) if the tree is balanced.

Related

Construct a binary tree from permutation in n log n time

The numbers 1 to n are inserted in a binary search tree in a specified order p_1, p_2,..., p_n. Describe an O(nlog n) time algorithm to construct the resulting final binary search tree.
Note that :-
I don't need average time n log n, but the worst time.
I need the the exact tree that results when insertion takes place with the usual rules. AVL or red black trees not allowed.
This is an assignment question. It is very very non trivial. In fact it seemed impossible at first glance. I have thought on it much. My observations:-
The argument that we use to prove that sorting takes atleast n log n time does not eliminate the existence of such an algorithm here.
If it is always possible to find a subtree in O(n) time whose size is between two fractions of the size of tree, the problem can be easily solved.
Choosing median or left child of root as root of subtree doesn't work.
The trick is not to use the constructed BST for lookups. Instead, keep an additional, balanced BST for lookups. Link the leaves.
For example, we might have
Constructed Balanced
3 2
/ \ / \
2 D 1 3
/ \ / | | \
1 C a b c d
/ \
A B
where a, b, c, d are pointers to A, B, C, D respectively, and A, B, C, D are what would normally be null pointers.
To insert, insert into the balanced BST first (O(log n)), follow the pointer to the constructed tree (O(1)), do the constructed insert (O(1)), and relink the new leaves (O(1)).
As David Eisenstat doesn't have time to extend his answer, I'll try to put more details into a similar algorithm.
Intuition
The main intuition behind the algorithm is based on the following statements:
statement #1: if a BST contains values a and b (a < b) AND there are no values between them, then either A (node for value a) is a (possibly indirect) parent of B (node for value b) or B is a (possibly indirect) parent of A.
This statement is obviously true because if their lowest common ancestor C is some other node than A and B, its value c must be between a and b. Note that statement #1 is true for any BST (balanced or unbalanced).
statement #2: if a simple (unbalanced) BST contains values a and b (a < b) AND there are no values between them AND we are trying to add value x such that a < x < b, then X (node for value x) will be either direct right (greater) child of A or direct left (less) child of B whichever node is lower in the tree.
Let's assume that the lower of two nodes is a (the other case is symmetrical). During insertion phase value x will travel the same path as a during its insertion because tree doesn't contain any values between a and x i.e. at any comparison values a and x are indistinguishable. It means that value x will navigate tree till node A and will pass node B at some earlier step (see statement #1). As x > a it should become a right child of A. Direct right child of A must be empty at this point because A is in B's subtree i.e. all values in that subtree are less than b and since there are no values between a and b in the tree, no value can be right child of node A.
Note that statement #2 might potentially be not true for some balanced BST after re-balancing was performed although this should be a strange case.
statement #3: in a balanced BST for any value x not in the tree yet, you can find closest greater and closest less values in O(log(N)) time.
This follows directly from statements #1 and #2: all you need is find the potential insertion point for the value x in the BST (takes O(log(N))), one of the two values will be direct parent of the insertion point and to find the other you need to travel the tree back to the root (again takes O(log(N))).
So now the idea behind the algorithm becomes clear: for fast insertion into an unbalanced BST we need to find nodes with closest less and greater values. We can easily do it if we additionally maintain a balanced BST with the same keys as our target (unbalanced) BST and with corresponding nodes from that BST as values. Using that additional data structure we can find insertion point for each new value in O(log(N)) time and update this data structure with new value in O(log(N)) time as well.
Algorithm
Init "main" root and balancedRoot with null.
for each value x in the list do:
if this is the first value just add it as the root nodes to both trees and go to #2
in the tree specified by balancedRoot find nodes that correspond to the closest less (BalancedA, points to node A in the main BST) and closest greater (BalancedB, points to node B in the main BST) values.
If there is no closest lower value i.e. we are adding minimum element, add it as the left child to the node B
If there is no closest greater value i.e. we are adding maximum element, add it as the right child to the node A
Find whichever of nodes A or B is lower in the tree. You can use explicit level stored in the node. If the lower node is A (less node), add x as the direct right child of A else add x as the direct left child of B (greater node). Alternatively (and more cleverly) you may notice that from the statements #1 and #2 follows that exactly one of the two candidate insert positions (A's right child or B's left child) will be empty and this is where you want to insert your value x.
Add value x to the balanced tree (might re-use from step #4).
Go to step #2
As no inner step of the loop takes more than O(log(N)), total complexity is O(N*log(N))
Java implementation
I'm too lazy to implement balanced BST myself so I used standard Java TreeMap that implements Red-Black tree and has useful lowerEntry and higherEntry methods that correspond to step #4 of the algorithm (you may look at the source code to ensure that both are actually O(log(N))).
import java.util.Map;
import java.util.TreeMap;
public class BSTTest {
static class Node {
public final int value;
public Node left;
public Node right;
public Node(int value) {
this.value = value;
}
public boolean compareTree(Node other) {
return compareTrees(this, other);
}
public static boolean compareTrees(Node n1, Node n2) {
if ((n1 == null) && (n2 == null))
return true;
if ((n1 == null) || (n2 == null))
return false;
if (n1.value != n2.value)
return false;
return compareTrees(n1.left, n2.left) &&
compareTrees(n1.right, n2.right);
}
public void assignLeftSafe(Node child) {
if (this.left != null)
throw new IllegalStateException("left child is already set");
this.left = child;
}
public void assignRightSafe(Node child) {
if (this.right != null)
throw new IllegalStateException("right child is already set");
this.right = child;
}
#Override
public String toString() {
return "Node{" +
"value=" + value +
'}';
}
}
static Node insertToBst(Node root, int value) {
if (root == null)
root = new Node(value);
else if (value < root.value)
root.left = insertToBst(root.left, value);
else
root.right = insertToBst(root.right, value);
return root;
}
static Node buildBstDirect(int[] values) {
Node root = null;
for (int v : values) {
root = insertToBst(root, v);
}
return root;
}
static Node buildBstSmart(int[] values) {
Node root = null;
TreeMap<Integer, Node> balancedTree = new TreeMap<Integer, Node>();
for (int v : values) {
Node node = new Node(v);
if (balancedTree.isEmpty()) {
root = node;
} else {
Map.Entry<Integer, Node> lowerEntry = balancedTree.lowerEntry(v);
Map.Entry<Integer, Node> higherEntry = balancedTree.higherEntry(v);
if (lowerEntry == null) {
// adding minimum value
higherEntry.getValue().assignLeftSafe(node);
} else if (higherEntry == null) {
// adding max value
lowerEntry.getValue().assignRightSafe(node);
} else {
// adding some middle value
Node lowerNode = lowerEntry.getValue();
Node higherNode = higherEntry.getValue();
if (lowerNode.right == null)
lowerNode.assignRightSafe(node);
else
higherNode.assignLeftSafe(node);
}
}
// update balancedTree
balancedTree.put(v, node);
}
return root;
}
public static void main(String[] args) {
int[] input = new int[]{7, 6, 9, 4, 1, 8, 2, 5, 3};
Node directRoot = buildBstDirect(input);
Node smartRoot = buildBstSmart(input);
System.out.println(directRoot.compareTree(smartRoot));
}
}
Here's a linear-time algorithm. (I said that I wasn't going to work on this question, so if you like this answer, please award the bounty to SergGr.)
Create a doubly linked list with nodes 1..n and compute the inverse of p. For i from n down to 1, let q be the left neighbor of p_i in the list, and let r be the right neighbor. If p^-1(q) > p^-1(r), then make p_i the right child of q. If p^-1(q) < p^-1(r), then make p_i the left child of r. Delete p_i from the list.
In Python:
class Node(object):
__slots__ = ('left', 'key', 'right')
def __init__(self, key):
self.left = None
self.key = key
self.right = None
def construct(p):
# Validate the input.
p = list(p)
n = len(p)
assert set(p) == set(range(n)) # 0 .. n-1
# Compute p^-1.
p_inv = [None] * n
for i in range(n):
p_inv[p[i]] = i
# Set up the list.
nodes = [Node(i) for i in range(n)]
for i in range(n):
if i >= 1:
nodes[i].left = nodes[i - 1]
if i < n - 1:
nodes[i].right = nodes[i + 1]
# Process p.
for i in range(n - 1, 0, -1): # n-1, n-2 .. 1
q = nodes[p[i]].left
r = nodes[p[i]].right
if r is None or (q is not None and p_inv[q.key] > p_inv[r.key]):
print(p[i], 'is the right child of', q.key)
else:
print(p[i], 'is the left child of', r.key)
if q is not None:
q.right = r
if r is not None:
r.left = q
construct([1, 3, 2, 0])
Here's my O(n log^2 n) attempt that doesn't require building a balanced tree.
Put nodes in an array in their natural order (1 to n). Also link them into a linked list in the order of insertion. Each node stores its order of insertion along with the key.
The algorithm goes like this.
The input is a node in the linked list, and a range (low, high) of indices in the node array
Call the input node root, Its key is rootkey. Unlink it from the list.
Determine which subtree of the input node is smaller.
Traverse the corresponding array range, unlink each node from the linked list, then link them in a separate linked list and sort the list again in the insertion order.
Heads of the two resulting lists are children of the input node.
Perform the algorithm recursively on children of the input node, passing ranges (low, rootkey-1) and (rootkey+1, high) as index ranges.
The sorting operation at each level gives the algorithm the extra log n complexity factor.
Here's an O(n log n) algorithm that can also be adapted to O(n log log m) time, where m is the range, by using a Y-fast trie rather than a balanced binary tree.
In a binary search tree, lower values are left of higher values. The order of insertion corresponds with the right-or-left node choices when traveling along the final tree. The parent of any node, x, is either the least higher number previously inserted or the greatest lower number previously inserted, whichever was inserted later.
We can identify and connect the listed nodes with their correct parents using the logic above in O(n log n) worst-time by maintaining a balanced binary tree with the nodes visited so far as we traverse the order of insertion.
Explanation:
Let's imagine a proposed lower parent, p. Now imagine there's a number, l > p but still lower than x, inserted before p. Either (1) p passed l during insertion, in which case x would have had to pass l to get to p but that contradicts that x must have gone right if it reached l; or (2) p did not pass l, in which case p is in a subtree left of l but that would mean a number was inserted that's smaller than l but greater than x, a contradiction.
Clearly, a number, l < x, greater than p that was inserted after p would also contradict p as x's parent since either (1) l passed p during insertion, which means p's right child would have already been assigned when x was inserted; or (2) l is in a subtree to the right of p, which again would mean a number was inserted that's smaller than l but greater than x, a contradiction.
Therefore, for any node, x, with a lower parent, that parent must be the greatest number lower than and inserted before x. Similar logic covers the scenario of a higher proposed parent.
Now let's imagine x's parent, p < x, was inserted before h, the lowest number greater than and inserted before x. Then either (1) h passed p, in which case p's right node would have been already assigned when x was inserted; or (2) h is in a subtree right of p, which means a number lower than h and greater than x was previously inserted but that would contradict our assertion that h is the lowest number inserted so far that's greater than x.
Since this is an assignment, I'm posting a hint instead of an answer.
Sort the numbers, while keeping the insertion order. Say you have input: [1,7,3,5,8,2,4]. Then after sorting you will have [[1,0], [2,5], [3,2], [4, 6], [5,3], [7,1], [8,4]] . This is actually the in-order traversal of the resulting tree. Think hard about how to reconstruct the tree given the in-order traversal and the insertion order (this part will be linear time).
More hints coming if you really need them.

how to find the position of right most node in last level of complete binary tree?

I am doing a problem in binary trees, and when I came across a problem find the right most node in the last level of a complete binary tree and the issue here is we have to do it in O(n) time which was a stopping point, Doing it in O(n) is simple by traversing all the elements, but is there a way to do this in any complexity less than O(n), I have browsed through internet a lot, and I couldn't get anything regarding the thing.
Thanks in advance.
Yes, you can do it in O(log(n)^2) by doing a variation of binary search.
This can be done by first going to the leftest element1, then to the 2nd leftest element, then to the 4th leftest element, 8th ,... until you find there is no such element.
Let's say the last element you found was the ith, and the first you didn't was 2i.
Now you can simply do a binary search over that range.
This is O(log(n/2)) = O(logn) total iterations, and since each iteration is going down the entire tree, it's total of O(log(n)^2) time.
(1) In here and the followings, the "x leftest element" is referring only to the nodes in the deepest level of the tree.
I assume that you know the number of nodes. Let n such number.
In a complete binary tree, a level i has twice the number of nodes than the level i - 1.
So, you could iteratively divide n between 2. If there remainder then n is a right child; otherwise, is a left child. You store into a sequence, preferably a stack, whether there is remainder or not.
Some such as:
Stack<char> s;
while (n > 1)
{
if (n % 2 == 0)
s.push('L');
else
s.push('R');
n = n/2; // n would int so division is floor
}
When the while finishes, the stack contains the path to the rightmost node.
The number of times that the while is executed is log_2(n).
This is the recursive solution with time complexity O(lg n* lg n) and O(lg n) space complexity (considering stack storage space).
Space complexity can be reduced to O(1) using Iterative version of the below code.
// helper function
int getLeftHeight(TreeNode * node) {
int c = 0;
while (node) {
c++;
node = node -> left;
}
return c;
}
int getRightMostElement(TreeNode * node) {
int h = getLeftHeight(node);
// base case will reach when RightMostElement which is our ans is found
if (h == 1)
return node -> val;
// ans lies in rightsubtree
else if ((h - 1) == getLeftHeight(node -> right))
return getRightMostElement(node -> right);
// ans lies in left subtree
else getRightMostElement(node -> left);
}
Time Complexity derivation -
At each recursion step, we are considering either left subtree or right subtree i.e. n/2 elements for maximum height (lg n) function calls,
calculating height takes lg n time -
T(n) = T(n/2) + c1 lgn
= T(n/4) + c1 lgn + c2 (lgn - 1)
= ...
= T(1) + c [lgn + (lgn-1) + (lgn-2) + ... + 1]
= O(lgn*lgn)
Since it's a complete binary tree, going over all the right nodes until you reach the leaves will take O(logN), not O(N). In regular binary tree it takes O(N) because in the worst case all the nodes are lined up to the right, but since it's a complete binary tree, it can't be

Count number of nodes within range inside Binary Search Tree in O(LogN)

Given a BST and two integers 'a' and 'b' (a < b), how can we find the number of nodes such that , a < node value < b, in O(log n)?
I know one can easily find the position of a and b in LogN time, but how to count the nodes in between without doing a traversal, which is O(n)?
In each node of your Binary Search Tree, also keep count of the number of values in the tree that are lesser than its value (or, for a different tree design mentioned in the footnote below, the nodes in its left subtree).
Now, first find the node containing the value a. Get the count of values lesser than a which has been stored in this node. This step is Log(n).
Now find the node containing the value b. Get the count of values lesser than b which are stored in this node. This step is also Log(n).
Subtract the two counts and you have the number of nodes between a and b. Total complexity of this search is 2*Log(n) = O(Log(n)).
See this video. The professor explains your question here by using Splay Trees.
Simple solution:
Start checking from the root node
If Node falls within range, then increase it by 1 and check in left and right child recursively
If Node is not within range, then check the values with range. If range values are less than root, then definitely possible scenarios are left subtree. Else check in right subtree
Here is the sample code. Hope it clears.
if (node == null) {
return 0;
} else if (node.data == x && node.data == y) {
return 1;
} else if (node.data >= x && node.data <= y) {
return 1 + nodesWithInRange(node.left, x, y) + nodesWithInRange(node.right, x, y);
} else if (node.data > x && node.data > y) {
return nodesWithInRange(node.left, x, y);
} else {
return nodesWithInRange(node.right, x, y);
}
Time Complexity :- O(logn)+ O(K)
K is the number of elements between x and y.
It's not very ideal but good in case you would not like to modify the Binary Tree nodes definition.
store the inorder traversal of BST in array( it will be sorted). Searching 'a' and 'b' will take log(n) time and get their index and take the difference. this will give the number of node in range 'a' to 'b'.
space complexity O(n)
Idea is simple.
Traverse the BST starting from root.
For every node check if it lies in range.
If it lies in range then count++. And recur for both of its children.
If current node is smaller than low value of range, then recur for right child, else recur for left child.
Time complexity will be O(height + number of nodes in range)..
For your question that why it is not O(n).
Because we are not traversing the whole tree that is the number of nodes in the tree. We are just traversing the required subtree according to the parent's data.
Pseudocode
int findCountInRange(Node root, int a, int b){
if(root==null)
return 0;
if(root->data <= a && root->data >= b)
return 1 + findCountInRange(root->left, a, b)+findCountInRange(root->right, a, b);
else if(root->data < low)
return findCountInRange(root->right, a, b);
else
return findCountInRange(root->left, a, b);
}

Time Complexity for Finding the Minimum Value of a Binary Tree

I wrote a recursive function for finding the min value of a binary tree (assume that it is not ordered).
The code is as below.
//assume node values are positive int.
int minValue (Node n) {
if(n == null) return 0;
leftmin = minValue(n.left);
rightmin = minValue(n.right);
return min(n.data, leftmin, rightmin);
}
int min (int a, int b, int c) {
int min = 0;
if(b != 0 && c != 0) {
if(a<=b) min =a;
else min =b;
if(min<=c) return min;
else return c;
}
if(b==0) {
if(a<=c) return a;
else return c;
}
if(c==0) {
if(a<=b) return a;
else return b;
}
}
I guess the time complexity of the minValue function is O(n) by intuition.
Is this correct? Can someone show the formal proof of the time complexity of minValue function?
Assuming your binary tree is not ordered, then your search algorithm will have O(N) running time, so your intuition is correct. The reason it will take O(N) is that you will, on average, have to search half the nodes in the tree to find an input. But this assumes that the tree is completely unordered.
For a sorted and balanced binary tree, searching will take O(logN). The reason for this is that the search will only ever have to traverse one single path down the tree. A balanced tree with N nodes will have a height of log(N), and this explains the complexity for searching. Consider the following tree for example:
5
/ \
3 7
/ \ / \
1 4 6 8
There are 8 (actually 7) nodes in the tree, but the height is only log(8) = 2. You can convince yourself that you will only ever have to traverse this tree once to find a value or fail doing so.
Note that for a binary tree which is not balanced these complexities may not apply.
The number of comparisons is n-1. The proof is an old chestnut, usually applied to the problem of saying how many matches are needed in a single-elimination tennis match. Each comparison removes exactly one number from consideration and so if there's initially n numbers in the tree, you need n-1 comparisons to reduce that to 1.
You can lookup and remove the min/max of a BST in constant time O(1), if you implement it yourself and store a reference to head/tail. Most implementations don't do that, only storing the root-node. But if you analyze how a BST works, given a ref to min/max (or aliased as head/tail), then you can find the next min/max in constant time.
See this for more info:
https://stackoverflow.com/a/74905762/1223975

Numbers in Binary Search Tree(BST) adding to a certain value

Given a BST, is it possible to find two numbers that add up to a given value, in O(n) time and little additional memory. By little additional memory, it is implied that you can't copy the entire BST into an array.
This can be accomplished in O(n) time and O(1) additional memory if you have both child and parent pointers. You keep two pointers, x and y, and start x at the minimum element and y at the maximum. If the sum of these two elements is too low, you move x to its successor, and if it's too high you move y to its predecessor. You can report a failure once x points to a larger element than y. Each edge in the tree is traversed at most twice for a total of O(n) edge traversals, and your only memory usage is the two pointers. Without parent pointers, you need to remember the sequence of ancestors to the root, which is at least Omega(log n) and possibly higher if the tree is unbalanced.
To find a successor, you can use the following pseudocode (analogous code for predecessor):
succ(x) {
if (x.right != null) {
ret = x.right;
while (ret.left != null) ret = ret.left;
return ret;
} else {
retc = x;
while (retc.parent != null && retc.parent < x) retc = retc.parent;
if (retc.parent != null && retc.parent > x) return retc.parent;
else return null;
}
}
I think jonderry is very close, but the parent pointers require \Omega(n) memory, that is they add substantially to memory usage. What he is doing is two coordinated traversals in opposite directions (small to large and viveversa) trying to keep the sum always close to the target and you can manage that with two stacks, and the stacks can only grow up to the depth of the tree and that is O(log n). I don't know if that's "little" additional memory, but certainly it is less additional memory and o(n). So this is exactly as in jonderry own comment, but there is no runtime penalty because traversing a binary tree using only a stack is a well known and efficient and definitely O(n) operation. So you have increasing iterator ii and a decreasing iterator di.
x = ii.next()
y = di.next()
while (true) {
try:
if x + y > target {y = di.next()}
if x + y < target {x = ii.next()}
if x + y == target {return (x,y)}
except IterError:
break
}
return None
I had just run into the same problem in the context of computing the pseudomedian, the median of all pairwise averages in a sample.
Yes, if you parse it as it would be a sorted array.

Resources