Related
The numbers 1 to n are inserted in a binary search tree in a specified order p_1, p_2,..., p_n. Describe an O(nlog n) time algorithm to construct the resulting final binary search tree.
Note that :-
I don't need average time n log n, but the worst time.
I need the the exact tree that results when insertion takes place with the usual rules. AVL or red black trees not allowed.
This is an assignment question. It is very very non trivial. In fact it seemed impossible at first glance. I have thought on it much. My observations:-
The argument that we use to prove that sorting takes atleast n log n time does not eliminate the existence of such an algorithm here.
If it is always possible to find a subtree in O(n) time whose size is between two fractions of the size of tree, the problem can be easily solved.
Choosing median or left child of root as root of subtree doesn't work.
The trick is not to use the constructed BST for lookups. Instead, keep an additional, balanced BST for lookups. Link the leaves.
For example, we might have
Constructed Balanced
3 2
/ \ / \
2 D 1 3
/ \ / | | \
1 C a b c d
/ \
A B
where a, b, c, d are pointers to A, B, C, D respectively, and A, B, C, D are what would normally be null pointers.
To insert, insert into the balanced BST first (O(log n)), follow the pointer to the constructed tree (O(1)), do the constructed insert (O(1)), and relink the new leaves (O(1)).
As David Eisenstat doesn't have time to extend his answer, I'll try to put more details into a similar algorithm.
Intuition
The main intuition behind the algorithm is based on the following statements:
statement #1: if a BST contains values a and b (a < b) AND there are no values between them, then either A (node for value a) is a (possibly indirect) parent of B (node for value b) or B is a (possibly indirect) parent of A.
This statement is obviously true because if their lowest common ancestor C is some other node than A and B, its value c must be between a and b. Note that statement #1 is true for any BST (balanced or unbalanced).
statement #2: if a simple (unbalanced) BST contains values a and b (a < b) AND there are no values between them AND we are trying to add value x such that a < x < b, then X (node for value x) will be either direct right (greater) child of A or direct left (less) child of B whichever node is lower in the tree.
Let's assume that the lower of two nodes is a (the other case is symmetrical). During insertion phase value x will travel the same path as a during its insertion because tree doesn't contain any values between a and x i.e. at any comparison values a and x are indistinguishable. It means that value x will navigate tree till node A and will pass node B at some earlier step (see statement #1). As x > a it should become a right child of A. Direct right child of A must be empty at this point because A is in B's subtree i.e. all values in that subtree are less than b and since there are no values between a and b in the tree, no value can be right child of node A.
Note that statement #2 might potentially be not true for some balanced BST after re-balancing was performed although this should be a strange case.
statement #3: in a balanced BST for any value x not in the tree yet, you can find closest greater and closest less values in O(log(N)) time.
This follows directly from statements #1 and #2: all you need is find the potential insertion point for the value x in the BST (takes O(log(N))), one of the two values will be direct parent of the insertion point and to find the other you need to travel the tree back to the root (again takes O(log(N))).
So now the idea behind the algorithm becomes clear: for fast insertion into an unbalanced BST we need to find nodes with closest less and greater values. We can easily do it if we additionally maintain a balanced BST with the same keys as our target (unbalanced) BST and with corresponding nodes from that BST as values. Using that additional data structure we can find insertion point for each new value in O(log(N)) time and update this data structure with new value in O(log(N)) time as well.
Algorithm
Init "main" root and balancedRoot with null.
for each value x in the list do:
if this is the first value just add it as the root nodes to both trees and go to #2
in the tree specified by balancedRoot find nodes that correspond to the closest less (BalancedA, points to node A in the main BST) and closest greater (BalancedB, points to node B in the main BST) values.
If there is no closest lower value i.e. we are adding minimum element, add it as the left child to the node B
If there is no closest greater value i.e. we are adding maximum element, add it as the right child to the node A
Find whichever of nodes A or B is lower in the tree. You can use explicit level stored in the node. If the lower node is A (less node), add x as the direct right child of A else add x as the direct left child of B (greater node). Alternatively (and more cleverly) you may notice that from the statements #1 and #2 follows that exactly one of the two candidate insert positions (A's right child or B's left child) will be empty and this is where you want to insert your value x.
Add value x to the balanced tree (might re-use from step #4).
Go to step #2
As no inner step of the loop takes more than O(log(N)), total complexity is O(N*log(N))
Java implementation
I'm too lazy to implement balanced BST myself so I used standard Java TreeMap that implements Red-Black tree and has useful lowerEntry and higherEntry methods that correspond to step #4 of the algorithm (you may look at the source code to ensure that both are actually O(log(N))).
import java.util.Map;
import java.util.TreeMap;
public class BSTTest {
static class Node {
public final int value;
public Node left;
public Node right;
public Node(int value) {
this.value = value;
}
public boolean compareTree(Node other) {
return compareTrees(this, other);
}
public static boolean compareTrees(Node n1, Node n2) {
if ((n1 == null) && (n2 == null))
return true;
if ((n1 == null) || (n2 == null))
return false;
if (n1.value != n2.value)
return false;
return compareTrees(n1.left, n2.left) &&
compareTrees(n1.right, n2.right);
}
public void assignLeftSafe(Node child) {
if (this.left != null)
throw new IllegalStateException("left child is already set");
this.left = child;
}
public void assignRightSafe(Node child) {
if (this.right != null)
throw new IllegalStateException("right child is already set");
this.right = child;
}
#Override
public String toString() {
return "Node{" +
"value=" + value +
'}';
}
}
static Node insertToBst(Node root, int value) {
if (root == null)
root = new Node(value);
else if (value < root.value)
root.left = insertToBst(root.left, value);
else
root.right = insertToBst(root.right, value);
return root;
}
static Node buildBstDirect(int[] values) {
Node root = null;
for (int v : values) {
root = insertToBst(root, v);
}
return root;
}
static Node buildBstSmart(int[] values) {
Node root = null;
TreeMap<Integer, Node> balancedTree = new TreeMap<Integer, Node>();
for (int v : values) {
Node node = new Node(v);
if (balancedTree.isEmpty()) {
root = node;
} else {
Map.Entry<Integer, Node> lowerEntry = balancedTree.lowerEntry(v);
Map.Entry<Integer, Node> higherEntry = balancedTree.higherEntry(v);
if (lowerEntry == null) {
// adding minimum value
higherEntry.getValue().assignLeftSafe(node);
} else if (higherEntry == null) {
// adding max value
lowerEntry.getValue().assignRightSafe(node);
} else {
// adding some middle value
Node lowerNode = lowerEntry.getValue();
Node higherNode = higherEntry.getValue();
if (lowerNode.right == null)
lowerNode.assignRightSafe(node);
else
higherNode.assignLeftSafe(node);
}
}
// update balancedTree
balancedTree.put(v, node);
}
return root;
}
public static void main(String[] args) {
int[] input = new int[]{7, 6, 9, 4, 1, 8, 2, 5, 3};
Node directRoot = buildBstDirect(input);
Node smartRoot = buildBstSmart(input);
System.out.println(directRoot.compareTree(smartRoot));
}
}
Here's a linear-time algorithm. (I said that I wasn't going to work on this question, so if you like this answer, please award the bounty to SergGr.)
Create a doubly linked list with nodes 1..n and compute the inverse of p. For i from n down to 1, let q be the left neighbor of p_i in the list, and let r be the right neighbor. If p^-1(q) > p^-1(r), then make p_i the right child of q. If p^-1(q) < p^-1(r), then make p_i the left child of r. Delete p_i from the list.
In Python:
class Node(object):
__slots__ = ('left', 'key', 'right')
def __init__(self, key):
self.left = None
self.key = key
self.right = None
def construct(p):
# Validate the input.
p = list(p)
n = len(p)
assert set(p) == set(range(n)) # 0 .. n-1
# Compute p^-1.
p_inv = [None] * n
for i in range(n):
p_inv[p[i]] = i
# Set up the list.
nodes = [Node(i) for i in range(n)]
for i in range(n):
if i >= 1:
nodes[i].left = nodes[i - 1]
if i < n - 1:
nodes[i].right = nodes[i + 1]
# Process p.
for i in range(n - 1, 0, -1): # n-1, n-2 .. 1
q = nodes[p[i]].left
r = nodes[p[i]].right
if r is None or (q is not None and p_inv[q.key] > p_inv[r.key]):
print(p[i], 'is the right child of', q.key)
else:
print(p[i], 'is the left child of', r.key)
if q is not None:
q.right = r
if r is not None:
r.left = q
construct([1, 3, 2, 0])
Here's my O(n log^2 n) attempt that doesn't require building a balanced tree.
Put nodes in an array in their natural order (1 to n). Also link them into a linked list in the order of insertion. Each node stores its order of insertion along with the key.
The algorithm goes like this.
The input is a node in the linked list, and a range (low, high) of indices in the node array
Call the input node root, Its key is rootkey. Unlink it from the list.
Determine which subtree of the input node is smaller.
Traverse the corresponding array range, unlink each node from the linked list, then link them in a separate linked list and sort the list again in the insertion order.
Heads of the two resulting lists are children of the input node.
Perform the algorithm recursively on children of the input node, passing ranges (low, rootkey-1) and (rootkey+1, high) as index ranges.
The sorting operation at each level gives the algorithm the extra log n complexity factor.
Here's an O(n log n) algorithm that can also be adapted to O(n log log m) time, where m is the range, by using a Y-fast trie rather than a balanced binary tree.
In a binary search tree, lower values are left of higher values. The order of insertion corresponds with the right-or-left node choices when traveling along the final tree. The parent of any node, x, is either the least higher number previously inserted or the greatest lower number previously inserted, whichever was inserted later.
We can identify and connect the listed nodes with their correct parents using the logic above in O(n log n) worst-time by maintaining a balanced binary tree with the nodes visited so far as we traverse the order of insertion.
Explanation:
Let's imagine a proposed lower parent, p. Now imagine there's a number, l > p but still lower than x, inserted before p. Either (1) p passed l during insertion, in which case x would have had to pass l to get to p but that contradicts that x must have gone right if it reached l; or (2) p did not pass l, in which case p is in a subtree left of l but that would mean a number was inserted that's smaller than l but greater than x, a contradiction.
Clearly, a number, l < x, greater than p that was inserted after p would also contradict p as x's parent since either (1) l passed p during insertion, which means p's right child would have already been assigned when x was inserted; or (2) l is in a subtree to the right of p, which again would mean a number was inserted that's smaller than l but greater than x, a contradiction.
Therefore, for any node, x, with a lower parent, that parent must be the greatest number lower than and inserted before x. Similar logic covers the scenario of a higher proposed parent.
Now let's imagine x's parent, p < x, was inserted before h, the lowest number greater than and inserted before x. Then either (1) h passed p, in which case p's right node would have been already assigned when x was inserted; or (2) h is in a subtree right of p, which means a number lower than h and greater than x was previously inserted but that would contradict our assertion that h is the lowest number inserted so far that's greater than x.
Since this is an assignment, I'm posting a hint instead of an answer.
Sort the numbers, while keeping the insertion order. Say you have input: [1,7,3,5,8,2,4]. Then after sorting you will have [[1,0], [2,5], [3,2], [4, 6], [5,3], [7,1], [8,4]] . This is actually the in-order traversal of the resulting tree. Think hard about how to reconstruct the tree given the in-order traversal and the insertion order (this part will be linear time).
More hints coming if you really need them.
I want a "pre-order" traversal of the nodes in BinaryTree visits them in
this order [N0,N1,N2,N3]
What should I do with the following structure?
one sig Ordering { // model a linear order on nodes
first: Node, // the first node in the linear order
order: Node -> Node // for each node n, n.(Ordering.order) represents the
// node (if any) immediately after n in order
}
fact LinearOrder { // the first node in the linear order is N0; and
// the four nodes are ordered as [N0, N1, N2, N3]
}
pred SymmetryBreaking(t: BinaryTree) { // if t has a root node, it is the
//first node according to the linear order; and
// a "pre-order" traversal of the nodes in t visits them according
//to the linear order
}
Your question has 2 part. First, define ordering for N0, N1....... Second, define symmetry break using Linear ordering.
First, you can define ordering by list all the valid relationship, like
`N0.(Ordering.order) = N1`
Second, define the predicate that pre-order of tree follow linear ordering. Basically, there are 2 cases, and the first one is trivial , no t.root . However when the root is not NULL, the tree must has following 3 property.
For all node n, if h = n.left, h.val = n.val.(Ordering.order)
For all node n, if h = n.right, h.val = (one node in n's left-sub-tree or n).(Ordering.order)
t.root = N0
If you translate the whole description into alloy, it will be something like
no t.root or
{
t.root = N0 // 3
all h : t.root.^(left+right) | one n : t.root.*(left+right) |
(n.left = h and n.(Ordering.order) = h ) or //1
( one l :( n.left*(left+right) + n)| n.right = h and l.(Ordering.order) = h)//2
}
Note : there may be many alternative solution, and this one is definitely not a simple one.
Problem: Consider a complete k-ary tree with l levels, with nodes labelled by their rank in a breadth first traversal. Compute the list of labels in the order that they are traversed in the depth first traversal.
For example, for a binary tree with 3 levels, the required list is:
[0 1 3 7 8 4 9 10 2 5 11 12 6 13 14]
One way to do this is to actually construct a tree structure and traverse it twice; The first traversal is breadth first, labelling the nodes 0,1,2,.. as you go. The second traversal is depth first, reporting the labels 0,1,3,7,.. as you go.
I'm interested in a method that avoids constructing a tree in memory. I realize that the size of such a tree is comparable to the size of the output, but I'm hoping that the solution will allow for a "streaming" output (ie one that needs not be stored entirely in memory).
I am also interested in the companion problem; start with a tree labelled according to a depth first traversal and generate the labels of a breadth first traversal. I imagine that the solution to this problem will be, in some sense, dual to the first problem.
You don't actually need to construct the tree. You can do the depth first traversal using just the BFS labels instead of pointers to actual nodes.
Using BFS position labels to represent the nodes in k-ary tree:
The root is 0
The first child of any node n is k*n + 1
The right sibling of a node n, if it has one, is n+1
in code it looks like this:
class Whatever
{
static void addSubtree(List<Integer> list, int node, int k, int levelsleft)
{
if (levelsleft < 1)
{
return;
}
list.add(node);
for (int i=0; i<k; i++)
{
addSubtree(list, node*k+i+1, k, levelsleft-1);
}
}
public static void main (String[] args) throws java.lang.Exception
{
int K = 2, LEVELS = 4;
ArrayList<Integer> list = new ArrayList<>();
addSubtree(list, 0, K, LEVELS);
System.out.println(list);
}
}
This is actually used all the time to represent a binary heap in an array -- the nodes are the array elements in BFS order, and the tree is traversed by performing these operations on indexes.
You can use the standard DFS and BFS algorithms, but instead of getting the child nodes of a particular node from a pre-built tree structure, you can compute them as needed.
For a BFS-numbered, complete K-ary tree of height H, the i-th child of a node N at depth D is:
K*N + 1 + i
A derivation of this formula when i = 0 (first child) is provided here.
For a DFS-numbered, complete K-ary tree of height H, the i-th child of a node N at depth D is given by a much uglier formula:
N + 1 + i*step where step = (K^(H - D) - 1) / (K - 1)
Here is a rough explanation of this formula:
For a node N at depth D in a DFS-numbered K-ary tree of height H, its first child is simply N+1 because it is the next node to be visited in a depth-first traversal. The second child of N will be visited directly after visiting the entire sub-tree rooted at the first child (N+1), which is itself a complete K-ary tree of height H - (D + 1). The size of any complete, K-ary tree is given by the sum of a finite geometric series as explained here. The size of said sub-tree is the distance between the first and second children, and, in fact, it is the same distance between all siblings since each of their sub-trees are the same size. If we call this distance step, then:
1st child is N + 1
2nd child is N + 1 + step
3rd child is N + 1 + step + step
...and so on.
Below is a Python implementation (note: the dfs function uses the BFS formula, because it is converting from DFS to BFS, and vice-versa for the bfs function.):
def dfs(K, H):
stack = list()
push, pop = list.append, list.pop
push(stack, (0, 0))
while stack:
label, depth = pop(stack)
yield label
if depth + 1 > H: # leaf node
continue
for i in reversed(range(K)):
push(stack, (K*label + 1 + i, depth + 1))
def bfs(K, H):
from collections import deque
queue = deque()
push, pop = deque.append, deque.popleft
push(queue, (0, 0))
while queue:
label, depth = pop(queue)
yield label
if depth + 1 > H: # leaf node
continue
step = (K**(H - depth) - 1) // (K - 1)
for i in range(K):
push(queue, (label + 1 + i*step, depth + 1))
print(list(dfs(2, 3)))
print(list(bfs(2, 3)))
print(list(dfs(3, 2)))
print(list(bfs(3, 2)))
The above will print:
[0, 1, 3, 7, 8, 4, 9, 10, 2, 5, 11, 12, 6, 13, 14]
[0, 1, 8, 2, 5, 9, 12, 3, 4, 6, 7, 10, 11, 13, 14]
[0, 1, 4, 5, 6, 2, 7, 8, 9, 3, 10, 11, 12]
[0, 1, 5, 9, 2, 3, 4, 6, 7, 8, 10, 11, 12]
Here's some javascript that seems to solve the problem.
var arity = 2;
var depth = 3;
function look(rowstart, pos, dep) {
var number = rowstart + pos;
console.log(number);
if (dep < depth-1) {
var rowlen = Math.pow(arity, dep);
var newRowstart = rowstart + rowlen;
for (var i = 0; i < arity; i++) {
look(newRowstart, pos*arity + i, dep+1);
}
}
}
look(0, 0, 0);
It's a depth-first search that calculates the BFS label of each node on the way down.
It calculates the label of a node using the current depth dep, the horizontal position in the current row (pos) and the label of the first node in the row (rowstart).
For each node in a BST, what is the length of the longest path from the node to a leaf? (worst case)
I think in the worst case we have a linear path from a node to a leaf. If there are n nodes in a tree, then the running time is O(n*n). Is this right?
You can do this in linear time, assuming it's "A given node to every leaf" or "Every node to a given leaf." If it's "Every Node to Every Leaf", that's a bit harder.
To do this: Walk from the "target" to the root, marking each node by distance; colors all of these nodes red. (So, the root holds the depth of the target, and the target holds 0). For each red node, walk its non-red children, adding 1 as you descend, starting from the value of the red node.
It's not O(n*n) because you're able to re-use a lot of your work; you don't find one path, then start completely over to find the next.
The longest path from a node to a leaf would be
1. Going all the way from the node to the root
2.Then going from the root down to the deepest leaf
3.Things to ensure would be not to traverse a node twice because if thus was allowed, the path could be made infinitely long by going between any two nodes multiple times
x
/ \
b a
\
c
\
d
The longest path from c to leaf would do two things
1. Go from c to x (count this length)
2. Go from x to deepest leaf which does not have c in its path (in this case that leaf is b)
The code below has a time complexity of O(n) for finding longest distance from a single node
Therefore for finding distance for all nodes would be O(n^2)
public int longestPath(Node n, Node root) {
int path = 0;
Node x = n;
while (x.parent != null) {
x = x.parent;
path++;
}
int height = maxHeight(root, n);
return path + height;
}
private int maxHeight(Node x, Node exclude) {
if (x == exclude)
return -1;
if (x == null)
return -1;
if (x.left == null && x.right == null)
return 0;
int l = maxHeight(x.left, exclude);
int r = maxHeight(x.right, exclude);
return l > r ? l + 1 : r + 1;
}
My solution would be.
1) Check if the node has subtree, if yes then find the height of that subtree.
2) Find the height of rest of the tree, as mentioned by #abhaybhatia
return the max of two heights
Suppose that you are given an arbitrary binary tree. We'll call the tree balanced if the following is true for all nodes:
That node is a leaf, or
The height of the left subtree and the height of the right subtree differ by at most ±1 and the left and right subtrees are themselves balanced.
Is there an efficient algorithm for determining the minimum number of nodes that need to be added to the tree in order to make it balanced? For simplicity, we'll assume that nodes can only be inserted as leaf nodes (like the way that a node is inserted into a binary search tree that does no rebalancing).
The following tree fits into your definition, although it doesn't seem very balanced to me:
EDIT This answer is wrong, but it has enough interesting stuff in it that I don't feel like deleting it yet. The algorithm produces a balanced tree, but not a minimal one. The number of nodes it adds is:
where n ranges over all nodes in the tree, lower(n) is the depth of the child of n with the lower depth and upper(n) is the depth of the child of n with the higher depth. Using the fact that the sum of the first k fibonacci numbers is fib(k+2)-1, we can replace the inner sum with fib(upper(n)) - fib(lower(n) + 2).
The formula is (more or less) derived from the following algorithm to add nodes to the tree, making it balanced (in python, only showing the relevant algorithms):
def balance(tree, label):
if tree is None:
return (None, 0)
left, left_height = balance(tree.left_child, label)
right, right_height = balance(tree.right_child, label)
while left_height < right_height - 1:
left = Node(label(), left, balanced_tree(left_height - 1, label))
left_height += 1
while right_height < left_height - 1:
right = Node(label(), right, balanced_tree(right_height - 1, label))
right_height += 1
return (Node(tree.label, left, right), max(left_height, right_height) + 1)
def balanced_tree(depth, label):
if depth <= 0:
return None
else:
return Node(label(),
balanced_tree(depth - 1, label),
balanced_tree(depth - 2, label))
As requested: report the count instead of creating the tree:
def balance(tree):
if tree is None:
return (0, 0)
left, left_height = balance(tree.left_child)
right, right_height = balance(tree.right_child)
while left_height < right_height - 1:
left += balanced_tree(left_height - 1) + 1
left_height += 1
while right_height < left_height - 1:
right += balanced_tree(right_height - 1) + 1
right_height += 1
return (left + right, max(left_height, right_height) + 1)
def balanced_tree(depth):
if depth <= 0:
return 0
else:
return (1 + balanced_tree(depth - 1)
+ balanced_tree(depth - 2))
Edit: Actually, I think that other than computing the size of a minimum balanced tree of depth n more efficiently (i.e. memoizing it, or used the closed form: it's just fibonacci(n+1)-1), that's probably as efficient as you can get, since you have to examine every node in the tree in order to test the balance condition, and that algorithm looks at every node precisely once.
Will this work?
Go recursively from the top. If the node A is imbalanced, add a node B on the short side and and enough left nodes to node B until node A is balanced.
(Of course, count the nodes added.)
First let's find the height of left child and right child of each node.
Now consider the root of the tree it's height is
1+max(height(root.left) , height(root.right)).
let's assume left has height of n-1 then right should have minimum height of n-2. Let's define another relation here req[node] -> the minimum required height of each node to make the tree balanced.
if you observe for a node to be at height h one of it's children should be at least at n-1 and to make it balanced the other children should be at at least n-2.
start from root with req[root] = height of root
The pseudo code is :
def chk_nodes(root, req):
if(root == NULL):
return minNodes(req)
if(left[root] > right[root]):
return chk_nodes(root.left , req-1) + chk_nodes(root.right , req-2)
else return chk_nodes(root.left , req-2) + chk_nodes(root.right , req-1)
Now what is minNodes(int req)?
It's a function which return 'minimum no of nodes required to create a balanced binary tree of height h'. the function is quite intuitive and self-explanatory.
def minNodes(int req) :
if req < 0 : return 0
return 1 + minNodes(req-1) + minNodes(req-2)
In minNodes function , a lookup table can be used to make it O(1) lookup time and O(N) for construction.
when the chk_nodes function runs recursively , at leaf-nodes we will be left with left-node , req . if that req > 0 then there should be a new sub-tree (balanced) with height req. Hence minNodes( req ) are required at this particular leaf-node.
With only 2 traversals and O(N) time, O(N) space the problem is solved.