Generating uniformly random curious binary trees - algorithm

A binary tree of N nodes is 'curious' if it is a binary tree whose node values are 1, 2, ..,N and which satisfy the property that
Each internal node of the tree has exactly one descendant which is greater than it.
Every number in 1,2, ..., N appears in the tree exactly once.
Example of a curious binary tree
4
/ \
5 2
/ \
1 3
Can you give an algorithm to generate a uniformly random curious binary tree of n nodes, which runs in O(n) guaranteed time?
Assume you only have access to a random number generator which can give you a (uniformly distributed) random number in the range [1, k] for any 1 <= k <= n. Assume the generator runs in O(1).
I would like to see an O(nlogn) time solution too.
Please follow the usual definition of labelled binary trees being distinct, to consider distinct curious binary trees.

There is a bijection between "curious" binary trees and standard heaps. Namely, given a heap, recursively (starting from the top) swap each internal node with its largest child. And, as I learned in StackOverflow not long ago, a heap is equivalent to a permutation of 1,2,...,N. So you should make a random permutation and turn it into a heap; or recursively make the heap in the same way that you would have made a random permutation. After that you can convert the heap to a "curious tree".

Aha, I think I've got how to create a random heap in O(N) time. (after which, use approach in Greg Kuperberg's answer to transform into "curious" binary tree.)
edit 2: Rough pseudocode for making a random min-heap directly. Max-heap is identical except the values inserted into the heap are in reverse numerical order.
struct Node {
Node left, right;
Object key;
constructor newNode() {
N = new Node;
N.left = N.right = null;
N.key = null;
}
}
function create-random-heap(RandomNumberGenerator rng, int N)
{
Node heap = Node.newNode();
// Creates a heap with an "incomplete" node containing a null, and having
// both child nodes as null.
List incompleteHeapNodes = [heap];
// use a vector/array type list to keep track of incomplete heap nodes.
for k = 1:N
{
// loop invariant: incompleteHeapNodes has k members. Order is unimportant.
int m = rng.getRandomNumber(k);
// create a random number between 0 and k-1
Node node = incompleteHeapNodes.get(m);
// pick a random node from the incomplete list,
// make it a complete node with key k.
// It is ok to do so since all of its parent nodes
// have values less than k.
node.left = Node.newNode();
node.right = Node.newNode();
node.key = k;
// Now remove this node from incompleteHeapNodes
// and add its children. (replace node with node.left,
// append node.right)
incompleteHeapNodes.set(m, node.left);
incompleteHeapNodes.append(node.right);
// All operations in this loop take O(1) time.
}
return prune-null-nodes(heap);
}
// get rid of all the incomplete nodes.
function prune-null-nodes(heap)
{
if (heap == null || heap.key == null)
return null;
heap.left = prune-null-nodes(heap.left);
heap.right = prune-null-nodes(heap.right);
}

Related

What's the most efficient way to convert a binomial tree into a sorted array of keys?

Say we are given a single-tree binomial heap, such that the binomial tree is of rank r, and thus holding 2^r keys. What's the most efficient way to convert it into a k<2^r length sorted array, with the k smallest keys of the tree? Let's assume we can't use any other data structure but Lazy Binomial Heaps, and Binomial Trees. Notice that at each level the children are unnecessarily linked by order, so you might have to make some comparisons at some point.
My solution was (assuming 1<=k<=2^r):
Create a new empty lazy binomial heap H.
Insert the root's key into the heap.
Create a new counter x, and set x=1.
For each level i=0,1,... (where the root is at level 0):
Let c be the number of nodes at level i.
Set x=x+c.
Iterate over the nodes in level i and:
Insert each node N to H. (In O(1))
If x < k, recursively make the same process for each node N, passing through x so the counting continues.
Repeat k times:
Extract the minimal key out of the heap and place it in the output array.
Delete the minimal key form the heap. (amortized cost: O(1))
Return output array.
There might be some holes in the pseudo-code, but I think the idea itself is clear. I also managed to implement it. However, I'm not sure that's the most efficient algorithm for this task.
Thanks to Gene's comment I see that the earlier algorithm I suggested will not always work, as it assumes the maximal node at level x is smaller than the minimal node at level x-1, which is not a reasonable assumption.
Yet, I believe this one makes the job efficiently:
public static int[] kMin(FibonacciHeap H, int k) {
if (H == null || H.isEmpty() || k <= 0)
return new int[0];
HeapNode tree = H.findMin();
int rank = tree.getRank();
int size = H.size();
size = (int) Math.min(size, Math.pow(2, rank));
if (k > size)
k = size;
int[] result = new int[k];
FibonacciHeap heap = new FibonacciHeap();
HeapNode next = H.findMin();
for(int i = 0; i < k; i++) { // k iterations
if(next != null)
for (Iterator<HeapNode> iter = next.iterator(); iter.hasNext(); ) { // rank nCr next.getParent().getRank() iterations.
next = iter.next();
HeapNode node = heap.insert(next.getKey()); // O(1)
node.setFreePointer(next);
}
next = heap.findMin().getFreePointer();
result[i] = next.getKey();
heap.deleteMin(); // O(log n) amortized cost.
next = next.child;
}
return result;
}
"freePointer" is a field in HeapNode, where I can store a pointer to another HeapNode. It is basically the "info field" most heaps have.
let r be the rank of the tree. Every iteration we insert at most r items to the external heap. In addition, every iteration we use Delete-Min to delete one item from the heap.
Therefore, the total cost of insertions is O(kr), and the total cost of Delete-Min is O(k*log(k)+k*log(r)). So the total cost of everything becomes O(k(log(k)+r))

Construct a binary tree from permutation in n log n time

The numbers 1 to n are inserted in a binary search tree in a specified order p_1, p_2,..., p_n. Describe an O(nlog n) time algorithm to construct the resulting final binary search tree.
Note that :-
I don't need average time n log n, but the worst time.
I need the the exact tree that results when insertion takes place with the usual rules. AVL or red black trees not allowed.
This is an assignment question. It is very very non trivial. In fact it seemed impossible at first glance. I have thought on it much. My observations:-
The argument that we use to prove that sorting takes atleast n log n time does not eliminate the existence of such an algorithm here.
If it is always possible to find a subtree in O(n) time whose size is between two fractions of the size of tree, the problem can be easily solved.
Choosing median or left child of root as root of subtree doesn't work.
The trick is not to use the constructed BST for lookups. Instead, keep an additional, balanced BST for lookups. Link the leaves.
For example, we might have
Constructed Balanced
3 2
/ \ / \
2 D 1 3
/ \ / | | \
1 C a b c d
/ \
A B
where a, b, c, d are pointers to A, B, C, D respectively, and A, B, C, D are what would normally be null pointers.
To insert, insert into the balanced BST first (O(log n)), follow the pointer to the constructed tree (O(1)), do the constructed insert (O(1)), and relink the new leaves (O(1)).
As David Eisenstat doesn't have time to extend his answer, I'll try to put more details into a similar algorithm.
Intuition
The main intuition behind the algorithm is based on the following statements:
statement #1: if a BST contains values a and b (a < b) AND there are no values between them, then either A (node for value a) is a (possibly indirect) parent of B (node for value b) or B is a (possibly indirect) parent of A.
This statement is obviously true because if their lowest common ancestor C is some other node than A and B, its value c must be between a and b. Note that statement #1 is true for any BST (balanced or unbalanced).
statement #2: if a simple (unbalanced) BST contains values a and b (a < b) AND there are no values between them AND we are trying to add value x such that a < x < b, then X (node for value x) will be either direct right (greater) child of A or direct left (less) child of B whichever node is lower in the tree.
Let's assume that the lower of two nodes is a (the other case is symmetrical). During insertion phase value x will travel the same path as a during its insertion because tree doesn't contain any values between a and x i.e. at any comparison values a and x are indistinguishable. It means that value x will navigate tree till node A and will pass node B at some earlier step (see statement #1). As x > a it should become a right child of A. Direct right child of A must be empty at this point because A is in B's subtree i.e. all values in that subtree are less than b and since there are no values between a and b in the tree, no value can be right child of node A.
Note that statement #2 might potentially be not true for some balanced BST after re-balancing was performed although this should be a strange case.
statement #3: in a balanced BST for any value x not in the tree yet, you can find closest greater and closest less values in O(log(N)) time.
This follows directly from statements #1 and #2: all you need is find the potential insertion point for the value x in the BST (takes O(log(N))), one of the two values will be direct parent of the insertion point and to find the other you need to travel the tree back to the root (again takes O(log(N))).
So now the idea behind the algorithm becomes clear: for fast insertion into an unbalanced BST we need to find nodes with closest less and greater values. We can easily do it if we additionally maintain a balanced BST with the same keys as our target (unbalanced) BST and with corresponding nodes from that BST as values. Using that additional data structure we can find insertion point for each new value in O(log(N)) time and update this data structure with new value in O(log(N)) time as well.
Algorithm
Init "main" root and balancedRoot with null.
for each value x in the list do:
if this is the first value just add it as the root nodes to both trees and go to #2
in the tree specified by balancedRoot find nodes that correspond to the closest less (BalancedA, points to node A in the main BST) and closest greater (BalancedB, points to node B in the main BST) values.
If there is no closest lower value i.e. we are adding minimum element, add it as the left child to the node B
If there is no closest greater value i.e. we are adding maximum element, add it as the right child to the node A
Find whichever of nodes A or B is lower in the tree. You can use explicit level stored in the node. If the lower node is A (less node), add x as the direct right child of A else add x as the direct left child of B (greater node). Alternatively (and more cleverly) you may notice that from the statements #1 and #2 follows that exactly one of the two candidate insert positions (A's right child or B's left child) will be empty and this is where you want to insert your value x.
Add value x to the balanced tree (might re-use from step #4).
Go to step #2
As no inner step of the loop takes more than O(log(N)), total complexity is O(N*log(N))
Java implementation
I'm too lazy to implement balanced BST myself so I used standard Java TreeMap that implements Red-Black tree and has useful lowerEntry and higherEntry methods that correspond to step #4 of the algorithm (you may look at the source code to ensure that both are actually O(log(N))).
import java.util.Map;
import java.util.TreeMap;
public class BSTTest {
static class Node {
public final int value;
public Node left;
public Node right;
public Node(int value) {
this.value = value;
}
public boolean compareTree(Node other) {
return compareTrees(this, other);
}
public static boolean compareTrees(Node n1, Node n2) {
if ((n1 == null) && (n2 == null))
return true;
if ((n1 == null) || (n2 == null))
return false;
if (n1.value != n2.value)
return false;
return compareTrees(n1.left, n2.left) &&
compareTrees(n1.right, n2.right);
}
public void assignLeftSafe(Node child) {
if (this.left != null)
throw new IllegalStateException("left child is already set");
this.left = child;
}
public void assignRightSafe(Node child) {
if (this.right != null)
throw new IllegalStateException("right child is already set");
this.right = child;
}
#Override
public String toString() {
return "Node{" +
"value=" + value +
'}';
}
}
static Node insertToBst(Node root, int value) {
if (root == null)
root = new Node(value);
else if (value < root.value)
root.left = insertToBst(root.left, value);
else
root.right = insertToBst(root.right, value);
return root;
}
static Node buildBstDirect(int[] values) {
Node root = null;
for (int v : values) {
root = insertToBst(root, v);
}
return root;
}
static Node buildBstSmart(int[] values) {
Node root = null;
TreeMap<Integer, Node> balancedTree = new TreeMap<Integer, Node>();
for (int v : values) {
Node node = new Node(v);
if (balancedTree.isEmpty()) {
root = node;
} else {
Map.Entry<Integer, Node> lowerEntry = balancedTree.lowerEntry(v);
Map.Entry<Integer, Node> higherEntry = balancedTree.higherEntry(v);
if (lowerEntry == null) {
// adding minimum value
higherEntry.getValue().assignLeftSafe(node);
} else if (higherEntry == null) {
// adding max value
lowerEntry.getValue().assignRightSafe(node);
} else {
// adding some middle value
Node lowerNode = lowerEntry.getValue();
Node higherNode = higherEntry.getValue();
if (lowerNode.right == null)
lowerNode.assignRightSafe(node);
else
higherNode.assignLeftSafe(node);
}
}
// update balancedTree
balancedTree.put(v, node);
}
return root;
}
public static void main(String[] args) {
int[] input = new int[]{7, 6, 9, 4, 1, 8, 2, 5, 3};
Node directRoot = buildBstDirect(input);
Node smartRoot = buildBstSmart(input);
System.out.println(directRoot.compareTree(smartRoot));
}
}
Here's a linear-time algorithm. (I said that I wasn't going to work on this question, so if you like this answer, please award the bounty to SergGr.)
Create a doubly linked list with nodes 1..n and compute the inverse of p. For i from n down to 1, let q be the left neighbor of p_i in the list, and let r be the right neighbor. If p^-1(q) > p^-1(r), then make p_i the right child of q. If p^-1(q) < p^-1(r), then make p_i the left child of r. Delete p_i from the list.
In Python:
class Node(object):
__slots__ = ('left', 'key', 'right')
def __init__(self, key):
self.left = None
self.key = key
self.right = None
def construct(p):
# Validate the input.
p = list(p)
n = len(p)
assert set(p) == set(range(n)) # 0 .. n-1
# Compute p^-1.
p_inv = [None] * n
for i in range(n):
p_inv[p[i]] = i
# Set up the list.
nodes = [Node(i) for i in range(n)]
for i in range(n):
if i >= 1:
nodes[i].left = nodes[i - 1]
if i < n - 1:
nodes[i].right = nodes[i + 1]
# Process p.
for i in range(n - 1, 0, -1): # n-1, n-2 .. 1
q = nodes[p[i]].left
r = nodes[p[i]].right
if r is None or (q is not None and p_inv[q.key] > p_inv[r.key]):
print(p[i], 'is the right child of', q.key)
else:
print(p[i], 'is the left child of', r.key)
if q is not None:
q.right = r
if r is not None:
r.left = q
construct([1, 3, 2, 0])
Here's my O(n log^2 n) attempt that doesn't require building a balanced tree.
Put nodes in an array in their natural order (1 to n). Also link them into a linked list in the order of insertion. Each node stores its order of insertion along with the key.
The algorithm goes like this.
The input is a node in the linked list, and a range (low, high) of indices in the node array
Call the input node root, Its key is rootkey. Unlink it from the list.
Determine which subtree of the input node is smaller.
Traverse the corresponding array range, unlink each node from the linked list, then link them in a separate linked list and sort the list again in the insertion order.
Heads of the two resulting lists are children of the input node.
Perform the algorithm recursively on children of the input node, passing ranges (low, rootkey-1) and (rootkey+1, high) as index ranges.
The sorting operation at each level gives the algorithm the extra log n complexity factor.
Here's an O(n log n) algorithm that can also be adapted to O(n log log m) time, where m is the range, by using a Y-fast trie rather than a balanced binary tree.
In a binary search tree, lower values are left of higher values. The order of insertion corresponds with the right-or-left node choices when traveling along the final tree. The parent of any node, x, is either the least higher number previously inserted or the greatest lower number previously inserted, whichever was inserted later.
We can identify and connect the listed nodes with their correct parents using the logic above in O(n log n) worst-time by maintaining a balanced binary tree with the nodes visited so far as we traverse the order of insertion.
Explanation:
Let's imagine a proposed lower parent, p. Now imagine there's a number, l > p but still lower than x, inserted before p. Either (1) p passed l during insertion, in which case x would have had to pass l to get to p but that contradicts that x must have gone right if it reached l; or (2) p did not pass l, in which case p is in a subtree left of l but that would mean a number was inserted that's smaller than l but greater than x, a contradiction.
Clearly, a number, l < x, greater than p that was inserted after p would also contradict p as x's parent since either (1) l passed p during insertion, which means p's right child would have already been assigned when x was inserted; or (2) l is in a subtree to the right of p, which again would mean a number was inserted that's smaller than l but greater than x, a contradiction.
Therefore, for any node, x, with a lower parent, that parent must be the greatest number lower than and inserted before x. Similar logic covers the scenario of a higher proposed parent.
Now let's imagine x's parent, p < x, was inserted before h, the lowest number greater than and inserted before x. Then either (1) h passed p, in which case p's right node would have been already assigned when x was inserted; or (2) h is in a subtree right of p, which means a number lower than h and greater than x was previously inserted but that would contradict our assertion that h is the lowest number inserted so far that's greater than x.
Since this is an assignment, I'm posting a hint instead of an answer.
Sort the numbers, while keeping the insertion order. Say you have input: [1,7,3,5,8,2,4]. Then after sorting you will have [[1,0], [2,5], [3,2], [4, 6], [5,3], [7,1], [8,4]] . This is actually the in-order traversal of the resulting tree. Think hard about how to reconstruct the tree given the in-order traversal and the insertion order (this part will be linear time).
More hints coming if you really need them.

How to build a binary tree in O(N ) time?

Following on from a previous question here I'm keen to know how to build a binary tree from an array of N unsorted large integers in order N time?
Unless you have some pre-conditions on the list that allow you to calculate the position in the tree for each item in constant time it is not possible to 'build', that is sequentially insert, items into a tree in O(N) time. Each insertion has to compare up to Log M times where M is the number of items already in the tree.
OK, just for completeness... The binary tree in question is built from an array and has a leaf for every array element. It keeps them in their original index order, not value order, so it doesn't magically let you sort a list in linear time. It also needs to be balanced.
To build such a tree in linear time, you can use a simple recursive algorithm like this (using 0-based indexes):
//build a tree of elements [start, end) in array
//precondition: end > start
buildTree(int[] array, int start, int end)
{
if (end-start > 1)
{
int mid = (start+end)>>1;
left = buildTree(array, start, mid);
right = buildTree(array, mid, end);
return new InternalNode(left,right);
}
else
{
return new LeafNode(array[start]);
}
}
I agree that this seems impossible in general (assuming we have a general, totally ordered set S of N items.) Below is an informal argument where I essentially reduce the building of a BST on S to the problem of sorting S.
Informal argument. Let S be a set of N elements. Now construct a binary search tree T that stores items from S in O(N) time.
Now do an inorder walk of the tree and print values of the leaves as you visit them. You essentially sorted the elements from S. This took you O(|T|) steps, where |T| is the size of the tree (i.e. the number of nodes). (The size of the BST is O(N log N) in the worst case.)
If |T|=o(N log N) then you just solved the general sorting problem in o(N log N) time which is a contradiction.
I have an idea, how it is possible.
Sort array with RadixSort, this is O(N). Thereafter, use recursive procedure to insert into leafs, like:
node *insert(int *array, int size) {
if(size <= 0)
return NULL;
node *rc = new node;
int midpoint = size / 2;
rc->data = array[midpoint];
rc->left = insert(array, midpoint);
rc->right = insert(array + midpoint + 1, size - midpoint - 1);
return rc;
}
Since we do not iterate tree from up to down, but always attach nodes to a current leafs, this is also O(1).

k-th largest element in B-tree [duplicate]

I'm trying to understand how I should think about getting the k-th key/element in a B-tree. Even if it's steps instead of code, it will still help a lot. Thanks
Edit: To clear up, I'm asking for the k-th smallest key in the B-tree.
There's no efficient way to do it using a standard B-tree. Broadly speaking, I see 2 options:
Convert the B-tree to an order statistic tree to allow for this operation in O(log n).
That is, for each node, keep a variable representing the size (number of elements) of the subtree rooted at that node (that node, all its children, all its children's children, etc.).
Whenever you do an insertion or deletion, you update this variable appropriately. You will only need to update nodes already being visited, so it won't change the complexity of those operations.
Getting the k-th element would involve adding up the sizes of the children until we get to k, picking the appropriate child to visit and decreasing k appropriately. Pseudo-code:
select(root, k) // initial call for root
// returns the k'th element of the elements in node
function select(node, k)
for i = 0 to t.elementCount
size = 0
if node.child[i] != null
size = node.sizeOfChild[i]
if k < size // element is in the child subtree
return select(node.child[i], k)
else if k == size // element is here
&& i != t.elementCount // only equal when k == elements in tree, i.e. k is not valid
return t.element[i]
else // k > size, element is to the right
k -= size + 1 // child[i] subtree + t.element[i]
return null // k > elements in tree
Consider child[i] to be directly to the left of element[i].
The pseudo-code for the binary search tree (not B-tree) provided on Wikipedia may explain the basic concept here better than the above.
Note that the size of a node's subtree should be store in its parent (note that I didn't use node.child[i].size above). Storing it in the node itself will be much less efficient, as reading nodes is considered a non-trivial or expensive operation for B-tree use cases (nodes must often be read from disk), thus you want to minimise the number of nodes read, even if that would make each node slightly bigger.
Do an in-order traversal until you've seen k elements - this will take O(n).
Pseudo-code:
select(root, *k) // initial call for root
// returns the k'th element of the elements in node
function select(node, *k) // pass k by pointer, allowing global update
if node == null
return null
for i = 0 to t.elementCount
element = select(node.child[i], k) // check if it's in the child's subtree
if element != null // element was found
return element
if i != t.elementCount // exclude last iteration
if k == 0 // element is here
return t.element[i]
(*k)-- // only decrease k for t.element[i] (i.e. by 1),
// k is decreased for node.child[i] in the recursive call
return null
You can use a new balanced binary search tree(like Splay or just using std::set) to record what elements are currently in the B-Tree. This will allow every operation to finish in O(logn), and its quite easy to implement(when using std::set) but will double the space cost.
Ok so, after a few sleepless hours I managed to do it, and for anyone who will wonder how, here it goes in pseudocode (k=0 for first element):
get_k-th(current, k):
for i = 0 to current.number_of_children_nodes
int size = size_of_B-tree(current.child[i])
if(k <= size-1)
return get_k-th(current.child[i], k)
else if(k == size && i < current.number_of_children_nodes)
return current.key[i]
else if (is_leaf_node(current) && k < current.number_of_children_nodes)
return node.key[k]
k = k - size - 1;
return null
I know this might look kinda weird, but it's what I came up with and thankfully it works. There might be a way to make this code clearer, and probably more efficient, but I hope it's good enough to help anyone else who might get stuck on the same obstacle as I did.

In a BST two nodes are randomly swapped we need to find those two nodes and swap them back

Given a binary search tree, in which any two nodes are swapped. So we need to find those two nodes and swap them back(we need to swap the nodes, not the data)
I tried to do this by making an additional array which has the inorder traversal of the tree and also saves the pointer to each node.
Then we just traverse the array and find the two nodes that are not in the sorted order, now these two nodes are searched in the tree and then swapped
So my question is how to solve this problem in O(1) space ?
In order traversal on a BST gives you a traversal on the elements, ordered.
You can use an in-order traversal, find the two out of place elements (store them in two pointers) and swap them back.
This method is actually creating the array you described on the fly, without actually creating it.
Note that however, space consumption is not O(1), it is O(h) where h is the height of the tree, due to the stack trace. If the tree is balanced, it will be O(logn)
Depending on the BST this can be done in O(1).
For instance, if the nodes of the tree have a pointer back to their parents you can use the implementation described in the nonRecrusiveInOrderTraversal section of the Wikipedia page for Tree_traversal.
As another possibility, if the BST is stored as a 1-dimensional array, instead of with pointers between nodes, you can simply walk over the array (and do the math to determine each node's parent and children).
While doing the traversal/walk, check to see if the current element is in the correct place.
If it isn't, then you can just see where in the tree it should be, and swap with the element in that location. (take some care for if the root is in the wrong place).
We can modified the isBST() method as below to swap those 2 randomly swapped nodes and correct it.
/* Returns true if the given tree is a binary search tree
(efficient version). */
int isBST(struct node* node)
{
struct node *x = NULL;
return(isBSTUtil(node, INT_MIN, INT_MAX,&x));
}
/* Returns true if the given tree is a BST and its
values are >= min and <= max. */
int isBSTUtil(struct node* node, int min, int max, struct node **x)
{
/* an empty tree is BST */
if (node==NULL)
return 1;
/* false if this node violates the min/max constraint */
if (node->data < min || node->data > max) {
if (*x == NULL) {
*x = node;//same the first incident where BST property is not followed.
}
else {
//here we second node, so swap it with *x.
int tmp = node->data;
node->data = (*x)->data;
(*x)->data = tmp;
}
//return 0;
return 1;//have to return 1; as we are not checking here if tree is BST, as we know it is not BST, and we are correcting it.
}
/* otherwise check the subtrees recursively,
tightening the min or max constraint */
return
isBSTUtil(node->left, min, node->data-1, x) && // Allow only distinct values
isBSTUtil(node->right, node->data+1, max, x); // Allow only distinct values
}

Resources