Given a BST, is it possible to find two numbers that add up to a given value, in O(n) time and little additional memory. By little additional memory, it is implied that you can't copy the entire BST into an array.
This can be accomplished in O(n) time and O(1) additional memory if you have both child and parent pointers. You keep two pointers, x and y, and start x at the minimum element and y at the maximum. If the sum of these two elements is too low, you move x to its successor, and if it's too high you move y to its predecessor. You can report a failure once x points to a larger element than y. Each edge in the tree is traversed at most twice for a total of O(n) edge traversals, and your only memory usage is the two pointers. Without parent pointers, you need to remember the sequence of ancestors to the root, which is at least Omega(log n) and possibly higher if the tree is unbalanced.
To find a successor, you can use the following pseudocode (analogous code for predecessor):
succ(x) {
if (x.right != null) {
ret = x.right;
while (ret.left != null) ret = ret.left;
return ret;
} else {
retc = x;
while (retc.parent != null && retc.parent < x) retc = retc.parent;
if (retc.parent != null && retc.parent > x) return retc.parent;
else return null;
}
}
I think jonderry is very close, but the parent pointers require \Omega(n) memory, that is they add substantially to memory usage. What he is doing is two coordinated traversals in opposite directions (small to large and viveversa) trying to keep the sum always close to the target and you can manage that with two stacks, and the stacks can only grow up to the depth of the tree and that is O(log n). I don't know if that's "little" additional memory, but certainly it is less additional memory and o(n). So this is exactly as in jonderry own comment, but there is no runtime penalty because traversing a binary tree using only a stack is a well known and efficient and definitely O(n) operation. So you have increasing iterator ii and a decreasing iterator di.
x = ii.next()
y = di.next()
while (true) {
try:
if x + y > target {y = di.next()}
if x + y < target {x = ii.next()}
if x + y == target {return (x,y)}
except IterError:
break
}
return None
I had just run into the same problem in the context of computing the pseudomedian, the median of all pairwise averages in a sample.
Yes, if you parse it as it would be a sorted array.
Related
Say we are given a single-tree binomial heap, such that the binomial tree is of rank r, and thus holding 2^r keys. What's the most efficient way to convert it into a k<2^r length sorted array, with the k smallest keys of the tree? Let's assume we can't use any other data structure but Lazy Binomial Heaps, and Binomial Trees. Notice that at each level the children are unnecessarily linked by order, so you might have to make some comparisons at some point.
My solution was (assuming 1<=k<=2^r):
Create a new empty lazy binomial heap H.
Insert the root's key into the heap.
Create a new counter x, and set x=1.
For each level i=0,1,... (where the root is at level 0):
Let c be the number of nodes at level i.
Set x=x+c.
Iterate over the nodes in level i and:
Insert each node N to H. (In O(1))
If x < k, recursively make the same process for each node N, passing through x so the counting continues.
Repeat k times:
Extract the minimal key out of the heap and place it in the output array.
Delete the minimal key form the heap. (amortized cost: O(1))
Return output array.
There might be some holes in the pseudo-code, but I think the idea itself is clear. I also managed to implement it. However, I'm not sure that's the most efficient algorithm for this task.
Thanks to Gene's comment I see that the earlier algorithm I suggested will not always work, as it assumes the maximal node at level x is smaller than the minimal node at level x-1, which is not a reasonable assumption.
Yet, I believe this one makes the job efficiently:
public static int[] kMin(FibonacciHeap H, int k) {
if (H == null || H.isEmpty() || k <= 0)
return new int[0];
HeapNode tree = H.findMin();
int rank = tree.getRank();
int size = H.size();
size = (int) Math.min(size, Math.pow(2, rank));
if (k > size)
k = size;
int[] result = new int[k];
FibonacciHeap heap = new FibonacciHeap();
HeapNode next = H.findMin();
for(int i = 0; i < k; i++) { // k iterations
if(next != null)
for (Iterator<HeapNode> iter = next.iterator(); iter.hasNext(); ) { // rank nCr next.getParent().getRank() iterations.
next = iter.next();
HeapNode node = heap.insert(next.getKey()); // O(1)
node.setFreePointer(next);
}
next = heap.findMin().getFreePointer();
result[i] = next.getKey();
heap.deleteMin(); // O(log n) amortized cost.
next = next.child;
}
return result;
}
"freePointer" is a field in HeapNode, where I can store a pointer to another HeapNode. It is basically the "info field" most heaps have.
let r be the rank of the tree. Every iteration we insert at most r items to the external heap. In addition, every iteration we use Delete-Min to delete one item from the heap.
Therefore, the total cost of insertions is O(kr), and the total cost of Delete-Min is O(k*log(k)+k*log(r)). So the total cost of everything becomes O(k(log(k)+r))
For a binary search of a sorted array of 2^n-1 elements in which the element we are looking for appears, what is the amortized worst-case time complexity?
Found this on my review sheet for my final exam. I can't even figure out why we would want amortized time complexity for binary search because its worst case is O(log n). According to my notes, the amortized cost calculates the upper-bound of an algorithm and then divides it by the number of items, so wouldn't that be as simple as the worst-case time complexity divided by n, meaning O(log n)/2^n-1?
For reference, here is the binary search I've been using:
public static boolean binarySearch(int x, int[] sorted) {
int s = 0; //start
int e = sorted.length-1; //end
while(s <= e) {
int mid = s + (e-s)/2;
if( sorted[mid] == x )
return true;
else if( sorted[mid] < x )
start = mid+1;
else
end = mid-1;
}
return false;
}
I'm honestly not sure what this means - I don't see how amortization interacts with binary search.
Perhaps the question is asking what the average cost of a successful binary search would be. You could imagine binary searching for all n elements of the array and looking at the average cost of such an operation. In that case, there's one element for which the search makes one probe, two for which the search makes two probes, four for which it makes three probes, etc. This averages out to O(log n).
Hope this helps!
iAmortized cost is the total cost over all possible queries divided by the number of possible queries. You will get slightly different results depending on how you count queries that fail to find the item. (Either don't count them at all, or count one for each gap where a missing item could be.)
So for a search of 2^n - 1 items (just as an example to keep the math simple), there is one item you would find on your first probe, 2 items would be found on the second probe, 4 on the third probe, ... 2^(n-1) on the nth probe. There are 2^n "gaps" for missing items (remembering to count both ends as gaps).
With your algorithm, finding an item on probe k costs 2k-1 comparisons. (That's 2 compares for each of the k-1 probes before the kth, plus one where the test for == returns true.) Searching for an item not in the table costs 2n comparisons.
I'll leave it to you to do the math, but I can't leave the topic without expressing how irked I am when I see binary search coded this way. Consider:
public static boolean binarySearch(int x, int[] sorted {
int s = 0; // start
int e = sorted.length; // end
// Loop invariant: if x is at sorted[k] then s <= k < e
int mid = (s + e)/2;
while (mid != s) {
if (sorted[mid] > x) e = mid; else s = mid;
mid = (s + e)/2; }
return (mid < e) && (sorted[mid] == x); // mid == e means the array was empty
}
You don't short-circuit the loop when you hit the item you're looking for, which seems like a defect, but on the other hand you do only one comparison on every item you look at, instead of two comparisons on each item that doesn't match. Since half of all items are found at leaves of the search tree, what seems like a defect turns out to be a major gain. Indeed, the number of elements where short-circuiting the loop is beneficial is only about the square root of the number of elements in the array.
Grind through the arithmetic, computing amortized search cost (counting "cost" as the number of comparisons to sorted[mid], and you'll see that this version is approximately twice as fast. It also has constant cost (within ±1 comparison), depending only on the number of items in the array and not on where or even if the item is found. Not that that's important.
I'm trying to understand how I should think about getting the k-th key/element in a B-tree. Even if it's steps instead of code, it will still help a lot. Thanks
Edit: To clear up, I'm asking for the k-th smallest key in the B-tree.
There's no efficient way to do it using a standard B-tree. Broadly speaking, I see 2 options:
Convert the B-tree to an order statistic tree to allow for this operation in O(log n).
That is, for each node, keep a variable representing the size (number of elements) of the subtree rooted at that node (that node, all its children, all its children's children, etc.).
Whenever you do an insertion or deletion, you update this variable appropriately. You will only need to update nodes already being visited, so it won't change the complexity of those operations.
Getting the k-th element would involve adding up the sizes of the children until we get to k, picking the appropriate child to visit and decreasing k appropriately. Pseudo-code:
select(root, k) // initial call for root
// returns the k'th element of the elements in node
function select(node, k)
for i = 0 to t.elementCount
size = 0
if node.child[i] != null
size = node.sizeOfChild[i]
if k < size // element is in the child subtree
return select(node.child[i], k)
else if k == size // element is here
&& i != t.elementCount // only equal when k == elements in tree, i.e. k is not valid
return t.element[i]
else // k > size, element is to the right
k -= size + 1 // child[i] subtree + t.element[i]
return null // k > elements in tree
Consider child[i] to be directly to the left of element[i].
The pseudo-code for the binary search tree (not B-tree) provided on Wikipedia may explain the basic concept here better than the above.
Note that the size of a node's subtree should be store in its parent (note that I didn't use node.child[i].size above). Storing it in the node itself will be much less efficient, as reading nodes is considered a non-trivial or expensive operation for B-tree use cases (nodes must often be read from disk), thus you want to minimise the number of nodes read, even if that would make each node slightly bigger.
Do an in-order traversal until you've seen k elements - this will take O(n).
Pseudo-code:
select(root, *k) // initial call for root
// returns the k'th element of the elements in node
function select(node, *k) // pass k by pointer, allowing global update
if node == null
return null
for i = 0 to t.elementCount
element = select(node.child[i], k) // check if it's in the child's subtree
if element != null // element was found
return element
if i != t.elementCount // exclude last iteration
if k == 0 // element is here
return t.element[i]
(*k)-- // only decrease k for t.element[i] (i.e. by 1),
// k is decreased for node.child[i] in the recursive call
return null
You can use a new balanced binary search tree(like Splay or just using std::set) to record what elements are currently in the B-Tree. This will allow every operation to finish in O(logn), and its quite easy to implement(when using std::set) but will double the space cost.
Ok so, after a few sleepless hours I managed to do it, and for anyone who will wonder how, here it goes in pseudocode (k=0 for first element):
get_k-th(current, k):
for i = 0 to current.number_of_children_nodes
int size = size_of_B-tree(current.child[i])
if(k <= size-1)
return get_k-th(current.child[i], k)
else if(k == size && i < current.number_of_children_nodes)
return current.key[i]
else if (is_leaf_node(current) && k < current.number_of_children_nodes)
return node.key[k]
k = k - size - 1;
return null
I know this might look kinda weird, but it's what I came up with and thankfully it works. There might be a way to make this code clearer, and probably more efficient, but I hope it's good enough to help anyone else who might get stuck on the same obstacle as I did.
The RMQ problem can be extended like so:
Given is an array of n integers A.
query(x, y): given two integers 1 ≤ x, y ≤ n, find the minimum of A[x], A[x+1], ... A[y];
update(x, v): given an integer v and 1 ≤ x ≤ n do A[x] = v.
This problem can be solved in O(log n) for both operations using segment trees.
This is an efficient solution on paper, but in practice, segment trees involve a lot of overhead, especially if implemented recursively.
I know for a fact that there is a way to solve the problem in O(log^2 n) for one (or both, I'm not sure) of the operations, using binary indexed trees (more resources can be found, but this and this are, IMO, the most concise and exhaustive, respectively). This solution, for values of n that fit into memory, is faster in practice, because BITs have a lot less overhead.
However, I do not know how the BIT structure is used to perform the given operations. I only know how to use it to query an interval sum for example. How can I use it to find the minimum?
If it helps, I have code that others have written that does what I am asking for, but I cannot make sense of it. Here is one such piece of code:
int que( int l, int r ) {
int p, q, m = 0;
for( p=r-(r&-r); l<=r; r=p, p-=p&-p ) {
q = ( p+1 >= l ) ? T[r] : (p=r-1) + 1;
if( a[m] < a[q] )
m = q;
}
return m;
}
void upd( int x ) {
int y, z;
for( y = x; x <= N; x += x & -x )
if( T[x] == y ) {
z = que( x-(x&-x) + 1, x-1 );
T[x] = (a[z] > a[x]) ? z : x;
}
else
if( a[ T[x] ] < a[ y ] )
T[x] = y;
}
In the above code, T is initialized with 0, a is the given array, N its size (they do indexing from 1 for whatever reason) and upd is called at first for every read value. Before upd is called a[x] = v is executed.
Also, p & -p is the same as the p ^ (p & (p - 1)) in some BIT sources and indexing starts from 1 with the zero element initialized to infinity.
Can anyone explain how the above works or how I could solve the given problem with a BIT?
I haven't looked at the code in detail, but it seems to be roughly consistent with the following scheme:
1) Keep the structure of the BIT, that is, impose a tree structure based on powers of two on the array.
2) At each node of the tree, keep the minimum value found at any descendant of that node.
3) Given an arbitrary range, put pointers at the start and end of the range and move them both upwards until they meet. If you move a pointer upwards and towards the other pointer then you have just entered a node in which every descendant is a member of the range, so take note of that value at that node. If you move a pointer upwards and away from the other pointer the node you have just joined records a minimum derived from values including those outside the range, and you have already taken note of every relevant value below that node inside the range, so ignore the value at that node.
4) Once the two pointers are the same pointer, the minimum in the range is the minimum value in any node that you have taken note of.
From a level above the bit fiddling, this is what we have:
A normal BIT array g for integer data array a stores range sums.
g[k] = sum{ i = D(k) + 1 .. k } a[i]
where D(k) is just k with the lowest-order 1 bit set to 0. Here we have instead
T[k] = min{ i = D(k) + 1 .. k } a[i]
The query works exactly like a normal BIT range sum query with the change that minima of subranges are taken as the query proceeds rather than sums. For N items in a, there are ceiling(log N) bits in N, which determines the run time.
The update takes more work because O(log N) subrange minima - i.e. elements of g - are affected by the change, and each takes an O(log N) query by itself to resolve. This makes the update O(log^2 n) overall.
At the bit fiddling level this is fiendishly clever code. The statement x += x & -x clears the lowest-order consecutive string of 1's in x and then sets the next highest-order zero to 1. This is just what you need to "traverse" the BIT for the original integer x.
Segment trees are an efficient solution in practice too. You don't implement them as trees, though. Round n up to the next power of two and use an array rmq of size 2*n. The last n entries of rmq are A. If j < n, then rmq[j] = min(rmq[2*j], rmq[2*j+1]). You only need to look at logarithmically many entries of rmq to answer a range-minimum query. And you only need to update logarithmically many entries of rmq when an entry of A is updated.
I don't understand your code, though, so I'm not going to remark on it.
A binary tree of N nodes is 'curious' if it is a binary tree whose node values are 1, 2, ..,N and which satisfy the property that
Each internal node of the tree has exactly one descendant which is greater than it.
Every number in 1,2, ..., N appears in the tree exactly once.
Example of a curious binary tree
4
/ \
5 2
/ \
1 3
Can you give an algorithm to generate a uniformly random curious binary tree of n nodes, which runs in O(n) guaranteed time?
Assume you only have access to a random number generator which can give you a (uniformly distributed) random number in the range [1, k] for any 1 <= k <= n. Assume the generator runs in O(1).
I would like to see an O(nlogn) time solution too.
Please follow the usual definition of labelled binary trees being distinct, to consider distinct curious binary trees.
There is a bijection between "curious" binary trees and standard heaps. Namely, given a heap, recursively (starting from the top) swap each internal node with its largest child. And, as I learned in StackOverflow not long ago, a heap is equivalent to a permutation of 1,2,...,N. So you should make a random permutation and turn it into a heap; or recursively make the heap in the same way that you would have made a random permutation. After that you can convert the heap to a "curious tree".
Aha, I think I've got how to create a random heap in O(N) time. (after which, use approach in Greg Kuperberg's answer to transform into "curious" binary tree.)
edit 2: Rough pseudocode for making a random min-heap directly. Max-heap is identical except the values inserted into the heap are in reverse numerical order.
struct Node {
Node left, right;
Object key;
constructor newNode() {
N = new Node;
N.left = N.right = null;
N.key = null;
}
}
function create-random-heap(RandomNumberGenerator rng, int N)
{
Node heap = Node.newNode();
// Creates a heap with an "incomplete" node containing a null, and having
// both child nodes as null.
List incompleteHeapNodes = [heap];
// use a vector/array type list to keep track of incomplete heap nodes.
for k = 1:N
{
// loop invariant: incompleteHeapNodes has k members. Order is unimportant.
int m = rng.getRandomNumber(k);
// create a random number between 0 and k-1
Node node = incompleteHeapNodes.get(m);
// pick a random node from the incomplete list,
// make it a complete node with key k.
// It is ok to do so since all of its parent nodes
// have values less than k.
node.left = Node.newNode();
node.right = Node.newNode();
node.key = k;
// Now remove this node from incompleteHeapNodes
// and add its children. (replace node with node.left,
// append node.right)
incompleteHeapNodes.set(m, node.left);
incompleteHeapNodes.append(node.right);
// All operations in this loop take O(1) time.
}
return prune-null-nodes(heap);
}
// get rid of all the incomplete nodes.
function prune-null-nodes(heap)
{
if (heap == null || heap.key == null)
return null;
heap.left = prune-null-nodes(heap.left);
heap.right = prune-null-nodes(heap.right);
}