How to compute index of n'th element in a binary search tree? - binary-tree

I'm storing a binary search tree (BST) in an array where the indices of the left and right children of each node are computed as follows:
N = parent node index
L = 2 * N + 1
R = 2 * N + 2
I would like to be able to quickly (ideally in O(1) time) compute the index of the n'th element in the BST.
For example, given the following binary tree ...
A
/ \
B C
/ \ /
D E F
... which would be stored in an array, like this ...
Node array[] = { A, B, C, D, E, F };
The following table shows which should be the index for each element.
Element
Node
Index
0
D
3
1
B
1
2
E
4
3
A
0
4
F
5
5
C
2
What would be the most efficient way to do this?
Note: The BST will always be perfectly balanced ...

To find the index, we need to know what layer of the tree it's in, and what item in that layer it is.
Let's number the layers from the bottom up, starting at 1. Then, the layer of each item is 1, 2, 1, 3, 1, 2, 1, 4, 1,,2, 1, 3, 1, 2, 1, .... This is pretty much the sequence gcd(n, 2^n), which looks just like a BST:
You can calculate this using the euclidean algorithm, in O(log n) time.
To work out where in the layer this item appears, we can observe that the distances between indices in the same layer is 2^layer (you can see this by counting the spaces in the graph.
Combining these pieces of information, you should be able to get an index.

Related

Find the number of elements greater than x in a given range

Given an array with n elements, how to find the number of elements greater than or equal to a given value (x) in the given range index i to index j in O(log n) complexity?
The queries are of the form (i, j, x) which means find number of elements greater than x from the ith till jth element in the array
The array is not sorted. i, j & x are different for different queries. Elements of the array are static.
Edit: i, j, x all can be different for different queries!
If we know all queries before hand, we can solve this problem by making use of Fenwick tree.
First, we need to sort all elements in array and queries together, based on their values.
So, assuming that we have array [5, 4, 2, 1, 3] and queries (0, 1, 6) and (2, 5, 2), we will have following result after sorting : [1, 2, 2, 3 , 4 , 5, 6]
Now, we will need to process each element in descending order:
If we encounter an element which is from the array, we will update its index in the Fenwick tree, which take O(log n)
If we encounter a queries, we need to check, in this range of the query, how many elements have been added in the tree, which take O(log n).
For above example, the process will be:
1st element is a query for value 6, as Fenwick tree is empty -> result is 0
2nd is element 5 -> add index 0 into Fenwick tree
3rd element is 4 -> add index 1 into tree.
4th element is 3 -> add index 4 into tree.
5th element is 2 -> add index 2 into tree.
6th element is query for range (2, 5), we query the tree and get answer 2.
7th element is 1 -> add index 3 into tree.
Finish.
So, in total, the time complexity for our solution is O((m + n) log(m + n)) with m and n is the number of queries and number of element from input array respectively.
That is possible only if you got the array sorted. In that case binary search the smallest value passing your condition and compute the count simply by sub-dividing your index range by its found position to two intervals. Then just compute the length of the interval passing your condition.
If array is not sorted and you need to preserve its order you can use index sort . When put together:
definitions
Let <i0,i1> be your used index range and x be your value.
index sort array part <i0,i1>
so create array of size m=i1-i0+1 and index sort it. This task is O(m.log(m)) where m<=n.
binary search x position in index array
This task is O(log(m)) and you want the index j = <0,m) for which array[index[j]]<=x is the smallest value <=x
compute count
Simply count how many indexes are after j up to m
count = m-j;
As you can see if array is sorted you got O(log(m)) complexity but if it is not then you need to sort O(m.log(m)) which is worse than naive approach O(m) which should be used only if the array is changing often and cant be sorted directly.
[Edit1] What I mean by Index sort
By index sort I mean this: Let have array a
a[] = { 4,6,2,9,6,3,5,1 }
The index sort means that you create new array ix of indexes in sorted order so for example ascending index sort means:
a[ix[i]]<=a[ix[i+1]]
In our example index bubble sort is is like this:
// init indexes
a[ix[i]]= { 4,6,2,9,6,3,5,1 }
ix[] = { 0,1,2,3,4,5,6,7 }
// bubble sort 1st iteration
a[ix[i]]= { 4,2,6,6,3,5,1,9 }
ix[] = { 0,2,1,4,5,6,7,3 }
// bubble sort 2nd iteration
a[ix[i]]= { 2,4,6,3,5,1,6,9 }
ix[] = { 2,0,1,5,6,7,4,3 }
// bubble sort 3th iteration
a[ix[i]]= { 2,4,3,5,1,6,6,9 }
ix[] = { 2,0,5,6,7,1,4,3 }
// bubble sort 4th iteration
a[ix[i]]= { 2,3,4,1,5,6,6,9 }
ix[] = { 2,5,0,7,6,1,4,3 }
// bubble sort 5th iteration
a[ix[i]]= { 2,3,1,4,5,6,6,9 }
ix[] = { 2,5,7,0,6,1,4,3 }
// bubble sort 6th iteration
a[ix[i]]= { 2,1,3,4,5,6,6,9 }
ix[] = { 2,7,5,0,6,1,4,3 }
// bubble sort 7th iteration
a[ix[i]]= { 1,2,3,4,5,6,6,9 }
ix[] = { 7,2,5,0,6,1,4,3 }
So the result of ascending index sort is this:
// ix: 0 1 2 3 4 5 6 7
a[] = { 4,6,2,9,6,3,5,1 }
ix[] = { 7,2,5,0,6,1,4,3 }
Original array stays unchanged only the index array is changed. Items a[ix[i]] where i=0,1,2,3... are sorted ascending.
So now if x=4 on this interval you need to find (bin search) which i has the smallest but still a[ix[i]]>=x so:
// ix: 0 1 2 3 4 5 6 7
a[] = { 4,6,2,9,6,3,5,1 }
ix[] = { 7,2,5,0,6,1,4,3 }
a[ix[i]]= { 1,2,3,4,5,6,6,9 }
// *
i = 3; m=8; count = m-i = 8-3 = 5;
So the answer is 5 items are >=4
[Edit2] Just to be sure you know what binary search means for this
i=0; // init value marked by `*`
j=4; // max power of 2 < m , i+j is marked by `^`
// ix: 0 1 2 3 4 5 6 7 i j i+j a[ix[i+j]]
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 0 4 4 5>=4 j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 0 2 2 3< 4 -> i+=j; j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 2 1 3 4>=4 j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 2 0 -> stop
*
a[ix[i]] < x -> a[ix[i+1]] >= x -> i = 2+1 = 3 in O(log(m))
so you need index i and binary bit mask j (powers of 2). At first set i with zero and j with the biggest power of 2 still smaller then n (or in this case m). Fro example something like this:
i=0; for (j=1;j<=m;j<<=1;); j>>=1;
Now in each iteration test if a[ix[i+j]] suffice search condition or not. If yes then update i+=j else leave it as is. After that go to next bit so j>>=1 and if j==0 stop else do iteration again. at the end you found value is a[ix[i]] and index is i in log2(m) iterations which is also the number of bits needed to represent m-1.
In the example above I use condition a[ix[i]]<4 so the found value was biggest number still <4 in the array. as we needed to also include 4 then I just increment the index once at the end (I could use <=4instead but was too lazy to rewrite the whole thing again).
The count of such items is then just number of element in array (or interval) minus the i.
Previous answer describes an offline solution using Fenwick tree, but this problem could be solved online (and even when doing updates to the array) with slightly worse complexity. I'll describe such a solution using segment tree and AVL tree (any self-balancing BST could do the trick).
First lets see how to solve this problem using segment tree. We'll do this by keeping the actual elements of the array in every node by range that it covers. So for array A = [9, 4, 5, 6, 1, 3, 2, 8] we'll have:
[9 4 5 6 1 3 2 8] Node 1
[9 4 5 6] [1 3 2 8] Node 2-3
[9 4] [5 6] [1 3] [2 8] Node 4-7
[9] [4] [5] [6] [1] [3] [2] [8] Node 8-15
Since height of our segment tree is log(n) and at every level we keep n elements, total amount of memory used is n log(n).
Next step is to sort these arrays which looks like this:
[1 2 3 4 5 6 8 9] Node 1
[4 5 6 9] [1 2 3 8] Node 2-3
[4 9] [5 6] [1 3] [2 8] Node 4-7
[9] [4] [5] [6] [1] [3] [2] [8] Node 8-15
NOTE: You first need to build the tree and then sort it to keep the order of elements in original array.
Now we can start our range queries and that works basically the same way as in regular segment tree, except when we find a completely overlapping interval, we then additionally check for number of elements greater than X. This can be done with binary search in log(n) time by finding the index of first element greater than X and subtracting it from number of elements in that interval.
Let's say our query was (0, 5, 4), so we do a segment search on interval [0, 5] and end up with arrays: [4, 5, 6, 9], [1, 3]. We then do a binary search on these arrays to see number of elements greater than 4 and get 3 (from first array) and 0 (from second) which brings to total of 3 - our query answer.
Interval search in segment trees can have up to log(n) paths, which means log(n) arrays and since we're doing binary search on each of them, brings complexity to log^2(n) per query.
Now if we wanted to update the array, since we are using segment trees its impossible to add/remove elements efficiently, but we can replace them. Using AVL trees (or other binary trees that allow replacement and lookup in log(n) time) as nodes and storing the arrays, we can manage this operation in same time complexity (replacement with log(n) time).
This is special variant of orthogonal range counting queries in 2D.
Each element el[i] is transformed into point on the plane (i, el[i])
and the query (i,j,x) can be transformed to count all points in the rectangle [i,j] x [x, +infty].
You can use 2D Range Trees (for example: http://www.cs.uu.nl/docs/vakken/ga/slides5b.pdf) for such type of the queries.
The simple idea is to have a tree that stores points in the leaves
(each leaf contains single point) ordered by X-axis.
Each internal node of the tree contains additional tree that stores all points from the subtree (ordered by Y-axis).
The used space is O(n logn)
Simple version could do the counting in O(log^2 n) time, but using
fractional cascading
this could be reduced to O(log n).
There better solution by Chazelle in 1988 (https://www.cs.princeton.edu/~chazelle/pubs/FunctionalDataStructures.pdf)
to O(n) preprocessing and O(log n) query time.
You can find some solutions with better query time, but they are way more complicated.
I would try to give you a simple approach.
You must have studied merge sort.
In merge sort we keep on dividing array into sub array and then build it up back but we dont store the sorted subarrays in this approach we store them as nodes of binary tree.
this takes up nlogn space and nlogn time to build up;
now for each query you just have to find the subarray this will be done in logn on average and logn^2 in worst case.
These tree are also known as fenwick trees.
If you want a simple code I can provide you with that.

Computing Probability of an rBST

Hi all lets say we will insert A,B,C in an rBST(Binary Search Tree) random order, there would be 5 outcomes
B
A C
A
B
C
C
B
A
C
A
B
A
C
B
a)What would be the probability of getting these trees?
b)What would be the probability if we added a "D" and it looked like this:
A
B
C
D
Worst case probability? Thanks for your time!
First thing to notice is that you have 3 elements initially.
You can thing of constructing BST as a recursive process. Firstly, you select the root and then recursively you construct the left and the right subtree - both of them are determined by the root.
If you have n items, the probability that you select one of them as a root of the tree is clearly 1/n (I assume that random means uniformly random and independently of previous choices).
Of course, if you have 1 element or 0 elements there are only one tree possible, so the probability of constructing that tree is equal to 1.
Case 1:
B
A C
Pr = Pr(select B as a root of a whole tree)
* Pr(tree consisting of 1 element because only A is less than B)
* Pr(tree consisting of 1 element because only C is greater than B)
= 1/3 * 1 * 1 = 1/3
Case 2:
A
B
C
Pr = Pr(select A as a root of a whole tree)
* Pr(tree of 0 elements because none of elements is less than A)
* Pr(select B as a root of tree of elements greater than A)
* Pr(tree of 0 elements because none of remaining elements is less than B)
* Pr(tree of 1 element because C is greater than B)
= 1/3 * 1 * 1/2 * 1 * 1 = 1/6
Cases 3, 4, 5:
Constructing any of these trees is analogous to the Case 2 because they share the same structure - you can compute the probabilities and check it.
Summary
Of course every possible BST on 3 elements is listed above, so the probability of these trees should sum up to 1, let's check it:
Pr(Case 1) + 4 * Pr(Case 2) = 1/3 + 4 * 1/6 = 1/3 + 4/6 = 1
You can figure out the answer to your second question examining the above method.

find kth smallest number in O(logn) time

Here is the problem, an unsorted array a[n], and I need to find the kth smallest number in range [i, j], and absolutely 1<=i<=j<=n, k<=j-i+1.
Typically I will use quick-find to do the job, but it is not fast enough if there many query requests with different range [i, j], I hardly to figure out a algorithm to do the query in O(logn) time (preprocessing is allowed).
Any idea is appreciated.
PS
Let me make the problem easier to understand. Any kinds of preprocessing is allowed, but the query needs to be done in O(logn) time. And there will be many (more than 1) queries, like find the 1st in range [3,7], or 3rd in range [10,17], or 11th in range [33, 52].
By range [i, j] I mean in the original array, not sorted or something.
For example, a[5] = {3,1,7,5,9}, query 1st in range [3,4] is 5, 2nd in range [1,3] is 5, 3rd in range [0,2] is 7.
If pre-processing is allowed and not counted towards the time complexity, just use that to construct sub-lists so that you can efficiently find the element you're looking for. As with most optimisations, this trades space for time.
Your pre-processing step is to take your original list of n numbers and create a number of new sublists.
Each of these sublists is a portion of the original, starting with the nth element, extending for m elements and then sorted. So your original list of:
{3, 1, 7, 5, 9}
gives you:
list[0][0] = {3}
list[0][1] = {1, 3}
list[0][2] = {1, 3, 7}
list[0][3] = {1, 3, 5, 7}
list[0][4] = {1, 3, 5, 7, 9}
list[1][0] = {1}
list[1][1] = {1, 7}
list[1][2] = {1, 5, 7}
list[1][3] = {1, 5, 7, 9}
list[2][0] = {7}
list[2][1] = {5, 7}
list[2][2] = {5, 7, 9}
list[3][0] = {5}
list[3][1] = {5,9}
list[4][0] = {9}
This isn't a cheap operation (in time or space) so you may want to maintain a "dirty" flag on the list so you only perform it the first time after you do an modifying operation (insert, delete, change).
In fact, you can use lazy evaluation for even more efficiency. Basically set all sublists to an empty list when you start and whenever you perform a modifying operation. Then, whenever you attempt to access a sublist and it's empty, calculate that sublist (and that one only) before trying to get the kth value out of it.
That ensures sublists are evaluated only when needed and cached to prevent unnecessary recalculation. For example, if you never ask for a value from the 3-through-6 sublist, it's never calculated.
The pseudo-code for creating all the sublists is basically (for loops inclusive at both ends):
for n = 0 to a.lastindex:
create array list[n]
for m = 0 to a.lastindex - n
create array list[n][m]
for i = 0 to m:
list[n][m][i] = a[n+i]
sort list[n][m]
The code for lazy evaluation is a little more complex (but only a little), so I won't provide pseudo-code for that.
Then, in order to find the kth smallest number in the range i through j (where i and j are the original indexes), you simply look up lists[i][j-i][k-1], a very fast O(1) operation:
+--------------------------+
| |
| v
1st in range [3,4] (values 5,9), list[3][4-3=1][1-1-0] = 5
2nd in range [1,3] (values 1,7,5), list[1][3-1=2][2-1=1] = 5
3rd in range [0,2] (values 3,1,7), list[0][2-0=2][3-1=2] = 7
| | ^ ^ ^
| | | | |
| +-------------------------+----+ |
| |
+-------------------------------------------------+
Here's some Python code which shows this in action:
orig = [3,1,7,5,9]
print orig
print "====="
list = []
for n in range (len(orig)):
list.append([])
for m in range (len(orig) - n):
list[-1].append([])
for i in range (m+1):
list[-1][-1].append(orig[n+i])
list[-1][-1] = sorted(list[-1][-1])
print "(%d,%d)=%s"%(n,m,list[-1][-1])
print "====="
# Gives xth smallest in index range y through z inclusive.
x = 1; y = 3; z = 4; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
x = 2; y = 1; z = 3; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
x = 3; y = 0; z = 2; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
print "====="
As expected, the output is:
[3, 1, 7, 5, 9]
=====
(0,0)=[3]
(0,1)=[1, 3]
(0,2)=[1, 3, 7]
(0,3)=[1, 3, 5, 7]
(0,4)=[1, 3, 5, 7, 9]
(1,0)=[1]
(1,1)=[1, 7]
(1,2)=[1, 5, 7]
(1,3)=[1, 5, 7, 9]
(2,0)=[7]
(2,1)=[5, 7]
(2,2)=[5, 7, 9]
(3,0)=[5]
(3,1)=[5, 9]
(4,0)=[9]
=====
(1,3,4)=5
(2,1,3)=5
(3,0,2)=7
=====
Current solution is O( (logn)^2 ). I am pretty sure it can be modified to run on O(logn). The main advantage of this algorithm over paxdiablo's algorithm is space efficiency. This algorithm needs O(nlogn) space, not O(n^2) space.
First, the complexity of finding kth smallest element from two sorted arrays of length m and n is O(logm + logn). Complexity of finding kth smallest element from arrays of lengths a,b,c,d.. is O(loga+logb+.....).
Now, sort the whole array and store it. Sort the first half and second half of the array and store it and so on. You will have 1 sorted array of length n, 2 sorted of arrays of length n/2, 4 sorted arrays of length n/4 and so on. Total memory required = 1*n+2*n/2+4*n/4+8*n/8...= nlogn.
Once you have i and j figure out the list of of subarrays which, when concatenated, give you range [i,j]. There are going to be logn number of arrays. Finding kth smallest number among them would take O( (logn)^2) time.
Example for the last paragraph:
Assume the array is of size 8 (indexed from 0 to 7). You have the following sorted lists:
A:0-7, B:0-3, C:4-7, D:0-1, E:2-3, F:4-5, G:6-7.
Now construct a tree with pointers to these arrays such that every node contains its immediate constituents. A will be root, B and C are its children and so on.
Now implement a recursive function that returns a list of arrays.
def getArrays(node, i, j):
if i==node.min and j==node.max:
return [node];
if i<=node.left.max:
if j<=node.left.max:
return [getArrays(node.left, i, j)]; # (i,j) is located within left node
else:
return [ getArrays(node.left, i, node.left.max), getArrays(node.right, node.right.min, j) ]; # (i,j) is spread over left and right node
else:
return [getArrays(node.right, i, j)]; # (i,j) is located within right node
Preprocess: Make an nxn array where the [k][r] element is the kth smallest element of the first r elements (1-indexed for convenience).
Then, given some particular range [i,j] and value for k, do the following:
Find the element at the [k][j] slot of the matrix; call this x.
go down the i-1 column of your matrix and find how many values in it are smaller than or equal to x (treat column 0 as having 0 smaller entries). By construction, this column will be sorted (all columns will be sorted), so it can be found in log time. Call this value s
Find the element in the [k+s][j] slot of the matrix. This is your answer.
E.g., given 3 1 7 5 9
3 1 1 1 1
X 3 3 3 3
X X 7 5 5
X X X 7 7
X X X X 9
Now, if we're asked for the 2nd smallest in [2,4] range (again, 1-indexing), I first find the 2nd smallest in [1,4] range which is 3. I then look at column 1 and see that there is 1 element less than or equal to 3. Finally, I find the 3rd smallest in [1,4] range at [3][5] slot which is 5, as desired.
This takes n^2 space, and log(n) lookup time.
This one does not require pre-process but is somehow slower than O(logN). It's significantly faster than a naive iterate&count, and could support dynamic modification on the sequence.
It goes like this. Suppose the length n has n=2^x for some x. Construct a segment-tree whose root node represent [0,n-1]. For each of the node, if it represent a node [a,b], b>a, let it has two child nodes each representing [a,(a+b)/2], [(a+b)/2+1,b]. (That is, do a recursive divide-by-two).
Then, on each node, maintain a separate binary search tree for the numbers within that segment. Therefore, each modification on the sequence takes O(logN)[on the segement]*O(logN)[on the BST]. Queries can be done like this, Let Q(a,b,x) be rank of x within segment [a,b]. Obviously, if Q(a,b,x) can be computed efficiently, a binary search on x can compute the answer desired effectively (with an extra O(logE) factor.
Q(a,b,x) can be computed as: find smallest number of segments that make up [a,b], which can be done in O(logN) on the segment tree. For each segment, query on the binary search tree for that segment for the number of elements less than x. Add all these numbers to get Q(a,b,x).
This should be O(logN*logE*logN). Well not exactly what you have asked for though.
In O(log n) time it's not possible to read all of the elements of the array. Since it's not sorted, and there's no other provided information, this is impossible.
There's no way you can do better than O(n) in both worst and average case. You have to look at every single element.

Algorithms : martix traversal variation

There is an N × N square mesh-shaped grid of wires, as shown in a figure below. Nodes of the grid are at points (X, Y), where X and Y are integers from 0 to N−1. An electric current flows through the grid, between the nodes at (0, 0) and (N−1, N−1).
Initially, all the wires conduct the current, but the wires burn out at a rate of one per second. The burnouts are described by three zero-indexed arrays of integers, A, B and C, each of size M. For each moment T (0 ≤ T < M), in the T-th second the wire between nodes (A[T], B[T]) and:
(A[T], B[T] + 1), if C[T] = 0 or
(A[T] + 1, B[T]), if C[T] = 1
burns out. You can assume that the arrays describe existing wires, and that no wire burns out more than once. Your task is to determine when the current stops flowing between the nodes at (0,0) and (N−1,N−1).
Write a function:
int wire_burnouts(int N, int A[], int M, int B[], int M2, int C[], int M3);
that, given integer N and arrays A, B and C, returns the number of seconds after which the current stops flowing between the nodes at (0, 0) and (N−1, N−1). If the current keeps flowing even after all M wires burn out, the function should return −1.
For example, given N = 4, M = 9 and the following arrays:
A[0] = 0 B [0] = 0 C[0] = 0
A1 = 1 B 1 = 1 C1 = 1
A2 = 1 B 2 = 1 C2 = 0
A[3] = 2 B [3] = 1 C[3] = 0
A[4] = 3 B [4] = 2 C[4] = 0
A[5] = 2 B [5] = 2 C[5] = 1
A[6] = 1 B [6] = 3 C[6] = 1
A[7] = 0 B [7] = 1 C[7] = 0
A[8] = 0 B [8] = 0 C[8] = 1
your function should return 8, because just after the eighth wire burns out, there is no connection between the nodes at (0, 0) and (N−1, N−1). This situation is shown in the following figure:
Given N = 4, M = 1 and the following arrays:
A[0] = 0 B [0] = 0 C[0] = 0
your function should return −1, because burning out a single wire cannot break the connection between the nodes at (0, 0) and (N−1, N−1).
Assume that:
N is an integer within the range [1..400];
M is an integer within the range [0..2*N*(N−1)];
each element of array A is an integer within the range [0..N−1];
each element of array B is an integer within the range [0..N−1];
each element of array C is an integer within the range [0..1].
Complexity:
expected worst-case time complexity is O(N2*log(N));
expected worst-case space complexity is O(N2), beyond input storage (not counting the storage required for input arguments).
Construct complete grid of wires. Then destroy first M/2 wires. Check connectivity with depth-first search. If still connected, destroy M/4 more wires. If not, restore M/4 most recently destroyed wires. Continue this binary search until proper T is found.
Time complexity is determined by number of depth-first searches: O(log M) <= O(log N) and complexity of each depth-first search: O(N2).
Previous result may be improved with Disjoint-set data structure.
Construct complete grid of wires. Then destroy M wires as directed by arrays A, B, and C. Add the remaining connected components of the grid to disjoint-set data structure.
Then sequentially restore wires, starting from the last elements of these arrays and coming to their first elements. While doing this, find union of the sets remaining in disjoint-set structure. Stop when sets containing nodes (0, 0) and (N−1, N−1) are joined together.
If disjoint-set data structure uses union by rank and path compression approaches, time complexity of the whole algorithm is O(N2 α(N)), where α is the inverse Ackermann function. This is practically as good as O(N2).
Previous result may be improved if we use a graph, dual to the original grid of wires: node of the dual graph corresponds to face of the original graph, each edge of the dual graph intersects corresponding edge of the original graph. Two additional nodes will be needed: node L connected to every top and left node of the dual graph and node R connected to every bottom and right node.
If this dual graph contains a path from L to R, nodes (0, 0) and (N−1, N−1) cannot be connected to each other. If there is no path from L to R, nodes (0, 0) and (N−1, N−1) are connected.
Initially dual graph is completely disconnected. While removing edges from the original graph, we add corresponding edges to dual graph. At the same time we update disjoint-set data structure. Stop as soon as the sets containing nodes L and R are joined together.
This algorithm needs to visit elements of its input arrays A, B, and C only once, which makes it an Online algorithm.
The most limiting factor for time complexity is now the initialization time for the array of dual graph's nodes: O(N2). If there is a way to avoid this initialization, we get asymptotically more efficient O(M α(M)) algorithm. There are several approaches to initialization problem:
Use this trick to initialize array in O(1) time. This gives O(M α(M)) worst case time algorithm. But in practice it is rarely possible to allocate memory without initializing it (for security reasons).
Initialize array once and then use this algorithm many times. This gives O(M α(M)) amortized time algorithm.
Use hash table to store dual graph's nodes. This gives O(M α(M)) expected time algorithm. Also this improves space complexity to O(M).

Counting Treaps

Consider the problem of counting the number of structurally distinct binary search trees:
Given N, find the number of structurally distinct binary search trees containing the values 1 .. N
It's pretty easy to give an algorithm that solves this: fix every possible number in the root, then recursively solve the problem for the left and right subtrees:
countBST(numKeys)
if numKeys <= 1
return 1
else
result = 0
for i = 1 .. numKeys
leftBST = countBST(i - 1)
rightBST = countBST(numKeys - i)
result += leftBST * rightBST
return result
I've recently been familiarizing myself with treaps, and I posed the following problem to myself:
Given N, find the number of distinct treaps containing the values 1 .. N with priorities 1 .. N. Two treaps are distinct if they are structurally different relative to EITHER the key OR the priority (read on for clarification).
I've been trying to figure out a formula or an algorithm that can solve this for a while now, but I haven't been successful. This is what I noticed though:
The answers for n = 2 and n = 3 seem to be 2 and 6, based on me drawing trees on paper.
If we ignore the part that says treaps can also be different relative to the priority of the nodes, the problem seems to be identical to counting just binary search trees, since we'll be able to assign priorities to each BST such that it also respects the heap invariant. I haven't proven this though.
I think the hard part is accounting for the possibility to permute the priorities without changing the structure. For example, consider this treap, where the nodes are represented as (key, priority) pairs:
(3, 5)
/ \
(2, 3) (4, 4)
/ \
(1, 1) (5, 2)
We can permute the priorities of both the second and third levels while still maintaining the heap invariant, so we get more solutions even though no keys switch place. This probably gets even uglier for bigger trees. For example, this is a different treap from the one above:
(3, 5)
/ \
(2, 4) (4, 3) // swapped priorities
/ \
(1, 1) (5, 2)
I'd appreciate if anyone can share any ideas on how to approach this. It seemed like an interesting counting problem when I thought about it. Maybe someone else thought about it too and even solved it!
Interesting question! I believe the answer is N factorial!
Given a tree structure, there is exactly one way to fill in the binary search tree key values.
Thus all we need to do is count the different number of heaps.
Given a heap, consider an in-order traversal of the tree.
This corresponds to a permutation of the numbers 1 to N.
Now given any permutation of {1,2...,N}, you can construct a heap as follows:
Find the position of the largest element. The elements to its left form the left subtree and the elements to its right form the right subtree. These subtrees are formed recursively by finding the largest element and splitting there.
This gives rise to a heap, as we always choose the max element and the in-order traversal of that heap is the permutation we started with. Thus we have a way of going from a heap to a permutaion and back uniquely.
Thus the required number is N!.
As an example:
5
/ \
3 4 In-order traversal -> 35142
/ \
1 2
Now start with 35142. Largest is 5, so 3 is left subtree and 142 is right.
5
/ \
3 {142}
In 142, 4 is largest and 1 is left and 2 is right, so we get
5
/ \
3 4
/ \
1 2
The only way to fill in binary search keys for this is:
(2,5)
/ \
(1,3) (4,4)
/ \
(3,1) (5,2)
For a more formal proof:
If HN is the number of heaps on 1...N, then we have that
HN = Sum_{L=0 to N-1} HL * HN-1-L * (N-1 choose L)
(basically we pick the max and assign to root. Choose the size of left subtree, and choose that many elements and recurse on left and right).
Now,
H0 = 1
H1 = 1
H2 = 2
H3 = 6
If Hn = n! for 0 ≤ n ≤ k
Then HK+1 = Sum_{L=0 to K} L! * (K-L)! * (K!/L!*(K-L)!) = (K+1)!
def countBST(numKeys:Long):Long = numKeys match {
case 0L => 1L
case 1L => 1L
case _ => (1L to numKeys).map{i=>countBST(i-1) * countBST(numKeys-i)}.sum
}
You didn't actually define structural similarity for treaps -- you just gave examples. I'm going to assume the following definition: two trees are structurally different if and only if they have a different shape, or there exist nodes a (from tree A) and b (from tree B) such that a and b are in the same position, and the priorities of the children of a are in the opposite order of the priorities of the children of b. (It's obvious that if two treaps on the same values have the same shape, then the values in corresponding nodes are the same.)
In other words, if we visualize two trees by just giving the priorities on the nodes, the following two trees are structurally similar:
7 7
6 5 6 5
4 3 2 1 2 1 4 3 <--- does not change the relative order
of the children of any node
6's left child is still greater than 6's right child
5's left child is still greater than 5's right child
but the following two trees are structurally different:
7 7
5 6 6 5 <--- changes the relative order of the children
4 3 2 1 4 3 2 1 of node 7
Thus for the treap problem, each internal node has 2 orderings, and these two orderings do not otherwise affect the shape of the tree. So...
def countTreap(numKeys:Long):Long = numKeys match {
case 0L => 1L
case 1L => 1L
case _ => 2 * countBST(numKeys-1) + //2 situations when the tree has only 1 child
2 * (2L to (numKeys-1)).map{i=>countBST(i-1) * countBST(numKeys-i)}.sum
// and for each situation where the tree has 2 children, this node
// contributes 2 orderings the priorities of its children
// (which is independent of the shape of the tree below this level)
}

Resources