Find the kth node in Nth level of binary tree [duplicate] - algorithm

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
K-th element in a heap tree
Given a binary tree, if parent is 0, then left child is 0 and right child is 1. if parent is 1, then left child is 1 and right child is 0. Root of the tree is 0. Find the kth node value which is present at Nth level
I tried to solve in this way. Suppose first level has 0, second level has 01, third level has 01 - 10 (i.e complement of first half).
Similarly 0110 1001 on the fourth level.
Now how can I generalize this solution or any other way to solve this question?

To generalize your idea, you could write a recursive procedure that gives the list of the elements of the nth level of the tree, since (like you said) every level can be obtained concatenating the upper level and its complement:
getLevel(level)
if level == 0
return [0]
upperLevel = getLevel(level - 1)
return upperLevel + complement(upperLevel)
Where [...] is a list, + is the concatenation of lists and complement changes 0 into 1 and viceversa.
Having this, you just have to get the kth element of the list generated by getLevel(n).
This is probably not the optimal solution, it's just built on your idea (and it's easy).

I manually generated first several bits, and got 0110100110010110. Google reveals this is Thue-Morse sequence. Sequence A010060 in OEIS. Comments on OEIS page have this line:
a(n) = S2(n) mod 2, where S2(n) = sum of digits of n, n in base-2 notation.
Here n is what in your case is k, and N in your case does not matter. So, to determine a(n) calculate number of 1's in n, and take least significant bit of this sum.

Related

Point Updates in Fenwick Tree

I have a hard time understanding how does adding LSB to the current index gives us the next place which contains the given point.
void update(int k, int x) {
while (k <= n) {
tree[k] += x;
k += k&-k; // adding LSB (least significant bit)
}
}
Can anyone explain to me or refer to some resources? All the resources I've seen just tell you that it works, but does not explain why. I know how the query works though.
Thanks.
P.S I've seen kind of the same questions here, but I don't still get it, since they do not really explain.
Fenwick Tree data structure can be quite tricky to grasp fundamentally, but once you understand the underlying mathematics, you should be good at it. So I will try to explain all the hows and whys about Fenwick Trees.
Fenwick Tree is based on the Binary Representation of the array index
First and foremost, what you should firmly understand is, that:
Idea of the Fenwick Tree is based on a fact, that each integer number can be represented as a Binary Number, i.e. as a sum of different powers of 2, and that representation will be unique; e.g. integer number 14 can be represented as 23+22+21.
Note, that "different", is important keyword in this definition, so you should not represent 14 as 23+21+21+21.
How Fenwick Tree is populated
I will not implement the Fenwick Tree population algorithm here (you said, you understand how the tree is populated, besides, it is irrelevant to the question); however, I will stress the fact, that Fenwick Tree is [mostly] implemented via array, in a way, that each slot in the fenwick-tree array, holds a value, which is the sum of the range of the original array, where:
right index in that range is k itself (this slot is the right bound);
number of elements in that range is the smallest addend from the sum-of-the-powers-of-two representation of that index (so, you should count that amount of elements, from right, to the left, in order to get the range in question).
P. S. If the Fenwick Tree stores some n value at index 24, this means, that sum of the interval [17, 24] in the original array, will be n.
Q: Why 17 is the left bound?
A: Because, 24 is 24+23, and smallest addend from this expression is 23 = 8. Now, according to the definition given above, the range which sums up to the element at index 24 in the Fenwick Tree array, will be containing 8 elements, and if the right bound happens to be at index 24 itself, 8 consecutive elements from right to the left will get us to the left bound, which is at index 17; therefore, we have 8 elements in the inclusive range [17, 24] and the value at the index 24 will be n, which is sum of the elements in [17, 24] range.
This image will even clearly illustrate what I wrote above:
Important note:
Representing the integer as a sum of different powers of 2, stems from the principles of the Binary Numeral System.
For instance, 1011 can be written as 23+21+20.
leftmost column, in the binary representation, constitutes 2 to the power of 3, and the right most column constitutes 2 to the power of 0. In the binary representation, powers of 2 increase by 1 per each step from rightmost column to the left.
If you understand the Binary Numeral System, you should understand, that when representing some number N as the sum of the different powers of two, the smallest number in that sum, is same, as the part in N's binary representation starting from the Least Significant Bit (LSB) and ending with the rightmost digit of that binary representation, which is also same as 2 to the power of indexOf(LSB)-1 (in case you start indexing your binary number with 1, from the right) or indexOf(LSB) (in case you index your number with 0).
What does all this give?
Faster Range Queries
See how does Range Query work in the Fenwick Tree.
I hope you understand that we need prefix sums for the range queries.
In order to calculate the prefix sums for the original[0, index], instead of iterating over entire array, you now just cascade down in the corresponding Fenwick Tree, from that index, and you continuously remove LSB from the values at those indices, while you keep summing up values at all those indices (which are sums of the ranges of the original array).
This looks like:
int prefixSum(int index) {
int sum = 0;
while(index!=0) {
sum+=fenwickTree[index];
index = index - LSB(index);
}
return sum;
}
Q: Why does this work?
A: I think it should be obvious now, but if it is still not - then pay a close attention on why we remove LSB(index). We do so, because after you have added fenwickTree[index] to the current sum while calculating the prefix sum, as we've already explained above, next slot storing another slice of the original arrays interval, will be at the index = index - LSB(index), because in the Fenwick Tree, indix k stores the interval of the length [2LSBIndexOf(toBinary(k))-1, k]
So, according to what we have just shown (cascading, summing, and index-LSB(index)), with the Fenwick Tree, the prefix sum for index 11 (for example), will be calculated as:
prefixSum = fenwickTree[11] + fenwickTree[10] + fenwickTree[8]
because:
fenwickTree[11] stores sum of original[11] (odd indices store only values at those indices);
fenwickTree[10] stores sum of original[9,10];
fenwickTree[8] stores sum of original[1, 8].
You basically have 3 slices to sum up: [1,8], [9,10] and [11].
Faster Point Updates
See how does Point Update work in the Fenwick Tree.
I think, it is now obvious why and how Point Update works - in terms of LSB, it is an opposite operation of the range query - instead of removing LSB(index), you will be adding the LSB(index), cascading now UP to the indices and updating corresponding ones in the Fenwick Tree.
For instance, if we want to add a value at index 9, you have to find out all the slots that are responsible for that index and you have to update them. We have to take number starting at LSB of index 9 element, and we have to add it to value at index 9. We have to keep repeating this until we reach the slot where LSB is the number at that index itself. That's it.
void update(int i, int x) {
while (i <= n) {
fenwickTree[i] += x;
i += LSB(i); //this will give you the next slot which is used as an addend
}
}
I really hope this helps you and sheds some light on your understanding.

find 4th smallest element in linear time

So i had an exercise given to me about 2 months ago, that says the following:
Given n (n>=4) distinct elements, design a divide & conquer algorithm to compute the 4th smallest element. Your algorithm should run in linear time in the worst case.
I had an extremely hard time with this problem, and could only find relevant algorithms that runs in the worst case O(n*k). After several weeks of trying, we managed, with the help of our teacher, "solve" this problem. The final algorithm is as follows:
Rules: The input size can only be of size 2^k
(1): Divide input into n/2. One left array, one right array.
(2): If input size == 4, sort the arrays using merge sort.
(2.1) Merge left array with right array into a new result array with length 4.
(2.2) Return element at index [4-1]
(3): Repeat step 1
This is solved recursively and our base case is at step 2. Step 2.2 means that for all
of our recursive calls that we did, we will get a final result array of length 4, and at that
point, we can justr return the element at index [4-1].
With this algorithm, my teacher claims that this runs in linear time. My problem with that statement is that we are diving the input until we reach sub-arrays with an input size of 4, and then that is sorted. So for an input size of 8, we would sort 2 sub-arrays with length 4, since 8/4 = 2. How is this in any case linear time? We are still sorting the whole input size but in blocks aren't we? This really does not make sense to me. It doesn't matter if we sort the whole input size at it is, or divide it into sub-arrays with size of 4,and sort them like that? It will still be a worst time of O(n*log(n))?
Would appreciate some explanations on this !
To make proving that algorithm runs in linear time, let's modify it a bit (we will only change an order of dividing and merging blocks, nothing more):
(1): Divide input into n/4 blocks, each has size 4.
(2): Until there is more than one block, repeat:
Merge each pair of adjacent blocks into one block of size 4.
(For example, if we have 4 blocks, we will split them in 2 pairs -
first pair contains first and second blocks,
second pair contains third and fourth blocks.
After merging we will have 2 blocks -
the first one contains 4 least elements from blocks 1 and 2,
the second one contains 4 least elements from blocks 3 and 4).
(3): The answer is the last element of that one block left.
Proof: It's a fact that array of constant length (in your case, 4) can be sorted in constant time. Let k = log(n). Loop (2) runs k-2 iterations (on each iteration the count of elements left is divided by 2, until 4 elements are left).
Before i-th iteration (0 <= i < k-2) there are (2^(k-i)) elements left, so there are 2^(k-i-2) blocks and we will merge 2^(k-i-3) pairs of blocks. Let's find how many pairs we will merge in all iterations. Count of merges equals
mergeOperationsCount = 2^(k-3) + 2^(k-4) + .... + 2^(k-(k-3)) =
= 2^(k-3) * (1 + 1/2 + 1/4 + 1/8 + .....) < 2^(k-2) = O(2^k) = O(n)
Since we can merge each pair in constant time (because thay have constant size), and the only operation we make is merging pairs, the algorithm runs in O(n).
And after this proof, I want to notice that there is another linear algorithm which is trivial, but it is not divide-and-conquer.

Sequence increasing and decreasing by turns

Let's assume we've got a sequence of integers of given length n. We want to delete some elements (maybe none), so that the sequence is increasing and decreasing by turns in result. It means, that every element should have neighbouring elements either both bigger or both smaller than itself.
For example 1 3 2 7 6 and 5 1 4 2 10 are both sequences increasing and decreasing by turns.
We want to delete some elements to transform our sequence that way, but we also want to maximize the sum of elements left. So, for example, from sequence 2 18 6 7 8 2 10 we want to delete 6 and make it 2 18 7 8 2 10.
I am looking for an effective solution to that problem. Example above shows that the most naive greedy algorithm (delete every first element that breaks the sequence) won't work - it would delete 7 instead of 6, which would not maximize the sum of elements left.
Any ideas how to solve that effectively (O(n) or O(n log n) probably) and correctly?
For every element of the sequence with index i we will calculate F(i, high) and F(i, low), where F(i, high) equals to the biggest sum of the subsequence with wanted characteristics that ends with the i-th element and this element is a "high peak". (I'll explain mainly the "high" part, the "low" part can be done similarly). We can calculate these functions using the following relations:
The answer is maximal among all F(i, high) and F(i, low) values.
That gives us a rather simple dynamic programming solution with O(n^2) time complexity. But we can go further.
We can optimize a calculation of max(F(j,low)) part. What we need to do is to find the biggest value among previously calculated F(j, low) with the condition that a[j] < a[i]. This can be done with segment trees.
First of all, we'll "squeeze" our initial sequence. We need the real value of the element a[i] only when calculating the sum. But we need only the relative order of the elements when checking that a[j] is less than a[i]. So we'll map every element to its index in the sorted elements array without duplicates. For example, sequence a = 2 18 6 7 8 2 10 will be translated to b = 0 5 1 2 3 0 4. This can be done in O(n*log(n)).
The biggest element of b will be less than n, as a result, we can build a segment tree on the segment [0, n] with every node containing the biggest sum within the segment (we need two segment trees for "high" and "low" part accordingly). Now let's describe the step i of the algorithm:
Find the biggest sum max_low on the segment [0, b[i]-1] using the "low" segment tree (initially all nodes of the tree contain zero).
F(i, high) is equal to max_low + a[i].
Find the biggest sum max_high on the segment [b[i]+1, n] using the "high" segment tree.
F(i, low) is equal to max_high + a[i].
Update the [b[i], b[i]] segment of the "high" segment tree with F(i, high) value recalculating maximums of the parent nodes (and [b[i], b[i]] node itself).
Do the same for "low" segment tree and F(i, low).
Complexity analysis: b sequence calculation is O(n*log(n)). Segment tree max/update operations have O(log(n)) complexity and there are O(n) of them. The overall complexity of this algorithm is O(n*log(n)).

Efficient approach to find co-prime subarrays

Given an array, is it possible to find the number of co-prime sub arrays of the array in better than O(N²) time? Co-prime arrays are defined as a contiguous subset of an array such that GCD of all elements is 1.
Consider adding one element to the end of the array. Now find the rightmost position, if any, such that the sub-array from that position to the element you have just added is co-prime. Since it is rightmost, no shorter array ending with the element added is co-prime. Since it is co-prime, every array that starts to its left and ends with the new element is co-prime. So you have worked out the number of co-prime sub-arrays that end with the new element. If you can find the rightmost position efficiently - say in O(log n) instead of O(n) - then you can count the number of co-prime sub-arrays in O(n log n) by extending the array one element at a time.
To make it possible to find rightmost positions, think of the full array as the leaves of a complete binary tree, padded out to make its a length a power of two. At each node put the GCD of all of the elements below that node - you can do this from the bottom up in time O(n). Every contiguous interval within the array can be covered by a collection of nodes of size O(log n) such that the interval consists of the leaves underneath the nodes, so you can compute the GCD of the interval is time O(log n).
To find the rightmost position forming a co-prime subarray with your current element, start with the current element and check to see if it is 1. If it is, you are finished. If not, look at the element to its left, take a GCD with that, and push the result on a stack. If the result is 1, you are finished, if not, do the same, but look to see if there is a sub-tree of 2 elements you can use to add 2 elements at once. At each of the succeeding steps you double the size of the sub-tree you are trying to find. You won't always find a convenient sub-tree of the size you want, but because every interval can be covered by O(log n) subtrees you should get lucky often enough to go through this step in time O(log n).
Now you have either found that whole array to the current element is not co-prime or you have found a section that is co-prime, but may go further to the left than it needs. The value at the top of the stack was computed by taking the GCD of the value just below it on the stack and the GCD at the top of a sub-tree. Pop it off the stack and take the GCD of the value just below it and the right half of the sub-tree. If you are still co-prime then you didn't need the left half of the sub-tree. If not, then you needed it, but perhaps not all of it. In either case you can continue down to find the rightmost match in time O(log n).
So I think you can find the rightmost position forming a co-prime subarray with the current element in time O(log n) (admittedly with some very fiddly programming) so you can count the number of coprime sub-arrays in time O(n log n)
Two examples:
List 1, 3, 5, 7. The next level is 1, 1 and the root is 1. If the current element is 13 then I check against 7 and find that gcd(7, 13) = 1. Therefore I immediately know that GCD(5, 7, 13) = GCD(3, 5, 7, 13) = GCD(1, 3, 4, 7, 13) = 1.
List 2, 4, 8, 16. The next level is 2, 8 and the root is 2. If the current numbers is 32 then I check against 16 and find that gcd(16, 32) = 16 != 1 so then I check against 8 and find that GCD(8, 32) = 8 and then I check against 2 and find that GCD(2, 32) = 2 so there is no interval in the extended array which has GCD = 1.

Enumerate search trees

According to this question the number of different search trees of a certain size is equal to a catalan number. Is it possible to enumerate those trees? That is, can someone implement the following two functions:
Node* id2tree(int id); // return root of tree
int tree2id(Node* root); // return id of tree
(I ask because the binary code for the tree (se one of the answers to this question) would be a very efficient code for representing arbitrarily large integers of unknown range, i.e, a variable length code for integers
0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
etc
notice that the number of integers of each code length is 1, 1, 2, 5,.. (the catalan sequence). )
It should be possible to convert the id to tree and back.
The id and bitstrings being:
0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
First consider the fact that given a bitstring, we can easily move to the tree (a recursive method) and viceversa (preorder, outputting 1 for parent and 0 for leaf).
The main challenge comes from trying to map the id to the bitstring and vice versa.
Suppose we listed out the trees of n nodes as follows:
Left sub-tree n-1 nodes, Right sub-tree 0 nodes. (Cn-1*C0 of them)
Left sub-tree n-2 nodes, Right sub-tree 1 node. (Cn-2*C1 of them)
Left sub-tree n-3 nodes, right sub-tree 2 nodes. (Cn-3*C2 of them)
...
...
Left sub-tree 0 nodes, Right sub-tree n-1 nodes. (C0*Cn-1 of them)
Cr = rth catalan number.
The enumeration you have given seems to come from the following procedure: we keep the left subtree fixed, enumerate through the right subtrees. Then move onto the next left subtree, enumerate through the right subtrees, and so on. We start with the maximum size left subtree, then next one is max size -1, etc.
So say we have an id = S say. We first find an n such that
C0 + C1 + C2 + ... + Cn < S <= C0+C1+ C2 + ... +Cn+1
Then S would correspond to a tree with n+1 nodes.
So you now consider P = S - (C0+C1+C2+ ...+Cn), which is the position in the enumeration of the trees of n+1 nodes.
Now we figure out an r such that Cn*C0 + Cn-1*C1 + .. + Cn-rCr < P <= CnC0 + Cn-1*C1 + .. + Cn-r+1*Cr-1
This tell us how many nodes the left subtree and the right subtree have.
Considering P - Cn*C0 + Cn-1*C1 + .. + Cn-r*Cr , we can now figure out the exact left subtree enumeration position(only considering trees of that size) and the exact right subtree enumeration position and recursively form the bitstring.
Mapping the bitstring to the id should be similar, as we know what the left subtree and right subtrees look like, all we would need to do is find the corresponding positions and do some arithmetic to get the ID.
Not sure how helpful it is though. You will be working with some pretty huge numbers all the time.
For general (non-search) binary trees I can see how this would be possible, since when building up the tree there are three choices (the amount of children) for every node, only restricted by having the total reach exactly N. You could find a way to represent such a tree as a sequence of choices (by building up the tree in a specific order), and represent that sequence as a base-3 number (or perhaps a variable base would be more appropriate).
But for binary search trees, not every organisation of elements is acceptable. You have to obey the numeric ordering constraints as well. On the other hand, since insertion into a binary search tree is well-defined, you can represent an entire tree of N elements by having a list of N numbers in a specific insertion order. By permuting the numbers to be in a different order, you can generate a different tree.
Permutations are of course easily counted by using variable-base numbers: You have N choices for the first item, N-1 for the second, etc. That gives you a sequence of N numbers that you can encode as a number with base varying from N to 1. Encoding and decoding from variable-base to binary or decimal is trivially adapted from a normal fixed-base conversion algorithm. (The ones that use modulus and division operations).
So you can convert a number to and from a permutation, and given a list of numbers you can convert a permutation (of that list) from and to a binary search tree. Now I think that you could get all the possible binary search trees of size N by permuting just the integers 1 to N, but I'm not entirely sure, and attempting to prove that is a bit too much for this post.
I hope this is a good starting point for a discussion.

Categories

Resources