How to find which position have prefix sum M in BIT? - algorithm

Suppose I have created a Binary Indexed Tree with prefix sum of length N. The main array contains only 0s and 1s. Now I want to find which index has a prefix sum M(That means have exactly M 1s).
Like my array is a[]={1,0,0,1,1};
prefix-sum would look like {1,1,1,2,3};
now 3rd index(0 based) has prefix sum of 2.
How can i find this index with BIT?
Thanks in advance.

Why can't you do binary search for that index ? It will take O(log n * log n) time. Here is a simple implementation -
int findIndex(int sum) {
int l = 1, r = n;
while(l <= r) {
int mid = l + r >> 1;
int This = read(mid);
if(This == sum) return mid;
else if(This < sum) l = mid+1;
else r = mid-1;
} return -1;
}
I used the read(x) function. That should return the sum of interval [1,x] in O(log n) time. The overall complexity will be O(log^2 n).
Hope it helps.

If elements in array a[n] is non-negative (and the prefix sum array p[n]is non-decreasing), you can locate an element by prefix sum as query prefix sum by index from BIT, which takes O(logn) time. The only difference is that you need to compare the prefix sum you get at each level to your input to decide which subtree you need to search subsequently -- if the prefix sum is smaller than your input, continue searching the left subtree; otherwise, search the right subtree; repeat this process until reach a node that sums up the desired prefix sum, in which case return the index of the node. The idea is analogous to binary search because the prefix sums are naturally sorted in BIT. If there are negative values in a[n], this method won't work since prefix sums in BIT won't be sorted in this case.

Related

Divide and Conquer Algorithms- Binary search variant

This is a practice question for the understanding of Divide and conquer algorithms.
You are given an array of N sorted integers. All the elements are distinct except one
element is repeated twice. Design an O (log N) algorithm to find that element.
I get that array needs to be divided and see if an equal counterpart is found in the next index, some variant of binary search, I believe. But I can't find any solution or guidance regarding that.
You can not do it in O(log n) time because at any step even if u divide the array in 2 parts, u can not decide which part to consider for further processing and which should be left.
On the other hand if the consecutive numbers are all present in the array then by looking at the index and the value in the index we can decide if the duplicate number is in left side or right side of the array.
D&C should look something like this
int Twice (int a[],int i, int j) {
if (i >= j)
return -1;
int k = (i+j)/2;
if (a[k] == a[k+1])
return k;
if (a[k] == a[k-1])
return k-1;
int m = Twice(a,i,k-1);
int n = Twice(a,k+1,j);
return m != -1 ? m : n;
}
int Twice (int a[], int n) {
return Twice(a,0,n);
}
But it has complexity O(n). As it is said above, it is not possible to find O(lg n) algorithm for this problem.

Insertion sort comparison?

How to count number of comparisons in insertion sort in less than O(n^2) ?
When we're inserting an element, we alternate comparisons and swaps until either (1) the element compares not less than the element to its right (2) we hit the beginning of the array. In case (1), there is one comparison not paired with a swap. In case (2), every comparison is paired with a swap. The upward adjustment for number of comparisons can be computed by counting the number of successive minima from left to right (or however your insertion sort works), in time O(n).
num_comparisons = num_swaps
min_so_far = array[0]
for i in range(1, len(array)):
if array[i] < min_so_far:
min_so_far = array[i]
else:
num_comparisons += 1
As commented, to do it in less than O(n^2) is hard, maybe impossible if you must pay the price for sorting. If you already know the number of comparisons done at each external iteration then it would be possible in O(n), but the price for sorting was payed sometime before.
Here is a way for counting the comparisons inside the method (in pseudo C++):
void insertion_sort(int p[], const size_t n, size_t & count)
{
for (long i = 1, j; i < n; ++i)
{
auto tmp = p[i];
for (j = i - 1; j >= 0 and p[j] > tmp; --j) // insert a gap where put tmp
p[j + 1] = p[j];
count += i - j; // i - j is the number of comparisons done in this iteration
p[j + 1] = tmp;
}
}
n is the number of elements and count the comparisons counter which must receive a variable set to zero.
If I remember correctly, this is how insertion sort works:
A = unsorted input array
B := []; //sorted output array
while(A is not empty) {
remove first element from A and add it to B, preserving B's sorting
}
If the insertion to B is implemented by linear search from the left until you find a greater element, then the number of comparisons is the number of pairs (i,j) such that i < j and A[i] >= A[j] (I'm considering the stable variant).
In other words, for each element x, count the number of elements before x that have less or equal value. That can be done by scanning A from the left, adding it's element to some balanced binary search tree, that also remembers the number of elements under each node. In such tree, you can find number of elements lesser or equal to a certain value in O(log n). Total time: O(n log n).

Search a sorted integer array for an element equal to its index, where A may have duplicate entries

My question is very similar to Q1 and Q2, except that I want to deal with the case where the array may have duplicate entries.
Assume the array A consists of integers sorted in increasing order. If its entries are all distinct, you can do this easily in O(log n) with binary search. But if there are duplicate entries, it's more complicated. Here's my approach:
int search(const vector<int>& A) {
int left = 0, right = A.size() - 1;
return binarySearchHelper(A, left, right);
}
int binarySearchHelper(const vector<int>& A, int left, int right) {
int indexFound = -1;
if (left <= right) {
int mid = left + (right - left) / 2;
if (A[mid] == mid) {
return mid;
} else {
if (A[mid] <= right) {
indexFound = binarySearchHelper(A, mid + 1, right);
}
if (indexFound == -1 && A[left] <= mid) {
indexFound = binarySearchHelper(A, left, mid - 1);
}
}
}
return indexFound;
}
In the worst case (A has no element equal to its index), binarySearchHelper makes 2 recursive calls with input size halved at each level of recursion, meaning it has a worst-case time complexity of O(n). That's the same as the O(n) approach where you just read through the array in order. Is this really the best you can do? Also, is there a way to measure the algorithm's average time complexity? If not, is there some heuristic for deciding when to use the basic O(n) read-through approach and when to try a recursive approach such as mine?
If A has negative integers, then it's necessary to check the condition if (left <= right) in binarySearchHelper. Since, for example, if A = [-1], then the algorithm would recurse from bsh(A, 0, 0) to bsh(A,1,0) and to bsh(A,0,-1). My intuition leads me to believe the check if (left <= right) is necessary if and only if A has some negative integers. Can anyone help me verify this?
I would take a different approach. First I would eliminate all negative numbers in O(log n) simply by doing a binary search for the first positive number. This is allowed because no negative number can be equal to its index. Let's say the index of the first positive element is i.
Now I will keep doing the following until I find the element or find that it doesn't exist:
If i not inside A, return false.
If i < A[i] do i = A[i]. It would take A[i] - i duplicates to have i 'catch up' to A[i], so we would increment i by A[i] - i, this is equivalent to setting i to A[i]. Go to 1.
If i == A[i] return true (and index if you want to).
Find the first index greater than i such that i <= A[i]. You can do this doing a 'binary search from the left' by incrementing i by 1, 2, 4, 8, etc and then doing a binary search on the last interval you found it in. If it doesn't exist, return false.
In the worst case the above is stil O(n), but it has many tricks to speed it up way beyond that in better cases.

How to find K smallest values using quick sort

The problem is simple if I sort all the values and pick the fist K values. But it wastes too much time because maybe the smallest K values has been sorted before the whole array is sorted. I think the way to solve this problem is to add a flag in the code, but I can not figure out how to put the flag to judge whether the smallest k values has been sort.
You can use random selection algorithm to solve this problem with O(n) time. In the end, just return sub-array from 0 to k.
I think the problem can be solved by finding the kth smallest value. Suppose the signature of the function partition in quicksort is int partition(int* array, int start, int end), here is pseudocode which illustrate the basic idea:
int select(int[] a, int start, int end, int k)
{
j = partition(a,start,end);
if( k == j)
return a[j];
if( k < j )
select(a,start,j-1,k);
if( k > j )
select(a,j+1,end,k-j);
}
index = select(a, 0, length_of_a, k);
Then a[0...index] is the first k smallest values in array a. You can further sort a[0...index] if you want them sorted.

Number of all increasing subsequences in given sequence?

You may have heard about the well-known problem of finding the longest increasing subsequence. The optimal algorithm has O(n*log(n))complexity.
I was thinking about problem of finding all increasing subsequences in given sequence. I have found solution for a problem where we need to find a number of increasing subsequences of length k, which has O(n*k*log(n)) complexity (where n is a length of a sequence).
Of course, this algorithm can be used for my problem, but then solution has O(n*k*log(n)*n) = O(n^2*k*log(n)) complexity, I suppose. I think, that there must be a better (I mean - faster) solution, but I don't know such yet.
If you know how to solve the problem of finding all increasing subsequences in given sequence in optimal time/complexity (in this case, optimal = better than O(n^2*k*log(n))), please let me know about that.
In the end: this problem is not a homework. There was mentioned on my lecture a problem of the longest increasing subsequence and I have started thinking about general idea of all increasing subsequences in given sequence.
I don't know if this is optimal - probably not, but here's a DP solution in O(n^2).
Let dp[i] = number of increasing subsequences with i as the last element
for i = 1 to n do
dp[i] = 1
for j = 1 to i - 1 do
if input[j] < input[i] then
dp[i] = dp[i] + dp[j] // we can just append input[i] to every subsequence ending with j
Then it's just a matter of summing all the entries in dp
You can compute the number of increasing subsequences in O(n log n) time as follows.
Recall the algorithm for the length of the longest increasing subsequence:
For each element, compute the predecessor element among previous elements, and add one to that length.
This algorithm runs naively in O(n^2) time, and runs in O(n log n) (or even better, in the case of integers), if you compute the predecessor using a data structure like a balanced binary search tree (BST) (or something more advanced like a van Emde Boas tree for integers).
To amend this algorithm for computing the number of sequences, store in the BST in each node the number of sequences ending at that element. When processing the next element in the list, you simply search for the predecessor, count the number of sequences ending at an element that is less than the element currently being processed (in O(log n) time), and store the result in the BST along with the current element. Finally, you sum the results for every element in the tree to get the result.
As a caveat, note that the number of increasing sequences could be very large, so that the arithmetic no longer takes O(1) time per operation. This needs to be taken into consideration.
Psuedocode:
ret = 0
T = empty_augmented_bst() // with an integer field in addition to the key
for x int X:
// sum of auxiliary fields of keys less than x
// computed in O(log n) time using augmented BSTs
count = 1 + T.sum_less(x)
T.insert(x, 1 + count) // sets x's auxiliary field to 1 + count
ret += count // keep track of return value
return ret
I'm assuming without loss of generalization the input A[0..(n-1)] consists of all integers in {0, 1, ..., n-1}.
Let DP[i] = number of increasing subsequences ending in A[i].
We have the recurrence:
To compute DP[i], we only need to compute DP[j] for all j where A[j] < A[i]. Therefore, we can compute the DP array in the ascending order of values of A. This leaves DP[k] = 0 for all k where A[k] > A[i].
The problem boils down to computing the sum DP[0] to DP[i-1]. Supposing we have already calculated DP[0] to DP[i-1], we can calculate DP[i] in O(log n) using a Fenwick tree.
The final answer is then DP[0] + DP[1] + ... DP[n-1]. The algorithm runs in O(n log n).
This is an O(nklogn) solution where n is the length of the input array and k is the size of the increasing sub-sequences. It is based on the solution mentioned in the question.
vector<int> values, an n length array, is the array to be searched for increasing sub-sequences.
vector<int> temp(n); // Array for sorting
map<int, int> mapIndex; // This will translate from the value in index to the 1-based count of values less than it
partial_sort_copy(values.cbegin(), values.cend(), temp.begin(), temp.end());
for(auto i = 0; i < n; ++i){
mapIndex.insert(make_pair(temp[i], i + 1)); // insert will only allow each number to be added to the map the first time
}
mapIndex now contains a ranking of all numbers in values.
vector<vector<int>> binaryIndexTree(k, vector<int>(n)); // A 2D binary index tree with depth k
auto result = 0;
for(auto it = values.cbegin(); it != values.cend(); ++it){
auto rank = mapIndex[*it];
auto value = 1; // Number of sequences to be added to this rank and all subsequent ranks
update(rank, value, binaryIndexTree[0]); // Populate the binary index tree for sub-sequences of length 1
for(auto i = 1; i < k; ++i){ // Itterate over all sub-sequence lengths 2 - k
value = getValue(rank - 1, binaryIndexTree[i - 1]); // Retrieve all possible shorter sub-sequences of lesser or equal rank
update(rank, value, binaryIndexTree[i]); // Update the binary index tree for sub sequences of this length
}
result += value; // Add the possible sub-sequences of length k for this rank
}
After placing all n elements of values into all k dimensions of binaryIndexTree. The values collected into result represent the total number of increasing sub-sequences of length k.
The binary index tree functions used to obtain this result are:
void update(int rank, int increment, vector<int>& binaryIndexTree)
{
while (rank < binaryIndexTree.size()) { // Increment the current rank and all higher ranks
binaryIndexTree[rank - 1] += increment;
rank += (rank & -rank);
}
}
int getValue(int rank, const vector<int>& binaryIndexTree)
{
auto result = 0;
while (rank > 0) { // Search the current rank and all lower ranks
result += binaryIndexTree[rank - 1]; // Sum any value found into result
rank -= (rank & -rank);
}
return result;
}
The binary index tree is obviously O(nklogn), but it is the ability to sequentially fill it out that creates the possibility of using it for a solution.
mapIndex creates a rank for each number in values, such that the smallest number in values has a rank of 1. (For example if values is "2, 3, 4, 3, 4, 1" then mapIndex will contain: "{1, 1}, {2, 2}, {3, 3}, {4, 5}". Note that "4" has a rank of "5" because there are 2 "3"s in values
binaryIndexTree has k different trees, level x would represent the total number of increasing sub-strings that can be formed of length x. Any number in values can create a sub-string of length 1, so each element will increment it's rank and all ranks above it by 1.
At higher levels an increasing sub-string depends on there already being a sub-string available of a shorter length and lower rank.
Because elements are inserted into binary index tree according to their order in values, the order of occurrence in values is preserved, so if an element has been inserted in binaryIndexTree that is because it preceded the current element in values.
An excellent description of how binary index tree is available here: http://www.geeksforgeeks.org/binary-indexed-tree-or-fenwick-tree-2/
You can find an executable version of the code here: http://ideone.com/GdF0me
Let us take an example -
Take an array {7, 4, 6, 8}
Now if you consider each individual element also as a subsequence then the number of increasing subsequence that can be formed are -
{7} {4} {6} {4,6} {8} {7,8} {4,8} {6,8} {4,6,8}
A total of 9 increasing subsequence can be formed for this array.
So the answer is 9.
The code is as follows -
int arr[] = {7, 4, 6, 8};
int T[] = new int[arr.length];
for(int i=0; i<arr.length; i++)
T[i] = 1;
int sum = 1;
for(int i=1; i<arr.length; i++){
for(int j=0; j<i; j++){
if(arr[i] > arr[j]){
T[i] = T[i] + T[j];
}
}
sum += T[i];
}
System.out.println(sum);
The complexity of the code is O(N log N).
You can use sparse segment tree to get optimal solution with O(nlog(n)).
The solution running as follow :
for(int i=0;i<n;i++)
{
dp[i]=1+query(0,a[i]);
update(a[i],dp[i]);
}
The query parameters are : query(first position, last position)
The update parameters are : update(position,value)
And the final answer is the sum of all values of dp array.
Java version as an example:
int[] A = {1, 2, 0, 0, 0, 4};
int[] dp = new int[A.length];
for (int i = 0; i < A.length; i++) {
dp[i] = 1;
for (int j = 0; j <= i - 1; j++) {
if (A[j] < A[i]) {
dp[i] = dp[i] + dp[j];
}
}
}

Resources