Number of all increasing subsequences in given sequence? - algorithm

You may have heard about the well-known problem of finding the longest increasing subsequence. The optimal algorithm has O(n*log(n))complexity.
I was thinking about problem of finding all increasing subsequences in given sequence. I have found solution for a problem where we need to find a number of increasing subsequences of length k, which has O(n*k*log(n)) complexity (where n is a length of a sequence).
Of course, this algorithm can be used for my problem, but then solution has O(n*k*log(n)*n) = O(n^2*k*log(n)) complexity, I suppose. I think, that there must be a better (I mean - faster) solution, but I don't know such yet.
If you know how to solve the problem of finding all increasing subsequences in given sequence in optimal time/complexity (in this case, optimal = better than O(n^2*k*log(n))), please let me know about that.
In the end: this problem is not a homework. There was mentioned on my lecture a problem of the longest increasing subsequence and I have started thinking about general idea of all increasing subsequences in given sequence.

I don't know if this is optimal - probably not, but here's a DP solution in O(n^2).
Let dp[i] = number of increasing subsequences with i as the last element
for i = 1 to n do
dp[i] = 1
for j = 1 to i - 1 do
if input[j] < input[i] then
dp[i] = dp[i] + dp[j] // we can just append input[i] to every subsequence ending with j
Then it's just a matter of summing all the entries in dp

You can compute the number of increasing subsequences in O(n log n) time as follows.
Recall the algorithm for the length of the longest increasing subsequence:
For each element, compute the predecessor element among previous elements, and add one to that length.
This algorithm runs naively in O(n^2) time, and runs in O(n log n) (or even better, in the case of integers), if you compute the predecessor using a data structure like a balanced binary search tree (BST) (or something more advanced like a van Emde Boas tree for integers).
To amend this algorithm for computing the number of sequences, store in the BST in each node the number of sequences ending at that element. When processing the next element in the list, you simply search for the predecessor, count the number of sequences ending at an element that is less than the element currently being processed (in O(log n) time), and store the result in the BST along with the current element. Finally, you sum the results for every element in the tree to get the result.
As a caveat, note that the number of increasing sequences could be very large, so that the arithmetic no longer takes O(1) time per operation. This needs to be taken into consideration.
Psuedocode:
ret = 0
T = empty_augmented_bst() // with an integer field in addition to the key
for x int X:
// sum of auxiliary fields of keys less than x
// computed in O(log n) time using augmented BSTs
count = 1 + T.sum_less(x)
T.insert(x, 1 + count) // sets x's auxiliary field to 1 + count
ret += count // keep track of return value
return ret

I'm assuming without loss of generalization the input A[0..(n-1)] consists of all integers in {0, 1, ..., n-1}.
Let DP[i] = number of increasing subsequences ending in A[i].
We have the recurrence:
To compute DP[i], we only need to compute DP[j] for all j where A[j] < A[i]. Therefore, we can compute the DP array in the ascending order of values of A. This leaves DP[k] = 0 for all k where A[k] > A[i].
The problem boils down to computing the sum DP[0] to DP[i-1]. Supposing we have already calculated DP[0] to DP[i-1], we can calculate DP[i] in O(log n) using a Fenwick tree.
The final answer is then DP[0] + DP[1] + ... DP[n-1]. The algorithm runs in O(n log n).

This is an O(nklogn) solution where n is the length of the input array and k is the size of the increasing sub-sequences. It is based on the solution mentioned in the question.
vector<int> values, an n length array, is the array to be searched for increasing sub-sequences.
vector<int> temp(n); // Array for sorting
map<int, int> mapIndex; // This will translate from the value in index to the 1-based count of values less than it
partial_sort_copy(values.cbegin(), values.cend(), temp.begin(), temp.end());
for(auto i = 0; i < n; ++i){
mapIndex.insert(make_pair(temp[i], i + 1)); // insert will only allow each number to be added to the map the first time
}
mapIndex now contains a ranking of all numbers in values.
vector<vector<int>> binaryIndexTree(k, vector<int>(n)); // A 2D binary index tree with depth k
auto result = 0;
for(auto it = values.cbegin(); it != values.cend(); ++it){
auto rank = mapIndex[*it];
auto value = 1; // Number of sequences to be added to this rank and all subsequent ranks
update(rank, value, binaryIndexTree[0]); // Populate the binary index tree for sub-sequences of length 1
for(auto i = 1; i < k; ++i){ // Itterate over all sub-sequence lengths 2 - k
value = getValue(rank - 1, binaryIndexTree[i - 1]); // Retrieve all possible shorter sub-sequences of lesser or equal rank
update(rank, value, binaryIndexTree[i]); // Update the binary index tree for sub sequences of this length
}
result += value; // Add the possible sub-sequences of length k for this rank
}
After placing all n elements of values into all k dimensions of binaryIndexTree. The values collected into result represent the total number of increasing sub-sequences of length k.
The binary index tree functions used to obtain this result are:
void update(int rank, int increment, vector<int>& binaryIndexTree)
{
while (rank < binaryIndexTree.size()) { // Increment the current rank and all higher ranks
binaryIndexTree[rank - 1] += increment;
rank += (rank & -rank);
}
}
int getValue(int rank, const vector<int>& binaryIndexTree)
{
auto result = 0;
while (rank > 0) { // Search the current rank and all lower ranks
result += binaryIndexTree[rank - 1]; // Sum any value found into result
rank -= (rank & -rank);
}
return result;
}
The binary index tree is obviously O(nklogn), but it is the ability to sequentially fill it out that creates the possibility of using it for a solution.
mapIndex creates a rank for each number in values, such that the smallest number in values has a rank of 1. (For example if values is "2, 3, 4, 3, 4, 1" then mapIndex will contain: "{1, 1}, {2, 2}, {3, 3}, {4, 5}". Note that "4" has a rank of "5" because there are 2 "3"s in values
binaryIndexTree has k different trees, level x would represent the total number of increasing sub-strings that can be formed of length x. Any number in values can create a sub-string of length 1, so each element will increment it's rank and all ranks above it by 1.
At higher levels an increasing sub-string depends on there already being a sub-string available of a shorter length and lower rank.
Because elements are inserted into binary index tree according to their order in values, the order of occurrence in values is preserved, so if an element has been inserted in binaryIndexTree that is because it preceded the current element in values.
An excellent description of how binary index tree is available here: http://www.geeksforgeeks.org/binary-indexed-tree-or-fenwick-tree-2/
You can find an executable version of the code here: http://ideone.com/GdF0me

Let us take an example -
Take an array {7, 4, 6, 8}
Now if you consider each individual element also as a subsequence then the number of increasing subsequence that can be formed are -
{7} {4} {6} {4,6} {8} {7,8} {4,8} {6,8} {4,6,8}
A total of 9 increasing subsequence can be formed for this array.
So the answer is 9.
The code is as follows -
int arr[] = {7, 4, 6, 8};
int T[] = new int[arr.length];
for(int i=0; i<arr.length; i++)
T[i] = 1;
int sum = 1;
for(int i=1; i<arr.length; i++){
for(int j=0; j<i; j++){
if(arr[i] > arr[j]){
T[i] = T[i] + T[j];
}
}
sum += T[i];
}
System.out.println(sum);
The complexity of the code is O(N log N).

You can use sparse segment tree to get optimal solution with O(nlog(n)).
The solution running as follow :
for(int i=0;i<n;i++)
{
dp[i]=1+query(0,a[i]);
update(a[i],dp[i]);
}
The query parameters are : query(first position, last position)
The update parameters are : update(position,value)
And the final answer is the sum of all values of dp array.

Java version as an example:
int[] A = {1, 2, 0, 0, 0, 4};
int[] dp = new int[A.length];
for (int i = 0; i < A.length; i++) {
dp[i] = 1;
for (int j = 0; j <= i - 1; j++) {
if (A[j] < A[i]) {
dp[i] = dp[i] + dp[j];
}
}
}

Related

How to solve "fixed size maximum subarray" using divide and conquer approach?

Disclaimer: I know this problem can be solved with single pass of array very efficiently, but I am interested in doing this with divide and conquer because it is bit different than typical problems we tackle with divide and conquer.
Suppose you are given a floating point array X[1:n] of size n and interval length l. The problem is to design a divide and conquer algorithm to find the sub-array of length l from the array that has the maximum sum.
Here is what I came up with.For array of length n there are n-l+1 sub-arrays of l consecutive elements. For example for array of length n = 10 and l = 3, there will be 8 sub-arrays of length 3.
Now, to divide the problem into two half, I decided to break array at n-l+1/2 so that equal number of sub-arrays will be distributed to both halves of my division as depicted in algorithm below. Again, for n = 10, l = 3, n-l+1 = 8, so I divided the problem at (n-l+1)/2 = 4. But for 4th sub-array I need array elements up-to 6 i.e. (n+l-1)/2.
void FixedLengthMS(input: X[1:n], l, output: k, max_sum)
{
if(l==n){//only one sub-array
sum = Sumof(X[1:n]);
k=1;
}
int kl, kr;
float sum_l, sum_r;
FixedLengthMS(X[1:(n+l-1)/2], l, kl, sum_l);
FixedLengthMS(X[(n-l+3)/2:n], l, kr, sum_r);
if(sum_l >= sum_r){
sum = sum_l;
k = kl;
}
else{
sum = sum_r;
k = n-l+1/2 + kr;
}
}
Note: to clear out array indexing
for sub-array starting at (n-l+1)/2 we need array elements up-to (n-l+1)/2 + l-1 = (n+l-1)/2
My concern:
To apply divide and conquer I have used some data elements in both array, so I am looking for another method that avoids the extra storage.
Faster method will be appreciated.
Please ignore the syntax of code section, I am just trying to give overview of algorithm.
You don't need divide and conquer. A simple one pass algorithm can be used for the task. Let's suppose, that array is big enough. Then:
double sum = 0;
for (size_t i = 0; i < l; ++i)
sum += X[i];
size_t max_index = 0;
double max_sum = sum;
for (int i = 0; i < n - l; ++i) {
sum += X[i + l] - X[i];
if (sum > max_sum) {
max_sum = sum;
max_index = i;
}
}

Insertion sort comparison?

How to count number of comparisons in insertion sort in less than O(n^2) ?
When we're inserting an element, we alternate comparisons and swaps until either (1) the element compares not less than the element to its right (2) we hit the beginning of the array. In case (1), there is one comparison not paired with a swap. In case (2), every comparison is paired with a swap. The upward adjustment for number of comparisons can be computed by counting the number of successive minima from left to right (or however your insertion sort works), in time O(n).
num_comparisons = num_swaps
min_so_far = array[0]
for i in range(1, len(array)):
if array[i] < min_so_far:
min_so_far = array[i]
else:
num_comparisons += 1
As commented, to do it in less than O(n^2) is hard, maybe impossible if you must pay the price for sorting. If you already know the number of comparisons done at each external iteration then it would be possible in O(n), but the price for sorting was payed sometime before.
Here is a way for counting the comparisons inside the method (in pseudo C++):
void insertion_sort(int p[], const size_t n, size_t & count)
{
for (long i = 1, j; i < n; ++i)
{
auto tmp = p[i];
for (j = i - 1; j >= 0 and p[j] > tmp; --j) // insert a gap where put tmp
p[j + 1] = p[j];
count += i - j; // i - j is the number of comparisons done in this iteration
p[j + 1] = tmp;
}
}
n is the number of elements and count the comparisons counter which must receive a variable set to zero.
If I remember correctly, this is how insertion sort works:
A = unsorted input array
B := []; //sorted output array
while(A is not empty) {
remove first element from A and add it to B, preserving B's sorting
}
If the insertion to B is implemented by linear search from the left until you find a greater element, then the number of comparisons is the number of pairs (i,j) such that i < j and A[i] >= A[j] (I'm considering the stable variant).
In other words, for each element x, count the number of elements before x that have less or equal value. That can be done by scanning A from the left, adding it's element to some balanced binary search tree, that also remembers the number of elements under each node. In such tree, you can find number of elements lesser or equal to a certain value in O(log n). Total time: O(n log n).

Maximize sum of list with no more than k consecutive elements from input

I have an array of N numbers and I want remove only those elements from the list which when removed will create a new list where there are no more K numbers adjacent to each other. There can be multiple lists that can be created with this restriction. So I just want that list in which the sum of the remaining numbers is maximum and as an output print that sum only.
The algorithm that I have come up with so far has a time complexity of O(n^2). Is it possible to get better algorithm for this problem?
Link to the question.
Here's my attempt:
int main()
{
//Total Number of elements in the list
int count = 6;
//Maximum number of elements that can be together
int maxTogether = 1;
//The list of numbers
int billboards[] = {4, 7, 2, 0, 8, 9};
int maxSum = 0;
for(int k = 0; k<=maxTogether ; k++){
int sum=0;
int size= k;
for (int i = 0; i< count; i++) {
if(size != maxTogether){
sum += billboards[i];
size++;
}else{
size = 0;
}
}
printf("%i\n", sum);
if(sum > maxSum)
{
maxSum = sum;
}
}
return 0;
}
The O(NK) dynamic programming solution is fairly easy:
Let A[i] be the best sum of the elements to the left subject to the not-k-consecutive constraint (assuming we're removing the i-th element as well).
Then we can calculate A[i] by looking back K elements:
A[i] = 0;
for j = 1 to k
A[i] = max(A[i], A[i-j])
A[i] += input[i]
And, at the end, just look through the last k elements from A, adding the elements to the right to each and picking the best one.
But this is too slow.
Let's do better.
So A[i] finds the best from A[i-1], A[i-2], ..., A[i-K+1], A[i-K].
So A[i+1] finds the best from A[i], A[i-1], A[i-2], ..., A[i-K+1].
There's a lot of redundancy there - we already know the best from indices i-1 through i-K because of A[i]'s calculation, but then we find the best of all of those except i-K (with i) again in A[i+1].
So we can just store all of them in an ordered data structure and then remove A[i-K] and insert A[i]. My choice - A binary search tree to find the minimum, along with a circular array of size K+1 of tree nodes, so we can easily find the one we need to remove.
I swapped the problem around to make it slightly simpler - instead of finding the maximum of remaining elements, I find the minimum of removed elements and then return total sum - removed sum.
High-level pseudo-code:
for each i in input
add (i + the smallest value in the BST) to the BST
add the above node to the circular array
if it wrapper around, remove the overridden element from the BST
// now the remaining nodes in the BST are the last k elements
return (the total sum - the smallest value in the BST)
Running time:
O(n log k)
Java code:
int getBestSum(int[] input, int K)
{
Node[] array = new Node[K+1];
TreeSet<Node> nodes = new TreeSet<Node>();
Node n = new Node(0);
nodes.add(n);
array[0] = n;
int arrPos = 0;
int sum = 0;
for (int i: input)
{
sum += i;
Node oldNode = nodes.first();
Node newNode = new Node(oldNode.value + i);
arrPos = (arrPos + 1) % array.length;
if (array[arrPos] != null)
nodes.remove(array[arrPos]);
array[arrPos] = newNode;
nodes.add(newNode);
}
return sum - nodes.first().value;
}
getBestSum(new int[]{1,2,3,1,6,10}, 2) prints 21, as required.
Let f[i] be the maximum total value you can get with the first i numbers, while you don't choose the last(i.e. the i-th) one. Then we have
f[i] = max{
f[i-1],
max{f[j] + sum(j + 1, i - 1) | (i - j) <= k}
}
you can use a heap-like data structure to maintain the options and get the maximum one in log(n) time, keep a global delta or whatever, and pay attention to the range i - j <= k.
The following algorithm is of O(N*K) complexity.
Examine the 1st K elements (0 to K-1) of the array. There can be at most 1 gap in this region.
Reason: If there were two gaps, then there would not be any reason to have the lower (earlier gap).
For each index i of these K gap options, following holds true:
1. Sum upto i-1 is the present score of each option.
2. If the next gap is after a distance of d, then the options for d are (K - i) to K
For every possible position of gap, calculate the best sum upto that position among the options.
The latter part of the array can be traversed similarly independently from the past gap history.
Traverse the array further till the end.

finding the sum of smaller elements on left

i came across a problem of finding the number of smaller elements on left of each element in an array of integers, which can be solved in O(nlgn) by using Binary Indexed trees(like AVL, etc) or Merge Sort. Using an AVL tree one can calculate the size of left sub-tree for each element and this would be the required answer. However I can't come up how to calculate the sum of the smaller elements left to each element efficiently. For each element , do i have to traverse the left sub-tree and sum the values at nodes or is there any better way(using Merge Sort etc)?
E.g for the array: 4,7,1,3,2 the required ans would be: 0,4,0,1,1
Thanks.
In Binary Indexed trees you store the number of child nodes for every node of the binary search tree. This allows you to find number of nodes, preceding each node (number of smaller elements).
For this task, you can store the sum of child node values for every node of the binary search tree. This allows you to find the sum of values for preceding nodes (sum of smaller elements). Also in O(n*log(n)).
Check this tutorial on Binary Indexed Tree. This is a structure, that uses O(n) memory and can proceed such tasks:
1. Change value of a[i] by(to) x, call this add(i,x);
2. Return sum all of a[i], i<=m, call this get(x).
in O(log n).
Now, how to use this to your task. You can do this in 2 steps.
Step one. Copy, sort and remove duplicates from original array. Now you can remap numbers, so they are in range [1...n].
Step 2. Now walk through the array from left to right. Let A[i] - be the value in original array, new[i] - mapped value. (if A = [2, 7, 11, -3, 7] then new = [2, 3, 4, 1, 2]).
The answer is get(new[i]-1).
Update the values: add(new[i], 1) for counting, add(new[i], A[i]) for sum.
All in all. Sorting and remapping is O(n logn). Working on array is n * O(log n) = O(n log n). So total complexity is O(n logn)
Alternatively, use treap (in russian).
EDIT: Building new array.
Suppose the original array A = [2, 7, 11,-3, 7]
Copy it to B and sort, B = [-3, 2, 7, 7, 11]
Do a unique B = [-3, 2, 7, 11].
Now to get new, you can
add all of elements to map in increasing order, e.g. (-3 -> 1, 2->2, 7->3, 11->4)
for each element in A, do a binary search over B
The following code has a complexity of O(nlogn).
It uses a binary indexed tree to solve the problem.
#include <cstdio>
using namespace std;
const int MX_RANGE = 100000, MX_SIZE = 100000;
int tree[MX_RANGE] = {0}, a[MX_SIZE];
int main() {
int n, mn = MX_RANGE, shift = 0;
scanf("%d", &n);
for(int i = 0; i < n; i++) {
scanf("%d", &a[i]);
if(a[i] < mn) mn = a[i];
}
shift = 1-mn; // we need to remap all values to start from 1
for(int i = 0; i < n; i++) {
// Read answer
int sum = 0, idx = a[i]+shift-1;
while(idx>0) {
sum += tree[idx];
idx -= (idx&-idx);
}
printf("%d ", sum);
// Update tree
idx = a[i]+shift;
while(idx<=MX_RANGE) {
tree[idx] += a[i];
idx += (idx&-idx);
}
}
printf("\n");
}

selection algorithm problem

Suppose you have an array A of n items, and you want to find the k items in A closest
to the median of A. For example, if A contains the 9 values {7, 14, 10, 12, 2, 11, 29, 3, 4}
and k = 5, then the answer would be the values {7, 14, 10, 12, 11}, since the median
is 10 and these are the five values in A closest to the value 10. Give an algorithm
to solve this problem in O(n) time.
I know that a selection algorithm (deep selection) is the appropriate algorithm for this problem, but I think that would run in O(n*logn) time instead of O(n). Any help would be greatly appreciated :)
You will first need to find the median, which can be done in O(n) (for example using Hoare's Quickselect algorithm).
Then you will need to implement a sorting algorithm which sorts the elements in the array according to their absolute distance to the median (smallest distances first).
If you were to sort the entire array this way, this would typically take somewhere from O(n * log n) to O(n^2), depending on the algorithm being used. However since you only need the first k values, the complexity can be reduced to O(k * log n) to O(k * n).
Since k is a constant and does not depend on the size of the array, the overall complexity in a worst case scenario will be: O(n) (for finding the median) + O(k * n) (sorting), which is O(n) overall.
I think you can do this using a variant on quicksort.
You start with a set S of n items and are looking for the "middle" k items. You can think of this as partitioning S into three parts of sizes n - k/2 (the "lower" items), k (the "middle" items), and n - k/2 (the "upper" items).
This gives us a strategy: first remove the lower n - k/2 items from S, leaving S'. Then remove the upper n - k/2 items from S', leaving S'', which is the middle k items of S.
You can easily partition a set this way using "half a quicksort": choose a pivot, partition the set into L and U (lower and upper elements w.r.t. the pivot), then you know the items to discard in the partition must be either all of L and some of U or vice versa: recurse accordingly.
[Thinking further, this may not be exactly what you want if you define "closest to the median" in some other way, but it's a start.]
Assumption: we care about the k values in A that are closest to the median. If we had an A={1,2,2,2,2,2,2,2,2,2,2,2,3}, and k=3, the answer is {2,2,2}. Similarly, if we have A={0,1,2,3,3,4,5,6}, and k=3, answers {2,3,3} and {3,3,4} are equally valid. Furthermore, we are not interested in the indices from which these values came, though I imagine some small tweaks to the algorithm would work.
As Grodrigues states, first find the median in O(n) time. While we're at it, keep track of the largest and smallest number
Next, create an array K, k items long. This array will contain the distance an item is from the median. (note that
Copy the first k items from A into K.
For each item A[i], compare the distance of A[i] from the median to each item of K. If A[i] is closer to the median than the farthest item from the median in K, replace that item. As an optimization, we could also track K's closest and farthest items from the median, so we have a faster comparison to K, or we could keep K sorted, but neither optimization is necessary to operate in O(n) time.
Pseudocode, C++ ish:
/* n = length of array
* array = A, given in the problem
* result is a pre-allocated array where the result will be placed
* k is the length of result
*
* returns
* 0 for success
* -1 for invalid input
* 1 for other errors
*
* Implementation note: optimizations are skipped.
*/
#define SUCCESS 0
#define INVALID_INPUT -1
#define ERROR 1
void find_k_closest(int n, int[] array, int k, int[] result)
{
// if we're looking for more results than possible,
// it's impossible to give a valid result.
if( k > n ) return INVALID_INPUT;
// populate result with the first k elements of array.
for( int i=0; i<k; i++ )
{
result[i] = array[i];
}
// if we're looking for n items of an n length array,
// we don't need to do any comparisons
// Up to this point, function is O(k). Worst case, k==n,
// and we're O(n)
if( k==n ) return 0;
// Assume an O(n) median function
// Note that we don't bother finding the median if there's an
// error or if the output is the input.
int median = median(array);
// Convert the result array to be distance, not
// actual numbers
for( int i=0; i<k; i++)
{
result[i] = result[i]-median;
// if array[i]=1, median=3, array[i] will be set to 2.
// 4 3 -1.
}
// Up to this point, function is O(2k+n) = O(n)
// find the closest items.
// Outer loop is O(n * order_inner_loop)
// Inner loop is O(k)
// Thus outer loop is O(2k*n) = O(n)
// Note that we start at k, since the first k elements
// of array are already in result.
OUTER: for(int i=k; i<n; i++)
{
int distance = array[i]-median;
int abs_distance = abs(distance);
// find the result farthest from the median
int idx = 0;
#define FURTHER(a,b) ((abs(a)>abs(b)) ? 1 : 0;
INNER: for( int i=1; i<k; i++ )
{
idx = (FURTHER(result[i],result[i-1])) ? i:i-1;
}
// If array[i] is closer to the median than the farthest element of
// result, replace the farthest element of result with array[i]
if( abs_distance < result[idx] ){ result[idx] = distance; }
}
}
// Up to this point, function is O(2n)
// convert result from distance to values
for( int i=0; i<k; i++)
{
result[i] = median - result[i];
// if array[i]=2 , median=3, array[i] will be set to 1.
// -1 3 4.
}
}

Resources