Constrained Longest Increasing Subsequence - algorithm

Consider an array, which has N integers. Now we are given with an index i, which can take up values from 1 through N. This particular index should always be present in the LIS that we generate. Calculate the LIS for each value at i.
How can we solve the above problem efficiently? My straightforward solution is to vary the index i for all of its values and calculate LIS. The time complexity goes up to O(N2log(N)). Can it be beaten?
Example:
N = 2. i = 1
Say the given array is [1,2].
[1,2] or [2, 2]
The longest (strictly) increasing subsequence in each case is 2 and 1.

The canonical dynamic program for LIS computes, for each k, the longest increasing subsequence of the elements at index 1..k that includes the element at index k. Using this data and the mirror image data for longest increasing subsequences of k..n, we find the LIS that includes index k as the union of the longest before k and the longest after k.
O(n log n)

Having an index i that must be in the subsequence makes it an easy task to look to the left and right and see how far you can go to remain strictly increasing. This will take at most O(N) steps.
The straight forward solution will now just repeat this for all N values at the index i, which gives a total effort of O(N^2).
But note, that when changing the value at index i, the calculations done earlier can be reused. It is only necessary to check if the the sequence can be extended beyond i in either direction or not, If yes, you know already how far (or can calculate it now once and for all).
This brings the total effort down to O(N).

Related

Counting inversions which involves swaps between elements greater than a certain value

This is a homework question which I have spent quite a lot of time thinking.
Suppose I've an unsorted array of integers, and a given integer d. The task is to count the number of inversions that involves swaps greater than d.
For example, given an input array (3, 4, 3, 1) and d = 2, the number of such inversions is 1 as only the inversion between 4 and 1 is counted. The other inversions have a difference of less than 2.
Of course, an easy way to count the number of inversions is to iterate though every number of the list, and add the number of elements ahead of that number, which is smaller and the difference is more than d. However, this is a n^2 algorithm. A n \log n algorithm is needed instead.
Another more efficient algorithm is given here, where we perform mergesort on the input array and directly count from there.
https://www.geeksforgeeks.org/counting-inversions-subarrays-given-size/
However, I'm having trouble modifying this to get a correct answer.
My approach is something like this:
During the 'merge' step of mergesort, if the first item of the left subarray is smaller, then just add it to the sorted array and continue.
Else, I increment the number of inversions needed with the number of items in the left subarray greater than the first item of the right subarray by d.
Nonetheless, I'm still having trouble getting the correct answer.
Can someone please help me? Thank you.
During the merging process, if the element of index i of the left sub-array A[i] is greater than the element of index j of the right sub-array A[j], there will be j-i swaps there.
Do a binary search for the smallest element that has a difference greater than the value on the sub-array from A[i...j-1]. Then the number of swaps between elements greater than the value will be j minus the index of this smallest element.

Finding number of length 3 increasing (or decreasing) subsequences?

Given an array of positive integers, how can I find the number of increasing (or decreasing) subsequences of length 3? E.g. [1,6,3,7,5,2,9,4,8] has 24 of these, such as [3,4,8] and [6,7,9].
I've found solutions for length k, but I believe those solutions can be made more efficient since we're only looking at k = 3.
For example, a naive O(n^3) solution can be made faster by looping over elements and counting how many elements to their left are less, and how many to their right are higher, then multiplying these two counts, and adding it to a sum. This is O(n^2), which obviously doesn't translate easily into k > 3.
The solution can be by looping over elements, on every element you can count how many elements to their left and less be using segment tree algorithm which work in O(log(n)), and by this way you can count how many elements to their right and higher, then multiplying these two counts, and adding it to the sum. This is O(n*log(n)).
You can learn more about segment tree algorithm over here:
Segment Tree Tutorial
For each curr element, count how many elements on the left and right have less and greater values.
This curr element can form less[left] * greater[right] + greater[left] * less[right] triplet.
Complexity Considerations
The straightforward approach to count elements on left and right yields a quadratic solution. You might be tempted to use a set or something to count solders in O(log n) time.
You can find a solder rating in a set in O(log n), however, counting elements before and after will still be linear. Unless you implement BST where each node tracks count of left children.
Check the solution here:
https://leetcode.com/problems/count-number-of-teams/discuss/554795/C%2B%2BJava-O(n-*-n)

Finding a maximal sorted subsequence

Assume that we're given a set of pairs S={(x_1,y_1),...,(x_n,y_n)} of integers. What is the most efficient way of computing a maximal sequence of elements (a_1,b_1),...,(a_m,b_m) in S with the property that
a_i <= a_{i+1}
b_i <= b_{i+1}
for i=1,...,m-1, i.e. the sequence is ordered with respect to both components. I can come up with a quadratic algorithm that does the following:
We sort the elements of S with respect to the first coordinate, giving (c_1,d_1),...,(c_n,d_n), where c_i <= c_{i+1}.
Using dynamic programming, for each (c_i,d_i) we compute the longest sequence ordered with respect to both components that ends in (c_i,d_i). This can be done in linear time, once we know the longest such sequence for (c_1,d_1)...,(c_{i+1},d_{i+1}).
Since we have to perform an O(nlogn) sort in step 1 and a linear search for each index in step 2, which is quadratic, we end up with a quadratic runtime.
I've been trying to figure out whether there's a faster, i.e. O(nlogn) way of generating the maximal sequence from having two sorts of the set S: one with respect to the first component, and one with respect to the second. Is this possible?
Yes, it is possible to do it O(n log n).
Let's sort the elements of the set in lexicographical order. The first components are ordered correctly now, so we can forget about them.
Let's take a look at any sorted subsequence of this sorted sequence. The second elements form an increasing subsequence. That's why we can just find the longest increasing subsequence in the sorted sequence for the second element of each pair(completely ignoring first elements as they are already sorted properly). The longest increasing subsequence for an array of numbers can be found in O(n log n) time(it is a well-known problem).

Counting number of contiguous subarrays with positive sum in O(nlogn) using O(n) transformation

I am interested in finding the number of contiguous sub-arrays that sum to a positive value (sum>0).
More formally, Given an array of integers A[1,...,n] I am looking to count the pairs of integers (i,j) such that 1<=i<=j<=n and A[i]+...+A[j]>0.
I am familiar with Kadane's algorithm for finding the maximum sum sub-array in O(n), and using a similar approach I can count the number of these sub-arrays in O(n^2).
To do this I take the cumulative sum T(i). I then compute T(j)-T(i-1) for all j=1,...,n and i=1,...,j and just record the differences that end up positive.
Apparently though there is an O(n) time routine that transforms this problem into a problem of counting the number of inversions (which can be achieved in O(nlogn) using say merge-sort). Try as I may though, I have been unable to find this transformation.
I do understand though that somehow I must match this inversion to the fact that the sum of elements between a pair (i,j) is positive.
Does anyone have any guidance as to how to do this? Any help is greatly appreciated.
EDIT: I am actually looking for the O(n) transformation rather than an alternative (not based on transformation plus inversion counting) solution to finding the number of sub arrays.
Using the original array A, build another array sumA such that:
sumA[i] = A[0] + A[1] + ... + A[i].
Now in this sumA[] array if there are two indices i, j (i < j) such that sumA[i] < sumA[j],
Then sumA[j] - sumA[i] > 0. This is exactly the sum of all elements between indices i and j.
Hence the problem reduces to finding the number of inversions for the reverse of this array. This can be conducted by sorting the array sumA[] in descending order using mergesort and calculating the number of inversions encountered in this process.

Average number of intervals from an input in 0..N

The question sprang up when examining the "Find the K missing numbers in this set supposed to cover [0..N]" question.
The author of the question asked for CS answers instead of equation-based answers, and his proposal was to sort the input and then iterate over it to list the K missing numbers.
While this seems fine to me, it also seems wasteful. Let's take an example:
N = 200
K = 2 (we will consider K << N)
missing elements: 53, 75
The "sorted" set can be represented as: [0, 52] U [54, 74] U [76, 200], which is way more compact than enumerating all values of the set (and allows to retrieve the missing numbers in O(K) operations, to be compared with O(N) if the set is sorted).
However this is the final result, but during the construction the list of intervals might be much larger, as we feed the elements one at a time....
Let us, therefore, introduce another variable: let I be the number of elements of the set that we fed to the structure so far. Then, we may at worst have: min((N-K)/2, I) intervals (I think...)
From which we deduce that the number of intervals reached during the construction is the maximum encountered for I in [0..N], the worst case being (N-K)/2 thus O(N).
I have however a gut feeling that if the input is random, instead of being specially crafted, we might get a much lower bound... and thus the always so tricky question:
How much intervals... in average ?
Your approach vs. the proposed one with sorting seems to be a classical trade-off of which operations is cheap and which one is expensive.
I find your notation a bit confusing, so please allow me to use my own:
Let S be the set. Let n be the number of items in the set: n = |S|. Let max be the biggest number in the set: max = max(S). Let k be the number of elements not in the set: k = |{0,...,max} \ S|.
For the sorting solution, we could very cheaply insert all n elements into S using hashing. That would take expected O(n). Then for finding the k missing elements, we sort the set in O(nlogn), and then determine the missing elements in O(n).
That is, the overall cost for adding n elements and then finding the missing k elements takes expected O(n) + O(nlogn) + O(n) = O(nlogn).
You suggest a different approach in which we represent the set as a list of dense subsets of S. How would you implement such a data structure? I suggest a sorted tree (instead of a list) so that an insert becomes efficient. Because what do you have to do for an insert of a new element e? I think you have to:
Find the potential candidate subset(s) in the tree where e could be added
If a subset already contains e, nothing has to be done.
If a subset contains e+1 and another subset contains e-1, merge the subsets together and add e to the result
If a subset already contains e+1, but e-1 is not contained in S, add e to that subset
If a subset already contains e-1, but e+1 is not contained in S, add e to that subset
Otherwise, create a new subset holding only the element e and insert it into the tree.
We can expect that finding the subsets needed for the above operations takes O(logn). The operations of 4. or 5. take constant time if we represent the subsets as pairs of integers (we just have to decrement the lower or increment the upper boundary). 3. and 6. potentially require changing the tree structure, but we expect that to take at most O(logn), so the whole "insert" will not take more than O(logn).
Now with such a datastructure in place, we can easily determine the k missing numbers by traversing the tree in order and collecting the numbers not in any of the subsets. The costs are linear in the number of nodes in the tree, which is <= n/2, so the total costs are O(n) for that.
However, if we consider again the complete sequence operations, we get for n inserts O(nlogn) + O(n) for finding the k missing numbers, so the overall costs are again O(nlogn).
This is not better than the expected costs of the first algorithm.
A third solution is to use a boolean array to represent the set and a single integer max for the biggest element in the set.
If an element e is added to the Set, you set array[e] = true. You can implement the variable size of the set using table expansion, so the costs for inserting an element into the array is constant.
To retrieve the missing elements, you just collect those elements f where array[f] == false. This will take O(max).
The overall costs for inserting n elements and finding the k missing ones is thus: O(n) + O(max). However, max = n + k, and so we get as the overall costs O(n + k).
A fourth method which is a cross-over between the third one and the one using hashing is the following one, which also uses hashing, but doesn't require sorting.
Store your set S in a hash set, and also store the maximum element in S in a variable max. To find the k missing ones, first generate a result set R containing all numbers {0,...,max}. Then iterate over S and delete every element in S from R.
The costs for that are also O(n + k).

Resources