Finding consecutively increasing subsequence stored in a haphazard manner - algorithm

The problem is this, given an array of length say N, we have to find all subsequences of length W such that those W elements, when sorted, forms an arithmetic progression with interval 1. So for an array like [1,4,6,3,5,2,7,9], and W as 5, the slice [4,6,3,5,2] can be regarded as one such subsequence, since, when sorted, it yields [2,3,4,5,6], an A.P with common difference 1.
The immediate solution which comes to the mind is to have a sliding window, for each new element, pop the old one, push the new one, sort the window, and if for that window, window[w-1] - window[0] + 1 = w, then it is such a subsequence. However, it takes O(NlogN) time, whereas the solution at Codechef proposes a O(N) time algorithm that uses double-ended queue. I am having difficulty in understanding the algorithm, what is being pushed and popped, and why so, and how it maintains the window in sorted order without the need to resort with each new element. Can anybody explain it?

You are correct in observing that a segment is valid if max(segment) - min(segment) + 1 = W. So, the problem reduces to finding the min and max of all length W segments in O(N).
For this, we can use a deque D. Suppose we want to find the min. We will store the indexes of elements in D, assuming 0-based indexing. Let A be the original array.
for i = 0 to N - 1:
if D.first() == i - W:
D.popFirst() <- this means that the element is too old,
so we no longer care about it
while not D.empty() and A[ D.last() ] >= A[i]:
D.popLast()
D.pushBack(i)
For each i, this will give you the minimum in [i - W + 1, i] as the element at index D.first().
popFirst() removes the first element from D. We have to do this when the first element in D is more than W steps away from i, because it will not contribute to the minimum in the interval above.
popLast() removes the last element from D. We do this to maintain the sorted order: if the last element in D is the index of an element larger than A[i], then adding i at the end of D would break the order. So we have to keep removing the last element to ensure that D stays sorted.
pushBack() adds an element at the end of D. After adding it, D will definitely remain sorted.
This is O(1) (to find a min, the above pseudocode is O(n)) because each element will be pushed and popped to / from D at most once.
This works because D will always be a sliding window of indexes sorted by their associated value in A. When we are at an element that would break this order, we can pop elements from D (the sliding window) until the order is restored. Since the new element is smaller than those we are popping, there is no way those can contribute to a solution.
Note that you can implement this even without the methods I used by keeping two pointers associated with D: start and end. Then make D an array of length N and you are done.

Related

How to find 2 special elements in the array in O(n)

Let a1,...,an be a sequence of real numbers. Let m be the minimum of the sequence, and let M be the maximum of the sequence.
I proved that there exists 2 elements in the sequence, x,y, such that |x-y|<=(M-m)/n.
Now, is there a way to find an algorithm that finds such 2 elements in time complexity of O(n)?
I thought about sorting the sequence, but since I dont know anything about M I cannot use radix/bucket or any other linear time algorithm that I'm familier with.
I'd appreciate any idea.
Thanks in advance.
First find out n, M, m. If not already given they can be determined in O(n).
Then create a memory storage of n+1 elements; we will use the storage for n+1 buckets with width w=(M-m)/n.
The buckets cover the range of values equally: Bucket 1 goes from [m; m+w[, Bucket 2 from [m+w; m+2*w[, Bucket n from [m+(n-1)*w; m+n*w[ = [M-w; M[, and the (n+1)th bucket from [M; M+w[.
Now we go once through all the values and sort them into the buckets according to the assigned intervals. There should be at a maximum 1 element per bucket. If the bucket is already filled, it means that the elements are closer together than the boundaries of the half-open interval, e.g. we found elements x, y with |x-y| < w = (M-m)/n.
If no such two elements are found, afterwards n buckets of n+1 total buckets are filled with one element. And all those elements are sorted.
We once more go through all the buckets and compare the distance of the content of neighbouring buckets only, whether there are two elements, which fulfil the condition.
Due to the width of the buckets, the condition cannot be true for buckets, which are not adjoining: For those the distance is always |x-y| > w.
(The fulfilment of the last inequality in 4. is also the reason, why the interval is half-open and cannot be closed, and why we need n+1 buckets instead of n. An alternative would be, to use n buckets and make the now last bucket a special case with [M; M+w]. But O(n+1)=O(n) and using n+1 steps is preferable to special casing the last bucket.)
The running time is O(n) for step 1, 0 for step 2 - we actually do not do anything there, O(n) for step 3 and O(n) for step 4, as there is only 1 element per bucket. Altogether O(n).
This task shows, that either sorting of elements, which are not close together or coarse sorting without considering fine distances can be done in O(n) instead of O(n*log(n)). It has useful applications. Numbers on computers are discrete, they have a finite precision. I have sucessfuly used this sorting method for signal-processing / fast sorting in real-time production code.
About #Damien 's remark: The real threshold of (M-m)/(n-1) is provably true for every such sequence. I assumed in the answer so far the sequence we are looking at is a special kind, where the stronger condition is true, or at least, for all sequences, if the stronger condition was true, we would find such elements in O(n).
If this was a small mistake of the OP instead (who said to have proven the stronger condition) and we should find two elements x, y with |x-y| <= (M-m)/(n-1) instead, we can simplify:
-- 3. We would do steps 1 to 3 like above, but with n buckets and the bucket width set to w = (M-m)/(n-1). The bucket n now goes from [M; M+w[.
For step 4 we would do the following alternative:
4./alternative: n buckets are filled with one element each. The element at bucket n has to be M and is at the left boundary of the bucket interval. The distance of this element y = M to the element x in the n-1th bucket for every such possible element x in the n-1thbucket is: |M-x| <= w = (M-m)/(n-1), so we found x and y, which fulfil the condition, q.e.d.
First note that the real threshold should be (M-m)/(n-1).
The first step is to calculate the min m and max M elements, in O(N).
You calculate the mid = (m + M)/2value.
You concentrate the value less than mid at the beginning, and more than mid at the end of he array.
You select the part with the largest number of elements and you iterate until very few numbers are kept.
If both parts have the same number of elements, you can select any of them. If the remaining part has much more elements than n/2, then in order to maintain a O(n) complexity, you can keep onlyn/2 + 1 of them, as the goal is not to find the smallest difference, but one difference small enough only.
As indicated in a comment by #btilly, this solution could fail in some cases, for example with an input [0, 2.1, 2.9, 5]. For that, it is needed to calculate the max value of the left hand, and the min value of the right hand, and to test if the answer is not right_min - left_max. This doesn't change the O(n) complexity, even if the solution becomes less elegant.
Complexity of the search procedure: O(n) + O(n/2) + O(n/4) + ... + O(2) = O(2n) = O(n).
Damien is correct in his comment that the correct results is that there must be x, y such that |x-y| <= (M-m)/(n-1). If you have the sequence [0, 1, 2, 3, 4] you have 5 elements, but no two elements are closer than (M-m)/n = (4-0)/5 = 4/5.
With the right threshold, the solution is easy - find M and m by scanning through the input once, and then bucket the input into (n-1) buckets of size (M-m)/(n-1), putting values that are on the boundaries of a pair of buckets into both buckets. At least one bucket must have two values in it by the pigeon-hole principle.

Find the largest subarray with all ones

Given a binary array (element is 0 or 1), I need to find the maximum length of sub array having all ones for given range(l and r) in the array.
I know the O(n) approach to find such sub array but if there are O(n) queries the overall complexity becomes O(n^2).
I know that segment tree is used for such type of problems but I cant figure out how to build tree for this problem.
Can I build a segment tree using which I can answer queries in log(n) time such that for O(n) queries overall complexity will become O(nlog(n)).
Let A be your binary array.
Build two array IL and IR:
- IL contains, in order, every i such that A[i] = 1 and (i = 0 or A[i-1] = 0);
- IR contains, in order, every i such that A[i-1] = 1 and (i = N or A[i] = 0).
In other words, for any i, the range defined by IL[i] inclusive and IR[i] non-inclusive corresponds to a sequence of 1s in A.
Now, for any query {L, R} (for the range [L; R] inclusive), let S = 0. Traverse both IL and IR with i, until IL[i] >= L. At this point, if IR[i-1] > L, set S = IR[i-1]-L. Continue traversing IL and IR, setting S = max(S, IR[i]-IL[i]), until IR[i] > R. Finally, if IL[i] <= R, set S = max(S, R-IL[i]).
S is now the size of the greatest sequence of 1s in A between L and R.
The complexity of building IL and IR is O(N), and the complexity of answering a query is O(M), with M the length of IL or IR.
Yes, you can use a segment tree to solve this problem.
Let's try to think what that tree must look like. Obviously, every node must contain the length of max subarray of 1s and 0s in that range.
Now, how do we join two nodes into a bigger one. In other words, you have a node representing [low, mid) and a node representing [mid, high). You have to obtain max subarray for [low, high).
First things first, max for whole will at least be max for parts. So we have to take the maximum among the left and right values.
But what if the real max subarray overlaps both nodes? Well, then it must be the rightmost part of left node and leftmost part of right node. So, we need to keep track of longest subarray at start and end as well.
Now, how to update these left and rightmost subarray lengths? Well, leftmost of parent node must be leftmost of left child, unless leftmost of left child spans the entire left node. In that case, leftmost of parent node will be leftmost of left + leftmost of right node.
A similar rule applies to tracking the rightmost subarray of 1s.
And we're finished. Here's the final rules in pseudo code.
max_sub[parent] = max(max_sub[left], max_sub[right], right_sub[left] + left_sub[right])
left_sub[parent] = left_sub[left] if left_sub[left] < length[left] else left_sub[left] + left_sub[right]
right_sub[parent] = right_sub[right] if right_sub[right] < length[right] else right_sub[right] + right_sub[left]
Note that you will need to take similar steps when finding the result for a range.
Here's an example tree for the array [0, 1, 1, 0, 1, 1, 1, 0].

Find two elements with smallest absolute difference in an interval

I'm given an array and a list of queries of type L R which mean find the smallest absolute difference between any two array elements such that their indices are between L and R inclusive (Here the starting index of array is at 1 instead of at 0)
For example take the array a with elements 2 1 8 5 11 then the query 1-3 which would be (2 1 8) the answer would be 1=2-1, or the query 2-4 (1 8 5) where the answer would be 3=8-5
Now this is easy if you have to look at one interval you sort the interval and then compare i-th element with i+1-th and store the minimum difference for each i.
The problem is that I'll have a lot of intervals to check I have to keep the original array intact.
What I've done is I constructed a new array b with indices from the first one such that a[b[i]] <= a[b[j]] for i <= j. Now for each query I loop through the whole array and look if b[j] is between L and R if it is compare its absolute difference to the first next element that is also between L and R keep the minimum and then do the same for that element until you get to the end.
This is inefficient because for each query I have to check all elements of the array especially if the query is small compared to the size of array. I'm looking for a time efficient approach.
EDIT: The numbers don't have to be consecutive, perhaps I gave a bad array as an example, What I've meant for example if it's 1 5 2 then the smallest difference is 1=2-1. In a sorted array the smallest difference is guaranteed to be between two consecutive elements, that's why I've thought of sorting
I'll sketch an O(n (√n) log n)-time solution, which might be fast enough? When I gave up sport programming, computers were a lot slower.
The high-level idea is to apply Mo's trick to a data structure with the following operations.
insert(x) - inserts x into the underlying multiset
delete(x) - deletes one copy of x from the underlying multiset
min-abs-diff() - returns the minimum absolute difference
between two elements of the multiset
(0 if some element has multiplicity >1)
Read in all of the query intervals [l, r], sort them in order of lexicographically nondecreasing (floor(l / sqrt(n)), r) where n is the length of the input, and then to process an interval I, insert the elements in I - I' where I' was the previous interval, delete the elements in I' - I, and report the minimum absolute difference. (The point of the funny sort order is to reduce the number of operations from O(n^2) to O(n √n) assuming n queries.)
There are a couple ways to implement the data structure to have O(log n)-time operations. I'm going to use a binary search tree for clarity of exposition, but you could also sort the array and use a segment tree (less work if you don't have a BST implementation that lets you specify decorations).
Add three fields to each BST node: min (minimum value in the subtree rooted at this node), max (maximum value in the subtree rooted at this node), min-abs-diff (minimum absolute difference between values in the subtree rooted at this node). These fields can be computed bottom-up like so.
if node v has left child u and right child w:
v.min = u.min
v.max = w.max
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max,
w.min - v.value, w.min-abs-diff)
if node v has left child u and no right child:
v.min = u.min
v.max = v.value
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max)
if node v has no left child and right child w:
v.min = v.value
v.max = w.max
v.min-abs-diff = min(w.min - v.value, w.min-abs-diff)
if node v has no left child and no right child:
v.min = v.value
v.max = v.value
v.min-abs-diff = ∞
This logic can be implemented pretty compactly.
if v has a left child u:
v.min = u.min
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max)
else:
v.min = v.value
v.min-abs-diff = ∞
if v has a right child w:
v.max = w.max
v.min-abs-diff = min(v.min-abs-diff, w.min - v.value, w.min-abs-diff)
else:
v.max = v.value
insert and delete work as usual, except that the decorations need to be updated along the traversal path. The total time is still O(log n) for reasonable container choices.
min-abs-diff is implemented by returning root.min-abs-diff where root is the root of the tree.
EDIT #2: My answer determines the smallest difference between any two adjacent values in a sequence, not the smallest difference between any two values in the sequence.
When you say that you have a lot of intervals to check, do you happen to mean that you have to perform checks of many intervals over the same sequence of numbers? If so, what if you just pre-computed the differences from one number to the next? E.g., in Python:
elements = [2, 1, 8, 5, 11]
def get_differences(sequence):
"""Yield absolute differences between each pair of items in the sequence"""
it = iter(sequence)
sentinel = object()
previous = next(it, sentinel)
if previous is sentinel:
return ()
for current in it:
yield abs(previous - current)
previous = current
differences = list(get_differences(elements)) # differences = [1, 7, 3, 6]
Then when you have to find the minimum difference, just return min(differences[start_index:stop_index-1].
EDIT: I missed your paragraph:
Now this is easy if you have to look at one interval you sort the interval and then compare i-th element with i+1-th and store the minimum difference for each i.
But I still think what I'm saying makes sense; you don't have to sort the entire collection but you still need to do an O(n) operation. If you're dealing with numeric values on a platform where the numbers can be represented as machine integers or floats, then as long as you use an array-like container, this should be cache friendly and relatively efficient. If you happen to have repeated queries, you might be able to do some memoization to cache pre-computed results.

Efficient approach to find co-prime subarrays

Given an array, is it possible to find the number of co-prime sub arrays of the array in better than O(N²) time? Co-prime arrays are defined as a contiguous subset of an array such that GCD of all elements is 1.
Consider adding one element to the end of the array. Now find the rightmost position, if any, such that the sub-array from that position to the element you have just added is co-prime. Since it is rightmost, no shorter array ending with the element added is co-prime. Since it is co-prime, every array that starts to its left and ends with the new element is co-prime. So you have worked out the number of co-prime sub-arrays that end with the new element. If you can find the rightmost position efficiently - say in O(log n) instead of O(n) - then you can count the number of co-prime sub-arrays in O(n log n) by extending the array one element at a time.
To make it possible to find rightmost positions, think of the full array as the leaves of a complete binary tree, padded out to make its a length a power of two. At each node put the GCD of all of the elements below that node - you can do this from the bottom up in time O(n). Every contiguous interval within the array can be covered by a collection of nodes of size O(log n) such that the interval consists of the leaves underneath the nodes, so you can compute the GCD of the interval is time O(log n).
To find the rightmost position forming a co-prime subarray with your current element, start with the current element and check to see if it is 1. If it is, you are finished. If not, look at the element to its left, take a GCD with that, and push the result on a stack. If the result is 1, you are finished, if not, do the same, but look to see if there is a sub-tree of 2 elements you can use to add 2 elements at once. At each of the succeeding steps you double the size of the sub-tree you are trying to find. You won't always find a convenient sub-tree of the size you want, but because every interval can be covered by O(log n) subtrees you should get lucky often enough to go through this step in time O(log n).
Now you have either found that whole array to the current element is not co-prime or you have found a section that is co-prime, but may go further to the left than it needs. The value at the top of the stack was computed by taking the GCD of the value just below it on the stack and the GCD at the top of a sub-tree. Pop it off the stack and take the GCD of the value just below it and the right half of the sub-tree. If you are still co-prime then you didn't need the left half of the sub-tree. If not, then you needed it, but perhaps not all of it. In either case you can continue down to find the rightmost match in time O(log n).
So I think you can find the rightmost position forming a co-prime subarray with the current element in time O(log n) (admittedly with some very fiddly programming) so you can count the number of coprime sub-arrays in time O(n log n)
Two examples:
List 1, 3, 5, 7. The next level is 1, 1 and the root is 1. If the current element is 13 then I check against 7 and find that gcd(7, 13) = 1. Therefore I immediately know that GCD(5, 7, 13) = GCD(3, 5, 7, 13) = GCD(1, 3, 4, 7, 13) = 1.
List 2, 4, 8, 16. The next level is 2, 8 and the root is 2. If the current numbers is 32 then I check against 16 and find that gcd(16, 32) = 16 != 1 so then I check against 8 and find that GCD(8, 32) = 8 and then I check against 2 and find that GCD(2, 32) = 2 so there is no interval in the extended array which has GCD = 1.

How to find kth largest number in pairwise sums like setA + setB?

Here are two integers set, say A and B
and we can get another set C, in which every element is sum of element a in A and element b in B.
For example, A = {1,2}, B = {3,4} and we get C = {4, 5, 6} where 4=1+3, 5=1+4=2+3, 6=2+4
Now I want to find out which number is the kth largest one in set C, for example 5 is 2nd largest one in above example.
Is there a efficient solution?
I know that pairwise sums sorting is an open problem and has a n^2 lower time bound. But since only kth largest number is needed, maybe we can learn from the O(n) algorithm of finding median number in an unsorted array.
Thanks.
If k is very close to 1 or N, any algorithm that generates the sorted sums lazily could simply be run until the kth or N-kth item pops out.
In particular, I'm thinking of best-first search of the following space: (a,b) means the ath item from A, the first list, added to the bth from B, the second.
Keep in a best=lowest priority queue pairs (a,b) with cost(a,b) = A[a]+B[b].
Start with just (1,1) in the priority queue, which is the minimum.
Repeat until k items popped:
pop the top (a,b)
if a<|A|, push (a+1,b)
if a=1 and b<|B|, push (a,b+1)
This gives you a saw-tooth comb connectivity and saves you from having to mark each (a,b) visited in an array. Note that cost(a+1,b)>=cost(a,b) and cost(a,b+1)>=cost(a,b) because A and B are sorted.
Here's a picture of a comb to show the successor generation rule above (you start in the upper left corner; a is the horizontal direction):
|-------
|-------
|-------
It's just best-first exploration of (up to) all |A|*|B| tuples and their sums.
Note that the most possible items pushed before popping k is 2*k, because each item has either 1 or 2 successors. Here's a possible queue state, where items pushed into the queue are marked *:
|--*----
|-*-----
*-------
Everything above and to the left of the * frontier has already been popped.
For the N-k<k case, do the same thing but with reversed priority queue order and exploration order (or, just negate and reverse the values, get the (N-k)th least, then negate and return the answer).
See also: sorted list of pairwise sums on SO, or the Open problems project.
Sort arrays A & B : O(mlogm + nlogn)
Apply a modified form of algorithm for merging 2 sorted arrays : O(m+n)
i.e. at each point, u sum the the two elements.
When u have got (m+n-k+1)th element in C, stop merging. That element is essentially kth largest.
E.g.
{1,2} & {3,4} : Sorted
C:
{1+3,(1+4)|(2+3),2+4}
Well, O(n) would be a lower bound (probably not tight though), otherwise you could run the O(n) algorithm n times to get a sorted list in O(n^2).
Can you assume the two sets are sorted (you present them in sorted order above)? If so, you could possibly get something with an average case that's decently better by doing an "early out", starting at the last pair of elements, etc. Just a hunch though.

Resources