Find two elements with smallest absolute difference in an interval - algorithm

I'm given an array and a list of queries of type L R which mean find the smallest absolute difference between any two array elements such that their indices are between L and R inclusive (Here the starting index of array is at 1 instead of at 0)
For example take the array a with elements 2 1 8 5 11 then the query 1-3 which would be (2 1 8) the answer would be 1=2-1, or the query 2-4 (1 8 5) where the answer would be 3=8-5
Now this is easy if you have to look at one interval you sort the interval and then compare i-th element with i+1-th and store the minimum difference for each i.
The problem is that I'll have a lot of intervals to check I have to keep the original array intact.
What I've done is I constructed a new array b with indices from the first one such that a[b[i]] <= a[b[j]] for i <= j. Now for each query I loop through the whole array and look if b[j] is between L and R if it is compare its absolute difference to the first next element that is also between L and R keep the minimum and then do the same for that element until you get to the end.
This is inefficient because for each query I have to check all elements of the array especially if the query is small compared to the size of array. I'm looking for a time efficient approach.
EDIT: The numbers don't have to be consecutive, perhaps I gave a bad array as an example, What I've meant for example if it's 1 5 2 then the smallest difference is 1=2-1. In a sorted array the smallest difference is guaranteed to be between two consecutive elements, that's why I've thought of sorting

I'll sketch an O(n (√n) log n)-time solution, which might be fast enough? When I gave up sport programming, computers were a lot slower.
The high-level idea is to apply Mo's trick to a data structure with the following operations.
insert(x) - inserts x into the underlying multiset
delete(x) - deletes one copy of x from the underlying multiset
min-abs-diff() - returns the minimum absolute difference
between two elements of the multiset
(0 if some element has multiplicity >1)
Read in all of the query intervals [l, r], sort them in order of lexicographically nondecreasing (floor(l / sqrt(n)), r) where n is the length of the input, and then to process an interval I, insert the elements in I - I' where I' was the previous interval, delete the elements in I' - I, and report the minimum absolute difference. (The point of the funny sort order is to reduce the number of operations from O(n^2) to O(n √n) assuming n queries.)
There are a couple ways to implement the data structure to have O(log n)-time operations. I'm going to use a binary search tree for clarity of exposition, but you could also sort the array and use a segment tree (less work if you don't have a BST implementation that lets you specify decorations).
Add three fields to each BST node: min (minimum value in the subtree rooted at this node), max (maximum value in the subtree rooted at this node), min-abs-diff (minimum absolute difference between values in the subtree rooted at this node). These fields can be computed bottom-up like so.
if node v has left child u and right child w:
v.min = u.min
v.max = w.max
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max,
w.min - v.value, w.min-abs-diff)
if node v has left child u and no right child:
v.min = u.min
v.max = v.value
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max)
if node v has no left child and right child w:
v.min = v.value
v.max = w.max
v.min-abs-diff = min(w.min - v.value, w.min-abs-diff)
if node v has no left child and no right child:
v.min = v.value
v.max = v.value
v.min-abs-diff = ∞
This logic can be implemented pretty compactly.
if v has a left child u:
v.min = u.min
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max)
else:
v.min = v.value
v.min-abs-diff = ∞
if v has a right child w:
v.max = w.max
v.min-abs-diff = min(v.min-abs-diff, w.min - v.value, w.min-abs-diff)
else:
v.max = v.value
insert and delete work as usual, except that the decorations need to be updated along the traversal path. The total time is still O(log n) for reasonable container choices.
min-abs-diff is implemented by returning root.min-abs-diff where root is the root of the tree.

EDIT #2: My answer determines the smallest difference between any two adjacent values in a sequence, not the smallest difference between any two values in the sequence.
When you say that you have a lot of intervals to check, do you happen to mean that you have to perform checks of many intervals over the same sequence of numbers? If so, what if you just pre-computed the differences from one number to the next? E.g., in Python:
elements = [2, 1, 8, 5, 11]
def get_differences(sequence):
"""Yield absolute differences between each pair of items in the sequence"""
it = iter(sequence)
sentinel = object()
previous = next(it, sentinel)
if previous is sentinel:
return ()
for current in it:
yield abs(previous - current)
previous = current
differences = list(get_differences(elements)) # differences = [1, 7, 3, 6]
Then when you have to find the minimum difference, just return min(differences[start_index:stop_index-1].
EDIT: I missed your paragraph:
Now this is easy if you have to look at one interval you sort the interval and then compare i-th element with i+1-th and store the minimum difference for each i.
But I still think what I'm saying makes sense; you don't have to sort the entire collection but you still need to do an O(n) operation. If you're dealing with numeric values on a platform where the numbers can be represented as machine integers or floats, then as long as you use an array-like container, this should be cache friendly and relatively efficient. If you happen to have repeated queries, you might be able to do some memoization to cache pre-computed results.

Related

Maximize minimum distance between arrays

Lets say that you are given n sorted arrays of numbers and you need to pick one number from each array such that the minimum distance between the n chosen elements is maximized.
Example:
arrays:
[0, 500]
[100, 350]
[200]
2<=n<=10 and every array could have ~10^3-10^4 elements.
In this example the optimal solution to maximize minimum distance is pick numbers: 500, 350, 200 or 0, 200, 350 where min distance is 150 and is the maximum possible of every combination.
I am looking for an algorithm to solve this. I know that I could binary search the max min distance but I can't see how to decide is there is a solution with max min distance of at least d, in order for the binary search to work. I am thinking maybe dynamic programming could help but haven't managed to find a solution with dp.
Of course generating all combination with n elements is not efficient. I have already tried backtracking but it is slow since it tries every combination.
n ≤ 10 suggests that we can take an exponential dependence on n. Here's
an O(2n m n)-time algorithm where m is the total size of the
arrays.
The dynamic programming approach I have in mind is, for each subset of
arrays, calculate all of the pairs (maximum number, minimum distance) on
the efficient frontier, where we have to choose one number from each of
the arrays in the subset. By efficient frontier I mean that if we have
two pairs (a, b) ≠ (c, d) with a ≤ c and b ≥ d, then (c, d) is not on
the efficient frontier. We'll want to keep these frontiers sorted for
fast merges.
The base case with the empty subset is easy: there's one pair, (minimum
distance = ∞, maximum number = −∞).
For every nonempty subset of arrays in some order that extends the
inclusion order, we compute a frontier for each array in the subset,
representing the subset of solutions where that array contributes the
maximum number. Then we merge these frontiers. (Naively this costs us
another factor of log n, which maybe isn't worth the hassle to avoid
given that n ≤ 10, but we can avoid it by merging the arrays once at the
beginning to enable future merges to use bucketing.)
To construct a new frontier from a subset of arrays and another array
also involves a merge. We initialize an iterator at the start of the
frontier (i.e., least maximum number) and an iterator at the start of
the array (i.e., least number). While neither iterator is past the end,
Emit a candidate pair (min(minimum distance, array number − maximum
number), array number).
If the min was less than or equal to minimum distance, increment the
frontier iterator. If the min was less than or equal to array number
− maximum number, increment the array iterator.
Cull the candidate pairs to leave only the efficient frontier. There is
an elegant way to do this in code that is more trouble to explain.
I am going to give an algorithm that for a given distance d, will output whether it is possible to make a selection where the distance between any pair of chosen numbers is at least d. Then, you can binary-search the maximum d for which the algorithm outputs "YES", in order to find the answer to your problem.
Assume the minimum distance d be given. Here is the algorithm:
for every permutation p of size n do:
last := -infinity
ok := true
for p_i in p do:
x := take the smallest element greater than or equal to last+d in the p_i^th array (can be done efficiently with binary search).
if no such x was found; then
ok = false
break
end
last = x
done
if ok; then
return "YES"
end
done
return "NO"
So, we brute-force the order of arrays. Then, for every possible order, we use a greedy method to choose elements from each array, following the order. For example, take the example you gave:
arrays:
[0, 500]
[100, 350]
[200]
and assume d = 150. For the permutation 1 3 2, we first take 0 from the 1st array, then we find the smallest element in the 3rd array that is greater than or equal to 0+150 (it is 200), then we find the smallest element in the 2nd array which is greater than or equal to 200+150 (it is 350). Since we could find an element from every array, the algorithm outputs "YES". But for d = 200 for instance, the algorithm would output "NO" because none of the possible orderings would result in a successful selection.
The complexity for the above algorithm is O(n! * n * log(m)) where m is the maximum number of elements in an array. I believe it would be sufficient, since n is very small. (For m = 10^4, 10! * 10 * 13 ~ 5*10^8. It can be computed under a second on a modern CPU.)
Lets look at an example with optimal choices, x (horizontal arrays A, B, C, D):
A x
B b x b
C x c
D d x
Our recurrence based on range could be: let f(low, excluded) represent the maximum closest distance between two chosen elements (from arrays 1 to n) of the subset without elements in excluded, where low is the lowest chosen element. Then:
(1)
f(low, excluded) when |excluded| = n-1:
max(low)
for low in the only permitted array
(2)
f(low, excluded):
max(
min(
a - low,
f(a, excluded')
)
)
for a ≥ low, a not in excluded'
where excluded' = excluded ∪ {low's array}
We can limit a. For one thing the maximum we can achieve is
(3)
m = (highest - low) / (n - |excluded| - 1)
which means a need not go higher than low + m.
Secondly, we can store results for all f(a, excluded'), keyed by excluded' (we have 2^10 possible keys), each in a decorated binary tree ordered by a. The decoration will be the highest result achievable in the right subtree, meaning we can find the max for all f(v, excluded'), v ≥ a in logarithmic time.
The latter establishes a dominance relationship and clearly we are intetested in both a larger a and a larger f(a, excluded') so as to maximise the min function in (2). Picking an a in the middle, we can use a binary search. If we have:
a - low < max(v, excluded'), v ≥ a
where max(v, excluded') is the lookup
for a in the decorated tree
then we look to the right since max(v, excluded) indicates there's a better answer on the right, where a - low is also larger.
And if we have:
a - low ≥ max(v, excluded), v ≥ a
then we record this candidate and look to the left since to the right, the answer is fixed at max(v, excluded), given that a - low could not decrease.
In order to conduct the binary search on the range, [low, low + m] (see (3)), rather than merge and label all the arrays at the outset, we can keep them separate and compare the closest candidates to mid out of each array we are currently permitted to choose a from. (The trees have the mixed results, keyed by subset.) (The flow of this part is not completely clear to me.)
Worst case with this method, given that n = C is constant seems to be
O(C * array_length * 2^C * C * log(array_length) * log(C * array_length))
C * array_length is the iteration on low
Each low can be paired with 2^C inclusions
C * log(array_length) is the separated binary-search
And log(C * array_length) is the tree lookup
Simplifying:
= O(array_length * log^2(array_length))
although in practice, there could be many dead-end branches that exit early where a full selection wouldn't be possible.
In case, it wasn't clear, the iteration is on a fixed lowest element in the selection. In other words, we want the best f(low, excluded) for all different lows (and excludeds). For bottom-up, we would iterate from the highest value down so our results for a get stored as we iterate.

Find the largest subarray with all ones

Given a binary array (element is 0 or 1), I need to find the maximum length of sub array having all ones for given range(l and r) in the array.
I know the O(n) approach to find such sub array but if there are O(n) queries the overall complexity becomes O(n^2).
I know that segment tree is used for such type of problems but I cant figure out how to build tree for this problem.
Can I build a segment tree using which I can answer queries in log(n) time such that for O(n) queries overall complexity will become O(nlog(n)).
Let A be your binary array.
Build two array IL and IR:
- IL contains, in order, every i such that A[i] = 1 and (i = 0 or A[i-1] = 0);
- IR contains, in order, every i such that A[i-1] = 1 and (i = N or A[i] = 0).
In other words, for any i, the range defined by IL[i] inclusive and IR[i] non-inclusive corresponds to a sequence of 1s in A.
Now, for any query {L, R} (for the range [L; R] inclusive), let S = 0. Traverse both IL and IR with i, until IL[i] >= L. At this point, if IR[i-1] > L, set S = IR[i-1]-L. Continue traversing IL and IR, setting S = max(S, IR[i]-IL[i]), until IR[i] > R. Finally, if IL[i] <= R, set S = max(S, R-IL[i]).
S is now the size of the greatest sequence of 1s in A between L and R.
The complexity of building IL and IR is O(N), and the complexity of answering a query is O(M), with M the length of IL or IR.
Yes, you can use a segment tree to solve this problem.
Let's try to think what that tree must look like. Obviously, every node must contain the length of max subarray of 1s and 0s in that range.
Now, how do we join two nodes into a bigger one. In other words, you have a node representing [low, mid) and a node representing [mid, high). You have to obtain max subarray for [low, high).
First things first, max for whole will at least be max for parts. So we have to take the maximum among the left and right values.
But what if the real max subarray overlaps both nodes? Well, then it must be the rightmost part of left node and leftmost part of right node. So, we need to keep track of longest subarray at start and end as well.
Now, how to update these left and rightmost subarray lengths? Well, leftmost of parent node must be leftmost of left child, unless leftmost of left child spans the entire left node. In that case, leftmost of parent node will be leftmost of left + leftmost of right node.
A similar rule applies to tracking the rightmost subarray of 1s.
And we're finished. Here's the final rules in pseudo code.
max_sub[parent] = max(max_sub[left], max_sub[right], right_sub[left] + left_sub[right])
left_sub[parent] = left_sub[left] if left_sub[left] < length[left] else left_sub[left] + left_sub[right]
right_sub[parent] = right_sub[right] if right_sub[right] < length[right] else right_sub[right] + right_sub[left]
Note that you will need to take similar steps when finding the result for a range.
Here's an example tree for the array [0, 1, 1, 0, 1, 1, 1, 0].

Finding longest overlapping interval pair

Say I have a list of n integral intervals [a,b] each representing set S = {a, a+1, ...b}. An overlap is defined as |S_1 \cap S_2|. Example: [3,6] and [5,9] overlap on [5,6] so the length of that is 2. The task is to find two intervals with the longest overlap in Little-O(n^2) using just recursion and not dynamic programming.
Naive approach is obviously brute force, which does not hold with time complexity condition. I was also unsuccessful trying sweep line algo and/or Longest common subsequence algorithm.
I just cannot find a way of dividing it into subproblems. Any ideas would be appreciated.
Also found this, which in my opinion does not work at all:
Finding “maximum” overlapping interval pair in O(nlog(n))
Here is an approach that takes N log(N) time.
Breakdown every interval [a,b] [c,d] into an array of pair like this:
pair<a,-1>
pair<b,a>
pair<c,-1>
pair<d,c>
sort these pairs in increasing order. Since interval starts are marked as -1, in case of ties interval they should come ahead of interval ends.
for i = 0 to end of the pair array
if current pair represents interval start
put it in a multiset
else
remove the interval start corresponding to this interval end from the multiset.
if the multiset is not empty
update the maxOverlap with (current_interval_end - max(minimum_value_in_multiset,start_value_of_current_interval)+1)
This approach should update the maxOverlap to the highest possible value.
Keep info about the two largest overlapping intervals max1 and max2 (empty in the beginning).
Sort the input list [x1, y1] .. [xn, yn] = I1..In by the value x, discarding the shorter of two intervals if equality is encountered. While throwing intervals out, keep max1 and max2 updated.
For each interval, add an attribute max in linear time, showing the largest y value of all preceding intervals (in sorted list):
rollmax = −∞
for j = 1..n do
Ij.max = rollmax
rollmax = max(rollmax, Ij.y)
On sorted, filtered, and expanded input list perform the following query. It uses an ever expanding sublist of intervals smaller then currently searched for interval Ii as input into recursive function SearchOverlap.
for i = 2..n do
SearchOverlap(Ii, 1, i − 1)
return {max1, max2}
Function SearchOverlap uses divide and conquer approach to traverse the sorted list Il, .. Ir. It imagines such list as a complete binary tree, with interval Ic as its local root. The test Ic.max < I.max is used to always decide to traverse the binary tree (go left/right) in direction of interval with largest overlap with I. Note, that I is the queried for interval, which is compared to log(n) other intervals. Also note, that the largest possible overlapping interval might be passed in such traversal, hence the check for largest overlap in the beginning of function SearchOverlap.
SearchOverlap(I , l, r)
c = ceil(Avg(l, r)) // Central element of queried list
if Overlap(max1, max2) < Overlap(I , Ic) then
max1 = I
max2 = Ic
if l ≥ r then
return
if Ic.max < I.max then
SearchOverlap(I , c + 1, r)
else
SearchOverlap(I , l, c − 1)
return
Largest overlapping intervals (if not empty) are returned at the end. Total complexity is O(n log(n)).

Maximum of all possible subarrays of an array

How do I find/store maximum/minimum of all possible non-empty sub-arrays of an array of length n?
I generated the segment tree of the array and the for each possible sub array if did query into segment tree but that's not efficient. How do I do it in O(n)?
P.S n <= 10 ^7
For eg. arr[]= { 1, 2, 3 }; // the array need not to be sorted
sub-array min max
{1} 1 1
{2} 2 2
{3} 3 3
{1,2} 1 2
{2,3} 2 3
{1,2,3} 1 3
I don't think it is possible to store all those values in O(n). But it is pretty easy to create, in O(n), a structure that makes possible to answer, in O(1) the query "how many subsets are there where A[i] is the maximum element".
Naïve version:
Think about the naïve strategy: to know how many such subsets are there for some A[i], you could employ a simple O(n) algorithm that counts how many elements to the left and to the right of the array that are less than A[i]. Let's say:
A = [... 10 1 1 1 5 1 1 10 ...]
This 5 up has 3 elements to the left and 2 to the right lesser than it. From this we know there are 4*3=12 subarrays for which that very 5 is the maximum. 4*3 because there are 0..3 subarrays to the left and 0..2 to the right.
Optimized version:
This naïve version of the check would take O(n) operations for each element, so O(n^2) after all. Wouldn't it be nice if we could compute all these lengths in O(n) in a single pass?
Luckily there is a simple algorithm for that. Just use a stack. Traverse the array normally (from left to right). Put every element index in the stack. But before putting it, remove all the indexes whose value are lesser than the current value. The remaining index before the current one is the nearest larger element.
To find the same values at the right, just traverse the array backwards.
Here's a sample Python proof-of-concept that shows this algorithm in action. I implemented also the naïve version so we can cross-check the result from the optimized version:
from random import choice
from collections import defaultdict, deque
def make_bounds(A, fallback, arange, op):
stack = deque()
bound = [fallback] * len(A)
for i in arange:
while stack and op(A[stack[-1]], A[i]):
stack.pop()
if stack:
bound[i] = stack[-1]
stack.append(i)
return bound
def optimized_version(A):
T = zip(make_bounds(A, -1, xrange(len(A)), lambda x, y: x<=y),
make_bounds(A, len(A), reversed(xrange(len(A))), lambda x, y: x<y))
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left, right = T[i]
answer[x] += (i-left) * (right-i)
return dict(answer)
def naive_version(A):
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
answer[x] += (i-left) * (right-i)
return dict(answer)
A = [choice(xrange(32)) for i in xrange(8)]
MA1 = naive_version(A)
MA2 = optimized_version(A)
print 'Array: ', A
print 'Naive: ', MA1
print 'Optimized:', MA2
print 'OK: ', MA1 == MA2
I don't think it is possible to it directly in O(n) time: you need to iterate over all the elements of the subarrays, and you have n of them. Unless the subarrays are sorted.
You could, on the other hand, when initialising the subarrays, instead of making them normal arrays, you could build heaps, specifically min heaps when you want to find the minimum and max heaps when you want to find the maximum.
Building a heap is a linear time operation, and retrieving the maximum and minimum respectively for a max heap and min heap is a constant time operation, since those elements are found at the first place of the heap.
Heaps can be easily implemented just using a normal array.
Check this article on Wikipedia about binary heaps: https://en.wikipedia.org/wiki/Binary_heap.
I do not understand what exactly you mean by maximum of sub-arrays, so I will assume you are asking for one of the following
The subarray of maximum/minimum length or some other criteria (in which case the problem will reduce to finding max element in a 1 dimensional array)
The maximum elements of all your sub-arrays either in the context of one sub-array or in the context of the entire super-array
Problem 1 can be solved by simply iterating your super-array and storing a reference to the largest element. Or building a heap as nbro had said. Problem 2 also has a similar solution. However a linear scan is through n arrays of length m is not going to be linear. So you will have to keep your class invariants such that the maximum/minimum is known after every operation. Maybe with the help of some data structure like a heap.
Assuming you mean contiguous sub-arrays, create the array of partial sums where Yi = SUM(i=0..i)Xi, so from 1,4,2,3 create 0,1,1+4=5,1+4+2=7,1+4+2+3=10. You can create this from left to right in linear time, and the value of any contiguous subarray is one partial sum subtracted from another, so 4+2+3 = 1+4+2+3 - 1= 9.
Then scan through the partial sums from left to right, keeping track of the smallest value seen so far (including the initial zero). At each point subtract this from the current value and keep track of the highest value produced in this way. This should give you the value of the contiguous sub-array with largest sum, and you can keep index information, too, to find where this sub-array starts and ends.
To find the minimum, either change the above slightly or just reverse the sign of all the numbers and do exactly the same thing again: min(a, b) = -max(-a, -b)
I think the question you are asking is to find the Maximum of a subarry.
bleow is the code that cand do that in O(n) time.
int maxSumSubArr(vector<int> a)
{
int maxsum = *max_element(a.begin(), a.end());
if(maxsum < 0) return maxsum;
int sum = 0;
for(int i = 0; i< a.size; i++)
{
sum += a[i];
if(sum > maxsum)maxsum = sum;
if(sum < 0) sum = 0;
}
return maxsum;
}
Note: This code is not tested please add comments if found some issues.

Finding consecutively increasing subsequence stored in a haphazard manner

The problem is this, given an array of length say N, we have to find all subsequences of length W such that those W elements, when sorted, forms an arithmetic progression with interval 1. So for an array like [1,4,6,3,5,2,7,9], and W as 5, the slice [4,6,3,5,2] can be regarded as one such subsequence, since, when sorted, it yields [2,3,4,5,6], an A.P with common difference 1.
The immediate solution which comes to the mind is to have a sliding window, for each new element, pop the old one, push the new one, sort the window, and if for that window, window[w-1] - window[0] + 1 = w, then it is such a subsequence. However, it takes O(NlogN) time, whereas the solution at Codechef proposes a O(N) time algorithm that uses double-ended queue. I am having difficulty in understanding the algorithm, what is being pushed and popped, and why so, and how it maintains the window in sorted order without the need to resort with each new element. Can anybody explain it?
You are correct in observing that a segment is valid if max(segment) - min(segment) + 1 = W. So, the problem reduces to finding the min and max of all length W segments in O(N).
For this, we can use a deque D. Suppose we want to find the min. We will store the indexes of elements in D, assuming 0-based indexing. Let A be the original array.
for i = 0 to N - 1:
if D.first() == i - W:
D.popFirst() <- this means that the element is too old,
so we no longer care about it
while not D.empty() and A[ D.last() ] >= A[i]:
D.popLast()
D.pushBack(i)
For each i, this will give you the minimum in [i - W + 1, i] as the element at index D.first().
popFirst() removes the first element from D. We have to do this when the first element in D is more than W steps away from i, because it will not contribute to the minimum in the interval above.
popLast() removes the last element from D. We do this to maintain the sorted order: if the last element in D is the index of an element larger than A[i], then adding i at the end of D would break the order. So we have to keep removing the last element to ensure that D stays sorted.
pushBack() adds an element at the end of D. After adding it, D will definitely remain sorted.
This is O(1) (to find a min, the above pseudocode is O(n)) because each element will be pushed and popped to / from D at most once.
This works because D will always be a sliding window of indexes sorted by their associated value in A. When we are at an element that would break this order, we can pop elements from D (the sliding window) until the order is restored. Since the new element is smaller than those we are popping, there is no way those can contribute to a solution.
Note that you can implement this even without the methods I used by keeping two pointers associated with D: start and end. Then make D an array of length N and you are done.

Resources