Longest Length sub array with elements in a given range - algorithm

If I have a list of integers, in an array, how do I find the length of the longest sub array, such that the difference between the minimum and maximum element of that array is less than a given integer, say M.
So if we had an array with 3 elements,
[1, 2, 4]
And if M were equal to 2
Then the longest subarry would be [1, 2]
Because if we included 4, and we started from the beginning, the difference would be 3, which is greater than M ( = 2), and if we started from 2, the difference between the largest (4) and smallest element (2) would be 2 and that is not less than 2 (M)
The best I can think of is to start from the left, then go as far right as possible without the sub array range getting too high. Of course at each step we have to keep track of the minimum and maximum element so far. This has an n squared time complexity though, can't we get it faster?

I have an improvement to David Winder's algorithm. The idea is that instead of using two heaps to find the minimum and maximum elements, we can use what I call the deque DP optimization trick (there's probably a proper name for this somewhere).
To understand this, we can look at a simpler problem: finding the minimum element in all subarrays of some size k in an array. The idea is that we keep a double-ended queue containing potential candidates for the minimum element. When we encounter a new element, we pop off all the elements at the back end of the queue more than or equal to the current element before pushing the current element into the back.
We can do this because we know that any subarray we encounter in the future which includes an element that we pop off will also include the current element, and since the current element is less than those elements that gets popped off, those elements will never be the minimum.
After pushing the current element, we pop off the front element in the queue if it is more than k elements away. The minimum element in the current subarray is simply the first element in the queue because the way we popped off the elements from the back of the queue kept it increasing.
To use this algorithm in your problem, we would have two deques to store the minimum and maximum elements. When we encounter a new element which is too much larger than the minimum element, we pop off the front of the deque until the element is no longer too large. The beginning of the longest array ending at that position is then the index of the last element we popped off plus 1.
This makes the solution O(n).
C++ implementation:
int best = std::numeric_limits<int>::lowest(), beg = 0;
//best = length of the longest subarray that meets the requirements so far
//beg = the beginning of the longest subarray ending at the current index
std::deque<int> least, greatest;
//these two deques store the indices of the elements which could cause trouble
for (int i = 0; i < n; i++)
{
while (!least.empty() && a[least.back()] >= a[i])
{
least.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
least.push_back(i);
while (!greatest.empty() && a[greatest.back()] <= a[i])
{
greatest.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
greatest.push_back(i);
while (a[least.front()] < a[i] - m)
{
beg = least.front() + 1;
least.pop_front();
//remove elements from the beginning if they are too small
}
while (a[greatest.front()] > a[i] + m)
{
beg = greatest.front() + 1;
greatest.pop_front();
//remove elements from the beginning if they are too large
}
best = std::max(best, i - beg + 1);
}

Consider the following idea:
Let create MaxLen array (size of n) which define as: MaxLen[i] = length of the max sub-array till the i-th place.
After we will fill this array it will be easy (O(n)) to find your max sub-array.
How do we fill the MaxLen array? Assume you know MaxLen[i], What will be in MaxLen[i+1]?
We have 2 option - if the number in originalArr[i+1] do not break your constrains of exceed diff of m in the longest sub-array ending at index i then MaxLen[i+1] = MaxLen[i] + 1 (because we just able to make our previous sub array little bit longer. In the other hand, if originalArr[i+1] bigger or smaller with diff m with one of the last sub array we need to find the element that has diff of m and (let call its index is k) and insert into MaxLen[i+1] = i - k + 1 because our new max sub array will have to exclude the originalArr[k] element.
How do we find this "bad" element? we will use Heap. After every element we pass we insert it value and index to both min and max heap (done in log(n)). When you have the i-th element and you want to check if there is someone in the previous last array who break your sequence you can start extract element from the heap until no element is bigger or smaller the originalArr[i] -> take the max index of the extract element and that your k - the index of the element who broke your sequence.
I will try to simplify with pseudo code (I only demonstrate for min-heap but it the same as the max heap)
Array is input array of size n
min-heap = new heap()
maxLen = array(n) // of size n
maxLen[0] = 1; //max subArray for original Array with size 1
min-heap.push(Array[0], 0)
for (i in (1,n)) {
if (Array[i] - min-heap.top < m) // then all good
maxLen[i] = maxLen[i-1] + 1
else {
maxIndex = min-heap.top.index;
while (Array[i] - min-heap.top.value > m)
maxIndex = max (maxIndex , min-heap.pop.index)
if (empty(min-heap))
maxIndex = i // all element are "bad" so need to start new sub-array
break
//max index is our k ->
maxLen[i] = i - k + 1
}
min-heap.push(Array[i], i)
When you done, run on your max length array and choose the max value (from his index you can extract the begin an end indexes of the original array).
So we had loop over the array (n) and in each insert to 2 heaps (log n).
You would probably saying: Hi! But you also had un-know times of heap extract which force heapify (log n)! But notice that this heap can have max of n element and element can be extract twice so calculate accumolate complecsity and you will see its still o(1).
So bottom line: O(n*logn).
Edited:
This solution can be simplify by using AVL tree instead of 2 heaps - finding min and max are both O(logn) in AVL tree - same goes for insert, find and delete - so just use tree with element of the value and there index in the original array.
Edited 2:
#Fei Xiang even came up with better solution of O(n) using deques.

Related

Algorithm for searching for a key in 1,000,000,000 elements with the key being in the first n indices without n being specified beforehand

The question is as follows:
Suppose you are given a very large integer array A with 1,000,000,000 elements, with elements sorted in descending order. However, only the first n elements contain data which are positive integers, but the value of n is unknown. The rest of the array elements contain zeroes.
Give the algorithm for a method search(A,k) to search for a key k in the array A. It returns the index of the array where k is found, or -1 otherwise. Your algorithm must run in worst case O(logn) time.
You may use binarySearch(A, k, left, right) that searches for k in A[left] through A[right] (assuming left to right is sorted in descending order).
The approaches that I have thought of so far:
Use a for loop to iterate from index 0 till the index containing the first 0 and compare against k. This takes O(n) time so doesn't fit the time limit.
Binary search on A itself. This takes O(log 1,000,000,000) time and exceeds the time limit.
I am kind of stuck here and unable to think of any other approaches.
What would be an approach that runs in worst case O(logn)?
Hi this approach is based on a modified binary search
Start at first index if matches return
Double the index untill value is found or you the end
a) if Value is greater than k continue
b) else if value is less than k BinarySearch(index/2, index, k)
So basically what we did is we narrowed down our search space by jumping forward, Like previous answer by nice_dev; But While Jumping we also checked if our value is in specific window if it is we stop jumping and binary search the last window
The idea here will be to first find the value of n. We can do a jump similar to concept of jump arrays like below:
Start from index 1(1-based indexing).
If current value is not zero, jump to index * 2.
Repeat step 2 if current index value is not zero.
If current index value is zero, iterate one by one from last non zero index to find value of n.
Pseudocode:
index = 1
prev = 1
while index <= arr.length and arr[index - 1] != 0:
prev = index
index = index * 2
while prev <= arr.length and arr[prev - 1] != 0
prev++
The below linear run will be minimal as you have reached the last value of n in log n
time.
If binarySearch routine is already provided, then you can just do binarySearch(A, k, 0, prev - 1) to get the final answer.

Find majority element when the values are unknown

Suppose I have an array of elements.
I cannot read the values of the elements. I can only compare any two elements from the array to know whether they are the same or not, but even then I don't get to know their actual values.
Suppose this array has a majority of elements of the same value. I need to find and return any of the majority elements. How would I do it?
We have to be be able to do it in a big thet.of n l0g n.
Keep track of two indices, i & j. Initialize i=0, j=1. Repeatedly compare arr[i] to arr[j].
if arr[i] == arr[j], increment j.
if arr[i] != arr[j]
eliminate both from the array
increment i to the next index that hasn't been eliminated.
increment j to the next index >i that hasn't been eliminated.
The elimination operation will eliminate at least one non-majority element each time it eliminates a majority element, so majority is preserved. When you've gone through the array, all elements not eliminated will be in the majority, and you're guaranteed at least one.
This is O(n) time, but also O(n) space to keep track of eliminations.
Given:
an implicit array a of length n, which is known to have a majority element
an oracle function f, such that f(i, j) = a[i] == a[j]
Asked:
Return an index i, such that a[i] is a majority element of a.
Main observation:
If
m is a majority element of a, and
for some even k < n each element of a[0, k) occurs at most k / 2 times
then m is a majority element of a[k, n).
We can use that observation by assuming that the first element is the majority element. We move through the array until we reach a point where that element occurred exactly half the time. Then we discard the prefix and continue again from that point on. This is exactly what the Boyer-Moore algorithm does, as pointed out by Rici in the comments.
In code:
result = 0 // index where the majority element is
count = 0 // the number of times we've seen that element in the current prefix
for i = 0; i < n; i++ {
// we've seen the current majority candidate exactly half of the time:
// discard the current prefix and start over
if (count == 0) {
result = i
}
// keep track of how many times we've seen the current majority candidate in the prefix
if (f(result, i)) {
count++
} else {
count--
}
}
return result
For completeness: this algorithm uses two variables and a single loop, so it runs in O(n) time and O(1) space.
Assuming you can determine if elements are <, >, or == what you can do is go through the list and build a tree. The trees values will be like buckets, the item and count of how many you've seen. When you come by a node where you get == then just increment the count. Then at the end go through the tree and find the one with the highest count.
Assuming you build a balanced tree, this should be O(n log n). Red Black trees might help with making a balanced tree. Else you could build the tree by adding randomly selected elements and this would give you O(n log n) on average.

Maximum sum in array with special conditions

Assume we have an array with n Elements ( n%3 = 0).In each step, a number is taken from the array. Either you take the leftmost or the rightmost one. If you choose the left one, this element is added to the sum and the two right numbers are removed and vice versa.
Example: A = [100,4,2,150,1,1], sum = 0.
take the leftmost element. A = [4,2,150] sum = 0+100 =100
2.take the rightmost element. A = [] sum = 100+150 = 250
So the result for A should be 250 and the sequence would be Left, Right.
How can I calculate the maximum sum I can get in an array? And how can I determine the sequence in which I have to extract the elements?
I guess this problem can best be solved with dynamic programming and the concrete sequence can then be determined by backtracking.
The underlying problem can be solved via dynamic programming as follows. The state space can be defined by letting
M(i,j) := maximum value attainable by chosing from the subarray of
A starting at index i and ending at index j
for any i, j in {1, N} where `N` is the number of elements
in the input.
where the recurrence relation is as follows.
M(i,j) = max { M(i+1, j-2) + A[i], M(i+2, j-1) + A[j] }
Here, the first value corresponds to the choice of adding the beginning of the array while the second value connesponds to the choice of subtracting the end of the array. The base cases are the states of value 0 where i=j.

Finding the sum of maximum difference possible from all subintervals of a given array

I have a problem designing an algorithm. The problem is that this should be executed in O(n) time.
Here is the assignment:
There is an unsorted array "a" with n numbers.
mij=min{ai, ai+1, ..., aj}, Mij=max{ai, ai+1, ..., aj}
Calculate:
S=SUM[i=1,n] { SUM[j=i,n] { (Mij - mij) } }
I am able to solve this in O(nlogn) time. This is a university research assignment. Everything that I tried suggests that this is not possible. I would be very thankful if you could point me in the right direction where to find the solution. Or at least prove that this is not possible.
Further explanation:
Given i and j, find the maximum and minimum elements of the array slice a[i:j]. Subtract those to get the range of the slice, a[max]-a[min].
Now, add up the ranges of all slices for all (i, j) such that 1 <= i <= j <= n. Do it in O(n) time.
This is pretty straight forward problem.
I will assume that it is array of objects (like pair of values or tuples) not numbers. First value is index in array and the second is value.
Right question here is how many time we need to multiply each number and add/subtract it from the sum ie in how many sub-sequences it is maximum and minimum element.
This problem is connected to finding next greatest element (nge), you can see here, just to know it for future problems.
I will write it in pseudo code.
subsum (A):
returnSum = 0
//i am pushing object into the stack. Firt value is index in array, secong is value
lastStackObject.push(-1, Integer.MAX_INT)
for (int i=1; i<n; i++)
next = stack.pop()
stack.push(next)
while (next.value < A[i].value)
last = stack.pop()
beforeLast = stack.peek()
retrunSum = returnSum + last.value*(i-last.index)*(last.index-beforeLast.index)
stack.push(A[i])
while stack is not empty:
last = stack.pop()
beforeLast = stack.peek()
retrunSum = returnSum + last.value*(A.length-last.index)*(last.index-beforeLast.index)
return returnSum
sum(A)
//first we calculate sum of maximum values in subarray, and then sum of minimum values. This is done by simply multiply each value in array by -1
retrun subsum(A)+subsum(-1 for x in A.value)
Time complexity of this code is O(n).
Peek function is just to read next value in stack without popping it.

Maximum of all possible subarrays of an array

How do I find/store maximum/minimum of all possible non-empty sub-arrays of an array of length n?
I generated the segment tree of the array and the for each possible sub array if did query into segment tree but that's not efficient. How do I do it in O(n)?
P.S n <= 10 ^7
For eg. arr[]= { 1, 2, 3 }; // the array need not to be sorted
sub-array min max
{1} 1 1
{2} 2 2
{3} 3 3
{1,2} 1 2
{2,3} 2 3
{1,2,3} 1 3
I don't think it is possible to store all those values in O(n). But it is pretty easy to create, in O(n), a structure that makes possible to answer, in O(1) the query "how many subsets are there where A[i] is the maximum element".
Naïve version:
Think about the naïve strategy: to know how many such subsets are there for some A[i], you could employ a simple O(n) algorithm that counts how many elements to the left and to the right of the array that are less than A[i]. Let's say:
A = [... 10 1 1 1 5 1 1 10 ...]
This 5 up has 3 elements to the left and 2 to the right lesser than it. From this we know there are 4*3=12 subarrays for which that very 5 is the maximum. 4*3 because there are 0..3 subarrays to the left and 0..2 to the right.
Optimized version:
This naïve version of the check would take O(n) operations for each element, so O(n^2) after all. Wouldn't it be nice if we could compute all these lengths in O(n) in a single pass?
Luckily there is a simple algorithm for that. Just use a stack. Traverse the array normally (from left to right). Put every element index in the stack. But before putting it, remove all the indexes whose value are lesser than the current value. The remaining index before the current one is the nearest larger element.
To find the same values at the right, just traverse the array backwards.
Here's a sample Python proof-of-concept that shows this algorithm in action. I implemented also the naïve version so we can cross-check the result from the optimized version:
from random import choice
from collections import defaultdict, deque
def make_bounds(A, fallback, arange, op):
stack = deque()
bound = [fallback] * len(A)
for i in arange:
while stack and op(A[stack[-1]], A[i]):
stack.pop()
if stack:
bound[i] = stack[-1]
stack.append(i)
return bound
def optimized_version(A):
T = zip(make_bounds(A, -1, xrange(len(A)), lambda x, y: x<=y),
make_bounds(A, len(A), reversed(xrange(len(A))), lambda x, y: x<y))
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left, right = T[i]
answer[x] += (i-left) * (right-i)
return dict(answer)
def naive_version(A):
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
answer[x] += (i-left) * (right-i)
return dict(answer)
A = [choice(xrange(32)) for i in xrange(8)]
MA1 = naive_version(A)
MA2 = optimized_version(A)
print 'Array: ', A
print 'Naive: ', MA1
print 'Optimized:', MA2
print 'OK: ', MA1 == MA2
I don't think it is possible to it directly in O(n) time: you need to iterate over all the elements of the subarrays, and you have n of them. Unless the subarrays are sorted.
You could, on the other hand, when initialising the subarrays, instead of making them normal arrays, you could build heaps, specifically min heaps when you want to find the minimum and max heaps when you want to find the maximum.
Building a heap is a linear time operation, and retrieving the maximum and minimum respectively for a max heap and min heap is a constant time operation, since those elements are found at the first place of the heap.
Heaps can be easily implemented just using a normal array.
Check this article on Wikipedia about binary heaps: https://en.wikipedia.org/wiki/Binary_heap.
I do not understand what exactly you mean by maximum of sub-arrays, so I will assume you are asking for one of the following
The subarray of maximum/minimum length or some other criteria (in which case the problem will reduce to finding max element in a 1 dimensional array)
The maximum elements of all your sub-arrays either in the context of one sub-array or in the context of the entire super-array
Problem 1 can be solved by simply iterating your super-array and storing a reference to the largest element. Or building a heap as nbro had said. Problem 2 also has a similar solution. However a linear scan is through n arrays of length m is not going to be linear. So you will have to keep your class invariants such that the maximum/minimum is known after every operation. Maybe with the help of some data structure like a heap.
Assuming you mean contiguous sub-arrays, create the array of partial sums where Yi = SUM(i=0..i)Xi, so from 1,4,2,3 create 0,1,1+4=5,1+4+2=7,1+4+2+3=10. You can create this from left to right in linear time, and the value of any contiguous subarray is one partial sum subtracted from another, so 4+2+3 = 1+4+2+3 - 1= 9.
Then scan through the partial sums from left to right, keeping track of the smallest value seen so far (including the initial zero). At each point subtract this from the current value and keep track of the highest value produced in this way. This should give you the value of the contiguous sub-array with largest sum, and you can keep index information, too, to find where this sub-array starts and ends.
To find the minimum, either change the above slightly or just reverse the sign of all the numbers and do exactly the same thing again: min(a, b) = -max(-a, -b)
I think the question you are asking is to find the Maximum of a subarry.
bleow is the code that cand do that in O(n) time.
int maxSumSubArr(vector<int> a)
{
int maxsum = *max_element(a.begin(), a.end());
if(maxsum < 0) return maxsum;
int sum = 0;
for(int i = 0; i< a.size; i++)
{
sum += a[i];
if(sum > maxsum)maxsum = sum;
if(sum < 0) sum = 0;
}
return maxsum;
}
Note: This code is not tested please add comments if found some issues.

Resources