How to make the run time of the program to ϴ n - algorithm

The requirement is that the input will be set of integer ranging from -5 to 5, the result should give the longest subset of the integer, in which the total must be greater or equal to zero.
I can only come up with the following:
The input will be input[0 to n]
let start, longestStart, end, longestEnd, sum = 0
for i=0 to n-1
start = i
sum = input[i]
for j=1 to n
if sum + input[j] >= 0 then
end=j;
if end - start > longestEnd - longestStart then
longestStart = start;
longestEnd = end;
However this is ϴ(n^2). I would like to know what are the ways to make this loop become ϴ(n)
Thank you

Since
a - b == (a + n) - (b + n)
for any a, b or n, we can apply this to the array of numbers, keeping a running total of all elements from 0 to current. From the above equation, the sum of any subarray from index a to b is sum(elements 0-b) - sum(elements 0-a).
By keeping track of local minima and maxima, and the sums to them, you can find the subarray with the greatest range in one pass, ie O(n).

Related

Find scalar interval containing maximum elements from population A and zero elements from population B

Given two large sets A and B of scalar (floating point) values, what algorithm would you use to find the (scalar) range [x0,x1] containing zero elements from B and the maximum number of elements from A?
Is sorting complexity (O(n log n)) unavoidable?
Create a single list with all values, where each value is marked with two counts: one count that relates to set A, and another that relates to set B. Initially these counts are 1 and 0, when the value comes from set A, and 0 and 1 when the value comes from set B. So entries in this list could be tuples (value, countA, countB). This operation is O(n).
Sort these tuples. O(nlogn)
Merge tuples with duplicate values into one tuple, and accumulate the values in the corresponding counters, so that the tuple tells us how many times the value occurs in set A and how many times in set B. O(n)
Traverse this list in sorted order and maintain the largest sum of counts for countA of a series of adjacent tuples where countB is always 0, and the minimum and maximum value of that range. O(n)
The sorting is the determining factor of the time complexity: O(nlogn).
Sort both A and B in O(|A| log |A| + |B| log |B|). Then apply the following algorithm, which has complexity O(|A| + |B|):
i = j = k = 0
best_interval = (0, 1)
while i < len(B) - 1:
lo = B[i]
hi = B[i+1]
j = k # We can skip ahead from last iteration.
while j < len(A) and A[j] <= lo:
j += 1
k = j # We can skip ahead from the above loop.
while k < len(A) and A[k] < hi:
k += 1
if k - j > best_interval[1] - best_interval[0]:
best_interval = (j, k)
i += 1
x0 = A[best_interval[0]]
x1 = A[best_interval[1]-1]
It may look quadratic at a first inspection but note we never decrease j and k - it really is just a linear scan with three pointers.

Bonetrousle | Find B distinct positive integers below K such that their sum is N or say that it is not possible. | Timeout error

I am attempting the Bonetrousle HackerRank challenge.
The problem is the following:
Find B distinct positive integers below K such that their sum is N or say that it is not possible.
Constraints:
n, k <= 10^18
b <= 10^5
You can check that a solution exists if the given N lies between the minimum(take first B elements) and maximum(take last B elements) possible sum.
From there on, I start with the minimum sum, and try to make it to N by assigning each element the maximum possible value without breaking the constraint. (no duplication, sum == N)
Below is the code I wrote.
def foo1(n,k,b):
minSum = (b*(b+1))//2
maxSum = (b)*(k-b+1+k)//2
#maxSum = (k*(k+1))//2 - minSum
#print(minSum, maxSum)
if n>=minSum and n<=maxSum:
minArr = [i for i in range(1,b+1)]
minArr.reverse()
sumA = sum(minArr)
maxA = k
for i in range(len(minArr)):
tmp = minArr[i]
minArr[i] = maxA
sumA = sumA-tmp+minArr[i]
while sumA > n:
sumA -=1
minArr[i] -= 1
maxA = minArr[i]-1
"""
while sumA+1 <= n and minArr[i]+1 <= k and minArr[i]+1 != maxA:
#print(minArr, maxA)
minArr[i]+=1
sumA +=1
maxA = minArr[i]
if sumA == n:
break
"""
else:
return [-1]
return minArr
The code outputs correct solutions however it times out on hacker rank for 4 test cases. (sample n,b,k : 19999651, 20000000, 6324)
It gives answer within 3 seconds on my machine for the same test case.
Initially I thought the issue was with the commented code, since I was trying to increment each element array 1-by-1 until the sum was reached. I modified the code to assign each element the maximum possible value and then decrement it if it breaks the constraints, however it did not help much, apparently.
Any suggestion on modifying the code to get it to pass the timing constraint or a much faster algorithm?
First, find the B largest consecutive integers with sum <= N. The problem is impossible if this sequence starts at an integer < 1 or ends at an integer > K
The sum of B integers starting at x is B*(2x+B-1)/2, so just solve for x directly.
Obviously, if you were to add one to each of the integers in the sequence starting at x, then you'd get the next B consecutive integers, and their sum is > N, so you don't need to increment that many. Just add 1 to the highest N-sum integers in the sequence to make the sum come out right.

Find longest sequences with sufficient average score

I have a long list of scores between 0 and 1. How do I efficiently find all contiguous sublists longer than x elements such that the average score in each sublist is not less than y?
E.g., how do I find all contiguous sublists longer than 300 elements such that the average score of these sublists is not less than 0.8?
I'm mainly interested in the LONGEST sublists that fulfill these criteria, not actually all sublists. So I'm looking for all longest sublists.
If you want only the longest such substrings, this can be solved in O(n log n) time by transforming the problem slightly and then binary-searching over maximum solution lengths.
Let the input list of scores be x[1], ..., x[n]. Let's transform this list by subtracting y from each element, to form the list z[1], ..., z[n], whose elements may be positive or negative. Notice that any sublist x[i .. j] has average score at least y if and only if the sum of elements in the corresponding sublist in z (i.e., z[i] + z[i+1] + ... + z[j]) is at least 0. So, if we had a way to compute the maximum sum T of any sublist in z[] efficiently (spoiler: we do), this would, as a side effect, tell us if there is any sublist in x[] that has average score at least y: if T >= 0 then there is at least 1 such sublist, while if T < 0 then there is no sublist in x[] (not even a single-element sublist) that has average score at least y. But this doesn't yet give us all the information we need to answer your original question, since nothing forces the maximum-sum sublist in z to have maximum length: it could well be that a longer sublist exists that has lower overall average, while still having average at least y.
This can be addressed by generalising the problem of finding the sublist with maximum sum: instead of asking for a sublist with maximum sum overall, we will now ask for a sublist having maximum sum among all sublists having length at least some given k. I'll now describe an algorithm that, given a list of numbers z[1], ..., z[n], each of which can be positive or negative, and any positive integer k, will compute the maximum sum of any sublist of z[] having length at least k, as well as the location of a particular sublist that achieves this sum, and has longest possible length among all sublists having this sum. It's a slight generalisation of Kadane's algorithm.
FindMaxSumLongerThan(z[], k):
v = 0 # Sum of the rightmost k numbers in the current sublist
For i from 1 to k:
v = v + z[i]
best = v
bestStart = 1
bestEnd = k
# Now for each i, with k+1 <= i <= n, find the biggest sum ending at position i.
tail = -1 # Will contain the maximum sum among all lists ending at i-k
tailLen = 0 # The length of the longest list having the above sum
For i from k+1 to n:
If tail >= 0:
tail = tail + z[i-k]
tailLen = tailLen + 1
Else:
tail = z[i-k]
tailLen = 1
If tail >= 0:
nonnegTail = tail
nonnegTailLen = tailLen
Else:
nonnegTail = 0
nonnegTailLen = 0
v = v + z[i] - z[i-k] # Slide the window right 1 position
If v + nonnegTail > best:
best = v + nonnegTail
bestStart = i - k - nonnegTailLen + 1
bestEnd = i
The above algorithm takes O(n) time and O(1) space, returning the maximum sum in best and the beginning and ending positions of some sublist that achieves that sum in bestStart and bestEnd, respectively.
How is the above useful? For a given input list x[], suppose we first transform x[] into z[] by subtracting y from each element as described above; this will be the z[] passed into every call to FindMaxSumLongerThan(). We can view the value of best that results from calling the function with z[] and a given minimum sublist length k as a mathematical function of k: best(k). Since FindMaxSumLongerThan() finds the maximum sum of any sublist of z[] having length at least k, best(k) is a nonincreasing function of k. (Say we set k=5 and found that the maximum sum of any sublist is 42; then we are guaranteed to find a total of at least 42 if we try again with k=4 or k=3.) That means we can binary search on k to find the largest k such that best(k) >= 0: that k will then be the longest sublist of x[] that has average value at least y. The resulting bestStart and bestEnd will identify a particular sublist having this property; it's easy to modify the algorithm to find all (at most n -- one per rightmost position) of these sublists without increasing the time complexity.
I think that general solution is always O(N^2). I will demonstrate a code in Python and some optimizations you can implement to increase the performance by several orders of magnitude.
Let's generate some data:
from random import random
scores_list = [random() for i in range(10000)]
scores_len = len(scores_list)
Let's say these are our target values:
# Your average
avg = 0.55
# Your min lenght
min_len = 10
Here is a naive brute force solution
res = []
for i in range(scores_len - min_len):
for j in range(i+min_len, scores_len):
l = scores_list[i:j]
if sum(l) / (j - i) >= avg:
res.append(l)
That will run very slowly because it has to perform 10000^2 (10^8) operations.
Here is how we can do it better. It is still quadratic but there is some tricks wich allows it to perform much much faster:
res = []
i = 0
while i < scores_len - min_len:
j = i + min_len
di = scores_len
dj = 0
current_sum = sum(scores_list[i:j])
while j < scores_len:
current_sum += sum(scores_list[j-dj:j])
current_avg = current_sum/(j - i)
if current_avg >= avg:
res.append(scores_list[i:j])
dj = 1
di = 1
else:
dj = max(1, int((avg * (j - i) - current_sum)/(1 - avg)))
di = min(di, max(1, int(((j-i) * avg - current_sum)/avg)))
j += dj
i += di
For uniform distribution (which we have here) and for given target values it will perform only less than 10^6 operations (~7 * 10^5) and this is by two orders of magnitude less than brute force solution.
So basically if you have a few target sublists it will perform very good. And if you have a lot of them this algorithm will be about the same as a brute force one.

Find largest continuous sum such that the minimum of it and it's complement is largest

I'm given a sequence of numbers a_1,a_2,...,a_n. It's sum is S=a_1+a_2+...+a_n and I need to find a subsequence a_i,...,a_j such that min(S-(a_i+...+a_j),a_i+...+a_j) is the largest possible (both sums must be non-empty).
Example:
1,2,3,4,5 the sequence is 3,4, because then min(S-(a_i+...+a_j),a_i+...+a_j)=min(8,7)=7 (and it's the largest possible which can be checked for other subsequences).
I tried to do this the hard way.
I load all values into the array tab[n].
I do this n-1 times tab[i]+=tab[i-j]. So that tab[j] is the sum from the beginning till j.
I check all possible sums a_i+...+a_j=tab[j]-tab[i-1] and substract it from the sum, take the minimum and see if it's larger than before.
It takes O(n^2). This makes me very sad and miserable. Is there a better way?
Seems like this can be done in O(n) time.
Compute the sum S. The ideal subsequence sum is the longest one which gets closest to S/2.
Start with i=j=0 and increase j until sum(a_i..a_j) and sum(a_i..a_{j+1}) are as close as possible to S/2. Note which ever is closer and save the values of i_best,j_best,sum_best.
Increment i and then increase j again until sum(a_i..a_j) and sum(a_i..a_{j+1}) are as close as possible to S/2. Note which ever is closer and replace the values of i_best,j_best,sum_best if they are better. Repeat this step until done.
Note that both i and j are never decremented, so they are changed a total of at most O(n) times. Since all other operations take only constant time, this results in an O(n) runtime for the entire algorithm.
Let's first do some clarifications.
A subsequence of a sequence is actually a subset of the indices of the sequence. Haivng said that, and specifically int he case where you sequence has distinct elements, your problem will reduce to the famous Partition problem, which is known to be NP-complete. If that is the case, you can manage to solve the problem in O(Sn) where "n" is the number of elements and "S" is the total sum. This is not polynomial time as "S" can be arbitrarily large.
So lets consider the case with a contiguous subsequence. You need to observe array elements twice. First run sums them up into some "S". In the second run you carefully adjust array length. Lets assume you know that a[i] + a[i + 1] + ... + a[j] > S / 2. Then you let i = i + 1 to reduce the sum. Conversely, if it was smaller, you would increase j.
This code runs in O(n).
Python code:
from math import fabs
a = [1, 2, 3, 4, 5]
i = 0
j = 0
S = sum(a)
s = 0
while s + a[j] <= S / 2:
s = s + a[j]
j = j + 1
s = s + a[j]
best_case = (i, j)
best_difference = fabs(S / 2 - s)
while True:
if fabs(S / 2 - s) < best_difference:
best_case = (i, j)
best_difference = fabs(S / 2 - s)
if s > S / 2:
s -= a[i]
i += 1
else:
j += 1
if j == len(a):
break
s += a[j]
print best_case
i = best_case[0]
j = best_case[1]
print "Best subarray = ", a[i:j + 1]
print "Best sum = " , sum(a[i:j + 1])

The expected number of inversions--From Introduction to Algorithms by Cormen

Let A[1 .. n] be an array of n distinct numbers. If i < j and A[i] > A[j], then the pair (i, j) is called an inversion of A. (See Problem 2-4 for more on inversions.) Suppose that each element of A is chosen randomly, independently, and uniformly from the range 1 through n. Use indicator random variables to compute the expected number of inversions.
The problem is from exercise 5.2-5 in Introduction to Algorithms by Cormen. Here is my recursive solution:
Suppose x(i) is the number of inversions in a[1..i], and E(i) is the expected value of x(i), then E(i+1) can be computed as following:
Image we have i+1 positions to place all the numbers, if we place i+1 on the first position, then x(i+1) = i + x(i); if we place i+1 on the second position, then x(i+1) = i-1 + x(i),..., so E(i+1) = 1/(i+1)* sum(k) + E(i), where k = [0,i]. Finally we get E(i+1) = i/2 + E(i).
Because we know that E(2) = 0.5, so recursively we get: E(n) = (n-1 + n-2 + ... + 2)/2 + 0.5 = n* (n-1)/4.
Although the deduction above seems to be right, but I am still not very sure of that. So I share it here.
If there is something wrong, please correct me.
All the solutions seem to be correct, but the problem says that we should use indicator random variables. So here is my solution using the same:
Let Eij be the event that i < j and A[i] > A[j].
Let Xij = I{Eij} = {1 if (i, j) is an inversion of A
0 if (i, j) is not an inversion of A}
Let X = Σ(i=1 to n)Σ(j=1 to n)(Xij) = No. of inversions of A.
E[X] = E[Σ(i=1 to n)Σ(j=1 to n)(Xij)]
= Σ(i=1 to n)Σ(j=1 to n)(E[Xij])
= Σ(i=1 to n)Σ(j=1 to n)(P(Eij))
= Σ(i=1 to n)Σ(j=i + 1 to n)(P(Eij)) (as we must have i < j)
= Σ(i=1 to n)Σ(j=i + 1 to n)(1/2) (we can choose the two numbers in
C(n, 2) ways and arrange them
as required. So P(Eij) = C(n, 2) / n(n-1))
= Σ(i=1 to n)((n - i)/2)
= n(n - 1)/4
Another solution is even simpler, IMO, although it does not use "indicator random variables".
Since all of the numbers are distinct, every pair of elements is either an inversion (i < j with A[i] > A[j]) or a non-inversion (i < j with A[i] < A[j]). Put another way, every pair of numbers is either in order or out of order.
So for any given permutation, the total number of inversions plus non-inversions is just the total number of pairs, or n*(n-1)/2.
By symmetry of "less than" and "greater than", the expected number of inversions equals the expected number of non-inversions.
Since the expectation of their sum is n*(n-1)/2 (constant for all permutations), and they are equal, they are each half of that or n*(n-1)/4.
[Update 1]
Apparently my "symmetry of 'less than' and 'greater than'" statement requires some elaboration.
For any array of numbers A in the range 1 through n, define ~A as the array you get when you subtract each number from n+1. For example, if A is [2,3,1], then ~A is [2,1,3].
Now, observe that for any pair of numbers in A that are in order, the corresponding elements of ~A are out of order. (Easy to show because negating two numbers exchanges their ordering.) This mapping explicitly shows the symmetry (duality) between less-than and greater-than in this context.
So, for any A, the number of inversions equals the number of non-inversions in ~A. But for every possible A, there corresponds exactly one ~A; when the numbers are chosen uniformly, both A and ~A are equally likely. Therefore the expected number of inversions in A equals the expected number of inversions in ~A, because these expectations are being calculated over the exact same space.
Therefore the expected number of inversions in A equals the expected number of non-inversions. The sum of these expectations is the expectation of the sum, which is the constant n*(n-1)/2, or the total number of pairs.
[Update 2]
A simpler symmetry: For any array A of n elements, define ~A as the same elements but in reverse order. Associate the element at position i in A with the element at position n+1-i in ~A. (That is, associate each element with itself in the reversed array.)
Now any inversion in A is associated with a non-inversion in ~A, just as with the construction in Update 1 above. So the same argument applies: The number of inversions in A equals the number of inversions in ~A; both A and ~A are equally likely sequences; etc.
The point of the intuition here is that the "less than" and "greater than" operators are just mirror images of each other, which you can see either by negating the arguments (as in Update 1) or by swapping them (as in Update 2). So the expected number of inversions and non-inversions is the same, since you cannot tell whether you are looking at any particular array through a mirror or not.
Even simpler (similar to Aman's answer above, but perhaps clearer) ...
Let Xij be a random variable with Xij=1 if A[i] > A[j] and Xij=0 otherwise.
Let X=sum(Xij) over i, j where i < j
Number of pairs (ij)*: n(n-1)/2
Probability that Xij=1 (Pr(Xij=1))): 1/2
By linearity of expectation**: E(X) = E(sum(Xij))
= sum(E(Xij))
= sum(Pr(Xij=1))
= n(n-1)/2 * 1/2
= n(n-1)/4
* I think of this as the size of the upper triangle of a square matrix.
** All sums here are over i, j, where i < j.
I think it's right, but I think the proper way to prove it is to use conditionnal expectations :
for all X and Y we have : E[X] =E [E [X|Y]]
then in your case :
E(i+1) = E[x(i+1)] = E[E[x(i+1) | x(i)]] = E[SUM(k)/(1+i) + x(i)] = i/2 + E[x(i)] = i/2 + E(i)
about the second statement :
if :
E(n) = n* (n-1)/4.
then E(n+1) = (n+1)*n/4 = (n-1)*n/4 + 2*n/4 = (n-1)*n/4 + n/2 = E(n) +n/2
So n* (n-1)/4. verify the recursion relation for all n >=2 and it verifies it for n=2
So E(n) = n*(n-1)/4
Hope I understood your problem and it helps
Using indicator random variables:
Let X = random variable which is equal to the number of inversions.
Let Xij = 1 if A[i] and A[j] form an inversion pair, and Xij = 0 otherwise.
Number of inversion pairs = Sum over 1 <= i < j <= n of (Xij)
Now P[Xij = 1] = P[A[i] > A[j]] = (n choose 2) / (2! * n choose 2) = 1/2
E[X] = E[sum over all ij pairs such that i < j of Xij] = sum over all ij pairs such that i < j of E[Xij] = n(n - 1) / 4

Resources