Find scalar interval containing maximum elements from population A and zero elements from population B - algorithm

Given two large sets A and B of scalar (floating point) values, what algorithm would you use to find the (scalar) range [x0,x1] containing zero elements from B and the maximum number of elements from A?
Is sorting complexity (O(n log n)) unavoidable?

Create a single list with all values, where each value is marked with two counts: one count that relates to set A, and another that relates to set B. Initially these counts are 1 and 0, when the value comes from set A, and 0 and 1 when the value comes from set B. So entries in this list could be tuples (value, countA, countB). This operation is O(n).
Sort these tuples. O(nlogn)
Merge tuples with duplicate values into one tuple, and accumulate the values in the corresponding counters, so that the tuple tells us how many times the value occurs in set A and how many times in set B. O(n)
Traverse this list in sorted order and maintain the largest sum of counts for countA of a series of adjacent tuples where countB is always 0, and the minimum and maximum value of that range. O(n)
The sorting is the determining factor of the time complexity: O(nlogn).

Sort both A and B in O(|A| log |A| + |B| log |B|). Then apply the following algorithm, which has complexity O(|A| + |B|):
i = j = k = 0
best_interval = (0, 1)
while i < len(B) - 1:
lo = B[i]
hi = B[i+1]
j = k # We can skip ahead from last iteration.
while j < len(A) and A[j] <= lo:
j += 1
k = j # We can skip ahead from the above loop.
while k < len(A) and A[k] < hi:
k += 1
if k - j > best_interval[1] - best_interval[0]:
best_interval = (j, k)
i += 1
x0 = A[best_interval[0]]
x1 = A[best_interval[1]-1]
It may look quadratic at a first inspection but note we never decrease j and k - it really is just a linear scan with three pointers.

Related

Find continuous subarrays that have at least 1 pair adding up to target sum - Optimization

I took this assessment that had this prompt, and I was able to pass 18/20 tests, but not the last 2 due to hitting the execution time limit. Unfortunately, the input values were not displayed for these tests.
Prompt:
// Given an array of integers **a**, find how many of its continuous subarrays of length **m** that contain at least 1 pair of integers with a sum equal to **k**
Example:
const a = [1,2,3,4,5,6,7];
const m = 5, k = 5;
solution(a, m, k) will yield 2, because there are 2 subarrays in a that have at least 1 pair that add up to k
a[0]...a[4] - [1,2,3,4,5] - 2 + 3 = k ✓
a[1]...a[5] - [2,3,4,5,6] - 2 + 3 = k ✓
a[2]...a[6] - [3,4,5,6,7] - no two elements add up to k ✕
Here was my solution:
// strategy: check each subarray if it contains a two sum pair
// time complexity: O(n * m), where n is the size of a and m is the subarray length
// space complexity: O(m), where m is the subarray length
function solution(a, m, k) {
let count = 0;
for(let i = 0; i <= a.length - m; i++){
let set = new Set();
for(let j = i; j < i + m; j++){
if(set.has(k - a[j])){
count++;
break;
}
else
set.add(a[j]);
}
}
return count;
}
I thought of ways to optimize this algo, but failed to come up with any. Is there any way this can be optimized further for time complexity - perhaps for any edge cases?
Any feedback would be much appreciated!
maintain a map of highest position of the last m values (add/remove/query is O(1)) and highest position of the first value of a complementary pair
for each array element, check if complementary element is in the map, update the highest position if necessary.
if at least m elements were processed and higest position is in the range, increase counter
O(n) overall. Python:
def solution(a, m, k):
count = 0
last_pos = {} # value: last position observed
max_complement_pos = -1
for head, num in enumerate(a, 1): # advance head by one
tail = head - m
# deletion part is to keep space complexity O(m).
# If this is not a concern (likely), safe to omit
if tail > 0 and last_pos[a[tail]] <= tail: # time to pop last element
del last_pos[a[tail]]
max_complement_pos = max(max_complement_pos, last_pos.get(k-num, -1))
count += head >= m and max_complement_pos > tail
last_pos[num] =head # add element at head
return count
Create a counting hash: elt -> count.
When the window moves:
add/increment the new element
decrement the departing element
check if (k - new_elt) is in your hash with a count >= 1. If it is, you've found a good subarray.

What is the minimum cost of arranging the range a (n) (a [i] <= 20) such that the equal values will form a continuous segment?

You provide 1 string: a1, a2..an (a [i] <= 20)
Requirement: The minimum cost (number of steps) to swap any two elements in the sequence so that the final sequence obtained has equal values ​​that lie in succession:
Each step you can only choose 2 adjacent values to swap: swap (a [i], a [i + 1]) = 1steps
example:
1 1 3 1 3 2 3 2
Swap (a [3], a [4])
Swap (a [6], a [7])
-> 1 1 1 3 3 3 2 2
minimum = 2
I need your help.
Note that since A[i] <= 20 we can go ahead and enumerate every subset of all A[i] and fit comfortably within any time constraints.
Let M be the number of unique A[i], then there is a O(NM + M * 2^M) dynamic programming solution with bitmasks.
note that when I say moving an A[i] I mean moving every element with value A[i]
To understand how we do this let's first consider the brute force solution. We have some set of unique A[i] moved to the front of the string, and then at each step we pick the next A[i] to move behind what we had originally. This is O(M! * N).
There's one important observation to be made here: if we have some set of A[i] at the start of the string, and then we move the next one, the order of our original set of A[i] doesn't actually matter. Any move will cost the same regardless of the order.
Let cost(subset, A[i]) be the cost of moving all A[i] behind that subset of A[i] at the front of the string. Then we can write the following:
dp = [float('inf')] * (1 << M) # every subset of A[i]
dp[0] = 0
for mask in range(len(dp)):
for bit in range(M):
# if this A[i] hasn't been moved to the front, we move it to the front
if (mask >> bit) & 1 == 0:
dp[mask^(1 << bit)] = min(dp[mask^(1 << bit)], dp[mask] + cost(mask, bit))
If we compute cost naively then we have O(M * 2^M * N). However we can precompute every value of cost with O(1) per value.
Here's how we can do this:
Idea: The number of swaps needed to sort an array is the number of inversions.
Let's define a new array inversions[M][M], where inversions[i][j] is the number of times j comes after i in the arr. For clarity here's how we would compute it naively:
for i in range(len(arr)):
for j in range(i + 1, len(arr)):
if arr[i] != arr[j]: inversions[arr[i]][arr[j]] += 1
Assume that we have inversions, then we can compute cost(subset, A[i]) like so:
cost = 0
for bit in range(M):
# if bit isn't in the mask and thus needs to get swapped with A[i]
if (subset >> bit) & 1 == 0:
cost += inversions[bit][A[i]]
What's left is the following:
Compute inversions in O(NM). This can be done with keeping a count of each M at each index in N.
Currently cost is O(M) and not O(1). We can run a separate dynamic programming on cost to build an array cost[(1 << M)][M], where cost[i][j] is the cost to move item j to subset i.
For sake of completeness here is a complete code written in C++. It's my submission for the same problem on codeforces. Note that in that code cost is named contribution.

Finding best algorithm for sum of a section of an array's values

Given an array of n integers in the locations A[1], A[2], …, A[n], describe an O(n^2) time algorithm to
compute the sum A[i] + A[i+1] + … + A[j] for all i, j, 1 ≤ i < j ≤ n.
I've tried multiple ways of solving this problem but none have in O(n^2) time.
So for an array containing {1,2,3,4}
You would output:
1+2 = 3
1+2+3 = 6
1+2+3+4 = 10
2+3 = 5
2+3+4 = 9
3+4 = 7
The answer does not need to be in a specific language, pseudocode is preferred.
A good preperation is everything.
You could create an array of integrals:
I[0..n] = (0, I[0] + A[1], I[1] + A[2], ..., I[n-1]+A[n]);
This will cost you O(n) * O(1) (looping over all elements and doing one addition);
Now you can calculate each Sum(A, i, j) with just a single subtraction: I[j] - I[i-1];
so this has O(1)
Looping over all combinations of i and j with 1 <= (i,j) <= n has O(n^2).
So you end up with O(n) * O(1) + O(n^2) * O(1) = O(n^2) .
Edit:
Your array A starts at 1 - adapted to this - this also solves the little quirk with i-1
So the integral array I starts with index 0 and is 1 element larger than A
Edit:
First you'll maybe have thought about the most naive idea:
Naive idea
Create a function that for given values of i and of j will return the sum A[i] + ... + A[j].
function sumRange(A, i, j):
sum = 0
for k = i to j
sum = sum + A[k]
return sum
Then generate all pairs of i and j (with i < j) and call the above function for each pair:
for i = 1 to n
for j = i+1 to n
output sumRange(A, i, j)
This is not O(n²), because already the two loops on i and j represent O(n²) iterations, and then the function will perform yet another loop, making it O(n³).
Better idea
The above can be improved. Look at the repetition it performs. The sum that was calculated for given values of i and j could be reused to calculate the sum for when j has increased with 1, without starting from scratch and summing the values between i and (now) j-1 again, only to add that one more value to it.
We should just remember what the previous sum was, and add A[j] to it.
So without a separate function:
for i = 1 to n
sum = A[i]
for j = i+1 to n
sum = sum + A[j]
output sum
Note how the sum is not reset to 0 once it is output. It is preserved, so that when j is incremented, only one value needs to be added to it.
Now it is O(n²). Note also how it does not require an extra array for storage. It only needs the memory for a few variables (i, j, sum), so its space complexity is O(1).
As the number of sums you need to output is O(n²), there is no way to improve this time complexity any further.
NB: I assume here that single array values do not constitute a "sum". As you stated in your question, i < j, and also in your example you only showed sums of at least two array values. The above can be easily adapted to also include single value "sums" if ever that were needed.

How to make the run time of the program to ϴ n

The requirement is that the input will be set of integer ranging from -5 to 5, the result should give the longest subset of the integer, in which the total must be greater or equal to zero.
I can only come up with the following:
The input will be input[0 to n]
let start, longestStart, end, longestEnd, sum = 0
for i=0 to n-1
start = i
sum = input[i]
for j=1 to n
if sum + input[j] >= 0 then
end=j;
if end - start > longestEnd - longestStart then
longestStart = start;
longestEnd = end;
However this is ϴ(n^2). I would like to know what are the ways to make this loop become ϴ(n)
Thank you
Since
a - b == (a + n) - (b + n)
for any a, b or n, we can apply this to the array of numbers, keeping a running total of all elements from 0 to current. From the above equation, the sum of any subarray from index a to b is sum(elements 0-b) - sum(elements 0-a).
By keeping track of local minima and maxima, and the sums to them, you can find the subarray with the greatest range in one pass, ie O(n).

What is the fastest algorithm for intersection of two sorted lists?

Say that there are two sorted lists: A and B.
The number of entries in A and B can vary. (They can be very small/huge. They can be similar to each other/significantly different).
What is the known to be the fastest algorithm for this functionality?
Can any one give me an idea or reference?
Assume that A has m elements and B has n elements, with m ≥ n. Information theoretically, the best we can do is
(m + n)!
lg -------- = n lg (m/n) + O(n)
m! n!
comparisons, since in order to verify an empty intersection, we essentially have to perform a sorted merge. We can get within a constant factor of this bound by iterating through B and keeping a "cursor" in A indicating the position at which the most recent element of B should be inserted to maintain sorted order. We use exponential search to advance the cursor, for a total cost that is on the order of
lg x_1 + lg x_2 + ... + lg x_n,
where x_1 + x_2 + ... + x_n = m + n is some integer partition of m. This sum is O(n lg (m/n)) by the concavity of lg.
I don't know if this is the fastest option but here's one that runs in O(n+m) where n and m are the sizes of your lists:
Loop over both lists until one of them is empty in the following way:
Advance by one on one list.
Advance on the other list until you find a value that is either equal or greater than the current value of the other list.
If it is equal, the element belongs to the intersection and you can append it to another list
If it is greater that the other element, advance on the other list until you find a value equal or greater than this value
as said, repeat this until one of the lists is empty
Here is a simple and tested Python implementation that uses bisect search to advance pointers of both lists.
It assumes both input lists are sorted and contain no duplicates.
import bisect
def compute_intersection_list(l1, l2):
# A is the smaller list
A, B = (l1, l2) if len(l1) < len(l2) else (l2, l1)
i = 0
j = 0
intersection_list = []
while i < len(A) and j < len(B):
if A[i] == B[j]:
intersection_list.append(A[i])
i += 1
j += 1
elif A[i] < B[j]:
i = bisect.bisect_left(A, B[j], lo=i+1)
else:
j = bisect.bisect_left(B, A[i], lo=j+1)
return intersection_list
# test on many random cases
import random
MM = 100 # max value
for _ in range(10000):
M1 = random.randint(0, MM) # random max value
N1 = random.randint(0, M1) # random number of values
M2 = random.randint(0, MM) # random max value
N2 = random.randint(0, M2) # random number of values
a = sorted(random.sample(range(M1), N1)) # sampling without replacement to have no duplicates
b = sorted(random.sample(range(M2), N2))
assert compute_intersection_list(a, b) == sorted(set(a).intersection(b))

Resources