Evenly pruning an ordered set - algorithm

Given an ordered set of n elements (a1, a2, ..., an), what algorithm can I use to pick M of these elements (M < n) such that the sampling from the original set is as spread out as possible?
For example, if the starting set is the numbers from 1 to 9 (i.e. n=9), and I want to evenly sample it so I end up with only 5 of those numbers (i.e. M=9), I'd select 1, 3, 5, 7, 9. To get 3 of the original 9, I'd go with 1, 5, 9, and so on. But what would the pseudo-code for picking the elements look like for any n and M?
The mathematical formulation for this problem would be as follows: given M < n, find the set q(1), q(2), ..., q(M) such that 1 <= q(k) < q(k+1) <= M for any k:[1, M-1], and the sum of q(k+1)-q(k) for k: [1, M-1] is the maximum possible.

Following Nico's suggestion of maximizing the minimum difference (the form of the objective function is fairly flexible), there's a simple O(n^2 M)-time dynamic program that goes something like this.
def prune(a, M):
n = len(a)
# The [i][j] entry is the max min gap
# for a[0], ..., a[i] choose j + 1 (a[0] is always chosen).
table = [[float('inf')] * M for i in range(n)]
for i in range(1, n):
for j in range(M):
table[i][j] = max(min(table[k][j], a[i] - a[k])
for k in range(i))
# Trace back the sequence of argmaxes to recover the chosen indexes.
This can be improved to O(n M) time using total monotonicity. The idea is that, as i increases, the proper value for k can only increase too, and as soon as the objective decreases when we increase k, we can move on to the next i.
Both of the above algorithms handle fairly general objectives. If max is OK, then there's an O(n log n)-time algorithm that uses binary search to guess the minimum gap and then check whether it's feasible. I'll wait for you to update the question.

Related

Count of divisors of numbers till N in O(N)?

So, we can count divisors of each number from 1 to N in O(NlogN) algorithm with sieve:
int n;
cin >> n;
for (int i = 1; i <= n; i++) {
for (int j = i; j <= n; j += i) {
cnt[j]++; //// here cnt[x] means count of divisors of x
}
}
Is there way to reduce it to O(N)?
Thanks in advance.
Here is a simple optimization on #גלעד ברקן's solution. Rather than use sets, use arrays. This is about 10x as fast as the set version.
n = 100
answer = [None for i in range(0, n+1)]
answer[1] = 1
small_factors = [1]
p = 1
while (p < n):
p = p + 1
if answer[p] is None:
print("\n\nPrime: " + str(p))
limit = n / p
new_small_factors = []
for i in small_factors:
j = i
while j <= limit:
new_small_factors.append(j)
answer[j * p] = answer[j] + answer[i]
j = j * p
small_factors = new_small_factors
print("\n\nAnswer: " + str([(k,d) for k,d in enumerate(answer)]))
It is worth noting that this is also a O(n) algorithm for enumerating primes. However with the use of a wheel generated from all of the primes below size log(n)/2 it can create a prime list in time O(n/log(log(n))).
How about this? Start with the prime 2 and keep a list of tuples, (k, d_k), where d_k is the number of divisors of k, starting with (1,1):
for each prime, p (ascending and lower than or equal to n / 2):
for each tuple (k, d_k) in the list:
if k * p > n:
remove the tuple from the list
continue
power = 1
while p * k <= n:
add the tuple to the list if k * p^power <= n / p
k = k * p
output (k, (power + 1) * d_k)
power = power + 1
the next number the output has skipped is the next prime
(since clearly all numbers up to the next prime are
either smaller primes or composites of smaller primes)
The method above also generates the primes, relying on O(n) memory to keep finding the next prime. Having a more efficient, independent stream of primes could allow us to avoid appending any tuples (k, d_k) to the list, where k * next_prime > n, as well as free up all memory holding output greater than n / next_prime.
Python code
Consider the total of those counts, sum(phi(i) for i=1,n). That sum is O(N log N), so any O(N) solution would have to bypass individual counting.
This suggests that any improvement would need to depend on prior results (dynamic programming). We already know that phi(i) is the product of each prime degree plus one. For instance, 12 = 2^2 * 3^1. The degrees are 2 and 1, respective. (2+1)*(1+1) = 6. 12 has 6 divisors: 1, 2, 3, 4, 6, 12.
This "reduces" the question to whether you can leverage the prior knowledge to get an O(1) way to compute the number of divisors directly, without having to count them individually.
Think about the given case ... divisor counts so far include:
1 1
2 2
3 2
4 3
6 4
Is there an O(1) way to get phi(12) = 6 from these figures?
Here is an algorithm that is theoretically better than O(n log(n)) but may be worse for reasonable n. I believe that its running time is O(n lg*(n)) where lg* is the https://en.wikipedia.org/wiki/Iterated_logarithm.
First of all you can find all primes up to n in time O(n) using the Sieve of Atkin. See https://en.wikipedia.org/wiki/Sieve_of_Atkin for details.
Now the idea is that we will build up our list of counts only inserting each count once. We'll go through the prime factors one by one, and insert values for everything with that as the maximum prime number. However in order to do that we need a data structure with the following properties:
We can store a value (specifically the count) at each value.
We can walk the list of inserted values forwards and backwards in O(1).
We can find the last inserted number below i "efficiently".
Insertion should be "efficient".
(Quotes are the parts that are hard to estimate.)
The first is trivial, each slot in our data structure needs a spot for the value. The second can be done with a doubly linked list. The third can be done with a clever variation on a skip-list. The fourth falls out from the first 3.
We can do this with an array of nodes (which do not start out initialized) with the following fields that look like a doubly linked list:
value The answer we are looking for.
prev The last previous value that we have an answer for.
next The next value that we have an answer for.
Now if i is in the list and j is the next value, the skip-list trick will be that we will also fill in prev for the first even after i, the first divisible by 4, divisible by 8 and so on until we reach j. So if i = 81 and j = 96 we would fill in prev for 82, 84, 88 and then 96.
Now suppose that we want to insert a value v at k between an existing i and j. How do we do it? I'll present pseudocode starting with only k known then fill it out for i = 81, j = 96 and k = 90.
k.value := v
for temp in searching down from k for increasing factors of 2:
if temp has a value:
our_prev := temp
break
else if temp has a prev:
our_prev = temp.prev
break
our_next := our_prev.next
our_prev.next := k
k.next := our_next
our_next.prev := k
for temp in searching up from k for increasing factors of 2:
if j <= temp:
break
temp.prev = k
k.prev := our_prev
In our particular example we were willing to search downwards from 90 to 90, 88, 80, 64, 0. But we actually get told that prev is 81 when we get to 88. We would be willing to search up to 90, 92, 96, 128, 256, ... however we just have to set 92.prev 96.prev and we are done.
Now this is a complicated bit of code, but its performance is O(log(k-i) + log(j-k) + 1). Which means that it starts off as O(log(n)) but gets better as more values get filled in.
So how do we initialize this data structure? Well we initialize an array of uninitialized values then set 1.value := 0, 1.next := n+1, and 2.prev := 4.prev := 8.prev := 16.prev := ... := 1. And then we start processing our primes.
When we reach prime p we start by searching for the previous inserted value below n/p. Walking backwards from there we keep inserting values for x*p, x*p^2, ... until we hit our limit. (The reason for backwards is that we do not want to try to insert, say, 18 once for 3 and once for 9. Going backwards prevents that.)
Now what is our running time? Finding the primes is O(n). Finding the initial inserts is also easily O(n/log(n)) operations of time O(log(n)) for another O(n). Now what about the inserts of all of the values? That is trivially O(n log(n)) but can we do better?
Well first all of the inserts to density 1/log(n) filled in can be done in time O(n/log(n)) * O(log(n)) = O(n). And then all of the inserts to density 1/log(log(n)) can likewise be done in time O(n/log(log(n))) * O(log(log(n))) = O(n). And so on with increasing numbers of logs. The number of such factors that we get is O(lg*(n)) for the O(n lg*(n)) estimate that I gave.
I haven't shown that this estimate is as good as you can do, but I think that it is.
So, not O(n), but pretty darned close.

Minimum sum that cant be obtained from a set

Given a set S of positive integers whose elements need not to be distinct i need to find minimal non-negative sum that cant be obtained from any subset of the given set.
Example : if S = {1, 1, 3, 7}, we can get 0 as (S' = {}), 1 as (S' = {1}), 2 as (S' = {1, 1}), 3 as (S' = {3}), 4 as (S' = {1, 3}), 5 as (S' = {1, 1, 3}), but we can't get 6.
Now we are given one array A, consisting of N positive integers. Their are M queries,each consist of two integers Li and Ri describe i'th query: we need to find this Sum that cant be obtained from array elements ={A[Li], A[Li+1], ..., A[Ri-1], A[Ri]} .
I know to find it by a brute force approach to be done in O(2^n). But given 1 ≤ N, M ≤ 100,000.This cant be done .
So is their any effective approach to do it.
Concept
Suppose we had an array of bool representing which numbers so far haven't been found (by way of summing).
For each number n we encounter in the ordered (increasing values) subset of S, we do the following:
For each existing True value at position i in numbers, we set numbers[i + n] to True
We set numbers[n] to True
With this sort of a sieve, we would mark all the found numbers as True, and iterating through the array when the algorithm finishes would find us the minimum unobtainable sum.
Refinement
Obviously, we can't have a solution like this because the array would have to be infinite in order to work for all sets of numbers.
The concept could be improved by making a few observations. With an input of 1, 1, 3, the array becomes (in sequence):
(numbers represent true values)
An important observation can be made:
(3) For each next number, if the previous numbers had already been found it will be added to all those numbers. This implies that if there were no gaps before a number, there will be no gaps after that number has been processed.
For the next input of 7 we can assert that:
(4) Since the input set is ordered, there will be no number less than 7
(5) If there is no number less than 7, then 6 cannot be obtained
We can come to a conclusion that:
(6) the first gap represents the minimum unobtainable number.
Algorithm
Because of (3) and (6), we don't actually need the numbers array, we only need a single value, max to represent the maximum number found so far.
This way, if the next number n is greater than max + 1, then a gap would have been made, and max + 1 is the minimum unobtainable number.
Otherwise, max becomes max + n. If we've run through the entire S, the result is max + 1.
Actual code (C#, easily converted to C):
static int Calculate(int[] S)
{
int max = 0;
for (int i = 0; i < S.Length; i++)
{
if (S[i] <= max + 1)
max = max + S[i];
else
return max + 1;
}
return max + 1;
}
Should run pretty fast, since it's obviously linear time (O(n)). Since the input to the function should be sorted, with quicksort this would become O(nlogn). I've managed to get results M = N = 100000 on 8 cores in just under 5 minutes.
With numbers upper limit of 10^9, a radix sort could be used to approximate O(n) time for the sorting, however this would still be way over 2 seconds because of the sheer amount of sorts required.
But, we can use statistical probability of 1 being randomed to eliminate subsets before sorting. On the start, check if 1 exists in S, if not then every query's result is 1 because it cannot be obtained.
Statistically, if we random from 10^9 numbers 10^5 times, we have 99.9% chance of not getting a single 1.
Before each sort, check if that subset contains 1, if not then its result is one.
With this modification, the code runs in 2 miliseconds on my machine. Here's that code on http://pastebin.com/rF6VddTx
This is a variation of the subset-sum problem, which is NP-Complete, but there is a pseudo-polynomial Dynamic Programming solution you can adopt here, based on the recursive formula:
f(S,i) = f(S-arr[i],i-1) OR f(S,i-1)
f(-n,i) = false
f(_,-n) = false
f(0,i) = true
The recursive formula is basically an exhaustive search, each sum can be achieved if you can get it with element i OR without element i.
The dynamic programming is achieved by building a SUM+1 x n+1 table (where SUM is the sum of all elements, and n is the number of elements), and building it bottom-up.
Something like:
table <- SUM+1 x n+1 table
//init:
for each i from 0 to SUM+1:
table[0][i] = true
for each j from 1 to n:
table[j][0] = false
//fill the table:
for each i from 1 to SUM+1:
for each j from 1 to n+1:
if i < arr[j]:
table[i][j] = table[i][j-1]
else:
table[i][j] = table[i-arr[j]][j-1] OR table[i][j-1]
Once you have the table, you need the smallest i such that for all j: table[i][j] = false
Complexity of solution is O(n*SUM), where SUM is the sum of all elements, but note that the algorithm can actually be trimmed after the required number was found, without the need to go on for the next rows, which are un-needed for the solution.

Efficient iteration over sorted partial sums

I have a list of N positive numbers sorted in ascending order, L[0] to L[N-1].
I want to iterate over subsets of M distinct list elements (without replacement, order not important), 1 <= M <= N, sorted according to their partial sum. M is not fixed, the final result should consider all possible subsets.
I only want the K smallest subsets efficiently (ideally polynomial in K). The obvious algorithm of enumerating all subsets with M <= K is O(K!).
I can reduce the problem to subsets of fixed size M, by placing K iterators (1 <= M <= K) in a min-heap and having the master iterator operate on the heap root.
Essentially I need the Python function call:
sorted(itertools.combinations(L, M), key=sum)[:K]
... but efficient (N ~ 200, K ~ 30), should run in less than 1sec.
Example:
L = [1, 2, 5, 10, 11]
K = 8
answer = [(1,), (2,), (1,2), (5,), (1,5), (2,5), (1,2,5), (10,)]
Answer:
As David's answer shows, the important trick is that for a subset S to be outputted, all subsets of S must have been previously outputted, in particular the subsets where only 1 element has been removed. Thus, every time you output a subset, you can add all 1-element extensions of this subset for consideration (a maximum of K), and still be sure that the next outputted subset will be in the list of all considered subsets up to this point.
Fully working, more efficient Python function:
def sorted_subsets(L, K):
candidates = [(L[i], (i,)) for i in xrange(min(len(L), K))]
for j in xrange(K):
new = candidates.pop(0)
yield tuple(L[i] for i in new[1])
new_candidates = [(L[i] + new[0], (i,) + new[1]) for i in xrange(new[1][0])]
candidates = sorted(candidates + new_candidates)[:K-j-1]
UPDATE, found an O(K log K) algorithm.
This is similar to the trick above, but instead of adding all 1-element extensions with the elements added greater than the max of the subset, you consider only 2 extensions: one that adds max(S)+1, and the other one that shifts max(S) to max(S) + 1 (that would eventually generate all 1-element extensions to the right).
import heapq
def sorted_subsets_faster(L, K):
candidates = [(L[0], (0,))]
for j in xrange(K):
new = heapq.heappop(candidates)
yield tuple(L[i] for i in new[1])
i = new[1][-1]
if i+1 < len(L):
heapq.heappush(candidates, (new[0] + L[i+1], new[1] + (i+1,)))
heapq.heappush(candidates, (new[0] - L[i] + L[i+1], new[1][:-1] + (i+1,)))
From my benchmarks, it is faster for ALL values of K.
Also, it is not necessary to supply in advance the value of K, we can just iterate and stop whenever, without changing the efficiency of the algorithm. Also note that the number of candidates is bounded by K+1.
It might be possible to improve even further by using a priority deque (min-max heap) instead of a priority queue, but frankly I'm satisfied with this solution. I'd be interested in a linear algorithm though, or a proof that it's impossible.
Here's some rough Python-ish pseudo-code:
final = []
L = L[:K] # Anything after the first K is too big already
sorted_candidates = L[]
while len( final ) < K:
final.append( sorted_candidates[0] ) # We keep it sorted so the first option
# is always the smallest sum not
# already included
# If you just added a subset of size A, make a bunch of subsets of size A+1
expansion = [sorted_candidates[0].add( x )
for x in L and x not already included in sorted_candidates[0]]
# We're done with the first element, so remove it
sorted_candidates = sorted_candidates[1:]
# Now go through and build a new set of sorted candidates by getting the
# smallest possible ones from sorted_candidates and expansion
new_candidates = []
for i in range(K - len( final )):
if sum( expansion[0] ) < sum( sorted_candidates[0] ):
new_candidates.append( expansion[0] )
expansion = expansion[1:]
else:
new_candidates.append( sorted_candidates[0] )
sorted_candidates = sorted_candidates[1:]
sorted_candidates = new_candidates
We'll assume that you will do things like removing the first element of an array in an efficient way, so the only real work in the loop is in building expansion and in rebuilding sorted_candidates. Both of these have fewer than K steps, so as an upper bound, you're looking at a loop that is O(K) and that is run K times, so O(K^2) for the algorithm.

Sum-subset with a fixed subset size

The sum-subset problem states:
Given a set of integers, is there a non-empty subset whose sum is zero?
This problem is NP-complete in general. I'm curious if the complexity of this slight variant is known:
Given a set of integers, is there a subset of size k whose sum is zero?
For example, if k = 1, you can do a binary search to find the answer in O(log n). If k = 2, then you can get it down to O(n log n) (e.g. see Find a pair of elements from an array whose sum equals a given number). If k = 3, then you can do O(n^2) (e.g. see Finding three elements in an array whose sum is closest to a given number).
Is there a known bound that can be placed on this problem as a function of k?
As motivation, I was thinking about this question How do you partition an array into 2 parts such that the two parts have equal average? and trying to determine if it is actually NP-complete. The answer lies in whether or not there is a formula as described above.
Barring a general solution, I'd be very interested in knowing an optimal bound for k=4.
For k=4, space complexity O(n), time complexity O(n2 * log(n))
Sort the array. Starting from 2 smallest and 2 largest elements, calculate all lesser sums of 2 elements (a[i] + a[j]) in the non-decreasing order and all greater sums of 2 elements (a[k] + a[l]) in the non-increasing order. Increase lesser sum if total sum is less than zero, decrease greater one if total sum is greater than zero, stop when total sum is zero (success) or a[i] + a[j] > a[k] + a[l] (failure).
The trick is to iterate through all the indexes i and j in such a way, that (a[i] + a[j]) will never decrease. And for k and l, (a[k] + a[l]) should never increase. A priority queue helps to do this:
Put key=(a[i] + a[j]), value=(i = 0, j = 1) to priority queue.
Pop (sum, i, j) from priority queue.
Use sum in the above algorithm.
Put (a[i+1] + a[j]), i+1, j and (a[i] + a[j+1]), i, j+1 to priority queue only if these elements were not already used. To keep track of used elements, maintain an array of maximal used 'j' for each 'i'. It is enough to use only values for 'j', that are greater, than 'i'.
Continue from step 2.
For k>4
If space complexity is limited to O(n), I cannot find anything better, than use brute force for k-4 values and the above algorithm for the remaining 4 values. Time complexity O(n(k-2) * log(n)).
For very large k integer linear programming may give some improvement.
Update
If n is very large (on the same order as maximum integer value), it is possible to implement O(1) priority queue, improving complexities to O(n2) and O(n(k-2)).
If n >= k * INT_MAX, different algorithm with O(n) space complexity is possible. Precalculate a bitset for all possible sums of k/2 values. And use it to check sums of other k/2 values. Time complexity is O(n(ceil(k/2))).
The problem of determining whether 0 in W + X + Y + Z = {w + x + y + z | w in W, x in X, y in Y, z in Z} is basically the same except for not having annoying degenerate cases (i.e., the problems are inter-reducible with minimal resources).
This problem (and thus the original for k = 4) has an O(n^2 log n)-time, O(n)-space algorithm. The O(n log n)-time algorithm for k = 2 (to determine whether 0 in A + B) accesses A in sorted order and B in reverse sorted order. Thus all we need is an O(n)-space iterator for A = W + X, which can be reused symmetrically for B = Y + Z. Let W = {w1, ..., wn} in sorted order. For all x in X, insert a key-value item (w1 + x, (1, x)) into a priority queue. Repeatedly remove the min element (wi + x, (i, x)) and insert (wi+1 + x, (i+1, x)).
Question that is very similar:
Is this variant of the subset sum problem easier to solve?
It's still NP-complete.
If it were not, the subset-sum would also be in P, as it could be represented as F(1) | F(2) | ... F(n) where F is your function. This would have O(O(F(1)) + O(F(2)) + O(F(n))) which would still be polynomial, which is incorrect as we know it's NP-complete.
Note that if you have certain bounds on the inputs you can achieve polynomial time.
Also note that the brute-force runtime can be calculated with binomial coefficients.
The solution for k=4 in O(n^2log(n))
Step 1: Calculate the pairwise sum and sort the list. There are n(n-1)/2 sums. So the complexity is O(n^2log(n)). Keep the identities of the individuals which make the sum.
Step 2: For each element in the above list search for the complement and make sure they don't share "the individuals). There are n^2 searches, each with complexity O(log(n))
EDIT: The space complexity of the original algorithm is O(n^2). The space complexity can be reduced to O(1) by simulating a virtual 2D matrix (O(n), if you consider space to store sorted version of the array).
First about 2D matrix: sort the numbers and create a matrix X using pairwise sums. Now the matrix is ins such a way that all the rows and columns are sorted. To search for a value in this matrix, search the numbers on the diagonal. If the number is in between X[i,i] and X[i+1,i+1], you can basically halve the search space by to matrices X[i:N, 0:i] and X[0:i, i:N]. The resulting search algorithm is O(log^2n) (I AM NOT VERY SURE. CAN SOMEBODY CHECK IT?).
Now, instead of using a real matrix, use a virtual matrix where X[i,j] are calculated as needed instead of pre-computing them.
Resulting time complexity: O( (nlogn)^2 ).
PS: In the following link, it says the complexity of 2D sorted matrix search is O(n) complexity. If that is true (i.e. O(log^2n) is incorrect), then the finally complexity is O(n^3).
To build on awesomo's answer... if we can assume that numbers are sorted, we can do better than O(n^k) for given k; simply take all O(n^(k-1)) subsets of size (k-1), then do a binary search in what remains for a number that, when added to the first (k-1), gives the target. This is O(n^(k-1) log n). This means the complexity is certainly less than that.
In fact, if we know that the complexity is O(n^2) for k=3, we can do even better for k > 3: choose all (k-3)-subsets, of which there are O(n^(k-3)), and then solve the problem in O(n^2) on the remaining elements. This is O(n^(k-1)) for k >= 3.
However, maybe you can do even better? I'll think about this one.
EDIT: I was initially going to add a lot proposing a different take on this problem, but I've decided to post an abridged version. I encourage other posters to see whether they believe this idea has any merit. The analysis is tough, but it might just be crazy enough to work.
We can use the fact that we have a fixed k, and that sums of odd and even numbers behave in certain ways, to define a recursive algorithm to solve this problem.
First, modify the problem so that you have both even and odd numbers in the list (this can be accomplished by dividing by two if all are even, or by subtracting 1 from numbers and k from the target sum if all are odd, and repeating as necessary).
Next, use the fact that even target sums can be reached only by using an even number of odd numbers, and odd target sums can be reached using only an odd number of odd numbers. Generate appropriate subsets of the odd numbers, and call the algorithm recursively using the even numbers, the sum minus the sum of the subset of odd numbers being examined, and k minus the size of the subset of odd numbers. When k = 1, do binary search. If ever k > n (not sure this can happen), return false.
If you have very few odd numbers, this could allow you to very quickly pick up terms that must be part of a winning subset, or discard ones that cannot. You can transform problems with lots of even numbers to equivalent problems with lots of odd numbers by using the subtraction trick. The worst case must therefore be when the numbers of even and odd numbers are very similar... and that's where I am right now. A uselessly loose upper bound on this is many orders of magnitudes worse than brute-force, but I feel like this is probably at least as good as brute-force. Thoughts are welcome!
EDIT2: An example of the above, for illustration.
{1, 2, 2, 6, 7, 7, 20}, k = 3, sum = 20.
Subset {}:
{2, 2, 6, 20}, k = 3, sum = 20
= {1, 1, 3, 10}, k = 3, sum = 10
Subset {}:
{10}, k = 3, sum = 10
Failure
Subset {1, 1}:
{10}, k = 1, sum = 8
Failure
Subset {1, 3}:
{10}, k = 1, sum = 6
Failure
Subset {1, 7}:
{2, 2, 6, 20}, k = 1, sum = 12
Failure
Subset {7, 7}:
{2, 2, 6, 20}, k = 1, sum = 6
Success
The time complexity is trivially O(n^k) (number of k-sized subsets from n elements).
Since k is a given constant, a (possibly quite high-order) polynomial upper bounds the complexity as a function of n.

Algorithm to determine indices i..j of array A containing all the elements of another array B

I came across this question on an interview questions thread. Here is the question:
Given two integer arrays A [1..n] and
B[1..m], find the smallest window
in A that contains all elements of
B. In other words, find a pair < i , j >
such that A[i..j] contains B[1..m].
If A doesn't contain all the elements of
B, then i,j can be returned as -1.
The integers in A need not be in the same order as they are in B. If there are more than one smallest window (different, but have the same size), then its enough to return one of them.
Example: A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]. The algorithm should return i = 2 (index of 5 in A) and j = 9 (index of 17 in A).
Now I can think of two variations.
Let's suppose that B has duplicates.
This variation doesn't consider the number of times each element occurs in B. It just checks for all the unique elements that occur in B and finds the smallest corresponding window in A that satisfies the above problem. For example, if A[1,2,4,5,7] and B[2,2,5], this variation doesn't bother about there being two 2's in B and just checks A for the unique integers in B namely 2 and 5 and hence returns i=1, j=3.
This variation accounts for duplicates in B. If there are two 2's in B, then it expects to see at least two 2's in A as well. If not, it returns -1,-1.
When you answer, please do let me know which variation you are answering. Pseudocode should do. Please mention space and time complexity if it is tricky to calculate it. Mention if your solution assumes array indices to start at 1 or 0 too.
Thanks in advance.
Complexity
Time: O((m+n)log m)
Space: O(m)
The following is provably optimal up to a logarithmic factor. (I believe the log factor cannot be got rid of, and so it's optimal.)
Variant 1 is just a special case of variant 2 with all the multiplicities being 1, after removing duplicates from B. So it's enough to handle the latter variant; if you want variant 1, just remove duplicates in O(m log m) time. In the following, let m denote the number of distinct elements in B. We assume m < n, because otherwise we can just return -1, in constant time.
For each index i in A, we will find the smallest index s[i] such that A[i..s[i]] contains B[1..m], with the right multiplicities. The crucial observation is that s[i] is non-decreasing, and this is what allows us to do it in amortised linear time.
Start with i=j=1. We will keep a tuple (c[1], c[2], ... c[m]) of the number of times each element of B occurs, in the current window A[i..j]. We will also keep a set S of indices (a subset of 1..m) for which the count is "right" (i.e., k for which c[k]=1 in variant 1, or c[k] = <the right number> in variant 2).
So, for i=1, starting with j=1, increment each c[A[j]] (if A[j] was an element of B), check if c[A[j]] is now "right", and add or remove j from S accordingly. Stop when S has size m. You've now found s[1], in at most O(n log m) time. (There are O(n) j's, and each set operation took O(log m) time.)
Now for computing successive s[i]s, do the following. Increment i, decrement c[A[i]], update S accordingly, and, if necessary, increment j until S has size m again. That gives you s[i] for each i. At the end, report the (i,s[i]) for which s[i]-i was smallest.
Note that although it seems that you might be performing up to O(n) steps (incrementing j) for each i, the second pointer j only moves to the right: so the total number of times you can increment j is at most n. (This is amortised analysis.) Each time you increment j, you might perform a set operation that takes O(log m) time, so the total time is O(n log m). The space required was for keeping the tuple of counts, the set of elements of B, the set S, and some constant number of other variables, so O(m) in all.
There is an obvious O(m+n) lower bound, because you need to examine all the elements. So the only question is whether we can prove the log factor is necessary; I believe it is.
Here is the solution I thought of (but it's not very neat).
I am going to illustrate it using the example in the question.
Let A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]
Sort B. (So B = [2,5,8,11,17]). This step takes O(log m).
Allocate an array C of size A. Iterate through elements of A, binary search for it in the sorted B, if it is found enter it's "index in sorted B + 1" in C. If its not found, enter -1. After this step,
A = [1 , 2, 5, 11, 2, 6, 8, 24, 101, 17, 8] (no changes, quoting for ease).
C = [-1, 1, 2, 4 , 1, -1, 3, -1, -1, 5, 3]
Time: (n log m), Space O(n).
Find the smallest window in C that has all the numbers from 1 to m. For finding the window, I can think of two general directions:
a. A bit oriented approach where in I set the bit corresponding to each position and finally check by some kind of ANDing.
b. Create another array D of size m, go through C and when I encounter p in C, increment D[p]. Use this for finding the window.
Please leave comments regarding the general approach as such, as well as for 3a and 3b.
My solution:
a. Create a hash table with m keys, one for each value in B. Each key in H maps to a dynamic array of sorted indices containing indices in A that are equal to B[i]. This takes O(n) time. We go through each index j in A. If key A[i] exists in H (O(1) time) then add an value containing the index j of A to the list of indices that H[A[i]] maps to.
At this point we have 'binned' n elements into m bins. However, total storage is just O(n).
b. The 2nd part of the algorithm involves maintaining a ‘left’ index and a ‘right’ index for each list in H. Lets create two arrays of size m called L and R that contain these values. Initially in our example,
We also keep track of the “best” minimum window.
We then iterate over the following actions on L and R which are inherently greedy:
i. In each iteration, we compute the minimum and maximum values in L and R.
For L, Lmax - Lmin is the window and for R, Rmax - Rmin is the window. We update the best window if one of these windows is better than the current best window. We use a min heap to keep track of the minimum element in L and a max heap to keep track of the largest element in R. These take O(m*log(m)) time to build.
ii. From a ‘greedy’ perspective, we want to take the action that will minimize the window size in each L and R. For L it intuitively makes sense to increment the minimum index, and for R, it makes sense to decrement the maximum index.
We want to increment the array position for the minimum value until it is larger than the 2nd smallest element in L, and similarly, we want to decrement the array position for the largest value in R until it is smaller than the 2nd largest element in R.
Next, we make a key observation:
If L[i] is the minimum value in L and R[i] is less than the 2nd smallest element in L, ie, if R[i] were to still be the minimum value in L if L[i] were replaced with R[i], then we are done. We now have the “best” index in list i that can contribute to the minimum window. Also, all the other elements in R cannot contribute to the best window since their L values are all larger than L[i]. Similarly if R[j] is the maximum element in R and L[j] is greater than the 2nd largest value in R, we are also done by setting R[j] = L[j]. Any other index in array i to the left of L[j] has already been accounted for as have all indices to the right of R[j], and all indices between L[j] and R[j] will perform poorer than L[j].
Otherwise, we simply increment the array position L[i] until it is larger than the 2nd smallest element in L and decrement array position R[j] (where R[j] is the max in R) until it is smaller than the 2nd largest element in R. We compute the windows and update the best window if one of the L or R windows is smaller than the best window. We can do a Fibonacci search to optimally do the increment / decrement. We keep incrementing L[i] using Fibonacci increments until we are larger than the 2nd largest element in L. We can then perform binary search to get the smallest element L[i] that is larger than the 2nd largest element in L, similar for the set R. After the increment / decrement, we pop the largest element from the max heap for R and the minimum element for the min heap for L and insert the new values of L[i] and R[j] into the heaps. This is an O(log(m)) operation.
Step ii. would terminate when Lmin can’t move any more to the right or Rmax can’t move any more to the left (as the R/L values are the same). Note that we can have scenarios in which L[i] = R[i] but if it is not the minimum element in L or the maximum element in R, the algorithm would still continue.
Runtime analysis:
a. Creation of the hash table takes O(n) time and O(n) space.
b. Creation of heaps: O(m*log(m)) time and O(m) space.
c. The greedy iterative algorithm is a little harder to analyze. Its runtime is really bounded by the distribution of elements. Worst case, we cover all the elements in each array in the hash table. For each element, we perform an O(log(m)) heap update.
Worst case runtime is hence O(n*log(m)) for the iterative greedy algorithm. In the best case, we discover very fast that L[i] = R[i] for the minimum element in L or the maximum element in R…run time is O(1)*log(m) for the greedy algorithm!
Average case seems really hard to analyze. What is the average “convergence” of this algorithm to the minimum window. If we were to assume that the Fibonacci increments / binary search were to help, we could say we only look at m*log(n/m) elements (every list has n/m elements) in the average case. In that case, the running time of the greedy algorithm would be m*log(n/m)*log(m).
Total running time
Best case: O(n + m*log(m) + log(m)) time = O(n) assuming m << n
Average case: O(n + m*log(m) + m*log(n/m)*log(m)) time = O(n) assuming m << n.
Worst case: O(n + n*log(m) + m*log(m)) = O(n*log(m)) assuming m << n.
Space: O(n + m) (hashtable and heaps) always.
Edit: Here is a worked out example:
A[5, 1, 1, 5, 6, 1, 1, 5]
B[5, 6]
H:
{
5 => {1, 4, 8}
6 => {5}
}
Greedy Algorithm:
L => {1, 1}
R => {3, 1}
Iteration 1:
a. Lmin = 1 (since H{5}[1] < H{6}[1]), Lmax = 5. Window: 5 - 1 + 1= 5
Increment Lmin pointer, it now becomes 2.
L => {2, 1}
Rmin = H{6}[1] = 5, Rmax = H{5}[3] = 8. Window = 8 - 5 + 1 = 4. Best window so far = 4 (less than 5 computed above).
We also note the indices in A (5, 8) for the best window.
Decrement Rmax, it now becomes 2 and the value is 4.
R => {2, 1}
b. Now, Lmin = 4 (H{5}[2]) and the index i in L is 1. Lmax = 5 (H{6}[1]) and the index in L is 2.
We can't increment Lmin since L[1] = R[1] = 2. Thus we just compute the window now.
The window = Lmax - Lmin + 1 = 2 which is the best window so far.
Thus, the best window in A = (4, 5).
struct Pair {
int i;
int j;
};
Pair
find_smallest_subarray_window(int *A, size_t n, int *B, size_t m)
{
Pair p;
p.i = -1;
p.j = -1;
// key is array value, value is array index
std::map<int, int> map;
size_t count = 0;
int i;
int j;
for(i = 0; i < n, ++i) {
for(j = 0; j < m; ++j) {
if(A[i] == B[j]) {
if(map.find(A[i]) == map.end()) {
map.insert(std::pair<int, int>(A[i], i));
} else {
int start = findSmallestVal(map);
int end = findLargestVal(map);
int oldLength = end-start;
int oldIndex = map[A[i]];
map[A[i]] = i;
int _start = findSmallestVal(map);
int _end = findLargestVal(map);
int newLength = _end - _start;
if(newLength > oldLength) {
// revert back
map[A[i]] = oldIndex;
}
}
}
}
if(count == m) {
break;
}
}
p.i = findSmallestVal(map);
p.j = findLargestVal(map);
return p;
}

Resources