algorithm that finds the element ai in a sequence - algorithm

A sequence A=[a1, a2,...,an] is a valley sequence, if there's an index i with 1 < i < n such that:
a1 > a2 > .... > ai
and
ai < ai+1 < .... < an.
It is given that a valley sequence must contain at least three elements in it.
What i'm really confused about is, how do we find an algorithm that finds the element ai, as described above, in O(log n) time?
Will it be similar to an O(log n) binary search?
And if we do have a binary search algorithm which find an element of an array in O(log n) time, can we improve the runtime to O(log log n) ?

To have a BIG-O(logn) algorithm, we will have to reduce the problem size by half in constant time.
In this problem specifically, we can select a mid-point, and check if its slope is increasing, decreasing or a bottom.
If the slope is increasing, the part after the mid-point could be ignored
else if the slope is decreasing, the part before the mid-point could be ignored
else the mid-point should be the bottom, hence we find our target.
Java code example :
Input: [99, 97, 89, 1, 2, 4, 6], output: 1
public int findBottomValley(int[] valleySequence) {
int start = 0, end = valleySequence.length - 1, mid;
while (start + 1 < end) {
mid = start + (end - start) / 2;
if (checkSlope(mid, valleySequence) < 0) {
// case decreasing
start = mid;
} else if (checkSlope(mid, valleySequence) > 0) {
// case increasing
end = mid;
} else {
// find our target
return valleySequence[mid];
}
}
// left over with two points
if (valleySequence[start] < valleySequence[end]) {
return valleySequence[start];
}
return valleySequence[end];
}
The helper function checkSlope(index, list) will check the slope at the index of the list, it will check three points including index - 1, index and index + 1. If the slope is decreasing, return negative numbers; if the slope is increasing, return positive numbers; if the numbers at index - 1 and index + 1 are both larger than the number at index, return 0;
Note: the algorithm makes assumptions that:
the list has at least three items
the slope at the adjacent elements cannot be flat, the reason behind this is that if there are adjacent numbers that are equal, then we are unable to decide which side the bottom is. It could appear at the left of such flat slope or on the right, hence we will have to do a linear search.
Since random access of an array is already constant O(1), having an O(logn) access time may not help the algorithm.

There is a solution that works a lot like binary search. Set a = 2 and b = n - 1. At each step, we will only need to consider candidates with index k such that a <= k <= b. Compute m = (a + b) / 2 (integer divide, so round down) and then consider array elements at indices m - 1, m and m + 1. If these elements are decreasing, then set a = m and keep searching. If these elements are increasing, then set b = m and keep searching. If these elements form a valley sequence, then return m as the answer. If b - a < 2, then there is no valley.
Since we halve the search space each time, the complexity is logarithmic. Yes, we access three elements and perform two comparisons at each stage, but calculation will show that just affects constant factors.
Note that this answer depends on these sequences being strictly decreasing and then increasing. If consecutive elements can repeat, the best solution is linear in the worst case.
Just saw the second part. In general, no, a way to find specific elements in logarithmic time - even constant time - is useless in general. The problem is that we really have no useful idea what to look for. If the spacing of all elements' values is greater than their spacing in the array - this isn't hard to arrange - then I can't see how you'd pick something to search for.

Related

Q: Count array pairs with bitwise AND > k ~ better than O(N^2) possible?

Given an array nums
Count no. of pairs (two elements) where bitwise AND is greater than K
Brute force
for i in range(0,n):
for j in range(i+1,n):
if a[i]&a[j] > k:
res += 1
Better version:
preprocess to remove all elements ≤k
and then brute force
But i was wondering, what would be the limit in complexity here?
Can we do better with a trie, hashmap approach like two-sum?
( I did not find this problem on Leetcode so I thought of asking here )
Let size_of_input_array = N. Let the input array be of B-bit numbers
Here is an easy to understand and implement solution.
Eliminate all values <= k.
The above image shows 5 10-bit numbers.
Step 1: Adjacency Graph
Store a list of set bits. In our example, 7th bit is set for numbers at index 0,1,2,3 in the input array.
Step 2: The challenge is to avoid counting the same pairs again.
To solve this challenge we take help of union-find data structure as shown in the code below.
//unordered_map<int, vector<int>> adjacency_graph;
//adjacency_graph has been filled up in step 1
vector<int> parent;
for(int i = 0; i < input_array.size(); i++)
parent.push_back(i);
int result = 0;
for(int i = 0; i < adjacency_graph.size(); i++){ // loop 1
auto v = adjacency_graph[i];
if(v.size() > 1){
int different_parents = 1;
for (int j = 1; j < v.size(); j++) { // loop 2
int x = find(parent, v[j]);
int y = find(parent, v[j - 1]);
if (x != y) {
different_parents++;
union(parent, x, y);
}
}
result += (different_parents * (different_parents - 1)) / 2;
}
}
return result;
In the above code, find and union are from union-find data structure.
Time Complexity:
Step 1:
Build Adjacency Graph: O(BN)
Step 2:
Loop 1: O(B)
Loop 2: O(N * Inverse of Ackermann’s function which is an extremely slow-growing function)
Overall Time Complexity
= O(BN)
Space Complexity
Overall space complexity = O(BN)
First, prune everything <= k. Also Sort the value list.
Going from the most significant bit to the least significant we are going to keep track of the set of numbers we are working with (initially all ,s=0, e=n).
Let p be the first position that contains a 1 in the current set at the current position.
If the bit in k is 0, then everything that would yield a 1 world definetly be good and we need to investigate the ones that get a 0. We have (end - p) * (end-p-1) /2 pairs in the current range and (end-p) * <total 1s in this position larger or equal to end> combinations with larger previously good numbers, that we can add to the solution. To continue we update end = p. We want to count 1s in all the numbers above, because we only counted them before in pairs with each other, not with the numbers this low in the set.
If the bit in k is 1, then we can't count any wins yet, but we need to eliminate everything below p, so we update start = p.
You can stop once you went through all the bits or start==end.
Details:
Since at each step we eliminate either everything that has a 0 or everything that has a 1, then everything between start and end will have the same bit-prefix. since the values are sorted we can do a binary search to find p.
For <total 1s in this position larger than p>. We already have the values sorted. So we can compute partial sums and store for every position in the sorted list the number of 1s in every bit position for all numbers above it.
Complexity:
We got bit-by-bit so L (the bit length of the numbers), we do a binary search (logN), and lookup and updates O(1), so this is O(L logN).
We have to sort O(NlogN).
We have to compute partial bit-wise sums O(L*N).
Total O(L logN + NlogN + L*N).
Since N>>L, L logN is subsummed by NlogN. Since L>>logN (probably, as in you have 32 bit numbers but you don't have 4Billion of them), then NlogN is subsummed by L*N. So complexity is O(L * N). Since we also need to keep the partial sums around the memory complexity is also O(L * N).

How to find pair with kth largest sum?

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
The pairs with largest sums are
13 + 16 = 29
13 + 12 = 25
8 + 16 = 24
13 + 8 = 21
8 + 12 = 20
So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?
Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.
I am already aware of the O(Klogk) solution , using Max-Heap given here .
It also is one of the favorite Google interview question , and they demand a O(k) solution .
I've also read somewhere that there exists a O(k) solution, which i am unable to figure out .
Can someone explain the correct solution with a pseudocode .
P.S.
Please DON'T post this link as answer/comment.It DOESN'T contain the answer.
I start with a simple but not quite linear-time algorithm. We choose some value between array1[0]+array2[0] and array1[N-1]+array2[N-1]. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between array1[N-1]+array2[N-1] and array1[0]+array2[0]. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.
Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N2) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?
We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank k/4 for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.
This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.
If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.
I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses std::nth_element, which allows linear complexity only on average (not worst-case).
A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.
OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.
Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).
EDIT: This does not work. I leave the answer, since apparently I am not the only one who could have this kind of idea; see the discussion below.
A counter-example is x = (2, 3, 6), y = (1, 4, 5) and k=3, where the algorithm gives 7 (3+4) instead of 8 (3+5).
Let x and y be the two arrays, sorted in decreasing order; we want to construct the K-th largest sum.
The variables are: i the index in the first array (element x[i]), j the index in the second array (element y[j]), and k the "order" of the sum (k in 1..K), in the sense that S(k)=x[i]+y[j] will be the k-th greater sum satisfying your conditions (this is the loop invariant).
Start from (i, j) equal to (0, 0): clearly, S(1) = x[0]+y[0].
for k from 1 to K-1, do:
if x[i+1]+ y[j] > x[i] + y[j+1], then i := i+1 (and j does not change) ; else j:=j+1
To see that it works, consider you have S(k) = x[i] + y[j]. Then, S(k+1) is the greatest sum which is lower (or equal) to S(k), and such as at least one element (i or j) changes. It is not difficult to see that exactly one of i or j should change.
If i changes, the greater sum you can construct which is lower than S(k) is by setting i=i+1, because x is decreasing and all the x[i'] + y[j] with i' < i are greater than S(k). The same holds for j, showing that S(k+1) is either x[i+1] + y[j] or x[i] + y[j+1].
Therefore, at the end of the loop you found the K-th greater sum.
tl;dr: If you look ahead and look behind at each iteration, you can start with the end (which is highest) and work back in O(K) time.
Although the insight underlying this approach is, I believe, sound, the code below is not quite correct at present (see comments).
Let's see: first of all, the arrays are sorted. So, if the arrays are a and b with lengths M and N, and as you have arranged them, the largest items are in slots M and N respectively, the largest pair will always be a[M]+b[N].
Now, what's the second largest pair? It's going to have perhaps one of {a[M],b[N]} (it can't have both, because that's just the largest pair again), and at least one of {a[M-1],b[N-1]}. BUT, we also know that if we choose a[M-1]+b[N-1], we can make one of the operands larger by choosing the higher number from the same list, so it will have exactly one number from the last column, and one from the penultimate column.
Consider the following two arrays: a = [1, 2, 53]; b = [66, 67, 68]. Our highest pair is 53+68. If we lose the smaller of those two, our pair is 68+2; if we lose the larger, it's 53+67. So, we have to look ahead to decide what our next pair will be. The simplest lookahead strategy is simply to calculate the sum of both possible pairs. That will always cost two additions, and two comparisons for each transition (three because we need to deal with the case where the sums are equal);let's call that cost Q).
At first, I was tempted to repeat that K-1 times. BUT there's a hitch: the next largest pair might actually be the other pair we can validly make from {{a[M],b[N]}, {a[M-1],b[N-1]}. So, we also need to look behind.
So, let's code (python, should be 2/3 compatible):
def kth(a,b,k):
M = len(a)
N = len(b)
if k > M*N:
raise ValueError("There are only %s possible pairs; you asked for the %sth largest, which is impossible" % M*N,k)
(ia,ib) = M-1,N-1 #0 based arrays
# we need this for lookback
nottakenindices = (0,0) # could be any value
nottakensum = float('-inf')
for i in range(k-1):
optionone = a[ia]+b[ib-1]
optiontwo = a[ia-1]+b[ib]
biggest = max((optionone,optiontwo))
#first deal with look behind
if nottakensum > biggest:
if optionone == biggest:
newnottakenindices = (ia,ib-1)
else: newnottakenindices = (ia-1,ib)
ia,ib = nottakenindices
nottakensum = biggest
nottakenindices = newnottakenindices
#deal with case where indices hit 0
elif ia <= 0 and ib <= 0:
ia = ib = 0
elif ia <= 0:
ib-=1
ia = 0
nottakensum = float('-inf')
elif ib <= 0:
ia-=1
ib = 0
nottakensum = float('-inf')
#lookahead cases
elif optionone > optiontwo:
#then choose the first option as our next pair
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
elif optionone < optiontwo: # choose the second
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#next two cases apply if options are equal
elif a[ia] > b[ib]:# drop the smallest
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
else: # might be equal or not - we can choose arbitrarily if equal
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#+2 - one for zero-based, one for skipping the 1st largest
data = (i+2,a[ia],b[ib],a[ia]+b[ib],ia,ib)
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
if ia <= 0 and ib <= 0:
raise ValueError("Both arrays exhausted before Kth (%sth) pair reached"%data[0])
return data, narrative
For those without python, here's an ideone: http://ideone.com/tfm2MA
At worst, we have 5 comparisons in each iteration, and K-1 iterations, which means that this is an O(K) algorithm.
Now, it might be possible to exploit information about differences between values to optimise this a little bit, but this accomplishes the goal.
Here's a reference implementation (not O(K), but will always work, unless there's a corner case with cases where pairs have equal sums):
import itertools
def refkth(a,b,k):
(rightia,righta),(rightib,rightb) = sorted(itertools.product(enumerate(a),enumerate(b)), key=lamba((ia,ea),(ib,eb):ea+eb)[k-1]
data = k,righta,rightb,righta+rightb,rightia,rightib
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
return data, narrative
This calculates the cartesian product of the two arrays (i.e. all possible pairs), sorts them by sum, and takes the kth element. The enumerate function decorates each item with its index.
The max-heap algorithm in the other question is simple, fast and correct. Don't knock it. It's really well explained too. https://stackoverflow.com/a/5212618/284795
Might be there isn't any O(k) algorithm. That's okay, O(k log k) is almost as fast.
If the last two solutions were at (a1, b1), (a2, b2), then it seems to me there are only four candidate solutions (a1-1, b1) (a1, b1-1) (a2-1, b2) (a2, b2-1). This intuition could be wrong. Surely there are at most four candidates for each coordinate, and the next highest is among the 16 pairs (a in {a1,a2,a1-1,a2-1}, b in {b1,b2,b1-1,b2-1}). That's O(k).
(No it's not, still not sure whether that's possible.)
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
Merge the 2 arrays and note down the indexes in the sorted array. Here is the index array looks like (starting from 1 not 0)
[1, 2, 4, 6, 8]
[3, 5, 7, 9]
Now start from end and make tuples. sum the elements in the tuple and pick the kth largest sum.
public static List<List<Integer>> optimization(int[] nums1, int[] nums2, int k) {
// 2 * O(n log(n))
Arrays.sort(nums1);
Arrays.sort(nums2);
List<List<Integer>> results = new ArrayList<>(k);
int endIndex = 0;
// Find the number whose square is the first one bigger than k
for (int i = 1; i <= k; i++) {
if (i * i >= k) {
endIndex = i;
break;
}
}
// The following Iteration provides at most endIndex^2 elements, and both arrays are in ascending order,
// so k smallest pairs must can be found in this iteration. To flatten the nested loop, refer
// 'https://stackoverflow.com/questions/7457879/algorithm-to-optimize-nested-loops'
for (int i = 0; i < endIndex * endIndex; i++) {
int m = i / endIndex;
int n = i % endIndex;
List<Integer> item = new ArrayList<>(2);
item.add(nums1[m]);
item.add(nums2[n]);
results.add(item);
}
results.sort(Comparator.comparing(pair->pair.get(0) + pair.get(1)));
return results.stream().limit(k).collect(Collectors.toList());
}
Key to eliminate O(n^2):
Avoid cartesian product(or 'cross join' like operation) of both arrays, which means flattening the nested loop.
Downsize iteration over the 2 arrays.
So:
Sort both arrays (Arrays.sort offers O(n log(n)) performance according to Java doc)
Limit the iteration range to the size which is just big enough to support k smallest pairs searching.

Find triplets in better than linear time such that A[n-1] >= A[n] <= A[n+1]

A sequence of numbers was given in an interview such that A[0] >= A[1] and A[N-1] >= A[N-2]. I was asked to find at-least one triplet such that A[n-1] >= A[n] <= A[n+1].
I tried to solve in iterations. Interviewer expected better than linear time solution. How should I approach this question?
Example: 9 8 5 4 3 2 6 7
Answer: 3 2 6
We can solve this in O(logn) time using divide & conquer aka. binary search. Better than linear time. So we need to find a triplet such that A[n-1] >= A[n] <= A[n+1].
First find the mid of the given array. If mid is smaller than its left and greater than its right. then return, thats your answer. Incidentally this would be a basecase in your recursion. Also if len(arr) < 3 then too return. another basecase.
Now comes the recursion scenarios. When to recurse, we would need to inspect further right. For that, If mid is greater than the element on its left then consider start to left of the array as a subproblem and recurse with this new array. i.e. in tangible terms at this point we would have ...26... with index n being 6. So we move left to see if the element to the left of 2 completes the triplet.
Otherwise if mid is greater than element on its right subarray then consider mid+1 to right of the array as a subproblem and recurse.
More Theory: The above should be sufficient to understand the problem but read on. The problem essentially boils down to finding local minima in a given set of elements. A number in the array is called local minima if it is smaller than both its left and right numbers which precisely boils down to A[n-1] >= A[n] <= A[n+1].
A given array such that its first 2 elements are decreasing and last 2 elements are increasing HAS to have a local minima. Why is that? Lets prove this by negation. If first two numbers are decreasing, and there is no local minima, that means 3rd number is less than 2nd number. otherwise 2nd number would have been local minima. Following the same logic 4th number will have to be less than 3rd number and so on and so forth. So the numbers in the array will have to be in decreasing order. Which violates the constraint of last two numbers being in increasing order. This proves by negation that there need to be a local minima.
The above theory suggests a O(n) linear approach but we definitely can do better. But the theory definitely gives us a different perspective about the problem.
Code: Here's python code (fyi - was typed in stackoverflow text editor freehand, it might misbheave).
def local_minima(arr, start, end):
mid = (start+end)/2
if mid-2 < 0 and mid+1 >= len(arr):
return -1;
if arr[mid-2] > arr[mid-1] and arr[mid-1] < arr[mid]: #found it!
return mid-1;
if arr[mid-1] > arr[mid-2]:
return local_minima(arr, start, mid);
else:
return local_minima(arr, mid, end);
Note that I just return the index of the n. To print out the triple just do -1 and +1 to the returned index. source
It sounds like what you're asking is this:
You have a sequence of numbers. It starts decreasing and continues to decrease until element n, then it starts increasing until the end of the sequence. Find n.
This is a (non-optimal) solution in linear time:
for (i = 1; i < length(A) - 1; i++)
{
if ((A[i-1] >= A[i]) && (A[i] <= A[i+1]))
return i;
}
To do better than linear time, you need to use the information that you get from the fact that the series decreases then increases.
Consider the difference between A[i] and A[i+1]. If A[i] > A[i+1], then n > i, since the values are still decreasing. If A[i] <= A[i+1], then n <= i, since the values are now increasing. In this case you need to check the difference between A[i-1] and A[i].
This is a solution in log time:
int boundUpper = length(A) - 1;
int boundLower = 1;
int i = (boundUpper + boundLower) / 2; //initial estimate
while (true)
{
if (A[i] > A[i+1])
boundLower = i + 1;
else if (A[i-1] >= A[i])
return i;
else
boundUpper = i;
i = (boundLower + boundUpper) / 2;
}
I'll leave it to you to add in the necessary error check in the case that A does not have an element satisfying the criteria.
Linear you could just do by iterating through the set, comparing them all.
You could also check the slope of the first two, then do a kind of binary chop/in order traversal comparing pairs until you find one of the opposite slope. That would amortize to a better than n time, I think, though it's not guaranteed.
edit: just realised what your ordering meant. The binary chop method is guaranteed to do this in <n time, as there is guaranteed to be a point of change (assuming that your N-1, N-2 are the last two elements of the list).
This means you just need to find it/one of them, in which case binary chop will do it in order log(n)

Find subset with elements that are furthest apart from eachother

I have an interview question that I can't seem to figure out. Given an array of size N, find the subset of size k such that the elements in the subset are the furthest apart from each other. In other words, maximize the minimum pairwise distance between the elements.
Example:
Array = [1,2,6,10]
k = 3
answer = [1,6,10]
The bruteforce way requires finding all subsets of size k which is exponential in runtime.
One idea I had was to take values evenly spaced from the array. What I mean by this is
Take the 1st and last element
find the difference between them (in this case 10-1) and divide that by k ((10-1)/3=3)
move 2 pointers inward from both ends, picking out elements that are +/- 3 from your previous pick. So in this case, you start from 1 and 10 and find the closest elements to 4 and 7. That would be 6.
This is based on the intuition that the elements should be as evenly spread as possible. I have no idea how to prove it works/doesn't work. If anyone knows how or has a better algorithm please do share. Thanks!
This can be solved in polynomial time using DP.
The first step is, as you mentioned, sort the list A. Let X[i,j] be the solution for selecting j elements from first i elements A.
Now, X[i+1, j+1] = max( min( X[k,j], A[i+1]-A[k] ) ) over k<=i.
I will leave initialization step and memorization of subset step for you to work on.
In your example (1,2,6,10) it works the following way:
1 2 6 10
1 - - - -
2 - 1 5 9
3 - - 1 4
4 - - - 1
The basic idea is right, I think. You should start by sorting the array, then take the first and the last elements, then determine the rest.
I cannot think of a polynomial algorithm to solve this, so I would suggest one of the two options.
One is to use a search algorithm, branch-and-bound style, since you have a nice heuristic at hand: the upper bound for any solution is the minimum size of the gap between the elements picked so far, so the first guess (evenly spaced cells, as you suggested) can give you a good baseline, which will help prune most of the branches right away. This will work fine for smaller values of k, although the worst case performance is O(N^k).
The other option is to start with the same baseline, calculate the minimum pairwise distance for it and then try to improve it. Say you have a subset with minimum distance of 10, now try to get one with 11. This can be easily done by a greedy algorithm -- pick the first item in the sorted sequence such that the distance between it and the previous item is bigger-or-equal to the distance you want. If you succeed, try increasing further, if you fail -- there is no such subset.
The latter solution can be faster when the array is large and k is relatively large as well, but the elements in the array are relatively small. If they are bound by some value M, this algorithm will take O(N*M) time, or, with a small improvement, O(N*log(M)), where N is the size of the array.
As Evgeny Kluev suggests in his answer, there is also a good upper bound on the maximum pairwise distance, which can be used in either one of these algorithms. So the complexity of the latter is actually O(N*log(M/k)).
You can do this in O(n*(log n) + n*log(M)), where M is max(A) - min(A).
The idea is to use binary search to find the maximum separation possible.
First, sort the array. Then, we just need a helper function that takes in a distance d, and greedily builds the longest subarray possible with consecutive elements separated by at least d. We can do this in O(n) time.
If the generated array has length at least k, then the maximum separation possible is >=d. Otherwise, it's strictly less than d. This means we can use binary search to find the maximum value. With some cleverness, you can shrink the 'low' and 'high' bounds of the binary search, but it's already so fast that sorting would become the bottleneck.
Python code:
def maximize_distance(nums: List[int], k: int) -> List[int]:
"""Given an array of numbers and size k, uses binary search
to find a subset of size k with maximum min-pairwise-distance"""
assert len(nums) >= k
if k == 1:
return [nums[0]]
nums.sort()
def longest_separated_array(desired_distance: int) -> List[int]:
"""Given a distance, returns a subarray of nums
of length k with pairwise differences at least that distance (if
one exists)."""
answer = [nums[0]]
for x in nums[1:]:
if x - answer[-1] >= desired_distance:
answer.append(x)
if len(answer) == k:
break
return answer
low, high = 0, (nums[-1] - nums[0])
while low < high:
mid = (low + high + 1) // 2
if len(longest_separated_array(mid)) == k:
low = mid
else:
high = mid - 1
return longest_separated_array(low)
I suppose your set is ordered. If not, my answer will be changed slightly.
Let's suppose you have an array X = (X1, X2, ..., Xn)
Energy(Xi) = min(|X(i-1) - Xi|, |X(i+1) - Xi|), 1 < i <n
j <- 1
while j < n - k do
X.Exclude(min(Energy(Xi)), 1 < i < n)
j <- j + 1
n <- n - 1
end while
$length = length($array);
sort($array); //sorts the list in ascending order
$differences = ($array << 1) - $array; //gets the difference between each value and the next largest value
sort($differences); //sorts the list in ascending order
$max = ($array[$length-1]-$array[0])/$M; //this is the theoretical max of how large the result can be
$result = array();
for ($i = 0; i < $length-1; $i++){
$count += $differences[i];
if ($length-$i == $M - 1 || $count >= $max){ //if there are either no more coins that can be taken or we have gone above or equal to the theoretical max, add a point
$result.push_back($count);
$count = 0;
$M--;
}
}
return min($result)
For the non-code people: sort the list, find the differences between each 2 sequential elements, sort that list (in ascending order), then loop through it summing up sequential values until you either pass the theoretical max or there arent enough elements remaining; then add that value to a new array and continue until you hit the end of the array. then return the minimum of the newly created array.
This is just a quick draft though. At a quick glance any operation here can be done in linear time (radix sort for the sorts).
For example, with 1, 4, 7, 100, and 200 and M=3, we get:
$differences = 3, 3, 93, 100
$max = (200-1)/3 ~ 67
then we loop:
$count = 3, 3+3=6, 6+93=99 > 67 so we push 99
$count = 100 > 67 so we push 100
min(99,100) = 99
It is a simple exercise to convert this to the set solution that I leave to the reader (P.S. after all the times reading that in a book, I've always wanted to say it :P)

Algorithm to determine indices i..j of array A containing all the elements of another array B

I came across this question on an interview questions thread. Here is the question:
Given two integer arrays A [1..n] and
B[1..m], find the smallest window
in A that contains all elements of
B. In other words, find a pair < i , j >
such that A[i..j] contains B[1..m].
If A doesn't contain all the elements of
B, then i,j can be returned as -1.
The integers in A need not be in the same order as they are in B. If there are more than one smallest window (different, but have the same size), then its enough to return one of them.
Example: A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]. The algorithm should return i = 2 (index of 5 in A) and j = 9 (index of 17 in A).
Now I can think of two variations.
Let's suppose that B has duplicates.
This variation doesn't consider the number of times each element occurs in B. It just checks for all the unique elements that occur in B and finds the smallest corresponding window in A that satisfies the above problem. For example, if A[1,2,4,5,7] and B[2,2,5], this variation doesn't bother about there being two 2's in B and just checks A for the unique integers in B namely 2 and 5 and hence returns i=1, j=3.
This variation accounts for duplicates in B. If there are two 2's in B, then it expects to see at least two 2's in A as well. If not, it returns -1,-1.
When you answer, please do let me know which variation you are answering. Pseudocode should do. Please mention space and time complexity if it is tricky to calculate it. Mention if your solution assumes array indices to start at 1 or 0 too.
Thanks in advance.
Complexity
Time: O((m+n)log m)
Space: O(m)
The following is provably optimal up to a logarithmic factor. (I believe the log factor cannot be got rid of, and so it's optimal.)
Variant 1 is just a special case of variant 2 with all the multiplicities being 1, after removing duplicates from B. So it's enough to handle the latter variant; if you want variant 1, just remove duplicates in O(m log m) time. In the following, let m denote the number of distinct elements in B. We assume m < n, because otherwise we can just return -1, in constant time.
For each index i in A, we will find the smallest index s[i] such that A[i..s[i]] contains B[1..m], with the right multiplicities. The crucial observation is that s[i] is non-decreasing, and this is what allows us to do it in amortised linear time.
Start with i=j=1. We will keep a tuple (c[1], c[2], ... c[m]) of the number of times each element of B occurs, in the current window A[i..j]. We will also keep a set S of indices (a subset of 1..m) for which the count is "right" (i.e., k for which c[k]=1 in variant 1, or c[k] = <the right number> in variant 2).
So, for i=1, starting with j=1, increment each c[A[j]] (if A[j] was an element of B), check if c[A[j]] is now "right", and add or remove j from S accordingly. Stop when S has size m. You've now found s[1], in at most O(n log m) time. (There are O(n) j's, and each set operation took O(log m) time.)
Now for computing successive s[i]s, do the following. Increment i, decrement c[A[i]], update S accordingly, and, if necessary, increment j until S has size m again. That gives you s[i] for each i. At the end, report the (i,s[i]) for which s[i]-i was smallest.
Note that although it seems that you might be performing up to O(n) steps (incrementing j) for each i, the second pointer j only moves to the right: so the total number of times you can increment j is at most n. (This is amortised analysis.) Each time you increment j, you might perform a set operation that takes O(log m) time, so the total time is O(n log m). The space required was for keeping the tuple of counts, the set of elements of B, the set S, and some constant number of other variables, so O(m) in all.
There is an obvious O(m+n) lower bound, because you need to examine all the elements. So the only question is whether we can prove the log factor is necessary; I believe it is.
Here is the solution I thought of (but it's not very neat).
I am going to illustrate it using the example in the question.
Let A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]
Sort B. (So B = [2,5,8,11,17]). This step takes O(log m).
Allocate an array C of size A. Iterate through elements of A, binary search for it in the sorted B, if it is found enter it's "index in sorted B + 1" in C. If its not found, enter -1. After this step,
A = [1 , 2, 5, 11, 2, 6, 8, 24, 101, 17, 8] (no changes, quoting for ease).
C = [-1, 1, 2, 4 , 1, -1, 3, -1, -1, 5, 3]
Time: (n log m), Space O(n).
Find the smallest window in C that has all the numbers from 1 to m. For finding the window, I can think of two general directions:
a. A bit oriented approach where in I set the bit corresponding to each position and finally check by some kind of ANDing.
b. Create another array D of size m, go through C and when I encounter p in C, increment D[p]. Use this for finding the window.
Please leave comments regarding the general approach as such, as well as for 3a and 3b.
My solution:
a. Create a hash table with m keys, one for each value in B. Each key in H maps to a dynamic array of sorted indices containing indices in A that are equal to B[i]. This takes O(n) time. We go through each index j in A. If key A[i] exists in H (O(1) time) then add an value containing the index j of A to the list of indices that H[A[i]] maps to.
At this point we have 'binned' n elements into m bins. However, total storage is just O(n).
b. The 2nd part of the algorithm involves maintaining a ‘left’ index and a ‘right’ index for each list in H. Lets create two arrays of size m called L and R that contain these values. Initially in our example,
We also keep track of the “best” minimum window.
We then iterate over the following actions on L and R which are inherently greedy:
i. In each iteration, we compute the minimum and maximum values in L and R.
For L, Lmax - Lmin is the window and for R, Rmax - Rmin is the window. We update the best window if one of these windows is better than the current best window. We use a min heap to keep track of the minimum element in L and a max heap to keep track of the largest element in R. These take O(m*log(m)) time to build.
ii. From a ‘greedy’ perspective, we want to take the action that will minimize the window size in each L and R. For L it intuitively makes sense to increment the minimum index, and for R, it makes sense to decrement the maximum index.
We want to increment the array position for the minimum value until it is larger than the 2nd smallest element in L, and similarly, we want to decrement the array position for the largest value in R until it is smaller than the 2nd largest element in R.
Next, we make a key observation:
If L[i] is the minimum value in L and R[i] is less than the 2nd smallest element in L, ie, if R[i] were to still be the minimum value in L if L[i] were replaced with R[i], then we are done. We now have the “best” index in list i that can contribute to the minimum window. Also, all the other elements in R cannot contribute to the best window since their L values are all larger than L[i]. Similarly if R[j] is the maximum element in R and L[j] is greater than the 2nd largest value in R, we are also done by setting R[j] = L[j]. Any other index in array i to the left of L[j] has already been accounted for as have all indices to the right of R[j], and all indices between L[j] and R[j] will perform poorer than L[j].
Otherwise, we simply increment the array position L[i] until it is larger than the 2nd smallest element in L and decrement array position R[j] (where R[j] is the max in R) until it is smaller than the 2nd largest element in R. We compute the windows and update the best window if one of the L or R windows is smaller than the best window. We can do a Fibonacci search to optimally do the increment / decrement. We keep incrementing L[i] using Fibonacci increments until we are larger than the 2nd largest element in L. We can then perform binary search to get the smallest element L[i] that is larger than the 2nd largest element in L, similar for the set R. After the increment / decrement, we pop the largest element from the max heap for R and the minimum element for the min heap for L and insert the new values of L[i] and R[j] into the heaps. This is an O(log(m)) operation.
Step ii. would terminate when Lmin can’t move any more to the right or Rmax can’t move any more to the left (as the R/L values are the same). Note that we can have scenarios in which L[i] = R[i] but if it is not the minimum element in L or the maximum element in R, the algorithm would still continue.
Runtime analysis:
a. Creation of the hash table takes O(n) time and O(n) space.
b. Creation of heaps: O(m*log(m)) time and O(m) space.
c. The greedy iterative algorithm is a little harder to analyze. Its runtime is really bounded by the distribution of elements. Worst case, we cover all the elements in each array in the hash table. For each element, we perform an O(log(m)) heap update.
Worst case runtime is hence O(n*log(m)) for the iterative greedy algorithm. In the best case, we discover very fast that L[i] = R[i] for the minimum element in L or the maximum element in R…run time is O(1)*log(m) for the greedy algorithm!
Average case seems really hard to analyze. What is the average “convergence” of this algorithm to the minimum window. If we were to assume that the Fibonacci increments / binary search were to help, we could say we only look at m*log(n/m) elements (every list has n/m elements) in the average case. In that case, the running time of the greedy algorithm would be m*log(n/m)*log(m).
Total running time
Best case: O(n + m*log(m) + log(m)) time = O(n) assuming m << n
Average case: O(n + m*log(m) + m*log(n/m)*log(m)) time = O(n) assuming m << n.
Worst case: O(n + n*log(m) + m*log(m)) = O(n*log(m)) assuming m << n.
Space: O(n + m) (hashtable and heaps) always.
Edit: Here is a worked out example:
A[5, 1, 1, 5, 6, 1, 1, 5]
B[5, 6]
H:
{
5 => {1, 4, 8}
6 => {5}
}
Greedy Algorithm:
L => {1, 1}
R => {3, 1}
Iteration 1:
a. Lmin = 1 (since H{5}[1] < H{6}[1]), Lmax = 5. Window: 5 - 1 + 1= 5
Increment Lmin pointer, it now becomes 2.
L => {2, 1}
Rmin = H{6}[1] = 5, Rmax = H{5}[3] = 8. Window = 8 - 5 + 1 = 4. Best window so far = 4 (less than 5 computed above).
We also note the indices in A (5, 8) for the best window.
Decrement Rmax, it now becomes 2 and the value is 4.
R => {2, 1}
b. Now, Lmin = 4 (H{5}[2]) and the index i in L is 1. Lmax = 5 (H{6}[1]) and the index in L is 2.
We can't increment Lmin since L[1] = R[1] = 2. Thus we just compute the window now.
The window = Lmax - Lmin + 1 = 2 which is the best window so far.
Thus, the best window in A = (4, 5).
struct Pair {
int i;
int j;
};
Pair
find_smallest_subarray_window(int *A, size_t n, int *B, size_t m)
{
Pair p;
p.i = -1;
p.j = -1;
// key is array value, value is array index
std::map<int, int> map;
size_t count = 0;
int i;
int j;
for(i = 0; i < n, ++i) {
for(j = 0; j < m; ++j) {
if(A[i] == B[j]) {
if(map.find(A[i]) == map.end()) {
map.insert(std::pair<int, int>(A[i], i));
} else {
int start = findSmallestVal(map);
int end = findLargestVal(map);
int oldLength = end-start;
int oldIndex = map[A[i]];
map[A[i]] = i;
int _start = findSmallestVal(map);
int _end = findLargestVal(map);
int newLength = _end - _start;
if(newLength > oldLength) {
// revert back
map[A[i]] = oldIndex;
}
}
}
}
if(count == m) {
break;
}
}
p.i = findSmallestVal(map);
p.j = findLargestVal(map);
return p;
}

Resources