Sorting as much as possible: values can travel no more than k positions to their left - algorithm

Given an array of length N and an integer K, sort the array as much as possible such that no element travels more than K positions to its left. An element however can travel as much as it likes to its right.
Let's define sortedness as the number of disordered pairs, i.e.: sortedness(1,2,3) = 0 and sortedness(3,1,2) = 2.
Clarification: If the first k+1 items of the array are moved to the end of the array, the other ones should be considered moved k+1 positions to the left.
This is an interview question. I thought of using a bubble sort. The outer loop would run K times with a run-time of O(nk). The smallest integer would be the only integer shifted to the left K times. The other integers would be shifted to the left less than K times.
Is there a more efficient way to approach this problem?

Use a min heap to sort the list of n elements in O(n log k).
Add the first k+1 unsorted elements to the heap.
Repeat this step: pop off the min element from the heap. Add it to the end of the sorted list. add the next unsorted element to the heap.
Because the heap always has at most k+1 elements regardless of n, all heap operations are O(log k), and the total running time is O(n log k)
Why is this correct?
Suppose it isn't. Then for some inputs my algorithm gives non-optimal sorts. Let I be such an input, let A be the output of my algorithm on I, and let B be the optimal sort.
Let i be the first index where A and B disagree. Let x = A[i], y = B[i], and let j be the index of x in B.
I claim that swapping x and y in B improve the sortedness of B, which is a contradiction.
Because A and B are identical for positions before i, the same set of k+1 elements are eligible to go into position i for both. Because my algorithm chose x to be the min of those elements, we know that x is less than y. We also know j is greater than i.
What happens when we swap x and y in B?
First, note that the change in sortedness is unaffected by anything to the left of i or to the right of j, because their positions relative to both x and y are unchanged by the swap.
We know there are no elements between i and j that are less than x, because my sort chose the smallest available element. Therefore all elements between i and j are at least as large as x.
For each element between i and j equal to x, swapping x and y improves sortedness by 1 because we improve y relative to these elements and x is unaffected.
For each element between i and j greater than x, the sortedness of x relative to these is improved by 1, and in the worst case the sortedness of y relative to these is degraded by 1, so the net effect is at worst 0.
Furthermore, swapping x and y improves the sortedness of x relative to y by 1, so this swap strictly improves overall sortedness.
Contradiction.

Naive approach:
iterate the array from left to right.For each position i we consider a subarray from i to i+k. Then we have to get the minimum valued element in this subarray and swap the 1st element of this subarray with this element. Now, go to position i+1 and do the same.
Optimized Approach:
We can use segment tree to solve this. Using this data structure you can find the minimum value between any range of an array and also edit any data online in O(logn). In your problem, we can get the solution array using following steps,
arr[1] = minimum value between position 1 to min(k,n), then edit this position with infinity
arr[2] = min value between position 1 to min(k+1,n), then edit this position with infinity
arr[3] = min value between position 1 to min(k+2,n), then edit this position with infinity
arr[4] = min value between position 1 to min(k+3,n), then edit this position with infinity
...
...
arr[n] = min value between position 1 to min(k+n,n), then edit this position with infinity
Overall complexity O(nlogn)
for example:
given array = 5 3 4 7 8 2 1 0 and K = 2
using this algorithm you will get the solution array as this:
3 4 5 2 1 0 7 8 sortedness value = 12
Hope it helps!
Best regards,
Agassaa

Related

Maximum Score a Mathematician can score

For an array A of n integers, a mathematician can perform the following moves move on the array
1. Choose an index i(0<=i<length(A)) and add A[i] to the scores.
2. Discard either the left partition(i.e A[0....i-1]) or the right
partition(i.e A[i+1 ... length(A)-1]). the partition discarded can
be empty too. The selected partition becomes the new value of A and
is used for subsequent operations.
Starting from the initial score of 0 mathematician wishes to find the maximum score achievable after K moves.
Example:
A = [4,6,-10,-1,10,-20], K = 4
Maximum Score is 19
Explanation:
- Select A[4](0-based indexing) and keep the left subarray. Now the
score is 10 and A = [4,6,-10,-1].
- Select A[0] and keep the right subarray. Now Score is 10+4=14 and A =
[6,-10,-1].
- Select A[0] and keep the right subarray. Now the score is 14+6=20,
and A = [-10,-1].
- Select A[1] and then right subarray. Now score is 20-1=19 and A = []
So, after K=4 moves, the maximum score is 19
I tried a dynamic programming solution with the following subproblem and recurrence relation:
- opt(i,j,k) = maximum score possible using element from index i to j
in k moves
- opt(i,j,k) = max( opt(i,j,k), a[l] + max(opt(i,l-1,k-1),
opt(l+1,j,k-1)) for l ranging from i to j (inclusive).
the complexity of the above dp solution is: n^3k
Can you help me with a better solution?
Let M be a set of the K largest values in A. It's obvious the maximum achievable score is the sum of all the elements in M. Note that it's always possible to get such a score. The mathematician can first find M and then go through the array selecting the leftmost value in A that belongs to M and discarding the left part of the array. This proves that finding the sum of M is the answer.
You can use Quickselect to achieve O(n) performance on average. If you want to avoid the worst-case performance O(n^2) you can find M using a min heap of size K storing the K largest numbers as you iterate over A. This would lead to O(n * log(K)) time complexity.

Select pairs of numbers with the minimum overall difference

Given n pairs of numbers, select k pairs so that the difference between the minimum value and the maximum value is minimal. Note that 2 numbers in 1 pair cannot be separated. Example (n=5, k=3):
INPUT OUTPUT (return the index of the pairs)
5 4 1 2 4
1 5
9 8
1 0
2 7
In this case, choosing (5,4) (1,5) (1,0) will give a difference of 5 (max is 5, min is 0). I'm looking for an efficient way (n log n) of doing this since the input will be pretty large and I don't want to go through every possible case.
Thank you.
NOTE: No code is needed. An explanation of the solution is enough.
Here's a method with O(n log n) time complexity:
First sort the array according to the smaller number in the pair. Now iterate back from the last element in the sorted array (the pair with the highest minimum).
As we go backwards, the elements already visited will necessarily have an equal or higher minimum than the current element. Store the visited pairs in a max heap according to the maximal number in the visited pair. If the heap size is smaller than k-1, keep adding to the heap.
Once the heap size equals k-1, begin recording and comparing the best interval so far. If the heap size exceeds k-1, pop the maximal element off. The heap is guaranteed to contain the first k-1 pairs where the minimal number is greater than or equal to the current minimal number and the maximal is smallest (since we keep popping off the maximal element when the heap size exceeds k-1).
Total time O(n log n) for sorting + O(n log n) to iterate and maintain the heap = O(n log n) in total.
Example:
5 4
1 5
9 8
1 0
2 7
k = 3
Sort pairs by the smaller number in each pair:
[(1,0),(1,5),(2,7),(5,4),(9,8)]
Iterate from end to start:
i = 4; Insert (9,8) into heap
i = 3; Insert (5,4) into heap
i = 2; Range = 2-9
i = 1; Pop (9,8) from heap; Range = 1-7
i = 0; Pop (2,7) from heap; Range = 0-5
Minimal interval [0,5] (find k matching indices in O(n) time)
Lets keep to sorted arrays: one which sorted according to minimal number in pair and other to maximal. Lets iterate over first array and fix minimal number in answer. We can keep pointer on k-th number in second array. When we go to next pair we remove all pairs with less minimal value from second array and forward pointer if needed. To find position in log n time in second array we can keep additional map between pair and position.

Maximum of all possible subarrays of an array

How do I find/store maximum/minimum of all possible non-empty sub-arrays of an array of length n?
I generated the segment tree of the array and the for each possible sub array if did query into segment tree but that's not efficient. How do I do it in O(n)?
P.S n <= 10 ^7
For eg. arr[]= { 1, 2, 3 }; // the array need not to be sorted
sub-array min max
{1} 1 1
{2} 2 2
{3} 3 3
{1,2} 1 2
{2,3} 2 3
{1,2,3} 1 3
I don't think it is possible to store all those values in O(n). But it is pretty easy to create, in O(n), a structure that makes possible to answer, in O(1) the query "how many subsets are there where A[i] is the maximum element".
Naïve version:
Think about the naïve strategy: to know how many such subsets are there for some A[i], you could employ a simple O(n) algorithm that counts how many elements to the left and to the right of the array that are less than A[i]. Let's say:
A = [... 10 1 1 1 5 1 1 10 ...]
This 5 up has 3 elements to the left and 2 to the right lesser than it. From this we know there are 4*3=12 subarrays for which that very 5 is the maximum. 4*3 because there are 0..3 subarrays to the left and 0..2 to the right.
Optimized version:
This naïve version of the check would take O(n) operations for each element, so O(n^2) after all. Wouldn't it be nice if we could compute all these lengths in O(n) in a single pass?
Luckily there is a simple algorithm for that. Just use a stack. Traverse the array normally (from left to right). Put every element index in the stack. But before putting it, remove all the indexes whose value are lesser than the current value. The remaining index before the current one is the nearest larger element.
To find the same values at the right, just traverse the array backwards.
Here's a sample Python proof-of-concept that shows this algorithm in action. I implemented also the naïve version so we can cross-check the result from the optimized version:
from random import choice
from collections import defaultdict, deque
def make_bounds(A, fallback, arange, op):
stack = deque()
bound = [fallback] * len(A)
for i in arange:
while stack and op(A[stack[-1]], A[i]):
stack.pop()
if stack:
bound[i] = stack[-1]
stack.append(i)
return bound
def optimized_version(A):
T = zip(make_bounds(A, -1, xrange(len(A)), lambda x, y: x<=y),
make_bounds(A, len(A), reversed(xrange(len(A))), lambda x, y: x<y))
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left, right = T[i]
answer[x] += (i-left) * (right-i)
return dict(answer)
def naive_version(A):
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
answer[x] += (i-left) * (right-i)
return dict(answer)
A = [choice(xrange(32)) for i in xrange(8)]
MA1 = naive_version(A)
MA2 = optimized_version(A)
print 'Array: ', A
print 'Naive: ', MA1
print 'Optimized:', MA2
print 'OK: ', MA1 == MA2
I don't think it is possible to it directly in O(n) time: you need to iterate over all the elements of the subarrays, and you have n of them. Unless the subarrays are sorted.
You could, on the other hand, when initialising the subarrays, instead of making them normal arrays, you could build heaps, specifically min heaps when you want to find the minimum and max heaps when you want to find the maximum.
Building a heap is a linear time operation, and retrieving the maximum and minimum respectively for a max heap and min heap is a constant time operation, since those elements are found at the first place of the heap.
Heaps can be easily implemented just using a normal array.
Check this article on Wikipedia about binary heaps: https://en.wikipedia.org/wiki/Binary_heap.
I do not understand what exactly you mean by maximum of sub-arrays, so I will assume you are asking for one of the following
The subarray of maximum/minimum length or some other criteria (in which case the problem will reduce to finding max element in a 1 dimensional array)
The maximum elements of all your sub-arrays either in the context of one sub-array or in the context of the entire super-array
Problem 1 can be solved by simply iterating your super-array and storing a reference to the largest element. Or building a heap as nbro had said. Problem 2 also has a similar solution. However a linear scan is through n arrays of length m is not going to be linear. So you will have to keep your class invariants such that the maximum/minimum is known after every operation. Maybe with the help of some data structure like a heap.
Assuming you mean contiguous sub-arrays, create the array of partial sums where Yi = SUM(i=0..i)Xi, so from 1,4,2,3 create 0,1,1+4=5,1+4+2=7,1+4+2+3=10. You can create this from left to right in linear time, and the value of any contiguous subarray is one partial sum subtracted from another, so 4+2+3 = 1+4+2+3 - 1= 9.
Then scan through the partial sums from left to right, keeping track of the smallest value seen so far (including the initial zero). At each point subtract this from the current value and keep track of the highest value produced in this way. This should give you the value of the contiguous sub-array with largest sum, and you can keep index information, too, to find where this sub-array starts and ends.
To find the minimum, either change the above slightly or just reverse the sign of all the numbers and do exactly the same thing again: min(a, b) = -max(-a, -b)
I think the question you are asking is to find the Maximum of a subarry.
bleow is the code that cand do that in O(n) time.
int maxSumSubArr(vector<int> a)
{
int maxsum = *max_element(a.begin(), a.end());
if(maxsum < 0) return maxsum;
int sum = 0;
for(int i = 0; i< a.size; i++)
{
sum += a[i];
if(sum > maxsum)maxsum = sum;
if(sum < 0) sum = 0;
}
return maxsum;
}
Note: This code is not tested please add comments if found some issues.

Maximise the minimum difference [duplicate]

This question already has answers here:
Take K elements and maximise the minimum distance
(2 answers)
Closed 7 years ago.
We are given N elements in form of array A , Now we have to choose K indexes from N given indexes such that for any 2 indexes i and j minimum value of |A[i]-A[j]| is as large as possible. We need to tell this maximum value.
Lets take an example : Let N=5 and K=2 and array be [1,5,3,7,11] then here answer is 10 as we can simply choose first and last position and differ = 11-1=10.
Example 2 : Let N=10 and K=3 and array A be [3 9 6 11 15 20 23] then here answer will be 8. As we can select [3,11,23] or [3,15,23].
Now given N , K and Array A we need to find this maximum difference.
We are given that 1 ≤ N ≤ 10^5 and 1 ≤ S ≤ 10^7
Let's sort the array.
Now we can do a binary search over the answer.
For a fixed candidate x, we can just pick the elements greedily(iterating over the sorted array and taking each element if we can). If the number of elements we have picked is not less than K, x is feasible. Otherwise, it is not.
The time complexity is O(N * log N + N * log (MAX_ELEMENT - MIN_ELEMENT))
A pseudo code:
bool isFeasible(int x):
cnt = 1
last = a[0]
for i <- 1 ... n - 1:
if a[i] - last >= x:
last = a[i]
cnt++
return cnt >= k
sort(a)
low = 0
high = a[n - 1] - a[0] + 1
while high - low > 1:
mid = low + (high - low) / 2
if isFeasible(mid):
low = mid
else
high = mid
print(low)
I think this can be dealt with as a dynamic programming problem. Start off by sorting A, and then the problem is to mark K elements in A such that the minimum difference between adjacent marked items is as large as possible. As a starter, you can always mark the first and last elements.
Moving from left to right, at each position for i=1..N work out the largest minimum difference you can get by marking i elements in the sub-array terminating at this position. You can work out the largest minimum difference for k items terminating at this position by considering the largest minimum difference for k-1 items terminating at each position to the left of the position you are working on. The obvious thing to do is to consider each possible position up to the position you are currently working on as ending a stretch of k-1 items with minimum difference, but you may be able to do a binary search here to speed things up.
Once you have worked all the way to the right hand end you know the maximum possible value for the original problem. If you need to know where to put the K elements, you can take notes as you go along so that you can backtrack to find out the elements chosen that lead to this solution, working from right to left.

Generate a random integer from 0 to N-1 which is not in the list

You are given N and an int K[].
The task at hand is to generate a equal probabilistic random number between 0 to N-1 which doesn't exist in K.
N is strictly a integer >= 0.
And K.length is < N-1. And 0 <= K[i] <= N-1. Also assume K is sorted and each element of K is unique.
You are given a function uniformRand(int M) which generates uniform random number in the range 0 to M-1 And assume this functions's complexity is O(1).
Example:
N = 7
K = {0, 1, 5}
the function should return any random number { 2, 3, 4, 6 } with equal
probability.
I could get a O(N) solution for this : First generate a random number between 0 to N - K.length. And map the thus generated random number to a number not in K. The second step will take the complexity to O(N). Can it be done better in may be O(log N) ?
You can use the fact that all the numbers in K[] are between 0 and N-1 and they are distinct.
For your example case, you generate a random number from 0 to 3. Say you get a random number r. Now you conduct binary search on the array K[].
Initialize i = K.length/2.
Find K[i] - i. This will give you the number of numbers missing from the array in the range 0 to i.
For example K[2] = 5. So 3 elements are missing from K[0] to K[2] (2,3,4)
Hence you can decide whether you have to conduct the remaining search in the first part of array K or the next part. This is because you know r.
This search will give you a complexity of log(K.length)
EDIT: For example,
N = 7
K = {0, 1, 4} // modified the array to clarify the algorithm steps.
the function should return any random number { 2, 3, 5, 6 } with equal probability.
Random number generated between 0 and N-K.length = random{0-3}. Say we get 3. Hence we require the 4th missing number in array K.
Conduct binary search on array K[].
Initial i = K.length/2 = 1.
Now we see K[1] - 1 = 0. Hence no number is missing upto i = 1. Hence we search on the latter part of the array.
Now i = 2. K[2] - 2 = 4 - 2 = 2. Hence there are 2 missing numbers up to index i = 2. But we need the 4th missing element. So we again have to search in the latter part of the array.
Now we reach an empty array. What should we do now? If we reach an empty array between say K[j] & K[j+1] then it simply means that all elements between K[j] and K[j+1] are missing from the array K.
Hence all elements above K[2] are missing from the array, namely 5 and 6. We need the 4th element out of which we have already discarded 2 elements. Hence we will choose the second element which is 6.
Binary search.
The basic algorithm:
(not quite the same as the other answer - the number is only generated at the end)
Start in the middle of K.
By looking at the current value and it's index, we can determine the number of pickable numbers (numbers not in K) to the left.
Similarly, by including N, we can determine the number of pickable numbers to the right.
Now randomly go either left or right, weighted based on the count of pickable numbers on each side.
Repeat in the chosen subarray until the subarray is empty.
Then generate a random number in the range consisting of the numbers before and after the subarray in the array.
The running time would be O(log |K|), and, since |K| < N-1, O(log N).
The exact mathematics for number counts and weights can be derived from the example below.
Extension with K containing a bigger range:
Now let's say (for enrichment purposes) K can also contain values N or larger.
Then, instead of starting with the entire K, we start with a subarray up to position min(N, |K|), and start in the middle of that.
It's easy to see that the N-th position in K (if one exists) will be >= N, so this chosen range includes any possible number we can generate.
From here, we need to do a binary search for N (which would give us a point where all values to the left are < N, even if N could not be found) (the above algorithm doesn't deal with K containing values greater than N).
Then we just run the algorithm as above with the subarray ending at the last value < N.
The running time would be O(log N), or, more specifically, O(log min(N, |K|)).
Example:
N = 10
K = {0, 1, 4, 5, 8}
So we start in the middle - 4.
Given that we're at index 2, we know there are 2 elements to the left, and the value is 4, so there are 4 - 2 = 2 pickable values to the left.
Similarly, there are 10 - (4+1) - 2 = 3 pickable values to the right.
So now we go left with probability 2/(2+3) and right with probability 3/(2+3).
Let's say we went right, and our next middle value is 5.
We are at the first position in this subarray, and the previous value is 4, so we have 5 - (4+1) = 0 pickable values to the left.
And there are 10 - (5+1) - 1 = 3 pickable values to the right.
We can't go left (0 probability). If we go right, our next middle value would be 8.
There would be 2 pickable values to the left, and 1 to the right.
If we go left, we'd have an empty subarray.
So then we'd generate a number between 5 and 8, which would be 6 or 7 with equal probability.
This can be solved by basically solving this:
Find the rth smallest number not in the given array, K, subject to
conditions in the question.
For that consider the implicit array D, defined by
D[i] = K[i] - i for 0 <= i < L, where L is length of K
We also set D[-1] = 0 and D[L] = N
We also define K[-1] = 0.
Note, we don't actually need to construct D. Also note that D is sorted (and all elements non-negative), as the numbers in K[] are unique and increasing.
Now we make the following claim:
CLAIM: To find the rth smallest number not in K[], we need to find right most occurrence of r' in D (which occurs at position defined by j), where r' is the largest number in D, which is < r. Such an r' exists, because D[-1] = 0. Once we find such an r' (and j), the number we are looking for is r-r' + K[j].
Proof: Basically the definition of r' and j tells us that there are exactlyr' numbers missing from 0 to K[j], and more than r numbers missing from 0 to K[j+1]. Thus all the numbers from K[j]+1 to K[j+1]-1 are missing (and these missing are at least r-r' in number), and the number we seek is among them, given by K[j] + r-r'.
Algorithm:
In order to find (r',j) all we need to do is a (modified) binary search for r in D, where we keep moving to the left even if we find r in the array.
This is an O(log K) algorithm.
If you are running this many times, it probably pays to speed up your generation operation: O(log N) time just isn't acceptable.
Make an empty array G. Starting at zero, count upwards while progressing through the values of K. If a value isn't in K add it to G. If it is in K don't add it and progress your K pointer. (This relies on K being sorted.)
Now you have an array G which has only acceptable numbers.
Use your random number generator to choose a value from G.
This requires O(N) preparatory work and each generation happens in O(1) time. After N look-ups the amortized time of all operations is O(1).
A Python mock-up:
import random
class PRNG:
def __init__(self, K,N):
self.G = []
kptr = 0
for i in range(N):
if kptr<len(K) and K[kptr]==i:
kptr+=1
else:
self.G.append(i)
def getRand(self):
rn = random.randint(0,len(self.G)-1)
return self.G[rn]
prng=PRNG( [0,1,5], 7)
for i in range(20):
print prng.getRand()

Resources