Smallest missing integer algorithm that runs in O(n)? - algorithm

What algorithm might find a missing integer in O(n) time, from an array?
Say we have an array A with elements in a value-range {1,2,3...2n}. Half the elements are missing so length of A = n.
E.g:
A = [1,2,5,3,10] , n=5
Output = 4

The smallest missing integer must be in the range [1, ..., n+1]. So create an array of flags, all initially false, indicating the presence of that integer. Then an algorithm is:
Scan the input array, setting flags to true as you encounter values in the range. This operation is O(n). (That is, set flag[A[i]] to true for each position i in the input array, provided A[i] <= n.)
Scan the flag array for the first false flag. This operation is also O(n). The index of the first false flag is the smallest missing integer.
EDIT: O(n) time algorithm with O(1) extra space:
If A is writable and there are some extra bits available in the elements of A, then a constant-extra-space algorithm is possible. For instance, if the elements of A are signed values, and since all the numbers are positive, we can use the sign bit of the numbers in the original array as the flags, rather than creating a new flag array. So the algorithm would be:
For each position i of the original array, if abs(A[i]) < n+1, make the value at A[abs(A[i])] negative. (This assumes array indexes are based at 1. Adjust in the obvious way if you are using 0-based arrays.) Don't just negate the value, in case there are duplicate values in A.
Find the index of the first element of A that is positive. That index is the smallest missing number in A. If all positions are negative, then A must be a permutation of {1, ..., n} and hence the smallest missing number is n+1.
If the elements are unsigned, but can hold values as high as 4 n + 1, then in step 1, instead of making the element negative, add 2 n + 1 (provided the element is <= 2 n) and use (A[i] mod (2n+1)) instead of abs(A[i]). Then in step 2, find the first element < 2 n + 1 instead of the first positive element. Other such tricks are possible as well.

You can do this in O(1) additional space, assuming that the only valid operations on the array is to read elements, and to swap pairs of elements.
First note that the specification of the problem excludes the possibility of the array containing duplicates: it contains half of the numbers from 1 to 2N.
We perform a quick-select type algorithm. Start with m=1, M=2N+1, and pivot the array on (m + M)/2. If the size of the left part of the array (elements <= (m+M)/2) is less than (m + M)/2 - m + 1, then the first missing number must be there. Otherwise, it must be in the right part of the array. Repeat on the left or right side accordingly until you find the missing number.
The size of the slice of the array under consideration halves each time and pivoting an array of size n can be done in O(n) time and O(1) space. So overall, the time complexity is 2N + N + N/2 + ... + 1 <= 4N = O(N).

An implementation of Paul Hankin's idea in C++
#include <iostream>
using namespace std;
const int MAX = 1000;
int a[MAX];
int n;
void swap(int &a, int &b) {
int tmp = a;
a = b;
b = tmp;
}
// Rearranges elements of a[l..r] in such a way that first come elements
// lower or equal to M, next come elements greater than M. Elements in each group
// come in no particular order.
// Returns an index of the first element among a[l..r] which is greater than M.
int rearrange(int l, int r, int M) {
int i = l, j = r;
while (i <= j)
if (a[i] <= M) i++;
else swap(a[i], a[j--]);
return i;
}
int main() {
cin >> n;
for (int i = 0; i < n; i++) cin >> a[i];
int L = 1, R = 2 * n;
int l = 0, r = n - 1;
while (L < R) {
int M = (L + R) / 2; // pivot element
int m = rearrange(l, r, M);
if (m - l == M - L + 1)
l = m, L = M + 1;
else
r = m - 1, R = M;
}
cout << L;
return 0;
}

Related

O(n) solution to counting sub-arrays with sum constraints

I'm trying to improve my intuition around the following two sub-array problems.
Problem one
Return the length of the shortest, non-empty, contiguous sub-array of A with sum at least
K. If there is no non-empty sub-array with sum at least K, return -1
I've come across an O(N) solution online.
public int shortestSubarray(int[] A, int K) {
int N = A.length;
long[] P = new long[N+1];
for (int i = 0; i < N; ++i)
P[i+1] = P[i] + (long) A[i];
// Want smallest y-x with P[y] - P[x] >= K
int ans = N+1; // N+1 is impossible
Deque<Integer> monoq = new LinkedList(); //opt(y) candidates, as indices of P
for (int y = 0; y < P.length; ++y) {
// Want opt(y) = largest x with P[x] <= P[y] - K;
while (!monoq.isEmpty() && P[y] <= P[monoq.getLast()])
monoq.removeLast();
while (!monoq.isEmpty() && P[y] >= P[monoq.getFirst()] + K)
ans = Math.min(ans, y - monoq.removeFirst());
monoq.addLast(y);
}
return ans < N+1 ? ans : -1;
}
It seems to be maintaining a sliding window with a deque. It looks like a variant of Kadane's algorithm.
Problem two
Given an array of N integers (positive and negative), find the number of
contiguous sub array whose sum is greater or equal to K (also, positive or
negative)"
The best solution I've seen to this problem is O(nlogn) as described in the following answer.
tree = an empty search tree
result = 0
// This sum corresponds to an empty prefix.
prefixSum = 0
tree.add(prefixSum)
// Iterate over the input array from left to right.
for elem <- array:
prefixSum += elem
// Add the number of subarrays that have this element as the last one
// and their sum is not less than K.
result += tree.getNumberOfLessOrEqual(prefixSum - K)
// Add the current prefix sum the tree.
tree.add(prefixSum)
print result
My questions
Is my intuition that algorithm one is a variant of Kandane's algorithm correct?
If so, is there a variant of this algorithm (or another O(n) solution) that can be used to solve problem two?
Why can problem two only be solved in O(nlogn) time when they look so similar?

Finding subarray with target bitwise AND value

Given an array A of size N and an integer P, find the subarray B = A[i...j] such that i <= j, compute the bitwise value of subarray elements say K = B[i] & B[i + 1] & ... & B[j].
Output the minimum value of |K-P| among all possible values of K.
Here is a a quasilinear approach, assuming the elements of the array have a constant number of bits.
The rows of the matrix K[i,j] = A[i] & A[i + 1] & ... & A[j] are monotonically decreasing (ignore the lower triangle of the matrix). That means the absolute value of the difference between K[i,:] and the search parameter P is unimodal and a minimum (not necessarily the minimum as the same minimum may occur several times, but then they will do so in a row) can be found in O(log n) time with ternary search (assuming access to elements of K can be arranged in constant time). Repeat this for every row and output the position of the lowest minimum, bringing it up to O(n log n).
Performing the row-minimum search in a time less than the size of row requires implicit access to the elements of the matrix K, which could be accomplished by creating b prefix-sum arrays, one for each bit of the elements of A. A range-AND can then be found by calculating all b single-bit range-sums and comparing them with the length of the range, each comparison giving a single bit of the range-AND. This takes O(nb) preprocessing and gives O(b) (so constant, by the assumption I made at the beginning) access to arbitrary elements of K.
I had hoped that the matrix of absolute differences would be a Monge matrix allowing the SMAWK algorithm to be used, but that does not seem to be the case and I could not find a way to push to towards that property.
Are you familiar with the Find subarray with given sum problem? The solution I'm proposing uses the same method as in the efficient solution in the link. It is highly recommended to read it before continuing.
First let's notice that the longer a subarray its K will be it will be smaller, since the & operator between two numbers can create only a smaller number.
So if I have a subarray from i to j and I want want to make its K smaller I'll add more elements (now the subarray is from i to j + 1), if I want to make K larger I'll remove elements (i + 1 to j).
If we review the solution to Find subarray with given sum we see that we can easily transform it to our problem - the given sum is K and summing is like using the & operator, but more elements is smaller K so we can flip the comparison of the sums.
This problem tells you if the solution exist but if you simply maintain the minimal difference you found so far you can solve your problem as well.
Edit
This solution is true if all the numbers are positive, as mentioned in the comments, if not all the numbers are positive the solution is slightly different.
Notice that if not all of the numbers are negative, the K will be positive, so in order to find a negative P we can consider only the negatives in the algorithm, than use the algorithm as shown above.
Here an other quasi-linear algorithm, mixing the yonlif Find subarray with given sum problem solution with Harold idea to compute K[i,j]; therefore I don't use pre-processing which if memory-hungry. I use a counter to keep trace of bits and compute at most 2N values of K, each costing at most O(log N). since log N is generally smaller than the word size (B), it's faster than a linear O(NB) algorithm.
Counts of bits of N numbers can be done with only ~log N words :
So you can compute A[i]&A[i+1]& ... &A[I+N-1] with only log N operations.
Here the way to manage the counter: if
counter is C0,C1, ...Cp, and
Ck is Ck0,Ck1, ...Ckm,
Then Cpq ... C1q,C0q is the binary representation of the number of bits equal to 1 among the q-th bit of {A[i],A[i+1], ... ,A[j-1]}.
The bit-level implementation (in python); all bits are managed in parallel.
def add(counter,x):
k = 0
while x :
x, counter[k] = x & counter[k], x ^ counter[k]
k += 1
def sub(counter,x):
k = 0
while x :
x, counter[k] = x & ~counter[k], x ^ counter[k]
k += 1
def val(counter,count): # return A[i] & .... & A[j-1] if count = j-i.
k = 0
res = -1
while count:
if count %2 > 0 : res &= counter[k]
else: res &= ~counter[k]
count //= 2
k += 1
return res
And the algorithm :
def solve(A,P):
counter = np.zeros(32, np.int64) # up to 4Go
n = A.size
i = j = 0
K=P # trig fill buffer
mini = np.int64(2**63-1)
while i<n :
if K<P or j == n : # dump buffer
sub(counter,A[i])
i += 1
else: # fill buffer
add(counter,A[j])
j += 1
if j>i:
K = val(counter, count)
X = np.abs(K - P)
if mini > X: mini = X
else : K = P # reset K
return mini
val,sub and add are O(ln N) so the whole process is O(N ln N)
Test :
n = 10**5
A = np.random.randint(0, 10**8, n, dtype=np.int64)
P = np.random.randint(0, 10**8, dtype=np.int64)
%time solve(A,P)
Wall time: 0.8 s
Out: 452613036735
A numba compiled version (decorate the 4 functions by #numba.jit) is 200x faster (5 ms).
Yonlif answer is wrong.
In the Find subaray with given sum solution we have a loop where we do substruction.
while (curr_sum > sum && start < i-1)
curr_sum = curr_sum - arr[start++];
Since there is no inverse operator of a logical AND, we cannot rewrite this line and we cannot use this solution directly.
One would say that we can recalculate the sum every time when we increase the lower bound of a sliding window (which would lead us to O(n^2) time complexity), but this solution would not work (I'll provide the code and counter example in the end).
Here is brute force solution that works in O(n^3)
unsigned int getSum(const vector<int>& vec, int from, int to) {
unsigned int sum = -1;
for (auto k = from; k <= to; k++)
sum &= (unsigned int)vec[k];
return sum;
}
void updateMin(unsigned int& minDiff, int sum, int target) {
minDiff = std::min(minDiff, (unsigned int)std::abs((int)sum - target));
}
// Brute force solution: O(n^3)
int maxSubArray(const std::vector<int>& vec, int target) {
auto minDiff = UINT_MAX;
for (auto i = 0; i < vec.size(); i++)
for (auto j = i; j < vec.size(); j++)
updateMin(minDiff, getSum(vec, i, j), target);
return minDiff;
}
Here is O(n^2) solution in C++ (thanks to B.M answer) The idea is to update current sum instead calling getSum for every two indices. You should also look at B.M answer as it contains conditions for early braak. Here is C++ version:
int maxSubArray(const std::vector<int>& vec, int target) {
auto minDiff = UINT_MAX;
for (auto i = 0; i < vec.size(); i++) {
unsigned int sum = -1;
for (auto j = i; j < vec.size(); j++) {
sum &= (unsigned int)vec[j];
updateMin(minDiff, sum, target);
}
}
return minDiff;
}
Here is NOT working solution with a sliding window: This is the idea from Yonlif answer with the precomputation of the sum in O(n^2)
int maxSubArray(const std::vector<int>& vec, int target) {
auto minDiff = UINT_MAX;
unsigned int sum = -1;
auto left = 0, right = 0;
while (right < vec.size()) {
if (sum > target)
sum &= (unsigned int)vec[right++];
else
sum = getSum(vec, ++left, right);
updateMin(minDiff, sum, target);
}
right--;
while (left < vec.size()) {
sum = getSum(vec, left++, right);
updateMin(minDiff, sum, target);
}
return minDiff;
}
The problem with this solution is that we skip some sequences which can actually be the best ones.
Input: vector = [26,77,21,6], target = 5.
Ouput should be zero as 77&21=5, but sliding window approach is not capable of finding that one as it will first consider window [0..3] and than increase lower bound, without possibility to consider window [1..2].
If someone have a linear or log-linear solution which works it would be nice to post.
Here is a solution that i wrote and that takes time complexity of the order O(n^2).
The below code snippet is written in Java .
class Solution{
public int solve(int[] arr,int p){
int maxk = Integer.MIN_VALUE;
int mink = Integer.MAX_VALUE;
int size = arr.length;
for(int i =0;i<size;i++){
int temp = arr[i];
for(int j = i;j<size;j++){
temp &=arr[j];
if(temp<=p){
if(temp>maxk)
maxk = temp;
}
else{
if(temp < mink)
mink = temp;
}
}
}
int min1 = Math.abs(mink -p);
int min2 = Math.abs(maxk -p);
return ( min1 < min2 ) ? min1 : min2;
}
}
It is simple brute force approach where 2 numbers let us say x and y , such that x <= k and y >=k are found where x and y are some different K = arr[i]&arr[i+1]&...arr[j] where i<=j for different i and j for x,y .
Answer will be just the minimum of |x-p| and |y-p| .
This is a Python implementation of the O(n) solution based on the broad idea from Yonlif's answer. There were doubts about whether this solution could work since no implementation was provided, so here's an explicit writeup.
Some caveats:
The code technically runs in O(n*B), where n is the number of integers and B is the number of unique bit positions set in any of the integers. With constant-width integers that's linear, but otherwise it's not generally linear in actual input size. You can get a true linear solution for exponentially large inputs with more bookkeeping.
Negative numbers in the array aren't handled, since their bit representation isn't specified in the question. See the comments on Yonlif's answer for hints on how to handle fixed-width two's complement signed integers.
The contentious part of the sliding window solution seems to be how to 'undo' bitwise &. The trick is to store the counts of set-bits in each bit-position of elements in your sliding window, not just the bitwise &. This means adding or removing an element from the window turns into adding or removing 1 from the bit-counters for each set-bit in the element.
On top of testing this code for correctness, it isn't too hard to prove that a sliding window approach can solve this problem. The bitwise & function on subarrays is weakly-monotonic with respect to subarray inclusion. Therefore the basic approach of increasing the right pointer when the &-value is too large, and increasing the left pointer when the &-value is too small, will cause our sliding window to equal an optimal sliding window at some point.
Here's a small example run on Dejan's testcase from another answer:
A = [26, 77, 21, 6], Target = 5
Active sliding window surrounded by []
[26], 77, 21, 6
left = 0, right = 0, AND = 26
----------------------------------------
[26, 77], 21, 6
left = 0, right = 1, AND = 8
----------------------------------------
[26, 77, 21], 6
left = 0, right = 2, AND = 0
----------------------------------------
26, [77, 21], 6
left = 1, right = 2, AND = 5
----------------------------------------
26, 77, [21], 6
left = 2, right = 2, AND = 21
----------------------------------------
26, 77, [21, 6]
left = 2, right = 3, AND = 4
----------------------------------------
26, 77, 21, [6]
left = 3, right = 3, AND = 6
So the code will correctly output 0, as the value of 5 was found for [77, 21]
Python code:
def find_bitwise_and(nums: List[int], target: int) -> int:
"""Find smallest difference between a subarray-& and target.
Given a list on nonnegative integers, and nonnegative target
returns the minimum value of abs(target - BITWISE_AND(B))
over all nonempty subarrays B
Runs in linear time on fixed-width integers.
"""
def get_set_bits(x: int) -> List[int]:
"""Return indices of set bits in x"""
return [i for i, x in enumerate(reversed(bin(x)[2:]))
if x == '1']
def counts_to_bitwise_and(window_length: int,
bit_counts: Dict[int, int]) -> int:
"""Given bit counts for a window of an array, return
bitwise AND of the window's elements."""
return sum((1 << key) for key, count in bit_counts.items()
if count == window_length)
current_AND_value = nums[0]
best_diff = abs(current_AND_value - target)
window_bit_counts = Counter(get_set_bits(nums[0]))
left_idx = right_idx = 0
while right_idx < len(nums):
# Expand the window to decrease & value
if current_AND_value > target or left_idx > right_idx:
right_idx += 1
if right_idx >= len(nums):
break
window_bit_counts += Counter(get_set_bits(nums[right_idx]))
# Shrink the window to increase & value
else:
window_bit_counts -= Counter(get_set_bits(nums[left_idx]))
left_idx += 1
current_AND_value = counts_to_bitwise_and(right_idx - left_idx + 1,
window_bit_counts)
# No nonempty arrays allowed
if left_idx <= right_idx:
best_diff = min(best_diff, abs(current_AND_value - target))
return best_diff

Majority element of JPEG images using Divide and conquer [duplicate]

An array is said to have a majority element if more than half of its elements are the same. Is there a divide-and-conquer algorithm for determining if an array has a majority element?
I normally do the following, but it is not using divide-and-conquer. I do not want to use the Boyer-Moore algorithm.
int find(int[] arr, int size) {
int count = 0, i, mElement;
for (i = 0; i < size; i++) {
if (count == 0) mElement = arr[i];
if (arr[i] == mElement) count++;
else count--;
}
count = 0;
for (i = 0; i < size; i++) {
if (arr[i] == mElement) count++;
}
if (count > size / 2) return mElement;
return -1;
}
I can see at least one divide and conquer method.
Start by finding the median, such as with Hoare's Select algorithm. If one value forms a majority of the elements, the median must have that value, so we've just found the value we're looking for.
From there, find (for example) the 25th and 75th percentile items. Again, if there's a majority element, at least one of those would need to have the same value as the median.
Assuming you haven't ruled out there being a majority element yet, you can continue the search. For example, let's assume the 75th percentile was equal to the median, but the 25th percentile wasn't.
When then continue searching for the item halfway between the 25th percentile and the median, as well as the one halfway between the 75th percentile and the end.
Continue finding the median of each partition that must contain the end of the elements with the same value as the median until you've either confirmed or denied the existence of a majority element.
As an aside: I don't quite see how Boyer-Moore would be used for this task. Boyer-Moore is a way of finding a substring in a string.
There is, and it does not require the elements to have an order.
To be formal, we're dealing with multisets (also called bags.) In the following, for a multiset S, let:
v(e,S) be the multiplicity of an element e in S, i.e. the number of times it occurs (the multiplicity is zero if e is not a member of S at all.)
#S be the cardinality of S, i.e. the number of elements in S counting multiplicity.
⊕ be the multiset sum: if S = L ⊕ R then S contains all the elements of L and R counting multiplicity, i.e. v(e;S) = v(e;L) + v(e;R) for any element e. (This also shows that the multiplicity can be calculated by 'divide-and-conquer'.)
[x] be the largest integer less than or equal to x.
The majority element m of S, if it exists, is that element such that 2 v(m;S) > #S.
Let's call L and R a splitting of S if L ⊕ R = S and an even splitting if |#L - #R| ≤ 1. That is, if n=#S is even, L and R have exactly half the elements of S, and if n is odd, than one has cardinality [n/2] and the other has cardinality [n/2]+1.
For an arbitrary split of S into L and R, two observations:
If neither L nor R has a majority element, then S cannot: for any element e, 2 v(e;S) = 2 v(e;L) + 2 v(e;R) ≤ #L + #R = #S.
If one of L and R has a majority element m with multiplicity k, then it is the majority element of S only if it has multiplicity r in the other half, with 2(k+r) > #S.
The algorithm majority(S) below returns either a pair (m,k), indicating that m is the majority element with k occurrences, or none:
If S is empty, return none; if S has just one element m, then return (m,1). Otherwise:
Make an even split of S into two halves L and R.
Let (m,k) = majority(L), if not none:
a. Let k' = k + v(m;R).
b. Return (m,k') if 2 k' > n.
Otherwise let (m,k) = majority(R), if not none:
a. Let k' = k + v(m;L).
b. Return (m,k') if 2 k' > n.
Otherwise return none.
Note that the algorithm is still correct even if the split is not an even one. Splitting evenly though is likely to perform better in practice.
Addendum
Made the terminal case explicit in the algorithm description above. Some sample C++ code:
struct majority_t {
int m; // majority element
size_t k; // multiplicity of m; zero => no majority element
constexpr majority_t(): m(0), k(0) {}
constexpr majority_t(int m_,size_t k_): m(m_), k(k_) {}
explicit operator bool() const { return k>0; }
};
static constexpr majority_t no_majority;
size_t multiplicity(int x,const int *arr,size_t n) {
if (n==0) return 0;
else if (n==1) return arr[0]==x?1:0;
size_t r=n/2;
return multiplicity(x,arr,r)+multiplicity(x,arr+r,n-r);
}
majority_t majority(const int *arr,size_t n) {
if (n==0) return no_majority;
else if (n==1) return majority_t(arr[0],1);
size_t r=n/2;
majority_t left=majority(arr,r);
if (left) {
left.k+=multiplicity(left.m,arr+r,n-r);
if (left.k>r) return left;
}
majority_t right=majority(arr+r,n-r);
if (right) {
right.k+=multiplicity(right.m,arr,r);
if (right.k>r) return right;
}
return no_majority;
}
A simpler divide and conquer algorithm works for the case that there exists more than 1/2 elements which are the same and there are n = 2^k elements for some integer k.
FindMost(A, startIndex, endIndex)
{ // input array A
if (startIndex == endIndex) // base case
return A[startIndex];
x = FindMost(A, startIndex, (startIndex + endIndex - 1)/2);
y = FindMost(A, (startIndex + endIndex - 1)/2 + 1, endIndex);
if (x == null && y == null)
return null;
else if (x == null && y != null)
return y;
else if (x != null && y == null)
return x;
else if (x != y)
return null;
else return x
}
This algorithm could be modified so that it works for n which is not exponent of 2, but boundary cases must be handled carefully.
Lets say the array is 1, 2, 1, 1, 3, 1, 4, 1, 6, 1.
If an array contains more than half of elements same then there should be a position where the two consecutive elements are same.
In the above example observe 1 is repeated more than half times. And the indexes(index start from 0) index 2 and index 3 have same element.

Maximize sum of list with no more than k consecutive elements from input

I have an array of N numbers and I want remove only those elements from the list which when removed will create a new list where there are no more K numbers adjacent to each other. There can be multiple lists that can be created with this restriction. So I just want that list in which the sum of the remaining numbers is maximum and as an output print that sum only.
The algorithm that I have come up with so far has a time complexity of O(n^2). Is it possible to get better algorithm for this problem?
Link to the question.
Here's my attempt:
int main()
{
//Total Number of elements in the list
int count = 6;
//Maximum number of elements that can be together
int maxTogether = 1;
//The list of numbers
int billboards[] = {4, 7, 2, 0, 8, 9};
int maxSum = 0;
for(int k = 0; k<=maxTogether ; k++){
int sum=0;
int size= k;
for (int i = 0; i< count; i++) {
if(size != maxTogether){
sum += billboards[i];
size++;
}else{
size = 0;
}
}
printf("%i\n", sum);
if(sum > maxSum)
{
maxSum = sum;
}
}
return 0;
}
The O(NK) dynamic programming solution is fairly easy:
Let A[i] be the best sum of the elements to the left subject to the not-k-consecutive constraint (assuming we're removing the i-th element as well).
Then we can calculate A[i] by looking back K elements:
A[i] = 0;
for j = 1 to k
A[i] = max(A[i], A[i-j])
A[i] += input[i]
And, at the end, just look through the last k elements from A, adding the elements to the right to each and picking the best one.
But this is too slow.
Let's do better.
So A[i] finds the best from A[i-1], A[i-2], ..., A[i-K+1], A[i-K].
So A[i+1] finds the best from A[i], A[i-1], A[i-2], ..., A[i-K+1].
There's a lot of redundancy there - we already know the best from indices i-1 through i-K because of A[i]'s calculation, but then we find the best of all of those except i-K (with i) again in A[i+1].
So we can just store all of them in an ordered data structure and then remove A[i-K] and insert A[i]. My choice - A binary search tree to find the minimum, along with a circular array of size K+1 of tree nodes, so we can easily find the one we need to remove.
I swapped the problem around to make it slightly simpler - instead of finding the maximum of remaining elements, I find the minimum of removed elements and then return total sum - removed sum.
High-level pseudo-code:
for each i in input
add (i + the smallest value in the BST) to the BST
add the above node to the circular array
if it wrapper around, remove the overridden element from the BST
// now the remaining nodes in the BST are the last k elements
return (the total sum - the smallest value in the BST)
Running time:
O(n log k)
Java code:
int getBestSum(int[] input, int K)
{
Node[] array = new Node[K+1];
TreeSet<Node> nodes = new TreeSet<Node>();
Node n = new Node(0);
nodes.add(n);
array[0] = n;
int arrPos = 0;
int sum = 0;
for (int i: input)
{
sum += i;
Node oldNode = nodes.first();
Node newNode = new Node(oldNode.value + i);
arrPos = (arrPos + 1) % array.length;
if (array[arrPos] != null)
nodes.remove(array[arrPos]);
array[arrPos] = newNode;
nodes.add(newNode);
}
return sum - nodes.first().value;
}
getBestSum(new int[]{1,2,3,1,6,10}, 2) prints 21, as required.
Let f[i] be the maximum total value you can get with the first i numbers, while you don't choose the last(i.e. the i-th) one. Then we have
f[i] = max{
f[i-1],
max{f[j] + sum(j + 1, i - 1) | (i - j) <= k}
}
you can use a heap-like data structure to maintain the options and get the maximum one in log(n) time, keep a global delta or whatever, and pay attention to the range i - j <= k.
The following algorithm is of O(N*K) complexity.
Examine the 1st K elements (0 to K-1) of the array. There can be at most 1 gap in this region.
Reason: If there were two gaps, then there would not be any reason to have the lower (earlier gap).
For each index i of these K gap options, following holds true:
1. Sum upto i-1 is the present score of each option.
2. If the next gap is after a distance of d, then the options for d are (K - i) to K
For every possible position of gap, calculate the best sum upto that position among the options.
The latter part of the array can be traversed similarly independently from the past gap history.
Traverse the array further till the end.

Generating M distinct random numbers (one at a time) from a given range 0..N-1 in less than O(M) memory

Is there any method to do this?
I mean, we even cannot work with "in" array of {0,1,..,N-1} (because it's at least O(N) memory).
M can be = N. N can be > 2^64. Result should be uniformly random and would better be every possible sequence (but may not).
Also full-range PRNGs (and friends) aren't suitable, because it will give same sequence each time.
Time complexity doesn't matter.
If you don't care what order the random selection comes out in, then it can be done in constant memory. The selection comes out in order.
The answer hinges on estimating the probability that the smallest value in a random selection of M distinct values of the set {0, ..., N-1} is i, for each possible i. Call this value p(i, M, N). With more mathematics than I have the patience to type into an interface which doesn't support Latex, you can derive some pretty good estimates for the p function; here, I'll just show the simple, non-time-efficient approach.
Let's just focus on p(0, M, N), which is the probability that a random selection of M out of N objects will include the first object. Then we can iterate through the objects (that is, the numbers 0...N-1) one at a time; deciding for each one whether it is included or not by flipping a weighted coin. We just need to compute the coin's weights for each flip.
By definition, there are MCN possible M-selections of a set of N objects. Of these MCN-1 do not include the first element. (That's the count of M-selections of N-1 objects, which is all the M-selections of the set missing one element). Similarly, M-1CN-1 selections do include the first element (that is, all the M-1-selections of the N-1-set, with the first element added to each selection).
These two values add up to MCN; the well-known recursive algorithm for computing C.
So p(0, M, N) is just M-1CN-1/MCN. Since MCN = N!/(M!*(N-M)!), we can simplify that fraction to M/N. As expected, if M == N, that works out to 1 (M of N objects must include every object).
So now we know what the probability that the first object will be in the selection. We can then reduce the size of the set, and either reduce the remaining selection size or not, depending on whether the coin flip determined that we did or did not include the first object. So here's the final algorithm, in pseudo-code, based on the existence of the weighted random boolean function:
w(x, y) => true with probability X / Y; otherwise false.
I'll leave the implementation of w for the reader, since it's trivial.
So:
Generate a random M-selection from the set 0...N-1
Parameters: M, N
Set i = 0
while M > 0:
if w(M, N):
output i
M = M - 1
N = N - 1
i = i + 1
It might not be immediately obvious that that works, but note that:
the output i statement must be executed exactly M times, since it is coupled with a decrement of M, and the while loop executes until M is 0
The closer M gets to N, the higher the probability that M will be decremented. If we ever get to the point where M == N, then both will be decremented in lockstep until they both reach 0.
i is incremented exactly when N is decremented, so it must always be in the range 0...N-1. In fact, it's redundant; we could output N-1 instead of outputting i, which would change the algorithm to produce sets in decreasing order instead of increasing order. I didn't do that because I think the above is easier to understand.
The time complexity of that algorithm is O(N+M) which must be O(N). If N is large, that's not great, but the problem statement said that time complexity doesn't matter, so I'll leave it there.
PRNGs that don't map their state space to a lower number of bits for output should work fine. Examples include Linear Congruential Generators and Tausworthe generators. They will give the same sequence if you use the same seed to start them, but that's easy to change.
Brute force:
if time complexity doesn't matter it would be a solution for 0 < M <= N invariant. nextRandom(N) is a function which returns random integer in [0..N):
init() {
for (int idx = 0; idx < N; idx++) {
a[idx] = -1;
}
for (int idx = 0; idx < M; idx++) {
getNext();
}
}
int getNext() {
for (int idx = 1; idx < M; idx++) {
a[idx -1] = a[idx];
}
while (true) {
r = nextRandom(N);
idx = 0;
while (idx < M && a[idx] != r) idx++;
if (idx == M) {
a[idx - 1] = r;
return r;
}
}
}
O(M) solution: It is recursive solution for simplicity. It supposes to run nextRandom() which returns a random number in [0..1):
rnd(0, 0, N, M); // to get next M distinct random numbers
int rnd(int idx, int n1, int n2, int m) {
if (n1 >= n2 || m <= 0) return idx;
int r = nextRandom(n2 - n1) + n1;
int m1 = (int) ((m-1.0)*(r-n1)/(n2-n1) + nextRandom()); // gives [0..m-1]
int m2 = m - m1 - 1;
idx = rnd(idx, n1, r-1, m1);
print r;
return rnd(idx+1, r+1, n2, m2);
}
the idea is to select a random r in between [0..N) on first step which splits the range on two sub-ranges by N1 and N2 elements in each (N1+N2==N-1). We need to repeat the same step for [0..r) which has N1 elements and [r+1..N) (N2 elements) choosing M1 and M2 (M1+M2==M-1) so as M1/M2 == N1/N2. M1 and M2 must be integers, but the proportion can give real results, we need to round values with probabilities (1.2 will give 1 with p=0.8 and 2 with p=0.2 etc.).

Resources