3-PARTITION problem - algorithm

here is another dynamic programming question (Vazirani ch6)
Consider the following 3-PARTITION
problem. Given integers a1...an, we
want to determine whether it is
possible to partition of {1...n} into
three disjoint subsets I, J, K such
that
sum(I) = sum(J) = sum(K) = 1/3*sum(ALL)
For example, for input (1; 2; 3; 4; 4;
5; 8) the answer is yes, because there
is the partition (1; 8), (4; 5), (2;
3; 4). On the other hand, for input
(2; 2; 3; 5) the answer is no. Devise
and analyze a dynamic programming
algorithm for 3-PARTITION that runs in
time poly- nomial in n and (Sum a_i)
How can I solve this problem? I know 2-partition but still can't solve it

It's easy to generalize 2-sets solution for 3-sets case.
In original version, you create array of boolean sums where sums[i] tells whether sum i can be reached with numbers from the set, or not. Then, once array is created, you just see if sums[TOTAL/2] is true or not.
Since you said you know old version already, I'll describe only difference between them.
In 3-partition case, you keep array of boolean sums, where sums[i][j] tells whether first set can have sum i and second - sum j. Then, once array is created, you just see if sums[TOTAL/3][TOTAL/3] is true or not.
If original complexity is O(TOTAL*n), here it's O(TOTAL^2*n).
It may not be polynomial in the strictest sense of the word, but then original version isn't strictly polynomial too :)

I think by reduction it goes like this:
Reducing 2-partition to 3-partition:
Let S be the original set, and A be its total sum, then let S'=union({A/2},S).
Hence, perform a 3-partition on the set S' yields three sets X, Y, Z.
Among X, Y, Z, one of them must be {A/2}, say it's set Z, then X and Y is a 2-partition.
The witnesses of 3-partition on S' is the witnesses of 2-partition on S, thus 2-partition reduces to 3-partition.

If this problem is to be solvable; then sum(ALL)/3 must be an integer. Any solution must have SUM(J) + SUM(K) = SUM(I) + sum(ALL)/3. This represents a solution to the 2-partition problem over concat(ALL, {sum(ALL)/3}).
You say you have a 2-partition implementation: use it to solve that problem. Then (at least) one of the two partitions will contain the number sum(ALL)/3 - remove the number from that partion, and you've found I. For the other partition, run 2-partition again, to split J from K; after all, J and K must be equal in sum themselves.
Edit: This solution is probably incorrect - the 2-partition of the concatenated set will have several solutions (at least one for each of I, J, K) - however, if there are other solutions, then the "other side" may not consist of the union of two of I, J, K, and may not be splittable at all. You'll need to actually think, I fear :-).
Try 2: Iterate over the multiset, maintaining the following map: R(i,j,k) :: Boolean which represents the fact whether up to the current iteration the numbers permit division into three multisets that have sums i, j, k. I.e., for any R(i,j,k) and next number n in the next state R' it holds that R'(i+n,j,k) and R'(i,j+n,k) and R'(i,j,k+n). Note that the complexity (as per the excersize) depends on the magnitude of the input numbers; this is a pseudo-polynomialtime algorithm. Nikita's solution is conceptually similar but more efficient than this solution since it doesn't track the third set's sum: that's unnecessary since you can trivially compute it.

As I have answered in same another question like this, the C++ implementation would look something like this:
int partition3(vector<int> &A)
{
int sum = accumulate(A.begin(), A.end(), 0);
if (sum % 3 != 0)
{
return false;
}
int size = A.size();
vector<vector<int>> dp(sum + 1, vector<int>(sum + 1, 0));
dp[0][0] = true;
// process the numbers one by one
for (int i = 0; i < size; i++)
{
for (int j = sum; j >= 0; --j)
{
for (int k = sum; k >= 0; --k)
{
if (dp[j][k])
{
dp[j + A[i]][k] = true;
dp[j][k + A[i]] = true;
}
}
}
}
return dp[sum / 3][sum / 3];
}

Let's say you want to partition the set $X = {x_1, ..., x_n}$ in $k$ partitions.
Create a $ n \times k $ table. Assume the cost $M[i,j]$ be the maximum sum of $i$ elements in $j$ partitions. Just recursively use the following optimality criterion to fill it:
M[n,k] = min_{i\leq n} max ( M[i, k-1], \sum_{j=i+1}^{n} x_i )
Using these initial values for the table:
M[i,1] = \sum_{j=1}^{i} x_i and M[1,j] = x_j
The running time is $O(kn^2)$ (polynomial )

Create a three dimensional array, where size is count of elements, and part is equal to to sum of all elements divided by 3. So each cell of array[seq][sum1][sum2] tells can you create sum1 and sum2 using max seq elements from given array A[] or not. So compute all values of array, result will be in cell array[using all elements][sum of all element / 3][sum of all elements / 3], if you can create two sets without crossing equal to sum/3, there will be third set.
Logic of checking: exlude A[seq] element to third sum(not stored), check cell without element if it has same two sums; OR include to sum1 - if it is possible to get two sets without seq element, where sum1 is smaller by value of element seq A[seq], and sum2 isn't changed; OR include to sum2 check like previous.
int partition3(vector<int> &A)
{
int part=0;
for (int a : A)
part += a;
if (part%3)
return 0;
int size = A.size()+1;
part = part/3+1;
bool array[size][part][part];
//sequence from 0 integers inside to all inside
for(int seq=0; seq<size; seq++)
for(int sum1=0; sum1<part; sum1++)
for(int sum2=0;sum2<part; sum2++) {
bool curRes;
if (seq==0)
if (sum1 == 0 && sum2 == 0)
curRes = true;
else
curRes= false;
else {
int curInSeq = seq-1;
bool excludeFrom = array[seq-1][sum1][sum2];
bool includeToSum1 = (sum1>=A[curInSeq]
&& array[seq-1][sum1-A[curInSeq]][sum2]);
bool includeToSum2 = (sum2>=A[curInSeq]
&& array[seq-1][sum1][sum2-A[curInSeq]]);
curRes = excludeFrom || includeToSum1 || includeToSum2;
}
array[seq][sum1][sum2] = curRes;
}
int result = array[size-1][part-1][part-1];
return result;
}

Another example in C++ (based on the previous answers):
bool partition3(vector<int> const &A) {
int sum = 0;
for (int i = 0; i < A.size(); i++) {
sum += A[i];
}
if (sum % 3 != 0) {
return false;
}
vector<vector<vector<int>>> E(A.size() + 1, vector<vector<int>>(sum / 3 + 1, vector<int>(sum / 3 + 1, 0)));
for (int i = 1; i <= A.size(); i++) {
for (int j = 0; j <= sum / 3; j++) {
for (int k = 0; k <= sum / 3; k++) {
E[i][j][k] = E[i - 1][j][k];
if (A[i - 1] <= k) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j][k - A[i - 1]] + A[i - 1]);
}
if (A[i - 1] <= j) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j - A[i - 1]][k] + A[i - 1]);
}
}
}
}
return (E.back().back().back() / 2 == sum / 3);
}

You really want Korf's Complete Karmarkar-Karp algorithm (http://ac.els-cdn.com/S0004370298000861/1-s2.0-S0004370298000861-main.pdf, http://ijcai.org/papers09/Papers/IJCAI09-096.pdf). A generalization to three-partitioning is given. The algorithm is surprisingly fast given the complexity of the problem, but requires some implementation.
The essential idea of KK is to ensure that large blocks of similar size appear in different partitions. One groups pairs of blocks, which can then be treated as a smaller block of size equal to the difference in sizes that can be placed as normal: by doing this recursively, one ends up with small blocks that are easy to place. One then does a two-coloring of the block groups to ensure that the opposite placements are handled. The extension to 3-partition is a bit complicated. The Korf extension is to use depth-first search in KK order to find all possible solutions or to find a solution quickly.

Related

What is maximum water colledted between two histograms?

I recently came across this problem:
You are given height of n histograms each of width 1. You have to choose any two histograms such that if it starts raining and all other histograms(except the two you have selected) are removed, then the water collected between the two histograms is maximised.
Input:
9
3 2 5 9 7 8 1 4 6
Output:
25
Between third and last histogram.
This is a variant of Trapping rain water problem.
I tried two solutions but both had worst case complexity of N^2. How can we optimise further.
Sol1: Brute force for every pair.
int maxWaterCollected(vector<int> hist, int n) {
int ans = 0;
for (int i= 0; i < n; i++) {
for (int j = i + 1; j < n; j++) {
ans = max(ans, min(hist[i], hist[j]) * (j - i - 1));
}
}
return ans;
}
Sol2: Keep a sequence of histograms in increasing order of height. For every histogram, find its best histogram in this sequence. now, if all histograms are in increasing order then this solution also becomes N^2.
int maxWaterCollected(vector<int> hist, int n) {
vector< pair<int, int> > increasingSeq(1, make_pair(hist[0], 0)); // initialised with 1st element.
int ans = 0;
for (int i = 1; i < n; i++) {
// compute best result from current increasing sequence
for (int j = 0; j < increasingSeq.size(); j++) {
ans = max(ans, min(hist[i], increasingSeq[j].first) * (i - increasingSeq[j].second - 1));
}
// add this histogram to sequence
if (hist[i] > increasingSeq.back().first) {
increasingSeq.push_back(make_pair(hist[i], i));
}
}
return ans;
}
Use 2 iterators, one from begin() and one from end() - 1.
until the 2 iterator are equal:
Compare current result with the max, and keep the max
Move the iterator with smaller value (begin -> end or end -> begin)
Complexity: O(n).
Jarod42 has the right idea, but it's unclear from his terse post why his algorithm, described below in Python, is correct:
def candidates(hist):
l = 0
r = len(hist) - 1
while l < r:
yield (r - l - 1) * min(hist[l], hist[r])
if hist[l] <= hist[r]:
l += 1
else:
r -= 1
def maxwater(hist):
return max(candidates(hist))
The proof of correctness is by induction: the optimal solution either (1) belongs to the candidates yielded so far or (2) chooses histograms inside [l, r]. The base case is simple, because all histograms are inside [0, len(hist) - 1].
Inductively, suppose that we're about to advance either l or r. These cases are symmetric, so let's assume that we're about to advance l. We know that hist[l] <= hist[r], so the value is (r - l - 1) * hist[l]. Given any other right endpoint r1 < r, the value is (r1 - l - 1) * min(hist[l], hist[r1]), which is less because r - l - 1 > r1 - l - 1 and hist[l] >= min(hist[l], hist[r1]). We can rule out all of these solutions as suboptimal, so it's safe to advance l.

Counting the number of subset solution in the subset sum (knapsack) with dynamic programming

So I want to know how to count all the solutions for a knapsack problem. Namely I'm interested in finding the number of possible subsets from a set of numbers that have the maximum size of K.
e.g we have a set of items of size {3, 2, 5, 6, 7} and the max size is K = 13. So the solutions are {5, 6, 2} and {6, 7}. On the other hand there are two solutions; I want my dynamic programming algorithm to report there are two possible solutions.
This can be done with dynamic programming. The basic strategy is to build a memorization table, d[i][j], which stores the number of combinations using the first j numbers that sum to i. Note that j = 0 represents an empty set of numbers. Here is a sample implementation:
int countCombinations(int[] numbers, int target) {
// d[i][j] = n means there are n combinations of the first j numbers summing to i.
int[][] d = new int[target + 1][numbers.length + 1];
// There is always 1 combination summing to 0, namely the empty set.
for (int j = 0; j <= numbers.length; ++j) {
d[0][j] = 1;
}
// For each total i, calculate the effect of using or omitting each number j.
for (int i = 1; i <= target; ++i) {
for (int j = 1; j <= numbers.length; ++j) {
// "First j numbers" is 1-indexed, our array is 0-indexed.
int number = numbers[j - 1];
// Initialize value to 0.
d[i][j] = 0;
// How many combinations were there before considering the jth number?
d[i][j] += d[i][j - 1];
// How many things summed to i - number?
if (i - number >= 0) {
d[i][j] += d[i - number][j - 1];
}
}
}
// Return the entry in the table storing all the number combos summing to target.
return d[target][numbers.length - 1];
}
Just to add some Google keywords: this problem is also known as summing n coins without repeats to a target sum.
There is a dynamic knapsack solution for this task.In dp array dp[i] stores the number of subsets which their sum is "i". In this case your answer is dp[K].( Sorry for indentation problems I could not figure out how to make it right :( )
dp[0] = 1 ;
for( int i=0; i<N ; i++ )
for( int j=K-a[i] ; j>=0 ; j-- )
dp[j+a[i]] += dp[j]
I don't think Max's algorithm works for the case: [0,0,1] with target of 1. The answer is 4 but his algorithm will output 1. His algorithm only works for positive integers, because it assumes that a sum of 0 can only be achieved with empty set. However, it can also be achieved if 0 exists in the array. The more robust way of tackling this problem (and also more space efficient) is using a 1D dp array. The pseudo-code is the following:
int[] dp = new int[target+1];
for (int num : nums) {
for (int s = target; s >= 0; s--) {
if (s >= num) { // can include num
dp[s] += dp[s-num];
} else { // cannot include num (can be omitted, here for better explanation)
dp[s] += 0;
}
}
}
return dp[target+1];
The reason I backtrack from target to 0 in the inner for loop is to avoid duplication. Think about the example [2,2,2] with target sum of 4. If you iterate from index 0, then you would double count a 2 when you are at dp[4] (should be [1 0 1 0 0] instead of [1 0 1 0 1] after one iteration in inner loop).
Hope this helps.

How to find minimum positive contiguous sub sequence in O(n) time?

We have this algorithm for finding maximum positive sub sequence in given sequence in O(n) time. Can anybody suggest similar algorithm for finding minimum positive contiguous sub sequence.
For example
If given sequence is 1,2,3,4,5 answer should be 1.
[5,-4,3,5,4] ->1 is the minimum positive sum of elements [5,-4].
There cannot be such algorithm. The lower bound for this problem is O(n log n). I'll prove it by reducing the element distinctness problem to it (actually to the non-negative variant of it).
Let's suppose we have an O(n) algorithm for this problem (the minimum non-negative subarray).
We want to find out if an array (e.g. A=[1, 2, -3, 4, 2]) has only distinct elements. To solve this problem, I could construct an array with the difference between consecutive elements (e.g. A'=[1, -5, 7, -2]) and run the O(n) algorithm we have. The original array only has distinct elements if and only if the minimum non-negative subarray is greater than 0.
If we had an O(n) algorithm to your problem, we would have an O(n) algorithm to element distinctness problem, which we know is not possible on a Turing machine.
We can have a O(n log n) algorithm as follow:
Assuming that we have an array prefix, which index i stores the sum of array A from 0 to i, so the sum of sub-array (i, j) is prefix[j] - prefix[i - 1].
Thus, in order to find the minimum positive sub-array ending at index j, so, we need to find the maximum element prefix[x], which less than prefix[j] and x < j. We can find that element in O(log n) time if we use a binary search tree.
Pseudo code:
int[]prefix = new int[A.length];
prefix[0] = A[0];
for(int i = 1; i < A.length; i++)
prefix[i] = A[i] + prefix[i - 1];
int result = MAX_VALUE;
BinarySearchTree tree;
for(int i = 0; i < A.length; i++){
if(A[i] > 0)
result = min(result, A[i];
int v = tree.getMaximumElementLessThan(prefix[i]);
result = min(result, prefix[i] - v);
tree.add(prefix[i]);
}
I believe there's a O(n) algorithm, see below.
Note: it has a scale factor that might make it less attractive in practical applications: it depends on the (input) values to be processed, see remarks in the code.
private int GetMinimumPositiveContiguousSubsequenc(List<Int32> values)
{
// Note: this method has no precautions against integer over/underflow, which may occur
// if large (abs) values are present in the input-list.
// There must be at least 1 item.
if (values == null || values.Count == 0)
throw new ArgumentException("There must be at least one item provided to this method.");
// 1. Scan once to:
// a) Get the mimumum positive element;
// b) Get the value of the MAX contiguous sequence
// c) Get the value of the MIN contiguous sequence - allowing negative values: the mirror of the MAX contiguous sequence.
// d) Pinpoint the (index of the) first negative value.
int minPositive = 0;
int maxSequence = 0;
int currentMaxSequence = 0;
int minSequence = 0;
int currentMinSequence = 0;
int indxFirstNegative = -1;
for (int k = 0; k < values.Count; k++)
{
int value = values[k];
if (value > 0)
if (minPositive == 0 || value < minPositive)
minPositive = value;
else if (indxFirstNegative == -1 && value < 0)
indxFirstNegative = k;
currentMaxSequence += value;
if (currentMaxSequence <= 0)
currentMaxSequence = 0;
else if (currentMaxSequence > maxSequence)
maxSequence = currentMaxSequence;
currentMinSequence += value;
if (currentMinSequence >= 0)
currentMinSequence = 0;
else if (currentMinSequence < minSequence)
minSequence = currentMinSequence;
}
// 2. We're done if (a) there are no negatives, or (b) the minPositive (single) value is 1 (or 0...).
if (minSequence == 0 || minPositive <= 1)
return minPositive;
// 3. Real work to do.
// The strategy is as follows, iterating over the input values:
// a) Keep track of the cumulative value of ALL items - the sequence that starts with the very first item.
// b) Register each such cumulative value as "existing" in a bool array 'initialSequence' as we go along.
// We know already the max/min contiguous sequence values, so we can properly size that array in advance.
// Since negative sequence values occur we'll have an offset to match the index in that bool array
// with the corresponding value of the initial sequence.
// c) For each next input value to process scan the "initialSequence" bool array to see whether relevant entries are TRUE.
// We don't need to go over the complete array, as we're only interested in entries that would produce a subsequence with
// a value that is positive and also smaller than best-so-far.
// (As we go along, the range to check will normally shrink as we get better and better results.
// Also: initially the range is already limited by the single-minimum-positive value that we have found.)
// Performance-wise this approach (which is O(n)) is suitable IFF the number of input values is large (or at least: not small) relative to
// the spread between maxSequence and minSeqence: the latter two define the size of the array in which we will do (partial) linear traversals.
// If this condition is not met it may be more efficient to replace the bool array by a (binary) search tree.
// (which will result in O(n logn) performance).
// Since we know the relevant parameters at this point, we may below have the two strategies both implemented and decide run-time
// which to choose.
// The current implementation has only the fixed bool array approach.
// Initialize a variable to keep track of the best result 'so far'; it will also be the return value.
int minPositiveSequence = minPositive;
// The bool array to keep track of which (total) cumulative values (always with the sequence starting at element #0) have occurred so far,
// and the 'offset' - see remark 3b above.
int offset = -minSequence;
bool[] initialSequence = new bool[maxSequence + offset + 1];
int valueCumulative = 0;
for (int k = 0; k < indxFirstNegative; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
}
for (int k = indxFirstNegative; k < values.Count; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
// Check whether the difference with any previous "cumulative" may improve the optimum-so-far.
// the index that, if the entry is TRUE, would yield the best possible result.
int indexHigh = valueCumulative + offset - 1;
// the last (lowest) index that, if the entry is TRUE, would still yield an improvement over what we have so far.
int indexLow = Math.Max(0, valueCumulative + offset - minPositiveSequence + 1);
for (int indx = indexHigh; indx >= indexLow; indx--)
{
if (initialSequence[indx])
{
minPositiveSequence = valueCumulative - indx + offset;
if (minPositiveSequence == 1)
return minPositiveSequence;
break;
}
}
}
return minPositiveSequence;
}
}

Efficient way to count subsets with given sum

Given N numbers I need to count subsets whose sum is S.
Note : Numbers in array need not to be distinct.
My current code is :
int countSubsets(vector<int> numbers,int sum)
{
vector<int> DP(sum+1);
DP[0]=1;
int currentSum=0;
for(int i=0;i<numbers.size();i++)
{
currentSum+=numbers[i];
for (int j=min(sum,currentSum);j>=numbers[i];j--)
DP[j]+=DP[j - numbers[i]];
}
return DP[sum];
}
Can their be any efficient way than this ?
Constraints are :
1 ≤ N ≤ 14
1 ≤ S ≤ 100000
1 ≤ A[i] ≤ 10000
Also their are 100 test cases in a single file. So please help if their exist better solution than this one
N is small (2^20 - is about 1 milion - 2^14 is really small value) - just iterate over all subsets, below I wrote pretty fast way to do that (bithacking). Treat integers as sets (that's enumerating subsets in Lexicographical order)
int length = array.Length;
int subsetCount = 0;
for (int i=0; i<(1<<length); ++i)
{
int currentSet = i;
int tempIndex = length-1;
int currentSum = 0;
while (currentSet > 0) // iterate over bits "from the right side"
{
if (currentSet & 1 == 1) // if current bit is "1"
currentSum += array[tempIndex];
currentSet >>= 1;
tempIndex--;
}
subsetCount += (currentSum == targetSum) ? 1 : 0;
}
You can use the fact that N is small: it is possible to generate all possible subsets of the given array and check if its sum is S for each of them. The time complexity is O(N * 2 ** N) or O(2 ** N)(it depends on the way of the generation). This solution should be fast enough for the given constraints.
Here is a pseudo code of an O(2 ** N) solution:
result = 0
void generate(int curPos, int curSum):
if curPos == N:
if curSum == S:
result++
return
// Do not take the current element.
generate(curPos + 1, curSum)
// Take it.
generate(curPos + 1, curSum + numbers[curPos])
generate(0, 0)
A faster solution based on the meet in the middle technique:
Let's generate all subsets for the first half of the array using the algorithm described above and put their sums into a map(which maps a sum to the number of subsets that have it. It can be either a hash table or just an array because S is relatively small). This step takes O(2 ** (N / 2)) time.
Now let's generate all subsets for the second half and for each of them add the number of subset that sum up to S - currentSum e in the first half(using the map constructed in 1.), where the currentSum is the sum of all elements in the current subseta. Again, we have O(2 ** (N / 2)) subsets and each of them is processed in O(1).
The total time complexity is O(2 ** (N / 2)).
A pseudo code for this solution:
Map<int, int> count = new HashMap<int, int>() // or an array of size S + 1.
result = 0
void generate1(int[] numbers, int pos, int currentSum):
if pos == numbers.length:
count[currentSum]++
return
generate1(numbers, pos + 1, currentSum)
generate1(numbers, pos + 1, currentSum + numbers[pos])
void generate2(int[] numbers, int pos, int currentSum):
if pos == numbers.length:
result += count[S - currentSum]
return
generate2(numbers, pos + 1, currentSum)
generate2(numbers, pos + 1, currentSum + numbers[pos])
generate1(the first half of numbers, 0, 0)
generate2(the second half of numbers, 0, 0)
If N is odd, the middle element can go to either the first half or to the second one. It doesn't matter where it goes as long as it goes to exactly one of them.

Find the missing and duplicate elements in an array in linear time and constant space

You’re given an array of N 64 bit integers. N may be very large. You know that every integer 1..N appears once in the array, except there is one integer missing and one integer duplicated.
Write a linear time algorithm to find the missing and duplicated numbers. Further, your algorithm should run in small constant space and leave the array untouched.
Source: http://maxschireson.com/2011/04/23/want-a-job-working-on-mongodb-your-first-online-interview-is-in-this-post/
If all numbers were present in the array the sum would be N(N+1)/2.
Determine the actual sum by summing up all numbers in the array in O(n), let this be Sum(Actual).
One number is missing, let this be j and one number is duplicated, let this be k. That means that
Sum(Actual) = N(N+1)/2 + k - j
derived from that
k = Sum(Actual) -N(N+1)/2 + j
Also we can calculate the sum of squares in the array, which would sum up to
n3/3 + n2/2 + n/6 if all numbers were present.
Now we can calculate the actual sum of squares in O(n), let this be Sum(Actual Squares).
Sum(Actual Squares) =n3/3 +
n2/2 + n/6 + k2
- j2
Now we have two equations with which we can determine j and k.
The XOR trick works in two passes with a read-only array.
This avoids the problem of possible integer overflows which the sum and sum of squares solution has.
Let the two numbers be x and y, one of which is the missing number and the other repeated.
XOR all the elements of the array, along with 1,2,...,N.
The result is w = x XOR y.
Now since x and y are distinct, w is non-zero.
Pick any non-zero bit of w. x and y differ in this bit. Say the position of the bit is k.
Now consider a split of the array (and the numbers 1,2,...,N) into two sets, based on whether the bit at position k is 0 or 1.
Now if we compute the XOR (separately) of the elements of the two sets, the result has to be x and y.
Since the criteria for splitting is just checking if a bit is set of not, we can compute the two XORs of the two sets by making another pass through the array and having two variables, each of which holds the XOR of the elements seen so far (and 1,2,...N), for each set. At the end, when we are done, those two variables will hold x and y.
Related:
Finding missing elements in an array which can be generalized to m appearing twice and m missing.
Find three numbers appeared only once which is about three missing.
Using the basic idea from a related interview question you could do:
Sum up all the numbers (we shall call this S1) and their squares (S2)
Compute the expected sum of the numbers, without modifications, i.e. E1 = n*(n+1)/2 and E2 = n*(n+1)*(2n+1)/6
Now you know that E1 - S1 = d - m and E2 - S2 = d^2 - m^2, where d is the duplicated number and m the missing one.
Solve this system of equations and you'll find that:
m = 1/2 ((E2 - S2)/(E1 - S1) - (E1 - S1))
d = 1/2 ((E2 - S2)/(E1 - S1) + (E1 - S1)) // or even simpler: d = m + (E1 - S1)
.
$S1 = $S2 = 0;
foreach ($nums as $num) {
$S1 += $num;
$S2 += $num * $num;
}
$D1 = $n * ($n + 1) / 2 - $S1;
$D2 = $n * ($n + 1) * (2 * $n + 1) / 6 - $S2;
$m = 1/2 * ($D2/$D1 - $D1);
$d = 1/2 * ($D2/$D1 + $D1);
Here is a Java implementation based on #Aryabhatta 's idea:
Input:[3 1 2 5 3]
Output:[3, 4]
public ArrayList<Integer> repeatedNumber(final List<Integer> A) {
ArrayList<Integer> ret = new ArrayList<>();
int xor = 0, x = 0, y = 0;
for(int i=0; i<A.size(); i++) {
xor ^= A.get(i);
}
for(int i=1; i<=A.size(); i++) {
xor ^= i;
}
int setBit = xor & ~(xor-1);
for(int i=0; i<A.size(); i++) {
if((A.get(i) & setBit) != 0) {
x ^= A.get(i);
} else {
y ^= A.get(i);
}
}
for(int i=1; i<=A.size(); i++) {
if((i & setBit) != 0) {
x ^= i;
} else {
y ^= i;
}
}
for(int i=0; i<A.size(); i++) {
if(A.get(i) == x) {
ret.add(x);
ret.add(y);
return ret;
}
if(A.get(i) == y) {
ret.add(y);
ret.add(x);
return ret;
}
}
return ret;
}
The solution proposed by BrokenGlass covers the case for two unknowns (corresponding to one duplicate number and one missing number), using two formulas:
and
The formulas yield the generalized harmonic number of orders n of -1 and -2, respectively. (Power series)
This solution is generalizable for 3 unknowns by including the value of generalized harmonic number of order n of -3.
To solve for m unknowns (duplicates and missing numbers), use m generalized harmonic numbers of orders n of -1 through -m.
Moron notes that this approach was discussed earlier in StackOverflow at Easy interview question got harder.
Taking the leave the array untouched requirement literally (i.e. the array can be temporarily modified as long as it does not change in the end), a programming-oriented solution can be suggested.
I assume that array size N is much smaller than 2^64 which is an utterly unrealistic amount of memory. So we can safely assume that N < 2^P such that P << 64 (significantly smaller). In other words this means that all numbers in the array have some high bits unused. So let's just use the highest bit as a flag whether the index of that position has been seen in the array. The algorithm goes as follows:
set HIGH = 2^63 // a number with only the highest bit set
scan the array, for each number k do
if array[k] < HIGH: array[k] = array[k] + HIGH // set the highest bit
else: k is the duplicate
for each i in 1..N do
if array[i] < HIGH: i is missing
else: array[i] = array[i] - HIGH // restore the original number
This is linear time and very little computation
long long int len = A.size();
long long int sumOfN = (len * (len+1) ) /2, sumOfNsq = (len * (len +1) *(2*len +1) )/6;
long long int missingNumber1=0, missingNumber2=0;
for(int i=0;i<A.size(); i++){
sumOfN -= (long long int)A[i];
sumOfNsq -= (long long int)A[i]*(long long int)A[i];
}
missingno = (sumOfN + sumOfNsq/sumOfN)/2;
reaptingNO = missingNumber1 - sumOfN;
Psuedo code assuming the set is sorted
missing = nil
duplicate = nil
for i = 0, i < set.size - 1, i += 1
if set[i] == set[i + 1]
duplicate = set[i]
else if((set[i] + 1) != set[i+1])
missing = set[i] + 1
if missing != nil && duplicate != nil
break
return (missing, duplicate)

Resources