Given an array of ints and a number n, calculate the number of ways to sum to n using the ints - algorithm

I saw this problem in my interview preparation.
Given an array of ints and a number n, calculate the number of ways to
sum to n using the ints
Following code is my solution. I tried to solve this by recursion. Subproblem is for each int in the array, we can either pick it or not.
public static int count(List<Integer> list, int n) {
System.out.print(list.size() + ", " + n);
System.out.println();
if (n < 0 || list.size() == 0)
return 0;
if (list.get(0) == n)
return 1;
int e = list.remove(0);
return count(list, n) + count(list, n - e);
}
I tried to use [10, 1, 2, 7, 6, 1, 5] for ints, and set n to 8. The result should be 4. However, I got 0. I tried to print what I have on each layer of stack to debug as showed in the code. Following is what I have:
7, 8
6, 8
5, 8
4, 8
3, 8
2, 8
1, 8
0, 8
0, 3
0, 7
0, 2
0, 1
0, 6
0, 7
0, -2
This result confuses me. I think it looks right from beginning to (0, 3). Starting from (0, 7), it looks wrong to me. I expect (1, 7) there. Because if I understand correctly, this is for count(list, n - e) call on second to the bottom layer on the stack. The list operation on the lower layer shouldn't impact the list on the current layer.
So my questions are:
why is it (0, 7) instead of (1, 7) based on my current code?
what adjustment should I do to my current code to get the correct result?
Thanks!

The reason why your algorithm is not working is because you are using one list that is being modified before the recursive calls.
Since the list is passed by reference, what ends up happening is that you recursively call remove until there is nothing in the list any more and then all of your recursive calls are going to return 0
What you could do is create two copies of the list on every recursive step. However, this would be way too inefficient.
A better way would be to use an index i that marks the element in the list that is being looked at during the call:
public static int count(List<Integer> list, int n, int i) {
//System.out.print(list.size() + ", " + n);
//System.out.println();
if (n < 0 || i <= 0)
return 0;
int e = list.get(i); // e is the i-th element in the list
if (e == n)
return 1 + count(list, n, i-1); // Return 1 + check for more possibilities without picking e
return count(list, n, i-1) + count(list, n - e, i-1); // Result if e is not picked + result if e is picked
}
You would then pass yourList.size() - 1 for i on the initial function call.
One more point is that when you return 1, you still have to add the number of possibilities for when your element e is not picked to be part of a sum. Otherwise, if - for example - your last element in the list was n, the recursion would end on the first step only returning 1 and not checking for more possible number combinations.
Finally, you might want to rewrite the algorithm using a dynamic approach, since that would give you a way better running time.

Related

Minimum common remainder of division

I have n pairs of numbers: ( p[1], s[1] ), ( p[2], s[2] ), ... , ( p[n], s[n] )
Where p[i] is integer greater than 1; s[i] is integer : 0 <= s[i] < p[i]
Is there any way to determine minimum positive integer a , such that for each pair :
( s[i] + a ) mod p[i] != 0
Anything better than brute force ?
It is possible to do better than brute force. Brute force would be O(A·n), where A is the minimum valid value for a that we are looking for.
The approach described below uses a min-heap and achieves O(n·log(n) + A·log(n)) time complexity.
First, notice that replacing a with a value of the form (p[i] - s[i]) + k * p[i] leads to a reminder equal to zero in the ith pair, for any positive integer k. Thus, the numbers of that form are invalid a values (the solution that we are looking for is different from all of them).
The proposed algorithm is an efficient way to generate the numbers of that form (for all i and k), i.e. the invalid values for a, in increasing order. As soon as the current value differs from the previous one by more than 1, it means that there was a valid a in-between.
The pseudocode below details this approach.
1. construct a min-heap from all the following pairs (p[i] - s[i], p[i]),
where the heap comparator is based on the first element of the pairs.
2. a0 = -1; maxA = lcm(p[i])
3. Repeat
3a. Retrieve and remove the root of the heap, (a, p[i]).
3b. If a - a0 > 1 then the result is a0 + 1. Exit.
3c. if a is at least maxA, then no solution exists. Exit.
3d. Insert into the heap the value (a + p[i], p[i]).
3e. a0 = a
Remark: it is possible for such an a to not exist. If a valid a is not found below LCM(p[1], p[2], ... p[n]), then it is guaranteed that no valid a exists.
I'll show below an example of how this algorithm works.
Consider the following (p, s) pairs: { (2, 1), (5, 3) }.
The first pair indicates that a should avoid values like 1, 3, 5, 7, ..., whereas the second pair indicates that we should avoid values like 2, 7, 12, 17, ... .
The min-heap initially contains the first element of each sequence (step 1 of the pseudocode) -- shown in bold below:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
We retrieve and remove the head of the heap, i.e., the minimum value among the two bold ones, and this is 1. We add into the heap the next element from that sequence, thus the heap now contains the elements 2 and 3:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
We again retrieve the head of the heap, this time it contains the value 2, and add the next element of that sequence into the heap:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
The algorithm continues, we will next retrieve value 3, and add 5 into the heap:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
Finally, now we retrieve value 5. At this point we realize that the value 4 is not among the invalid values for a, thus that is the solution that we are looking for.
I can think of two different solutions. First:
p_max = lcm (p[0],p[1],...,p[n]) - 1;
for a = 0 to p_max:
zero_found = false;
for i = 0 to n:
if ( s[i] + a ) mod p[i] == 0:
zero_found = true;
break;
if !zero_found:
return a;
return -1;
I suppose this is the one you call "brute force". Notice that p_max represents Least Common Multiple of p[i]s - 1 (solution is either in the closed interval [0, p_max], or it does not exist). Complexity of this solution is O(n * p_max) in the worst case (plus the running time for calculating lcm!). There is a better solution regarding the time complexity, but it uses an additional binary array - classical time-space tradeoff. Its idea is similar to the Sieve of Eratosthenes, but for remainders instead of primes :)
p_max = lcm (p[0],p[1],...,p[n]) - 1;
int remainders[p_max + 1] = {0};
for i = 0 to n:
int rem = s[i] - p[i];
while rem >= -p_max:
remainders[-rem] = 1;
rem -= p[i];
for i = 0 to n:
if !remainders[i]:
return i;
return -1;
Explanation of the algorithm: first, we create an array remainders that will indicate whether certain negative remainder exists in the whole set. What is a negative remainder? It's simple, notice that 6 = 2 mod 4 is equivalent to 6 = -2 mod 4. If remainders[i] == 1, it means that if we add i to one of the s[j], we will get p[j] (which is 0, and that is what we want to avoid). Array is populated with all possible negative remainders, up to -p_max. Now all we have to do is search for the first i, such that remainder[i] == 0 and return it, if it exists - notice that the solution does not have to exists. In the problem text, you have indicated that you are searching for the minimum positive integer, I don't see why zero would not fit (if all s[i] are positive). However, if that is a strong requirement, just change the for loop to start from 1 instead of 0, and increment p_max.
The complexity of this algorithm is n + sum (p_max / p[i]) = n + p_max * sum (1 / p[i]), where i goes from to 0 to n. Since all p[i]s are at least 2, that is asymptotically better than the brute force solution.
An example for better understanding: suppose that the input is (5,4), (5,1), (2,0). p_max is lcm(5,5,2) - 1 = 10 - 1 = 9, so we create array with 10 elements, initially filled with zeros. Now let's proceed pair by pair:
from the first pair, we have remainders[1] = 1 and remainders[6] = 1
second pair gives remainders[4] = 1 and remainders[9] = 1
last pair gives remainders[0] = 1, remainders[2] = 1, remainders[4] = 1, remainders[6] = 1 and remainders[8] = 1.
Therefore, first index with zero value in the array is 3, which is a desired solution.

Maximum sum increasing subsequence, changing algorithm to use memoization

I have the following code which implements a recursive solution for this problem, instead of using the reference variable 'x' to store overall max, How can I or can I return the result from recursion so I don't have to use the 'x' which would help memoization?
// Test Cases:
// Input: {1, 101, 2, 3, 100, 4, 5} Output: 106
// Input: {3, 4, 5, 10} Output: 22
int sum(vector<int> seq)
{
int x = INT32_MIN;
helper(seq, seq.size(), x);
return x;
}
int helper(vector<int>& seq, int n, int& x)
{
if (n == 1) return seq[0];
int maxTillNow = seq[0];
int res = INT32_MIN;
for (int i = 1; i < n; ++i)
{
res = helper(seq, i, x);
if (seq[i - 1] < seq[n - 1] && res + seq[n - 1] > maxTillNow) maxTillNow = res + seq[n - 1];
}
x = max(x, maxTillNow);
return maxTillNow;
}
First, I don't think this implementation is correct. For this input {5, 1, 2, 3, 4} it gives 14 while the correct result is 10.
For writing a recursive solution for this problem, you don't need to pass x as a parameter, as x is the result you expect to get from the function itself. Instead, you can construct a state as the following:
Current index: this is the index you're processing at the current step.
Last taken number: This is the value of the last number you included in your result subsequence so far. This is to make sure that you pick larger numbers in the following steps to keep the result subsequence increasing.
So your function definition is something like sum(current_index, last_taken_number) = the maximum increasing sum from current_index until the end, given that you have to pick elements greater than last_taken_number to keep it an increasing subsequence, where the answer that you desire is sum(0, a small value) since it calculates the result for the whole sequence. by a small value I mean smaller than any other value in the whole sequence.
sum(current_index, last_taken_number) could be calculated recursively using smaller substates. First assume the simple cases:
N = 0, result is 0 since you don't have a sequence at all.
N = 1, the sequence contains only one number, the result is either that number or 0 in case the number is negative (I'm considering an empty subsequence as a valid subsequence, so not taking any number is a valid answer).
Now to the tricky part, when N >= 2.
Assume that N = 2. In this case you have two options:
Either ignore the first number, then the problem can be reduced to the N=1 version where that number is the last one in the sequence. In this case the result is the same as sum(1,MIN_VAL), where current_index=1 since we already processed index=0 and decided to ignore it, and MIN_VAL is the small value we mentioned above
Take the first number. Assume the its value is X. Then the result is X + sum(1, X). That means the solution includes X since you decided to include it in the sequence, plus whatever the result is from sum(1,X). Note that we're calling sum with MIN_VAL=X since we decided to take X, so the following values that we pick have to be greater than X.
Both decisions are valid. The result is whatever the maximum of these two. So we can deduce the general recurrence as the following:
sum(current_index, MIN_VAL) = max(
sum(current_index + 1, MIN_VAL) // ignore,
seq[current_index] + sum(current_index + 1, seq[current_index]) // take
).
The second decision is not always valid, so you have to make sure that the current element > MIN_VAL in order to be valid to take it.
This is a pseudo code for the idea:
sum(current_index, MIN_VAL){
if(current_index == END_OF_SEQUENCE) return 0
if( state[current_index,MIN_VAL] was calculated before ) return the perviously calculated result
decision_1 = sum(current_index + 1, MIN_VAL) // ignore case
if(sequence[current_index] > MIN_VAL) // decision_2 is valid
decision_2 = sequence[current_index] + sum(current_index + 1, sequence[current_index]) // take case
else
decision_2 = INT_MIN
result = max(decision_1, decision_2)
memorize result for the state[current_index, MIN_VAL]
return result
}

How to find longest increasing sequence starting at each position within the array in O(n log n) time,

How could we find longest increasing sub-sequence starting at each position of the array in O(n log n) time, I have seen techniques to find longest increasing sequence ending at each position of the array but I am unable to find the other way round.
e.g.
for the sequence " 3 2 4 4 3 2 3 "
output must be " 2 2 1 1 1 2 1 "
I made a quick and dirty JavaScript implementation (note: it is O(n^2)):
function lis(a) {
var tmpArr = Array(),
result = Array(),
i = a.length;
while (i--) {
var theValue = a[i],
longestFound = tmpArr[theValue] || 1;
for (var j=theValue+1; j<tmpArr.length; j++) {
if (tmpArr[j] >= longestFound) {
longestFound = tmpArr[j]+1;
}
}
result[i] = tmpArr[theValue] = longestFound;
}
return result;
}
jsFiddle: http://jsfiddle.net/Bwj9s/1/
We run through the array right-to-left, keeping previous calculations in a separate temporary array for subsequent lookups.
The tmpArray contains the previously found subsequences beginning with any given value, so tmpArray[n] will represent the longest subsequence found (to the right of the current position) beginning with the value n.
The loop goes like this: For every index, we look up the value (and all higher values) in our tmpArray to see if we already found a subsequence which the value could be prepended to. If we find one, we simply add 1 to that length, update the tmpArray for the value, and move to the next index. If we don't find a working (higher) subsequence, we set the tmpArray for the value to 1 and move on.
In order to make it O(n log n) we observe that the tmpArray will always be a decreasing array -- it can and should use a binary search rather than a partial loop.
EDIT: I didn't read the post completely, sorry. I thought you needed the longest increasing sub-sequence for all sequence. Re-edited the code to make it work.
I think it is possible to do it in linear time, actually. Consider this code:
int a[10] = {4, 2, 6, 10, 5, 3, 7, 5, 4, 10};
int maxLength[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; // array of zeros
int n = 10; // size of the array;
int b = 0;
while (b != n) {
int e = b;
while (++e < n && a[b] < a[e]) {} //while the sequence is increasing, ++e
while (b != e) { maxLength[b++] = e-b-1; }
}

Algorithm to generate all unique permutations of fixed-length integer partitions?

I'm searching for an algorithm that generates all permutations of fixed-length partitions of an integer. Order does not matter.
For example, for n=4 and length L=3:
[(0, 2, 2), (2, 0, 2), (2, 2, 0),
(2, 1, 1), (1, 2, 1), (1, 1, 2),
(0, 1, 3), (0, 3, 1), (3, 0, 1), (3, 1, 0), (1, 3, 0), (1, 0, 3),
(0, 0, 4), (4, 0, 0), (0, 4, 0)]
I bumbled about with integer partitions + permutations for partitions whose length is lesser than L; but that was too slow because I got the same partition multiple times (because [0, 0, 1] may be a permutation of [0, 0, 1] ;-)
Any help appreciated, and no, this isn't homework -- personal interest :-)
Okay. First, forget about the permutations and just generate the partitions of length L (as suggested by #Svein Bringsli). Note that for each partition, you may impose an ordering on the elements, such as >. Now just "count," maintaining your ordering. For n = 4, k = 3:
(4, 0, 0)
(3, 1, 0)
(2, 2, 0)
(2, 1, 1)
So, how to implement this? It looks like: while subtracting 1 from position i and adding it to the next position maintains our order, subtract 1 from position i, add 1 to position i + 1, and move to the next position. If we're in the last position, step back.
Here's a little python which does just that:
def partition_helper(l, i, result):
if i == len(l) - 1:
return
while l[i] - 1 >= l[i + 1] + 1:
l[i] -= 1
l[i + 1] += 1
result.append(list(l))
partition_helper(l, i + 1, result)
def partition(n, k):
l = [n] + [0] * (k - 1)
result = [list(l)]
partition_helper(l, 0, result)
return result
Now you have a list of lists (really a list of multisets), and generating all permutations of each multiset of the list gives you your solution. I won't go into that, there's a recursive algorithm which basically says, for each position, choose each unique element in the multiset and append the permutations of the multiset resulting from removing that element from the multiset.
Given that you ask this out of interest, you would probably be interested an authorative answer! It can be found in "7.2.1.2 - Generating all permutations" of Knuth's The Art of Computer Programming (subvolume 4A).
Also, 3 concrete algorithms can be found here.
As noted by #pbarranis, the code by #rlibby does not include all lists when n equals k. Below is Python code which does include all lists. This code is non-recursive, which may be more efficient with respect to memory usage.
def successor(n, l):
idx = [j for j in range(len(l)) if l[j] < l[0]-1]
if not idx:
return False
i = idx[0]
l[1:i+1] = [l[i]+1]*(len(l[1:i+1]))
l[0] = n - sum(l[1:])
return True
def partitions(n, k):
l = [0]*k
l[0] = n
results = []
results.append(list(l))
while successor(n, l):
results.append(list(l))
return results
The lists are created in colexicographic order (algorithm and more description here).
I found that using a recursive function was not good for larger lengths and integers because it chews up too much RAM, and using a generator / resumable-function (that 'yields' values) was too slow and required a large library to make it cross-platform.
So here's a non-recursive solution in C++ that produces the partitions in sorted order (which is ideal for permutations too). I've found this to be over 10 times faster than seemingly clever and concise recursive solutions I tried for partition lengths of 4 or greater, but for lengths of 1-3 the performance is not necessarily better (and I don't care about short lengths because they're fast with either approach).
// Inputs
unsigned short myInt = 10;
unsigned short len = 3;
// Partition variables.
vector<unsigned short> partition(len);
unsigned short last = len - 1;
unsigned short penult = last - 1;
short cur = penult; // Can dip into negative value when len is 1 or 2. Can be changed to unsigned if len is always >=3.
unsigned short sum = 0;
// Prefill partition with 0.
fill(partition.begin(), partition.end(), 0);
do {
// Calculate remainder.
partition[last] = max(0, myInt - sum); // Would only need "myInt - sum" if partition vector contains signed ints.
/*
*
* DO SOMETHING WITH "partition" HERE.
*
*/
if (partition[cur + 1] <= partition[cur] + 1) {
do {
cur--;
} while (
cur > 0 &&
accumulate(partition.cbegin(), partition.cbegin() + cur, 0) + (len - cur) * (partition[cur] + 1) > myInt
);
// Escape if seeked behind too far.
// I think this if-statement is only useful when len is 1 or 2, can probably be removed if len is always >=3.
if (cur < 0) {
break;
}
// Increment the new cur position.
sum++;
partition[cur]++;
// The value in each position must be at least as large as the
// value in the previous position.
for (unsigned short i = cur + 1; i < last; ++i) {
sum = sum - partition[i] + partition[i - 1];
partition[i] = partition[i - 1];
}
// Reset cur for next time.
cur = penult;
}
else {
sum++;
partition[penult]++;
}
} while (myInt - sum >= partition[penult]);
Where I've written DO SOMETHING WITH "partition" HERE. is where you would actually consume the value. (On the last iteration the code will continue to execute the remainder of the loop but I found this to be better than constantly checking for exit conditions - it's optimised for larger operations)
0,0,10
0,1,9
0,2,8
0,3,7
0,4,6
0,5,5
1,1,8
1,2,7
1,3,6
1,4,5
2,2,6
2,3,5
2,4,4
3,3,4
Oh I've used "unsigned short" because I know my length and integer won't exceed certain limits, change that if it's not suitable for you :) Check the comments; one variable there (cur) had to be signed to handle lengths of 1 or 2 and there's a corresponding if-statement that goes with that, and I've also noted in a comment that if your partition vector has signed ints there is another line that can be simplified.
To get all the compositions, in C++ I would use this simple permutation strategy which thankfully does not produce any duplicates:
do {
// Your code goes here.
} while (next_permutation(partition.begin(), partition.end()));
Nest that in the DO SOMETHING WITH "partition" HERE spot, and you're good to go.
An alternative to finding the compositions (based on the Java code here https://www.nayuki.io/page/next-lexicographical-permutation-algorithm) is as follows. I've found this to perform better than next_permutation().
// Process lexicographic permutations of partition (compositions).
composition = partition;
do {
// Your code goes here.
// Find longest non-increasing suffix
i = last;
while (i > 0 && composition[i - 1] >= composition[i]) {
--i;
}
// Now i is the head index of the suffix
// Are we at the last permutation already?
if (i <= 0) {
break;
}
// Let array[i - 1] be the pivot
// Find rightmost element that exceeds the pivot
j = last;
while (composition[j] <= composition[i - 1])
--j;
// Now the value array[j] will become the new pivot
// Assertion: j >= i
// Swap the pivot with j
temp = composition[i - 1];
composition[i - 1] = composition[j];
composition[j] = temp;
// Reverse the suffix
j = last;
while (i < j) {
temp = composition[i];
composition[i] = composition[j];
composition[j] = temp;
++i;
--j;
}
} while (true);
You'll notice some undeclared variables there, just declare them earlier in the code before all your do-loops: i, j, pos, and temp (unsigned shorts), and composition (same type and length as partition). You can reuse the declaration of i for it's use in a for-loop in the partitions code snippet. Also note variables like last being used which assume this code is nested within the partitions code given earlier.
Again "Your code goes here" is where you consume the composition for your own purposes.
For reference here are my headers.
#include <vector> // for std::vector
#include <numeric> // for std::accumulate
#include <algorithm> // for std::next_permutation and std::max
using namespace std;
Despite the massive increase in speed using these approaches, for any sizeable integers and partition lengths this will still make you mad at your CPU :)
Like I mentioned above, I couldn't get #rlibby's code to work for my needs, and I needed code where n=l, so just a subset of your need. Here's my code below, in C#. I know it's not perfectly an answer to the question above, but I believe you'd only have to modify the first method to make it work for different values of l; basically add the same code #rlibby did, making the array of length l instead of length n.
public static List<int[]> GetPartitionPermutations(int n)
{
int[] l = new int[n];
var results = new List<int[]>();
GeneratePermutations(l, n, n, 0, results);
return results;
}
private static void GeneratePermutations(int[] l, int n, int nMax, int i, List<int[]> results)
{
if (n == 0)
{
for (; i < l.Length; ++i)
{
l[i] = 0;
}
results.Add(l.ToArray());
return;
}
for (int cnt = Math.Min(nMax, n); cnt > 0; --cnt)
{
l[i] = cnt;
GeneratePermutations(l, (n - cnt), cnt, i + 1, results);
}
}
A lot of searching led to this question. Here is an answer that includes the permutations:
#!/usr/bin/python
from itertools import combinations_with_replacement as cr
def all_partitions(n, k):
"""
Return all possible combinations that add up to n
i.e. divide n objects in k DISTINCT boxes in all possible ways
"""
all_part = []
for div in cr(range(n+1), k-1):
counts = [div[0]]
for i in range(1, k-1):
counts.append(div[i] - div[i-1])
counts.append(n-div[-1])
all_part.append(counts)
return all_part
For instance, all_part(4, 3) as asked by OP gives:
[[0, 0, 4],
[0, 1, 3],
[0, 2, 2],
[0, 3, 1],
[0, 4, 0],
[1, 0, 3],
[1, 1, 2],
[1, 2, 1],
[1, 3, 0],
[2, 0, 2],
[2, 1, 1],
[2, 2, 0],
[3, 0, 1],
[3, 1, 0],
[4, 0, 0]]

Algorithm to find two repeated numbers in an array, without sorting

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.
E.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.
What is the best way to find the repeated numbers?
P.S. [You should not use sorting]
There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.
bool flags[100];
for(int i = 0; i < 100; i++)
flags[i] = false;
for(int i = 0; i < input_size; i++)
if(flags[input_array[i]])
return input_array[i];
else
flags[input_array[i]] = true;
Of course there is the additional memory but this is the fastest.
OK, seems I just can't give it a rest :)
Simplest solution
int A[N] = {...};
int signed_1(n) { return n%2<1 ? +n : -n; } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n; } // 0,+1,-2,-3,+4,+5,-6,-7,...
long S1 = 0; // or int64, or long long, or some user-defined class
long S2 = 0; // so that it has enough bits to contain sum without overflow
for (int i=0; i<N-2; ++i)
{
S1 += signed_1(A[i]) - signed_1(i);
S2 += signed_2(A[i]) - signed_2(i);
}
for (int i=N-2; i<N; ++i)
{
S1 += signed_1(A[i]);
S2 += signed_2(A[i]);
}
S1 = abs(S1);
S2 = abs(S2);
assert(S1 != S2); // this algorithm fails in this case
p = (S1+S2)/2;
q = abs(S1-S2)/2;
One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().
if abs(S1)==abs(S2) then the algorithm fails, though this value will still be the difference between p and q (i.e. abs(p - q) == abs(S1)).
Previous solution
I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:
Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)
so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2
Simple math remains:
(1) p+q = S1
(2) p^2+q^2 = S2
Surely you won't solve it as math classes teach to solve square equations.
First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.
You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.
You can calculate the sum and the product over the array, resulting in:
1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2
So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get
sum(Array) - (n-3)(n-2)/2 = x = p + q
Now do the same for the product:
1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q
prod(Array) / (n - 3)! = y = p * q
Your now got these terms:
x = p + q
y = p * q
=> y(p + q) = x(p * q)
If you transform this term, you should be able to calculate p and q
Insert each element into a set/hashtable, first checking if its are already in it.
You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.
Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.
Check this old but good paper on the topic:
Finding Repeated Elements (PDF)
Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.
For example, here's a relevant part from my answer:
bool has_duplicates(int* a, int m, int n)
{
/** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')
Whether a[] array has duplicates.
precondition: all values are in [n, n+m) range.
feature: It marks visited items using a sign bit.
*/
assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
for (int *p = a; p != &a[m]; ++p) {
*p -= (n - 1); // [n, n+m) -> [1, m+1)
assert(*p > 0);
}
// determine: are there duplicates
bool has_dups = false;
for (int i = 0; i < m; ++i) {
const int j = abs(a[i]) - 1;
assert(j >= 0);
assert(j < m);
if (a[j] > 0)
a[j] *= -1; // mark
else { // already seen
has_dups = true;
break;
}
}
// restore the array
for (int *p = a; p != &a[m]; ++p) {
if (*p < 0)
*p *= -1; // unmark
// [1, m+1) -> [n, n+m)
*p += (n - 1);
}
return has_dups;
}
The program leaves the array unchanged (the array should be writeable but its values are restored on exit).
It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).
suppose array is
a[0], a[1], a[2] ..... a[n-1]
sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]
sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2
similarly
sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6
Suppose repeated elements are = X and Y
so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;
So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)
I know the question is very old but I suddenly hit it and I think I have an interesting answer to it.
We know this is a brainteaser and a trivial solution (i.e. HashMap, Sort, etc) no matter how good they are would be boring.
As the numbers are integers, they have constant bit size (i.e. 32). Let us assume we are working with 4 bit integers right now. We look for A and B which are the duplicate numbers.
We need 4 buckets, each for one bit. Each bucket contains numbers which its specific bit is 1. For example bucket 1 gets 2, 3, 4, 7, ...:
Bucket 0 : Sum ( x where: x & 2 power 0 == 0 )
...
Bucket i : Sum ( x where: x & 2 power i == 0 )
We know what would be the sum of each bucket if there was no duplicate. I consider this as prior knowledge.
Once above buckets are generated, a bunch of them would have values more than expected. By constructing the number from buckets we will have (A OR B for your information).
We can calculate (A XOR B) as follows:
A XOR B = Array[i] XOR Array[i-1] XOR ... 0, XOR n-3 XOR n-2 ... XOR 0
Now going back to buckets, we know exactly which buckets have both our numbers and which ones have only one (from the XOR bit).
For the buckets that have only one number we can extract the number num = (sum - expected sum of bucket). However, we should be good only if we can find one of the duplicate numbers so if we have at least one bit in A XOR B, we've got the answer.
But what if A XOR B is zero?
Well this case is only possible if both duplicate numbers are the same number, which then our number is the answer of A OR B.
Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.
Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:
int count [10];
for (int i = 0; i < arraylen; i++) {
count[array[i]]++;
}
Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.
Here's implementation in Python of #eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).
def two_repeated(iterable):
s1, s2 = 0, 0
for i, j in enumerate(iterable):
s1 += j - i # number_of_digits(s1) ~ 2 * number_of_digits(i)
s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i)
s1 += (i - 1) + i
s2 += (i - 1)**2 + i**2
p = (s1 - int((2*s2 - s1**2)**.5)) // 2
# `Decimal().sqrt()` could replace `int()**.5` for really large integers
# or any function to compute integer square root
return p, s1 - p
Example:
>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
A more verbose version of the above code follows with explanation:
def two_repeated_seq(arr):
"""Return the only two duplicates from `arr`.
>>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
"""
n = len(arr)
assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
assert len(set(arr)) == (n - 2) # number of unique items
s1 = (n-2) + (n-1) # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits
s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
for i, j in enumerate(arr):
s1 += j - i
s2 += j*j - i*i
"""
s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
= sum(arr) - sum(range(n-2))
= sum(range(n-2)) + p + q - sum(range(n-2))
= p + q
"""
assert s1 == (sum(arr) - sum(range(n-2)))
"""
s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
= sum(i*i for i in arr) - sum(i*i for i in range(n-2))
= p*p + q*q
"""
assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))
"""
s1 = p+q
-> s1**2 = (p+q)**2
-> s1**2 = p*p + 2*p*q + q*q
-> s1**2 - (p*p + q*q) = 2*p*q
s2 = p*p + q*q
-> p*q = (s1**2 - s2)/2
Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
that p and q are roots of x**2 - B*x + C = 0
-> p = (B + sqrtD) / 2
-> q = (B - sqrtD) / 2
where sqrtD = sqrt(B**2 - 4*C)
-> p = (s1 + sqrt(2*s2 - s1**2))/2
"""
sqrtD = (2*s2 - s1**2)**.5
assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
sqrtD = int(sqrtD)
assert (s1 - sqrtD) % 2 == 0 # even
p = (s1 - sqrtD) // 2
q = s1 - p
assert q == ((s1 + sqrtD) // 2)
assert sqrtD == (q - p)
return p, q
NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.
Since a range is specified, you can perform radix sort. This would sort your array in O(n). Searching for duplicates in a sorted array is then O(n)
You can use simple nested for loop
int[] numArray = new int[] { 1, 2, 3, 4, 5, 7, 8, 3, 7 };
for (int i = 0; i < numArray.Length; i++)
{
for (int j = i + 1; j < numArray.Length; j++)
{
if (numArray[i] == numArray[j])
{
//DO SOMETHING
}
}
*OR you can filter the array and use recursive function if you want to get the count of occurrences*
int[] array = { 1, 2, 3, 4, 5, 4, 4, 1, 8, 9, 23, 4, 6, 8, 9, 1,4 };
int[] myNewArray = null;
int a = 1;
void GetDuplicates(int[] array)
for (int i = 0; i < array.Length; i++)
{
for (int j = i + 1; j < array.Length; j++)
{
if (array[i] == array[j])
{
a += 1;
}
}
Console.WriteLine(" {0} occurred {1} time/s", array[i], a);
IEnumerable<int> num = from n in array where n != array[i] select n;
myNewArray = null;
a = 1;
myNewArray = num.ToArray() ;
break;
}
GetDuplicates(myNewArray);
answer to 18..
you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!
check this out ...
O(n) time and O(1) space complexity
for(i=0;i< n;i++)
xor=xor^arr[i]
for(i=1;i<=n-3;i++)
xor=xor^i;
So in the given example you will get the xor of 3 and 5
xor=xor & -xor //Isolate the last digit
for(i = 0; i < n; i++)
{
if(arr[i] & xor)
x = x ^ arr[i];
else
y = y ^ arr[i];
}
for(i = 1; i <= n-3; i++)
{
if(i & xor)
x = x ^ i;
else
y = y ^ i;
}
x and y are your answers
For each number: check if it exists in the rest of the array.
Without sorting you're going to have a keep track of numbers you've already visited.
in psuedocode this would basically be (done this way so I'm not just giving you the answer):
for each number in the list
if number not already in unique numbers list
add it to the unique numbers list
else
return that number as it is a duplicate
end if
end for each
How about this:
for (i=0; i<n-1; i++) {
for (j=i+1; j<n; j++) {
if (a[i] == a[j]) {
printf("%d appears more than once\n",a[i]);
break;
}
}
}
Sure it's not the fastest, but it's simple and easy to understand, and requires
no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)
In c:
int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};
int num = 0, i;
for (i=0; i < 8; i++)
num = num ^ arr[i] ^i;
Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.
I have written a small programme which finds out the number of elements not repeated, just go through this let me know your opinion, at the moment I assume even number of elements are even but can easily extended for odd numbers also.
So my idea is to first sort the numbers and then apply my algorithm.quick sort can be use to sort this elements.
Lets take an input array as below
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
the number 2,10 and 4 are not repeated ,but they are in sorted order, if not sorted use quick sort to first sort it out.
Lets apply my programme on this
using namespace std;
main()
{
//int arr[] = {2, 9, 6, 1, 1, 4, 2, 3, 5};
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
int i = 0;
vector<int> vec;
int var = arr[0];
for(i = 1 ; i < sizeof(arr)/sizeof(arr[0]); i += 2)
{
var = var ^ arr[i];
if(var != 0 )
{
//put in vector
var = arr[i-1];
vec.push_back(var);
i = i-1;
}
var = arr[i+1];
}
for(int i = 0 ; i < vec.size() ; i++)
printf("value not repeated = %d\n",vec[i]);
}
This gives the output:
value not repeated= 2
value not repeated= 10
value not repeated= 4
Its simple and very straight forward, just use XOR man.
for(i=1;i<=n;i++) {
if(!(arr[i] ^ arr[i+1]))
printf("Found Repeated number %5d",arr[i]);
}
Here is an algorithm that uses order statistics and runs in O(n).
You can solve this by repeatedly calling SELECT with the median as parameter.
You also rely on the fact that After a call to SELECT,
the elements that are less than or equal to the median are moved to the left of the median.
Call SELECT on A with the median as the parameter.
If the median value is floor(n/2) then the repeated values are right to the median. So you continue with the right half of the array.
Else if it is not so then a repeated value is left to the median. So you continue with the left half of the array.
You continue this way recursively.
For example:
When A={2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, then the median should be the value 4.
After the first call to SELECT
A={3, 2, 0, 1, <3>, 4, 5, 6, 5} The median value is smaller than 4 so we continue with the left half.
A={3, 2, 0, 1, 3}
After the second call to SELECT
A={1, 0, <2>, 3, 3} then the median should be 2 and it is so we continue with the right half.
A={3, 3}, found.
This algorithm runs in O(n+n/2+n/4+...)=O(n).
What about using the https://en.wikipedia.org/wiki/HyperLogLog?
Redis does http://redis.io/topics/data-types-intro#hyperloglogs
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.
Well using the nested for loop and assuming the question is to find the number occurred only twice in an array.
def repeated(ar,n):
count=0
for i in range(n):
for j in range(i+1,n):
if ar[i] == ar[j]:
count+=1
if count == 1:
count=0
print("repeated:",ar[i])
arr= [2, 3, 6, 1, 5, 4, 0, 3, 5]
n = len(arr)
repeated(arr,n)
Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no .
This can be extended to find out if there is any missing no in the array or not.
This solution is O(n) in time complexity

Resources