Minimum transfer to make array equal - algorithm

This question is asked in the interview. I am still not able to find what should be right approach to attempt this problem.
Given an array = [7,2,2] find the minimum number of transfer required to make array elements almost equal. If this is not possible the larger elements should come to the left side.
In above example the final state of array would be [4,4,3] and the answer will be 2+ 1 =3.
We are transfering 2 from 7 to first 2 and then we are transfering another 1 from 7 to 2.
If the input is [2,2,7] then the answer will be 4 since we need to keep bigger elements on the left side.
final state = [4,4,3]
2 transfered from 7 to both 2 to make the final count as 4.

The minimum amount of transfers done 1 unit at a time is half the total amount by which the input differs from the desired array. "Almost equal" doesn't seem to mean any complication according to what you've given.

The solution is to imagine what the target array will be. This target array will depend only on the sum of the values in the original array, and the length of the array (which obviously must remain the same).
If the sum of the values is a multiple of the array length, then in the target array all values will be the same. If however there is a remainder, that remainder represents the number of array values that will be one more than some of the value(s) at the end of the array.
We don't actually have to store that target array. It is implicitly defined by the quotient and the remainder of the division of the sum by the array length.
The output of the function is the sum of differences with the actual input array value and the expected value at any array index. We should only count positive differences (i.e. transfers out of a value) as otherwise we would count transfers twice -- once on the outgoing side and again on the incoming side.
Here is an implementation in basic JavaScript:
function solve(arr) {
// Sum all array values
let sum = 0;
for (let i = 0; i < arr.length; i++) {
sum += arr[i];
}
// Get the integer quotient and remainder
let quotient = Math.floor(sum / arr.length);
let remainder = sum % arr.length;
// Determine the target value until the remainder is completely consumed:
let expected = quotient + 1;
// Collect all the positive differences with the expected value
let result = 0;
for (let i = 0; i < arr.length; i++) {
// If we have consumed the remainder, reduce the expected value
if (i == remainder) {
expected = quotient;
}
let transfer = arr[i] - expected;
// Only account for positive transfers to avoid double counting
if (transfer > 0) {
result += transfer;
}
}
return result;
}
let array = [7,2,2];
console.log(solve(array)); // 6

Let's start form target array. What is it?
Having {7, 2, 2} we want to obtain {4, 4, 3}. So every item is at least 3 and some top items are 3 + 1 == 4.
The algorithm is
let sum = sum(original)
let rem = sum(original) % length(original) # here % stands for remainder
target[i] = sum / length(original) + (i < rem ? 1 : 0)
Having original and target
original: 7 2 2
target: 4 4 3
transfer: 3 2 1 (6 in total)
note, that
transfer[i] is just an absolute difference: abs(original[i] - target[i])
we count each transfer twice: once we subtract and then we add.
So the answer is
sum(transfer[i]) / 2 == sum(abs(original[i] - target[i])) / 2
Code (c#):
private static int Solve(int[] initial) {
// Don't forget about degenerated cases
if (initial is null || initial.Length <= 0)
return 0;
int sum = initial.Sum();
int rem = sum % initial.Length;
int result = 0;
for (int i = 0; i < initial.Length; ++i)
result += Math.Abs(sum / initial.Length + ((i < rem) ? 1 : 0) - initial[i]);
return result / 2;
}
Demo: (Fiddle)
int[][] tests = new int[][] {
new int[] {7, 2, 2},
new int[] {2, 2, 7},
new int[] {},
new int[] {2, 2, 2},
new int[] {1, 2, 3},
};
string report = string.Join(Environment.NewLine, tests
.Select(test => $"[{string.Join(", ", test)}] => {Solve(test)}"));
Console.Write(report);
Outcome:
[7, 2, 2] => 3
[2, 2, 7] => 4
[] => 0
[2, 2, 2] => 0
[1, 2, 3] => 1

Seems to me like a simple problem that can be solved with greedy approach.
Steps:
Sum up the input array-elements S, divide by its length n. Lets say, the quotient is Q and remainder (mod) is R. Then, final array target will have 1st R elements with value = Q+1. Rest of the elements will be Q.
Number of transfers will be half of the sum of absolute difference at each (corresponding) position in input and target arrays.
Example:
Input [7, 2, 2]
S=11 n=3 Q=11/3=3 R=11%3=2
Target [3+1, 3+1, 3]
Answer = (abs(7-4) + abs(2-4) + abs(2-3)) / 2 = 3

Related

Interview Question - Which numbers shows up most times in a list of intervals

I only heard of this question, so I don't know the exact limits. You are given a list of positive integers. Each two consecutive values form a closed interval. Find the number that appears in most intervals. If two values appear the same amount of times, select the smallest one.
Example: [4, 1, 6, 5] results in [1, 4], [1, 6], [5, 6] with 1, 2, 3, 4, 5 each showing up twice. The correct answer would be 1 since it's the smallest.
I unfortunately have no idea how this can be done without going for an O(n^2) approach. The only optimisation I could think of was merging consecutive descending or ascending intervals, but this doesn't really work since [4, 3, 2] would count 3 twice.
Edit: Someone commented (but then deleted) a solution with this link http://www.zrzahid.com/maximum-number-of-overlapping-intervals/. I find this one the most elegant, even though it doesn't take into account the fact that some elements in my input would be both the beginning and end of some intervals.
Sort intervals based on their starting value. Then run a swipe line from left (the global smallest value) to the right (the global maximum value) value. At each meeting point (start or end of an interval) count the number of intersection with the swipe line (in O(log(n))). Time complexity of this algorithm would be O(n log(n)) (n is the number of intervals).
The major observation is that the result will be one of the numbers in the input (proof left to the reader as simple exercise, yada yada).
My solution will be inspired by #Prune's solution. The important step is mapping the input numbers to their order within all different numbers in the input.
I will work with C++ std. We can first load all the numbers into a set. We can then create map from that, which maps a number to its order within all numbers.
int solve(input) {
set<int> vals;
for (int n : input) {
vals.insert(n);
}
map<int, int> numberOrder;
int order = 0;
for (int n : vals) { // values in a set are ordered
numberOrder[n] = order++;
}
We then create process array (similar to #Prune's solution).
int process[map.size() + 1]; // adding past-the-end element
int curr = input[0];
for (int i = 0; i < input.size(); ++i) {
last = curr;
curr = input[i];
process[numberOrder[min(last, curr)]]++;
process[numberOrder[max(last, curr)] + 1]--;
}
int appear = 0;
int maxAppear = 0;
for (int i = 0; i < process.size(); ++i) {
appear += process[i];
if (appear > maxAppear) {
maxAppear = appear;
maxOrder = i;
}
}
Last, we need to find our found value in the map.
for (pair<int, int> a : numberOrder) {
if (a.second == maxOrder) {
return a.first;
}
}
}
This solution has O(n * log(n)) time complexity and O(n) space complexity, which is independent on maximum input number size (unlike other solutions).
If the maximum number in the range array is less than the maximum size limit of an array, my solution will work with complexity o(n).
1- I created a new array to process ranges and use it to find the
numbers that appears most in all intervals. For simplicity let's use
your example. the input = [1, 4], [1, 6], [5, 6]. let's call the new
array process and give it length 6 and it is initialized with 0s
process = [0,0,0,0,0,0].
2-Then loop through all the intervals and mark the start with (+1) and
the cell immediately after my range end with (-1)
for range [1,4] process = [1,0,0,0,-1,0]
for range [1,6] process = [2,0,0,0,-1,0]
for range [5,6] process = [2,0,0,0,0,0]
3- The p rocess array will work as accumulative array. initialize a
variable let's call it appear = process[0] which will be equal to 2
in our case. Go through process and keep accumulating what can you
notice? elements 1,2,3,4,5,6 will have appear =2 because each of
them appeared twice in the given ranges .
4- Maximize while you loop through process array you will find the
solution
public class Test {
public static void main(String[] args) {
int[] arr = new int[] { 4, 1, 6, 5 };
System.out.println(solve(arr));
}
public static int solve(int[] range) {
// I assume that the max number is Integer.MAX_VALUE
int size = (int) 1e8;
int[] process = new int[size];
// fill process array
for (int i = 0; i < range.length - 1; ++i) {
int start = Math.min(range[i], range[i + 1]);
int end = Math.max(range[i], range[i + 1]);
process[start]++;
if (end + 1 < size)
process[end + 1]--;
}
// Find the number that appears in most intervals (smallest one)
int appear = process[0];
int max = appear;
int solu = 0;
for (int i = 1; i < size; ++i) {
appear += process[i];
if (appear > max){
solu = i;
max = appear;
}
}
return solu;
}
}
Think of these as parentheses: ( to start and interval, ) to end. Now check the bounds for each pair [a, b], and tally interval start/end markers for each position: the lower number gets an interval start to the left; the larger number gets a close interval to the right. For the given input:
Process [4, 1]
result: [0, 1, 0, 0, 0, -1]
Process [1, 6]
result: [0, 2, 0, 0, 0, -1, 0, -1]
Process [6, 5]
result: [0, 2, 0, 0, 0, -1, 1, -2]
Now, merely make a cumulative sum of this list; the position of the largest value is your desired answer.
result: [0, 2, 0, 0, 0, -1, 1, -2]
cumsum: [0, 2, 2, 2, 2, 1, 2, 0]
Note that the final sum must be 0, and can never be negative. The largest value is 2, which appears first at position 1. Thus, 1 is the lowest integer that appears the maximum (2) quantity.
No that's one pass on the input, and one pass on the range of numbers. Note that with a simple table of values, you can save storage. The processing table would look something like:
[(1, 2)
(4, -1)
(5, 1)
(6, -2)]
If you have input with intervals both starting and stopping at a number, then you need to handle the starts first. For instance, [4, 3, 2] would look like
[(2, 1)
(3, 1)
(3, -1)
(4, -1)]
NOTE: maintaining a sorted insert list is O(n^2) time on the size of the input; sorting the list afterward is O(n log n). Either is O(n) space.
My first suggestion, indexing on the number itself, is O(n) time, but O(r) space on the range of input values.
[

How to divide a number into multiple parts(not equal) so that there sum is equal to input?

I want to divide a number e.g. input number i.e. 40 into different token(30 parts) numbers randomly selected from a range and their sum must be equal to input number i.e 40.
Edit:
Max Range is should be 40% and minimum should be 0.
example:
range = (0,4)
1+1+0+1+1+0+3+0+3+0+0+2+0+4+4+1+1+0+1+1+0+3+0+4+0+2+2+0+4+1 = 40.
Actually in real world Showing results for scenario I am having a sum of product users expressions which i need to populate randomly into a record set for each day in last month. I am using php but unable to get the algorithm to process such situation.
Simple approach exploits "trial and error" method. Suitable for reasonable small input values.
Note - it might work long time when n is close to p*maxx. If such case is possible, it would more wise to distribute "holes" rather than "ones" (the second code)
import random
def randparts(n, p, maxx):
lst = [0] * p
while n > 0:
r = random.randrange(p)
if lst[r] < maxx:
n -= 1
lst[r] += 1
return lst
print(randparts(20, 10, 4))
>>> [2, 0, 3, 2, 4, 2, 1, 3, 0, 3]
def randparts(n, p, maxx):
if p * maxx >= n * 2:
lst = [0] * p
while n > 0:
r = random.randrange(p)
if lst[r] < maxx:
n -= 1
lst[r] += 1
else:
lst = [maxx] * p
n = maxx * p - n
while n > 0:
r = random.randrange(p)
if lst[r] > 0:
n -= 1
lst[r] -= 1
return lst
print(randparts(16, 10, 4))
print(randparts(32, 10, 4))
>> [2, 0, 0, 3, 4, 0, 0, 3, 2, 2]
>> [3, 4, 4, 4, 4, 0, 3, 3, 4, 3]
Since you mentioned that it is for 'a record set for each day in last month', I assume that the number of tokens could also be 28, or 31, and since you said 'randomly', here is what I would do:
1. create a function that takes in:
a. The number to sum to (40 in your example).
b. The maximum number of a single token (4 in your example).
c. The number of tokens (30 in your example).
2. Within the function, create an array the size of the number of tokens (28, 30, 31, or whatever)
3. Initialize all elements of the array to zero.
4. Check to make sure that it is possible to achieve the sum given the maximum single token value and number of tokens.
5. While I need to increment a token (sum > 0):
a. Select a random token.
b. Determine if the value of the token can be incremented without going over the max single token value.
c. If it can, then increment the token value and decrement the sum.
d. If the token cannot be incremented, then go back to 5a.
6. Return the array of tokens, or however you want them back (you didn't specify).
Here is an example in c#:
public int[] SegmentSum(int sum, int maxPart, int parts)
{
if (sum < 0 || maxPart < 0 || parts < 0 || parts * maxPart < sum)
throw new ArgumentOutOfRangeException;
Random rnd = new Random();
int[] tokens = Enumerable.Repeat(0, parts).ToArray();
while(sum > 0)
{
int token = rnd.Next(parts);
if (tokens[token] < maxPart)
{
tokens[token]++;
sum--;
}
}
return tokens;
}
Hope this helps you.

the least adding numbers--algorithm

I came across this problem online.
Given an integer:N and an array int arr[], you have to add some
elements to the array so that you can generate from 1 to N by using
(add) the element in the array.
Please keep in mind that you can only use each element in the array once when generating a certain x (1<=x<=N). Return the number of the least adding numbers.
For example:
N=6, arr = [1, 3]
1 is already in arr.
add 2 to the arr.
3 is already in arr
4 = 1 + 3
5 = 2 + 3
6 = 1 + 2 + 3
So we return 1 since we only need to add one element which is 2.
Can anyone give some hints?
N can always be made by adding subset of 1 to N - 1 numbers except N = 2 and N = 1. So, a number X can must be made when previous 1 to X - 1 consecutive elements are already in the array.
Example -
arr[] = {1, 2, 5}, N = 9
ans := 0
1 is already present.
2 is already present.
3 is absent. But prior 1 to (3 - 1) elements are present. So 3 is added in the array. But as 3 is built using already existed elements, so answer won't increase.
same rule for 4 and 5
So, ans is 0
arr[] = {3, 4}, for any N >= 2
ans = 2
arr[] = {1, 3}, for any N >= 2
ans = 1
So, it seems that, if only 1 and 2 is not present in the array, we have to add that element regardless of the previous elements are already in array or not. All later numbers can be made by using previous elements. And when trying to making any number X (> 2), we will already found previous 1 to X - 1 elements in the array. So X can always be made.
So, basically we need to check if 1 and 2 is present or not. So answer of this problem won't be bigger than 2
Constraint 2
In above algorithm, we assume, when a new element X is not present in the array but it can be made using already existed elements of the array, then answer won't increase but X will be added in the array to be used for next numbers building. What if X can't be added in the array?
Then, Basically it will turn into a subset sum problem. For every missing number we have to check if the number can be made using any subset of elements in the array. Its a typical O(N^2) dynamic programming algorithm.
int subsetSum(vector<int>& arr, int N)
{
// The value of subset[i][j] will be true if there is a subset of set[0..j-1]
// with sum equal to i
bool subset[N + 1][arr.size() + 1];
// If sum is 0, then answer is true
for (int i = 0; i <= arr.size(); i++)
subset[0][i] = true;
// If sum is not 0 and set is empty, then answer is false
for (int i = 1; i <= N; i++)
subset[i][0] = false;
// Fill the subset table in botton up manner
for (int i = 1; i <= N; i++)
{
for (int j = 1; j <= arr.size(); j++)
{
subset[i][j] = subset[i][j - 1];
if (i >= set[j - 1])
subset[i][j] = subset[i][j] || subset[i - set[j - 1]][j - 1];
}
}
unordered_map<int, bool> exist;
for(int i = 0; i < arr.size(); ++i) {
exist[arr[i]] = true;
}
int ans = 0;
for(int i = 1; i <= N; ++i) {
if(!exist[i] or !subset[i][arr.size()]) {
ans++;
}
}
return ans;
}
Let A be the collection of input numbers.
Initialize a boolean array B to store in B[i] whether or not we can 'make' i by adding the numbers in A as described in the problem. Make all B[i] initially FALSE.
Then, pseudocode:
for i = 1 to N
if B[i] && (not A.Contains(i))
continue next i
if not A.Contains(i)
countAdded++
for j = N-i downTo 1
if B[j] then B[j+i] = TRUE
B[i] = TRUE
next i
Explanation:
Within the (main) loop (i): B contains TRUE for the values that we can construct with the values in A that are lower than i. Initially, therefore, with i=1 all B are FALSE.
Then, for each i we have two aspects to consider: (a) is B[i] already TRUE? If not we'll have to add i; (b) is i present in A? because, see previous remark, at this point we haven't yet processed that A-value. So, even if B[i] is already TRUE we'll have to flag TRUE for all (other) B that we may reach with i.
Consequently:
For each i we first determine if either of these two cases applies, and if not, we skip to the next i.
Then, if A does NOT (yet) contain i, it must be the case that B[i] is FALSE, see skip-condition, and therefore we'll add i (to A, conceptually, but it's not necessary to actually put it into A).
Next, either we had i in A initially, or we have just added it. In any case, we'll need to flag B TRUE for all values that can be constructed with this new i. To do so, we better scan existing B in downward fashion; otherwise we may add i to a "new" B-value that has i already as constituent.
Finally, B[i] itself is set TRUE (it may already be TRUE...), simply because i is in A (orginally, or by adding)
One way can be to make a set of all possible numbers that can be generated by the array. This can be done in O(n^2) time. Then, check whether numbers from 1 to n are present in the set in O(1) time. If a number is not present, add it to the count of least adding numbers which was initially zero and make a new empty set. Take all elements of previous set and add not present number to them and add them (set-add method) to the new set. Replace original set with the union of original and new set. Doing this from 1 to n will give the sum of least adding numbers in O(n^3) time.
Sort the array (NLogN)
Think this should work -
max_sum = 0
numbers_added = 0 # this will contain you final answer
for i in range(1, N+1):
if i not in arr and i > max_sum:
numbers_added += 1
max_sum += i
elif i < len(arr):
max_sum += arr[i]
print numbers_added
For each number starting from 1 we may either
Have it in the arr. In such case we update the list of numbers we can make.
Don't have it in the arr but we can form it with existing numbers. We simply ignore it.
We don't have it in the arr and we cannot form it with existing numbers. We add it to the arr and update the list of numbers we can make.
For example:
N=10, arr = [1, 2, 6]
1 is already in arr.
2 is already in arr.
3 = 1 + 2
3 is not in the arr but we can already form 3.
4 is not present in arr and we cannot form 4 either with existing numbers.
So add 4 to the arr and update.
5 = 1 + 4
6 = 2 + 4
7 = 1 + 2 + 4
5 is not in arr but we can form 5.
6 is in array. So update
8 = 2 + 6
9 = 1 + 2 + 6
10 = 4 + 6
So we return 1 since we only need to add one element which is 4.
And following might be an implementation:
int calc(bool arr[], bool can[], int N) {
// arr[i] is true if we already have number
// can[i] is true if we have been able to form number i
int count=0;
for(int i=1;i<=N;i++) {
if(arr[i]==false && can[i]==true) { // case 1
continue;
} else if(arr[i]==false && can[i]==false) { // case 3
count++;
}
for(int j=N-i;j>=1;j--) { // update for case 1 and case 3
if(can[j]==true) can[i+j]=true;
}
can[i]=1;
}
return count;
}

How to find longest increasing sequence starting at each position within the array in O(n log n) time,

How could we find longest increasing sub-sequence starting at each position of the array in O(n log n) time, I have seen techniques to find longest increasing sequence ending at each position of the array but I am unable to find the other way round.
e.g.
for the sequence " 3 2 4 4 3 2 3 "
output must be " 2 2 1 1 1 2 1 "
I made a quick and dirty JavaScript implementation (note: it is O(n^2)):
function lis(a) {
var tmpArr = Array(),
result = Array(),
i = a.length;
while (i--) {
var theValue = a[i],
longestFound = tmpArr[theValue] || 1;
for (var j=theValue+1; j<tmpArr.length; j++) {
if (tmpArr[j] >= longestFound) {
longestFound = tmpArr[j]+1;
}
}
result[i] = tmpArr[theValue] = longestFound;
}
return result;
}
jsFiddle: http://jsfiddle.net/Bwj9s/1/
We run through the array right-to-left, keeping previous calculations in a separate temporary array for subsequent lookups.
The tmpArray contains the previously found subsequences beginning with any given value, so tmpArray[n] will represent the longest subsequence found (to the right of the current position) beginning with the value n.
The loop goes like this: For every index, we look up the value (and all higher values) in our tmpArray to see if we already found a subsequence which the value could be prepended to. If we find one, we simply add 1 to that length, update the tmpArray for the value, and move to the next index. If we don't find a working (higher) subsequence, we set the tmpArray for the value to 1 and move on.
In order to make it O(n log n) we observe that the tmpArray will always be a decreasing array -- it can and should use a binary search rather than a partial loop.
EDIT: I didn't read the post completely, sorry. I thought you needed the longest increasing sub-sequence for all sequence. Re-edited the code to make it work.
I think it is possible to do it in linear time, actually. Consider this code:
int a[10] = {4, 2, 6, 10, 5, 3, 7, 5, 4, 10};
int maxLength[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; // array of zeros
int n = 10; // size of the array;
int b = 0;
while (b != n) {
int e = b;
while (++e < n && a[b] < a[e]) {} //while the sequence is increasing, ++e
while (b != e) { maxLength[b++] = e-b-1; }
}

Algorithm to find two repeated numbers in an array, without sorting

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.
E.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.
What is the best way to find the repeated numbers?
P.S. [You should not use sorting]
There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.
bool flags[100];
for(int i = 0; i < 100; i++)
flags[i] = false;
for(int i = 0; i < input_size; i++)
if(flags[input_array[i]])
return input_array[i];
else
flags[input_array[i]] = true;
Of course there is the additional memory but this is the fastest.
OK, seems I just can't give it a rest :)
Simplest solution
int A[N] = {...};
int signed_1(n) { return n%2<1 ? +n : -n; } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n; } // 0,+1,-2,-3,+4,+5,-6,-7,...
long S1 = 0; // or int64, or long long, or some user-defined class
long S2 = 0; // so that it has enough bits to contain sum without overflow
for (int i=0; i<N-2; ++i)
{
S1 += signed_1(A[i]) - signed_1(i);
S2 += signed_2(A[i]) - signed_2(i);
}
for (int i=N-2; i<N; ++i)
{
S1 += signed_1(A[i]);
S2 += signed_2(A[i]);
}
S1 = abs(S1);
S2 = abs(S2);
assert(S1 != S2); // this algorithm fails in this case
p = (S1+S2)/2;
q = abs(S1-S2)/2;
One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().
if abs(S1)==abs(S2) then the algorithm fails, though this value will still be the difference between p and q (i.e. abs(p - q) == abs(S1)).
Previous solution
I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:
Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)
so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2
Simple math remains:
(1) p+q = S1
(2) p^2+q^2 = S2
Surely you won't solve it as math classes teach to solve square equations.
First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.
You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.
You can calculate the sum and the product over the array, resulting in:
1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2
So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get
sum(Array) - (n-3)(n-2)/2 = x = p + q
Now do the same for the product:
1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q
prod(Array) / (n - 3)! = y = p * q
Your now got these terms:
x = p + q
y = p * q
=> y(p + q) = x(p * q)
If you transform this term, you should be able to calculate p and q
Insert each element into a set/hashtable, first checking if its are already in it.
You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.
Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.
Check this old but good paper on the topic:
Finding Repeated Elements (PDF)
Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.
For example, here's a relevant part from my answer:
bool has_duplicates(int* a, int m, int n)
{
/** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')
Whether a[] array has duplicates.
precondition: all values are in [n, n+m) range.
feature: It marks visited items using a sign bit.
*/
assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
for (int *p = a; p != &a[m]; ++p) {
*p -= (n - 1); // [n, n+m) -> [1, m+1)
assert(*p > 0);
}
// determine: are there duplicates
bool has_dups = false;
for (int i = 0; i < m; ++i) {
const int j = abs(a[i]) - 1;
assert(j >= 0);
assert(j < m);
if (a[j] > 0)
a[j] *= -1; // mark
else { // already seen
has_dups = true;
break;
}
}
// restore the array
for (int *p = a; p != &a[m]; ++p) {
if (*p < 0)
*p *= -1; // unmark
// [1, m+1) -> [n, n+m)
*p += (n - 1);
}
return has_dups;
}
The program leaves the array unchanged (the array should be writeable but its values are restored on exit).
It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).
suppose array is
a[0], a[1], a[2] ..... a[n-1]
sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]
sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2
similarly
sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6
Suppose repeated elements are = X and Y
so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;
So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)
I know the question is very old but I suddenly hit it and I think I have an interesting answer to it.
We know this is a brainteaser and a trivial solution (i.e. HashMap, Sort, etc) no matter how good they are would be boring.
As the numbers are integers, they have constant bit size (i.e. 32). Let us assume we are working with 4 bit integers right now. We look for A and B which are the duplicate numbers.
We need 4 buckets, each for one bit. Each bucket contains numbers which its specific bit is 1. For example bucket 1 gets 2, 3, 4, 7, ...:
Bucket 0 : Sum ( x where: x & 2 power 0 == 0 )
...
Bucket i : Sum ( x where: x & 2 power i == 0 )
We know what would be the sum of each bucket if there was no duplicate. I consider this as prior knowledge.
Once above buckets are generated, a bunch of them would have values more than expected. By constructing the number from buckets we will have (A OR B for your information).
We can calculate (A XOR B) as follows:
A XOR B = Array[i] XOR Array[i-1] XOR ... 0, XOR n-3 XOR n-2 ... XOR 0
Now going back to buckets, we know exactly which buckets have both our numbers and which ones have only one (from the XOR bit).
For the buckets that have only one number we can extract the number num = (sum - expected sum of bucket). However, we should be good only if we can find one of the duplicate numbers so if we have at least one bit in A XOR B, we've got the answer.
But what if A XOR B is zero?
Well this case is only possible if both duplicate numbers are the same number, which then our number is the answer of A OR B.
Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.
Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:
int count [10];
for (int i = 0; i < arraylen; i++) {
count[array[i]]++;
}
Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.
Here's implementation in Python of #eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).
def two_repeated(iterable):
s1, s2 = 0, 0
for i, j in enumerate(iterable):
s1 += j - i # number_of_digits(s1) ~ 2 * number_of_digits(i)
s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i)
s1 += (i - 1) + i
s2 += (i - 1)**2 + i**2
p = (s1 - int((2*s2 - s1**2)**.5)) // 2
# `Decimal().sqrt()` could replace `int()**.5` for really large integers
# or any function to compute integer square root
return p, s1 - p
Example:
>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
A more verbose version of the above code follows with explanation:
def two_repeated_seq(arr):
"""Return the only two duplicates from `arr`.
>>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
"""
n = len(arr)
assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
assert len(set(arr)) == (n - 2) # number of unique items
s1 = (n-2) + (n-1) # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits
s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
for i, j in enumerate(arr):
s1 += j - i
s2 += j*j - i*i
"""
s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
= sum(arr) - sum(range(n-2))
= sum(range(n-2)) + p + q - sum(range(n-2))
= p + q
"""
assert s1 == (sum(arr) - sum(range(n-2)))
"""
s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
= sum(i*i for i in arr) - sum(i*i for i in range(n-2))
= p*p + q*q
"""
assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))
"""
s1 = p+q
-> s1**2 = (p+q)**2
-> s1**2 = p*p + 2*p*q + q*q
-> s1**2 - (p*p + q*q) = 2*p*q
s2 = p*p + q*q
-> p*q = (s1**2 - s2)/2
Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
that p and q are roots of x**2 - B*x + C = 0
-> p = (B + sqrtD) / 2
-> q = (B - sqrtD) / 2
where sqrtD = sqrt(B**2 - 4*C)
-> p = (s1 + sqrt(2*s2 - s1**2))/2
"""
sqrtD = (2*s2 - s1**2)**.5
assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
sqrtD = int(sqrtD)
assert (s1 - sqrtD) % 2 == 0 # even
p = (s1 - sqrtD) // 2
q = s1 - p
assert q == ((s1 + sqrtD) // 2)
assert sqrtD == (q - p)
return p, q
NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.
Since a range is specified, you can perform radix sort. This would sort your array in O(n). Searching for duplicates in a sorted array is then O(n)
You can use simple nested for loop
int[] numArray = new int[] { 1, 2, 3, 4, 5, 7, 8, 3, 7 };
for (int i = 0; i < numArray.Length; i++)
{
for (int j = i + 1; j < numArray.Length; j++)
{
if (numArray[i] == numArray[j])
{
//DO SOMETHING
}
}
*OR you can filter the array and use recursive function if you want to get the count of occurrences*
int[] array = { 1, 2, 3, 4, 5, 4, 4, 1, 8, 9, 23, 4, 6, 8, 9, 1,4 };
int[] myNewArray = null;
int a = 1;
void GetDuplicates(int[] array)
for (int i = 0; i < array.Length; i++)
{
for (int j = i + 1; j < array.Length; j++)
{
if (array[i] == array[j])
{
a += 1;
}
}
Console.WriteLine(" {0} occurred {1} time/s", array[i], a);
IEnumerable<int> num = from n in array where n != array[i] select n;
myNewArray = null;
a = 1;
myNewArray = num.ToArray() ;
break;
}
GetDuplicates(myNewArray);
answer to 18..
you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!
check this out ...
O(n) time and O(1) space complexity
for(i=0;i< n;i++)
xor=xor^arr[i]
for(i=1;i<=n-3;i++)
xor=xor^i;
So in the given example you will get the xor of 3 and 5
xor=xor & -xor //Isolate the last digit
for(i = 0; i < n; i++)
{
if(arr[i] & xor)
x = x ^ arr[i];
else
y = y ^ arr[i];
}
for(i = 1; i <= n-3; i++)
{
if(i & xor)
x = x ^ i;
else
y = y ^ i;
}
x and y are your answers
For each number: check if it exists in the rest of the array.
Without sorting you're going to have a keep track of numbers you've already visited.
in psuedocode this would basically be (done this way so I'm not just giving you the answer):
for each number in the list
if number not already in unique numbers list
add it to the unique numbers list
else
return that number as it is a duplicate
end if
end for each
How about this:
for (i=0; i<n-1; i++) {
for (j=i+1; j<n; j++) {
if (a[i] == a[j]) {
printf("%d appears more than once\n",a[i]);
break;
}
}
}
Sure it's not the fastest, but it's simple and easy to understand, and requires
no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)
In c:
int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};
int num = 0, i;
for (i=0; i < 8; i++)
num = num ^ arr[i] ^i;
Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.
I have written a small programme which finds out the number of elements not repeated, just go through this let me know your opinion, at the moment I assume even number of elements are even but can easily extended for odd numbers also.
So my idea is to first sort the numbers and then apply my algorithm.quick sort can be use to sort this elements.
Lets take an input array as below
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
the number 2,10 and 4 are not repeated ,but they are in sorted order, if not sorted use quick sort to first sort it out.
Lets apply my programme on this
using namespace std;
main()
{
//int arr[] = {2, 9, 6, 1, 1, 4, 2, 3, 5};
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
int i = 0;
vector<int> vec;
int var = arr[0];
for(i = 1 ; i < sizeof(arr)/sizeof(arr[0]); i += 2)
{
var = var ^ arr[i];
if(var != 0 )
{
//put in vector
var = arr[i-1];
vec.push_back(var);
i = i-1;
}
var = arr[i+1];
}
for(int i = 0 ; i < vec.size() ; i++)
printf("value not repeated = %d\n",vec[i]);
}
This gives the output:
value not repeated= 2
value not repeated= 10
value not repeated= 4
Its simple and very straight forward, just use XOR man.
for(i=1;i<=n;i++) {
if(!(arr[i] ^ arr[i+1]))
printf("Found Repeated number %5d",arr[i]);
}
Here is an algorithm that uses order statistics and runs in O(n).
You can solve this by repeatedly calling SELECT with the median as parameter.
You also rely on the fact that After a call to SELECT,
the elements that are less than or equal to the median are moved to the left of the median.
Call SELECT on A with the median as the parameter.
If the median value is floor(n/2) then the repeated values are right to the median. So you continue with the right half of the array.
Else if it is not so then a repeated value is left to the median. So you continue with the left half of the array.
You continue this way recursively.
For example:
When A={2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, then the median should be the value 4.
After the first call to SELECT
A={3, 2, 0, 1, <3>, 4, 5, 6, 5} The median value is smaller than 4 so we continue with the left half.
A={3, 2, 0, 1, 3}
After the second call to SELECT
A={1, 0, <2>, 3, 3} then the median should be 2 and it is so we continue with the right half.
A={3, 3}, found.
This algorithm runs in O(n+n/2+n/4+...)=O(n).
What about using the https://en.wikipedia.org/wiki/HyperLogLog?
Redis does http://redis.io/topics/data-types-intro#hyperloglogs
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.
Well using the nested for loop and assuming the question is to find the number occurred only twice in an array.
def repeated(ar,n):
count=0
for i in range(n):
for j in range(i+1,n):
if ar[i] == ar[j]:
count+=1
if count == 1:
count=0
print("repeated:",ar[i])
arr= [2, 3, 6, 1, 5, 4, 0, 3, 5]
n = len(arr)
repeated(arr,n)
Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no .
This can be extended to find out if there is any missing no in the array or not.
This solution is O(n) in time complexity

Resources