Algorithm to find two repeated numbers in an array, without sorting - algorithm

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.
E.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.
What is the best way to find the repeated numbers?
P.S. [You should not use sorting]

There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.
bool flags[100];
for(int i = 0; i < 100; i++)
flags[i] = false;
for(int i = 0; i < input_size; i++)
if(flags[input_array[i]])
return input_array[i];
else
flags[input_array[i]] = true;
Of course there is the additional memory but this is the fastest.

OK, seems I just can't give it a rest :)
Simplest solution
int A[N] = {...};
int signed_1(n) { return n%2<1 ? +n : -n; } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n; } // 0,+1,-2,-3,+4,+5,-6,-7,...
long S1 = 0; // or int64, or long long, or some user-defined class
long S2 = 0; // so that it has enough bits to contain sum without overflow
for (int i=0; i<N-2; ++i)
{
S1 += signed_1(A[i]) - signed_1(i);
S2 += signed_2(A[i]) - signed_2(i);
}
for (int i=N-2; i<N; ++i)
{
S1 += signed_1(A[i]);
S2 += signed_2(A[i]);
}
S1 = abs(S1);
S2 = abs(S2);
assert(S1 != S2); // this algorithm fails in this case
p = (S1+S2)/2;
q = abs(S1-S2)/2;
One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().
if abs(S1)==abs(S2) then the algorithm fails, though this value will still be the difference between p and q (i.e. abs(p - q) == abs(S1)).
Previous solution
I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:
Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)
so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2
Simple math remains:
(1) p+q = S1
(2) p^2+q^2 = S2
Surely you won't solve it as math classes teach to solve square equations.
First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.

You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.
You can calculate the sum and the product over the array, resulting in:
1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2
So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get
sum(Array) - (n-3)(n-2)/2 = x = p + q
Now do the same for the product:
1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q
prod(Array) / (n - 3)! = y = p * q
Your now got these terms:
x = p + q
y = p * q
=> y(p + q) = x(p * q)
If you transform this term, you should be able to calculate p and q

Insert each element into a set/hashtable, first checking if its are already in it.

You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.
Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.

Check this old but good paper on the topic:
Finding Repeated Elements (PDF)

Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.
For example, here's a relevant part from my answer:
bool has_duplicates(int* a, int m, int n)
{
/** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')
Whether a[] array has duplicates.
precondition: all values are in [n, n+m) range.
feature: It marks visited items using a sign bit.
*/
assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
for (int *p = a; p != &a[m]; ++p) {
*p -= (n - 1); // [n, n+m) -> [1, m+1)
assert(*p > 0);
}
// determine: are there duplicates
bool has_dups = false;
for (int i = 0; i < m; ++i) {
const int j = abs(a[i]) - 1;
assert(j >= 0);
assert(j < m);
if (a[j] > 0)
a[j] *= -1; // mark
else { // already seen
has_dups = true;
break;
}
}
// restore the array
for (int *p = a; p != &a[m]; ++p) {
if (*p < 0)
*p *= -1; // unmark
// [1, m+1) -> [n, n+m)
*p += (n - 1);
}
return has_dups;
}
The program leaves the array unchanged (the array should be writeable but its values are restored on exit).
It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).

suppose array is
a[0], a[1], a[2] ..... a[n-1]
sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]
sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2
similarly
sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6
Suppose repeated elements are = X and Y
so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;
So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)

I know the question is very old but I suddenly hit it and I think I have an interesting answer to it.
We know this is a brainteaser and a trivial solution (i.e. HashMap, Sort, etc) no matter how good they are would be boring.
As the numbers are integers, they have constant bit size (i.e. 32). Let us assume we are working with 4 bit integers right now. We look for A and B which are the duplicate numbers.
We need 4 buckets, each for one bit. Each bucket contains numbers which its specific bit is 1. For example bucket 1 gets 2, 3, 4, 7, ...:
Bucket 0 : Sum ( x where: x & 2 power 0 == 0 )
...
Bucket i : Sum ( x where: x & 2 power i == 0 )
We know what would be the sum of each bucket if there was no duplicate. I consider this as prior knowledge.
Once above buckets are generated, a bunch of them would have values more than expected. By constructing the number from buckets we will have (A OR B for your information).
We can calculate (A XOR B) as follows:
A XOR B = Array[i] XOR Array[i-1] XOR ... 0, XOR n-3 XOR n-2 ... XOR 0
Now going back to buckets, we know exactly which buckets have both our numbers and which ones have only one (from the XOR bit).
For the buckets that have only one number we can extract the number num = (sum - expected sum of bucket). However, we should be good only if we can find one of the duplicate numbers so if we have at least one bit in A XOR B, we've got the answer.
But what if A XOR B is zero?
Well this case is only possible if both duplicate numbers are the same number, which then our number is the answer of A OR B.

Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.
Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:
int count [10];
for (int i = 0; i < arraylen; i++) {
count[array[i]]++;
}
Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.

Here's implementation in Python of #eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).
def two_repeated(iterable):
s1, s2 = 0, 0
for i, j in enumerate(iterable):
s1 += j - i # number_of_digits(s1) ~ 2 * number_of_digits(i)
s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i)
s1 += (i - 1) + i
s2 += (i - 1)**2 + i**2
p = (s1 - int((2*s2 - s1**2)**.5)) // 2
# `Decimal().sqrt()` could replace `int()**.5` for really large integers
# or any function to compute integer square root
return p, s1 - p
Example:
>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
A more verbose version of the above code follows with explanation:
def two_repeated_seq(arr):
"""Return the only two duplicates from `arr`.
>>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
"""
n = len(arr)
assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
assert len(set(arr)) == (n - 2) # number of unique items
s1 = (n-2) + (n-1) # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits
s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
for i, j in enumerate(arr):
s1 += j - i
s2 += j*j - i*i
"""
s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
= sum(arr) - sum(range(n-2))
= sum(range(n-2)) + p + q - sum(range(n-2))
= p + q
"""
assert s1 == (sum(arr) - sum(range(n-2)))
"""
s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
= sum(i*i for i in arr) - sum(i*i for i in range(n-2))
= p*p + q*q
"""
assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))
"""
s1 = p+q
-> s1**2 = (p+q)**2
-> s1**2 = p*p + 2*p*q + q*q
-> s1**2 - (p*p + q*q) = 2*p*q
s2 = p*p + q*q
-> p*q = (s1**2 - s2)/2
Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
that p and q are roots of x**2 - B*x + C = 0
-> p = (B + sqrtD) / 2
-> q = (B - sqrtD) / 2
where sqrtD = sqrt(B**2 - 4*C)
-> p = (s1 + sqrt(2*s2 - s1**2))/2
"""
sqrtD = (2*s2 - s1**2)**.5
assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
sqrtD = int(sqrtD)
assert (s1 - sqrtD) % 2 == 0 # even
p = (s1 - sqrtD) // 2
q = s1 - p
assert q == ((s1 + sqrtD) // 2)
assert sqrtD == (q - p)
return p, q
NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.

Since a range is specified, you can perform radix sort. This would sort your array in O(n). Searching for duplicates in a sorted array is then O(n)

You can use simple nested for loop
int[] numArray = new int[] { 1, 2, 3, 4, 5, 7, 8, 3, 7 };
for (int i = 0; i < numArray.Length; i++)
{
for (int j = i + 1; j < numArray.Length; j++)
{
if (numArray[i] == numArray[j])
{
//DO SOMETHING
}
}
*OR you can filter the array and use recursive function if you want to get the count of occurrences*
int[] array = { 1, 2, 3, 4, 5, 4, 4, 1, 8, 9, 23, 4, 6, 8, 9, 1,4 };
int[] myNewArray = null;
int a = 1;
void GetDuplicates(int[] array)
for (int i = 0; i < array.Length; i++)
{
for (int j = i + 1; j < array.Length; j++)
{
if (array[i] == array[j])
{
a += 1;
}
}
Console.WriteLine(" {0} occurred {1} time/s", array[i], a);
IEnumerable<int> num = from n in array where n != array[i] select n;
myNewArray = null;
a = 1;
myNewArray = num.ToArray() ;
break;
}
GetDuplicates(myNewArray);

answer to 18..
you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!

check this out ...
O(n) time and O(1) space complexity
for(i=0;i< n;i++)
xor=xor^arr[i]
for(i=1;i<=n-3;i++)
xor=xor^i;
So in the given example you will get the xor of 3 and 5
xor=xor & -xor //Isolate the last digit
for(i = 0; i < n; i++)
{
if(arr[i] & xor)
x = x ^ arr[i];
else
y = y ^ arr[i];
}
for(i = 1; i <= n-3; i++)
{
if(i & xor)
x = x ^ i;
else
y = y ^ i;
}
x and y are your answers

For each number: check if it exists in the rest of the array.

Without sorting you're going to have a keep track of numbers you've already visited.
in psuedocode this would basically be (done this way so I'm not just giving you the answer):
for each number in the list
if number not already in unique numbers list
add it to the unique numbers list
else
return that number as it is a duplicate
end if
end for each

How about this:
for (i=0; i<n-1; i++) {
for (j=i+1; j<n; j++) {
if (a[i] == a[j]) {
printf("%d appears more than once\n",a[i]);
break;
}
}
}
Sure it's not the fastest, but it's simple and easy to understand, and requires
no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)

In c:
int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};
int num = 0, i;
for (i=0; i < 8; i++)
num = num ^ arr[i] ^i;
Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.

I have written a small programme which finds out the number of elements not repeated, just go through this let me know your opinion, at the moment I assume even number of elements are even but can easily extended for odd numbers also.
So my idea is to first sort the numbers and then apply my algorithm.quick sort can be use to sort this elements.
Lets take an input array as below
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
the number 2,10 and 4 are not repeated ,but they are in sorted order, if not sorted use quick sort to first sort it out.
Lets apply my programme on this
using namespace std;
main()
{
//int arr[] = {2, 9, 6, 1, 1, 4, 2, 3, 5};
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
int i = 0;
vector<int> vec;
int var = arr[0];
for(i = 1 ; i < sizeof(arr)/sizeof(arr[0]); i += 2)
{
var = var ^ arr[i];
if(var != 0 )
{
//put in vector
var = arr[i-1];
vec.push_back(var);
i = i-1;
}
var = arr[i+1];
}
for(int i = 0 ; i < vec.size() ; i++)
printf("value not repeated = %d\n",vec[i]);
}
This gives the output:
value not repeated= 2
value not repeated= 10
value not repeated= 4
Its simple and very straight forward, just use XOR man.

for(i=1;i<=n;i++) {
if(!(arr[i] ^ arr[i+1]))
printf("Found Repeated number %5d",arr[i]);
}

Here is an algorithm that uses order statistics and runs in O(n).
You can solve this by repeatedly calling SELECT with the median as parameter.
You also rely on the fact that After a call to SELECT,
the elements that are less than or equal to the median are moved to the left of the median.
Call SELECT on A with the median as the parameter.
If the median value is floor(n/2) then the repeated values are right to the median. So you continue with the right half of the array.
Else if it is not so then a repeated value is left to the median. So you continue with the left half of the array.
You continue this way recursively.
For example:
When A={2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, then the median should be the value 4.
After the first call to SELECT
A={3, 2, 0, 1, <3>, 4, 5, 6, 5} The median value is smaller than 4 so we continue with the left half.
A={3, 2, 0, 1, 3}
After the second call to SELECT
A={1, 0, <2>, 3, 3} then the median should be 2 and it is so we continue with the right half.
A={3, 3}, found.
This algorithm runs in O(n+n/2+n/4+...)=O(n).

What about using the https://en.wikipedia.org/wiki/HyperLogLog?
Redis does http://redis.io/topics/data-types-intro#hyperloglogs
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.

Well using the nested for loop and assuming the question is to find the number occurred only twice in an array.
def repeated(ar,n):
count=0
for i in range(n):
for j in range(i+1,n):
if ar[i] == ar[j]:
count+=1
if count == 1:
count=0
print("repeated:",ar[i])
arr= [2, 3, 6, 1, 5, 4, 0, 3, 5]
n = len(arr)
repeated(arr,n)

Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no .
This can be extended to find out if there is any missing no in the array or not.
This solution is O(n) in time complexity

Related

Minimum transfer to make array equal

This question is asked in the interview. I am still not able to find what should be right approach to attempt this problem.
Given an array = [7,2,2] find the minimum number of transfer required to make array elements almost equal. If this is not possible the larger elements should come to the left side.
In above example the final state of array would be [4,4,3] and the answer will be 2+ 1 =3.
We are transfering 2 from 7 to first 2 and then we are transfering another 1 from 7 to 2.
If the input is [2,2,7] then the answer will be 4 since we need to keep bigger elements on the left side.
final state = [4,4,3]
2 transfered from 7 to both 2 to make the final count as 4.
The minimum amount of transfers done 1 unit at a time is half the total amount by which the input differs from the desired array. "Almost equal" doesn't seem to mean any complication according to what you've given.
The solution is to imagine what the target array will be. This target array will depend only on the sum of the values in the original array, and the length of the array (which obviously must remain the same).
If the sum of the values is a multiple of the array length, then in the target array all values will be the same. If however there is a remainder, that remainder represents the number of array values that will be one more than some of the value(s) at the end of the array.
We don't actually have to store that target array. It is implicitly defined by the quotient and the remainder of the division of the sum by the array length.
The output of the function is the sum of differences with the actual input array value and the expected value at any array index. We should only count positive differences (i.e. transfers out of a value) as otherwise we would count transfers twice -- once on the outgoing side and again on the incoming side.
Here is an implementation in basic JavaScript:
function solve(arr) {
// Sum all array values
let sum = 0;
for (let i = 0; i < arr.length; i++) {
sum += arr[i];
}
// Get the integer quotient and remainder
let quotient = Math.floor(sum / arr.length);
let remainder = sum % arr.length;
// Determine the target value until the remainder is completely consumed:
let expected = quotient + 1;
// Collect all the positive differences with the expected value
let result = 0;
for (let i = 0; i < arr.length; i++) {
// If we have consumed the remainder, reduce the expected value
if (i == remainder) {
expected = quotient;
}
let transfer = arr[i] - expected;
// Only account for positive transfers to avoid double counting
if (transfer > 0) {
result += transfer;
}
}
return result;
}
let array = [7,2,2];
console.log(solve(array)); // 6
Let's start form target array. What is it?
Having {7, 2, 2} we want to obtain {4, 4, 3}. So every item is at least 3 and some top items are 3 + 1 == 4.
The algorithm is
let sum = sum(original)
let rem = sum(original) % length(original) # here % stands for remainder
target[i] = sum / length(original) + (i < rem ? 1 : 0)
Having original and target
original: 7 2 2
target: 4 4 3
transfer: 3 2 1 (6 in total)
note, that
transfer[i] is just an absolute difference: abs(original[i] - target[i])
we count each transfer twice: once we subtract and then we add.
So the answer is
sum(transfer[i]) / 2 == sum(abs(original[i] - target[i])) / 2
Code (c#):
private static int Solve(int[] initial) {
// Don't forget about degenerated cases
if (initial is null || initial.Length <= 0)
return 0;
int sum = initial.Sum();
int rem = sum % initial.Length;
int result = 0;
for (int i = 0; i < initial.Length; ++i)
result += Math.Abs(sum / initial.Length + ((i < rem) ? 1 : 0) - initial[i]);
return result / 2;
}
Demo: (Fiddle)
int[][] tests = new int[][] {
new int[] {7, 2, 2},
new int[] {2, 2, 7},
new int[] {},
new int[] {2, 2, 2},
new int[] {1, 2, 3},
};
string report = string.Join(Environment.NewLine, tests
.Select(test => $"[{string.Join(", ", test)}] => {Solve(test)}"));
Console.Write(report);
Outcome:
[7, 2, 2] => 3
[2, 2, 7] => 4
[] => 0
[2, 2, 2] => 0
[1, 2, 3] => 1
Seems to me like a simple problem that can be solved with greedy approach.
Steps:
Sum up the input array-elements S, divide by its length n. Lets say, the quotient is Q and remainder (mod) is R. Then, final array target will have 1st R elements with value = Q+1. Rest of the elements will be Q.
Number of transfers will be half of the sum of absolute difference at each (corresponding) position in input and target arrays.
Example:
Input [7, 2, 2]
S=11 n=3 Q=11/3=3 R=11%3=2
Target [3+1, 3+1, 3]
Answer = (abs(7-4) + abs(2-4) + abs(2-3)) / 2 = 3

Given an array of numbers. At each step we can pick a number like N in this array and sum N with another number that exist in this array

I'm stuck on this problem.
Given an array of numbers. At each step we can pick a number like N in this array and sum N with another number that exist in this array. We continue this process until all numbers in this array equals to zero. What is the minimum number of steps required? (We can guarantee initially the sum of numbers in this array is zero).
Example: -20,-15,1,3,7,9,15
Step 1: pick -15 and sum with 15 -> -20,0,1,3,7,9,0
Step 2: pick 9 and sum with -20 -> -11,0,1,3,7,0,0
Step 3: pick 7 and sum with -11 -> -4,0,1,3,0,0,0
Step 4: pick 3 and sum with -4 -> -1,0,1,0,0,0,0
Step 5: pick 1 and sum with -1 -> 0,0,0,0,0,0,0
So the answer of this example is 5.
I've tried using greedy algorithm. It works like this:
At each step we pick maximum and minimum number that already available in this array and sum these two numbers until all numbers in this array equals to zero.
but it doesn't work and get me wrong answer. Can anyone help me to solve this problem?
#include <bits/stdc++.h>
using namespace std;
int a[] = {-20,-15,1,3,7,9,15};
int bruteforce(){
bool isEqualToZero = 1;
for (int i=0;i<(sizeof(a)/sizeof(int));i++)
if (a[i] != 0){
isEqualToZero = 0;
break;
}
if (isEqualToZero)
return 0;
int tmp=0,m=1e9;
for (int i=0;i<(sizeof(a)/sizeof(int));i++){
for (int j=i+1;j<(sizeof(a)/sizeof(int));j++){
if (a[i]*a[j] >= 0) continue;
tmp = a[j];
a[i] += a[j];
a[j] = 0;
m = min(m,bruteforce());
a[j] = tmp;
a[i] -= tmp;
}
}
return m+1;
}
int main()
{
cout << bruteforce();
}
This is the brute force approach that I've written for this problem. Is there any algorithm to solve this problem faster?
This has an np-complete feel, but the following search does an A* search through all possible normalized partial sums on the way to a single non-zero term. Which solves your problem, and means that you don't get into an infinite loop if the sum is not zero.
If greedy works, this will explore the greedy path first, verify that you can't do better, and return fairly quickly. If greedy doesn't work, this may...take a lot longer.
Implementation in Python because that is easy for me. Translation into another language is an exercise for the reader.
import heapq
def find_minimal_steps (numbers):
normalized = tuple(sorted(numbers))
seen = set([normalized])
todo = [(min_steps_remaining(normalized), 0, normalized, None)]
while todo[0][0] < 7:
step_limit, steps_taken, prev, path = heapq.heappop(todo)
steps_taken = -1 * steps_taken # We store negative for sort order
if min_steps_remaining(prev) == 0:
decoded_path = []
while path is not None:
decoded_path.append((path[0], path[1]))
path = path[2]
return steps_taken, list(reversed(decoded_path))
prev_numbers = list(prev)
for i in range(len(prev_numbers)):
for j in range(len(prev_numbers)):
if i != j:
# Track what they were
num_i = prev_numbers[i]
num_j = prev_numbers[j]
# Sum them
prev_numbers[i] += num_j
prev_numbers[j] = 0
normalized = tuple(sorted(prev_numbers))
if (normalized not in seen):
seen.add(normalized)
heapq.heappush(todo, (
min_steps_remaining(normalized) + steps_taken + 1,
-steps_taken - 1, # More steps is smaller is looked at first
normalized,
(num_i, num_j, path)))
# set them back.
prev_numbers[i] = num_i
prev_numbers[j] = num_j
print(find_minimal_steps([-20,-15,1,3,7,9,15]))
For fun I also added a linked list implementation that doesn't just tell you how many minimal steps, but which ones it found. In this case its steps were (-15, 15), (7, 9), (3, 16), (1, 19), (-20, 20) meaning add 15 to -15, 9 to 7, 16 to 3, 19 to 1, and 20 to -20.

Iterate binary numbers with the same quantity of ones (or zeros) in random order

I need to generate binary numbers with the same quantity of ones (or zeros) in random order.
Does anyone know any efficient algorithm for fixed-length binary numbers?
Example for 2 ones and 4 digits (just to be more clear):
1100
1010
1001
0110
0101
0011
UPDATE
Random order w/o repetitions is significant. Sequence of binary numbers required, not single permutation.
If you have enough memory to store all the possible bit sequences, and you don't mind generating them all before you have the first result, then the solution would be to use some efficient generator to produce all possible sequences into a vector and then shuffle the vector using the Fisher-Yates shuffle. That's easy and unbiased (as long as you use a good random number generator to do the shuffle) but it can use a lot of memory if n is large, particularly if you are not sure you will need to complete the iteration.
But there are a couple of solutions which do not require keeping all the possible words in memory. (C implementations of the two solutions follow the text.)
1. Bit shuffle an enumeration
The fastest one (I think) is to first generate a random shuffle of bit values, and then iterate over the possible words one at a time applying the shuffle to the bits of each value. In order to avoid the complication of shuffling actual bits, the words can be generated in a Gray code order in which only two bit positions are changed from one word to the next. (This is also known as a "revolving-door" iteration because as each new 1 is added, some other 1 must be removed.) This allows the bit mask to be updated rapidly, but it means that successive entries are highly correlated, which may be unsuitable for some purposes. Also, for small values of n the number of possible bit shuffles is very limited, so there will not be a lot of different sequences produced. (For example, for the case where n is 4 and k is 2, there are 6 possible words which could be sequenced in 6! (720) different ways, but there are only 4! (24) bit-shuffles. This could be ameliorated slightly by starting the iteration at a random position in the sequence.)
It is always possible to find a Gray code. Here's an example for n=6, k=3: (The bold bits are swapped at each step. I wanted to underline them but for some inexplicable reason SO allows strikethrough but not underline.)
111000 010110 100011 010101
101100 001110 010011 001101
011100 101010 001011 101001
110100 011010 000111 011001
100110 110010 100101 110001
This sequence can be produced by a recursive algorithm similar to that suggested by #JasonBoubin -- the only difference is that the second half of each recursion needs to be produced in reverse order -- but it's convenient to use a non-recursive version of the algorithm. The one in the sample code below comes from Frank Ruskey's unpublished manuscript on Combinatorial Generation (Algorithm 5.7 on page 130). I modified it to use 0-based indexing, as well as adding the code to keep track of the binary representations.
2. Randomly generate an integer sequence and convert it to combinations
The "more" random but somewhat slower solution is to produce a shuffled list of enumeration indices (which are sequential integers in [0, n choose k)) and then find the word corresponding to each index.
The simplest pseudo-random way to produce a shuffled list of integers in a contiguous range is to use a randomly-chosen Linear Congruential Generator (LCG). An LCG is the recursive sequence xi = (a * xi-1 + c) mod m. If m is a power of 2, a mod 4 is 1 and c mod 2 is 1, then that recursion will cycle through all 2m possible values. To cycle through the range [0, n choose k), we simply select m to be the next larger power of 2, and then skip any values which are not in the desired range. (That will be fewer than half the values produced, for obvious reasons.)
To convert the enumeration index into an actual word, we perform a binomial decomposition of the index based on the fact that the set of n choose k words consists of n-1 choose k words starting with a 0 and n-1 choose k-1 words starting with a 1. So to produce the ith word:
if i < n-1 choose k we output a 0 and then the ith word in the set of n-1 bit words with k bits set;
otherwise, we output a 1 and then subtract n-1 choose k from i as the index into the set of n-1 bit words with k-1 bits set.
It's convenient to precompute all the useful binomial coefficients.
LCGs suffer from the disadvantage that they are quite easy to predict after the first few terms are seen. Also, some of the randomly-selected values of a and c will produce index sequences where successive indices are highly correlated. (Also, the low-order bits are always quite non-random.) Some of these problems could be slightly ameliorated by also applying a random bit-shuffle to the final result. This is not illustrated in the code below but it would slow things down very little and it should be obvious how to do it. (It basically consists of replacing 1UL<<n with a table lookup into the shuffled bits).
The C code below uses some optimizations which make it a bit challenging to read. The binomial coefficients are stored in a lower-diagonal array:
row
index
[ 0] 1
[ 1] 1 1
[ 3] 1 2 1
[ 6] 1 3 3 1
[10] 1 4 6 4 1
As can be seen, the array index for binom(n, k) is n(n+1)/2 + k, and if we have that index, we can find binom(n-1, k) by simply subtracting n, and binom(n-1, k-1) by subtracting n+1. In order to avoid needing to store zeros in the array, we make sure that we never look up a binomial coefficient where k is negative or greater than n. In particular, if we have arrived at a point in the recursion where k == n or k == 0, we can definitely know that the index to look up is 0, because there is only one possible word. Furthermore, index 0 in the set of words with some n and k
will consist precisely of n-k zeros followed by k ones, which is the n-bit binary representation of 2k-1. By short-cutting the algorithm when the index reaches 0, we can avoid having to worry about the cases where one of binom(n-1, k) or binom(n-1, k-1) is not a valid index.
C code for the two solutions
Gray code with shuffled bits
void gray_combs(int n, int k) {
/* bit[i] is the ith shuffled bit */
uint32_t bit[n+1];
{
uint32_t mask = 1;
for (int i = 0; i < n; ++i, mask <<= 1)
bit[i] = mask;
bit[n] = 0;
shuffle(bit, n);
}
/* comb[i] for 0 <= i < k is the index of the ith bit
* in the current combination. comb[k] is a sentinel. */
int comb[k + 1];
for (int i = 0; i < k; ++i) comb[i] = i;
comb[k] = n;
/* Initial word has the first k (shuffled) bits set */
uint32_t word = 0;
for (int i = 0; i < k; ++i) word |= bit[i];
/* Now iterate over all combinations */
int j = k - 1; /* See Ruskey for meaning of j */
do {
handle(word, n);
if (j < 0) {
word ^= bit[comb[0]] | bit[comb[0] - 1];
if (--comb[0] == 0) j += 2;
}
else if (comb[j + 1] == comb[j] + 1) {
word ^= bit[comb[j + 1]] | bit[j];
comb[j + 1] = comb[j]; comb[j] = j;
if (comb[j + 1] == comb[j] + 1) j += 2;
}
else if (j > 0) {
word ^= bit[comb[j - 1]] | bit[comb[j] + 1];
comb[j - 1] = comb[j]; ++comb[j];
j -= 2;
}
else {
word ^= bit[comb[j]] | bit[comb[j] + 1];
++comb[j];
}
} while (comb[k] == n);
}
LCG with enumeration index to word conversion
static const uint32_t* binom(unsigned n, unsigned k) {
static const uint32_t b[] = {
1,
1, 1,
1, 2, 1,
1, 3, 3, 1,
1, 4, 6, 4, 1,
1, 5, 10, 10, 5, 1,
1, 6, 15, 20, 15, 6, 1,
// ... elided for space
};
return &b[n * (n + 1) / 2 + k];
}
static uint32_t enumerate(const uint32_t* b, uint32_t r, unsigned n, unsigned k) {
uint32_t rv = 0;
while (r) {
do {
b -= n;
--n;
} while (r < *b);
r -= *b;
--b;
--k;
rv |= 1UL << n;
}
return rv + (1UL << k) - 1;
}
static bool lcg_combs(unsigned n, unsigned k) {
const uint32_t* b = binom(n, k);
uint32_t count = *b;
uint32_t m = 1; while (m < count) m <<= 1;
uint32_t a = 4 * randrange(1, m / 4) + 1;
uint32_t c = 2 * randrange(0, m / 2) + 1;
uint32_t x = randrange(0, m);
while (count--) {
do
x = (a * x + c) & (m - 1);
while (x >= *b);
handle(enumerate(b, x, n, k), n);
}
return true;
}
Note: I didn't include the implementation of randrange or shuffle; code is readily available. randrange(low, lim) produces a random integer in the range [low, lim); shuffle(vec, n) randomly shuffles the integer vector vecof length n.
Also, the the loop calls handle(word, n) for each generated word. That must must be replaced with whatever is to be done with each combination.
With handle defined as a function which does nothing, gray_combs took 150 milliseconds on my laptop to find all 40,116,600 28-bit words with 14 bits set. lcg_combs took 5.5 seconds.
Integers with exactly k bits set are easy to generate in order.
You can do that, and then change the order by applying a bit-permutation to the results (see below), for example here's a randomly generated 16-bit (you should pick one with the right number of bits, based on the word size not on the number of set bits) bit-permutation (not tested):
uint permute(uint x) {
x = bit_permute_step(x, 0x00005110, 1); // Butterfly, stage 0
x = bit_permute_step(x, 0x00000709, 4); // Butterfly, stage 2
x = bit_permute_step(x, 0x000000a1, 8); // Butterfly, stage 3
x = bit_permute_step(x, 0x00005404, 1); // Butterfly, stage 0
x = bit_permute_step(x, 0x00000231, 2); // Butterfly, stage 1
return x;
}
uint bit_permute_step(uint x, uint m, int shift) {
uint t;
t = ((x >> shift) ^ x) & m;
x = (x ^ t) ^ (t << shift);
return x;
}
Generating the re-ordered sequence is easy:
uint i = (1u << k) - 1;
uint max = i << (wordsize - k);
do
{
yield permute(i);
i = nextPermutation(i);
} while (i != max);
yield permute(i); // for max
Where nextPermutation comes from the linked question,
uint nextPermutation(uint v) {
uint t = (v | (v - 1)) + 1;
uint w = t | ((((t & -t) / (v & -v)) >> 1) - 1);
return w;
}
The bit-permutation should be chosen as a random permutation (eg take 0..(wordsize-1) and shuffle) and then converted to bfly masks (I used programming.sirrida.de/calcperm.php), not as randomly generated bfly masks.
I think you can use Heap's algorithm. This algorithm generates all possible permutations of n objects. Just create simple array and use algorithm for generating all possible permutations.
This algorithm is non effective if you want to iterate over binary numbers with BINARY operations. For binary operations you can use LFSR.
LFSR is a simple method for iteration over all numbers. I think you can do some simple modifications for generations fixed size zeros numbers with LFSR.
How about this solution in Python which does permutations?
from itertools import permutations
fixed_length = 4
perms = [''.join(p) for p in permutations('11' + '0' * (fixed_length - 2))]
unique_perms = set(perms)
This would return the numbers as strings, easily convertible with int(num, 2).
As for efficiency, running this took 0.021 milliseconds on my machine.
You can modify the general permutation algorithm to work with binary. Here's an implementation in C++:
#include<iostream>
#include<string>
#include<iostream>
void binaryPermutation(int ones, int digits, std::string current){
if(digits <= 0 && ones <= 0){
std::cout<<current<<std::endl;
}
else if(digits > 0){
if(ones > 0){
binaryPermutation(ones-1, digits-1, current+"1");
}
binaryPermutation(ones, digits-1, current+"0");
}
}
int main()
{
binaryPermutation(2, 4, "");
return 0;
}
This code outputs the following:
1100
1010
1001
0110
0101
0011
You can modify it to store these outputs in a collection or do something other than simply print them.

Find number of continuous subarray having sum zero

You have given a array and You have to give number of continuous subarray which the sum is zero.
example:
1) 0 ,1,-1,0 => 6 {{0},{1,-1},{0,1,-1},{1,-1,0},{0}};
2) 5, 2, -2, 5 ,-5, 9 => 3.
With O(n^2) it can be done.I am trying to find the solution below this complexity.
Consider S[0..N] - prefix sums of your array, i.e. S[k] = A[0] + A[1] + ... + A[k-1] for k from 0 to N.
Now sum of elements from L to R-1 is zero if and only if S[R] = S[L]. It means that you have to find number of indices 0 <= L < R <= N such that S[L] = S[R].
This problem can be solved with a hash table. Iterate over elements of S[] while maintaining for each value X number of times it was met in the already processed part of S[]. These counts should be stored in a hash map, where the number X is a key, and the count H[X] is the value. When you meet a new elements S[i], add H[S[i]] to your answer (these account for substrings ending with (i-1)-st element), then increment H[S[i]] by one.
Note that if sum of absolute values of array elements is small, you can use a simple array instead of hash table. The complexity is linear on average.
Here is the code:
long long CountZeroSubstrings(vector<int> A) {
int n = A.size();
vector<long long> S(n+1, 0);
for (int i = 0; i < n; i++)
S[i+1] = S[i] + A[i];
long long answer = 0;
unordered_map<long long, int> H;
for (int i = 0; i <= n; i++) {
if (H.count(S[i]))
answer += H[S[i]];
H[S[i]]++;
}
return answer;
}
This can be solved in linear time by keeping a hash table of sums reached during the array traversal. The number of subsets can then be directly calculated from the counts of revisited sums.
Haskell version:
import qualified Data.Map as M
import Data.List (foldl')
f = foldl' (\b a -> b + div (a * (a + 1)) 2) 0 . M.elems . snd
. foldl' (\(s,m) x -> let s' = s + x in case M.lookup s' m of
Nothing -> (s',M.insert s' 0 m)
otherwise -> (s',M.adjust (+1) s' m)) (0,M.fromList[(0,0)])
Output:
*Main> f [0,1,-1,0]
6
*Main> f [5,2,-2,5,-5,9]
3
*Main> f [0,0,0,0]
10
*Main> f [0,1,0,0]
4
*Main> f [0,1,0,0,2,3,-3]
5
*Main> f [0,1,-1,0,0,2,3,-3]
11
C# version of #stgatilov answer https://stackoverflow.com/a/31489960/3087417 with readable variables:
int[] sums = new int[arr.Count() + 1];
for (int i = 0; i < arr.Count(); i++)
sums[i + 1] = sums[i] + arr[i];
int numberOfFragments = 0;
Dictionary<int, int> sumToNumberOfRepetitions = new Dictionary<int, int>();
foreach (int item in sums)
{
if (sumToNumberOfRepetitions.ContainsKey(item))
numberOfFragments += sumToNumberOfRepetitions[item];
else
sumToNumberOfRepetitions.Add(item, 0);
sumToNumberOfRepetitions[item]++;
}
return numberOfFragments;
If you want to have sum not only zero but any number k, here is the hint:
int numToFind = currentSum - k;
if (sumToNumberOfRepetitions.ContainsKey(numToFind))
numberOfFragments += sumToNumberOfRepetitions[numToFind];
I feel it can be solved using DP:
Let the state be :
DP[i][j] represents the number of ways j can be formed using all the subarrays ending at i!
Transitions:
for every element in the initial step ,
Increase the number of ways to form Element[i] using i elements by 1 i.e. using the subarray of length 1 starting from i and ending with i i.e
DP[i][Element[i]]++;
then for every j in Range [ -Mod(highest Magnitude of any element ) , Mod(highest Magnitude of any element) ]
DP[i][j]+=DP[i-1][j-Element[i]];
Then your answer will be the sum of all the DP[i][0] (Number of ways to form 0 using subarrays ending at i ) where i varies from 1 to Number of elements
Complexity is O(MOD highest magnitude of any element * Number of Elements)
https://www.techiedelight.com/find-sub-array-with-0-sum/
This would be an exact solution.
# Utility function to insert <key, value> into the dict
def insert(dict, key, value):
# if the key is seen for the first time, initialize the list
dict.setdefault(key, []).append(value)
# Function to print all sub-lists with 0 sum present
# in the given list
def printallSublists(A):
# create an empty -dict to store ending index of all
# sub-lists having same sum
dict = {}
# insert (0, -1) pair into the dict to handle the case when
# sub-list with 0 sum starts from index 0
insert(dict, 0, -1)
result = 0
sum = 0
# traverse the given list
for i in range(len(A)):
# sum of elements so far
sum += A[i]
# if sum is seen before, there exists at-least one
# sub-list with 0 sum
if sum in dict:
list = dict.get(sum)
result += len(list)
# find all sub-lists with same sum
for value in list:
print("Sublist is", (value + 1, i))
# insert (sum so far, current index) pair into the -dict
insert(dict, sum, i)
print("length :", result)
if __name__ == '__main__':
A = [0, 1, 2, -3, 0, 2, -2]
printallSublists(A)
I don't know what the complexity of my suggestion would be but i have an idea :)
What you can do is try to reduce element from main array which are not able to contribute for you solution
suppose elements are -10, 5, 2, -2, 5,7 ,-5, 9,11,19
so you can see that -10,9,11 and 19 are element
that are never gone be useful to make sum 0 in your case
so try to remove -10,9,11, and 19 from your main array
to do this what you can do is
1) create two sub array from your main array
`positive {5,7,2,9,11,19}` and `negative {-10,-2,-5}`
2) remove element from positive array which does not satisfy condition
condition -> value should be construct from negative arrays element
or sum of its elements
ie.
5 = -5 //so keep it //don't consider the sign
7 = (-5 + -2 ) // keep
2 = -2 // keep
9 // cannot be construct using -10,-2,-5
same for all 11 and 19
3) remove element form negative array which does not satisfy condition
condition -> value should be construct from positive arrays element
or sum of its elements
i.e. -10 // cannot be construct so discard
-2 = 2 // keep
-5 = 5 // keep
so finally you got an array which contains -2,-5,5,7,2 create all possible sub array form it and check for sum = 0
(Note if your input array contains 0 add all 0's in final array)

Number of unique sequences of 3 digits (-1,0,1) given a length that matches a sum

Say you have a vertical game board of length n (being the number of spaces). And you have a three-sided die that has the options: go forward one, stay and go back one. If you go below or above the number of board game spaces it is an invalid game. The only valid move once you reach the end of the board is "stay". Given an exact number of die rolls t, is it possible to algorithmically work out the number of unique dice rolls that result in a winning game?
So far I've tried producing a list of every possible combination of (-1,0,1) for the given number of die rolls and sorting through the list to see if any add up to the length of the board and also meet all the requirements for being a valid game. But this is impractical for dice rolls above 20.
For example:
t=1, n=2; Output=1
t=3, n=2; Output=3
You can use a dynamic programming approach. The sketch of a recurrence is:
M(0, 1) = 1
M(t, n) = T(t-1, n-1) + T(t-1, n) + T(t-1, n+1)
Of course you have to consider the border cases (like going off the board or not allowing to exit the end of the board, but it's easy to code that).
Here's some Python code:
def solve(N, T):
M, M2 = [0]*N, [0]*N
M[0] = 1
for i in xrange(T):
M, M2 = M2, M
for j in xrange(N):
M[j] = (j>0 and M2[j-1]) + M2[j] + (j+1<N-1 and M2[j+1])
return M[N-1]
print solve(3, 2) #1
print solve(2, 1) #1
print solve(2, 3) #3
print solve(5, 20) #19535230
Bonus: fancy "one-liner" with list compreehension and reduce
def solve(N, T):
return reduce(
lambda M, _: [(j>0 and M[j-1]) + M[j] + (j<N-2 and M[j+1]) for j in xrange(N)],
xrange(T), [1]+[0]*N)[-1]
Let M[i, j] be an N by N matrix with M[i, j] = 1 if |i-j| <= 1 and 0 otherwise (and the special case for the "stay" rule of M[N, N-1] = 0)
This matrix counts paths of length 1 from position i to position j.
To find paths of length t, simply raise M to the t'th power. This can be performed efficiently by linear algebra packages.
The solution can be read off: M^t[1, N].
For example, computing paths of length 20 on a board of size 5 in an interactive Python session:
>>> import numpy
>>> M = numpy.matrix('1 1 0 0 0;1 1 1 0 0; 0 1 1 1 0; 0 0 1 1 1; 0 0 0 0 1')
>>> M
matrix([[1, 1, 0, 0, 0],
[1, 1, 1, 0, 0],
[0, 1, 1, 1, 0],
[0, 0, 1, 1, 1],
[0, 0, 0, 0, 1]])
>>> M ** 20
matrix([[31628466, 51170460, 51163695, 31617520, 19535230],
[51170460, 82792161, 82787980, 51163695, 31617520],
[51163695, 82787980, 82792161, 51170460, 31628465],
[31617520, 51163695, 51170460, 31628466, 19552940],
[ 0, 0, 0, 0, 1]])
So there's M^20[1, 5], or 19535230 paths of length 20 from start to finish on a board of size 5.
Try a backtracking algorithm. Recursively "dive down" into depth t and only continue with dice values that could still result in a valid state. Propably by passing a "remaining budget" around.
For example, n=10, t=20, when you reached depth 10 of 20 and your budget is still 10 (= steps forward and backwards seemed to cancelled), the next recursion steps until depth t would discontinue the 0 and -1 possibilities, because they could not result in a valid state at the end.
A backtracking algorithms for this case is still very heavy (exponential), but better than first blowing up a bubble with all possibilities and then filtering.
Since zeros can be added anywhere, we'll multiply those possibilities by the different arrangements of (-1)'s:
X (space 1) X (space 2) X (space 3) X (space 4) X
(-1)'s can only appear in spaces 1,2 or 3, not in space 4. I got help with the mathematical recurrence that counts the number of ways to place minus ones without skipping backwards.
JavaScript code:
function C(n,k){if(k==0||n==k)return 1;var p=n;for(var i=2;i<=k;i++)p*=(n+1-i)/i;return p}
function sumCoefficients(arr,cs){
var s = 0, i = -1;
while (arr[++i]){
s += cs[i] * arr[i];
}
return s;
}
function f(n,t){
var numMinusOnes = (t - (n-1)) >> 1
result = C(t,n-1),
numPlaces = n - 2,
cs = [];
for (var i=1; numPlaces-i>=i-1; i++){
cs.push(-Math.pow(-1,i) * C(numPlaces + 1 - i,i));
}
var As = new Array(cs.length),
An;
As[0] = 1;
for (var m=1; m<=numMinusOnes; m++){
var zeros = t - (n-1) - 2*m;
An = sumCoefficients(As,cs);
As.unshift(An);
As.pop();
result += An * C(zeros + 2*m + n-1,zeros);
}
return result;
}
Output:
console.log(f(5,20))
19535230

Resources