How to generate a list of subsets with restrictions? - algorithm

I am trying to figure out an efficient algorithm to take a list of items and generate all unique subsets that result from splitting the list into exactly 2 sublists. I'm sure there is a general purpose way to do this, but I'm interested in a specific case. My list will be sorted, and there can be duplicate items.
Some examples:
Input
{1,2,3}
Output
{{1},{2,3}}
{{2},{1,3}}
{{3},{1,2}}
Input
{1,2,3,4}
Output
{{1},{2,3,4}}
{{2},{1,3,4}}
{{3},{1,2,4}}
{{4},{1,2,3}}
{{1,2},{3,4}}
{{1,3},{2,4}}
{{1,4},{2,3}}
Input
{1,2,2,3}
Output
{{1},{2,2,3}}
{{2},{1,2,3}}
{{3},{1,2,2}}
{{1,2},{2,3}}
{{1,3},{2,2}}
I can do this on paper, but I'm struggling to figure out a simple way to do it programmatically. I'm only looking for a quick pseudocode description of how to do this, not any specific code examples.
Any help is appreciated. Thanks.

If you were generating all subsets you would end up generating 2n subsets for a list of length n. A common way to do this is to iterate through all the numbers i from 0 to 2n-1 and use the bits that are set in i to determine which items are in the ith subset. This works because any item either is or is not present in any particular subset, so by iterating through all the combinations of n bits you iterate through the 2n subsets.
For example, to generate the subsets of (1, 2, 3) you would iterate through the numbers 0 to 7:
0 = 000b → ()
1 = 001b → (1)
2 = 010b → (2)
3 = 011b → (1, 2)
4 = 100b → (3)
5 = 101b → (1, 3)
6 = 110b → (2, 3)
7 = 111b → (1, 2, 3)
In your problem you can generate each subset and its complement to get your pair of mutually exclusive subsets. Each pair would be repeated when you do this so you only need to iterate up to 2n-1 - 1 and then stop.
1 = 001b → (1) + (2, 3)
2 = 010b → (2) + (1, 3)
3 = 011b → (1, 2) + (3)
To deal with duplicate items you could generate subsets of list indices instead of subsets of list items. Like with the list (1, 2, 2, 3) generate subsets of the list (0, 1, 2, 3) instead and then use those numbers as indices into the (1, 2, 2, 3) list. Add a level of indirection, basically.
Here's some Python code putting this all together.
#!/usr/bin/env python
def split_subsets(items):
subsets = set()
for n in xrange(1, 2 ** len(items) / 2):
# Use ith index if ith bit of n is set.
l_indices = [i for i in xrange(0, len(items)) if n & (1 << i) != 0]
# Use the indices NOT present in l_indices.
r_indices = [i for i in xrange(0, len(items)) if i not in l_indices]
# Get the items corresponding to the indices above.
l = tuple(items[i] for i in l_indices)
r = tuple(items[i] for i in r_indices)
# Swap l and r if they are reversed.
if (len(l), l) > (len(r), r):
l, r = r, l
subsets.add((l, r))
# Sort the subset pairs so the left items are in ascending order.
return sorted(subsets, key = lambda (l, r): (len(l), l))
for l, r in split_subsets([1, 2, 2, 3]):
print l, r
Output:
(1,) (2, 2, 3)
(2,) (1, 2, 3)
(3,) (1, 2, 2)
(1, 2) (2, 3)
(1, 3) (2, 2)

The following C++ function does exactly what you need, but the order differs from the one in examples:
// input contains all input number with duplicates allowed
void generate(std::vector<int> input) {
typedef std::map<int,int> Map;
std::map<int,int> mp;
for (size_t i = 0; i < input.size(); ++i) {
mp[input[i]]++;
}
std::vector<int> numbers;
std::vector<int> mult;
for (Map::iterator it = mp.begin(); it != mp.end(); ++it) {
numbers.push_back(it->first);
mult.push_back(it->second);
}
std::vector<int> cur(mult.size());
for (;;) {
size_t i = 0;
while (i < cur.size() && cur[i] == mult[i]) cur[i++] = 0;
if (i == cur.size()) break;
cur[i]++;
std::vector<int> list1, list2;
for (size_t i = 0; i < cur.size(); ++i) {
list1.insert(list1.end(), cur[i], numbers[i]);
list2.insert(list2.end(), mult[i] - cur[i], numbers[i]);
}
if (list1.size() == 0 || list2.size() == 0) continue;
if (list1 > list2) continue;
std::cout << "{{";
for (size_t i = 0; i < list1.size(); ++i) {
if (i > 0) std::cout << ",";
std::cout << list1[i];
}
std::cout << "},{";
for (size_t i = 0; i < list2.size(); ++i) {
if (i > 0) std::cout << ",";
std::cout << list2[i];
}
std::cout << "}\n";
}
}

A bit of Erlang code, the problem is that it generates duplicates when you have duplicate elements, so the result list still needs to be filtered...
do([E,F]) -> [{[E], [F]}];
do([H|T]) -> lists:flatten([{[H], T}] ++
[[{[H|L1],L2},{L1, [H|L2]}] || {L1,L2} <- all(T)]).
filtered(L) ->
lists:usort([case length(L1) < length(L2) of true -> {L1,L2};
false -> {L2,L1} end
|| {L1,L2} <- do(L)]).
in pseudocode this means that:
for a two long list {E,F} the result is {{E},{F}}
for longer lists take the first element H and the rest of the list T and return
{{H},{T}} (the first element as a single element list, and the remaining list)
also run the algorithm recursively for T, and for each {L1,L2} element in the resulting list return {{H,L1},{L2}} and {{L1},{H,L2}}

My suggestion is...
First, count how many of each value you have, possibly in a hashtable. Then calculate the total number of combinations to consider - the product of the counts.
Iterate through that number of combinations.
At each combination, copy your loop count (as x), then start an inner loop through your hashtable items.
For each hashtable item, use (x modulo count) as your number of instances of the hashtable key in the first list. Divide x by the count before repeating the inner loop.
If you are worried that the number of combinations might overflow your integer type, the issue is avoidable. Use an array with each item (one for every hashmap key) starting from zero, and 'count' through the combinations treating each array item as a digit (so the whole array represents the combination number), but with each 'digit' having a different base (the corresponding count). That is, to 'increment' the array, first increment item 0. If it overflows (becomes equal to its count), set it to zero and increment the next array item. Repeat the overflow checks until If overflows continue past the end of the array, you have finished.
I think sergdev is using a very similar approach to this second one, but using std::map rather than a hashtable (std::unordered_map should work). A hashtable should be faster for large numbers of items, but won't give you the values in any particular order. The ordering for each loop through the keys in a hashtable should be consistent, though, unless you add/remove keys.

Related

Find number of continuous subarray having sum zero

You have given a array and You have to give number of continuous subarray which the sum is zero.
example:
1) 0 ,1,-1,0 => 6 {{0},{1,-1},{0,1,-1},{1,-1,0},{0}};
2) 5, 2, -2, 5 ,-5, 9 => 3.
With O(n^2) it can be done.I am trying to find the solution below this complexity.
Consider S[0..N] - prefix sums of your array, i.e. S[k] = A[0] + A[1] + ... + A[k-1] for k from 0 to N.
Now sum of elements from L to R-1 is zero if and only if S[R] = S[L]. It means that you have to find number of indices 0 <= L < R <= N such that S[L] = S[R].
This problem can be solved with a hash table. Iterate over elements of S[] while maintaining for each value X number of times it was met in the already processed part of S[]. These counts should be stored in a hash map, where the number X is a key, and the count H[X] is the value. When you meet a new elements S[i], add H[S[i]] to your answer (these account for substrings ending with (i-1)-st element), then increment H[S[i]] by one.
Note that if sum of absolute values of array elements is small, you can use a simple array instead of hash table. The complexity is linear on average.
Here is the code:
long long CountZeroSubstrings(vector<int> A) {
int n = A.size();
vector<long long> S(n+1, 0);
for (int i = 0; i < n; i++)
S[i+1] = S[i] + A[i];
long long answer = 0;
unordered_map<long long, int> H;
for (int i = 0; i <= n; i++) {
if (H.count(S[i]))
answer += H[S[i]];
H[S[i]]++;
}
return answer;
}
This can be solved in linear time by keeping a hash table of sums reached during the array traversal. The number of subsets can then be directly calculated from the counts of revisited sums.
Haskell version:
import qualified Data.Map as M
import Data.List (foldl')
f = foldl' (\b a -> b + div (a * (a + 1)) 2) 0 . M.elems . snd
. foldl' (\(s,m) x -> let s' = s + x in case M.lookup s' m of
Nothing -> (s',M.insert s' 0 m)
otherwise -> (s',M.adjust (+1) s' m)) (0,M.fromList[(0,0)])
Output:
*Main> f [0,1,-1,0]
6
*Main> f [5,2,-2,5,-5,9]
3
*Main> f [0,0,0,0]
10
*Main> f [0,1,0,0]
4
*Main> f [0,1,0,0,2,3,-3]
5
*Main> f [0,1,-1,0,0,2,3,-3]
11
C# version of #stgatilov answer https://stackoverflow.com/a/31489960/3087417 with readable variables:
int[] sums = new int[arr.Count() + 1];
for (int i = 0; i < arr.Count(); i++)
sums[i + 1] = sums[i] + arr[i];
int numberOfFragments = 0;
Dictionary<int, int> sumToNumberOfRepetitions = new Dictionary<int, int>();
foreach (int item in sums)
{
if (sumToNumberOfRepetitions.ContainsKey(item))
numberOfFragments += sumToNumberOfRepetitions[item];
else
sumToNumberOfRepetitions.Add(item, 0);
sumToNumberOfRepetitions[item]++;
}
return numberOfFragments;
If you want to have sum not only zero but any number k, here is the hint:
int numToFind = currentSum - k;
if (sumToNumberOfRepetitions.ContainsKey(numToFind))
numberOfFragments += sumToNumberOfRepetitions[numToFind];
I feel it can be solved using DP:
Let the state be :
DP[i][j] represents the number of ways j can be formed using all the subarrays ending at i!
Transitions:
for every element in the initial step ,
Increase the number of ways to form Element[i] using i elements by 1 i.e. using the subarray of length 1 starting from i and ending with i i.e
DP[i][Element[i]]++;
then for every j in Range [ -Mod(highest Magnitude of any element ) , Mod(highest Magnitude of any element) ]
DP[i][j]+=DP[i-1][j-Element[i]];
Then your answer will be the sum of all the DP[i][0] (Number of ways to form 0 using subarrays ending at i ) where i varies from 1 to Number of elements
Complexity is O(MOD highest magnitude of any element * Number of Elements)
https://www.techiedelight.com/find-sub-array-with-0-sum/
This would be an exact solution.
# Utility function to insert <key, value> into the dict
def insert(dict, key, value):
# if the key is seen for the first time, initialize the list
dict.setdefault(key, []).append(value)
# Function to print all sub-lists with 0 sum present
# in the given list
def printallSublists(A):
# create an empty -dict to store ending index of all
# sub-lists having same sum
dict = {}
# insert (0, -1) pair into the dict to handle the case when
# sub-list with 0 sum starts from index 0
insert(dict, 0, -1)
result = 0
sum = 0
# traverse the given list
for i in range(len(A)):
# sum of elements so far
sum += A[i]
# if sum is seen before, there exists at-least one
# sub-list with 0 sum
if sum in dict:
list = dict.get(sum)
result += len(list)
# find all sub-lists with same sum
for value in list:
print("Sublist is", (value + 1, i))
# insert (sum so far, current index) pair into the -dict
insert(dict, sum, i)
print("length :", result)
if __name__ == '__main__':
A = [0, 1, 2, -3, 0, 2, -2]
printallSublists(A)
I don't know what the complexity of my suggestion would be but i have an idea :)
What you can do is try to reduce element from main array which are not able to contribute for you solution
suppose elements are -10, 5, 2, -2, 5,7 ,-5, 9,11,19
so you can see that -10,9,11 and 19 are element
that are never gone be useful to make sum 0 in your case
so try to remove -10,9,11, and 19 from your main array
to do this what you can do is
1) create two sub array from your main array
`positive {5,7,2,9,11,19}` and `negative {-10,-2,-5}`
2) remove element from positive array which does not satisfy condition
condition -> value should be construct from negative arrays element
or sum of its elements
ie.
5 = -5 //so keep it //don't consider the sign
7 = (-5 + -2 ) // keep
2 = -2 // keep
9 // cannot be construct using -10,-2,-5
same for all 11 and 19
3) remove element form negative array which does not satisfy condition
condition -> value should be construct from positive arrays element
or sum of its elements
i.e. -10 // cannot be construct so discard
-2 = 2 // keep
-5 = 5 // keep
so finally you got an array which contains -2,-5,5,7,2 create all possible sub array form it and check for sum = 0
(Note if your input array contains 0 add all 0's in final array)

Generating k-combinations lexicographically

I'm not aware nor could I find an algorithm to generate combinations of k items (i.e. k-subsets) lexicographically. I do know algorithms to generate combinations of n choose k, but they don't generate the k-subsets lexicographically.
Can somebody help me out with this or point me in the right direction?
The following algorithm will generate all combinations of elements of a set:
procedure all_combinations(S)
if length(S) == 0
return {}
else
all_comb = {}
x = first element of S
Sx = S-{x}
for each C in all_combinations(Sx)
all_comb += C
all_comb += {x} ∪ C
return all_comb
For the set {1,2,3}, this algorithm does…
all_combinations({2,3})
all_combinations({3})
all_combinations({}), which returns {}
all_combinations({3}) returns {{}, {3}}
all_combinations({2,3}) returns {{}, {2}, {3}, {2,3}}
all_combinations({1,2,3}) returns {{}, {1}, {2}, {1,2}, {3}, {1,3}, {2,3}, {1,2,3}}
This algorithm basically uses some simple rules to determine the next token in the combination: assume a set N of size n and an unfinished combination C that is so far filled with c < k elements (searched combinations have length = k). Now C[c + 1] must lie in the range between (both inclusive) N[indexOf(C[c]) + 1] (each element must be higher than the previous to ensure order) and N[k - c + 1] (there are k - (c - 1) free spaces for remaining elements, which must aswell be higher than their previous element). Using this we can generate the combinations pretty easy recursively:
define combinationsLex(T[] set , int k)
sort(set)
//initialize the combinations with their first element
for int i in [0 , length(set) - k]
int c_init[k]
c_init[0] = set[i]
combinationsLexRec(set , c_init , 1)
//set is the alphabet from which the combinations are created, c is the current
//incomplete combination and at is the position in c at which the next element will be inserted
define combinationsLexRec(T[] set , int[] c , int at)
if at == length(c)
//do whatever you want with the combination
for int i in [c[at] + 1 , length(set) - c[at]]
int[] nc = copy(c)
nc[at] = set[i]
combinationsLexRec(set , nc , at + 1)
Notes:
this implementation assumes 0-based arrayindices
the result is only ordered lexicographically if the input is ordered aswell

how to write iterative algorithm for generate all subsets of a set?

I wrote recursive backtracking algorithm for finding all subsets of a given set.
void backtracke(int* a, int k, int n)
{
if (k == n)
{
for(int i = 1; i <=k; ++i)
{
if (a[i] == true)
{
std::cout << i << " ";
}
}
std::cout << std::endl;
return;
}
bool c[2];
c[0] = false;
c[1] = true;
++k;
for(int i = 0; i < 2; ++i)
{
a[k] = c[i];
backtracke(a, k, n);
a[k] = INT_MAX;
}
}
now we have to write the same algorithm but in an iterative form, how to do it ?
You can use the binary counter approach. Any unique binary string of length n represents a unique subset of a set of n elements. If you start with 0 and end with 2^n-1, you cover all possible subsets. The counter can be easily implemented in an iterative manner.
The code in Java:
public static void printAllSubsets(int[] arr) {
byte[] counter = new byte[arr.length];
while (true) {
// Print combination
for (int i = 0; i < counter.length; i++) {
if (counter[i] != 0)
System.out.print(arr[i] + " ");
}
System.out.println();
// Increment counter
int i = 0;
while (i < counter.length && counter[i] == 1)
counter[i++] = 0;
if (i == counter.length)
break;
counter[i] = 1;
}
}
Note that in Java one can use BitSet, which makes the code really shorter, but I used a byte array to illustrate the process better.
There are a few ways to write an iterative algorithm for this problem. The most commonly suggested would be to:
Count (i.e. a simply for-loop) from 0 to 2numberOfElements - 1
If we look at the variable used above for counting in binary, the digit at each position could be thought of a flag indicating whether or not the element at the corresponding index in the set should be included in this subset. Simply loop over each bit (by taking the remainder by 2, then dividing by 2), including the corresponding elements in our output.
Example:
Input: {1,2,3,4,5}.
We'd start counting at 0, which is 00000 in binary, which means no flags are set, so no elements are included (this would obviously be skipped if you don't want the empty subset) - output {}.
Then 1 = 00001, indicating that only the last element would be included - output {5}.
Then 2 = 00010, indicating that only the second last element would be included - output {4}.
Then 3 = 00011, indicating that the last two elements would be included - output {4,5}.
And so on, all the way up to 31 = 11111, indicating that all the elements would be included - output {1,2,3,4,5}.
* Actually code-wise, it would be simpler to turn this on its head - output {1} for 00001, considering that the first remainder by 2 will then correspond to the flag of the 0th element, the second remainder, the 1st element, etc., but the above is simpler for illustrative purposes.
More generally, any recursive algorithm could be changed to an iterative one as follows:
Create a loop consisting of parts (think switch-statement), with each part consisting of the code between any two recursive calls in your function
Create a stack where each element contains each necessary local variable in the function, and an indication of which part we're busy with
The loop would pop elements from the stack, executing the appropriate section of code
Each recursive call would be replaced by first adding it's own state to the stack, and then the called state
Replace return with appropriate break statements
A little Python implementation of George's algorithm. Perhaps it will help someone.
def subsets(S):
l = len(S)
for x in range(2**l):
yield {s for i,s in enumerate(S) if ((x / 2**i) % 2) // 1 == 1}
Basically what you want is P(S) = S_0 U S_1 U ... U S_n where S_i is a set of all sets contained by taking i elements from S. In other words if S= {a, b, c} then S_0 = {{}}, S_1 = {{a},{b},{c}}, S_2 = {{a, b}, {a, c}, {b, c}} and S_3 = {a, b, c}.
The algorithm we have so far is
set P(set S) {
PS = {}
for i in [0..|S|]
PS = PS U Combination(S, i)
return PS
}
We know that |S_i| = nCi where |S| = n. So basically we know that we will be looping nCi times. You may use this information to optimize the algorithm later on. To generate combinations of size i the algorithm that I present is as follows:
Suppose S = {a, b, c} then you can map 0 to a, 1 to b and 2 to c. And perumtations to these are (if i=2) 0-0, 0-1, 0-2, 1-0, 1-1, 1-2, 2-0, 2-1, 2-2. To check if a sequence is a combination you check if the numbers are all unique and that if you permute the digits the sequence doesn't appear elsewhere, this will filter the above sequence to just 0-1, 0-2 and 1-2 which are later mapped back to {a,b},{a,c},{b,c}. How to generate the long sequence above you can follow this algorithm
set Combination(set S, integer l) {
CS = {}
for x in [0..2^l] {
n = {}
for i in [0..l] {
n = n U {floor(x / |S|^i) mod |S|} // get the i-th digit in x base |S|
}
CS = CS U {S[n]}
}
return filter(CS) // filtering described above
}

How can I efficiently determine if two lists contain elements ordered in the same way?

I have two ordered lists of the same element type, each list having at most one element of each value (say ints and unique numbers), but otherwise with no restrictions (one may be a subset of the other, they may be completely disjunct, or share some elements but not others).
How do I efficiently determine if A is ordering any two items in a different way than B is? For example, if A has the items 1, 2, 10 and B the items 2, 10, 1, the property would not hold as A lists 1 before 10 but B lists it after 10. 1, 2, 10 vs 2, 10, 5 would be perfectly valid however as A never mentions 5 at all, I cannot rely on any given sorting rule shared by both lists.
You can get O(n) as follows. First, find the intersection of the two sets using hashing. Second, test whether A and B are identical if you only consider elements from the intersection.
My approach would be to first make sorted copies of A and B which also record the positions of elements in the original lists:
for i in 1 .. length(A):
Apos[i] = (A, i)
sortedApos = sort(Apos[] by first element of each pair)
for i in 1 .. length(B):
Bpos[i] = (B, i)
sortedBpos = sort(Bpos[] by first element of each pair)
Now find those elements in common using a standard list merge that records the positions in both A and B of the shared elements:
i = 1
j = 1
shared = []
while i <= length(A) && j <= length(B)
if sortedApos[i][1] < sortedBpos[j][1]
++i
else if sortedApos[i][1] > sortedBpos[j][1]
++j
else // They're equal
append(shared, (sortedApos[i][2], sortedBpos[j][2]))
++i
++j
Finally, sort shared by its first element (position in A) and check that all its second elements (positions in B) are increasing. This will be the case iff the elements common to A and B appear in the same order:
sortedShared = sort(shared[] by first element of each pair)
for i = 2 .. length(sortedShared)
if sortedShared[i][2] < sortedShared[i-1][2]
return DIFFERENT
return SAME
Time complexity: 2*(O(n) + O(nlog n)) + O(n) + O(nlog n) + O(n) = O(nlog n).
General approach: store all the values and their positions in B as keys and values in a HashMap. Iterate over the values in A and look them up in B's HashMap to get their position in B (or null). If this position is before the largest position value you've seen previously, then you know that something in B is in a different order than A. Runs in O(n) time.
Rough, totally untested code:
boolean valuesInSameOrder(int[] A, int[] B)
{
Map<Integer, Integer> bMap = new HashMap<Integer, Integer>();
for (int i = 0; i < B.length; i++)
{
bMap.put(B[i], i);
}
int maxPosInB = 0;
for (int i = 0; i < A.length; i++)
{
if(bMap.containsKey(A[i]))
{
int currPosInB = bMap.get(A[i]);
if (currPosInB < maxPosInB)
{
// B has something in a different order than A
return false;
}
else
{
maxPosInB = currPosInB;
}
}
}
// All of B's values are in the same order as A
return true;
}

Algorithm to find two repeated numbers in an array, without sorting

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.
E.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.
What is the best way to find the repeated numbers?
P.S. [You should not use sorting]
There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.
bool flags[100];
for(int i = 0; i < 100; i++)
flags[i] = false;
for(int i = 0; i < input_size; i++)
if(flags[input_array[i]])
return input_array[i];
else
flags[input_array[i]] = true;
Of course there is the additional memory but this is the fastest.
OK, seems I just can't give it a rest :)
Simplest solution
int A[N] = {...};
int signed_1(n) { return n%2<1 ? +n : -n; } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n; } // 0,+1,-2,-3,+4,+5,-6,-7,...
long S1 = 0; // or int64, or long long, or some user-defined class
long S2 = 0; // so that it has enough bits to contain sum without overflow
for (int i=0; i<N-2; ++i)
{
S1 += signed_1(A[i]) - signed_1(i);
S2 += signed_2(A[i]) - signed_2(i);
}
for (int i=N-2; i<N; ++i)
{
S1 += signed_1(A[i]);
S2 += signed_2(A[i]);
}
S1 = abs(S1);
S2 = abs(S2);
assert(S1 != S2); // this algorithm fails in this case
p = (S1+S2)/2;
q = abs(S1-S2)/2;
One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().
if abs(S1)==abs(S2) then the algorithm fails, though this value will still be the difference between p and q (i.e. abs(p - q) == abs(S1)).
Previous solution
I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:
Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)
so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2
Simple math remains:
(1) p+q = S1
(2) p^2+q^2 = S2
Surely you won't solve it as math classes teach to solve square equations.
First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.
You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.
You can calculate the sum and the product over the array, resulting in:
1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2
So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get
sum(Array) - (n-3)(n-2)/2 = x = p + q
Now do the same for the product:
1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q
prod(Array) / (n - 3)! = y = p * q
Your now got these terms:
x = p + q
y = p * q
=> y(p + q) = x(p * q)
If you transform this term, you should be able to calculate p and q
Insert each element into a set/hashtable, first checking if its are already in it.
You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.
Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.
Check this old but good paper on the topic:
Finding Repeated Elements (PDF)
Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.
For example, here's a relevant part from my answer:
bool has_duplicates(int* a, int m, int n)
{
/** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')
Whether a[] array has duplicates.
precondition: all values are in [n, n+m) range.
feature: It marks visited items using a sign bit.
*/
assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
for (int *p = a; p != &a[m]; ++p) {
*p -= (n - 1); // [n, n+m) -> [1, m+1)
assert(*p > 0);
}
// determine: are there duplicates
bool has_dups = false;
for (int i = 0; i < m; ++i) {
const int j = abs(a[i]) - 1;
assert(j >= 0);
assert(j < m);
if (a[j] > 0)
a[j] *= -1; // mark
else { // already seen
has_dups = true;
break;
}
}
// restore the array
for (int *p = a; p != &a[m]; ++p) {
if (*p < 0)
*p *= -1; // unmark
// [1, m+1) -> [n, n+m)
*p += (n - 1);
}
return has_dups;
}
The program leaves the array unchanged (the array should be writeable but its values are restored on exit).
It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).
suppose array is
a[0], a[1], a[2] ..... a[n-1]
sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]
sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2
similarly
sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6
Suppose repeated elements are = X and Y
so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;
So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)
I know the question is very old but I suddenly hit it and I think I have an interesting answer to it.
We know this is a brainteaser and a trivial solution (i.e. HashMap, Sort, etc) no matter how good they are would be boring.
As the numbers are integers, they have constant bit size (i.e. 32). Let us assume we are working with 4 bit integers right now. We look for A and B which are the duplicate numbers.
We need 4 buckets, each for one bit. Each bucket contains numbers which its specific bit is 1. For example bucket 1 gets 2, 3, 4, 7, ...:
Bucket 0 : Sum ( x where: x & 2 power 0 == 0 )
...
Bucket i : Sum ( x where: x & 2 power i == 0 )
We know what would be the sum of each bucket if there was no duplicate. I consider this as prior knowledge.
Once above buckets are generated, a bunch of them would have values more than expected. By constructing the number from buckets we will have (A OR B for your information).
We can calculate (A XOR B) as follows:
A XOR B = Array[i] XOR Array[i-1] XOR ... 0, XOR n-3 XOR n-2 ... XOR 0
Now going back to buckets, we know exactly which buckets have both our numbers and which ones have only one (from the XOR bit).
For the buckets that have only one number we can extract the number num = (sum - expected sum of bucket). However, we should be good only if we can find one of the duplicate numbers so if we have at least one bit in A XOR B, we've got the answer.
But what if A XOR B is zero?
Well this case is only possible if both duplicate numbers are the same number, which then our number is the answer of A OR B.
Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.
Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:
int count [10];
for (int i = 0; i < arraylen; i++) {
count[array[i]]++;
}
Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.
Here's implementation in Python of #eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).
def two_repeated(iterable):
s1, s2 = 0, 0
for i, j in enumerate(iterable):
s1 += j - i # number_of_digits(s1) ~ 2 * number_of_digits(i)
s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i)
s1 += (i - 1) + i
s2 += (i - 1)**2 + i**2
p = (s1 - int((2*s2 - s1**2)**.5)) // 2
# `Decimal().sqrt()` could replace `int()**.5` for really large integers
# or any function to compute integer square root
return p, s1 - p
Example:
>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
A more verbose version of the above code follows with explanation:
def two_repeated_seq(arr):
"""Return the only two duplicates from `arr`.
>>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
"""
n = len(arr)
assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
assert len(set(arr)) == (n - 2) # number of unique items
s1 = (n-2) + (n-1) # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits
s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
for i, j in enumerate(arr):
s1 += j - i
s2 += j*j - i*i
"""
s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
= sum(arr) - sum(range(n-2))
= sum(range(n-2)) + p + q - sum(range(n-2))
= p + q
"""
assert s1 == (sum(arr) - sum(range(n-2)))
"""
s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
= sum(i*i for i in arr) - sum(i*i for i in range(n-2))
= p*p + q*q
"""
assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))
"""
s1 = p+q
-> s1**2 = (p+q)**2
-> s1**2 = p*p + 2*p*q + q*q
-> s1**2 - (p*p + q*q) = 2*p*q
s2 = p*p + q*q
-> p*q = (s1**2 - s2)/2
Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
that p and q are roots of x**2 - B*x + C = 0
-> p = (B + sqrtD) / 2
-> q = (B - sqrtD) / 2
where sqrtD = sqrt(B**2 - 4*C)
-> p = (s1 + sqrt(2*s2 - s1**2))/2
"""
sqrtD = (2*s2 - s1**2)**.5
assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
sqrtD = int(sqrtD)
assert (s1 - sqrtD) % 2 == 0 # even
p = (s1 - sqrtD) // 2
q = s1 - p
assert q == ((s1 + sqrtD) // 2)
assert sqrtD == (q - p)
return p, q
NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.
Since a range is specified, you can perform radix sort. This would sort your array in O(n). Searching for duplicates in a sorted array is then O(n)
You can use simple nested for loop
int[] numArray = new int[] { 1, 2, 3, 4, 5, 7, 8, 3, 7 };
for (int i = 0; i < numArray.Length; i++)
{
for (int j = i + 1; j < numArray.Length; j++)
{
if (numArray[i] == numArray[j])
{
//DO SOMETHING
}
}
*OR you can filter the array and use recursive function if you want to get the count of occurrences*
int[] array = { 1, 2, 3, 4, 5, 4, 4, 1, 8, 9, 23, 4, 6, 8, 9, 1,4 };
int[] myNewArray = null;
int a = 1;
void GetDuplicates(int[] array)
for (int i = 0; i < array.Length; i++)
{
for (int j = i + 1; j < array.Length; j++)
{
if (array[i] == array[j])
{
a += 1;
}
}
Console.WriteLine(" {0} occurred {1} time/s", array[i], a);
IEnumerable<int> num = from n in array where n != array[i] select n;
myNewArray = null;
a = 1;
myNewArray = num.ToArray() ;
break;
}
GetDuplicates(myNewArray);
answer to 18..
you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!
check this out ...
O(n) time and O(1) space complexity
for(i=0;i< n;i++)
xor=xor^arr[i]
for(i=1;i<=n-3;i++)
xor=xor^i;
So in the given example you will get the xor of 3 and 5
xor=xor & -xor //Isolate the last digit
for(i = 0; i < n; i++)
{
if(arr[i] & xor)
x = x ^ arr[i];
else
y = y ^ arr[i];
}
for(i = 1; i <= n-3; i++)
{
if(i & xor)
x = x ^ i;
else
y = y ^ i;
}
x and y are your answers
For each number: check if it exists in the rest of the array.
Without sorting you're going to have a keep track of numbers you've already visited.
in psuedocode this would basically be (done this way so I'm not just giving you the answer):
for each number in the list
if number not already in unique numbers list
add it to the unique numbers list
else
return that number as it is a duplicate
end if
end for each
How about this:
for (i=0; i<n-1; i++) {
for (j=i+1; j<n; j++) {
if (a[i] == a[j]) {
printf("%d appears more than once\n",a[i]);
break;
}
}
}
Sure it's not the fastest, but it's simple and easy to understand, and requires
no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)
In c:
int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};
int num = 0, i;
for (i=0; i < 8; i++)
num = num ^ arr[i] ^i;
Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.
I have written a small programme which finds out the number of elements not repeated, just go through this let me know your opinion, at the moment I assume even number of elements are even but can easily extended for odd numbers also.
So my idea is to first sort the numbers and then apply my algorithm.quick sort can be use to sort this elements.
Lets take an input array as below
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
the number 2,10 and 4 are not repeated ,but they are in sorted order, if not sorted use quick sort to first sort it out.
Lets apply my programme on this
using namespace std;
main()
{
//int arr[] = {2, 9, 6, 1, 1, 4, 2, 3, 5};
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
int i = 0;
vector<int> vec;
int var = arr[0];
for(i = 1 ; i < sizeof(arr)/sizeof(arr[0]); i += 2)
{
var = var ^ arr[i];
if(var != 0 )
{
//put in vector
var = arr[i-1];
vec.push_back(var);
i = i-1;
}
var = arr[i+1];
}
for(int i = 0 ; i < vec.size() ; i++)
printf("value not repeated = %d\n",vec[i]);
}
This gives the output:
value not repeated= 2
value not repeated= 10
value not repeated= 4
Its simple and very straight forward, just use XOR man.
for(i=1;i<=n;i++) {
if(!(arr[i] ^ arr[i+1]))
printf("Found Repeated number %5d",arr[i]);
}
Here is an algorithm that uses order statistics and runs in O(n).
You can solve this by repeatedly calling SELECT with the median as parameter.
You also rely on the fact that After a call to SELECT,
the elements that are less than or equal to the median are moved to the left of the median.
Call SELECT on A with the median as the parameter.
If the median value is floor(n/2) then the repeated values are right to the median. So you continue with the right half of the array.
Else if it is not so then a repeated value is left to the median. So you continue with the left half of the array.
You continue this way recursively.
For example:
When A={2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, then the median should be the value 4.
After the first call to SELECT
A={3, 2, 0, 1, <3>, 4, 5, 6, 5} The median value is smaller than 4 so we continue with the left half.
A={3, 2, 0, 1, 3}
After the second call to SELECT
A={1, 0, <2>, 3, 3} then the median should be 2 and it is so we continue with the right half.
A={3, 3}, found.
This algorithm runs in O(n+n/2+n/4+...)=O(n).
What about using the https://en.wikipedia.org/wiki/HyperLogLog?
Redis does http://redis.io/topics/data-types-intro#hyperloglogs
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.
Well using the nested for loop and assuming the question is to find the number occurred only twice in an array.
def repeated(ar,n):
count=0
for i in range(n):
for j in range(i+1,n):
if ar[i] == ar[j]:
count+=1
if count == 1:
count=0
print("repeated:",ar[i])
arr= [2, 3, 6, 1, 5, 4, 0, 3, 5]
n = len(arr)
repeated(arr,n)
Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no .
This can be extended to find out if there is any missing no in the array or not.
This solution is O(n) in time complexity

Resources