Algorithm for Permutation with Buckets - algorithm

I am looking for an algorithm which works like this
permutateBuckets([A,B,C])
and gives the following result:
[ [[A,B,C]],
[[A,B],[C]], [[A,C],[B]], [[B,C],[A]], [[A],[B,C]], [[B],[A,C]], [[C],[A,B]],
[[A],[B],[C]], [[A],[C],[B]], [[B],[A],[C]], [[B],[C],[A]], [[C],[A],[B]], [[C],[B],[A]]
]
In general:
The permutation for [1,2,...,n] should include any possible arrangements of 1 up to n buckets that contain the input values, order of values within buckets is not relevant (e.g. [1,2] equals [2,1]), only the order of the containing buckets matters (e.g. [[1,2],[3]] is different than [[3],[1,2]] ).
Each input element has to be in exactly one bucket for a result to be valid (e.g. an input of [1,2] cannot give [[1]] (missing 2), or [[1,2],[1]] (1 appears twice) as output).

The simplest approach is recursive:
Make [[A]] list
Insert new item in all possible places -
before current sublists
between all sublists
after current sublists
into every sublist
For example, list [[B][A]] produces 5 new lists with item C - places to insert C are:
[ [B] [A] ]
^ ^ ^ ^ ^
and three level-2 lists [[A],[B]], [[B],[A]], [[A,B]] produce 5+5+3=13 level-3 lists.
Alternative way:
Generate all n-length nondecreasing sequences from 1...1 to 1..n and generate unique permutations for every sequence.
Values on these permutations correspond to the bucket number for every item. For example, 122 sequence gives 3 permutations that corresponds to distributions:
1 2 2 [1],[2, 3]
2 1 2 [2],[1, 3]
2 2 1 [3],[1, 2]
In any case number of distributions rises very quickly (ordered Bell numbers 1, 3, 13, 75, 541, 4683, 47293, 545835, 7087261, 102247563...)
Implementation of iterative approach in Delphi (full FP-compatible code at ideone)
procedure GenDistributions(N: Integer);
var
seq, t, i, mx: Integer;
Data: array of Byte;
Dist: TBytes2D;
begin
SetLength(Data, N);
//there are n-1 places for incrementing
//so 2^(n-1) possible sequences
for seq := 0 to 1 shl (N - 1) - 1 do begin
t := seq;
mx := 0;
Data[0] := mx;
for i := 1 to N - 1 do begin
mx := mx + (t and 1); //check for the lowest bit
Data[i] := mx;
t := t shr 1;
end;
//here Data contains nondecreasing sequence 0..mx, increment is 0 or 1
//Data[i] corresponds to the number of sublist which item i belongs to
repeat
Dist := nil;
SetLength(Dist, mx + 1); // reset result array into [][][] state
for i := 0 to N - 1 do
Dist[Data[i]] := Dist[Data[i]] + [i]; //add item to calculated sublist
PrintOut(Dist);
until not NextPerm(Data); //generates next permutation if possible
end;
And now Python recursive implementation (ideone)
import copy
cnt = 0
def ModifySublist(Ls, idx, value):
res = copy.deepcopy(Ls)
res[idx].append(value)
return res
def InsertSublist(Ls, idx, value):
res = copy.deepcopy(Ls)
res.insert(idx, [value])
return res
def GenDists(AList, Level, Limit):
global cnt
if (Level==Limit):
print( AList)
cnt += 1
else:
for i in range(len(AList)):
GenDists(ModifySublist(AList, i, Level), Level + 1, Limit)
GenDists(InsertSublist(AList, i, Level), Level + 1, Limit)
GenDists(InsertSublist(AList, len(AList), Level), Level + 1, Limit)
GenDists([], 0, 3)
print(cnt)
Edit: #mhmnn cloned this code in JavaScript using custom items for output.

Related

Queries add a number, remove a number, replace all A[i] in array with A[i] xor X, find sum of K smallest numbers

The problem is:
Initially, the sequence is empty. There are n queries and 4 types of queries:
Add(x): add x to the sequence, if there is already x in the sequence, still add x.
Remove(x): remove x from the sequence 1 times.
Xor(x): replace all N of the sequence with N xor x.
Sum(K): find sum of the k smallest elements in the sequence.
0 <= x, n, K <= 10^5
For each query sum(x), output the sum of the x smallest elements in the sequence.
Input:
7
Add(4) // A[] = {4}
Remove(3) // A[] = {4}
Add(2) // A[] = {4, 2}
Sum(2) // A[] = {4, 2} => Output: 6
Xor(2) // A[] = {4^2, 2^2} = {6, 0}
Sum(1) // A[] = {6, 0} => Output: 0
Sum(2) // A[] = {6, 0} => Output: 6
I solved the problem with the following way:
Use a vector A to hold the sequence of numbers, and an array Count[] where Count[x] is the number of occurrences of x in A. Initially A is empty, and every Count[x] = 0.
For each Add(x) query, I add x to A, and Count[x] = Count[x]+1
For each Remove(x) query, if Count[x] = 0 then skip, otherwise, remove x from A and Count[x] = Count[x]-1
For each Xor(x) query, replace every A[i] with A[i]^x
For each Sum(x) query, sort A in ascending value, take the sum of the first x numbers
It seems that my way has a complexity of O(n^2), so for n <= 100000 the above algorithm cannot work. Is there a better way to solve this problem? Thanks a lot.
My code can run well in n <= 5000. Here is it:
int Count[100001];
vector<int> A;
void Add(int x) {
A.push_back(x);
Count[x] = Count[x]+1;
}
void Remove(int x) {
if (Count[x] == 0) return;
Count[x] = Count[x]-1;
auto Find = find(A.begin(), A.end(), x);
A.erase(Find);
}
void Xor(int x) {
for (int& i : A)
i = i^x;
}
int Sum(int x) {
int Num = 0, S = 0;
for (int i : A) {
if (Num + 1 > x) return S;
S = S + i; Num = Num + 1;
}
return S;
}
I'll describe a data structure that supports Add(x)/Remove(x)/Count()/SumXorWith(x) (returns the sum of all elements xor x; doesn't modify the sequence) and then sketch how to extend it to a full solution where each operation is O(log^2 n) (taking n to be both the number of operations and the upper bound on the values).
First observe that Count and SumXorWith can be used to count, for each bit position, how many numbers have that position set (e.g., for the low order bit, it's (Count() + SumXorWith(0) - SumXorWith(1)) / 2). Conversely, it's enough to maintain these counts. In pseudocode:
*** Variables, initially zero:
count : int
bit_count : int[17]
*** Operations:
Add(x):
increment count
for j from 0 to 16, add the j'th bit of x to bit_count[j]
Remove(x):
decrement count
for j from 0 to 16, subtract the j'th bit of x from bit_count[j]
Count():
return count
SumXorWith(x):
return the sum for j from 0 to 16 of
2**j * (if j'th bit of x = 0 then bit_count[j] else count - bit_count[j])
To extend this data structure to handle Xor(x)/Sum(), we could just take count - bit_count for each bit set in x, but for efficiency (that we'll need later), there's a trick. The idea is that we store the sequence xor cum_xor. More pseudocode:
*** Additional variable, initially zero
cum_xor : int
*** Operations:
Add(x): super.Add(x xor cum_xor)
Remove(x): super.Remove(x xor cum_xor)
Xor(x): cum_xor <- cum_xor xor x
Count(): return super.Count()
Sum(): return super.SumXorWith(cum_xor)
Finally, we need to handle Sum(x), with selection. This is, frankly, the tedious part. We set up a height-17 (ceiling of log2(100000)) trie on big-endian bit patterns, with one of the data structures above at each node of the trie. To Add/Remove, we descend the trie, doing Add/Remove at each node. Xor we handle as before, by updating cum_xor. Sum(x) is the trickiest, of course. Starting at the root of the trie, we examine the current node. If it has at most x elements, just sum it. Otherwise, its "favored" child is the one that agrees with cum_xor, and its "disfavored" child is the one that disagrees. If the favored child has at least x elements, then we can operate recursively on it and ignore the disfavored child. Otherwise, we sum the whole favored child and operate recursively on the disfavored child, decreasing x by the number of elements in the favored child.
(For maximum practical efficiency, we'd want something with higher fan-out than the trie and likely the naive implementation near the leaves, but this is as simple as I can make it and likely fast enough.)

Algorithms. Add two n-bit binary numbers. What is a loop invariant of this problem?

I'm solving the exercise 2.1-4 from CLRS "introduction to algorithms".
The problem is described as:
Consider the problem of adding two n-bit binary integers, stored in two n-element arrays A and B. The sum of the two integers should be stored in binary form in element array C.
What is the loop invariant of this problem?
I have some thoughts about this question, and wrote them as comments in my solution to this problem, written in golang.
package additoin_binary
/*
Loop invariant:
At the start of each iteration of the loop digits in the subarray r[len(r) - 1 - i:] are
a[len(a) - 1 - i:] + b[len(b) - 1 - i:] + carry | provided that (len(a) - 1 - i) and (len(b) - 1 - i) are positive
a[len(a) - 1 - i:] + carry | provided that (len(a) - 1 - i) is positive and (len(b) - 1 - i) is negative
carry | provided that (len(a) - 1 - i) and (len(b) - 1 - i) are negative
*** carry for i = a[(len(a) - 1) - i - 1] + b[(len(b) - 1) - i - 1] == 2 ? 1 : 0
*/
func BinaryAddition(a, b []int) []int {
// a and b can have arbitrary number of digits.
// We should control a length of the second term. It should be less or equal to a length of first term.w
// Other vise we swap them
if len(b) > len(a) {
b, a = a, b
}
// loop invariant initialization:
// before first loop iteration (i=0) index b_i is out of the array range (b[-1]), so we don't have second term and sum = a
// another way of thinking about it is we don't get b as argument and then sum = a, too
if len(b) == 0 {
return a
}
// result array to store sum
r := make([]int, len(a)+1)
// overflow of summing two bits (1 + 1)
carry := 0
// loop invariant maintenance:
// we have right digits (after addition) in r for indexes r[len(r) - 1 - i:]
for i := 0; i < len(r); i++ {
a_i := len(a) - 1 - i // index for getting a last digit of a
b_i := len(b) - 1 - i // index for getting a last digit of b
r_i := len(r) - 1 - i // index for getting a last digit of r
var s int
if b_i >= 0 && a_i >= 0 {
s = a[a_i] + b[b_i] + carry
} else if a_i >= 0 {
s = a[a_i] + carry
} else { // all indexes run out of the game (a < 0, b < 0)
s = carry
}
if s > 1 {
r[r_i] = 0
carry = 1
} else {
r[r_i] = s
carry = 0
}
}
// loop invariant termination:
// i goes from 0 to len(r) - 1, r[len(r) - 1 - ([len(r) - 1):] => r[:]
// This means, that for every index in r we have a right sum
//*At i=0, r[i] a sum can be equal to 0, so we explicitly check that before return r
if r[0] == 0 {
return r[1:]
} else {
return r
}
}
Edit 1: I extended the original problem. So now arrays A and B can have arbitrary lengths, respectively m and n. Example A = [1,0,1], B = [1,0] (m=3, n=2)
Consider the problem of adding two n-bit binary integers, stored in two n-element arrays A and B. The sum of the two integers should be stored in binary form in element array C.
The problem has a guarantee that A and B are n-element arrays, I think it's an important condition which could reduce code work.
What is a loop invariant?
In simple words, a loop invariant is some predicate (condition) that holds for every iteration of the loop.
In this problem, if assume len = len(C), iterate i in [0, len), the loop invariant is that r[len-1-i:len] is always the sum of a[len-2-i:len-1] and b[len-2-i:len-1] in lower i + 1 bit. Because after each loop, you will make one bit correct, it could prove that algorithm is correct.
The loop invariant condition can be taken as the number of bits yet to be added n - p (assuming you start by adding the lsb bits first from right to left), where p I have taken as the current bit significant position and n the size of the Augend and Addend bit-sequences.

How to generate 2 non-adjacent random numbers in a range

I need to generate 2 random numbers in a range [A..B] with the restriction that the numbers can not be adjacent.
I would like to do it in constant time (I don't want to keep drawing until the 2nd value is good).
I can think of several ways to do this:
Pick the first, then pick one that fits after it: Draw V1 from the range [A..B-2], then draw V2 from the range [V1+2..B]
Pick the distance between them, then place them: Draw d from [2..B-A] and V1 from [0..B-A-d] then V2=V1+d
Pick the first, then pick an offset to the second one: Draw V1 from the whole range, then draw d from the range [A+2-V1..B-V1-1], and set V2= d<=0 ? V1-2+d : V1+1+d
Pick the first, then pick the distance to the second with wrapping: pick V1 from [A..B], d from [0..A-B-2], V2 = V1+d; V2 = V2>B ? V2-(B-A)
I want the most random method (generates most entropy, has most even distribution). I think the last 2 are equivalent and more random than the first two. Is there an even better way?
Assume that the range is [0, n). For random unordered nonadjacent pairs, it suffices to generate a random unordered pair from [0, n-2) and increase the greater element by 2. The latter can be accomplished by a bijective mapping from [0, (n+1)n/2).
import random
def randnonadjpair(n):
i, j = randunordpair(n-2)
return i, j+2
def randunordpair(n):
i = random.randrange((n+1)*n//2)
if n%2 == 1:
if i < n:
return i, n-1
i -= n
n -= 1
h = n//2
q, r = divmod(i, h)
if q < h:
return q, h + r
q -= h
if q <= r:
return q, r
return n-q, n-1-r
(This answer is for ordered pairs.)
There are 2 (n-2) + (n-2) (n-3) = n^2 - 3 n + 2 ways to choose two ordered nonadjacent elements from a range of length n. Generate a random number x between 0 inclusive and n^2 - 3 n + 2 exclusive and then map it bijectively to a valid outcome:
def biject(n, x):
if x < n - 2:
return (0, x + 2)
x -= n - 2
if x < n - 2:
return (n - 1, x)
x -= n - 2
q, r = divmod(x, n - 3)
return (q, r if r < q - 1 else r + 3)
If you want maximum entropy then your two picks have to be independent. Thus the value of the second pick cannot be limited by the first pick; both have to be chosen from the entire range available. That means picking the two numbers independently, checking them as a pair and rejecting both if the pair is unsuitable. In pseudocode, that looks something like:
function pickPair()
repeat
num1 <- random(A, B)
num2 <- random(A, B)
until (notAdjacent(num1, num2))
return (num1, num2)
end function
You check the constraints on the two numbers in the method notAdjacent().
You do not state the size of the range [A..B]. Given a reasonably large range then the chances of having to reject a pair are low. Alternatively, always pick a fixed number of pairs and return any of the pairs that matches your criterion:
function constantTimePickPair
pairFound <- false
repeats <- 5 // Or enough to ensure certainty of a valid pair.
do repeats times
num1 <- random(A, B)
num2 <- random(A, B)
if (notAdjacent(num1, num2))
pairFound <- true
result <- (num1, num2)
end if
end do
if (NOT pairFound)
throw error "Pair not found."
end if
return result
end function
You will need to set enough repeats to make statistically certain of finding a valid pair.
How about the following approach:
V1 = rand(A..B)
V2 = rand(A+2..B-1)
V2 += V2 > V1 ? 1 : -2
Also, it should be mentioned that you can't get an even distribution here for the second choice.
Border items on the left and on the right will have slightly more chances to be picked.
The probability for inner numbers is (B-A-3)/(B-A), while probability for border elements is (B-A-2)/(B-A).
Here's my current plan:
Given target range [A..B]. It's length L is A-B+1. We want to choose V1,V2 such that V2 is not in range [V1-1..V1+1]
If V1 is A, then there are L-2 possibilities for V2
If V1 is A+1, there are L-3 possibilities for V2.
...
Extending this pattern, we get the total number of possibilities P as sum([1..L-2]). (This is half the number #David Eisenstat came up with).
If we pick a number N in the range [0,P), then we can generate the corresponding combination with:
V1 = A
T = L-2
while (N >= T):
N -= T
T -= 1
V1 += 1
V2 = V2 + N + 2
I would do it as follows:
Draw V1 from [A..B]
If V1 == A || V1 == B draw V2 from [A..B-1], else draw V2 from [A..B-2]
Do:
if(V2 >= V1 - 1) V2++;
if(V2 >= V1 + 1) V2++;
The first check makes sure that V1 - 1 can not be the value of V2
The second check makes sure that V1 + 1 can not be the value of V2.
Or, in other words, this remaps the values to [A..V1-2][V1][V1+2..B].
Since this does not discard nor repeat any values, the distribution should be good.
This answer currently assumes V1 == V2 is valid.
In fact, no, the distribution of the above would be biased.
If N = B - A + 1,
for a number = A or = B, there are N - 2 pairs containing it
for a number in [A+1...B-1], there are only N - 3 pairs containing it.
Calculate the number of pairs M, draw a number in [1..M] and map it back to the corresponding pair, as detailed for example in David Eisenstats answer.

Find number of continuous subarray having sum zero

You have given a array and You have to give number of continuous subarray which the sum is zero.
example:
1) 0 ,1,-1,0 => 6 {{0},{1,-1},{0,1,-1},{1,-1,0},{0}};
2) 5, 2, -2, 5 ,-5, 9 => 3.
With O(n^2) it can be done.I am trying to find the solution below this complexity.
Consider S[0..N] - prefix sums of your array, i.e. S[k] = A[0] + A[1] + ... + A[k-1] for k from 0 to N.
Now sum of elements from L to R-1 is zero if and only if S[R] = S[L]. It means that you have to find number of indices 0 <= L < R <= N such that S[L] = S[R].
This problem can be solved with a hash table. Iterate over elements of S[] while maintaining for each value X number of times it was met in the already processed part of S[]. These counts should be stored in a hash map, where the number X is a key, and the count H[X] is the value. When you meet a new elements S[i], add H[S[i]] to your answer (these account for substrings ending with (i-1)-st element), then increment H[S[i]] by one.
Note that if sum of absolute values of array elements is small, you can use a simple array instead of hash table. The complexity is linear on average.
Here is the code:
long long CountZeroSubstrings(vector<int> A) {
int n = A.size();
vector<long long> S(n+1, 0);
for (int i = 0; i < n; i++)
S[i+1] = S[i] + A[i];
long long answer = 0;
unordered_map<long long, int> H;
for (int i = 0; i <= n; i++) {
if (H.count(S[i]))
answer += H[S[i]];
H[S[i]]++;
}
return answer;
}
This can be solved in linear time by keeping a hash table of sums reached during the array traversal. The number of subsets can then be directly calculated from the counts of revisited sums.
Haskell version:
import qualified Data.Map as M
import Data.List (foldl')
f = foldl' (\b a -> b + div (a * (a + 1)) 2) 0 . M.elems . snd
. foldl' (\(s,m) x -> let s' = s + x in case M.lookup s' m of
Nothing -> (s',M.insert s' 0 m)
otherwise -> (s',M.adjust (+1) s' m)) (0,M.fromList[(0,0)])
Output:
*Main> f [0,1,-1,0]
6
*Main> f [5,2,-2,5,-5,9]
3
*Main> f [0,0,0,0]
10
*Main> f [0,1,0,0]
4
*Main> f [0,1,0,0,2,3,-3]
5
*Main> f [0,1,-1,0,0,2,3,-3]
11
C# version of #stgatilov answer https://stackoverflow.com/a/31489960/3087417 with readable variables:
int[] sums = new int[arr.Count() + 1];
for (int i = 0; i < arr.Count(); i++)
sums[i + 1] = sums[i] + arr[i];
int numberOfFragments = 0;
Dictionary<int, int> sumToNumberOfRepetitions = new Dictionary<int, int>();
foreach (int item in sums)
{
if (sumToNumberOfRepetitions.ContainsKey(item))
numberOfFragments += sumToNumberOfRepetitions[item];
else
sumToNumberOfRepetitions.Add(item, 0);
sumToNumberOfRepetitions[item]++;
}
return numberOfFragments;
If you want to have sum not only zero but any number k, here is the hint:
int numToFind = currentSum - k;
if (sumToNumberOfRepetitions.ContainsKey(numToFind))
numberOfFragments += sumToNumberOfRepetitions[numToFind];
I feel it can be solved using DP:
Let the state be :
DP[i][j] represents the number of ways j can be formed using all the subarrays ending at i!
Transitions:
for every element in the initial step ,
Increase the number of ways to form Element[i] using i elements by 1 i.e. using the subarray of length 1 starting from i and ending with i i.e
DP[i][Element[i]]++;
then for every j in Range [ -Mod(highest Magnitude of any element ) , Mod(highest Magnitude of any element) ]
DP[i][j]+=DP[i-1][j-Element[i]];
Then your answer will be the sum of all the DP[i][0] (Number of ways to form 0 using subarrays ending at i ) where i varies from 1 to Number of elements
Complexity is O(MOD highest magnitude of any element * Number of Elements)
https://www.techiedelight.com/find-sub-array-with-0-sum/
This would be an exact solution.
# Utility function to insert <key, value> into the dict
def insert(dict, key, value):
# if the key is seen for the first time, initialize the list
dict.setdefault(key, []).append(value)
# Function to print all sub-lists with 0 sum present
# in the given list
def printallSublists(A):
# create an empty -dict to store ending index of all
# sub-lists having same sum
dict = {}
# insert (0, -1) pair into the dict to handle the case when
# sub-list with 0 sum starts from index 0
insert(dict, 0, -1)
result = 0
sum = 0
# traverse the given list
for i in range(len(A)):
# sum of elements so far
sum += A[i]
# if sum is seen before, there exists at-least one
# sub-list with 0 sum
if sum in dict:
list = dict.get(sum)
result += len(list)
# find all sub-lists with same sum
for value in list:
print("Sublist is", (value + 1, i))
# insert (sum so far, current index) pair into the -dict
insert(dict, sum, i)
print("length :", result)
if __name__ == '__main__':
A = [0, 1, 2, -3, 0, 2, -2]
printallSublists(A)
I don't know what the complexity of my suggestion would be but i have an idea :)
What you can do is try to reduce element from main array which are not able to contribute for you solution
suppose elements are -10, 5, 2, -2, 5,7 ,-5, 9,11,19
so you can see that -10,9,11 and 19 are element
that are never gone be useful to make sum 0 in your case
so try to remove -10,9,11, and 19 from your main array
to do this what you can do is
1) create two sub array from your main array
`positive {5,7,2,9,11,19}` and `negative {-10,-2,-5}`
2) remove element from positive array which does not satisfy condition
condition -> value should be construct from negative arrays element
or sum of its elements
ie.
5 = -5 //so keep it //don't consider the sign
7 = (-5 + -2 ) // keep
2 = -2 // keep
9 // cannot be construct using -10,-2,-5
same for all 11 and 19
3) remove element form negative array which does not satisfy condition
condition -> value should be construct from positive arrays element
or sum of its elements
i.e. -10 // cannot be construct so discard
-2 = 2 // keep
-5 = 5 // keep
so finally you got an array which contains -2,-5,5,7,2 create all possible sub array form it and check for sum = 0
(Note if your input array contains 0 add all 0's in final array)

How to generate a list of subsets with restrictions?

I am trying to figure out an efficient algorithm to take a list of items and generate all unique subsets that result from splitting the list into exactly 2 sublists. I'm sure there is a general purpose way to do this, but I'm interested in a specific case. My list will be sorted, and there can be duplicate items.
Some examples:
Input
{1,2,3}
Output
{{1},{2,3}}
{{2},{1,3}}
{{3},{1,2}}
Input
{1,2,3,4}
Output
{{1},{2,3,4}}
{{2},{1,3,4}}
{{3},{1,2,4}}
{{4},{1,2,3}}
{{1,2},{3,4}}
{{1,3},{2,4}}
{{1,4},{2,3}}
Input
{1,2,2,3}
Output
{{1},{2,2,3}}
{{2},{1,2,3}}
{{3},{1,2,2}}
{{1,2},{2,3}}
{{1,3},{2,2}}
I can do this on paper, but I'm struggling to figure out a simple way to do it programmatically. I'm only looking for a quick pseudocode description of how to do this, not any specific code examples.
Any help is appreciated. Thanks.
If you were generating all subsets you would end up generating 2n subsets for a list of length n. A common way to do this is to iterate through all the numbers i from 0 to 2n-1 and use the bits that are set in i to determine which items are in the ith subset. This works because any item either is or is not present in any particular subset, so by iterating through all the combinations of n bits you iterate through the 2n subsets.
For example, to generate the subsets of (1, 2, 3) you would iterate through the numbers 0 to 7:
0 = 000b → ()
1 = 001b → (1)
2 = 010b → (2)
3 = 011b → (1, 2)
4 = 100b → (3)
5 = 101b → (1, 3)
6 = 110b → (2, 3)
7 = 111b → (1, 2, 3)
In your problem you can generate each subset and its complement to get your pair of mutually exclusive subsets. Each pair would be repeated when you do this so you only need to iterate up to 2n-1 - 1 and then stop.
1 = 001b → (1) + (2, 3)
2 = 010b → (2) + (1, 3)
3 = 011b → (1, 2) + (3)
To deal with duplicate items you could generate subsets of list indices instead of subsets of list items. Like with the list (1, 2, 2, 3) generate subsets of the list (0, 1, 2, 3) instead and then use those numbers as indices into the (1, 2, 2, 3) list. Add a level of indirection, basically.
Here's some Python code putting this all together.
#!/usr/bin/env python
def split_subsets(items):
subsets = set()
for n in xrange(1, 2 ** len(items) / 2):
# Use ith index if ith bit of n is set.
l_indices = [i for i in xrange(0, len(items)) if n & (1 << i) != 0]
# Use the indices NOT present in l_indices.
r_indices = [i for i in xrange(0, len(items)) if i not in l_indices]
# Get the items corresponding to the indices above.
l = tuple(items[i] for i in l_indices)
r = tuple(items[i] for i in r_indices)
# Swap l and r if they are reversed.
if (len(l), l) > (len(r), r):
l, r = r, l
subsets.add((l, r))
# Sort the subset pairs so the left items are in ascending order.
return sorted(subsets, key = lambda (l, r): (len(l), l))
for l, r in split_subsets([1, 2, 2, 3]):
print l, r
Output:
(1,) (2, 2, 3)
(2,) (1, 2, 3)
(3,) (1, 2, 2)
(1, 2) (2, 3)
(1, 3) (2, 2)
The following C++ function does exactly what you need, but the order differs from the one in examples:
// input contains all input number with duplicates allowed
void generate(std::vector<int> input) {
typedef std::map<int,int> Map;
std::map<int,int> mp;
for (size_t i = 0; i < input.size(); ++i) {
mp[input[i]]++;
}
std::vector<int> numbers;
std::vector<int> mult;
for (Map::iterator it = mp.begin(); it != mp.end(); ++it) {
numbers.push_back(it->first);
mult.push_back(it->second);
}
std::vector<int> cur(mult.size());
for (;;) {
size_t i = 0;
while (i < cur.size() && cur[i] == mult[i]) cur[i++] = 0;
if (i == cur.size()) break;
cur[i]++;
std::vector<int> list1, list2;
for (size_t i = 0; i < cur.size(); ++i) {
list1.insert(list1.end(), cur[i], numbers[i]);
list2.insert(list2.end(), mult[i] - cur[i], numbers[i]);
}
if (list1.size() == 0 || list2.size() == 0) continue;
if (list1 > list2) continue;
std::cout << "{{";
for (size_t i = 0; i < list1.size(); ++i) {
if (i > 0) std::cout << ",";
std::cout << list1[i];
}
std::cout << "},{";
for (size_t i = 0; i < list2.size(); ++i) {
if (i > 0) std::cout << ",";
std::cout << list2[i];
}
std::cout << "}\n";
}
}
A bit of Erlang code, the problem is that it generates duplicates when you have duplicate elements, so the result list still needs to be filtered...
do([E,F]) -> [{[E], [F]}];
do([H|T]) -> lists:flatten([{[H], T}] ++
[[{[H|L1],L2},{L1, [H|L2]}] || {L1,L2} <- all(T)]).
filtered(L) ->
lists:usort([case length(L1) < length(L2) of true -> {L1,L2};
false -> {L2,L1} end
|| {L1,L2} <- do(L)]).
in pseudocode this means that:
for a two long list {E,F} the result is {{E},{F}}
for longer lists take the first element H and the rest of the list T and return
{{H},{T}} (the first element as a single element list, and the remaining list)
also run the algorithm recursively for T, and for each {L1,L2} element in the resulting list return {{H,L1},{L2}} and {{L1},{H,L2}}
My suggestion is...
First, count how many of each value you have, possibly in a hashtable. Then calculate the total number of combinations to consider - the product of the counts.
Iterate through that number of combinations.
At each combination, copy your loop count (as x), then start an inner loop through your hashtable items.
For each hashtable item, use (x modulo count) as your number of instances of the hashtable key in the first list. Divide x by the count before repeating the inner loop.
If you are worried that the number of combinations might overflow your integer type, the issue is avoidable. Use an array with each item (one for every hashmap key) starting from zero, and 'count' through the combinations treating each array item as a digit (so the whole array represents the combination number), but with each 'digit' having a different base (the corresponding count). That is, to 'increment' the array, first increment item 0. If it overflows (becomes equal to its count), set it to zero and increment the next array item. Repeat the overflow checks until If overflows continue past the end of the array, you have finished.
I think sergdev is using a very similar approach to this second one, but using std::map rather than a hashtable (std::unordered_map should work). A hashtable should be faster for large numbers of items, but won't give you the values in any particular order. The ordering for each loop through the keys in a hashtable should be consistent, though, unless you add/remove keys.

Resources