Generate all binary strings of length n with k bits set - algorithm

What's the best algorithm to find all binary strings of length n that contain k bits set? For example, if n=4 and k=3, there are...
0111
1011
1101
1110
I need a good way to generate these given any n and any k so I'd prefer it to be done with strings.

This method will generate all integers with exactly N '1' bits.
From https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
Compute the lexicographically next bit permutation
Suppose we have a pattern of N bits set to 1 in an integer and we want
the next permutation of N 1 bits in a lexicographical sense. For
example, if N is 3 and the bit pattern is 00010011, the next patterns
would be 00010101, 00010110, 00011001, 00011010, 00011100, 00100011,
and so forth. The following is a fast way to compute the next
permutation.
unsigned int v; // current permutation of bits
unsigned int w; // next permutation of bits
unsigned int t = v | (v - 1); // t gets v's least significant 0 bits set to 1
// Next set to 1 the most significant bit to change,
// set to 0 the least significant ones, and add the necessary 1 bits.
w = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
The __builtin_ctz(v) GNU C compiler intrinsic for x86 CPUs returns the number of trailing zeros. If you are using Microsoft compilers for
x86, the intrinsic is _BitScanForward. These both emit a bsf
instruction, but equivalents may be available for other architectures.
If not, then consider using one of the methods for counting the
consecutive zero bits mentioned earlier. Here is another version that
tends to be slower because of its division operator, but it does not
require counting the trailing zeros.
unsigned int t = (v | (v - 1)) + 1;
w = t | ((((t & -t) / (v & -v)) >> 1) - 1);
Thanks to Dario Sneidermanis of Argentina, who provided this on November 28, 2009.

Python
import itertools
def kbits(n, k):
result = []
for bits in itertools.combinations(range(n), k):
s = ['0'] * n
for bit in bits:
s[bit] = '1'
result.append(''.join(s))
return result
print kbits(4, 3)
Output: ['1110', '1101', '1011', '0111']
Explanation:
Essentially we need to choose the positions of the 1-bits. There are n choose k ways of choosing k bits among n total bits. itertools is a nice module that does this for us. itertools.combinations(range(n), k) will choose k bits from [0, 1, 2 ... n-1] and then it's just a matter of building the string given those bit indexes.
Since you aren't using Python, look at the pseudo-code for itertools.combinations here:
http://docs.python.org/library/itertools.html#itertools.combinations
Should be easy to implement in any language.

Forget about implementation ("be it done with strings" is obviously an implementation issue!) -- think about the algorithm, for Pete's sake... just as in, your very first TAG, man!
What you're looking for is all combinations of K items out of a set of N (the indices, 0 to N-1 , of the set bits). That's obviously simplest to express recursively, e.g., pseudocode:
combinations(K, setN):
if k > length(setN): return "no combinations possible"
if k == 0: return "empty combination"
# combinations including the first item:
return ((first-item-of setN) combined combinations(K-1, all-but-first-of setN))
union combinations(K, all-but-first-of setN)
i.e., the first item is either present or absent: if present, you have K-1 left to go (from the tail aka all-but-firs), if absent, still K left to go.
Pattern-matching functional languages like SML or Haskell may be best to express this pseudocode (procedural ones, like my big love Python, may actually mask the problem too deeply by including too-rich functionality, such as itertools.combinations, which does all the hard work for you and therefore HIDES it from you!).
What are you most familiar with, for this purpose -- Scheme, SML, Haskell, ...? I'll be happy to translate the above pseudocode for you. I can do it in languages such as Python too, of course -- but since the point is getting you to understand the mechanics for this homework assignment, I won't use too-rich functionality such as itertools.combinations, but rather recursion (and recursion-elimination, if needed) on more obvious primitives (such as head, tail, and concatenation). But please DO let us know what pseudocode-like language you're most familiar with! (You DO understand that the problem you state is identically equipotent to "get all combinations of K items out or range(N)", right?).

This C# method returns an enumerator that creates all combinations. As it creates the combinations as you enumerate them it only uses stack space, so it's not limited by memory space in the number of combinations that it can create.
This is the first version that I came up with. It's limited by the stack space to a length of about 2700:
static IEnumerable<string> BinStrings(int length, int bits) {
if (length == 1) {
yield return bits.ToString();
} else {
if (length > bits) {
foreach (string s in BinStrings(length - 1, bits)) {
yield return "0" + s;
}
}
if (bits > 0) {
foreach (string s in BinStrings(length - 1, bits - 1)) {
yield return "1" + s;
}
}
}
}
This is the second version, that uses a binary split rather than splitting off the first character, so it uses the stack much more efficiently. It's only limited by the memory space for the string that it creates in each iteration, and I have tested it up to a length of 10000000:
static IEnumerable<string> BinStrings(int length, int bits) {
if (length == 1) {
yield return bits.ToString();
} else {
int first = length / 2;
int last = length - first;
int low = Math.Max(0, bits - last);
int high = Math.Min(bits, first);
for (int i = low; i <= high; i++) {
foreach (string f in BinStrings(first, i)) {
foreach (string l in BinStrings(last, bits - i)) {
yield return f + l;
}
}
}
}
}

One problem with many of the standard solutions to this problem is that the entire set of strings is generated and then those are iterated through, which may exhaust the stack. It quickly becomes unwieldy for any but the smallest sets. In addition, in many instances, only a partial sampling is needed, but the standard (recursive) solutions generally chop the problem into pieces that are heavily biased to one direction (eg. consider all the solutions with a zero starting bit, and then all the solutions with a one starting bit).
In many cases, it would be more desireable to be able to pass a bit string (specifying element selection) to a function and have it return the next bit string in such a way as to have a minimal change (this is known as a Gray Code) and to have a representation of all the elements.
Donald Knuth covers a whole host of algorithms for this in his Fascicle 3A, section 7.2.1.3: Generating all Combinations.
There is an approach for tackling the iterative Gray Code algorithm for all ways of choosing k elements from n at http://answers.yahoo.com/question/index?qid=20081208224633AA0gdMl
with a link to final PHP code listed in the comment (click to expand it) at the bottom of the page.

One possible 1.5-liner:
$ python -c 'import itertools; \
print set([ n for n in itertools.permutations("0111", 4)])'
set([('1', '1', '1', '0'), ('0', '1', '1', '1'), ..., ('1', '0', '1', '1')])
.. where k is the number of 1s in "0111".
The itertools module explains equivalents for its methods; see the equivalent for the permutation method.

One algorithm that should work:
generate-strings(prefix, len, numBits) -> String:
if (len == 0):
print prefix
return
if (len == numBits):
print prefix + (len x "1")
generate-strings(prefix + "0", len-1, numBits)
generate-strings(prefix + "1", len-1, numBits)
Good luck!

In a more generic way, the below function will give you all possible index combinations for an N choose K problem which you can then apply to a string or whatever else:
def generate_index_combinations(n, k):
possible_combinations = []
def walk(current_index, indexes_so_far=None):
indexes_so_far = indexes_so_far or []
if len(indexes_so_far) == k:
indexes_so_far = tuple(indexes_so_far)
possible_combinations.append(indexes_so_far)
return
if current_index == n:
return
walk(current_index + 1, indexes_so_far + [current_index])
walk(current_index + 1, indexes_so_far)
if k == 0:
return []
walk(0)
return possible_combinations

I would try recursion.
There are n digits with k of them 1s. Another way to view this is sequence of k+1 slots with n-k 0s distributed among them. That is, (a run of 0s followed by a 1) k times, then followed by another run of 0s. Any of these runs can be of length zero, but the total length needs to be n-k.
Represent this as an array of k+1 integers. Convert to a string at the bottom of the recursion.
Recursively call to depth n-k, a method that increments one element of the array before a recursive call and then decrements it, k+1 times.
At the depth of n-k, output the string.
int[] run = new int[k+1];
void recur(int depth) {
if(depth == 0){
output();
return;
}
for(int i = 0; i < k + 1; ++i){
++run[i];
recur(depth - 1);
--run[i];
}
public static void main(string[] arrrgghhs) {
recur(n - k);
}
It's been a while since I have done Java, so there are probably some errors in this code, but the idea should work.

Are strings faster than an array of ints? All the solutions prepending to strings probably result in a copy of the string at each iteration.
So probably the most efficient way would be an array of int or char that you append to. Java has efficient growable containers, right? Use that, if it's faster than string. Or if BigInteger is efficient, it's certainly compact, since each bit only takes a bit, not a whole byte or int. But then to iterate over the bits you need to & mask a bit, and bitshift the mask to the next bit position. So probably slower, unless JIT compilers are good at that these days.
I would post this a a comment on the original question, but my karma isn't high enough. Sorry.

Python (functional style)
Using python's itertools.combinations you can generate all choices of k our of n and map those choices to a binary array with reduce
from itertools import combinations
from functools import reduce # not necessary in python 2.x
def k_bits_on(k,n):
one_at = lambda v,i:v[:i]+[1]+v[i+1:]
return [tuple(reduce(one_at,c,[0]*n)) for c in combinations(range(n),k)]
Example usage:
In [4]: k_bits_on(2,5)
Out[4]:
[(0, 0, 0, 1, 1),
(0, 0, 1, 0, 1),
(0, 0, 1, 1, 0),
(0, 1, 0, 0, 1),
(0, 1, 0, 1, 0),
(0, 1, 1, 0, 0),
(1, 0, 0, 0, 1),
(1, 0, 0, 1, 0),
(1, 0, 1, 0, 0),
(1, 1, 0, 0, 0)]

Well for this question (where you need to iterate over all the submasks in increasing order of their number of set bits), which has been marked as a duplicate of this.
We can simply iterate over all the submasks add them to a vector and sort it according to the number of set bits.
vector<int> v;
for(ll i=mask;i>0;i=(i-1)&mask)
v.push_back(i);
auto cmp = [](const auto &a, const auto &b){
return __builtin_popcountll(a) < __builtin_popcountll(b);
}
v.sort(v.begin(), v.end(), cmp);
Another way would be to iterate over all the submasks N times and add a number to the vector if the number of set bits is equal to i in the ith iteration.
Both ways have complexity of O(n*2^n)

Best and Easy Solution
This is an easy problem. We just need to use Dynamic Programming.
I can give my solution which stores integeres. After that you can convert integers to bitwise strings.
List<Long> dp[]=new List[m+1];
for(int i=0;i<=m;i++) dp[i]=new ArrayList<>();
// dp[i] stores all possible bit masks of n length and i bits set
dp[0].add(0l);
for(int i=1;i<=m;i++){
// transitions
for(int j=0;j<dp[i-1].size();j++){
long num=dp[i-1].get(j);
for(int p=0;p<n;p++){
if((num&(1l<<p))==0) dp[i].add(num|(1l<<p));
}
}
}
// dp[m] contains all possible numbers having m bits set of len n
But dp[m] contains duplicates because adding 1 to 10 or 01 gives 11 two times. To handle that we can use HashSet
Set<Long> set=new HashSet<>();
for(int i=0;i<dp[m].size();i++) set.add(dp[m].get(i));

if you want to solve this problem recursively, you can do this by a D&C algorithm :
def binlist(n,k,s):
if n==0:
if s.count('1')==k:
print(s)
else:
binlist(n-1,k,s+'1')
binlist(n-1,k,s+'0')
binlist(5,3,'')
the output will be :
11100
11010
11001
10110
10101
10011
01110
01101
01011
00111

Related

How to turn integers into Fibonacci coding efficiently?

Fibonacci sequence is obtained by starting with 0 and 1 and then adding the two last numbers to get the next one.
All positive integers can be represented as a sum of a set of Fibonacci numbers without repetition. For example: 13 can be the sum of the sets {13}, {5,8} or {2,3,8}. But, as we have seen, some numbers have more than one set whose sum is the number. If we add the constraint that the sets cannot have two consecutive Fibonacci numbers, than we have a unique representation for each number.
We will use a binary sequence (just zeros and ones) to do that. For example, 17 = 1 + 3 + 13. Then, 17 = 100101. See figure 2 for a detailed explanation.
I want to turn some integers into this representation, but the integers may be very big. How to I do this efficiently.
The problem itself is simple. You always pick the largest fibonacci number less than the remainder. You can ignore the the constraint with the consecutive numbers (since if you need both, the next one is the sum of both so you should have picked that one instead of the initial two).
So the problem remains how to quickly find the largest fibonacci number less than some number X.
There's a known trick that starting with the matrix (call it M)
1 1
1 0
You can compute fibbonacci number by matrix multiplications(the xth number is M^x). More details here: https://www.nayuki.io/page/fast-fibonacci-algorithms . The end result is that you can compute the number you're look in O(logN) matrix multiplications.
You'll need large number computations (multiplications and additions) if they don't fit into existing types.
Also store the matrices corresponding to powers of two you compute the first time, since you'll need them again for the results.
Overall this should be O((logN)^2 * large_number_multiplications/additions)).
First I want to tell you that I really liked this question, I didn't know that All positive integers can be represented as a sum of a set of Fibonacci numbers without repetition, I saw the prove by induction and it was awesome.
To respond to your question I think that we have to figure how the presentation is created. I think that the easy way to find this is that from the number we found the closest minor fibonacci item.
For example if we want to present 40:
We have Fib(9)=34 and Fib(10)=55 so the first element in the presentation is Fib(9)
since 40 - Fib(9) = 6 and (Fib(5) =5 and Fib(6) =8) the next element is Fib(5). So we have 40 = Fib(9) + Fib(5)+ Fib(2)
Allow me to write this in C#
class Program
{
static void Main(string[] args)
{
List<int> fibPresentation = new List<int>();
int numberToPresent = Convert.ToInt32(Console.ReadLine());
while (numberToPresent > 0)
{
int k =1;
while (CalculateFib(k) <= numberToPresent)
{
k++;
}
numberToPresent = numberToPresent - CalculateFib(k-1);
fibPresentation.Add(k-1);
}
}
static int CalculateFib(int n)
{
if (n == 1)
return 1;
int a = 0;
int b = 1;
// In N steps compute Fibonacci sequence iteratively.
for (int i = 0; i < n; i++)
{
int temp = a;
a = b;
b = temp + b;
}
return a;
}
}
Your result will be in fibPresentation
This encoding is more accurately called the "Zeckendorf representation": see https://en.wikipedia.org/wiki/Fibonacci_coding
A greedy approach works (see https://en.wikipedia.org/wiki/Zeckendorf%27s_theorem) and here's some Python code that converts a number to this representation. It uses the first 100 Fibonacci numbers and works correctly for all inputs up to 927372692193078999175 (and incorrectly for any larger inputs).
fibs = [0, 1]
for _ in xrange(100):
fibs.append(fibs[-2] + fibs[-1])
def zeck(n):
i = len(fibs) - 1
r = 0
while n:
if fibs[i] <= n:
r |= 1 << (i - 2)
n -= fibs[i]
i -= 1
return r
print bin(zeck(17))
The output is:
0b100101
As the greedy approach seems to work, it suffices to be able to invert the relation N=Fn.
By the Binet formula, Fn=[φ^n/√5], where the brackets denote the nearest integer. Then with n=floor(lnφ(√5N)) you are very close to the solution.
17 => n = floor(7.5599...) => F7 = 13
4 => n = floor(4.5531) => F4 = 3
1 => n = floor(1.6722) => F1 = 1
(I do not exclude that some n values can be off by one.)
I'm not sure if this is an efficient enough for you, but you could simply use Backtracking to find a(the) valid representation.
I would try to start the backtracking steps by taking the biggest possible fib number and only switch to smaller ones if the consecutive or the only once constraint is violated.

Generating a non-repeating set from a random seed, and extract result by index

p.s. I have referred to this as Random, but this is a Seed Based Random Shuffle, where the Seed will be generated by a PRNG, but with the same Seed, the same "random" distribution will be observed.
I am currently trying to find a method to assist in doing 2 things:
1) Generate Non-Repeating Sequence
This will take 2 arguments: Seed; and N. It will generate a sequence, of size N, populated with numbers between 1 and N, with no repetitions.
I have found a few good methods to do this, but most of them get stumped by feasibility with the second thing.
2) Extract an entry from the Sequence
This will take 3 arguments: Seed; N; and I. This is for determining what value would appear at position I in a Sequence that would be generated with Seed and N. However, in order to work with what I have in mind, it absolutely cannot use a generated sequence, and pick out an element.
I initially worked with pre-calculating the sequence, then querying it, but this only really works in test cases, as the number of Seeds, and the value of N that will be used would create a database into the Petabytes.
From what I can tell, having a method that implements requirement 1 by using requirement 2 would be the most ideal method.
i.e. a sequence is generated by:
function Generate_Sequence(int S, int N) {
int[] sequence = new int[N];
for (int i = 0; i < N; i++) {
sequence[i] = Extract_From_Sequence(S, N, i);
}
return sequence;
}
For Example
GS = Generate Sequence
ES = Extract from Sequence
for:
S = 1
N = 5
I = 4
GS(S, N) = { 4, 2, 5, 1, 3 }
ES(S, N, I) = 1
let S = 2
GS(S, N) = { 3, 5, 2, 4, 1 }
ES(S, N, I) = 4
One way to do this is to make a permutation over the bit positions of the number. Assume that N is a power of two (I will discuss the general case later!).
Use the seed S to generate a permutation \sigma over the set of {1,2,...,log(n)}. Then permute the bits of I according to the \sigma to obtain I'. In other words, the bit of I' at the position \sigma(x) is obtained from the bit of I at the position x.
One problem with this method is its linearity (It is closed under the XOR operation). To overcome this, you can find a number p with gcd(p,N)=1 (this can be done easily even for very large Ns) and generate a random number (q < N) using the seed S. The output of the Extract_From_Sequence(S, N, I) would be (p*I'+q mod N).
Now the case where N is not a complete power of two. The problem arises when the I' falls outside the range of [1,N]. In that case, we return the most significant bits of I to their initial position until the resulting value falls into the desired range. This is done by changing the \sigma(log(n)) bit of I' with the log(n) bit, and so on ....

Efficient way to find Next Smaller number with same number of 1 bits

I'm working on this question and come up with a solution (May be one or two condition needs to be added) but not sure if this is the right way to do it and find it cumbersome to use two loops and not sure if this is the efficient way of doing it. It would be great if anyone has some nice trick to do it or any better approach would be welcome :). (Language is not a barrier)
My Algorithm:
First find the first '0' lsb bit in the number
then find the next set bit which is next to this '0' bit
Change the one you find '0' bit to 1 & '1' bit to '0'
The number you'll get is next smaller
if all the bit is set then you don't have any number which is next smaller with the same number of '1' bits.
void nextSmaller(int number) {
int firstZeroBitHelper = 1, nextOneBitHelper;
while (firstZeroBitHelper < number) {
// when we find first lsb zero bit we'll stop
bool bit = number & firstZeroBitHelper;
if (bit == false)
break;
firstZeroBitHelper = firstZeroBitHelper << 1;
}
if (firstZeroBitHelper >= number) {
cout << "No minimum number exists" << endl;
return;
}
nextOneBitHelper = firstZeroBitHelper;
nextOneBitHelper = nextOneBitHelper << 1;
while (nextOneBitHelper < number) {
// when we get '1' after the previous zero we stop
bool bit = number & nextOneBitHelper;
if (bit == true)
break;
nextOneBitHelper = nextOneBitHelper << 1;
}
// change the first zero to 1
number = number | firstZeroBitHelper;
// change the next set bit to zero
number = number & ~nextOneBitHelper;
cout << number << endl;
}
Continuing from my comment..
Well, I found it, and pretty quickly too. See The Art of Computer Programming chapter 7.1.3 (in volume 4A), answer to question 21: "the reverse of Gosper's hack".
It looks like this:
t = y + 1;
u = t ^ y;
v = t & y;
x = v - (v & -v) / (u + 1);
Where y is the input and x the result. The same optimizations as in Gosper's hack apply to that division.
Going upwards:
Find the rightmost occurrence of "01" in the number and make it "10".
Justify all following 1-bits as far to the right as possible.
Going downwards:
Find the rightmost occurrence of "10" in the number and make it "01".
Left-justify all following 1-bits (i.e. don't do anything if the bit you just set is already followed by a 1).
An example to make the downwards case clear:
225 = 0b11100001
Swap: 0b11010001
Left-justify: 0b11011000 = 216
I'll explain the case of going upwards first, because it feels less tricky to me. We want to find the least-significant position where we can move a 1-bit one position left (in other words, the rightmost 0 that has a 1 to its right). It should be clear that this is the rightmost bit that we can possibly set, since we need to clear a bit somewhere else for every bit we set, and we need to clear a bit somewhere to the right of the bit we set, or else the number will get smaller instead of larger.
Now that we've set this bit, we want to clear one bit (to restore the total number of set bits), and reshuffle the remaining bits so that the number is as small as possible (this makes it the next greatest number with the same number of set bits). We can clear the bit to the right of the one we just set, and we can push any remaining 1-bits as far right as possible without fear of going below our original number, since all the less-significant bits together still add up to less than the single bit we just set.
Finding the next lower number instead of the next higher is basically the same, except that we're looking for the rightmost position where we can move a set bit one position right, and after doing that we want to move all less-significant bits as far left as possible.
It looks like others have got the bit-twiddling versions of this well in hand, but I wanted to see if I could give a good explanation of the logical/mathematical side of the algorithm.
anatolyg covered your algorithm pretty well, but there's a more efficient solution.
You can use Gosper's hack with a clever realization that if you flip the bits around, then Gosper's produces the values in descending order.
Something like this pseudocode would work:
let k := number
let n := num bits in k (log base 2)
k = k ^ ((1 << n) - 1)
k = gosper(k)
k = k ^ ((1 << n) - 1)
return k
This gives you a nice O(1) (or O(log n) if you consider xor to be linear time) algorithm. :)
There are some cases you have to consider, like if k=2^x-1 for some x, but that's pretty easy to catch.
The algorithm you described is not quite correct; it does everything right except one detail. Any binary number has the following form, found in the middle of your algorithm:
xxxxx...10000...1111...
---n---// f //
Here xxx... are arbitrary bits, and the numbers of consecutive zeros and ones are determined by firstZeroBitHelper and nextOneBitHelper (f and n).
Now you have to decrease this number leaving the same number of set bits, which necessarily turns the most significant 1 to 0:
xxxxx...0????...????...
-----n+f------
Note that any value for bits ??? makes the new number less than the original one, and you really want to choose these bits such that the resulting number has maximal value:
xxxxx...011111...0000...
---f+1--//n-1//
So you have to flip not just 2 bits, but f+2 bits (one bit from 1 to 0, and f+1 others from 0 to 1).
One way to do that is as follows.
First turn off all relevant bits:
number &= ~nextOneBitHelper;
number &= ~(nextOneBitHelper - 1);
Now turn on the needed bits, starting from MSB:
nextOneBitHelper >>= 1;
while (firstZeroBitHelper != 0)
{
number |= nextOneBitHelper;
nextOneBitHelper >>= 1;
firstZeroBitHelper >>= 1;
}
It is possible to implement the bit twiddling described above without loops; for that you would need to calculate n and f. Having done that:
unsigned mask = (1 << (f + 1)) - 1; // has f+1 bits set to 1
mask <<= n - 1; // now has these bits at correct positions
number |= mask; // now the number has these bits set
#include <iostream>
bool AlmostdivBy2(int num) {
return (-~num & (num)) == 0;
}
void toggleright(int &num) {
int t;
for (t = -1; num & 1; t++)
num >>= 1;
++num = (num << t) | ~-(1 << t);
}
void toggleleft(int &num) {
while (~num & 1)
num >>= 1; //Simply keep chopping off zeros
//~num & 1 checks if the number is even
//Even numbers have a zero at bit at the rightmost spot
}
int main() {
int value;
std::cin >> value;
if (!AlmostdivBy2(value)) {
(~value & 1) ? toggleleft(value) : toggleright(value);
}
std::cout << value << "\n";
return 0;
}
I think I might have over thought this one, but here is my explanation:
If the number is close to being a power of 2 i.e values like 1, 3, 7, 15, 31, ..., then there is no value smaller than it that could have the same number of ones in their binary representation. Therefore we don't worry about these numbers.
if the number is even, that is another easy fix, we simply keep chopping off zeros from the end until the number is odd
Odd numbers presented the most challange which is why it is recursive. First you had to find the first zero bit starting from the right of the number. When this is found, you add one to that number which will turn that last bit to a 1. As the recursion unwinds you keep shifting the bits to the left and adding one. When this is done, you have the next smallest.
Hope I didn't confuse you
EDIT
Worked on it more and here is a non recursive version of toggleright
void toggleright(int &num) {
int t = 1;
while ( (num >>= 1) & 1 && t++ );
num = (-~num << ~-t) | ~-(1 << t);
}

number to unique permutation mapping of a sequence containing duplicates

I am looking for an algorithm that can map a number to a unique permutation of a sequence. I have found out about Lehmer codes and the factorial number system thanks to a similar question, Fast permutation -> number -> permutation mapping algorithms, but that question doesn't deal with the case where there are duplicate elements in the sequence.
For example, take the sequence 'AAABBC'. There are 6! = 720 ways that could be arranged, but I believe there are only 6! / (3! * 2! * 1!) = 60 unique permutation of this sequence. How can I map a number to a permutation in these cases?
Edit: changed the term 'set' to 'sequence'.
From Permutation to Number:
Let K be the number of character classes (example: AAABBC has three character classes)
Let N[K] be the number of elements in each character class. (example: for AAABBC, we have N[K]=[3,2,1], and let N= sum(N[K])
Every legal permutation of the sequence then uniquely corresponds to a path in an incomplete K-way tree.
The unique number of the permutation then corresponds to the index of the tree-node in a post-order traversal of the K-ary tree terminal nodes.
Luckily, we don't actually have to perform the tree traversal -- we just need to know how many terminal nodes in the tree are lexicographically less than our node. This is very easy to compute, as at any node in the tree, the number terminal nodes below the current node is equal to the number of permutations using the unused elements in the sequence, which has a closed form solution that is a simple multiplication of factorials.
So given our 6 original letters, and the first element of our permutation is a 'B', we determine that there will be 5!/3!1!1! = 20 elements that started with 'A', so our permutation number has to be greater than 20. Had our first letter been a 'C', we could have calculated it as 5!/2!2!1! (not A) + 5!/3!1!1! (not B) = 30+ 20, or alternatively as
60 (total) - 5!/3!2!0! (C) = 50
Using this, we can take a permutation (e.g. 'BAABCA') and perform the following computations:
Permuation #= (5!/2!2!1!) ('B') + 0('A') + 0('A')+ 3!/1!1!1! ('B') + 2!/1!
= 30 + 3 +2 = 35
Checking that this works: CBBAAA corresponds to
(5!/2!2!1! (not A) + 5!/3!1!1! (not B)) 'C'+ 4!/2!2!0! (not A) 'B' + 3!/2!1!0! (not A) 'B' = (30 + 20) +6 + 3 = 59
Likewise, AAABBC =
0 ('A') + 0 'A' + '0' A' + 0 'B' + 0 'B' + 0 'C = 0
Sample implementation:
import math
import copy
from operator import mul
def computePermutationNumber(inPerm, inCharClasses):
permutation=copy.copy(inPerm)
charClasses=copy.copy(inCharClasses)
n=len(permutation)
permNumber=0
for i,x in enumerate(permutation):
for j in xrange(x):
if( charClasses[j]>0):
charClasses[j]-=1
permNumber+=multiFactorial(n-i-1, charClasses)
charClasses[j]+=1
if charClasses[x]>0:
charClasses[x]-=1
return permNumber
def multiFactorial(n, charClasses):
val= math.factorial(n)/ reduce(mul, (map(lambda x: math.factorial(x), charClasses)))
return val
From Number to Permutation:
This process can be done in reverse, though I'm not sure how efficiently:
Given a permutation number, and the alphabet that it was generated from, recursively subtract the largest number of nodes less than or equal to the remaining permutation number.
E.g. Given a permutation number of 59, we first can subtract 30 + 20 = 50 ('C') leaving 9. Then we can subtract 'B' (6) and a second 'B'(3), re-generating our original permutation.
Here is an algorithm in Java that enumerates the possible sequences by mapping an integer to the sequence.
public class Main {
private int[] counts = { 3, 2, 1 }; // 3 Symbols A, 2 Symbols B, 1 Symbol C
private int n = sum(counts);
public static void main(String[] args) {
new Main().enumerate();
}
private void enumerate() {
int s = size(counts);
for (int i = 0; i < s; ++i) {
String p = perm(i);
System.out.printf("%4d -> %s\n", i, p);
}
}
// calculates the total number of symbols still to be placed
private int sum(int[] counts) {
int n = 0;
for (int i = 0; i < counts.length; i++) {
n += counts[i];
}
return n;
}
// calculates the number of different sequences with the symbol configuration in counts
private int size(int[] counts) {
int res = 1;
int num = 0;
for (int pos = 0; pos < counts.length; pos++) {
for (int den = 1; den <= counts[pos]; den++) {
res *= ++num;
res /= den;
}
}
return res;
}
// maps the sequence number to a sequence
private String perm(int num) {
int[] counts = this.counts.clone();
StringBuilder sb = new StringBuilder(n);
for (int i = 0; i < n; ++i) {
int p = 0;
for (;;) {
while (counts[p] == 0) {
p++;
}
counts[p]--;
int c = size(counts);
if (c > num) {
sb.append((char) ('A' + p));
break;
}
counts[p]++;
num -= c;
p++;
}
}
return sb.toString();
}
}
The mapping used by the algorithm is as follows. I use the example given in the question (3 x A, 2 x B, 1 x C) to illustrate it.
There are 60 (=6!/3!/2!/1!) possible sequences in total, 30 (=5!/2!/2!/1!) of them have an A at the first place, 20 (=5!/3!/1!/1!) have a B at the first place, and 10 (=5!/3!/2!/0!) have a C at the first place.
The numbers 0..29 are mapped to all sequences starting with an A, 30..49 are mapped to the sequences starting with B, and 50..59 are mapped to the sequences starting with C.
The same process is repeated for the next place in the sequence, for example if we take the sequences starting with B we have now to map numbers 0 (=30-30) .. 19 (=49-30) to the sequences with configuration (3 x A, 1 x B, 1 x C)
A very simple algorithm to mapping a number for a permutation consists of n digits is
number<-digit[0]*10^(n-1)+digit[1]*10^(n-2)+...+digit[n]*10^0
You can find plenty of resources for algorithms to generate permutations. I guess you want to use this algorithm in bioinformatics. For example you can use itertools.permutations from Python.
Assuming the resulting number fits inside a word (e.g. 32 or 64 bit integer) relatively easily, then much of the linked article still applies. Encoding and decoding from a variable base remains the same. What changes is how the base varies.
If you're creating a permutation of a sequence, you pick an item out of your bucket of symbols (from the original sequence) and put it at the start. Then you pick out another item from your bucket of symbols and put it on the end of that. You'll keep picking and placing symbols at the end until you've run out of symbols in your bucket.
What's significant is which item you picked out of the bucket of the remaining symbols each time. The number of remaining symbols is something you don't have to record because you can compute that as you build the permutation -- that's a result of your choices, not the choices themselves.
The strategy here is to record what you chose, and then present an array of what's left to be chosen. Then choose, record which index you chose (packing it via the variable base method), and repeat until there's nothing left to choose. (Just as above when you were building a permuted sequence.)
In the case of duplicate symbols it doesn't matter which one you picked, so you can treat them as the same symbol. The difference is that when you pick a symbol which still has a duplicate left, you didn't reduce the number of symbols in the bucket to pick from next time.
Let's adopt a notation that makes this clear:
Instead of listing duplicate symbols left in our bucket to choose from like c a b c a a we'll list them along with how many are still in the bucket: c-2 a-3 b-1.
Note that if you pick c from the list, the bucket has c-1 a-3 b-1 left in it. That means next time we pick something, we have three choices.
But on the other hand, if I picked b from the list, the bucket has c-2 a-3 left in it. That means next time we pick something, we only have two choices.
When reconstructing the permuted sequence we just maintain the bucket the same way as when we were computing the permutation number.
The implementation details aren't trivial, but they're straightforward with standard algorithms. The only thing that might heckle you is what to do when a symbol in your bucket is no longer available.
Suppose your bucket was represented by a list of pairs (like above): c-1 a-3 b-1 and you choose c. Your resulting bucket is c-0 a-3 b-1. But c-0 is no longer a choice, so your list should only have two entries, not three. You could move the entire list down by 1 resulting in a-3 b-1, but if your list is long this is expensive. A fast an easy solution: move the last element of the bucket into the removed location and decrease your bucket size: c0 a-3 b-1 becomes b-1 a-3 <empty> or just b-1 a-3.
Note that we can do the above because it doesn't matter what order the symbols in the bucket are listed in, as long as it's the same way when we encode or decode the number.
As I was unsure of the code in gbronner's answer (or of my understanding), I recoded it in R as follows
ritpermz=function(n, parclass){
return(factorial(n) / prod(factorial(parclass)))}
rankum <- function(confg, parclass){
n=length(confg)
permdex=1
for (i in 1:(n-1)){
x=confg[i]
if (x > 1){
for (j in 1:(x-1)){
if(parclass[j] > 0){
parclass[j]=parclass[j]-1
permdex=permdex + ritpermz(n-i, parclass)
parclass[j]=parclass[j]+1}}}
parclass[x]=parclass[x]-1
}#}
return(permdex)
}
which does produce a ranking with the right range of integers

Finding if a random number has occured before or not

Let me be clear at start that this is a contrived example and not a real world problem.
If I have a problem of creating a random number between 0 to 10. I do this 11 times making sure that a previously occurred number is not drawn again, if I get a repeated number,
I create another random number again to make sure it has not be seen earlier. So essentially I get a a sequence of unique numbers from 0 - 10 in a random order
e.g. 3 1 2 0 5 9 4 8 10 6 7 and so on
Now to come up with logic to make sure that the random numbers are unique and not one which we have drawn before, we could use many approaches
Use C++ std::bitset and set the bit corresponding to the index equal to value of each random no. and check it next time when a new random number is drawn.
Or
Use a std::map<int,int> to count the number of times or even simple C array with some sentinel values stored in that array to indicate if that number has occurred or not.
If I have to avoid these methods above and use some mathematical/logical/bitwise operation to find whether a random number has been draw before or not, is there a way?
You don't want to do it the way you suggest. Consider what happens when you have already selected 10 of the 11 items; your random number generator will cycle until it finds the missing number, which might be never, depending on your random number generator.
A better solution is to create a list of numbers 0 to 10 in order, then shuffle the list into a random order. The normal algorithm for doing this is due to Knuth, Fisher and Yates: starting at the first element, swap each element with an element at a location greater than the current element in the array.
function shuffle(a, n)
for i from n-1 to 1 step -1
j = randint(i)
swap(a[i], a[j])
We assume an array with indices 0 to n-1, and a randint function that sets j to the range 0 <= j <= i.
Use an array and add all possible values to it. Then pick one out of the array and remove it. Next time, pick again until the array is empty.
Yes, there is a mathematical way to do it, but it is a bit expansive.
have an array: primes[] where primes[i] = the i'th prime number. So its beginning will be [2,3,5,7,11,...].
Also store a number mult Now, once you draw a number (let it be i) you check if mult % primes[i] == 0, if it is - the number was drawn before, if it wasn't - then the number was not. chose it and do mult = mult * primes[i].
However, it is expansive because it might require a lot of space for large ranges (the possible values of mult increases exponentially
(This is a nice mathematical approach, because we actually look at a set of primes p_i, the array of primes is only the implementation to the abstract set of primes).
A bit manipulation alternative for small values is using an int or long as a bitset.
With this approach, to check a candidate i is not in the set you only need to check:
if (pow(2,i) & set == 0) // not in the set
else //already in the set
To enter an element i to the set:
set = set | pow(2,i)
A better approach will be to populate a list with all the numbers, shuffle it with fisher-yates shuffle, and iterate it for generating new random numbers.
If I have to avoid these methods above and use some
mathematical/logical/bitwise operation to find whether a random number
has been draw before or not, is there a way?
Subject to your contrived constraints yes, you can imitate a small bitset using bitwise operations:
You can choose different integer types on the right according to what size you need.
bitset code bitwise code
std::bitset<32> x; unsigned long x = 0;
if (x[i]) { ... } if (x & (1UL << i)) { ... }
// assuming v is 0 or 1
x[i] = v; x = (x & ~(1UL << i)) | ((unsigned long)v << i);
x[i] = true; x |= (1UL << i);
x[i] = false; x &= ~(1UL << i);
For a larger set (beyond the size in bits of unsigned long long), you will need an array of your chosen integer type. Divide the index by the width of each value to know what index to look up in the array, and use the modulus for the bit shifts. This is basically what bitset does.
I'm assuming that the various answers that tell you how best to shuffle 10 numbers are missing the point entirely: that your contrived constraints are there because you do not in fact want or need to know how best to shuffle 10 numbers :-)
Keep a variable too map the drawn numbers. The i'th bit of that variable will be 1 if the number was drawn before:
int mapNumbers = 0;
int generateRand() {
if (mapNumbers & ((1 << 11) - 1) == ((1 << 11) - 1)) return; // return if all numbers have been generated
int x;
do {
x = newVal();
} while (!x & mapNumbers);
mapNumbers |= (1 << x);
return x;
}

Resources