Number of 1s in the two's complement binary representations of integers in a range - algorithm

This problem is from the 2011 Codesprint (http://csfall11.interviewstreet.com/):
One of the basics of Computer Science is knowing how numbers are represented in 2's complement. Imagine that you write down all numbers between A and B inclusive in 2's complement representation using 32 bits. How many 1's will you write down in all ?
Input:
The first line contains the number of test cases T (<1000). Each of the next T lines contains two integers A and B.
Output:
Output T lines, one corresponding to each test case.
Constraints:
-2^31 <= A <= B <= 2^31 - 1
Sample Input:
3
-2 0
-3 4
-1 4
Sample Output:
63
99
37
Explanation:
For the first case, -2 contains 31 1's followed by a 0, -1 contains 32 1's and 0 contains 0 1's. Thus the total is 63.
For the second case, the answer is 31 + 31 + 32 + 0 + 1 + 1 + 2 + 1 = 99
I realize that you can use the fact that the number of 1s in -X is equal to the number of 0s in the complement of (-X) = X-1 to speed up the search. The solution claims that there is a O(log X) recurrence relation for generating the answer but I do not understand it. The solution code can be viewed here: https://gist.github.com/1285119
I would appreciate it if someone could explain how this relation is derived!

Well, it's not that complicated...
The single-argument solve(int a) function is the key. It is short, so I will cut&paste it here:
long long solve(int a)
{
if(a == 0) return 0 ;
if(a % 2 == 0) return solve(a - 1) + __builtin_popcount(a) ;
return ((long long)a + 1) / 2 + 2 * solve(a / 2) ;
}
It only works for non-negative a, and it counts the number of 1 bits in all integers from 0 to a inclusive.
The function has three cases:
a == 0 -> returns 0. Obviously.
a even -> returns the number of 1 bits in a plus solve(a-1). Also pretty obvious.
The final case is the interesting one. So, how do we count the number of 1 bits from 0 to an odd number a?
Consider all of the integers between 0 and a, and split them into two groups: The evens, and the odds. For example, if a is 5, you have two groups (in binary):
000 (aka. 0)
010 (aka. 2)
100 (aka. 4)
and
001 (aka 1)
011 (aka 3)
101 (aka 5)
Observe that these two groups must have the same size (because a is odd and the range is inclusive). To count how many 1 bits there are in each group, first count all but the last bits, then count the last bits.
All but the last bits looks like this:
00
01
10
...and it looks like this for both groups. The number of 1 bits here is just solve(a/2). (In this example, it is the number of 1 bits from 0 to 2. Also, recall that integer division in C/C++ rounds down.)
The last bit is zero for every number in the first group and one for every number in the second group, so those last bits contribute (a+1)/2 one bits to the total.
So the third case of the recursion is (a+1)/2 + 2*solve(a/2), with appropriate casts to long long to handle the case where a is INT_MAX (and thus a+1 overflows).
This is an O(log N) solution. To generalize it to solve(a,b), you just compute solve(b) - solve(a), plus the appropriate logic for worrying about negative numbers. That is what the two-argument solve(int a, int b) is doing.

Cast the array into a series of integers. Then for each integer do:
int NumberOfSetBits(int i)
{
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
Also this is portable, unlike __builtin_popcount
See here: How to count the number of set bits in a 32-bit integer?

when a is positive, the better explanation was already been posted.
If a is negative, then on a 32-bit system each negative number between a and zero will have 32 1's bits less the number of bits in the range from 0 to the binary representation of positive a.
So, in a better way,
long long solve(int a) {
if (a >= 0){
if (a == 0) return 0;
else if ((a %2) == 0) return solve(a - 1) + noOfSetBits(a);
else return (2 * solve( a / 2)) + ((long long)a + 1) / 2;
}else {
a++;
return ((long long)(-a) + 1) * 32 - solve(-a);
}
}

In the following code, the bitsum of x is defined as the count of 1 bits in the two's complement representation of the numbers between 0 and x (inclusive), where Integer.MIN_VALUE <= x <= Integer.MAX_VALUE.
For example:
bitsum(0) is 0
bitsum(1) is 1
bitsum(2) is 1
bitsum(3) is 4
..etc
10987654321098765432109876543210 i % 10 for 0 <= i <= 31
00000000000000000000000000000000 0
00000000000000000000000000000001 1
00000000000000000000000000000010 2
00000000000000000000000000000011 3
00000000000000000000000000000100 4
00000000000000000000000000000101 ...
00000000000000000000000000000110
00000000000000000000000000000111 (2^i)-1
00000000000000000000000000001000 2^i
00000000000000000000000000001001 (2^i)+1
00000000000000000000000000001010 ...
00000000000000000000000000001011 x, 011 = x & (2^i)-1 = 3
00000000000000000000000000001100
00000000000000000000000000001101
00000000000000000000000000001110
00000000000000000000000000001111
00000000000000000000000000010000
00000000000000000000000000010001
00000000000000000000000000010010 18
...
01111111111111111111111111111111 Integer.MAX_VALUE
The formula of the bitsum is:
bitsum(x) = bitsum((2^i)-1) + 1 + x - 2^i + bitsum(x & (2^i)-1 )
Note that x - 2^i = x & (2^i)-1
Negative numbers are handled slightly differently than positive numbers. In this case the number of zeros is subtracted from the total number of bits:
Integer.MIN_VALUE <= x < -1
Total number of bits: 32 * -x.
The number of zeros in a negative number x is equal to the number of ones in -x - 1.
public class TwosComplement {
//t[i] is the bitsum of (2^i)-1 for i in 0 to 31.
private static long[] t = new long[32];
static {
t[0] = 0;
t[1] = 1;
int p = 2;
for (int i = 2; i < 32; i++) {
t[i] = 2*t[i-1] + p;
p = p << 1;
}
}
//count the bits between x and y inclusive
public static long bitsum(int x, int y) {
if (y > x && x > 0) {
return bitsum(y) - bitsum(x-1);
}
else if (y >= 0 && x == 0) {
return bitsum(y);
}
else if (y == x) {
return Integer.bitCount(y);
}
else if (x < 0 && y == 0) {
return bitsum(x);
} else if (x < 0 && x < y && y < 0 ) {
return bitsum(x) - bitsum(y+1);
} else if (x < 0 && x < y && 0 < y) {
return bitsum(x) + bitsum(y);
}
throw new RuntimeException(x + " " + y);
}
//count the bits between 0 and x
public static long bitsum(int x) {
if (x == 0) return 0;
if (x < 0) {
if (x == -1) {
return 32;
} else {
long y = -(long)x;
return 32 * y - bitsum((int)(y - 1));
}
} else {
int n = x;
int sum = 0; //x & (2^i)-1
int j = 0;
int i = 1; //i = 2^j
int lsb = n & 1; //least significant bit
n = n >>> 1;
while (n != 0) {
sum += lsb * i;
lsb = n & 1;
n = n >>> 1;
i = i << 1;
j++;
}
long tot = t[j] + 1 + sum + bitsum(sum);
return tot;
}
}
}

Related

Count number of 1 digits in 11 to the power of N

I came across an interesting problem:
How would you count the number of 1 digits in the representation of 11 to the power of N, 0<N<=1000.
Let d be the number of 1 digits
N=2 11^2 = 121 d=2
N=3 11^3 = 1331 d=2
Worst time complexity expected O(N^2)
The simple approach where you compute the number and count the number of 1 digits my getting the last digit and dividing by 10, does not work very well. 11^1000 is not even representable in any standard data type.
Powers of eleven can be stored as a string and calculated quite quickly that way, without a generalised arbitrary precision math package. All you need is multiply by ten and add.
For example, 111 is 11. To get the next power of 11 (112), you multiply by (10 + 1), which is effectively the number with a zero tacked the end, added to the number: 110 + 11 = 121.
Similarly, 113 can then be calculated as: 1210 + 121 = 1331.
And so on:
11^2 11^3 11^4 11^5 11^6
110 1210 13310 146410 1610510
+11 +121 +1331 +14641 +161051
--- ---- ----- ------ -------
121 1331 14641 161051 1771561
So that's how I'd approach, at least initially.
By way of example, here's a Python function to raise 11 to the n'th power, using the method described (I am aware that Python has support for arbitrary precision, keep in mind I'm just using it as a demonstration on how to do this an an algorithm, which is how the question was tagged):
def elevenToPowerOf(n):
# Anything to the zero is 1.
if n == 0: return "1"
# Otherwise, n <- n * 10 + n, once for each level of power.
num = "11"
while n > 1:
n = n - 1
# Make multiply by eleven easy.
ten = num + "0"
num = "0" + num
# Standard primary school algorithm for adding.
newnum = ""
carry = 0
for dgt in range(len(ten)-1,-1,-1):
res = int(ten[dgt]) + int(num[dgt]) + carry
carry = res // 10
res = res % 10
newnum = str(res) + newnum
if carry == 1:
newnum = "1" + newnum
# Prepare for next multiplication.
num = newnum
# There you go, 11^n as a string.
return num
And, for testing, a little program which works out those values for each power that you provide on the command line:
import sys
for idx in range(1,len(sys.argv)):
try:
power = int(sys.argv[idx])
except (e):
print("Invalid number [%s]" % (sys.argv[idx]))
sys.exit(1)
if power < 0:
print("Negative powers not allowed [%d]" % (power))
sys.exit(1)
number = elevenToPowerOf(power)
count = 0
for ch in number:
if ch == '1':
count += 1
print("11^%d is %s, has %d ones" % (power,number,count))
When you run that with:
time python3 prog.py 0 1 2 3 4 5 6 7 8 9 10 11 12 1000
you can see that it's both accurate (checked with bc) and fast (finished in about half a second):
11^0 is 1, has 1 ones
11^1 is 11, has 2 ones
11^2 is 121, has 2 ones
11^3 is 1331, has 2 ones
11^4 is 14641, has 2 ones
11^5 is 161051, has 3 ones
11^6 is 1771561, has 3 ones
11^7 is 19487171, has 3 ones
11^8 is 214358881, has 2 ones
11^9 is 2357947691, has 1 ones
11^10 is 25937424601, has 1 ones
11^11 is 285311670611, has 4 ones
11^12 is 3138428376721, has 2 ones
11^1000 is 2469932918005826334124088385085221477709733385238396234869182951830739390375433175367866116456946191973803561189036523363533798726571008961243792655536655282201820357872673322901148243453211756020067624545609411212063417307681204817377763465511222635167942816318177424600927358163388910854695041070577642045540560963004207926938348086979035423732739933235077042750354729095729602516751896320598857608367865475244863114521391548985943858154775884418927768284663678512441565517194156946312753546771163991252528017732162399536497445066348868438762510366191040118080751580689254476068034620047646422315123643119627205531371694188794408120267120500325775293645416335230014278578281272863450085145349124727476223298887655183167465713337723258182649072572861625150703747030550736347589416285606367521524529665763903537989935510874657420361426804068643262800901916285076966174176854351055183740078763891951775452021781225066361670593917001215032839838911476044840388663443684517735022039957481918726697789827894303408292584258328090724141496484460001, has 105 ones
real 0m0.609s
user 0m0.592s
sys 0m0.012s
That may not necessarily be O(n2) but it should be fast enough for your domain constraints.
Of course, given those constraints, you can make it O(1) by using a method I call pre-generation. Simply write a program to generate an array you can plug into your program which contains a suitable function. The following Python program does exactly that, for the powers of eleven from 1 to 100 inclusive:
def mulBy11(num):
# Same length to ease addition.
ten = num + '0'
num = '0' + num
# Standard primary school algorithm for adding.
result = ''
carry = 0
for idx in range(len(ten)-1, -1, -1):
digit = int(ten[idx]) + int(num[idx]) + carry
carry = digit // 10
digit = digit % 10
result = str(digit) + result
if carry == 1:
result = '1' + result
return result
num = '1'
print('int oneCountInPowerOf11(int n) {')
print(' static int numOnes[] = {-1', end='')
for power in range(1,101):
num = mulBy11(num)
count = sum(1 for ch in num if ch == '1')
print(',%d' % count, end='')
print('};')
print(' if ((n < 0) || (n > sizeof(numOnes) / sizeof(*numOnes)))')
print(' return -1;')
print(' return numOnes[n];')
print('}')
The code output by this script is:
int oneCountInPowerOf11(int n) {
static int numOnes[] = {-1,2,2,2,2,3,3,3,2,1,1,4,2,3,1,4,2,1,4,4,1,5,5,1,5,3,6,6,3,6,3,7,5,7,4,4,2,3,4,4,3,8,4,8,5,5,7,7,7,6,6,9,9,7,12,10,8,6,11,7,6,5,5,7,10,2,8,4,6,8,5,9,13,14,8,10,8,7,11,10,9,8,7,13,8,9,6,8,5,8,7,15,12,9,10,10,12,13,7,11,12};
if ((n < 0) || (n > sizeof(numOnes) / sizeof(*numOnes)))
return -1;
return numOnes[n];
}
which should be blindingly fast when plugged into a C program. On my system, the Python code itself (when you up the range to 1..1000) runs in about 0.6 seconds and the C code, when compiled, finds the number of ones in 111000 in 0.07 seconds.
Here's my concise solution.
def count1s(N):
# When 11^(N-1) = result, 11^(N) = (10+1) * result = 10*result + result
result = 1
for i in range(N):
result += 10*result
# Now count 1's
count = 0
for ch in str(result):
if ch == '1':
count += 1
return count
En c#:
private static void Main(string[] args)
{
var res = Elevento(1000);
var countOf1 = res.Select(x => int.Parse(x.ToString())).Count(s => s == 1);
Console.WriteLine(countOf1);
}
private static string Elevento(int n)
{
if (n == 0) return "1";
//Otherwise, n <- n * 10 + n, once for each level of power.
var num = "11";
while (n > 1)
{
n--;
// Make multiply by eleven easy.
var ten = num + "0";
num = "0" + num;
//Standard primary school algorithm for adding.
var newnum = "";
var carry = 0;
foreach (var dgt in Enumerable.Range(0, ten.Length).Reverse())
{
var res = int.Parse(ten[dgt].ToString()) + int.Parse(num[dgt].ToString()) + carry;
carry = res/10;
res = res%10;
newnum = res + newnum;
}
if (carry == 1)
newnum = "1" + newnum;
// Prepare for next multiplication.
num = newnum;
}
//There you go, 11^n as a string.
return num;
}

Display all the possible numbers having its digits in ascending order

Write a program that can display all the possible numbers in between given two numbers, having its digits in ascending order.
For Example:-
Input: 5000 to 6000
Output: 5678 5679 5689 5789
Input: 90 to 124
Output: 123 124
Brute force approach can make it count to all numbers and check of digits for each one of them. But I want approaches that can skip some numbers and can bring complexity lesser than O(n). Do any such solution(s) exists that can give better approach for this problem?
I offer a solution in Python. It is efficient as it considers only the relevant numbers. The basic idea is to count upwards, but handle overflow somewhat differently. While we normally set overflowing digits to 0, here we set them to the previous digit +1. Please check the inline comments for further details. You can play with it here: http://ideone.com/ePvVsQ
def ascending( na, nb ):
assert nb>=na
# split each number into a list of digits
a = list( int(x) for x in str(na))
b = list( int(x) for x in str(nb))
d = len(b) - len(a)
# if both numbers have different length add leading zeros
if d>0:
a = [0]*d + a # add leading zeros
assert len(a) == len(b)
n = len(a)
# check if the initial value has increasing digits as required,
# and fix if necessary
for x in range(d+1, n):
if a[x] <= a[x-1]:
for y in range(x, n):
a[y] = a[y-1] + 1
break
res = [] # result set
while a<=b:
# if we found a value and add it to the result list
# turn the list of digits back into an integer
if max(a) < 10:
res.append( int( ''.join( str(k) for k in a ) ) )
# in order to increase the number we look for the
# least significant digit that can be increased
for x in range( n-1, -1, -1): # count down from n-1 to 0
if a[x] < 10+x-n:
break
# digit x is to be increased
a[x] += 1
# all subsequent digits must be increased accordingly
for y in range( x+1, n ):
a[y] = a[y-1] + 1
return res
print( ascending( 5000, 9000 ) )
Sounds like task from Project Euler. Here is the solution in C++. It is not short, but it is straightforward and effective. Oh, and hey, it uses backtracking.
// Higher order digits at the back
typedef std::vector<int> Digits;
// Extract decimal digits of a number
Digits ExtractDigits(int n)
{
Digits digits;
while (n > 0)
{
digits.push_back(n % 10);
n /= 10;
}
if (digits.empty())
{
digits.push_back(0);
}
return digits;
}
// Main function
void PrintNumsRec(
const Digits& minDigits, // digits of the min value
const Digits& maxDigits, // digits of the max value
Digits& digits, // digits of current value
int pos, // current digits with index greater than pos are already filled
bool minEq, // currently filled digits are the same as of min value
bool maxEq) // currently filled digits are the same as of max value
{
if (pos < 0)
{
// Print current value. Handle leading zeros by yourself, if need
for (auto pDigit = digits.rbegin(); pDigit != digits.rend(); ++pDigit)
{
if (*pDigit >= 0)
{
std::cout << *pDigit;
}
}
std::cout << std::endl;
return;
}
// Compute iteration boundaries for current position
int first = minEq ? minDigits[pos] : 0;
int last = maxEq ? maxDigits[pos] : 9;
// The last filled digit
int prev = digits[pos + 1];
// Make sure generated number has increasing digits
int firstInc = std::max(first, prev + 1);
// Iterate through possible cases for current digit
for (int d = firstInc; d <= last; ++d)
{
digits[pos] = d;
if (d == 0 && prev == -1)
{
// Mark leading zeros with -1
digits[pos] = -1;
}
PrintNumsRec(minDigits, maxDigits, digits, pos - 1, minEq && (d == first), maxEq && (d == last));
}
}
// High-level function
void PrintNums(int min, int max)
{
auto minDigits = ExtractDigits(min);
auto maxDigits = ExtractDigits(max);
// Make digits array of the same size
while (minDigits.size() < maxDigits.size())
{
minDigits.push_back(0);
}
Digits digits(minDigits.size());
int pos = digits.size() - 1;
// Placeholder for leading zero
digits.push_back(-1);
PrintNumsRec(minDigits, maxDigits, digits, pos, true, true);
}
void main()
{
PrintNums(53, 297);
}
It uses recursion to handle arbitrary amount of digits, but it is essentially the same as the nested loops approach. Here is the output for (53, 297):
056
057
058
059
067
068
069
078
079
089
123
124
125
126
127
128
129
134
135
136
137
138
139
145
146
147
148
149
156
157
158
159
167
168
169
178
179
189
234
235
236
237
238
239
245
246
247
248
249
256
257
258
259
267
268
269
278
279
289
Much more interesting problem would be to count all these numbers without explicitly computing it. One would use dynamic programming for that.
There is only a very limited number of numbers which can match your definition (with 9 digits max) and these can be generated very fast. But if you really need speed, just cache the tree or the generated list and do a lookup when you need your result.
using System;
using System.Collections.Generic;
namespace so_ascending_digits
{
class Program
{
class Node
{
int digit;
int value;
List<Node> children;
public Node(int val = 0, int dig = 0)
{
digit = dig;
value = (val * 10) + digit;
children = new List<Node>();
for (int i = digit + 1; i < 10; i++)
{
children.Add(new Node(value, i));
}
}
public void Collect(ref List<int> collection, int min = 0, int max = Int16.MaxValue)
{
if ((value >= min) && (value <= max)) collection.Add(value);
foreach (Node n in children) if (value * 10 < max) n.Collect(ref collection, min, max);
}
}
static void Main(string[] args)
{
Node root = new Node();
List<int> numbers = new List<int>();
root.Collect(ref numbers, 5000, 6000);
numbers.Sort();
Console.WriteLine(String.Join("\n", numbers));
}
}
}
Why the brute force algorithm may be very inefficient.
One efficient way of encoding the input is to provide two numbers: the lower end of the range, a, and the number of values in the range, b-a-1. This can be encoded in O(lg a + lg (b - a)) bits, since the number of bits needed to represent a number in base-2 is roughly equal to the base-2 logarithm of the number. We can simplify this to O(lg b), because intuitively if b - a is small, then a = O(b), and if b - a is large, then b - a = O(b). Either way, the total input size is O(2 lg b) = O(lg b).
Now the brute force algorithm just checks each number from a to b, and outputs the numbers whose digits in base 10 are in increasing order. There are b - a + 1 possible numbers in that range. However, when you represent this in terms of the input size, you find that b - a + 1 = 2lg (b - a + 1) = 2O(lg b) for a large enough interval.
This means that for an input size n = O(lg b), you may need to check in the worst case O(2 n) values.
A better algorithm
Instead of checking every possible number in the interval, you can simply generate the valid numbers directly. Here's a rough overview of how. A number n can be thought of as a sequence of digits n1 ... nk, where k is again roughly log10 n.
For a and a four-digit number b, the iteration would look something like
for w in a1 .. 9:
for x in w+1 .. 9:
for y in x+1 .. 9:
for x in y+1 .. 9:
m = 1000 * w + 100 * x + 10 * y + w
if m < a:
next
if m > b:
exit
output w ++ x ++ y ++ z (++ is just string concatenation)
where a1 can be considered 0 if a has fewer digits than b.
For larger numbers, you can imagine just adding more nested for loops. In general, if b has d digits, you need d = O(lg b) loops, each of which iterates at most 10 times. The running time is thus O(10 lg b) = O(lg b) , which is a far better than the O(2lg b) running time you get by checking if every number is sorted or not.
One other detail that I have glossed over, which actually does affect the running time. As written, the algorithm needs to consider the time it takes to generate m. Without going into the details, you could assume that this adds at worst a factor of O(lg b) to the running time, resulting in an O(lg2 b) algorithm. However, using a little extra space at the top of each for loop to store partial products would save lots of redundant multiplication, allowing us to preserve the originally stated O(lg b) running time.
One way (pseudo-code):
for (digit3 = '5'; digit3 <= '6'; digit3++)
for (digit2 = digit3+1; digit2 <= '9'; digit2++)
for (digit1 = digit2+1; digit1 <= '9'; digit1++)
for (digit0 = digit1+1; digit0 <= '9'; digit0++)
output = digit3 + digit2 + digit1 + digit0; // concatenation

Caculating total combinations

I don't know how to go about this programming problem.
Given two integers n and m, how many numbers exist such that all numbers have all digits from 0 to n-1 and the difference between two adjacent digits is exactly 1 and the number of digits in the number is atmost 'm'.
What is the best way to solve this problem? Is there a direct mathematical formula?
Edit: The number cannot start with 0.
Example:
for n = 3 and m = 6 there are 18 such numbers (210, 2101, 21012, 210121 ... etc)
Update (some people have encountered an ambiguity):
All digits from 0 to n-1 must be present.
This Python code computes the answer in O(nm) by keeping track of the numbers ending with a particular digit.
Different arrays (A,B,C,D) are used to track numbers that have hit the maximum or minimum of the range.
n=3
m=6
A=[1]*n # Number of ways of being at digit i and never being to min or max
B=[0]*n # number of ways with minimum being observed
C=[0]*n # number of ways with maximum being observed
D=[0]*n # number of ways with both being observed
A[0]=0 # Cannot start with 0
A[n-1]=0 # Have seen max so this 1 moves from A to C
C[n-1]=1 # Have seen max if start with highest digit
t=0
for k in range(m-1):
A2=[0]*n
B2=[0]*n
C2=[0]*n
D2=[0]*n
for i in range(1,n-1):
A2[i]=A[i+1]+A[i-1]
B2[i]=B[i+1]+B[i-1]
C2[i]=C[i+1]+C[i-1]
D2[i]=D[i+1]+D[i-1]
B2[0]=A[1]+B[1]
C2[n-1]=A[n-2]+C[n-2]
D2[0]=C[1]+D[1]
D2[n-1]=B[n-2]+D[n-2]
A=A2
B=B2
C=C2
D=D2
x=sum(d for d in D2)
t+=x
print t
After doing some more research, I think there may actually be a mathematical approach after all, although the math is advanced for me. Douglas S. Stones pointed me in the direction of Joseph Myers' (2008) article, BMO 2008–2009 Round 1 Problem 1—Generalisation, which derives formulas for calculating the number of zig-zag paths across a rectangular board.
As I understand it, in Anirudh's example, our board would have 6 rows of length 3 (I believe this would mean n=3 and r=6 in the article's terms). We can visualize our board so:
0 1 2 example zig-zag path: 0
0 1 2 1
0 1 2 0
0 1 2 1
0 1 2 2
0 1 2 1
Since Myers' formula m(n,r) would generate the number for all the zig-zag paths, that is, the number of all 6-digit numbers where all adjacent digits are consecutive and digits are chosen from (0,1,2), we would still need to determine and subtract those that begin with zero and those that do not include all digits.
If I understand correctly, we may do this in the following way for our example, although generalizing the concept to arbitrary m and n may prove more complicated:
Let m(3,6) equal the number of 6-digit numbers where all adjacent digits
are consecutive and digits are chosen from (0,1,2). According to Myers,
m(3,r) is given by formula and also equals OEIS sequence A029744 at
index r+2, so we have
m(3,6) = 16
How many of these numbers start with zero? Myers describes c(n,r) as the
number of zig-zag paths whose colour is that of the square in the top
right corner of the board. In our case, c(3,6) would include the total
for starting-digit 0 as well as starting-digit 2. He gives c(3,2r) as 2^r,
so we have
c(3,6) = 8. For starting-digit 0 only, we divide by two to get 4.
Now we need to obtain only those numbers that include all the digits in
the range, but how? We can do this be subtracting m(n-1,r) from m(n,r).
In our case, we have all the m(2,6) that would include only 0's and 1's,
and all the m(2,6) that would include 1's and 2's. Myers gives
m(2,anything) as 2, so we have
2*m(2,6) = 2*2 = 4
But we must remember that one of the zero-starting numbers is included
in our total for 2*m(2,6), namely 010101. So all together we have
m(3,6) - c(3,6)/2 - 4 + 1
= 16 - 4 - 4 + 1
= 9
To complete our example, we must follow a similar process for m(3,5),
m(3,4) and m(3,3). Since it's late here, I might follow up tomorrow...
One approach could be to program it recursively, calling the function to add as well as subtract from the last digit.
Haskell code:
import Data.List (sort,nub)
f n m = concatMap (combs n) [n..m]
combs n m = concatMap (\x -> combs' 1 [x]) [1..n - 1] where
combs' count result
| count == m = if test then [concatMap show result] else []
| otherwise = combs' (count + 1) (result ++ [r + 1])
++ combs' (count + 1) (result ++ [r - 1])
where r = last result
test = (nub . sort $ result) == [0..n - 1]
Output:
*Main> f 3 6
["210","1210","1012","2101","12101","10121","21210","21012"
,"21010","121210","121012","121010","101212","101210","101012"
,"212101","210121","210101"]
In response to Anirudh Rayabharam's comment, I hope the following code will be more 'pseudocode' like. When the total number of digits reaches m, the function g outputs 1 if the solution has hashed all [0..n-1], and 0 if not. The function f accumulates the results for g for starting digits [1..n-1] and total number of digits [n..m].
Haskell code:
import qualified Data.Set as S
g :: Int -> Int -> Int -> Int -> (S.Set Int, Int) -> Int
g n m digitCount lastDigit (hash,hashCount)
| digitCount == m = if test then 1 else 0
| otherwise =
if lastDigit == 0
then g n m d' (lastDigit + 1) (hash'',hashCount')
else if lastDigit == n - 1
then g n m d' (lastDigit - 1) (hash'',hashCount')
else g n m d' (lastDigit + 1) (hash'',hashCount')
+ g n m d' (lastDigit - 1) (hash'',hashCount')
where test = hashCount' == n
d' = digitCount + 1
hash'' = if test then S.empty else hash'
(hash',hashCount')
| hashCount == n = (S.empty,hashCount)
| S.member lastDigit hash = (hash,hashCount)
| otherwise = (S.insert lastDigit hash,hashCount + 1)
f n m = foldr forEachNumDigits 0 [n..m] where
forEachNumDigits numDigits accumulator =
accumulator + foldr forEachStartingDigit 0 [1..n - 1] where
forEachStartingDigit startingDigit accumulator' =
accumulator' + g n numDigits 1 startingDigit (S.empty,0)
Output:
*Main> f 3 6
18
(0.01 secs, 571980 bytes)
*Main> f 4 20
62784
(1.23 secs, 97795656 bytes)
*Main> f 4 25
762465
(11.73 secs, 1068373268 bytes)
model your problem as 2 superimposed lattices in 2 dimensions, specifically as pairs (i,j) interconnected with oriented edges ((i0,j0),(i1,j1)) where i1 = i0 + 1, |j1 - j0| = 1, modified as follows:
dropping all pairs (i,j) with j > 9 and its incident edges
dropping all pairs (i,j) with i > m-1 and its incident edges
dropping edge ((0,0), (1,1))
this construction results in a structure like in this diagram:
:
the requested numbers map to paths in the lattice starting at one of the green elements ((0,j), j=1..min(n-1,9)) that contain at least one pink and one red element ((i,0), i=1..m-1, (i,n-1), i=0..m-1 ). to see this, identify the i-th digit j of a given number with point (i,j). including pink and red elements ('extremal digits') guarantee that all available diguts are represented in the number.
Analysis
for convenience, let q1, q2 denote the position-1.
let q1 be the position of a number's first digit being either 0 or min(n-1,9).
let q2 be the position of a number's first 0 if the digit at position q1 is min(n-1,9) and vv.
case 1: first extremal digit is 0
the number of valid prefixes containing no 0 can be expressed as sum_{k=1..min(n-1,9)} (paths_to_0(k,1,q1), the function paths_to_0 being recursively defined as
paths_to_0(0,q1-1,q1) = 0;
paths_to_0(1,q1-1,q1) = 1;
paths_to_0(digit,i,q1) = 0; if q1-i < digit;
paths_to_0(x,_,_) = 0; if x >= min(n-1,9)
// x=min(n-1,9) mustn't occur before position q2,
// x > min(n-1,9) not at all
paths_to_0(x,_,_) = 0; if x <= 0;
// x=0 mustn't occur before position q1,
// x < 0 not at all
and else paths_to_0(digit,i,q1) =
paths_to_0(digit+1,i+1,q1) + paths_to_0(digit-1,i+1,q1);
similarly we have
paths_to_max(min(n-1,9),q2-1,q2) = 0;
paths_to_max(min(n-2,8),q2-1,q2) = 1;
paths_to_max(digit,i,q2) = 0 if q2-i < n-1;
paths_to_max(x,_,_) = 0; if x >= min(n-1,9)
// x=min(n-1,9) mustn't occur before
// position q2,
// x > min(n-1,9) not at all
paths_to_max(x,_,_) = 0; if x < 0;
and else paths_to_max(digit,q1,q2) =
paths_max(digit+1,q1+1,q2) + paths_to_max(digit-1,q1+1,q2);
and finally
paths_suffix(digit,length-1,length) = 2; if digit > 0 and digit < min(n-1,9)
paths_suffix(digit,length-1,length) = 1; if digit = 0 or digit = min(n-1,9)
paths_suffix(digit,k,length) = 0; if length > m-1
or length < q2
or k > length
paths_suffix(digit,k,0) = 1; // the empty path
and else paths_suffix(digit,k,length) =
paths_suffix(digit+1,k+1,length) + paths_suffix(digit-1,k+1,length);
... for a grand total of
number_count_case_1(n, m) =
sum_{first=1..min(n-1,9), q1=1..m-1-(n-1), q2=q1..m-1, l_suffix=0..m-1-q2} (
paths_to_0(first,1,q1)
+ paths_to_max(0,q1,q2)
+ paths_suffix(min(n-1,9),q2,l_suffix+q2)
)
case 2: first extremal digit is min(n-1,9)
case 2.1: initial digit is not min(n-1,9)
this is symmetrical to case 1 with all digits d replaced by min(n,10) - d. as the lattice structure is symmetrical, this means number_count_case_2_1 = number_count_case_1.
case 2.2: initial digit is min(n-1,9)
note that q1 is 1 and the second digit must be min(n-2,8).
thus
number_count_case_2_2 (n, m) =
sum_{q2=1..m-2, l_suffix=0..m-2-q2} (
paths_to_max(1,1,q2)
+ paths_suffix(min(n-1,9),q2,l_suffix+q2)
)
so the grand grand total will be
number_count ( n, m ) = 2 * number_count_case_1 (n, m) + number_count_case_2_2 (n, m);
Code
i don't know whether a closed expression for number_count exists, but the following perl code will compute it (the code is but a proof of concept as it does not use memoization techniques to avoid recomputing results already obtained):
use strict;
use warnings;
my ($n, $m) = ( 5, 7 ); # for example
$n = ($n > 10) ? 10 : $n; # cutoff
sub min
sub paths_to_0 ($$$) {
my (
$d
, $at
, $until
) = #_;
#
if (($d == 0) && ($at == $until - 1)) { return 0; }
if (($d == 1) && ($at == $until - 1)) { return 1; }
if ($until - $at < $d) { return 0; }
if (($d <= 0) || ($d >= $n))) { return 0; }
return paths_to_0($d+1, $at+1, $until) + paths_to_0($d-1, $at+1, $until);
} # paths_to_0
sub paths_to_max ($$$) {
my (
$d
, $at
, $until
) = #_;
#
if (($d == $n-1) && ($at == $until - 1)) { return 0; }
if (($d == $n-2) && ($at == $until - 1)) { return 1; }
if ($until - $at < $n-1) { return 0; }
if (($d < 0) || ($d >= $n-1)) { return 0; }
return paths_to_max($d+1, $at+1, $until) + paths_to_max($d-1, $at+1, $until);
} # paths_to_max
sub paths_suffix ($$$) {
my (
$d
, $at
, $until
) = #_;
#
if (($d < $n-1) && ($d > 0) && ($at == $until - 1)) { return 2; }
if ((($d == $n-1) && ($d == 0)) && ($at == $until - 1)) { return 1; }
if (($until > $m-1) || ($at > $until)) { return 0; }
if ($until == 0) { return 1; }
return paths_suffix($d+1, $at+1, $until) + paths_suffix($d-1, $at+1, $until);
} # paths_suffix
#
# main
#
number_count =
sum_{first=1..min(n-1,9), q1=1..m-1-(n-1), q2=q1..m-1, l_suffix=0..m-1-q2} (
paths_to_0(first,1,q1)
+ paths_to_max(0,q1,q2)
+ paths_suffix(min(n-1,9),q2,l_suffix+q2)
)
my ($number_count, $number_count_2_2) = (0, 0);
my ($first, $q1, i, $l_suffix);
for ($first = 1; $first <= $n-1; $first++) {
for ($q1 = 1; $q1 <= $m-1 - ($n-1); $q1++) {
for ($q2 = $q1; $q2 <= $m-1; $q2++) {
for ($l_suffix = 0; $l_suffix <= $m-1 - $q2; $l_suffix++) {
$number_count =
$number_count
+ paths_to_0($first,1,$q1)
+ paths_to_max(0,$q1,$q2)
+ paths_suffix($n-1,$q2,$l_suffix+$q2)
;
}
}
}
}
#
# case 2.2
#
for ($q2 = 1; $q2 <= $m-2; $q2++) {
for ($l_suffix = 0; $l_suffix <= $m-2 - $q2; $l_suffix++) {
$number_count_2_2 =
$number_count_2_2
+ paths_to_max(1,1,$q2)
+ paths_suffix($n-1,$q2,$l_suffix+$q2)
;
}
}
$number_count = 2 * $number_count + number_count_2_2;

How to find the number of values in a given range divisible by a given value?

I have three numbers x, y , z.
For a range between numbers x and y.
How can i find the total numbers whose % with z is 0 i.e. how many numbers between x and y are divisible by z ?
It can be done in O(1): find the first one, find the last one, find the count of all other.
I'm assuming the range is inclusive. If your ranges are exclusive, adjust the bounds by one:
find the first value after x that is divisible by z. You can discard x:
x_mod = x % z;
if(x_mod != 0)
x += (z - x_mod);
find the last value before y that is divisible by y. You can discard y:
y -= y % z;
find the size of this range:
if(x > y)
return 0;
else
return (y - x) / z + 1;
If mathematical floor and ceil functions are available, the first two parts can be written more readably. Also the last part can be compressed using math functions:
x = ceil (x, z);
y = floor (y, z);
return max((y - x) / z + 1, 0);
if the input is guaranteed to be a valid range (x >= y), the last test or max is unneccessary:
x = ceil (x, z);
y = floor (y, z);
return (y - x) / z + 1;
(2017, answer rewritten thanks to comments)
The number of multiples of z in a number n is simply n / z
/ being the integer division, meaning decimals that could result from the division are simply ignored (for instance 17/5 => 3 and not 3.4).
Now, in a range from x to y, how many multiples of z are there?
Let see how many multiples m we have up to y
0----------------------------------x------------------------y
-m---m---m---m---m---m---m---m---m---m---m---m---m---m---m---
You see where I'm going... to get the number of multiples in the range [ x, y ], get the number of multiples of y then subtract the number of multiples before x, (x-1) / z
Solution: ( y / z ) - (( x - 1 ) / z )
Programmatically, you could make a function numberOfMultiples
function numberOfMultiples(n, z) {
return n / z;
}
to get the number of multiples in a range [x, y]
numberOfMultiples(y) - numberOfMultiples(x-1)
The function is O(1), there is no need of a loop to get the number of multiples.
Examples of results you should find
[30, 90] ÷ 13 => 4
[1, 1000] ÷ 6 => 166
[100, 1000000] ÷ 7 => 142843
[777, 777777777] ÷ 7 => 111111001
For the first example, 90 / 13 = 6, (30-1) / 13 = 2, and 6-2 = 4
---26---39---52---65---78---91--
^ ^
30<---(4 multiples)-->90
I also encountered this on Codility. It took me much longer than I'd like to admit to come up with a good solution, so I figured I would share what I think is an elegant solution!
Straightforward Approach 1/2:
O(N) time solution with a loop and counter, unrealistic when N = 2 billion.
Awesome Approach 3:
We want the number of digits in some range that are divisible by K.
Simple case: assume range [0 .. n*K], N = n*K
N/K represents the number of digits in [0,N) that are divisible by K, given N%K = 0 (aka. N is divisible by K)
ex. N = 9, K = 3, Num digits = |{0 3 6}| = 3 = 9/3
Similarly,
N/K + 1 represents the number of digits in [0,N] divisible by K
ex. N = 9, K = 3, Num digits = |{0 3 6 9}| = 4 = 9/3 + 1
I think really understanding the above fact is the trickiest part of this question, I cannot explain exactly why it works.
The rest boils down to prefix sums and handling special cases.
Now we don't always have a range that begins with 0, and we cannot assume the two bounds will be divisible by K.
But wait! We can fix this by calculating our own nice upper and lower bounds and using some subtraction magic :)
First find the closest upper and lower in the range [A,B] that are divisible by K.
Upper bound (easier): ex. B = 10, K = 3, new_B = 9... the pattern is B - B%K
Lower bound: ex. A = 10, K = 3, new_A = 12... try a few more and you will see the pattern is A - A%K + K
Then calculate the following using the above technique:
Determine the total number of digits X between [0,B] that are divisible by K
Determine the total number of digits Y between [0,A) that are divisible by K
Calculate the number of digits between [A,B] that are divisible by K in constant time by the expression X - Y
Website: https://codility.com/demo/take-sample-test/count_div/
class CountDiv {
public int solution(int A, int B, int K) {
int firstDivisible = A%K == 0 ? A : A + (K - A%K);
int lastDivisible = B%K == 0 ? B : B - B%K; //B/K behaves this way by default.
return (lastDivisible - firstDivisible)/K + 1;
}
}
This is my first time explaining an approach like this. Feedback is very much appreciated :)
This is one of the Codility Lesson 3 questions. For this question, the input is guaranteed to be in a valid range. I answered it using Javascript:
function solution(x, y, z) {
var totalDivisibles = Math.floor(y / z),
excludeDivisibles = Math.floor((x - 1) / z),
divisiblesInArray = totalDivisibles - excludeDivisibles;
return divisiblesInArray;
}
https://codility.com/demo/results/demoQX3MJC-8AP/
(I actually wanted to ask about some of the other comments on this page but I don't have enough rep points yet).
Divide y-x by z, rounding down. Add one if y%z < x%z or if x%z == 0.
No mathematical proof, unless someone cares to provide one, but test cases, in Perl:
#!perl
use strict;
use warnings;
use Test::More;
sub multiples_in_range {
my ($x, $y, $z) = #_;
return 0 if $x > $y;
my $ret = int( ($y - $x) / $z);
$ret++ if $y%$z < $x%$z or $x%$z == 0;
return $ret;
}
for my $z (2 .. 10) {
for my $x (0 .. 2*$z) {
for my $y (0 .. 4*$z) {
is multiples_in_range($x, $y, $z),
scalar(grep { $_ % $z == 0 } $x..$y),
"[$x..$y] mod $z";
}
}
}
done_testing;
Output:
$ prove divrange.pl
divrange.pl .. ok
All tests successful.
Files=1, Tests=3405, 0 wallclock secs ( 0.20 usr 0.02 sys + 0.26 cusr 0.01 csys = 0.49 CPU)
Result: PASS
Let [A;B] be an interval of positive integers including A and B such that 0 <= A <= B, K be the divisor.
It is easy to see that there are N(A) = ⌊A / K⌋ = floor(A / K) factors of K in interval [0;A]:
1K 2K 3K 4K 5K
●········x········x··●·····x········x········x···>
0 A
Similarly, there are N(B) = ⌊B / K⌋ = floor(B / K) factors of K in interval [0;B]:
1K 2K 3K 4K 5K
●········x········x········x········x···●····x···>
0 B
Then N = N(B) - N(A) equals to the number of K's (the number of integers divisible by K) in range (A;B]. The point A is not included, because the subtracted N(A) includes this point. Therefore, the result should be incremented by one, if A mod K is zero:
N := N(B) - N(A)
if (A mod K = 0)
N := N + 1
Implementation in PHP
function solution($A, $B, $K) {
if ($K < 1)
return 0;
$c = floor($B / $K) - floor($A / $K);
if ($A % $K == 0)
$c++;
return (int)$c;
}
In PHP, the effect of the floor function can be achieved by casting to the integer type:
$c = (int)($B / $K) - (int)($A / $K);
which, I think, is faster.
Here is my short and simple solution in C++ which got 100/100 on codility. :)
Runs in O(1) time. I hope its not difficult to understand.
int solution(int A, int B, int K) {
// write your code in C++11
int cnt=0;
if( A%K==0 or B%K==0)
cnt++;
if(A>=K)
cnt+= (B - A)/K;
else
cnt+=B/K;
return cnt;
}
(floor)(high/d) - (floor)(low/d) - (high%d==0)
Explanation:
There are a/d numbers divisible by d from 0.0 to a. (d!=0)
Therefore (floor)(high/d) - (floor)(low/d) will give numbers divisible in the range (low,high] (Note that low is excluded and high is included in this range)
Now to remove high from the range just subtract (high%d==0)
Works for integers, floats or whatever (Use fmodf function for floats)
Won't strive for an o(1) solution, this leave for more clever person:) Just feel this is a perfect usage scenario for function programming. Simple and straightforward.
> x,y,z=1,1000,6
=> [1, 1000, 6]
> (x..y).select {|n| n%z==0}.size
=> 166
EDIT: after reading other's O(1) solution. I feel shamed. Programming made people lazy to think...
Division (a/b=c) by definition - taking a set of size a and forming groups of size b. The number of groups of this size that can be formed, c, is the quotient of a and b. - is nothing more than the number of integers within range/interval ]0..a] (not including zero, but including a) that are divisible by b.
so by definition:
Y/Z - number of integers within ]0..Y] that are divisible by Z
and
X/Z - number of integers within ]0..X] that are divisible by Z
thus:
result = [Y/Z] - [X/Z] + x (where x = 1 if and only if X is divisible by Y otherwise 0 - assuming the given range [X..Y] includes X)
example :
for (6, 12, 2) we have 12/2 - 6/2 + 1 (as 6%2 == 0) = 6 - 3 + 1 = 4 // {6, 8, 10, 12}
for (5, 12, 2) we have 12/2 - 5/2 + 0 (as 5%2 != 0) = 6 - 2 + 0 = 4 // {6, 8, 10, 12}
The time complexity of the solution will be linear.
Code Snippet :
int countDiv(int a, int b, int m)
{
int mod = (min(a, b)%m==0);
int cnt = abs(floor(b/m) - floor(a/m)) + mod;
return cnt;
}
here n will give you count of number and will print sum of all numbers that are divisible by k
int a = sc.nextInt();
int b = sc.nextInt();
int k = sc.nextInt();
int first = 0;
if (a > k) {
first = a + a/k;
} else {
first = k;
}
int last = b - b%k;
if (first > last) {
System.out.println(0);
} else {
int n = (last - first)/k+1;
System.out.println(n * (first + last)/2);
}
Here is the solution to the problem written in Swift Programming Language.
Step 1: Find the first number in the range divisible by z.
Step 2: Find the last number in the range divisible by z.
Step 3: Use a mathematical formula to find the number of divisible numbers by z in the range.
func solution(_ x : Int, _ y : Int, _ z : Int) -> Int {
var numberOfDivisible = 0
var firstNumber: Int
var lastNumber: Int
if y == x {
return x % z == 0 ? 1 : 0
}
//Find first number divisible by z
let moduloX = x % z
if moduloX == 0 {
firstNumber = x
} else {
firstNumber = x + (z - moduloX)
}
//Fist last number divisible by z
let moduloY = y % z
if moduloY == 0 {
lastNumber = y
} else {
lastNumber = y - moduloY
}
//Math formula
numberOfDivisible = Int(floor(Double((lastNumber - firstNumber) / z))) + 1
return numberOfDivisible
}
public static int Solution(int A, int B, int K)
{
int count = 0;
//If A is divisible by K
if(A % K == 0)
{
count = (B / K) - (A / K) + 1;
}
//If A is not divisible by K
else if(A % K != 0)
{
count = (B / K) - (A / K);
}
return count;
}
This can be done in O(1).
Here you are a solution in C++.
auto first{ x % z == 0 ? x : x + z - x % z };
auto last{ y % z == 0 ? y : y - y % z };
auto ans{ (last - first) / z + 1 };
Where first is the first number that ∈ [x; y] and is divisible by z, last is the last number that ∈ [x; y] and is divisible by z and ans is the answer that you are looking for.

How to calculate the index (lexicographical order) when the combination is given

I know that there is an algorithm that permits, given a combination of number (no repetitions, no order), calculates the index of the lexicographic order.
It would be very useful for my application to speedup things...
For example:
combination(10, 5)
1 - 1 2 3 4 5
2 - 1 2 3 4 6
3 - 1 2 3 4 7
....
251 - 5 7 8 9 10
252 - 6 7 8 9 10
I need that the algorithm returns the index of the given combination.
es: index( 2, 5, 7, 8, 10 ) --> index
EDIT: actually I'm using a java application that generates all combinations C(53, 5) and inserts them into a TreeMap.
My idea is to create an array that contains all combinations (and related data) that I can index with this algorithm.
Everything is to speedup combination searching.
However I tried some (not all) of your solutions and the algorithms that you proposed are slower that a get() from TreeMap.
If it helps: my needs are for a combination of 5 from 53 starting from 0 to 52.
Thank you again to all :-)
Here is a snippet that will do the work.
#include <iostream>
int main()
{
const int n = 10;
const int k = 5;
int combination[k] = {2, 5, 7, 8, 10};
int index = 0;
int j = 0;
for (int i = 0; i != k; ++i)
{
for (++j; j != combination[i]; ++j)
{
index += c(n - j, k - i - 1);
}
}
std::cout << index + 1 << std::endl;
return 0;
}
It assumes you have a function
int c(int n, int k);
that will return the number of combinations of choosing k elements out of n elements.
The loop calculates the number of combinations preceding the given combination.
By adding one at the end we get the actual index.
For the given combination there are
c(9, 4) = 126 combinations containing 1 and hence preceding it in lexicographic order.
Of the combinations containing 2 as the smallest number there are
c(7, 3) = 35 combinations having 3 as the second smallest number
c(6, 3) = 20 combinations having 4 as the second smallest number
All of these are preceding the given combination.
Of the combinations containing 2 and 5 as the two smallest numbers there are
c(4, 2) = 6 combinations having 6 as the third smallest number.
All of these are preceding the given combination.
Etc.
If you put a print statement in the inner loop you will get the numbers
126, 35, 20, 6, 1.
Hope that explains the code.
Convert your number selections to a factorial base number. This number will be the index you want. Technically this calculates the lexicographical index of all permutations, but if you only give it combinations, the indexes will still be well ordered, just with some large gaps for all the permutations that come in between each combination.
Edit: pseudocode removed, it was incorrect, but the method above should work. Too tired to come up with correct pseudocode at the moment.
Edit 2: Here's an example. Say we were choosing a combination of 5 elements from a set of 10 elements, like in your example above. If the combination was 2 3 4 6 8, you would get the related factorial base number like so:
Take the unselected elements and count how many you have to pass by to get to the one you are selecting.
1 2 3 4 5 6 7 8 9 10
2 -> 1
1 3 4 5 6 7 8 9 10
3 -> 1
1 4 5 6 7 8 9 10
4 -> 1
1 5 6 7 8 9 10
6 -> 2
1 5 7 8 9 10
8 -> 3
So the index in factorial base is 1112300000
In decimal base, it's
1*9! + 1*8! + 1*7! + 2*6! + 3*5! = 410040
This is Algorithm 2.7 kSubsetLexRank on page 44 of Combinatorial Algorithms by Kreher and Stinson.
r = 0
t[0] = 0
for i from 1 to k
if t[i - 1] + 1 <= t[i] - 1
for j from t[i - 1] to t[i] - 1
r = r + choose(n - j, k - i)
return r
The array t holds your values, for example [5 7 8 9 10]. The function choose(n, k) calculates the number "n choose k". The result value r will be the index, 251 for the example. Other inputs are n and k, for the example they would be 10 and 5.
zero-base,
# v: array of length k consisting of numbers between 0 and n-1 (ascending)
def index_of_combination(n,k,v):
idx = 0
for p in range(k-1):
if p == 0: arrg = range(1,v[p]+1)
else: arrg = range(v[p-1]+2, v[p]+1)
for a in arrg:
idx += combi[n-a, k-1-p]
idx += v[k-1] - v[k-2] - 1
return idx
Null Set has the right approach. The index corresponds to the factorial-base number of the sequence. You build a factorial-base number just like any other base number, except that the base decreases for each digit.
Now, the value of each digit in the factorial-base number is the number of elements less than it that have not yet been used. So, for combination(10, 5):
(1 2 3 4 5) == 0*9!/5! + 0*8!/5! + 0*7!/5! + 0*6!/5! + 0*5!/5!
== 0*3024 + 0*336 + 0*42 + 0*6 + 0*1
== 0
(10 9 8 7 6) == 9*3024 + 8*336 + 7*42 + 6*6 + 5*1
== 30239
It should be pretty easy to calculate the index incrementally.
If you have a set of positive integers 0<=x_1 < x_2< ... < x_k , then you could use something called the squashed order:
I = sum(j=1..k) Choose(x_j,j)
The beauty of the squashed order is that it works independent of the largest value in the parent set.
The squashed order is not the order you are looking for, but it is related.
To use the squashed order to get the lexicographic order in the set of k-subsets of {1,...,n) is by taking
1 <= x1 < ... < x_k <=n
compute
0 <= n-x_k < n-x_(k-1) ... < n-x_1
Then compute the squashed order index of (n-x_k,...,n-k_1)
Then subtract the squashed order index from Choose(n,k) to get your result, which is the lexicographic index.
If you have relatively small values of n and k, you can cache all the values Choose(a,b) with a
See Anderson, Combinatorics on Finite Sets, pp 112-119
I needed also the same for a project of mine and the fastest solution I found was (Python):
import math
def nCr(n,r):
f = math.factorial
return f(n) / f(r) / f(n-r)
def index(comb,n,k):
r=nCr(n,k)
for i in range(k):
if n-comb[i]<k-i:continue
r=r-nCr(n-comb[i],k-i)
return r
My input "comb" contained elements in increasing order You can test the code with for example:
import itertools
k=3
t=[1,2,3,4,5]
for x in itertools.combinations(t, k):
print x,index(x,len(t),k)
It is not hard to prove that if comb=(a1,a2,a3...,ak) (in increasing order) then:
index=[nCk-(n-a1+1)Ck] + [(n-a1)C(k-1)-(n-a2+1)C(k-1)] + ... =
nCk -(n-a1)Ck -(n-a2)C(k-1) - .... -(n-ak)C1
There's another way to do all this. You could generate all possible combinations and write them into a binary file where each comb is represented by it's index starting from zero. Then, when you need to find an index, and the combination is given, you apply a binary search on the file. Here's the function. It's written in VB.NET 2010 for my lotto program, it works with Israel lottery system so there's a bonus (7th) number; just ignore it.
Public Function Comb2Index( _
ByVal gAr() As Byte) As UInt32
Dim mxPntr As UInt32 = WHL.AMT.WHL_SYS_00 '(16.273.488)
Dim mdPntr As UInt32 = mxPntr \ 2
Dim eqCntr As Byte
Dim rdAr() As Byte
modBinary.OpenFile(WHL.WHL_SYS_00, _
FileMode.Open, FileAccess.Read)
Do
modBinary.ReadBlock(mdPntr, rdAr)
RP: If eqCntr = 7 Then GoTo EX
If gAr(eqCntr) = rdAr(eqCntr) Then
eqCntr += 1
GoTo RP
ElseIf gAr(eqCntr) < rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mxPntr = mdPntr
mdPntr \= 2
ElseIf gAr(eqCntr) > rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mdPntr += (mxPntr - mdPntr) \ 2
End If
Loop Until eqCntr = 7
EX: modBinary.CloseFile()
Return mdPntr
End Function
P.S. It takes 5 to 10 mins to generate 16 million combs on a Core 2 Duo. To find the index using binary search on file takes 397 milliseconds on a SATA drive.
Assuming the maximum setSize is not too large, you can simply generate a lookup table, where the inputs are encoded this way:
int index(a,b,c,...)
{
int key = 0;
key |= 1<<a;
key |= 1<<b;
key |= 1<<c;
//repeat for all arguments
return Lookup[key];
}
To generate the lookup table, look at this "banker's order" algorithm. Generate all the combinations, and also store the base index for each nItems. (For the example on p6, this would be [0,1,5,11,15]). Note that by you storing the answers in the opposite order from the example (LSBs set first) you will only need one table, sized for the largest possible set.
Populate the lookup table by walking through the combinations doing Lookup[combination[i]]=i-baseIdx[nItems]
EDIT: Never mind. This is completely wrong.
Let your combination be (a1, a2, ..., ak-1, ak) where a1 < a2 < ... < ak. Let choose(a,b) = a!/(b!*(a-b)!) if a >= b and 0 otherwise. Then, the index you are looking for is
choose(ak-1, k) + choose(ak-1-1, k-1) + choose(ak-2-1, k-2) + ... + choose (a2-1, 2) + choose (a1-1, 1) + 1
The first term counts the number of k-element combinations such that the largest element is less than ak. The second term counts the number of (k-1)-element combinations such that the largest element is less than ak-1. And, so on.
Notice that the size of the universe of elements to be chosen from (10 in your example) does not play a role in the computation of the index. Can you see why?
Sample solution:
class Program
{
static void Main(string[] args)
{
// The input
var n = 5;
var t = new[] { 2, 4, 5 };
// Helping transformations
ComputeDistances(t);
CorrectDistances(t);
// The algorithm
var r = CalculateRank(t, n);
Console.WriteLine("n = 5");
Console.WriteLine("t = {2, 4, 5}");
Console.WriteLine("r = {0}", r);
Console.ReadKey();
}
static void ComputeDistances(int[] t)
{
var k = t.Length;
while (--k >= 0)
t[k] -= (k + 1);
}
static void CorrectDistances(int[] t)
{
var k = t.Length;
while (--k > 0)
t[k] -= t[k - 1];
}
static int CalculateRank(int[] t, int n)
{
int k = t.Length - 1, r = 0;
for (var i = 0; i < t.Length; i++)
{
if (t[i] == 0)
{
n--;
k--;
continue;
}
for (var j = 0; j < t[i]; j++)
{
n--;
r += CalculateBinomialCoefficient(n, k);
}
n--;
k--;
}
return r;
}
static int CalculateBinomialCoefficient(int n, int k)
{
int i, l = 1, m, x, y;
if (n - k < k)
{
x = k;
y = n - k;
}
else
{
x = n - k;
y = k;
}
for (i = x + 1; i <= n; i++)
l *= i;
m = CalculateFactorial(y);
return l/m;
}
static int CalculateFactorial(int n)
{
int i, w = 1;
for (i = 1; i <= n; i++)
w *= i;
return w;
}
}
The idea behind the scenes is to associate a k-subset with an operation of drawing k-elements from the n-size set. It is a combination, so the overall count of possible items will be (n k). It is a clue that we could seek the solution in Pascal Triangle. After a while of comparing manually written examples with the appropriate numbers from the Pascal Triangle, we will find the pattern and hence the algorithm.
I used user515430's answer and converted to python3. Also this supports non-continuous values so you could pass in [1,3,5,7,9] as your pool instead of range(1,11)
from itertools import combinations
from scipy.special import comb
from pandas import Index
debugcombinations = False
class IndexedCombination:
def __init__(self, _setsize, _poolvalues):
self.setsize = _setsize
self.poolvals = Index(_poolvalues)
self.poolsize = len(self.poolvals)
self.totalcombinations = 1
fast_k = min(self.setsize, self.poolsize - self.setsize)
for i in range(1, fast_k + 1):
self.totalcombinations = self.totalcombinations * (self.poolsize - fast_k + i) // i
#fill the nCr cache
self.choose_cache = {}
n = self.poolsize
k = self.setsize
for i in range(k + 1):
for j in range(n + 1):
if n - j >= k - i:
self.choose_cache[n - j,k - i] = comb(n - j,k - i, exact=True)
if debugcombinations:
print('testnth = ' + str(self.testnth()))
def get_nth_combination(self,index):
n = self.poolsize
r = self.setsize
c = self.totalcombinations
#if index < 0 or index >= c:
# raise IndexError
result = []
while r:
c, n, r = c*r//n, n-1, r-1
while index >= c:
index -= c
c, n = c*(n-r)//n, n-1
result.append(self.poolvals[-1 - n])
return tuple(result)
def get_n_from_combination(self,someset):
n = self.poolsize
k = self.setsize
index = 0
j = 0
for i in range(k):
setidx = self.poolvals.get_loc(someset[i])
for j in range(j + 1, setidx + 1):
index += self.choose_cache[n - j, k - i - 1]
j += 1
return index
#just used to test whether nth_combination from the internet actually works
def testnth(self):
n = 0
_setsize = self.setsize
mainset = self.poolvals
for someset in combinations(mainset, _setsize):
nthset = self.get_nth_combination(n)
n2 = self.get_n_from_combination(nthset)
if debugcombinations:
print(str(n) + ': ' + str(someset) + ' vs ' + str(n2) + ': ' + str(nthset))
if n != n2:
return False
for x in range(_setsize):
if someset[x] != nthset[x]:
return False
n += 1
return True
setcombination = IndexedCombination(5, list(range(1,10+1)))
print( str(setcombination.get_n_from_combination([2,5,7,8,10])))
returns 188

Resources