Overflow detection in unsigned division algorithm - algorithm

I have a question about 64/32-bits division algorithm as it appears in Hacker's Delight in Chapter 9-4 Unsigned Long Division, Figure 9-3, "div1u". Online it can be seen here, from where I copy-pasted it as follows:
unsigned divlu2(unsigned u1, unsigned u0, unsigned v,
unsigned *r) {
const unsigned b = 65536; // Number base (16 bits).
unsigned un1, un0, // Norm. dividend LSD's.
vn1, vn0, // Norm. divisor digits.
q1, q0, // Quotient digits.
un32, un21, un10,// Dividend digit pairs.
rhat; // A remainder.
int s; // Shift amount for norm.
if (u1 >= v) { // If overflow, set rem.
if (r != NULL) // to an impossible value,
*r = 0xFFFFFFFF; // and return the largest
return 0xFFFFFFFF;} // possible quotient.
s = nlz(v); // 0 <= s <= 31.
v = v << s; // Normalize divisor.
vn1 = v >> 16; // Break divisor up into
vn0 = v & 0xFFFF; // two 16-bit digits.
un32 = (u1 << s) | (u0 >> 32 - s) & (-s >> 31);
un10 = u0 << s; // Shift dividend left.
un1 = un10 >> 16; // Break right half of
un0 = un10 & 0xFFFF; // dividend into two digits.
q1 = un32/vn1; // Compute the first
rhat = un32 - q1*vn1; // quotient digit, q1.
again1:
if (q1 >= b || q1*vn0 > b*rhat + un1) {
q1 = q1 - 1;
rhat = rhat + vn1;
if (rhat < b) goto again1;}
un21 = un32*b + un1 - q1*v; // Multiply and subtract.
q0 = un21/vn1; // Compute the second
rhat = un21 - q0*vn1; // quotient digit, q0.
again2:
if (q0 >= b || q0*vn0 > b*rhat + un0) {
q0 = q0 - 1;
rhat = rhat + vn1;
if (rhat < b) goto again2;}
if (r != NULL) // If remainder is wanted,
*r = (un21*b + un0 - q0*v) >> s; // return it.
return q1*b + q0;
}
Specifically, I'm interested in the bounds of the variable un21. How large can it be? Somewhat surprising, it can be larger than v but by how much?
In other words, under again2 there is the test q0 >= b. If I wanted to know whether the division (q0 = un21/vn1) eventually overflows, is it enough to test (un21 >> 16) == vn1 or does it have to read (un21 >> 16) >= vn1, instead if q0 >= b?
The idea is to know in advance, prior to calculating the quotient, whether the division overflows or not.

Related

How to perform division of two Q15 values in Verilog , with out using '/' (division) Operator?

As division operation (/) is expensive in case of FPGA ? Is it possible to perform division of two Q15 format numbers(16 bit fixed point number) with basic shift operations?
Could someone help me by providing some example?
Thanks in advance!
Fixed-point arithmetic is just integer arithmetic with a bit of scaling thrown in. Q15 is a purely fractional format stored as a signed 16-bit integer with scale factor of 215, able to represent values in the interval [-1, 1). Clearly, division only makes sense in Q15 when the divisor's magnitude exceeds the dividend's magnitude, as otherwise the quotient's magnitude exceeds the representable range.
Before embarking on a custom Verilog implementation of fixed-point division, you would want to check your FPGA vendor's library offerings as a fixed-point library including pipeline division is often available. There are also opens source projects that may be relevant, such as this one.
When using integer division operators for fixed-point division, we need to adjust for the fact that the division will remove the scale factor, i.e (a * 2scale) / (b * 2scale) = (a/b), while the correct fixed-point result is (a/b * 2scale). This is easily fixed by pre-multiplying the dividend by 2scale, as in the following C implementation:
int16_t div_q15 (int16_t dividend, int16_t divisor)
{
return (int16_t)(((int32_t)dividend << 15) / (int32_t)divisor);
}
Wikipedia gives a reasonable overwiew on how to implement binary division on a bit-by-bit basis using add, subtract, and shift operations. These methods are closely related to the longhand division taught in grade school. For FPGAs, the use of the non-restoring method if often preferred, as pointed out by this paper, for example:
Nikolay Sorokin, "Implementation of high-speed fixed-point dividers on FPGA". Journal of Computer Science & Technology, Vol. 6, No. 1, April 2006, pp. 8-11.
Here is C code that shows how the non-restoring method may be used for the division of 16-bit two's-complement operands:
/* bit-wise non-restoring two's complement division */
void int16_div (int16_t dividend, int16_t divisor, int16_t *quot, int16_t *rem)
{
const int operand_bits = (int) (sizeof (int16_t) * CHAR_BIT);
uint16_t d = (uint16_t)divisor;
uint16_t nd = 0 - d; /* -divisor */
uint16_t r, q = 0; /* remainder, quotient */
uint32_t dd = (uint32_t)d << operand_bits; /* expanded divisor */
uint32_t pp = dividend; /* partial remainder */
int i;
for (i = operand_bits - 1; i >= 0; i--) {
if ((int32_t)(pp ^ dd) < 0) {
q = (q << 1) + 0; /* record quotient bit -1 (as 0) */
pp = (pp << 1) + dd;
} else {
q = (q << 1) + 1; /* record quotient bit +1 (as 1) */
pp = (pp << 1) - dd;
}
}
/* convert quotient from digit set {-1,1} to plain two's complement */
q = (q << 1) + 1;
/* remainder is upper half of partial remainder */
r = (uint16_t)(pp >> operand_bits);
/* fix up cases where we worked past a partial remainder of zero */
if (r == d) { /* remainder equal to divisor */
q = q + 1;
r = 0;
} else if (r == nd) { /* remainder equal to -divisor */
q = q - 1;
r = 0;
}
/* for truncating division, remainder must have same sign as dividend */
if (r && ((int16_t)(dividend ^ r) < 0)) {
if ((int16_t)q < 0) {
q = q + 1;
r = r - d;
} else {
q = q - 1;
r = r + d;
}
}
*quot = (int16_t)q;
*rem = (int16_t)r;
}
Note that there are multiple ways of dealing with the various special cases that arise in non-restoring division. For example, one frequently sees code that detects a zero partial remainder pp and exits the loop over the quotient bits early in this case. Here I assume that an FPGA implementation would unroll the loop completely to create a pipelined implementation, in which case early termination is not helpful. Instead, a final correction is applied to those quotients that are affected by ignoring a partial remainder of zero.
In order to create a Q15 division from the above, we have to make just a single change: incorporating the up-scaling of the dividend. Instead of:
uint32_t pp = dividend; /* partial remainder */
we now use this:
uint32_t pp = dividend << 15; /* partial remainder; incorporate Q15 scaling */
The resulting C code (sorry, I won't provide read-to-use Verilog code) including the test framework is:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <limits.h>
#include <math.h>
/* bit-wise non-restoring two's complement division */
void q15_div (int16_t dividend, int16_t divisor, int16_t *quot, int16_t *rem)
{
const int operand_bits = (int) (sizeof (int16_t) * CHAR_BIT);
uint16_t d = (uint16_t)divisor;
uint16_t nd = 0 - d; /* -divisor */
uint16_t r, q = 0; /* remainder, quotient */
uint32_t dd = (uint32_t)d << operand_bits; /* expanded divisor */
uint32_t pp = dividend << 15; /* partial remainder, incorporate Q15 scaling */
int i;
for (i = operand_bits - 1; i >= 0; i--) {
if ((int32_t)(pp ^ dd) < 0) {
q = (q << 1) + 0; /* record quotient bit -1 (as 0) */
pp = (pp << 1) + dd;
} else {
q = (q << 1) + 1; /* record quotient bit +1 (as 1) */
pp = (pp << 1) - dd;
}
}
/* convert quotient from digit set {-1,1} to plain two's complement */
q = (q << 1) + 1;
/* remainder is upper half of partial remainder */
r = (uint16_t)(pp >> operand_bits);
/* fix up cases where we worked past a partial remainder of zero */
if (r == d) { /* remainder equal to divisor */
q = q + 1;
r = 0;
} else if (r == nd) { /* remainder equal to -divisor */
q = q - 1;
r = 0;
}
/* for truncating division, remainder must have same sign as dividend */
if (r && ((int16_t)(dividend ^ r) < 0)) {
if ((int16_t)q < 0) {
q = q + 1;
r = r - d;
} else {
q = q - 1;
r = r + d;
}
}
*quot = (int16_t)q;
*rem = (int16_t)r;
}
int main (void)
{
uint16_t dividend, divisor, ref_q, res_q, res_r;
double quot, fxscale = (1 << 15);
dividend = 0;
do {
printf ("\r%04x", dividend);
divisor = 1;
do {
quot = trunc (fxscale * (int16_t)dividend / (int16_t)divisor);
/* Q15 can only represent numbers in [-1, 1) */
if ((quot >= -1.0) && (quot < 1.0)) {
ref_q = (int16_t)((((int32_t)(int16_t)dividend) << 15) /
((int32_t)(int16_t)divisor));
q15_div ((int16_t)dividend, (int16_t)divisor,
(int16_t *)&res_q, (int16_t *)&res_r);
if (res_q != ref_q) {
printf ("!r dividend=%04x (%f) divisor=%04x (%f) res=%04x (%f) ref=%04x (%f)\n",
dividend, (int16_t)dividend / fxscale,
divisor, (int16_t)divisor / fxscale,
res_q, (int16_t)res_q / fxscale,
ref_q, (int16_t)ref_q / fxscale);
}
}
divisor++;
} while (divisor);
dividend++;
} while (dividend);
return EXIT_SUCCESS;
}

Portable efficient alternative to PDEP without using BMI2?

The documentation for the parallel deposit instruction (PDEP) in Intel's Bit Manipulation Instruction Set 2 (BMI2) describes the following serial implementation for the instruction (C-like pseudocode):
U64 _pdep_u64(U64 val, U64 mask) {
U64 res = 0;
for (U64 bb = 1; mask; bb += bb) {
if (val & bb)
res |= mask & -mask;
mask &= mask - 1;
}
return res;
}
See also Intel's pdep insn ref manual entry.
This algorithm is O(n), where n is the number of set bits in mask, which obviously has a worst case of O(k) where k is the total number of bits in mask.
Is a more efficient worst case algorithm possible?
Is it possible to make a faster version that assumes that val has at most one bit set, ie either equals 0 or equals 1<<r for some value of r from 0 to 63?
The second part of the question, about the special case of a 1-bit deposit, requires two steps. In the first step, we need to determine the bit index r of the single 1-bit in val, with a suitable response in case val is zero. This can easily be accomplished via the POSIX function ffs, or if r is known by other means, as alluded to by the asker in comments. In the second step we need to identify bit index i of the r-th 1-bit in mask, if it exists. We can then deposit the r-th bit of val at bit i.
One way of finding the index of the r-th 1-bit in mask is to tally the 1-bits using a classical population count algorithm based on binary partitioning, and record all of the intermediate group-wise bit counts. We then perform a binary search on the recorded bit-count data to identify the position of the desired bit.
The following C-code demonstrates this using 64-bit data. Whether this is actually faster than the iterative method will very much depend on typical values of mask and val.
#include <stdint.h>
/* Find the index of the n-th 1-bit in mask, n >= 0
The index of the least significant bit is 0
Return -1 if there is no such bit
*/
int find_nth_set_bit (uint64_t mask, int n)
{
int t, i = n, r = 0;
const uint64_t m1 = 0x5555555555555555ULL; // even bits
const uint64_t m2 = 0x3333333333333333ULL; // even 2-bit groups
const uint64_t m4 = 0x0f0f0f0f0f0f0f0fULL; // even nibbles
const uint64_t m8 = 0x00ff00ff00ff00ffULL; // even bytes
uint64_t c1 = mask;
uint64_t c2 = c1 - ((c1 >> 1) & m1);
uint64_t c4 = ((c2 >> 2) & m2) + (c2 & m2);
uint64_t c8 = ((c4 >> 4) + c4) & m4;
uint64_t c16 = ((c8 >> 8) + c8) & m8;
uint64_t c32 = (c16 >> 16) + c16;
int c64 = (int)(((c32 >> 32) + c32) & 0x7f);
t = (c32 ) & 0x3f; if (i >= t) { r += 32; i -= t; }
t = (c16>> r) & 0x1f; if (i >= t) { r += 16; i -= t; }
t = (c8 >> r) & 0x0f; if (i >= t) { r += 8; i -= t; }
t = (c4 >> r) & 0x07; if (i >= t) { r += 4; i -= t; }
t = (c2 >> r) & 0x03; if (i >= t) { r += 2; i -= t; }
t = (c1 >> r) & 0x01; if (i >= t) { r += 1; }
if (n >= c64) r = -1;
return r;
}
/* val is either zero or has a single 1-bit.
Return -1 if val is zero, otherwise the index of the 1-bit
The index of the least significant bit is 0
*/
int find_bit_index (uint64_t val)
{
return ffsll (val) - 1;
}
uint64_t deposit_single_bit (uint64_t val, uint64_t mask)
{
uint64_t res = (uint64_t)0;
int r = find_bit_index (val);
if (r >= 0) {
int i = find_nth_set_bit (mask, r);
if (i >= 0) res = (uint64_t)1 << i;
}
return res;
}

Smallest number in a range [a,b] with maximum number of '1' in binary representation

Given a range [a,b] (both inclusive) I need to find the smallest number with the maximum number of '1's in binary representation. My current approach is I find the number of bits set in all numbers from a to b and keep track of the maximum.
However this is very slow, any faster method?
Let's find most significant bit which is different in a and b. It will be 0 in a, 1 in b. If we place all other bits to the right to 1 - resulting number will be still in range [a; b]. And it will the single number with maximum number of ones in representation.
EDIT. The result of this algorithm always returns the number with n-1 bits set to one, where n is number of bits which can be changed. As pointed in comments - there is a bug in case if all of there n bits in b are set to 1. Here is the fixed code snippet:
int maximizeBits(int a, int b) {
if (a == b) {
return a;
}
int m = a ^ b, pow2 = 1; // MSB of m=a^b is bit that we need to find
while (m > pow2) { // Set other bits to 0
if ((m & pow2) != 0) {
m ^= pow2;
}
pow2 <<= 1;
}
int res = a | (m - 1); // Now m is in form of 2^n and m - 1 would be mask of n-1 bits
if ((res | b) <= b) { // Fix of problem if all n bits in b are set to 1
res = b;
}
return res;
}
You can replace the loop in Jarlax' answer by a "parallel suffix OR", like this
uint32_t m = (a ^ b) >> 1;
m |= m >> 1;
m |= m >> 2;
m |= m >> 4;
m |= m >> 8;
m |= m >> 16;
uint32_t res = a | m;
if ((res | b) <= b)
res = b;
return res;
It generalizes to different sizes integer, using ceil(log(k)) steps in general. The initial test a == b is not necessary, a ^ b would be zero, therefore m is zero, so nothing interesting happens anyway.
Alternatively, here's a completely different approach: keep changing the lowest 0 to a 1 until it is no longer possible.
unsigned x = a;
while (x < b) {
unsigned newx = (x + 1) | x; // set lowest 0
if (newx <= b)
x = newx;
else
break;
}
return x;

Add two numbers without using + and - operators

Suppose you have two numbers, both signed integers, and you want to sum them but can't use your language's conventional + and - operators. How would you do that?
Based on http://www.ocf.berkeley.edu/~wwu/riddles/cs.shtml
Not mine, but cute
int a = 42;
int b = 17;
char *ptr = (char*)a;
int result = (int)&ptr[b];
Using Bitwise operations just like Adder Circuits
Cringe. Nobody builds an adder from 1-bit adders anymore.
do {
sum = a ^ b;
carry = a & b;
a = sum;
b = carry << 1;
} while (b);
return sum;
Of course, arithmetic here is assumed to be unsigned modulo 2n or twos-complement. It's only guaranteed to work in C if you convert to unsigned, perform the calculation unsigned, and then convert back to signed.
Since ++ and -- are not + and - operators:
int add(int lhs, int rhs) {
if (lhs < 0)
while (lhs++) --rhs;
else
while (lhs--) ++rhs;
return rhs;
}
Using bitwise logic:
int sum = 0;
int carry = 0;
while (n1 > 0 || n2 > 0) {
int b1 = n1 % 2;
int b2 = n2 % 2;
int sumBits = b1 ^ b2 ^ carry;
sum = (sum << 1) | sumBits;
carry = (b1 & b2) | (b1 & carry) | (b2 & carry);
n1 /= 2;
n2 /= 2;
}
Here's something different than what's been posted already. Use the facts that:
log (a^b) = b * log a
e^a * e^b = e^(a + b)
So:
log (e^(a + b)) = log(e^a * e^b) = a + b (if the log is base e)
So just find log(e^a * e^b).
Of course this is just theoretical, in practice this is going to be inefficient and most likely inexact too.
If we're obeying the letter of the rules:
a += b;
Otherwise http://www.geekinterview.com/question_details/67647 has a pretty complete list.
This version has a restriction on the number range:
(((int64_t)a << 32) | ((int64_t)b & INT64_C(0xFFFFFFFF)) % 0xFFFFFFFF
This also counts under the "letter of the rules" category.
Simple example in Python, complete with a simple test:
NUM_BITS = 32
def adder(a, b, carry):
sum = a ^ b ^ carry
carry = (a & b) | (carry & (a ^ b))
#print "%d + %d = %d (carry %d)" % (a, b, sum, carry)
return sum, carry
def add_two_numbers(a, b):
carry = 0
result = 0
for n in range(NUM_BITS):
mask = 1 << n
bit_a = (a & mask) >> n
bit_b = (b & mask) >> n
sum, carry = adder(bit_a, bit_b, carry)
result = result | (sum << n)
return result
if __name__ == '__main__':
assert add_two_numbers(2, 3) == 5
assert add_two_numbers(57, 23) == 80
for a in range(10):
for b in range(10):
result = add_two_numbers(a, b)
print "%d + %d == %d" % (a, b, result)
assert result == a + b
In Common Lisp:
(defun esoteric-sum (a b)
(let ((and (logand a b)))
(if (zerop and)
;; No carrying necessary.
(logior a b)
;; Combine the partial sum with the carried bits again.
(esoteric-sum (logxor a b) (ash and 1)))))
That's taking the bitwise-and of the numbers, which figures out which bits need to carry, and, if there are no bits that require shifting, returns the bitwise-or of the operands. Otherwise, it shifts the carried bits one to the left and combines them again with the bitwise-exclusive-or of the numbers, which sums all the bits that don't need to carry, until no more carrying is necessary.
Here's an iterative alternative to the recursive form above:
(defun esoteric-sum-iterative (a b)
(loop for first = a then (logxor first second)
for second = b then (ash and 1)
for and = (logand first second)
until (zerop and)
finally (return (logior first second))))
Note that the function needs another concession to overcome Common Lisp's reluctance to employ fixed-width two's complement arithmetic—normally an immeasurable asset—but I'd rather not cloud the form of the function with that accidental complexity.
If you need more detail on why that works, please ask a more detailed question to probe the topic.
Not very creative, I know, but in Python:
sum([a,b])
I realize that this might not be the most elegant solution to the problem, but I figured out a way to do this using the len(list) function as a substitute for the addition operator.
'''
Addition without operators: This program obtains two integers from the user
and then adds them together without using operators. This is one of the 'hard'
questions from 'Cracking the Coding Interview' by
'''
print('Welcome to addition without a plus sign!')
item1 = int(input('Please enter the first number: '))
item2 = int(input('Please eneter the second number: '))
item1_list = []
item2_list = []
total = 0
total_list = []
marker = 'x'
placeholder = 'placeholder'
while len(item1_list) < item1:
item1_list.append(marker)
while len(item2_list) < item2:
item2_list.append(marker)
item1_list.insert(1, placeholder)
item1_list.insert(1, placeholder)
for item in range(1, len(item1_list)):
total_list.append(item1_list.pop())
for item in range(1, len(item2_list)):
total_list.append(item2_list.pop())
total = len(total_list)
print('The sum of', item1, 'and', item2, 'is', total)
#include <stdio.h>
int main()
{
int n1=5,n2=55,i=0;
int sum = 0;
int carry = 0;
while (n1 > 0 || n2 > 0)
{
int b1 = n1 % 2;
int b2 = n2 % 2;
int sumBits = b1 ^ b2 ^ carry;
sum = sum | ( sumBits << i);
i++;
carry = (b1 & b2) | (b1 & carry) | (b2 & carry);
n1 /= 2;
n2 /= 2;
}
sum = sum | ( carry << i );
printf("%d",sum);
return 0;
}

Algorithm to calculate the number of 1s for a range of numbers in binary

So I just got back for the ACM Programing competition and did pretty well but there was one problem that not one team got.
The Problem.
Start with an integer N0 which is greater than 0. Let N1 be the number of ones in the binary representation of N0. So, if N0 = 27, N1 = 4. For all i > 0, let Ni be the number of ones in the binary representation of Ni-1. This sequence will always converge to one. For any starting number, N0, let K be the minimum value of i >= 0 for which N1 = 1. For example, if N0 = 31, then N1 = 5, N2 = 2, N3 = 1, so K = 3.
Given a range of consecutive numbers and a value of X how many numbers in the range have a K value equal to X?
Input
There will be several test cases in the input. Each test case will consist of three integers on a single line:
LO HI X
Where LO and HI (1 <= LO <= HI <= 10^18) are the lower and upper limits of a range of integers, and X (0 <= X <= 10) is the target value for K. The input will end with a line of three 0s.
Output
For each test case output a single integer, representing the number of integers in the range from LO to HI (inclusive) which have a K value equal to X in the input. Print each Integer on its own line with no spaces. Do not print any blank lines between answers.
Sample Input
31 31 3
31 31 1
27 31 1
27 31 2
1023 1025 1
1023 1025 2
0 0 0
Sample Output
1
0
0
3
1
1
If you guys want I can include our answer or our problem, because finding for a small range is easy but I will give you a hint first your program needs to run in seconds not minutes. We had a successful solution but not an efficient algorithm to use a range similar to
48238 10^18 9
Anyway good luck and if the community likes these we had some more we could not solve that could be some good brain teasers for you guys. The competition allows you to use Python, C++, or Java—all three are acceptable in an answer.
So as a hint my coach said to think of how binary numbers count rather than checking every bit. I think that gets us a lot closer.
I think a key is first understanding the pattern of K values and how rapidly it grows. Basically, you have:
K(1) = 0
K(X) = K(bitcount(X))+1 for X > 1
So finding the smallest X values for a given K we see
K(1) = 0
K(2) = 1
K(3) = 2
K(7) = 3
K(127) = 4
K(170141183460469231731687303715884105727) = 5
So for an example like 48238 10^18 9 the answer is trivially 0. K=0 only for 1, and K=1 only for powers of 2, so in the range of interest, we'll pretty much only see K values of 2, 3 or 4, and never see K >= 5
edit
Ok, so we're looking for an algorithm to count the number of values with K=2,3,4 in a range of value LO..HI without iterating over the entire range. So the first step is to find the number of values in the range with bitcount(x)==i for i = 1..59 (since we only care about values up to 10^18 and 10^18 < 2^60). So break down the range lo..hi into subranges that are a power of 2 size and differ only in their lower n bits -- a range of the form x*(2^n)..(x+1)*(2^n)-1. We can break down the arbitray lo..hi range into such subranges easily. For each such subrange there will be choose(n, i) values with i+bitcount(x) set bits.
So we just add all the subranges together to get a vector of counts for 1..59, which we then iterate over, adding together those elements with the same K value to get our answer.
edit (fixed again to be be C89 compatible and work for lo=1/k=0)
Here's a C program to do what I previously described:
#include <stdio.h>
#include <string.h>
#include <assert.h>
int bitcount(long long x) {
int rv = 0;
while(x) { rv++; x &= x-1; }
return rv; }
long long choose(long long m, long long n) {
long long rv = 1;
int i;
for (i = 0; i < n; i++) {
rv *= m-i;
rv /= i+1; }
return rv; }
void bitcounts_p2range(long long *counts, long long base, int l2range) {
int i;
assert((base & ((1LL << l2range) - 1)) == 0);
counts += bitcount(base);
for (i = 0; i <= l2range; i++)
counts[i] += choose(l2range, i); }
void bitcounts_range(long long *counts, long long lo, long long hi) {
int l2range = 0;
while (lo + (1LL << l2range) - 1 <= hi) {
if (lo & (1LL << l2range)) {
bitcounts_p2range(counts, lo, l2range);
lo += 1LL << l2range; }
l2range++; }
while (l2range >= 0) {
if (lo + (1LL << l2range) - 1 <= hi) {
bitcounts_p2range(counts, lo, l2range);
lo += 1LL << l2range; }
l2range--; }
assert(lo == hi+1); }
int K(int x) {
int rv = 0;
while(x > 1) {
x = bitcount(x);
rv++; }
return rv; }
int main() {
long long counts[64];
long long lo, hi, total;
int i, k;
while (scanf("%lld%lld%d", &lo, &hi, &k) == 3) {
if (lo < 1 || lo > hi || k < 0) break;
if (lo == 0 || hi == 0 || k == 0) break;
total = 0;
if (lo == 1) {
lo++;
if (k == 0) total++; }
memset(counts, 0, sizeof(counts));
bitcounts_range(counts, lo, hi);
for (i = 1; i < 64; i++)
if (K(i)+1 == k)
total += counts[i];
printf("%lld\n", total); }
return 0; }
which runs just fine for values up to 2^63-1 (LONGLONG_MAX).
For 48238 1000000000000000000 3 it gives 513162479025364957, which certainly seems plausible
edit
giving the inputs of
48238 1000000000000000000 1
48238 1000000000000000000 2
48238 1000000000000000000 3
48238 1000000000000000000 4
gives outputs of
44
87878254941659920
513162479025364957
398959266032926842
Those add up to 999999999999951763 which is correct. The value for k=1 is correct (there are 44 powers of two in that range 2^16 up to 2^59). So while I'm not sure the other 3 values are correct, they're certainly plausible.
The idea behind this answer can help you develop very fast solution. Having ranges 0..2^N the complexity of a potential algorithm would be O(N) in the worst case (Assuming that complexity of a long arithmetic is O(1)) If programmed correctly it should easily handle N = 1000000 in a matter of milliseconds.
Imagine we have the following values:
LO = 0; (0000000000000000000000000000000)
HI = 2147483647; (1111111111111111111111111111111)
The lowest possible N1 in range LO..HI is 0
The highest possible N1 in range LO..HI is 31
So the computation of N2..NN part is done only for one of 32 values (i.e. 0..31).
Which can be done simply, even without a computer.
Now lets compute the amount of N1=X for a range of values LO..HI
When we have X = 0 we have count(N1=X) = 1 this is the following value:
1 0000000000000000000000000000000
When we have X = 1 we have count(N1=X) = 31 these are the following values:
01 1000000000000000000000000000000
02 0100000000000000000000000000000
03 0010000000000000000000000000000
...
30 0000000000000000000000000000010
31 0000000000000000000000000000001
When we have X = 2 we have the following pattern:
1100000000000000000000000000000
How many unique strings can be formed with 29 - '0' and 2 - '1'?
Imagine the rightmost '1'(#1) is cycling from left to right, we get the following picture:
01 1100000000000000000000000000000
02 1010000000000000000000000000000
03 1001000000000000000000000000000
...
30 1000000000000000000000000000001
Now we've got 30 unique strings while moving the '1'(#1) from left to right, it is now impossible to
create a unique string by moving the '1'(#1) in any direction. This means we should move '1'(#2) to the right,
let's also reset the position of '1'(#1) as left as possible remaining uniqueness, we get:
01 0110000000000000000000000000000
now we do the cycling of '1'(#1) once again
02 0101000000000000000000000000000
03 0100100000000000000000000000000
...
29 0100000000000000000000000000001
Now we've got 29 unique strings, continuing this whole operation 28 times we get the following expression
count(N1=2) = 30 + 29 + 28 + ... + 1 = 465
When we have X = 3 the picture remains similar but we are moving '1'(#1), '1'(#2), '1'(#3)
Moving the '1'(#1) creates 29 unique strings, when we start moving '1'(#2) we get
29 + 28 + ... + 1 = 435 unique strings, after that we are left to process '1'(#3) so we have
29 + 28 + ... + 1 = 435
28 + ... + 1 = 406
...
+ 1 = 1
435 + 406 + 378 + 351 + 325 + 300 + 276 +
253 + 231 + 210 + 190 + 171 + 153 + 136 +
120 + 105 + 091 + 078 + 066 + 055 + 045 +
036 + 028 + 021 + 015 + 010 + 006 + 003 + 001 = 4495
Let's try to solve the general case i.e. when we have N zeros and M ones.
Overall amount of permutations for the string of length (N + M) is equal to (N + M)!
The amount of '0' duplicates in this string is equal to N!
The amount of '1' duplicates in this string is equal to M!
thus receiving overall amount of unique strings formed of N zeros and M ones is
(N + M)! 32! 263130836933693530167218012160000000
F(N, M) = ============= => ========== = ====================================== = 4495
(N!) * (M!) 3! * 29! 6 * 304888344611713860501504000000
Edit:
F(N, M) = Binomial(N + M, M)
Now let's consider a real life example:
LO = 43797207; (0000010100111000100101011010111)
HI = 1562866180; (1011101001001110111001000000100)
So how do we apply our unique permutations formula to this example? Since we don't know how
many '1' is located below LO and how many '1' is located above HI.
So lets count these permutations below LO and above HI.
Lets remember how we cycled '1'(#1), '1'(#2), ...
1111100000000000000000000000000 => 2080374784
1111010000000000000000000000000 => 2046820352
1111001000000000000000000000000 => 2030043136
1111000000000000000000000000001 => 2013265921
1110110000000000000000000000000 => 1979711488
1110101000000000000000000000000 => 1962934272
1110100100000000000000000000000 => 1954545664
1110100010000000000000000000001 => 1950351361
As you see this cycling process decreases the decimal values smoothly. So we need to count amount of
cycles until we reach HI value. But we shouldn't be counting these values by one because
the worst case can generate up to 32!/(16!*16!) = 601080390 cycles, which we will be cycling very long :)
So we need cycle chunks of '1' at once.
Having our example we would want to count the amount of cycles of a transformation
1111100000000000000000000000000 => 1011101000000000000000000000000
1011101001001110111001000000100
So how many cycles causes the transformation
1111100000000000000000000000000 => 1011101000000000000000000000000
?
Lets see, the transformation:
1111100000000000000000000000000 => 1110110000000000000000000000000
is equal to following set of cycles:
01 1111100000000000000000000000000
02 1111010000000000000000000000000
...
27 1111000000000000000000000000001
28 1110110000000000000000000000000
So we need 28 cycles to transform
1111100000000000000000000000000 => 1110110000000000000000000000000
How many cycles do we need to transform
1111100000000000000000000000000 => 1101110000000000000000000000000
performing following moves we need:
1110110000000000000000000000000 28 cycles
1110011000000000000000000000000 27 cycles
1110001100000000000000000000000 26 cycles
...
1110000000000000000000000000011 1 cycle
and 1 cycle for receiving:
1101110000000000000000000000000 1 cycle
thus receiving 28 + 27 + ... + 1 + 1 = 406 + 1
but we have seen this value before and it was the result for the amount of unique permutations, which was
computed for 2 '1' and 27 '0'. This means that amount of cycles while moving
11100000000000000000000000000 => 01110000000000000000000000000
is equal to moving
_1100000000000000000000000000 => _0000000000000000000000000011
plus one additional cycle
so this means if we have M zeros and N ones and want to move the chunk of U '1' to the right we will need to
perform the following amount of cycles:
(U - 1 + M)!
1 + =============== = f(U, M)
M! * (U - 1)!
Edit:
f(U, M) = 1 + Binomial(U - 1 + M, M)
Now let's come back to our real life example:
LO = 43797207; (0000010100111000100101011010111)
HI = 1562866180; (1011101001001110111001000000100)
so what we want to do is count the amount cycles needed to perform the following
transformations (suppose N1 = 6)
1111110000000000000000000000000 => 1011101001000000000000000000000
1011101001001110111001000000100
this is equal to:
1011101001000000000000000000000 1011101001000000000000000000000
------------------------------- -------------------------------
_111110000000000000000000000000 => _011111000000000000000000000000 f(5, 25) = 118756
_____11000000000000000000000000 => _____01100000000000000000000000 f(2, 24) = 301
_______100000000000000000000000 => _______010000000000000000000000 f(1, 23) = 24
________10000000000000000000000 => ________01000000000000000000000 f(1, 22) = 23
thus resulting 119104 'lost' cycles which are located above HI
Regarding LO, there is actually no difference in what direction we are cycling
so for computing LO we can do reverse cycling:
0000010100111000100101011010111 0000010100111000100101011010111
------------------------------- -------------------------------
0000000000000000000000000111___ => 0000000000000000000000001110___ f(3, 25) = 2926
00000000000000000000000011_____ => 00000000000000000000000110_____ f(2, 24) = 301
Thus resulting 3227 'lost' cycles which are located below LO this means that
overall amount of lost cycles = 119104 + 3227 = 122331
overall amount of all possible cycles = F(6, 25) = 736281
N1 in range 43797207..1562866180 is equal to 736281 - 122331 = 613950
I wont provide the remaining part of the solution. It is not that hard to grasp the remaining part. Good luck!
I think it's a problem in Discrete mathematics,
assuming LOW is 0,
otherwise we can insert a function for summing numbers below LOW,
from numbers shown i understand the longest number will consist up to 60 binary digit at most
alg(HIGH,k)
l=len(HIGH)
sum=0;
for(i=0;i<l;i++)
{
count=(l choose i);
nwia=numbers_with_i_above(i,HIGH);
if canreach(i,k) sum+=(count-nwia);
}
all the numbers appear
non is listed twice
numbers_with_i_above is trivial
canreach with numbers up to 60 is easy
len is it length of a binary represention
Zobgib,
The key to this problem is not to understand how rapidly the growth of K's pattern grows, but HOW it grows, itself. The first step in this is to understand (as your coach said) how binary numbers count, as this determines everything about how K is determined. Binary numbers follow a pattern that is distinct when counting the number of positive bits. Its a single progressive repetitive pattern. I am going to demonstrate in an unusual way...
Assume i is an integer value. Assume b is the number of positive bits in i
i = 1;
b = 1;
i = 2; 3;
b = 1; 2;
i = 4; 5; 6; 7;
b = 1; 2; 2; 3;
i = 8; 9; 10; 11; 12; 13; 14; 15;
b = 1; 2; 2; 3; 2; 3; 3; 4;
i = 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31;
b = 1; 2; 2; 3; 2; 3; 3; 4; 2; 3; 3; 4; 3; 4; 4; 5;
I assure you, this pattern holds to infinity, but if needed you
should be able to find or construct a proof easily.
If you look at the data above, you'll notice a distinct pattern related to 2^n. Each time you have an integer exponent of 2, the pattern will reset by including the each term of previous pattern, and then each term of the previous pattern incremented by 1. As such, to get K, you just apply the new number to the pattern above. The key is to find a single expression (that is efficient) to receive your number of bits.
For demonstration, yet again, you can further extrapolate a new pattern off of this, because it is static and follows the same progression. Below is the original data modified with its K value (based on the recursion).
Assume i is an integer value. Assume b is the number of positive bits in i
i = 1;
b = 1;
K = 1;
i = 2; 3;
b = 1; 2;
K = 1; 2;
i = 4; 5; 6; 7;
b = 1; 2; 2; 3;
K = 1; 2; 2; 3;
i = 8; 9; 10; 11; 12; 13; 14; 15;
b = 1; 2; 2; 3; 2; 3; 3; 4;
K = 1; 2; 2; 3; 2; 3; 3; 2;
i = 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31;
b = 1; 2; 2; 3; 2; 3; 3; 4; 2; 3; 3; 4; 3; 4; 4; 5;
K = 1; 2; 2; 3; 2; 3; 3; 2; 2; 3; 3; 2; 3; 2; 2; 3;
If you notice, K follows a similar patterning, with a special condition... Everytime b is a power of 2, it actually lowers the K value by 2. Soooo, if you follow a binary progression, you should be able to easily map your K values. Since this pattern is dependant on powers of 2, and the pattern is dependant upon finding the nearest power of 2 and starting there, I propose the following solution. Take your LOW value and find the nearest power of 2 (p) such that 2^p < LOW. This can be done by "counting the bits" for just the lowest number. Again, once you know which exponent it is, you don't have to count the bits for any other number. You just increment through the pattern and you will have your b and hence K (which is following the same pattern).
Note: If you are particularly observant, you can use the previous b or K to determine the next. If the current i is odd, add 1 to the previous b. If the current i is divisible by 4, then you decrement b by either 1 or 2, dependent upon whether it's in the first 1/2 of the pattern or second half. And, of course, if i is a power of 2, start over at 1.
Fuzzical Logic
Pseudo-code Example (non-Optimized)
{ var LOW, HIGH
var power = 0
//Get Nearest Power Of 2
for (var i = 0 to 60) {
// Compare using bitwise AND
if (LOW bitAND (2 ^ i) = (2 ^ i)) {
if ((2 ^ i) <= LOW) {
set power to i
}
else {
// Found the Power: end the for loop
set i to 61
}
}
}
// Automatically 1 at a Power of 2
set numOfBits to 1
array numbersWithPositiveBits with 64 integers = 0
// Must create the pattern from Power of 2
set foundLOW to false
for (var j = (2^power) to HIGH) {
set lenOfPatten to (power + 1)
// Don't record until we have found the LOW value
if ((foundLOW is false) bitAND (j is equal to LOW)) {
set foundLOW to true
}
// If j is odd, increment numOfBits
if ((1 bitAND j) is equal to 1) {
increment numOfBits
}
else if (j modulus 4 == 0) {
decrement numOfBits accordingly //Figure this one out yourself, please
}
else if ((j - (2^power)) == (power + 1)) {
// We are at the next power
increment power
// Start pattern over
set numOfBits to 1
}
// Record if appropriate
if (foundLOW is equal to true) {
increment element numOfBits in array numbersWithPositiveBits
}
}
// From here, derive your K values.
You can solve this efficiently as follows:
ret = 0;
for (i = 1; i <= 64; i++) {
if (computeK(i) != desiredK) continue;
ret += numBelow(HIGH, i) - numBelow(LO - 1, i);
}
return ret;
The function numBelow(high, numSet) computes the number of integers less than or equal to high and greater than zero that have numSet bits set. To implement numBelow(high, numSet) efficiently, you can use something like the following:
numBelow(high, numSet) {
t = floor(lg(high));
ret = 0;
if (numBitsSet(high) == numSet) ret++;
while (numSet > 0 && t > 0) {
ret += nchoosek(t - 1, numSet);
numSet--;
while (--t > 0 && (((1 << t) & high) == 0));
}
return ret;
}
This is a full working example with c++17
#include <bits/stdc++.h>
using namespace std;
#define BASE_MAX 61
typedef unsigned long long ll;
ll combination[BASE_MAX][BASE_MAX];
vector<vector<ll>> NK(4);
int count_bit(ll n) {
int ret = 0;
while (n) {
if (n & 1) {
ret++;
}
n >>= 1;
}
return ret;
}
int get_leftmost_bit_index(ll n) {
int ret = 0;
while (n > 1) {
ret++;
n >>= 1;
}
return ret;
}
void pre_calculate() {
for (int i = 0; i < BASE_MAX; i++)
combination[i][0] = 1;
for (int i = 1; i < BASE_MAX; i++) {
for (int j = 1; j < BASE_MAX; j++) {
combination[i][j] = combination[i - 1][j] + combination[i - 1][j - 1];
}
}
NK[0].push_back(1);
for (int i = 2; i < BASE_MAX; i++) {
int bitCount = count_bit(i);
if (find(NK[0].begin(), NK[0].end(), bitCount) != NK[0].end()) {
NK[1].push_back(i);
}
}
for (int i = 1; i < BASE_MAX; i++) {
int bitCount = count_bit(i);
if (find(NK[1].begin(), NK[1].end(), bitCount) != NK[1].end()) {
NK[2].push_back(i);
}
}
for (int i = 1; i < BASE_MAX; i++) {
int bitCount = count_bit(i);
if (find(NK[2].begin(), NK[2].end(), bitCount) != NK[2].end()) {
NK[3].push_back(i);
}
}
}
ll how_many_numbers_have_n_bit_in_range(ll lo, ll hi, int bit_count) {
if (bit_count == 0) {
if (lo == 0) return 1;
else return 0;
}
if (lo == hi) {
return count_bit(lo) == bit_count;
}
int lo_leftmost = get_leftmost_bit_index(lo); // 100 -> 2
int hi_leftmost = get_leftmost_bit_index(hi); // 1101 -> 3
if (lo_leftmost == hi_leftmost) {
return how_many_numbers_have_n_bit_in_range(lo & ~(1LL << lo_leftmost), hi & ~(1LL << hi_leftmost),
bit_count - 1);
}
if (lo != 0) {
return how_many_numbers_have_n_bit_in_range(0, hi, bit_count) -
how_many_numbers_have_n_bit_in_range(0, lo - 1, bit_count);
}
ll ret = combination[hi_leftmost][bit_count];
ret += how_many_numbers_have_n_bit_in_range(1LL << hi_leftmost, hi, bit_count);
return ret;
}
int main(void) {
pre_calculate();
while (true) {
ll LO, HI;
int X;
scanf("%lld%lld%d", &LO, &HI, &X);
if (LO == 0 && HI == 0 && X == 0)
break;
switch (X) {
case 0:
cout << (LO == 1) << endl;
break;
case 1: {
int ret = 0;
ll power2 = 1;
for (int i = 0; i < BASE_MAX; i++) {
power2 *= 2;
if (power2 > HI)
break;
if (power2 >= LO)
ret++;
}
cout << ret << endl;
break;
}
case 2:
case 3:
case 4: {
vector<ll> &addedBitsSizes = NK[X - 1];
ll ret = 0;
for (auto bit_count_to_added: addedBitsSizes) {
ll result = how_many_numbers_have_n_bit_in_range(LO, HI, bit_count_to_added);
ret += result;
}
cout << ret << endl;
break;
}
default:
cout << 0 << endl;
break;
}
}
return 0;
}

Resources