generate a random int given a random bit function - random

Suppose you're given a int randBit() function which returns, uniformly distributed, 0 or 1.
Write a randNumber(int max) function.
This is my implementation, but I can't prove/disprove that it's right.
// max number of bits
int i = (int)Math.floor(Math.log(max) / Math.log(2)) + 1;
int ret = randBit();
while (i-- > 0) {
ret = ret << 1 | randBit();
}
return ret;
The basic idea I had is that
find the number of bits present in the number
then generate the number by continuously concatenating the LSB until the bitlength is met

The approach to fill an int with random bits is the right way in my opinion. However, since your algorithm only works when max is power of 2 and is off by one in the loop, I'd suggest this modification:
// max number of bits
int i = (int)Math.floor(Math.log(max) / Math.log(2)) + 1;
int rnd = 0;
int mask = 1;
while (i-- > 0) {
rnd = rnd << 1 | randBit();
mask <<= 1; // or: mask *= 2
}
double q = (double)rnd / mask; // range is [0, 1)
return (int)((max + 1) * q);
Let's take a look at this:
i will always be equal to the number of bits that max occupies. When the loop is finished, rnd will contain that number of bits filled randomly with 0 or 1, and mask-1 will contain that number of bits filled with 1s. So it's safe to assume that the quotient of rnd and mask-1 is uniformly distributed between 0 and 1. This multiplied with max would wield results in the range between 0 and max, also uniformly distributed, in terms of floating/real values.
Now this result has to be mapped to integers, and of course you'd want them also to be uniformly distributed. The only catch here is the 1. If the quotient of rnd and mask-1 is exactly 1, there'd be an edge case that would cause trouble when scaling to the desired result range: There would be 0 .. max-1 values uniformly distributed, but max would be a rare exception.
To take care of this condition the quotient has to be built such that it ranges from 0 to 1, but with 1 exclusive. This is achieved by rnd / mask. This range can be easily mapped to uniformly-spreaded integers 0 .. max by multiplying with max+1 and casting to int.

Related

Change the range of IRAND() in Fortran 77 [duplicate]

This is a follow on from a previously posted question:
How to generate a random number in C?
I wish to be able to generate a random number from within a particular range, such as 1 to 6 to mimic the sides of a die.
How would I go about doing this?
All the answers so far are mathematically wrong. Returning rand() % N does not uniformly give a number in the range [0, N) unless N divides the length of the interval into which rand() returns (i.e. is a power of 2). Furthermore, one has no idea whether the moduli of rand() are independent: it's possible that they go 0, 1, 2, ..., which is uniform but not very random. The only assumption it seems reasonable to make is that rand() puts out a Poisson distribution: any two nonoverlapping subintervals of the same size are equally likely and independent. For a finite set of values, this implies a uniform distribution and also ensures that the values of rand() are nicely scattered.
This means that the only correct way of changing the range of rand() is to divide it into boxes; for example, if RAND_MAX == 11 and you want a range of 1..6, you should assign {0,1} to 1, {2,3} to 2, and so on. These are disjoint, equally-sized intervals and thus are uniformly and independently distributed.
The suggestion to use floating-point division is mathematically plausible but suffers from rounding issues in principle. Perhaps double is high-enough precision to make it work; perhaps not. I don't know and I don't want to have to figure it out; in any case, the answer is system-dependent.
The correct way is to use integer arithmetic. That is, you want something like the following:
#include <stdlib.h> // For random(), RAND_MAX
// Assumes 0 <= max <= RAND_MAX
// Returns in the closed interval [0, max]
long random_at_most(long max) {
unsigned long
// max <= RAND_MAX < ULONG_MAX, so this is okay.
num_bins = (unsigned long) max + 1,
num_rand = (unsigned long) RAND_MAX + 1,
bin_size = num_rand / num_bins,
defect = num_rand % num_bins;
long x;
do {
x = random();
}
// This is carefully written not to overflow
while (num_rand - defect <= (unsigned long)x);
// Truncated division is intentional
return x/bin_size;
}
The loop is necessary to get a perfectly uniform distribution. For example, if you are given random numbers from 0 to 2 and you want only ones from 0 to 1, you just keep pulling until you don't get a 2; it's not hard to check that this gives 0 or 1 with equal probability. This method is also described in the link that nos gave in their answer, though coded differently. I'm using random() rather than rand() as it has a better distribution (as noted by the man page for rand()).
If you want to get random values outside the default range [0, RAND_MAX], then you have to do something tricky. Perhaps the most expedient is to define a function random_extended() that pulls n bits (using random_at_most()) and returns in [0, 2**n), and then apply random_at_most() with random_extended() in place of random() (and 2**n - 1 in place of RAND_MAX) to pull a random value less than 2**n, assuming you have a numerical type that can hold such a value. Finally, of course, you can get values in [min, max] using min + random_at_most(max - min), including negative values.
Following on from #Ryan Reich's answer, I thought I'd offer my cleaned up version. The first bounds check isn't required given the second bounds check, and I've made it iterative rather than recursive. It returns values in the range [min, max], where max >= min and 1+max-min < RAND_MAX.
unsigned int rand_interval(unsigned int min, unsigned int max)
{
int r;
const unsigned int range = 1 + max - min;
const unsigned int buckets = RAND_MAX / range;
const unsigned int limit = buckets * range;
/* Create equal size buckets all in a row, then fire randomly towards
* the buckets until you land in one of them. All buckets are equally
* likely. If you land off the end of the line of buckets, try again. */
do
{
r = rand();
} while (r >= limit);
return min + (r / buckets);
}
Here is a formula if you know the max and min values of a range, and you want to generate numbers inclusive in between the range:
r = (rand() % (max + 1 - min)) + min
unsigned int
randr(unsigned int min, unsigned int max)
{
double scaled = (double)rand()/RAND_MAX;
return (max - min +1)*scaled + min;
}
See here for other options.
Wouldn't you just do:
srand(time(NULL));
int r = ( rand() % 6 ) + 1;
% is the modulus operator. Essentially it will just divide by 6 and return the remainder... from 0 - 5
For those who understand the bias problem but can't stand the unpredictable run-time of rejection-based methods, this series produces a progressively less biased random integer in the [0, n-1] interval:
r = n / 2;
r = (rand() * n + r) / (RAND_MAX + 1);
r = (rand() * n + r) / (RAND_MAX + 1);
r = (rand() * n + r) / (RAND_MAX + 1);
...
It does so by synthesising a high-precision fixed-point random number of i * log_2(RAND_MAX + 1) bits (where i is the number of iterations) and performing a long multiplication by n.
When the number of bits is sufficiently large compared to n, the bias becomes immeasurably small.
It does not matter if RAND_MAX + 1 is less than n (as in this question), or if it is not a power of two, but care must be taken to avoid integer overflow if RAND_MAX * n is large.
Here is a slight simpler algorithm than Ryan Reich's solution:
/// Begin and end are *inclusive*; => [begin, end]
uint32_t getRandInterval(uint32_t begin, uint32_t end) {
uint32_t range = (end - begin) + 1;
uint32_t limit = ((uint64_t)RAND_MAX + 1) - (((uint64_t)RAND_MAX + 1) % range);
/* Imagine range-sized buckets all in a row, then fire randomly towards
* the buckets until you land in one of them. All buckets are equally
* likely. If you land off the end of the line of buckets, try again. */
uint32_t randVal = rand();
while (randVal >= limit) randVal = rand();
/// Return the position you hit in the bucket + begin as random number
return (randVal % range) + begin;
}
Example (RAND_MAX := 16, begin := 2, end := 7)
=> range := 6 (1 + end - begin)
=> limit := 12 (RAND_MAX + 1) - ((RAND_MAX + 1) % range)
The limit is always a multiple of the range,
so we can split it into range-sized buckets:
Possible-rand-output: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Buckets: [0, 1, 2, 3, 4, 5][0, 1, 2, 3, 4, 5][X, X, X, X, X]
Buckets + begin: [2, 3, 4, 5, 6, 7][2, 3, 4, 5, 6, 7][X, X, X, X, X]
1st call to rand() => 13
→ 13 is not in the bucket-range anymore (>= limit), while-condition is true
→ retry...
2nd call to rand() => 7
→ 7 is in the bucket-range (< limit), while-condition is false
→ Get the corresponding bucket-value 1 (randVal % range) and add begin
=> 3
In order to avoid the modulo bias (suggested in other answers) you can always use:
arc4random_uniform(MAX-MIN)+MIN
Where "MAX" is the upper bound and "MIN" is lower bound. For example, for numbers between 10 and 20:
arc4random_uniform(20-10)+10
arc4random_uniform(10)+10
Simple solution and better than using "rand() % N".
While Ryan is correct, the solution can be much simpler based on what is known about the source of the randomness. To re-state the problem:
There is a source of randomness, outputting integer numbers in range [0, MAX) with uniform distribution.
The goal is to produce uniformly distributed random integer numbers in range [rmin, rmax] where 0 <= rmin < rmax < MAX.
In my experience, if the number of bins (or "boxes") is significantly smaller than the range of the original numbers, and the original source is cryptographically strong - there is no need to go through all that rigamarole, and simple modulo division would suffice (like output = rnd.next() % (rmax+1), if rmin == 0), and produce random numbers that are distributed uniformly "enough", and without any loss of speed. The key factor is the randomness source (i.e., kids, don't try this at home with rand()).
Here's an example/proof of how it works in practice. I wanted to generate random numbers from 1 to 22, having a cryptographically strong source that produced random bytes (based on Intel RDRAND). The results are:
Rnd distribution test (22 boxes, numbers of entries in each box):
1: 409443 4.55%
2: 408736 4.54%
3: 408557 4.54%
4: 409125 4.55%
5: 408812 4.54%
6: 409418 4.55%
7: 408365 4.54%
8: 407992 4.53%
9: 409262 4.55%
10: 408112 4.53%
11: 409995 4.56%
12: 409810 4.55%
13: 409638 4.55%
14: 408905 4.54%
15: 408484 4.54%
16: 408211 4.54%
17: 409773 4.55%
18: 409597 4.55%
19: 409727 4.55%
20: 409062 4.55%
21: 409634 4.55%
22: 409342 4.55%
total: 100.00%
This is as close to uniform as I need for my purpose (fair dice throw, generating cryptographically strong codebooks for WWII cipher machines such as http://users.telenet.be/d.rijmenants/en/kl-7sim.htm, etc). The output does not show any appreciable bias.
Here's the source of cryptographically strong (true) random number generator:
Intel Digital Random Number Generator
and a sample code that produces 64-bit (unsigned) random numbers.
int rdrand64_step(unsigned long long int *therand)
{
unsigned long long int foo;
int cf_error_status;
asm("rdrand %%rax; \
mov $1,%%edx; \
cmovae %%rax,%%rdx; \
mov %%edx,%1; \
mov %%rax, %0;":"=r"(foo),"=r"(cf_error_status)::"%rax","%rdx");
*therand = foo;
return cf_error_status;
}
I compiled it on Mac OS X with clang-6.0.1 (straight), and with gcc-4.8.3 using "-Wa,q" flag (because GAS does not support these new instructions).
As said before modulo isn't sufficient because it skews the distribution. Heres my code which masks off bits and uses them to ensure the distribution isn't skewed.
static uint32_t randomInRange(uint32_t a,uint32_t b) {
uint32_t v;
uint32_t range;
uint32_t upper;
uint32_t lower;
uint32_t mask;
if(a == b) {
return a;
}
if(a > b) {
upper = a;
lower = b;
} else {
upper = b;
lower = a;
}
range = upper - lower;
mask = 0;
//XXX calculate range with log and mask? nah, too lazy :).
while(1) {
if(mask >= range) {
break;
}
mask = (mask << 1) | 1;
}
while(1) {
v = rand() & mask;
if(v <= range) {
return lower + v;
}
}
}
The following simple code lets you look at the distribution:
int main() {
unsigned long long int i;
unsigned int n = 10;
unsigned int numbers[n];
for (i = 0; i < n; i++) {
numbers[i] = 0;
}
for (i = 0 ; i < 10000000 ; i++){
uint32_t rand = random_in_range(0,n - 1);
if(rand >= n){
printf("bug: rand out of range %u\n",(unsigned int)rand);
return 1;
}
numbers[rand] += 1;
}
for(i = 0; i < n; i++) {
printf("%u: %u\n",i,numbers[i]);
}
}
Will return a floating point number in the range [0,1]:
#define rand01() (((double)random())/((double)(RAND_MAX)))

Given a binary number, how to find the nth set bit from the right in O(1) time?

Lets say x = 1110 (14 in Dec) and I want to find the 2nd set bit from the right, 0100 (4 in Dec)
Yet another example, lets say x = 10110010 (178 in Dec) and I want the 3rd set bit from the right,i.e, 00100000 (32 in Dec)
How to find it? Is there a hack?
Subtracting one from a number will clear the least-significant bit which was set, while setting bits below that. ANDing with the original number will then leave a number which was equal to the original except with the original lowest set bit clear. This procedure may be iterated N times to yield a number with the lowest N set bits clear. The bit which is changed by the Nth iteration (if any) will be the Nth lowest bit that was set in the original.
Assuming a two's complement signed 32-bit integer called number is the input (hence only counting bits 0 to 30 in the for loop):
int number = (1 << 3) | 1; // this is the input, it can be whatever you like
int currentLsbCount = 0;
int desiredLsbCount = 2; // this is your n
int foundLsb = 0;
int foundLsbIndex = 0;
for (int i = 0; i < 31; i++)
{
int bit = (number >> i) & 1;
if (bit == 1)
{
++currentLsbCount;
}
if (currentLsbCount == desiredLsbCount)
{
foundLsb = number & (1 << i);
foundLsbIndex = i;
break;
}
}
foundLsb will hold the value or will be zero if the input was zero; foundLsbIndex will hold the index of the bit.
As far as I know you would have to iterate. There is no quicker method than looping through the bits. You could add some skip logic in, but it would not improve the worst case timing. For instance:
if ((number & ((1 << x) - 1)) == number)
{
// the bottom x bits are zero...
}
This would increase the number of operations for the worst case.
In VB.NET, I'd possibly do the following:
Private Function ReturnBit(input As Long, num As Long) As Long
Dim iResult As Long = 0 'Counts set bits.
Dim work As Long = input 'Working copy of input.
'Looping from the LSB to the MSB of a byte. Adjust for desired
'length, 15 for 2 bytes, 31 for 4 bytes, etc.
For i As Integer = 0 To 7
'If the working variable is 0, the input does not contain as
'many set bits as required. Return -1 if you wish.
If work = 0 Then Return 0
'Add the now LSB if 1, 0 otherwise.
iResult += (work And 1)
'iResult contains the number of set bits now. If this is
'the requested number, return this number. If you're just after
'the position, just return i instead. Instead of 2^i it could be
'more efficient to use 1<<i, but I'd rely on the compiler for
'this.
If iResult = num Then Return CLng(2 ^ i)
'Remove the LSB from the working copy.
work >>= 1
Next
Return 0 'Not enough set bits in input.
End Function

Non repetitive random number generator in c

I want to make a program that will give me 4 random numbers in the range 1 - 20 without any of them being the same. It does give me 4 different random numbers but every couple of tries 2 numbers are the same. I don't want that.
Here's my code:
int main(){
int g;
srand(time(0));
start:;
scanf("%d",&g);
switch(g){
case 1:RNG_4_10();
break;
default:exit(0);
break;
}
goto start;
}
int RNG_4_10(){
int a,n,i,c;
for(c=0;c<10;c++){
printf("\n");
for(i=0;i<4;i++){
a = (rand() % 20 + 1); //give a random value to a;
n = a; //assign n the value of a;
while(a == n){
a = rand() % 20 + 1;
}
printf("%d\t",a);
}
}
}
Also, I know that RNG's have a probability of repeating numbers and in theory they could generate the same number for infinity, but what I don't get is how can I have 2 similar numbers on the same run. I added that while to avoid that. Is this code wrong or my understanding is awful?
Most random number generators will have a probability of repeating values. If they didn't their behaviour would be less random by various measures.
If you want four random values in the range 1-20, then create an array of 20 elements with all those values, and shuffle it with the help of your random number generator. Then pick the first four values.
A common technique to shuffle is (in pseudocode)
/* shuffle an array of n elements */
for (i = n-1; i > 0; --i)
{
swap(array[i], array[gen(n)]); /* zero-based array indexing */
}
where gen(n) returns a suitably random value with values between 0 and n-1, possibly with repetition.

A problem from a programming competition... Digit Sums

I need help solving problem N from this earlier competition:
Problem N: Digit Sums
Given 3 positive integers A, B and C,
find how many positive integers less
than or equal to A, when expressed in
base B, have digits which sum to C.
Input will consist of a series of
lines, each containing three integers,
A, B and C, 2 ≤ B ≤ 100, 1 ≤ A, C ≤
1,000,000,000. The numbers A, B and C
are given in base 10 and are separated
by one or more blanks. The input is
terminated by a line containing three
zeros.
Output will be the number of numbers,
for each input line (it must be given
in base 10).
Sample input
100 10 9
100 10 1
750000 2 2
1000000000 10 40
100000000 100 200
0 0 0
Sample output
10
3
189
45433800
666303
The relevant rules:
Read all input from the keyboard, i.e. use stdin, System.in, cin or equivalent. Input will be redirected from a file to form the input to your submission.
Write all output to the screen, i.e. use stdout, System.out, cout or equivalent. Do not write to stderr. Do NOT use, or even include, any module that allows direct manipulation of the screen, such as conio, Crt or anything similar. Output from your program is redirected to a file for later checking. Use of direct I/O means that such output is not redirected and hence cannot be checked. This could mean that a correct program is rejected!
Unless otherwise stated, all integers in the input will fit into a standard 32-bit computer word. Adjacent integers on a line will be separated by one or more spaces.
Of course, it's fair to say that I should learn more before trying to solve this, but i'd really appreciate it if someone here told me how it's done.
Thanks in advance, John.
Other people pointed out trivial solution: iterate over all numbers from 1 to A. But this problem, actually, can be solved in nearly constant time: O(length of A), which is O(log(A)).
Code provided is for base 10. Adapting it for arbitrary base is trivial.
To reach above estimate for time, you need to add memorization to recursion. Let me know if you have questions about that part.
Now, recursive function itself. Written in Java, but everything should work in C#/C++ without any changes. It's big, but mostly because of comments where I try to clarify algorithm.
// returns amount of numbers strictly less than 'num' with sum of digits 'sum'
// pay attention to word 'strictly'
int count(int num, int sum) {
// no numbers with negative sum of digits
if (sum < 0) {
return 0;
}
int result = 0;
// imagine, 'num' == 1234
// let's check numbers 1233, 1232, 1231, 1230 manually
while (num % 10 > 0) {
--num;
// check if current number is good
if (sumOfDigits(num) == sum) {
// one more result
++result;
}
}
if (num == 0) {
// zero reached, no more numbers to check
return result;
}
num /= 10;
// Using example above (1234), now we're left with numbers
// strictly less than 1230 to check (1..1229)
// It means, any number less than 123 with arbitrary digit appended to the right
// E.g., if this digit in the right (last digit) is 3,
// then sum of the other digits must be "sum - 3"
// and we need to add to result 'count(123, sum - 3)'
// let's iterate over all possible values of last digit
for (int digit = 0; digit < 10; ++digit) {
result += count(num, sum - digit);
}
return result;
}
Helper function
// returns sum of digits, plain and simple
int sumOfDigits(int x) {
int result = 0;
while (x > 0) {
result += x % 10;
x /= 10;
}
return result;
}
Now, let's write a little tester
int A = 12345;
int C = 13;
// recursive solution
System.out.println(count(A + 1, C));
// brute-force solution
int total = 0;
for (int i = 1; i <= A; ++i) {
if (sumOfDigits(i) == C) {
++total;
}
}
System.out.println(total);
You can write more comprehensive tester checking all values of A, but overall solution seems to be correct. (I tried several random A's and C's.)
Don't forget, you can't test solution for A == 1000000000 without memorization: it'll run too long. But with memorization, you can test it even for A == 10^1000.
edit
Just to prove a concept, poor man's memorization. (in Java, in other languages hashtables are declared differently) But if you want to learn something, it might be better to try to do it yourself.
// hold values here
private Map<String, Integer> mem;
int count(int num, int sum) {
// no numbers with negative sum of digits
if (sum < 0) {
return 0;
}
String key = num + " " + sum;
if (mem.containsKey(key)) {
return mem.get(key);
}
// ...
// continue as above...
// ...
mem.put(key, result);
return result;
}
Here's the same memoized recursive solution that Rybak posted, but with a simpler implementation, in my humble opinion:
HashMap<String, Integer> cache = new HashMap<String, Integer>();
int count(int bound, int base, int sum) {
// No negative digit sums.
if (sum < 0)
return 0;
// Handle one digit case.
if (bound < base)
return (sum <= bound) ? 1 : 0;
String key = bound + " " + sum;
if (cache.containsKey(key))
return cache.get(key);
int count = 0;
for (int digit = 0; digit < base; digit++)
count += count((bound - digit) / base, base, sum - digit);
cache.put(key, count);
return count;
}
This is not the complete solution (no input parsing). To get the number in base B, repeatedly take the modulo B, and then divide by B until the result is 0. This effectively computes the base-B digit from the right, and then shifts the number right.
int A,B,C; // from input
for (int x=1; x<A; x++)
{
int sumDigits = 0;
int v = x;
while (v!=0) {
sumDigits += (v % B);
v /= B;
}
if (sumDigits==C)
cout << x;
}
This is a brute force approach. It may be possible to compute this quicker by determining which sets of base B digits add up to C, arranging these in all permutations that are less than A, and then working backwards from that to create the original number.
Yum.
Try this:
int number, digitSum, resultCounter = 0;
for(int i=1; i<=A, i++)
{
number = i; //to avoid screwing up our counter
digitSum = 0;
while(number > 1)
{
//this is the next "digit" of the number as it would be in base B;
//works with any base including 10.
digitSum += (number % B);
//remove this digit from the number, square the base, rinse, repeat
number /= B;
}
digitSum += number;
//Does the sum match?
if(digitSum == C)
resultCounter++;
}
That's your basic algorithm for one line. Now you wrap this in another For loop for each input line you received, preceded by the input collection phase itself. This process can be simplified, but I don't feel like coding your entire answer to see if my algorithm works, and this looks right whereas the simpler tricks are harder to pass by inspection.
The way this works is by modulo dividing by powers of the base. Simple example, 1234 in base 10:
1234 % 10 = 4
1234 / 10 = 123 //integer division truncates any fraction
123 % 10 = 3 //sum is 7
123 / 10 = 12
12 % 10 = 2 //sum is 9
12 / 10 = 1 //end condition, add this and the sum is 10
A harder example to figure out by inspection would be the same number in base 12:
1234 % 12 = 10 //you can call it "A" like in hex, but we need a sum anyway
1234 / 12 = 102
102 % 12 = 6 // sum 16
102/12 = 8
8 % 12 = 8 //sum 24
8 / 12 = 0 //end condition, sum still 24.
So 1234 in base 12 would be written 86A. Check the math:
8*12^2 + 6*12 + 10 = 1152 + 72 + 10 = 1234
Have fun wrapping the rest of the code around this.

How to find a binary logarithm very fast? (O(1) at best)

Is there any very fast method to find a binary logarithm of an integer number? For example, given a number
x=52656145834278593348959013841835216159447547700274555627155488768 such algorithm must find y=log(x,2) which is 215. x is always a power of 2.
The problem seems to be really simple. All what is required is to find the position of the most significant 1 bit. There is a well-known method FloorLog, but it is not very fast especially for the very long multi-words integers.
What is the fastest method?
A quick hack: Most floating-point number representations automatically normalise values, meaning that they effectively perform the loop Christoffer Hammarström mentioned in hardware. So simply converting from an integer to FP and extracting the exponent should do the trick, provided the numbers are within the FP representation's exponent range! (In your case, your integer input requires multiple machine words, so multiple "shifts" will need to be performed in the conversion.)
If the integers are stored in a uint32_t a[], then my obvious solution would be as follows:
Run a linear search over a[] to find the highest-valued non-zero uint32_t value a[i] in a[] (test using uint64_t for that search if your machine has native uint64_t support)
Apply the bit twiddling hacks to find the binary log b of the uint32_t value a[i] you found in step 1.
Evaluate 32*i+b.
The answer is implementation or language dependent. Any implementation can store the number of significant bits along with the data, as it is often useful. If it must be calculated, then find the most significant word/limb and the most significant bit in that word.
If you're using fixed-width integers then the other answers already have you pretty-well covered.
If you're using arbitrarily large integers, like int in Python or BigInteger in Java, then you can take advantage of the fact that their variable-size representation uses an underlying array, so the base-2 logarithm can be computed easily and quickly in O(1) time using the length of the underlying array. The base-2 logarithm of a power of 2 is simply one less than the number of bits required to represent the number.
So when n is an integer power of 2:
In Python, you can write n.bit_length() - 1 (docs).
In Java, you can write n.bitLength() - 1 (docs).
You can create an array of logarithms beforehand. This will find logarithmic values up to log(N):
#define N 100000
int naj[N];
naj[2] = 1;
for ( int i = 3; i <= N; i++ )
{
naj[i] = naj[i-1];
if ( (1 << (naj[i]+1)) <= i )
naj[i]++;
}
The array naj is your logarithmic values. Where naj[k] = log(k).
Log is based on two.
This uses binary search for finding the closest power of 2.
public static int binLog(int x,boolean shouldRoundResult){
// assuming 32-bit integer
int lo=0;
int hi=31;
int rangeDelta=hi-lo;
int expGuess=0;
int guess;
while(rangeDelta>1){
expGuess=(lo+hi)/2; // or (loGuess+hiGuess)>>1
guess=1<<expGuess;
if(guess<x){
lo=expGuess;
} else if(guess>x){
hi=expGuess;
} else {
lo=hi=expGuess;
}
rangeDelta=hi-lo;
}
if(shouldRoundResult && hi>lo){
int loGuess=1<<lo;
int hiGuess=1<<hi;
int loDelta=Math.abs(x-loGuess);
int hiDelta=Math.abs(hiGuess-x);
if(loDelta<hiDelta)
expGuess=lo;
else
expGuess=hi;
} else {
expGuess=lo;
}
int result=expGuess;
return result;
}
The best option on top of my head would be a O(log(logn)) approach, by using binary search. Here is an example for a 64-bit ( <= 2^63 - 1 ) number (in C++):
int log2(int64_t num) {
int res = 0, pw = 0;
for(int i = 32; i > 0; i --) {
res += i;
if(((1LL << res) - 1) & num)
res -= i;
}
return res;
}
This algorithm will basically profide me with the highest number res such as (2^res - 1 & num) == 0. Of course, for any number, you can work it out in a similar matter:
int log2_better(int64_t num) {
var res = 0;
for(i = 32; i > 0; i >>= 1) {
if( (1LL << (res + i)) <= num )
res += i;
}
return res;
}
Note that this method relies on the fact that the "bitshift" operation is more or less O(1). If this is not the case, you would have to precompute either all the powers of 2, or the numbers of form 2^2^i (2^1, 2^2, 2^4, 2^8, etc.) and do some multiplications(which in this case aren't O(1)) anymore.
The example in the OP is an integer string of 65 characters, which is not representable by a INT64 or even INT128. It is still very easy to get the Log(2,x) from this string by converting it to a double-precision number. This at least gives you easy access to integers upto 2^1023.
Below you find some form of pseudocode
# 1. read the string
string="52656145834278593348959013841835216159447547700274555627155488768"
# 2. extract the length of the string
l=length(string) # l = 65
# 3. read the first min(l,17) digits in a float
float=to_float(string(1: min(17,l) ))
# 4. multiply with the correct power of 10
float = float * 10^(l-min(17,l) ) # float = 5.2656145834278593E64
# 5. Take the log2 of this number and round to the nearest integer
log2 = Round( Log(float,2) ) # 215
Note:
some computer languages can convert arbitrary strings into a double precision number. So steps 2,3 and 4 could be replaced by x=to_float(string)
Step 5 could be done quicker by just reading the double-precision exponent (bits 53 up to and including 63) and subtracting 1023 from it.
Quick example code: If you have awk you can quickly test this algorithm.
The following code creates the first 300 powers of two:
awk 'BEGIN{for(n=0;n<300; n++) print 2^n}'
The following reads the input and does the above algorithm:
awk '{ l=length($0); m = (l > 17 ? 17 : l)
x = substr($0,1,m) * 10^(l-m)
print log(x)/log(2)
}'
So the following bash-command is a convoluted way to create a consecutive list of numbers from 0 to 299:
$ awk 'BEGIN{for(n=0;n<300; n++) print 2^n}' | awk '{ l=length($0); m = (l > 17 ? 17 : l); x = substr($0,1,m) * 10^(l-m); print log(x)/log(2) }'
0
1
2
...
299

Resources