Intel Secure Key (RDRAND) possibly strange behaviour - random

I've been using the Intel-provided RNG feature for some time, to provide myself with some randomness by means of a C++/CLI program I wrote myself.
However, after some time, something struck me as particularly suspicious. Among other uses, I ask for a random number between 1 and 4 and wrote the result down on paper each time. Here are the results :
2, 3, 3, 2, 1, 3, 4, 2, 3, 2, 3, 1, 3, 2, 3, 1, 2, 4, 2, 2, 1, 2, 1, 3, 1, 3, 3, 3, 3.
Number of 1s : 6
Number of 2s : 9
Number of 3s : 12
Number of 4s : 2
Total : 29
I'm actually wondering if there's a problem either with Intel's RNG, my algorithm, methodology or something else maybe ? Or do you consider the bias not to be significant enough yet ?
I'm using Windows 10 Pro, my CPU is an Intel Core i7-4710MQ.
Compiled with VS2017.
Methodology :
Start a Powershell command prompt
Load my assembly with Add-Type -Path <mydll>
Invoke [rdrw.Random]::Next(4)
Add one to the result
A detail that may be of importance : I don't ask for that number very often, so there's some time between draws and it usually comes when the RNG hasn't been used for some time (one hour at least).
And yes it's a lazy algorithm, I didn't want to bother myself with exceptions.
Algorithm follows :
#include <immintrin.h>
namespace rdrw {
#pragma managed(push,off)
unsigned long long getRdRand() {
unsigned long long val = 0;
while (!_rdrand64_step(&val));
return val;
}
#pragma managed(pop)
public ref class Random abstract sealed
{
public:
// Returns a random 64 bit unsigned integer
static unsigned long long Next() {
return getRdRand();
}
// Return a random unsigned integer between 0 and max-1 (inclusive)
static unsigned long long Next(unsigned long long max) {
unsigned long long nb = max - 1;
unsigned long long mask = 1;
unsigned long long draw = 0;
if (max <= 1)
return 0;
// Create a bitmask that's at least as big as the biggest acceptable value
while ((nb&mask) != nb)
{
mask <<= 1;
mask |= 1;
}
do
{
// Throw unnecessary bits
draw = Next() & mask;
} while (draw>nb);
return draw;
}
// return a random unsigned integer between min and max-1 inclusive
static unsigned long long Next(unsigned long long min, unsigned long long max) {
if (max == min)
return min;
if (max < min)
return 0;
unsigned long long diff = max - min;
return Next(diff) + min;
}
};
}
Thanks for your insights !

Related

Change the range of IRAND() in Fortran 77 [duplicate]

This is a follow on from a previously posted question:
How to generate a random number in C?
I wish to be able to generate a random number from within a particular range, such as 1 to 6 to mimic the sides of a die.
How would I go about doing this?
All the answers so far are mathematically wrong. Returning rand() % N does not uniformly give a number in the range [0, N) unless N divides the length of the interval into which rand() returns (i.e. is a power of 2). Furthermore, one has no idea whether the moduli of rand() are independent: it's possible that they go 0, 1, 2, ..., which is uniform but not very random. The only assumption it seems reasonable to make is that rand() puts out a Poisson distribution: any two nonoverlapping subintervals of the same size are equally likely and independent. For a finite set of values, this implies a uniform distribution and also ensures that the values of rand() are nicely scattered.
This means that the only correct way of changing the range of rand() is to divide it into boxes; for example, if RAND_MAX == 11 and you want a range of 1..6, you should assign {0,1} to 1, {2,3} to 2, and so on. These are disjoint, equally-sized intervals and thus are uniformly and independently distributed.
The suggestion to use floating-point division is mathematically plausible but suffers from rounding issues in principle. Perhaps double is high-enough precision to make it work; perhaps not. I don't know and I don't want to have to figure it out; in any case, the answer is system-dependent.
The correct way is to use integer arithmetic. That is, you want something like the following:
#include <stdlib.h> // For random(), RAND_MAX
// Assumes 0 <= max <= RAND_MAX
// Returns in the closed interval [0, max]
long random_at_most(long max) {
unsigned long
// max <= RAND_MAX < ULONG_MAX, so this is okay.
num_bins = (unsigned long) max + 1,
num_rand = (unsigned long) RAND_MAX + 1,
bin_size = num_rand / num_bins,
defect = num_rand % num_bins;
long x;
do {
x = random();
}
// This is carefully written not to overflow
while (num_rand - defect <= (unsigned long)x);
// Truncated division is intentional
return x/bin_size;
}
The loop is necessary to get a perfectly uniform distribution. For example, if you are given random numbers from 0 to 2 and you want only ones from 0 to 1, you just keep pulling until you don't get a 2; it's not hard to check that this gives 0 or 1 with equal probability. This method is also described in the link that nos gave in their answer, though coded differently. I'm using random() rather than rand() as it has a better distribution (as noted by the man page for rand()).
If you want to get random values outside the default range [0, RAND_MAX], then you have to do something tricky. Perhaps the most expedient is to define a function random_extended() that pulls n bits (using random_at_most()) and returns in [0, 2**n), and then apply random_at_most() with random_extended() in place of random() (and 2**n - 1 in place of RAND_MAX) to pull a random value less than 2**n, assuming you have a numerical type that can hold such a value. Finally, of course, you can get values in [min, max] using min + random_at_most(max - min), including negative values.
Following on from #Ryan Reich's answer, I thought I'd offer my cleaned up version. The first bounds check isn't required given the second bounds check, and I've made it iterative rather than recursive. It returns values in the range [min, max], where max >= min and 1+max-min < RAND_MAX.
unsigned int rand_interval(unsigned int min, unsigned int max)
{
int r;
const unsigned int range = 1 + max - min;
const unsigned int buckets = RAND_MAX / range;
const unsigned int limit = buckets * range;
/* Create equal size buckets all in a row, then fire randomly towards
* the buckets until you land in one of them. All buckets are equally
* likely. If you land off the end of the line of buckets, try again. */
do
{
r = rand();
} while (r >= limit);
return min + (r / buckets);
}
Here is a formula if you know the max and min values of a range, and you want to generate numbers inclusive in between the range:
r = (rand() % (max + 1 - min)) + min
unsigned int
randr(unsigned int min, unsigned int max)
{
double scaled = (double)rand()/RAND_MAX;
return (max - min +1)*scaled + min;
}
See here for other options.
Wouldn't you just do:
srand(time(NULL));
int r = ( rand() % 6 ) + 1;
% is the modulus operator. Essentially it will just divide by 6 and return the remainder... from 0 - 5
For those who understand the bias problem but can't stand the unpredictable run-time of rejection-based methods, this series produces a progressively less biased random integer in the [0, n-1] interval:
r = n / 2;
r = (rand() * n + r) / (RAND_MAX + 1);
r = (rand() * n + r) / (RAND_MAX + 1);
r = (rand() * n + r) / (RAND_MAX + 1);
...
It does so by synthesising a high-precision fixed-point random number of i * log_2(RAND_MAX + 1) bits (where i is the number of iterations) and performing a long multiplication by n.
When the number of bits is sufficiently large compared to n, the bias becomes immeasurably small.
It does not matter if RAND_MAX + 1 is less than n (as in this question), or if it is not a power of two, but care must be taken to avoid integer overflow if RAND_MAX * n is large.
Here is a slight simpler algorithm than Ryan Reich's solution:
/// Begin and end are *inclusive*; => [begin, end]
uint32_t getRandInterval(uint32_t begin, uint32_t end) {
uint32_t range = (end - begin) + 1;
uint32_t limit = ((uint64_t)RAND_MAX + 1) - (((uint64_t)RAND_MAX + 1) % range);
/* Imagine range-sized buckets all in a row, then fire randomly towards
* the buckets until you land in one of them. All buckets are equally
* likely. If you land off the end of the line of buckets, try again. */
uint32_t randVal = rand();
while (randVal >= limit) randVal = rand();
/// Return the position you hit in the bucket + begin as random number
return (randVal % range) + begin;
}
Example (RAND_MAX := 16, begin := 2, end := 7)
=> range := 6 (1 + end - begin)
=> limit := 12 (RAND_MAX + 1) - ((RAND_MAX + 1) % range)
The limit is always a multiple of the range,
so we can split it into range-sized buckets:
Possible-rand-output: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Buckets: [0, 1, 2, 3, 4, 5][0, 1, 2, 3, 4, 5][X, X, X, X, X]
Buckets + begin: [2, 3, 4, 5, 6, 7][2, 3, 4, 5, 6, 7][X, X, X, X, X]
1st call to rand() => 13
→ 13 is not in the bucket-range anymore (>= limit), while-condition is true
→ retry...
2nd call to rand() => 7
→ 7 is in the bucket-range (< limit), while-condition is false
→ Get the corresponding bucket-value 1 (randVal % range) and add begin
=> 3
In order to avoid the modulo bias (suggested in other answers) you can always use:
arc4random_uniform(MAX-MIN)+MIN
Where "MAX" is the upper bound and "MIN" is lower bound. For example, for numbers between 10 and 20:
arc4random_uniform(20-10)+10
arc4random_uniform(10)+10
Simple solution and better than using "rand() % N".
While Ryan is correct, the solution can be much simpler based on what is known about the source of the randomness. To re-state the problem:
There is a source of randomness, outputting integer numbers in range [0, MAX) with uniform distribution.
The goal is to produce uniformly distributed random integer numbers in range [rmin, rmax] where 0 <= rmin < rmax < MAX.
In my experience, if the number of bins (or "boxes") is significantly smaller than the range of the original numbers, and the original source is cryptographically strong - there is no need to go through all that rigamarole, and simple modulo division would suffice (like output = rnd.next() % (rmax+1), if rmin == 0), and produce random numbers that are distributed uniformly "enough", and without any loss of speed. The key factor is the randomness source (i.e., kids, don't try this at home with rand()).
Here's an example/proof of how it works in practice. I wanted to generate random numbers from 1 to 22, having a cryptographically strong source that produced random bytes (based on Intel RDRAND). The results are:
Rnd distribution test (22 boxes, numbers of entries in each box):
1: 409443 4.55%
2: 408736 4.54%
3: 408557 4.54%
4: 409125 4.55%
5: 408812 4.54%
6: 409418 4.55%
7: 408365 4.54%
8: 407992 4.53%
9: 409262 4.55%
10: 408112 4.53%
11: 409995 4.56%
12: 409810 4.55%
13: 409638 4.55%
14: 408905 4.54%
15: 408484 4.54%
16: 408211 4.54%
17: 409773 4.55%
18: 409597 4.55%
19: 409727 4.55%
20: 409062 4.55%
21: 409634 4.55%
22: 409342 4.55%
total: 100.00%
This is as close to uniform as I need for my purpose (fair dice throw, generating cryptographically strong codebooks for WWII cipher machines such as http://users.telenet.be/d.rijmenants/en/kl-7sim.htm, etc). The output does not show any appreciable bias.
Here's the source of cryptographically strong (true) random number generator:
Intel Digital Random Number Generator
and a sample code that produces 64-bit (unsigned) random numbers.
int rdrand64_step(unsigned long long int *therand)
{
unsigned long long int foo;
int cf_error_status;
asm("rdrand %%rax; \
mov $1,%%edx; \
cmovae %%rax,%%rdx; \
mov %%edx,%1; \
mov %%rax, %0;":"=r"(foo),"=r"(cf_error_status)::"%rax","%rdx");
*therand = foo;
return cf_error_status;
}
I compiled it on Mac OS X with clang-6.0.1 (straight), and with gcc-4.8.3 using "-Wa,q" flag (because GAS does not support these new instructions).
As said before modulo isn't sufficient because it skews the distribution. Heres my code which masks off bits and uses them to ensure the distribution isn't skewed.
static uint32_t randomInRange(uint32_t a,uint32_t b) {
uint32_t v;
uint32_t range;
uint32_t upper;
uint32_t lower;
uint32_t mask;
if(a == b) {
return a;
}
if(a > b) {
upper = a;
lower = b;
} else {
upper = b;
lower = a;
}
range = upper - lower;
mask = 0;
//XXX calculate range with log and mask? nah, too lazy :).
while(1) {
if(mask >= range) {
break;
}
mask = (mask << 1) | 1;
}
while(1) {
v = rand() & mask;
if(v <= range) {
return lower + v;
}
}
}
The following simple code lets you look at the distribution:
int main() {
unsigned long long int i;
unsigned int n = 10;
unsigned int numbers[n];
for (i = 0; i < n; i++) {
numbers[i] = 0;
}
for (i = 0 ; i < 10000000 ; i++){
uint32_t rand = random_in_range(0,n - 1);
if(rand >= n){
printf("bug: rand out of range %u\n",(unsigned int)rand);
return 1;
}
numbers[rand] += 1;
}
for(i = 0; i < n; i++) {
printf("%u: %u\n",i,numbers[i]);
}
}
Will return a floating point number in the range [0,1]:
#define rand01() (((double)random())/((double)(RAND_MAX)))

What would be the most efficient algorithm to find kth digit from right of a^b i.e a raised to power b

I recently came across this question. I know the naive approach i.e to find a^b and then extract least significant digits of this number 'k' times.
I am looking for a better approach.
'a' and 'b' are integers.
The naive approach breaks when a^p < 10^k, but a^(p+1) overflows. A solution which only requires 2*10^k-2 to fit into the variables is to write the (a*a) mod 10^k using Russian peasant multiplication. It calculates the product of a*b by multiplying a and dividing b with steps of two and hence prevents the overflow as you can take the modulus between each step.
Here is a c++ implementation of function calculating (a*b)%m without an overflow:
unsigned long long abModm(unsigned long long a, unsigned long long b,unsigned long long m){
unsigned long long res=0;
a=a%m;
b=b%m;
while (b>0){
if (b&1==1){//is b odd
res=(res+a)%m;//collect the result
}
a=(a<<1)%m;//multiply a
b>>=1;//divide b
}
return res;
}
Then you can use this to solve the problem as already suggested by others:
int kthDigit(unsigned long long a, unsigned long long b, int k){
unsigned long long m=1;
for (int i=0;i<k;++i) m*=10;
unsigned long long res=1;
for (int i=0;i<b;++i){
res=abModm(res,a,m);
}
m/=10;
return res/m;
}
The exponent calculation is O(b) you can do it in O(log(b)) with
unsigned long long res=1;
while (b){
if (b&1) res=abModm(res,a,m);
b>>=1;
a=abModm(a,a,m);
}
Check for the special case that a is divisible by 10. If k < b the result is 0, if k ≥ b then it's the (k - b'th) digit of (a/10)^b.
Do the calculation modulo 10^(k + 1). Replace a with a modulo 10^(k + 1). With 64 bit arithmetic, the calculation is easy if k ≤ 18 and a < 2^32.
Do the power by multiplying in steps, and in each step, discard the highest digits that will not influence the digit you're looking for. This will allow you to go beyond the integer size limitations of your implementation. In Javascript, which is limited to 253-1, you can calculate e.g. the 9th digit of 999999999999.
function powerDigit(a, b, k) {
var c = 1, max = Math.pow(10, k);
a %= max;
while (b--) {
c *= a;
// if (c >= Math.pow(2, 53)) return NaN; // Javascript limitation
c %= max;
}
return Math.floor(c * 10 / max);
}
document.write(powerDigit(9, 9, 9) + "<BR>"); // 3 ; 387420489
document.write(powerDigit(99, 9, 9) + "<BR>"); // 4 ; 913517247483640899
document.write(powerDigit(99, 99, 9) + "<BR>"); // 2 ; 3.697296376497267726e+197
document.write(powerDigit(999, 999, 9) + "<BR>"); // 4 ; 3.680634882592232678e+2996
document.write(powerDigit(999999, 999999, 9)); // 9 ; millions of digits
First you need to find a^b then you divide this by 10^(k-1) and from result you find modulo 10 and you get your kth number from right.
Here i give example of c code:
double r=pow(a,b)/pow(10,k-1);
int result=(int)r%10;

Iterating through a bit sequence and finding each set bit

My question is more or less what's in the title; I'm wondering if there's a fast way to going through a sequence of bits and finding each bit that's set.
More detailed information:
I'm currently working on a data stucture that represents a set of objects. In order to support some operations I need, the structure must be able to perform very fast intersection of subsets internally. The solution I've come up with is to have each subset of the structure's superset represented by a "bit array", where each bit maps to an index in the array that holds the superset's data. Example: if bit #1 is set in a subset, then the element at index 1 in the superset's array is present in the subset.
Each subset consists of an array of ulong big enough that there's enough bits to represent the entire superset (if the superset contains 256 elements, the size of the array must be 256 / 64 = 4). To find the intersection of 2 subsets, S1 and S2, I can simply iterate through S1 and S2's array, and find the bitwise-and between the ulongs at each index.
Now back to what my question is really about:
In order to return the data of a subset, I have to iterate through all the bits in the subset's "bit array" and find the bits that are set. This is how I curently do it:
/// <summary>
/// Gets an enumerator that enables enumeration over the strings in the subset.
/// </summary>
/// <returns> An enumerator. </returns>
public IEnumerator<string> GetEnumerator()
{
int bitArrayChunkIndex = 0;
int bitArrayChunkOffset = 0;
int bitArrayChunkCount = this.bitArray.Length;
while(bitArrayChunkIndex < bitArrayChunkCount)
{
ulong bitChunk = bitArray[bitArrayChunkIndex];
// RELEVANT PART
if (bitChunk != 0)
{
int bit = 0;
while (bit < BIT_ARRAY_CHUNK_SIZE /* 64 */)
{
if(bitChunk.BitIsSet(bit))
yield return supersetData[bitArrayChunkOffset + bit];
bit++;
}
}
bitArrayChunkIndex++;
bitArrayChunkOffset += BIT_ARRAY_CHUNK_SIZE;
// END OF RELEVANT PART
}
}
Is there any obvious ways to optimize this? Any bit hacks to enable it to be done very fast? Thanks!
On INTEL 386+, you can use machine instruction BitSearchFirst.
Following - sample for gcc. This is little tricky for process 64-bit words,
but anyway works quick and efficient.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int argc, char **argv) {
uint64_t val;
sscanf(argv[1], "%llx", &val);
printf("val=0x%llx\n", val);
uint32_t result;
if((uint32_t)val) { // first bit is inside lowest 32
asm("bsfl %1,%0" : "=r"(result) : "r"(val));
} else { // first bit is outside lowest 32
asm("bsfl %1,%0" : "=r"(result) : "r"(val >> 32));
result += 32;
}
printf("val=%llu; result=%u\n", val, result);
return 0;
}
Also, in your use x64 architecture, you can try to use bsfq instruction, and remove "if/else"
Take an array of sixteen integers, initialized with the number of bits set for the integers from zero to fifteen (i.e. 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4). Now take bitchunk % 16, and look up the result in the int array - that's the number of set bits in the first four bits of the chunk. Right shift four times, and repeat the entire operation fifteen more times.
You can do this with an array of 256 integers and 8 bit sub-chunks instead. I wouldn't recommend using an array of 4096 integers with 12 bit sub-chunks, that's getting a bit ridiculous.
int[] lookup = new int[16] {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4};
int bitCount = 0;
for(int i = 0; i < 16; i++) {
int firstFourBits = bitChunk % 16;
bitCount += lookup[firstFourBits];
bitChunk = butChunk >> 4;
}

Usage of _mm_shuffle_epi8 intrinsic

Can someone please explain the _mm_shuffle_epi8 SSSE3 intrinsic?
I know it shuffles 16 8-bit integers in an __m128i but not sure how I could use this.
I basically want to use _mm_shuffle_epi8 to modify the function below to get better performance.
while(not done)
dest[i+0] = (src+j).a;
dest[i+1] = (src+j).b;
dest[i+2] = (src+j).c;
dest[i+3] = (src+j+1).a;
dest[i+4] = (src+j+1).b;
dest[i+5] = (src+j+1).c;
i+=6;
j+=2;
_mm_shuffle_epi8 (better known as pshufb), essentially does this:
temp = dst;
for (int i = 0; i < 16; i++)
dst[i] = (src[i] & 0x80) == 0 ? temp[src[i] & 15] : 0;
As for whether you can use it here, it's impossible to tell without knowing the types involved. It won't be "nice" anyway because the destination is a block of 6 bytes (or words? or dwords?). You could make that work by unrolling and doing a lot of shifting and or-ing.
here's an example of using the intrinsic; you'll have to find out how to apply it to your particular situation. this code endian-swaps 4 32-bit integers at a time:
unsigned int *bswap(unsigned int *destination, unsigned int *source, int length) {
int i;
__m128i mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3);
for (i = 0; i < length; i += 4) {
_mm_storeu_si128((__m128i *)&destination[i],
_mm_shuffle_epi8(_mm_loadu_si128((__m128i *)&source[i]), mask));
}
return destination;
}

find the index of the highest bit set of a 32-bit number without loops obviously

Here's a tough one(atleast i had a hard time :P):
find the index of the highest bit set of a 32-bit number without using any loops.
With recursion:
int firstset(int bits) {
return (bits & 0x80000000) ? 31 : firstset((bits << 1) | 1) - 1;
}
Assumes [31,..,0] indexing
Returns -1 if no bits set
| 1 prevents stack overflow by capping the number of shifts until a 1 is reached (32)
Not tail recursive :)
Very interesting question, I will provide you an answer with benchmark
Solution using a loop
uint8_t highestBitIndex( uint32_t n )
{
uint8_t r = 0;
while ( n >>= 1 )
r++;
return r;
}
This help to better understand the question but is highly inefficient.
Solution using log
This approach can also be summarize by the log method
uint8_t highestSetBitIndex2(uint32_t n) {
return (uint8_t)(log(n) / log(2));
}
However it is also inefficient (even more than above one, see benchmark)
Solution using built-in instruction
uint8_t highestBitIndex3( uint32_t n )
{
return 31 - __builtin_clz(n);
}
This solution, while very efficient, suffer from the fact that it only work with specific compilers (gcc and clang will do) and on specific platforms.
NB: It is 31 and not 32 if we want the index
Solution with intrinsic
#include <x86intrin.h>
uint8_t highestSetBitIndex5(uint32_t n)
{
return _bit_scan_reverse(n); // undefined behavior if n == 0
}
This will call the bsr instruction at assembly level
Solution using inline assembly
LZCNT and BSR can be summarize in assembly with the below functions:
uint8_t highestSetBitIndex4(uint32_t n) // undefined behavior if n == 0
{
__asm__ __volatile__ (R"(
.intel_syntax noprefix
bsr eax, edi
.att_syntax noprefix
)"
);
}
uint8_t highestSetBitIndex7(uint32_t n) // undefined behavior if n == 0
{
__asm__ __volatile__ (R"(.intel_syntax noprefix
lzcnt ecx, edi
mov eax, 31
sub eax, ecx
.att_syntax noprefix
)");
}
NB: Do Not Use unless you know what you are doing
Solution using lookup table and magic number multiplication (probably the best AFAIK)
First you use the following function to clear all the bits except the highest one:
uint32_t keepHighestBit( uint32_t n )
{
n |= (n >> 1);
n |= (n >> 2);
n |= (n >> 4);
n |= (n >> 8);
n |= (n >> 16);
return n - (n >> 1);
}
Credit: The idea come from Henry S. Warren, Jr. in his book Hacker's Delight
Then we use an algorithm based on DeBruijn's Sequence to perform a kind of binary search:
uint8_t highestBitIndex8( uint32_t b )
{
static const uint32_t deBruijnMagic = 0x06EB14F9; // equivalent to 0b111(0xff ^ 3)
static const uint8_t deBruijnTable[64] = {
0, 0, 0, 1, 0, 16, 2, 0, 29, 0, 17, 0, 0, 3, 0, 22,
30, 0, 0, 20, 18, 0, 11, 0, 13, 0, 0, 4, 0, 7, 0, 23,
31, 0, 15, 0, 28, 0, 0, 21, 0, 19, 0, 10, 12, 0, 6, 0,
0, 14, 27, 0, 0, 9, 0, 5, 0, 26, 8, 0, 25, 0, 24, 0,
};
return deBruijnTable[(keepHighestBit(b) * deBruijnMagic) >> 26];
}
Another version:
void propagateBits(uint32_t *n) {
*n |= *n >> 1;
*n |= *n >> 2;
*n |= *n >> 4;
*n |= *n >> 8;
*n |= *n >> 16;
}
uint8_t highestSetBitIndex8(uint32_t b)
{
static const uint32_t Magic = (uint32_t) 0x07C4ACDD;
static const int BitTable[32] = {
0, 9, 1, 10, 13, 21, 2, 29,
11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7,
19, 27, 23, 6, 26, 5, 4, 31,
};
propagateBits(&b);
return BitTable[(b * Magic) >> 27];
}
Benchmark with 100 million calls
compiling with g++ -std=c++17 highestSetBit.cpp -O3 && ./a.out
highestBitIndex1 136.8 ms (loop)
highestBitIndex2 183.8 ms (log(n) / log(2))
highestBitIndex3 10.6 ms (de Bruijn lookup Table with power of two, 64 entries)
highestBitIndex4 4.5 ms (inline assembly bsr)
highestBitIndex5 6.7 ms (intrinsic bsr)
highestBitIndex6 4.7 ms (gcc lzcnt)
highestBitIndex7 7.1 ms (inline assembly lzcnt)
highestBitIndex8 10.2 ms (de Bruijn lookup Table, 32 entries)
I would personally go for highestBitIndex8 if portability is your focus, else gcc built-in is nice.
Floor of logarithm-base-two should do the trick (though you have to special-case 0).
Floor of log base 2 of 0001 is 0 (bit with index 0 is set).
" " of 0010 is 1 (bit with index 1 is set).
" " of 0011 is 1 (bit with index 1 is set).
" " of 0100 is 2 (bit with index 2 is set).
and so on.
On an unrelated note, this is actually a pretty terrible interview question (I say this as someone who does technical interviews for potential candidates), because it really doesn't correspond to anything you do in practical programming.
Your boss isn't going to come up to you one day and say "hey, so we have a rush job for this latest feature, and it needs to be implemented without loops!"
You could do it like this (not optimised):
int index = 0;
uint32_t temp = number;
if ((temp >> 16) != 0) {
temp >>= 16;
index += 16;
}
if ((temp >> 8) != 0) {
temp >>= 8
index += 8;
}
...
sorry for bumping an old thread, but how about this
inline int ilog2(unsigned long long i) {
union { float f; int i; } = { i };
return (u.i>>23)-27;
}
...
int highest=ilog2(x); highest+=(x>>highest)-1;
// and in case you need it
int lowest = ilog2((x^x-1)+1)-1;
this can be done as a binary search, reducing complexity of O(N) (for an N-bit word) to O(log(N)). A possible implementation is:
int highest_bit_index(uint32_t value)
{
if(value == 0) return 0;
int depth = 0;
int exponent = 16;
while(exponent > 0)
{
int shifted = value >> (exponent);
if(shifted > 0)
{
depth += exponent;
if(shifted == 1) return depth + 1;
value >>= exponent;
}
exponent /= 2;
}
return depth + 1;
}
the input is a 32 bit unsigned integer.
it has a loop that can be converted into 5 levels of if-statements , therefore resulting in 32 or so if-statements. you could also use recursion to get rid of the loop, or the absolutely evil "goto" ;)
Let
n - Decimal number for which bit location to be identified
start - Indicates decimal value of ( 1 << 32 ) - 2147483648
bitLocation - Indicates bit location which is set to 1
public int highestBitSet(int n, long start, int bitLocation)
{
if (start == 0)
{
return 0;
}
if ((start & n) > 0)
{
return bitLocation;
}
else
{
return highestBitSet(n, (start >> 1), --bitLocation);
}
}
long i = 1;
long startIndex = (i << 31);
int bitLocation = 32;
int value = highestBitSet(64, startIndex, bitLocation);
System.out.println(value);
int high_bit_set(int n, int pos)
{
if(pos<0)
return -1;
else
return (0x80000000 & n)?pos:high_bit_set((n<<1),--pos);
}
main()
{
int n=0x23;
int high_pos = high_bit_set(n,31);
printf("highest index = %d",high_pos);
}
From your main call function high_bit_set(int n , int pos) with the input value n, and default 31 as the highest position. And the function is like above.
Paislee's solution is actually pretty easy to make tail-recursive, though, it's a much slower solution than the suggested floor(log2(n));
int firstset_tr(int bits, int final_dec) {
// pass in 0 for final_dec on first call, or use a helper function
if (bits & 0x80000000) {
return 31-final_dec;
} else {
return firstset_tr( ((bits << 1) | 1), final_dec+1 );
}
}
This function also works for other bit sizes, just change the check,
e.g.
if (bits & 0x80) { // for 8-bit
return 7-final_dec;
}
Note that what you are trying to do is calculate the integer log2 of an integer,
#include <stdio.h>
#include <stdlib.h>
unsigned int
Log2(unsigned long x)
{
unsigned long n = x;
int bits = sizeof(x)*8;
int step = 1; int k=0;
for( step = 1; step < bits; ) {
n |= (n >> step);
step *= 2; ++k;
}
//printf("%ld %ld\n",x, (x - (n >> 1)) );
return(x - (n >> 1));
}
Observe that you can attempt to search more than 1 bit at a time.
unsigned int
Log2_a(unsigned long x)
{
unsigned long n = x;
int bits = sizeof(x)*8;
int step = 1;
int step2 = 0;
//observe that you can move 8 bits at a time, and there is a pattern...
//if( x>1<<step2+8 ) { step2+=8;
//if( x>1<<step2+8 ) { step2+=8;
//if( x>1<<step2+8 ) { step2+=8;
//}
//}
//}
for( step2=0; x>1L<<step2+8; ) {
step2+=8;
}
//printf("step2 %d\n",step2);
for( step = 0; x>1L<<(step+step2); ) {
step+=1;
//printf("step %d\n",step+step2);
}
printf("log2(%ld) %d\n",x,step+step2);
return(step+step2);
}
This approach uses a binary search
unsigned int
Log2_b(unsigned long x)
{
unsigned long n = x;
unsigned int bits = sizeof(x)*8;
unsigned int hbit = bits-1;
unsigned int lbit = 0;
unsigned long guess = bits/2;
int found = 0;
while ( hbit-lbit>1 ) {
//printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit);
//when value between guess..lbit
if( (x<=(1L<<guess)) ) {
//printf("%ld < 1<<%d %ld\n",x,guess,1L<<guess);
hbit=guess;
guess=(hbit+lbit)/2;
//printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit);
}
//when value between hbit..guess
//else
if( (x>(1L<<guess)) ) {
//printf("%ld > 1<<%d %ld\n",x,guess,1L<<guess);
lbit=guess;
guess=(hbit+lbit)/2;
//printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit);
}
}
if( (x>(1L<<guess)) ) ++guess;
printf("log2(x%ld)=r%d\n",x,guess);
return(guess);
}
Another binary search method, perhaps more readable,
unsigned int
Log2_c(unsigned long x)
{
unsigned long v = x;
unsigned int bits = sizeof(x)*8;
unsigned int step = bits;
unsigned int res = 0;
for( step = bits/2; step>0; )
{
//printf("log2(%ld) v %d >> step %d = %ld\n",x,v,step,v>>step);
while ( v>>step ) {
v>>=step;
res+=step;
//printf("log2(%ld) step %d res %d v>>step %ld\n",x,step,res,v);
}
step /= 2;
}
if( (x>(1L<<res)) ) ++res;
printf("log2(x%ld)=r%ld\n",x,res);
return(res);
}
And because you will want to test these,
int main()
{
unsigned long int x = 3;
for( x=2; x<1000000000; x*=2 ) {
//printf("x %ld, x+1 %ld, log2(x+1) %d\n",x,x+1,Log2(x+1));
printf("x %ld, x+1 %ld, log2_a(x+1) %d\n",x,x+1,Log2_a(x+1));
printf("x %ld, x+1 %ld, log2_b(x+1) %d\n",x,x+1,Log2_b(x+1));
printf("x %ld, x+1 %ld, log2_c(x+1) %d\n",x,x+1,Log2_c(x+1));
}
return(0);
}
well from what I know the function Log is Implemented very efficiently in most programming languages, and even if it does contain loops , it is probably very few of them , internally
So I would say that in most cases using the log would be faster , and more direct.
you do have to check for 0 though and avoid taking the log of 0, as that would cause the program to crash.

Resources