I'm interested in a fast method for "expanding bits," which can be defined as the following:
Let B be a binary number with n bits, i.e. B \in {0,1}^n
Let P be the position of all 1/true bits in B, i.e. 1 << p[i] & B == 1, and |P|=k
For another given number, A \in {0,1}^k, let Ap be the bit-expanded form of A given B, such that Ap[j] == A[j] << p[j].
The result of the "bit expansion" is Ap.
A couple examples:
Given B: 0010 1110, A: 0110, then Ap should be 0000 1100
Given B: 1001 1001, A: 1101, then Ap should be 1001 0001
Following is a straightforward algorithm, but I can't help shake the feeling that there's a faster/easier way to do this.
unsigned int expand_bits(unsigned int A, unsigned int B, int n) {
int k = popcount(B); // cuda function, but there are good methods for this
unsigned int Ap = 0;
int j = k-1;
// Starting at the most significant bit,
for (int i = n - 1; i >= 0; --i) {
Ap <<= 1;
// if B is 1, add the value at A[j] to Ap, decrement j.
if (B & (1 << i)) {
Ap += (A >> j--) & 1;
}
}
return Ap;
}
The question appears to be asking for a CUDA emulation of the BMI2 instruction PDEP, which takes a source operand a, and deposits its bits based on the positions of the 1-bits of a mask b. There is no hardware support for an identical, or a similar, operation on currently shipping GPUs; that is, up to and including the Maxwell architecture.
I am assuming, based on the two examples given, that the mask b in general is sparse, and that we can minimize work by only iterating over the 1-bits of b. This could cause divergent branches on the GPU, but the exact trade-off in performance is unknown without knowledge of a specific use case. For now, I am assuming that the exploitation of sparsity in the mask b has a stronger positive influence on performance compared to the negative impact of divergence.
In the emulation code below, I have reduced the use of potentially "expensive" shift operations, instead relying mostly on simple ALU instructions. On various GPUs, shift instructions are executed with lower throughput than simple integer arithmetic. I have retained a single shift, off the critical path through the code, to avoid becoming execution limited by the arithmetic units. If desired, the expression 1U << i can be replaced by addition: introduce a variable m that is initialized to 1 before the loop and doubled each time through the loop.
The basic idea is to isolate each 1-bit of mask b in turn (starting at the least significant end), AND it with the value of the i-th bit of a, and incorporate the result into the expanded destination. After a 1-bit from b has been used, we remove it from the mask, and iterate until the mask becomes zero.
In order to avoid shifting the i-th bit of a into place, we simply isolate it and then replicate its value to all more significant bits by simple negation, taking advantage of the two's complement representation of integers.
/* Emulate PDEP: deposit the bits of 'a' (starting with the least significant
bit) at the positions indicated by the set bits of the mask stored in 'b'.
*/
__device__ unsigned int my_pdep (unsigned int a, unsigned int b)
{
unsigned int l, s, r = 0;
int i;
for (i = 0; b; i++) { // iterate over 1-bits in mask, until mask becomes 0
l = b & (0 - b); // extract mask's least significant 1-bit
b = b ^ l; // clear mask's least significant 1-bit
s = 0 - (a & (1U << i)); // spread i-th bit of 'a' to more signif. bits
r = r | (l & s); // deposit i-th bit of 'a' at position of mask's 1-bit
}
return r;
}
The variant without any shift operations alluded to above looks as follows:
/* Emulate PDEP: deposit the bits of 'a' (starting with the least significant
bit) at the positions indicated by the set bits of the mask stored in 'b'.
*/
__device__ unsigned int my_pdep (unsigned int a, unsigned int b)
{
unsigned int l, s, r = 0, m = 1;
while (b) { // iterate over 1-bits in mask, until mask becomes 0
l = b & (0 - b); // extract mask's least significant 1-bit
b = b ^ l; // clear mask's least significant 1-bit
s = 0 - (a & m); // spread i-th bit of 'a' to more significant bits
r = r | (l & s); // deposit i-th bit of 'a' at position of mask's 1-bit
m = m + m; // mask for next bit of 'a'
}
return r;
}
In comments below, #Evgeny Kluev pointed to a shift-free PDEP emulation at the chessprogramming website that looks potentially faster than either of my two implementations above; it seems worth a try.
Related
I am trying to generate single precision floating point random number using FPGA by generating number between 0 and 0x3f80000 (IEEE format for 1). But since there are more number of discreet points near to zero than 1, I am not getting uniform generation. Is there any transformation which I can apply to mimic uniform generation. I am using LFSR(32 Bit) and Xoshiro random number generation.
A standard way to generate uniformly distributed floats in [0,1) from uniformly distributed 32-bit unsigned integers is to multiply the integers with 2-32. Obviously we wouldn't instantiate a floating-point multiplier on the FPGA just for this purpose, and we do not have to, since the multiplier is a power of two. In essence what is needed is a conversion of the integer to a floating-point number, then decrementing the exponent of the floating-point number by 32. This does not work for a zero input which has to be handled as a special case. In the ISO-C99 code below I am assuming that float is mapped to IEEE-754 binary32 type.
Other than for certain special cases, the significand of an IEEE-754 binary floating-point number is normalized to [1,2). To convert an integer into the significand, we need to normalize it, so the most significant bit is set. We can do this by counting the number of leading zero bits, then left shifting the number by that amount. The count of leading zeros is also needed to adjust the exponent.
The significand of a binary32 number comprises 24 bits, of which only 23 bits are stored; the most significant bit (the integer bit) is always one and therefore implicit. This means not all of the 32 bits of the integer can be incorporated into the binary32, so in converting a 32-bit unsigned integer one usually rounds to 24-bit precision. To simplify the implementation, in the code below I simply truncate by cutting off the least significant eight bits, which should have no noticeable effect on the uniform distribution. For the exponent part, we can combine the adjustments due to normalization step with the subtraction due to the scale factor of 2-32.
The code below is written using hardware-centric primitives. Extracting a bit is just a question of grabbing the correct wire, and shifts by fixed amounts are likewise simply wire shifts. The circuit needed to count the number of leading zeros is typically called a priority encoder.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#define USE_FP_MULTIPLY (0)
uint32_t bit (uint32_t, uint32_t);
uint32_t mux (uint32_t, uint32_t, uint32_t);
uint32_t clz (uint32_t);
float uint32_as_float (uint32_t);
/* uniform float in [0, 1) from uniformly distributed random integers */
float uniform_rand_01 (uint32_t i)
{
const uint32_t FP32_EXPO_BIAS = 127;
const uint32_t FP32_MANT_BITS = 24;
const uint32_t FP32_STORED_MANT_BITS = FP32_MANT_BITS - 1;
uint32_t lz, r;
// compute shift amount needed for normalization
lz = clz (i);
// normalize so that msb is set, except when input is zero
i = mux (bit (lz, 4), i << 16, i);
i = mux (bit (lz, 3), i << 8, i);
i = mux (bit (lz, 2), i << 4, i);
i = mux (bit (lz, 1), i << 2, i);
i = mux (bit (lz, 0), i << 1, i);
// build bit pattern for IEEE-754 binary32 floating-point number
r = (((FP32_EXPO_BIAS - 2 - lz) << FP32_STORED_MANT_BITS) +
(i >> (32 - FP32_MANT_BITS)));
// handle special case of zero input
r = mux (i == 0, i, r);
// treat bit-pattern as 'float'
return uint32_as_float (r);
}
// extract bit i from x
uint32_t bit (uint32_t x, uint32_t i)
{
return (x >> i) & 1;
}
// simulate 2-to-1 multiplexer: c ? a : b ; c must be in {0,1}
uint32_t mux (uint32_t c, uint32_t a, uint32_t b)
{
uint32_t m = c * 0xffffffff;
return (a & m) | (b & ~m);
}
// count leading zeros. A priority encoder in hardware.
uint32_t clz (uint32_t x)
{
uint32_t m, c, y, n = 32;
y = x >> 16; m = n - 16; c = (y != 0); n = mux (c, m, n); x = mux (c, y, x);
y = x >> 8; m = n - 8; c = (y != 0); n = mux (c, m, n); x = mux (c, y, x);
y = x >> 4; m = n - 4; c = (y != 0); n = mux (c, m, n); x = mux (c, y, x);
y = x >> 2; m = n - 2; c = (y != 0); n = mux (c, m, n); x = mux (c, y, x);
y = x >> 1; m = n - 2; c = (y != 0); n = mux (c, m, n - x);
return n;
}
// re-interpret bit pattern of a 32-bit integer as an IEEE-754 binary32
float uint32_as_float (uint32_t a)
{
float r;
memcpy (&r, &a, sizeof r);
return r;
}
// George Marsaglia's KISS PRNG, period 2**123. Newsgroup sci.math, 21 Jan 1999
// Bug fix: Greg Rose, "KISS: A Bit Too Simple" http://eprint.iacr.org/2011/007
static uint32_t kiss_z=362436069, kiss_w=521288629;
static uint32_t kiss_jsr=123456789, kiss_jcong=380116160;
#define znew (kiss_z=36969*(kiss_z&65535)+(kiss_z>>16))
#define wnew (kiss_w=18000*(kiss_w&65535)+(kiss_w>>16))
#define MWC ((znew<<16)+wnew )
#define SHR3 (kiss_jsr^=(kiss_jsr<<13),kiss_jsr^=(kiss_jsr>>17), \
kiss_jsr^=(kiss_jsr<<5))
#define CONG (kiss_jcong=69069*kiss_jcong+1234567)
#define KISS ((MWC^CONG)+SHR3)
#define N 100
uint32_t bucket [N];
int main (void)
{
for (int i = 0; i < 100000; i++) {
uint32_t i = KISS;
#if USE_FP_MULTIPLY
float r = i * 0x1.0p-32f;
#else // USE_FP_MULTIPLY
float r = uniform_rand_01 (i);
#endif // USE_FP_MULTIPLY
bucket [(int)(r * N)]++;
}
for (int i = 0; i < N; i++) {
printf ("bucket [%2d]: [%.5f,%.5f): %u\n",
i, 1.0f*i/N, (i+1.0f)/N, bucket[i]);
}
return EXIT_SUCCESS;
}
Please check the xoshiro128+ here https://prng.di.unimi.it/xoshiro128plus.c
The VHDL code written by someone can be found here:
https://github.com/jorisvr/vhdl_prng/tree/master/rtl
The seed value is generated from another random number generation algorithm so don't get confused by this.
Depending on the seed value used it should give a uniform distribution.
Note: This question is different from Fastest way to calculate a 128-bit integer modulo a 64-bit integer.
Here's a C# fiddle:
https://dotnetfiddle.net/QbLowb
Given the pseudocode:
UInt64 a = 9228496132430806238;
UInt32 d = 585741;
How do i calculate
UInt32 r = a % d?
The catch, of course, is that i am not in a compiler that supports the UInt64 data type.1 But i do have access to the Windows ULARGE_INTEGER union:
typedef struct ULARGE_INTEGER {
DWORD LowPart;
DWORD HighPart;
};
Which means really that i can turn my code above into:
//9228496132430806238 = 0x80123456789ABCDE
UInt32 a = 0x80123456; //high part
UInt32 b = 0x789ABCDE; //low part
UInt32 r = 585741;
How to do it
But now comes how to do the actual calculation. I can start with the pencil-and-paper long division:
________________________
585741 ) 0x80123456 0x789ABCDE
To make it simpler, we can work in variables:
Now we are working entirely with 32-bit unsigned types, which my compiler does support.
u1 = a / r; //integer truncation math
v1 = a % r; //modulus
But now i've brought myself to a standstill. Because now i have to calculate:
v1||b / r
In other words, I have to perform division of a 64-bit value, which is what i was unable to perform in the first place!
This must be a solved problem already. But the only questions i can find on Stackoverflow are people trying to calculate:
a^b mod n
or other cryptographically large multi-precision operations, or approximate floating point.
Bonus Reading
Microsoft Research: Division and Modulus for Computer Scientists
https://stackoverflow.com/questions/36684771/calculating-large-mods-by-hand
Fastest way to calculate a 128-bit integer modulo a 64-bit integer (unrelated question; i hate you people)
1But it does support Int64, but i don't think that helps me
Working with Int64 support
I was hoping for the generic solution to the performing modulus against a ULARGE_INTEGER (and even LARGE_INTEGER), in a compiler without native 64-bit support. That would be the correct, good, perfect, and ideal answer, which other people will be able to use when they need.
But there is also the reality of the problem i have. And it can lead to an answer that is generally not useful to anyone else:
cheating by calling one of the Win32 large integer functions (although there is none for modulus)
cheating by using 64-bit support for signed integers
I can check if a is positive. If it is, i know my compiler's built-in support for Int64 will handle:
UInt32 r = a % d; //for a >= 0
Then there's there's how to handle the other case: a is negative
UInt32 ModU64(ULARGE_INTEGER a, UInt32 d)
{
//Hack: Our compiler does support Int64, just not UInt64.
//Use that Int64 support if the high bit in a isn't set.
Int64 sa = (Int64)a.QuadPart;
if (sa >= 0)
return (sa % d);
//sa is negative. What to do...what to do.
//If we want to continue to work with 64-bit integers,
//we could now treat our number as two 64-bit signed values:
// a == (aHigh + aLow)
// aHigh = 0x8000000000000000
// aLow = 0x0fffffffffffffff
//
// a mod d = (aHigh + aLow) % d
// = ((aHigh % d) + (aLow % d)) % d //<--Is this even true!?
Int64 aLow = sa && 0x0fffffffffffffff;
Int64 aHigh = 0x8000000000000000;
UInt32 rLow = aLow % d; //remainder from low portion
UInt32 rHigh = aHigh % d; //this doesn't work, because it's "-1 mod d"
Int64 r = (rHigh + rLow) % d;
return d;
}
Answer
It took a while, but i finally got an answer. I would post it as an answer; but Z29kIGZ1Y2tpbmcgZGFtbiBzcGVybSBidXJwaW5nIGNvY2tzdWNraW5nIHR3YXR3YWZmbGVz people mistakenly decided that my unique question was an exact duplicate.
UInt32 ModU64(ULARGE_INTEGER a, UInt32 d)
{
//I have no idea if this overflows some intermediate calculations
UInt32 Al = a.LowPart;
UInt32 Ah = a.HighPart;
UInt32 remainder = (((Ah mod d) * ((0xFFFFFFFF - d) mod d)) + (Al mod d)) mod d;
return remainder;
}
Fiddle
I just updated my ALU32 class code in this related QA:
Cant make value propagate through carry
As CPU assembly independent code for mul,div was requested. The divider is solving all your problems. However it is using Binary long division so its a bit slover than stacking up 32 bit mul/mod/div operations. Here the relevant part of code:
void ALU32::div(DWORD &c,DWORD &d,DWORD ah,DWORD al,DWORD b)
{
DWORD ch,cl,bh,bl,h,l,mh,ml;
int e;
// edge cases
if (!b ){ c=0xFFFFFFFF; d=0xFFFFFFFF; cy=1; return; }
if (!ah){ c=al/b; d=al%b; cy=0; return; }
// align a,b for binary long division m is the shifted mask of b lsb
for (bl=b,bh=0,mh=0,ml=1;bh<0x80000000;)
{
e=0; if (ah>bh) e=+1; // e = cmp a,b {-1,0,+1}
else if (ah<bh) e=-1;
else if (al>bl) e=+1;
else if (al<bl) e=-1;
if (e<=0) break; // a<=b ?
shl(bl); rcl(bh); // b<<=1
shl(ml); rcl(mh); // m<<=1
}
// binary long division
for (ch=0,cl=0;;)
{
sub(l,al,bl); // a-b
sbc(h,ah,bh);
if (cy) // a<b ?
{
if (ml==1) break;
shr(mh); rcr(ml); // m>>=1
shr(bh); rcr(bl); // b>>=1
continue;
}
al=l; ah=h; // a>=b ?
add(cl,cl,ml); // c+=m
adc(ch,ch,mh);
}
cy=0; c=cl; d=al;
if ((ch)||(ah)) cy=1; // overflow
}
Look the linked QA for description of the class and used subfunctions. The idea behind a/b is simple:
definition
lets assume that we got 64/64 bit division (modulus will be a partial product) and want to use 32 bit arithmetics so:
(ah,al) / (bh,bl) = (ch,cl)
each 64bit QWORD will be defined as high and low 32bit DWORD.
align a,b
exactly like computing division on paper we must align b so it divides a so find sh that:
(bh,bl)<<sh <= (ah,al)
(bh,bl)<<(sh+1) > (ah,al)
and compute m so
(mh,ml) = 1<<sh
beware that in case bh>=0x80000000 stop the shifting or we would overflow ...
divide
set result c = 0 and then simply substract b from a while b>=a. For each substraction add m to c. Once b>a shift both b,m right to align again. Stop if m==0 or a==0.
result
c will hold 64bit result of division so use cl and similarly a holds the remainder so use al as your modulus result. You can check if ch,ah are zero if not overflow occurs (as result is bigger than 32 bit). The same goes for edge cases like division by zero...
Now as you want 64bit/32bit simply set bh=0 ... To do this I needed 64bit operations (+,-,<<,>>) which I did by stacking up 32bit operations with Carry (that is the reason why my ALU32 class was created in the first place) for more info see the link above.
I recently came across this question. I know the naive approach i.e to find a^b and then extract least significant digits of this number 'k' times.
I am looking for a better approach.
'a' and 'b' are integers.
The naive approach breaks when a^p < 10^k, but a^(p+1) overflows. A solution which only requires 2*10^k-2 to fit into the variables is to write the (a*a) mod 10^k using Russian peasant multiplication. It calculates the product of a*b by multiplying a and dividing b with steps of two and hence prevents the overflow as you can take the modulus between each step.
Here is a c++ implementation of function calculating (a*b)%m without an overflow:
unsigned long long abModm(unsigned long long a, unsigned long long b,unsigned long long m){
unsigned long long res=0;
a=a%m;
b=b%m;
while (b>0){
if (b&1==1){//is b odd
res=(res+a)%m;//collect the result
}
a=(a<<1)%m;//multiply a
b>>=1;//divide b
}
return res;
}
Then you can use this to solve the problem as already suggested by others:
int kthDigit(unsigned long long a, unsigned long long b, int k){
unsigned long long m=1;
for (int i=0;i<k;++i) m*=10;
unsigned long long res=1;
for (int i=0;i<b;++i){
res=abModm(res,a,m);
}
m/=10;
return res/m;
}
The exponent calculation is O(b) you can do it in O(log(b)) with
unsigned long long res=1;
while (b){
if (b&1) res=abModm(res,a,m);
b>>=1;
a=abModm(a,a,m);
}
Check for the special case that a is divisible by 10. If k < b the result is 0, if k ≥ b then it's the (k - b'th) digit of (a/10)^b.
Do the calculation modulo 10^(k + 1). Replace a with a modulo 10^(k + 1). With 64 bit arithmetic, the calculation is easy if k ≤ 18 and a < 2^32.
Do the power by multiplying in steps, and in each step, discard the highest digits that will not influence the digit you're looking for. This will allow you to go beyond the integer size limitations of your implementation. In Javascript, which is limited to 253-1, you can calculate e.g. the 9th digit of 999999999999.
function powerDigit(a, b, k) {
var c = 1, max = Math.pow(10, k);
a %= max;
while (b--) {
c *= a;
// if (c >= Math.pow(2, 53)) return NaN; // Javascript limitation
c %= max;
}
return Math.floor(c * 10 / max);
}
document.write(powerDigit(9, 9, 9) + "<BR>"); // 3 ; 387420489
document.write(powerDigit(99, 9, 9) + "<BR>"); // 4 ; 913517247483640899
document.write(powerDigit(99, 99, 9) + "<BR>"); // 2 ; 3.697296376497267726e+197
document.write(powerDigit(999, 999, 9) + "<BR>"); // 4 ; 3.680634882592232678e+2996
document.write(powerDigit(999999, 999999, 9)); // 9 ; millions of digits
First you need to find a^b then you divide this by 10^(k-1) and from result you find modulo 10 and you get your kth number from right.
Here i give example of c code:
double r=pow(a,b)/pow(10,k-1);
int result=(int)r%10;
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Best algorithm to count the number of set bits in a 32-bit integer?
Given a 32bit unsigned integer, we want to count the number of non-zero bits in its binary representation. What is the fastest way to do that ?
We want to do this N~10^10 times.
note: using a large look up table is usually not a good idea because of the architecture of current cpu's . it is much faster to calculate it locally than to use a huge array that needs looking at the external memory
There are actually several options, I presume the native way is way too slow for this.
You can go with lookup table for 8-bit value and do it in parallel for all four bytes from unsigned int value, then sum the result. This one could be also quite well-paralelizable (be it multi-core, or maybe even some SSE3/4 could help).
You can also go with Brian Kernighan's solution:
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}
And the last possible way I found somewhere some time ago is (on 64-bit machines, as the modulo operation would be really fast there):
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
Is there any very fast method to find a binary logarithm of an integer number? For example, given a number
x=52656145834278593348959013841835216159447547700274555627155488768 such algorithm must find y=log(x,2) which is 215. x is always a power of 2.
The problem seems to be really simple. All what is required is to find the position of the most significant 1 bit. There is a well-known method FloorLog, but it is not very fast especially for the very long multi-words integers.
What is the fastest method?
A quick hack: Most floating-point number representations automatically normalise values, meaning that they effectively perform the loop Christoffer Hammarström mentioned in hardware. So simply converting from an integer to FP and extracting the exponent should do the trick, provided the numbers are within the FP representation's exponent range! (In your case, your integer input requires multiple machine words, so multiple "shifts" will need to be performed in the conversion.)
If the integers are stored in a uint32_t a[], then my obvious solution would be as follows:
Run a linear search over a[] to find the highest-valued non-zero uint32_t value a[i] in a[] (test using uint64_t for that search if your machine has native uint64_t support)
Apply the bit twiddling hacks to find the binary log b of the uint32_t value a[i] you found in step 1.
Evaluate 32*i+b.
The answer is implementation or language dependent. Any implementation can store the number of significant bits along with the data, as it is often useful. If it must be calculated, then find the most significant word/limb and the most significant bit in that word.
If you're using fixed-width integers then the other answers already have you pretty-well covered.
If you're using arbitrarily large integers, like int in Python or BigInteger in Java, then you can take advantage of the fact that their variable-size representation uses an underlying array, so the base-2 logarithm can be computed easily and quickly in O(1) time using the length of the underlying array. The base-2 logarithm of a power of 2 is simply one less than the number of bits required to represent the number.
So when n is an integer power of 2:
In Python, you can write n.bit_length() - 1 (docs).
In Java, you can write n.bitLength() - 1 (docs).
You can create an array of logarithms beforehand. This will find logarithmic values up to log(N):
#define N 100000
int naj[N];
naj[2] = 1;
for ( int i = 3; i <= N; i++ )
{
naj[i] = naj[i-1];
if ( (1 << (naj[i]+1)) <= i )
naj[i]++;
}
The array naj is your logarithmic values. Where naj[k] = log(k).
Log is based on two.
This uses binary search for finding the closest power of 2.
public static int binLog(int x,boolean shouldRoundResult){
// assuming 32-bit integer
int lo=0;
int hi=31;
int rangeDelta=hi-lo;
int expGuess=0;
int guess;
while(rangeDelta>1){
expGuess=(lo+hi)/2; // or (loGuess+hiGuess)>>1
guess=1<<expGuess;
if(guess<x){
lo=expGuess;
} else if(guess>x){
hi=expGuess;
} else {
lo=hi=expGuess;
}
rangeDelta=hi-lo;
}
if(shouldRoundResult && hi>lo){
int loGuess=1<<lo;
int hiGuess=1<<hi;
int loDelta=Math.abs(x-loGuess);
int hiDelta=Math.abs(hiGuess-x);
if(loDelta<hiDelta)
expGuess=lo;
else
expGuess=hi;
} else {
expGuess=lo;
}
int result=expGuess;
return result;
}
The best option on top of my head would be a O(log(logn)) approach, by using binary search. Here is an example for a 64-bit ( <= 2^63 - 1 ) number (in C++):
int log2(int64_t num) {
int res = 0, pw = 0;
for(int i = 32; i > 0; i --) {
res += i;
if(((1LL << res) - 1) & num)
res -= i;
}
return res;
}
This algorithm will basically profide me with the highest number res such as (2^res - 1 & num) == 0. Of course, for any number, you can work it out in a similar matter:
int log2_better(int64_t num) {
var res = 0;
for(i = 32; i > 0; i >>= 1) {
if( (1LL << (res + i)) <= num )
res += i;
}
return res;
}
Note that this method relies on the fact that the "bitshift" operation is more or less O(1). If this is not the case, you would have to precompute either all the powers of 2, or the numbers of form 2^2^i (2^1, 2^2, 2^4, 2^8, etc.) and do some multiplications(which in this case aren't O(1)) anymore.
The example in the OP is an integer string of 65 characters, which is not representable by a INT64 or even INT128. It is still very easy to get the Log(2,x) from this string by converting it to a double-precision number. This at least gives you easy access to integers upto 2^1023.
Below you find some form of pseudocode
# 1. read the string
string="52656145834278593348959013841835216159447547700274555627155488768"
# 2. extract the length of the string
l=length(string) # l = 65
# 3. read the first min(l,17) digits in a float
float=to_float(string(1: min(17,l) ))
# 4. multiply with the correct power of 10
float = float * 10^(l-min(17,l) ) # float = 5.2656145834278593E64
# 5. Take the log2 of this number and round to the nearest integer
log2 = Round( Log(float,2) ) # 215
Note:
some computer languages can convert arbitrary strings into a double precision number. So steps 2,3 and 4 could be replaced by x=to_float(string)
Step 5 could be done quicker by just reading the double-precision exponent (bits 53 up to and including 63) and subtracting 1023 from it.
Quick example code: If you have awk you can quickly test this algorithm.
The following code creates the first 300 powers of two:
awk 'BEGIN{for(n=0;n<300; n++) print 2^n}'
The following reads the input and does the above algorithm:
awk '{ l=length($0); m = (l > 17 ? 17 : l)
x = substr($0,1,m) * 10^(l-m)
print log(x)/log(2)
}'
So the following bash-command is a convoluted way to create a consecutive list of numbers from 0 to 299:
$ awk 'BEGIN{for(n=0;n<300; n++) print 2^n}' | awk '{ l=length($0); m = (l > 17 ? 17 : l); x = substr($0,1,m) * 10^(l-m); print log(x)/log(2) }'
0
1
2
...
299