In Golang, how do you set and clear individual bits of an integer? For example, functions that behave like this:
clearBit(129, 7) // returns 1
setBit(1, 7) // returns 129
Here's a function to set a bit. First, shift the number 1 the specified number of spaces in the integer (so it becomes 0010, 0100, etc). Then OR it with the original input. This leaves the other bits unaffected but will always set the target bit to 1.
// Sets the bit at pos in the integer n.
func setBit(n int, pos uint) int {
n |= (1 << pos)
return n
}
Here's a function to clear a bit. First shift the number 1 the specified number of spaces in the integer (so it becomes 0010, 0100, etc). Then flip every bit in the mask with the ^ operator (so 0010 becomes 1101). Then use a bitwise AND, which doesn't touch the numbers AND'ed with 1, but which will unset the value in the mask which is set to 0.
// Clears the bit at pos in n.
func clearBit(n int, pos uint) int {
mask := ^(1 << pos)
n &= mask
return n
}
Finally here's a function to check whether a bit is set. Shift the number 1 the specified number of spaces (so it becomes 0010, 0100, etc) and then AND it with the target number. If the resulting number is greater than 0 (it'll be 1, 2, 4, 8, etc) then the bit is set.
func hasBit(n int, pos uint) bool {
val := n & (1 << pos)
return (val > 0)
}
There is also a compact notation to clear a bit. The operator for that is &^ and called "and not".
Using this operator the clearBit function can be written like this:
// Clears the bit at pos in n.
func clearBit(n int, pos uint) int {
n &^= (1 << pos)
return n
}
Or like this:
// Clears the bit at pos in n.
func clearBit(n int, pos uint) int {
return n &^ (1 << pos)
}
Related
Suppose N is an arbitrary number represented according to IEEE754 single precision standards. I want to find the most precise possible representation of N/2 again in IEEE754.
I want to find a general algorithm (described in words, I just want the necessary steps and cases to take into account) for obtaining the representation.
My approach is :
Say the number is represented: b0_b1_b2_b3...b_34.
Isolate the first bit which determines the sign (-/+) of the number.
Calculate the representation of the power (p) from the unsigned representation b_1...b_11.
If power = 128 we have a special case. If all the bits of the mantissa are equal to 0, we have, depending on b_0, either minus or plus infinity. We don't change anything. If the mantissa has at least one bit equal to 1 then we have NaN value. Again we change nothing.
if e is inside]-126, 127[then we have a normalized mantissam. The new power p can be calculated asp' = p - 1and belongs in the interval]-127, 126]. We then calculatem/2` and we represent it starting from the right and losing any bits that cannot be included in the 23 bits of the mantissa.
If e = -126, then in calculating the half of this number we pass in a denormalized mantissa. We represent p = 127, calculate the half of the mantissa and represent it again starting from the right losing any information that cannot be included.
Finally if e = -127 we have a denormalized mantissa. As long as m/2 can be represented in the number of bits available in the mantissa without losing information we represent that and keep p = -127. In any other case we represent the number as a positive or negative 0 depending on b_0
Any steps I have missed, any improvements ( I am sure there are ) that can be made or anything that seems completely wrong?
I implemented a divide by two algorithm in Java and verified it for all 32-bit inputs. I tried to follow your pseudocode, but there were three places where I diverged. First, the infinity/NaN exponent is 128. Second, in case 4 (normal -> normal), there's no need to operate on the fraction. Third, you didn't describe how round half to even works when you do operate on the fraction. LGTM otherwise.
public final class FloatDivision {
public static float divideFloatByTwo(float value) {
int bits = Float.floatToIntBits(value);
int sign = bits >>> 31;
int biased_exponent = (bits >>> 23) & 0xff;
int exponent = biased_exponent - 127;
int fraction = bits & 0x7fffff;
if (exponent == 128) {
// value is NaN or infinity
} else if (exponent == -126) {
// value is normal, but result is subnormal
biased_exponent = 0;
fraction = divideNonNegativeIntByTwo(0x800000 | fraction);
} else if (exponent == -127) {
// value is subnormal or zero
fraction = divideNonNegativeIntByTwo(fraction);
} else {
// value and result are normal
biased_exponent--;
}
return Float.intBitsToFloat((sign << 31) | (biased_exponent << 23) | fraction);
}
private static int divideNonNegativeIntByTwo(int value) {
// round half to even
return (value >>> 1) + ((value >>> 1) & value & 1);
}
public static void main(String[] args) {
int bits = Integer.MIN_VALUE;
do {
if (bits % 0x800000 == 0) {
System.out.println(bits);
}
float value = Float.intBitsToFloat(bits);
if (Float.floatToIntBits(divideFloatByTwo(value)) != Float.floatToIntBits(value / 2)) {
System.err.println(bits);
break;
}
} while (++bits != Integer.MIN_VALUE);
}
}
Suppose you're given a int randBit() function which returns, uniformly distributed, 0 or 1.
Write a randNumber(int max) function.
This is my implementation, but I can't prove/disprove that it's right.
// max number of bits
int i = (int)Math.floor(Math.log(max) / Math.log(2)) + 1;
int ret = randBit();
while (i-- > 0) {
ret = ret << 1 | randBit();
}
return ret;
The basic idea I had is that
find the number of bits present in the number
then generate the number by continuously concatenating the LSB until the bitlength is met
The approach to fill an int with random bits is the right way in my opinion. However, since your algorithm only works when max is power of 2 and is off by one in the loop, I'd suggest this modification:
// max number of bits
int i = (int)Math.floor(Math.log(max) / Math.log(2)) + 1;
int rnd = 0;
int mask = 1;
while (i-- > 0) {
rnd = rnd << 1 | randBit();
mask <<= 1; // or: mask *= 2
}
double q = (double)rnd / mask; // range is [0, 1)
return (int)((max + 1) * q);
Let's take a look at this:
i will always be equal to the number of bits that max occupies. When the loop is finished, rnd will contain that number of bits filled randomly with 0 or 1, and mask-1 will contain that number of bits filled with 1s. So it's safe to assume that the quotient of rnd and mask-1 is uniformly distributed between 0 and 1. This multiplied with max would wield results in the range between 0 and max, also uniformly distributed, in terms of floating/real values.
Now this result has to be mapped to integers, and of course you'd want them also to be uniformly distributed. The only catch here is the 1. If the quotient of rnd and mask-1 is exactly 1, there'd be an edge case that would cause trouble when scaling to the desired result range: There would be 0 .. max-1 values uniformly distributed, but max would be a rare exception.
To take care of this condition the quotient has to be built such that it ranges from 0 to 1, but with 1 exclusive. This is achieved by rnd / mask. This range can be easily mapped to uniformly-spreaded integers 0 .. max by multiplying with max+1 and casting to int.
I'm interested in a fast method for "expanding bits," which can be defined as the following:
Let B be a binary number with n bits, i.e. B \in {0,1}^n
Let P be the position of all 1/true bits in B, i.e. 1 << p[i] & B == 1, and |P|=k
For another given number, A \in {0,1}^k, let Ap be the bit-expanded form of A given B, such that Ap[j] == A[j] << p[j].
The result of the "bit expansion" is Ap.
A couple examples:
Given B: 0010 1110, A: 0110, then Ap should be 0000 1100
Given B: 1001 1001, A: 1101, then Ap should be 1001 0001
Following is a straightforward algorithm, but I can't help shake the feeling that there's a faster/easier way to do this.
unsigned int expand_bits(unsigned int A, unsigned int B, int n) {
int k = popcount(B); // cuda function, but there are good methods for this
unsigned int Ap = 0;
int j = k-1;
// Starting at the most significant bit,
for (int i = n - 1; i >= 0; --i) {
Ap <<= 1;
// if B is 1, add the value at A[j] to Ap, decrement j.
if (B & (1 << i)) {
Ap += (A >> j--) & 1;
}
}
return Ap;
}
The question appears to be asking for a CUDA emulation of the BMI2 instruction PDEP, which takes a source operand a, and deposits its bits based on the positions of the 1-bits of a mask b. There is no hardware support for an identical, or a similar, operation on currently shipping GPUs; that is, up to and including the Maxwell architecture.
I am assuming, based on the two examples given, that the mask b in general is sparse, and that we can minimize work by only iterating over the 1-bits of b. This could cause divergent branches on the GPU, but the exact trade-off in performance is unknown without knowledge of a specific use case. For now, I am assuming that the exploitation of sparsity in the mask b has a stronger positive influence on performance compared to the negative impact of divergence.
In the emulation code below, I have reduced the use of potentially "expensive" shift operations, instead relying mostly on simple ALU instructions. On various GPUs, shift instructions are executed with lower throughput than simple integer arithmetic. I have retained a single shift, off the critical path through the code, to avoid becoming execution limited by the arithmetic units. If desired, the expression 1U << i can be replaced by addition: introduce a variable m that is initialized to 1 before the loop and doubled each time through the loop.
The basic idea is to isolate each 1-bit of mask b in turn (starting at the least significant end), AND it with the value of the i-th bit of a, and incorporate the result into the expanded destination. After a 1-bit from b has been used, we remove it from the mask, and iterate until the mask becomes zero.
In order to avoid shifting the i-th bit of a into place, we simply isolate it and then replicate its value to all more significant bits by simple negation, taking advantage of the two's complement representation of integers.
/* Emulate PDEP: deposit the bits of 'a' (starting with the least significant
bit) at the positions indicated by the set bits of the mask stored in 'b'.
*/
__device__ unsigned int my_pdep (unsigned int a, unsigned int b)
{
unsigned int l, s, r = 0;
int i;
for (i = 0; b; i++) { // iterate over 1-bits in mask, until mask becomes 0
l = b & (0 - b); // extract mask's least significant 1-bit
b = b ^ l; // clear mask's least significant 1-bit
s = 0 - (a & (1U << i)); // spread i-th bit of 'a' to more signif. bits
r = r | (l & s); // deposit i-th bit of 'a' at position of mask's 1-bit
}
return r;
}
The variant without any shift operations alluded to above looks as follows:
/* Emulate PDEP: deposit the bits of 'a' (starting with the least significant
bit) at the positions indicated by the set bits of the mask stored in 'b'.
*/
__device__ unsigned int my_pdep (unsigned int a, unsigned int b)
{
unsigned int l, s, r = 0, m = 1;
while (b) { // iterate over 1-bits in mask, until mask becomes 0
l = b & (0 - b); // extract mask's least significant 1-bit
b = b ^ l; // clear mask's least significant 1-bit
s = 0 - (a & m); // spread i-th bit of 'a' to more significant bits
r = r | (l & s); // deposit i-th bit of 'a' at position of mask's 1-bit
m = m + m; // mask for next bit of 'a'
}
return r;
}
In comments below, #Evgeny Kluev pointed to a shift-free PDEP emulation at the chessprogramming website that looks potentially faster than either of my two implementations above; it seems worth a try.
Lets say x = 1110 (14 in Dec) and I want to find the 2nd set bit from the right, 0100 (4 in Dec)
Yet another example, lets say x = 10110010 (178 in Dec) and I want the 3rd set bit from the right,i.e, 00100000 (32 in Dec)
How to find it? Is there a hack?
Subtracting one from a number will clear the least-significant bit which was set, while setting bits below that. ANDing with the original number will then leave a number which was equal to the original except with the original lowest set bit clear. This procedure may be iterated N times to yield a number with the lowest N set bits clear. The bit which is changed by the Nth iteration (if any) will be the Nth lowest bit that was set in the original.
Assuming a two's complement signed 32-bit integer called number is the input (hence only counting bits 0 to 30 in the for loop):
int number = (1 << 3) | 1; // this is the input, it can be whatever you like
int currentLsbCount = 0;
int desiredLsbCount = 2; // this is your n
int foundLsb = 0;
int foundLsbIndex = 0;
for (int i = 0; i < 31; i++)
{
int bit = (number >> i) & 1;
if (bit == 1)
{
++currentLsbCount;
}
if (currentLsbCount == desiredLsbCount)
{
foundLsb = number & (1 << i);
foundLsbIndex = i;
break;
}
}
foundLsb will hold the value or will be zero if the input was zero; foundLsbIndex will hold the index of the bit.
As far as I know you would have to iterate. There is no quicker method than looping through the bits. You could add some skip logic in, but it would not improve the worst case timing. For instance:
if ((number & ((1 << x) - 1)) == number)
{
// the bottom x bits are zero...
}
This would increase the number of operations for the worst case.
In VB.NET, I'd possibly do the following:
Private Function ReturnBit(input As Long, num As Long) As Long
Dim iResult As Long = 0 'Counts set bits.
Dim work As Long = input 'Working copy of input.
'Looping from the LSB to the MSB of a byte. Adjust for desired
'length, 15 for 2 bytes, 31 for 4 bytes, etc.
For i As Integer = 0 To 7
'If the working variable is 0, the input does not contain as
'many set bits as required. Return -1 if you wish.
If work = 0 Then Return 0
'Add the now LSB if 1, 0 otherwise.
iResult += (work And 1)
'iResult contains the number of set bits now. If this is
'the requested number, return this number. If you're just after
'the position, just return i instead. Instead of 2^i it could be
'more efficient to use 1<<i, but I'd rely on the compiler for
'this.
If iResult = num Then Return CLng(2 ^ i)
'Remove the LSB from the working copy.
work >>= 1
Next
Return 0 'Not enough set bits in input.
End Function
The following is the implementation of BitSet in the solution of question 10-4 in cracking the coding interview book. Why is it allocating an array of size/32 not (size/32 + 1). Am I missing something here or this is a bug?
If I pass 33 to the constructor of BitSet then I will allocate only one int and If I try to set or get the bit 32, I will get an AV!
package Question10_4;
class BitSet {
int[] bitset;
public BitSet(int size) {
bitset = new int[size >> 5]; // divide by 32
}
boolean get(int pos) {
int wordNumber = (pos >> 5); // divide by 32
int bitNumber = (pos & 0x1F); // mod 32
return (bitset[wordNumber] & (1 << bitNumber)) != 0;
}
void set(int pos) {
int wordNumber = (pos >> 5); // divide by 32
int bitNumber = (pos & 0x1F); // mod 32
bitset[wordNumber] |= 1 << bitNumber;
}
}
From what I can gather from reading the solution you mention (on page 205), and the little I understand about computer programming, it seems to me that this is a special implementation of a bitset, meant to take the argument of 32,000 in its construction (see the checkDuplicates function. The question is about examining an array with numbers from 1 to N, where N is at most 32,000, with only 4KB of memory).
This way, an array of 1000 elements is created, each one used for 32 bits in the bit set. You can see in the bitset class that to get a bit's position, we (floor) divide by 32 to get the array index, and then mod 32 to get the specific bit position.
Yes, answer in the book is incorrect.
Correct answer:
bitset = new int[(size + 31) >> 5]; // divide by 32