Is there an algorithm to bit shift every n-long bits from a number without overflow? - algorithm

For example, if I have a number 0101 1111 and I want to shift every 4 bit long section to the left to get 1010 1110. While I could just modulo off each section to get two 4-bit numbers, is there an algorithm that doesn't need to do this?

A naive approach
A first naive appraoch is to slice the 4 bit groups and process them individually. The expected result is obtained with the following for the first group of 4 bits.
(((x & 0xf) // take only 4 bits
<< 1) // shift them by 1
& 0xf) // get rid of potential overflow
For the n+1 th group of 4 bits, it's
(((x & (0xf<<(n*4)))
<< 1)
& (0xf<<(n*4)))
Since this is designed, so that there is no overlap around the 4 bits that are processed, you could iterate, and binary-or the partial results.
A less naive approach
Another approach is to simply shift the full x by 1, causing every 4 bit group to be shifted at once:
0101 1111 -> 1011 1110
We can then easily get rid of the overflow, and at the same time make sure that 0's are injected on the left, by clearing every 4th bit in the result of the shift:
1011 1110
& 1110 1110
---------
1010 1110
1110 is e in hexadecimal. So you need to generate a number with as many 0xe as there are 4 bit segments. In your case it's 0xee if it's just 8 bits. It's 0xeeeeeeeeeeeeeeee if it's 64 bits. Someone told this answer in the comments. Here you have the explanation.
Caution if your underlying data type is signed, because of the sign bit. Do this processing on unsigned integers to avoid any surprise.

Here is one way.
int bits = 0b1111_0001_0011_0111;
int result = 0;
int m = 0b1111;
while(m != 0) {
result |= ((bits & m) << 1) & m;
m <<= 4;
}
System.out.printf("%-7s = %s%n","src", Integer.toBinaryString(bits));
System.out.printf("%-7s = %s%n","result", Integer.toBinaryString(result));
Prints
src = 1111000100110111
result = 1110001001101110

Related

Looping over a bitmask

This is a submission to the TopCoder SRM 466 "Lottery Ticket" problem. I've seen this pattern used multiple times for this problem.
Nick likes to play the lottery. The cost of a single lottery ticket is price. Nick has exactly four banknotes with values b1, b2, b3 and b4 (some of the values may be equal). He wants to know if it's possible to buy a single lottery ticket without getting any change back. In other words, he wants to pay the exact price of a ticket using any subset of his banknotes. Return "POSSIBLE" if it is possible or "IMPOSSIBLE" if it is not (all quotes for clarity).
string buy(int p, int b1, int b2, int b3, int b4) {
int arr[] = {b1, b2, b3, b4};
for (int msk = 0; msk < (1 << 4); ++msk) {
int sum = 0;
for (int i = 0; i < 4; ++i) {
if (msk & (1 << i)) {
sum += arr[i];
}
}
if (sum == p) return "POSSIBLE";
}
return "IMPOSSIBLE";
}
Can someone explain how this works? I don't understand why he puts the values into an array and loops using two nested for loops.
This problem can be extended to any number of bank notes but lets go over this example.
The idea with this solution is to use brute-force approach to solve the problem. That means I will try all possible solutions and if one of them is working then the result is positive.
Working solution in this case means the sum of the bank notes that I've picked is equal to the p.
Lets look at this piece of code first:
for (int msk = 0; msk < (1 << 4); ++msk)
This says I'm going to iterate over all numbers from 0 to 2^4-1, i.e. 0-15.
If you write these numbers in their binary notation, you will notice that they cover all possible combinations of length 4 (we don't have to write all leading zeroes, but it would actually have 32 bits in total for type int).
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Lets pick one of the examples, e.g. 1010. This means that I will pick numbers at positions 1 and 3 (0-based looking from right to the left). Then I'll check if the sum of these two numbers is equal to p.
The next for-loop sums all the numbers at positions that have 1:
for (int i = 0; i < 4; ++i) {
if (msk & (1 << i)) {
sum += arr[i];
}
}
If we break it down, then we have the msk which represents the current combination we are checking and (1 << i) which is just shift-left bitwise operation that gives us 2^i, or in binary notation:
0001 = 1 << 0
0010 = 1 << 1
0100 = 1 << 2
1000 = 1 << 3
NOTE: (1 << i) is inside the parenthesis because & has higher priority and we don't want that in this case.
If you use & operator between two integers, you will get the bitwise operation, e.g.
1010 & 1000 = 1000 // this is greater than 0
1010 & 0100 = 0000 // this is equal to 0
Therefore if (msk & (1 << i)) will only be true for positions that have 1 in the current combination, i.e. msk.
I hope this also explains the reason why he puts the values inside the array - it's because he wants to assign each bank note an index and then use that note if the mask has 1 for its position, instead of figuring out which of the 4 variables should be used.
It just generates all 16 possible combinations. Every combination is represented by 4 bits, 1 meaning the banknote is used, 0 meaning it is not used.
Then it calculates the sum of the combinations and if the sum is correct, it prints "possible".
For each currency note, you have two choices, take it or leave it (on or off).
With 4 notes, you can see them as 4 bits, and you go through all possible combinations of picking them if you go through 0000 to 1111 in binary.
That's what the bitvector does. The outer loop generates all possible subsets and the inner loop evaluates a subset to see if it matches the required sum.

Parallel radix sort, how would this implementation actually work? Are there some heuristics?

I am working on an Udacity quiz for their parallel programming course. I am pretty stuck on how I should start on the assignment because I am not sure if I understand it correctly.
For the assignment (in code) we are given two arrays and array on values and an array of positions. We are supposed to sort the array of values with a parallelized radix sort, along with setting the positions correctly too.
I completely understand radix sort and how it works. What I don't understand is how they want us to implemented it. Here is the template given to start the assignment
//Udacity HW 4
//Radix Sorting
#include "reference_calc.cpp"
#include "utils.h"
/* Red Eye Removal
===============
For this assignment we are implementing red eye removal. This is
accomplished by first creating a score for every pixel that tells us how
likely it is to be a red eye pixel. We have already done this for you - you
are receiving the scores and need to sort them in ascending order so that we
know which pixels to alter to remove the red eye.
Note: ascending order == smallest to largest
Each score is associated with a position, when you sort the scores, you must
also move the positions accordingly.
Implementing Parallel Radix Sort with CUDA
==========================================
The basic idea is to construct a histogram on each pass of how many of each
"digit" there are. Then we scan this histogram so that we know where to put
the output of each digit. For example, the first 1 must come after all the
0s so we have to know how many 0s there are to be able to start moving 1s
into the correct position.
1) Histogram of the number of occurrences of each digit
2) Exclusive Prefix Sum of Histogram
3) Determine relative offset of each digit
For example [0 0 1 1 0 0 1]
-> [0 1 0 1 2 3 2]
4) Combine the results of steps 2 & 3 to determine the final
output location for each element and move it there
LSB Radix sort is an out-of-place sort and you will need to ping-pong values
between the input and output buffers we have provided. Make sure the final
sorted results end up in the output buffer! Hint: You may need to do a copy
at the end.
*/
void your_sort(unsigned int* const d_inputVals,
unsigned int* const d_inputPos,
unsigned int* const d_outputVals,
unsigned int* const d_outputPos,
const size_t numElems)
{
}
I specifically don't understand how those 4 steps end up sorting the array.
So for the first step, I am supposed to create a histogram of the "digits" (why is that in quotes..?). So given a input value n I need to make a count of the 0's and 1's into a histogram. So, should step 1 create an array of histograms, one for each input value?
And well, for the rest of the steps it breaks down pretty quickly. Could someone show me how these steps are supposed to implement a radix sort?
The basic idea behind a radix sort is that we will consider each element to be sorted digit by digit, from least significant to most significant. For each digit, we will move the elements so that those digits are in increasing order.
Let's take a really simple example. Let's sort four quantities, each of which have 4 binary digits. Let's choose 1, 4, 7, and 14. We'll mix them up and also visualize the binary representation:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
First we will consider bit 0:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
bit 0: 1 0 0 1
Now the radix sort algorithm says we must move the elements in such a way that (considering only bit 0) all the zeroes are on the left, and all the ones are on the right. Let's do this while preserving the order of the elements with a zero bit and preserving the order of the elements with a one bit. We could do that like this:
Element # 2 3 1 4
Value: 14 4 7 1
Binary: 1110 0100 0111 0001
bit 0: 0 0 1 1
The first step of our radix sort is complete. The next step is to consider the next (binary) digit:
Element # 3 2 1 4
Value: 4 14 7 1
Binary: 0100 1110 0111 0001
bit 1: 0 1 1 0
Once again, we must move elements so that the digit in question (bit 1) is arranged in ascending order:
Element # 3 4 2 1
Value: 4 1 14 7
Binary: 0100 0001 1110 0111
bit 1: 0 0 1 1
Now we must move to the next higher digit:
Element # 3 4 2 1
Value: 4 1 14 7
Binary: 0100 0001 1110 0111
bit 2: 1 0 1 1
And move them again:
Element # 4 3 2 1
Value: 1 4 14 7
Binary: 0001 0100 1110 0111
bit 2: 0 1 1 1
Now we move to the last (highest order) digit:
Element # 4 3 2 1
Value: 1 4 14 7
Binary: 0001 0100 1110 0111
bit 3: 0 0 1 0
And make our final move:
Element # 4 3 1 2
Value: 1 4 7 14
Binary: 0001 0100 0111 1110
bit 3: 0 0 0 1
And the values are now sorted. This hopefully seems clear, but in the description so far we've glossed over the details of things like "how do we know which elements to move?" and "how do we know where to put them?" So let's repeat our example, but we'll use the specific methods and sequence suggested in the prompt, in order to answer these questions. Starting over with bit 0:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
bit 0: 1 0 0 1
First let's build a histogram of the number of zero bits in bit 0 position, and the number of 1 bits in bit 0 position:
bit 0: 1 0 0 1
zero bits one bits
--------- --------
1)histogram: 2 2
Now let's do an exclusive prefix-sum on these histogram values:
zero bits one bits
--------- --------
1)histogram: 2 2
2)prefix sum: 0 2
An exclusive prefix-sum is just the sum of all preceding values. There are no preceding values in the first position, and in the second position the preceding value is 2 (the number of elements with a 0 bit in bit 0 position). Now, as an independent operation, let's determine the relative offset of each 0 bit amongst all the zero bits, and each one bit amongst all the one bits:
bit 0: 1 0 0 1
3)offset: 0 0 1 1
This can actually be done programmatically using exclusive prefix-sums again, considering the 0-group and 1-group separately, and treating each position as if it has a value of 1:
0 bit 0: 1 1
3)ex. psum: 0 1
1 bit 0: 1 1
3)ex. psum: 0 1
Now, step 4 of the given algorithm says:
4) Combine the results of steps 2 & 3 to determine the final output location for each element and move it there
What this means is, for each element, we will select the histogram-bin prefix sum value corresponding to its bit value (0 or 1) and add to that, the offset associated with its position, to determine the location to move that element to:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
bit 0: 1 0 0 1
hist psum: 2 0 0 2
offset: 0 0 1 1
new index: 2 0 1 3
Moving each element to its "new index" position, we have:
Element # 2 3 1 4
Value: 14 4 7 1
Binary: 0111 1110 0111 0001
Which is exactly the result we expect for the completion of our first digit-move, based on the previous walk-through. This has completed step 1, i.e. the first (least-significant) digit; we still have the remaining digits to process, creating a new histogram and new prefix sums at each step.
Notes:
Radix-sort, even in a computer, does not have to be done based strictly on binary digits. It's possible to construct a similar algorithm with digits of different sizes, perhaps consisting of 2,3, or 4 bits.
One of the optimizations we can perform on a radix sort is to only sort based on the number of digits that are actually meaningful. For example, if we are storing quantities in 32-bit values, but we know that the largest quantity present is 1023 (2^10-1), we need not sort on all 32 bits. We can stop, expecting a proper sort, after proceeding through the first 10 bits.
What does any of this have to do with GPUs? In so far as the above description goes, not much. The practical application is to consider using parallel algorithms for things like the histogram, the prefix-sums, and the data movement. This decomposition of radix-sort allows one to locate and use parallel algorithms already developed for these more basic operations, in order to construct a fast parallel sort.
What follows is a worked example. This may help with your understanding of radix sort. I don't think it will help with your assignment, because this example performs a 32-bit radix sort at the warp level, for a single warp, ie. for 32 quantities. But a possible advantage from an understanding point of view is that things like histogramming and prefix sums can be done at the warp level in just a few instructions, taking advantage of various CUDA intrinsics. For your assignment, you won't be able to use these techniques, and you will need to come up with full-featured parallel prefix sums, histograms, etc. that can operate on an arbitrary dataset size.
#include <stdio.h>
#include <stdlib.h>
#define WSIZE 32
#define LOOPS 100000
#define UPPER_BIT 31
#define LOWER_BIT 0
__device__ unsigned int ddata[WSIZE];
// naive warp-level bitwise radix sort
__global__ void mykernel(){
__shared__ volatile unsigned int sdata[WSIZE*2];
// load from global into shared variable
sdata[threadIdx.x] = ddata[threadIdx.x];
unsigned int bitmask = 1<<LOWER_BIT;
unsigned int offset = 0;
unsigned int thrmask = 0xFFFFFFFFU << threadIdx.x;
unsigned int mypos;
// for each LSB to MSB
for (int i = LOWER_BIT; i <= UPPER_BIT; i++){
unsigned int mydata = sdata[((WSIZE-1)-threadIdx.x)+offset];
unsigned int mybit = mydata&bitmask;
// get population of ones and zeroes (cc 2.0 ballot)
unsigned int ones = __ballot(mybit); // cc 2.0
unsigned int zeroes = ~ones;
offset ^= WSIZE; // switch ping-pong buffers
// do zeroes, then ones
if (!mybit) // threads with a zero bit
// get my position in ping-pong buffer
mypos = __popc(zeroes&thrmask);
else // threads with a one bit
// get my position in ping-pong buffer
mypos = __popc(zeroes)+__popc(ones&thrmask);
// move to buffer (or use shfl for cc 3.0)
sdata[mypos-1+offset] = mydata;
// repeat for next bit
bitmask <<= 1;
}
// save results to global
ddata[threadIdx.x] = sdata[threadIdx.x+offset];
}
int main(){
unsigned int hdata[WSIZE];
for (int lcount = 0; lcount < LOOPS; lcount++){
unsigned int range = 1U<<UPPER_BIT;
for (int i = 0; i < WSIZE; i++) hdata[i] = rand()%range;
cudaMemcpyToSymbol(ddata, hdata, WSIZE*sizeof(unsigned int));
mykernel<<<1, WSIZE>>>();
cudaMemcpyFromSymbol(hdata, ddata, WSIZE*sizeof(unsigned int));
for (int i = 0; i < WSIZE-1; i++) if (hdata[i] > hdata[i+1]) {printf("sort error at loop %d, hdata[%d] = %d, hdata[%d] = %d\n", lcount,i, hdata[i],i+1, hdata[i+1]); return 1;}
// printf("sorted data:\n");
//for (int i = 0; i < WSIZE; i++) printf("%u\n", hdata[i]);
}
printf("Success!\n");
return 0;
}
The methodology that #Robert Crovella gives is absolutely correct and very helpful. It is mildly different than the process that they explain in the Udacity videos. I'll record one iteration of their method, watchable here, in this answer, jumping off from Robert Crovella's example:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
LSB: 1 0 0 1
Predicate: 0 __1__ __1__ 0
Pred. Scan: 0 __0__ __1__ 2
Number of ones in predicate: 2
!Predicate:__1__ 0 0 __1__
!Pred. Scan: 0 1 1 1
Offset for !Pred. Scan = Number of ones in predicate = 2
!Pred. Scan + Offset:
__2__ 3 3 __3__
Final indexes to move values after 1 iteration (on LSB):
2 0 1 3
Values after 1 iteration (on LSB):
14 4 7 1
I placed emphasis (__ __) on the values that indicate or contain the index to move the value to.
Terms (from Udacity video):
LSB = least significant bit
Predicate (for LSB): (x & 1) == 0
for the next significant bit: (x & 2) == 0
for the one after that: (x & 4) == 0
and so on, with more left shifting (<<)
Pred. Scan = Predicate Scan = Predicate exclusive prefix sum
!Pred. = bits of predicate flipped (0->1 and 1->0)
Number of ones in predicate
note that this is not necessarily the last entry in the scan, you can instead get this value (sum/reduction of the predicate) as an intermediate of the Blelloch scan
A summary of the above is:
Get the predicate of your list (bit in common, starting from the LSB)
Scan the predicate, and record the sum of the predicate in the process
Blelloch Scan on the GPU
note that your predicate will be of arbitrary size, so read the section on Blelloch Scan for arrays of arbitrary instead of 2^n size
Flip bits of the predicate, and scan that
Move the values in your array with the following rule:
For the ith element in the array:
if the ith predicate is TRUE, move the ith value to the index in the ith element of the predicate scan
else, move the ith value to the index in the ith element of the !Predicate scan plus the sum of the Predicate
Move to the next significant bit (NSB)
For reference, you can consult my solution for this HW assignment in CUDA.

beauty of a binary number game

This a fairly known problem ( similar question: number of setbits in a number and a game based on setbits but answer not clear ):
The beauty of a number X is the number of 1s in the binary
representation of X. Two players are plaing a game. There is a number n
written on a blackboard. The game is played as following:
Each time a player chooses an integer number (0 <= k) so that 2^k is
less than n and (n-2^k) is equally as beautiful as n. Then n is removed from
blackboard and replaced with n-2^k instead. The player that cannot continue
the game (there is no such k that satisfies the constrains) loses the
game.
The First player starts the game and they alternate turns.
Knowing that both players play optimally must specify the
winner.
Now the solution I came up with is this:
Moving a 1 bit to its right, is subtracting the number by 2^p where ( p = position the bit moved to - 1). Example: 11001 --> 25 now if I change it to 10101 ---> 21 ( 25-(2^2))
A player can't make 2 or more such right shift in 1 round (not the programmatic right shift) as they can't sum to a power of 2. So the player are left with moving the set bit to some position to its right just once each round. This means there can be only R rounds where R is the number of times a set bit can be moved to a more right position. So the winner will always be the 1st player if R is Odd number and 2nd player if R is even number.
Original#: 101001 41
after 1st: 11001 25 (41-16)
after 2nd: 10101 21 (25-4)
after 1st: 1101 13 (21-8)
after 2nd: 1011 11 (13-2)
after 1st: 111 7 (11-4) --> the game will end at this point
I'm not sure about the correctness of the approach, is this correct? or am I missing something big?
Your approach is on the right track. The observation to be made here is that, also as illustrated in the example you gave, the game ends when all ones are on the least significant bits. So we basically need to count how many swaps we need to make the zeros go to the most significant bits.
Let's take an example, say the initial number from which game starts is 12 the the game state is as follows:
Initial state 1100 (12) ->
A makes move 1010 (10) ->
B makes move 1001 (9) ->
A makes move 0101 (5) ->
B makes 0011 (3)
A cannot move further and B wins
This can be programmatically (java program v7) achieved as
public int identifyWinner (int n) {
int total = 0, numZeros = 0;
while (n != 0) {
if ((n & 0x01) == 1) {
total += numZeros;
} else {
numZeros++;
}
n >>= 1;
}
return (total & 0b1) == 1 ? 1 : 2;
}
Also to note that even if there are multiple choices available with a player to make the next move, as illustrated below, the outcome will not change though the intermediate changes leading to outcome may change.
Again let us look at the state flow taking the same example of initial number 12
Initial state 1100 (12) ->
A makes move 1010 (10) ->
(B here has multiple choices) B makes move 0110 (6)
A makes move 0101 (5) ->
B makes 0011 (3)
A cannot move further and B wins
A cannot move further as for no k (k >=0 and n < 2**k so k =0, 1 are the only plausible choices here) does n-2^k has same beauty as n so B wins.
Multiple choices are possible with 41 as starting point as well, but A will win always (41(S) -> 37(A) -> 35(B) -> 19(A) -> 11(B) -> 7(A)).
Hope it helps!
Yes, each turn a 1 can move right if there is a 0 to its right.
But, no, the number of moves is not related to number of zeros. Counterexample:
101 (1 possible move)
versus
110 (2 possible moves)
The number of moves in the game is the sum of the total 1's to the left of each 0. (Or conversely the sum of the total 0's to the right of each 1.)
(i.e. 11000 has 2 + 2 + 2 = 6 moves, but 10100 has 1 + 2 + 2 = 5 moves because one 0 has one less 1 to its right)
The winner of the game will be the first player if the total moves in the game is odd, and will be the second player if the number of moves in the game is even.
Proof:
On any given move a player must choose a bit corresponding to
a 0 immediately to the right of a 1. Otherwise the total number of
1's will increase if a bit corresponding to a different 0 is chosen,
and will decrease if a bit corresponding to a 1 is chosen. Such a move
will result in the 1 to the right of the corresponding chosen bit
being moved one position to its right.
Given this observation, each
1 has to move through every 0 to its right; and every 0 it moves
through consumes one move. Note that regardless of the choices either
player makes on any given move, the total number of moves in the game
remains fixed.
Since Harshdeep has already posted a correct solution looping over each bit (the O(n) solution), I'll post an optimized divide and conquer O(log(n)) solution (in C/C++) reminiscent of a similar algorithm to calculate Hamming Weight. Of course using Big-Oh to describe the algorithm here is somewhat dubious since the number of bits is constant.
I've verified that the below code on all 32-bit unsigned integers gives the same result as the linear algorithm. This code runs over all values in order in 45 seconds on my machine, while the linear code takes 6 minutes and 45 seconds.
Code:
bool FastP1Win(unsigned n) {
unsigned t;
// lo: 0/1 count parity
// hi: move count parity
// 00 -> 00 : 00 >>1-> 00 &01-> 00 ; 00 |00-> 00 ; 00 &01-> 00 &00-> 00 *11-> 00 ^00-> 00
// 01 -> 01 : 01 >>1-> 00 &01-> 00 ; 01 |00-> 01 ; 01 &01-> 01 &00-> 00 *11-> 00 ^01-> 01
// 10 -> 11 : 10 >>1-> 01 &01-> 01 ; 10 |01-> 11 ; 10 &01-> 00 &01-> 00 *11-> 00 ^11-> 11
// 11 -> 00 : 11 >>1-> 01 &01-> 01 ; 11 |01-> 11 ; 11 &01-> 01 &01-> 01 *11-> 11 ^11-> 00
t = (n >> 1) & 0x55555555;
n = (n | t) ^ ((n & t & 0x55555555) * 0x3);
t = n << 2; // move every right 2-bit solution to line up with the every left 2-bit solution
n ^= ((n & t & 0x44444444) << 1) ^ t; // merge the right 2-bit solution into the left 2-bit solution
t = (n << 4); // move every right 4-bit solution to line up with the every left 4-bit solution
n ^= ((n & t & 0x40404040) << 1) ^ t; // merge the right 4-bit solution into the left 4-bit solution (stored in the high 2 bits of every 4 bits)
t = n << 8; // move every right 8-bit solution to line up with the every left 8-bit solution
n ^= ((n & t & 0x40004000) << 1) ^ t; // merge the right 8-bit solution into the left 8-bit solution (stored in the high 2 bits of every 8 bits)
t = n << 16; // move every right 16-bit solution to line up with the every left 16-bit solution
n ^= ((n & t) << 1) ^ t; // merge the right 16-bit solution into the left 16-bit solution (stored in the high 2 bits of every 16 bits)
return (int)n < 0; // return the parity of the move count of the overall solution (now stored in the sign bit)
}
Explanation:
To find number of moves in the game, one can divide the problem into smaller pieces and combine the pieces. One must track the number of 0's in any given piece, and also the number of moves in any given piece.
For instance, if we divide the problem into two 16-bit pieces then the following equation expresses the combination of the solutions:
totalmoves = leftmoves + rightmoves + (rightzeros * (16 - leftzeros)); // 16 - leftzeros yields the leftones count
Since we don't care about the total moves, just weather the value is even or odd (the parity) we only need to track the parity.
Here is the truth table for the parity of addition:
even + even = even
even + odd = odd
odd + even = odd
odd + odd = even
Given the above truth table, the parity of addition can be expressed with an XOR.
And the truth table for the parity of multiplication:
even * even = even
even * odd = even
odd * even = even
odd * odd = odd
Given the above truth table, the parity of multiplication can be expressed with an AND.
If we divide the problem into pieces of even size, then the parity of the zero count, and the one count, will always be equal and we need not track or calculate them separately.
At any given stage of the algorithm we need to know the parity of the zero/one count, and the parity of the number of moves in that piece of the solution. This requires two bits. So, lets transform every two bits in the solution so that the high bit becomes the move count parity, and the low bit becomes the zero/one count parity.
This is accomplished with this computation:
unsigned t;
t = (n >> 1) & 0x55555555;
n = (n | t) ^ ((n & t & 0x55555555) * 0x3);
From here we combine every adjacent 2-bit solution into a 4-bit solution (using & for multiplication, ^ for addition, and the relationships described above), then every adjacent 4-bit solution into a 8-bit solution, then every adjacent 8-bit solution into a 16-bit solution, and finally every adjacent 16-bit solution into a 32-bit solution.
At the end, only the parity of the number of moves is returned stored in the second least significant bit.

the nth gray code

the formula for calculating nth gray code is :
(n-1) XOR (floor((n-1)/2))
(Source: wikipedia)
I encoded it as:
int gray(int n)
{
n--;
return n ^ (n >> 1);
}
Can someone explain how the above formula works, or possibly its deriviation?
If you look at binary counting sequence, you note, that neighboring codes differ at several last bits (with no holes), so if you xor them, pattern of several trailing 1's appear. Also, when you shift numbers right, xors also will be shifted right: (A xor B)>>N == A>>N xor B>>N.
N N>>1 gray
0000 . 0000 . 0000 .
| >xor = 0001 >xor = 0000 >xor = 0001
0001 . 0000 . 0001 .
|| >xor = 0011 | >xor = 0001 >xor = 0010
0010 . 0001 . 0011 .
| >xor = 0001 >xor = 0000 >xor = 0001
0011 . 0001 . 0010 .
||| >xor = 0111 || >xor = 0011 >xor = 0100
0100 0010 0110
Original Xor results and shifted results differ in single bit (i marked them by dot above). This means that if you xor them, you'll get pattern with 1 bit set. So,
(A xor B) xor (A>>1 xor B>>1) == (A xor A>>1) xor (B xor B>>1) == gray (A) xor gray (B)
As xor gives us 1s in differing bits, it proves, what neighbouring codes differ only in single bit, and that's main property of Gray code we want to get.
So for completeness, whould be proven, that N can be restored from its N ^ (N>>1) value: knowing n'th bit of code we can restore n-1'th bit using xor.
A_[bit n-1] = A_[bit n] xor gray(A)_[bit n-1]
Starting from largest bit (it is xored with 0) thus we can restore whole number.
Prove by induction.
Hint: The 1<<kth to (1<<(k+1))-1th values are twice the 1<<(k-1)th to (1<<k)-1th values, plus either zero or one.
Edit: That was way too confusing. What I really mean is,
gray(2*n) and gray(2*n+1) are 2*gray(n) and 2*gray(n)+1 in some order.
The Wikipedia entry you refer to explains the equation in a very circuitous manner.
However, it helps to start with this:
Therefore the coding is stable, in the
sense that once a binary number
appears in Gn it appears in the same
position in all longer lists; so it
makes sense to talk about the
reflective Gray code value of a
number: G(m) = the m-th reflecting
Gray code, counting from 0.
In other words, Gn(m) & 2^n-1 is either Gn-1(m & 2^n-1) or ~Gn-1(m & 2^n-1). For example, G(3) & 1 is either G(1) or ~G(1). Now, we know that Gn(m) & 2^n-1 will be the reflected (bitwise inverse) if m is greater than 2^n-1.
In other words:
G(m, bits), k= 2^(bits - 1)
G(m, bits)= m>=k ? (k | ~G(m & (k - 1), bits - 1)) : G(m, bits - 1)
G(m, 1) = m
Working out the math in its entirety, you get (m ^ (m >> 1)) for the zero-based Gray code.
Incrementing a number, when you look at it bitwise, flips all trailing ones to zeros and the last zero to one. That's a whole lot of bits flipped, and the purpose of Gray code is to make it exactly one. This transformation makes both numbers (before and after increment) equal on all the bits being flipped, except the highest one.
Before:
011...11
+ 1
---------
100...00
After:
010...00
+ 1
---------
110...00
^<--------This is the only bit that differs
(might be flipped in both numbers by carry over from higher position)
n ^ (n >> 1) is easier to compute but it seems that only changing the trailing 011..1 to 010..0 (i.e. zeroing the whole trailing block of 1's except the highest 1) and 10..0 to 11..0 (i.e flipping the highest 0 in the trailing 0's) is enough to obtain a Gray code.

Query about working out whether number is a power of 2

Using the classic code snippet:
if (x & (x-1)) == 0
If the answer is 1, then it is false and not a power of 2. However, working on 5 (not a power of 2) and 4 results in:
0001 1111
0001 1111
0000 1111
That's 4 1s.
Working on 8 and 7:
1111 1111
0111 1111
0111 1111
The 0 is first, but we have 4.
In this link (http://www.exploringbinary.com/ten-ways-to-check-if-an-integer-is-a-power-of-two-in-c/) for both cases, the answer starts with 0 and there is a variable number of 0s/1s. How does this answer whether the number is a power of 2?
You need refresh yourself on how binary works. 5 is not represented as 0001 1111 (5 bits on), it's represented as 0000 0101 (2^2 + 2^0), and 4 is likewise not 0000 1111 (4 bits on) but rather 0000 0100 (2^2). The numbers you wrote are actually in unary.
Wikipedia, as usual, has a pretty thorough overview.
Any power of two number can be represent in binary with a single 1 and multiple 0s.
eg.
10000(16)
1000(8)
100(4)
If you subtract 1 from any power of two number, you will get all 1s to the right of where the original one was.
10000(16) - 1 = 01111(15)
ANDing these two numbers will give you 0 every time.
In the case of a non-power of two number, subtracting one will leave at least one "1" unchanged somewhere in the number like:
10010(18) - 1 = 10001(17)
ANDing these two will result in
10000(16) != 0
Keep in mind that if x is a power of 2, there is exactly 1 bit set. Subtract 1, and you know two things: the resulting value is not a power of two, and the bit that was set is no longer set. So, when you do a bitwise and &, every bit that was set in x is not unset, and all the bits in (x-1) that are set must be matched against bits not set in x. So the and of each bit is always 0.
In other words, for any bit pattern, you are guaranteed that (x&(x-1)) is zero.
((n & (n-1)) == 0)
It checks whether the value of “n” is a power of 2.
Example:
if n = 8, the bit representation is 1000
n & (n-1) = (1000) & ( 0111) = (0000)
So it return zero only if its value is in power of 2.
The only exception to this is ‘0’.
0 & (0-1) = 0 but ‘0’ is not the power of two.
Why does this make sense?
Imagine what happens when you subtract 1 from a string of bits. You read from left to right,
turning each 0 to a 1 until you hit a 1, at which point that bit is flipped:
1000100100 -> (subtract 1) -> 1000100011
Thus, every bit, up through the first 1, is flipped. If there’s exactly one 1 in the number, then every bit (other than the leading zeros) will be flipped. Thus, n & (n-1) == 0 if there’s exactly one 1. If there’s exactly one 1, then it must be a power of two.

Resources