This is task from algorithms book.
The thing is that I completely don't know where to start!
Trace the following non-recursive algorithm to generate the binary reflexive
Gray code of order 4. Start with the n-bit string of all 0’s.
For i = 1, 2, ... 2^n-1, generate the i-th bit string by flipping bit b in the
previous bit string, where b is the position of the least significant 1 in the
binary representation of i.
So I know the Gray code for 1 bit should be 0 1, for 2 00 01 11 10 etc.
Many questions
1) Do I know that for n = 1 I can start of with 0 1?
2) How should I understand "start with the n-bit string of all 0's"?
3) "Previous bit string"? Which string is the "previous"? Previous means from lower n-bit? (for instance for n=2, previous is the one from n=1)?
4) How do I even convert 1-bit strings to 2-bit strings if the only operation there is to flip?
This confuses me a lot. The only "human" method I understand so far is: take sets from lower n-bit, duplicate them, invert the 2nd set, add 0's to every element in 1st set, add 1's do every elements in 2nd set. Done (example: 0 1 -> 0 1 | 0 1 -> 0 1 | 1 0 -> 00 01 | 11 10 -> 11 01 11 10 done.
Thanks for any help
The answer to all four your questions is that this algorithm does not start with lower values of n. All strings it generates have the same length, and the i-th (for i = 1, ..., 2n-1) string is generated from the (i-1)-th one.
Here is the fist few steps for n = 4:
Start with G0 = 0000
To generate G1, flip 0-th bit in G0, as 0 is the position of the least significant 1 in the binary representation of 1 = 0001b. G1 = 0001.
To generate G2, flip 1-st bit in G1, as 1 is the position of the least significant 1 in the binary representation of 2 = 0010b. G2 = 0011.
To generate G3, flip 0-th bit in G2, as 0 is the position of the least significant 1 in the binary representation of 3 = 0011b. G3 = 0010.
To generate G4, flip 2-nd bit in G3, as 2 is the position of the least significant 1 in the binary representation of 4 = 0100b. G4 = 0110.
To generate G5, flip 0-th bit in G4, as 0 is the position of the least significant 1 in the binary representation of 5 = 0101b. G5 = 0111.
Related
We are required to compute the bit wise AND amongst all natural numbers lying between A and B, both inclusive.I came across this problem on a website and here is the approach they used but i couldn't understand the method.Can anyone explain this more clearly with an example ?
In order to solve this problem, we just need to focus on the occurrences of each power 2, which turn out to be cyclic. Now for each 2^i(the length of the cycle will be 2^(i+1) having 2^i zeros followed by same number of ones) we just need to compute if 1 remains constant in the given interval, which is done by simple arithmetic. If so, that power of 2 will be present in the answer, otherwise it won't.
Let's count (unsigned) with 3 bits to visualize some numbers first:
000 = 0
001 = 1
010 = 2
011 = 3
100 = 4
101 = 5
110 = 6
111 = 7
If you look at the columns, you can see that the lowest bit is alternating with a cycle of 1, the next with a cycle of 2, then 4, and the nth lowest bit is alternating with a cycle of 2^(n-1).
As soon as a bit was 0 once it is always 0 (because 0 and whatever is 0).
You could also say the nth bit is only 1 if the nth bit of A and B is 1 and d < 2^(n-1). In other words a bit will only be 1 if it is 1 at the beginning and the end and didn't had time to change to 0 in between because its cycle is too large.
I am working on an Udacity quiz for their parallel programming course. I am pretty stuck on how I should start on the assignment because I am not sure if I understand it correctly.
For the assignment (in code) we are given two arrays and array on values and an array of positions. We are supposed to sort the array of values with a parallelized radix sort, along with setting the positions correctly too.
I completely understand radix sort and how it works. What I don't understand is how they want us to implemented it. Here is the template given to start the assignment
//Udacity HW 4
//Radix Sorting
#include "reference_calc.cpp"
#include "utils.h"
/* Red Eye Removal
===============
For this assignment we are implementing red eye removal. This is
accomplished by first creating a score for every pixel that tells us how
likely it is to be a red eye pixel. We have already done this for you - you
are receiving the scores and need to sort them in ascending order so that we
know which pixels to alter to remove the red eye.
Note: ascending order == smallest to largest
Each score is associated with a position, when you sort the scores, you must
also move the positions accordingly.
Implementing Parallel Radix Sort with CUDA
==========================================
The basic idea is to construct a histogram on each pass of how many of each
"digit" there are. Then we scan this histogram so that we know where to put
the output of each digit. For example, the first 1 must come after all the
0s so we have to know how many 0s there are to be able to start moving 1s
into the correct position.
1) Histogram of the number of occurrences of each digit
2) Exclusive Prefix Sum of Histogram
3) Determine relative offset of each digit
For example [0 0 1 1 0 0 1]
-> [0 1 0 1 2 3 2]
4) Combine the results of steps 2 & 3 to determine the final
output location for each element and move it there
LSB Radix sort is an out-of-place sort and you will need to ping-pong values
between the input and output buffers we have provided. Make sure the final
sorted results end up in the output buffer! Hint: You may need to do a copy
at the end.
*/
void your_sort(unsigned int* const d_inputVals,
unsigned int* const d_inputPos,
unsigned int* const d_outputVals,
unsigned int* const d_outputPos,
const size_t numElems)
{
}
I specifically don't understand how those 4 steps end up sorting the array.
So for the first step, I am supposed to create a histogram of the "digits" (why is that in quotes..?). So given a input value n I need to make a count of the 0's and 1's into a histogram. So, should step 1 create an array of histograms, one for each input value?
And well, for the rest of the steps it breaks down pretty quickly. Could someone show me how these steps are supposed to implement a radix sort?
The basic idea behind a radix sort is that we will consider each element to be sorted digit by digit, from least significant to most significant. For each digit, we will move the elements so that those digits are in increasing order.
Let's take a really simple example. Let's sort four quantities, each of which have 4 binary digits. Let's choose 1, 4, 7, and 14. We'll mix them up and also visualize the binary representation:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
First we will consider bit 0:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
bit 0: 1 0 0 1
Now the radix sort algorithm says we must move the elements in such a way that (considering only bit 0) all the zeroes are on the left, and all the ones are on the right. Let's do this while preserving the order of the elements with a zero bit and preserving the order of the elements with a one bit. We could do that like this:
Element # 2 3 1 4
Value: 14 4 7 1
Binary: 1110 0100 0111 0001
bit 0: 0 0 1 1
The first step of our radix sort is complete. The next step is to consider the next (binary) digit:
Element # 3 2 1 4
Value: 4 14 7 1
Binary: 0100 1110 0111 0001
bit 1: 0 1 1 0
Once again, we must move elements so that the digit in question (bit 1) is arranged in ascending order:
Element # 3 4 2 1
Value: 4 1 14 7
Binary: 0100 0001 1110 0111
bit 1: 0 0 1 1
Now we must move to the next higher digit:
Element # 3 4 2 1
Value: 4 1 14 7
Binary: 0100 0001 1110 0111
bit 2: 1 0 1 1
And move them again:
Element # 4 3 2 1
Value: 1 4 14 7
Binary: 0001 0100 1110 0111
bit 2: 0 1 1 1
Now we move to the last (highest order) digit:
Element # 4 3 2 1
Value: 1 4 14 7
Binary: 0001 0100 1110 0111
bit 3: 0 0 1 0
And make our final move:
Element # 4 3 1 2
Value: 1 4 7 14
Binary: 0001 0100 0111 1110
bit 3: 0 0 0 1
And the values are now sorted. This hopefully seems clear, but in the description so far we've glossed over the details of things like "how do we know which elements to move?" and "how do we know where to put them?" So let's repeat our example, but we'll use the specific methods and sequence suggested in the prompt, in order to answer these questions. Starting over with bit 0:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
bit 0: 1 0 0 1
First let's build a histogram of the number of zero bits in bit 0 position, and the number of 1 bits in bit 0 position:
bit 0: 1 0 0 1
zero bits one bits
--------- --------
1)histogram: 2 2
Now let's do an exclusive prefix-sum on these histogram values:
zero bits one bits
--------- --------
1)histogram: 2 2
2)prefix sum: 0 2
An exclusive prefix-sum is just the sum of all preceding values. There are no preceding values in the first position, and in the second position the preceding value is 2 (the number of elements with a 0 bit in bit 0 position). Now, as an independent operation, let's determine the relative offset of each 0 bit amongst all the zero bits, and each one bit amongst all the one bits:
bit 0: 1 0 0 1
3)offset: 0 0 1 1
This can actually be done programmatically using exclusive prefix-sums again, considering the 0-group and 1-group separately, and treating each position as if it has a value of 1:
0 bit 0: 1 1
3)ex. psum: 0 1
1 bit 0: 1 1
3)ex. psum: 0 1
Now, step 4 of the given algorithm says:
4) Combine the results of steps 2 & 3 to determine the final output location for each element and move it there
What this means is, for each element, we will select the histogram-bin prefix sum value corresponding to its bit value (0 or 1) and add to that, the offset associated with its position, to determine the location to move that element to:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
bit 0: 1 0 0 1
hist psum: 2 0 0 2
offset: 0 0 1 1
new index: 2 0 1 3
Moving each element to its "new index" position, we have:
Element # 2 3 1 4
Value: 14 4 7 1
Binary: 0111 1110 0111 0001
Which is exactly the result we expect for the completion of our first digit-move, based on the previous walk-through. This has completed step 1, i.e. the first (least-significant) digit; we still have the remaining digits to process, creating a new histogram and new prefix sums at each step.
Notes:
Radix-sort, even in a computer, does not have to be done based strictly on binary digits. It's possible to construct a similar algorithm with digits of different sizes, perhaps consisting of 2,3, or 4 bits.
One of the optimizations we can perform on a radix sort is to only sort based on the number of digits that are actually meaningful. For example, if we are storing quantities in 32-bit values, but we know that the largest quantity present is 1023 (2^10-1), we need not sort on all 32 bits. We can stop, expecting a proper sort, after proceeding through the first 10 bits.
What does any of this have to do with GPUs? In so far as the above description goes, not much. The practical application is to consider using parallel algorithms for things like the histogram, the prefix-sums, and the data movement. This decomposition of radix-sort allows one to locate and use parallel algorithms already developed for these more basic operations, in order to construct a fast parallel sort.
What follows is a worked example. This may help with your understanding of radix sort. I don't think it will help with your assignment, because this example performs a 32-bit radix sort at the warp level, for a single warp, ie. for 32 quantities. But a possible advantage from an understanding point of view is that things like histogramming and prefix sums can be done at the warp level in just a few instructions, taking advantage of various CUDA intrinsics. For your assignment, you won't be able to use these techniques, and you will need to come up with full-featured parallel prefix sums, histograms, etc. that can operate on an arbitrary dataset size.
#include <stdio.h>
#include <stdlib.h>
#define WSIZE 32
#define LOOPS 100000
#define UPPER_BIT 31
#define LOWER_BIT 0
__device__ unsigned int ddata[WSIZE];
// naive warp-level bitwise radix sort
__global__ void mykernel(){
__shared__ volatile unsigned int sdata[WSIZE*2];
// load from global into shared variable
sdata[threadIdx.x] = ddata[threadIdx.x];
unsigned int bitmask = 1<<LOWER_BIT;
unsigned int offset = 0;
unsigned int thrmask = 0xFFFFFFFFU << threadIdx.x;
unsigned int mypos;
// for each LSB to MSB
for (int i = LOWER_BIT; i <= UPPER_BIT; i++){
unsigned int mydata = sdata[((WSIZE-1)-threadIdx.x)+offset];
unsigned int mybit = mydata&bitmask;
// get population of ones and zeroes (cc 2.0 ballot)
unsigned int ones = __ballot(mybit); // cc 2.0
unsigned int zeroes = ~ones;
offset ^= WSIZE; // switch ping-pong buffers
// do zeroes, then ones
if (!mybit) // threads with a zero bit
// get my position in ping-pong buffer
mypos = __popc(zeroes&thrmask);
else // threads with a one bit
// get my position in ping-pong buffer
mypos = __popc(zeroes)+__popc(ones&thrmask);
// move to buffer (or use shfl for cc 3.0)
sdata[mypos-1+offset] = mydata;
// repeat for next bit
bitmask <<= 1;
}
// save results to global
ddata[threadIdx.x] = sdata[threadIdx.x+offset];
}
int main(){
unsigned int hdata[WSIZE];
for (int lcount = 0; lcount < LOOPS; lcount++){
unsigned int range = 1U<<UPPER_BIT;
for (int i = 0; i < WSIZE; i++) hdata[i] = rand()%range;
cudaMemcpyToSymbol(ddata, hdata, WSIZE*sizeof(unsigned int));
mykernel<<<1, WSIZE>>>();
cudaMemcpyFromSymbol(hdata, ddata, WSIZE*sizeof(unsigned int));
for (int i = 0; i < WSIZE-1; i++) if (hdata[i] > hdata[i+1]) {printf("sort error at loop %d, hdata[%d] = %d, hdata[%d] = %d\n", lcount,i, hdata[i],i+1, hdata[i+1]); return 1;}
// printf("sorted data:\n");
//for (int i = 0; i < WSIZE; i++) printf("%u\n", hdata[i]);
}
printf("Success!\n");
return 0;
}
The methodology that #Robert Crovella gives is absolutely correct and very helpful. It is mildly different than the process that they explain in the Udacity videos. I'll record one iteration of their method, watchable here, in this answer, jumping off from Robert Crovella's example:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
LSB: 1 0 0 1
Predicate: 0 __1__ __1__ 0
Pred. Scan: 0 __0__ __1__ 2
Number of ones in predicate: 2
!Predicate:__1__ 0 0 __1__
!Pred. Scan: 0 1 1 1
Offset for !Pred. Scan = Number of ones in predicate = 2
!Pred. Scan + Offset:
__2__ 3 3 __3__
Final indexes to move values after 1 iteration (on LSB):
2 0 1 3
Values after 1 iteration (on LSB):
14 4 7 1
I placed emphasis (__ __) on the values that indicate or contain the index to move the value to.
Terms (from Udacity video):
LSB = least significant bit
Predicate (for LSB): (x & 1) == 0
for the next significant bit: (x & 2) == 0
for the one after that: (x & 4) == 0
and so on, with more left shifting (<<)
Pred. Scan = Predicate Scan = Predicate exclusive prefix sum
!Pred. = bits of predicate flipped (0->1 and 1->0)
Number of ones in predicate
note that this is not necessarily the last entry in the scan, you can instead get this value (sum/reduction of the predicate) as an intermediate of the Blelloch scan
A summary of the above is:
Get the predicate of your list (bit in common, starting from the LSB)
Scan the predicate, and record the sum of the predicate in the process
Blelloch Scan on the GPU
note that your predicate will be of arbitrary size, so read the section on Blelloch Scan for arrays of arbitrary instead of 2^n size
Flip bits of the predicate, and scan that
Move the values in your array with the following rule:
For the ith element in the array:
if the ith predicate is TRUE, move the ith value to the index in the ith element of the predicate scan
else, move the ith value to the index in the ith element of the !Predicate scan plus the sum of the Predicate
Move to the next significant bit (NSB)
For reference, you can consult my solution for this HW assignment in CUDA.
This a fairly known problem ( similar question: number of setbits in a number and a game based on setbits but answer not clear ):
The beauty of a number X is the number of 1s in the binary
representation of X. Two players are plaing a game. There is a number n
written on a blackboard. The game is played as following:
Each time a player chooses an integer number (0 <= k) so that 2^k is
less than n and (n-2^k) is equally as beautiful as n. Then n is removed from
blackboard and replaced with n-2^k instead. The player that cannot continue
the game (there is no such k that satisfies the constrains) loses the
game.
The First player starts the game and they alternate turns.
Knowing that both players play optimally must specify the
winner.
Now the solution I came up with is this:
Moving a 1 bit to its right, is subtracting the number by 2^p where ( p = position the bit moved to - 1). Example: 11001 --> 25 now if I change it to 10101 ---> 21 ( 25-(2^2))
A player can't make 2 or more such right shift in 1 round (not the programmatic right shift) as they can't sum to a power of 2. So the player are left with moving the set bit to some position to its right just once each round. This means there can be only R rounds where R is the number of times a set bit can be moved to a more right position. So the winner will always be the 1st player if R is Odd number and 2nd player if R is even number.
Original#: 101001 41
after 1st: 11001 25 (41-16)
after 2nd: 10101 21 (25-4)
after 1st: 1101 13 (21-8)
after 2nd: 1011 11 (13-2)
after 1st: 111 7 (11-4) --> the game will end at this point
I'm not sure about the correctness of the approach, is this correct? or am I missing something big?
Your approach is on the right track. The observation to be made here is that, also as illustrated in the example you gave, the game ends when all ones are on the least significant bits. So we basically need to count how many swaps we need to make the zeros go to the most significant bits.
Let's take an example, say the initial number from which game starts is 12 the the game state is as follows:
Initial state 1100 (12) ->
A makes move 1010 (10) ->
B makes move 1001 (9) ->
A makes move 0101 (5) ->
B makes 0011 (3)
A cannot move further and B wins
This can be programmatically (java program v7) achieved as
public int identifyWinner (int n) {
int total = 0, numZeros = 0;
while (n != 0) {
if ((n & 0x01) == 1) {
total += numZeros;
} else {
numZeros++;
}
n >>= 1;
}
return (total & 0b1) == 1 ? 1 : 2;
}
Also to note that even if there are multiple choices available with a player to make the next move, as illustrated below, the outcome will not change though the intermediate changes leading to outcome may change.
Again let us look at the state flow taking the same example of initial number 12
Initial state 1100 (12) ->
A makes move 1010 (10) ->
(B here has multiple choices) B makes move 0110 (6)
A makes move 0101 (5) ->
B makes 0011 (3)
A cannot move further and B wins
A cannot move further as for no k (k >=0 and n < 2**k so k =0, 1 are the only plausible choices here) does n-2^k has same beauty as n so B wins.
Multiple choices are possible with 41 as starting point as well, but A will win always (41(S) -> 37(A) -> 35(B) -> 19(A) -> 11(B) -> 7(A)).
Hope it helps!
Yes, each turn a 1 can move right if there is a 0 to its right.
But, no, the number of moves is not related to number of zeros. Counterexample:
101 (1 possible move)
versus
110 (2 possible moves)
The number of moves in the game is the sum of the total 1's to the left of each 0. (Or conversely the sum of the total 0's to the right of each 1.)
(i.e. 11000 has 2 + 2 + 2 = 6 moves, but 10100 has 1 + 2 + 2 = 5 moves because one 0 has one less 1 to its right)
The winner of the game will be the first player if the total moves in the game is odd, and will be the second player if the number of moves in the game is even.
Proof:
On any given move a player must choose a bit corresponding to
a 0 immediately to the right of a 1. Otherwise the total number of
1's will increase if a bit corresponding to a different 0 is chosen,
and will decrease if a bit corresponding to a 1 is chosen. Such a move
will result in the 1 to the right of the corresponding chosen bit
being moved one position to its right.
Given this observation, each
1 has to move through every 0 to its right; and every 0 it moves
through consumes one move. Note that regardless of the choices either
player makes on any given move, the total number of moves in the game
remains fixed.
Since Harshdeep has already posted a correct solution looping over each bit (the O(n) solution), I'll post an optimized divide and conquer O(log(n)) solution (in C/C++) reminiscent of a similar algorithm to calculate Hamming Weight. Of course using Big-Oh to describe the algorithm here is somewhat dubious since the number of bits is constant.
I've verified that the below code on all 32-bit unsigned integers gives the same result as the linear algorithm. This code runs over all values in order in 45 seconds on my machine, while the linear code takes 6 minutes and 45 seconds.
Code:
bool FastP1Win(unsigned n) {
unsigned t;
// lo: 0/1 count parity
// hi: move count parity
// 00 -> 00 : 00 >>1-> 00 &01-> 00 ; 00 |00-> 00 ; 00 &01-> 00 &00-> 00 *11-> 00 ^00-> 00
// 01 -> 01 : 01 >>1-> 00 &01-> 00 ; 01 |00-> 01 ; 01 &01-> 01 &00-> 00 *11-> 00 ^01-> 01
// 10 -> 11 : 10 >>1-> 01 &01-> 01 ; 10 |01-> 11 ; 10 &01-> 00 &01-> 00 *11-> 00 ^11-> 11
// 11 -> 00 : 11 >>1-> 01 &01-> 01 ; 11 |01-> 11 ; 11 &01-> 01 &01-> 01 *11-> 11 ^11-> 00
t = (n >> 1) & 0x55555555;
n = (n | t) ^ ((n & t & 0x55555555) * 0x3);
t = n << 2; // move every right 2-bit solution to line up with the every left 2-bit solution
n ^= ((n & t & 0x44444444) << 1) ^ t; // merge the right 2-bit solution into the left 2-bit solution
t = (n << 4); // move every right 4-bit solution to line up with the every left 4-bit solution
n ^= ((n & t & 0x40404040) << 1) ^ t; // merge the right 4-bit solution into the left 4-bit solution (stored in the high 2 bits of every 4 bits)
t = n << 8; // move every right 8-bit solution to line up with the every left 8-bit solution
n ^= ((n & t & 0x40004000) << 1) ^ t; // merge the right 8-bit solution into the left 8-bit solution (stored in the high 2 bits of every 8 bits)
t = n << 16; // move every right 16-bit solution to line up with the every left 16-bit solution
n ^= ((n & t) << 1) ^ t; // merge the right 16-bit solution into the left 16-bit solution (stored in the high 2 bits of every 16 bits)
return (int)n < 0; // return the parity of the move count of the overall solution (now stored in the sign bit)
}
Explanation:
To find number of moves in the game, one can divide the problem into smaller pieces and combine the pieces. One must track the number of 0's in any given piece, and also the number of moves in any given piece.
For instance, if we divide the problem into two 16-bit pieces then the following equation expresses the combination of the solutions:
totalmoves = leftmoves + rightmoves + (rightzeros * (16 - leftzeros)); // 16 - leftzeros yields the leftones count
Since we don't care about the total moves, just weather the value is even or odd (the parity) we only need to track the parity.
Here is the truth table for the parity of addition:
even + even = even
even + odd = odd
odd + even = odd
odd + odd = even
Given the above truth table, the parity of addition can be expressed with an XOR.
And the truth table for the parity of multiplication:
even * even = even
even * odd = even
odd * even = even
odd * odd = odd
Given the above truth table, the parity of multiplication can be expressed with an AND.
If we divide the problem into pieces of even size, then the parity of the zero count, and the one count, will always be equal and we need not track or calculate them separately.
At any given stage of the algorithm we need to know the parity of the zero/one count, and the parity of the number of moves in that piece of the solution. This requires two bits. So, lets transform every two bits in the solution so that the high bit becomes the move count parity, and the low bit becomes the zero/one count parity.
This is accomplished with this computation:
unsigned t;
t = (n >> 1) & 0x55555555;
n = (n | t) ^ ((n & t & 0x55555555) * 0x3);
From here we combine every adjacent 2-bit solution into a 4-bit solution (using & for multiplication, ^ for addition, and the relationships described above), then every adjacent 4-bit solution into a 8-bit solution, then every adjacent 8-bit solution into a 16-bit solution, and finally every adjacent 16-bit solution into a 32-bit solution.
At the end, only the parity of the number of moves is returned stored in the second least significant bit.
I am trying to think of an algorithm to implement this for a given n bit binary number. I tried out many examples, but am unable to find out any pattern. So how shall I proceed?
How about this:
Convert the number to base 4 (this is trivial by simply combining pairs of bits). 5 in base 4 is 11. The values base 4 that are divisible by 11 are somewhat familiar: 11, 22, 33, 110, 121, 132, 203, ...
The rule for divisibility by 11 is that you add all the odd digits and all the even digits and subtract one from the other. If the result is divisible by 11 (which remember is 5), then it's divisible by 11 (which remember is 5).
For example:
123456d = 1 1110 0010 0100 0000b = 132021000_4
The even digits are 1 2 2 0 0 : sum = 5d
The odd digits are 3 0 1 0 : sum = 4d
Difference is 1, which is not divisble by 5
Or another one:
123455d = 1 1110 0010 0011 1111b = 132020333_4
The even digits are 1 2 2 3 3 : sum = 11d
The odd digits are 3 0 0 3 : sum = 6d
Difference is 5, which is a 5 or a 0
This should have a fairly efficient HW implementation because it's mostly bit-slicing, followed by N/2 adders, where N is the number of bits in the number you're interested in.
Note that after adding the digits and subtracting, the maximum value is 3/4 * N, so if you have 16-bit numbers max, you can get at most 12 as a result, so you only need to check for 0, ±5 and ±10 explicitly. If you're using 32-bit numbers then you can get at most 24 as a result, so you need to also check if the result is ±15 or ±20.
Make a Deterministic Finite Automaton (DFA) to implement the divisibility check and implement the DFA in hardware.
Creating a DFA for divisibility by 5 is easy. You just need to notice the remainders and check what 2r (mod 5) and 2r + 1(mod 5) map to. There are many websites that discuss this. For example this one.
There are well-known examples to convert DFA to a hardware representation as well.
Well , I just figured out ...
number mod 5 = a0 * 2^0 mod 5 + a1 * 2^1 mod 5 +a2* 2^2 mod 5 + a3 * 2^3 mod 5 + a4 * 2^4 mod 5 + ....
= a0 (1) + a1(2) +a2 (-1) +a3 (-2) +a4 (1) repeats ...
Hence difference of odd digits + 2 times difference of even digits = divisible by 5
for example ... consider 110010
odd digits differnce = 0-0+1 = 1 or 01
even digits difference = 1-0+1 = 2 or 10
difference of odd digits + 2 times difference of even digits = 01 + 2*(10)=01 + 100 = 101 is divisible by 5 .
The contribution of each bit toward being divisible by five is a four bit pattern 3421.
You could shift through any binary number 4 bits at a time adding the corresponding value for positive bits.
Example:
100011
take 0011
apply the pattern 0021
sum 3
next four bits 0010
apply the pattern 0020
sum = 5
We can design a Deterministic Finite Automaton (DFA) for the same. The DFA, then can be implemented in Hardware. This is similar to this answer.
We will simulate a Deterministic Finite Automaton (DFA) that accepts Binary Representation of Integers which are divisible by 5
Now, by accept, we mean that when we are done with scanning string, we should be in one of the multiple possible Final States.
Approach to Design DFA : Essentially, we need to divide the Binary Representation of Integer by 5, and track the remainder. If after consuming/scanning [From Left to Right] the entire string, remainder is Zero, then we should end up in Final State, and if remainder isn't zero we should be in Non-Final States.
Now, DFA is defined by Quintuple/5-Tuple (Q,q₀,F,Σ,δ). We will obtain these five components step-by-step.
Q : Finite Set of States
We need to track remainder. On dividing any integer by 5, we can get remainder as 0,1, 2, 3 or 4. Hence, we will have Five States Z, O, T, Th and F for each possible remainder.
Q={Z, O, T, Th, F}
If after scanning certain part of Binary String, we are in state Z, this means that integer defined from Left to this part will give remainder Zero when divided by 5. Similarly, O for remainder One, and so on.
Now, we can write these three states by Euclidean Division Algorithm as
Z : 5m
O : 5m+1
T : 5m+2
Th : 5m+3
F : 5m+4
where m is Integer.
q₀ : an initial/start state from set Q
Now, start state can be thought in terms of empty string (ɛ). An ɛ directly gets into q₀.
What remainder does ɛ gives when divided by 5?
We can append as many 0s in left hand side of a Binary Number. In the similar fashion, we can append ɛ in left hand side of a Binary String. Thus, ɛ in left can be thought of as 0. And 0 when divided by 5 gives remainder 0. Hence, ɛ should end in State Z. But ɛ ends up in q₀.
Thus, q₀=Z
F : a set of accept states
Now we want all strings which are divisible by 5, or which gives remainder 0 when divided by 5, or which after complete scanning should end up in state Z, and gets accepted.
Hence,
F={Z}
Σ : Alphabet (a finite set of input symbols)
Since we are scanning/reading a Binary String. Hence,
Σ={0,1}
δ : Transition Function (δ : Q × Σ → Q)
Now this δ tells us that if we are in state x (in Q) and next input to be scanned is y (in Σ), then at which state z (in Q) should we go.
If the string upto this point gives remainder 3/Th when divided by 5, and if we append 1 to string, then what remainder will resultant string give.
Now, this can be analyzed by observing how magnitude of a binary string changes on appending 0 and 1.
a.
In Decimal (Base-10), if we add/append 0, then magnitude gets multiplied by 10 . 53, on appending 0 it becomes 530
Also, if we append 8 to decimal, then Magnitude gets multiplied by 10, and then we add 8 to multiplied magnitude.
b.
In Binary (Base-2), if we add/append 0, then magnitude gets multiplied by 2 (The Positional Weight of each Bit get multiplied by 2)
Example : (1010)2 [which is (10)10], on appending 0 it becomes (10100)2 [which is (20)10]
Similarly, In Binary, if we append 1, then Magnitude gets multiplied by 2, and then we add 1.
Example : (10)2 [which is (2)10], on appending 1 it becomes (101)2 [which is (5)10]
Thus, we can say that for Binary String x,
x0=2|x|
x1=2|x|+1
We will use these relation to analyze Five States
Any string in Z can be written as 5m
- On 0, it becomes 2(5m), which is 5(2m), nothing but state Z.
- On 1, it becomes 2(5m)+1, which is 5(2m)+1, that is O. [This can be read as if a Binary String is presently divisible by 5, and we append 1, then resultant string will give remainder as 1]
Any string in O can be written as 5m+1
- On 0, it becomes 2(5m+1) = 10m+2, which is 5(2m)+2, state T.
- On 1, it becomes 2(5m+1)+1 = 10m+3, which is 5(2m)+3, that is state Th.
Any string in T can be written as 5m+2
- On 0, it becomes 2(5m+2) = 10m+4, which is 5(2m)+4, state F.
- On 1, it becomes 2(5m+2)+1 = 10m+5, which is 5(2m+1), state Z. [If m is integer, so is (2m+1)]
Any string in Th can be written as 5m+3
- On 0, it becomes 2(5m+3) = 10m+6, which is 5(2m+1)+1, state V.
- On 1, it becomes 2(5m+3)+1 = 10m+7, which is 5(2m+1)+2, that is state T.
Any string in F can be written as 5m+4
- On 0, it becomes 2(5m+4) = 10m+8, which is 5(2m+1)+3, state Th.
- On 1, it becomes 2(5m+4)+1 = 10m+9, which is 5(2m+1)+4, that is state F.
Hence, the final DFA combining Everything (creating using Tool)
We can even write code [in High Level Language] for the same. But it would go beyond main aim of this question. If readers wish to see the same, they can check here.
As any assignment this would have been an answer for is bound to be way overdue a year later:
in the binary representation of a natural divisible by five the parities of bits 4n and 4n+2 equal, as well as those for bits 4n+1 and 4n+3.
(This is entirely equivalent to the answers of JoshG79, notsogeek, or james: 4≡-1(mod 5), 3≡-2(mod 5) (with reduced hand-waving about recursion in argumentation, and no dispensable handling of carries in circuitry))
Using the classic code snippet:
if (x & (x-1)) == 0
If the answer is 1, then it is false and not a power of 2. However, working on 5 (not a power of 2) and 4 results in:
0001 1111
0001 1111
0000 1111
That's 4 1s.
Working on 8 and 7:
1111 1111
0111 1111
0111 1111
The 0 is first, but we have 4.
In this link (http://www.exploringbinary.com/ten-ways-to-check-if-an-integer-is-a-power-of-two-in-c/) for both cases, the answer starts with 0 and there is a variable number of 0s/1s. How does this answer whether the number is a power of 2?
You need refresh yourself on how binary works. 5 is not represented as 0001 1111 (5 bits on), it's represented as 0000 0101 (2^2 + 2^0), and 4 is likewise not 0000 1111 (4 bits on) but rather 0000 0100 (2^2). The numbers you wrote are actually in unary.
Wikipedia, as usual, has a pretty thorough overview.
Any power of two number can be represent in binary with a single 1 and multiple 0s.
eg.
10000(16)
1000(8)
100(4)
If you subtract 1 from any power of two number, you will get all 1s to the right of where the original one was.
10000(16) - 1 = 01111(15)
ANDing these two numbers will give you 0 every time.
In the case of a non-power of two number, subtracting one will leave at least one "1" unchanged somewhere in the number like:
10010(18) - 1 = 10001(17)
ANDing these two will result in
10000(16) != 0
Keep in mind that if x is a power of 2, there is exactly 1 bit set. Subtract 1, and you know two things: the resulting value is not a power of two, and the bit that was set is no longer set. So, when you do a bitwise and &, every bit that was set in x is not unset, and all the bits in (x-1) that are set must be matched against bits not set in x. So the and of each bit is always 0.
In other words, for any bit pattern, you are guaranteed that (x&(x-1)) is zero.
((n & (n-1)) == 0)
It checks whether the value of “n” is a power of 2.
Example:
if n = 8, the bit representation is 1000
n & (n-1) = (1000) & ( 0111) = (0000)
So it return zero only if its value is in power of 2.
The only exception to this is ‘0’.
0 & (0-1) = 0 but ‘0’ is not the power of two.
Why does this make sense?
Imagine what happens when you subtract 1 from a string of bits. You read from left to right,
turning each 0 to a 1 until you hit a 1, at which point that bit is flipped:
1000100100 -> (subtract 1) -> 1000100011
Thus, every bit, up through the first 1, is flipped. If there’s exactly one 1 in the number, then every bit (other than the leading zeros) will be flipped. Thus, n & (n-1) == 0 if there’s exactly one 1. If there’s exactly one 1, then it must be a power of two.