hamming code parity distribution issue - bit

Can someone please clarify where im going wrong. I'm at this 2 hours... I know that the first parity in the code includes itself and skips every first number after it in a yes and no sequence. the second skips every second set of numbers after itself and the following number. The 4th should skip 4 after including itself and the first 3 numbers.
These are my message bits in its original form: 1101011011000110
and I want to add the hamming parity bits onto them.
? = Parity
so this means ??1?101?01101100011?0
Parity 1 = ?110110010
Parity 2 = ?101111011
Parity 3 = ?1010110?0 (this is where my issue is so I cant move on)
Parity 4 = cant get to this part...

It looks like you have the 5th parity bit in the wrong location.
You have:
??1?101?01101100011?0
But it should be:
??1?101?0110110?00110
So, recalculating the parity bits (bold means "keep", otherwise skip), we follow the pattern "keep 1, skip 1, keep 1...":
? ? 1 ? 1 0 1 ? 0 1 1 0 1 1 0 ? 0 0 1 1 0
P1: ?1110110010 (even number of 1s, so 0 parity)
(discard first 1 bit) Follow the pattern "keep 2, skip 2, keep 2...":
?1 ?1 01 ?0 11 01 10 ?0 01 10
P2: ?101111001 (even number of 1s, so 0 parity)
(discard next 2 bits) Follow the pattern "keep 4, skip 4, keep 4...":
?101 ?011 0110 ?0011 0
P4: ?10101100 (even number of 1s, so 0 parity)
(discard next 4 bits) Follow the pattern "keep 8, skip 8, keep 8..." (but we can only do keep 8, skip 6, so that's good enough here):
?0110110 ?00110
P5: ?0110110 (even number of 1s, so 0 parity)
So the final, hamming (21,16) encoded value is:
001010100110110000110
(21 bits, 16 of which are data bits)

Related

Why n bitwise and -n always return the right most bit (last bit)

Here is the python code snippet:
1 & -1 # 1
2 & -2 # 2
3 & -3 # 1
...
It seems any n & -n always return right most (last) bit, I don't really know why. Can someone help me to understand this?
It's due to the way that negative numbers are represented in binary, which is called two's complement representation.
To create the two's complement of some number n (in other words, to create the representation of -n):
Invert all the bits
Add 1
So in other words, when you write 1 & -1 it really means 1 & ((~1)+1). The initial ~1 gives the value 1111110 and adding one gives 11111111. (Let's stick with 8 bits for these examples.) ANDing that values with 1 gives just 1.
In the next case, 2 & -2 means 2 & ((~2)+1). Inverting 2 gives 11111101 and adding one gives 11111110. Then AND with 2 (10 in binary) gives 2.
Finally 3 & -3 means 3 & ((~3)+1). Invert 3 gives 11111100, add 1 gives 11111101, and AND with 3 (11 binary) gives 1.
~x = -1 -x
so
-x = ~x + 1
When you take the compliment of x (~x), all the 0-bits turn to 1 and all the 1 bits turn to zero. e.g. 101100 -> 010011.
When you add 1, the consecutive 1s on the right change to 0 and the first 0 bit gets set to 1: 010011 -> 010100
If you & that with the original, the 0 bits at the top that changed to 1 come out 0. The 1 bits at the bottom you flipped to 0 by adding come out 0. Only the rightmost 1 bit, which turned into the rightmost 0 bit in the complement and got reset to 1 by the addition, is 1 on both sides: 101100 & 010100 -> 000100
The integers are stored in the memory in the binary form. The non-negative integers are stored as it is in as they are in their binary but the negative numbers are stored in the two's complement form. For example take any arbitrary number 158.
158 = 0000000010011110
while its negative ie.
-158 = 1111111101100010
Take for any number and bitwise AND with its negative then you will get rightmost set bit. This is because in the process of conversion of two's complement we start from right and put the bits as it is till we encounter our first set bit. That is the right most set bit is being written as it is. Then We flip the digits in the left starting from here.
However Above procedure is just a shortcut method for calculating 2's complement. For actual process, you have to first take 1's complement of the number(flipping all bits set to unset and unset to set) and then add 1 to the whole result. Now here is insight of why does this shortcut works everytime
Why does this two's complement shortcut work?
This will give you more insight https://www.geeksforgeeks.org/efficient-method-2s-complement-binary-string/
And take some numbers and work on examples and see yourself.
The catch is in calculating the 2's complement not in performing bitwise AND operation, AND always needs same things to be same to give true.
What 2's complement is doing is, it is making the rightmost digit of both numbers taking part in bitwise AND operation, Look below how 2's complement operation is being calculated and you will find that we represent the number(decimal) in binary and then to calculate the 2's complement we start from right and copy all binary digits the same until we see the first 1, and when we see the first 1 then we make all left of it as opposite the the previous.
The catch is:
Say you want 2's complement of 4 i.e (-4) , so what it says is represent the decimal in binary and copy all bits(0) from right till you see the first 1 and after that reverse all 0s and 1s.
Example: we want 2's complement of 6 -> 0 1 1 0 = 0 0 1 0 , From right of 0110 we start and till we see the first 1 we copy exactly what is in first then we reverse all 0s and 1s.
Another operation with 2's complement
0100
1100 ( Bold are just same as above)
Now it is obvious that when you do bitwise AND then the right most
digit will only make through the AND operation as you need both to
be equal to get through AND.

Decode number coded with some strange algorithm

There is such an encoding algorithm: we're given some number x, let's say 7923. We add zeros to the end and the beginning of it - 079230. Now we sum digits, beginning from the right side - the first digit with the second, the second with the third etc. If the sum is bigger than 9, we subtract 10 from it and move 1 to the next operation.
So for the example of 079230: 0+3 = 3, 3+2=5, 2+9=11, 9+7+1=17, 7+0+1=8. Our encoded number are the results of the operations, so we end with a number: 87153.
Now, we are given the encoded number, but with one digit missing - for example 871X3. Now we need to decode it - get 7923 from it.
So shortly speaking - 871X3 is the input, the output should be 7923. Only one digit could be missing.
That is kind of weird problem for me. I guess we need to deduce somehow the original number, maybe with some backtracking? The first digit of original number could be the first digit of encoded number or that digit minus 1. But I can't see any path leading from here to the solution.
Look at your algorithm: this is merely performing long multiplication. The "encoded" number is the original times 11.
Since the multiplier is greater than 9, there can be only one possibility for the missing digit. Find it with modular arithmetic. Put in 0 as the missing digit and check:
87103 mod 11 ==> 5
Now, the "parity" of the number changes with the position. Counting from the right, even-numbered positions will return the missing digit itself (factors of 0, 99, 9999, 999999, etc. will drop out). Odd-numbered positions will give the additive inverse: subtract that digit from 11 to get the missing one. For the given example, try this for each of the digits:
87150 mod 11 ==> 8 missing digit is 3 (11-8)
87103 mod 11 ==> 5 missing digit is 5
87053 mod 11 ==> 10 missing digit is 1 (11-10)
80153 mod 11 ==> 7 missing digit is 7
7153 mod 11 ==> 3 missing digit is 8 (11-3)
say the digits of the original number are, from left to right, d,c,b,a.
0 + a mod 10 = 3, so a = 3
3 + b mod 10 = 5, so b = 2
2 + c mod 10 = 1, so c = 9
9 + 1 + d mod 10 = 7, so d = 7
7 + 1 mod 10 = 8, as expected.

How can I understand the result of this testcase in this code challenge?

I am trying to understand the first testcase of this challenge in codeforces.
The description is:
Sergey is testing a next-generation processor. Instead of bytes the processor works with memory cells consisting of n bits. These bits are numbered from 1 to n. An integer is stored in the cell in the following way: the least significant bit is stored in the first bit of the cell, the next significant bit is stored in the second bit, and so on; the most significant bit is stored in the n-th bit.
Now Sergey wants to test the following instruction: "add 1 to the value of the cell". As a result of the instruction, the integer that is written in the cell must be increased by one; if some of the most significant bits of the resulting number do not fit into the cell, they must be discarded.
Sergey wrote certain values ​​of the bits in the cell and is going to add one to its value. How many bits of the cell will change after the operation?
Summary
Given a binary number, add 1 to its decimal value, count how many bits change after the operation?
Testcases
4
1100
= 3
4
1111
= 4
Note
In the first sample the cell ends up with value 0010, in the second sample — with 0000.
In the 2 test case 1111 is 15, so 15 + 1 = 16 (10000 in binary), so all the 1's change, therefore is 4
But in the 2 test case 1100 is 12, so 12 + 1 = 13 (01101), here just the left 1 at the end changes, but the result is 3 why?
You've missed the crucial part: the least significant bit is the first one (i.e. the leftmost one), not the last one, as we usually write binary.
Thus, 1100 is not 12 but 3. And so, 1100 + 1 = 3 + 1 = 4 = 0010, so 3 bits are changed.
The "least significant bit" means literally a bit that is not the most significant, so you can understand it as "the one representing the smallest value". In binary, the bit representing 2^0 is the least significant. So the binary code in your task is written as follows:
bit no. 0 1 2 3 4 (...)
value 2^0 2^1 2^2 2^3 2^4 (...)
| least | most
| significant | significant
| bit | bit
that's why 1100 is:
1100 = 1 * 2^0 + 1 * 2^1 + 0*2^2 + 0*2^3 = 1 + 2 + 0 + 0 = 3
not the other way around (as we write usually).

Parallel radix sort, how would this implementation actually work? Are there some heuristics?

I am working on an Udacity quiz for their parallel programming course. I am pretty stuck on how I should start on the assignment because I am not sure if I understand it correctly.
For the assignment (in code) we are given two arrays and array on values and an array of positions. We are supposed to sort the array of values with a parallelized radix sort, along with setting the positions correctly too.
I completely understand radix sort and how it works. What I don't understand is how they want us to implemented it. Here is the template given to start the assignment
//Udacity HW 4
//Radix Sorting
#include "reference_calc.cpp"
#include "utils.h"
/* Red Eye Removal
===============
For this assignment we are implementing red eye removal. This is
accomplished by first creating a score for every pixel that tells us how
likely it is to be a red eye pixel. We have already done this for you - you
are receiving the scores and need to sort them in ascending order so that we
know which pixels to alter to remove the red eye.
Note: ascending order == smallest to largest
Each score is associated with a position, when you sort the scores, you must
also move the positions accordingly.
Implementing Parallel Radix Sort with CUDA
==========================================
The basic idea is to construct a histogram on each pass of how many of each
"digit" there are. Then we scan this histogram so that we know where to put
the output of each digit. For example, the first 1 must come after all the
0s so we have to know how many 0s there are to be able to start moving 1s
into the correct position.
1) Histogram of the number of occurrences of each digit
2) Exclusive Prefix Sum of Histogram
3) Determine relative offset of each digit
For example [0 0 1 1 0 0 1]
-> [0 1 0 1 2 3 2]
4) Combine the results of steps 2 & 3 to determine the final
output location for each element and move it there
LSB Radix sort is an out-of-place sort and you will need to ping-pong values
between the input and output buffers we have provided. Make sure the final
sorted results end up in the output buffer! Hint: You may need to do a copy
at the end.
*/
void your_sort(unsigned int* const d_inputVals,
unsigned int* const d_inputPos,
unsigned int* const d_outputVals,
unsigned int* const d_outputPos,
const size_t numElems)
{
}
I specifically don't understand how those 4 steps end up sorting the array.
So for the first step, I am supposed to create a histogram of the "digits" (why is that in quotes..?). So given a input value n I need to make a count of the 0's and 1's into a histogram. So, should step 1 create an array of histograms, one for each input value?
And well, for the rest of the steps it breaks down pretty quickly. Could someone show me how these steps are supposed to implement a radix sort?
The basic idea behind a radix sort is that we will consider each element to be sorted digit by digit, from least significant to most significant. For each digit, we will move the elements so that those digits are in increasing order.
Let's take a really simple example. Let's sort four quantities, each of which have 4 binary digits. Let's choose 1, 4, 7, and 14. We'll mix them up and also visualize the binary representation:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
First we will consider bit 0:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
bit 0: 1 0 0 1
Now the radix sort algorithm says we must move the elements in such a way that (considering only bit 0) all the zeroes are on the left, and all the ones are on the right. Let's do this while preserving the order of the elements with a zero bit and preserving the order of the elements with a one bit. We could do that like this:
Element # 2 3 1 4
Value: 14 4 7 1
Binary: 1110 0100 0111 0001
bit 0: 0 0 1 1
The first step of our radix sort is complete. The next step is to consider the next (binary) digit:
Element # 3 2 1 4
Value: 4 14 7 1
Binary: 0100 1110 0111 0001
bit 1: 0 1 1 0
Once again, we must move elements so that the digit in question (bit 1) is arranged in ascending order:
Element # 3 4 2 1
Value: 4 1 14 7
Binary: 0100 0001 1110 0111
bit 1: 0 0 1 1
Now we must move to the next higher digit:
Element # 3 4 2 1
Value: 4 1 14 7
Binary: 0100 0001 1110 0111
bit 2: 1 0 1 1
And move them again:
Element # 4 3 2 1
Value: 1 4 14 7
Binary: 0001 0100 1110 0111
bit 2: 0 1 1 1
Now we move to the last (highest order) digit:
Element # 4 3 2 1
Value: 1 4 14 7
Binary: 0001 0100 1110 0111
bit 3: 0 0 1 0
And make our final move:
Element # 4 3 1 2
Value: 1 4 7 14
Binary: 0001 0100 0111 1110
bit 3: 0 0 0 1
And the values are now sorted. This hopefully seems clear, but in the description so far we've glossed over the details of things like "how do we know which elements to move?" and "how do we know where to put them?" So let's repeat our example, but we'll use the specific methods and sequence suggested in the prompt, in order to answer these questions. Starting over with bit 0:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
bit 0: 1 0 0 1
First let's build a histogram of the number of zero bits in bit 0 position, and the number of 1 bits in bit 0 position:
bit 0: 1 0 0 1
zero bits one bits
--------- --------
1)histogram: 2 2
Now let's do an exclusive prefix-sum on these histogram values:
zero bits one bits
--------- --------
1)histogram: 2 2
2)prefix sum: 0 2
An exclusive prefix-sum is just the sum of all preceding values. There are no preceding values in the first position, and in the second position the preceding value is 2 (the number of elements with a 0 bit in bit 0 position). Now, as an independent operation, let's determine the relative offset of each 0 bit amongst all the zero bits, and each one bit amongst all the one bits:
bit 0: 1 0 0 1
3)offset: 0 0 1 1
This can actually be done programmatically using exclusive prefix-sums again, considering the 0-group and 1-group separately, and treating each position as if it has a value of 1:
0 bit 0: 1 1
3)ex. psum: 0 1
1 bit 0: 1 1
3)ex. psum: 0 1
Now, step 4 of the given algorithm says:
4) Combine the results of steps 2 & 3 to determine the final output location for each element and move it there
What this means is, for each element, we will select the histogram-bin prefix sum value corresponding to its bit value (0 or 1) and add to that, the offset associated with its position, to determine the location to move that element to:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
bit 0: 1 0 0 1
hist psum: 2 0 0 2
offset: 0 0 1 1
new index: 2 0 1 3
Moving each element to its "new index" position, we have:
Element # 2 3 1 4
Value: 14 4 7 1
Binary: 0111 1110 0111 0001
Which is exactly the result we expect for the completion of our first digit-move, based on the previous walk-through. This has completed step 1, i.e. the first (least-significant) digit; we still have the remaining digits to process, creating a new histogram and new prefix sums at each step.
Notes:
Radix-sort, even in a computer, does not have to be done based strictly on binary digits. It's possible to construct a similar algorithm with digits of different sizes, perhaps consisting of 2,3, or 4 bits.
One of the optimizations we can perform on a radix sort is to only sort based on the number of digits that are actually meaningful. For example, if we are storing quantities in 32-bit values, but we know that the largest quantity present is 1023 (2^10-1), we need not sort on all 32 bits. We can stop, expecting a proper sort, after proceeding through the first 10 bits.
What does any of this have to do with GPUs? In so far as the above description goes, not much. The practical application is to consider using parallel algorithms for things like the histogram, the prefix-sums, and the data movement. This decomposition of radix-sort allows one to locate and use parallel algorithms already developed for these more basic operations, in order to construct a fast parallel sort.
What follows is a worked example. This may help with your understanding of radix sort. I don't think it will help with your assignment, because this example performs a 32-bit radix sort at the warp level, for a single warp, ie. for 32 quantities. But a possible advantage from an understanding point of view is that things like histogramming and prefix sums can be done at the warp level in just a few instructions, taking advantage of various CUDA intrinsics. For your assignment, you won't be able to use these techniques, and you will need to come up with full-featured parallel prefix sums, histograms, etc. that can operate on an arbitrary dataset size.
#include <stdio.h>
#include <stdlib.h>
#define WSIZE 32
#define LOOPS 100000
#define UPPER_BIT 31
#define LOWER_BIT 0
__device__ unsigned int ddata[WSIZE];
// naive warp-level bitwise radix sort
__global__ void mykernel(){
__shared__ volatile unsigned int sdata[WSIZE*2];
// load from global into shared variable
sdata[threadIdx.x] = ddata[threadIdx.x];
unsigned int bitmask = 1<<LOWER_BIT;
unsigned int offset = 0;
unsigned int thrmask = 0xFFFFFFFFU << threadIdx.x;
unsigned int mypos;
// for each LSB to MSB
for (int i = LOWER_BIT; i <= UPPER_BIT; i++){
unsigned int mydata = sdata[((WSIZE-1)-threadIdx.x)+offset];
unsigned int mybit = mydata&bitmask;
// get population of ones and zeroes (cc 2.0 ballot)
unsigned int ones = __ballot(mybit); // cc 2.0
unsigned int zeroes = ~ones;
offset ^= WSIZE; // switch ping-pong buffers
// do zeroes, then ones
if (!mybit) // threads with a zero bit
// get my position in ping-pong buffer
mypos = __popc(zeroes&thrmask);
else // threads with a one bit
// get my position in ping-pong buffer
mypos = __popc(zeroes)+__popc(ones&thrmask);
// move to buffer (or use shfl for cc 3.0)
sdata[mypos-1+offset] = mydata;
// repeat for next bit
bitmask <<= 1;
}
// save results to global
ddata[threadIdx.x] = sdata[threadIdx.x+offset];
}
int main(){
unsigned int hdata[WSIZE];
for (int lcount = 0; lcount < LOOPS; lcount++){
unsigned int range = 1U<<UPPER_BIT;
for (int i = 0; i < WSIZE; i++) hdata[i] = rand()%range;
cudaMemcpyToSymbol(ddata, hdata, WSIZE*sizeof(unsigned int));
mykernel<<<1, WSIZE>>>();
cudaMemcpyFromSymbol(hdata, ddata, WSIZE*sizeof(unsigned int));
for (int i = 0; i < WSIZE-1; i++) if (hdata[i] > hdata[i+1]) {printf("sort error at loop %d, hdata[%d] = %d, hdata[%d] = %d\n", lcount,i, hdata[i],i+1, hdata[i+1]); return 1;}
// printf("sorted data:\n");
//for (int i = 0; i < WSIZE; i++) printf("%u\n", hdata[i]);
}
printf("Success!\n");
return 0;
}
The methodology that #Robert Crovella gives is absolutely correct and very helpful. It is mildly different than the process that they explain in the Udacity videos. I'll record one iteration of their method, watchable here, in this answer, jumping off from Robert Crovella's example:
Element # 1 2 3 4
Value: 7 14 4 1
Binary: 0111 1110 0100 0001
LSB: 1 0 0 1
Predicate: 0 __1__ __1__ 0
Pred. Scan: 0 __0__ __1__ 2
Number of ones in predicate: 2
!Predicate:__1__ 0 0 __1__
!Pred. Scan: 0 1 1 1
Offset for !Pred. Scan = Number of ones in predicate = 2
!Pred. Scan + Offset:
__2__ 3 3 __3__
Final indexes to move values after 1 iteration (on LSB):
2 0 1 3
Values after 1 iteration (on LSB):
14 4 7 1
I placed emphasis (__ __) on the values that indicate or contain the index to move the value to.
Terms (from Udacity video):
LSB = least significant bit
Predicate (for LSB): (x & 1) == 0
for the next significant bit: (x & 2) == 0
for the one after that: (x & 4) == 0
and so on, with more left shifting (<<)
Pred. Scan = Predicate Scan = Predicate exclusive prefix sum
!Pred. = bits of predicate flipped (0->1 and 1->0)
Number of ones in predicate
note that this is not necessarily the last entry in the scan, you can instead get this value (sum/reduction of the predicate) as an intermediate of the Blelloch scan
A summary of the above is:
Get the predicate of your list (bit in common, starting from the LSB)
Scan the predicate, and record the sum of the predicate in the process
Blelloch Scan on the GPU
note that your predicate will be of arbitrary size, so read the section on Blelloch Scan for arrays of arbitrary instead of 2^n size
Flip bits of the predicate, and scan that
Move the values in your array with the following rule:
For the ith element in the array:
if the ith predicate is TRUE, move the ith value to the index in the ith element of the predicate scan
else, move the ith value to the index in the ith element of the !Predicate scan plus the sum of the Predicate
Move to the next significant bit (NSB)
For reference, you can consult my solution for this HW assignment in CUDA.

Query about working out whether number is a power of 2

Using the classic code snippet:
if (x & (x-1)) == 0
If the answer is 1, then it is false and not a power of 2. However, working on 5 (not a power of 2) and 4 results in:
0001 1111
0001 1111
0000 1111
That's 4 1s.
Working on 8 and 7:
1111 1111
0111 1111
0111 1111
The 0 is first, but we have 4.
In this link (http://www.exploringbinary.com/ten-ways-to-check-if-an-integer-is-a-power-of-two-in-c/) for both cases, the answer starts with 0 and there is a variable number of 0s/1s. How does this answer whether the number is a power of 2?
You need refresh yourself on how binary works. 5 is not represented as 0001 1111 (5 bits on), it's represented as 0000 0101 (2^2 + 2^0), and 4 is likewise not 0000 1111 (4 bits on) but rather 0000 0100 (2^2). The numbers you wrote are actually in unary.
Wikipedia, as usual, has a pretty thorough overview.
Any power of two number can be represent in binary with a single 1 and multiple 0s.
eg.
10000(16)
1000(8)
100(4)
If you subtract 1 from any power of two number, you will get all 1s to the right of where the original one was.
10000(16) - 1 = 01111(15)
ANDing these two numbers will give you 0 every time.
In the case of a non-power of two number, subtracting one will leave at least one "1" unchanged somewhere in the number like:
10010(18) - 1 = 10001(17)
ANDing these two will result in
10000(16) != 0
Keep in mind that if x is a power of 2, there is exactly 1 bit set. Subtract 1, and you know two things: the resulting value is not a power of two, and the bit that was set is no longer set. So, when you do a bitwise and &, every bit that was set in x is not unset, and all the bits in (x-1) that are set must be matched against bits not set in x. So the and of each bit is always 0.
In other words, for any bit pattern, you are guaranteed that (x&(x-1)) is zero.
((n & (n-1)) == 0)
It checks whether the value of “n” is a power of 2.
Example:
if n = 8, the bit representation is 1000
n & (n-1) = (1000) & ( 0111) = (0000)
So it return zero only if its value is in power of 2.
The only exception to this is ‘0’.
0 & (0-1) = 0 but ‘0’ is not the power of two.
Why does this make sense?
Imagine what happens when you subtract 1 from a string of bits. You read from left to right,
turning each 0 to a 1 until you hit a 1, at which point that bit is flipped:
1000100100 -> (subtract 1) -> 1000100011
Thus, every bit, up through the first 1, is flipped. If there’s exactly one 1 in the number, then every bit (other than the leading zeros) will be flipped. Thus, n & (n-1) == 0 if there’s exactly one 1. If there’s exactly one 1, then it must be a power of two.

Resources