My task is to write a 16 bit ALU in verilog. I found difficulties when I do the part that needs to rotate the operand and doing the 2's complement addition and subtraction. I know how to work that out by paper and pencil but i cant figure out ways to do it in Verilog.
for example:
A is denoted as a15 a14 a13 a12 a11 a10 a9 a8 a7 a6 a5 a4 a3 a2 a1 a0
if i am going to rotate 4 bits,
the answer would be
a11 a10 a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 a15 a14 a13 a12
i tried concatenation but it turns out to be incorrect.
need you all help...
The following will work using one shifter:
assign A_out = {A_in,A_in} >> (16-shift[3:0]);
When shift is 0 the left A_in is selected. As shift increase the left A_in shifts to the left and the MSBs of the right A_in fills in.
If synthesizing, then you may want to use muxes, as dynamic shift logic tends require more gates. A 16-bit barrel shifter will require 4 levels of 2-to-1 muxes.
wire [15:0] tmp [3:1];
assign tmp[3] = shift[3] ? { A_in[ 7:0], A_in[15: 8]} : A_in;
assign tmp[2] = shift[2] ? {tmp[3][11:0],tmp[3][15:12]} : tmp[3];
assign tmp[1] = shift[1] ? {tmp[2][13:0],tmp[2][15:14]} : tmp[2];
assign A_out = shift[0] ? {tmp[1][14:0],tmp[1][15 ]} : tmp[1];
assign A_out = A_in << bits_to_rotate;
Where bits_to_rotate can be a variable value (either a signal or a reg).
This will infer a generic shifter using multiplexers, or a barrel shifter, whatever suits better the target hardware. The synthetizer will take care about that.
Oh, well. If you want to rotate instead of shift, the thing is just a bit trickier:
assign A_out = (A_in << bits_to_rotate) | (A_in >> ~bits_to_rotate);
Why is concatenation incorrect? This should do what you ask.
assign A_out[15:0] = {A_in[11:0], A_in[15:12]};
The best way I found to do this is finding a pattern. When you want to rotate left an 8 bit signal 1 position (8'b00001111 << 1) the result is 8'b00011110) also when you want to rotate left 9 positions (8'b00001111 << 9) the result is the same, 8'b00011110, and also rotating 17 positions, this reduces your possibilities to next table:
So if you look, the three first bits of all numbers on tale equivalent to rotate 1 position (1,9,17,25...249) are equal to 001 (1).
The three first bits of all numbers on table equivalent to rotate 6 positions (6,14,22,30...254) are equal to 110 (6).
So you can apply a mask (8'b00000111) to determine the correct shifting by making zero all other bits:
reg_out_temp <= reg_in_1 << (reg_in_2 & 8'h07);
reg_out_temp shall be the double of reg_in_1, in this case reg_out_temp shall be 16 bit and reg_in_1 8 bit, so you can get the carried bits to the other byte when you shift the data so you can combine them using an OR expression:
reg_out <= reg_out_temp[15:8] | reg_out_temp[7:0];
So by two clock cycles you have the result. For a 16 bit rotation, your mask shall be 8'b00011111 (8'h1F) because your shifts goes from 0 to 16, and your temporary register shall be of 32 bits.
Related
I need to construct the (31,26) hamming code of 0x444.
After reading Wikipedia and the algorithm shown in GeeksForGeeks I still can't understand how this works as my construction ended up different than the result of a calculator I found on the internet.
My result is: 0100 0100 0010 0010 or 0x4422
is it correct?
As I understand:
P1 = Bitwise XOR(C1,C3,C5,C7,C9,C11,C13.C15,C17..) = 0
P2 = Bitwise XOR(C2,C3,C6,C7,C10,C11,C14,C15..) = 1
P3 = Bitwise XOR(C4,C5,C6,C7,C12,C13,C14,C15..) = 0
P4 = Bitwise XOR(C8,C9,C10,C11,C12,C13,C14,C15..) = 0
P5 = Bitwise XOR(C16,C17..) = 0
Another thing I can't understand.. if (31,26) hamming code is supposed to output a 31 bit result with 5 parity bits and 26 data bits.. why (7,4) hamming code transforms each 4 bits to 7 bits representation and not just 1 representation of 7 bits with 3 parity bits?
Thanks.
Yes, assuming you are numbering the bits from 1 at the right-hand end, then 0x000444 is encoded as 0x00004422 for a (31,26) Hamming Code -- for an even parity code-word.
Where C1, C2, etc are bit 1, 2, etc of the code-word, and P1, P2, etc are parity bits 1, 2, etc. I think is clearer to say that:
P1 = C1 = Bitwise_XOR(C3, C5, C7, C9, ...)
so that:
Bitwise_XOR(C1, C3, C5, C7, C9, ...) == 0
and so on. This is even parity.
You do not say which "calculator" you tried, but it could be that the discrepancy you see is to do with what end you number from. I note that Wikipedia gives:
If a byte of data to be encoded is 10011010, then the data word (using _ to represent the parity bits) would be __1_001_1010, and the code word is 011100101010.
which is clearly counting bits from the left-hand end.
I regret I do not understand your second question. I can say that a (31,26) Hamming Code does indeed take 26 bits of data and adds 5 parity bits to produce a 31 bits code-word. And that a (7,4) Hamming Code does likewise for 4 bits of data, 3 parity bits and a 7 bit code-word.
In a logic circuit, I have an 8-bit data vector that is fed into an ECC IC which I am supposed to develop the logic for and that contains a vector of 5 Parity Bits. My first step to develop the logic (with logic gates, XOR), is to figure out which parity bit is going to check for which Data bits (since they are interlaced). I am using even parity, and following general hamming code rules (a parity bit in every 2^n ), I get the following sequence of output:
P1 P2 D1 P3 D2 D3 D4 P4 D5 D6 D7 D8 P5
Following the General Hamming Algorithm:
For each parity bit, Position 1,2,4,8,16 and so on... (Powers of 2), we skip for the first position n (n-1) and we check 1 bit, then we skip another one, the check another one, etc... we repeat the same process for the other bits, but this time checking/skipping every 2^n, where n is the position they occupy in the output array (P1 P2 D1 P3 D2 D3 D4 P4 D5 D6 D7 D8 P5)
Following that convention, I get:
P1 Checks data bits -> XOR(3 5 7 9 10 12)
P2 Checks data bits -> XOR(3 6 7 10 11)
P3 Checks data bits -> XOR(5 6 10 11 12)
P4 Checks data bits -> XOR(9 10 11)
Am I right? The thing that confuses me is that if I should start checking counting the parity bit as one of the 2^n bits that are supposed to be checked, or 1 bit after that specific parity bit. Pretty much sums up to if it is inclusive or not.
Thank you for your help in advance!
Cheers!
You can follow this sheme. The bits marked in each row must sum up to 0 (mod 2) in other words for the marked positions in each row the number of set bits must be even.
P1 P2 D1 P3 D2 D3 D4 P4 D5 D6 D7 D8
x x x x x x
x x x x x x
x x x x x
x x x x x
I don't understand why you have P5 in the scheme.
This is the diagram we were given for class:
Why wouldn't you just use C4 in this image? If C4 is 1, then the last addition resulted in an overflow, which is what we're wondering. Why do we need to look at C3?
Overflow flag indicates an overflow condition for a signed operation.
Some points to remember in a signed operation:
MSB is always reserved to indicate sign of the number
Negative numbers are represented in 2's complement
An overflow results in invalid operation
Two's complement overflow rules:
If the sum of two positive numbers yields a negative result, the sum has overflowed.
If the sum of two negative numbers yields a positive result, the sum has overflowed.
Otherwise, the sum has not overflowed.
For Example:
**Ex1:**
0111 (carry)
0101 ( 5)
+ 0011 ( 3)
==================
1000 ( 8) ;invalid (V=1) (C3=1) (C4=0)
**Ex2:**
1011 (carry)
1001 (-7)
+ 1011 (−5)
==================
0100 ( 4) ;invalid (V=1) (C3=0) (C4=1)
**Ex3:**
1110 (carry)
0111 ( 7)
+ 1110 (−2)
==================
0101 ( 5) ;valid (V=0) (C3=1) (C4=1)
In a signed operation if the two leftmost carry bits (the ones on the far left of the top row in these examples) are both 1s or both 0s, the result is valid; if the left two carry bits are "1 0" or "0 1", a sign overflow has occurred. Conveniently, an XOR operation on these two bits can quickly determine if an overflow condition exists. (Ref:Two's complement)
Overflow vs Carry: Overflow can be considered as a two's complement form of a Carry. In a signed operation overflow flag is monitored and carry flag is ignored. Similarly in an unsigned operation carry flag is monitored and overflow flag is ignored.
Overflow for signed numbers occurs when the carry-in into the most significant bit is not equal to the carry out.
For example, working with 8 bits, 65 + 64 = 129 actually results in a overflow. This is because this is 1000 0001 in binary which is also -127 in 2's complement. If you work through this example, you can see that it is a result of the carry out not equalling the carry in.
It is possible to have a correct computation even when the carry flag is high.
Consider
1000 1000 = -120
+ 1111 1111 = -1
=(1) 10000111 = -121
There is a carry out of 1, but there has been no overflow.
I would like the give a more general answer to this question for any positive natural number of bits.
Lets call the last Carry output C1, the second to last Carry output C0, the sum sign output S0 and the signbits of A and B respectively A0 and B0.
Then the following holds:
C1 = A0 + B0 + C0
S0 = A0*B0 + A0*C0 + B0*C0
Lets now walk through the posibilities.
If C1 == 1 there are two possibilities:
if C0 == 0: A0 and B0
must both have been 1 (and thus both A en B must be negative). This
means S0 has to be 0 meaning the solution was positive while A en B
were negative => overflow
if C0 == 1: either
A en B have opposite signs, so overflow is not possible. => no overflow
A0 en B0 are both 1 (and thus A en B
must both be negative). This means S0 has to be 1 meaning the solution was negative => no overflow
If C1 == 0 there are two possibilities:
if C0 == 0: either
A0 en B0 are both 0 (and thus A en B
must both be positive). This means S0 has to be 0 meaning the solution was positive => no overflow
A en B have opposite signs => no overflow
if C0 == 1: A0 en B0 must both be 0 (and thus A en B
must both be positive) This
means S0 has to be 1 meaning the solution was negative while A en B
were positive => overflow
Hope that helps someone out there.
I have some data coming from the hardware. Data comes in blocks of 32 bytes, and there are potentially millions of blocks. Data blocks are scattered in two halves the following way (a letter is one block):
A C E G I K M O B D F H J L N P
or if numbered
0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
First all blocks with even indexes, then the odd blocks. Is there a specialized algorithm to reorder the data correctly (alphabetical order)?
The constraints are mainly on space. I don't want to allocate another buffer to reorder: just one more block. But I'd also like to keep the number of moves low: a simple quicksort would be O(NlogN). Is there a faster solution in O(N) for this special reordering case?
Since this data is always in the same order, sorting in the classical sense is not needed at all. You do not need any comparisons, since you already know in advance which of two given data points.
Instead you can produce the permutation on the data directly. If you transform this into cyclic form, this will tell you exactly which swaps to do, to transform the permuted data into ordered data.
Here is an example for your data:
0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Now calculate the inverse (I'll skip this step, because I am lazy here, assume instead the permutation I have given above actually is the inverse already).
Here is the cyclic form:
(0)(1 8 4 2)(3 9 12 6)(5 10)(7 11 13 14)(15)
So if you want to reorder a sequence structured like this, you would do
# first cycle
# nothing to do
# second cycle
swap 1 8
swap 8 4
swap 4 2
# third cycle
swap 3 9
swap 9 12
swap 12 6
# so on for the other cycles
If you would have done this for the inverse instead of the original permutation, you would get the correct sequence with a proven minimal number of swaps.
EDIT:
For more details on something like this, see the chapter on Permutations in TAOCP for example.
So you have data coming in in a pattern like
a0 a2 a4...a14 a1 a3 a5...a15
and you want to have it sorted to
b0 b1 b2...b15
With some reordering the permutation can be written like:
a0 -> b0
a8 -> b1
a1 -> b2
a2 -> b4
a4 -> b8
a9 -> b3
a3 -> b6
a6 -> b12
a12 -> b9
a10 -> b5
a5 -> b10
a11 -> b7
a7 -> b14
a14 -> b13
a13 -> b11
a15 -> b15
So if you want to sort in place it with only one block additional space in a temporary t, this could be done in O(1) with
t = a8; a8 = a4; a4 = a2; a2 = a1; a1 = t
t = a9; a9 = a12; a12= a6; a6 = a3; a9 = t
t = a10; a10 = a5; a5 = t
t = a11; a11 = a13; a13 = a14; a14 = a7; a7 = t
Edit:The general case (for N != 16), if it is solvable in O(N), is actually an interesting question. I suspect the cycles always start with a prime number which satisfies p < N/2 && N mod p != 0 and the indices have a recurrence like in+1 = 2in mod N, but I am not able to prove it. If this is the case, deriving an O(N) algorithm is trivial.
maybe i'm misunderstanding, but if the order is always identical to the one given then you can "pre-program" (ie avoiding all comparisons) the optimum solution (which is going to be the one that has the minimmum number of swaps to move from the string given to ABCDEFGHIJKLMNOP and which, for something this small, you can work out by hand - see LiKao's answer).
It is easier for me to label your set with numbers:
0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
Start from the 14 and move all even numbers to place (8 swaps). You will get this:
0 1 2 9 4 6 13 8 3 10 7 12 11 14 15
Now you need another 3 swaps (9 with 3, 7 with 13, 11 with 13 moved from 7).
A total of 11 swaps. Not a general solution, but it could give you some hints.
You can also view the intended permutation as a shuffle of the address-bits `abcd <-> dabc' (with abcd the individual bits of the index) Like:
#include <stdio.h>
#define ROTATE(v,n,i) (((v)>>(i)) | (((v) & ((1u <<(i))-1)) << ((n)-(i))))
/******************************************************/
int main (int argc, char **argv)
{
unsigned i,a,b;
for (i=0; i < 16; i++) {
a = ROTATE(i,4,1);
b = ROTATE(a,4,3);
fprintf(stdout,"i=%u a=%u b=%u\n", i, a, b);
}
return 0;
}
/******************************************************/
That was count sort I believe
I think this is not really possible but worth asking anyway. Say I have two small numbers (Each ranges from 0 to 11). Is there a way that I can compress them into one byte and get them back later. How about with four numbers of similar sizes.
What I need is something like: a1 + a2 = x. I only know x and from that get a1, a2
For the second part: a1 + a2 + a3 + a4 = x. I only know x and from that get a1, a2, a3, a4
Note: I know you cannot unadd, just illustrating my question.
x must be one byte. a1, a2, a3, a4 range [0, 11].
Thats trivial with bit masks. Idea is to divide byte into smaller units and dedicate them to different elements.
For 2 numbers, it can be like this: first 4 bits are number1, rest are number2. You would use number1 = (x & 0b11110000) >> 4, number2 = (x & 0b00001111) to retrieve values, and x = (number1 << 4) | number2 to compress them.
For two numbers, sure. Each one has 12 possible values, so the pair has a total of 12^2 = 144 possible values, and that's less than the 256 possible values of a byte. So you could do e.g.
x = 12*a1 + a2
a1 = x / 12
a2 = x % 12
(If you only have signed bytes, e.g. in Java, it's a little trickier)
For four numbers from 0 to 11, there are 12^4 = 20736 values, so you couldn't fit them in one byte, but you could do it with two.
x = 12^3*a1 + 12^2*a2 + 12*a3 + a4
a1 = x / 12^3
a2 = (x / 12^2) % 12
a3 = (x / 12) % 12
a4 = x % 12
EDIT: the other answers talk about storing one number per four bits and using bit-shifting. That's faster.
The 0-11 example is pretty easy -- you can store each number in four bits, so putting them into a single byte is just a matter of shifting one 4 bits to the left, and oring the two together.
Four numbers of similar sizes won't fit -- four bits apiece times four gives a minimum of 16 bits to hold them.
Let's say it in general: suppose you want to mix N numbers a1, a2, ... aN, a1 ranging from 0..k1-1, a2 from 0..k2-1, ... and aN from 0 .. kN-1.
Then, the encoded number is:
encoded = a1 + k1*a2 + k1*k2*a3 + ... k1*k2*..*k(N-1)*aN
The decoding is then more tricky, stepwise:
rest = encoded
a1 = rest mod k1
rest = rest div k1
a2 = rest mod k2
rest = rest div k2
...
a(N-1) = rest mod k(N-1)
rest = rest div k(N-1)
aN = rest # rest is already < kN
If the numbers 0-11 aren't evenly distributed you can do even better by using shorter bit sequences for common values and longer ones for rarer values. It costs at least one bit to code which length you are using so there is a whole branch of CS devoted to proving when it's worth doing.
So a byte can hold upto 256 values or FF in Hex. So you can encode two numbers from 0-16 in a byte.
byte a1 = 0xf;
byte a2 = 0x9;
byte compress = a1 << 4 | (0x0F & a2); // should yield 0xf9 in one byte.
4 Numbers you can do if you reduce it to only 0-8 range.
Since a single byte is 8 bits, you can easily subdivide it, with smaller ranges of values. The extreme limit of this is when you have 8 single bit integers, which is called a bit field.
If you want to store two 4-bit integers (which gives you 0-15 for each), you simply have to do this:
value = a * 16 + b;
As long as you do proper bounds checking, you will never lose any information here.
To get the two values back, you just have to do this:
a = floor(value / 16)
b = value MOD 15
MOD is modulus, it's the "remainder" of a division.
If you want to store four 2-bit integers (0-3), you can do this:
value = a * 64 + b * 16 + c * 4 + d
And, to get them back:
a = floor(value / 64)
b = floor(value / 16) MOD 4
c = floor(value / 4) MOD 4
d = value MOD 4
I leave the last division as an exercise for the reader ;)
#Mike Caron
your last example (4 integers between 0-3) is much faster with bit-shifting. No need for floor().
value = (a << 6) | (b << 4) | (c << 2) | d;
a = (value >> 6);
b = (value >> 4) % 4;
c = (value >> 2) % 4;
d = (value) % 4;
Use Bit masking or Bit Shifting. The later is faster
Test out BinaryTrees for some fun. (it will be handing later on in dev life regarding data and all sorts of dev voodom lol)
Packing four values into one number will require at least 15 bits. This doesn't fit in a single byte, but in two.
What you need to do is a conversion from base 12 to base 65536 and conversely.
B = A1 + 12.(A2 + 12.(A3 + 12.A4))
A1 = B % 12
A2 = (B / 12) % 12
A3 = (B / 144) % 12
A4 = B / 1728
As this takes 2 bytes anyway, conversion from base 12 to (packed) base 16 is by far prefable.
B1 = A1 + 256.A2
B2 = A3 + 256.A4
A1 = B1 % 256
A2 = B1 / 256
A3 = B2 % 256
A4 = B2 / 256
The modulos and divisions are implemented bymaskings and shifts.
0-9 works much easier. You can easily store 11random order decimals in 4 1/2 bytes. Which is tighter compression than log(256)÷log(10). Just by creative mapping. Remember not all compression has to do with, dictionaries, redundancies, or sequences.
If you are talking of random numbers 0 - 9 you can have 4 digits per 14 bits not 15.