I am asked on HW to reverse bits, like mirror flip them so for example 1011 0100 becomes 0010 1101, using shift and rotate combination. I understand how those commands work but I can't think of a way to flip them. Thanks.
I need to do it using SAL assembly language.
If you need to flip a b-bit word using only shifts, you could emulate a stack:
b times{
right shift on the input register, setting a carry flag.
left shift on the output register, reading the carry flag.
}
Note that x86 has the "rotate through carry" instructions - they serve both purposes (or use rotation without carry on the input register to preserve the input). If left shift from carry is not available but right shift from carry is, reverse the words "left" and "right" in the previous algorithm. If no shift from carry is available, you need to emulate by an "ordinary" logical shift followed by setting the correct bit, but...
If you can use AND and OR as well and b is known ahead and is a power of two, there is a faster way. Reverse two bits within each pair, then two pairs within each nibble, then two nibbles within each byte, then two bytes within each word...
for 8-bit x:
//1234 5678
x = (0x55 & x)<< 1 | (0xAA & x)>> 1 //2143 6587
x = (0x33 & x)<< 2 | (0xCC & x)>> 2 //4321 8765
x = (0x0F & x)<< 4 | (0xF0 & x)>> 4 //8765 4321
Related
i am running a HDL code written in VHDL and i have an input vector with maximum length of 512 bits. Some of my inputs are less than the max size. So i want to find if there is a way to find the actual length of every input, in order to cut the unwanted zeros at the most significant bits of the input vector. Is there any possible way to do this kind of stuff?
I guess you are looking for an unambiguous padding method for your data. What I would recommend in your case is an adaption of the ISO/IEC 9797-1 padding method 2 as follows:
For every input data (even if it already has 512 bits), you add a leading '1' bit. Then you add leading '0' bits (possibly none) to fill up your vector.
To implement this scheme you would have to enlargen your input vector to 513 bits (because you have to always add at least one bit).
To remove the padding, you simple go through the vector starting at the MSB and find the first '1' bit, which marks the end of your apdding pattern.
Example (for 8+1 bit):
input: 10101
padded: 0001 10101
input: 00000000
padded: 1 00000000
So I have a word, and I want to loop through and test the left most bit. I have my word and I'm passing it to my subroutine, I know how to build a loop, I'm just not sure how to test the left most bit in the word.
Thanks for any help
The best way to do this is with bit masking -- perform a bitwise AND between the word you want to check and a bit mask with a 1 in any position you wish to test. i.e. in binary:
my word: 11
bitmask: 10
& ==
10
you can see that the 1 in the left side drops out. So to do something similar on a 16bit number:
0x0230 & 0x8000 = 0x0000
0xC020 & 0x8000 = 0x8000 != 0x0000
The important thing to note here is that if the bit is not present the AND returns a 0, and if the bit is present it returns something else. It doesn't matter what it is, just that it's not zero.
Not sure if it applies to your specific task but a simple approach could be performing a logical/arithmetic left shift to the word. This is simply done by adding the word to itself (which is equal to multiplying by 2 and thus shifting all the bits to the left 1 "spot). After doing this, the condition codes will be set (assuming you're using the GPRs) and you can test if the left most bit is a 1 or a 0 by checking if the shifted word is positive OR zero (hence the left most bit is a 0), or negative (hence the left most bit is a 1). Loop over the whole word following this approach and you'll be able to determine the value of each bit in your word. Hope this helps.
Now I'm trying to figure out best method for iterating over bits in FPGA. I'm using some variation of fast powering algorithm, a.k.a exponentiation by squaring (more precisely it's doubling and add algorithm for elliptic curve mathematics). To implement it on hardware, I know I must use FSM which does iteration. My problem is how to properly "handle" moving from bit to bit. My first thought was to switch order of bytes, but when my k = 17 is 32bit, I must discard first 27 bits, so it's rather stupid idea. Another concept was with "moving" 0001000 pattern and bitwise & it with number, but it also requires to find first nonzero bit.
TL&DR
Got for example k = 17 (32bits, so: 17x0 10001) and want to iterate 5 times (that means I start iteration on first "real" bit of number) knowing each bit I iterate over.
Language doesn't matter - I need only the algorithm, not solution in specific language. However, if it is easily done in Verilog, I wouldn't mind. :P
A dedicated combinatorial circuit to find the first nonzero bit, shift it to the first position and tell you the shift amount should be fairly light on resources.
In principle, the compiler should be able to find this solution on its own and improve on it:
if none of the top 16 bits are set, set bit 4 of the shift amount, and shift by 16.
if none of the top 8 bits are set, set bit 3 of the shift amount, and shift by 8.
...
The compiler should be able to find further optimizations on this.
Don not code for FPGA but still:
rewrite algorithm to iterate number x from LSB to MSB
then in each iteration bit shift x right by 1 bit
stop if x==0.
this way you have bit-scan inside your main loop and do not need additional cycles for it.
x!=0 is done easily by ORing all its bits together
C++ code example:
DWORD x = ...;
for (; x != 0; x >>= 1)
{
//here is your iteration loop stuff like:
if (DWORD(x & 1) !=0 ) ...;
}
Something like:
always # *
casex(num)
8XXX_XXXX: k = 32;
4XXX_XXXX: k = 31;
2XXX_XXXX: k = 30;
...
Should give you the value of k.
You can have a shift register which can be parallel loaded so you can write a 1 to the kth bit, so you know when your iterations have ended.
If you loop from 0 to 31 and discard the 27 leading zeros...you aren't necessarily wasting cycles. Depends on whether you've surrounded this with a synchronous process, or a asynchronous one.
One gives you a rather small clocked circuit with a 32 clock latency.
The other gives you a giant rats nest of ANDs and ORs which won't run at a very high frequency.
Depends on what you want. Remember though, that even if you do decide to loop over 32 clocks, you can PIPELINE it such that you start a new calculation every clock. It might take you 32 clocks to get an answer, but you CAN do them at high speed.
In two-complement to invert the sign of a number you usually just negate every bit and add 1.
For example:
011 (3)
100 + 1 = 101 (-3)
In VHDL is:
a <= std_logic_vector(unsigned(not(a)) + 1);
In this way the synthesizer uses an N-bit adder.
Is there another more efficient solution without using the adder?
I would guess there's not an easier way to do it, but the adder is probably not as bad as you think it is.
If you are trying to say invert a 32-bit number, the synthesis tool might start with a 32-bit adder structure. However then upon seeing that the B input is always tied to 1, it can 'hollow out' a lot of the structure due to the unused gates (AND gates with one pin tied to ground, OR gates with one pin tied to logic 1, etc).
So what you're left with I'd imagine would be a reasonably efficient blob of logic which just increments an input number.
If you are just trying to create a two's complement bit pattern then a unary - also works.
a = 3'b001 ; // 1
b = -a ; //3'b111 -1
c = ~a + 1 ; //3'b111 -1
Tim has also correctly pointed out that just because you use a + or imply one through a unary -, the synthesis tools are free to optimise this.
A full adder has 3 inputs (A, B, Carry_in) and 2 outputs (Sum Carry_out). Since for our use the second input is only 1 bit wide, and at the LSB there is no carry, we do not need a 'full adder'.
A half adder which has 2 inputs (A, B) and 2 outputs (Sum Carry), is perfect here.
For the LSB the half adders B input will be high, the +1. The rest of the bits B inputs will be used to propagate the Carry from the previous bit.
There is no way that I am aware of to write in verilog that you want a half adder, but any size number plus 1 bit only requires a half adder rather than a fulladder.
Could you please suggest an error detection scheme for detecting
one possible bit flip in the first 32 bytes of a 33-byte message using
no more than 8 bits of additional data?
Could Pearson hashing be a solution?
Detecting a single bit-flip in any message requires only one extra bit, independent of the length of the message: simply xor together all the bits in the message and tack that on the end. If any single bit flips, the parity bit at the end won't match up.
If you're asking to detect which bit flipped, that can't be done, and a simple argument shows it: the extra eight bits can represent up to 256 classes of 32-byte messages, but the zero message and the 256 messages with one on bit each must all be in different classes. Thus, there are 257 messages which must be distinctly classified, and only 256 classes.
You can detect one bit flip with just one extra bit in any length message (as stated by #Daniel Wagner). The parity bit can, simply put, indicate whether the total number of 1-bits is odd or even. Obviously, if the number of bits that are wrong is even, then the parity bit will fail, so you cannot detect 2-bit errors.
Now, for a more accessible understanding of why you can't error-correct 32 bytes (256 bits) with just 8 bits, please read about the Hamming code (like used in ECC memory). Such a scheme uses special error-correcting parity bits (henceforth called "EC parity") that only encode the parity of a subset of the total number of bits. For every 2^m - 1 total bits, you need to use m EC bits. These represent each possible different mask following the pattern "x bits on, x bits off" where x is a power of 2. Thus, the larger the number of bits at once, the better the data/parity bit ratio you get. For example, 7 total bits would allow encoding only 4 data bits after losing 3 EC bits, but 31 total bits can encode 26 data bits after losing 5 EC bits.
Now, to really understand this probably will take an example. Consider the following sets of masks. The first two rows are to be read top down, indicating the bit number (the "Most Significant Byte" I've labeled MSB):
MSB LSB
| |
v v
33222222 22221111 11111100 0000000|0
10987654 32109876 54321098 7654321|0
-------- -------- -------- -------|-
1: 10101010 10101010 10101010 1010101|0
2: 11001100 11001100 11001100 1100110|0
3: 11110000 11110000 11110000 1111000|0
4: 11111111 00000000 11111111 0000000|0
5: 11111111 11111111 00000000 0000000|0
The first thing to notice is that the binary values for 0 to 31 are represented in each column going from right to left (reading the bits in rows 1 through 5). This means that each vertical column is different from each other one (the important part). I put a vertical extra line between bit numbers 0 and 1 for a particular reason: Column 0 is useless because it has no bits set in it.
To perform error-correcting, we will bitwise-AND the received data bits against each EC bit's predefined mask, then compare the resulting parity to the EC bit. For any calculated parities discovered to not match, find the column in which only those bits are set. For example, if error-correcting bits 1, 4, and 5 are wrong when calculated from the received data value, then column #25--containing 1s in only those masks--must be the incorrect bit and can be corrected by flipping it. If only a single error-correcting bit is wrong, then the error is in that error-correcting bit. Here's an analogy to help you understand why this works:
There are 32 identical boxes, with one containing a marble. Your task is to locate the marble using just an old-style scale (the kind with two balanced platforms to compare the weights of different objects) and you are only allowed 5 weighing attempts. The solution is fairly easy: you put 16 boxes on each side of the scale and the heavier side indicates which side the marble is on. Discarding the 16 boxes on the lighter side, you then weigh 8 and 8 boxes keeping the heavier, then 4 and 4, then 2 and 2, and finally locate the marble by comparing the weights of the last 2 boxes 1 to 1: the heaviest box contains the marble. You have completed the task in only 5 weighings of 32, 16, 8, 4, and 2 boxes.
Similarly, our bit patterns have divided up the boxes in 5 different groups. Going backwards, the fifth EC bit determines whether an error is on the left side or the right side. In our scenario with bit #25, it is wrong, so we know that the error bit is on the left side of the group (bits 16-31). In our next mask for EC bit #4 (still stepping backward), we only consider bits 16-31, and we find that the "heavier" side is the left one again, so we have narrowed down the bits 24-31. Following the decision tree downward and cutting the number of possible columns in half each time, by the time we reach EC bit 1 there is only 1 possible bit left--our "marble in a box".
Note: The analogy is useful, though not perfect: 1-bits are not represented by marbles--the erroring bit location is represented by the marble.
Now, some playing around with these masks and thinking how to arrange things will reveal that there is a problem: If we try to make all 31 bits data bits, then we need 5 more bits for EC. But how, then, will we tell if the EC bits themselves are wrong? Just a single EC bit wrong will incorrectly tell us that some data bit needs correction, and we'll wrongly flip that data bit. The EC bits have to somehow encode for themselves! The solution is to position the parity bits inside of the data, in columns from the bit patterns above where only one bit is set. This way, any data bit being wrong will trigger two EC bits to be wrong, making it so that if only one EC bit is wrong, we know it is wrong itself instead of it signifying a data bit is wrong. The columns that satisfy the one-bit condition are 1, 2, 4, 8, and 16. The data bits will be interleaved between these starting at position 2. (Remember, we are not using position 0 as it would never provide any information--none of our EC bits would be set at all).
Finally, adding one more bit for overall parity will allow detecting 2-bit errors and reliably correcting 1-bit errors, as we can then compare the EC bits to it: if the EC bits say something is wrong, but the parity bit says otherwise, we know there are 2 bits wrong and cannot perform correction. We can use the discarded bit #0 as our parity bit! In fact, now we are encoding the following pattern:
0: 11111111 11111111 11111111 11111111
This gives us a final total of 6 Error-Checking and Correcting (ECC) bits. Extending the scheme of using different masks indefinitely looks like this:
32 bits - 6 ECC bits = 26 data
64 bits - 7 ECC bits = 57 data
128 bits - 8 ECC bits = 120 data
256 bits - 9 ECC bits = 247 data
512 bits - 10 ECC bits = 502 data
Now, if we are sure that we only will get a 1-bit error, we can dispense with the #0 parity bit, so we have the following:
31 bits - 5 ECC bits = 26 data
63 bits - 6 ECC bits = 57 data
127 bits - 7 ECC bits = 120 data
255 bits - 8 ECC bits = 247 data
511 bits - 9 ECC bits = 502 data
This is no change because we don't get any more data bits. Oops! 32 bytes (256 bits) as you requested cannot be error-corrected with a single byte, even if we know we can have only a 1-bit error at worst, and we know the ECC bits will be correct (allowing us to move them out of the data region and use them all for data). We need TWO more bits than we have--one must slide up to the next range of 512 bits, then leave out 246 data bits to get our 256 data bits. So that's one more ECC bit AND one more data bit (as we only have 255, exactly what Daniel told you).
Summary:: You need 33 bytes + 1 bit to detect which bit flipped in the first 32 bytes.
Note: if you are going to send 64 bytes, then you're under the 32:1 ratio, as you can error correct that in just 10 bits. But it's that in real world applications, the "frame size" of your ECC can't keep going up indefinitely for a few reasons: 1) The number of bits being worked with at once may be much smaller than the frame size, leading to gross inefficiencies (think ECC RAM). 2) The chance of being able to accurately correct a bit gets less and less, since the larger the frame, the greater the chance it will have more errors, and 2 errors defeats error-correction ability, while 3 or more can defeat even error-detection ability. 3) Once an error is detected, the larger the frame size, the larger the size of the corrupted piece that must be retransmitted.
If you need to use a whole byte instead of a bit, and you only need to detect errors, then the standard solution is to use a cyclic redundancy check (CRC). There are several well-known 8-bit CRCs to choose from.
A typical fast implementation of a CRC uses a table with 256 entries to handle a byte of the message at a time. For the case of an 8 bit CRC this is a special case of Pearson's algorithm.