Is there any bit level error detection algorithm that use minimum extra bits? - algorithm

I have a 32-bit number that is created by encoding some data, I want to be more confident that the data (a max 32-bit number) is not changed when decoding it, so I am going to add some error detection bits.
I need to keep the data as short as possible, so I can only add a few bits for error detection, in some cases just 1 bit.
I'm looking for an algorithm that detects more bit changes and needs fewer extra bits.
I was thinking of calculating a checksum or CRC and just dropping extra bits or maybe xor the result to make it shorter but I'm not sure if the error detection remains good enough.
Thanks in advance for any help.

A 1-bit CRC, with polynomial x+1 would simply be the parity of your 32 message bits. That will detect any one-bit error in the resulting 32 bits. For a 2-bit CRC, you can use x2+1. You can define a CRC of any length. See Koopman's list for good CRC polynomials for CRCs of degree 3 and higher.

Related

Probability of a collision using 32 bit CRC of a unique 32 byte array

I am trying to figure out if using 32 bit CRC will produce collision on 32 byte array.
BackGround
My system reads some configuration whenever it boots up from an external flash. I store the SHA256 hash of the last know configuration and when ever I read the configuration I calculate the SHA256 hash and compare it. If the two hash are different then the data is different.
I need to take that SHA256 and make it into a 32bit hash for another part of the system (due to some legacy code restrictions).
Questions
Will there be a high number of collision if I compute the 32 bit CRC on the 32 byte hash from SHA256?
I calculate the probability of collision to be 0. Can you let me know if this is correct?
The number of sample K is always 2 in my problem (I think) because I am calculating 32 bit CRC on two 32 bytes byte array (SHA256 byte array).
see calculation here
That's correct, if by "0" you mean that very small number. That small number is the probability that you would get a 32-bit CRC from random data that accidentally matches what you were expecting. It is simply 2-32.

How are code lengths limited to 16 bits at maximum in JPEG?

According to ITU T.81 that describes the JPEG format, BITS stores the "code length counts". Creation of it is described in Annex K Figure K.2 of the specification. JPEG specification expects that symbols will exist that will require huffman codes upto 32 bits in length when encoding is being carrying out. However, it limits huffman code length to 16 bits at maximum for when data is encoded. For this purpose the code lengths must be limited to 16 bits. The procedure for this is contained in Annex K Figure K.3 shown below:
My question is that will BITS have negative values as well when we do BITS(I)-2 and BITS(I)-1? Does it have to be declared as signed? If so, what meaning do negative values have? I have implemented this in code but it gives me negative values. So some images encode just fine but others where BITS has to be manipulated to 16 bits, the images always get corrupted.
As I understand it, negative values should be fine since those are i=[17,32] values, which are not used once you are done reducing it to 16 bits. The algorithm assumes signed math, notice the BITS(i) > 0 condition, negative values will fall through the "No" branch and eventually end after dealing with BITS(17).
In your implementation, I think you could use unsigned math if you really want to and just clamp the underflow to 0 (Naively, something like BITS(i) = BITS(i) > 2 ? BITS(i) - 2 : 0).

Fixed Point Multiplication for FFT

I’m writing a Radix-2 DIT FFT algorithm in VHDL, which requires some fractional multiplication of input data by Twiddle Factor (TF). I use Fixed Point arithmetic’s to achieve that, with every word being 16 bit long, where 1 bit is a sign bit and the rest is distributed between integer and fraction. Therefore my dilemma:
I have no idea, in what range my input data will be, so if I just decide that 4 bits go to integer and the rest 11 bits to fraction, in case I get integer numbers higher than 4 bits = 15 decimal, I’m screwed. The same applies if I do 50/50, like 7 bits to integer and the rest to fraction. If I get numbers, which are very small, I’m screwed because of truncation or rounding, i.e:
Let’s assume I have an integer "3"(0000 0011) on input and TF of "0.7071" ( 0.10110101 - 8 bit), and let’s assume, for simplicity, my data is 8 bit long, therefore:
3x0.7071 = 2.1213
3x0.7071 = 0000 0010 . 0001 1111 = 2.12109375 (for 16 bits).
Here comes the trick - I need to up/down round or truncate 16 bits to 8 bits, therefore, I get 0000 0010, i.e 2 - the error is way too high.
My questions are:
How would you solve this problem of range vs precision if you don’t know the range of your input data AND you would have numbers represented in fixed point?
Should I make a process, which decides after every multiplication where to put the comma? Wouldn’t it make the multiplication slower?
Xilinx IP Core has 3 different ways for Fixed Number Arithmetic’s – Unscaled (similar to what I want to do, just truncate in case overflow happens), Scaled fixed point (I would assume, that in that case it decides after each multiplication, where the comma should be and what should be rounded) and Block Floating Point(No idea what it is or how it works - would appreciate an explanation). So how does this IP Core decide where to put the comma? If the decision is made depending on the highest value in my dataset, then in case I have just 1 high peak and the rest of the data is low, the error will be very high.
I will appreciate any ideas or information on any known methods.
You don't need to know the fixed-point format of your input. You can safely treat it as normalized -1 to 1 range or full integer-range.
The reason is that your output will have the same format as the input. Or, more likely for FFT, a known relationship like 3 bits increase, which would the output has 3 more integer bits than the input.
It is the core user's burden to know where the decimal point will end up, you have to document the change to dynamic range of course.

How to uniquely represent 99,999 bits as a byte, word, or double word

I have 99,999 bit flags that I need to represent uniquely with 32 bits or less. Any of the bits can be set and I need to know if the set bits differ from a comparable set of bits. I am considering using CRC to store a unique value hash but I am not sure if collisions will be a problem. Ideally, less than 500 of these bits will be set at any given time, but they will not be know ahead of time.
Is there suitable hash or other algorithm to uniquely represent these bits?
NO!
Without some other information about those bit flags to identify that certain combinations are impossible, this cannot be done. If all combinations are possible, then you will need to use 99,999 bits to store your 99,999 bit flags.
Edit:
Based on the background information that this is to reduce network usage and the expectation is that only about 500 of the bits are set, there are techniques that can be used, but none are a simple hash, and none are efficient enough to store in 32 bits. I would start by looking at Arithmetic Coding. This uses a probability distribution of the characters that you want to send (0.5% 1, 99.5% 0) to compress data. By my computations, you can "expect" a compression of about 22 times. But, for signals that are considered rare, you will pay the price by needing to transmit a signal larger than your starting 99,999 bits.

Expected collisions for perfect 32bit crc

I'm trying to determine how my crc compares to an "ideal" 32bit crc.
So I ran my crc over 1 million completely random samples of data and collected the amount of collisions, I want to compare this number to the number of collisions I could expect from the "ideal" crc.
Does anyone know how to calculate the expected collision for an "ideal" 32bit crc?
Compare your own CRC with 0x1EDC6F41 as your "ideal" reference.
Having said that, there is no ideal 32-bit CRC. Different polynomials have different collision characteristics depending on the length of data hashed. However, a paper by Castagnoli in 1993 found what is considered the best 32-bit CRC value over the broadest range of data lengths, which is 0x1EDC6F41. This polynomial is used by some network protocols like iSCSI and also the x86 CRC32 instruction.
This explains beautifully the "Birthday Problem" and all about predicting the collision probability CRC32 Hash Collision Probability

Resources