Generating of the polynomial key for crc - algorithm

With reference to this article:
https://www.digikey.com/eewiki/display/microcontroller/CRC+Basics
The polynomial key is an important part of the CRC. Keys are not just random polynomials; they are generated using a set of mathematical formulas and are intended to increase the number of mistakes that the CRC process identifies. The polynomial is typically defined by the network protocol or external device. Since there is a well-established set of keys available, the processes for defining keys is not discussed here.
I understand how the calculation of CRC with a given polynomial key, however, how does one generate the polynomial key, and to ensure it can catch as much error as possible with a given set of protocal?
I assume the polynomial key has something to do with:
data lengh
data speed
others?

The part about using mathematical formulas to generate CRC polynomials is somewhat misleading. The number of 1 bits in a CRC polynomial is the maximum possible Hamming distance (HD)for the polynomial, and generally the actual Hamming distance will be less depending on the data length. The maximum bit error detection is the Hamming distance - 1.
Generally CRC polynomials that detect a higher number of bits are the product of multiple prime polynomials. For example, for a 32 bit CRC that can detect up to 7 errors for data + crc length = 1024 bits, the 33 bit CRC polynomial 0x1f1922815 = 0x787*0x557*0x465*0x3*0x3. The 0x3 factor will detect any odd number of bit errors, so the CRC needs to detect all possible 6 bit errors in 1024 bits.
I'm not aware of a formula to determine the maximum bit error detection, and generally a somewhat optimized brute force search is done. As an example, say a 32 bit CRC polynomial is being checked to see if it can detect all 6 bit errors for data + crc length of 1024 bits, the number of possible 6 bit error patterns in 1024 bits is comb(1024,6) = 1,577,953,087,760,896. To reduce this to something reasonable, the number of possible 3 bit errors, comb(1024,3) = 178,433,024, is used to create a large table, each entry containing the CRC and 3 bit indexes. The table is sorted, and then used to check for collisions where a 3 bit pattern's CRC is the same as a different 3 bit pattern's CRC. A check for failure for 4 bit patterns will also be needed (checking for collisions between two different 2 bit patterns).
Generally as the data length gets smaller, the maximum number of error bits guaranteed to be detected increases. Here is a link to a bunch of CRC polynomials and their error detection capability.
https://users.ece.cmu.edu/~koopman/crc/crc32.html
The notes page explains the table entries:
http://users.ece.cmu.edu/~koopman/crc/notes.html

Related

Shuffle sequential numbers without a buffer

I am looking for a shuffle algorithm to shuffle a set of sequential numbers without buffering. Another way to state this is that I’m looking for a random sequence of unique numbers that have a given period.
Your typical Fisher–Yates shuffle needs to have each element all of the elements it is going to shuffle, so that isn’t going to work.
A Linear-Feedback Shift Register (LFSR) does what I want, but only works for periods that are powers-of-two less two. Here is an example of using a 4-bit LFSR to shuffle the numbers 1-14:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
8
12
14
7
4
10
5
11
6
3
2
1
9
13
The first two is the input, and the second row the output. What’s nice is that the state is very small—just the current index. You can start of any index and get a difference set of numbers (starting at 1 yields: 8, 12, 14; starting at 9: 6, 3, 2), although the sequence is always the same (5 is always followed by 11). If I want a different sequence, I can pick a different generator polynomial.
The limitations to the LFSR are that the periods are always power-of-two less two (the min and max are always the same, thus unshuffled) and there not enough enough generator polynomials to allow every possible random sequence.
A block cipher algorithm would work. Every key produces a uniquely shuffled set of numbers. However all block ciphers (that I know about) have power-of-two block sizes, and usually a fixed or limited number of block sizes. A block cipher with a arbitrary non-binary block size would be perfect if such a thing exists.
There are a couple of projects I have that could benefit from such an algorithm. One is for small embedded micros that need to produce a shuffled sequence of numbers with a period larger than the memory they have available (think Arduino Uno needing to shuffle 1 to 100,000).
Does such an algorithm exist? If not, what things might I search for to help me develop such an algorithm? Or is this simply not possible?
Edit 2022-01-30
I have received a lot of good feedback and I need to better explain what I am searching for.
In addition to the Arduino example, where memory is an issue, there is also the shuffle of a large number of records (billions to trillions). The desire is to have a shuffle applied to these records without needing a buffer to hold the shuffle order array, or the time needed to build that array.
I do not need an algorithm that could produce every possible permutation, but a large number of permutations. Something like a typical block cipher in counter mode where each key produces a unique sequence of values.
A Linear Congruential Generator using coefficients to produce the desired sequence period will only produce a single sequence. This is the same problem for a Linear Feedback Shift Register.
Format-Preserving Encryption (FPE), such as AES FFX, shows promise and is where I am currently focusing my attention. Additional feedback welcome.
It is certainly not possible to produce an algorithm which could potentially generate every possible sequence of length N with less than N (log2N - 1.45) bits of state, because there are N! possible sequence and each state can generate exactly one sequence. If your hypothetical Arduino application could produce every possible sequence of 100,000 numbers, it would require at least 1,516,705 bits of state, a bit more than 185Kib, which is probably more memory than you want to devote to the problem [Note 1].
That's also a lot more memory than you would need for the shuffle buffer; that's because the PRNG driving the shuffle algorithm also doesn't have enough state to come close to being able to generate every possible sequence. It can't generate more different sequences than the number of different possible states that it has.
So you have to make some compromise :-)
One simple algorithm is to start with some parametrisable generator which can produce non-repeating sequences for a large variety of block sizes. Then you just choose a block size which is as least as large as your target range but not "too much larger"; say, less than twice as large. Then you just select a subrange of the block size and start generating numbers. If the generated number is inside the subrange, you return its offset; if not, you throw it away and generate another number. If the generator's range is less than twice the desired range, then you will throw away less than half of the generated values and producing the next element in the sequence will be amortised O(1). In theory, it might take a long time to generate an individual value, but that's not very likely, and if you use a not-very-good PRNG like a linear congruential generator, you can make it very unlikely indeed by restricting the possible generator parameters.
For LCGs you have a couple of possibilities. You could use a power-of-two modulus, with an odd offset and a multiplier which is 5 mod 8 (and not too far from the square root of the block size), or you could use a prime modulus with almost arbitrary offset and multiplier. Using a prime modulus is computationally more expensive but the deficiencies of LCG are less apparent. Since you don't need to handle arbitrary primes, you can preselect a geometrically-spaced sample and compute the efficient division-by-multiplication algorithm for each one.
Since you're free to use any subrange of the generator's range, you have an additional potential parameter: the offset of the start of the subrange. (Or even offsets, since the subrange doesn't need to be contiguous.) You can also increase the apparent randomness by doing any bijective transformation (XOR/rotates are good, if you're using a power-of-two block size.)
Depending on your application, there are known algorithms to produce block ciphers for subword bit lengths [Note 2], which gives you another possible way to increase randomness and/or add some more bits to the generator state.
Notes
The approximation for the minimum number of states comes directly from Stirling's approximation for N!, but I computed the number of bits by using the commonly available lgamma function.
With about 30 seconds of googling, I found this paper on researchgate.net; I'm far from knowledgable enough in crypto to offer an opinion, but it looks credible; also, there are references to other algorithms in its footnotes.

1024 bit pseudo random generator in verilog for FPGA

I want to generate random vectors of length 1024 in verilog . I have looked at certain implementations like Tausworth generators and Mersenne Twisters.
Most Mersenne twisters have 32 bit/ 64 bit outputs . I want to simulate an error pattern of 1024 bits with some probability p . So , I generate a 32 bit random number (uniformly distributed) using Mersenne Twister. Since I have 32 bit random numbers , this number will be in the range 0 to 2^32-1 . After this I set the number to 1, if the number generated from this 32 bit value is less than p*(2^32-1) .Otherwise the number is mapped to a 0 in my 1023 bit vector . Basically , each 32 bit number is used to generate a bit in the 1023 vector according to the probabilistic distribution .
The above method implies that I need 1024 clock cycles to generate each 1024 bit vector. Is there any other way which allows me to do this quickly ? I understand that I could use several instance of the Mersenne Twister in parallel using different seed values but I was afraid that those numbers will not be truly random and that there will be collisions . Is there something that I am doing wrong or something that I am missing ? I would really appreciate your help
Okay,
So I read a bit about Mersenne Twisters in general from wikipedia. I accept I did't outright get all of it but I got this: Given a seed value (to initialise the array), the module generates 32 bit random numbers.
Now, from your description above, it takes one cycle to compute one random number.
So your problem basically boils to to it's mathematics rather than being about verilog as such.
I would try to explain the math of it as best as I can.
You have a 32 bit uniformly distributed random number. So, the probability of any one bit being high or low is exactly (well, close to, cause psuedo random) 0.5.
Let's forget that this is a pseudo random generator, because that is the best you are going to get(So let's consider this as our ideal).
Even if we generate 5 numbers one after the other, the probability of each one being any particular number is still uniformly distributed. So if we concatenate these five numbers, we will get a 160 bit completely random number.
If it's still not clear, consider this way.
I'm gonna break the problem down. Let's say we have a 4-bit random number generator (RNG), and we require 16 bit random numbers.
Each output of the RNG would be a hex digit with a uniform probability distribution. So the probability of getting some particular digit (say... A) is 1/16. Now I want to make a 4 digit Hex number (say... 0xA019).
Probability of getting A as the Most Significant digit = 1/16
Probability of getting 0 as digit number 2 = 1/16
Probability of getting 1 as digit number 3 = 1/16
Probability of getting 9 as the Least Significant digit = 1/16
So the probability of getting 0xA019 = 1/(2^16). Infact, probability of getting any four digit hex number would be exactly the same. Now, extend the same logic to Base-32 Number systems with 32 digit numbers as the required output and you have your solution.
So, we see, we could do with just 32 repetitions of the Mersenne twister to get the 1024 bit output (that would take 32 cycles, still kinda slow). What you could also do is synthesise 32 twisters in parallel (that would give you the output in one stroke but would be very heavy on the fpga in terms of area, power constraints).
The best way to go about this would be to try for some middle ground (maybe 4 parallel twisters running in 8 cycles). This would really be a question of the end application of the module and the power and timing constraints that you need for that application.
As for giving different seed values, most PRNGs usually have provision for input seeds just to increase randomness, from what I read on Mersenne Twisters, it has the same case.
Hope that answers your question.

Compression by quazi-logarithmic scale

I need to compress a large set of (unsigned) integer values, whereas the goal is to keep their relative accuracy. Simply speaking, there is a big difference between 1,2,3, but the difference between 1001,1002,1003 is minor.
So, I need a sort of a lossy transformation. The natural choice is to build a logarithmic scale, but the drawback is that conversion to/from it requires floating-point operations, log/exp calculation and etc.
OTOH I don't need a truly logarithmic scale, I just need it to resemble it in some sense.
I came up with an idea of encoding numbers in a floating-point manner. That is, I allocate N bits for each compressed number, from which some represent the mantissa, and the remaining are for the order. The choice for the size of the mantissa and order would depend on the needed range and accuracy.
My question is: is it a good idea? Or perhaps there exists a better encoding scheme w.r.t. computation complexity vs quality (similarity to logarithmic scale).
What I implemented in details:
As I said, bits for mantissa and for order. Order bits are leading, so that the greater the encoded number - the greater the raw one.
The actual number is decoded by appending an extra leading bit to mantissa (aka implicit bit), and left-shifting it by the encoded order. The smallest decoded number would be 1 << M where M is the size of mantissa. If the needed range should start from 0 (like in my case) then this number can be subtracted.
Encoding the number is also simple. Add the 1 << M, then find its order, i.e. how much it should be right-shifted until it fits our mantissa with implicit leading bit, and then encoding is trivial. Finding the order is done via median-search, which results to just a few ifs. (for example, if there are 4 order bits, the max order is 15, and it's found within 4 ifs).
I call this a "quazi-logarithmic" scale. The absolute precision decreases the greater is the number. But unlike the true logarithmic scale, where the granularity increases contiguously, in our case it jumps by factor of 2 after each fixed-size range.
The advantages of this encoding:
Fast encoding and decoding
No floating-point numbers, no implicit precision loss during manipulations with them, boundary cases and etc.
Not dependent on standard libraries, complex math functions, etc.
Encoding and decoding may be implemented via C++ template stuff, so that conversion may even be implemented in compile-time. This is convenient to define some compile-time constants in a human-readable way.
In your compression-algorithm every group of numbers that result in the same output after being compressed will be decompressed to the lowest number in that group. If you changed that to the number in the middle the average fault would be reduced.
E.g. for a 8-bit mantissa and 5bit exponent the numbers in the range[0x1340, 0x1350) will be translated into 0x1340 by decompress(compress(x)). If the entire range would first be compressed and afterwards decompressed the total difference would be 120. If the output would be 0x1348 instead, the total error would only be 64, which reduces the error by a solid 46.7%. So simply adding 2 << (exponent - 1) to the output will significantly reduce the error of the compression-scheme.
Apart from that I don't see much of an issue with this scheme. Just keep in mind that you'll need a specific encoding for 0. There would be alternative encodings, but without knowing anything specific about the input this one will be the best you can get.
EDIT:
While it is possible to move the correction of the result from the decompression to the compression-step, this comes at an increased expenses of enlarging the exponent-range by one. This is due the fact that for the numbers with the MSB set only half of the numbers will use the corresponding exponent (the other half will be populated by numbers with the second-most significant bit set). The higher half of numbers with the MSB set will be placed in the next-higher order.
So for e.g. for 32-bit numbers encoded with a 15bit-mantissa only numbers until 0x8FFF FFFF will have order 15 (Mantissa = 0x1FFF and Exponent = 15). All higher values will have order 16 (Mantissa = 0x?FFF and Exponent = 16). While the increase of the exponent by 1 in itself doesn't seem much, in this example it already costs an additional bit for the exponent.
In addition the decompression-step for the above example will produce an integer-overflow, which may be problematic under certain circumstances (e.g. C# will throw an exception if the decompression is done in checked-mode). Same applies for the compression-step: unless properly handled, adding 2^(order(n) - 1) to the input n will cause an overflow, thus placing the number in order 0.
I would recommend moving the correction to the decompression-step (as shown above) to remove potential integer-overflows as a source of problems/bugs and keep the number of exponents that need to be encoded minimal.
EDIT2:
Another issue with this approach is the fact that half of the numbers (excluding the lowest order) wind up in a larger "group" when the correction is done on compression, thus reducing precision.

Ideas for an efficient way of hashing a 15-puzzle state

I am implementing a 15-puzzle solver by Ant Colony Optimization, and I am thinking a way of efficiently hashing each state into a number, so I waste the least amount of bytes.
A state is represented by a list of 16 numbers, from 0 to 15 (0 is the hole).
Like:
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,0]
So I want to create an unique number to identify that state.
I could convert all the digits to a base 16 number, but I don't think that's very efficient
Any ideas?.
Thanks
Your state is isomorphic to the permutations of 16 elements. A 45 bit number is enough to enumerate those (log2 16!), but we might as well round up to 64 bit if it's beneficial. The problem reduces to finding an efficient conversion from the state to its position in the enumeration of states.
Knowing that each number in 0..16 occurs only once, we could create 16 variables of log2 16 = 4 bits each, where the ith variable denotes which position the number i occurs. This has quite a bit of redundancy: It takes log2(16) * 16 bits, but that's exactly 64 bit. It can be implemented pretty efficiently (untested pseudocode):
state2number(state):
idx = 0
for i in [0;16):
val = state[i]
idx |= i << (val * 4)
return idx
I don't know if this is what you meant by "convert all the digits to a base 16 number". It is insanely efficient, when unrolled and otherwise micro-optimized it's only a few dozen cycles. It takes two bytes more than necessary, but 64 bit is still pretty space efficient and directly using it as index into some array isn't feasible for 64 nor for 45 bit.
There are 16! = 2.09*10^13 possible states which needs about 44.25 bits to be encoded.
So if you want to encode the state in bytes, you need at least 6 bytes to do it.
Why not encode it this way:
Let us name the values a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
The value can be
b`:= b - (b>a)?1:0;
c`:= c - (c>a)?1:0 - (c>b)?1:0
d`:= d - (d>a)?1:0 - (d>b)?1:0 - (d>c)?1:0
....
hashNumber= a+b*15+c*15*14+d`*15*14*15+....
This will give you a bijective mapping of each possible sate to a number fitting in 6 bytes.
Also converting the number back to its referring state is quite easy, if you need to do it.
Not optimal but fast is:
Use 4 bits for each number (leave out the last one because it can be computed from the previous 15 numbers) that needs 15*4 bits = 60 bits.
can be stored in 7.5 bytes or if you are ok to waste more, simply use 8 bytes.

Understanding assumptions about machine word size in analyzing computer algorithms

I am reading the book Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein.. In the second chapter under "Analyzing Algorithms" it is mentioned that :
We also assume a limit on the size of each word of data. For example , when working with inputs of size n , we typically assume that integers are represented by c lg n bits for some constant c>=1 . We require c>=1 so that each word can hold the value of n , enabling us to index the individual input elements , and we restrict c to be a constant so that the word size doesn't grow arbitrarily .( If the word size could grow arbitrarily , we could store huge amounts of data in one word and operate on it all in constant time - clearly an unrealistic scenario.)
My questions are why this assumption that each integer should be represented by c lg n bits and also how c>=1 being the case allows us to index the individual input elements ?
first, by lg they apparently mean log base 2, so lg n is the number of bits in n.
then what they are saying is that if they have an algorithm that takes a list of numbers (i am being more specific in my example to help make it easier to understand) like 1,2,3,...n then they assume that:
a "word" in memory is big enough to hold any of those numbers.
a "word" in memory is not big enough to hold all the numbers (in one single word, packed in somehow).
when calculating the number of "steps" in an algorithm, an operation on one "word" takes one step.
the reason they are doing this is to keep the analysis realistic (you can only store numbers up to some size in "native" types; after that you need to switch to arbitrary precision libraries) without choosing a particular example (like 32 bit integers) that might be inappropriate in some cases, or become outdated.
You need at least lg n bits to represent integers of size n, so that's a lower bound on the number of bits needed to store inputs of size n. Setting the constant c >= 1 makes it a lower bound. If the constant multiplier were less than 1, you wouldn't have enough bits to store n.
This is a simplifying step in the RAM model. It allows you to treat each individual input value as though it were accessible in a single slot (or "word") of memory, instead of worrying about complications that might arise otherwise. (Loading, storing, and copying values of different word sizes would take differing amounts of time if we used a model that allowed varying word lengths.) This is what's meant by "enabling us to index the individual input elements." Each input element of the problem is assumed to be accessible at a single address, or index (meaning it fits in one word of memory), simplifying the model.
This question was asked very long ago and the explanations really helped me, but I feel like there could still be a little more clarification about how the lg n came about. For me talking through things really helps:
Lets choose a random number in base 10, like 27, we need 5 bits to store this. Why? Well because 27 is 11011 in binary. Notice 11011 has 5 digits each 'digit' is what we call a bit hence 5 bits.
Think of each bit as being a slot. For binary, each of those slots can hold a 0 or 1. What's the largest number I can store with 5 bits? Well, the largest number would fill each slot: 11111
11111 = 31 = 2^5 so to store 31 we need 5 bits and 31 is 2^5
Generally (and I will use very explicit names for clarity):
numToStore = 2 ^ numBitsNeeded
Since log is the mathematical inverse of exponent we get:
log(numToStore) = numBitsNeeded
Since this is likely to not result in an integer, we use ceil to round our answer up. So applying our example to find how many bits are needed to store the number 31:
log(31) = 4.954196310386876 = 5 bits

Resources