QR code generation algorithm implementation case analysis - algorithm

I'm implementing a QR code generation algorithm as explained on thonky.com and I'm trying to understand one of the cases:
as stated in this page and this page I can deduct that if the code is protected with M error correction level, and the chosen mask is No. 0, the first 5 bits of the format string (non-XORed) are '00000', and because of this the whole string of 15 bits is zeros.
the next step is to remove all leading zeros, which are, again, all of them. it means that there's nothing to XOR the generator polynomial string(10100110111) with, thus giving us a final string of 15 zeros, which means that the final (XORed) string will be simply the mask string (101010000010010).
I'm seeking for confirmation that my logic is right.
Thank you all very much in advance for the help.

Your logic is correct.
remove all leading zeroes
The actual process could be described as appending 10 zero bits to the 5 bits of data and treating the 15 bits as 15 single bit coefficients of a polynomial, then dividing that polynomial by the 11 bit generator polynomial resulting in a 10 bit remainder polynomial, which is then subtracted from the 5 data bits + 10 zero bits polynomial. Since this is binary math, add and subtract are both xor operations, and since the 10 appended bits are zero bits, the process can just append the 10 remainder bits to the 5 data bits.
As commented above, rather than actually implementing a BCH encode function, since there are only 32 possible format strings, you can just do a table lookup.
https://www.thonky.com/qr-code-tutorial/format-version-tables

Related

Algorithm for Converting large integer to string without modulo base

I have looked for a while to find an algorithm which converts integers to string. My requirement is to do this manually as I am using my own large number type. I have + - * /(with remainder) defined, but need to find a way to print a single number from a double int (high and low, if int is 64bits, 128bits total).
I have seen some answers such as
Convert integer to string without access to libraries
Converting a big integer to decimal string
but was wondering if a faster algorithm was possible. I am open to working with bits directly(e.g. base2 to base10-string - I could not find such an algorithm however), but I was just hoping to avoid repeated division by 10 for numbers possibly as large as 2^128.
You can use divide-and-conquer in such a way that the parts can be converted to string using your standard library (which will typically be quite efficient at that job).
So instead of dividing by 10 in every iteration, you can e.g. divide by 10**15, and have your library convert the chunks to 15-digit strings. After at most three steps, you're finished.
Of course you have to do some string manipulation regarding the zero-padding. But maybe your library can help you here as well, if you use something like a %015d zero-padding format for all the lower parts, and for the highest non-zero part use a non-padding %d format.
You may try your luck with a contrived method, as follows.
Numbers can be represented using the Binary-Coded Decimal representation. In this representation, every decimal digit is stored on 4 bits, and when performing an addition, if the sum of two digits exceeds 9, you add 6 and carry to the left.
If you have pre-stored the BCD representation of all powers of 2, then it takes at most 128 additions to perform the conversion. You can spare a little by the fact that for low powers, you don't need full length addition (39 digits).
But this sounds as a lot of operations. You can "parallelize" them by packing several BCD digits in an single integer: an integer addition on 32 bits is equivalent to 8 simultaneaous BCD digit additions. But we have a problem with the carries. To work around, we can store the digits on 5 bits instead of 4, and the carries will appear in the fifth bit. Then we can obtain the carries by masking, add them to the next digits (shift left 5), and adjust the digit sums (multiply by 10 and subtract).
2 3 4 5 6
+ 7 6 9 2 1
= 9 913 7 7
Carries:
0-0-1-0-0
Adjustments:
9 913 7 7
-0000010000
= 9 9 3 7 7
Actually, you have to handle possible cascaded carries, so the sum will involve the two addends and carries in, and generate a sum and carries out.
32 bits operations allow you to process 6 digits at a time (7 rounds for 39 digits), and 64 bits operations, 12 digits (4 rounds for 39 digits).
if you want to just encode your numbers as string
use hex numbers that is fast as you can cast all the digits just by bit operations ... also using Base64 encoding is doable just by bit operations + the translation table. Booth representations can be done on small int arithmetics only in O(n) where n is the count of printed digits.
If you need base10
then print a hex string and convert it to decimal on strings like this:
str_hex2dec
this is much slower than #1 but still doable on small int arithmetics ... You can do this also in reverse (input number from string) by using dec2hex ...
For bigint libs there are also another ways of easing up the string/integer conversions:
BCD
binary coded decimal ... the number printed as hex is the decadic number. So each digit has 4 bits. This waste some memory but many CPU's has BCD support and can do operations on such integers natively.
Base 10^n
sometimes is used base 10^n instead of 2^m while
10^n <= 2^m
The m is bitwidth of your atomic integer and n i snumber of decadic digits that fits inside it.
for example if your atomic unsigned integer is 16 bit it can hold up to 65536 values in base 2. If you use base 10000 instead you can print each atom asa decadic number with zeropad from left and simply stack all such prints together.
This also waste some memory but usually not too much (if the bitwidth is reasonably selected) and you can use standard instructions on the integers. Only the Carry propagation will change a bit...
for example for 32bit words:
2^32 = 4294967296 >= 1000000000
so we wasted log2(4.2949...) = ~2.1 bits per each 32 bits. This is much better than BCD log2(16/10)*(32/4)= ~5.42 bits And usually even better with higher bit widths

Binary Number Having same number of 0s and 1s [duplicate]

This question already has answers here:
Get all 1-k tuples in a n-tuple
(3 answers)
Closed 8 years ago.
Given a binary string or a binary number(one is free to take it in any way), I need to find out the next smaller binary number but retaining the number of 0s and 1s in the original binary string or number.
For e.g.
If the given binary number or string was 11100000, the required output would be 11010000.
If the given binary number or string was 11010000, the required output would be 11001000.
Of course, I can do this with Brute Force approach. But I needed a better solution. What could be an optimal way of doing it? I was wondering if someone can help me reach a solution to this in O(1) using bit wise operations.
This is an elaboration on Setzer22's answer, which was close but which lacked one vital piece.
FindNextSmallestWithSameNumberOfBits(string[1...n])
1. for i = n - 1 to 1 do
2. if string[i+1] = 0 and string[i] = 1 then
3. string[i] := 0
4. string[i+1] := 1
5. sort(string[i+2...n], descending)
6. return string[1...n]
7. return "no solution"
This is an O(n) algorithm, which is a provably optimal asymptotic bound for this problem when the input size is unrestricted; while this is "bitwise" in the sense that it operates on bits, it clearly doesn't use what one would typically think of as "bitwise operations." Luckily, for inputs which can be of arbitrary length, there can be no asymptotic advantage to using traditional "bitwise operations" over this method. For inputs of fixed length, to which asymptotic analysis does not readily apply, one might do better using a technique such as those linked to by Asuka in the other answer to this question.
Note, based on comments, that sorting on line 5 can be replaced with simply reversing the string. The reason for this is that this substring is guaranteed to be of the form 0...01...1 (that is, any 0s to the left of any 1s) since, if it weren't, we'd have already found an occurrence of the string 10 and satisfied the condition on line 2.
The key that was missing in Setzer22's answer is that, once you move the rightmost 1 with a 0 to the right of it to the right, you then need to left-shift all the 1s that are even further right as far left as they will go. The reason for this is that the 1 bit shifted to the right is more significant than the bits to the right of it, so left-shifting any 1s which are less significant will give a larger number, but not large enough to undo the effect of reducing the more significant bit.
Clarification based on comments: notice that in line 7 of the pseudocode presented above, it's possible that the algorithm won't return a valid string. The reason for this is that, sometimes, there is no string with the same number of 1s which represents a smaller number. This occurs if and only if the string 01 does not appear as a substring in the input string (in which case the condition on line 2 is never satisfied).
This isn't the clearest explanation of all time, so please let me know if it needs more work. Here's an example:
10011 // input
01011 // right-shift the right-most 1 bit with a 0 to the right of it
01110 // left-shift all 1 bits to the right of the right-shifted as far as possible
1010100011 // input
1010010011 // right-shift the right-most 1 bit with a 0 to the right of it
1010011100 // left-shift all 1 bits to the right of the right-shifted bit as far as possible
One way to clarify this which just occurred to me: right-shifting the 1 bit guarantees that the result will be smaller than the original number; left-shifting the 1s to the right guarantees that the result will no smaller than is necessary.
May be this is what you are finding:
https://github.com/hcs0/Hackers-Delight/blob/master/snoob.c.txt
The functions snoob(), snoob1(), snoob2(), snoob3(), snoob4() and next_set_of_n_elements() are various implementations.
These functions are helper functions which are called by the above functions:
ntz() stands for "number of trailing zeros"
nlz() stands for "number of leading zeros"
pop() stands for "population count" (number of bit set (number of "1"s) in the string)
This is very efficient but only works on fix size integers (eg 32-bit, 64-bit).

Fastest method for adding/summing the individual digit components of a number

I saw a question on a math forum a while back where a person was discussing adding up the digits in a number over and over again until a single digit is achieved. (i.e. "362" would become "3+6+2" which would become "11"... then "11" would become "1+1" would would become "2" therefor "362" would return 2... I wrote some nice code to get an answer to this and posted it only to be outdone by a user who suggested that any number in modulo 9 is equal to this "infinite digit sum", I checked it an he was right... well almost right, if zero was returned you had to switch it out with a "9" but that was a very quick fix...
362 = 3+6+2 = 11 = 1+1 = 2
or...
362%9 = 2
Anways, the mod9 method works fantastic for infinitely adding the sum of the digits until you are left with just a single digit... but what about only doing it once (i.e. 362 would just return "11")... Can anyone think of fast algorithms?
There's a cool trick for summing the 1 digits in binary, and with a fixed-width integer. At each iteration, you separate out half the digits each into two values, bit-shift one value down, then add. First iteration, separate ever other digit. Second iteration, pairs of digits, and so on.
Given that 27 is 00011011 as 8-bit binary, the process is...
00010001 + 00000101 = 00010110 <- every other digit step
00010010 + 00000001 = 00010011 <- pairs of digits
00000011 + 00000001 = 00000100 <- quads, giving final result 4
You could do a similar trick with decimal, but it would be less efficient than a simple loop unless you had a direct representation of decimal numbers with fast operations to zero out selected digits and to do digit-shifting. So for 12345678 you get...
02040608 + 01030507 = 03071115 <- every other digit
00070015 + 00030011 = 00100026 <- pairs
00000026 + 00000010 = 00000036 <- quads, final result
So 1+2+3+4+5+6+7+8 = 36, which is correct, but you can only do this efficiently if your number representation is fixed-width decimal. It always takes lg(n) iterations, where lg means the base two logarithm, and you round upwards.
To expand on this a little (based on in-comments discussions), let's pretend this was sane, for a bit...
If you count single-digit additions, there's actually more work than a simple loop here. The idea, as with the bitwise trick for counting bits, is to re-order those additions (using associativity) and then to compute as many as possible in parallel, using a single full-width addition to implement two half-width additions, four quarter-width additions etc. There's significant overhead for the digit-clearing and digit-shifting operations, and even more if you implement this as a loop (calculating or looking up the digit-masking and shift-distance values for each step). The "loop" should probably be fully unrolled and those masks and shift-distances be included as constants in the code to avoid that.
A processor with support for Binary Coded Decimal (BCD) could handle this. Digit masking and digit shifting would be implemented using bit masking and bit shifting, as each decimal digit would be encoded in 4 (or more) bits, independent of the encoding of other digits.
One issue is that BCD support is quite rare these days. It used to be fairly common in the 8 bit and 16 bit days, but as far as I'm aware, processors that still support it now do so mainly for backward compatibility. Reasons include...
Very early processors didn't include hardware multiplication and division. Hardware support for these operations means it's easier and more efficient to convert binary to decimal now. Binary is used for almost everything now, and BCD is mostly forgotten.
There are decimal number representations around in libraries, but few if any high level languages ever provided portable support to hardware BCD, so since assembler stopped being a real-world option for most developers BCD support simply stopped being used.
As numbers get larger, even packed BCD is quite inefficiently packed. Number representations base 10^x have the most important properties of base 10, and are easily decoded as decimal. Base 1000 only needs 10 bits per three digits, not 12, because 2^10 is 1024. That's enough to show you get an extra decimal digit for 32 bits - 9 digits instead of 8 - and you've still got 2 bits left over, e.g. for a sign bit.
The thing is, for this digit-totalling algorithm to be worthwhile at all, you need to be working with fixed-width decimal of probably at least 32 bits (8 digits). That gives 12 operations (6 masks, 3 shifts, 3 additions) rather than 15 additions for the (fully unrolled) simple loop. That's a borderline gain, though - and other issues in the code could easily mean it's actually slower.
The efficiency gain is clearer at 64 bits (16 decimal digits) as there's still only 16 operations (8 masks, 4 shifts, 4 additions) rather than 31, but the odds of finding a processor that supports 64-bit BCD operations seems slim. And even if you did, how often do you need this anyway? It seems unlikely that it could be worth the effort and loss of portability.
Here's something in Haskell:
sumDigits n =
if n == 0
then 0
else let a = mod n 10
in a + sumDigits (div n 10)
Oh, but I just read you're doing that already...
(then there's also the obvious:
sumDigits n = sum $ map (read . (:[])) . show $ n
)
For short code, try this:
int digit_sum(int n){
if (n<10) return n;
return n%10 + digit_sum(n/10);
}
Or, in words,
-If the number is less than ten, then the digit sum is the number itself.
-Otherwise, the digit sum is the current last digit (a.k.a. n mod10 or n%10), plus the digit sum of everything to the left of that number (n divided by 10, using integer division).
-This algorithm can also be generalized for any base, substituting the base in for 10.
int digit_sum(int n)
Do
if (n<10) return n;
Exit do
else
n=n%10 + digit_sum(n/10);
Loop

How to determine maximal error correction/detection method for 6 + 1 digits?

I have the following constraints for a number that will be recognized from an image:
6 digits data
1 digit error correction
The first 6 digits cannot be changed (they must be human readable)
The check digit must remain a numeral
The error correction scheme currently is based on a checksum, such that the 7th digit is the last digit of the sum of the first 6 digits.
E.g.
123456 => 1234561
999999 => 9999994
472912 => 4729125
219274 => 2192745
How can I determine the number and types of errors this scheme can detect/correct, and is there a scheme that will provide better error detection? (Error detection is more important than error correction for my use case).
You can try Luhn, it's a little more complex that what you describe but it will meet your requirements.
A copy paste from wikipedia:
The Luhn algorithm will detect any single-digit error, as well as
almost all transpositions of adjacent digits. It will not, however,
detect transposition of the two-digit sequence 09 to 90 (or vice
versa). It will detect 7 of the 10 possible twin errors (it will not
detect 22 ↔ 55, 33 ↔ 66 or 44 ↔ 77).
Other, more complex check-digit
algorithms (such as the Verhoeff algorithm and the Damm algorithm) can
detect more transcription errors. The Luhn mod N algorithm is an
extension that supports non-numerical strings.
Because the algorithm
operates on the digits in a right-to-left manner and zero digits
affect the result only if they cause shift in position, zero-padding
the beginning of a string of numbers does not affect the calculation.
Therefore, systems that pad to a specific number of digits (by
converting 1234 to 0001234 for instance) can perform Luhn validation
before or after the padding and achieve the same result.
Prepending a
0 to odd-length numbers enables you to process the number from left to
right rather than right to left, doubling the odd-place digits.
The
algorithm appeared in a US Patent for a hand-held, mechanical device
for computing the checksum. It was therefore required to be rather
simple. The device took the mod 10 sum by mechanical means. The
substitution digits, that is, the results of the double and reduce
procedure, were not produced mechanically. Rather, the digits were
marked in their permuted order on the body of the machine.

Least significant bit first

While working on ruby I came across:
> "cc".unpack('b8B8')
=> ["11000110", "01100011"]
Then I tried Googleing to find a good answer on "least significant bit", but could not find any.
Anyone care to explain, or point me in the right direction where I can understand the difference between "LSB first" and "MSB first".
It has to do with the direction of the bits. Notice that in this example it's unpacking two ascii "c" characters, and yet the bits are mirror images of each other. LSB means the rightmost (least-significant) bit is the first bit. MSB means the leftmost (most-significant) is the first bit.
As a simple example, consider the number 5, which in "normal" (readable) binary looks like this:
00000101
The least significant bit is the rightmost 1, because that is the 2^0 position (or just plan 1). It doesn't impact the value too much. The one next to it is the 2^1 position (or just plain 0 in this case), which is a bit more significant. The bit to its left (2^2 or just plain 4) is more significant still. So we say this is MSB notation because the most significant bit (2^7) comes first. If we change it to LSB, it simply becomes:
10100000
Easy right?
(And yes, for all you hardware gurus out there I'm aware that this changes from one architecture to another depending on endianness, but this is a simple answer for a simple question)
The term "significance" of a bit or byte only makes sense in the context of interpreting a sequence of bits or bytes as an integer. The bigger the impact of the bit or byte on the value of the resulting integer - the higher its significance. The more "significant" it is to the value.
So, for example, when we talk about a sequence of four bytes having the least significant byte first (aka little-endian), what we mean is that when we interpret those four bytes as a 32bit integer, the first byte denotes the lowest eight binary digits of the integer, the second bytes denotes the 17th through 24th binary digits of the integer, the third denotes the 9th through 16th, and the last byte denotes the highest eight bits of the integer.
Likewise, if we say a sequence of 8 bits is in most significant bit first order, what we mean is that if we interpret the 8 bits as an 8-bit integer, the first bit in the sequence denotes the highest binary digit of the integer, the second bit the second highest, and so on, until the last bit denotes the lowest binary digit of the integer.
Another thing to think about is that one can say that the usual decimal notation has a convention that is most significant digit first. For example, a decimal number like:
1250
is read to mean:
1 x 1000 +
2 x 100 +
5 x 10 +
0 x 1
Right? Now imagine a different convention that is least significant digit first. The same number would be written:
0521
and would be read as:
0 x 1 +
5 x 10 +
2 x 100 +
1 x 1000
Another thing you should observe in passing is that in the C family of languages (most modern programming languages), the shift-left operator (<<) and shift-right operator (>>) are pointing in a most significant bit direction. That is shifting a bit left increases its significance and shifting it right decreases it, meaning that left is most significant (and the left side is usually what we mean by first, at least in the west).

Resources