Error detecting code and Hamming distance - ascii

The Hamming distance of v and w equals 2, but without parity
bit it would be just 1. Why is this the case?

This would be more appropriately asked in the theoretical computer science section of StackExchange, but since you've been honest and flagged it as homework ...
ASCII uses 7 bits to specify a character. (In ASCII, 'X' is represented by the 7 bits `1011000'.) If you start with any ASCII sequence, the number of bits you need to flip to get to another legitimate ASCII sequence is only 1 bit. Therefore the Hamming distance between plain ASCII sequences is 1.
However, if a parity bit is added (for a total of 8 bits -- the 7 ASCII bits plus one parity bit, conventionally shown in the leftmost position) then any single-bit flip in the sequence will cause the result to have incorrect parity. Following the example, with even parity 'X' is represented by 11011000, because the parity bit is chosen to give an even number of 1s in the sequence. If you now flip any single bit in that sequence then the result will be unacceptable because it will have incorrect parity. In order to arrive at an acceptable new sequence with even parity you must change a minimum of two bits. Therefore when parity is in effect the Hamming distance between acceptable sequences is 2.

Related

QR code generation algorithm implementation case analysis

I'm implementing a QR code generation algorithm as explained on thonky.com and I'm trying to understand one of the cases:
as stated in this page and this page I can deduct that if the code is protected with M error correction level, and the chosen mask is No. 0, the first 5 bits of the format string (non-XORed) are '00000', and because of this the whole string of 15 bits is zeros.
the next step is to remove all leading zeros, which are, again, all of them. it means that there's nothing to XOR the generator polynomial string(10100110111) with, thus giving us a final string of 15 zeros, which means that the final (XORed) string will be simply the mask string (101010000010010).
I'm seeking for confirmation that my logic is right.
Thank you all very much in advance for the help.
Your logic is correct.
remove all leading zeroes
The actual process could be described as appending 10 zero bits to the 5 bits of data and treating the 15 bits as 15 single bit coefficients of a polynomial, then dividing that polynomial by the 11 bit generator polynomial resulting in a 10 bit remainder polynomial, which is then subtracted from the 5 data bits + 10 zero bits polynomial. Since this is binary math, add and subtract are both xor operations, and since the 10 appended bits are zero bits, the process can just append the 10 remainder bits to the 5 data bits.
As commented above, rather than actually implementing a BCH encode function, since there are only 32 possible format strings, you can just do a table lookup.
https://www.thonky.com/qr-code-tutorial/format-version-tables

Accessing individual bits in MIPS

I'm writing a program in MIPS that solves a maze using a left-hand rule algorithm. I already have my algorithm written, but I need to find a way to keep track of the spaces in the maze that I've already visited so that I can find the "best" and most direct solution to solve the maze.
In the program, register $t9 is a 32 bit number that stores information about the location of the car that traverses the maze, including column and row position, which is what I need to isolate. Basically, all I need to know is how to work with/isolate those specific bits.
Bits 31-24 is an 8-bit number representing the row in 2's compliment
Bits 23-16 is an 8-bit number representing the column in 2's compliment
tl;dr I just need to extract the first 8-bits, and the next 8-bits from a 32 bit number located in $t9 in MIPS
Thank you!
To get bits 31-24, perform a logical shift right (SRL) by 24. The remaining number will correspond to the value of those bits, as interpreted as an 8-bit integer.
To get bits 23-16, shift right by 16, then AND by 0xff.
Will you figure out the MIPS commands for that?

Best way to represent numbers of unbounded length?

What's the most optimal (space efficient) way to represent integers of unbounded length?
(The numbers range from zero to positive-infinity)
Some sample number inputs can be found here (each number is shown on it's own line).
Is there a compression algorithm that is specialized in compressing numbers?
You've basically got two alternatives for variable-length integers:
Use 1 bit of every k as an end terminator. That's the way Google protobuf does it, for example (in their case, one bit from every byte, so there are 7 useful bits in every byte).
Output the bit-length first, and then the bits. That's how ASN.1 works, except for OIDs which are represented in form 1.
If the numbers can be really big, Option 2 is better, although it's more complicated and you have to apply it recursively, since you may have to output the length of the length, and then the length, and then the number. A common technique is to use a Option 1 (bit markers) for the length field.
For smallish numbers, option 1 is better. Consider the case where most numbers would fit in 64 bits. The overhead of storing them 7 bits per byte is 1/7; with eight bytes, you'd represent 56 bits. Using even the 7/8 representation for length would also represent 56 bits in eight bytes: one length byte and seven data bytes. Any number shorter than 48 bits would benefit from the self-terminating code.
"Truly random numbers" of unbounded length are, on average, infinitely long, so that's probably not what you've got. More likely, you have some idea of the probability distribution of number sizes, and could choose between the above options.
Note that none of these "compress" (except relative to the bloated ascii-decimal format). The asymptote of log n/n is 0, so as the numbers get bigger the size of the size of the numbers tends to occupy no (relative) space. But it still needs to be represented somehow, so the total representation will always be a bit bigger than log2 of the number.
You cannot compress per se, but you can encode, which may be what you're looking for. You have files with sequences of ASCII decimal digits separated by line feeds. You should simply Huffman encode the characters. You won't do much better than about 3.5 bits per character.

How to determine maximal error correction/detection method for 6 + 1 digits?

I have the following constraints for a number that will be recognized from an image:
6 digits data
1 digit error correction
The first 6 digits cannot be changed (they must be human readable)
The check digit must remain a numeral
The error correction scheme currently is based on a checksum, such that the 7th digit is the last digit of the sum of the first 6 digits.
E.g.
123456 => 1234561
999999 => 9999994
472912 => 4729125
219274 => 2192745
How can I determine the number and types of errors this scheme can detect/correct, and is there a scheme that will provide better error detection? (Error detection is more important than error correction for my use case).
You can try Luhn, it's a little more complex that what you describe but it will meet your requirements.
A copy paste from wikipedia:
The Luhn algorithm will detect any single-digit error, as well as
almost all transpositions of adjacent digits. It will not, however,
detect transposition of the two-digit sequence 09 to 90 (or vice
versa). It will detect 7 of the 10 possible twin errors (it will not
detect 22 ↔ 55, 33 ↔ 66 or 44 ↔ 77).
Other, more complex check-digit
algorithms (such as the Verhoeff algorithm and the Damm algorithm) can
detect more transcription errors. The Luhn mod N algorithm is an
extension that supports non-numerical strings.
Because the algorithm
operates on the digits in a right-to-left manner and zero digits
affect the result only if they cause shift in position, zero-padding
the beginning of a string of numbers does not affect the calculation.
Therefore, systems that pad to a specific number of digits (by
converting 1234 to 0001234 for instance) can perform Luhn validation
before or after the padding and achieve the same result.
Prepending a
0 to odd-length numbers enables you to process the number from left to
right rather than right to left, doubling the odd-place digits.
The
algorithm appeared in a US Patent for a hand-held, mechanical device
for computing the checksum. It was therefore required to be rather
simple. The device took the mod 10 sum by mechanical means. The
substitution digits, that is, the results of the double and reduce
procedure, were not produced mechanically. Rather, the digits were
marked in their permuted order on the body of the machine.

Least significant bit first

While working on ruby I came across:
> "cc".unpack('b8B8')
=> ["11000110", "01100011"]
Then I tried Googleing to find a good answer on "least significant bit", but could not find any.
Anyone care to explain, or point me in the right direction where I can understand the difference between "LSB first" and "MSB first".
It has to do with the direction of the bits. Notice that in this example it's unpacking two ascii "c" characters, and yet the bits are mirror images of each other. LSB means the rightmost (least-significant) bit is the first bit. MSB means the leftmost (most-significant) is the first bit.
As a simple example, consider the number 5, which in "normal" (readable) binary looks like this:
00000101
The least significant bit is the rightmost 1, because that is the 2^0 position (or just plan 1). It doesn't impact the value too much. The one next to it is the 2^1 position (or just plain 0 in this case), which is a bit more significant. The bit to its left (2^2 or just plain 4) is more significant still. So we say this is MSB notation because the most significant bit (2^7) comes first. If we change it to LSB, it simply becomes:
10100000
Easy right?
(And yes, for all you hardware gurus out there I'm aware that this changes from one architecture to another depending on endianness, but this is a simple answer for a simple question)
The term "significance" of a bit or byte only makes sense in the context of interpreting a sequence of bits or bytes as an integer. The bigger the impact of the bit or byte on the value of the resulting integer - the higher its significance. The more "significant" it is to the value.
So, for example, when we talk about a sequence of four bytes having the least significant byte first (aka little-endian), what we mean is that when we interpret those four bytes as a 32bit integer, the first byte denotes the lowest eight binary digits of the integer, the second bytes denotes the 17th through 24th binary digits of the integer, the third denotes the 9th through 16th, and the last byte denotes the highest eight bits of the integer.
Likewise, if we say a sequence of 8 bits is in most significant bit first order, what we mean is that if we interpret the 8 bits as an 8-bit integer, the first bit in the sequence denotes the highest binary digit of the integer, the second bit the second highest, and so on, until the last bit denotes the lowest binary digit of the integer.
Another thing to think about is that one can say that the usual decimal notation has a convention that is most significant digit first. For example, a decimal number like:
1250
is read to mean:
1 x 1000 +
2 x 100 +
5 x 10 +
0 x 1
Right? Now imagine a different convention that is least significant digit first. The same number would be written:
0521
and would be read as:
0 x 1 +
5 x 10 +
2 x 100 +
1 x 1000
Another thing you should observe in passing is that in the C family of languages (most modern programming languages), the shift-left operator (<<) and shift-right operator (>>) are pointing in a most significant bit direction. That is shifting a bit left increases its significance and shifting it right decreases it, meaning that left is most significant (and the left side is usually what we mean by first, at least in the west).

Resources