Huffman code tables - image

I didn't understand what do the Huffman tables of Jpeg contain, could someone explain this to me?
Thanks

Huffman encoding is a variable-length data compression method. It works by assigning the most frequent values in an input stream to the encodings with the smallest bit lengths.
For example, the input Seems every eel eeks elegantly. may encode the letter e as binary 1 and all other letters as various other longer codes, all starting with 0. That way, the resultant bit stream would be smaller than if every letter was a fixed size. By way of example, let's examine the quantities of each character and construct a tree that puts the common ones at the top.
Letter Count
------ -----
e 10
<SPC> 4
l 3
sy 2
Smvrkgant. 1
<EOF> 1
The end of file marker EOF is there since you generally have to have a multiple of eight bits in your file. It's to stop any padding at the end from being treated as a real character.
__________#__________
________________/______________ \
________/________ ____\____ e
__/__ __\__ __/__ \
/ \ / \ / \ / \
/ \ / \ / SPC l s
/ \ / \ / \ / \ / \
y S m v / k g \ n t
/\ / \
r . a EOF
Now this isn't necessarily the most efficient tree but it's enough to establish how the encodings are done. Let's first look at the uncompressed data. Assuming an eight-bit encoding, those thirty-one characters (we don't need the EOF for the uncompressed data) are going to take up 248 bits.
But, if you use the tree above to locate the characters, outputting a zero bit if you take the left sub-tree and a one bit if you take the right, you get the following:
Section Encoding
---------- --------
Seems<SPC> 00001 1 1 00010 0111 0101 (20 bits)
every<SPC> 1 00011 1 001000 00000 0101 (22 bits)
eel<SPC> 1 1 0110 0101 (10 bits)
eeks<SPC> 1 1 00101 0111 0101 (15 bits)
elegantly 1 0110 1 00110 001110 01000 01001 0110 00000 (36 bits)
.<EOF> 001001 001111 (12 bits)
That gives a grand total of 115 bits, rounded up to 120 since it needs to be a multiple of a byte, but that's still about half the size of the uncompressed data.
Now that's usually not worth it for a small file like this, since you have to add the space taken up by the actual tree itself(a), otherwise you cannot decode it at the other end. But certainly, for larger files where the distribution of characters isn't even, it can lead to impressive savings in space.
So, after all that, the Huffman tables in a JPEG are simply the tables that allow you to uncompress the stream into usable information.
The encoding process for JPEG consists of a few different steps (color conversion, chroma resolution reduction, block-based discrete cosine transforms, and so on) but the final step is a lossless Huffman encoding on each block which is what those tables are used to reverse when reading the image.
(a) Probably the best case for minimal storage of this table would be something like:
Size of length section (8-bits) = 3 (longest bit length of 6 takes 3 bits)
Repeated for each byte:
Actual length (3 bits, holding value between 1..6 inclusive)
Encoding (n bits, where n is the actual length)
Byte (8 bits)
End of table marker (3 bits) = 0 to distinguish from actual length above
For the text above, that would be:
00000011 8 bits
n bits byte
--- ------ -----
001 1 'e' 12 bits
100 0101 <SPC> 15 bits
101 00001 'S' 16 bits
101 00010 'm' 16 bits
100 0111 's' 15 bits
101 00011 'v' 16 bits
110 001000 'r' 17 bits
101 00000 'y' 16 bits
101 00101 'k' 16 bits
100 0110 'l' 15 bits
101 00110 'g' 16 bits
110 001110 'a' 17 bits
101 01000 'n' 16 bits
101 01001 't' 16 bits
110 001001 '.' 17 bits
110 001111 <EOF> 17 bits
000 3 bits
That makes the table 264 bits which totally wipes out the savings from compression. However, as stated, the impact of the table becomes far less as the input file becomes larger and there's a way to avoid the table altogether.
That way involves the use of another variant of Huffman, called Adaptive Huffman. This is where the table isn't actually stored in the compressed data.
Instead, during compression, the table starts with just EOF and a special bit sequence meant to introduce a new real byte into the table.
When introducing a new byte into the table, you would output the introducer bit sequence followed by the full eight bits of that byte.
Then, after each byte is output and the counts updated, the table/tree is rebalanced based on the new counts to be the most space-efficient (though the rebalancing may be deferred to improve speed, you just have to ensure the same deferral happens during decompression, an example being every time you add byte for the first 1K of input, then every 10K of input after that, assuming you've added new bytes since the last rebalance).
This means that the table itself can be built in exactly the same way at the other end (decompression), starting with the same minimal table with just the EOF and introducer sequence.
During decompression, when you see the introducer sequence, you can add the byte following it (the next eight bits) to the table with a count of zero, output the byte, then adjust the count and re-balance (or defer as previously mentioned).
That way, you do not have to have the table shipped with the compressed file. This, of course, costs a little more time during compression and decompression in that you're periodically rebalancing the table but, as with most things in life, it's a trade-off.

The DHT marker doesn't specify directly which symbol is associated with a code. It contains a vector with counts of how many codes there are of a given length. After that it contains a vector with symbol values.
So when you want to decode you have to generate the huffman codes from the first vector and then associate every code with a symbol in the second vector.

Related

hpack encoding integer significance

After reading this, https://httpwg.org/specs/rfc7541.html#integer.representation
I am confused about quite a few things, although I seem to have the overall gist of the idea.
For one, What are the 'prefixes' exactly/what is their purpose?
For two:
C.1.1. Example 1: Encoding 10 Using a 5-Bit Prefix
The value 10 is to be encoded with a 5-bit prefix.
10 is less than 31 (2^5 - 1) and is represented using the 5-bit prefix.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 0 | 1 | 0 | 1 | 0 | 10 stored on 5 bits
+---+---+---+---+---+---+---+---+
What are the leading Xs? What is the starting 0 for?
>>> bin(10)
'0b1010'
>>>
Typing this in the python IDE, you see almost the same output... Why does it differ?
This is when the number fits within the number of prefix bits though, making it seemingly simple.
C.1.2. Example 2: Encoding 1337 Using a 5-Bit Prefix
The value I=1337 is to be encoded with a 5-bit prefix.
1337 is greater than 31 (25 - 1).
The 5-bit prefix is filled with its max value (31).
I = 1337 - (25 - 1) = 1306.
I (1306) is greater than or equal to 128, so the while loop body executes:
I % 128 == 26
26 + 128 == 154
154 is encoded in 8 bits as: 10011010
I is set to 10 (1306 / 128 == 10)
I is no longer greater than or equal to 128, so the while loop terminates.
I, now 10, is encoded in 8 bits as: 00001010.
The process ends.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 | Prefix = 31, I = 1306
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1306>=128, encode(154), I=1306/128
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 10<128, encode(10), done
+---+---+---+---+---+---+---+---+
The octet-like diagram shows three different numbers being produced... Since the numbers are produced throughout the loop, how do you replicate this octet-like diagram within an integer? What is the actual final result? The diagram or "I" being 10, or 00001010.
def f(a, b):
if a < 2**b - 1:
print(a)
else:
c = 2**b - 1
remain = a - c
print(c)
if remain >= 128:
while 1:
e = remain % 128
g = e + 128
remain = remain / 128
if remain >= 128:
continue
else:
print(remain)
c+=int(remain)
print(c)
break
As im trying to figure this out, I wrote a quick python implementation of it, It seems that i am left with a few useless variables, one being g which in the documentation is the 26 + 128 == 154.
Lastly, where does 128 come from? I can't find any relation between the numbers besides the fact 2 raised to the 7th power is 128, but why is that significant? Is this because the first bit is reserved as a continuation flag? and an octet contains 8 bits so 8 - 1 = 7?
For one, What are the 'prefixes' exactly/what is their purpose?
Integers are used in a few places in HPACK messages and often they have leading bits that cannot be used to for the actual integer. Therefore, there will often be a few leading digits that will be unavailable to use for the integer itself. They are represented by the X. For the purposes of this calculation it doesn't make what those Xs are: could be 000, or 111, or 010 or...etc. Also, there will not always be 3 Xs - that is just an example. There could only be one leading X, or two, or four...etc.
For example, to look up a previous HPACK decoded header, we use 6.1. Indexed Header Field Representation which starts with a leading 1, followed by the table index value. Therefore that 1 is the X in the previous example. We have 7-bits (instead of only 5-bits in the original example in your question). If the table index value is 127 or less we can represent it using those 7-bits. If it's >= 127 then we need to do some extra work (we'll come back to this).
If it's a new value we want to add to the table (to reuse in future requests), but we already have that header name in the table (so it's just a new value for that name we want as a new entry) then we use 6.2.1. Literal Header Field with Incremental Indexing. This has 2 bits at the beginning (01 - which are the Xs), and we only have 6-bits this time to represent the index of the name we want to reuse. So in this case there are two Xs.
So don't worry about there being 3 Xs - that's just an example. In the above examples there was one X (as first bit had to be 1), and two Xs (as first two bits had to be 01) respectively. The Integer Representation section is telling you how to handle any prefixed integer, whether prefixed by 1, 2, 3... etc unusable "X" bits.
What are the leading Xs? What is the starting 0 for?
The leading Xs are discussed above. The starting 0 is just because, in this example we have 5-bits to represent the integers and only need 4-bits. So we pad it with 0. If the value to encode was 20 it would be 10100. If the value was 40, we couldn't fit it in 5-bits so need to do something else.
Typing this in the python IDE, you see almost the same output... Why does it differ?
Python uses 0b to show it's a binary number. It doesn't bother showing any leading zeros. So 0b1010 is the same as 0b01010 and also the same as 0b00001010.
This is when the number fits within the number of prefix bits though, making it seemingly simple.
Exactly. If you need more than the number of bits you have, you don't have space for it. You can't just use more bits as HPACK will not know whether you are intending to use more bits (so should look at next byte) or if it's just a straight number (so only look at this one byte). It needs a signal to know that. That signal is using all 1s.
So to encode 40 in 5 bits, we need to use 11111 to say "it's not big enough", overflow to next byte. 11111 in binary is 31, so we know it's bigger than that, so we'll not waste that, and instead use it, and subtract it from the 40 to give 9 left to encode in the next byte. A new additional byte gives us 8 new bits to play with (well actually only 7 as we'll soon discover, as the first bit is used to signal a further overflow). This is enough so we can use 00001001 to encode our 9. So our complex number is represented in two bytes: XXX11111 and 00001001.
If we want to encode a value bigger than can fix in the first prefixed bit, AND the left over is bigger than 127 that would fit into the available 7 bits of the second byte, then we can't use this overflow mechanism using two bytes. Instead we use another "overflow, overflow" mechanism using three bytes:
For this "overflow, overflow" mechanism, we set the first byte bits to 1s as usual for an overflow (XXX11111) and then set the first bit of the second byte to 1. This leaves 7 bits available to encode the value, plus the next 8 bits in the third byte we're going to have to use (actually only 7 bits of the third byte, because again it uses the first bit to indicate another overflow).
There's various ways they could go have gone about this using the second and third bytes. What they decided to do was encode this as two numbers: the 128 mod, and the 128 multiplier.
1337 = 31 + (128 * 10) + 26
So that means the frist byte is set to 31 as per pervious example, the second byte is set to 26 (which is 11010) plus the leading 1 to show we're using the overflow overflow method (so 100011010), and the third byte is set to 10 (or 00001010).
So 1337 is encoded in three bytes: XXX11111 100011010 00001010 (including setting X to whatever those values were).
Using 128 mod and multiplier is quite efficient and means this large number (and in fact any number up to 16,383) can be represented in three bytes which is, not uncoincidentally, also the max integer that can be represented in 7 + 7 = 14 bits). But it does take a bit of getting your head around!
If it's bigger than 16,383 then we need to do another round of overflow in a similar manner.
All this seems horrendously complex but is actually relatively simply, and efficiently, coded up. Computers can do this pretty easily and quickly.
It seems that i am left with a few useless variables, one being g
You are not print this value in the if statement. Only the left over value in the else. You need to print both.
which in the documentation is the 26 + 128 == 154.
Lastly, where does 128 come from? I can't find any relation between the numbers besides the fact 2 raised to the 7th power is 128, but why is that significant? Is this because the first bit is reserved as a continuation flag? and an octet contains 8 bits so 8 - 1 = 7?
Exactly, it's because the first bit (value 128) needs to be set as per explanation above, to show we are continuing/overflowing into needing a third byte.

I want to ask about the Google document protocol-buffers encoding

https://developers.google.com/protocol-buffers/docs/encoding?hl=eng#structure
enter image description here
I understand why drop msb from "value".
However, I don't know why msb is dropped on the "key" side.
? 000 1000: key
1 001 0110: value (msb: 1)
0 000 0001: value (msb: 0 = end)
"key" has a byte after it, so the number to enter msb is 1 (1 000 1000)
or
I think there is no reason to drop msb because "key" is 1 byte.(0000 1000)
but It dropped msb and msb bit is 0.
Why is the value 0 even though it dropped msb?
You need to think of the key, in isolation, as an integer, and varint encode that integer by itself using varint encoding. The integer shown (b1000) only needs 4 bits, so in varint encoding that is a single byte with a 0 MSB. But keys can be much much larger. Think of field number 1708 - the number (not including the wire type) is b11010101100 - so that's 11 bits for the number, plus 3 bits for the wire type, so we have 14 bits of data to encode. Varint is a 7-bit encoding, so we can store that in 2 bytes: the first will have the MSB 1, and the second will have the MSB 0.
Or in other words: any MSB when discussing the key is there to help you find where the key stops being encoded. It has nothing to do with the payload that comes afterwards.

Why this exponent got calculated in this way at this example?

Number: 0.1101112 × 2^–3 (the first bit is included in this example in the mantissa)
where 8 bits are used for the characteristic, and the exponent bias is
2^7 – 1
Their solution:
The sign bit is 0. The characteristic is –3 + 2^7 – 1, represented as an 8-
bit binary number. The simplest way to calculate the characteristic
here is to find the 7-bit 2’s complement of the binary representation
of 4 (= 3 + 1), and adjoin a leading zero:
Binary representation of 4: 0000100
2’s complement: 1111100
Characteristic: 0111 1100
Why: my solution was get the 8-bit instead of the 7-bit complement
1111 1100 then add it to 128 8-bit representation 1000 0000
Which get me 1 0111 1100 then ignoring the ninth column I got the same answer,
but i did not get the approach of the author.
Your explanation is highly appreciated
Thanks
The idea behind the original approach is to rewrite the expression
–3 + 2^7 – 1
as
2^7 - 4
The lower seven bits of this expression are the 7-bit two's complement of 4 (i.e. the representation of -4 in 7 bits). Since the number is obviously in the range 0-127, then eighth bit must be zero.

Ruby - How to represent message length as 2 binary bytes

I'm using Ruby and I'm communicating with a network endpoint that requires the formatting of a 'header' prior to sending the message itself.
The first field in the header must be the message length which is defined as a 2 binary byte message length in network byte order.
For example, my message is 1024 in length. How do I represent 1024 as binary two-bytes?
The standard tools for byte wrangling in Ruby (and Perl and Python and ...) are pack and unpack. Ruby's pack is in Array. You have a length that should be two bytes long and in network byte order, that sounds like a job for the n format specifier:
n | Integer | 16-bit unsigned, network (big-endian) byte order
So if the length is in length, you'd get your two bytes thusly:
two_bytes = [ length ].pack('n')
If you need to do the opposite, have a look at String#unpack:
length = two_bytes.unpack('n').first
See Array#pack.
[1024].pack("n")
This packs the number as the network-order byte sequence \x04\x00.
The way this works is that each byte is 8 binary bits. 1024 in binary is 10000000000. If we break this up into octets of 8 (8 bits per byte), we get: 00000100 00000000.
A byte can represent (2 states) ^ (8 positions) = 256 unique values. However, since we don't have 256 ascii-printable characters, we visually represent bytes as hexadecimal pairs, since a hexadecimal digit can represent 16 different values and 16 * 16 = 256. Thus, we can take the first byte, 00000100 and break it into two hexadecimal quads as 0000 0100. Translating binary to hex gives us 0x04. The second byte is trivial, as 0000 0000 is 0x00. This gives us our hexadecimal representation of the two-byte string.
It's worth noting that because you are constrained to a 2-byte (16-bit) header, you are limited to a maximum value of 11111111 11111111, or 2^16 - 1 = 65535 bytes. Any message larger than that cannot accurately represent its length in two bytes.

Confusion regarding genetic algorithms

My books(Artificial Intelligence A modern approach) says that Genetic algorithms begin with a set of k randomly generated states, called population. Each state is represented as a string over a finite alphabet- most commonly, a string of 0s and 1s. For eg, an 8-queens state must specify the positions of 8 queens, each in a column of 8 squares, and so requires 8 * log(2)8 = 24 bits. Alternatively the state could be represented as 8 digits, each in range from 1 to 8.
[ http://en.wikipedia.org/wiki/Eight_queens_puzzle ]
I don't understand the expression 8 * log(2)8 = 24 bits , why log2 ^ 8? And what are these 24 bits supposed to be for?
If we take first example on the wikipedia page, the solution can be encoded as [2,4,6,8,3,1,7,5] : the first digit gives the row number for the queen in column A, the second for the queen in column B and so on. Now instead of starting the row numbering at 1, we will start at 0. The solution is then encoded with [1,3,5,7,0,6,4]. Any position can be encoded such way.
We have only digits between 0 and 7, if we write them in binary 3 bit (=log2(8)) are enough :
000 -> 0
001 -> 1
...
110 -> 6
111 -> 7
A position can be encoded using 8 times 3 digits, e.g. from [1,3,5,7,2,0,6,4] we get [001,011,101,111,010,000,110,100] or more briefly 001011101111010000110100 : 24 bits.
In the other way, the bitstring 000010001011100101111110 decodes as 000.010.001.011.100.101.111.110 then [0,2,1,3,4,5,7,6] and gives [1,3,2,4,5,8,7] : queen in column A is on row 1, queen in column B is on row 3, etc.
The number of bits needed to store the possible squares (8 possibilities 0-7) is log(2)8. Note that 111 in binary is 7 in decimal. You have to specify the square for 8 columns, so you need 3 bits 8 times

Resources