Length values from deflate algorithm - algorithm

I compressed the text "TestingTesting" and the hex result was: 0B 49 2D 2E C9 CC 4B 0F 81 50 00. I can't figure out the length and distance codes. The binary below is reversed because the RFC says to read the bits from right to left (thanks Matthew Slattery for the help). Here is what was parsed so far:
1 BFINAL (last block)
01 BTYPE (static)
1000 0100 132-48= 84 T
1001 0101 149-48= 101 e
1010 0011 163-48= 115 s
1010 0100 164-48= 116 t
1001 1001 153-48= 105 i
1001 1110 158-48= 110 n
1001 0111 151-48= 103 g
These are the remaining bits that I don't know how to parse:
1000 0100 0000 1000 0101 0000 0000 0
The final 10 bits (end of block value is x100) is the only part I can parse. I think the length and distance values should be 7 (binary 0111) since the length of "Testing" is 7 letters, and it gets copied 7 characters after the current position, but I can't figure out how its representing this in the remaining bits. What am I doing wrong?

The distance code is 5, but a distance code of 5 is followed by one "extra bit" to indicate an actual distance of either 7 or 8. (See the second table in paragraph 3.2.5 of the RFC.)
The complete decoding of the data is:
1 BFINAL
01 BTYPE=static
10000100 'T'
10010101 'e'
10100011 's'
10100100 't'
10011001 'i'
10011110 'n'
10010111 'g'
10000100 another 'T'
0000100 literal/length code 260 = length 6
00101 distance code 5
0 extra bit => the distance is 7
0000000 literal/length code 256 = end of block

Related

Is there an algorithm to find the shortest binary representation for every entry within a given range?

I have an encoding scheme, but I don't know the name of it. I know there must be an algorithm to encode/decode integers into this binary scheme. The scheme is as follows:
1 2 3 4 5 6 7 8 9 etc.
0 - 0 0 00 00 00 00 000 000
1 1 10 01 01 01 010 001 001
2 11 10 10 100 011 010 010
3 11 110 101 100 011 011
4 111 110 101 100 100
5 111 110 101 101
6 111 110 110
7 111 1110
8 1111
etc.
Example:
When you have a range of 6 integers (0 to 5) you can use column 6. With this you can save a bit on the numbers 0 and 1. When using column 9, you will save a bit on every number except on 7 and 8.
The 'you will save a bit' is opposed to using 2, 3, 4, or N bit words.
I tried to Google this, but I can't find the right search keywords. Could someone point me in the right direction?
Thanks!
This appears to be Huffman Encoding with assumed uniform distribution across all values in any given range.
So for instance, the 5th column is just a huffman encoding for the character set [0-5] (inclusive), which assumes all 6 numbers are of equal probability to occur.

What is the byte/bit order in this Microsoft document?

This is the documentation for the Windows .lnk shortcut format:
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-shllink/16cb4ca1-9339-4d0c-a68d-bf1d6cc0f943
The ShellLinkHeader structure is described like this:
This is a file:
Looking at HeaderSize, the bytes are 4c 00 00 00 and it's supposed to mean 76 decimal. This is a little-endian integer, no surprise here.
Next is the LinkCLSID with the bytes 01 14 02 00 00 00 00 00 c0 00 00 00, representing the value "00021401-0000-0000-C000-000000000046". This answer seems to explain why the byte order changes because the last 8 bytes are a byte array while the others are little-endian numbers.
My question is about the LinkFlags part.
The LinkFlags part is described like this:
And the bytes in my file are 9b 00 08 00, or in binary:
9 b 0 0 0 8 0 0
1001 1011 0000 0000 0000 1000 0000 0000
^
By comparing different files I found out that the bit marked with ^ is bit 6/G in the documentation (marked in red).
How to interpret this? The bytes are in the same order as in the documentation but each byte has its bits reversed?
The issue here springs from the fact the shown list of bits in these specs is not meant to fit a number underneath it at all. It is meant to fit a list of bits underneath it, and that list goes from the lowest bit to the highest bit, which is the complete inverse of how we read numbers from left to right.
The list clearly shows bits numbered from 0 to 31, though, meaning this is indeed one 32-bit value, and not four bytes. Specifically, this means the original read bytes need to be interpreted as a single 32-bit integer before doing anything else. Like with all other values, this means it needs to be read as little-endian number, with its bytes reversed.
So your 9b 00 08 00 becomes 0008009b, or, in binary, 0000 0000 0000 1000 0000 0000 1001 1011.
But, as I said, that list in the specs shows the bits from lowest to highest. So to fit them under that, reverse the binary version:
0 1 2 3
0123 4567 8901 2345 6789 0123 4567 8901
ABCD EFGH IJKL MNOP QRST UVWX YZ#_ ____
---------------------------------------
1101 1001 0000 0000 0001 0000 0000 0000
^
So bit 6, indicated in the specs as 'G', is 0.
This whole thing makes a lot more sense if you invert the specs, though, and list the bits logically, from highest to lowest:
3 2 1 0
1098 7654 3210 9876 5432 1098 7654 3210
____ _#ZY XWVU TSRQ PONM LKJI HGFE DCBA
---------------------------------------
0000 0000 0000 1000 0000 0000 1001 1011
^
0 0 0 8 0 0 9 b
This makes the alphabetic references look a lot less intuitive, but it does perfectly fit the numeric versions underneath. The bit matches your findings (third bit on what you have as value '9'), and you can also clearly see that the highest 5 bits are unused.

Self complementing Codes

This statement was deemed true: Given any self-complementing decimal code scheme, if we know the codes for the number 283, then we can deduce the codes for 671.
I wanna know why. I took Excess-3 BCD as the self complementing code:
0-0011
1-0100
2-0101
3-0110
4-0111
5-1000
6-1001
7-1010
8-1011
9-1100
So 283 = 0101 1011 0110 .
671 = 1001 1010 0011
So why is the statement as it is as 283-ex3 is not a 1s complement of 671-ex3?
Since it is self-complementing decimal code scheme, then the code for 9's compliment of 283 can be obtained by taking 1's complement of code for 283.
9's complement of 283 = 716
283 = 0101 1011 0110. so its 1's complement = 1010 0100 1001 will be the code for 716.
From this: code for 7 =1010, that for 1 =0100 and for 6 = 1001
So code for 671 = 1001 1010 0100

How does a direct mapped cache with 16 one-word blocks vary from one with 8 two-word blocks

I am looking to determine Binary address, tag, index, and Hit or miss of a cache with 16 one-word blocks and one which uses 8 2-word blocks all assumed empty at the begginning
Say I have the referenced instructions 4, 4, 32, 31, 5, 32
For the first cache (16 one word blocks) you must first convert 4 to binary then that binary value is split to get tag then if you find that index again it will be mark as a hit
That being said, I believe the table below to be correct using this method.
Ref | Binary | Tag | Index | Hit or Miss
4 00000100 0000 0100 miss
4 00000100 0000 0100 hit
32 00100000 0010 0000 miss
31 00011111 0001 1111 miss
5 00000101 0000 0101 miss
32 00100000 0010 0000 hit
I wish to do the same for the the second cache(8 two-word blocks) however I am unsure how to continue.
I figure the binary is the same for the numbers however I am confused on how to determine tag and index from it and whether there was a hit or a miss on the same referenced instructions as the first cache.
How would one determine the tag, index, and whether or not it was a hit or miss in this cache?
It varies in that you have half as many cache lines to work with, giving 4 bits of tag, 3 bits of index and 1 bit of displacement within the cache line (indicating which word of a two-word block is addressed). For the example given, the wider fetches will garner one additional hit since accessing 4 fetches 5 as well.
Ref | Binary | Tag | Index | Disp | Hit or Miss
4 00000100 0000 010 0 miss
4 00000100 0000 010 0 hit
32 00100000 0010 000 0 miss
31 00011111 0001 111 1 miss
5 00000101 0000 010 1 *hit
32 00100000 0010 000 0 hit

decoding HID data

I am using an rs232 HID reader.
Its manual says that its output is
CCDDDDDDDDDDXX
where CC is reserved for HID
DDDDDDDDDD is the transponder (the card) data
XX is a checksum
the checksum is well explained and irrelevant here. About DDDDDDDDDD only says valid values are 0000000000 to 1FFFFFFFFF but no indication of how it converts to what is printed on front face of the card.
I have 3 sample cards, sadly on a short range (edit plus an extra one). here I show them:
readed from rs232 shown on card
00000602031C27 00398
00000602031F2A 00399
0000060203202B 00400
00000601B535F1 55962 **new
Also I have a DB with 1000 cards loaded (what is printed on front) so I need the the decode path from what I read on rs232 to what is printed on front.
Some values from DB (I have seen the cards, but I have no phisical access to them now)
55503
60237
00833
Thanks a lot to every one.
Googling for the string "CCDDDDDDDDDDXX" returns http://www.rfideas.com/downloads/SerialAppNote8.pdf which seems to describe how to decode the numbers. I don't guarantee if that is accurate.
Decoding the Standard 26-bit Format
Message sent by the reader:
C C D D D D D D D D D D X X
---------------------------
0 0 0 0 0 6 0 2 0 3 1 C 2 7
0 0 0 0 0 6 0 2 0 3 1 F 2 A
0 0 0 0 0 6 0 2 0 3 2 0 2 B
0 0 0 0 0 6 0 1 B 5 3 5 F 1
Stripping off the checksum, X, and reducing the data to binary gives:
C C D D D D D D D D D D
cccc cccc zzzz zzzz zzzz zspf ffff fffn nnnn nnnn nnnn nnnp
-----------------------------------------------------------
0000 0000 0000 0000 0000 0110 0000 0010 0000 0011 0001 1100
0000 0000 0000 0000 0000 0110 0000 0010 0000 0011 0001 1111
0000 0000 0000 0000 0000 0110 0000 0010 0000 0011 0010 0000
0000 0000 0000 0000 0000 0110 0000 0001 1011 0101 0011 0101
All the Card Data Characters to the left of the 7th can be ignored.
c = HID Specific Code.
z = leading zeros
s = start sentinel (it is always a 1)
p = parity odd and even (12 bits each).
f = Facility Code 8 bits
n = Card Number 16 bits
From this we can see that
00000602031C27 → n = 0b0000000110001110 = 398
00000602031F2A → n = 0b0000000110001111 = 399
0000060203202B → n = 0b0000000110010000 = 400
00000601B535F1 → n = 0b1101101010011010 = 55962
So, for your example, we may probably get:
55503
(f, n) = 0b0000_0001__1101_1000_1100_1111
odd parity of first 12 bits = 0
even parity of last 12 bits = 0
result = 00000403b19e56

Resources