I am reverse engineering an AC remote control. When I send the temperature values (from 17C to 30C) I get the following stream.
Temperature - Binary - Hex - Decimal
17C - 00000000 - 0x00 - 0
18C - 00010000 - 0x10 - 16
19C - 00110000 - 0x30 - 48
20C - 00100000 - 0x20 - 32
21C - 01100000 - 0x60 - 96
22C - 01110000 - 0x70 - 112
23C - 01010000 - 0x50 - 80
24C - 01000000 - 0x40 - 64
25C - 11000000 - 0xc0 - 192
26C - 11010000 - 0xd0 - 208
27C - 10010000 - 0x90 - 144
28C - 10000000 - 0x80 - 128
29C - 10100000 - 0xa0 - 160
30C - 10110000 - 0xb0 - 176
What possible method could they have used to encode the temperature data in the byte?
Is there any method to reverse engineer the stream?
Looking at it it's clear that they are adding powers of 2 but I can't figure out the logic behind it.
The first four bits form a 4-bit Gray code, since every two consecutive bit sequences differ in exactly one position:
0000
0001
0011
0010
0110
0111
0101
0100
1100
1101
1001
1000
1010
1011
[1111] (missing, probably 31C)
Your sample does not give any indication about what happens outside the [17, 31] range.
As for why this pattern would be used, Wikipedia provides some examples:
Gray codes are used in position encoders (linear encoders and rotary encoders), in preference to straightforward binary encoding. This avoids the possibility that, when several bits change in the binary representation of an angle, a misread will result from some of the bits changing before others.
Related
So, I know that when we type characters, each character maps to a number in a character set and then, that number is transformed into a binary format so a computer can understand. They way that number is transformed into a binary format(how many bits gets allocated) depends on character encoding.
So, if I type L, It represents 76. Then 76 gets tranformed into 1 byte binary format because of let's say UTF-8.
Now, I've read the following somewhere:
The Devanagari character क, with code point 2325 (which is 915 in
hexadecimal notation), will be represented by two bytes when using the
UTF-16 encoding (09 15), three bytes with UTF-8 (E0 A4 95), or four
bytes with UTF-32 (00 00 09 15).
So, as you can see it says three bytes with UTF-8 (E0 A4 95). how are E0 A4 95 bytes ? I am asking because i have no idea where E0 A4 95 came from... Why do we need this ? if we know that code point is 2325, all we have to do is in order to use UTF-8, we know that utf-8 will need 3 bytes to transform 2325 into binary... Why do we need E0 A4 95 and what is it ?
E0 A4 95 is the 3-byte UTF-8 encoding of U+0915. In binary:
E 0 A 4 9 5 (hex)
11100000 10100100 10010101 (binary)
1110xxxx 10xxxxxx 10xxxxxx (3-byte UTF-8 encoding pattern)
0000 100100 010101 (data bits)
00001001 00010101 (regrouped data bits to 8-bit bytes)
0 9 1 5 (hex)
U+0915 (Unicode code point)
The first byte's binary pattern 1110xxxx is a lead byte indicating a 3-byte encoding and 4 bits of data. Follow on bytes start with 10xxxxxx and provide 6 more bits of data. There will be two following bytes after a 3-byte leading byte indicator.
For more information read the Wikipedia article on UTF-8 and the standard RFC-3629.
This is the documentation for the Windows .lnk shortcut format:
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-shllink/16cb4ca1-9339-4d0c-a68d-bf1d6cc0f943
The ShellLinkHeader structure is described like this:
This is a file:
Looking at HeaderSize, the bytes are 4c 00 00 00 and it's supposed to mean 76 decimal. This is a little-endian integer, no surprise here.
Next is the LinkCLSID with the bytes 01 14 02 00 00 00 00 00 c0 00 00 00, representing the value "00021401-0000-0000-C000-000000000046". This answer seems to explain why the byte order changes because the last 8 bytes are a byte array while the others are little-endian numbers.
My question is about the LinkFlags part.
The LinkFlags part is described like this:
And the bytes in my file are 9b 00 08 00, or in binary:
9 b 0 0 0 8 0 0
1001 1011 0000 0000 0000 1000 0000 0000
^
By comparing different files I found out that the bit marked with ^ is bit 6/G in the documentation (marked in red).
How to interpret this? The bytes are in the same order as in the documentation but each byte has its bits reversed?
The issue here springs from the fact the shown list of bits in these specs is not meant to fit a number underneath it at all. It is meant to fit a list of bits underneath it, and that list goes from the lowest bit to the highest bit, which is the complete inverse of how we read numbers from left to right.
The list clearly shows bits numbered from 0 to 31, though, meaning this is indeed one 32-bit value, and not four bytes. Specifically, this means the original read bytes need to be interpreted as a single 32-bit integer before doing anything else. Like with all other values, this means it needs to be read as little-endian number, with its bytes reversed.
So your 9b 00 08 00 becomes 0008009b, or, in binary, 0000 0000 0000 1000 0000 0000 1001 1011.
But, as I said, that list in the specs shows the bits from lowest to highest. So to fit them under that, reverse the binary version:
0 1 2 3
0123 4567 8901 2345 6789 0123 4567 8901
ABCD EFGH IJKL MNOP QRST UVWX YZ#_ ____
---------------------------------------
1101 1001 0000 0000 0001 0000 0000 0000
^
So bit 6, indicated in the specs as 'G', is 0.
This whole thing makes a lot more sense if you invert the specs, though, and list the bits logically, from highest to lowest:
3 2 1 0
1098 7654 3210 9876 5432 1098 7654 3210
____ _#ZY XWVU TSRQ PONM LKJI HGFE DCBA
---------------------------------------
0000 0000 0000 1000 0000 0000 1001 1011
^
0 0 0 8 0 0 9 b
This makes the alphabetic references look a lot less intuitive, but it does perfectly fit the numeric versions underneath. The bit matches your findings (third bit on what you have as value '9'), and you can also clearly see that the highest 5 bits are unused.
The setup
Say I've got:
A series of numbers resulting from LZW compression of a bitmap:
256 1 258 258 0 261 261 259 260 262 0 264 1 266 267 258 2 273 2 262 259 274 275 270 278 259 262 281 265 276 264 270 268 288 264 257
An LZW-compressed, variable-length-encoded bytestream (including the LZW code size header and sub-block markers) which represents this same series of numbers:
00001000 00101001 00000000 00000011 00001000 00010100 00001000 10100000
01100000 11000001 10000001 00000100 00001101 00000010 01000000 00011000
01000000 11100001 01000010 10000001 00000010 00100010 00001010 00110000
00111000 01010000 11100010 01000100 10000111 00010110 00000111 00011010
11001100 10011000 10010000 00100010 01000010 10000111 00001100 01000001
00100010 00001100 00001000 00000000
And an initial code width of 8.
The problem
I'm trying to derive the initial series of numbers (the integer array) from the bytestream.
From what I've read, the procedure here is to take the initial code width, scan right-to-left, reading initial code width + 1 bits at a time, to extract the integers from the bytestream. For example:
iteration #1: 1001011011100/001/ yield return 4
iteration #2: 1001011011/100/001 yield return 1
iteration #3: 1001011/011/100001 yield return 6
iteration #4: 1001/011/011100001 yield return 6
This procedure will not work for iteration #5, which will yield 1:
iteration #5: 1/001/011011100001 yield return 1 (expected 9)
The code width should have been increased by one.
The question
How am I supposed to know when to increase the code width when reading the variable-length-encoded bytestream? Do I have all of the required information necessary to decompress this bytestream? Am I conceptually missing something?
UPDATE:
After a long discussion with greybeard - I found out that I was reading the binary string incorrectly: 00000000 00000011 00 is to be interpreted as 256, 1. The bytestream is not read as big-endian.
And very roughly speaking, if you are decoding a bytestream, you increase the number of bits read every time you read 2^N-1 codes, where N is the current code width.
Decompressing, you are supposed to build a dictionary in much the same way as the compressor. You know you need to increase the code width as soon as the compressor might use a code too wide for the current width.
As long as the dictionary is not full (the maximum code is not assigned), a new code is assigned for every (regular) code put out (not the Clear Code or End Of Information codes).
With the example in the presentation you linked, 8 is assigned when the second 6 is "transmitted" - you need to switch to four bits before reading the next code.
(This is where the example and your series of numbers differ - the link presents 4, 1, 6, 6, 2, 9.)
I am looking to determine Binary address, tag, index, and Hit or miss of a cache with 16 one-word blocks and one which uses 8 2-word blocks all assumed empty at the begginning
Say I have the referenced instructions 4, 4, 32, 31, 5, 32
For the first cache (16 one word blocks) you must first convert 4 to binary then that binary value is split to get tag then if you find that index again it will be mark as a hit
That being said, I believe the table below to be correct using this method.
Ref | Binary | Tag | Index | Hit or Miss
4 00000100 0000 0100 miss
4 00000100 0000 0100 hit
32 00100000 0010 0000 miss
31 00011111 0001 1111 miss
5 00000101 0000 0101 miss
32 00100000 0010 0000 hit
I wish to do the same for the the second cache(8 two-word blocks) however I am unsure how to continue.
I figure the binary is the same for the numbers however I am confused on how to determine tag and index from it and whether there was a hit or a miss on the same referenced instructions as the first cache.
How would one determine the tag, index, and whether or not it was a hit or miss in this cache?
It varies in that you have half as many cache lines to work with, giving 4 bits of tag, 3 bits of index and 1 bit of displacement within the cache line (indicating which word of a two-word block is addressed). For the example given, the wider fetches will garner one additional hit since accessing 4 fetches 5 as well.
Ref | Binary | Tag | Index | Disp | Hit or Miss
4 00000100 0000 010 0 miss
4 00000100 0000 010 0 hit
32 00100000 0010 000 0 miss
31 00011111 0001 111 1 miss
5 00000101 0000 010 1 *hit
32 00100000 0010 000 0 hit
I compressed the text "TestingTesting" and the hex result was: 0B 49 2D 2E C9 CC 4B 0F 81 50 00. I can't figure out the length and distance codes. The binary below is reversed because the RFC says to read the bits from right to left (thanks Matthew Slattery for the help). Here is what was parsed so far:
1 BFINAL (last block)
01 BTYPE (static)
1000 0100 132-48= 84 T
1001 0101 149-48= 101 e
1010 0011 163-48= 115 s
1010 0100 164-48= 116 t
1001 1001 153-48= 105 i
1001 1110 158-48= 110 n
1001 0111 151-48= 103 g
These are the remaining bits that I don't know how to parse:
1000 0100 0000 1000 0101 0000 0000 0
The final 10 bits (end of block value is x100) is the only part I can parse. I think the length and distance values should be 7 (binary 0111) since the length of "Testing" is 7 letters, and it gets copied 7 characters after the current position, but I can't figure out how its representing this in the remaining bits. What am I doing wrong?
The distance code is 5, but a distance code of 5 is followed by one "extra bit" to indicate an actual distance of either 7 or 8. (See the second table in paragraph 3.2.5 of the RFC.)
The complete decoding of the data is:
1 BFINAL
01 BTYPE=static
10000100 'T'
10010101 'e'
10100011 's'
10100100 't'
10011001 'i'
10011110 'n'
10010111 'g'
10000100 another 'T'
0000100 literal/length code 260 = length 6
00101 distance code 5
0 extra bit => the distance is 7
0000000 literal/length code 256 = end of block